From nobody Sun Nov 24 18:33:01 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1717751430; cv=none; d=zohomail.com; s=zohoarc; b=aPlkaev/44c9IT4+uj8EX5phqVk6yhsAwC4vF0hPXjhD2rxeJiyH4ExmbEUuMS1jhh6SRTN6oPc8xoOvGJI12oSZ/N1v2Nky+7I9fe2BBIzQd5C71gx8WUmSqPpVwHmAjnNLSSmmBAheFUHdJb4giA+wgmbHoVRtPUHi5om3UdA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1717751430; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=mc4++sRx3ukGDekZxB1N0n7JzUxaxxhg3TlGVFiWWaQ=; b=JcHXH2GZw1LuV/dvP2ilZyQq5df4M19gpoHtqruEf3hjAQUBvkVYxgUk48v3gvtDAzUkHa6Tbt8DHP7QuJudYRBRasatoY+hYwXEyCFZ0vi1rKHq6t5rzSRMeAIwbRRgSnSgXVehAzSK76arTIacObLrLsHfpZ12F11kX0jAQuc= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1717751430832521.4223072025765; Fri, 7 Jun 2024 02:10:30 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.736429.1142518 (Exim 4.92) (envelope-from ) id 1sFVbs-0007ZO-6j; Fri, 07 Jun 2024 09:10:08 +0000 Received: by outflank-mailman (output) from mailman id 736429.1142518; Fri, 07 Jun 2024 09:10:08 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1sFVbs-0007ZD-34; Fri, 07 Jun 2024 09:10:08 +0000 Received: by outflank-mailman (input) for mailman id 736429; Fri, 07 Jun 2024 09:10:06 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1sFVbq-0006Bc-3b for xen-devel@lists.xenproject.org; Fri, 07 Jun 2024 09:10:06 +0000 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id b9f949be-24ad-11ef-90a2-e314d9c70b13; Fri, 07 Jun 2024 11:10:05 +0200 (CEST) Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-606-eUzDhlzTMDqdmYuPAblQDA-1; Fri, 07 Jun 2024 05:09:56 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 964D43C01221; Fri, 7 Jun 2024 09:09:55 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.94]) by smtp.corp.redhat.com (Postfix) with ESMTP id 30BEC37E5; Fri, 7 Jun 2024 09:09:51 +0000 (UTC) X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: b9f949be-24ad-11ef-90a2-e314d9c70b13 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1717751403; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mc4++sRx3ukGDekZxB1N0n7JzUxaxxhg3TlGVFiWWaQ=; b=i3l+1xbWhuleJwBKwwSrDqP43KsQnfifvtyAnYuVufCcL0d0NFBBHXPiLH0MHGUKkDxvfj WWrtixlSiV3Fj/7e8bSJYMfx5LkRmpSLW0vEc4HIqqBnUlpQeOlJpHndGG82/ltRA9tDGN XsuZBbOBpONWjoqPTYqFHPyaYQlzwsg= X-MC-Unique: eUzDhlzTMDqdmYuPAblQDA-1 From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org, kasan-dev@googlegroups.com, David Hildenbrand , Andrew Morton , Mike Rapoport , Oscar Salvador , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , =?UTF-8?q?Eugenio=20P=C3=A9rez?= , Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , Alexander Potapenko , Marco Elver , Dmitry Vyukov Subject: [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved() Date: Fri, 7 Jun 2024 11:09:37 +0200 Message-ID: <20240607090939.89524-3-david@redhat.com> In-Reply-To: <20240607090939.89524-1-david@redhat.com> References: <20240607090939.89524-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1717751432117100003 Content-Type: text/plain; charset="utf-8" We currently initialize the memmap such that PG_reserved is set and the refcount of the page is 1. In virtio-mem code, we have to manually clear that PG_reserved flag to make memory offlining with partially hotplugged memory blocks possible: has_unmovable_pages() would otherwise bail out on such pages. We want to avoid PG_reserved where possible and move to typed pages instead. Further, we want to further enlighten memory offlining code about PG_offline: offline pages in an online memory section. One example is handling managed page count adjustments in a cleaner way during memory offlining. So let's initialize the pages with PG_offline instead of PG_reserved. generic_online_page()->__free_pages_core() will now clear that flag before handing that memory to the buddy. Note that the page refcount is still 1 and would forbid offlining of such memory except when special care is take during GOING_OFFLINE as currently only implemented by virtio-mem. With this change, we can now get non-PageReserved() pages in the XEN balloon list. From what I can tell, that can already happen via decrease_reservation(), so that should be fine. HV-balloon should not really observe a change: partial online memory blocks still cannot get surprise-offlined, because the refcount of these PageOffline() pages is 1. Update virtio-mem, HV-balloon and XEN-balloon code to be aware that hotplugged pages are now PageOffline() instead of PageReserved() before they are handed over to the buddy. We'll leave the ZONE_DEVICE case alone for now. Signed-off-by: David Hildenbrand Acked-by: Oscar Salvador # for the generic --- drivers/hv/hv_balloon.c | 5 ++--- drivers/virtio/virtio_mem.c | 18 ++++++++++++------ drivers/xen/balloon.c | 9 +++++++-- include/linux/page-flags.h | 12 +++++------- mm/memory_hotplug.c | 16 ++++++++++------ mm/mm_init.c | 10 ++++++++-- mm/page_alloc.c | 32 +++++++++++++++++++++++--------- 7 files changed, 67 insertions(+), 35 deletions(-) diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index e000fa3b9f978..c1be38edd8361 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -693,9 +693,8 @@ static void hv_page_online_one(struct hv_hotadd_state *= has, struct page *pg) if (!PageOffline(pg)) __SetPageOffline(pg); return; - } - if (PageOffline(pg)) - __ClearPageOffline(pg); + } else if (!PageOffline(pg)) + return; =20 /* This frame is currently backed; online the page. */ generic_online_page(pg, 0); diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index a3857bacc8446..b90df29621c81 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -1146,12 +1146,16 @@ static void virtio_mem_set_fake_offline(unsigned lo= ng pfn, for (; nr_pages--; pfn++) { struct page *page =3D pfn_to_page(pfn); =20 - __SetPageOffline(page); - if (!onlined) { + if (!onlined) + /* + * Pages that have not been onlined yet were initialized + * to PageOffline(). Remember that we have to route them + * through generic_online_page(). + */ SetPageDirty(page); - /* FIXME: remove after cleanups */ - ClearPageReserved(page); - } + else + __SetPageOffline(page); + VM_WARN_ON_ONCE(!PageOffline(page)); } page_offline_end(); } @@ -1166,9 +1170,11 @@ static void virtio_mem_clear_fake_offline(unsigned l= ong pfn, for (; nr_pages--; pfn++) { struct page *page =3D pfn_to_page(pfn); =20 - __ClearPageOffline(page); if (!onlined) + /* generic_online_page() will clear PageOffline(). */ ClearPageDirty(page); + else + __ClearPageOffline(page); } } =20 diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index aaf2514fcfa46..528395133b4f8 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -146,7 +146,8 @@ static DECLARE_WAIT_QUEUE_HEAD(balloon_wq); /* balloon_append: add the given page to the balloon. */ static void balloon_append(struct page *page) { - __SetPageOffline(page); + if (!PageOffline(page)) + __SetPageOffline(page); =20 /* Lowmem is re-populated first, so highmem pages go at list tail. */ if (PageHighMem(page)) { @@ -412,7 +413,11 @@ static enum bp_state increase_reservation(unsigned lon= g nr_pages) =20 xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]); =20 - /* Relinquish the page back to the allocator. */ + /* + * Relinquish the page back to the allocator. Note that + * some pages, including ones added via xen_online_page(), might + * not be marked reserved; free_reserved_page() will handle that. + */ free_reserved_page(page); } =20 diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index f04fea86324d9..e0362ce7fc109 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -30,16 +30,11 @@ * - Pages falling into physical memory gaps - not IORESOURCE_SYSRAM. Tryi= ng * to read/write these pages might end badly. Don't touch! * - The zero page(s) - * - Pages not added to the page allocator when onlining a section because - * they were excluded via the online_page_callback() or because they are - * PG_hwpoison. * - Pages allocated in the context of kexec/kdump (loaded kernel image, * control pages, vmcoreinfo) * - MMIO/DMA pages. Some architectures don't allow to ioremap pages that = are * not marked PG_reserved (as they might be in use by somebody else who = does * not respect the caching strategy). - * - Pages part of an offline section (struct pages of offline sections sh= ould - * not be trusted as they will be initialized when first onlined). * - MCA pages on ia64 * - Pages holding CPU notes for POWER Firmware Assisted Dump * - Device memory (e.g. PMEM, DAX, HMM) @@ -1021,6 +1016,10 @@ PAGE_TYPE_OPS(Buddy, buddy, buddy) * The content of these pages is effectively stale. Such pages should not * be touched (read/write/dump/save) except by their owner. * + * When a memory block gets onlined, all pages are initialized with a + * refcount of 1 and PageOffline(). generic_online_page() will + * take care of clearing PageOffline(). + * * If a driver wants to allow to offline unmovable PageOffline() pages wit= hout * putting them back to the buddy, it can do so via the memory notifier by * decrementing the reference count in MEM_GOING_OFFLINE and incrementing = the @@ -1028,8 +1027,7 @@ PAGE_TYPE_OPS(Buddy, buddy, buddy) * pages (now with a reference count of zero) are treated like free pages, * allowing the containing memory block to get offlined. A driver that * relies on this feature is aware that re-onlining the memory block will - * require to re-set the pages PageOffline() and not giving them to the - * buddy via online_page_callback_t. + * require not giving them to the buddy via generic_online_page(). * * There are drivers that mark a page PageOffline() and expect there won't= be * any further access to page content. PFN walkers that read content of ra= ndom diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 27e3be75edcf7..0254059efcbe1 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -734,7 +734,7 @@ static inline void section_taint_zone_device(unsigned l= ong pfn) /* * Associate the pfn range with the given zone, initializing the memmaps * and resizing the pgdat/zone data to span the added pages. After this - * call, all affected pages are PG_reserved. + * call, all affected pages are PageOffline(). * * All aligned pageblocks are initialized to the specified migratetype * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related @@ -1100,8 +1100,12 @@ int mhp_init_memmap_on_memory(unsigned long pfn, uns= igned long nr_pages, =20 move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE); =20 - for (i =3D 0; i < nr_pages; i++) - SetPageVmemmapSelfHosted(pfn_to_page(pfn + i)); + for (i =3D 0; i < nr_pages; i++) { + struct page *page =3D pfn_to_page(pfn + i); + + __ClearPageOffline(page); + SetPageVmemmapSelfHosted(page); + } =20 /* * It might be that the vmemmap_pages fully span sections. If that is @@ -1959,9 +1963,9 @@ int __ref offline_pages(unsigned long start_pfn, unsi= gned long nr_pages, * Don't allow to offline memory blocks that contain holes. * Consequently, memory blocks with holes can never get onlined * via the hotplug path - online_pages() - as hotplugged memory has - * no holes. This way, we e.g., don't have to worry about marking - * memory holes PG_reserved, don't need pfn_valid() checks, and can - * avoid using walk_system_ram_range() later. + * no holes. This way, we don't have to worry about memory holes, + * don't need pfn_valid() checks, and can avoid using + * walk_system_ram_range() later. */ walk_system_ram_range(start_pfn, nr_pages, &system_ram_pages, count_system_ram_pages_cb); diff --git a/mm/mm_init.c b/mm/mm_init.c index feb5b6e8c8875..c066c1c474837 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -892,8 +892,14 @@ void __meminit memmap_init_range(unsigned long size, i= nt nid, unsigned long zone =20 page =3D pfn_to_page(pfn); __init_single_page(page, pfn, zone, nid); - if (context =3D=3D MEMINIT_HOTPLUG) - __SetPageReserved(page); + if (context =3D=3D MEMINIT_HOTPLUG) { +#ifdef CONFIG_ZONE_DEVICE + if (zone =3D=3D ZONE_DEVICE) + __SetPageReserved(page); + else +#endif + __SetPageOffline(page); + } =20 /* * Usually, we want to mark the pageblock MIGRATE_MOVABLE, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e0c8a8354be36..039bc52cc9091 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1225,18 +1225,23 @@ void __free_pages_core(struct page *page, unsigned = int order, * When initializing the memmap, __init_single_page() sets the refcount * of all pages to 1 ("allocated"/"not free"). We have to set the * refcount of all involved pages to 0. + * + * Note that hotplugged memory pages are initialized to PageOffline(). + * Pages freed from memblock might be marked as reserved. */ - prefetchw(p); - for (loop =3D 0; loop < (nr_pages - 1); loop++, p++) { - prefetchw(p + 1); - __ClearPageReserved(p); - set_page_count(p, 0); - } - __ClearPageReserved(p); - set_page_count(p, 0); - if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && unlikely(context =3D=3D MEMINIT_HOTPLUG)) { + prefetchw(p); + for (loop =3D 0; loop < (nr_pages - 1); loop++, p++) { + prefetchw(p + 1); + VM_WARN_ON_ONCE(PageReserved(p)); + __ClearPageOffline(p); + set_page_count(p, 0); + } + VM_WARN_ON_ONCE(PageReserved(p)); + __ClearPageOffline(p); + set_page_count(p, 0); + /* * Freeing the page with debug_pagealloc enabled will try to * unmap it; some archs don't like double-unmappings, so @@ -1245,6 +1250,15 @@ void __free_pages_core(struct page *page, unsigned i= nt order, debug_pagealloc_map_pages(page, nr_pages); adjust_managed_page_count(page, nr_pages); } else { + prefetchw(p); + for (loop =3D 0; loop < (nr_pages - 1); loop++, p++) { + prefetchw(p + 1); + __ClearPageReserved(p); + set_page_count(p, 0); + } + __ClearPageReserved(p); + set_page_count(p, 0); + /* memblock adjusts totalram_pages() ahead of time. */ atomic_long_add(nr_pages, &page_zone(page)->managed_pages); } --=20 2.45.1