From nobody Sat Jun 20 00:54:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A920C4332F for ; Thu, 24 Mar 2022 22:44:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355499AbiCXWqR (ORCPT ); Thu, 24 Mar 2022 18:46:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239751AbiCXWqP (ORCPT ); Thu, 24 Mar 2022 18:46:15 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37C7AB6E44 for ; Thu, 24 Mar 2022 15:44:41 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 67FF15C0194; Thu, 24 Mar 2022 18:44:40 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Thu, 24 Mar 2022 18:44:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=cc :cc:content-transfer-encoding:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to :reply-to:sender:subject:subject:to:to; s=fm3; bh=Lu9+y7/OA8vzrI JEVfxTWNNqIdwqiBgzFUZF7HyFHFY=; b=yyzvg03EkvLu5ylQIGcqrVSHyCKg3m l1iF5ykmC1otScAYznKAGrwaS32THrBeFsC76HKFrmX/uDr9FsZIe847jNRZ8bIq 8ya+ikcW6ckyD45ZamVOfSx+K9+xeKnz2XYXBfAcb7eRE0dFP+npuDZ0x3o0jYwo SzoFIGN+peiqH4zuOOnJYtwKH7W2nloLZxsOdp25bqIM75b3SqZdgYA5s3/kHVF9 +Hu1v9VbfcwgGu+VjH+I02WCaaHy8T3j+tVYmwnG6dNT7ajC5MFvLRyyNNeApRlQ QziKWaq3YWwmO2AtBvKkLeO8RgknN3wJ/Xa88X/RNQgS66hc9JiFIEqA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=Lu9+y7/OA8vzrIJEVfxTWNNqIdwqiBgzFUZF7HyFHFY=; b=MoCGNkIf nna3AndZwp2mZeLc1fdv/d/c/lYq5p2Kn1ipQfwdnGNlZBKmOyftfY4my+Dqduve pJUsqN7DyOTwDZScDUEKiCthWLC2A/VOeVvnFxJoH7QBf0YXP0rifBKpdh03Sg/b KM05FUfVYrkNSymCDILiO9u1JkWg673rjEoV3sZdX0wNuEzC2vZlYw/0rnoA80mo rEpS44pJ0iq7OeRJw+PUbAOVbo5+hGlK/ewx0lxInn8IE9gVSlfR9PkkdO7//DxL crPghJCqRAM+wDo7FIZgN51E2TmS9QdGvk+VSsJ/+txgV6tuI35Fld1/KjjWQqYL anZoGKHztfg9Bg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrudegledgudeivdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtqhertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeeije euvdeuudeuhfeghfehieeuvdetvdeugfeigeevteeuieeuhedtgeduheefleenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpeiiihdrhigrnhessh gvnhhtrdgtohhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 24 Mar 2022 18:44:39 -0400 (EDT) From: Zi Yan To: David Hildenbrand , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, Vlastimil Babka , Mel Gorman , Eric Ren , Mike Rapoport , Oscar Salvador , Christophe Leroy , Zi Yan , Mike Rapoport Subject: [PATCH v9 1/5] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Date: Thu, 24 Mar 2022 18:44:31 -0400 Message-Id: <20220324224435.17794-2-zi.yan@sent.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220324224435.17794-1-zi.yan@sent.com> References: <20220324224435.17794-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Zi Yan has_unmovable_pages() is only used in mm/page_isolation.c. Move it from mm/page_alloc.c and make it static. Signed-off-by: Zi Yan Reviewed-by: Oscar Salvador Reviewed-by: Mike Rapoport Acked-by: David Hildenbrand --- include/linux/page-isolation.h | 2 - mm/page_alloc.c | 119 --------------------------------- mm/page_isolation.c | 119 +++++++++++++++++++++++++++++++++ 3 files changed, 119 insertions(+), 121 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 572458016331..e14eddf6741a 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -33,8 +33,6 @@ static inline bool is_migrate_isolate(int migratetype) #define MEMORY_OFFLINE 0x1 #define REPORT_FAILURE 0x2 =20 -struct page *has_unmovable_pages(struct zone *zone, struct page *page, - int migratetype, int flags); void set_pageblock_migratetype(struct page *page, int migratetype); int move_freepages_block(struct zone *zone, struct page *page, int migratetype, int *num_movable); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f648decfe39d..6de57d058d3d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8936,125 +8936,6 @@ void *__init alloc_large_system_hash(const char *ta= blename, return table; } =20 -/* - * This function checks whether pageblock includes unmovable pages or not. - * - * PageLRU check without isolation or lru_lock could race so that - * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable - * check without lock_page also may miss some movable non-lru pages at - * race condition. So you can't expect this function should be exact. - * - * Returns a page without holding a reference. If the caller wants to - * dereference that page (e.g., dumping), it has to make sure that it - * cannot get removed (e.g., via memory unplug) concurrently. - * - */ -struct page *has_unmovable_pages(struct zone *zone, struct page *page, - int migratetype, int flags) -{ - unsigned long iter =3D 0; - unsigned long pfn =3D page_to_pfn(page); - unsigned long offset =3D pfn % pageblock_nr_pages; - - if (is_migrate_cma_page(page)) { - /* - * CMA allocations (alloc_contig_range) really need to mark - * isolate CMA pageblocks even when they are not movable in fact - * so consider them movable here. - */ - if (is_migrate_cma(migratetype)) - return NULL; - - return page; - } - - for (; iter < pageblock_nr_pages - offset; iter++) { - page =3D pfn_to_page(pfn + iter); - - /* - * Both, bootmem allocations and memory holes are marked - * PG_reserved and are unmovable. We can even have unmovable - * allocations inside ZONE_MOVABLE, for example when - * specifying "movablecore". - */ - if (PageReserved(page)) - return page; - - /* - * If the zone is movable and we have ruled out all reserved - * pages then it should be reasonably safe to assume the rest - * is movable. - */ - if (zone_idx(zone) =3D=3D ZONE_MOVABLE) - continue; - - /* - * Hugepages are not in LRU lists, but they're movable. - * THPs are on the LRU, but need to be counted as #small pages. - * We need not scan over tail pages because we don't - * handle each tail page individually in migration. - */ - if (PageHuge(page) || PageTransCompound(page)) { - struct page *head =3D compound_head(page); - unsigned int skip_pages; - - if (PageHuge(page)) { - if (!hugepage_migration_supported(page_hstate(head))) - return page; - } else if (!PageLRU(head) && !__PageMovable(head)) { - return page; - } - - skip_pages =3D compound_nr(head) - (page - head); - iter +=3D skip_pages - 1; - continue; - } - - /* - * We can't use page_count without pin a page - * because another CPU can free compound page. - * This check already skips compound tails of THP - * because their page->_refcount is zero at all time. - */ - if (!page_ref_count(page)) { - if (PageBuddy(page)) - iter +=3D (1 << buddy_order(page)) - 1; - continue; - } - - /* - * The HWPoisoned page may be not in buddy system, and - * page_count() is not 0. - */ - if ((flags & MEMORY_OFFLINE) && PageHWPoison(page)) - continue; - - /* - * We treat all PageOffline() pages as movable when offlining - * to give drivers a chance to decrement their reference count - * in MEM_GOING_OFFLINE in order to indicate that these pages - * can be offlined as there are no direct references anymore. - * For actually unmovable PageOffline() where the driver does - * not support this, we will fail later when trying to actually - * move these pages that still have a reference count > 0. - * (false negatives in this function only) - */ - if ((flags & MEMORY_OFFLINE) && PageOffline(page)) - continue; - - if (__PageMovable(page) || PageLRU(page)) - continue; - - /* - * If there are RECLAIMABLE pages, we need to check - * it. But now, memory offline itself doesn't call - * shrink_node_slabs() and it still to be fixed. - */ - return page; - } - return NULL; -} - #ifdef CONFIG_CONTIG_ALLOC static unsigned long pfn_max_align_down(unsigned long pfn) { diff --git a/mm/page_isolation.c b/mm/page_isolation.c index f67c4c70f17f..b34f1310aeaa 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -15,6 +15,125 @@ #define CREATE_TRACE_POINTS #include =20 +/* + * This function checks whether pageblock includes unmovable pages or not. + * + * PageLRU check without isolation or lru_lock could race so that + * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable + * check without lock_page also may miss some movable non-lru pages at + * race condition. So you can't expect this function should be exact. + * + * Returns a page without holding a reference. If the caller wants to + * dereference that page (e.g., dumping), it has to make sure that it + * cannot get removed (e.g., via memory unplug) concurrently. + * + */ +static struct page *has_unmovable_pages(struct zone *zone, struct page *pa= ge, + int migratetype, int flags) +{ + unsigned long iter =3D 0; + unsigned long pfn =3D page_to_pfn(page); + unsigned long offset =3D pfn % pageblock_nr_pages; + + if (is_migrate_cma_page(page)) { + /* + * CMA allocations (alloc_contig_range) really need to mark + * isolate CMA pageblocks even when they are not movable in fact + * so consider them movable here. + */ + if (is_migrate_cma(migratetype)) + return NULL; + + return page; + } + + for (; iter < pageblock_nr_pages - offset; iter++) { + page =3D pfn_to_page(pfn + iter); + + /* + * Both, bootmem allocations and memory holes are marked + * PG_reserved and are unmovable. We can even have unmovable + * allocations inside ZONE_MOVABLE, for example when + * specifying "movablecore". + */ + if (PageReserved(page)) + return page; + + /* + * If the zone is movable and we have ruled out all reserved + * pages then it should be reasonably safe to assume the rest + * is movable. + */ + if (zone_idx(zone) =3D=3D ZONE_MOVABLE) + continue; + + /* + * Hugepages are not in LRU lists, but they're movable. + * THPs are on the LRU, but need to be counted as #small pages. + * We need not scan over tail pages because we don't + * handle each tail page individually in migration. + */ + if (PageHuge(page) || PageTransCompound(page)) { + struct page *head =3D compound_head(page); + unsigned int skip_pages; + + if (PageHuge(page)) { + if (!hugepage_migration_supported(page_hstate(head))) + return page; + } else if (!PageLRU(head) && !__PageMovable(head)) { + return page; + } + + skip_pages =3D compound_nr(head) - (page - head); + iter +=3D skip_pages - 1; + continue; + } + + /* + * We can't use page_count without pin a page + * because another CPU can free compound page. + * This check already skips compound tails of THP + * because their page->_refcount is zero at all time. + */ + if (!page_ref_count(page)) { + if (PageBuddy(page)) + iter +=3D (1 << buddy_order(page)) - 1; + continue; + } + + /* + * The HWPoisoned page may be not in buddy system, and + * page_count() is not 0. + */ + if ((flags & MEMORY_OFFLINE) && PageHWPoison(page)) + continue; + + /* + * We treat all PageOffline() pages as movable when offlining + * to give drivers a chance to decrement their reference count + * in MEM_GOING_OFFLINE in order to indicate that these pages + * can be offlined as there are no direct references anymore. + * For actually unmovable PageOffline() where the driver does + * not support this, we will fail later when trying to actually + * move these pages that still have a reference count > 0. + * (false negatives in this function only) + */ + if ((flags & MEMORY_OFFLINE) && PageOffline(page)) + continue; + + if (__PageMovable(page) || PageLRU(page)) + continue; + + /* + * If there are RECLAIMABLE pages, we need to check + * it. But now, memory offline itself doesn't call + * shrink_node_slabs() and it still to be fixed. + */ + return page; + } + return NULL; +} + static int set_migratetype_isolate(struct page *page, int migratetype, int= isol_flags) { struct zone *zone =3D page_zone(page); --=20 2.35.1 From nobody Sat Jun 20 00:54:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A8F9C433F5 for ; Thu, 24 Mar 2022 22:44:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355529AbiCXWq1 (ORCPT ); Thu, 24 Mar 2022 18:46:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355489AbiCXWqQ (ORCPT ); Thu, 24 Mar 2022 18:46:16 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3ABDCB6E4D for ; Thu, 24 Mar 2022 15:44:41 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id F31585C019A; Thu, 24 Mar 2022 18:44:40 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Thu, 24 Mar 2022 18:44:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=cc :cc:content-transfer-encoding:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to :reply-to:sender:subject:subject:to:to; s=fm3; bh=NA2etCGxuoebSh zJaGa4g6mJM+OnWpoUrsdpDAYfOko=; b=HpMh6CAqozEJFwycpcf4bsvKE2b5d3 0Jmlro/1EflYmMlqbfIlb6cd9nc0t6bvXz0/21aCX4cbBpGXKyQ3DweC31tKFJ1R Gx1Lraw8niP5D5dggoNxs6mJLZ6jMaOw9bQo9RCRM8+WRR5rwlujKuMm5kopXHbk sNZMNZJJGKzX4KPDAPmeWnEHWSnSoo826NnH3xHIvgKLmSoV6QX6bKLJ7snQ8rAE fiTRUBHUkbL8t4V55DdK+WAuQrs/JGwlAI6u8OaSNzAZe43tJUCqAwjCFcRyMg8x JGVl32f7aiItBPnLGM5oSMNb9LmvVaEu34+1HJIkqBK0cr2kvvR57KWw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=NA2etCGxuoebShzJaGa4g6mJM+OnWpoUrsdpDAYfOko=; b=jK8UCTe8 U9HOpdGG/AUeSOEmiOdw9MjPB/DUYUksrflXQ+3LYpgMWsbN1ZIVezWkowGTcF7j NNe+tsIX9GI40T0COQ0vMOxkHx0xn8CRs1HQ3l5jZpFqw60abSMSNTYcbzQmFsiI BFLABQxj2mvvJKLXqR4ii6VRKqAk0qbOQnvhfeXP3gQ7p+2PVyF48h6lbdoOdRne MuIlQmojvGgRGUnEvUqcmGCYIaDi2zDRocAfLPuQswl8AX4lOoF0tOJ6rb7r47Or ZuTeM9IZSyS4m7qym6IWsWpQVR7nTuJzc6CDOFMguPrv0QyFq5bsQXO5SLHmRWuj M566a7O0kGil7w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrudegledgudeivdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtqhertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeeije euvdeuudeuhfeghfehieeuvdetvdeugfeigeevteeuieeuhedtgeduheefleenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpeiiihdrhigrnhessh gvnhhtrdgtohhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 24 Mar 2022 18:44:40 -0400 (EDT) From: Zi Yan To: David Hildenbrand , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, Vlastimil Babka , Mel Gorman , Eric Ren , Mike Rapoport , Oscar Salvador , Christophe Leroy , Zi Yan Subject: [PATCH v9 2/5] mm: page_isolation: check specified range for unmovable pages Date: Thu, 24 Mar 2022 18:44:32 -0400 Message-Id: <20220324224435.17794-3-zi.yan@sent.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220324224435.17794-1-zi.yan@sent.com> References: <20220324224435.17794-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Zi Yan Enable set_migratetype_isolate() to check specified sub-range for unmovable pages during isolation. Page isolation is done at MAX_ORDER_NR_PAEGS granularity, but not all pages within that granularity are intended to be isolated. For example, alloc_contig_range(), which uses page isolation, allows ranges without alignment. This commit makes unmovable page check only look for interesting pages, so that page isolation can succeed for any non-overlapping ranges. Signed-off-by: Zi Yan --- mm/page_alloc.c | 16 ++-------- mm/page_isolation.c | 78 ++++++++++++++++++++++++++++++++------------- 2 files changed, 57 insertions(+), 37 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6de57d058d3d..f24fe057389f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8937,16 +8937,6 @@ void *__init alloc_large_system_hash(const char *tab= lename, } =20 #ifdef CONFIG_CONTIG_ALLOC -static unsigned long pfn_max_align_down(unsigned long pfn) -{ - return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES); -} - -static unsigned long pfn_max_align_up(unsigned long pfn) -{ - return ALIGN(pfn, MAX_ORDER_NR_PAGES); -} - #if defined(CONFIG_DYNAMIC_DEBUG) || \ (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE)) /* Usage: See admin-guide/dynamic-debug-howto.rst */ @@ -9091,8 +9081,7 @@ int alloc_contig_range(unsigned long start, unsigned = long end, * put back to page allocator so that buddy can use them. */ =20 - ret =3D start_isolate_page_range(pfn_max_align_down(start), - pfn_max_align_up(end), migratetype, 0); + ret =3D start_isolate_page_range(start, end, migratetype, 0); if (ret) return ret; =20 @@ -9173,8 +9162,7 @@ int alloc_contig_range(unsigned long start, unsigned = long end, free_contig_range(end, outer_end - end); =20 done: - undo_isolate_page_range(pfn_max_align_down(start), - pfn_max_align_up(end), migratetype); + undo_isolate_page_range(start, end, migratetype); return ret; } EXPORT_SYMBOL(alloc_contig_range); diff --git a/mm/page_isolation.c b/mm/page_isolation.c index b34f1310aeaa..0223c9a4cff3 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -16,7 +16,9 @@ #include =20 /* - * This function checks whether pageblock includes unmovable pages or not. + * This function checks whether the range [start_pfn, end_pfn) includes + * unmovable pages or not. The range must fall into a single pageblock and + * consequently belong to a single zone. * * PageLRU check without isolation or lru_lock could race so that * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable @@ -28,12 +30,14 @@ * cannot get removed (e.g., via memory unplug) concurrently. * */ -static struct page *has_unmovable_pages(struct zone *zone, struct page *pa= ge, - int migratetype, int flags) +static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned = long end_pfn, + int migratetype, int flags) { - unsigned long iter =3D 0; - unsigned long pfn =3D page_to_pfn(page); - unsigned long offset =3D pfn % pageblock_nr_pages; + unsigned long pfn =3D start_pfn; + struct page *page =3D pfn_to_page(pfn); + + VM_BUG_ON(ALIGN_DOWN(start_pfn, pageblock_nr_pages) !=3D + ALIGN_DOWN(end_pfn - 1, pageblock_nr_pages)); =20 if (is_migrate_cma_page(page)) { /* @@ -47,8 +51,11 @@ static struct page *has_unmovable_pages(struct zone *zon= e, struct page *page, return page; } =20 - for (; iter < pageblock_nr_pages - offset; iter++) { - page =3D pfn_to_page(pfn + iter); + for (pfn =3D start_pfn; pfn < end_pfn; pfn++) { + struct zone *zone; + + page =3D pfn_to_page(pfn); + zone =3D page_zone(page); =20 /* * Both, bootmem allocations and memory holes are marked @@ -85,7 +92,7 @@ static struct page *has_unmovable_pages(struct zone *zone= , struct page *page, } =20 skip_pages =3D compound_nr(head) - (page - head); - iter +=3D skip_pages - 1; + pfn +=3D skip_pages - 1; continue; } =20 @@ -97,7 +104,7 @@ static struct page *has_unmovable_pages(struct zone *zon= e, struct page *page, */ if (!page_ref_count(page)) { if (PageBuddy(page)) - iter +=3D (1 << buddy_order(page)) - 1; + pfn +=3D (1 << buddy_order(page)) - 1; continue; } =20 @@ -134,11 +141,18 @@ static struct page *has_unmovable_pages(struct zone *= zone, struct page *page, return NULL; } =20 -static int set_migratetype_isolate(struct page *page, int migratetype, int= isol_flags) +/* + * This function set pageblock migratetype to isolate if no unmovable page= is + * present in [start_pfn, end_pfn). The pageblock must intersect with + * [start_pfn, end_pfn). + */ +static int set_migratetype_isolate(struct page *page, int migratetype, int= isol_flags, + unsigned long start_pfn, unsigned long end_pfn) { struct zone *zone =3D page_zone(page); struct page *unmovable; unsigned long flags; + unsigned long check_unmovable_start, check_unmovable_end; =20 spin_lock_irqsave(&zone->lock, flags); =20 @@ -155,8 +169,16 @@ static int set_migratetype_isolate(struct page *page, = int migratetype, int isol_ /* * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. * We just check MOVABLE pages. + * + * Pass the intersection of [start_pfn, end_pfn) and the page's pageblock + * to avoid redundant checks. */ - unmovable =3D has_unmovable_pages(zone, page, migratetype, isol_flags); + check_unmovable_start =3D max(page_to_pfn(page), start_pfn); + check_unmovable_end =3D min(ALIGN(page_to_pfn(page) + 1, pageblock_nr_pag= es), + end_pfn); + + unmovable =3D has_unmovable_pages(check_unmovable_start, check_unmovable_= end, + migratetype, isol_flags); if (!unmovable) { unsigned long nr_pages; int mt =3D get_pageblock_migratetype(page); @@ -262,12 +284,21 @@ __first_valid_page(unsigned long pfn, unsigned long n= r_pages) return NULL; } =20 +static unsigned long pfn_max_align_down(unsigned long pfn) +{ + return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES); +} + +static unsigned long pfn_max_align_up(unsigned long pfn) +{ + return ALIGN(pfn, MAX_ORDER_NR_PAGES); +} + /** * start_isolate_page_range() - make page-allocation-type of range of page= s to * be MIGRATE_ISOLATE. * @start_pfn: The lower PFN of the range to be isolated. * @end_pfn: The upper PFN of the range to be isolated. - * start_pfn/end_pfn must be aligned to pageblock_order. * @migratetype: Migrate type to set in error recovery. * @flags: The following flags are allowed (they can be combined in * a bit mask) @@ -309,15 +340,16 @@ int start_isolate_page_range(unsigned long start_pfn,= unsigned long end_pfn, unsigned long pfn; struct page *page; =20 - BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages)); - BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages)); + unsigned long isolate_start =3D pfn_max_align_down(start_pfn); + unsigned long isolate_end =3D pfn_max_align_up(end_pfn); =20 - for (pfn =3D start_pfn; - pfn < end_pfn; + for (pfn =3D isolate_start; + pfn < isolate_end; pfn +=3D pageblock_nr_pages) { page =3D __first_valid_page(pfn, pageblock_nr_pages); - if (page && set_migratetype_isolate(page, migratetype, flags)) { - undo_isolate_page_range(start_pfn, pfn, migratetype); + if (page && set_migratetype_isolate(page, migratetype, flags, + start_pfn, end_pfn)) { + undo_isolate_page_range(isolate_start, pfn, migratetype); return -EBUSY; } } @@ -332,12 +364,12 @@ void undo_isolate_page_range(unsigned long start_pfn,= unsigned long end_pfn, { unsigned long pfn; struct page *page; + unsigned long isolate_start =3D pfn_max_align_down(start_pfn); + unsigned long isolate_end =3D pfn_max_align_up(end_pfn); =20 - BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages)); - BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages)); =20 - for (pfn =3D start_pfn; - pfn < end_pfn; + for (pfn =3D isolate_start; + pfn < isolate_end; pfn +=3D pageblock_nr_pages) { page =3D __first_valid_page(pfn, pageblock_nr_pages); if (!page || !is_migrate_isolate_page(page)) --=20 2.35.1 From nobody Sat Jun 20 00:54:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F4C8C433F5 for ; Thu, 24 Mar 2022 22:45:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353801AbiCXWqg (ORCPT ); Thu, 24 Mar 2022 18:46:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355497AbiCXWqR (ORCPT ); Thu, 24 Mar 2022 18:46:17 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C1C9B6E4E for ; Thu, 24 Mar 2022 15:44:42 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 9A8B55C01A0; Thu, 24 Mar 2022 18:44:41 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Thu, 24 Mar 2022 18:44:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=cc :cc:content-transfer-encoding:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to :reply-to:sender:subject:subject:to:to; s=fm3; bh=w52cqJdzfOThCh +TtTPJbCC7qinoFmlJ6eyWd7DNA6c=; b=o2rDRRjB42zopoLrHPKY1Mrn8ZJDdN dZkC8p+09ZPCAjZxjHIr6wVyIizn1kz5N9H/k2yxmDUjoPmikZw7PGaAzVJ/Dtvt S49kIZomrwgmr3KZwUbn5WNQLW07vzHZDCrc2ElB61PGMILXln7ra5PHmHm3vg+R H8iYM2Q+tCGOF8cfLy7exwDRk4fNJjoZZhgWEBNwbTXhqPM0X3/M5C2DKe6CBEc4 vcuXxCbYpJdcNTCh8vijgpKMWzIODP79SP66e5FzLx2U7SN96wksuwzIAL7kxNDc 8/6TIjLlYIB1MGnVP4hK6GUSniu8eOG620kr1xiXg8+HcB00qe720oEw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=w52cqJdzfOThCh+TtTPJbCC7qinoFmlJ6eyWd7DNA6c=; b=c60eWj+y LkqUfVtvq2yuCXdD+ifKlhRcLH1kM7jDQRXlCUDpmch2/6LExup1jKIC7ce+ccL3 DLCXXHChNUAmmIfOtdnqBum+no0sTUHOoMabIMHsXqpzWeInQaLr06Y5YTkdJcoe QlMQ7B+pdnVgOCgoLqyaNc51a0vJYTcLDM/SPZ7mPH2dM5JMP68rFQHnRiQ0UB6O x8pjE+kqxI/O1La1KVEUi3N07qjeYfaXBLGGPTIRQDDOzFhvoGpwktqlg0qexDno YKpehHHih9XDFfkZr0bjWsAx6LyVIbtifT1Qi1nOZUopVk+6SUep/bYL7lF7dPw+ zc1PG4QpP4qelA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrudegledgudeivdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtqhertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeeije euvdeuudeuhfeghfehieeuvdetvdeugfeigeevteeuieeuhedtgeduheefleenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpeiiihdrhigrnhessh gvnhhtrdgtohhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 24 Mar 2022 18:44:41 -0400 (EDT) From: Zi Yan To: David Hildenbrand , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, Vlastimil Babka , Mel Gorman , Eric Ren , Mike Rapoport , Oscar Salvador , Christophe Leroy , Zi Yan , kernel test robot Subject: [PATCH v9 3/5] mm: make alloc_contig_range work at pageblock granularity Date: Thu, 24 Mar 2022 18:44:33 -0400 Message-Id: <20220324224435.17794-4-zi.yan@sent.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220324224435.17794-1-zi.yan@sent.com> References: <20220324224435.17794-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Zi Yan alloc_contig_range() worked at MAX_ORDER_NR_PAGES granularity to avoid merging pageblocks with different migratetypes. It might unnecessarily convert extra pageblocks at the beginning and at the end of the range. Change alloc_contig_range() to work at pageblock granularity. Special handling is needed for free pages and in-use pages across the boundaries of the range specified alloc_contig_range(). Because these partially isolated pages causes free page accounting issues. The free pages will be split and freed into separate migratetype lists; the in-use pages will be migrated then the freed pages will be handled. Reported-by: kernel test robot Signed-off-by: Zi Yan --- include/linux/page-isolation.h | 2 +- mm/internal.h | 6 ++ mm/memory_hotplug.c | 3 +- mm/page_alloc.c | 107 +++++++++---------- mm/page_isolation.c | 189 ++++++++++++++++++++++++++++++--- 5 files changed, 236 insertions(+), 71 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index e14eddf6741a..52060514f920 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -42,7 +42,7 @@ int move_freepages_block(struct zone *zone, struct page *= page, */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, - unsigned migratetype, int flags); + unsigned migratetype, int flags, gfp_t gfp_flags); =20 /* * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE. diff --git a/mm/internal.h b/mm/internal.h index 9be0227ccc94..9d0a6a898ba8 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -269,6 +269,9 @@ extern void *memmap_alloc(phys_addr_t size, phys_addr_t= align, phys_addr_t min_addr, int nid, bool exact_nid); =20 +void split_free_page(struct page *free_page, + int order, unsigned long split_pfn_offset); + #if defined CONFIG_COMPACTION || defined CONFIG_CMA =20 /* @@ -332,6 +335,9 @@ isolate_freepages_range(struct compact_control *cc, int isolate_migratepages_range(struct compact_control *cc, unsigned long low_pfn, unsigned long end_pfn); + +int __alloc_contig_migrate_range(struct compact_control *cc, + unsigned long start, unsigned long end); #endif int find_suitable_fallback(struct free_area *area, unsigned int order, int migratetype, bool only_stealable, bool *can_steal); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 416b38ca8def..1cf4d4b60772 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1836,7 +1836,8 @@ int __ref offline_pages(unsigned long start_pfn, unsi= gned long nr_pages, /* set above range as isolated */ ret =3D start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, - MEMORY_OFFLINE | REPORT_FAILURE); + MEMORY_OFFLINE | REPORT_FAILURE, + GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL); if (ret) { reason =3D "failure to isolate range"; goto failed_removal_pcplists_disabled; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f24fe057389f..57ebc9e41414 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1138,6 +1138,43 @@ static inline void __free_one_page(struct page *page, page_reporting_notify_free(order); } =20 +/** + * split_free_page() -- split a free page at split_pfn_offset + * @free_page: the original free page + * @order: the order of the page + * @split_pfn_offset: split offset within the page + * + * It is used when the free page crosses two pageblocks with different mig= ratetypes + * at split_pfn_offset within the page. The split free page will be put in= to + * separate migratetype lists afterwards. Otherwise, the function achieves + * nothing. + */ +void split_free_page(struct page *free_page, + int order, unsigned long split_pfn_offset) +{ + struct zone *zone =3D page_zone(free_page); + unsigned long free_page_pfn =3D page_to_pfn(free_page); + unsigned long pfn; + unsigned long flags; + int free_page_order; + + spin_lock_irqsave(&zone->lock, flags); + del_page_from_free_list(free_page, zone, order); + for (pfn =3D free_page_pfn; + pfn < free_page_pfn + (1UL << order);) { + int mt =3D get_pfnblock_migratetype(pfn_to_page(pfn), pfn); + + free_page_order =3D ffs(split_pfn_offset) - 1; + __free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order, + mt, FPI_NONE); + pfn +=3D 1UL << free_page_order; + split_pfn_offset -=3D (1UL << free_page_order); + /* we have done the first part, now switch to second part */ + if (split_pfn_offset =3D=3D 0) + split_pfn_offset =3D (1UL << order) - (pfn - free_page_pfn); + } + spin_unlock_irqrestore(&zone->lock, flags); +} /* * A bad page could be due to a number of fields. Instead of multiple bran= ches, * try and check multiple fields with one check. The caller must do a deta= iled @@ -8959,7 +8996,7 @@ static inline void alloc_contig_dump_pages(struct lis= t_head *page_list) #endif =20 /* [start, end) must belong to a single zone. */ -static int __alloc_contig_migrate_range(struct compact_control *cc, +int __alloc_contig_migrate_range(struct compact_control *cc, unsigned long start, unsigned long end) { /* This function is based on compact_zone() from compaction.c. */ @@ -9041,8 +9078,9 @@ static int __alloc_contig_migrate_range(struct compac= t_control *cc, int alloc_contig_range(unsigned long start, unsigned long end, unsigned migratetype, gfp_t gfp_mask) { - unsigned long outer_start, outer_end; - unsigned int order; + unsigned long outer_end; + unsigned long alloc_start =3D ALIGN_DOWN(start, pageblock_nr_pages); + unsigned long alloc_end =3D ALIGN(end, pageblock_nr_pages); int ret =3D 0; =20 struct compact_control cc =3D { @@ -9061,14 +9099,11 @@ int alloc_contig_range(unsigned long start, unsigne= d long end, * What we do here is we mark all pageblocks in range as * MIGRATE_ISOLATE. Because pageblock and max order pages may * have different sizes, and due to the way page allocator - * work, we align the range to biggest of the two pages so - * that page allocator won't try to merge buddies from - * different pageblocks and change MIGRATE_ISOLATE to some - * other migration type. + * work, start_isolate_page_range() has special handlings for this. * * Once the pageblocks are marked as MIGRATE_ISOLATE, we * migrate the pages from an unaligned range (ie. pages that - * we are interested in). This will put all the pages in + * we are interested in). This will put all the pages in * range back to page allocator as MIGRATE_ISOLATE. * * When this is done, we take the pages in range from page @@ -9081,9 +9116,9 @@ int alloc_contig_range(unsigned long start, unsigned = long end, * put back to page allocator so that buddy can use them. */ =20 - ret =3D start_isolate_page_range(start, end, migratetype, 0); + ret =3D start_isolate_page_range(start, end, migratetype, 0, gfp_mask); if (ret) - return ret; + goto done; =20 drain_all_pages(cc.zone); =20 @@ -9102,64 +9137,24 @@ int alloc_contig_range(unsigned long start, unsigne= d long end, goto done; ret =3D 0; =20 - /* - * Pages from [start, end) are within a MAX_ORDER_NR_PAGES - * aligned blocks that are marked as MIGRATE_ISOLATE. What's - * more, all pages in [start, end) are free in page allocator. - * What we are going to do is to allocate all pages from - * [start, end) (that is remove them from page allocator). - * - * The only problem is that pages at the beginning and at the - * end of interesting range may be not aligned with pages that - * page allocator holds, ie. they can be part of higher order - * pages. Because of this, we reserve the bigger range and - * once this is done free the pages we are not interested in. - * - * We don't have to hold zone->lock here because the pages are - * isolated thus they won't get removed from buddy. - */ - - order =3D 0; - outer_start =3D start; - while (!PageBuddy(pfn_to_page(outer_start))) { - if (++order >=3D MAX_ORDER) { - outer_start =3D start; - break; - } - outer_start &=3D ~0UL << order; - } - - if (outer_start !=3D start) { - order =3D buddy_order(pfn_to_page(outer_start)); - - /* - * outer_start page could be small order buddy page and - * it doesn't include start page. Adjust outer_start - * in this case to report failed page properly - * on tracepoint in test_pages_isolated() - */ - if (outer_start + (1UL << order) <=3D start) - outer_start =3D start; - } - /* Make sure the range is really isolated. */ - if (test_pages_isolated(outer_start, end, 0)) { + if (test_pages_isolated(alloc_start, alloc_end, 0)) { ret =3D -EBUSY; goto done; } =20 /* Grab isolated pages from freelists. */ - outer_end =3D isolate_freepages_range(&cc, outer_start, end); + outer_end =3D isolate_freepages_range(&cc, alloc_start, alloc_end); if (!outer_end) { ret =3D -EBUSY; goto done; } =20 /* Free head and tail (if any) */ - if (start !=3D outer_start) - free_contig_range(outer_start, start - outer_start); - if (end !=3D outer_end) - free_contig_range(end, outer_end - end); + if (start !=3D alloc_start) + free_contig_range(alloc_start, start - alloc_start); + if (end !=3D alloc_end) + free_contig_range(end, alloc_end - end); =20 done: undo_isolate_page_range(start, end, migratetype); diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 0223c9a4cff3..a24a521f62c6 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -284,16 +284,156 @@ __first_valid_page(unsigned long pfn, unsigned long = nr_pages) return NULL; } =20 -static unsigned long pfn_max_align_down(unsigned long pfn) +/** + * isolate_single_pageblock() -- tries to isolate a pageblock that might be + * within a free or in-use page. + * @boundary_pfn: pageblock-aligned pfn that a page might cross + * @gfp_flags: GFP flags used for migrating pages + * @isolate_before: isolate the pageblock before the boundary_pfn + * + * Free and in-use pages can be as big as MAX_ORDER-1 and contain more tha= n one + * pageblock. When not all pageblocks within a page are isolated at the sa= me + * time, free page accounting can go wrong. For example, in the case of + * MAX_ORDER-1 =3D pageblock_order + 1, a MAX_ORDER-1 page has two pagelbo= cks. + * [ MAX_ORDER-1 ] + * [ pageblock0 | pageblock1 ] + * When either pageblock is isolated, if it is a free page, the page is not + * split into separate migratetype lists, which is supposed to; if it is an + * in-use page and freed later, __free_one_page() does not split the free = page + * either. The function handles this by splitting the free page or migrati= ng + * the in-use page then splitting the free page. + */ +static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_= flags, + bool isolate_before) { - return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES); -} + unsigned char saved_mt; + unsigned long start_pfn; + unsigned long isolate_pageblock; + unsigned long pfn; + struct zone *zone; =20 -static unsigned long pfn_max_align_up(unsigned long pfn) -{ - return ALIGN(pfn, MAX_ORDER_NR_PAGES); + VM_BUG_ON(!IS_ALIGNED(boundary_pfn, pageblock_nr_pages)); + + if (isolate_before) + isolate_pageblock =3D boundary_pfn - pageblock_nr_pages; + else + isolate_pageblock =3D boundary_pfn; + + /* + * scan at the beginning of MAX_ORDER_NR_PAGES aligned range to avoid + * only isolating a subset of pageblocks from a bigger than pageblock + * free or in-use page. Also make sure all to-be-isolated pageblocks + * are within the same zone. + */ + zone =3D page_zone(pfn_to_page(isolate_pageblock)); + start_pfn =3D max(ALIGN_DOWN(isolate_pageblock, MAX_ORDER_NR_PAGES), + zone->zone_start_pfn); + + saved_mt =3D get_pageblock_migratetype(pfn_to_page(isolate_pageblock)); + set_pageblock_migratetype(pfn_to_page(isolate_pageblock), MIGRATE_ISOLATE= ); + + /* + * Bail out early when the to-be-isolated pageblock does not form + * a free or in-use page across boundary_pfn: + * + * 1. isolate before boundary_pfn: the page after is not online + * 2. isolate after boundary_pfn: the page before is not online + * + * This also ensures correctness. Without it, when isolate_before is + * false, the page can be NULL in the for loop below. + */ + if (isolate_before) { + if (!pfn_to_online_page(boundary_pfn)) + return 0; + } else { + if (!pfn_to_online_page(boundary_pfn - 1)) + return 0; + } + + for (pfn =3D start_pfn; pfn < boundary_pfn;) { + struct page *page =3D __first_valid_page(pfn, boundary_pfn - pfn); + + VM_BUG_ON(!page); + pfn =3D page_to_pfn(page); + /* + * start_pfn is MAX_ORDER_NR_PAGES aligned, if there is any + * free pages in [start_pfn, boundary_pfn), its head page will + * always be in the range. + */ + if (PageBuddy(page)) { + int order =3D buddy_order(page); + + if (pfn + (1UL << order) > boundary_pfn) + split_free_page(page, order, boundary_pfn - pfn); + pfn +=3D (1UL << order); + continue; + } + /* + * migrate compound pages then let the free page handling code + * above do the rest. If migration is not enabled, just fail. + */ + if (PageHuge(page) || PageTransCompound(page)) { +#if defined CONFIG_COMPACTION || defined CONFIG_CMA + unsigned long nr_pages =3D compound_nr(page); + int order =3D compound_order(page); + struct page *head =3D compound_head(page); + unsigned long head_pfn =3D page_to_pfn(head); + int ret; + struct compact_control cc =3D { + .nr_migratepages =3D 0, + .order =3D -1, + .zone =3D page_zone(pfn_to_page(head_pfn)), + .mode =3D MIGRATE_SYNC, + .ignore_skip_hint =3D true, + .no_set_skip_hint =3D true, + .gfp_mask =3D gfp_flags, + .alloc_contig =3D true, + }; + INIT_LIST_HEAD(&cc.migratepages); + + if (head_pfn + nr_pages < boundary_pfn) { + pfn +=3D nr_pages; + continue; + } + + ret =3D __alloc_contig_migrate_range(&cc, head_pfn, + head_pfn + nr_pages); + + if (ret) + goto failed; + /* + * reset pfn, let the free page handling code above + * split the free page to the right migratetype list. + * + * head_pfn is not used here as a hugetlb page order + * can be bigger than MAX_ORDER-1, but after it is + * freed, the free page order is not. Use pfn within + * the range to find the head of the free page and + * reset order to 0 if a hugetlb page with + * >MAX_ORDER-1 order is encountered. + */ + if (order > MAX_ORDER-1) + order =3D 0; + while (!PageBuddy(pfn_to_page(pfn))) { + order++; + pfn &=3D ~0UL << order; + } + continue; +#else + goto failed; +#endif + } + + pfn++; + } + return 0; +failed: + /* restore the original migratetype */ + set_pageblock_migratetype(pfn_to_page(isolate_pageblock), saved_mt); + return -EBUSY; } =20 + /** * start_isolate_page_range() - make page-allocation-type of range of page= s to * be MIGRATE_ISOLATE. @@ -307,6 +447,8 @@ static unsigned long pfn_max_align_up(unsigned long pfn) * and PageOffline() pages. * REPORT_FAILURE - report details about the failure to * isolate the range + * @gfp_flags: GFP flags used for migrating pages that sit across the + * range boundaries. * * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in * the range will never be allocated. Any free pages and pages freed in the @@ -315,6 +457,10 @@ static unsigned long pfn_max_align_up(unsigned long pf= n) * pages in the range finally, the caller have to free all pages in the ra= nge. * test_page_isolated() can be used for test it. * + * The function first tries to isolate the pageblocks at the beginning and= end + * of the range, since there might be pages across the range boundaries. + * Afterwards, it isolates the rest of the range. + * * There is no high level synchronization mechanism that prevents two thre= ads * from trying to isolate overlapping ranges. If this happens, one thread * will notice pageblocks in the overlapping range already set to isolate. @@ -335,21 +481,38 @@ static unsigned long pfn_max_align_up(unsigned long p= fn) * Return: 0 on success and -EBUSY if any part of range cannot be isolated. */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pf= n, - unsigned migratetype, int flags) + unsigned migratetype, int flags, gfp_t gfp_flags) { unsigned long pfn; struct page *page; + /* isolation is done at page block granularity */ + unsigned long isolate_start =3D ALIGN_DOWN(start_pfn, pageblock_nr_pages); + unsigned long isolate_end =3D ALIGN(end_pfn, pageblock_nr_pages); + int ret; =20 - unsigned long isolate_start =3D pfn_max_align_down(start_pfn); - unsigned long isolate_end =3D pfn_max_align_up(end_pfn); + /* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock = */ + ret =3D isolate_single_pageblock(isolate_start, gfp_flags, false); + if (ret) + return ret; =20 - for (pfn =3D isolate_start; - pfn < isolate_end; + /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */ + ret =3D isolate_single_pageblock(isolate_end, gfp_flags, true); + if (ret) { + unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype); + return ret; + } + + /* skip isolated pageblocks at the beginning and end */ + for (pfn =3D isolate_start + pageblock_nr_pages; + pfn < isolate_end - pageblock_nr_pages; pfn +=3D pageblock_nr_pages) { page =3D __first_valid_page(pfn, pageblock_nr_pages); if (page && set_migratetype_isolate(page, migratetype, flags, start_pfn, end_pfn)) { undo_isolate_page_range(isolate_start, pfn, migratetype); + unset_migratetype_isolate( + pfn_to_page(isolate_end - pageblock_nr_pages), + migratetype); return -EBUSY; } } @@ -364,8 +527,8 @@ void undo_isolate_page_range(unsigned long start_pfn, u= nsigned long end_pfn, { unsigned long pfn; struct page *page; - unsigned long isolate_start =3D pfn_max_align_down(start_pfn); - unsigned long isolate_end =3D pfn_max_align_up(end_pfn); + unsigned long isolate_start =3D ALIGN_DOWN(start_pfn, pageblock_nr_pages); + unsigned long isolate_end =3D ALIGN(end_pfn, pageblock_nr_pages); =20 =20 for (pfn =3D isolate_start; --=20 2.35.1 From nobody Sat Jun 20 00:54:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14933C433F5 for ; Thu, 24 Mar 2022 22:44:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355521AbiCXWqY (ORCPT ); Thu, 24 Mar 2022 18:46:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355485AbiCXWqP (ORCPT ); Thu, 24 Mar 2022 18:46:15 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDC6CB6E50 for ; Thu, 24 Mar 2022 15:44:42 -0700 (PDT) Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 135A25C00FE; Thu, 24 Mar 2022 18:44:42 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Thu, 24 Mar 2022 18:44:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=cc :cc:content-transfer-encoding:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to :reply-to:sender:subject:subject:to:to; s=fm3; bh=f9PRefpx2MpIl5 yijJ96u+SD2q2v2niHZWJF91cTEZE=; b=zlsIHf+ko7hXDjnl4e0JXlI13d3DiH VTnMBV5NQIJQW7+4MfNgpIiVk0VzWfo/RZDl53JeBLZWhaorG5c/9MeeXWDmLFaH v2fhjNyt68cq3rYjPKUgvHu+qGc7ZbxTZJIlIl65AGJYelVrhjQHetFoWP6pdsta 4zOXD7gYGE58b0f1yNmlQ/kwmT5qwpdWjQ6ViS2rrwj7D8S0N2Zj8cnwU9PJabkt n/q5GfeOMQb7dcg/fNEh5X4uzz3qxfHVATMAjqSBR7z3CMrccUANfY4iGL3TvwiE OFxIveNCGebQHpOytwcIN1JpHhziZ6ZMWdJGtSUEFn73uJAoE4aC0cjQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=f9PRefpx2MpIl5yijJ96u+SD2q2v2niHZWJF91cTEZE=; b=iediQyBs CiVWpucnx9uunrZzCBibY6sZNe3byiyQ7tcjFXwIMn9XZR1vQt3lF9Os3HAB1Vt0 n2F445nXGuWzIm02pBmmBvNqKKeXJv/rDl+2U0fgNKnBpfj9unTB8YBGjdUc0Umv +7i5qsMozjTvs0wnPozBmmEsm9NrkQN8dYmLhVbFW0gpTyqvQEzeETZc22AKwJM6 VuvEK7UQPUClu3mqrut0rjmwIryArKnhK6pF53pyHIXgIjtn6YisHxmLAKB9ejt1 vUVLmZCiTNkzaZVuKl6t6ikmahDjx7L6e9wjHRqddASv6a28Dcx+jeCtoAer/XE7 iYtGru+sxPomXQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrudegledgudeivdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtqhertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeeije euvdeuudeuhfeghfehieeuvdetvdeugfeigeevteeuieeuhedtgeduheefleenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpeiiihdrhigrnhessh gvnhhtrdgtohhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 24 Mar 2022 18:44:41 -0400 (EDT) From: Zi Yan To: David Hildenbrand , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, Vlastimil Babka , Mel Gorman , Eric Ren , Mike Rapoport , Oscar Salvador , Christophe Leroy , Zi Yan Subject: [PATCH v9 4/5] mm: cma: use pageblock_order as the single alignment Date: Thu, 24 Mar 2022 18:44:34 -0400 Message-Id: <20220324224435.17794-5-zi.yan@sent.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220324224435.17794-1-zi.yan@sent.com> References: <20220324224435.17794-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Zi Yan Now alloc_contig_range() works at pageblock granularity. Change CMA allocation, which uses alloc_contig_range(), to use pageblock_nr_pages alignment. Signed-off-by: Zi Yan --- include/linux/cma.h | 4 ++-- include/linux/mmzone.h | 5 +---- mm/page_alloc.c | 4 ++-- 3 files changed, 5 insertions(+), 8 deletions(-) diff --git a/include/linux/cma.h b/include/linux/cma.h index a6f637342740..63873b93deaa 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -17,11 +17,11 @@ #define CMA_MAX_NAME 64 =20 /* - * TODO: once the buddy -- especially pageblock merging and alloc_contig_r= ange() + * the buddy -- especially pageblock merging and alloc_contig_range() * -- can deal with only some pageblocks of a higher-order page being * MIGRATE_CMA, we can use pageblock_nr_pages. */ -#define CMA_MIN_ALIGNMENT_PAGES MAX_ORDER_NR_PAGES +#define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES) =20 struct cma; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 962b14d403e8..0725c50ca0cb 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -54,10 +54,7 @@ enum migratetype { * * The way to use it is to change migratetype of a range of * pageblocks to MIGRATE_CMA which can be done by - * __free_pageblock_cma() function. What is important though - * is that a range of pageblocks must be aligned to - * MAX_ORDER_NR_PAGES should biggest page be bigger than - * a single pageblock. + * __free_pageblock_cma() function. */ MIGRATE_CMA, #endif diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 57ebc9e41414..e5b545d60456 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -9064,8 +9064,8 @@ int __alloc_contig_migrate_range(struct compact_contr= ol *cc, * be either of the two. * @gfp_mask: GFP mask to use during compaction * - * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES - * aligned. The PFN range must belong to a single zone. + * The PFN range does not have to be pageblock aligned. The PFN range must + * belong to a single zone. * * The first thing this routine does is attempt to MIGRATE_ISOLATE all * pageblocks in the range. Once isolated, the pageblocks should not --=20 2.35.1 From nobody Sat Jun 20 00:54:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02990C433F5 for ; Thu, 24 Mar 2022 22:45:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355522AbiCXWqb (ORCPT ); Thu, 24 Mar 2022 18:46:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355494AbiCXWqQ (ORCPT ); Thu, 24 Mar 2022 18:46:16 -0400 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6669EB6E44 for ; Thu, 24 Mar 2022 15:44:44 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 93AD45C01A6; Thu, 24 Mar 2022 18:44:42 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Thu, 24 Mar 2022 18:44:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=cc :cc:content-transfer-encoding:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to :reply-to:sender:subject:subject:to:to; s=fm3; bh=RypWuAcjXYhvuM rAXCozdcgxcIlBzpaT89Q2nmHZmv8=; b=fpM1vU2l469OTi4Dk20tKbnWx6Cf4s oi57ccCLS0JJqje6SEiE5Yke2v5GYqFPSPwV+z8XDDJLF6fsw4BtWMQg+irZh3Bs X31g5FhLshTcW3irOuUP8tVsISzaNiuNygHsfQP+6AbveEms1Yl+lUreyDY4Yu7G G9Z8ml9VC8LppFEy7WYkRQffqW+wkjecn+44vf+2NnTtbKICbJ2bv/N1u2vPS/PY wdrRDMFBZwIoOto/IqkK+BgQG6OiYi0fe808qDjWWlNumNU9oRrrlDmyzSgcfESr mhSB71l4RbZiIDAr594gasNHoYBDQ3ikQfVahMIZ8jnT7BzaxrpB9xAA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=RypWuAcjXYhvuMrAXCozdcgxcIlBzpaT89Q2nmHZmv8=; b=XOwBPPeM mnbXwJ+PLdSRa2BMxFUDnjYPV0haB7FbevY5sBbo2sCdUaiIBIWz8NEAwqZecKqu 1VP872OeciQ8aB0XJVnu/+BfNNIH0+qOj8MtS1ZtT6DSguR16f9aRdWGuOS5NJet IIkB8IfincfHM+YZDZyCVoGF7UJITy9dZBTMwT2qW7SCBi8HvG8Cqx4W92wbdnqD gEjAS1eizVpCo6CWuSi2cw0FIfjylM9gY8IHq/E0/7X7DsuCVBgGPGqzBWrJ8u9M 9fNAjv8raJvc3eC9nHsiBLppHj0KqGjypmcpL+1DMfVmvCg8fh2BrcVGC7MLjyeW V6rfLteZ6mDDkA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrudegledgudeivdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtqhertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeeije euvdeuudeuhfeghfehieeuvdetvdeugfeigeevteeuieeuhedtgeduheefleenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpeiiihdrhigrnhessh gvnhhtrdgtohhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 24 Mar 2022 18:44:42 -0400 (EDT) From: Zi Yan To: David Hildenbrand , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, Vlastimil Babka , Mel Gorman , Eric Ren , Mike Rapoport , Oscar Salvador , Christophe Leroy , Zi Yan Subject: [PATCH v9 5/5] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size. Date: Thu, 24 Mar 2022 18:44:35 -0400 Message-Id: <20220324224435.17794-6-zi.yan@sent.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220324224435.17794-1-zi.yan@sent.com> References: <20220324224435.17794-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Zi Yan alloc_contig_range() now only needs to be aligned to pageblock_nr_pages, drop virtio_mem size requirement that it needs to be MAX_ORDER_NR_PAGES. Signed-off-by: Zi Yan --- drivers/virtio/virtio_mem.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index e7d6b679596d..e07486f01999 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -2476,10 +2476,10 @@ static int virtio_mem_init_hotplug(struct virtio_me= m *vm) VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD); =20 /* - * TODO: once alloc_contig_range() works reliably with pageblock - * granularity on ZONE_NORMAL, use pageblock_nr_pages instead. + * alloc_contig_range() works reliably with pageblock + * granularity on ZONE_NORMAL, use pageblock_nr_pages. */ - sb_size =3D PAGE_SIZE * MAX_ORDER_NR_PAGES; + sb_size =3D PAGE_SIZE * pageblock_nr_pages; sb_size =3D max_t(uint64_t, vm->device_block_size, sb_size); =20 if (sb_size < memory_block_size_bytes() && !force_bbm) { --=20 2.35.1