From nobody Thu Apr 2 15:38:04 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E71983BD230; Fri, 27 Mar 2026 12:57:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774616261; cv=none; b=nXV9uNUiNRCt4kEyC7fH5UETXIW2OcqShi8x+ZsbFy6W0p94S2hsl0RBNIrEOPBE/slxsiP9ffSfp23Je6bqr1aMs5LypYGASmHMP1w8us7cLGRtt0wex8XuFDGWE1o00/0t6XYVDY9aR5G1il4y1ADR34fDewPA4bERRcU9424= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774616261; c=relaxed/simple; bh=UXijisU5tOgKkFmh5wAQXBTkqCgWFSSnpsD0W42QDkA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EZE/C6NmOmLWHRgfDKeBw3YfTrbNPhCmkeSHHgPnov3ME8wsVAnxGC4hVkZdHMpWTk3vPHoywWRhZeZ4xZSSRllhFwS9voBxAbNoL7KvztBgdcKXDw8DhZMtQ40WKcUBJjU7CxkL0SW6HZDObO1p6lNTK4quixHiL/3utN+Z/E0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=tH6FZWN7; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="tH6FZWN7" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7708B35A1; Fri, 27 Mar 2026 05:57:32 -0700 (PDT) Received: from e142334-100.cambridge.arm.com (e142334-100.cambridge.arm.com [10.1.194.63]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5FA023F905; Fri, 27 Mar 2026 05:57:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1774616258; bh=UXijisU5tOgKkFmh5wAQXBTkqCgWFSSnpsD0W42QDkA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tH6FZWN7twbla1Uz7WKBFZvsyyynXr9mG5j1UkV+kyJqU/bg32SaVrgyYtnUM7kfw enPVHupFsmjFM9Sl4QJpCiHfuZBoMrnNbo7dF0F19i1EcqQDvfhG2gkHkHZDq/KpzE RXfySZtcsuyWEDAhlIyKrqKIlh6N3L3TcA3Y0BVI= From: Muhammad Usama Anjum To: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , Uladzislau Rezki , Nick Terrell , David Sterba , Vishal Moola , linux-mm@kvack.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, Ryan.Roberts@arm.com, david.hildenbrand@arm.com Cc: Ryan Roberts , usama.anjum@arm.com Subject: [PATCH v4 1/3] mm/page_alloc: Optimize free_contig_range() Date: Fri, 27 Mar 2026 12:57:13 +0000 Message-ID: <20260327125720.2270651-2-usama.anjum@arm.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260327125720.2270651-1-usama.anjum@arm.com> References: <20260327125720.2270651-1-usama.anjum@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ryan Roberts Decompose the range of order-0 pages to be freed into the set of largest possible power-of-2 size and aligned chunks and free them to the pcp or buddy. This improves on the previous approach which freed each order-0 page individually in a loop. Testing shows performance to be improved by more than 10x in some cases. Since each page is order-0, we must decrement each page's reference count individually and only consider the page for freeing as part of a high order chunk if the reference count goes to zero. Additionally free_pages_prepare() must be called for each individual order-0 page too, so that the struct page state and global accounting state can be appropriately managed. But once this is done, the resulting high order chunks can be freed as a unit to the pcp or buddy. This significantly speeds up the free operation but also has the side benefit that high order blocks are added to the pcp instead of each page ending up on the pcp order-0 list; memory remains more readily available in high orders. vmalloc will shortly become a user of this new optimized free_contig_range() since it aggressively allocates high order non-compound pages, but then calls split_page() to end up with contiguous order-0 pages. These can now be freed much more efficiently. The execution time of the following function was measured in a server class arm64 machine: static int page_alloc_high_order_test(void) { unsigned int order =3D HPAGE_PMD_ORDER; struct page *page; int i; for (i =3D 0; i < 100000; i++) { page =3D alloc_pages(GFP_KERNEL, order); if (!page) return -1; split_page(page, order); free_contig_range(page_to_pfn(page), 1UL << order); } return 0; } Execution time before: 4097358 usec Execution time after: 729831 usec Perf trace before: 99.63% 0.00% kthreadd [kernel.kallsyms] [.] kthread | ---kthread 0xffffb33c12a26af8 | |--98.13%--0xffffb33c12a26060 | | | |--97.37%--free_contig_range | | | | | |--94.93%--___free_pages | | | | | | | |--55.42%--__free_frozen_pa= ges | | | | | | | | | --43.20%--free_= frozen_page_commit | | | | | | | | | --35= .37%--_raw_spin_unlock_irqrestore | | | | | | | |--11.53%--_raw_spin_trylock | | | | | | | |--8.19%--__preempt_count_d= ec_and_test | | | | | | | |--5.64%--_raw_spin_unlock | | | | | | | |--2.37%--__get_pfnblock_fl= ags_mask.isra.0 | | | | | | | --1.07%--free_frozen_page_= commit | | | | | --1.54%--__free_frozen_pages | | | --0.77%--___free_pages | --0.98%--0xffffb33c12a26078 alloc_pages_noprof Perf trace after: 8.42% 2.90% kthreadd [kernel.kallsyms] [k] __free= _contig_range | |--5.52%--__free_contig_range | | | |--5.00%--free_prepared_contig_range | | | | | |--1.43%--__free_frozen_pages | | | | | | | --0.51%--free_frozen_page_com= mit | | | | | |--1.08%--_raw_spin_trylock | | | | | --0.89%--_raw_spin_unlock | | | --0.52%--free_pages_prepare | --2.90%--ret_from_fork kthread 0xffffae1c12abeaf8 0xffffae1c12abe7a0 | --2.69%--vfree __free_contig_range Signed-off-by: Ryan Roberts Co-developed-by: Muhammad Usama Anjum Signed-off-by: Muhammad Usama Anjum Reviewed-by: Zi Yan --- Changes since v3: - Move __free_contig_range() to more generic __free_contig_range_common() which will used to free frozen pages as well - Simplify the loop in __free_contig_range_common() - Rewrite the comment Changes since v2: - Handle different possible section boundries in __free_contig_range() - Drop the TODO - Remove return value from __free_contig_range() - Remove non-functional change from __free_pages_ok() Changes since v1: - Rebase on mm-new - Move FPI_PREPARED check inside __free_pages_prepare() now that fpi_flags are already being passed. - Add todo (Zi Yan) - Rerun benchmarks - Convert VM_BUG_ON_PAGE() to VM_WARN_ON_ONCE() - Rework order calculation in free_prepared_contig_range() and use MAX_PAGE_ORDER as high limit instead of pageblock_order as it must be up to internal __free_frozen_pages() how it frees them --- include/linux/gfp.h | 2 + mm/page_alloc.c | 103 +++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 103 insertions(+), 2 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index f82d74a77cad8..7c1f9da7c8e56 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -467,6 +467,8 @@ void free_contig_frozen_range(unsigned long pfn, unsign= ed long nr_pages); void free_contig_range(unsigned long pfn, unsigned long nr_pages); #endif =20 +void __free_contig_range(unsigned long pfn, unsigned long nr_pages); + DEFINE_FREE(free_page, void *, free_page((unsigned long)_T)) =20 #endif /* __LINUX_GFP_H */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 75ee81445640b..18a96b51aa0be 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -91,6 +91,9 @@ typedef int __bitwise fpi_t; /* Free the page without taking locks. Rely on trylock only. */ #define FPI_TRYLOCK ((__force fpi_t)BIT(2)) =20 +/* free_pages_prepare() has already been called for page(s) being freed. */ +#define FPI_PREPARED ((__force fpi_t)BIT(3)) + /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */ static DEFINE_MUTEX(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8) @@ -1310,6 +1313,9 @@ __always_inline bool __free_pages_prepare(struct page= *page, bool compound =3D PageCompound(page); struct folio *folio =3D page_folio(page); =20 + if (fpi_flags & FPI_PREPARED) + return true; + VM_BUG_ON_PAGE(PageTail(page), page); =20 trace_mm_page_free(page, order); @@ -6784,6 +6790,100 @@ void __init page_alloc_sysctl_init(void) register_sysctl_init("vm", page_alloc_sysctl_table); } =20 +static void free_prepared_contig_range(struct page *page, + unsigned long nr_pages) +{ + while (nr_pages) { + unsigned int order; + unsigned long pfn; + + pfn =3D page_to_pfn(page); + /* We are limited by the largest buddy order. */ + order =3D pfn ? __ffs(pfn) : MAX_PAGE_ORDER; + /* Don't exceed the number of pages to free. */ + order =3D min_t(unsigned int, order, ilog2(nr_pages)); + order =3D min_t(unsigned int, order, MAX_PAGE_ORDER); + + /* + * Free the chunk as a single block. Our caller has already + * called free_pages_prepare() for each order-0 page. + */ + __free_frozen_pages(page, order, FPI_PREPARED); + + page +=3D 1UL << order; + nr_pages -=3D 1UL << order; + } +} + +static void __free_contig_range_common(unsigned long pfn, unsigned long nr= _pages, + bool is_frozen) +{ + struct page *page =3D pfn_to_page(pfn); + struct page *start =3D NULL; + unsigned long start_sec; + bool can_free =3D true; + unsigned long i; + + /* + * Contiguous PFNs might not have a contiguous "struct pages" in some + * kernel config. Therefore, check memdesc_section(), and stop batching + * once it changes, see num_pages_contiguous(). + */ + for (i =3D 0; i < nr_pages; i++, page++) { + VM_WARN_ON_ONCE(PageHead(page)); + VM_WARN_ON_ONCE(PageTail(page)); + + if (!is_frozen) + can_free =3D put_page_testzero(page); + + if (can_free) + can_free =3D free_pages_prepare(page, 0); + + if (!can_free) { + if (start) { + free_prepared_contig_range(start, page - start); + start =3D NULL; + } + continue; + } + + if (start && memdesc_section(page->flags) !=3D start_sec) { + free_prepared_contig_range(start, page - start); + start =3D page; + start_sec =3D memdesc_section(page->flags); + } else if (!start) { + start =3D page; + start_sec =3D memdesc_section(page->flags); + } + } + + if (start) + free_prepared_contig_range(start, page - start); +} + +/** + * __free_contig_range - Free contiguous range of order-0 pages. + * @pfn: Page frame number of the first page in the range. + * @nr_pages: Number of pages to free. + * + * For each order-0 struct page in the physically contiguous range, put a + * reference. Free any page who's reference count falls to zero. The + * implementation is functionally equivalent to, but significantly faster = than + * calling __free_page() for each struct page in a loop. + * + * Memory allocated with alloc_pages(order>=3D1) then subsequently split to + * order-0 with split_page() is an example of appropriate contiguous pages= that + * can be freed with this API. + * + * Context: May be called in interrupt context or while holding a normal + * spinlock, but not in NMI context or while holding a raw spinlock. + */ +void __free_contig_range(unsigned long pfn, unsigned long nr_pages) +{ + __free_contig_range_common(pfn, nr_pages, false); +} +EXPORT_SYMBOL(__free_contig_range); + #ifdef CONFIG_CONTIG_ALLOC /* Usage: See admin-guide/dynamic-debug-howto.rst */ static void alloc_contig_dump_pages(struct list_head *page_list) @@ -7330,8 +7430,7 @@ void free_contig_range(unsigned long pfn, unsigned lo= ng nr_pages) if (WARN_ON_ONCE(PageHead(pfn_to_page(pfn)))) return; =20 - for (; nr_pages--; pfn++) - __free_page(pfn_to_page(pfn)); + __free_contig_range(pfn, nr_pages); } EXPORT_SYMBOL(free_contig_range); #endif /* CONFIG_CONTIG_ALLOC */ --=20 2.47.3 From nobody Thu Apr 2 15:38:04 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 69D8F3BBA16; Fri, 27 Mar 2026 12:57:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774616264; cv=none; b=l0LXdRPlFnE7fIgGXxjCbBgjgAcVHBNNWiGvGYHBjqmptjyJCYZizqzMvc0ShDtsSU8ZDlRxhovd9IzJDZ0v9slemOvVCSuHUY32rCMHveoBkr3IiVzH6XArx9M45JEieN6mNFnvz54XgcA6B/QsPUuMpUk8UK5bH16fECQ1rY8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774616264; c=relaxed/simple; bh=q7pCx5zYCw3JVFZIhMCmgYoCy0mbVlz7KK/9cEtGPRs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZxpWZj/8ZU9Gb0J45sFLiIFEZscrK2TgvgEQvbdFWh0atCW/GHgBqFuhOpa7Tex6/k9KklH57P12uI47R7N49/7ahljGbmyG7YnruI1d+aP+Ji/rxy3WtttJ2H2vP5f1P5ljrI2IuQ6Fhh7J8PUZWzNe0f/6Tp3QMdgo+7YZt4I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=joeT257G; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="joeT257G" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C3AB235A1; Fri, 27 Mar 2026 05:57:35 -0700 (PDT) Received: from e142334-100.cambridge.arm.com (e142334-100.cambridge.arm.com [10.1.194.63]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id ADEF83F905; Fri, 27 Mar 2026 05:57:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1774616261; bh=q7pCx5zYCw3JVFZIhMCmgYoCy0mbVlz7KK/9cEtGPRs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=joeT257GjAfkmrVcD+dPtCjwVRiwKDIkPSfjr2pKnpvoc1kN2MMhDygtFxtR0HB5c CgWgwKRnSV77TLcDKT4us721ECSFz/FfTSQSuw/6A2Srt5qilwV0Xwxj3mBLop2ZIg 0hdRmFSajfC3xNoRTwb87HHyyQ8hrw+wXCkRva3Y= From: Muhammad Usama Anjum To: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , Uladzislau Rezki , Nick Terrell , David Sterba , Vishal Moola , linux-mm@kvack.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, Ryan.Roberts@arm.com, david.hildenbrand@arm.com Cc: Ryan Roberts , usama.anjum@arm.com Subject: [PATCH v4 2/3] vmalloc: Optimize vfree Date: Fri, 27 Mar 2026 12:57:14 +0000 Message-ID: <20260327125720.2270651-3-usama.anjum@arm.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260327125720.2270651-1-usama.anjum@arm.com> References: <20260327125720.2270651-1-usama.anjum@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ryan Roberts Whenever vmalloc allocates high order pages (e.g. for a huge mapping) it must immediately split_page() to order-0 so that it remains compatible with users that want to access the underlying struct page. Commit a06157804399 ("mm/vmalloc: request large order pages from buddy allocator") recently made it much more likely for vmalloc to allocate high order pages which are subsequently split to order-0. Unfortunately this had the side effect of causing performance regressions for tight vmalloc/vfree loops (e.g. test_vmalloc.ko benchmarks). See Closes: tag. This happens because the high order pages must be gotten from the buddy but then because they are split to order-0, when they are freed they are freed to the order-0 pcp. Previously allocation was for order-0 pages so they were recycled from the pcp. It would be preferable if when vmalloc allocates an (e.g.) order-3 page that it also frees that order-3 page to the order-3 pcp, then the regression could be removed. So let's do exactly that; update stats separately first as coalescing is hard to do correctly without complexity. Use free_pages_bulk() which uses the new __free_contig_range() API to batch-free contiguous ranges of pfns. This not only removes the regression, but significantly improves performance of vfree beyond the baseline. A selection of test_vmalloc benchmarks running on arm64 server class system. mm-new is the baseline. Commit a06157804399 ("mm/vmalloc: request large order pages from buddy allocator") was added in v6.19-rc1 where we see regressions. Then with this change performance is much better. (>0 is faster, <0 is slower, (R)/(I) =3D statistically significant Regression/Improvement): +-----------------+--------------------------------------------------------= --+-------------------+--------------------+ | Benchmark | Result Class = | mm-new | this series | +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ | micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec) = | 1331843.33 | (I) 67.17% | | | fix_size_alloc_test: p:1, h:0, l:500000 (usec) = | 415907.33 | -5.14% | | | fix_size_alloc_test: p:4, h:0, l:500000 (usec) = | 755448.00 | (I) 53.55% | | | fix_size_alloc_test: p:16, h:0, l:500000 (usec) = | 1591331.33 | (I) 57.26% | | | fix_size_alloc_test: p:16, h:1, l:500000 (usec) = | 1594345.67 | (I) 68.46% | | | fix_size_alloc_test: p:64, h:0, l:100000 (usec) = | 1071826.00 | (I) 79.27% | | | fix_size_alloc_test: p:64, h:1, l:100000 (usec) = | 1018385.00 | (I) 84.17% | | | fix_size_alloc_test: p:256, h:0, l:100000 (usec) = | 3970899.67 | (I) 77.01% | | | fix_size_alloc_test: p:256, h:1, l:100000 (usec) = | 3821788.67 | (I) 89.44% | | | fix_size_alloc_test: p:512, h:0, l:100000 (usec) = | 7795968.00 | (I) 82.67% | | | fix_size_alloc_test: p:512, h:1, l:100000 (usec) = | 6530169.67 | (I) 118.09% | | | full_fit_alloc_test: p:1, h:0, l:500000 (usec) = | 626808.33 | -0.98% | | | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec= ) | 532145.67 | -1.68% | | | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec= ) | 537032.67 | -0.96% | | | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec) = | 8805069.00 | (I) 74.58% | | | pcpu_alloc_test: p:1, h:0, l:500000 (usec) = | 500824.67 | 4.35% | | | random_size_align_alloc_test: p:1, h:0, l:500000 (usec)= | 1637554.67 | (I) 76.99% | | | random_size_alloc_test: p:1, h:0, l:500000 (usec) = | 4556288.67 | (I) 72.23% | | | vm_map_ram_test: p:1, h:0, l:500000 (usec) = | 107371.00 | -0.70% | +-----------------+--------------------------------------------------------= --+-------------------+--------------------+ Fixes: a06157804399 ("mm/vmalloc: request large order pages from buddy allo= cator") Closes: https://lore.kernel.org/all/66919a28-bc81-49c9-b68f-dd7c73395a0d@ar= m.com/ Acked-by: Zi Yan Signed-off-by: Ryan Roberts Co-developed-by: Muhammad Usama Anjum Signed-off-by: Muhammad Usama Anjum Acked-by: Vlastimil Babka (SUSE) Reviewed-by: Uladzislau Rezki (Sony) --- Changes since v3: - Add kerneldoc comment and update description - Add tag Changes since v2: - Remove BUG_ON in favour of simple implementation as this has never been seen to output any bug in the past as well - Move the free loop to separate function, free_pages_bulk() - Update stats, lruvec_stat in separate loop Changes since v1: - Rebase on mm-new - Rerun benchmarks --- include/linux/gfp.h | 2 ++ mm/page_alloc.c | 38 ++++++++++++++++++++++++++++++++++++++ mm/vmalloc.c | 16 +++++----------- 3 files changed, 45 insertions(+), 11 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 7c1f9da7c8e56..71f9097ab99a0 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -239,6 +239,8 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int pr= eferred_nid, struct page **page_array); #define __alloc_pages_bulk(...) alloc_hooks(alloc_pages_bulk_noprof(__VA= _ARGS__)) =20 +void free_pages_bulk(struct page **page_array, unsigned long nr_pages); + unsigned long alloc_pages_bulk_mempolicy_noprof(gfp_t gfp, unsigned long nr_pages, struct page **page_array); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 18a96b51aa0be..64be8a9019dca 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5175,6 +5175,44 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int= preferred_nid, } EXPORT_SYMBOL_GPL(alloc_pages_bulk_noprof); =20 +/* + * free_pages_bulk - Free an array of order-0 pages + * @page_array: Array of pages to free + * @nr_pages: The number of pages in the array + * + * Free the order-0 pages. Adjacent entries whose PFNs form a contiguous + * run are released with a single __free_contig_range() call. + * + * This assumes page_array is sorted in ascending PFN order. Without that, + * the function still frees all pages, but contiguous runs may not be + * detected and the freeing pattern can degrade to freeing one page at a + * time. + * + * Context: Sleepable process context only; calls cond_resched() + */ +void free_pages_bulk(struct page **page_array, unsigned long nr_pages) +{ + unsigned long start_pfn =3D 0, pfn; + unsigned long i, nr_contig =3D 0; + + for (i =3D 0; i < nr_pages; i++) { + pfn =3D page_to_pfn(page_array[i]); + if (!nr_contig) { + start_pfn =3D pfn; + nr_contig =3D 1; + } else if (start_pfn + nr_contig !=3D pfn) { + __free_contig_range(start_pfn, nr_contig); + start_pfn =3D pfn; + nr_contig =3D 1; + cond_resched(); + } else { + nr_contig++; + } + } + if (nr_contig) + __free_contig_range(start_pfn, nr_contig); +} + /* * This is the 'heart' of the zoned buddy allocator. */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index c607307c657a6..e9b3d6451e48b 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3459,19 +3459,13 @@ void vfree(const void *addr) =20 if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS)) vm_reset_perms(vm); - for (i =3D 0; i < vm->nr_pages; i++) { - struct page *page =3D vm->pages[i]; =20 - BUG_ON(!page); - /* - * High-order allocs for huge vmallocs are split, so - * can be freed as an array of order-0 allocations - */ - if (!(vm->flags & VM_MAP_PUT_PAGES)) - mod_lruvec_page_state(page, NR_VMALLOC, -1); - __free_page(page); - cond_resched(); + if (!(vm->flags & VM_MAP_PUT_PAGES)) { + for (i =3D 0; i < vm->nr_pages; i++) + mod_lruvec_page_state(vm->pages[i], NR_VMALLOC, -1); } + free_pages_bulk(vm->pages, vm->nr_pages); + kvfree(vm->pages); kfree(vm); } --=20 2.47.3 From nobody Thu Apr 2 15:38:04 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 95D083BD23B; Fri, 27 Mar 2026 12:57:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774616267; cv=none; b=GZU8m4i5/EOJ2RTBYDIXvfkObyKqKca+1YiJRMKk0RPNN5DnnFwx7HKXo6VZ2a+EEnfzJd3+PigAtx0AcMXwz+lsEwCv8aiTRNYwdgkFtimlaf6H6KuiL1eQQ8wj7oJrE4JRbRFiWo+PiUGKiStjN+uNH2NMN0rgq/Fii4tKu88= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774616267; c=relaxed/simple; bh=U/ncK0j4auSiim1AZ43YtKYV2rZY6C5VfPkBzb2+7f4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lPHoIQYDevZUYLD+QbiZlYm4Z9TU3LC2cuGUxMTdvb/ui0il2ZHbi1r5np1E55ZPF5peiwgjfrZY76Cr68HPcAiVaxsWNlYhRxkfDhSv8q1zmAMRNMYZxwQ646i2pnL87wnbHDQjGYLhZ0WR3skxXfBZodg+Sjf7yyD7BpTU7RE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=vedtCy6d; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="vedtCy6d" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 01E8F35A1; Fri, 27 Mar 2026 05:57:39 -0700 (PDT) Received: from e142334-100.cambridge.arm.com (e142334-100.cambridge.arm.com [10.1.194.63]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 08DFE3F905; Fri, 27 Mar 2026 05:57:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1774616264; bh=U/ncK0j4auSiim1AZ43YtKYV2rZY6C5VfPkBzb2+7f4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vedtCy6da8VZEeX8xyArTIgFESd5s5gSC/l6pJhebIgd3O1RsVXPKni2xxSF/3Rv+ e/rYCKBI9mBgj8potQ+WL49SMMJY8ze0IWosgHrW1RejrNCZ5C75edd9PMm8/GS3+l PQ8jUVvD5rWEzPjORH1Zze0549hyupEBzJrt7VIw= From: Muhammad Usama Anjum To: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , Uladzislau Rezki , Nick Terrell , David Sterba , Vishal Moola , linux-mm@kvack.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, Ryan.Roberts@arm.com, david.hildenbrand@arm.com Cc: Muhammad Usama Anjum Subject: [PATCH v4 3/3] mm/page_alloc: Optimize __free_contig_frozen_range() Date: Fri, 27 Mar 2026 12:57:15 +0000 Message-ID: <20260327125720.2270651-4-usama.anjum@arm.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260327125720.2270651-1-usama.anjum@arm.com> References: <20260327125720.2270651-1-usama.anjum@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Apply the same batch-freeing optimization from free_contig_range() to the frozen page path. The previous __free_contig_frozen_range() freed each order-0 page individually via free_frozen_pages(), which is slow for the same reason the old free_contig_range() was: each page goes to the order-0 pcp list rather than being coalesced into higher-order blocks. Rewrite __free_contig_frozen_range() to call free_pages_prepare() for each order-0 page, then batch the prepared pages into the largest possible power-of-2 aligned chunks via free_prepared_contig_range(). If free_pages_prepare() fails (e.g. HWPoison, bad page) the page is deliberately not freed; it should not be returned to the allocator. I've tested CMA through debugfs. The test allocates 16384 pages per allocation for several iterations. There is 3.5x improvement. Before: 1406 usec per iteration After: 402 usec per iteration Before: 70.89% 0.69% cma [kernel.kallsyms] [.] free_cont= ig_frozen_range | |--70.20%--free_contig_frozen_range | | | |--46.41%--__free_frozen_pages | | | | | --36.18%--free_frozen_page_commit | | | | | --29.63%--_raw_spin_unlock_ir= qrestore | | | |--8.76%--_raw_spin_trylock | | | |--7.03%--__preempt_count_dec_and_test | | | |--4.57%--_raw_spin_unlock | | | |--1.96%--__get_pfnblock_flags_mask.isra.0 | | | --1.15%--free_frozen_page_commit | --0.69%--el0t_64_sync After: 23.57% 0.00% cma [kernel.kallsyms] [.] free_cont= ig_frozen_range | ---free_contig_frozen_range | |--20.45%--__free_contig_frozen_range | | | |--17.77%--free_pages_prepare | | | --0.72%--free_prepared_contig_range | | | --0.55%--__free_frozen_pages | --3.12%--free_pages_prepare Suggested-by: Zi Yan Signed-off-by: Muhammad Usama Anjum Acked-by: David Hildenbrand (Arm) Acked-by: Vlastimil Babka (SUSE) Reviewed-by: Zi Yan --- Changes since v3: - Use newly introduced __free_contig_range_common() as the pattern was very similar to __free_contig_range() Changes since v2: - Rework the loop to check for memory sections just like __free_contig_rang= e() - Didn't add reviewed-by tags because of rework --- mm/page_alloc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 64be8a9019dca..110e912fa785e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7059,8 +7059,7 @@ static int __alloc_contig_verify_gfp_mask(gfp_t gfp_m= ask, gfp_t *gfp_cc_mask) =20 static void __free_contig_frozen_range(unsigned long pfn, unsigned long nr= _pages) { - for (; nr_pages--; pfn++) - free_frozen_pages(pfn_to_page(pfn), 0); + __free_contig_range_common(pfn, nr_pages, true); } =20 /** --=20 2.47.3