include/linux/mmzone.h | 10 ++++++++-- mm/page_alloc.c | 5 +++++ 2 files changed, 13 insertions(+), 2 deletions(-)
Currently, THP CMA pages share PCP lists with UNMOVABLE and RECLAIMABLE
pages. This may result in CMA THP pages being allocated from the PCP
list for other migratetypes. When this occurs, these pages may fail to
be isolated, leading to CMA allocation failures when drivers request
them.
This patch introduces a dedicated PCP list for the THP CMA migratetype,
ensuring that CMA THP pages are not mixed with other migratetypes and
remain available for CMA allocations as intended.
Signed-off-by: akash.tyagi <akash.tyagi@mediatek.com>
---
include/linux/mmzone.h | 10 ++++++++--
mm/page_alloc.c | 5 +++++
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 283913d42d7b..dd93088ce851 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -696,11 +696,17 @@ enum zone_watermarks {
/*
* One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. Two additional lists
- * are added for THP. One PCP list is used by GPF_MOVABLE, and the other PCP list
- * is used by GFP_UNMOVABLE and GFP_RECLAIMABLE.
+ * are added for THP: one for GFP_MOVABLE, and one for GFP_UNMOVABLE and
+ * GFP_RECLAIMABLE. With CMA enabled, an extra THP PCP list is added for
+ * MIGRATE_CMA, allowing further distinction between MIGRATE_MOVABLE and
+ * MIGRATE_CMA for THP allocations.
*/
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_CMA
+#define NR_PCP_THP 3
+#else
#define NR_PCP_THP 2
+#endif
#else
#define NR_PCP_THP 0
#endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ef3c07266b3..35f8041afbcc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -519,6 +519,11 @@ static inline unsigned int order_to_pindex(int migratetype, int order)
if (order > PAGE_ALLOC_COSTLY_ORDER) {
VM_BUG_ON(order != HPAGE_PMD_ORDER);
+#ifdef CONFIG_CMA
+ if (migratetype == MIGRATE_CMA)
+ return NR_LOWORDER_PCP_LISTS + 2;
+#endif
+
movable = migratetype == MIGRATE_MOVABLE;
return NR_LOWORDER_PCP_LISTS + movable;
--
2.18.0
On 24.07.25 09:53, akash.tyagi wrote: > Currently, THP CMA pages share PCP lists with UNMOVABLE and RECLAIMABLE > pages. This may result in CMA THP pages being allocated from the PCP > list for other migratetypes. When this occurs, these pages may fail to > be isolated, leading to CMA allocation failures when drivers request > them. Curious, did you run into that in practice? Having MIGRATE_CMA pages allocated for unmovable allocations would indeed be broken. But, MIGRATE_PCPTYPES does not include MIGRATE_CMA. So there is also no dedicated PCP list for VMA? In free_unref_folios(), we have "Non-isolated types over MIGRATE_PCPTYPES get added to the MIGRATE_MOVABLE pcp list." if (unlikely(migratetype >= MIGRATE_PCPTYPES)) migratetype = MIGRATE_MOVABLE; So ... shouldn't that safe us here as well for THPs? > > This patch introduces a dedicated PCP list for the THP CMA migratetype, > ensuring that CMA THP pages are not mixed with other migratetypes and > remain available for CMA allocations as intended. > > Signed-off-by: akash.tyagi <akash.tyagi@mediatek.com> > --- > include/linux/mmzone.h | 10 ++++++++-- > mm/page_alloc.c | 5 +++++ > 2 files changed, 13 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 283913d42d7b..dd93088ce851 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -696,11 +696,17 @@ enum zone_watermarks { > > /* > * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. Two additional lists > - * are added for THP. One PCP list is used by GPF_MOVABLE, and the other PCP list > - * is used by GFP_UNMOVABLE and GFP_RECLAIMABLE. > + * are added for THP: one for GFP_MOVABLE, and one for GFP_UNMOVABLE and > + * GFP_RECLAIMABLE. With CMA enabled, an extra THP PCP list is added for > + * MIGRATE_CMA, allowing further distinction between MIGRATE_MOVABLE and > + * MIGRATE_CMA for THP allocations. > */ > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > +#ifdef CONFIG_CMA > +#define NR_PCP_THP 3 > +#else > #define NR_PCP_THP 2 > +#endif > #else > #define NR_PCP_THP 0 > #endif > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 2ef3c07266b3..35f8041afbcc 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -519,6 +519,11 @@ static inline unsigned int order_to_pindex(int migratetype, int order) > if (order > PAGE_ALLOC_COSTLY_ORDER) { > VM_BUG_ON(order != HPAGE_PMD_ORDER); > > +#ifdef CONFIG_CMA > + if (migratetype == MIGRATE_CMA) > + return NR_LOWORDER_PCP_LISTS + 2; > +#endif > + > movable = migratetype == MIGRATE_MOVABLE; > > return NR_LOWORDER_PCP_LISTS + movable; -- Cheers, David / dhildenb
Hi David/Zi, Is there any reason why the MIGRATE_CMA pages are not in the PCP lists? There are many devices that need fast allocation of MIGRATE_CMA pages, and they have to get them from the buddy allocator, which is a bit slower in comparison to the PCP lists. We also have cases where the MIGRATE_CMA memory requirements are big. For example, GPUs need MIGRATE_CMA memory in the ranges of 30MiB to 500MiBs. These cases would benefit if we have THPs for CMAs. Could we add the support for MIGRATE_CMA pages on the PCP and THP lists? Thanks
Hi David, Thank you for your feedback. We encountered this issue in the Android Common Kernel (version 6.12), which uses PCP lists for CMA pages. page_owner trace- Page allocated via order 9, mask 0x52dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO), pid 1, tgid 1 (swapper/0), ts 1065952310 ns PFN 0x23d200 type Unmovable Block 4585 type CMA Flags 0x4000000000000040(head|zone=1|kasantag=0x0) post_alloc_hook+0x228/0x230 prep_new_page+0x28/0x148 get_page_from_freelist+0x19d0/0x1a38 __alloc_pages_noprof+0x1b0/0x440 ___kmalloc_large_node+0xb4/0x1ec __kmalloc_large_node_noprof+0x2c/0xec __kmalloc_node_noprof+0x39c/0x548 __kvmalloc_node_noprof+0xd8/0x18c nf_ct_alloc_hashtable+0x64/0x108 nf_nat_init+0x3c/0xf8 do_one_initcall+0x150/0x3c0 do_initcall_level+0xa4/0x15c do_initcalls+0x70/0xc0 do_basic_setup+0x1c/0x28 kernel_init_freeable+0xcc/0x130 kernel_init+0x20/0x1ac This UNMOVABLE page was allocated from CMA, but it could not be migrated - so CMA alloc failed At first, we fixed this by adding CMA THP pages to the movable THP PCP list. This fixed the issue of CMA pages being put in the wrong list, but now any movable allocation can use these CMA pages. Later, we saw that a movable allocation used a CMA page and was pinned by __filemap_get_folio(). This page was pinned for too long, and eventually, CMA allocation failed page_owner trace- Page allocated via order 0, mask 0x140c48(GFP_NOFS|__GFP_COMP|__GFP_HARDWALL|__GFP_MOVABLE), pid 1198, tgid 1194 (ccci_mdinit), ts 17918751965 ns PFN 0x207233 type Movable Block 4153 type CMA Flags 0x4020000000008224(referenced|lru|workingset|private|zone=1|kasantag=0x0) post_alloc_hook+0x23c/0x254 prep_new_page+0x28/0x148 get_page_from_freelist+0x19d8/0x1a40 __alloc_pages_noprof+0x1a8/0x430 __folio_alloc_noprof+0x14/0x5c __filemap_get_folio+0x1bc/0x430 bdev_getblk+0xd4/0x294 __read_extent_tree_block+0x6c/0x260 ext4_find_extent+0x22c/0x3dc ext4_ext_map_blocks+0x88/0x173c ext4_map_query_blocks+0x54/0xe0 ext4_map_blocks+0xf8/0x518 _ext4_get_block+0x70/0x188 ext4_get_block+0x18/0x24 ext4_block_write_begin+0x154/0x62c ext4_write_begin+0x20c/0x630 Page has been migrated, last migrate reason: compaction Charged to memcg / Currently, free_unref_page treats CMA pages as movable. So, some MOVABLE allocations may use these CMA pages and pinned them. Later, when CMA needs these pages, these pages failed to migrate. free_unref_page()/free_unref_folios migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { migratetype = MIGRATE_MOVABLE; } Best Regards, Akash Tyagi
On 25.07.25 07:08, akash.tyagi wrote: > Hi David, > > Thank you for your feedback. > > We encountered this issue in the Android Common Kernel (version 6.12), which uses PCP lists for CMA pages. > > page_owner trace- > Page allocated via order 9, mask 0x52dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO), pid 1, tgid 1 (swapper/0), ts 1065952310 ns > PFN 0x23d200 type Unmovable Block 4585 type CMA Flags 0x4000000000000040(head|zone=1|kasantag=0x0) > post_alloc_hook+0x228/0x230 > prep_new_page+0x28/0x148 > get_page_from_freelist+0x19d0/0x1a38 > __alloc_pages_noprof+0x1b0/0x440 > ___kmalloc_large_node+0xb4/0x1ec > __kmalloc_large_node_noprof+0x2c/0xec > __kmalloc_node_noprof+0x39c/0x548 > __kvmalloc_node_noprof+0xd8/0x18c > nf_ct_alloc_hashtable+0x64/0x108 > nf_nat_init+0x3c/0xf8 > do_one_initcall+0x150/0x3c0 > do_initcall_level+0xa4/0x15c > do_initcalls+0x70/0xc0 > do_basic_setup+0x1c/0x28 > kernel_init_freeable+0xcc/0x130 > kernel_init+0x20/0x1ac > > This UNMOVABLE page was allocated from CMA, but it could not be migrated - so CMA alloc failed > At first, we fixed this by adding CMA THP pages to the movable THP PCP list. > This fixed the issue of CMA pages being put in the wrong list, but now any movable allocation can use these CMA pages. > > Later, we saw that a movable allocation used a CMA page and was pinned by __filemap_get_folio(). This page was pinned for too long, and eventually, CMA allocation failed > > page_owner trace- > Page allocated via order 0, mask 0x140c48(GFP_NOFS|__GFP_COMP|__GFP_HARDWALL|__GFP_MOVABLE), pid 1198, tgid 1194 (ccci_mdinit), ts 17918751965 ns > PFN 0x207233 type Movable Block 4153 type CMA Flags 0x4020000000008224(referenced|lru|workingset|private|zone=1|kasantag=0x0) > post_alloc_hook+0x23c/0x254 > prep_new_page+0x28/0x148 > get_page_from_freelist+0x19d8/0x1a40 > __alloc_pages_noprof+0x1a8/0x430 > __folio_alloc_noprof+0x14/0x5c > __filemap_get_folio+0x1bc/0x430 > bdev_getblk+0xd4/0x294 > __read_extent_tree_block+0x6c/0x260 > ext4_find_extent+0x22c/0x3dc > ext4_ext_map_blocks+0x88/0x173c > ext4_map_query_blocks+0x54/0xe0 > ext4_map_blocks+0xf8/0x518 > _ext4_get_block+0x70/0x188 > ext4_get_block+0x18/0x24 > ext4_block_write_begin+0x154/0x62c > ext4_write_begin+0x20c/0x630 > Page has been migrated, last migrate reason: compaction > Charged to memcg / > > > Currently, free_unref_page treats CMA pages as movable. So, some MOVABLE allocations may use these CMA pages and pinned them. Later, when CMA needs these pages, these pages failed to migrate. MOVABLE allocations commonly fallback to CMA allocations, independent of pcp. Long-term pinning is forbidden on MIGRATE_CMA pages. We had a bug recently fixed, maybe you ran into that? See commit 517f496e1e61bd169d585dab4dd77e7147506322 Author: David Hildenbrand <david@redhat.com> Date: Wed Jun 11 15:13:14 2025 +0200 mm/gup: revert "mm: gup: fix infinite loop within __get_longterm_locked" After commit 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked") we are able to longterm pin folios that are not supposed to get longterm pinned, simply because they temporarily have the LRU flag cleared (esp. temporarily isolated). For example, two __get_longterm_locked() callers can race, or __get_longterm_locked() can race with anything else that temporarily isolates folios. But there is this known problem that CMA can fail temporarily due to short-term pinnings. See the "reliable CMA" work (don't remember the exact name). So what you proposed in the patch at least does not apply I think. -- Cheers, David / dhildenb
On 25 Jul 2025, at 3:04, David Hildenbrand wrote: > On 25.07.25 07:08, akash.tyagi wrote: >> Hi David, >> >> Thank you for your feedback. >> >> We encountered this issue in the Android Common Kernel (version 6.12), which uses PCP lists for CMA pages. >> >> page_owner trace- >> Page allocated via order 9, mask 0x52dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO), pid 1, tgid 1 (swapper/0), ts 1065952310 ns >> PFN 0x23d200 type Unmovable Block 4585 type CMA Flags 0x4000000000000040(head|zone=1|kasantag=0x0) >> post_alloc_hook+0x228/0x230 >> prep_new_page+0x28/0x148 >> get_page_from_freelist+0x19d0/0x1a38 >> __alloc_pages_noprof+0x1b0/0x440 >> ___kmalloc_large_node+0xb4/0x1ec >> __kmalloc_large_node_noprof+0x2c/0xec >> __kmalloc_node_noprof+0x39c/0x548 >> __kvmalloc_node_noprof+0xd8/0x18c >> nf_ct_alloc_hashtable+0x64/0x108 >> nf_nat_init+0x3c/0xf8 >> do_one_initcall+0x150/0x3c0 >> do_initcall_level+0xa4/0x15c >> do_initcalls+0x70/0xc0 >> do_basic_setup+0x1c/0x28 >> kernel_init_freeable+0xcc/0x130 >> kernel_init+0x20/0x1ac >> This UNMOVABLE page was allocated from CMA, but it could not be migrated - so CMA alloc failed >> At first, we fixed this by adding CMA THP pages to the movable THP PCP list. >> This fixed the issue of CMA pages being put in the wrong list, but now any movable allocation can use these CMA pages. >> >> Later, we saw that a movable allocation used a CMA page and was pinned by __filemap_get_folio(). This page was pinned for too long, and eventually, CMA allocation failed >> >> page_owner trace- >> Page allocated via order 0, mask 0x140c48(GFP_NOFS|__GFP_COMP|__GFP_HARDWALL|__GFP_MOVABLE), pid 1198, tgid 1194 (ccci_mdinit), ts 17918751965 ns >> PFN 0x207233 type Movable Block 4153 type CMA Flags 0x4020000000008224(referenced|lru|workingset|private|zone=1|kasantag=0x0) >> post_alloc_hook+0x23c/0x254 >> prep_new_page+0x28/0x148 >> get_page_from_freelist+0x19d8/0x1a40 >> __alloc_pages_noprof+0x1a8/0x430 >> __folio_alloc_noprof+0x14/0x5c >> __filemap_get_folio+0x1bc/0x430 >> bdev_getblk+0xd4/0x294 >> __read_extent_tree_block+0x6c/0x260 >> ext4_find_extent+0x22c/0x3dc >> ext4_ext_map_blocks+0x88/0x173c >> ext4_map_query_blocks+0x54/0xe0 >> ext4_map_blocks+0xf8/0x518 >> _ext4_get_block+0x70/0x188 >> ext4_get_block+0x18/0x24 >> ext4_block_write_begin+0x154/0x62c >> ext4_write_begin+0x20c/0x630 >> Page has been migrated, last migrate reason: compaction >> Charged to memcg / >> >> >> Currently, free_unref_page treats CMA pages as movable. So, some MOVABLE allocations may use these CMA pages and pinned them. Later, when CMA needs these pages, these pages failed to migrate. > > > MOVABLE allocations commonly fallback to CMA allocations, independent of pcp. > > Long-term pinning is forbidden on MIGRATE_CMA pages. We had a bug recently fixed, > maybe you ran into that? > > See > > commit 517f496e1e61bd169d585dab4dd77e7147506322 > Author: David Hildenbrand <david@redhat.com> > Date: Wed Jun 11 15:13:14 2025 +0200 > > mm/gup: revert "mm: gup: fix infinite loop within __get_longterm_locked" > After commit 1aaf8c122918 ("mm: gup: fix infinite loop within > __get_longterm_locked") we are able to longterm pin folios that are not > supposed to get longterm pinned, simply because they temporarily have the > LRU flag cleared (esp. temporarily isolated). > For example, two __get_longterm_locked() callers can race, or > __get_longterm_locked() can race with anything else that temporarily > isolates folios. > > But there is this known problem that CMA can fail temporarily due to > short-term pinnings. See the "reliable CMA" work (don't remember the exact name). I think you mean Guaranteed CMA[1]. [1] https://lore.kernel.org/linux-mm/CAJuCfpEWVEqsivd7oTvp4foEho_HaD1XNP8KTeKWzG_X2skfGQ@mail.gmail.com/ Best Regards, Yan, Zi
On Fri, 25 Jul 2025 at 10:27, Zi Yan <ziy@nvidia.com> wrote: > But there is this known problem that CMA can fail temporarily due to > short-term pinnings. See the "reliable CMA" work (don't remember the exact name). > I think you mean Guaranteed CMA[1]. > > [1] https://lore.kernel.org/linux-mm/CAJuCfpEWVEqsivd7oTvp4foEho_HaD1XNP8KTeKWzG_X2skfGQ@mail.gmail.com/ > > Best Regards, > Yan, Zi Hi, Yes, the issue I described is actually related to Guaranteed CMA[1]. I have rewritten our problem statement to address concerns more specifically related to the Android common kernels. Problem statement: Android Common kernels usually have an out-of-tree patch to prevent file-backed page allocated from CMA. It allows some allocations which have lower chance of being pinned to use CMA to improve CMA utilization controlled by a flag __GFP_CMA. https://lore.kernel.org/lkml/cover.1604282969.git.cgoldswo@codeaurora.org/ Additionally, android kernels create cma pcp list for pages less than PAGE_ALLOC_COSTLY_ORDER, but not for THP pages. This way we noticed some UNMOVABLE allocation also occured from CMA via pcplist as for THP there is pcp either for movable or UNMOVABLE/RECLAIMABLE but not for CMA. so we moved CMA pages to movable for THP. movable = migratetype == MIGRATE_MOVABLE; #ifdef CONFIG_CMA movable |= migratetype == MIGRATE_CMA; #endif return NR_LOWORDER_PCP_LISTS + movable; Now, this way we fixes the issue where CMA pages wrongly placed in UNMOVABLE PCP. But now if there is a GFP_MOVABLE allocation (even without __GFP_CMA) (which android kernel maintains out-of-tree patch as share above), might pull that CMA page from the PCP. This breaks the intended use case of the above patch, which is to allow only allocations that use the __GFP_CMA flag. To address this, we have proposed introducing a CMA PCP for THP pages as well. I would appreciate your review and feedback on whether this is a feasible approach for adding a new PCP in Android common kernel perspective becuase Because having many MIGRATE_CMA pages in the THP lists could cause several performance issues. Best Regards, Akash Tyagi
On Tue, Jul 29, 2025 at 06:00:28PM +0530, akash.tyagi wrote: > Additionally, android kernels create cma pcp list for pages less than PAGE_ALLOC_COSTLY_ORDER, but not for THP pages. Why bother? If it's a CMA allocaation, just free it back to CMA straight away.
On 29.07.25 14:30, akash.tyagi wrote: > On Fri, 25 Jul 2025 at 10:27, Zi Yan <ziy@nvidia.com> wrote: >> But there is this known problem that CMA can fail temporarily due to >> short-term pinnings. See the "reliable CMA" work (don't remember the exact name). >> I think you mean Guaranteed CMA[1]. >> >> [1] https://lore.kernel.org/linux-mm/CAJuCfpEWVEqsivd7oTvp4foEho_HaD1XNP8KTeKWzG_X2skfGQ@mail.gmail.com/ >> >> Best Regards, >> Yan, Zi > > > Hi, > > Yes, the issue I described is actually related to Guaranteed CMA[1]. > > I have rewritten our problem statement to address concerns more specifically related to the Android common kernels. > > Problem statement: > Android Common kernels usually have an out-of-tree patch to prevent file-backed page allocated from CMA. > It allows some allocations which have lower chance of being pinned to use CMA to improve CMA utilization controlled by a flag __GFP_CMA. > https://lore.kernel.org/lkml/cover.1604282969.git.cgoldswo@codeaurora.org/ > > > Additionally, android kernels create cma pcp list for pages less than PAGE_ALLOC_COSTLY_ORDER, but not for THP pages. I'm afraid you have to fix this in the android kernels. -- Cheers, David / dhildenb
© 2016 - 2025 Red Hat, Inc.