mm/shmem.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-)
From: Kairui Song <kasong@tencent.com>
There are some problems with the code implementations of THP fallback.
suitable_orders could be zero, and calling highest_order on a zero value
returns an overflowed size. And the order check loop is updating the
index value on every loop which may cause the index to be aligned by a
larger value while the loop shrinks the order. And it forgot to try order
0 after the final loop.
This is usually fine because shmem_add_to_page_cache ensures the shmem
mapping is still sane, but it might cause many potential issues like
allocating random folios into the random position in the map or return
-ENOMEM by accident. This triggered some strange userspace errors [1],
and shouldn't have happened in the first place.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-mm/CAMgjq7DqgAmj25nDUwwu1U2cSGSn8n4-Hqpgottedy0S6YYeUw@mail.gmail.com/ [1]
Fixes: e7a2ab7b3bb5d ("mm: shmem: add mTHP support for anonymous shmem")
Signed-off-by: Kairui Song <kasong@tencent.com>
---
mm/shmem.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index b50ce7dbc84a..25303711f123 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1824,6 +1824,9 @@ static unsigned long shmem_suitable_orders(struct inode *inode, struct vm_fault
unsigned long pages;
int order;
+ if (!orders)
+ return 0;
+
if (vma) {
orders = thp_vma_suitable_orders(vma, vmf->address, orders);
if (!orders)
@@ -1888,27 +1891,28 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
orders = 0;
- if (orders > 0) {
- suitable_orders = shmem_suitable_orders(inode, vmf,
- mapping, index, orders);
+ suitable_orders = shmem_suitable_orders(inode, vmf,
+ mapping, index, orders);
+ if (suitable_orders) {
order = highest_order(suitable_orders);
- while (suitable_orders) {
+ do {
pages = 1UL << order;
- index = round_down(index, pages);
- folio = shmem_alloc_folio(gfp, order, info, index);
- if (folio)
+ folio = shmem_alloc_folio(gfp, order, info, round_down(index, pages));
+ if (folio) {
+ index = round_down(index, pages);
goto allocated;
+ }
if (pages == HPAGE_PMD_NR)
count_vm_event(THP_FILE_FALLBACK);
count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK);
order = next_order(&suitable_orders, order);
- }
- } else {
- pages = 1;
- folio = shmem_alloc_folio(gfp, 0, info, index);
+ } while (suitable_orders);
}
+
+ pages = 1;
+ folio = shmem_alloc_folio(gfp, 0, info, index);
if (!folio)
return ERR_PTR(-ENOMEM);
--
2.51.0
On 2025/10/22 03:04, Kairui Song wrote: > From: Kairui Song <kasong@tencent.com> > > There are some problems with the code implementations of THP fallback. > suitable_orders could be zero, and calling highest_order on a zero value > returns an overflowed size. And the order check loop is updating the > index value on every loop which may cause the index to be aligned by a > larger value while the loop shrinks the order. No, this is not true. Although ‘suitable_orders’ might be 0, it will not enter the ‘while (suitable_orders)’ loop, and ‘order’ will not be used (this is also how the highest_order() function is used in other places). > And it forgot to try order > 0 after the final loop. This is also not true. We will fallback to order 0 allocation in shmem_get_folio_gfp() if large order allocation fails. > This is usually fine because shmem_add_to_page_cache ensures the shmem > mapping is still sane, but it might cause many potential issues like > allocating random folios into the random position in the map or return > -ENOMEM by accident. This triggered some strange userspace errors [1], > and shouldn't have happened in the first place. I tested tmpfs with swap on ZRAM in the mm-new branch, and no issues were encountered. However, it is worth mentioning that, after the commit 69e0a3b49003 ("mm: shmem: fix the strategy for the tmpfs 'huge=' options"), tmpfs may consume more memory (because we no longer allocate large folios based on the write size), which might cause your test case to run out of memory (OOM) and trigger some errors. You may need to adjust your swap size or memcg limit. > Cc: stable@vger.kernel.org > Link: https://lore.kernel.org/linux-mm/CAMgjq7DqgAmj25nDUwwu1U2cSGSn8n4-Hqpgottedy0S6YYeUw@mail.gmail.com/ [1] > Fixes: e7a2ab7b3bb5d ("mm: shmem: add mTHP support for anonymous shmem") > Signed-off-by: Kairui Song <kasong@tencent.com> > --- > mm/shmem.c | 26 +++++++++++++++----------- > 1 file changed, 15 insertions(+), 11 deletions(-) > > diff --git a/mm/shmem.c b/mm/shmem.c > index b50ce7dbc84a..25303711f123 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1824,6 +1824,9 @@ static unsigned long shmem_suitable_orders(struct inode *inode, struct vm_fault > unsigned long pages; > int order; > > + if (!orders) > + return 0; > + > if (vma) { > orders = thp_vma_suitable_orders(vma, vmf->address, orders); > if (!orders) > @@ -1888,27 +1891,28 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf, > if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) > orders = 0; > > - if (orders > 0) { > - suitable_orders = shmem_suitable_orders(inode, vmf, > - mapping, index, orders); > + suitable_orders = shmem_suitable_orders(inode, vmf, > + mapping, index, orders); > > + if (suitable_orders) { > order = highest_order(suitable_orders); > - while (suitable_orders) { > + do { > pages = 1UL << order; > - index = round_down(index, pages); > - folio = shmem_alloc_folio(gfp, order, info, index); > - if (folio) > + folio = shmem_alloc_folio(gfp, order, info, round_down(index, pages)); > + if (folio) { > + index = round_down(index, pages); > goto allocated; > + } > > if (pages == HPAGE_PMD_NR) > count_vm_event(THP_FILE_FALLBACK); > count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK); > order = next_order(&suitable_orders, order); > - } > - } else { > - pages = 1; > - folio = shmem_alloc_folio(gfp, 0, info, index); > + } while (suitable_orders); > } > + > + pages = 1; > + folio = shmem_alloc_folio(gfp, 0, info, index); > if (!folio) > return ERR_PTR(-ENOMEM); >
On Wed, Oct 22, 2025 at 9:25 AM Baolin Wang <baolin.wang@linux.alibaba.com> wrote: > > > > On 2025/10/22 03:04, Kairui Song wrote: > > From: Kairui Song <kasong@tencent.com> > > > > There are some problems with the code implementations of THP fallback. > > suitable_orders could be zero, and calling highest_order on a zero value > > returns an overflowed size. And the order check loop is updating the > > index value on every loop which may cause the index to be aligned by a > > larger value while the loop shrinks the order. > > No, this is not true. Although ‘suitable_orders’ might be 0, it will not > enter the ‘while (suitable_orders)’ loop, and ‘order’ will not be used > (this is also how the highest_order() function is used in other places). Maybe I shouldn't mix the trivial issue with the real issue here, sorry, my bad, I was in a hurry :P. I mean if suitable_orders is zero we should just skip calling the highest_order since that returns a negative value. It's not causing an issue though, but redundant. > > > And it forgot to try order > > 0 after the final loop. > > This is also not true. We will fallback to order 0 allocation in > shmem_get_folio_gfp() if large order allocation fails. I thought after the fix, we can simplify the code, and maybe reduce the call to shmem_alloc_and_add_folio to only one so it will be inlined by the compiler. On second thought some more changes are needed to respect the huge_gfp. Maybe I should send a series to split the hot fix with clean ups. But here the index being modified during the loop do need a fix I think, so, for the fix part, we just need: diff --git a/mm/shmem.c b/mm/shmem.c index 29e1eb690125..e89ae4dd6859 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1895,10 +1895,11 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf, order = highest_order(suitable_orders); while (suitable_orders) { pages = 1UL << order; - index = round_down(index, pages); - folio = shmem_alloc_folio(gfp, order, info, index); - if (folio) + folio = shmem_alloc_folio(gfp, order, info, round_down(index, pages)); + if (folio) { + index = round_down(index, pages); goto allocated; + } if (pages == HPAGE_PMD_NR) count_vm_event(THP_FILE_FALLBACK); > > > This is usually fine because shmem_add_to_page_cache ensures the shmem > > mapping is still sane, but it might cause many potential issues like > > allocating random folios into the random position in the map or return > > -ENOMEM by accident. This triggered some strange userspace errors [1], > > and shouldn't have happened in the first place. > > I tested tmpfs with swap on ZRAM in the mm-new branch, and no issues > were encountered. However, it is worth mentioning that, after the commit > 69e0a3b49003 ("mm: shmem: fix the strategy for the tmpfs 'huge=' > options"), tmpfs may consume more memory (because we no longer allocate > large folios based on the write size), which might cause your test case > to run out of memory (OOM) and trigger some errors. You may need to > adjust your swap size or memcg limit. > I'm not seeing any OOM issue, and I tested with different memory sizes, the error occurs with all different setup. If I compared the built object with the ones before 69e0a3b49003, I can see that the build object is corrupted: Compare using hexdump & diff on any .o generated by gcc: --- GOOD 2025-10-21 12:17:44.121000287 +0000 +++ BAD 2025-10-21 12:18:01.094000870 +0000 @@ -3371,425 +3371,7 @@ ... <summary: whole chunk of data & symbols missing> ... GCC compile didn't fail because the problem occurs when writing the objects, so instead LD failed and not due to OOM. The reproduction rate is very high with different setup, not sure why it didn't occur on your setup, I suspect it's related to how different gcc handles write (not fault)? Revert 69e0a3b49003, the build is OK. 69e0a3b49003 isn't the root cause but triggered this. The minimized fix posted above also fixes the problem just fine.
© 2016 - 2025 Red Hat, Inc.