mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Commit 524c48072e56 ("mm/page_alloc: rename ALLOC_HIGH to
ALLOC_MIN_RESERVE") is the start of a series that explains how __GFP_HIGH,
which implies ALLOC_MIN_RESERVE, is going to be used instead of
__GFP_ATOMIC for high atomic reserves.
Commit eb2e2b425c69 ("mm/page_alloc: explicitly record high-order atomic
allocations in alloc_flags") introduced ALLOC_HIGHATOMIC for such
allocations of order higher than 0. It still used __GFP_ATOMIC, though.
Then, commit 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH
non-blocking allocations accesses reserves") just turned that check for
!__GFP_DIRECT_RECLAIM, ignoring that high atomic reserves were expected to
test for __GFP_HIGH.
This leads to high atomic reserves being added for high-order GFP_NOWAIT
allocations and others that clear __GFP_DIRECT_RECLAIM, which is
unexpected. Later, those reserves lead to 0-order allocations going to the
slow path and starting reclaim.
From /proc/pagetypeinfo, without the patch:
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type HighAtomic 1 8 10 9 7 3 0 0 0 0 0
Node 0, zone Normal, type HighAtomic 64 20 12 5 0 0 0 0 0 0 0
With the patch:
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Fixes: 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH non-blocking allocations accesses reserves")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Tested-by: Helen Koike <koike@igalia.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ef3c07266b3..bf52e3bef626 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4219,7 +4219,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
if (!(gfp_mask & __GFP_NOMEMALLOC)) {
alloc_flags |= ALLOC_NON_BLOCK;
- if (order > 0)
+ if (order > 0 && (alloc_flags & ALLOC_MIN_RESERVE))
alloc_flags |= ALLOC_HIGHATOMIC;
}
--
2.47.2
On 8/14/25 19:22, Thadeu Lima de Souza Cascardo wrote: > Commit 524c48072e56 ("mm/page_alloc: rename ALLOC_HIGH to > ALLOC_MIN_RESERVE") is the start of a series that explains how __GFP_HIGH, > which implies ALLOC_MIN_RESERVE, is going to be used instead of > __GFP_ATOMIC for high atomic reserves. > > Commit eb2e2b425c69 ("mm/page_alloc: explicitly record high-order atomic > allocations in alloc_flags") introduced ALLOC_HIGHATOMIC for such > allocations of order higher than 0. It still used __GFP_ATOMIC, though. > > Then, commit 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH > non-blocking allocations accesses reserves") just turned that check for > !__GFP_DIRECT_RECLAIM, ignoring that high atomic reserves were expected to > test for __GFP_HIGH. > > This leads to high atomic reserves being added for high-order GFP_NOWAIT > allocations and others that clear __GFP_DIRECT_RECLAIM, which is > unexpected. Later, those reserves lead to 0-order allocations going to the > slow path and starting reclaim. > > From /proc/pagetypeinfo, without the patch: > > Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 > Node 0, zone DMA32, type HighAtomic 1 8 10 9 7 3 0 0 0 0 0 > Node 0, zone Normal, type HighAtomic 64 20 12 5 0 0 0 0 0 0 0 > > With the patch: > > Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 > Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 > Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 > > Fixes: 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH non-blocking allocations accesses reserves") > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> > Tested-by: Helen Koike <koike@igalia.com> > Cc: Mel Gorman <mgorman@techsingularity.net> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: NeilBrown <neilb@suse.de> > Cc: Thierry Reding <thierry.reding@gmail.com> > Cc: Vlastimil Babka <vbabka@suse.cz> Agreed with others that this change matches the original intention and it must have been an oversight. Also found nothing to the contrary in the original threads. > --- > mm/page_alloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 2ef3c07266b3..bf52e3bef626 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4219,7 +4219,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) > if (!(gfp_mask & __GFP_NOMEMALLOC)) { > alloc_flags |= ALLOC_NON_BLOCK; > > - if (order > 0) > + if (order > 0 && (alloc_flags & ALLOC_MIN_RESERVE)) > alloc_flags |= ALLOC_HIGHATOMIC; > } >
On 8/29/25 10:36, Vlastimil Babka wrote: > On 8/14/25 19:22, Thadeu Lima de Souza Cascardo wrote: >> Commit 524c48072e56 ("mm/page_alloc: rename ALLOC_HIGH to >> ALLOC_MIN_RESERVE") is the start of a series that explains how __GFP_HIGH, >> which implies ALLOC_MIN_RESERVE, is going to be used instead of >> __GFP_ATOMIC for high atomic reserves. >> >> Commit eb2e2b425c69 ("mm/page_alloc: explicitly record high-order atomic >> allocations in alloc_flags") introduced ALLOC_HIGHATOMIC for such >> allocations of order higher than 0. It still used __GFP_ATOMIC, though. >> >> Then, commit 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH >> non-blocking allocations accesses reserves") just turned that check for >> !__GFP_DIRECT_RECLAIM, ignoring that high atomic reserves were expected to >> test for __GFP_HIGH. >> >> This leads to high atomic reserves being added for high-order GFP_NOWAIT >> allocations and others that clear __GFP_DIRECT_RECLAIM, which is >> unexpected. Later, those reserves lead to 0-order allocations going to the >> slow path and starting reclaim. >> >> From /proc/pagetypeinfo, without the patch: >> >> Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 >> Node 0, zone DMA32, type HighAtomic 1 8 10 9 7 3 0 0 0 0 0 >> Node 0, zone Normal, type HighAtomic 64 20 12 5 0 0 0 0 0 0 0 >> >> With the patch: >> >> Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 >> Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 >> Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 >> >> Fixes: 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH non-blocking allocations accesses reserves") >> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> >> Tested-by: Helen Koike <koike@igalia.com> >> Cc: Mel Gorman <mgorman@techsingularity.net> >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Matthew Wilcox <willy@infradead.org> >> Cc: NeilBrown <neilb@suse.de> >> Cc: Thierry Reding <thierry.reding@gmail.com> >> Cc: Vlastimil Babka <vbabka@suse.cz> > > Agreed with others that this change matches the original intention and it > must have been an oversight. Also found nothing to the contrary in the > original threads. Oops, forgot to add Reviewed-by: Vlastimil Babka <vbabka@suse.cz> >> --- >> mm/page_alloc.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 2ef3c07266b3..bf52e3bef626 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -4219,7 +4219,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) >> if (!(gfp_mask & __GFP_NOMEMALLOC)) { >> alloc_flags |= ALLOC_NON_BLOCK; >> >> - if (order > 0) >> + if (order > 0 && (alloc_flags & ALLOC_MIN_RESERVE)) >> alloc_flags |= ALLOC_HIGHATOMIC; >> } >> >
On (25/08/29 10:38), Vlastimil Babka wrote: > On 8/29/25 10:36, Vlastimil Babka wrote: > > On 8/14/25 19:22, Thadeu Lima de Souza Cascardo wrote: [..] > > Agreed with others that this change matches the original intention and it > > must have been an oversight. Also found nothing to the contrary in the > > original threads. > > Oops, forgot to add > > Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Thank you! FWIW Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org> This needs Cc: stable@vger.kernel.org all the way down to 5.15, as far as I can tell.
On 8/29/25 10:56, Sergey Senozhatsky wrote: > On (25/08/29 10:38), Vlastimil Babka wrote: >> On 8/29/25 10:36, Vlastimil Babka wrote: >> > On 8/14/25 19:22, Thadeu Lima de Souza Cascardo wrote: > [..] >> > Agreed with others that this change matches the original intention and it >> > must have been an oversight. Also found nothing to the contrary in the >> > original threads. >> >> Oops, forgot to add >> >> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> > > Thank you! > > FWIW > Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org> > > This needs Cc: stable@vger.kernel.org all the way down to 5.15, > as far as I can tell. If that problem bothers users of LTS kernels in that range, we can do that. I wonder a bit how it was only found out now as a regression in the browser/desktop environment test if it's that old? Is there another factor i.e. some new frequesnt source of allocations that trigger it?
On Fri, Aug 29, 2025 at 11:30:17AM +0200, Vlastimil Babka wrote: > On 8/29/25 10:56, Sergey Senozhatsky wrote: > > On (25/08/29 10:38), Vlastimil Babka wrote: > >> On 8/29/25 10:36, Vlastimil Babka wrote: > >> > On 8/14/25 19:22, Thadeu Lima de Souza Cascardo wrote: > > [..] > >> > Agreed with others that this change matches the original intention and it > >> > must have been an oversight. Also found nothing to the contrary in the > >> > original threads. > >> > >> Oops, forgot to add > >> > >> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> > > > > Thank you! > > > > FWIW > > Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org> > > > > This needs Cc: stable@vger.kernel.org all the way down to 5.15, > > as far as I can tell. > > If that problem bothers users of LTS kernels in that range, we can do that. > I wonder a bit how it was only found out now as a regression in the > browser/desktop environment test if it's that old? Is there another factor > i.e. some new frequesnt source of allocations that trigger it? That is not. It is just that there was an upgrade all the way back from 5.4 and, then, I caught this while doing some code inspection and reviewing the patchset I referred to. Well, I also tested that it really happens and caught a unix socket skb allocation triggering that as it masks off __GFP_DIRECT_RECLAIM. Cascardo.
On Thu 14-08-25 14:22:45, Thadeu Lima de Souza Cascardo wrote: > Commit 524c48072e56 ("mm/page_alloc: rename ALLOC_HIGH to > ALLOC_MIN_RESERVE") is the start of a series that explains how __GFP_HIGH, > which implies ALLOC_MIN_RESERVE, is going to be used instead of > __GFP_ATOMIC for high atomic reserves. > > Commit eb2e2b425c69 ("mm/page_alloc: explicitly record high-order atomic > allocations in alloc_flags") introduced ALLOC_HIGHATOMIC for such > allocations of order higher than 0. It still used __GFP_ATOMIC, though. > > Then, commit 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH > non-blocking allocations accesses reserves") just turned that check for > !__GFP_DIRECT_RECLAIM, ignoring that high atomic reserves were expected to > test for __GFP_HIGH. > > This leads to high atomic reserves being added for high-order GFP_NOWAIT > allocations and others that clear __GFP_DIRECT_RECLAIM, which is > unexpected. Later, those reserves lead to 0-order allocations going to the > slow path and starting reclaim. > > >From /proc/pagetypeinfo, without the patch: > > Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 > Node 0, zone DMA32, type HighAtomic 1 8 10 9 7 3 0 0 0 0 0 > Node 0, zone Normal, type HighAtomic 64 20 12 5 0 0 0 0 0 0 0 > > With the patch: > > Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 > Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 > Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 > > Fixes: 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH non-blocking allocations accesses reserves") > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> > Tested-by: Helen Koike <koike@igalia.com> > Cc: Mel Gorman <mgorman@techsingularity.net> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: NeilBrown <neilb@suse.de> > Cc: Thierry Reding <thierry.reding@gmail.com> > Cc: Vlastimil Babka <vbabka@suse.cz> Yes, this makes a lot of sense to me. GFP_NOWAIT allocations should be opportunistic and quick to fail rather than dipping into memory reserves. We must have overlooked that during the review. Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/page_alloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 2ef3c07266b3..bf52e3bef626 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4219,7 +4219,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) > if (!(gfp_mask & __GFP_NOMEMALLOC)) { > alloc_flags |= ALLOC_NON_BLOCK; > > - if (order > 0) > + if (order > 0 && (alloc_flags & ALLOC_MIN_RESERVE)) > alloc_flags |= ALLOC_HIGHATOMIC; > } > > -- > 2.47.2 -- Michal Hocko SUSE Labs
On (25/08/14 14:22), Thadeu Lima de Souza Cascardo wrote: > Commit 524c48072e56 ("mm/page_alloc: rename ALLOC_HIGH to > ALLOC_MIN_RESERVE") is the start of a series that explains how __GFP_HIGH, > which implies ALLOC_MIN_RESERVE, is going to be used instead of > __GFP_ATOMIC for high atomic reserves. > > Commit eb2e2b425c69 ("mm/page_alloc: explicitly record high-order atomic > allocations in alloc_flags") introduced ALLOC_HIGHATOMIC for such > allocations of order higher than 0. It still used __GFP_ATOMIC, though. > > Then, commit 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH > non-blocking allocations accesses reserves") just turned that check for > !__GFP_DIRECT_RECLAIM, ignoring that high atomic reserves were expected to > test for __GFP_HIGH. > > This leads to high atomic reserves being added for high-order GFP_NOWAIT > allocations and others that clear __GFP_DIRECT_RECLAIM, which is > unexpected. Later, those reserves lead to 0-order allocations going to the > slow path and starting reclaim. > > From /proc/pagetypeinfo, without the patch: > > Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 > Node 0, zone DMA32, type HighAtomic 1 8 10 9 7 3 0 0 0 0 0 > Node 0, zone Normal, type HighAtomic 64 20 12 5 0 0 0 0 0 0 0 > > With the patch: > > Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 > Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 > Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 [..] > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 2ef3c07266b3..bf52e3bef626 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4219,7 +4219,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) > if (!(gfp_mask & __GFP_NOMEMALLOC)) { > alloc_flags |= ALLOC_NON_BLOCK; > > - if (order > 0) > + if (order > 0 && (alloc_flags & ALLOC_MIN_RESERVE)) > alloc_flags |= ALLOC_HIGHATOMIC; > } From my limited understanding, it does look like this was the intention. Vlastimil, Mel, got a minute to take a look?
Hello Thadeu, On Thu, Aug 14, 2025 at 02:22:45PM -0300, Thadeu Lima de Souza Cascardo wrote: > Commit 524c48072e56 ("mm/page_alloc: rename ALLOC_HIGH to > ALLOC_MIN_RESERVE") is the start of a series that explains how __GFP_HIGH, > which implies ALLOC_MIN_RESERVE, is going to be used instead of > __GFP_ATOMIC for high atomic reserves. > > Commit eb2e2b425c69 ("mm/page_alloc: explicitly record high-order atomic > allocations in alloc_flags") introduced ALLOC_HIGHATOMIC for such > allocations of order higher than 0. It still used __GFP_ATOMIC, though. > > Then, commit 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH > non-blocking allocations accesses reserves") just turned that check for > !__GFP_DIRECT_RECLAIM, ignoring that high atomic reserves were expected to > test for __GFP_HIGH. It indeed looks accidental. From the cover letter, High-order atomic allocations are explicitly handled with the caveat that no __GFP_ATOMIC flag means that any high-order allocation that specifies GFP_HIGH and cannot enter direct reclaim will be treated as if it was GFP_ATOMIC. it sounds like the intent was what your patch does, and not to extend those privileges to anybody who is !gfp_direct_reclaim. > This leads to high atomic reserves being added for high-order GFP_NOWAIT > allocations and others that clear __GFP_DIRECT_RECLAIM, which is > unexpected. Later, those reserves lead to 0-order allocations going to the > slow path and starting reclaim. Can you please provide more background on the workload and the environment in which you observed this? Which GFP_NOWAIT requests you saw participating in the reserves etc. I would feel better with Mel or Vlastimil chiming in as well, but your fix looks correct to me.
On Thu, Aug 14, 2025 at 04:12:11PM -0400, Johannes Weiner wrote: > Hello Thadeu, > > On Thu, Aug 14, 2025 at 02:22:45PM -0300, Thadeu Lima de Souza Cascardo wrote: > > Commit 524c48072e56 ("mm/page_alloc: rename ALLOC_HIGH to > > ALLOC_MIN_RESERVE") is the start of a series that explains how __GFP_HIGH, > > which implies ALLOC_MIN_RESERVE, is going to be used instead of > > __GFP_ATOMIC for high atomic reserves. > > > > Commit eb2e2b425c69 ("mm/page_alloc: explicitly record high-order atomic > > allocations in alloc_flags") introduced ALLOC_HIGHATOMIC for such > > allocations of order higher than 0. It still used __GFP_ATOMIC, though. > > > > Then, commit 1ebbb21811b7 ("mm/page_alloc: explicitly define how __GFP_HIGH > > non-blocking allocations accesses reserves") just turned that check for > > !__GFP_DIRECT_RECLAIM, ignoring that high atomic reserves were expected to > > test for __GFP_HIGH. > > It indeed looks accidental. From the cover letter, > > High-order atomic allocations are explicitly handled with the caveat that > no __GFP_ATOMIC flag means that any high-order allocation that specifies > GFP_HIGH and cannot enter direct reclaim will be treated as if it was > GFP_ATOMIC. > > it sounds like the intent was what your patch does, and not to extend > those privileges to anybody who is !gfp_direct_reclaim. > > > This leads to high atomic reserves being added for high-order GFP_NOWAIT > > allocations and others that clear __GFP_DIRECT_RECLAIM, which is > > unexpected. Later, those reserves lead to 0-order allocations going to the > > slow path and starting reclaim. > > Can you please provide more background on the workload and the > environment in which you observed this? > > Which GFP_NOWAIT requests you saw participating in the reserves etc. > > I would feel better with Mel or Vlastimil chiming in as well, but your > fix looks correct to me. Thanks for the review, Johannes. This has been observed in a browser/desktop environment test, where we have noticed some memory pressure regression. This change alone does not make the regression go away entirely, but it improves it. I noticed some unix skb allocation going on and I found this at net/core/skbuff.c:alloc_skb_with_frags: page = alloc_pages((gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | __GFP_NOWARN, order); But I tested this at a simple VM with the most simple workload (no swap, writing to tmpfs) and it triggered with xarrays. At lib/xarray.c:xas_alloc: gfp_t gfp = GFP_NOWAIT | __GFP_NOWARN; if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT) gfp |= __GFP_ACCOUNT; node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp); Where radix_tree_node_cachep, on that VM, uses a 4-page slab. I tested with something like: if (order > 0) { WARN_ON_ONCE(!(alloc_flags & ALLOC_MIN_RESERVE)); alloc_flags |= ALLOC_HIGHATOMIC; } Thanks. Cascardo.
© 2016 - 2025 Red Hat, Inc.