try_alloc_pages() will not attempt to allocate memory if the system has
*any* unaccepted memory. Memory is accepted as needed and can remain in
the system indefinitely, causing the interface to always fail.
Rather than immediately giving up, attempt to use already accepted
memory on free lists.
Pass 'alloc_flags' to cond_accept_memory() and do not accept new memory
for ALLOC_TRYLOCK requests.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: 97769a53f117 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
---
mm/page_alloc.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5669baf2a6fe..5fccf5fce084 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -290,7 +290,8 @@ EXPORT_SYMBOL(nr_online_nodes);
#endif
static bool page_contains_unaccepted(struct page *page, unsigned int order);
-static bool cond_accept_memory(struct zone *zone, unsigned int order);
+static bool cond_accept_memory(struct zone *zone, unsigned int order,
+ int alloc_flags);
static bool __free_unaccepted(struct page *page);
int page_group_by_mobility_disabled __read_mostly;
@@ -3616,7 +3617,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
}
}
- cond_accept_memory(zone, order);
+ cond_accept_memory(zone, order, alloc_flags);
/*
* Detect whether the number of free pages is below high
@@ -3643,7 +3644,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
gfp_mask)) {
int ret;
- if (cond_accept_memory(zone, order))
+ if (cond_accept_memory(zone, order, alloc_flags))
goto try_this_zone;
/*
@@ -3696,7 +3697,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
return page;
} else {
- if (cond_accept_memory(zone, order))
+ if (cond_accept_memory(zone, order, alloc_flags))
goto try_this_zone;
/* Try again if zone has deferred pages */
@@ -4849,7 +4850,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
goto failed;
}
- cond_accept_memory(zone, 0);
+ cond_accept_memory(zone, 0, alloc_flags);
retry_this_zone:
mark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK) + nr_pages;
if (zone_watermark_fast(zone, 0, mark,
@@ -4858,7 +4859,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
break;
}
- if (cond_accept_memory(zone, 0))
+ if (cond_accept_memory(zone, 0, alloc_flags))
goto retry_this_zone;
/* Try again if zone has deferred pages */
@@ -7284,7 +7285,8 @@ static inline bool has_unaccepted_memory(void)
return static_branch_unlikely(&zones_with_unaccepted_pages);
}
-static bool cond_accept_memory(struct zone *zone, unsigned int order)
+static bool cond_accept_memory(struct zone *zone, unsigned int order,
+ int alloc_flags)
{
long to_accept, wmark;
bool ret = false;
@@ -7295,6 +7297,10 @@ static bool cond_accept_memory(struct zone *zone, unsigned int order)
if (list_empty(&zone->unaccepted_pages))
return false;
+ /* Bailout, since try_to_accept_memory_one() needs to take a lock */
+ if (alloc_flags & ALLOC_TRYLOCK)
+ return false;
+
wmark = promo_wmark_pages(zone);
/*
@@ -7351,7 +7357,8 @@ static bool page_contains_unaccepted(struct page *page, unsigned int order)
return false;
}
-static bool cond_accept_memory(struct zone *zone, unsigned int order)
+static bool cond_accept_memory(struct zone *zone, unsigned int order,
+ int alloc_flags)
{
return false;
}
@@ -7422,11 +7429,6 @@ struct page *try_alloc_pages_noprof(int nid, unsigned int order)
if (!pcp_allowed_order(order))
return NULL;
-#ifdef CONFIG_UNACCEPTED_MEMORY
- /* Bailout, since try_to_accept_memory_one() needs to take a lock */
- if (has_unaccepted_memory())
- return NULL;
-#endif
/* Bailout, since _deferred_grow_zone() needs to take a lock */
if (deferred_pages_enabled())
return NULL;
--
2.47.2
On Tue, 6 May 2025 14:25:08 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote: > try_alloc_pages() will not attempt to allocate memory if the system has > *any* unaccepted memory. Memory is accepted as needed and can remain in > the system indefinitely, causing the interface to always fail. > > Rather than immediately giving up, attempt to use already accepted > memory on free lists. > > Pass 'alloc_flags' to cond_accept_memory() and do not accept new memory > for ALLOC_TRYLOCK requests. What are the userspace-visible effects, please? Was the omission of cc:stable intentional? I cannot locally determine this without the above info. If the cc:stable omission was indeed intentional then it would be better if this series was presented as two standalone patches.
On Tue, May 06, 2025 at 05:00:34PM -0700, Andrew Morton wrote: > On Tue, 6 May 2025 14:25:08 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote: > > > try_alloc_pages() will not attempt to allocate memory if the system has > > *any* unaccepted memory. Memory is accepted as needed and can remain in > > the system indefinitely, causing the interface to always fail. > > > > Rather than immediately giving up, attempt to use already accepted > > memory on free lists. > > > > Pass 'alloc_flags' to cond_accept_memory() and do not accept new memory > > for ALLOC_TRYLOCK requests. > > What are the userspace-visible effects, please? I cannot say I fully understand the implications. The interface obviously allows failure, but the caller might expect eventual success on retry. So far, only BPF uses the interface. Maybe Alexei can comment on what will happen if the function always fails. I noticed the issue by code analysis because the second patch removes has_unaccepted_memory(). > Was the omission of cc:stable intentional? I cannot locally determine > this without the above info. > > If the cc:stable omission was indeed intentional then it would be better > if this series was presented as two standalone patches. Given that the second patch cannot be applied to current Linus' tree without this one, it is better to add stable@. -- Kiryl Shutsemau / Kirill A. Shutemov
On Tue, May 6, 2025 at 4:25 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> try_alloc_pages() will not attempt to allocate memory if the system has
> *any* unaccepted memory. Memory is accepted as needed and can remain in
> the system indefinitely, causing the interface to always fail.
>
> Rather than immediately giving up, attempt to use already accepted
> memory on free lists.
>
> Pass 'alloc_flags' to cond_accept_memory() and do not accept new memory
> for ALLOC_TRYLOCK requests.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Fixes: 97769a53f117 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation")
Thanks for working on this, but the fixes tag is overkill.
This limitation is not causing any issues in our setups.
Improving it is certainly better, of course.
Acked-by: Alexei Starovoitov <ast@kernel.org>
On Tue, May 06, 2025 at 10:18:21AM -0700, Alexei Starovoitov wrote:
> On Tue, May 6, 2025 at 4:25 AM Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > try_alloc_pages() will not attempt to allocate memory if the system has
> > *any* unaccepted memory. Memory is accepted as needed and can remain in
> > the system indefinitely, causing the interface to always fail.
> >
> > Rather than immediately giving up, attempt to use already accepted
> > memory on free lists.
> >
> > Pass 'alloc_flags' to cond_accept_memory() and do not accept new memory
> > for ALLOC_TRYLOCK requests.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Fixes: 97769a53f117 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation")
>
> Thanks for working on this, but the fixes tag is overkill.
> This limitation is not causing any issues in our setups.
Have you had chance to test it on any platform with unaccepted memory?
So far it is only Intel TDX and AMD SEV guests.
> Improving it is certainly better, of course.
> Acked-by: Alexei Starovoitov <ast@kernel.org>
Thanks!
--
Kiryl Shutsemau / Kirill A. Shutemov
On Tue, May 6, 2025 at 12:00 PM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Tue, May 06, 2025 at 10:18:21AM -0700, Alexei Starovoitov wrote:
> > On Tue, May 6, 2025 at 4:25 AM Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com> wrote:
> > >
> > > try_alloc_pages() will not attempt to allocate memory if the system has
> > > *any* unaccepted memory. Memory is accepted as needed and can remain in
> > > the system indefinitely, causing the interface to always fail.
> > >
> > > Rather than immediately giving up, attempt to use already accepted
> > > memory on free lists.
> > >
> > > Pass 'alloc_flags' to cond_accept_memory() and do not accept new memory
> > > for ALLOC_TRYLOCK requests.
> > >
> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > Fixes: 97769a53f117 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation")
> >
> > Thanks for working on this, but the fixes tag is overkill.
> > This limitation is not causing any issues in our setups.
>
> Have you had chance to test it on any platform with unaccepted memory?
> So far it is only Intel TDX and AMD SEV guests.
We don't use them, and my understanding is that such
unaccepted memory will be there only during boot time.
On Tue, May 06, 2025 at 03:05:31PM -0700, Alexei Starovoitov wrote:
> On Tue, May 6, 2025 at 12:00 PM Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > On Tue, May 06, 2025 at 10:18:21AM -0700, Alexei Starovoitov wrote:
> > > On Tue, May 6, 2025 at 4:25 AM Kirill A. Shutemov
> > > <kirill.shutemov@linux.intel.com> wrote:
> > > >
> > > > try_alloc_pages() will not attempt to allocate memory if the system has
> > > > *any* unaccepted memory. Memory is accepted as needed and can remain in
> > > > the system indefinitely, causing the interface to always fail.
> > > >
> > > > Rather than immediately giving up, attempt to use already accepted
> > > > memory on free lists.
> > > >
> > > > Pass 'alloc_flags' to cond_accept_memory() and do not accept new memory
> > > > for ALLOC_TRYLOCK requests.
> > > >
> > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > Fixes: 97769a53f117 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation")
> > >
> > > Thanks for working on this, but the fixes tag is overkill.
> > > This limitation is not causing any issues in our setups.
> >
> > Have you had chance to test it on any platform with unaccepted memory?
> > So far it is only Intel TDX and AMD SEV guests.
>
> We don't use them, and my understanding is that such
> unaccepted memory will be there only during boot time.
That's false. Unaccepted memory can be there indefinitely after boot. It
only gets accepted on demand.
--
Kiryl Shutsemau / Kirill A. Shutemov
On Tue May 6, 2025 at 11:25 AM UTC, Kirill A. Shutemov wrote: > + /* Bailout, since try_to_accept_memory_one() needs to take a lock */ > + if (alloc_flags & ALLOC_TRYLOCK) > + return false; > + Quick lazy question: why don't we just trylock it like we do for the zone lock?
On Tue, May 06, 2025 at 01:20:25PM +0000, Brendan Jackman wrote: > On Tue May 6, 2025 at 11:25 AM UTC, Kirill A. Shutemov wrote: > > + /* Bailout, since try_to_accept_memory_one() needs to take a lock */ > > + if (alloc_flags & ALLOC_TRYLOCK) > > + return false; > > + > > Quick lazy question: why don't we just trylock it like we do for the zone > lock? It is not only zone lock. There's also unaccepted_memory_lock inside accept_memory(). -- Kiryl Shutsemau / Kirill A. Shutemov
On Tue May 6, 2025 at 1:34 PM UTC, Kirill A. Shutemov wrote: > On Tue, May 06, 2025 at 01:20:25PM +0000, Brendan Jackman wrote: >> On Tue May 6, 2025 at 11:25 AM UTC, Kirill A. Shutemov wrote: >> > + /* Bailout, since try_to_accept_memory_one() needs to take a lock */ >> > + if (alloc_flags & ALLOC_TRYLOCK) >> > + return false; >> > + >> >> Quick lazy question: why don't we just trylock it like we do for the zone >> lock? > > It is not only zone lock. There's also unaccepted_memory_lock inside > accept_memory(). Right, but my lazy question was why can't we "just" trylock that too? But anyway, that's no use because if we win the trylock we'd still have to do __free_pages_ok().
© 2016 - 2025 Red Hat, Inc.