[RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()

Uladzislau Rezki (Sony) posted 7 patches 3 months ago
[RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
Posted by Uladzislau Rezki (Sony) 3 months ago
This patch makes __vmalloc_area_node() to correctly handle non-blocking
allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:

- nested_gfp flag follows the same non-blocking constraints
  as the primary gfp_mask, ensuring consistency and avoiding
  sleeping allocations in atomic contexts.

- if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
  and warning is issued if it was set, since __GFP_NOFAIL is
  incompatible with non-blocking contexts;

- Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
  if there are no DMA constraints.

- in non-blocking mode we use memalloc_noreclaim_save/restore()
  to prevent reclaim related operations that may sleep while
  setting up page tables or mapping pages.

This is particularly important for page table allocations that
internally use GFP_PGTABLE_KERNEL, which may sleep unless such
scope restrictions are applied. For example:

<snip>
    #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)

    __pte_alloc_kernel()
        pte_alloc_one_kernel(&init_mm);
            pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
<snip>

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2eaff0575a9e..fe1699e01e02 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3711,7 +3711,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 				 pgprot_t prot, unsigned int page_shift,
 				 int node)
 {
-	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
+	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
 	bool nofail = gfp_mask & __GFP_NOFAIL;
 	unsigned long addr = (unsigned long)area->addr;
 	unsigned long size = get_vm_area_size(area);
@@ -3719,12 +3719,28 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	unsigned int nr_small_pages = size >> PAGE_SHIFT;
 	unsigned int page_order;
 	unsigned int flags;
+	bool noblock;
 	int ret;
 
 	array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
+	noblock = !gfpflags_allow_blocking(gfp_mask);
 
-	if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
-		gfp_mask |= __GFP_HIGHMEM;
+	if (noblock) {
+		/* __GFP_NOFAIL is incompatible with non-blocking contexts. */
+		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
+		gfp_mask &= ~__GFP_NOFAIL;
+
+		/*
+		 * In non-sleeping contexts, ensure nested allocations follow
+		 * same non-blocking rules.
+		 */
+		nested_gfp = gfp_mask | __GFP_ZERO;
+		nofail = false;
+	} else {
+		/* Allow highmem allocations if there are no DMA constraints. */
+		if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
+			gfp_mask |= __GFP_HIGHMEM;
+	}
 
 	/* Please note that the recursion is strictly bounded. */
 	if (array_size > PAGE_SIZE) {
@@ -3788,7 +3804,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	 * page tables allocations ignore external gfp mask, enforce it
 	 * by the scope API
 	 */
-	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
+	if (noblock)
+		flags = memalloc_noreclaim_save();
+	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
 		flags = memalloc_nofs_save();
 	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)
 		flags = memalloc_noio_save();
@@ -3800,7 +3818,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 			schedule_timeout_uninterruptible(1);
 	} while (nofail && (ret < 0));
 
-	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
+	if (noblock)
+		memalloc_noreclaim_restore(flags);
+	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
 		memalloc_nofs_restore(flags);
 	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)
 		memalloc_noio_restore(flags);
-- 
2.39.5
Re: [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
Posted by Michal Hocko 3 months ago
On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote:
> This patch makes __vmalloc_area_node() to correctly handle non-blocking
> allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
> 
> - nested_gfp flag follows the same non-blocking constraints
>   as the primary gfp_mask, ensuring consistency and avoiding
>   sleeping allocations in atomic contexts.
> 
> - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
>   and warning is issued if it was set, since __GFP_NOFAIL is
>   incompatible with non-blocking contexts;
> 
> - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
>   if there are no DMA constraints.
> 
> - in non-blocking mode we use memalloc_noreclaim_save/restore()
>   to prevent reclaim related operations that may sleep while
>   setting up page tables or mapping pages.
> 
> This is particularly important for page table allocations that
> internally use GFP_PGTABLE_KERNEL, which may sleep unless such
> scope restrictions are applied. For example:
> 
> <snip>
>     #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> 
>     __pte_alloc_kernel()
>         pte_alloc_one_kernel(&init_mm);
>             pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
> <snip>
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  mm/vmalloc.c | 30 +++++++++++++++++++++++++-----
>  1 file changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 2eaff0575a9e..fe1699e01e02 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3711,7 +3711,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  				 pgprot_t prot, unsigned int page_shift,
>  				 int node)
>  {
> -	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> +	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>  	bool nofail = gfp_mask & __GFP_NOFAIL;
>  	unsigned long addr = (unsigned long)area->addr;
>  	unsigned long size = get_vm_area_size(area);
> @@ -3719,12 +3719,28 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	unsigned int nr_small_pages = size >> PAGE_SHIFT;
>  	unsigned int page_order;
>  	unsigned int flags;
> +	bool noblock;
>  	int ret;
>  
>  	array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
> +	noblock = !gfpflags_allow_blocking(gfp_mask);
>  
> -	if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
> -		gfp_mask |= __GFP_HIGHMEM;
> +	if (noblock) {
> +		/* __GFP_NOFAIL is incompatible with non-blocking contexts. */
> +		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
> +		gfp_mask &= ~__GFP_NOFAIL;

Btw. we already ignore GFP_NOFAIL for atomic allocations and warn about
that at the page allocator level (__alloc_pages_slowpath)

What we can do though is to add a pr_warn + dump_stack for request with
size that would require (in the worst case) page tables allocation
larger than a portion of min_free_kbytes (to scale with different memory
sizes). That should be plenty for any reasonable non blocking vmalloc.
We would have means to catch abusers in that way.

-- 
Michal Hocko
SUSE Labs
Re: [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
Posted by Uladzislau Rezki 3 months ago
On Tue, Jul 08, 2025 at 05:47:21PM +0200, Michal Hocko wrote:
> On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote:
> > This patch makes __vmalloc_area_node() to correctly handle non-blocking
> > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
> > 
> > - nested_gfp flag follows the same non-blocking constraints
> >   as the primary gfp_mask, ensuring consistency and avoiding
> >   sleeping allocations in atomic contexts.
> > 
> > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
> >   and warning is issued if it was set, since __GFP_NOFAIL is
> >   incompatible with non-blocking contexts;
> > 
> > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
> >   if there are no DMA constraints.
> > 
> > - in non-blocking mode we use memalloc_noreclaim_save/restore()
> >   to prevent reclaim related operations that may sleep while
> >   setting up page tables or mapping pages.
> > 
> > This is particularly important for page table allocations that
> > internally use GFP_PGTABLE_KERNEL, which may sleep unless such
> > scope restrictions are applied. For example:
> > 
> > <snip>
> >     #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> > 
> >     __pte_alloc_kernel()
> >         pte_alloc_one_kernel(&init_mm);
> >             pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
> > <snip>
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  mm/vmalloc.c | 30 +++++++++++++++++++++++++-----
> >  1 file changed, 25 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 2eaff0575a9e..fe1699e01e02 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3711,7 +3711,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> >  				 pgprot_t prot, unsigned int page_shift,
> >  				 int node)
> >  {
> > -	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> > +	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >  	bool nofail = gfp_mask & __GFP_NOFAIL;
> >  	unsigned long addr = (unsigned long)area->addr;
> >  	unsigned long size = get_vm_area_size(area);
> > @@ -3719,12 +3719,28 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> >  	unsigned int nr_small_pages = size >> PAGE_SHIFT;
> >  	unsigned int page_order;
> >  	unsigned int flags;
> > +	bool noblock;
> >  	int ret;
> >  
> >  	array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
> > +	noblock = !gfpflags_allow_blocking(gfp_mask);
> >  
> > -	if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
> > -		gfp_mask |= __GFP_HIGHMEM;
> > +	if (noblock) {
> > +		/* __GFP_NOFAIL is incompatible with non-blocking contexts. */
> > +		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
> > +		gfp_mask &= ~__GFP_NOFAIL;
> 
> Btw. we already ignore GFP_NOFAIL for atomic allocations and warn about
> that at the page allocator level (__alloc_pages_slowpath)
> 
Thank you. I will comment this!

>
> What we can do though is to add a pr_warn + dump_stack for request with
> size that would require (in the worst case) page tables allocation
> larger than a portion of min_free_kbytes (to scale with different memory
> sizes). That should be plenty for any reasonable non blocking vmalloc.
> We would have means to catch abusers in that way.
> 
OK, i will add it. I assume you mean:

  unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);

  if (request_pages > pages_min)
    dump();

--
Uladzislau Rezki
Re: [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
Posted by Michal Hocko 3 months ago
On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote:
> This patch makes __vmalloc_area_node() to correctly handle non-blocking
> allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
> 
> - nested_gfp flag follows the same non-blocking constraints
>   as the primary gfp_mask, ensuring consistency and avoiding
>   sleeping allocations in atomic contexts.
> 
> - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
>   and warning is issued if it was set, since __GFP_NOFAIL is
>   incompatible with non-blocking contexts;
> 
> - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
>   if there are no DMA constraints.
> 
> - in non-blocking mode we use memalloc_noreclaim_save/restore()
>   to prevent reclaim related operations that may sleep while
>   setting up page tables or mapping pages.
> 
> This is particularly important for page table allocations that
> internally use GFP_PGTABLE_KERNEL, which may sleep unless such
> scope restrictions are applied. For example:
> 
> <snip>
>     #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> 
>     __pte_alloc_kernel()
>         pte_alloc_one_kernel(&init_mm);
>             pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
> <snip>

The changelog doesn't explain the actual implementation and that is
really crucial here. You rely on memalloc_noreclaim_save (i.e.
PF_MEMALLOC) to never trigger memory reclaim but you are not explaining
how do you prevent from the biggest caveat of this interface. Let me
quote the documentation
 * Users of this scope have to be extremely careful to not deplete the reserves
 * completely and implement a throttling mechanism which controls the
 * consumption of the reserve based on the amount of freed memory. Usage of a
 * pre-allocated pool (e.g. mempool) should be always considered before using
 * this scope.

Unless I am missing something _any_ vmalloc(GFP_NOWAIT|GFP_ATOMIC) user
would get practically unbound access to the whole available memory. This
is not really acceptable.

> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  mm/vmalloc.c | 30 +++++++++++++++++++++++++-----
>  1 file changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 2eaff0575a9e..fe1699e01e02 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3711,7 +3711,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  				 pgprot_t prot, unsigned int page_shift,
>  				 int node)
>  {
> -	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> +	gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>  	bool nofail = gfp_mask & __GFP_NOFAIL;
>  	unsigned long addr = (unsigned long)area->addr;
>  	unsigned long size = get_vm_area_size(area);
> @@ -3719,12 +3719,28 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	unsigned int nr_small_pages = size >> PAGE_SHIFT;
>  	unsigned int page_order;
>  	unsigned int flags;
> +	bool noblock;
>  	int ret;
>  
>  	array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
> +	noblock = !gfpflags_allow_blocking(gfp_mask);
>  
> -	if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
> -		gfp_mask |= __GFP_HIGHMEM;
> +	if (noblock) {
> +		/* __GFP_NOFAIL is incompatible with non-blocking contexts. */
> +		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
> +		gfp_mask &= ~__GFP_NOFAIL;
> +
> +		/*
> +		 * In non-sleeping contexts, ensure nested allocations follow
> +		 * same non-blocking rules.
> +		 */
> +		nested_gfp = gfp_mask | __GFP_ZERO;
> +		nofail = false;
> +	} else {
> +		/* Allow highmem allocations if there are no DMA constraints. */
> +		if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
> +			gfp_mask |= __GFP_HIGHMEM;
> +	}
>  
>  	/* Please note that the recursion is strictly bounded. */
>  	if (array_size > PAGE_SIZE) {
> @@ -3788,7 +3804,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	 * page tables allocations ignore external gfp mask, enforce it
>  	 * by the scope API
>  	 */
> -	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
> +	if (noblock)
> +		flags = memalloc_noreclaim_save();
> +	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
>  		flags = memalloc_nofs_save();
>  	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)
>  		flags = memalloc_noio_save();
> @@ -3800,7 +3818,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  			schedule_timeout_uninterruptible(1);
>  	} while (nofail && (ret < 0));
>  
> -	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
> +	if (noblock)
> +		memalloc_noreclaim_restore(flags);
> +	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
>  		memalloc_nofs_restore(flags);
>  	else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)
>  		memalloc_noio_restore(flags);
> -- 
> 2.39.5
> 

-- 
Michal Hocko
SUSE Labs
Re: [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
Posted by Uladzislau Rezki 3 months ago
On Mon, Jul 07, 2025 at 09:13:04AM +0200, Michal Hocko wrote:
> On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote:
> > This patch makes __vmalloc_area_node() to correctly handle non-blocking
> > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
> > 
> > - nested_gfp flag follows the same non-blocking constraints
> >   as the primary gfp_mask, ensuring consistency and avoiding
> >   sleeping allocations in atomic contexts.
> > 
> > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
> >   and warning is issued if it was set, since __GFP_NOFAIL is
> >   incompatible with non-blocking contexts;
> > 
> > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
> >   if there are no DMA constraints.
> > 
> > - in non-blocking mode we use memalloc_noreclaim_save/restore()
> >   to prevent reclaim related operations that may sleep while
> >   setting up page tables or mapping pages.
> > 
> > This is particularly important for page table allocations that
> > internally use GFP_PGTABLE_KERNEL, which may sleep unless such
> > scope restrictions are applied. For example:
> > 
> > <snip>
> >     #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> > 
> >     __pte_alloc_kernel()
> >         pte_alloc_one_kernel(&init_mm);
> >             pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
> > <snip>
> 
> The changelog doesn't explain the actual implementation and that is
> really crucial here. You rely on memalloc_noreclaim_save (i.e.
> PF_MEMALLOC) to never trigger memory reclaim but you are not explaining
> how do you prevent from the biggest caveat of this interface. Let me
> quote the documentation
>  * Users of this scope have to be extremely careful to not deplete the reserves
>  * completely and implement a throttling mechanism which controls the
>  * consumption of the reserve based on the amount of freed memory. Usage of a
>  * pre-allocated pool (e.g. mempool) should be always considered before using
>  * this scope.
> 
I am aware about that comment. I had same concern about this, but it
looks like i/you may overshot here. Yes, we have access to memory
resrves but this only for page-table manipulations, i.e. to allocate
a page for 5-level page table structure. We have PGD, P4D, PUD, PMD
and PTE which is the lowest level and which needs pages the most.

As i see we do not free pages at least on PTE level, it means that
an address space is populated forward only and never shrink back.
Most of the time you do not need to allocate, this mostly occurs
initially after the boot.

>
> Unless I am missing something _any_ vmalloc(GFP_NOWAIT|GFP_ATOMIC) user
> would get practically unbound access to the whole available memory. This
> is not really acceptable.
> 
See above comment. If there is a big concern about this, i can add
memalloc_noblock_save() memalloc_noblock_restore() pair to eliminate
that concern. The context will be converted in a way that it drops
__GFP_DIRECT_RECLAIM flag.

Thank you for your comments and input i appreciate it.

--
Uladzislau Rezki
Re: [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
Posted by Michal Hocko 3 months ago
On Tue 08-07-25 14:27:57, Uladzislau Rezki wrote:
> On Mon, Jul 07, 2025 at 09:13:04AM +0200, Michal Hocko wrote:
> > On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote:
> > > This patch makes __vmalloc_area_node() to correctly handle non-blocking
> > > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
> > > 
> > > - nested_gfp flag follows the same non-blocking constraints
> > >   as the primary gfp_mask, ensuring consistency and avoiding
> > >   sleeping allocations in atomic contexts.
> > > 
> > > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
> > >   and warning is issued if it was set, since __GFP_NOFAIL is
> > >   incompatible with non-blocking contexts;
> > > 
> > > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
> > >   if there are no DMA constraints.
> > > 
> > > - in non-blocking mode we use memalloc_noreclaim_save/restore()
> > >   to prevent reclaim related operations that may sleep while
> > >   setting up page tables or mapping pages.
> > > 
> > > This is particularly important for page table allocations that
> > > internally use GFP_PGTABLE_KERNEL, which may sleep unless such
> > > scope restrictions are applied. For example:
> > > 
> > > <snip>
> > >     #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> > > 
> > >     __pte_alloc_kernel()
> > >         pte_alloc_one_kernel(&init_mm);
> > >             pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
> > > <snip>
> > 
> > The changelog doesn't explain the actual implementation and that is
> > really crucial here. You rely on memalloc_noreclaim_save (i.e.
> > PF_MEMALLOC) to never trigger memory reclaim but you are not explaining
> > how do you prevent from the biggest caveat of this interface. Let me
> > quote the documentation
> >  * Users of this scope have to be extremely careful to not deplete the reserves
> >  * completely and implement a throttling mechanism which controls the
> >  * consumption of the reserve based on the amount of freed memory. Usage of a
> >  * pre-allocated pool (e.g. mempool) should be always considered before using
> >  * this scope.
> > 
> I am aware about that comment. I had same concern about this, but it
> looks like i/you may overshot here. Yes, we have access to memory
> resrves but this only for page-table manipulations, i.e. to allocate
> a page for 5-level page table structure. We have PGD, P4D, PUD, PMD
> and PTE which is the lowest level and which needs pages the most.
> 
> As i see we do not free pages at least on PTE level, it means that
> an address space is populated forward only and never shrink back.
> Most of the time you do not need to allocate, this mostly occurs
> initially after the boot.

You are right, I have misread the patch. I thought this includes
vm_area_alloc_pages as well but you are right this is only for page
tables and that seems much more reasonable. Having that outlined in the
changelog would have helped ;)
-- 
Michal Hocko
SUSE Labs
Re: [RFC 6/7] mm/vmalloc: Support non-blocking GFP flags in __vmalloc_area_node()
Posted by Uladzislau Rezki 3 months ago
On Tue, Jul 08, 2025 at 05:22:52PM +0200, Michal Hocko wrote:
> On Tue 08-07-25 14:27:57, Uladzislau Rezki wrote:
> > On Mon, Jul 07, 2025 at 09:13:04AM +0200, Michal Hocko wrote:
> > > On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote:
> > > > This patch makes __vmalloc_area_node() to correctly handle non-blocking
> > > > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
> > > > 
> > > > - nested_gfp flag follows the same non-blocking constraints
> > > >   as the primary gfp_mask, ensuring consistency and avoiding
> > > >   sleeping allocations in atomic contexts.
> > > > 
> > > > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
> > > >   and warning is issued if it was set, since __GFP_NOFAIL is
> > > >   incompatible with non-blocking contexts;
> > > > 
> > > > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
> > > >   if there are no DMA constraints.
> > > > 
> > > > - in non-blocking mode we use memalloc_noreclaim_save/restore()
> > > >   to prevent reclaim related operations that may sleep while
> > > >   setting up page tables or mapping pages.
> > > > 
> > > > This is particularly important for page table allocations that
> > > > internally use GFP_PGTABLE_KERNEL, which may sleep unless such
> > > > scope restrictions are applied. For example:
> > > > 
> > > > <snip>
> > > >     #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> > > > 
> > > >     __pte_alloc_kernel()
> > > >         pte_alloc_one_kernel(&init_mm);
> > > >             pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
> > > > <snip>
> > > 
> > > The changelog doesn't explain the actual implementation and that is
> > > really crucial here. You rely on memalloc_noreclaim_save (i.e.
> > > PF_MEMALLOC) to never trigger memory reclaim but you are not explaining
> > > how do you prevent from the biggest caveat of this interface. Let me
> > > quote the documentation
> > >  * Users of this scope have to be extremely careful to not deplete the reserves
> > >  * completely and implement a throttling mechanism which controls the
> > >  * consumption of the reserve based on the amount of freed memory. Usage of a
> > >  * pre-allocated pool (e.g. mempool) should be always considered before using
> > >  * this scope.
> > > 
> > I am aware about that comment. I had same concern about this, but it
> > looks like i/you may overshot here. Yes, we have access to memory
> > resrves but this only for page-table manipulations, i.e. to allocate
> > a page for 5-level page table structure. We have PGD, P4D, PUD, PMD
> > and PTE which is the lowest level and which needs pages the most.
> > 
> > As i see we do not free pages at least on PTE level, it means that
> > an address space is populated forward only and never shrink back.
> > Most of the time you do not need to allocate, this mostly occurs
> > initially after the boot.
> 
> You are right, I have misread the patch. I thought this includes
> vm_area_alloc_pages as well but you are right this is only for page
> tables and that seems much more reasonable. Having that outlined in the
> changelog would have helped ;)
>
I will update the commit message in more detail in my next version.

Thank you for!

--
Uladzislau Rezki