This patch makes __vmalloc_area_node() to correctly handle non-blocking
allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes:
- nested_gfp flag follows the same non-blocking constraints
as the primary gfp_mask, ensuring consistency and avoiding
sleeping allocations in atomic contexts.
- if blocking is not allowed, __GFP_NOFAIL is forcibly cleared
and warning is issued if it was set, since __GFP_NOFAIL is
incompatible with non-blocking contexts;
- Add a __GFP_HIGHMEM to gfp_mask only for blocking requests
if there are no DMA constraints.
- in non-blocking mode we use memalloc_noreclaim_save/restore()
to prevent reclaim related operations that may sleep while
setting up page tables or mapping pages.
This is particularly important for page table allocations that
internally use GFP_PGTABLE_KERNEL, which may sleep unless such
scope restrictions are applied. For example:
<snip>
#define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
__pte_alloc_kernel()
pte_alloc_one_kernel(&init_mm);
pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0);
<snip>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
mm/vmalloc.c | 30 +++++++++++++++++++++++++-----
1 file changed, 25 insertions(+), 5 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2eaff0575a9e..fe1699e01e02 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3711,7 +3711,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
pgprot_t prot, unsigned int page_shift,
int node)
{
- const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
+ gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
bool nofail = gfp_mask & __GFP_NOFAIL;
unsigned long addr = (unsigned long)area->addr;
unsigned long size = get_vm_area_size(area);
@@ -3719,12 +3719,28 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
unsigned int nr_small_pages = size >> PAGE_SHIFT;
unsigned int page_order;
unsigned int flags;
+ bool noblock;
int ret;
array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
+ noblock = !gfpflags_allow_blocking(gfp_mask);
- if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
- gfp_mask |= __GFP_HIGHMEM;
+ if (noblock) {
+ /* __GFP_NOFAIL is incompatible with non-blocking contexts. */
+ WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
+ gfp_mask &= ~__GFP_NOFAIL;
+
+ /*
+ * In non-sleeping contexts, ensure nested allocations follow
+ * same non-blocking rules.
+ */
+ nested_gfp = gfp_mask | __GFP_ZERO;
+ nofail = false;
+ } else {
+ /* Allow highmem allocations if there are no DMA constraints. */
+ if (!(gfp_mask & (GFP_DMA | GFP_DMA32)))
+ gfp_mask |= __GFP_HIGHMEM;
+ }
/* Please note that the recursion is strictly bounded. */
if (array_size > PAGE_SIZE) {
@@ -3788,7 +3804,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
* page tables allocations ignore external gfp mask, enforce it
* by the scope API
*/
- if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
+ if (noblock)
+ flags = memalloc_noreclaim_save();
+ else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
flags = memalloc_nofs_save();
else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)
flags = memalloc_noio_save();
@@ -3800,7 +3818,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
schedule_timeout_uninterruptible(1);
} while (nofail && (ret < 0));
- if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
+ if (noblock)
+ memalloc_noreclaim_restore(flags);
+ else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
memalloc_nofs_restore(flags);
else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)
memalloc_noio_restore(flags);
--
2.39.5
On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote: > This patch makes __vmalloc_area_node() to correctly handle non-blocking > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes: > > - nested_gfp flag follows the same non-blocking constraints > as the primary gfp_mask, ensuring consistency and avoiding > sleeping allocations in atomic contexts. > > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared > and warning is issued if it was set, since __GFP_NOFAIL is > incompatible with non-blocking contexts; > > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests > if there are no DMA constraints. > > - in non-blocking mode we use memalloc_noreclaim_save/restore() > to prevent reclaim related operations that may sleep while > setting up page tables or mapping pages. > > This is particularly important for page table allocations that > internally use GFP_PGTABLE_KERNEL, which may sleep unless such > scope restrictions are applied. For example: > > <snip> > #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO) > > __pte_alloc_kernel() > pte_alloc_one_kernel(&init_mm); > pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0); > <snip> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > --- > mm/vmalloc.c | 30 +++++++++++++++++++++++++----- > 1 file changed, 25 insertions(+), 5 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 2eaff0575a9e..fe1699e01e02 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3711,7 +3711,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > pgprot_t prot, unsigned int page_shift, > int node) > { > - const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > + gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > bool nofail = gfp_mask & __GFP_NOFAIL; > unsigned long addr = (unsigned long)area->addr; > unsigned long size = get_vm_area_size(area); > @@ -3719,12 +3719,28 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > unsigned int nr_small_pages = size >> PAGE_SHIFT; > unsigned int page_order; > unsigned int flags; > + bool noblock; > int ret; > > array_size = (unsigned long)nr_small_pages * sizeof(struct page *); > + noblock = !gfpflags_allow_blocking(gfp_mask); > > - if (!(gfp_mask & (GFP_DMA | GFP_DMA32))) > - gfp_mask |= __GFP_HIGHMEM; > + if (noblock) { > + /* __GFP_NOFAIL is incompatible with non-blocking contexts. */ > + WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL); > + gfp_mask &= ~__GFP_NOFAIL; Btw. we already ignore GFP_NOFAIL for atomic allocations and warn about that at the page allocator level (__alloc_pages_slowpath) What we can do though is to add a pr_warn + dump_stack for request with size that would require (in the worst case) page tables allocation larger than a portion of min_free_kbytes (to scale with different memory sizes). That should be plenty for any reasonable non blocking vmalloc. We would have means to catch abusers in that way. -- Michal Hocko SUSE Labs
On Tue, Jul 08, 2025 at 05:47:21PM +0200, Michal Hocko wrote: > On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote: > > This patch makes __vmalloc_area_node() to correctly handle non-blocking > > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes: > > > > - nested_gfp flag follows the same non-blocking constraints > > as the primary gfp_mask, ensuring consistency and avoiding > > sleeping allocations in atomic contexts. > > > > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared > > and warning is issued if it was set, since __GFP_NOFAIL is > > incompatible with non-blocking contexts; > > > > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests > > if there are no DMA constraints. > > > > - in non-blocking mode we use memalloc_noreclaim_save/restore() > > to prevent reclaim related operations that may sleep while > > setting up page tables or mapping pages. > > > > This is particularly important for page table allocations that > > internally use GFP_PGTABLE_KERNEL, which may sleep unless such > > scope restrictions are applied. For example: > > > > <snip> > > #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO) > > > > __pte_alloc_kernel() > > pte_alloc_one_kernel(&init_mm); > > pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0); > > <snip> > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > --- > > mm/vmalloc.c | 30 +++++++++++++++++++++++++----- > > 1 file changed, 25 insertions(+), 5 deletions(-) > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index 2eaff0575a9e..fe1699e01e02 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -3711,7 +3711,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > pgprot_t prot, unsigned int page_shift, > > int node) > > { > > - const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > > + gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > > bool nofail = gfp_mask & __GFP_NOFAIL; > > unsigned long addr = (unsigned long)area->addr; > > unsigned long size = get_vm_area_size(area); > > @@ -3719,12 +3719,28 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > unsigned int nr_small_pages = size >> PAGE_SHIFT; > > unsigned int page_order; > > unsigned int flags; > > + bool noblock; > > int ret; > > > > array_size = (unsigned long)nr_small_pages * sizeof(struct page *); > > + noblock = !gfpflags_allow_blocking(gfp_mask); > > > > - if (!(gfp_mask & (GFP_DMA | GFP_DMA32))) > > - gfp_mask |= __GFP_HIGHMEM; > > + if (noblock) { > > + /* __GFP_NOFAIL is incompatible with non-blocking contexts. */ > > + WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL); > > + gfp_mask &= ~__GFP_NOFAIL; > > Btw. we already ignore GFP_NOFAIL for atomic allocations and warn about > that at the page allocator level (__alloc_pages_slowpath) > Thank you. I will comment this! > > What we can do though is to add a pr_warn + dump_stack for request with > size that would require (in the worst case) page tables allocation > larger than a portion of min_free_kbytes (to scale with different memory > sizes). That should be plenty for any reasonable non blocking vmalloc. > We would have means to catch abusers in that way. > OK, i will add it. I assume you mean: unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10); if (request_pages > pages_min) dump(); -- Uladzislau Rezki
On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote: > This patch makes __vmalloc_area_node() to correctly handle non-blocking > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes: > > - nested_gfp flag follows the same non-blocking constraints > as the primary gfp_mask, ensuring consistency and avoiding > sleeping allocations in atomic contexts. > > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared > and warning is issued if it was set, since __GFP_NOFAIL is > incompatible with non-blocking contexts; > > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests > if there are no DMA constraints. > > - in non-blocking mode we use memalloc_noreclaim_save/restore() > to prevent reclaim related operations that may sleep while > setting up page tables or mapping pages. > > This is particularly important for page table allocations that > internally use GFP_PGTABLE_KERNEL, which may sleep unless such > scope restrictions are applied. For example: > > <snip> > #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO) > > __pte_alloc_kernel() > pte_alloc_one_kernel(&init_mm); > pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0); > <snip> The changelog doesn't explain the actual implementation and that is really crucial here. You rely on memalloc_noreclaim_save (i.e. PF_MEMALLOC) to never trigger memory reclaim but you are not explaining how do you prevent from the biggest caveat of this interface. Let me quote the documentation * Users of this scope have to be extremely careful to not deplete the reserves * completely and implement a throttling mechanism which controls the * consumption of the reserve based on the amount of freed memory. Usage of a * pre-allocated pool (e.g. mempool) should be always considered before using * this scope. Unless I am missing something _any_ vmalloc(GFP_NOWAIT|GFP_ATOMIC) user would get practically unbound access to the whole available memory. This is not really acceptable. > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > --- > mm/vmalloc.c | 30 +++++++++++++++++++++++++----- > 1 file changed, 25 insertions(+), 5 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 2eaff0575a9e..fe1699e01e02 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3711,7 +3711,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > pgprot_t prot, unsigned int page_shift, > int node) > { > - const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > + gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > bool nofail = gfp_mask & __GFP_NOFAIL; > unsigned long addr = (unsigned long)area->addr; > unsigned long size = get_vm_area_size(area); > @@ -3719,12 +3719,28 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > unsigned int nr_small_pages = size >> PAGE_SHIFT; > unsigned int page_order; > unsigned int flags; > + bool noblock; > int ret; > > array_size = (unsigned long)nr_small_pages * sizeof(struct page *); > + noblock = !gfpflags_allow_blocking(gfp_mask); > > - if (!(gfp_mask & (GFP_DMA | GFP_DMA32))) > - gfp_mask |= __GFP_HIGHMEM; > + if (noblock) { > + /* __GFP_NOFAIL is incompatible with non-blocking contexts. */ > + WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL); > + gfp_mask &= ~__GFP_NOFAIL; > + > + /* > + * In non-sleeping contexts, ensure nested allocations follow > + * same non-blocking rules. > + */ > + nested_gfp = gfp_mask | __GFP_ZERO; > + nofail = false; > + } else { > + /* Allow highmem allocations if there are no DMA constraints. */ > + if (!(gfp_mask & (GFP_DMA | GFP_DMA32))) > + gfp_mask |= __GFP_HIGHMEM; > + } > > /* Please note that the recursion is strictly bounded. */ > if (array_size > PAGE_SIZE) { > @@ -3788,7 +3804,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > * page tables allocations ignore external gfp mask, enforce it > * by the scope API > */ > - if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) > + if (noblock) > + flags = memalloc_noreclaim_save(); > + else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) > flags = memalloc_nofs_save(); > else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0) > flags = memalloc_noio_save(); > @@ -3800,7 +3818,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > schedule_timeout_uninterruptible(1); > } while (nofail && (ret < 0)); > > - if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) > + if (noblock) > + memalloc_noreclaim_restore(flags); > + else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) > memalloc_nofs_restore(flags); > else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0) > memalloc_noio_restore(flags); > -- > 2.39.5 > -- Michal Hocko SUSE Labs
On Mon, Jul 07, 2025 at 09:13:04AM +0200, Michal Hocko wrote: > On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote: > > This patch makes __vmalloc_area_node() to correctly handle non-blocking > > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes: > > > > - nested_gfp flag follows the same non-blocking constraints > > as the primary gfp_mask, ensuring consistency and avoiding > > sleeping allocations in atomic contexts. > > > > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared > > and warning is issued if it was set, since __GFP_NOFAIL is > > incompatible with non-blocking contexts; > > > > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests > > if there are no DMA constraints. > > > > - in non-blocking mode we use memalloc_noreclaim_save/restore() > > to prevent reclaim related operations that may sleep while > > setting up page tables or mapping pages. > > > > This is particularly important for page table allocations that > > internally use GFP_PGTABLE_KERNEL, which may sleep unless such > > scope restrictions are applied. For example: > > > > <snip> > > #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO) > > > > __pte_alloc_kernel() > > pte_alloc_one_kernel(&init_mm); > > pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0); > > <snip> > > The changelog doesn't explain the actual implementation and that is > really crucial here. You rely on memalloc_noreclaim_save (i.e. > PF_MEMALLOC) to never trigger memory reclaim but you are not explaining > how do you prevent from the biggest caveat of this interface. Let me > quote the documentation > * Users of this scope have to be extremely careful to not deplete the reserves > * completely and implement a throttling mechanism which controls the > * consumption of the reserve based on the amount of freed memory. Usage of a > * pre-allocated pool (e.g. mempool) should be always considered before using > * this scope. > I am aware about that comment. I had same concern about this, but it looks like i/you may overshot here. Yes, we have access to memory resrves but this only for page-table manipulations, i.e. to allocate a page for 5-level page table structure. We have PGD, P4D, PUD, PMD and PTE which is the lowest level and which needs pages the most. As i see we do not free pages at least on PTE level, it means that an address space is populated forward only and never shrink back. Most of the time you do not need to allocate, this mostly occurs initially after the boot. > > Unless I am missing something _any_ vmalloc(GFP_NOWAIT|GFP_ATOMIC) user > would get practically unbound access to the whole available memory. This > is not really acceptable. > See above comment. If there is a big concern about this, i can add memalloc_noblock_save() memalloc_noblock_restore() pair to eliminate that concern. The context will be converted in a way that it drops __GFP_DIRECT_RECLAIM flag. Thank you for your comments and input i appreciate it. -- Uladzislau Rezki
On Tue 08-07-25 14:27:57, Uladzislau Rezki wrote: > On Mon, Jul 07, 2025 at 09:13:04AM +0200, Michal Hocko wrote: > > On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote: > > > This patch makes __vmalloc_area_node() to correctly handle non-blocking > > > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes: > > > > > > - nested_gfp flag follows the same non-blocking constraints > > > as the primary gfp_mask, ensuring consistency and avoiding > > > sleeping allocations in atomic contexts. > > > > > > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared > > > and warning is issued if it was set, since __GFP_NOFAIL is > > > incompatible with non-blocking contexts; > > > > > > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests > > > if there are no DMA constraints. > > > > > > - in non-blocking mode we use memalloc_noreclaim_save/restore() > > > to prevent reclaim related operations that may sleep while > > > setting up page tables or mapping pages. > > > > > > This is particularly important for page table allocations that > > > internally use GFP_PGTABLE_KERNEL, which may sleep unless such > > > scope restrictions are applied. For example: > > > > > > <snip> > > > #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO) > > > > > > __pte_alloc_kernel() > > > pte_alloc_one_kernel(&init_mm); > > > pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0); > > > <snip> > > > > The changelog doesn't explain the actual implementation and that is > > really crucial here. You rely on memalloc_noreclaim_save (i.e. > > PF_MEMALLOC) to never trigger memory reclaim but you are not explaining > > how do you prevent from the biggest caveat of this interface. Let me > > quote the documentation > > * Users of this scope have to be extremely careful to not deplete the reserves > > * completely and implement a throttling mechanism which controls the > > * consumption of the reserve based on the amount of freed memory. Usage of a > > * pre-allocated pool (e.g. mempool) should be always considered before using > > * this scope. > > > I am aware about that comment. I had same concern about this, but it > looks like i/you may overshot here. Yes, we have access to memory > resrves but this only for page-table manipulations, i.e. to allocate > a page for 5-level page table structure. We have PGD, P4D, PUD, PMD > and PTE which is the lowest level and which needs pages the most. > > As i see we do not free pages at least on PTE level, it means that > an address space is populated forward only and never shrink back. > Most of the time you do not need to allocate, this mostly occurs > initially after the boot. You are right, I have misread the patch. I thought this includes vm_area_alloc_pages as well but you are right this is only for page tables and that seems much more reasonable. Having that outlined in the changelog would have helped ;) -- Michal Hocko SUSE Labs
On Tue, Jul 08, 2025 at 05:22:52PM +0200, Michal Hocko wrote: > On Tue 08-07-25 14:27:57, Uladzislau Rezki wrote: > > On Mon, Jul 07, 2025 at 09:13:04AM +0200, Michal Hocko wrote: > > > On Fri 04-07-25 17:25:36, Uladzislau Rezki wrote: > > > > This patch makes __vmalloc_area_node() to correctly handle non-blocking > > > > allocation requests, such as GFP_ATOMIC and GFP_NOWAIT. Main changes: > > > > > > > > - nested_gfp flag follows the same non-blocking constraints > > > > as the primary gfp_mask, ensuring consistency and avoiding > > > > sleeping allocations in atomic contexts. > > > > > > > > - if blocking is not allowed, __GFP_NOFAIL is forcibly cleared > > > > and warning is issued if it was set, since __GFP_NOFAIL is > > > > incompatible with non-blocking contexts; > > > > > > > > - Add a __GFP_HIGHMEM to gfp_mask only for blocking requests > > > > if there are no DMA constraints. > > > > > > > > - in non-blocking mode we use memalloc_noreclaim_save/restore() > > > > to prevent reclaim related operations that may sleep while > > > > setting up page tables or mapping pages. > > > > > > > > This is particularly important for page table allocations that > > > > internally use GFP_PGTABLE_KERNEL, which may sleep unless such > > > > scope restrictions are applied. For example: > > > > > > > > <snip> > > > > #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO) > > > > > > > > __pte_alloc_kernel() > > > > pte_alloc_one_kernel(&init_mm); > > > > pagetable_alloc_noprof(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, 0); > > > > <snip> > > > > > > The changelog doesn't explain the actual implementation and that is > > > really crucial here. You rely on memalloc_noreclaim_save (i.e. > > > PF_MEMALLOC) to never trigger memory reclaim but you are not explaining > > > how do you prevent from the biggest caveat of this interface. Let me > > > quote the documentation > > > * Users of this scope have to be extremely careful to not deplete the reserves > > > * completely and implement a throttling mechanism which controls the > > > * consumption of the reserve based on the amount of freed memory. Usage of a > > > * pre-allocated pool (e.g. mempool) should be always considered before using > > > * this scope. > > > > > I am aware about that comment. I had same concern about this, but it > > looks like i/you may overshot here. Yes, we have access to memory > > resrves but this only for page-table manipulations, i.e. to allocate > > a page for 5-level page table structure. We have PGD, P4D, PUD, PMD > > and PTE which is the lowest level and which needs pages the most. > > > > As i see we do not free pages at least on PTE level, it means that > > an address space is populated forward only and never shrink back. > > Most of the time you do not need to allocate, this mostly occurs > > initially after the boot. > > You are right, I have misread the patch. I thought this includes > vm_area_alloc_pages as well but you are right this is only for page > tables and that seems much more reasonable. Having that outlined in the > changelog would have helped ;) > I will update the commit message in more detail in my next version. Thank you for! -- Uladzislau Rezki
© 2016 - 2025 Red Hat, Inc.