[PATCH 2/2] mm, slab: support NUMA policy for large kmalloc

Vlastimil Babka posted 2 patches 6 months, 3 weeks ago
There is a newer version of this series
[PATCH 2/2] mm, slab: support NUMA policy for large kmalloc
Posted by Vlastimil Babka 6 months, 3 weeks ago
The slab allocator observes the task's numa policy in various places
such as allocating slab pages. Large kmalloc allocations currently do
not, which seems to be an unintended omission. It is simple to correct
that, so make ___kmalloc_large_node() behave the same way as
alloc_slab_page().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index d7a62063a1676a327e13536bf724f0160f1fc8dc..d87015fad2df65629050d9bcd224facd3d2f4033 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4281,11 +4281,13 @@ static void *___kmalloc_large_node(size_t size, gfp_t flags, int node)
 	if (unlikely(flags & GFP_SLAB_BUG_MASK))
 		flags = kmalloc_fix_flags(flags);
 
+	flags |= __GFP_COMP;
+
 	if (node == NUMA_NO_NODE)
-		node = numa_mem_id();
+		folio = (struct folio *)alloc_frozen_pages_noprof(flags, order);
+	else
+		folio = (struct folio *)__alloc_frozen_pages_noprof(flags, order, node, NULL);
 
-	flags |= __GFP_COMP;
-	folio = (struct folio *)__alloc_frozen_pages_noprof(flags, order, node, NULL);
 	if (folio) {
 		ptr = folio_address(folio);
 		lruvec_stat_mod_folio(folio, NR_SLAB_UNRECLAIMABLE_B,

-- 
2.49.0
Re: [PATCH 2/2] mm, slab: support NUMA policy for large kmalloc
Posted by Harry Yoo 6 months, 2 weeks ago
On Thu, May 29, 2025 at 10:56:27AM +0200, Vlastimil Babka wrote:
> The slab allocator observes the task's numa policy in various places
> such as allocating slab pages. Large kmalloc allocations currently do
> not, which seems to be an unintended omission. It is simple to correct
> that, so make ___kmalloc_large_node() behave the same way as
> alloc_slab_page().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---

Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon

>  mm/slub.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index d7a62063a1676a327e13536bf724f0160f1fc8dc..d87015fad2df65629050d9bcd224facd3d2f4033 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4281,11 +4281,13 @@ static void *___kmalloc_large_node(size_t size, gfp_t flags, int node)
>  	if (unlikely(flags & GFP_SLAB_BUG_MASK))
>  		flags = kmalloc_fix_flags(flags);
>  
> +	flags |= __GFP_COMP;
> +
>  	if (node == NUMA_NO_NODE)
> -		node = numa_mem_id();
> +		folio = (struct folio *)alloc_frozen_pages_noprof(flags, order);
> +	else
> +		folio = (struct folio *)__alloc_frozen_pages_noprof(flags, order, node, NULL);
>  
> -	flags |= __GFP_COMP;
> -	folio = (struct folio *)__alloc_frozen_pages_noprof(flags, order, node, NULL);
>  	if (folio) {
>  		ptr = folio_address(folio);
>  		lruvec_stat_mod_folio(folio, NR_SLAB_UNRECLAIMABLE_B,
> 
> -- 
> 2.49.0
>
Re: [PATCH 2/2] mm, slab: support NUMA policy for large kmalloc
Posted by Christoph Lameter (Ampere) 6 months, 2 weeks ago
On Thu, 29 May 2025, Vlastimil Babka wrote:

> The slab allocator observes the task's numa policy in various places
> such as allocating slab pages. Large kmalloc allocations currently do
> not, which seems to be an unintended omission. It is simple to correct
> that, so make ___kmalloc_large_node() behave the same way as
> alloc_slab_page().

Large kmalloc allocation lead to the use of the page allocator which
implements the NUMA policies for the allocations.

This patch is not necessary.
Re: [PATCH 2/2] mm, slab: support NUMA policy for large kmalloc
Posted by Vlastimil Babka 6 months, 2 weeks ago
On 5/29/25 16:57, Christoph Lameter (Ampere) wrote:
> On Thu, 29 May 2025, Vlastimil Babka wrote:
> 
>> The slab allocator observes the task's numa policy in various places
>> such as allocating slab pages. Large kmalloc allocations currently do
>> not, which seems to be an unintended omission. It is simple to correct
>> that, so make ___kmalloc_large_node() behave the same way as
>> alloc_slab_page().
> 
> Large kmalloc allocation lead to the use of the page allocator which
> implements the NUMA policies for the allocations.
>
> This patch is not necessary.

I'm confused, as that's only true depending on which page allocator entry
point you use. AFAICS before this series, it's using
alloc_pages_node_noprof() which only does


        if (nid == NUMA_NO_NODE)
                nid = numa_mem_id();

and no mempolicies.

I see this patch as analogical to your commit 1941b31482a6 ("Reenable NUMA
policy support in the slab allocator")

Am I missing something?
Re: [PATCH 2/2] mm, slab: support NUMA policy for large kmalloc
Posted by Christoph Lameter (Ampere) 6 months, 2 weeks ago
On Thu, 29 May 2025, Vlastimil Babka wrote:

> On 5/29/25 16:57, Christoph Lameter (Ampere) wrote:
> > On Thu, 29 May 2025, Vlastimil Babka wrote:
> >
> >> The slab allocator observes the task's numa policy in various places
> >> such as allocating slab pages. Large kmalloc allocations currently do
> >> not, which seems to be an unintended omission. It is simple to correct
> >> that, so make ___kmalloc_large_node() behave the same way as
> >> alloc_slab_page().
> >
> > Large kmalloc allocation lead to the use of the page allocator which
> > implements the NUMA policies for the allocations.
> >
> > This patch is not necessary.
>
> I'm confused, as that's only true depending on which page allocator entry
> point you use. AFAICS before this series, it's using
> alloc_pages_node_noprof() which only does
>
>
>         if (nid == NUMA_NO_NODE)
>                 nid = numa_mem_id();
>
> and no mempolicies.

That is a bug.

> I see this patch as analogical to your commit 1941b31482a6 ("Reenable NUMA
> policy support in the slab allocator")
>
> Am I missing something?

The page allocator has its own NUMA suport.

The patch to reenable NUMA support dealt with an issue within the
allocator where the memory policies were ignored.

It seems that the error was repeated for large kmalloc allocations.
Instead of respecting memory allocation policies the allocation is forced
to be local to the node.

The forcing to the node is possible with GFP_THISNODE. The default needs
to be following memory policies.
Re: [PATCH 2/2] mm, slab: support NUMA policy for large kmalloc
Posted by Vlastimil Babka 6 months, 2 weeks ago
On 5/30/25 21:05, Christoph Lameter (Ampere) wrote:
> On Thu, 29 May 2025, Vlastimil Babka wrote:
> 
>> On 5/29/25 16:57, Christoph Lameter (Ampere) wrote:
>> > On Thu, 29 May 2025, Vlastimil Babka wrote:
>> >
>> >> The slab allocator observes the task's numa policy in various places
>> >> such as allocating slab pages. Large kmalloc allocations currently do
>> >> not, which seems to be an unintended omission. It is simple to correct
>> >> that, so make ___kmalloc_large_node() behave the same way as
>> >> alloc_slab_page().
>> >
>> > Large kmalloc allocation lead to the use of the page allocator which
>> > implements the NUMA policies for the allocations.
>> >
>> > This patch is not necessary.
>>
>> I'm confused, as that's only true depending on which page allocator entry
>> point you use. AFAICS before this series, it's using
>> alloc_pages_node_noprof() which only does
>>
>>
>>         if (nid == NUMA_NO_NODE)
>>                 nid = numa_mem_id();
>>
>> and no mempolicies.
> 
> That is a bug.
> 
>> I see this patch as analogical to your commit 1941b31482a6 ("Reenable NUMA
>> policy support in the slab allocator")
>>
>> Am I missing something?
> 
> The page allocator has its own NUMA suport.

It has support for respecting a preferred node (or forced node with
__GFP_THISNODE), nodemask, cpusets.

Mempolicy support is arguably outside the page allocator itself -
alloc_pages_mpol() lives in mm/mempolicy.c. Although some generically
looking wrappers alloc_pages() lead to it, while others don't
(__alloc_pages()). It's a kinda mess.

> The patch to reenable NUMA support dealt with an issue within the
> allocator where the memory policies were ignored.

I'd agree in the sense the issue was within the slab allocator, calling the
non-mpol-aware page allocator entry, which was not intended.

> It seems that the error was repeated for large kmalloc allocations.

After some digging, seems you're right and the error was done in commit
c4cab557521a ("mm/slab_common: cleanup kmalloc_large()") in v6.1. I'll add a
Fixes: tag and reword changelog accordingly.

> Instead of respecting memory allocation policies the allocation is forced
> to be local to the node.

It's not forced, but preferred, unless kmalloc() caller itself passes
__GFP_THISNODE. The slab layer doesn't add it.

> The forcing to the node is possible with GFP_THISNODE. The default needs
> to be following memory policies.



>