slab: support memoryless nodes with sheaves

[PATCH 2/3] slab: create barns for online memoryless nodes

Posted by Vlastimil Babka (SUSE) 4 weeks ago

Ming Lei has reported [1] a performance regression due to replacing cpu
(partial) slabs with sheaves. With slub stats enabled, a large amount of
slowpath allocations were observed. The affected system has 8 online
NUMA nodes but only 2 have memory.

For sheaves to work effectively on given cpu, its NUMA node has to have
struct node_barn allocated. Those are currently only allocated on nodes
with memory (N_MEMORY) where kmem_cache_node also exist as the goal is
to cache only node-local objects. But in order to have good performance
on a memoryless node, we need its barn to exist and use sheaves to cache
non-local objects (as no local objects can exist anyway).

Therefore change the implementation to allocate barns on all online
nodes, tracked in a new nodemask slab_barn_nodes. Also add a cpu hotplug
callback as that's when a memoryless node can become online.

Change rcu_sheaf->node assignment to numa_node_id() so it's returned to
the barn of the local cpu's (potentially memoryless) node, and not to
the nearest node with memory anymore.

Reported-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/ [1]
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 mm/slub.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 59 insertions(+), 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 609a183f8533..d8496b37e364 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -472,6 +472,12 @@ static inline struct node_barn *get_barn(struct kmem_cache *s)
  */
 static nodemask_t slab_nodes;
 
+/*
+ * Similar to slab_nodes but for where we have node_barn allocated.
+ * Corresponds to N_ONLINE nodes.
+ */
+static nodemask_t slab_barn_nodes;
+
 /*
  * Workqueue used for flushing cpu and kfree_rcu sheaves.
  */
@@ -4084,6 +4090,51 @@ void flush_all_rcu_sheaves(void)
 	rcu_barrier();
 }
 
+static int slub_cpu_setup(unsigned int cpu)
+{
+	int nid = cpu_to_node(cpu);
+	struct kmem_cache *s;
+	int ret = 0;
+
+	/*
+	 * we never clear a nid so it's safe to do a quick check before taking
+	 * the mutex, and then recheck to handle parallel cpu hotplug safely
+	 */
+	if (node_isset(nid, slab_barn_nodes))
+		return 0;
+
+	mutex_lock(&slab_mutex);
+
+	if (node_isset(nid, slab_barn_nodes))
+		goto out;
+
+	list_for_each_entry(s, &slab_caches, list) {
+		struct node_barn *barn;
+
+		/*
+		 * barn might already exist if a previous callback failed midway
+		 */
+		if (!cache_has_sheaves(s) || get_barn_node(s, nid))
+			continue;
+
+		barn = kmalloc_node(sizeof(*barn), GFP_KERNEL, nid);
+
+		if (!barn) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		barn_init(barn);
+		s->per_node[nid].barn = barn;
+	}
+	node_set(nid, slab_barn_nodes);
+
+out:
+	mutex_unlock(&slab_mutex);
+
+	return ret;
+}
+
 /*
  * Use the cpu notifier to insure that the cpu slabs are flushed when
  * necessary.
@@ -5936,7 +5987,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj)
 		rcu_sheaf = NULL;
 	} else {
 		pcs->rcu_free = NULL;
-		rcu_sheaf->node = numa_mem_id();
+		rcu_sheaf->node = numa_node_id();
 	}
 
 	/*
@@ -7597,7 +7648,7 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
 	if (slab_state == DOWN || !cache_has_sheaves(s))
 		return 1;
 
-	for_each_node_mask(node, slab_nodes) {
+	for_each_node_mask(node, slab_barn_nodes) {
 		struct node_barn *barn;
 
 		barn = kmalloc_node(sizeof(*barn), GFP_KERNEL, node);
@@ -8250,6 +8301,7 @@ static int slab_mem_going_online_callback(int nid)
 	 * and barn initialized for the new node.
 	 */
 	node_set(nid, slab_nodes);
+	node_set(nid, slab_barn_nodes);
 out:
 	mutex_unlock(&slab_mutex);
 	return ret;
@@ -8328,7 +8380,7 @@ static void __init bootstrap_cache_sheaves(struct kmem_cache *s)
 	if (!capacity)
 		return;
 
-	for_each_node_mask(node, slab_nodes) {
+	for_each_node_mask(node, slab_barn_nodes) {
 		struct node_barn *barn;
 
 		barn = kmalloc_node(sizeof(*barn), GFP_KERNEL, node);
@@ -8400,6 +8452,9 @@ void __init kmem_cache_init(void)
 	for_each_node_state(node, N_MEMORY)
 		node_set(node, slab_nodes);
 
+	for_each_online_node(node)
+		node_set(node, slab_barn_nodes);
+
 	create_boot_cache(kmem_cache_node, "kmem_cache_node",
 			sizeof(struct kmem_cache_node),
 			SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
@@ -8426,7 +8481,7 @@ void __init kmem_cache_init(void)
 	/* Setup random freelists for each cache */
 	init_freelist_randomization();
 
-	cpuhp_setup_state_nocalls(CPUHP_SLUB_DEAD, "slub:dead", NULL,
+	cpuhp_setup_state_nocalls(CPUHP_SLUB_DEAD, "slub:dead", slub_cpu_setup,
 				  slub_cpu_dead);
 
 	pr_info("SLUB: HWalign=%d, Order=%u-%u, MinObjects=%u, CPUs=%u, Nodes=%u\n",

-- 
2.53.0

Re: [PATCH 2/3] slab: create barns for online memoryless nodes

Posted by Hao Li 3 weeks ago

On Wed, Mar 11, 2026 at 09:25:56AM +0100, Vlastimil Babka (SUSE) wrote:
> Ming Lei has reported [1] a performance regression due to replacing cpu
> (partial) slabs with sheaves. With slub stats enabled, a large amount of
> slowpath allocations were observed. The affected system has 8 online
> NUMA nodes but only 2 have memory.
> 
> For sheaves to work effectively on given cpu, its NUMA node has to have
> struct node_barn allocated. Those are currently only allocated on nodes
> with memory (N_MEMORY) where kmem_cache_node also exist as the goal is
> to cache only node-local objects. But in order to have good performance
> on a memoryless node, we need its barn to exist and use sheaves to cache
> non-local objects (as no local objects can exist anyway).
> 
> Therefore change the implementation to allocate barns on all online
> nodes, tracked in a new nodemask slab_barn_nodes. Also add a cpu hotplug
> callback as that's when a memoryless node can become online.
> 
> Change rcu_sheaf->node assignment to numa_node_id() so it's returned to
> the barn of the local cpu's (potentially memoryless) node, and not to
> the nearest node with memory anymore.
> 
> Reported-by: Ming Lei <ming.lei@redhat.com>
> Link: https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/ [1]
> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> ---
>  mm/slub.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 59 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 609a183f8533..d8496b37e364 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
[...]
>  
>  	/*
> @@ -7597,7 +7648,7 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
>  	if (slab_state == DOWN || !cache_has_sheaves(s))
>  		return 1;
>  
> -	for_each_node_mask(node, slab_nodes) {
> +	for_each_node_mask(node, slab_barn_nodes) {
>  		struct node_barn *barn;
>  
>  		barn = kmalloc_node(sizeof(*barn), GFP_KERNEL, node);
> @@ -8250,6 +8301,7 @@ static int slab_mem_going_online_callback(int nid)
>  	 * and barn initialized for the new node.
>  	 */
>  	node_set(nid, slab_nodes);
> +	node_set(nid, slab_barn_nodes);

I had a somewhat related question here.

During memory hotplug, we call node_set() on slab_nodes when memory is brought
online, but we do not seem to call node_clear() when memory is taken offline. I
was wondering what the reasoning behind this is.

That also made me wonder about a related case. If I am understanding this
correctly, even if all memory of a node has been offlined, slab_nodes would
still make it appear that the node has memory, even though in reality it no
longer does. If so, then in patch 3, the condition
"if (unlikely(!node_isset(numa_node, slab_nodes)))" in can_free_to_pcs() seems
would cause the object free path to skip sheaves.

-- 
Thanks,
Hao

Re: [PATCH 2/3] slab: create barns for online memoryless nodes

Posted by Vlastimil Babka (SUSE) 3 weeks ago

On 3/18/26 10:27, Hao Li wrote:
> On Wed, Mar 11, 2026 at 09:25:56AM +0100, Vlastimil Babka (SUSE) wrote:
>> Ming Lei has reported [1] a performance regression due to replacing cpu
>> (partial) slabs with sheaves. With slub stats enabled, a large amount of
>> slowpath allocations were observed. The affected system has 8 online
>> NUMA nodes but only 2 have memory.
>> 
>> For sheaves to work effectively on given cpu, its NUMA node has to have
>> struct node_barn allocated. Those are currently only allocated on nodes
>> with memory (N_MEMORY) where kmem_cache_node also exist as the goal is
>> to cache only node-local objects. But in order to have good performance
>> on a memoryless node, we need its barn to exist and use sheaves to cache
>> non-local objects (as no local objects can exist anyway).
>> 
>> Therefore change the implementation to allocate barns on all online
>> nodes, tracked in a new nodemask slab_barn_nodes. Also add a cpu hotplug
>> callback as that's when a memoryless node can become online.
>> 
>> Change rcu_sheaf->node assignment to numa_node_id() so it's returned to
>> the barn of the local cpu's (potentially memoryless) node, and not to
>> the nearest node with memory anymore.
>> 
>> Reported-by: Ming Lei <ming.lei@redhat.com>
>> Link: https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/ [1]
>> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>> ---
>>  mm/slub.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 59 insertions(+), 4 deletions(-)
>> 
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 609a183f8533..d8496b37e364 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
> [...]
>>  
>>  	/*
>> @@ -7597,7 +7648,7 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
>>  	if (slab_state == DOWN || !cache_has_sheaves(s))
>>  		return 1;
>>  
>> -	for_each_node_mask(node, slab_nodes) {
>> +	for_each_node_mask(node, slab_barn_nodes) {
>>  		struct node_barn *barn;
>>  
>>  		barn = kmalloc_node(sizeof(*barn), GFP_KERNEL, node);
>> @@ -8250,6 +8301,7 @@ static int slab_mem_going_online_callback(int nid)
>>  	 * and barn initialized for the new node.
>>  	 */
>>  	node_set(nid, slab_nodes);
>> +	node_set(nid, slab_barn_nodes);
> 
> I had a somewhat related question here.
> 
> During memory hotplug, we call node_set() on slab_nodes when memory is brought
> online, but we do not seem to call node_clear() when memory is taken offline. I
> was wondering what the reasoning behind this is.

Probably nobody took the task the implement the necessary teardown.

> That also made me wonder about a related case. If I am understanding this
> correctly, even if all memory of a node has been offlined, slab_nodes would
> still make it appear that the node has memory, even though in reality it no
> longer does. If so, then in patch 3, the condition
> "if (unlikely(!node_isset(numa_node, slab_nodes)))" in can_free_to_pcs() seems
> would cause the object free path to skip sheaves.

Maybe the condition should be looking at N_MEMORY then?

Also ideally we should be using N_NORMAL_MEMORY everywhere for slab_nodes.
Oh we actually did, but give that up in commit 1bf47d4195e45.

Note in practice full memory offline of a node can only be achieved if it
was all ZONE_MOVABLE and thus no slab allocations ever happened on it. But
if it has only movable memory, it's practically memoryless for slab
purposes. Maybe the condition should be looking at N_NORMAL_MEMORY then.
That would cover the case when it became offline and also the case when it's
online but with only movable memory?

I don't know if with CONFIG_HAVE_MEMORYLESS_NODES it's possible that
numa_mem_id() (the closest node with memory) would be ZONE_MOVABLE only.
Maybe let's hope not, and not adjust that part?

Re: [PATCH 2/3] slab: create barns for online memoryless nodes

Posted by Hao Li 2 weeks, 6 days ago

On Wed, Mar 18, 2026 at 01:11:58PM +0100, Vlastimil Babka (SUSE) wrote:
> On 3/18/26 10:27, Hao Li wrote:
> > On Wed, Mar 11, 2026 at 09:25:56AM +0100, Vlastimil Babka (SUSE) wrote:
> >> Ming Lei has reported [1] a performance regression due to replacing cpu
> >> (partial) slabs with sheaves. With slub stats enabled, a large amount of
> >> slowpath allocations were observed. The affected system has 8 online
> >> NUMA nodes but only 2 have memory.
> >> 
> >> For sheaves to work effectively on given cpu, its NUMA node has to have
> >> struct node_barn allocated. Those are currently only allocated on nodes
> >> with memory (N_MEMORY) where kmem_cache_node also exist as the goal is
> >> to cache only node-local objects. But in order to have good performance
> >> on a memoryless node, we need its barn to exist and use sheaves to cache
> >> non-local objects (as no local objects can exist anyway).
> >> 
> >> Therefore change the implementation to allocate barns on all online
> >> nodes, tracked in a new nodemask slab_barn_nodes. Also add a cpu hotplug
> >> callback as that's when a memoryless node can become online.
> >> 
> >> Change rcu_sheaf->node assignment to numa_node_id() so it's returned to
> >> the barn of the local cpu's (potentially memoryless) node, and not to
> >> the nearest node with memory anymore.
> >> 
> >> Reported-by: Ming Lei <ming.lei@redhat.com>
> >> Link: https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/ [1]
> >> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> >> ---
> >>  mm/slub.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
> >>  1 file changed, 59 insertions(+), 4 deletions(-)
> >> 
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index 609a183f8533..d8496b37e364 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> > [...]
> >>  
> >>  	/*
> >> @@ -7597,7 +7648,7 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
> >>  	if (slab_state == DOWN || !cache_has_sheaves(s))
> >>  		return 1;
> >>  
> >> -	for_each_node_mask(node, slab_nodes) {
> >> +	for_each_node_mask(node, slab_barn_nodes) {
> >>  		struct node_barn *barn;
> >>  
> >>  		barn = kmalloc_node(sizeof(*barn), GFP_KERNEL, node);
> >> @@ -8250,6 +8301,7 @@ static int slab_mem_going_online_callback(int nid)
> >>  	 * and barn initialized for the new node.
> >>  	 */
> >>  	node_set(nid, slab_nodes);
> >> +	node_set(nid, slab_barn_nodes);
> > 
> > I had a somewhat related question here.
> > 
> > During memory hotplug, we call node_set() on slab_nodes when memory is brought
> > online, but we do not seem to call node_clear() when memory is taken offline. I
> > was wondering what the reasoning behind this is.
> 
> Probably nobody took the task the implement the necessary teardown.
> 
> > That also made me wonder about a related case. If I am understanding this
> > correctly, even if all memory of a node has been offlined, slab_nodes would
> > still make it appear that the node has memory, even though in reality it no
> > longer does. If so, then in patch 3, the condition
> > "if (unlikely(!node_isset(numa_node, slab_nodes)))" in can_free_to_pcs() seems
> > would cause the object free path to skip sheaves.
> 
> Maybe the condition should be looking at N_MEMORY then?

Yes, that's what I was thinking too.
I feel that, at least for the current patchset, this is probably a reasonable
approach.

> 
> Also ideally we should be using N_NORMAL_MEMORY everywhere for slab_nodes.
> Oh we actually did, but give that up in commit 1bf47d4195e45.

Thanks, I hadn't realized that node_clear had actually existed before.

> 
> Note in practice full memory offline of a node can only be achieved if it
> was all ZONE_MOVABLE and thus no slab allocations ever happened on it. But
> if it has only movable memory, it's practically memoryless for slab
> purposes.

That's a good point! I just realized that too.

> Maybe the condition should be looking at N_NORMAL_MEMORY then.
> That would cover the case when it became offline and also the case when it's
> online but with only movable memory?

Exactly, conceptually, N_NORMAL_MEMORY seems more precise than N_MEMORY. I took
a quick look through the code, though, and it seems that N_NORMAL_MEMORY hasn't
been fully handled in the hotplug code.

Given that, I think it makes sense to use N_MEMORY for now, and then switch to
N_NORMAL_MEMORY later once the handling there is improved.

> 
> I don't know if with CONFIG_HAVE_MEMORYLESS_NODES it's possible that
> numa_mem_id() (the closest node with memory) would be ZONE_MOVABLE only.
> Maybe let's hope not, and not adjust that part?
> 

I think that, in the CONFIG_HAVE_MEMORYLESS_NODES=y case, numa_mem_id() ends up
calling local_memory_node(), and the NUMA node it returns should be one that
can allocate slab memory. So the slab_node == numa_node check seems reasonable
to me.

So it seems that the issue being discussed here may only be specific to the
CONFIG_HAVE_MEMORYLESS_NODES=n case.

-- 
Thanks,
Hao

Re: [PATCH 2/3] slab: create barns for online memoryless nodes

Posted by Vlastimil Babka (SUSE) 2 weeks, 6 days ago

On 3/19/26 08:01, Hao Li wrote:
> On Wed, Mar 18, 2026 at 01:11:58PM +0100, Vlastimil Babka (SUSE) wrote:
>> On 3/18/26 10:27, Hao Li wrote:
>> > On Wed, Mar 11, 2026 at 09:25:56AM +0100, Vlastimil Babka (SUSE) wrote:
>> > 
>> > I had a somewhat related question here.
>> > 
>> > During memory hotplug, we call node_set() on slab_nodes when memory is brought
>> > online, but we do not seem to call node_clear() when memory is taken offline. I
>> > was wondering what the reasoning behind this is.
>> 
>> Probably nobody took the task the implement the necessary teardown.
>> 
>> > That also made me wonder about a related case. If I am understanding this
>> > correctly, even if all memory of a node has been offlined, slab_nodes would
>> > still make it appear that the node has memory, even though in reality it no
>> > longer does. If so, then in patch 3, the condition
>> > "if (unlikely(!node_isset(numa_node, slab_nodes)))" in can_free_to_pcs() seems
>> > would cause the object free path to skip sheaves.
>> 
>> Maybe the condition should be looking at N_MEMORY then?
> 
> Yes, that's what I was thinking too.
> I feel that, at least for the current patchset, this is probably a reasonable
> approach.

Ack.

>> 
>> Also ideally we should be using N_NORMAL_MEMORY everywhere for slab_nodes.
>> Oh we actually did, but give that up in commit 1bf47d4195e45.
> 
> Thanks, I hadn't realized that node_clear had actually existed before.
> 
>> 
>> Note in practice full memory offline of a node can only be achieved if it
>> was all ZONE_MOVABLE and thus no slab allocations ever happened on it. But
>> if it has only movable memory, it's practically memoryless for slab
>> purposes.
> 
> That's a good point! I just realized that too.
> 
>> Maybe the condition should be looking at N_NORMAL_MEMORY then.
>> That would cover the case when it became offline and also the case when it's
>> online but with only movable memory?
> 
> Exactly, conceptually, N_NORMAL_MEMORY seems more precise than N_MEMORY. I took
> a quick look through the code, though, and it seems that N_NORMAL_MEMORY hasn't
> been fully handled in the hotplug code.

Huh you're right, the hotplug code doesn't seem to set it. How much code
that we have is broken by that?
It seems hotplug doesn't handle it since 2007 in commit 37b07e4163f7
("memoryless nodes: fixup uses of node_online_map in generic code"),
although the initial support in 7ea1530ab3fd ("Memoryless nodes: introduce
mask of nodes with memory") did set it from hotplug.

> Given that, I think it makes sense to use N_MEMORY for now, and then switch to
> N_NORMAL_MEMORY later once the handling there is improved.

So I'll do this:

diff --git a/mm/slub.c b/mm/slub.c
index 01ab90bb4622..fb2c5c57bc4e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -6029,7 +6029,7 @@ static __always_inline bool can_free_to_pcs(struct
slab *slab)
         * point to the closest node as we would on a proper memoryless node
         * setup.
         */
-       if (unlikely(!node_isset(numa_node, slab_nodes)))
+       if (unlikely(!node_state(numa_node, N_MEMORY)))
                goto check_pfmemalloc;
 #endif


>> 
>> I don't know if with CONFIG_HAVE_MEMORYLESS_NODES it's possible that
>> numa_mem_id() (the closest node with memory) would be ZONE_MOVABLE only.
>> Maybe let's hope not, and not adjust that part?
>> 
> 
> I think that, in the CONFIG_HAVE_MEMORYLESS_NODES=y case, numa_mem_id() ends up
> calling local_memory_node(), and the NUMA node it returns should be one that
> can allocate slab memory. So the slab_node == numa_node check seems reasonable
> to me.
> 
> So it seems that the issue being discussed here may only be specific to the
> CONFIG_HAVE_MEMORYLESS_NODES=n case.

Great. Thanks!

Re: [PATCH 2/3] slab: create barns for online memoryless nodes

Posted by Hao Li 2 weeks, 6 days ago

On Thu, Mar 19, 2026 at 10:56:09AM +0100, Vlastimil Babka (SUSE) wrote:
> On 3/19/26 08:01, Hao Li wrote:
> > On Wed, Mar 18, 2026 at 01:11:58PM +0100, Vlastimil Babka (SUSE) wrote:
> >> On 3/18/26 10:27, Hao Li wrote:
> >> > On Wed, Mar 11, 2026 at 09:25:56AM +0100, Vlastimil Babka (SUSE) wrote:
> >> > 
> >> > I had a somewhat related question here.
> >> > 
> >> > During memory hotplug, we call node_set() on slab_nodes when memory is brought
> >> > online, but we do not seem to call node_clear() when memory is taken offline. I
> >> > was wondering what the reasoning behind this is.
> >> 
> >> Probably nobody took the task the implement the necessary teardown.
> >> 
> >> > That also made me wonder about a related case. If I am understanding this
> >> > correctly, even if all memory of a node has been offlined, slab_nodes would
> >> > still make it appear that the node has memory, even though in reality it no
> >> > longer does. If so, then in patch 3, the condition
> >> > "if (unlikely(!node_isset(numa_node, slab_nodes)))" in can_free_to_pcs() seems
> >> > would cause the object free path to skip sheaves.
> >> 
> >> Maybe the condition should be looking at N_MEMORY then?
> > 
> > Yes, that's what I was thinking too.
> > I feel that, at least for the current patchset, this is probably a reasonable
> > approach.
> 
> Ack.
> 
> >> 
> >> Also ideally we should be using N_NORMAL_MEMORY everywhere for slab_nodes.
> >> Oh we actually did, but give that up in commit 1bf47d4195e45.
> > 
> > Thanks, I hadn't realized that node_clear had actually existed before.
> > 
> >> 
> >> Note in practice full memory offline of a node can only be achieved if it
> >> was all ZONE_MOVABLE and thus no slab allocations ever happened on it. But
> >> if it has only movable memory, it's practically memoryless for slab
> >> purposes.
> > 
> > That's a good point! I just realized that too.
> > 
> >> Maybe the condition should be looking at N_NORMAL_MEMORY then.
> >> That would cover the case when it became offline and also the case when it's
> >> online but with only movable memory?
> > 
> > Exactly, conceptually, N_NORMAL_MEMORY seems more precise than N_MEMORY. I took
> > a quick look through the code, though, and it seems that N_NORMAL_MEMORY hasn't
> > been fully handled in the hotplug code.
> 
> Huh you're right, the hotplug code doesn't seem to set it. How much code
> that we have is broken by that?

This probably needs a bit more digging.

> It seems hotplug doesn't handle it since 2007 in commit 37b07e4163f7
> ("memoryless nodes: fixup uses of node_online_map in generic code"),
> although the initial support in 7ea1530ab3fd ("Memoryless nodes: introduce
> mask of nodes with memory") did set it from hotplug.

Yes, this really is quite an old issue. It looks like we may need to dig
through the git history a bit more carefully.

I'd be happy to dig into it further.

> 
> > Given that, I think it makes sense to use N_MEMORY for now, and then switch to
> > N_NORMAL_MEMORY later once the handling there is improved.
> 
> So I'll do this:
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 01ab90bb4622..fb2c5c57bc4e 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -6029,7 +6029,7 @@ static __always_inline bool can_free_to_pcs(struct
> slab *slab)
>          * point to the closest node as we would on a proper memoryless node
>          * setup.
>          */
> -       if (unlikely(!node_isset(numa_node, slab_nodes)))
> +       if (unlikely(!node_state(numa_node, N_MEMORY)))

Looks good to me.

I've gone through the full series, including the range-diff updates, and the
rest looks good to me.
Feel free to add my rb-tag to three updated patches. Thanks!

Reviewed-by: Hao Li <hao.li@linux.dev>

>                 goto check_pfmemalloc;
>  #endif
> 
> 
> >> 
> >> I don't know if with CONFIG_HAVE_MEMORYLESS_NODES it's possible that
> >> numa_mem_id() (the closest node with memory) would be ZONE_MOVABLE only.
> >> Maybe let's hope not, and not adjust that part?
> >> 
> > 
> > I think that, in the CONFIG_HAVE_MEMORYLESS_NODES=y case, numa_mem_id() ends up
> > calling local_memory_node(), and the NUMA node it returns should be one that
> > can allocate slab memory. So the slab_node == numa_node check seems reasonable
> > to me.
> > 
> > So it seems that the issue being discussed here may only be specific to the
> > CONFIG_HAVE_MEMORYLESS_NODES=n case.
> 
> Great. Thanks!
>

Re: [PATCH 2/3] slab: create barns for online memoryless nodes

Posted by Vlastimil Babka (SUSE) 2 weeks, 6 days ago

On 3/19/26 12:27, Hao Li wrote:
> On Thu, Mar 19, 2026 at 10:56:09AM +0100, Vlastimil Babka (SUSE) wrote:
>> > 
>> > Exactly, conceptually, N_NORMAL_MEMORY seems more precise than N_MEMORY. I took
>> > a quick look through the code, though, and it seems that N_NORMAL_MEMORY hasn't
>> > been fully handled in the hotplug code.
>> 
>> Huh you're right, the hotplug code doesn't seem to set it. How much code
>> that we have is broken by that?
> 
> This probably needs a bit more digging.
> 
>> It seems hotplug doesn't handle it since 2007 in commit 37b07e4163f7
>> ("memoryless nodes: fixup uses of node_online_map in generic code"),
>> although the initial support in 7ea1530ab3fd ("Memoryless nodes: introduce
>> mask of nodes with memory") did set it from hotplug.
> 
> Yes, this really is quite an old issue. It looks like we may need to dig
> through the git history a bit more carefully.
> 
> I'd be happy to dig into it further.

Great!

> 
>> 
>> > Given that, I think it makes sense to use N_MEMORY for now, and then switch to
>> > N_NORMAL_MEMORY later once the handling there is improved.
>> 
>> So I'll do this:
>> 
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 01ab90bb4622..fb2c5c57bc4e 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -6029,7 +6029,7 @@ static __always_inline bool can_free_to_pcs(struct
>> slab *slab)
>>          * point to the closest node as we would on a proper memoryless node
>>          * setup.
>>          */
>> -       if (unlikely(!node_isset(numa_node, slab_nodes)))
>> +       if (unlikely(!node_state(numa_node, N_MEMORY)))
> 
> Looks good to me.
> 
> I've gone through the full series, including the range-diff updates, and the
> rest looks good to me.
> Feel free to add my rb-tag to three updated patches. Thanks!
> 
> Reviewed-by: Hao Li <hao.li@linux.dev>

Thanks, updated in slab/for-next

> 
>>                 goto check_pfmemalloc;
>>  #endif
>> 
>> 
>> >> 
>> >> I don't know if with CONFIG_HAVE_MEMORYLESS_NODES it's possible that
>> >> numa_mem_id() (the closest node with memory) would be ZONE_MOVABLE only.
>> >> Maybe let's hope not, and not adjust that part?
>> >> 
>> > 
>> > I think that, in the CONFIG_HAVE_MEMORYLESS_NODES=y case, numa_mem_id() ends up
>> > calling local_memory_node(), and the NUMA node it returns should be one that
>> > can allocate slab memory. So the slab_node == numa_node check seems reasonable
>> > to me.
>> > 
>> > So it seems that the issue being discussed here may only be specific to the
>> > CONFIG_HAVE_MEMORYLESS_NODES=n case.
>> 
>> Great. Thanks!
>>

Re: [PATCH 2/3] slab: create barns for online memoryless nodes

Posted by Harry Yoo 3 weeks, 3 days ago

On Wed, Mar 11, 2026 at 09:25:56AM +0100, Vlastimil Babka (SUSE) wrote:
> Ming Lei has reported [1] a performance regression due to replacing cpu
> (partial) slabs with sheaves. With slub stats enabled, a large amount of
> slowpath allocations were observed. The affected system has 8 online
> NUMA nodes but only 2 have memory.
> 
> For sheaves to work effectively on given cpu, its NUMA node has to have
> struct node_barn allocated. Those are currently only allocated on nodes
> with memory (N_MEMORY) where kmem_cache_node also exist as the goal is
> to cache only node-local objects. But in order to have good performance
> on a memoryless node, we need its barn to exist and use sheaves to cache
> non-local objects (as no local objects can exist anyway).
> 
> Therefore change the implementation to allocate barns on all online
> nodes, tracked in a new nodemask slab_barn_nodes. Also add a cpu hotplug
> callback as that's when a memoryless node can become online.
> 
> Change rcu_sheaf->node assignment to numa_node_id() so it's returned to
> the barn of the local cpu's (potentially memoryless) node, and not to
> the nearest node with memory anymore.
> 
> Reported-by: Ming Lei <ming.lei@redhat.com>
> Link: https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> ---

Looks good to me,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon

[PATCH 1/3] slab: decouple pointer to barn from kmem_cache_node
[PATCH 2/3] slab: create barns for online memoryless nodes
[PATCH 3/3] slab: free remote objects to sheaves on memoryless nodes