From nobody Thu Apr 9 05:04:51 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 51A2F372EF5 for ; Wed, 11 Mar 2026 08:26:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773217587; cv=none; b=MvQPhoyXYN4gwgSptyGbSqftG0xbVpXFcJXuzjc3FMxfIxWJueE/Kn4j+eZUHY5ZSQ6ymvJJ02WK21wxjEvhQGzrVZUveYgMOwfgqJQEsalMdKm1kQxxK6egDEqY2UcY/ayZSInGqC91G09N2R6L87o/3wk20e5qxH319L4eSl0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773217587; c=relaxed/simple; bh=7ftedvnDDKq5y6BWhunjYlsl4QAcgEbP/dn6zuiDQ7s=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Tt3Pc0qPqzJ1U7nAO5u/aSjxwJ4ghVBWPCkPsXXyfvjQxrdndrgrsZNoN8woJDEAKHEFmlm9fDAM06vIKJH5i17xGi27xxIb3mCr4Nq0viQfisdWLPTGfRt2NFEORLhOk+YDHYtGQOcMmQsYzjODCmvMqfDfgya+ISPc/b1lfZg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lgQxR7KS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lgQxR7KS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0EBE0C4CEF7; Wed, 11 Mar 2026 08:26:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773217587; bh=7ftedvnDDKq5y6BWhunjYlsl4QAcgEbP/dn6zuiDQ7s=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=lgQxR7KSv9pD12Ev8UnEWaIpcxbh0lmHLmhmzXslAhcisUhvcwew4kpyAN9sZpF+G w+EaPQCNsrNUHqEcWnmvRrVSYz4vj7R1sK8DKGX1Z32pQ/U9Tih/PJDlwBcL5c1pnd ETw03Brp5jcAk4Vu1imfoOU2n9WywlhXwKSyhzzenOmePitloRtTU/jzqQu8IxUqGu /G5bkYgH7FpL8NOnE1g3JDwEz+DUVLau0l6zSGUuUnThuz7gn6/pkwgFSeYTuP/QDZ Qg7/5N+c3bWh8OlFYJO9aG12U8FqHMDsMKcLPwdKDWKlWhGTEArtL4DnMKLYvMCtno COhzKEFfs04Ow== From: "Vlastimil Babka (SUSE)" Date: Wed, 11 Mar 2026 09:25:56 +0100 Subject: [PATCH 2/3] slab: create barns for online memoryless nodes Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260311-b4-slab-memoryless-barns-v1-2-70ab850be4ce@kernel.org> References: <20260311-b4-slab-memoryless-barns-v1-0-70ab850be4ce@kernel.org> In-Reply-To: <20260311-b4-slab-memoryless-barns-v1-0-70ab850be4ce@kernel.org> To: Ming Lei , Harry Yoo Cc: Hao Li , Andrew Morton , Christoph Lameter , David Rientjes , Roman Gushchin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Vlastimil Babka (SUSE)" X-Mailer: b4 0.14.3 Ming Lei has reported [1] a performance regression due to replacing cpu (partial) slabs with sheaves. With slub stats enabled, a large amount of slowpath allocations were observed. The affected system has 8 online NUMA nodes but only 2 have memory. For sheaves to work effectively on given cpu, its NUMA node has to have struct node_barn allocated. Those are currently only allocated on nodes with memory (N_MEMORY) where kmem_cache_node also exist as the goal is to cache only node-local objects. But in order to have good performance on a memoryless node, we need its barn to exist and use sheaves to cache non-local objects (as no local objects can exist anyway). Therefore change the implementation to allocate barns on all online nodes, tracked in a new nodemask slab_barn_nodes. Also add a cpu hotplug callback as that's when a memoryless node can become online. Change rcu_sheaf->node assignment to numa_node_id() so it's returned to the barn of the local cpu's (potentially memoryless) node, and not to the nearest node with memory anymore. Reported-by: Ming Lei Link: https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/ [1] Signed-off-by: Vlastimil Babka (SUSE) Reviewed-by: Hao Li Reviewed-by: Harry Yoo --- mm/slub.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++= ---- 1 file changed, 59 insertions(+), 4 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 609a183f8533..d8496b37e364 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -472,6 +472,12 @@ static inline struct node_barn *get_barn(struct kmem_c= ache *s) */ static nodemask_t slab_nodes; =20 +/* + * Similar to slab_nodes but for where we have node_barn allocated. + * Corresponds to N_ONLINE nodes. + */ +static nodemask_t slab_barn_nodes; + /* * Workqueue used for flushing cpu and kfree_rcu sheaves. */ @@ -4084,6 +4090,51 @@ void flush_all_rcu_sheaves(void) rcu_barrier(); } =20 +static int slub_cpu_setup(unsigned int cpu) +{ + int nid =3D cpu_to_node(cpu); + struct kmem_cache *s; + int ret =3D 0; + + /* + * we never clear a nid so it's safe to do a quick check before taking + * the mutex, and then recheck to handle parallel cpu hotplug safely + */ + if (node_isset(nid, slab_barn_nodes)) + return 0; + + mutex_lock(&slab_mutex); + + if (node_isset(nid, slab_barn_nodes)) + goto out; + + list_for_each_entry(s, &slab_caches, list) { + struct node_barn *barn; + + /* + * barn might already exist if a previous callback failed midway + */ + if (!cache_has_sheaves(s) || get_barn_node(s, nid)) + continue; + + barn =3D kmalloc_node(sizeof(*barn), GFP_KERNEL, nid); + + if (!barn) { + ret =3D -ENOMEM; + goto out; + } + + barn_init(barn); + s->per_node[nid].barn =3D barn; + } + node_set(nid, slab_barn_nodes); + +out: + mutex_unlock(&slab_mutex); + + return ret; +} + /* * Use the cpu notifier to insure that the cpu slabs are flushed when * necessary. @@ -5936,7 +5987,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *ob= j) rcu_sheaf =3D NULL; } else { pcs->rcu_free =3D NULL; - rcu_sheaf->node =3D numa_mem_id(); + rcu_sheaf->node =3D numa_node_id(); } =20 /* @@ -7597,7 +7648,7 @@ static int init_kmem_cache_nodes(struct kmem_cache *s) if (slab_state =3D=3D DOWN || !cache_has_sheaves(s)) return 1; =20 - for_each_node_mask(node, slab_nodes) { + for_each_node_mask(node, slab_barn_nodes) { struct node_barn *barn; =20 barn =3D kmalloc_node(sizeof(*barn), GFP_KERNEL, node); @@ -8250,6 +8301,7 @@ static int slab_mem_going_online_callback(int nid) * and barn initialized for the new node. */ node_set(nid, slab_nodes); + node_set(nid, slab_barn_nodes); out: mutex_unlock(&slab_mutex); return ret; @@ -8328,7 +8380,7 @@ static void __init bootstrap_cache_sheaves(struct kme= m_cache *s) if (!capacity) return; =20 - for_each_node_mask(node, slab_nodes) { + for_each_node_mask(node, slab_barn_nodes) { struct node_barn *barn; =20 barn =3D kmalloc_node(sizeof(*barn), GFP_KERNEL, node); @@ -8400,6 +8452,9 @@ void __init kmem_cache_init(void) for_each_node_state(node, N_MEMORY) node_set(node, slab_nodes); =20 + for_each_online_node(node) + node_set(node, slab_barn_nodes); + create_boot_cache(kmem_cache_node, "kmem_cache_node", sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0); @@ -8426,7 +8481,7 @@ void __init kmem_cache_init(void) /* Setup random freelists for each cache */ init_freelist_randomization(); =20 - cpuhp_setup_state_nocalls(CPUHP_SLUB_DEAD, "slub:dead", NULL, + cpuhp_setup_state_nocalls(CPUHP_SLUB_DEAD, "slub:dead", slub_cpu_setup, slub_cpu_dead); =20 pr_info("SLUB: HWalign=3D%d, Order=3D%u-%u, MinObjects=3D%u, CPUs=3D%u, N= odes=3D%u\n", --=20 2.53.0