From nobody Thu Jan 1 08:55:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00578C25B47 for ; Tue, 24 Oct 2023 09:34:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234219AbjJXJel (ORCPT ); Tue, 24 Oct 2023 05:34:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234531AbjJXJeI (ORCPT ); Tue, 24 Oct 2023 05:34:08 -0400 Received: from out-206.mta0.migadu.com (out-206.mta0.migadu.com [IPv6:2001:41d0:1004:224b::ce]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45085D7F for ; Tue, 24 Oct 2023 02:33:59 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1698140037; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mVsHpTi8/ybc90IBC2PUgLSTYbIjUF6GvQZHOJT0cXI=; b=WkSYYMuXGHeVTVvJKeEDkqylfQ5ISuw3VBdmhEEMq+LWy2qJeirb8he9LzRmfOp5qZ6+kT ZswKHPlHfUqDgf+oW4CzD+qBmu/0yksI5xZDpGRNHQHcev4VmDR1H8HChl9GYIrTSRecZo StPLjAgG42c97ebsxtS4Z+pyEQMwhsc= From: chengming.zhou@linux.dev To: cl@linux.com, penberg@kernel.org Cc: rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, chengming.zhou@linux.dev, Chengming Zhou Subject: [RFC PATCH v3 1/7] slub: Keep track of whether slub is on the per-node partial list Date: Tue, 24 Oct 2023 09:33:39 +0000 Message-Id: <20231024093345.3676493-2-chengming.zhou@linux.dev> In-Reply-To: <20231024093345.3676493-1-chengming.zhou@linux.dev> References: <20231024093345.3676493-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Chengming Zhou Now we rely on the "frozen" bit to see if we should manipulate the slab->slab_list, which will be changed in the following patch. Instead we introduce another way to keep track of whether slub is on the per-node partial list, here we reuse the PG_workingset bit. We use __set_bit and __clear_bit directly instead of the atomic version for better performance and it's safe since it's protected by the slub node list_lock. Signed-off-by: Chengming Zhou --- mm/slab.h | 19 +++++++++++++++++++ mm/slub.c | 3 +++ 2 files changed, 22 insertions(+) diff --git a/mm/slab.h b/mm/slab.h index 8cd3294fedf5..50522b688cfb 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -193,6 +193,25 @@ static inline void __slab_clear_pfmemalloc(struct slab= *slab) __folio_clear_active(slab_folio(slab)); } =20 +/* + * Slub reuse PG_workingset bit to keep track of whether it's on + * the per-node partial list. + */ +static inline bool slab_test_node_partial(const struct slab *slab) +{ + return folio_test_workingset((struct folio *)slab_folio(slab)); +} + +static inline void slab_set_node_partial(struct slab *slab) +{ + __set_bit(PG_workingset, folio_flags(slab_folio(slab), 0)); +} + +static inline void slab_clear_node_partial(struct slab *slab) +{ + __clear_bit(PG_workingset, folio_flags(slab_folio(slab), 0)); +} + static inline void *slab_address(const struct slab *slab) { return folio_address(slab_folio(slab)); diff --git a/mm/slub.c b/mm/slub.c index 63d281dfacdb..3fad4edca34b 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2127,6 +2127,7 @@ __add_partial(struct kmem_cache_node *n, struct slab = *slab, int tail) list_add_tail(&slab->slab_list, &n->partial); else list_add(&slab->slab_list, &n->partial); + slab_set_node_partial(slab); } =20 static inline void add_partial(struct kmem_cache_node *n, @@ -2141,6 +2142,7 @@ static inline void remove_partial(struct kmem_cache_n= ode *n, { lockdep_assert_held(&n->list_lock); list_del(&slab->slab_list); + slab_clear_node_partial(slab); n->nr_partial--; } =20 @@ -4831,6 +4833,7 @@ static int __kmem_cache_do_shrink(struct kmem_cache *= s) =20 if (free =3D=3D slab->objects) { list_move(&slab->slab_list, &discard); + slab_clear_node_partial(slab); n->nr_partial--; dec_slabs_node(s, node, slab->objects); } else if (free <=3D SHRINK_PROMOTE_MAX) --=20 2.40.1 From nobody Thu Jan 1 08:55:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D15B9C00A8F for ; Tue, 24 Oct 2023 09:44:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234152AbjJXJoO (ORCPT ); Tue, 24 Oct 2023 05:44:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41038 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234546AbjJXJeJ (ORCPT ); Tue, 24 Oct 2023 05:34:09 -0400 Received: from out-190.mta0.migadu.com (out-190.mta0.migadu.com [91.218.175.190]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A57E1FC6 for ; Tue, 24 Oct 2023 02:34:01 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1698140039; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h11SacZ+d1SZT9G0QAPU/rLkN5EaAt8R9Llcg4aahcg=; b=XdLQFwKjBt2izoxJqBOj93psrMaXNRty4xClGHCDT9p4T6juuhYWB/eoSnFrU+EPl/QvmP 6gAcQ3g4yVWbXa8UyMXIbOBPExVeP2TXXqyMEawXzbGshHntMsONmYh2xHZQ0ydgsgepOq WH5zAaZdRSTxxEDvvW3W5dkYKS7D57s= From: chengming.zhou@linux.dev To: cl@linux.com, penberg@kernel.org Cc: rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, chengming.zhou@linux.dev, Chengming Zhou Subject: [RFC PATCH v3 2/7] slub: Prepare __slab_free() for unfrozen partial slab out of node partial list Date: Tue, 24 Oct 2023 09:33:40 +0000 Message-Id: <20231024093345.3676493-3-chengming.zhou@linux.dev> In-Reply-To: <20231024093345.3676493-1-chengming.zhou@linux.dev> References: <20231024093345.3676493-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Chengming Zhou Now the partial slub will be frozen when taken out of node partial list, so the __slab_free() will know from "was_frozen" that the partial slab is not on node partial list and is used by one kmem_cache_cpu. But we will change this, make partial slabs leave the node partial list with unfrozen state, so we need to change __slab_free() to use the new slab_test_node_partial() we just introduced. Signed-off-by: Chengming Zhou --- mm/slub.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/slub.c b/mm/slub.c index 3fad4edca34b..f568a32d7332 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3610,6 +3610,7 @@ static void __slab_free(struct kmem_cache *s, struct = slab *slab, unsigned long counters; struct kmem_cache_node *n =3D NULL; unsigned long flags; + bool on_node_partial; =20 stat(s, FREE_SLOWPATH); =20 @@ -3657,6 +3658,7 @@ static void __slab_free(struct kmem_cache *s, struct = slab *slab, */ spin_lock_irqsave(&n->list_lock, flags); =20 + on_node_partial =3D slab_test_node_partial(slab); } } =20 @@ -3685,6 +3687,15 @@ static void __slab_free(struct kmem_cache *s, struct= slab *slab, return; } =20 + /* + * This slab was partial but not on the per-node partial list, + * in which case we shouldn't manipulate its list, just return. + */ + if (prior && !on_node_partial) { + spin_unlock_irqrestore(&n->list_lock, flags); + return; + } + if (unlikely(!new.inuse && n->nr_partial >=3D s->min_partial)) goto slab_empty; =20 --=20 2.40.1 From nobody Thu Jan 1 08:55:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19F47C00A8F for ; Tue, 24 Oct 2023 09:44:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234229AbjJXJoL (ORCPT ); Tue, 24 Oct 2023 05:44:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234547AbjJXJeJ (ORCPT ); Tue, 24 Oct 2023 05:34:09 -0400 Received: from out-191.mta0.migadu.com (out-191.mta0.migadu.com [91.218.175.191]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBF0610F6 for ; Tue, 24 Oct 2023 02:34:03 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1698140042; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nZerFoAj8INaI6ZSxOyjE3BeTDGPig5MeJKjJh1pMQ8=; b=V5gwmpkG8243Dq0GOsqzdIUFu+jgwdS+wsU2S/1oNpANH7Jiro5jaQ+wSigcoAXTIwUHR0 tz233Wke1fEJVMWATf22pDzMl0BDfdcprGotQ76eoP2bW65VEb0nau30v/bz1ONYQ7C1iq uidyW/b1k2Vm8H6knFM73iZoGZXo8uc= From: chengming.zhou@linux.dev To: cl@linux.com, penberg@kernel.org Cc: rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, chengming.zhou@linux.dev, Chengming Zhou Subject: [RFC PATCH v3 3/7] slub: Reflow ___slab_alloc() Date: Tue, 24 Oct 2023 09:33:41 +0000 Message-Id: <20231024093345.3676493-4-chengming.zhou@linux.dev> In-Reply-To: <20231024093345.3676493-1-chengming.zhou@linux.dev> References: <20231024093345.3676493-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Chengming Zhou The get_partial() interface used in ___slab_alloc() may return a single object in the "kmem_cache_debug(s)" case, in which we will just return the "freelist" object. Move this handling up to prepare for later changes. And the "pfmemalloc_match()" part is not needed for node partial slab, since we already check this in the get_partial_node(). Signed-off-by: Chengming Zhou --- mm/slub.c | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index f568a32d7332..cd8aa68c156e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3218,8 +3218,21 @@ static void *___slab_alloc(struct kmem_cache *s, gfp= _t gfpflags, int node, pc.slab =3D &slab; pc.orig_size =3D orig_size; freelist =3D get_partial(s, node, &pc); - if (freelist) - goto check_new_slab; + if (freelist) { + if (kmem_cache_debug(s)) { + /* + * For debug caches here we had to go through + * alloc_single_from_partial() so just store the + * tracking info and return the object. + */ + if (s->flags & SLAB_STORE_USER) + set_track(s, freelist, TRACK_ALLOC, addr); + + return freelist; + } + + goto retry_load_slab; + } =20 slub_put_cpu_ptr(s->cpu_slab); slab =3D new_slab(s, gfpflags, node); @@ -3255,20 +3268,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp= _t gfpflags, int node, =20 inc_slabs_node(s, slab_nid(slab), slab->objects); =20 -check_new_slab: - - if (kmem_cache_debug(s)) { - /* - * For debug caches here we had to go through - * alloc_single_from_partial() so just store the tracking info - * and return the object - */ - if (s->flags & SLAB_STORE_USER) - set_track(s, freelist, TRACK_ALLOC, addr); - - return freelist; - } - if (unlikely(!pfmemalloc_match(slab, gfpflags))) { /* * For !pfmemalloc_match() case we don't load freelist so that --=20 2.40.1 From nobody Thu Jan 1 08:55:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BD3AC07545 for ; Tue, 24 Oct 2023 09:44:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234112AbjJXJoG (ORCPT ); Tue, 24 Oct 2023 05:44:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234550AbjJXJeK (ORCPT ); Tue, 24 Oct 2023 05:34:10 -0400 Received: from out-209.mta0.migadu.com (out-209.mta0.migadu.com [IPv6:2001:41d0:1004:224b::d1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D48C410E6 for ; Tue, 24 Oct 2023 02:34:05 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1698140044; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=M0oaaQjRtsggr60xf/TcRhISey4PmiC9GN3ZTwclV0I=; b=qe+5aC/P6zRu1TdKIHVLvCkRJ0utMiqJQbaCAMAjQVzHN5g58p3EkWiMWER0AdEMI6PKuU X/FgD2Z340OUYybYcb2S1udN+T5kspyS4qlvEyxH5DfTOIwhkqfb34by37jVV286bVVgXe Fb/uzGHTjVq7WYepvR7yMsyzuEjRMHI= From: chengming.zhou@linux.dev To: cl@linux.com, penberg@kernel.org Cc: rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, chengming.zhou@linux.dev, Chengming Zhou Subject: [RFC PATCH v3 4/7] slub: Change get_partial() interfaces to return slab Date: Tue, 24 Oct 2023 09:33:42 +0000 Message-Id: <20231024093345.3676493-5-chengming.zhou@linux.dev> In-Reply-To: <20231024093345.3676493-1-chengming.zhou@linux.dev> References: <20231024093345.3676493-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Chengming Zhou We need all get_partial() related interfaces to return a slab, instead of returning the freelist (or object). Use the partial_context.object to return back freelist or object for now. This patch shouldn't have any functional changes. Signed-off-by: Chengming Zhou Reviewed-by: Vlastimil Babka --- mm/slub.c | 63 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 33 insertions(+), 30 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index cd8aa68c156e..7d0234bffad3 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -204,9 +204,9 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled); =20 /* Structure holding parameters for get_partial() call chain */ struct partial_context { - struct slab **slab; gfp_t flags; unsigned int orig_size; + void *object; }; =20 static inline bool kmem_cache_debug(struct kmem_cache *s) @@ -2271,10 +2271,11 @@ static inline bool pfmemalloc_match(struct slab *sl= ab, gfp_t gfpflags); /* * Try to allocate a partial slab from a specific node. */ -static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node= *n, - struct partial_context *pc) +static struct slab *get_partial_node(struct kmem_cache *s, + struct kmem_cache_node *n, + struct partial_context *pc) { - struct slab *slab, *slab2; + struct slab *slab, *slab2, *partial =3D NULL; void *object =3D NULL; unsigned long flags; unsigned int partial_slabs =3D 0; @@ -2290,27 +2291,28 @@ static void *get_partial_node(struct kmem_cache *s,= struct kmem_cache_node *n, =20 spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry_safe(slab, slab2, &n->partial, slab_list) { - void *t; - if (!pfmemalloc_match(slab, pc->flags)) continue; =20 if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) { object =3D alloc_single_from_partial(s, n, slab, pc->orig_size); - if (object) + if (object) { + partial =3D slab; + pc->object =3D object; break; + } continue; } =20 - t =3D acquire_slab(s, n, slab, object =3D=3D NULL); - if (!t) + object =3D acquire_slab(s, n, slab, object =3D=3D NULL); + if (!object) break; =20 - if (!object) { - *pc->slab =3D slab; + if (!partial) { + partial =3D slab; + pc->object =3D object; stat(s, ALLOC_FROM_PARTIAL); - object =3D t; } else { put_cpu_partial(s, slab, 0); stat(s, CPU_PARTIAL_NODE); @@ -2326,20 +2328,21 @@ static void *get_partial_node(struct kmem_cache *s,= struct kmem_cache_node *n, =20 } spin_unlock_irqrestore(&n->list_lock, flags); - return object; + return partial; } =20 /* * Get a slab from somewhere. Search in increasing NUMA distances. */ -static void *get_any_partial(struct kmem_cache *s, struct partial_context = *pc) +static struct slab *get_any_partial(struct kmem_cache *s, + struct partial_context *pc) { #ifdef CONFIG_NUMA struct zonelist *zonelist; struct zoneref *z; struct zone *zone; enum zone_type highest_zoneidx =3D gfp_zone(pc->flags); - void *object; + struct slab *slab; unsigned int cpuset_mems_cookie; =20 /* @@ -2374,8 +2377,8 @@ static void *get_any_partial(struct kmem_cache *s, st= ruct partial_context *pc) =20 if (n && cpuset_zone_allowed(zone, pc->flags) && n->nr_partial > s->min_partial) { - object =3D get_partial_node(s, n, pc); - if (object) { + slab =3D get_partial_node(s, n, pc); + if (slab) { /* * Don't check read_mems_allowed_retry() * here - if mems_allowed was updated in @@ -2383,7 +2386,7 @@ static void *get_any_partial(struct kmem_cache *s, st= ruct partial_context *pc) * between allocation and the cpuset * update */ - return object; + return slab; } } } @@ -2395,17 +2398,18 @@ static void *get_any_partial(struct kmem_cache *s, = struct partial_context *pc) /* * Get a partial slab, lock it and return it. */ -static void *get_partial(struct kmem_cache *s, int node, struct partial_co= ntext *pc) +static struct slab *get_partial(struct kmem_cache *s, int node, + struct partial_context *pc) { - void *object; + struct slab *slab; int searchnode =3D node; =20 if (node =3D=3D NUMA_NO_NODE) searchnode =3D numa_mem_id(); =20 - object =3D get_partial_node(s, get_node(s, searchnode), pc); - if (object || node !=3D NUMA_NO_NODE) - return object; + slab =3D get_partial_node(s, get_node(s, searchnode), pc); + if (slab || node !=3D NUMA_NO_NODE) + return slab; =20 return get_any_partial(s, pc); } @@ -3215,10 +3219,10 @@ static void *___slab_alloc(struct kmem_cache *s, gf= p_t gfpflags, int node, new_objects: =20 pc.flags =3D gfpflags; - pc.slab =3D &slab; pc.orig_size =3D orig_size; - freelist =3D get_partial(s, node, &pc); - if (freelist) { + slab =3D get_partial(s, node, &pc); + if (slab) { + freelist =3D pc.object; if (kmem_cache_debug(s)) { /* * For debug caches here we had to go through @@ -3410,12 +3414,11 @@ static void *__slab_alloc_node(struct kmem_cache *s, void *object; =20 pc.flags =3D gfpflags; - pc.slab =3D &slab; pc.orig_size =3D orig_size; - object =3D get_partial(s, node, &pc); + slab =3D get_partial(s, node, &pc); =20 - if (object) - return object; + if (slab) + return pc.object; =20 slab =3D new_slab(s, gfpflags, node); if (unlikely(!slab)) { --=20 2.40.1 From nobody Thu Jan 1 08:55:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFF28C25B47 for ; Tue, 24 Oct 2023 09:44:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234213AbjJXJoQ (ORCPT ); Tue, 24 Oct 2023 05:44:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234554AbjJXJeL (ORCPT ); Tue, 24 Oct 2023 05:34:11 -0400 Received: from out-194.mta0.migadu.com (out-194.mta0.migadu.com [91.218.175.194]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4285E10F5 for ; Tue, 24 Oct 2023 02:34:08 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1698140046; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UijxEztLRTexuc2wmlND22JYjAvU8peNSSAQw8rq2PA=; b=gykdRjMal5sST/bB2trCR8USAxy2UxFWhSeBpIeqaRzeF5OYDFVsgF9netz4vEDOO1YDYl eTbp0AKHNXuWCACBnN9eOw9UgYy91M8GYL3m9MQgwc5I0MLP0NIaEzdLHE9x29GwvY+Peq YX7cFeuPsGeiRsn56VsLqoroQhjglBc= From: chengming.zhou@linux.dev To: cl@linux.com, penberg@kernel.org Cc: rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, chengming.zhou@linux.dev, Chengming Zhou Subject: [RFC PATCH v3 5/7] slub: Introduce freeze_slab() Date: Tue, 24 Oct 2023 09:33:43 +0000 Message-Id: <20231024093345.3676493-6-chengming.zhou@linux.dev> In-Reply-To: <20231024093345.3676493-1-chengming.zhou@linux.dev> References: <20231024093345.3676493-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Chengming Zhou We will have unfrozen slabs out of the node partial list later, so we need a freeze_slab() function to freeze the partial slab and get its freelist. Signed-off-by: Chengming Zhou Reviewed-by: Vlastimil Babka --- mm/slub.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/mm/slub.c b/mm/slub.c index 7d0234bffad3..5b428648021f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3079,6 +3079,33 @@ static inline void *get_freelist(struct kmem_cache *= s, struct slab *slab) return freelist; } =20 +/* + * Freeze the partial slab and return the pointer to the freelist. + */ +static inline void *freeze_slab(struct kmem_cache *s, struct slab *slab) +{ + struct slab new; + unsigned long counters; + void *freelist; + + do { + freelist =3D slab->freelist; + counters =3D slab->counters; + + new.counters =3D counters; + VM_BUG_ON(new.frozen); + + new.inuse =3D slab->objects; + new.frozen =3D 1; + + } while (!__slab_update_freelist(s, slab, + freelist, counters, + NULL, new.counters, + "freeze_slab")); + + return freelist; +} + /* * Slow path. The lockless freelist is empty or we need to perform * debugging duties. --=20 2.40.1 From nobody Thu Jan 1 08:55:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9CD3C07545 for ; Tue, 24 Oct 2023 09:35:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234322AbjJXJe5 (ORCPT ); Tue, 24 Oct 2023 05:34:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234563AbjJXJeM (ORCPT ); Tue, 24 Oct 2023 05:34:12 -0400 Received: from out-190.mta0.migadu.com (out-190.mta0.migadu.com [91.218.175.190]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97C441706 for ; Tue, 24 Oct 2023 02:34:10 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1698140048; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1189G8OwJQsHaUqQ7omEWGXJiWlZJSsRqA7qcxA5H0E=; b=Y7iQvWYnYomwhH9HpqqMB6QnGTBOBbLrimV35vPGob+QAc/EDk42XtH9Qiw36iRppPZAfY nbCKMQcN9UNCzVRzFAAYL7APDUqtxspLiF8pMGCAFass0h7cocBOURvFszr/QxFy9Eav/T md+C1DNGHg5ckM7dyaawphlNjCc6pmI= From: chengming.zhou@linux.dev To: cl@linux.com, penberg@kernel.org Cc: rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, chengming.zhou@linux.dev, Chengming Zhou Subject: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs Date: Tue, 24 Oct 2023 09:33:44 +0000 Message-Id: <20231024093345.3676493-7-chengming.zhou@linux.dev> In-Reply-To: <20231024093345.3676493-1-chengming.zhou@linux.dev> References: <20231024093345.3676493-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Chengming Zhou Now we will freeze slabs when moving them out of node partial list to cpu partial list, this method needs two cmpxchg_double operations: 1. freeze slab (acquire_slab()) under the node list_lock 2. get_freelist() when pick used in ___slab_alloc() Actually we don't need to freeze when moving slabs out of node partial list, we can delay freezing to when use slab freelist in ___slab_alloc(), so we can save one cmpxchg_double(). And there are other good points: - The moving of slabs between node partial list and cpu partial list becomes simpler, since we don't need to freeze or unfreeze at all. - The node list_lock contention would be less, since we don't need to freeze any slab under the node list_lock. We can achieve this because there is no concurrent path would manipulate the partial slab list except the __slab_free() path, which is serialized now. Since the slab returned by get_partial() interfaces is not frozen anymore and no freelist in the partial_context, so we need to use the introduced freeze_slab() to freeze it and get its freelist. Similarly, the slabs on the CPU partial list are not frozen anymore, we need to freeze_slab() on it before use. Signed-off-by: Chengming Zhou Reviewed-by: Vlastimil Babka --- mm/slub.c | 111 +++++++++++------------------------------------------- 1 file changed, 21 insertions(+), 90 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 5b428648021f..486d44421432 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2215,51 +2215,6 @@ static void *alloc_single_from_new_slab(struct kmem_= cache *s, return object; } =20 -/* - * Remove slab from the partial list, freeze it and - * return the pointer to the freelist. - * - * Returns a list of objects or NULL if it fails. - */ -static inline void *acquire_slab(struct kmem_cache *s, - struct kmem_cache_node *n, struct slab *slab, - int mode) -{ - void *freelist; - unsigned long counters; - struct slab new; - - lockdep_assert_held(&n->list_lock); - - /* - * Zap the freelist and set the frozen bit. - * The old freelist is the list of objects for the - * per cpu allocation list. - */ - freelist =3D slab->freelist; - counters =3D slab->counters; - new.counters =3D counters; - if (mode) { - new.inuse =3D slab->objects; - new.freelist =3D NULL; - } else { - new.freelist =3D freelist; - } - - VM_BUG_ON(new.frozen); - new.frozen =3D 1; - - if (!__slab_update_freelist(s, slab, - freelist, counters, - new.freelist, new.counters, - "acquire_slab")) - return NULL; - - remove_partial(n, slab); - WARN_ON(!freelist); - return freelist; -} - #ifdef CONFIG_SLUB_CPU_PARTIAL static void put_cpu_partial(struct kmem_cache *s, struct slab *slab, int d= rain); #else @@ -2276,7 +2231,6 @@ static struct slab *get_partial_node(struct kmem_cach= e *s, struct partial_context *pc) { struct slab *slab, *slab2, *partial =3D NULL; - void *object =3D NULL; unsigned long flags; unsigned int partial_slabs =3D 0; =20 @@ -2295,7 +2249,7 @@ static struct slab *get_partial_node(struct kmem_cach= e *s, continue; =20 if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) { - object =3D alloc_single_from_partial(s, n, slab, + void *object =3D alloc_single_from_partial(s, n, slab, pc->orig_size); if (object) { partial =3D slab; @@ -2305,13 +2259,10 @@ static struct slab *get_partial_node(struct kmem_ca= che *s, continue; } =20 - object =3D acquire_slab(s, n, slab, object =3D=3D NULL); - if (!object) - break; + remove_partial(n, slab); =20 if (!partial) { partial =3D slab; - pc->object =3D object; stat(s, ALLOC_FROM_PARTIAL); } else { put_cpu_partial(s, slab, 0); @@ -2610,9 +2561,6 @@ static void __unfreeze_partials(struct kmem_cache *s,= struct slab *partial_slab) unsigned long flags =3D 0; =20 while (partial_slab) { - struct slab new; - struct slab old; - slab =3D partial_slab; partial_slab =3D slab->next; =20 @@ -2625,23 +2573,7 @@ static void __unfreeze_partials(struct kmem_cache *s= , struct slab *partial_slab) spin_lock_irqsave(&n->list_lock, flags); } =20 - do { - - old.freelist =3D slab->freelist; - old.counters =3D slab->counters; - VM_BUG_ON(!old.frozen); - - new.counters =3D old.counters; - new.freelist =3D old.freelist; - - new.frozen =3D 0; - - } while (!__slab_update_freelist(s, slab, - old.freelist, old.counters, - new.freelist, new.counters, - "unfreezing slab")); - - if (unlikely(!new.inuse && n->nr_partial >=3D s->min_partial)) { + if (unlikely(!slab->inuse && n->nr_partial >=3D s->min_partial)) { slab->next =3D slab_to_discard; slab_to_discard =3D slab; } else { @@ -3148,7 +3080,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_= t gfpflags, int node, node =3D NUMA_NO_NODE; goto new_slab; } -redo: =20 if (unlikely(!node_match(slab, node))) { /* @@ -3224,7 +3155,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_= t gfpflags, int node, =20 new_slab: =20 - if (slub_percpu_partial(c)) { + while (slub_percpu_partial(c)) { local_lock_irqsave(&s->cpu_slab->lock, flags); if (unlikely(c->slab)) { local_unlock_irqrestore(&s->cpu_slab->lock, flags); @@ -3236,11 +3167,20 @@ static void *___slab_alloc(struct kmem_cache *s, gf= p_t gfpflags, int node, goto new_objects; } =20 - slab =3D c->slab =3D slub_percpu_partial(c); + slab =3D slub_percpu_partial(c); slub_set_percpu_partial(c, slab); local_unlock_irqrestore(&s->cpu_slab->lock, flags); stat(s, CPU_PARTIAL_ALLOC); - goto redo; + + if (unlikely(!node_match(slab, node) || + !pfmemalloc_match(slab, gfpflags))) { + slab->next =3D NULL; + __unfreeze_partials(s, slab); + continue; + } + + freelist =3D freeze_slab(s, slab); + goto retry_load_slab; } =20 new_objects: @@ -3249,8 +3189,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_= t gfpflags, int node, pc.orig_size =3D orig_size; slab =3D get_partial(s, node, &pc); if (slab) { - freelist =3D pc.object; if (kmem_cache_debug(s)) { + freelist =3D pc.object; /* * For debug caches here we had to go through * alloc_single_from_partial() so just store the @@ -3262,6 +3202,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_= t gfpflags, int node, return freelist; } =20 + freelist =3D freeze_slab(s, slab); goto retry_load_slab; } =20 @@ -3663,18 +3604,8 @@ static void __slab_free(struct kmem_cache *s, struct= slab *slab, was_frozen =3D new.frozen; new.inuse -=3D cnt; if ((!new.inuse || !prior) && !was_frozen) { - - if (kmem_cache_has_cpu_partial(s) && !prior) { - - /* - * Slab was on no list before and will be - * partially empty - * We can defer the list move and instead - * freeze it. - */ - new.frozen =3D 1; - - } else { /* Needs to be taken off a list */ + /* Needs to be taken off a list */ + if (!kmem_cache_has_cpu_partial(s) || prior) { =20 n =3D get_node(s, slab_nid(slab)); /* @@ -3704,9 +3635,9 @@ static void __slab_free(struct kmem_cache *s, struct = slab *slab, * activity can be necessary. */ stat(s, FREE_FROZEN); - } else if (new.frozen) { + } else if (kmem_cache_has_cpu_partial(s) && !prior) { /* - * If we just froze the slab then put it onto the + * If we started with a full slab then put it onto the * per cpu partial list. */ put_cpu_partial(s, slab, 1); --=20 2.40.1 From nobody Thu Jan 1 08:55:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8412C07545 for ; Tue, 24 Oct 2023 09:34:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234291AbjJXJes (ORCPT ); Tue, 24 Oct 2023 05:34:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234581AbjJXJeP (ORCPT ); Tue, 24 Oct 2023 05:34:15 -0400 Received: from out-197.mta0.migadu.com (out-197.mta0.migadu.com [91.218.175.197]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DBD1170D for ; Tue, 24 Oct 2023 02:34:13 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1698140051; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cdvJNGs14r77ax8dSUQgIW9duarJJopRajvvGp2TDMo=; b=LFQxXGGRKgIbWAldNJ1TYZrnkEc0I+pD5nGoJ0Dqwyqt32dcDScSAvf2HkxURKE+MuV4uf VITI/OvMyXtzkGrkQIUB2hTODauxk/xJEPGsyPf7Ow4uqMZV7ixHrnzuwbIKcLr4S5SXvq SpnsSqfJOa1quAz1cL2pg8WHPN5pFGw= From: chengming.zhou@linux.dev To: cl@linux.com, penberg@kernel.org Cc: rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, chengming.zhou@linux.dev, Chengming Zhou Subject: [RFC PATCH v3 7/7] slub: Optimize deactivate_slab() Date: Tue, 24 Oct 2023 09:33:45 +0000 Message-Id: <20231024093345.3676493-8-chengming.zhou@linux.dev> In-Reply-To: <20231024093345.3676493-1-chengming.zhou@linux.dev> References: <20231024093345.3676493-1-chengming.zhou@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Chengming Zhou Since the introduce of unfrozen slabs on cpu partial list, we don't need to synchronize the slab frozen state under the node list_lock. The caller of deactivate_slab() and the caller of __slab_free() won't manipulate the slab list concurrently. So we can get node list_lock in the last stage if we really need to manipulate the slab list in this path. Signed-off-by: Chengming Zhou --- mm/slub.c | 70 ++++++++++++++++++++----------------------------------- 1 file changed, 25 insertions(+), 45 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 486d44421432..64d550e415eb 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2449,10 +2449,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *= s) static void deactivate_slab(struct kmem_cache *s, struct slab *slab, void *freelist) { - enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST }; struct kmem_cache_node *n =3D get_node(s, slab_nid(slab)); int free_delta =3D 0; - enum slab_modes mode =3D M_NONE; void *nextfree, *freelist_iter, *freelist_tail; int tail =3D DEACTIVATE_TO_HEAD; unsigned long flags =3D 0; @@ -2499,58 +2497,40 @@ static void deactivate_slab(struct kmem_cache *s, s= truct slab *slab, * unfrozen and number of objects in the slab may have changed. * Then release lock and retry cmpxchg again. */ -redo: - - old.freelist =3D READ_ONCE(slab->freelist); - old.counters =3D READ_ONCE(slab->counters); - VM_BUG_ON(!old.frozen); - - /* Determine target state of the slab */ - new.counters =3D old.counters; - if (freelist_tail) { - new.inuse -=3D free_delta; - set_freepointer(s, freelist_tail, old.freelist); - new.freelist =3D freelist; - } else - new.freelist =3D old.freelist; + do { + old.freelist =3D READ_ONCE(slab->freelist); + old.counters =3D READ_ONCE(slab->counters); + VM_BUG_ON(!old.frozen); + + /* Determine target state of the slab */ + new.counters =3D old.counters; + new.frozen =3D 0; + if (freelist_tail) { + new.inuse -=3D free_delta; + set_freepointer(s, freelist_tail, old.freelist); + new.freelist =3D freelist; + } else + new.freelist =3D old.freelist; =20 - new.frozen =3D 0; + } while (!slab_update_freelist(s, slab, + old.freelist, old.counters, + new.freelist, new.counters, + "unfreezing slab")); =20 + /* + * Stage three: Manipulate the slab list based on the updated state. + */ if (!new.inuse && n->nr_partial >=3D s->min_partial) { - mode =3D M_FREE; + stat(s, DEACTIVATE_EMPTY); + discard_slab(s, slab); + stat(s, FREE_SLAB); } else if (new.freelist) { - mode =3D M_PARTIAL; - /* - * Taking the spinlock removes the possibility that - * acquire_slab() will see a slab that is frozen - */ spin_lock_irqsave(&n->list_lock, flags); - } else { - mode =3D M_FULL_NOLIST; - } - - - if (!slab_update_freelist(s, slab, - old.freelist, old.counters, - new.freelist, new.counters, - "unfreezing slab")) { - if (mode =3D=3D M_PARTIAL) - spin_unlock_irqrestore(&n->list_lock, flags); - goto redo; - } - - - if (mode =3D=3D M_PARTIAL) { add_partial(n, slab, tail); spin_unlock_irqrestore(&n->list_lock, flags); stat(s, tail); - } else if (mode =3D=3D M_FREE) { - stat(s, DEACTIVATE_EMPTY); - discard_slab(s, slab); - stat(s, FREE_SLAB); - } else if (mode =3D=3D M_FULL_NOLIST) { + } else stat(s, DEACTIVATE_FULL); - } } =20 #ifdef CONFIG_SLUB_CPU_PARTIAL --=20 2.40.1