From nobody Thu Apr  2 14:10:37 2026
Received: from mxct.zte.com.cn (mxct.zte.com.cn [58.251.27.85])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D7102DF701
	for <linux-kernel@vger.kernel.org>; Sat, 28 Mar 2026 04:56:15 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=58.251.27.85
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774673778; cv=none;
 b=lWBjVzFwW+PnNS8cRUsj1uHIKySDMKq4i/awgrBSSS6Bxlw458c3JN9TFtK20NNqC18b5C8peK5SL+cGIyoiV0ACfjqgsVQDrYTvrNpozMnNd7HYcbFRCYn1OdT4S1XL3b5U4y0BSbyx3hTRcZxjl7QpQ0G9Od2ozk3/3kLgAWs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774673778; c=relaxed/simple;
	bh=6B/ryYCh4jdMxF4H8Ay7G/vEL6jQh0ayX1V0hjhfWx4=;
	h=Message-ID:Date:Mime-Version:From:To:Cc:Subject:Content-Type;
 b=VT6xE0hAUCSoBT9pZKxeDDlD3HeY+vHVA+f94+SBo+rmPA1DLa0Mhc+56pA/jSZvqKzbL2RJY5QLq38NsQVLYeRgKa8cTUi05RT36+ffHPwAER3EJkurgDw/sMYB95oprCM6o7D8WiNh4qPmCIgM94XhXar7mPKFHQC7pMviy7o=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=zte.com.cn;
 spf=pass smtp.mailfrom=zte.com.cn; arc=none smtp.client-ip=58.251.27.85
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=zte.com.cn
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=zte.com.cn
Received: from mxde.zte.com.cn (unknown [10.35.20.121])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest
 SHA256)
	(No client certificate requested)
	by mxct.zte.com.cn (FangMail) with ESMTPS id 4fjQFc5KHDzHT1
	for <linux-kernel@vger.kernel.org>; Sat, 28 Mar 2026 12:56:04 +0800 (CST)
Received: from mxhk.zte.com.cn (unknown [192.168.250.138])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest
 SHA256)
	(No client certificate requested)
	by mxde.zte.com.cn (FangMail) with ESMTPS id 4fjQFT2nNFz9xW1Y
	for <linux-kernel@vger.kernel.org>; Sat, 28 Mar 2026 12:55:57 +0800 (CST)
Received: from mse-fl1.zte.com.cn (unknown [10.5.228.132])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest
 SHA256)
	(No client certificate requested)
	by mxhk.zte.com.cn (FangMail) with ESMTPS id 4fjQFJ0wz1z5B10d;
	Sat, 28 Mar 2026 12:55:48 +0800 (CST)
Received: from xaxapp05.zte.com.cn ([10.99.98.109])
	by mse-fl1.zte.com.cn with SMTP id 62S4taXs044377;
	Sat, 28 Mar 2026 12:55:36 +0800 (+08)
	(envelope-from hu.shengming@zte.com.cn)
Received: from mapi (xaxapp02[null])
	by mapi (Zmail) with MAPI id mid32;
	Sat, 28 Mar 2026 12:55:38 +0800 (CST)
X-Zmail-TransId: 2afa69c75f4a4e6-2c208
X-Mailer: Zmail v1.0
Message-ID: <20260328125538341lvTGRpS62UNdRiAAz2gH3@zte.com.cn>
Date: Sat, 28 Mar 2026 12:55:38 +0800 (CST)
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
From: <hu.shengming@zte.com.cn>
To: <vbabka@kernel.org>, <harry@kernel.org>, <akpm@linux-foundation.org>
Cc: <hao.li@linux.dev>, <cl@gentwo.org>, <rientjes@google.com>,
        <roman.gushchin@linux.dev>, <linux-mm@kvack.org>,
        <linux-kernel@vger.kernel.org>, <zhang.run@zte.com.cn>,
        <xu.xin16@zte.com.cn>, <yang.tao172@zte.com.cn>,
        <yang.yang29@zte.com.cn>
Subject: 
 =?UTF-8?B?W1BBVENIXSBtbS9zbHViOiBza2lwIGZyZWVsaXN0IGNvbnN0cnVjdGlvbiBmb3Igd2hvbGUtc2xhYiBidWxrIHJlZmlsbA==?=
Content-Type: text/plain; charset="utf-8"
X-MAIL: mse-fl1.zte.com.cn 62S4taXs044377
X-TLS: YES
X-SPF-DOMAIN: zte.com.cn
X-ENVELOPE-SENDER: hu.shengming@zte.com.cn
X-SPF: None
X-SOURCE-IP: 10.35.20.121 unknown Sat, 28 Mar 2026 12:56:05 +0800
X-Fangmail-Anti-Spam-Filtered: true
X-Fangmail-MID-QID: 69C75F63.000/4fjQFc5KHDzHT1
Content-Transfer-Encoding: quoted-printable

From: Shengming Hu <hu.shengming@zte.com.cn>

refill_objects() still carries a long-standing note that a whole-slab
bulk refill could avoid building a freelist that is immediately drained.

When the remaining bulk allocation is large enough to fully consume a
new slab, constructing the freelist is unnecessary overhead. Instead,
allocate the slab without building its freelist and hand out all objects
directly to the caller. The slab is then initialized as fully in-use.

Keep the existing behavior when CONFIG_SLAB_FREELIST_RANDOM is enabled,
as freelist construction is required to provide randomized object order.

Additionally, mark setup_object() as inline. After introducing this
optimization, the compiler no longer consistently inlines this helper,
which can regress performance in this hot path. Explicitly marking it
inline restores the expected code generation.

This reduces per-object overhead in bulk allocation paths and improves
allocation throughput significantly.

Benchmark results (slub_bulk_bench):

  Machine: qemu-system-x86_64 -m 1024M -smp 8
  Kernel:  Linux 7.0.0-rc5-next-20260326
  Config:  x86_64_defconfig
  Rounds:  20
  Total:   256MB

  obj_size=3D16, batch=3D256:
    before: 28.80 =C2=B1 1.20 ns/object
    after:  17.95 =C2=B1 0.94 ns/object
    delta:  -37.7%

  obj_size=3D32, batch=3D128:
    before: 33.00 =C2=B1 0.00 ns/object
    after:  21.75 =C2=B1 0.44 ns/object
    delta:  -34.1%

  obj_size=3D64, batch=3D64:
    before: 44.30 =C2=B1 0.73 ns/object
    after:  30.60 =C2=B1 0.50 ns/object
    delta:  -30.9%

  obj_size=3D128, batch=3D32:
    before: 81.40 =C2=B1 1.85 ns/object
    after:  47.00 =C2=B1 0.00 ns/object
    delta:  -42.3%

  obj_size=3D256, batch=3D32:
    before: 101.20 =C2=B1 1.28 ns/object
    after:  52.55 =C2=B1 0.60 ns/object
    delta:  -48.1%

  obj_size=3D512, batch=3D32:
    before: 109.40 =C2=B1 2.30 ns/object
    after:  53.80 =C2=B1 0.62 ns/object
    delta:  -50.8%

Link: https://github.com/HSM6236/slub_bulk_test.git
Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
---
 mm/slub.c | 90 +++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 71 insertions(+), 19 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index fb2c5c57bc4e..c0ecfb42b035 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2733,7 +2733,7 @@ bool slab_free_freelist_hook(struct kmem_cache *s, vo=
id **head, void **tail,
 	return *head !=3D NULL;
 }

-static void *setup_object(struct kmem_cache *s, void *object)
+static inline void *setup_object(struct kmem_cache *s, void *object)
 {
 	setup_object_debug(s, object);
 	object =3D kasan_init_slab_obj(s, object);
@@ -3438,7 +3438,8 @@ static __always_inline void unaccount_slab(struct sla=
b *slab, int order,
 			    -(PAGE_SIZE << order));
 }

-static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int n=
ode)
+static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int n=
ode,
+				  bool build_freelist)
 {
 	bool allow_spin =3D gfpflags_allow_spinning(flags);
 	struct slab *slab;
@@ -3446,7 +3447,7 @@ static struct slab *allocate_slab(struct kmem_cache *=
s, gfp_t flags, int node)
 	gfp_t alloc_gfp;
 	void *start, *p, *next;
 	int idx;
-	bool shuffle;
+	bool shuffle =3D false;

 	flags &=3D gfp_allowed_mask;

@@ -3483,6 +3484,7 @@ static struct slab *allocate_slab(struct kmem_cache *=
s, gfp_t flags, int node)
 	slab->frozen =3D 0;

 	slab->slab_cache =3D s;
+	slab->freelist =3D NULL;

 	kasan_poison_slab(slab);

@@ -3497,9 +3499,10 @@ static struct slab *allocate_slab(struct kmem_cache =
*s, gfp_t flags, int node)
 	alloc_slab_obj_exts_early(s, slab);
 	account_slab(slab, oo_order(oo), s, flags);

-	shuffle =3D shuffle_freelist(s, slab, allow_spin);
+	if (build_freelist)
+		shuffle =3D shuffle_freelist(s, slab, allow_spin);

-	if (!shuffle) {
+	if (build_freelist && !shuffle) {
 		start =3D fixup_red_left(s, start);
 		start =3D setup_object(s, start);
 		slab->freelist =3D start;
@@ -3515,7 +3518,8 @@ static struct slab *allocate_slab(struct kmem_cache *=
s, gfp_t flags, int node)
 	return slab;
 }

-static struct slab *new_slab(struct kmem_cache *s, gfp_t flags, int node)
+static struct slab *new_slab(struct kmem_cache *s, gfp_t flags, int node,
+			     bool build_freelist)
 {
 	if (unlikely(flags & GFP_SLAB_BUG_MASK))
 		flags =3D kmalloc_fix_flags(flags);
@@ -3523,7 +3527,7 @@ static struct slab *new_slab(struct kmem_cache *s, gf=
p_t flags, int node)
 	WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));

 	return allocate_slab(s,
-		flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
+		flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node, build_freelist);
 }

 static void __free_slab(struct kmem_cache *s, struct slab *slab, bool allo=
w_spin)
@@ -4395,6 +4399,45 @@ static unsigned int alloc_from_new_slab(struct kmem_=
cache *s, struct slab *slab,
 	return allocated;
 }

+static unsigned int alloc_whole_from_new_slab(struct kmem_cache *s,
+		struct slab *slab, void **p)
+{
+	unsigned int allocated =3D 0;
+	void *object;
+
+	object =3D fixup_red_left(s, slab_address(slab));
+	object =3D setup_object(s, object);
+
+	while (allocated < slab->objects - 1) {
+		p[allocated] =3D object;
+		maybe_wipe_obj_freeptr(s, object);
+
+		allocated++;
+		object +=3D s->size;
+		object =3D setup_object(s, object);
+	}
+
+	p[allocated] =3D object;
+	maybe_wipe_obj_freeptr(s, object);
+	allocated++;
+
+	slab->freelist =3D NULL;
+	slab->inuse =3D slab->objects;
+	inc_slabs_node(s, slab_nid(slab), slab->objects);
+
+	return allocated;
+}
+
+static inline bool bulk_refill_consumes_whole_slab(struct kmem_cache *s,
+		unsigned int count)
+{
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+	return false;
+#else
+	return count >=3D oo_objects(s->oo);
+#endif
+}
+
 /*
  * Slow path. We failed to allocate via percpu sheaves or they are not ava=
ilable
  * due to bootstrap or debugging enabled or SLUB_TINY.
@@ -4441,7 +4484,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_=
t gfpflags, int node,
 	if (object)
 		goto success;

-	slab =3D new_slab(s, pc.flags, node);
+	slab =3D new_slab(s, pc.flags, node, true);

 	if (unlikely(!slab)) {
 		if (node !=3D NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE)
@@ -7244,18 +7287,27 @@ refill_objects(struct kmem_cache *s, void **p, gfp_=
t gfp, unsigned int min,

 new_slab:

-	slab =3D new_slab(s, gfp, local_node);
-	if (!slab)
-		goto out;
-
-	stat(s, ALLOC_SLAB);
-
 	/*
-	 * TODO: possible optimization - if we know we will consume the whole
-	 * slab we might skip creating the freelist?
+	 * If the remaining bulk allocation is large enough to consume
+	 * an entire slab, avoid building the freelist only to drain it
+	 * immediately. Instead, allocate a slab without a freelist and
+	 * hand out all objects directly.
 	 */
-	refilled +=3D alloc_from_new_slab(s, slab, p + refilled, max - refilled,
-					/* allow_spin =3D */ true);
+	if (bulk_refill_consumes_whole_slab(s, max - refilled)) {
+		slab =3D new_slab(s, gfp, local_node, false);
+		if (!slab)
+			goto out;
+		stat(s, ALLOC_SLAB);
+		refilled +=3D alloc_whole_from_new_slab(s, slab, p + refilled);
+	} else {
+		slab =3D new_slab(s, gfp, local_node, true);
+		if (!slab)
+			goto out;
+		stat(s, ALLOC_SLAB);
+		refilled +=3D alloc_from_new_slab(s, slab, p + refilled,
+						max - refilled,
+						/* allow_spin =3D */ true);
+	}

 	if (refilled < min)
 		goto new_slab;
@@ -7587,7 +7639,7 @@ static void early_kmem_cache_node_alloc(int node)

 	BUG_ON(kmem_cache_node->size < sizeof(struct kmem_cache_node));

-	slab =3D new_slab(kmem_cache_node, GFP_NOWAIT, node);
+	slab =3D new_slab(kmem_cache_node, GFP_NOWAIT, node, true);

 	BUG_ON(!slab);
 	if (slab_nid(slab) !=3D node) {
--=20
2.25.1