From nobody Thu Apr  9 14:49:47 2026
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0337742315C
	for <linux-kernel@vger.kernel.org>; Mon,  2 Mar 2026 15:53:33 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=170.10.129.124
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1772466822; cv=none;
 b=dTaIDNxakQrYK1asoAtdYnQ4L1Fsj4iDAdF6KkW5KOiBGtjK/SlQP1dfQSaDcEOKKeYqrSECDXGImCc2P72X7cOH3Vbh5B+0Qdo7ACKmho+uXztde0MMBh9Pz08VW+IMr4oyEYKSH7WbWXTyKECE+/9iCtvwx0QACgnARV6GwEc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1772466822; c=relaxed/simple;
	bh=zIT1DyQDttuUSGUBnfePT7HKHoRo759ovKD3GmZCl10=;
	h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version:
	 Content-Type;
 b=ZW0+xNZL+v67BYH8Ab9HrwTxw6Ch0Y03zdTTvBRXN+lsnFCkz3fIWJLsLIrxn5hsyPkLfv0w+tY8M0UFHlTwkC3KKHVplLgjp+Gf+0gDsWYHJplcaTCYrbPjn19VQNe76TOSSI/3mbT4U+mQBuAgogEQgdi4HAr0vZW4303qQM8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=redhat.com;
 spf=pass smtp.mailfrom=redhat.com;
 dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com
 header.b=DYqoHWcQ; arc=none smtp.client-ip=170.10.129.124
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=redhat.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=redhat.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com
 header.b="DYqoHWcQ"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1772466812;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 references:references; bh=nA4r8tdoDE9581pdteheJGgZx1L1mlw7TfGP6HByoC0=;
	b=DYqoHWcQ44WVfvzzxq/l25sETzjP26o119hBOc9+VZYpty+t96xiKDb6sMnW4xGENRmSpN
	VJxUbd/GKds3G0Ber5fnWVo9+zs/1s3V1sE7L29EmBJg+Jtxo4B5rQcTygh09eQNtf2BaB
	VaNRxGN0UQaAiEGjFZTluU2YzCRq17s=
Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com
 (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by
 relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3,
 cipher=TLS_AES_256_GCM_SHA384) id us-mta-449-OUNtEChCNLK_YYkHVzdt6A-1; Mon,
 02 Mar 2026 10:53:29 -0500
X-MC-Unique: OUNtEChCNLK_YYkHVzdt6A-1
X-Mimecast-MFC-AGG-ID: OUNtEChCNLK_YYkHVzdt6A_1772466807
Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com
 (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest
 SHA256)
	(No client certificate requested)
	by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS
 id 165C21800465;
	Mon,  2 Mar 2026 15:53:27 +0000 (UTC)
Received: from tpad.localdomain (unknown [10.96.133.6])
	by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with
 ESMTPS id 5175A1800576;
	Mon,  2 Mar 2026 15:53:26 +0000 (UTC)
Received: by tpad.localdomain (Postfix, from userid 1000)
	id B1AE14025F126; Mon,  2 Mar 2026 12:53:00 -0300 (-03)
Message-ID: <20260302155105.306721019@redhat.com>
User-Agent: quilt/0.69
Date: Mon, 02 Mar 2026 12:49:50 -0300
From: Marcelo Tosatti <mtosatti@redhat.com>
To: linux-kernel@vger.kernel.org,
 linux-mm@kvack.org
Cc: Johannes Weiner <hannes@cmpxchg.org>,
 Michal Hocko <mhocko@kernel.org>,
 Roman Gushchin <roman.gushchin@linux.dev>,
 Shakeel Butt <shakeel.butt@linux.dev>,
 Muchun Song <muchun.song@linux.dev>,
 Andrew Morton <akpm@linux-foundation.org>,
 Christoph Lameter <cl@linux.com>,
 Pekka Enberg <penberg@kernel.org>,
 David Rientjes <rientjes@google.com>,
 Joonsoo Kim <iamjoonsoo.kim@lge.com>,
 Vlastimil Babka <vbabka@suse.cz>,
 Hyeonggon Yoo <42.hyeyoo@gmail.com>,
 Leonardo Bras <leobras.c@gmail.com>,
 Thomas Gleixner <tglx@linutronix.de>,
 Waiman Long <longman@redhat.com>,
 Boqun Feun <boqun.feng@gmail.com>,
 Frederic Weisbecker <frederic@kernel.org>,
 Marcelo Tosatti <mtosatti@redhat.com>
Subject: [PATCH v2 5/5] slub: apply new queue_percpu_work_on() interface
References: <20260302154945.143996316@redhat.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Make use of the new qpw_{un,}lock*() and queue_percpu_work_on()
interface to improve performance & latency.

For functions that may be scheduled in a different cpu, replace
local_{un,}lock*() by qpw_{un,}lock*(), and replace schedule_work_on() by
queue_percpu_work_on(). The same happens for flush_work() and
flush_percpu_work().

This change requires allocation of qpw_structs instead of a work_structs,
and changing parameters of a few functions to include the cpu parameter.

This should bring no relevant performance impact on non-QPW kernels:
For functions that may be scheduled in a different cpu, the local_*lock's
this_cpu_ptr() becomes a per_cpu_ptr(smp_processor_id()).

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 mm/slub.c |  146 +++++++++++++++++++++++++++++++--------------------------=
-----
 1 file changed, 74 insertions(+), 72 deletions(-)

Index: linux/mm/slub.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- linux.orig/mm/slub.c
+++ linux/mm/slub.c
@@ -50,6 +50,7 @@
 #include <linux/irq_work.h>
 #include <linux/kprobes.h>
 #include <linux/debugfs.h>
+#include <linux/qpw.h>
 #include <trace/events/kmem.h>
=20
 #include "internal.h"
@@ -129,7 +130,7 @@
  *   For debug caches, all allocations are forced to go through a list_lock
  *   protected region to serialize against concurrent validation.
  *
- *   cpu_sheaves->lock (local_trylock)
+ *   cpu_sheaves->lock (qpw_trylock)
  *
  *   This lock protects fastpath operations on the percpu sheaves. On !RT =
it
  *   only disables preemption and does no atomic operations. As long as th=
e main
@@ -157,7 +158,7 @@
  *   Interrupts are disabled as part of list_lock or barn lock operations,=
 or
  *   around the slab_lock operation, in order to make the slab allocator s=
afe
  *   to use in the context of an irq.
- *   Preemption is disabled as part of local_trylock operations.
+ *   Preemption is disabled as part of qpw_trylock operations.
  *   kmalloc_nolock() and kfree_nolock() are safe in NMI context but see
  *   their limitations.
  *
@@ -418,7 +419,7 @@ struct slab_sheaf {
 };
=20
 struct slub_percpu_sheaves {
-	local_trylock_t lock;
+	qpw_trylock_t lock;
 	struct slab_sheaf *main; /* never NULL when unlocked */
 	struct slab_sheaf *spare; /* empty or full, may be NULL */
 	struct slab_sheaf *rcu_free; /* for batching kfree_rcu() */
@@ -480,7 +481,7 @@ static nodemask_t slab_nodes;
 static struct workqueue_struct *flushwq;
=20
 struct slub_flush_work {
-	struct work_struct work;
+	struct qpw_struct qpw;
 	struct kmem_cache *s;
 	bool skip;
 };
@@ -2849,16 +2850,14 @@ static void __kmem_cache_free_bulk(struc
  *
  * Returns how many objects are remaining to be flushed
  */
-static unsigned int __sheaf_flush_main_batch(struct kmem_cache *s)
+static unsigned int __sheaf_flush_main_batch(struct kmem_cache *s, int cpu)
 {
 	struct slub_percpu_sheaves *pcs;
 	unsigned int batch, remaining;
 	void *objects[PCS_BATCH_MAX];
 	struct slab_sheaf *sheaf;
=20
-	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
-
-	pcs =3D this_cpu_ptr(s->cpu_sheaves);
+	pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu);
 	sheaf =3D pcs->main;
=20
 	batch =3D min(PCS_BATCH_MAX, sheaf->size);
@@ -2868,7 +2867,7 @@ static unsigned int __sheaf_flush_main_b
=20
 	remaining =3D sheaf->size;
=20
-	local_unlock(&s->cpu_sheaves->lock);
+	qpw_unlock(&s->cpu_sheaves->lock, cpu);
=20
 	__kmem_cache_free_bulk(s, batch, &objects[0]);
=20
@@ -2877,14 +2876,14 @@ static unsigned int __sheaf_flush_main_b
 	return remaining;
 }
=20
-static void sheaf_flush_main(struct kmem_cache *s)
+static void sheaf_flush_main(struct kmem_cache *s, int cpu)
 {
 	unsigned int remaining;
=20
 	do {
-		local_lock(&s->cpu_sheaves->lock);
+		qpw_lock(&s->cpu_sheaves->lock, cpu);
=20
-		remaining =3D __sheaf_flush_main_batch(s);
+		remaining =3D __sheaf_flush_main_batch(s, cpu);
=20
 	} while (remaining);
 }
@@ -2898,11 +2897,13 @@ static bool sheaf_try_flush_main(struct
 	bool ret =3D false;
=20
 	do {
-		if (!local_trylock(&s->cpu_sheaves->lock))
+		if (!local_qpw_trylock(&s->cpu_sheaves->lock))
 			return ret;
=20
 		ret =3D true;
-		remaining =3D __sheaf_flush_main_batch(s);
+
+		lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
+		remaining =3D __sheaf_flush_main_batch(s, smp_processor_id());
=20
 	} while (remaining);
=20
@@ -2979,13 +2980,13 @@ static void rcu_free_sheaf_nobarn(struct
  * flushing operations are rare so let's keep it simple and flush to slabs
  * directly, skipping the barn
  */
-static void pcs_flush_all(struct kmem_cache *s)
+static void pcs_flush_all(struct kmem_cache *s, int cpu)
 {
 	struct slub_percpu_sheaves *pcs;
 	struct slab_sheaf *spare, *rcu_free;
=20
-	local_lock(&s->cpu_sheaves->lock);
-	pcs =3D this_cpu_ptr(s->cpu_sheaves);
+	qpw_lock(&s->cpu_sheaves->lock, cpu);
+	pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu);
=20
 	spare =3D pcs->spare;
 	pcs->spare =3D NULL;
@@ -2993,7 +2994,7 @@ static void pcs_flush_all(struct kmem_ca
 	rcu_free =3D pcs->rcu_free;
 	pcs->rcu_free =3D NULL;
=20
-	local_unlock(&s->cpu_sheaves->lock);
+	qpw_unlock(&s->cpu_sheaves->lock, cpu);
=20
 	if (spare) {
 		sheaf_flush_unused(s, spare);
@@ -3003,7 +3004,7 @@ static void pcs_flush_all(struct kmem_ca
 	if (rcu_free)
 		call_rcu(&rcu_free->rcu_head, rcu_free_sheaf_nobarn);
=20
-	sheaf_flush_main(s);
+	sheaf_flush_main(s, cpu);
 }
=20
 static void __pcs_flush_all_cpu(struct kmem_cache *s, unsigned int cpu)
@@ -3953,13 +3954,13 @@ static void flush_cpu_sheaves(struct wor
 {
 	struct kmem_cache *s;
 	struct slub_flush_work *sfw;
+	int cpu =3D qpw_get_cpu(w);
=20
-	sfw =3D container_of(w, struct slub_flush_work, work);
-
+	sfw =3D &per_cpu(slub_flush, cpu);
 	s =3D sfw->s;
=20
 	if (cache_has_sheaves(s))
-		pcs_flush_all(s);
+		pcs_flush_all(s, cpu);
 }
=20
 static void flush_all_cpus_locked(struct kmem_cache *s)
@@ -3976,17 +3977,17 @@ static void flush_all_cpus_locked(struct
 			sfw->skip =3D true;
 			continue;
 		}
-		INIT_WORK(&sfw->work, flush_cpu_sheaves);
+		INIT_QPW(&sfw->qpw, flush_cpu_sheaves, cpu);
 		sfw->skip =3D false;
 		sfw->s =3D s;
-		queue_work_on(cpu, flushwq, &sfw->work);
+		queue_percpu_work_on(cpu, flushwq, &sfw->qpw);
 	}
=20
 	for_each_online_cpu(cpu) {
 		sfw =3D &per_cpu(slub_flush, cpu);
 		if (sfw->skip)
 			continue;
-		flush_work(&sfw->work);
+		flush_percpu_work(&sfw->qpw);
 	}
=20
 	mutex_unlock(&flush_lock);
@@ -4005,17 +4006,18 @@ static void flush_rcu_sheaf(struct work_
 	struct slab_sheaf *rcu_free;
 	struct slub_flush_work *sfw;
 	struct kmem_cache *s;
+	int cpu =3D qpw_get_cpu(w);
=20
-	sfw =3D container_of(w, struct slub_flush_work, work);
+	sfw =3D &per_cpu(slub_flush, cpu);
 	s =3D sfw->s;
=20
-	local_lock(&s->cpu_sheaves->lock);
-	pcs =3D this_cpu_ptr(s->cpu_sheaves);
+	qpw_lock(&s->cpu_sheaves->lock, cpu);
+	pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu);
=20
 	rcu_free =3D pcs->rcu_free;
 	pcs->rcu_free =3D NULL;
=20
-	local_unlock(&s->cpu_sheaves->lock);
+	qpw_unlock(&s->cpu_sheaves->lock, cpu);
=20
 	if (rcu_free)
 		call_rcu(&rcu_free->rcu_head, rcu_free_sheaf_nobarn);
@@ -4040,14 +4042,14 @@ void flush_rcu_sheaves_on_cache(struct k
 		 * sure the __kfree_rcu_sheaf() finished its call_rcu()
 		 */
=20
-		INIT_WORK(&sfw->work, flush_rcu_sheaf);
+		INIT_QPW(&sfw->qpw, flush_rcu_sheaf, cpu);
 		sfw->s =3D s;
-		queue_work_on(cpu, flushwq, &sfw->work);
+		queue_percpu_work_on(cpu, flushwq, &sfw->qpw);
 	}
=20
 	for_each_online_cpu(cpu) {
 		sfw =3D &per_cpu(slub_flush, cpu);
-		flush_work(&sfw->work);
+		flush_percpu_work(&sfw->qpw);
 	}
=20
 	mutex_unlock(&flush_lock);
@@ -4555,11 +4557,11 @@ __pcs_replace_empty_main(struct kmem_cac
 	struct node_barn *barn;
 	bool can_alloc;
=20
-	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
+	qpw_lockdep_assert_held(&s->cpu_sheaves->lock);
=20
 	/* Bootstrap or debug cache, back off */
 	if (unlikely(!cache_has_sheaves(s))) {
-		local_unlock(&s->cpu_sheaves->lock);
+		local_qpw_unlock(&s->cpu_sheaves->lock);
 		return NULL;
 	}
=20
@@ -4570,7 +4572,7 @@ __pcs_replace_empty_main(struct kmem_cac
=20
 	barn =3D get_barn(s);
 	if (!barn) {
-		local_unlock(&s->cpu_sheaves->lock);
+		local_qpw_unlock(&s->cpu_sheaves->lock);
 		return NULL;
 	}
=20
@@ -4596,7 +4598,7 @@ __pcs_replace_empty_main(struct kmem_cac
 		}
 	}
=20
-	local_unlock(&s->cpu_sheaves->lock);
+	local_qpw_unlock(&s->cpu_sheaves->lock);
=20
 	if (!can_alloc)
 		return NULL;
@@ -4622,7 +4624,7 @@ __pcs_replace_empty_main(struct kmem_cac
 	 * we can reach here only when gfpflags_allow_blocking
 	 * so this must not be an irq
 	 */
-	local_lock(&s->cpu_sheaves->lock);
+	local_qpw_lock(&s->cpu_sheaves->lock);
 	pcs =3D this_cpu_ptr(s->cpu_sheaves);
=20
 	/*
@@ -4699,7 +4701,7 @@ void *alloc_from_pcs(struct kmem_cache *
 		return NULL;
 	}
=20
-	if (!local_trylock(&s->cpu_sheaves->lock))
+	if (!local_qpw_trylock(&s->cpu_sheaves->lock))
 		return NULL;
=20
 	pcs =3D this_cpu_ptr(s->cpu_sheaves);
@@ -4719,7 +4721,7 @@ void *alloc_from_pcs(struct kmem_cache *
 		 * the current allocation or previous freeing process.
 		 */
 		if (page_to_nid(virt_to_page(object)) !=3D node) {
-			local_unlock(&s->cpu_sheaves->lock);
+			local_qpw_unlock(&s->cpu_sheaves->lock);
 			stat(s, ALLOC_NODE_MISMATCH);
 			return NULL;
 		}
@@ -4727,7 +4729,7 @@ void *alloc_from_pcs(struct kmem_cache *
=20
 	pcs->main->size--;
=20
-	local_unlock(&s->cpu_sheaves->lock);
+	local_qpw_unlock(&s->cpu_sheaves->lock);
=20
 	stat(s, ALLOC_FASTPATH);
=20
@@ -4744,7 +4746,7 @@ unsigned int alloc_from_pcs_bulk(struct
 	unsigned int batch;
=20
 next_batch:
-	if (!local_trylock(&s->cpu_sheaves->lock))
+	if (!local_qpw_trylock(&s->cpu_sheaves->lock))
 		return allocated;
=20
 	pcs =3D this_cpu_ptr(s->cpu_sheaves);
@@ -4755,7 +4757,7 @@ next_batch:
 		struct node_barn *barn;
=20
 		if (unlikely(!cache_has_sheaves(s))) {
-			local_unlock(&s->cpu_sheaves->lock);
+			local_qpw_unlock(&s->cpu_sheaves->lock);
 			return allocated;
 		}
=20
@@ -4766,7 +4768,7 @@ next_batch:
=20
 		barn =3D get_barn(s);
 		if (!barn) {
-			local_unlock(&s->cpu_sheaves->lock);
+			local_qpw_unlock(&s->cpu_sheaves->lock);
 			return allocated;
 		}
=20
@@ -4781,7 +4783,7 @@ next_batch:
=20
 		stat(s, BARN_GET_FAIL);
=20
-		local_unlock(&s->cpu_sheaves->lock);
+		local_qpw_unlock(&s->cpu_sheaves->lock);
=20
 		/*
 		 * Once full sheaves in barn are depleted, let the bulk
@@ -4799,7 +4801,7 @@ do_alloc:
 	main->size -=3D batch;
 	memcpy(p, main->objects + main->size, batch * sizeof(void *));
=20
-	local_unlock(&s->cpu_sheaves->lock);
+	local_qpw_unlock(&s->cpu_sheaves->lock);
=20
 	stat_add(s, ALLOC_FASTPATH, batch);
=20
@@ -4978,7 +4980,7 @@ kmem_cache_prefill_sheaf(struct kmem_cac
 		return sheaf;
 	}
=20
-	local_lock(&s->cpu_sheaves->lock);
+	local_qpw_lock(&s->cpu_sheaves->lock);
 	pcs =3D this_cpu_ptr(s->cpu_sheaves);
=20
 	if (pcs->spare) {
@@ -4997,7 +4999,7 @@ kmem_cache_prefill_sheaf(struct kmem_cac
 			stat(s, BARN_GET_FAIL);
 	}
=20
-	local_unlock(&s->cpu_sheaves->lock);
+	local_qpw_unlock(&s->cpu_sheaves->lock);
=20
=20
 	if (!sheaf)
@@ -5041,7 +5043,7 @@ void kmem_cache_return_sheaf(struct kmem
 		return;
 	}
=20
-	local_lock(&s->cpu_sheaves->lock);
+	local_qpw_lock(&s->cpu_sheaves->lock);
 	pcs =3D this_cpu_ptr(s->cpu_sheaves);
 	barn =3D get_barn(s);
=20
@@ -5051,7 +5053,7 @@ void kmem_cache_return_sheaf(struct kmem
 		stat(s, SHEAF_RETURN_FAST);
 	}
=20
-	local_unlock(&s->cpu_sheaves->lock);
+	local_qpw_unlock(&s->cpu_sheaves->lock);
=20
 	if (!sheaf)
 		return;
@@ -5581,7 +5583,7 @@ static void __pcs_install_empty_sheaf(st
 		struct slub_percpu_sheaves *pcs, struct slab_sheaf *empty,
 		struct node_barn *barn)
 {
-	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
+	qpw_lockdep_assert_held(&s->cpu_sheaves->lock);
=20
 	/* This is what we expect to find if nobody interrupted us. */
 	if (likely(!pcs->spare)) {
@@ -5618,9 +5620,9 @@ static void __pcs_install_empty_sheaf(st
 /*
  * Replace the full main sheaf with a (at least partially) empty sheaf.
  *
- * Must be called with the cpu_sheaves local lock locked. If successful, r=
eturns
- * the pcs pointer and the local lock locked (possibly on a different cpu =
than
- * initially called). If not successful, returns NULL and the local lock
+ * Must be called with the cpu_sheaves qpw lock locked. If successful, ret=
urns
+ * the pcs pointer and the qpw lock locked (possibly on a different cpu th=
an
+ * initially called). If not successful, returns NULL and the qpw lock
  * unlocked.
  */
 static struct slub_percpu_sheaves *
@@ -5632,17 +5634,17 @@ __pcs_replace_full_main(struct kmem_cach
 	bool put_fail;
=20
 restart:
-	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
+	qpw_lockdep_assert_held(&s->cpu_sheaves->lock);
=20
 	/* Bootstrap or debug cache, back off */
 	if (unlikely(!cache_has_sheaves(s))) {
-		local_unlock(&s->cpu_sheaves->lock);
+		local_qpw_unlock(&s->cpu_sheaves->lock);
 		return NULL;
 	}
=20
 	barn =3D get_barn(s);
 	if (!barn) {
-		local_unlock(&s->cpu_sheaves->lock);
+		local_qpw_unlock(&s->cpu_sheaves->lock);
 		return NULL;
 	}
=20
@@ -5679,7 +5681,7 @@ restart:
 		stat(s, BARN_PUT_FAIL);
=20
 		pcs->spare =3D NULL;
-		local_unlock(&s->cpu_sheaves->lock);
+		local_qpw_unlock(&s->cpu_sheaves->lock);
=20
 		sheaf_flush_unused(s, to_flush);
 		empty =3D to_flush;
@@ -5695,7 +5697,7 @@ restart:
 	put_fail =3D true;
=20
 alloc_empty:
-	local_unlock(&s->cpu_sheaves->lock);
+	local_qpw_unlock(&s->cpu_sheaves->lock);
=20
 	/*
 	 * alloc_empty_sheaf() doesn't support !allow_spin and it's
@@ -5715,7 +5717,7 @@ alloc_empty:
 	if (!sheaf_try_flush_main(s))
 		return NULL;
=20
-	if (!local_trylock(&s->cpu_sheaves->lock))
+	if (!local_qpw_trylock(&s->cpu_sheaves->lock))
 		return NULL;
=20
 	pcs =3D this_cpu_ptr(s->cpu_sheaves);
@@ -5731,7 +5733,7 @@ alloc_empty:
 	return pcs;
=20
 got_empty:
-	if (!local_trylock(&s->cpu_sheaves->lock)) {
+	if (!local_qpw_trylock(&s->cpu_sheaves->lock)) {
 		barn_put_empty_sheaf(barn, empty);
 		return NULL;
 	}
@@ -5751,7 +5753,7 @@ bool free_to_pcs(struct kmem_cache *s, v
 {
 	struct slub_percpu_sheaves *pcs;
=20
-	if (!local_trylock(&s->cpu_sheaves->lock))
+	if (!local_qpw_trylock(&s->cpu_sheaves->lock))
 		return false;
=20
 	pcs =3D this_cpu_ptr(s->cpu_sheaves);
@@ -5765,7 +5767,7 @@ bool free_to_pcs(struct kmem_cache *s, v
=20
 	pcs->main->objects[pcs->main->size++] =3D object;
=20
-	local_unlock(&s->cpu_sheaves->lock);
+	local_qpw_unlock(&s->cpu_sheaves->lock);
=20
 	stat(s, FREE_FASTPATH);
=20
@@ -5855,7 +5857,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache
=20
 	lock_map_acquire_try(&kfree_rcu_sheaf_map);
=20
-	if (!local_trylock(&s->cpu_sheaves->lock))
+	if (!local_qpw_trylock(&s->cpu_sheaves->lock))
 		goto fail;
=20
 	pcs =3D this_cpu_ptr(s->cpu_sheaves);
@@ -5867,7 +5869,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache
=20
 		/* Bootstrap or debug cache, fall back */
 		if (unlikely(!cache_has_sheaves(s))) {
-			local_unlock(&s->cpu_sheaves->lock);
+			local_qpw_unlock(&s->cpu_sheaves->lock);
 			goto fail;
 		}
=20
@@ -5879,7 +5881,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache
=20
 		barn =3D get_barn(s);
 		if (!barn) {
-			local_unlock(&s->cpu_sheaves->lock);
+			local_qpw_unlock(&s->cpu_sheaves->lock);
 			goto fail;
 		}
=20
@@ -5890,14 +5892,14 @@ bool __kfree_rcu_sheaf(struct kmem_cache
 			goto do_free;
 		}
=20
-		local_unlock(&s->cpu_sheaves->lock);
+		local_qpw_unlock(&s->cpu_sheaves->lock);
=20
 		empty =3D alloc_empty_sheaf(s, GFP_NOWAIT);
=20
 		if (!empty)
 			goto fail;
=20
-		if (!local_trylock(&s->cpu_sheaves->lock)) {
+		if (!local_qpw_trylock(&s->cpu_sheaves->lock)) {
 			barn_put_empty_sheaf(barn, empty);
 			goto fail;
 		}
@@ -5934,7 +5936,7 @@ do_free:
 	if (rcu_sheaf)
 		call_rcu(&rcu_sheaf->rcu_head, rcu_free_sheaf);
=20
-	local_unlock(&s->cpu_sheaves->lock);
+	local_qpw_unlock(&s->cpu_sheaves->lock);
=20
 	stat(s, FREE_RCU_SHEAF);
 	lock_map_release(&kfree_rcu_sheaf_map);
@@ -5990,7 +5992,7 @@ next_remote_batch:
 		goto flush_remote;
=20
 next_batch:
-	if (!local_trylock(&s->cpu_sheaves->lock))
+	if (!local_qpw_trylock(&s->cpu_sheaves->lock))
 		goto fallback;
=20
 	pcs =3D this_cpu_ptr(s->cpu_sheaves);
@@ -6033,7 +6035,7 @@ do_free:
 	memcpy(main->objects + main->size, p, batch * sizeof(void *));
 	main->size +=3D batch;
=20
-	local_unlock(&s->cpu_sheaves->lock);
+	local_qpw_unlock(&s->cpu_sheaves->lock);
=20
 	stat_add(s, FREE_FASTPATH, batch);
=20
@@ -6049,7 +6051,7 @@ do_free:
 	return;
=20
 no_empty:
-	local_unlock(&s->cpu_sheaves->lock);
+	local_qpw_unlock(&s->cpu_sheaves->lock);
=20
 	/*
 	 * if we depleted all empty sheaves in the barn or there are too
@@ -7454,7 +7456,7 @@ static int init_percpu_sheaves(struct km
=20
 		pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu);
=20
-		local_trylock_init(&pcs->lock);
+		qpw_trylock_init(&pcs->lock);
=20
 		/*
 		 * Bootstrap sheaf has zero size so fast-path allocation fails.