From nobody Thu Apr 9 14:49:47 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0337742315C for ; Mon, 2 Mar 2026 15:53:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772466822; cv=none; b=dTaIDNxakQrYK1asoAtdYnQ4L1Fsj4iDAdF6KkW5KOiBGtjK/SlQP1dfQSaDcEOKKeYqrSECDXGImCc2P72X7cOH3Vbh5B+0Qdo7ACKmho+uXztde0MMBh9Pz08VW+IMr4oyEYKSH7WbWXTyKECE+/9iCtvwx0QACgnARV6GwEc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772466822; c=relaxed/simple; bh=zIT1DyQDttuUSGUBnfePT7HKHoRo759ovKD3GmZCl10=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ZW0+xNZL+v67BYH8Ab9HrwTxw6Ch0Y03zdTTvBRXN+lsnFCkz3fIWJLsLIrxn5hsyPkLfv0w+tY8M0UFHlTwkC3KKHVplLgjp+Gf+0gDsWYHJplcaTCYrbPjn19VQNe76TOSSI/3mbT4U+mQBuAgogEQgdi4HAr0vZW4303qQM8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=DYqoHWcQ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DYqoHWcQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1772466812; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=nA4r8tdoDE9581pdteheJGgZx1L1mlw7TfGP6HByoC0=; b=DYqoHWcQ44WVfvzzxq/l25sETzjP26o119hBOc9+VZYpty+t96xiKDb6sMnW4xGENRmSpN VJxUbd/GKds3G0Ber5fnWVo9+zs/1s3V1sE7L29EmBJg+Jtxo4B5rQcTygh09eQNtf2BaB VaNRxGN0UQaAiEGjFZTluU2YzCRq17s= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-449-OUNtEChCNLK_YYkHVzdt6A-1; Mon, 02 Mar 2026 10:53:29 -0500 X-MC-Unique: OUNtEChCNLK_YYkHVzdt6A-1 X-Mimecast-MFC-AGG-ID: OUNtEChCNLK_YYkHVzdt6A_1772466807 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 165C21800465; Mon, 2 Mar 2026 15:53:27 +0000 (UTC) Received: from tpad.localdomain (unknown [10.96.133.6]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5175A1800576; Mon, 2 Mar 2026 15:53:26 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id B1AE14025F126; Mon, 2 Mar 2026 12:53:00 -0300 (-03) Message-ID: <20260302155105.306721019@redhat.com> User-Agent: quilt/0.69 Date: Mon, 02 Mar 2026 12:49:50 -0300 From: Marcelo Tosatti To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Vlastimil Babka , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Leonardo Bras , Thomas Gleixner , Waiman Long , Boqun Feun , Frederic Weisbecker , Marcelo Tosatti Subject: [PATCH v2 5/5] slub: apply new queue_percpu_work_on() interface References: <20260302154945.143996316@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make use of the new qpw_{un,}lock*() and queue_percpu_work_on() interface to improve performance & latency. For functions that may be scheduled in a different cpu, replace local_{un,}lock*() by qpw_{un,}lock*(), and replace schedule_work_on() by queue_percpu_work_on(). The same happens for flush_work() and flush_percpu_work(). This change requires allocation of qpw_structs instead of a work_structs, and changing parameters of a few functions to include the cpu parameter. This should bring no relevant performance impact on non-QPW kernels: For functions that may be scheduled in a different cpu, the local_*lock's this_cpu_ptr() becomes a per_cpu_ptr(smp_processor_id()). Signed-off-by: Leonardo Bras Signed-off-by: Marcelo Tosatti --- mm/slub.c | 146 +++++++++++++++++++++++++++++++--------------------------= ----- 1 file changed, 74 insertions(+), 72 deletions(-) Index: linux/mm/slub.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux.orig/mm/slub.c +++ linux/mm/slub.c @@ -50,6 +50,7 @@ #include #include #include +#include #include =20 #include "internal.h" @@ -129,7 +130,7 @@ * For debug caches, all allocations are forced to go through a list_lock * protected region to serialize against concurrent validation. * - * cpu_sheaves->lock (local_trylock) + * cpu_sheaves->lock (qpw_trylock) * * This lock protects fastpath operations on the percpu sheaves. On !RT = it * only disables preemption and does no atomic operations. As long as th= e main @@ -157,7 +158,7 @@ * Interrupts are disabled as part of list_lock or barn lock operations,= or * around the slab_lock operation, in order to make the slab allocator s= afe * to use in the context of an irq. - * Preemption is disabled as part of local_trylock operations. + * Preemption is disabled as part of qpw_trylock operations. * kmalloc_nolock() and kfree_nolock() are safe in NMI context but see * their limitations. * @@ -418,7 +419,7 @@ struct slab_sheaf { }; =20 struct slub_percpu_sheaves { - local_trylock_t lock; + qpw_trylock_t lock; struct slab_sheaf *main; /* never NULL when unlocked */ struct slab_sheaf *spare; /* empty or full, may be NULL */ struct slab_sheaf *rcu_free; /* for batching kfree_rcu() */ @@ -480,7 +481,7 @@ static nodemask_t slab_nodes; static struct workqueue_struct *flushwq; =20 struct slub_flush_work { - struct work_struct work; + struct qpw_struct qpw; struct kmem_cache *s; bool skip; }; @@ -2849,16 +2850,14 @@ static void __kmem_cache_free_bulk(struc * * Returns how many objects are remaining to be flushed */ -static unsigned int __sheaf_flush_main_batch(struct kmem_cache *s) +static unsigned int __sheaf_flush_main_batch(struct kmem_cache *s, int cpu) { struct slub_percpu_sheaves *pcs; unsigned int batch, remaining; void *objects[PCS_BATCH_MAX]; struct slab_sheaf *sheaf; =20 - lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock)); - - pcs =3D this_cpu_ptr(s->cpu_sheaves); + pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); sheaf =3D pcs->main; =20 batch =3D min(PCS_BATCH_MAX, sheaf->size); @@ -2868,7 +2867,7 @@ static unsigned int __sheaf_flush_main_b =20 remaining =3D sheaf->size; =20 - local_unlock(&s->cpu_sheaves->lock); + qpw_unlock(&s->cpu_sheaves->lock, cpu); =20 __kmem_cache_free_bulk(s, batch, &objects[0]); =20 @@ -2877,14 +2876,14 @@ static unsigned int __sheaf_flush_main_b return remaining; } =20 -static void sheaf_flush_main(struct kmem_cache *s) +static void sheaf_flush_main(struct kmem_cache *s, int cpu) { unsigned int remaining; =20 do { - local_lock(&s->cpu_sheaves->lock); + qpw_lock(&s->cpu_sheaves->lock, cpu); =20 - remaining =3D __sheaf_flush_main_batch(s); + remaining =3D __sheaf_flush_main_batch(s, cpu); =20 } while (remaining); } @@ -2898,11 +2897,13 @@ static bool sheaf_try_flush_main(struct bool ret =3D false; =20 do { - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!local_qpw_trylock(&s->cpu_sheaves->lock)) return ret; =20 ret =3D true; - remaining =3D __sheaf_flush_main_batch(s); + + lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock)); + remaining =3D __sheaf_flush_main_batch(s, smp_processor_id()); =20 } while (remaining); =20 @@ -2979,13 +2980,13 @@ static void rcu_free_sheaf_nobarn(struct * flushing operations are rare so let's keep it simple and flush to slabs * directly, skipping the barn */ -static void pcs_flush_all(struct kmem_cache *s) +static void pcs_flush_all(struct kmem_cache *s, int cpu) { struct slub_percpu_sheaves *pcs; struct slab_sheaf *spare, *rcu_free; =20 - local_lock(&s->cpu_sheaves->lock); - pcs =3D this_cpu_ptr(s->cpu_sheaves); + qpw_lock(&s->cpu_sheaves->lock, cpu); + pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); =20 spare =3D pcs->spare; pcs->spare =3D NULL; @@ -2993,7 +2994,7 @@ static void pcs_flush_all(struct kmem_ca rcu_free =3D pcs->rcu_free; pcs->rcu_free =3D NULL; =20 - local_unlock(&s->cpu_sheaves->lock); + qpw_unlock(&s->cpu_sheaves->lock, cpu); =20 if (spare) { sheaf_flush_unused(s, spare); @@ -3003,7 +3004,7 @@ static void pcs_flush_all(struct kmem_ca if (rcu_free) call_rcu(&rcu_free->rcu_head, rcu_free_sheaf_nobarn); =20 - sheaf_flush_main(s); + sheaf_flush_main(s, cpu); } =20 static void __pcs_flush_all_cpu(struct kmem_cache *s, unsigned int cpu) @@ -3953,13 +3954,13 @@ static void flush_cpu_sheaves(struct wor { struct kmem_cache *s; struct slub_flush_work *sfw; + int cpu =3D qpw_get_cpu(w); =20 - sfw =3D container_of(w, struct slub_flush_work, work); - + sfw =3D &per_cpu(slub_flush, cpu); s =3D sfw->s; =20 if (cache_has_sheaves(s)) - pcs_flush_all(s); + pcs_flush_all(s, cpu); } =20 static void flush_all_cpus_locked(struct kmem_cache *s) @@ -3976,17 +3977,17 @@ static void flush_all_cpus_locked(struct sfw->skip =3D true; continue; } - INIT_WORK(&sfw->work, flush_cpu_sheaves); + INIT_QPW(&sfw->qpw, flush_cpu_sheaves, cpu); sfw->skip =3D false; sfw->s =3D s; - queue_work_on(cpu, flushwq, &sfw->work); + queue_percpu_work_on(cpu, flushwq, &sfw->qpw); } =20 for_each_online_cpu(cpu) { sfw =3D &per_cpu(slub_flush, cpu); if (sfw->skip) continue; - flush_work(&sfw->work); + flush_percpu_work(&sfw->qpw); } =20 mutex_unlock(&flush_lock); @@ -4005,17 +4006,18 @@ static void flush_rcu_sheaf(struct work_ struct slab_sheaf *rcu_free; struct slub_flush_work *sfw; struct kmem_cache *s; + int cpu =3D qpw_get_cpu(w); =20 - sfw =3D container_of(w, struct slub_flush_work, work); + sfw =3D &per_cpu(slub_flush, cpu); s =3D sfw->s; =20 - local_lock(&s->cpu_sheaves->lock); - pcs =3D this_cpu_ptr(s->cpu_sheaves); + qpw_lock(&s->cpu_sheaves->lock, cpu); + pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); =20 rcu_free =3D pcs->rcu_free; pcs->rcu_free =3D NULL; =20 - local_unlock(&s->cpu_sheaves->lock); + qpw_unlock(&s->cpu_sheaves->lock, cpu); =20 if (rcu_free) call_rcu(&rcu_free->rcu_head, rcu_free_sheaf_nobarn); @@ -4040,14 +4042,14 @@ void flush_rcu_sheaves_on_cache(struct k * sure the __kfree_rcu_sheaf() finished its call_rcu() */ =20 - INIT_WORK(&sfw->work, flush_rcu_sheaf); + INIT_QPW(&sfw->qpw, flush_rcu_sheaf, cpu); sfw->s =3D s; - queue_work_on(cpu, flushwq, &sfw->work); + queue_percpu_work_on(cpu, flushwq, &sfw->qpw); } =20 for_each_online_cpu(cpu) { sfw =3D &per_cpu(slub_flush, cpu); - flush_work(&sfw->work); + flush_percpu_work(&sfw->qpw); } =20 mutex_unlock(&flush_lock); @@ -4555,11 +4557,11 @@ __pcs_replace_empty_main(struct kmem_cac struct node_barn *barn; bool can_alloc; =20 - lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock)); + qpw_lockdep_assert_held(&s->cpu_sheaves->lock); =20 /* Bootstrap or debug cache, back off */ if (unlikely(!cache_has_sheaves(s))) { - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); return NULL; } =20 @@ -4570,7 +4572,7 @@ __pcs_replace_empty_main(struct kmem_cac =20 barn =3D get_barn(s); if (!barn) { - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); return NULL; } =20 @@ -4596,7 +4598,7 @@ __pcs_replace_empty_main(struct kmem_cac } } =20 - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 if (!can_alloc) return NULL; @@ -4622,7 +4624,7 @@ __pcs_replace_empty_main(struct kmem_cac * we can reach here only when gfpflags_allow_blocking * so this must not be an irq */ - local_lock(&s->cpu_sheaves->lock); + local_qpw_lock(&s->cpu_sheaves->lock); pcs =3D this_cpu_ptr(s->cpu_sheaves); =20 /* @@ -4699,7 +4701,7 @@ void *alloc_from_pcs(struct kmem_cache * return NULL; } =20 - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!local_qpw_trylock(&s->cpu_sheaves->lock)) return NULL; =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); @@ -4719,7 +4721,7 @@ void *alloc_from_pcs(struct kmem_cache * * the current allocation or previous freeing process. */ if (page_to_nid(virt_to_page(object)) !=3D node) { - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); stat(s, ALLOC_NODE_MISMATCH); return NULL; } @@ -4727,7 +4729,7 @@ void *alloc_from_pcs(struct kmem_cache * =20 pcs->main->size--; =20 - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 stat(s, ALLOC_FASTPATH); =20 @@ -4744,7 +4746,7 @@ unsigned int alloc_from_pcs_bulk(struct unsigned int batch; =20 next_batch: - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!local_qpw_trylock(&s->cpu_sheaves->lock)) return allocated; =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); @@ -4755,7 +4757,7 @@ next_batch: struct node_barn *barn; =20 if (unlikely(!cache_has_sheaves(s))) { - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); return allocated; } =20 @@ -4766,7 +4768,7 @@ next_batch: =20 barn =3D get_barn(s); if (!barn) { - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); return allocated; } =20 @@ -4781,7 +4783,7 @@ next_batch: =20 stat(s, BARN_GET_FAIL); =20 - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 /* * Once full sheaves in barn are depleted, let the bulk @@ -4799,7 +4801,7 @@ do_alloc: main->size -=3D batch; memcpy(p, main->objects + main->size, batch * sizeof(void *)); =20 - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 stat_add(s, ALLOC_FASTPATH, batch); =20 @@ -4978,7 +4980,7 @@ kmem_cache_prefill_sheaf(struct kmem_cac return sheaf; } =20 - local_lock(&s->cpu_sheaves->lock); + local_qpw_lock(&s->cpu_sheaves->lock); pcs =3D this_cpu_ptr(s->cpu_sheaves); =20 if (pcs->spare) { @@ -4997,7 +4999,7 @@ kmem_cache_prefill_sheaf(struct kmem_cac stat(s, BARN_GET_FAIL); } =20 - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 =20 if (!sheaf) @@ -5041,7 +5043,7 @@ void kmem_cache_return_sheaf(struct kmem return; } =20 - local_lock(&s->cpu_sheaves->lock); + local_qpw_lock(&s->cpu_sheaves->lock); pcs =3D this_cpu_ptr(s->cpu_sheaves); barn =3D get_barn(s); =20 @@ -5051,7 +5053,7 @@ void kmem_cache_return_sheaf(struct kmem stat(s, SHEAF_RETURN_FAST); } =20 - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 if (!sheaf) return; @@ -5581,7 +5583,7 @@ static void __pcs_install_empty_sheaf(st struct slub_percpu_sheaves *pcs, struct slab_sheaf *empty, struct node_barn *barn) { - lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock)); + qpw_lockdep_assert_held(&s->cpu_sheaves->lock); =20 /* This is what we expect to find if nobody interrupted us. */ if (likely(!pcs->spare)) { @@ -5618,9 +5620,9 @@ static void __pcs_install_empty_sheaf(st /* * Replace the full main sheaf with a (at least partially) empty sheaf. * - * Must be called with the cpu_sheaves local lock locked. If successful, r= eturns - * the pcs pointer and the local lock locked (possibly on a different cpu = than - * initially called). If not successful, returns NULL and the local lock + * Must be called with the cpu_sheaves qpw lock locked. If successful, ret= urns + * the pcs pointer and the qpw lock locked (possibly on a different cpu th= an + * initially called). If not successful, returns NULL and the qpw lock * unlocked. */ static struct slub_percpu_sheaves * @@ -5632,17 +5634,17 @@ __pcs_replace_full_main(struct kmem_cach bool put_fail; =20 restart: - lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock)); + qpw_lockdep_assert_held(&s->cpu_sheaves->lock); =20 /* Bootstrap or debug cache, back off */ if (unlikely(!cache_has_sheaves(s))) { - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); return NULL; } =20 barn =3D get_barn(s); if (!barn) { - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); return NULL; } =20 @@ -5679,7 +5681,7 @@ restart: stat(s, BARN_PUT_FAIL); =20 pcs->spare =3D NULL; - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 sheaf_flush_unused(s, to_flush); empty =3D to_flush; @@ -5695,7 +5697,7 @@ restart: put_fail =3D true; =20 alloc_empty: - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 /* * alloc_empty_sheaf() doesn't support !allow_spin and it's @@ -5715,7 +5717,7 @@ alloc_empty: if (!sheaf_try_flush_main(s)) return NULL; =20 - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!local_qpw_trylock(&s->cpu_sheaves->lock)) return NULL; =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); @@ -5731,7 +5733,7 @@ alloc_empty: return pcs; =20 got_empty: - if (!local_trylock(&s->cpu_sheaves->lock)) { + if (!local_qpw_trylock(&s->cpu_sheaves->lock)) { barn_put_empty_sheaf(barn, empty); return NULL; } @@ -5751,7 +5753,7 @@ bool free_to_pcs(struct kmem_cache *s, v { struct slub_percpu_sheaves *pcs; =20 - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!local_qpw_trylock(&s->cpu_sheaves->lock)) return false; =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); @@ -5765,7 +5767,7 @@ bool free_to_pcs(struct kmem_cache *s, v =20 pcs->main->objects[pcs->main->size++] =3D object; =20 - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 stat(s, FREE_FASTPATH); =20 @@ -5855,7 +5857,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache =20 lock_map_acquire_try(&kfree_rcu_sheaf_map); =20 - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!local_qpw_trylock(&s->cpu_sheaves->lock)) goto fail; =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); @@ -5867,7 +5869,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache =20 /* Bootstrap or debug cache, fall back */ if (unlikely(!cache_has_sheaves(s))) { - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); goto fail; } =20 @@ -5879,7 +5881,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache =20 barn =3D get_barn(s); if (!barn) { - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); goto fail; } =20 @@ -5890,14 +5892,14 @@ bool __kfree_rcu_sheaf(struct kmem_cache goto do_free; } =20 - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 empty =3D alloc_empty_sheaf(s, GFP_NOWAIT); =20 if (!empty) goto fail; =20 - if (!local_trylock(&s->cpu_sheaves->lock)) { + if (!local_qpw_trylock(&s->cpu_sheaves->lock)) { barn_put_empty_sheaf(barn, empty); goto fail; } @@ -5934,7 +5936,7 @@ do_free: if (rcu_sheaf) call_rcu(&rcu_sheaf->rcu_head, rcu_free_sheaf); =20 - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 stat(s, FREE_RCU_SHEAF); lock_map_release(&kfree_rcu_sheaf_map); @@ -5990,7 +5992,7 @@ next_remote_batch: goto flush_remote; =20 next_batch: - if (!local_trylock(&s->cpu_sheaves->lock)) + if (!local_qpw_trylock(&s->cpu_sheaves->lock)) goto fallback; =20 pcs =3D this_cpu_ptr(s->cpu_sheaves); @@ -6033,7 +6035,7 @@ do_free: memcpy(main->objects + main->size, p, batch * sizeof(void *)); main->size +=3D batch; =20 - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 stat_add(s, FREE_FASTPATH, batch); =20 @@ -6049,7 +6051,7 @@ do_free: return; =20 no_empty: - local_unlock(&s->cpu_sheaves->lock); + local_qpw_unlock(&s->cpu_sheaves->lock); =20 /* * if we depleted all empty sheaves in the barn or there are too @@ -7454,7 +7456,7 @@ static int init_percpu_sheaves(struct km =20 pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); =20 - local_trylock_init(&pcs->lock); + qpw_trylock_init(&pcs->lock); =20 /* * Bootstrap sheaf has zero size so fast-path allocation fails.