From nobody Tue Feb 10 04:16:54 2026 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C36D831AF1F for ; Thu, 23 Oct 2025 13:53:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761227587; cv=none; b=IvWMYVSZJJS+tnKV/yzcIikEPU/ub2HwI6TgWRra14tXwqXNLgC97Dy2F4LEl44/Oof/s0vwJJiVjy0fmNGI6pYtZMjP9G1mzGDDO8vDF9nZ2Ttta4lFoTqUBRPXmhjpmVm0dmw63nYYERU/BcJlE91MvlXiDi711w+vNvxreec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761227587; c=relaxed/simple; bh=Ov8W60Dxhl8r2b+6632VLcvFwG0xCtI4hTAQ7Jp5kWQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=H1nho0gIXutF3ICeKP55eKMxYAiXUmzHMy6kmnyWObVog13cFo8msFofUIPYo9UvWWvlTuhATw5hkdyIbmX/kT9odsatMaHPMG+RNit6MWZLY+P9rd/aYlwOwauVD9Q9njOIDnjzYrP479G8OXbzQ8rNUpbyLnMWN9nKQUBkSz4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 75E561F769; Thu, 23 Oct 2025 13:53:01 +0000 (UTC) Authentication-Results: smtp-out2.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id ACAA713B03; Thu, 23 Oct 2025 13:52:53 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id QOvcKTUz+mjvQQAAD6G6ig (envelope-from ); Thu, 23 Oct 2025 13:52:53 +0000 From: Vlastimil Babka Date: Thu, 23 Oct 2025 15:52:29 +0200 Subject: [PATCH RFC 07/19] slab: make percpu sheaves compatible with kmalloc_nolock()/kfree_nolock() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251023-sheaves-for-all-v1-7-6ffa2c9941c0@suse.cz> References: <20251023-sheaves-for-all-v1-0-6ffa2c9941c0@suse.cz> In-Reply-To: <20251023-sheaves-for-all-v1-0-6ffa2c9941c0@suse.cz> To: Andrew Morton , Christoph Lameter , David Rientjes , Roman Gushchin , Harry Yoo Cc: Uladzislau Rezki , "Liam R. Howlett" , Suren Baghdasaryan , Sebastian Andrzej Siewior , Alexei Starovoitov , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev, bpf@vger.kernel.org, kasan-dev@googlegroups.com, Vlastimil Babka X-Mailer: b4 0.14.3 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Queue-Id: 75E561F769 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Action: no action X-Spam-Flag: NO X-Spam-Score: -4.00 X-Spam-Level: X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] Before we enable percpu sheaves for kmalloc caches, we need to make sure kmalloc_nolock() and kfree_nolock() will continue working properly and not spin when not allowed to. Percpu sheaves themselves use local_trylock() so they are already compatible. We just need to be careful with the barn->lock spin_lock. Pass a new allow_spin parameter where necessary to use spin_trylock_irqsave(). In kmalloc_nolock_noprof() we can now attempt alloc_from_pcs() safely, for now it will always fail until we enable sheaves for kmalloc caches next. Similarly in kfree_nolock() we can attempt free_to_pcs(). Signed-off-by: Vlastimil Babka --- mm/slub.c | 74 ++++++++++++++++++++++++++++++++++++++++++++---------------= ---- 1 file changed, 52 insertions(+), 22 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index ecb10ed5acfe..5d0b2cf66520 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2876,7 +2876,8 @@ static void pcs_destroy(struct kmem_cache *s) s->cpu_sheaves =3D NULL; } =20 -static struct slab_sheaf *barn_get_empty_sheaf(struct node_barn *barn) +static struct slab_sheaf *barn_get_empty_sheaf(struct node_barn *barn, + bool allow_spin) { struct slab_sheaf *empty =3D NULL; unsigned long flags; @@ -2884,7 +2885,10 @@ static struct slab_sheaf *barn_get_empty_sheaf(struc= t node_barn *barn) if (!data_race(barn->nr_empty)) return NULL; =20 - spin_lock_irqsave(&barn->lock, flags); + if (likely(allow_spin)) + spin_lock_irqsave(&barn->lock, flags); + else if (!spin_trylock_irqsave(&barn->lock, flags)) + return NULL; =20 if (likely(barn->nr_empty)) { empty =3D list_first_entry(&barn->sheaves_empty, @@ -2961,7 +2965,8 @@ static struct slab_sheaf *barn_get_full_or_empty_shea= f(struct node_barn *barn) * change. */ static struct slab_sheaf * -barn_replace_empty_sheaf(struct node_barn *barn, struct slab_sheaf *empty) +barn_replace_empty_sheaf(struct node_barn *barn, struct slab_sheaf *empty, + bool allow_spin) { struct slab_sheaf *full =3D NULL; unsigned long flags; @@ -2969,7 +2974,10 @@ barn_replace_empty_sheaf(struct node_barn *barn, str= uct slab_sheaf *empty) if (!data_race(barn->nr_full)) return NULL; =20 - spin_lock_irqsave(&barn->lock, flags); + if (likely(allow_spin)) + spin_lock_irqsave(&barn->lock, flags); + else if (!spin_trylock_irqsave(&barn->lock, flags)) + return NULL; =20 if (likely(barn->nr_full)) { full =3D list_first_entry(&barn->sheaves_full, struct slab_sheaf, @@ -2990,7 +2998,8 @@ barn_replace_empty_sheaf(struct node_barn *barn, stru= ct slab_sheaf *empty) * barn. But if there are too many full sheaves, reject this with -E2BIG. */ static struct slab_sheaf * -barn_replace_full_sheaf(struct node_barn *barn, struct slab_sheaf *full) +barn_replace_full_sheaf(struct node_barn *barn, struct slab_sheaf *full, + bool allow_spin) { struct slab_sheaf *empty; unsigned long flags; @@ -3001,7 +3010,10 @@ barn_replace_full_sheaf(struct node_barn *barn, stru= ct slab_sheaf *full) if (!data_race(barn->nr_empty)) return ERR_PTR(-ENOMEM); =20 - spin_lock_irqsave(&barn->lock, flags); + if (likely(allow_spin)) + spin_lock_irqsave(&barn->lock, flags); + else if (!spin_trylock_irqsave(&barn->lock, flags)) + return NULL; =20 if (likely(barn->nr_empty)) { empty =3D list_first_entry(&barn->sheaves_empty, struct slab_sheaf, @@ -5000,7 +5012,8 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct= slub_percpu_sheaves *pcs, return NULL; } =20 - full =3D barn_replace_empty_sheaf(barn, pcs->main); + full =3D barn_replace_empty_sheaf(barn, pcs->main, + gfpflags_allow_spinning(gfp)); =20 if (full) { stat(s, BARN_GET); @@ -5017,7 +5030,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct= slub_percpu_sheaves *pcs, empty =3D pcs->spare; pcs->spare =3D NULL; } else { - empty =3D barn_get_empty_sheaf(barn); + empty =3D barn_get_empty_sheaf(barn, true); } } =20 @@ -5154,7 +5167,8 @@ void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp,= int node) } =20 static __fastpath_inline -unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, size_t size, void *= *p) +unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t s= ize, + void **p) { struct slub_percpu_sheaves *pcs; struct slab_sheaf *main; @@ -5188,7 +5202,8 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s= , size_t size, void **p) return allocated; } =20 - full =3D barn_replace_empty_sheaf(barn, pcs->main); + full =3D barn_replace_empty_sheaf(barn, pcs->main, + gfpflags_allow_spinning(gfp)); =20 if (full) { stat(s, BARN_GET); @@ -5693,7 +5708,7 @@ void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_fl= ags, int node) gfp_t alloc_gfp =3D __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags; struct kmem_cache *s; bool can_retry =3D true; - void *ret =3D ERR_PTR(-EBUSY); + void *ret; =20 VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO | __GFP_NO_OBJ_EXT)); @@ -5720,6 +5735,13 @@ void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_f= lags, int node) */ return NULL; =20 + ret =3D alloc_from_pcs(s, alloc_gfp, node); + + if (ret) + goto success; + + ret =3D ERR_PTR(-EBUSY); + /* * Do not call slab_alloc_node(), since trylock mode isn't * compatible with slab_pre_alloc_hook/should_failslab and @@ -5756,6 +5778,7 @@ void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_fl= ags, int node) ret =3D NULL; } =20 +success: maybe_wipe_obj_freeptr(s, ret); slab_post_alloc_hook(s, NULL, alloc_gfp, 1, &ret, slab_want_init_on_alloc(alloc_gfp, s), size); @@ -6047,7 +6070,8 @@ static void __pcs_install_empty_sheaf(struct kmem_cac= he *s, * unlocked. */ static struct slub_percpu_sheaves * -__pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *= pcs) +__pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *= pcs, + bool allow_spin) { struct slab_sheaf *empty; struct node_barn *barn; @@ -6071,7 +6095,7 @@ __pcs_replace_full_main(struct kmem_cache *s, struct = slub_percpu_sheaves *pcs) put_fail =3D false; =20 if (!pcs->spare) { - empty =3D barn_get_empty_sheaf(barn); + empty =3D barn_get_empty_sheaf(barn, allow_spin); if (empty) { pcs->spare =3D pcs->main; pcs->main =3D empty; @@ -6085,7 +6109,7 @@ __pcs_replace_full_main(struct kmem_cache *s, struct = slub_percpu_sheaves *pcs) return pcs; } =20 - empty =3D barn_replace_full_sheaf(barn, pcs->main); + empty =3D barn_replace_full_sheaf(barn, pcs->main, allow_spin); =20 if (!IS_ERR(empty)) { stat(s, BARN_PUT); @@ -6093,6 +6117,11 @@ __pcs_replace_full_main(struct kmem_cache *s, struct= slub_percpu_sheaves *pcs) return pcs; } =20 + if (!allow_spin) { + local_unlock(&s->cpu_sheaves->lock); + return NULL; + } + if (PTR_ERR(empty) =3D=3D -E2BIG) { /* Since we got here, spare exists and is full */ struct slab_sheaf *to_flush =3D pcs->spare; @@ -6160,7 +6189,7 @@ __pcs_replace_full_main(struct kmem_cache *s, struct = slub_percpu_sheaves *pcs) * The object is expected to have passed slab_free_hook() already. */ static __fastpath_inline -bool free_to_pcs(struct kmem_cache *s, void *object) +bool free_to_pcs(struct kmem_cache *s, void *object, bool allow_spin) { struct slub_percpu_sheaves *pcs; =20 @@ -6171,7 +6200,7 @@ bool free_to_pcs(struct kmem_cache *s, void *object) =20 if (unlikely(pcs->main->size =3D=3D s->sheaf_capacity)) { =20 - pcs =3D __pcs_replace_full_main(s, pcs); + pcs =3D __pcs_replace_full_main(s, pcs, allow_spin); if (unlikely(!pcs)) return false; } @@ -6278,7 +6307,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *ob= j) goto fail; } =20 - empty =3D barn_get_empty_sheaf(barn); + empty =3D barn_get_empty_sheaf(barn, true); =20 if (empty) { pcs->rcu_free =3D empty; @@ -6398,7 +6427,7 @@ static void free_to_pcs_bulk(struct kmem_cache *s, si= ze_t size, void **p) goto no_empty; =20 if (!pcs->spare) { - empty =3D barn_get_empty_sheaf(barn); + empty =3D barn_get_empty_sheaf(barn, true); if (!empty) goto no_empty; =20 @@ -6412,7 +6441,7 @@ static void free_to_pcs_bulk(struct kmem_cache *s, si= ze_t size, void **p) goto do_free; } =20 - empty =3D barn_replace_full_sheaf(barn, pcs->main); + empty =3D barn_replace_full_sheaf(barn, pcs->main, true); if (IS_ERR(empty)) { stat(s, BARN_PUT_FAIL); goto no_empty; @@ -6659,7 +6688,7 @@ void slab_free(struct kmem_cache *s, struct slab *sla= b, void *object, =20 if (likely(!IS_ENABLED(CONFIG_NUMA) || slab_nid(slab) =3D=3D numa_mem_id(= )) && likely(!slab_test_pfmemalloc(slab))) { - if (likely(free_to_pcs(s, object))) + if (likely(free_to_pcs(s, object, true))) return; } =20 @@ -6922,7 +6951,8 @@ void kfree_nolock(const void *object) * since kasan quarantine takes locks and not supported from NMI. */ kasan_slab_free(s, x, false, false, /* skip quarantine */true); - do_slab_free(s, slab, x, x, 0, _RET_IP_); + if (!free_to_pcs(s, x, false)) + do_slab_free(s, slab, x, x, 0, _RET_IP_); } EXPORT_SYMBOL_GPL(kfree_nolock); =20 @@ -7465,7 +7495,7 @@ int kmem_cache_alloc_bulk_noprof(struct kmem_cache *s= , gfp_t flags, size_t size, size--; } =20 - i =3D alloc_from_pcs_bulk(s, size, p); + i =3D alloc_from_pcs_bulk(s, flags, size, p); =20 if (i < size) { /* --=20 2.51.1