From nobody Fri Oct 3 18:10:03 2025 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C230B335BAA for ; Wed, 27 Aug 2025 08:26:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283199; cv=none; b=fPAyxMGMqLKurU36n/q5VcHGGmPIkF0U1d8MfvAu/e4rEPN7jrlYZuG+SWLmQ+OvGhGhLzpFGPVc5HpdBtF23elPKwNfUuElbZWzgHqk09Y63pM/8wQyLQqSizzwu0zREB3eFW78SyD7ddjwTxWtJumKCfNkIzImNMdaWRG04VM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283199; c=relaxed/simple; bh=DO+/b1fm8eiQbtTkvGgyjEpChZcHLnwFFBNSKp6Po4I=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=bAPd/b9XxA/SO9gnlh3bxtPfJ1q/zs+xMCe4MTT21iGFn0bH6oVxApwOcp1uWK/XHqYMTsAFgTAu8fwcH7fiBECORaX8McUAdXo0xgeiORYfP3ouQClhjvPqvhpKF2sqUhCHBkXyJYnPcePEf3GoOuLpLvtabcg0tXgBZhq0d8I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=fWZ3VKM3; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=CE9USv0j; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=fWZ3VKM3; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=CE9USv0j; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="fWZ3VKM3"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="CE9USv0j"; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="fWZ3VKM3"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="CE9USv0j" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 289D521F9E; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lCAb571qgZEf5w34vVc+FdxmgCSPpBunGHGMPiUvzjc=; b=fWZ3VKM3efiPGsd1PSyEMGOcyKwZKIyDvSvJ/e+klC7PTeclT2hHi7wbwMHeDuYO2D3Xzk 731YNld4vwkApqJz9w2RKOhTD4AocxRQTND9pcyS/LqUX7jWMNzYQ0dh7jxw0eJEgZyETP 75+0hhVlAWCAOzHxm6HvvlNjHGetoHE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lCAb571qgZEf5w34vVc+FdxmgCSPpBunGHGMPiUvzjc=; b=CE9USv0jQGth3V9w8477Htow3GLkk6lXsEXd/4X1MlGUXqtzVYeNBRoE27twQxqq47dfgX WFC2Lm65fK3C1CCA== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lCAb571qgZEf5w34vVc+FdxmgCSPpBunGHGMPiUvzjc=; b=fWZ3VKM3efiPGsd1PSyEMGOcyKwZKIyDvSvJ/e+klC7PTeclT2hHi7wbwMHeDuYO2D3Xzk 731YNld4vwkApqJz9w2RKOhTD4AocxRQTND9pcyS/LqUX7jWMNzYQ0dh7jxw0eJEgZyETP 75+0hhVlAWCAOzHxm6HvvlNjHGetoHE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lCAb571qgZEf5w34vVc+FdxmgCSPpBunGHGMPiUvzjc=; b=CE9USv0jQGth3V9w8477Htow3GLkk6lXsEXd/4X1MlGUXqtzVYeNBRoE27twQxqq47dfgX WFC2Lm65fK3C1CCA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 0C85D13A31; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id sKnGAjvBrmhNfgAAD6G6ig (envelope-from ); Wed, 27 Aug 2025 08:26:35 +0000 From: Vlastimil Babka Date: Wed, 27 Aug 2025 10:26:33 +0200 Subject: [PATCH v6 01/10] slab: simplify init_kmem_cache_nodes() error handling Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250827-slub-percpu-caches-v6-1-f0f775a3f73f@suse.cz> References: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> In-Reply-To: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> To: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes Cc: Roman Gushchin , Harry Yoo , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, vbabka@suse.cz X-Mailer: b4 0.14.2 X-Spamd-Result: default: False [-8.30 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-0.999]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_ALL(0.00)[]; FUZZY_RATELIMITED(0.00)[rspamd.com]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_TWELVE(0.00)[12]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; FROM_HAS_DN(0.00)[]; FREEMAIL_CC(0.00)[linux.dev,oracle.com,gmail.com,kvack.org,vger.kernel.org,lists.infradead.org,suse.cz]; TO_DN_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo,suse.cz:mid,suse.cz:email] X-Spam-Flag: NO X-Spam-Level: X-Spam-Score: -8.30 We don't need to call free_kmem_cache_nodes() immediately when failing to allocate a kmem_cache_node, because when we return 0, do_kmem_cache_create() calls __kmem_cache_release() which also performs free_kmem_cache_nodes(). Signed-off-by: Vlastimil Babka --- mm/slub.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 30003763d224c2704a4b93082b8b47af12dcffc5..9f671ec76131c4b0b28d5d568aa= 45842b5efb6d4 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5669,10 +5669,8 @@ static int init_kmem_cache_nodes(struct kmem_cache *= s) n =3D kmem_cache_alloc_node(kmem_cache_node, GFP_KERNEL, node); =20 - if (!n) { - free_kmem_cache_nodes(s); + if (!n) return 0; - } =20 init_kmem_cache_node(n); s->node[node] =3D n; --=20 2.51.0 From nobody Fri Oct 3 18:10:03 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C67FE33CEAF for ; Wed, 27 Aug 2025 08:26:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283208; cv=none; b=F5hfNA6qEba1vyqIBIT8ttjSZ8OIs94CnIbEyorvTXL76GWQaoFk0GD/dOntJWuJrP5R8ATujy//BNaTZk5tIwIQmjiNaDRtdLQYwtWIwEb8XLvzw9G3t18Vos+R5YQx4Urs4mpe30CyULDIAfX5w7IunDz+c+NjYodFPTwunxw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283208; c=relaxed/simple; bh=4aCE+hJ6LrQMUeMfGDJTG5Tq/GSm5XFG0tgC0adtKY8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Yvl71njFl+F1b9RcyZpq0tfaIhiOJ8ua4xANG2fOGd3tx/3yqa+fhv/Spsd7HGEWWYeL1e0Jnhuslq2a9NUDcXQmcBkF6BoCXCSYNT5zv1v0TdHs1dryEFilpXoG7MRDOcAKtBP2ITU3jUJE8Ob6nR5BFrqALl2zFEjXabKuj5Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=HYmV0iqg; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=r79z/e+K; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=HYmV0iqg; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=r79z/e+K; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="HYmV0iqg"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="r79z/e+K"; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="HYmV0iqg"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="r79z/e+K" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 37EF31FF20; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rlRtOaeDNrio/hDcemDpLp8x7gyuhhyNrLH9xwGuttM=; b=HYmV0iqgOk/yYOdGWp2MPmcFF6U/hPzr9f+rT+9qfp/pY8AVkI9GXZrq/fOJLZEGUsojPn Da8LYpFqhVe5YEfpxj2SFBF8VxQNvMa/oSqupQHXhnaFeaVezLqXuxu8j6QR5qfZXKlvH2 nDykIIGZFpczWUoMks3GbTqaOJnqdpw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rlRtOaeDNrio/hDcemDpLp8x7gyuhhyNrLH9xwGuttM=; b=r79z/e+KK2CHVrb6p4Aj5QYymr28rAxo7EUudBxx1tRNGmWrk2EZgwtBGvG87tUVEBq9B9 7Oo0Pm2QLXHt7lDQ== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rlRtOaeDNrio/hDcemDpLp8x7gyuhhyNrLH9xwGuttM=; b=HYmV0iqgOk/yYOdGWp2MPmcFF6U/hPzr9f+rT+9qfp/pY8AVkI9GXZrq/fOJLZEGUsojPn Da8LYpFqhVe5YEfpxj2SFBF8VxQNvMa/oSqupQHXhnaFeaVezLqXuxu8j6QR5qfZXKlvH2 nDykIIGZFpczWUoMks3GbTqaOJnqdpw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rlRtOaeDNrio/hDcemDpLp8x7gyuhhyNrLH9xwGuttM=; b=r79z/e+KK2CHVrb6p4Aj5QYymr28rAxo7EUudBxx1tRNGmWrk2EZgwtBGvG87tUVEBq9B9 7Oo0Pm2QLXHt7lDQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 1EDE913A6A; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id CFk9BzvBrmhNfgAAD6G6ig (envelope-from ); Wed, 27 Aug 2025 08:26:35 +0000 From: Vlastimil Babka Date: Wed, 27 Aug 2025 10:26:34 +0200 Subject: [PATCH v6 02/10] slab: add opt-in caching layer of percpu sheaves Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250827-slub-percpu-caches-v6-2-f0f775a3f73f@suse.cz> References: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> In-Reply-To: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> To: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes Cc: Roman Gushchin , Harry Yoo , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, vbabka@suse.cz X-Mailer: b4 0.14.2 X-Spam-Level: X-Spamd-Result: default: False [-8.30 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-0.999]; MIME_GOOD(-0.10)[text/plain]; FUZZY_RATELIMITED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_DN_SOME(0.00)[]; RCPT_COUNT_TWELVE(0.00)[12]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; R_RATELIMIT(0.00)[to_ip_from(RLwn5r54y1cp81no5tmbbew5oc)]; FROM_HAS_DN(0.00)[]; FREEMAIL_CC(0.00)[linux.dev,oracle.com,gmail.com,kvack.org,vger.kernel.org,lists.infradead.org,suse.cz]; MID_RHS_MATCH_FROM(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.cz:email,suse.cz:mid,imap1.dmz-prg2.suse.org:helo] X-Spam-Flag: NO X-Spam-Score: -8.30 Specifying a non-zero value for a new struct kmem_cache_args field sheaf_capacity will setup a caching layer of percpu arrays called sheaves of given capacity for the created cache. Allocations from the cache will allocate via the percpu sheaves (main or spare) as long as they have no NUMA node preference. Frees will also put the object back into one of the sheaves. When both percpu sheaves are found empty during an allocation, an empty sheaf may be replaced with a full one from the per-node barn. If none are available and the allocation is allowed to block, an empty sheaf is refilled from slab(s) by an internal bulk alloc operation. When both percpu sheaves are full during freeing, the barn can replace a full one with an empty one, unless over a full sheaves limit. In that case a sheaf is flushed to slab(s) by an internal bulk free operation. Flushing sheaves and barns is also wired to the existing cpu flushing and cache shrinking operations. The sheaves do not distinguish NUMA locality of the cached objects. If an allocation is requested with kmem_cache_alloc_node() (or a mempolicy with strict_numa mode enabled) with a specific node (not NUMA_NO_NODE), the sheaves are bypassed. The bulk operations exposed to slab users also try to utilize the sheaves as long as the necessary (full or empty) sheaves are available on the cpu or in the barn. Once depleted, they will fallback to bulk alloc/free to slabs directly to avoid double copying. The sheaf_capacity value is exported in sysfs for observability. Sysfs CONFIG_SLUB_STATS counters alloc_cpu_sheaf and free_cpu_sheaf count objects allocated or freed using the sheaves (and thus not counting towards the other alloc/free path counters). Counters sheaf_refill and sheaf_flush count objects filled or flushed from or to slab pages, and can be used to assess how effective the caching is. The refill and flush operations will also count towards the usual alloc_fastpath/slowpath, free_fastpath/slowpath and other counters for the backing slabs. For barn operations, barn_get and barn_put count how many full sheaves were get from or put to the barn, the _fail variants count how many such requests could not be satisfied mainly because the barn was either empty or full. While the barn also holds empty sheaves to make some operations easier, these are not as critical to mandate own counters. Finally, there are sheaf_alloc/sheaf_free counters. Access to the percpu sheaves is protected by local_trylock() when potential callers include irq context, and local_lock() otherwise (such as when we already know the gfp flags allow blocking). The trylock failures should be rare and we can easily fallback. Each per-NUMA-node barn has a spin_lock. When slub_debug is enabled for a cache with sheaf_capacity also specified, the latter is ignored so that allocations and frees reach the slow path where debugging hooks are processed. Similarly, we ignore it with CONFIG_SLUB_TINY which prefers low memory usage to performance. Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 31 ++ mm/slab.h | 2 + mm/slab_common.c | 5 +- mm/slub.c | 1159 ++++++++++++++++++++++++++++++++++++++++++++++= +--- 4 files changed, 1135 insertions(+), 62 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index d5a8ab98035cf3e3d9043e3b038e1bebeff05b52..49acbcdc6696fd120c402adf757= b3f41660ad50a 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -335,6 +335,37 @@ struct kmem_cache_args { * %NULL means no constructor. */ void (*ctor)(void *); + /** + * @sheaf_capacity: Enable sheaves of given capacity for the cache. + * + * With a non-zero value, allocations from the cache go through caching + * arrays called sheaves. Each cpu has a main sheaf that's always + * present, and a spare sheaf that may be not present. When both become + * empty, there's an attempt to replace an empty sheaf with a full sheaf + * from the per-node barn. + * + * When no full sheaf is available, and gfp flags allow blocking, a + * sheaf is allocated and filled from slab(s) using bulk allocation. + * Otherwise the allocation falls back to the normal operation + * allocating a single object from a slab. + * + * Analogically when freeing and both percpu sheaves are full, the barn + * may replace it with an empty sheaf, unless it's over capacity. In + * that case a sheaf is bulk freed to slab pages. + * + * The sheaves do not enforce NUMA placement of objects, so allocations + * via kmem_cache_alloc_node() with a node specified other than + * NUMA_NO_NODE will bypass them. + * + * Bulk allocation and free operations also try to use the cpu sheaves + * and barn, but fallback to using slab pages directly. + * + * When slub_debug is enabled for the cache, the sheaf_capacity argument + * is ignored. + * + * %0 means no sheaves will be created. + */ + unsigned int sheaf_capacity; }; =20 struct kmem_cache *__kmem_cache_create_args(const char *name, diff --git a/mm/slab.h b/mm/slab.h index 248b34c839b7ca39cf14e139c62d116efb97d30f..206987ce44a4d053ebe3b5e5078= 4d2dd23822cd1 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -235,6 +235,7 @@ struct kmem_cache { #ifndef CONFIG_SLUB_TINY struct kmem_cache_cpu __percpu *cpu_slab; #endif + struct slub_percpu_sheaves __percpu *cpu_sheaves; /* Used for retrieving partial slabs, etc. */ slab_flags_t flags; unsigned long min_partial; @@ -248,6 +249,7 @@ struct kmem_cache { /* Number of per cpu partial slabs to keep around */ unsigned int cpu_partial_slabs; #endif + unsigned int sheaf_capacity; struct kmem_cache_order_objects oo; =20 /* Allocation and freeing of slabs */ diff --git a/mm/slab_common.c b/mm/slab_common.c index bfe7c40eeee1a01c175766935c1e3c0304434a53..e2b197e47866c30acdbd1fee415= 9f262a751c5a7 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -163,6 +163,9 @@ int slab_unmergeable(struct kmem_cache *s) return 1; #endif =20 + if (s->cpu_sheaves) + return 1; + /* * We may have set a slab to be unmergeable during bootstrap. */ @@ -321,7 +324,7 @@ struct kmem_cache *__kmem_cache_create_args(const char = *name, object_size - args->usersize < args->useroffset)) args->usersize =3D args->useroffset =3D 0; =20 - if (!args->usersize) + if (!args->usersize && !args->sheaf_capacity) s =3D __kmem_cache_alias(name, object_size, args->align, flags, args->ctor); if (s) diff --git a/mm/slub.c b/mm/slub.c index 9f671ec76131c4b0b28d5d568aa45842b5efb6d4..0822a817c28c2c4666e853ef0f4= 33842c64f607a 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -363,8 +363,10 @@ static inline void debugfs_slab_add(struct kmem_cache = *s) { } #endif =20 enum stat_item { + ALLOC_PCS, /* Allocation from percpu sheaf */ ALLOC_FASTPATH, /* Allocation from cpu slab */ ALLOC_SLOWPATH, /* Allocation by getting a new cpu slab */ + FREE_PCS, /* Free to percpu sheaf */ FREE_FASTPATH, /* Free to cpu slab */ FREE_SLOWPATH, /* Freeing not to cpu slab */ FREE_FROZEN, /* Freeing to frozen slab */ @@ -389,6 +391,14 @@ enum stat_item { CPU_PARTIAL_FREE, /* Refill cpu partial on free */ CPU_PARTIAL_NODE, /* Refill cpu partial from node partial */ CPU_PARTIAL_DRAIN, /* Drain cpu partial to node partial */ + SHEAF_FLUSH, /* Objects flushed from a sheaf */ + SHEAF_REFILL, /* Objects refilled to a sheaf */ + SHEAF_ALLOC, /* Allocation of an empty sheaf */ + SHEAF_FREE, /* Freeing of an empty sheaf */ + BARN_GET, /* Got full sheaf from barn */ + BARN_GET_FAIL, /* Failed to get full sheaf from barn */ + BARN_PUT, /* Put full sheaf to barn */ + BARN_PUT_FAIL, /* Failed to put full sheaf to barn */ NR_SLUB_STAT_ITEMS }; =20 @@ -435,6 +445,33 @@ void stat_add(const struct kmem_cache *s, enum stat_it= em si, int v) #endif } =20 +#define MAX_FULL_SHEAVES 10 +#define MAX_EMPTY_SHEAVES 10 + +struct node_barn { + spinlock_t lock; + struct list_head sheaves_full; + struct list_head sheaves_empty; + unsigned int nr_full; + unsigned int nr_empty; +}; + +struct slab_sheaf { + union { + struct rcu_head rcu_head; + struct list_head barn_list; + }; + unsigned int size; + void *objects[]; +}; + +struct slub_percpu_sheaves { + local_trylock_t lock; + struct slab_sheaf *main; /* never NULL when unlocked */ + struct slab_sheaf *spare; /* empty or full, may be NULL */ + struct node_barn *barn; +}; + /* * The slab lists for all objects. */ @@ -447,6 +484,7 @@ struct kmem_cache_node { atomic_long_t total_objects; struct list_head full; #endif + struct node_barn *barn; }; =20 static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int n= ode) @@ -470,12 +508,19 @@ static inline struct kmem_cache_node *get_node(struct= kmem_cache *s, int node) */ static nodemask_t slab_nodes; =20 -#ifndef CONFIG_SLUB_TINY /* * Workqueue used for flush_cpu_slab(). */ static struct workqueue_struct *flushwq; -#endif + +struct slub_flush_work { + struct work_struct work; + struct kmem_cache *s; + bool skip; +}; + +static DEFINE_MUTEX(flush_lock); +static DEFINE_PER_CPU(struct slub_flush_work, slub_flush); =20 /******************************************************************** * Core slab cache functions @@ -2473,6 +2518,360 @@ static void *setup_object(struct kmem_cache *s, voi= d *object) return object; } =20 +static struct slab_sheaf *alloc_empty_sheaf(struct kmem_cache *s, gfp_t gf= p) +{ + struct slab_sheaf *sheaf =3D kzalloc(struct_size(sheaf, objects, + s->sheaf_capacity), gfp); + + if (unlikely(!sheaf)) + return NULL; + + stat(s, SHEAF_ALLOC); + + return sheaf; +} + +static void free_empty_sheaf(struct kmem_cache *s, struct slab_sheaf *shea= f) +{ + kfree(sheaf); + + stat(s, SHEAF_FREE); +} + +static int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, + size_t size, void **p); + + +static int refill_sheaf(struct kmem_cache *s, struct slab_sheaf *sheaf, + gfp_t gfp) +{ + int to_fill =3D s->sheaf_capacity - sheaf->size; + int filled; + + if (!to_fill) + return 0; + + filled =3D __kmem_cache_alloc_bulk(s, gfp, to_fill, + &sheaf->objects[sheaf->size]); + + sheaf->size +=3D filled; + + stat_add(s, SHEAF_REFILL, filled); + + if (filled < to_fill) + return -ENOMEM; + + return 0; +} + + +static struct slab_sheaf *alloc_full_sheaf(struct kmem_cache *s, gfp_t gfp) +{ + struct slab_sheaf *sheaf =3D alloc_empty_sheaf(s, gfp); + + if (!sheaf) + return NULL; + + if (refill_sheaf(s, sheaf, gfp)) { + free_empty_sheaf(s, sheaf); + return NULL; + } + + return sheaf; +} + +/* + * Maximum number of objects freed during a single flush of main pcs sheaf. + * Translates directly to an on-stack array size. + */ +#define PCS_BATCH_MAX 32U + +static void __kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void= **p); + +/* + * Free all objects from the main sheaf. In order to perform + * __kmem_cache_free_bulk() outside of cpu_sheaves->lock, work in batches = where + * object pointers are moved to a on-stack array under the lock. To bound = the + * stack usage, limit each batch to PCS_BATCH_MAX. + * + * returns true if at least partially flushed + */ +static bool sheaf_flush_main(struct kmem_cache *s) +{ + struct slub_percpu_sheaves *pcs; + unsigned int batch, remaining; + void *objects[PCS_BATCH_MAX]; + struct slab_sheaf *sheaf; + bool ret =3D false; + +next_batch: + if (!local_trylock(&s->cpu_sheaves->lock)) + return ret; + + pcs =3D this_cpu_ptr(s->cpu_sheaves); + sheaf =3D pcs->main; + + batch =3D min(PCS_BATCH_MAX, sheaf->size); + + sheaf->size -=3D batch; + memcpy(objects, sheaf->objects + sheaf->size, batch * sizeof(void *)); + + remaining =3D sheaf->size; + + local_unlock(&s->cpu_sheaves->lock); + + __kmem_cache_free_bulk(s, batch, &objects[0]); + + stat_add(s, SHEAF_FLUSH, batch); + + ret =3D true; + + if (remaining) + goto next_batch; + + return ret; +} + +/* + * Free all objects from a sheaf that's unused, i.e. not linked to any + * cpu_sheaves, so we need no locking and batching. The locking is also not + * necessary when flushing cpu's sheaves (both spare and main) during cpu + * hotremove as the cpu is not executing anymore. + */ +static void sheaf_flush_unused(struct kmem_cache *s, struct slab_sheaf *sh= eaf) +{ + if (!sheaf->size) + return; + + stat_add(s, SHEAF_FLUSH, sheaf->size); + + __kmem_cache_free_bulk(s, sheaf->size, &sheaf->objects[0]); + + sheaf->size =3D 0; +} + +/* + * Caller needs to make sure migration is disabled in order to fully flush + * single cpu's sheaves + * + * must not be called from an irq + * + * flushing operations are rare so let's keep it simple and flush to slabs + * directly, skipping the barn + */ +static void pcs_flush_all(struct kmem_cache *s) +{ + struct slub_percpu_sheaves *pcs; + struct slab_sheaf *spare; + + local_lock(&s->cpu_sheaves->lock); + pcs =3D this_cpu_ptr(s->cpu_sheaves); + + spare =3D pcs->spare; + pcs->spare =3D NULL; + + local_unlock(&s->cpu_sheaves->lock); + + if (spare) { + sheaf_flush_unused(s, spare); + free_empty_sheaf(s, spare); + } + + sheaf_flush_main(s); +} + +static void __pcs_flush_all_cpu(struct kmem_cache *s, unsigned int cpu) +{ + struct slub_percpu_sheaves *pcs; + + pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); + + /* The cpu is not executing anymore so we don't need pcs->lock */ + sheaf_flush_unused(s, pcs->main); + if (pcs->spare) { + sheaf_flush_unused(s, pcs->spare); + free_empty_sheaf(s, pcs->spare); + pcs->spare =3D NULL; + } +} + +static void pcs_destroy(struct kmem_cache *s) +{ + int cpu; + + for_each_possible_cpu(cpu) { + struct slub_percpu_sheaves *pcs; + + pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); + + /* can happen when unwinding failed create */ + if (!pcs->main) + continue; + + /* + * We have already passed __kmem_cache_shutdown() so everything + * was flushed and there should be no objects allocated from + * slabs, otherwise kmem_cache_destroy() would have aborted. + * Therefore something would have to be really wrong if the + * warnings here trigger, and we should rather leave objects and + * sheaves to leak in that case. + */ + + WARN_ON(pcs->spare); + + if (!WARN_ON(pcs->main->size)) { + free_empty_sheaf(s, pcs->main); + pcs->main =3D NULL; + } + } + + free_percpu(s->cpu_sheaves); + s->cpu_sheaves =3D NULL; +} + +static struct slab_sheaf *barn_get_empty_sheaf(struct node_barn *barn) +{ + struct slab_sheaf *empty =3D NULL; + unsigned long flags; + + spin_lock_irqsave(&barn->lock, flags); + + if (barn->nr_empty) { + empty =3D list_first_entry(&barn->sheaves_empty, + struct slab_sheaf, barn_list); + list_del(&empty->barn_list); + barn->nr_empty--; + } + + spin_unlock_irqrestore(&barn->lock, flags); + + return empty; +} + +/* + * The following two functions are used mainly in cases where we have to u= ndo an + * intended action due to a race or cpu migration. Thus they do not check = the + * empty or full sheaf limits for simplicity. + */ + +static void barn_put_empty_sheaf(struct node_barn *barn, struct slab_sheaf= *sheaf) +{ + unsigned long flags; + + spin_lock_irqsave(&barn->lock, flags); + + list_add(&sheaf->barn_list, &barn->sheaves_empty); + barn->nr_empty++; + + spin_unlock_irqrestore(&barn->lock, flags); +} + +static void barn_put_full_sheaf(struct node_barn *barn, struct slab_sheaf = *sheaf) +{ + unsigned long flags; + + spin_lock_irqsave(&barn->lock, flags); + + list_add(&sheaf->barn_list, &barn->sheaves_full); + barn->nr_full++; + + spin_unlock_irqrestore(&barn->lock, flags); +} + +/* + * If a full sheaf is available, return it and put the supplied empty one = to + * barn. We ignore the limit on empty sheaves as the number of sheaves doe= sn't + * change. + */ +static struct slab_sheaf * +barn_replace_empty_sheaf(struct node_barn *barn, struct slab_sheaf *empty) +{ + struct slab_sheaf *full =3D NULL; + unsigned long flags; + + spin_lock_irqsave(&barn->lock, flags); + + if (barn->nr_full) { + full =3D list_first_entry(&barn->sheaves_full, struct slab_sheaf, + barn_list); + list_del(&full->barn_list); + list_add(&empty->barn_list, &barn->sheaves_empty); + barn->nr_full--; + barn->nr_empty++; + } + + spin_unlock_irqrestore(&barn->lock, flags); + + return full; +} + +/* + * If an empty sheaf is available, return it and put the supplied full one= to + * barn. But if there are too many full sheaves, reject this with -E2BIG. + */ +static struct slab_sheaf * +barn_replace_full_sheaf(struct node_barn *barn, struct slab_sheaf *full) +{ + struct slab_sheaf *empty; + unsigned long flags; + + spin_lock_irqsave(&barn->lock, flags); + + if (barn->nr_full >=3D MAX_FULL_SHEAVES) { + empty =3D ERR_PTR(-E2BIG); + } else if (!barn->nr_empty) { + empty =3D ERR_PTR(-ENOMEM); + } else { + empty =3D list_first_entry(&barn->sheaves_empty, struct slab_sheaf, + barn_list); + list_del(&empty->barn_list); + list_add(&full->barn_list, &barn->sheaves_full); + barn->nr_empty--; + barn->nr_full++; + } + + spin_unlock_irqrestore(&barn->lock, flags); + + return empty; +} + +static void barn_init(struct node_barn *barn) +{ + spin_lock_init(&barn->lock); + INIT_LIST_HEAD(&barn->sheaves_full); + INIT_LIST_HEAD(&barn->sheaves_empty); + barn->nr_full =3D 0; + barn->nr_empty =3D 0; +} + +static void barn_shrink(struct kmem_cache *s, struct node_barn *barn) +{ + struct list_head empty_list; + struct list_head full_list; + struct slab_sheaf *sheaf, *sheaf2; + unsigned long flags; + + INIT_LIST_HEAD(&empty_list); + INIT_LIST_HEAD(&full_list); + + spin_lock_irqsave(&barn->lock, flags); + + list_splice_init(&barn->sheaves_full, &full_list); + barn->nr_full =3D 0; + list_splice_init(&barn->sheaves_empty, &empty_list); + barn->nr_empty =3D 0; + + spin_unlock_irqrestore(&barn->lock, flags); + + list_for_each_entry_safe(sheaf, sheaf2, &full_list, barn_list) { + sheaf_flush_unused(s, sheaf); + free_empty_sheaf(s, sheaf); + } + + list_for_each_entry_safe(sheaf, sheaf2, &empty_list, barn_list) + free_empty_sheaf(s, sheaf); +} + /* * Slab allocation and freeing */ @@ -3344,11 +3743,42 @@ static inline void __flush_cpu_slab(struct kmem_cac= he *s, int cpu) put_partials_cpu(s, c); } =20 -struct slub_flush_work { - struct work_struct work; - struct kmem_cache *s; - bool skip; -}; +static inline void flush_this_cpu_slab(struct kmem_cache *s) +{ + struct kmem_cache_cpu *c =3D this_cpu_ptr(s->cpu_slab); + + if (c->slab) + flush_slab(s, c); + + put_partials(s); +} + +static bool has_cpu_slab(int cpu, struct kmem_cache *s) +{ + struct kmem_cache_cpu *c =3D per_cpu_ptr(s->cpu_slab, cpu); + + return c->slab || slub_percpu_partial(c); +} + +#else /* CONFIG_SLUB_TINY */ +static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu) { } +static inline bool has_cpu_slab(int cpu, struct kmem_cache *s) { return fa= lse; } +static inline void flush_this_cpu_slab(struct kmem_cache *s) { } +#endif /* CONFIG_SLUB_TINY */ + +static bool has_pcs_used(int cpu, struct kmem_cache *s) +{ + struct slub_percpu_sheaves *pcs; + + if (!s->cpu_sheaves) + return false; + + pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); + + return (pcs->spare || pcs->main->size); +} + +static void pcs_flush_all(struct kmem_cache *s); =20 /* * Flush cpu slab. @@ -3358,30 +3788,18 @@ struct slub_flush_work { static void flush_cpu_slab(struct work_struct *w) { struct kmem_cache *s; - struct kmem_cache_cpu *c; struct slub_flush_work *sfw; =20 sfw =3D container_of(w, struct slub_flush_work, work); =20 s =3D sfw->s; - c =3D this_cpu_ptr(s->cpu_slab); - - if (c->slab) - flush_slab(s, c); - - put_partials(s); -} =20 -static bool has_cpu_slab(int cpu, struct kmem_cache *s) -{ - struct kmem_cache_cpu *c =3D per_cpu_ptr(s->cpu_slab, cpu); + if (s->cpu_sheaves) + pcs_flush_all(s); =20 - return c->slab || slub_percpu_partial(c); + flush_this_cpu_slab(s); } =20 -static DEFINE_MUTEX(flush_lock); -static DEFINE_PER_CPU(struct slub_flush_work, slub_flush); - static void flush_all_cpus_locked(struct kmem_cache *s) { struct slub_flush_work *sfw; @@ -3392,7 +3810,7 @@ static void flush_all_cpus_locked(struct kmem_cache *= s) =20 for_each_online_cpu(cpu) { sfw =3D &per_cpu(slub_flush, cpu); - if (!has_cpu_slab(cpu, s)) { + if (!has_cpu_slab(cpu, s) && !has_pcs_used(cpu, s)) { sfw->skip =3D true; continue; } @@ -3428,19 +3846,15 @@ static int slub_cpu_dead(unsigned int cpu) struct kmem_cache *s; =20 mutex_lock(&slab_mutex); - list_for_each_entry(s, &slab_caches, list) + list_for_each_entry(s, &slab_caches, list) { __flush_cpu_slab(s, cpu); + if (s->cpu_sheaves) + __pcs_flush_all_cpu(s, cpu); + } mutex_unlock(&slab_mutex); return 0; } =20 -#else /* CONFIG_SLUB_TINY */ -static inline void flush_all_cpus_locked(struct kmem_cache *s) { } -static inline void flush_all(struct kmem_cache *s) { } -static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu) { } -static inline int slub_cpu_dead(unsigned int cpu) { return 0; } -#endif /* CONFIG_SLUB_TINY */ - /* * Check if the objects in a per cpu structure fit numa * locality expectations. @@ -4191,30 +4605,237 @@ bool slab_post_alloc_hook(struct kmem_cache *s, st= ruct list_lru *lru, } =20 /* - * Inlined fastpath so that allocation functions (kmalloc, kmem_cache_allo= c) - * have the fastpath folded into their functions. So no function call - * overhead for requests that can be satisfied on the fastpath. - * - * The fastpath works by first checking if the lockless freelist can be us= ed. - * If not then __slab_alloc is called for slow processing. + * Replace the empty main sheaf with a (at least partially) full sheaf. * - * Otherwise we can simply pick the next object from the lockless free lis= t. + * Must be called with the cpu_sheaves local lock locked. If successful, r= eturns + * the pcs pointer and the local lock locked (possibly on a different cpu = than + * initially called). If not successful, returns NULL and the local lock + * unlocked. */ -static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struc= t list_lru *lru, - gfp_t gfpflags, int node, unsigned long addr, size_t orig_size) +static struct slub_percpu_sheaves * +__pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves = *pcs, gfp_t gfp) { - void *object; - bool init =3D false; + struct slab_sheaf *empty =3D NULL; + struct slab_sheaf *full; + bool can_alloc; =20 - s =3D slab_pre_alloc_hook(s, gfpflags); - if (unlikely(!s)) - return NULL; + lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock.llock)); =20 - object =3D kfence_alloc(s, orig_size, gfpflags); - if (unlikely(object)) + if (pcs->spare && pcs->spare->size > 0) { + swap(pcs->main, pcs->spare); + return pcs; + } + + full =3D barn_replace_empty_sheaf(pcs->barn, pcs->main); + + if (full) { + stat(s, BARN_GET); + pcs->main =3D full; + return pcs; + } + + stat(s, BARN_GET_FAIL); + + can_alloc =3D gfpflags_allow_blocking(gfp); + + if (can_alloc) { + if (pcs->spare) { + empty =3D pcs->spare; + pcs->spare =3D NULL; + } else { + empty =3D barn_get_empty_sheaf(pcs->barn); + } + } + + local_unlock(&s->cpu_sheaves->lock); + + if (!can_alloc) + return NULL; + + if (empty) { + if (!refill_sheaf(s, empty, gfp)) { + full =3D empty; + } else { + /* + * we must be very low on memory so don't bother + * with the barn + */ + free_empty_sheaf(s, empty); + } + } else { + full =3D alloc_full_sheaf(s, gfp); + } + + if (!full) + return NULL; + + /* + * we can reach here only when gfpflags_allow_blocking + * so this must not be an irq + */ + local_lock(&s->cpu_sheaves->lock); + pcs =3D this_cpu_ptr(s->cpu_sheaves); + + /* + * If we are returning empty sheaf, we either got it from the + * barn or had to allocate one. If we are returning a full + * sheaf, it's due to racing or being migrated to a different + * cpu. Breaching the barn's sheaf limits should be thus rare + * enough so just ignore them to simplify the recovery. + */ + + if (pcs->main->size =3D=3D 0) { + barn_put_empty_sheaf(pcs->barn, pcs->main); + pcs->main =3D full; + return pcs; + } + + if (!pcs->spare) { + pcs->spare =3D full; + return pcs; + } + + if (pcs->spare->size =3D=3D 0) { + barn_put_empty_sheaf(pcs->barn, pcs->spare); + pcs->spare =3D full; + return pcs; + } + + barn_put_full_sheaf(pcs->barn, full); + stat(s, BARN_PUT); + + return pcs; +} + +static __fastpath_inline +void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp) +{ + struct slub_percpu_sheaves *pcs; + void *object; + +#ifdef CONFIG_NUMA + if (static_branch_unlikely(&strict_numa)) { + if (current->mempolicy) + return NULL; + } +#endif + + if (!local_trylock(&s->cpu_sheaves->lock)) + return NULL; + + pcs =3D this_cpu_ptr(s->cpu_sheaves); + + if (unlikely(pcs->main->size =3D=3D 0)) { + pcs =3D __pcs_replace_empty_main(s, pcs, gfp); + if (unlikely(!pcs)) + return NULL; + } + + object =3D pcs->main->objects[--pcs->main->size]; + + local_unlock(&s->cpu_sheaves->lock); + + stat(s, ALLOC_PCS); + + return object; +} + +static __fastpath_inline +unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, size_t size, void *= *p) +{ + struct slub_percpu_sheaves *pcs; + struct slab_sheaf *main; + unsigned int allocated =3D 0; + unsigned int batch; + +next_batch: + if (!local_trylock(&s->cpu_sheaves->lock)) + return allocated; + + pcs =3D this_cpu_ptr(s->cpu_sheaves); + + if (unlikely(pcs->main->size =3D=3D 0)) { + + struct slab_sheaf *full; + + if (pcs->spare && pcs->spare->size > 0) { + swap(pcs->main, pcs->spare); + goto do_alloc; + } + + full =3D barn_replace_empty_sheaf(pcs->barn, pcs->main); + + if (full) { + stat(s, BARN_GET); + pcs->main =3D full; + goto do_alloc; + } + + stat(s, BARN_GET_FAIL); + + local_unlock(&s->cpu_sheaves->lock); + + /* + * Once full sheaves in barn are depleted, let the bulk + * allocation continue from slab pages, otherwise we would just + * be copying arrays of pointers twice. + */ + return allocated; + } + +do_alloc: + + main =3D pcs->main; + batch =3D min(size, main->size); + + main->size -=3D batch; + memcpy(p, main->objects + main->size, batch * sizeof(void *)); + + local_unlock(&s->cpu_sheaves->lock); + + stat_add(s, ALLOC_PCS, batch); + + allocated +=3D batch; + + if (batch < size) { + p +=3D batch; + size -=3D batch; + goto next_batch; + } + + return allocated; +} + + +/* + * Inlined fastpath so that allocation functions (kmalloc, kmem_cache_allo= c) + * have the fastpath folded into their functions. So no function call + * overhead for requests that can be satisfied on the fastpath. + * + * The fastpath works by first checking if the lockless freelist can be us= ed. + * If not then __slab_alloc is called for slow processing. + * + * Otherwise we can simply pick the next object from the lockless free lis= t. + */ +static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struc= t list_lru *lru, + gfp_t gfpflags, int node, unsigned long addr, size_t orig_size) +{ + void *object; + bool init =3D false; + + s =3D slab_pre_alloc_hook(s, gfpflags); + if (unlikely(!s)) + return NULL; + + object =3D kfence_alloc(s, orig_size, gfpflags); + if (unlikely(object)) goto out; =20 - object =3D __slab_alloc_node(s, gfpflags, node, addr, orig_size); + if (s->cpu_sheaves && node =3D=3D NUMA_NO_NODE) + object =3D alloc_from_pcs(s, gfpflags); + + if (!object) + object =3D __slab_alloc_node(s, gfpflags, node, addr, orig_size); =20 maybe_wipe_obj_freeptr(s, object); init =3D slab_want_init_on_alloc(gfpflags, s); @@ -4591,6 +5212,288 @@ static void __slab_free(struct kmem_cache *s, struc= t slab *slab, discard_slab(s, slab); } =20 +/* + * pcs is locked. We should have get rid of the spare sheaf and obtained an + * empty sheaf, while the main sheaf is full. We want to install the empty= sheaf + * as a main sheaf, and make the current main sheaf a spare sheaf. + * + * However due to having relinquished the cpu_sheaves lock when obtaining + * the empty sheaf, we need to handle some unlikely but possible cases. + * + * If we put any sheaf to barn here, it's because we were interrupted or h= ave + * been migrated to a different cpu, which should be rare enough so just i= gnore + * the barn's limits to simplify the handling. + * + * An alternative scenario that gets us here is when we fail + * barn_replace_full_sheaf(), because there's no empty sheaf available in = the + * barn, so we had to allocate it by alloc_empty_sheaf(). But because we s= aw the + * limit on full sheaves was not exceeded, we assume it didn't change and = just + * put the full sheaf there. + */ +static void __pcs_install_empty_sheaf(struct kmem_cache *s, + struct slub_percpu_sheaves *pcs, struct slab_sheaf *empty) +{ + lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock.llock)); + + /* This is what we expect to find if nobody interrupted us. */ + if (likely(!pcs->spare)) { + pcs->spare =3D pcs->main; + pcs->main =3D empty; + return; + } + + /* + * Unlikely because if the main sheaf had space, we would have just + * freed to it. Get rid of our empty sheaf. + */ + if (pcs->main->size < s->sheaf_capacity) { + barn_put_empty_sheaf(pcs->barn, empty); + return; + } + + /* Also unlikely for the same reason */ + if (pcs->spare->size < s->sheaf_capacity) { + swap(pcs->main, pcs->spare); + barn_put_empty_sheaf(pcs->barn, empty); + return; + } + + /* + * We probably failed barn_replace_full_sheaf() due to no empty sheaf + * available there, but we allocated one, so finish the job. + */ + barn_put_full_sheaf(pcs->barn, pcs->main); + stat(s, BARN_PUT); + pcs->main =3D empty; +} + +/* + * Replace the full main sheaf with a (at least partially) empty sheaf. + * + * Must be called with the cpu_sheaves local lock locked. If successful, r= eturns + * the pcs pointer and the local lock locked (possibly on a different cpu = than + * initially called). If not successful, returns NULL and the local lock + * unlocked. + */ +static struct slub_percpu_sheaves * +__pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *= pcs) +{ + struct slab_sheaf *empty; + bool put_fail; + +restart: + lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock.llock)); + + put_fail =3D false; + + if (!pcs->spare) { + empty =3D barn_get_empty_sheaf(pcs->barn); + if (empty) { + pcs->spare =3D pcs->main; + pcs->main =3D empty; + return pcs; + } + goto alloc_empty; + } + + if (pcs->spare->size < s->sheaf_capacity) { + swap(pcs->main, pcs->spare); + return pcs; + } + + empty =3D barn_replace_full_sheaf(pcs->barn, pcs->main); + + if (!IS_ERR(empty)) { + stat(s, BARN_PUT); + pcs->main =3D empty; + return pcs; + } + + if (PTR_ERR(empty) =3D=3D -E2BIG) { + /* Since we got here, spare exists and is full */ + struct slab_sheaf *to_flush =3D pcs->spare; + + stat(s, BARN_PUT_FAIL); + + pcs->spare =3D NULL; + local_unlock(&s->cpu_sheaves->lock); + + sheaf_flush_unused(s, to_flush); + empty =3D to_flush; + goto got_empty; + } + + /* + * We could not replace full sheaf because barn had no empty + * sheaves. We can still allocate it and put the full sheaf in + * __pcs_install_empty_sheaf(), but if we fail to allocate it, + * make sure to count the fail. + */ + put_fail =3D true; + +alloc_empty: + local_unlock(&s->cpu_sheaves->lock); + + empty =3D alloc_empty_sheaf(s, GFP_NOWAIT); + if (empty) + goto got_empty; + + if (put_fail) + stat(s, BARN_PUT_FAIL); + + if (!sheaf_flush_main(s)) + return NULL; + + if (!local_trylock(&s->cpu_sheaves->lock)) + return NULL; + + pcs =3D this_cpu_ptr(s->cpu_sheaves); + + /* + * we flushed the main sheaf so it should be empty now, + * but in case we got preempted or migrated, we need to + * check again + */ + if (pcs->main->size =3D=3D s->sheaf_capacity) + goto restart; + + return pcs; + +got_empty: + if (!local_trylock(&s->cpu_sheaves->lock)) { + barn_put_empty_sheaf(pcs->barn, empty); + return NULL; + } + + pcs =3D this_cpu_ptr(s->cpu_sheaves); + __pcs_install_empty_sheaf(s, pcs, empty); + + return pcs; +} + +/* + * Free an object to the percpu sheaves. + * The object is expected to have passed slab_free_hook() already. + */ +static __fastpath_inline +bool free_to_pcs(struct kmem_cache *s, void *object) +{ + struct slub_percpu_sheaves *pcs; + + if (!local_trylock(&s->cpu_sheaves->lock)) + return false; + + pcs =3D this_cpu_ptr(s->cpu_sheaves); + + if (unlikely(pcs->main->size =3D=3D s->sheaf_capacity)) { + + pcs =3D __pcs_replace_full_main(s, pcs); + if (unlikely(!pcs)) + return false; + } + + pcs->main->objects[pcs->main->size++] =3D object; + + local_unlock(&s->cpu_sheaves->lock); + + stat(s, FREE_PCS); + + return true; +} + +/* + * Bulk free objects to the percpu sheaves. + * Unlike free_to_pcs() this includes the calls to all necessary hooks + * and the fallback to freeing to slab pages. + */ +static void free_to_pcs_bulk(struct kmem_cache *s, size_t size, void **p) +{ + struct slub_percpu_sheaves *pcs; + struct slab_sheaf *main, *empty; + unsigned int batch, i =3D 0; + bool init; + + init =3D slab_want_init_on_free(s); + + while (i < size) { + struct slab *slab =3D virt_to_slab(p[i]); + + memcg_slab_free_hook(s, slab, p + i, 1); + alloc_tagging_slab_free_hook(s, slab, p + i, 1); + + if (unlikely(!slab_free_hook(s, p[i], init, false))) { + p[i] =3D p[--size]; + if (!size) + return; + continue; + } + + i++; + } + +next_batch: + if (!local_trylock(&s->cpu_sheaves->lock)) + goto fallback; + + pcs =3D this_cpu_ptr(s->cpu_sheaves); + + if (likely(pcs->main->size < s->sheaf_capacity)) + goto do_free; + + if (!pcs->spare) { + empty =3D barn_get_empty_sheaf(pcs->barn); + if (!empty) + goto no_empty; + + pcs->spare =3D pcs->main; + pcs->main =3D empty; + goto do_free; + } + + if (pcs->spare->size < s->sheaf_capacity) { + swap(pcs->main, pcs->spare); + goto do_free; + } + + empty =3D barn_replace_full_sheaf(pcs->barn, pcs->main); + if (IS_ERR(empty)) { + stat(s, BARN_PUT_FAIL); + goto no_empty; + } + + stat(s, BARN_PUT); + pcs->main =3D empty; + +do_free: + main =3D pcs->main; + batch =3D min(size, s->sheaf_capacity - main->size); + + memcpy(main->objects + main->size, p, batch * sizeof(void *)); + main->size +=3D batch; + + local_unlock(&s->cpu_sheaves->lock); + + stat_add(s, FREE_PCS, batch); + + if (batch < size) { + p +=3D batch; + size -=3D batch; + goto next_batch; + } + + return; + +no_empty: + local_unlock(&s->cpu_sheaves->lock); + + /* + * if we depleted all empty sheaves in the barn or there are too + * many full sheaves, free the rest to slab pages + */ +fallback: + __kmem_cache_free_bulk(s, size, p); +} + #ifndef CONFIG_SLUB_TINY /* * Fastpath with forced inlining to produce a kfree and kmem_cache_free th= at @@ -4677,7 +5580,10 @@ void slab_free(struct kmem_cache *s, struct slab *sl= ab, void *object, memcg_slab_free_hook(s, slab, &object, 1); alloc_tagging_slab_free_hook(s, slab, &object, 1); =20 - if (likely(slab_free_hook(s, object, slab_want_init_on_free(s), false))) + if (unlikely(!slab_free_hook(s, object, slab_want_init_on_free(s), false)= )) + return; + + if (!s->cpu_sheaves || !free_to_pcs(s, object)) do_slab_free(s, slab, object, object, 1, addr); } =20 @@ -5273,6 +6179,15 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size= _t size, void **p) if (!size) return; =20 + /* + * freeing to sheaves is so incompatible with the detached freelist so + * once we go that way, we have to do everything differently + */ + if (s && s->cpu_sheaves) { + free_to_pcs_bulk(s, size, p); + return; + } + do { struct detached_freelist df; =20 @@ -5391,7 +6306,7 @@ static int __kmem_cache_alloc_bulk(struct kmem_cache = *s, gfp_t flags, int kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags, size_t= size, void **p) { - int i; + unsigned int i =3D 0; =20 if (!size) return 0; @@ -5400,9 +6315,20 @@ int kmem_cache_alloc_bulk_noprof(struct kmem_cache *= s, gfp_t flags, size_t size, if (unlikely(!s)) return 0; =20 - i =3D __kmem_cache_alloc_bulk(s, flags, size, p); - if (unlikely(i =3D=3D 0)) - return 0; + if (s->cpu_sheaves) + i =3D alloc_from_pcs_bulk(s, size, p); + + if (i < size) { + /* + * If we ran out of memory, don't bother with freeing back to + * the percpu sheaves, we have bigger problems. + */ + if (unlikely(__kmem_cache_alloc_bulk(s, flags, size - i, p + i) =3D=3D 0= )) { + if (i > 0) + __kmem_cache_free_bulk(s, i, p); + return 0; + } + } =20 /* * memcg and kmem_cache debug support and memory initialization. @@ -5412,11 +6338,11 @@ int kmem_cache_alloc_bulk_noprof(struct kmem_cache = *s, gfp_t flags, size_t size, slab_want_init_on_alloc(flags, s), s->object_size))) { return 0; } - return i; + + return size; } EXPORT_SYMBOL(kmem_cache_alloc_bulk_noprof); =20 - /* * Object placement in a slab is made very easy because we always start at * offset 0. If we tune the size of the object to the alignment then we can @@ -5550,7 +6476,7 @@ static inline int calculate_order(unsigned int size) } =20 static void -init_kmem_cache_node(struct kmem_cache_node *n) +init_kmem_cache_node(struct kmem_cache_node *n, struct node_barn *barn) { n->nr_partial =3D 0; spin_lock_init(&n->list_lock); @@ -5560,6 +6486,9 @@ init_kmem_cache_node(struct kmem_cache_node *n) atomic_long_set(&n->total_objects, 0); INIT_LIST_HEAD(&n->full); #endif + n->barn =3D barn; + if (barn) + barn_init(barn); } =20 #ifndef CONFIG_SLUB_TINY @@ -5590,6 +6519,30 @@ static inline int alloc_kmem_cache_cpus(struct kmem_= cache *s) } #endif /* CONFIG_SLUB_TINY */ =20 +static int init_percpu_sheaves(struct kmem_cache *s) +{ + int cpu; + + for_each_possible_cpu(cpu) { + struct slub_percpu_sheaves *pcs; + int nid; + + pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); + + local_trylock_init(&pcs->lock); + + nid =3D cpu_to_mem(cpu); + + pcs->barn =3D get_node(s, nid)->barn; + pcs->main =3D alloc_empty_sheaf(s, GFP_KERNEL); + + if (!pcs->main) + return -ENOMEM; + } + + return 0; +} + static struct kmem_cache *kmem_cache_node; =20 /* @@ -5625,7 +6578,7 @@ static void early_kmem_cache_node_alloc(int node) slab->freelist =3D get_freepointer(kmem_cache_node, n); slab->inuse =3D 1; kmem_cache_node->node[node] =3D n; - init_kmem_cache_node(n); + init_kmem_cache_node(n, NULL); inc_slabs_node(kmem_cache_node, node, slab->objects); =20 /* @@ -5641,6 +6594,13 @@ static void free_kmem_cache_nodes(struct kmem_cache = *s) struct kmem_cache_node *n; =20 for_each_kmem_cache_node(s, node, n) { + if (n->barn) { + WARN_ON(n->barn->nr_full); + WARN_ON(n->barn->nr_empty); + kfree(n->barn); + n->barn =3D NULL; + } + s->node[node] =3D NULL; kmem_cache_free(kmem_cache_node, n); } @@ -5649,6 +6609,8 @@ static void free_kmem_cache_nodes(struct kmem_cache *= s) void __kmem_cache_release(struct kmem_cache *s) { cache_random_seq_destroy(s); + if (s->cpu_sheaves) + pcs_destroy(s); #ifndef CONFIG_SLUB_TINY free_percpu(s->cpu_slab); #endif @@ -5661,18 +6623,29 @@ static int init_kmem_cache_nodes(struct kmem_cache = *s) =20 for_each_node_mask(node, slab_nodes) { struct kmem_cache_node *n; + struct node_barn *barn =3D NULL; =20 if (slab_state =3D=3D DOWN) { early_kmem_cache_node_alloc(node); continue; } + + if (s->cpu_sheaves) { + barn =3D kmalloc_node(sizeof(*barn), GFP_KERNEL, node); + + if (!barn) + return 0; + } + n =3D kmem_cache_alloc_node(kmem_cache_node, GFP_KERNEL, node); - - if (!n) + if (!n) { + kfree(barn); return 0; + } + + init_kmem_cache_node(n, barn); =20 - init_kmem_cache_node(n); s->node[node] =3D n; } return 1; @@ -5929,6 +6902,8 @@ int __kmem_cache_shutdown(struct kmem_cache *s) flush_all_cpus_locked(s); /* Attempt to free all objects */ for_each_kmem_cache_node(s, node, n) { + if (n->barn) + barn_shrink(s, n->barn); free_partial(s, n); if (n->nr_partial || node_nr_slabs(n)) return 1; @@ -6132,6 +7107,9 @@ static int __kmem_cache_do_shrink(struct kmem_cache *= s) for (i =3D 0; i < SHRINK_PROMOTE_MAX; i++) INIT_LIST_HEAD(promote + i); =20 + if (n->barn) + barn_shrink(s, n->barn); + spin_lock_irqsave(&n->list_lock, flags); =20 /* @@ -6211,12 +7189,24 @@ static int slab_mem_going_online_callback(int nid) */ mutex_lock(&slab_mutex); list_for_each_entry(s, &slab_caches, list) { + struct node_barn *barn =3D NULL; + /* * The structure may already exist if the node was previously * onlined and offlined. */ if (get_node(s, nid)) continue; + + if (s->cpu_sheaves) { + barn =3D kmalloc_node(sizeof(*barn), GFP_KERNEL, nid); + + if (!barn) { + ret =3D -ENOMEM; + goto out; + } + } + /* * XXX: kmem_cache_alloc_node will fallback to other nodes * since memory is not yet available from the node that @@ -6224,10 +7214,13 @@ static int slab_mem_going_online_callback(int nid) */ n =3D kmem_cache_alloc(kmem_cache_node, GFP_KERNEL); if (!n) { + kfree(barn); ret =3D -ENOMEM; goto out; } - init_kmem_cache_node(n); + + init_kmem_cache_node(n, barn); + s->node[nid] =3D n; } /* @@ -6440,6 +7433,17 @@ int do_kmem_cache_create(struct kmem_cache *s, const= char *name, =20 set_cpu_partial(s); =20 + if (args->sheaf_capacity && !IS_ENABLED(CONFIG_SLUB_TINY) + && !(s->flags & SLAB_DEBUG_FLAGS)) { + s->cpu_sheaves =3D alloc_percpu(struct slub_percpu_sheaves); + if (!s->cpu_sheaves) { + err =3D -ENOMEM; + goto out; + } + // TODO: increase capacity to grow slab_sheaf up to next kmalloc size? + s->sheaf_capacity =3D args->sheaf_capacity; + } + #ifdef CONFIG_NUMA s->remote_node_defrag_ratio =3D 1000; #endif @@ -6456,6 +7460,12 @@ int do_kmem_cache_create(struct kmem_cache *s, const= char *name, if (!alloc_kmem_cache_cpus(s)) goto out; =20 + if (s->cpu_sheaves) { + err =3D init_percpu_sheaves(s); + if (err) + goto out; + } + err =3D 0; =20 /* Mutex is not taken during early boot */ @@ -6908,6 +7918,12 @@ static ssize_t order_show(struct kmem_cache *s, char= *buf) } SLAB_ATTR_RO(order); =20 +static ssize_t sheaf_capacity_show(struct kmem_cache *s, char *buf) +{ + return sysfs_emit(buf, "%u\n", s->sheaf_capacity); +} +SLAB_ATTR_RO(sheaf_capacity); + static ssize_t min_partial_show(struct kmem_cache *s, char *buf) { return sysfs_emit(buf, "%lu\n", s->min_partial); @@ -7255,8 +8271,10 @@ static ssize_t text##_store(struct kmem_cache *s, \ } \ SLAB_ATTR(text); \ =20 +STAT_ATTR(ALLOC_PCS, alloc_cpu_sheaf); STAT_ATTR(ALLOC_FASTPATH, alloc_fastpath); STAT_ATTR(ALLOC_SLOWPATH, alloc_slowpath); +STAT_ATTR(FREE_PCS, free_cpu_sheaf); STAT_ATTR(FREE_FASTPATH, free_fastpath); STAT_ATTR(FREE_SLOWPATH, free_slowpath); STAT_ATTR(FREE_FROZEN, free_frozen); @@ -7281,6 +8299,14 @@ STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc); STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free); STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node); STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain); +STAT_ATTR(SHEAF_FLUSH, sheaf_flush); +STAT_ATTR(SHEAF_REFILL, sheaf_refill); +STAT_ATTR(SHEAF_ALLOC, sheaf_alloc); +STAT_ATTR(SHEAF_FREE, sheaf_free); +STAT_ATTR(BARN_GET, barn_get); +STAT_ATTR(BARN_GET_FAIL, barn_get_fail); +STAT_ATTR(BARN_PUT, barn_put); +STAT_ATTR(BARN_PUT_FAIL, barn_put_fail); #endif /* CONFIG_SLUB_STATS */ =20 #ifdef CONFIG_KFENCE @@ -7311,6 +8337,7 @@ static struct attribute *slab_attrs[] =3D { &object_size_attr.attr, &objs_per_slab_attr.attr, &order_attr.attr, + &sheaf_capacity_attr.attr, &min_partial_attr.attr, &cpu_partial_attr.attr, &objects_partial_attr.attr, @@ -7342,8 +8369,10 @@ static struct attribute *slab_attrs[] =3D { &remote_node_defrag_ratio_attr.attr, #endif #ifdef CONFIG_SLUB_STATS + &alloc_cpu_sheaf_attr.attr, &alloc_fastpath_attr.attr, &alloc_slowpath_attr.attr, + &free_cpu_sheaf_attr.attr, &free_fastpath_attr.attr, &free_slowpath_attr.attr, &free_frozen_attr.attr, @@ -7368,6 +8397,14 @@ static struct attribute *slab_attrs[] =3D { &cpu_partial_free_attr.attr, &cpu_partial_node_attr.attr, &cpu_partial_drain_attr.attr, + &sheaf_flush_attr.attr, + &sheaf_refill_attr.attr, + &sheaf_alloc_attr.attr, + &sheaf_free_attr.attr, + &barn_get_attr.attr, + &barn_get_fail_attr.attr, + &barn_put_attr.attr, + &barn_put_fail_attr.attr, #endif #ifdef CONFIG_FAILSLAB &failslab_attr.attr, --=20 2.51.0 From nobody Fri Oct 3 18:10:03 2025 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E37AE3375AF for ; Wed, 27 Aug 2025 08:26:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283211; cv=none; b=VzMcJ7yly4pypAFfUU/hPjMWhClOM1KHNBE9b+L+ka+dXM/0OrkCLHkrRlew4SGHXs0VQeILL9TOZjKNrIoAhGT7jCGz6cewB4Zc5XkaVXPfjVKERsxaeJGu+c3b6V1I7roC9vXpjWOub/8AzIQRc1fvqjprd1iNF1Iz4dSMwLk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283211; c=relaxed/simple; bh=3jzoc1tVl/nqut2ZAqI9yIzhP/0lFTqRMmN7y7IZvIQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ryvRkUz3OyplkFHm3FBm3YRhqBbd5TUL4LsLP9cDRTSIsPQC0g0EbUrMppHnMLnN3mEJHnQ5Boqto1jk44DmkAuokEa5oljsWUwLDEhA8sUobt3VDNNBZX/C+El3FaMJKke2j1o6E576Rb2U8tlL8E1tbknhdIYp1wowE4WCmQQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=Wn11fDoW; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=CHq4BEOJ; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=Wn11fDoW; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=CHq4BEOJ; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="Wn11fDoW"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="CHq4BEOJ"; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="Wn11fDoW"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="CHq4BEOJ" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 4B1DB2209A; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yZDXjBw35P2yFQeVQZgP7f941t6WUXjXrrLFu/SNxls=; b=Wn11fDoWyb5lNgkV9KCo0k/lslEl7z1mRoVeMr7SVRMWnRoP8kfzmGCYoctfAPsmB/pMKX EFlgakrM82ktRujIWMwr7x+9ubZJ1bHU7C0zb3/4efuRQ4wcdmBiMKywNCkXZlij9qRKfW iYdV+oY4ArF4Ha0wnXonYZ0thch3tas= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yZDXjBw35P2yFQeVQZgP7f941t6WUXjXrrLFu/SNxls=; b=CHq4BEOJE41StmRy/f/PbhVq59Wt5PcKV00urFT65UM8fhNEpktnM/kmbX95SRh2PA1sqj 4KYHqOVmHEbfMgCA== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yZDXjBw35P2yFQeVQZgP7f941t6WUXjXrrLFu/SNxls=; b=Wn11fDoWyb5lNgkV9KCo0k/lslEl7z1mRoVeMr7SVRMWnRoP8kfzmGCYoctfAPsmB/pMKX EFlgakrM82ktRujIWMwr7x+9ubZJ1bHU7C0zb3/4efuRQ4wcdmBiMKywNCkXZlij9qRKfW iYdV+oY4ArF4Ha0wnXonYZ0thch3tas= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yZDXjBw35P2yFQeVQZgP7f941t6WUXjXrrLFu/SNxls=; b=CHq4BEOJE41StmRy/f/PbhVq59Wt5PcKV00urFT65UM8fhNEpktnM/kmbX95SRh2PA1sqj 4KYHqOVmHEbfMgCA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 31E6313AA3; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id wK7pCzvBrmhNfgAAD6G6ig (envelope-from ); Wed, 27 Aug 2025 08:26:35 +0000 From: Vlastimil Babka Date: Wed, 27 Aug 2025 10:26:35 +0200 Subject: [PATCH v6 03/10] slab: add sheaf support for batching kfree_rcu() operations Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250827-slub-percpu-caches-v6-3-f0f775a3f73f@suse.cz> References: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> In-Reply-To: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> To: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes Cc: Roman Gushchin , Harry Yoo , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, vbabka@suse.cz X-Mailer: b4 0.14.2 X-Spam-Level: X-Spamd-Result: default: False [-8.30 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-0.999]; MIME_GOOD(-0.10)[text/plain]; FUZZY_RATELIMITED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_DN_SOME(0.00)[]; RCPT_COUNT_TWELVE(0.00)[12]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; R_RATELIMIT(0.00)[to_ip_from(RLwn5r54y1cp81no5tmbbew5oc)]; FROM_HAS_DN(0.00)[]; FREEMAIL_CC(0.00)[linux.dev,oracle.com,gmail.com,kvack.org,vger.kernel.org,lists.infradead.org,suse.cz]; MID_RHS_MATCH_FROM(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo,suse.cz:email,suse.cz:mid] X-Spam-Flag: NO X-Spam-Score: -8.30 Extend the sheaf infrastructure for more efficient kfree_rcu() handling. For caches with sheaves, on each cpu maintain a rcu_free sheaf in addition to main and spare sheaves. kfree_rcu() operations will try to put objects on this sheaf. Once full, the sheaf is detached and submitted to call_rcu() with a handler that will try to put it in the barn, or flush to slab pages using bulk free, when the barn is full. Then a new empty sheaf must be obtained to put more objects there. It's possible that no free sheaves are available to use for a new rcu_free sheaf, and the allocation in kfree_rcu() context can only use GFP_NOWAIT and thus may fail. In that case, fall back to the existing kfree_rcu() implementation. Expected advantages: - batching the kfree_rcu() operations, that could eventually replace the existing batching - sheaves can be reused for allocations via barn instead of being flushed to slabs, which is more efficient - this includes cases where only some cpus are allowed to process rcu callbacks (Android) Possible disadvantage: - objects might be waiting for more than their grace period (it is determined by the last object freed into the sheaf), increasing memory usage - but the existing batching does that too. Only implement this for CONFIG_KVFREE_RCU_BATCHED as the tiny implementation favors smaller memory footprint over performance. Add CONFIG_SLUB_STATS counters free_rcu_sheaf and free_rcu_sheaf_fail to count how many kfree_rcu() used the rcu_free sheaf successfully and how many had to fall back to the existing implementation. Reviewed-by: Harry Yoo Reviewed-by: Suren Baghdasaryan Signed-off-by: Vlastimil Babka --- mm/slab.h | 2 + mm/slab_common.c | 24 +++++++ mm/slub.c | 193 +++++++++++++++++++++++++++++++++++++++++++++++++++= ++-- 3 files changed, 214 insertions(+), 5 deletions(-) diff --git a/mm/slab.h b/mm/slab.h index 206987ce44a4d053ebe3b5e50784d2dd23822cd1..f1866f2d9b211bb0d7f24644b80= ef4b50a7c3d24 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -435,6 +435,8 @@ static inline bool is_kmalloc_normal(struct kmem_cache = *s) return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT)); } =20 +bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj); + #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \ SLAB_CACHE_DMA32 | SLAB_PANIC | \ SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS | \ diff --git a/mm/slab_common.c b/mm/slab_common.c index e2b197e47866c30acdbd1fee4159f262a751c5a7..2d806e02568532a1000fd3912db= 6978e945dcfa8 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1608,6 +1608,27 @@ static void kfree_rcu_work(struct work_struct *work) kvfree_rcu_list(head); } =20 +static bool kfree_rcu_sheaf(void *obj) +{ + struct kmem_cache *s; + struct folio *folio; + struct slab *slab; + + if (is_vmalloc_addr(obj)) + return false; + + folio =3D virt_to_folio(obj); + if (unlikely(!folio_test_slab(folio))) + return false; + + slab =3D folio_slab(folio); + s =3D slab->slab_cache; + if (s->cpu_sheaves) + return __kfree_rcu_sheaf(s, obj); + + return false; +} + static bool need_offload_krc(struct kfree_rcu_cpu *krcp) { @@ -1952,6 +1973,9 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr) if (!head) might_sleep(); =20 + if (kfree_rcu_sheaf(ptr)) + return; + // Queue the object but don't yet schedule the batch. if (debug_rcu_head_queue(ptr)) { // Probable double kfree_rcu(), just leak. diff --git a/mm/slub.c b/mm/slub.c index 0822a817c28c2c4666e853ef0f433842c64f607a..7492076cf8c388793c09a64496a= 3b8850ef0d8ec 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -367,6 +367,8 @@ enum stat_item { ALLOC_FASTPATH, /* Allocation from cpu slab */ ALLOC_SLOWPATH, /* Allocation by getting a new cpu slab */ FREE_PCS, /* Free to percpu sheaf */ + FREE_RCU_SHEAF, /* Free to rcu_free sheaf */ + FREE_RCU_SHEAF_FAIL, /* Failed to free to a rcu_free sheaf */ FREE_FASTPATH, /* Free to cpu slab */ FREE_SLOWPATH, /* Freeing not to cpu slab */ FREE_FROZEN, /* Freeing to frozen slab */ @@ -461,6 +463,7 @@ struct slab_sheaf { struct rcu_head rcu_head; struct list_head barn_list; }; + struct kmem_cache *cache; unsigned int size; void *objects[]; }; @@ -469,6 +472,7 @@ struct slub_percpu_sheaves { local_trylock_t lock; struct slab_sheaf *main; /* never NULL when unlocked */ struct slab_sheaf *spare; /* empty or full, may be NULL */ + struct slab_sheaf *rcu_free; /* for batching kfree_rcu() */ struct node_barn *barn; }; =20 @@ -2526,6 +2530,8 @@ static struct slab_sheaf *alloc_empty_sheaf(struct km= em_cache *s, gfp_t gfp) if (unlikely(!sheaf)) return NULL; =20 + sheaf->cache =3D s; + stat(s, SHEAF_ALLOC); =20 return sheaf; @@ -2650,6 +2656,43 @@ static void sheaf_flush_unused(struct kmem_cache *s,= struct slab_sheaf *sheaf) sheaf->size =3D 0; } =20 +static void __rcu_free_sheaf_prepare(struct kmem_cache *s, + struct slab_sheaf *sheaf) +{ + bool init =3D slab_want_init_on_free(s); + void **p =3D &sheaf->objects[0]; + unsigned int i =3D 0; + + while (i < sheaf->size) { + struct slab *slab =3D virt_to_slab(p[i]); + + memcg_slab_free_hook(s, slab, p + i, 1); + alloc_tagging_slab_free_hook(s, slab, p + i, 1); + + if (unlikely(!slab_free_hook(s, p[i], init, true))) { + p[i] =3D p[--sheaf->size]; + continue; + } + + i++; + } +} + +static void rcu_free_sheaf_nobarn(struct rcu_head *head) +{ + struct slab_sheaf *sheaf; + struct kmem_cache *s; + + sheaf =3D container_of(head, struct slab_sheaf, rcu_head); + s =3D sheaf->cache; + + __rcu_free_sheaf_prepare(s, sheaf); + + sheaf_flush_unused(s, sheaf); + + free_empty_sheaf(s, sheaf); +} + /* * Caller needs to make sure migration is disabled in order to fully flush * single cpu's sheaves @@ -2662,7 +2705,7 @@ static void sheaf_flush_unused(struct kmem_cache *s, = struct slab_sheaf *sheaf) static void pcs_flush_all(struct kmem_cache *s) { struct slub_percpu_sheaves *pcs; - struct slab_sheaf *spare; + struct slab_sheaf *spare, *rcu_free; =20 local_lock(&s->cpu_sheaves->lock); pcs =3D this_cpu_ptr(s->cpu_sheaves); @@ -2670,6 +2713,9 @@ static void pcs_flush_all(struct kmem_cache *s) spare =3D pcs->spare; pcs->spare =3D NULL; =20 + rcu_free =3D pcs->rcu_free; + pcs->rcu_free =3D NULL; + local_unlock(&s->cpu_sheaves->lock); =20 if (spare) { @@ -2677,6 +2723,9 @@ static void pcs_flush_all(struct kmem_cache *s) free_empty_sheaf(s, spare); } =20 + if (rcu_free) + call_rcu(&rcu_free->rcu_head, rcu_free_sheaf_nobarn); + sheaf_flush_main(s); } =20 @@ -2693,6 +2742,11 @@ static void __pcs_flush_all_cpu(struct kmem_cache *s= , unsigned int cpu) free_empty_sheaf(s, pcs->spare); pcs->spare =3D NULL; } + + if (pcs->rcu_free) { + call_rcu(&pcs->rcu_free->rcu_head, rcu_free_sheaf_nobarn); + pcs->rcu_free =3D NULL; + } } =20 static void pcs_destroy(struct kmem_cache *s) @@ -2718,6 +2772,7 @@ static void pcs_destroy(struct kmem_cache *s) */ =20 WARN_ON(pcs->spare); + WARN_ON(pcs->rcu_free); =20 if (!WARN_ON(pcs->main->size)) { free_empty_sheaf(s, pcs->main); @@ -3775,7 +3830,7 @@ static bool has_pcs_used(int cpu, struct kmem_cache *= s) =20 pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu); =20 - return (pcs->spare || pcs->main->size); + return (pcs->spare || pcs->rcu_free || pcs->main->size); } =20 static void pcs_flush_all(struct kmem_cache *s); @@ -5401,6 +5456,127 @@ bool free_to_pcs(struct kmem_cache *s, void *object) return true; } =20 +static void rcu_free_sheaf(struct rcu_head *head) +{ + struct slab_sheaf *sheaf; + struct node_barn *barn; + struct kmem_cache *s; + + sheaf =3D container_of(head, struct slab_sheaf, rcu_head); + + s =3D sheaf->cache; + + /* + * This may remove some objects due to slab_free_hook() returning false, + * so that the sheaf might no longer be completely full. But it's easier + * to handle it as full (unless it became completely empty), as the code + * handles it fine. The only downside is that sheaf will serve fewer + * allocations when reused. It only happens due to debugging, which is a + * performance hit anyway. + */ + __rcu_free_sheaf_prepare(s, sheaf); + + barn =3D get_node(s, numa_mem_id())->barn; + + /* due to slab_free_hook() */ + if (unlikely(sheaf->size =3D=3D 0)) + goto empty; + + /* + * Checking nr_full/nr_empty outside lock avoids contention in case the + * barn is at the respective limit. Due to the race we might go over the + * limit but that should be rare and harmless. + */ + + if (data_race(barn->nr_full) < MAX_FULL_SHEAVES) { + stat(s, BARN_PUT); + barn_put_full_sheaf(barn, sheaf); + return; + } + + stat(s, BARN_PUT_FAIL); + sheaf_flush_unused(s, sheaf); + +empty: + if (data_race(barn->nr_empty) < MAX_EMPTY_SHEAVES) { + barn_put_empty_sheaf(barn, sheaf); + return; + } + + free_empty_sheaf(s, sheaf); +} + +bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj) +{ + struct slub_percpu_sheaves *pcs; + struct slab_sheaf *rcu_sheaf; + + if (!local_trylock(&s->cpu_sheaves->lock)) + goto fail; + + pcs =3D this_cpu_ptr(s->cpu_sheaves); + + if (unlikely(!pcs->rcu_free)) { + + struct slab_sheaf *empty; + + if (pcs->spare && pcs->spare->size =3D=3D 0) { + pcs->rcu_free =3D pcs->spare; + pcs->spare =3D NULL; + goto do_free; + } + + empty =3D barn_get_empty_sheaf(pcs->barn); + + if (empty) { + pcs->rcu_free =3D empty; + goto do_free; + } + + local_unlock(&s->cpu_sheaves->lock); + + empty =3D alloc_empty_sheaf(s, GFP_NOWAIT); + + if (!empty) + goto fail; + + if (!local_trylock(&s->cpu_sheaves->lock)) { + barn_put_empty_sheaf(pcs->barn, empty); + goto fail; + } + + pcs =3D this_cpu_ptr(s->cpu_sheaves); + + if (unlikely(pcs->rcu_free)) + barn_put_empty_sheaf(pcs->barn, empty); + else + pcs->rcu_free =3D empty; + } + +do_free: + + rcu_sheaf =3D pcs->rcu_free; + + rcu_sheaf->objects[rcu_sheaf->size++] =3D obj; + + if (likely(rcu_sheaf->size < s->sheaf_capacity)) + rcu_sheaf =3D NULL; + else + pcs->rcu_free =3D NULL; + + local_unlock(&s->cpu_sheaves->lock); + + if (rcu_sheaf) + call_rcu(&rcu_sheaf->rcu_head, rcu_free_sheaf); + + stat(s, FREE_RCU_SHEAF); + return true; + +fail: + stat(s, FREE_RCU_SHEAF_FAIL); + return false; +} + /* * Bulk free objects to the percpu sheaves. * Unlike free_to_pcs() this includes the calls to all necessary hooks @@ -5410,10 +5586,8 @@ static void free_to_pcs_bulk(struct kmem_cache *s, s= ize_t size, void **p) { struct slub_percpu_sheaves *pcs; struct slab_sheaf *main, *empty; + bool init =3D slab_want_init_on_free(s); unsigned int batch, i =3D 0; - bool init; - - init =3D slab_want_init_on_free(s); =20 while (i < size) { struct slab *slab =3D virt_to_slab(p[i]); @@ -6900,6 +7074,11 @@ int __kmem_cache_shutdown(struct kmem_cache *s) struct kmem_cache_node *n; =20 flush_all_cpus_locked(s); + + /* we might have rcu sheaves in flight */ + if (s->cpu_sheaves) + rcu_barrier(); + /* Attempt to free all objects */ for_each_kmem_cache_node(s, node, n) { if (n->barn) @@ -8275,6 +8454,8 @@ STAT_ATTR(ALLOC_PCS, alloc_cpu_sheaf); STAT_ATTR(ALLOC_FASTPATH, alloc_fastpath); STAT_ATTR(ALLOC_SLOWPATH, alloc_slowpath); STAT_ATTR(FREE_PCS, free_cpu_sheaf); +STAT_ATTR(FREE_RCU_SHEAF, free_rcu_sheaf); +STAT_ATTR(FREE_RCU_SHEAF_FAIL, free_rcu_sheaf_fail); STAT_ATTR(FREE_FASTPATH, free_fastpath); STAT_ATTR(FREE_SLOWPATH, free_slowpath); STAT_ATTR(FREE_FROZEN, free_frozen); @@ -8373,6 +8554,8 @@ static struct attribute *slab_attrs[] =3D { &alloc_fastpath_attr.attr, &alloc_slowpath_attr.attr, &free_cpu_sheaf_attr.attr, + &free_rcu_sheaf_attr.attr, + &free_rcu_sheaf_fail_attr.attr, &free_fastpath_attr.attr, &free_slowpath_attr.attr, &free_frozen_attr.attr, --=20 2.51.0 From nobody Fri Oct 3 18:10:03 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4E4E340DA4 for ; Wed, 27 Aug 2025 08:26:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283214; cv=none; b=G5X0rglfpaeqsOz4toV8UhpcH4Y5twmN3NlWEDTnO7kmzsPBM6MGAdxPHKrq4EXVClXDyad8jH5Wy361qKn9pv2xHlcdA0O3yQgYPY4o3yrTiS9VezTJk6Wgu2z2I+gH6rWXI94/Q4bKi+u/OrzGMoYKZwJp04Y/BYq+GDAQ/Co= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283214; c=relaxed/simple; bh=OoHQtjdsDQ4TenMxZAvnlo3D7ftRn8V63+9oZX6+ADQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=TovZQWQGrdgnrFv6FHOPmEpAivTh5TMu2iMn0AN06/88t4F/CPjwBV+1Su1NIJW7lCUke1gIM/aQDFoxJ3Y955XldLqKnTrwTCqnuSdk2H2CLj0j/A+UsCN8Vf0Zb+DvnztWjWP2/xPE+vIPiGfFhrJZK9eddUTH0V2OVan78J8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 5D5521FF21; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Authentication-Results: smtp-out2.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 474AA13310; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 6LsiETvBrmhNfgAAD6G6ig (envelope-from ); Wed, 27 Aug 2025 08:26:35 +0000 From: Vlastimil Babka Date: Wed, 27 Aug 2025 10:26:36 +0200 Subject: [PATCH v6 04/10] slab: sheaf prefilling for guaranteed allocations Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250827-slub-percpu-caches-v6-4-f0f775a3f73f@suse.cz> References: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> In-Reply-To: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> To: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes Cc: Roman Gushchin , Harry Yoo , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, vbabka@suse.cz X-Mailer: b4 0.14.2 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Action: no action X-Spam-Flag: NO X-Spam-Level: X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Rspamd-Queue-Id: 5D5521FF21 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spam-Score: -4.00 Add functions for efficient guaranteed allocations e.g. in a critical section that cannot sleep, when the exact number of allocations is not known beforehand, but an upper limit can be calculated. kmem_cache_prefill_sheaf() returns a sheaf containing at least given number of objects. kmem_cache_alloc_from_sheaf() will allocate an object from the sheaf and is guaranteed not to fail until depleted. kmem_cache_return_sheaf() is for giving the sheaf back to the slab allocator after the critical section. This will also attempt to refill it to cache's sheaf capacity for better efficiency of sheaves handling, but it's not stricly necessary to succeed. kmem_cache_refill_sheaf() can be used to refill a previously obtained sheaf to requested size. If the current size is sufficient, it does nothing. If the requested size exceeds cache's sheaf_capacity and the sheaf's current capacity, the sheaf will be replaced with a new one, hence the indirect pointer parameter. kmem_cache_sheaf_size() can be used to query the current size. The implementation supports requesting sizes that exceed cache's sheaf_capacity, but it is not efficient - such "oversize" sheaves are allocated fresh in kmem_cache_prefill_sheaf() and flushed and freed immediately by kmem_cache_return_sheaf(). kmem_cache_refill_sheaf() might be especially ineffective when replacing a sheaf with a new one of a larger capacity. It is therefore better to size cache's sheaf_capacity accordingly to make oversize sheaves exceptional. CONFIG_SLUB_STATS counters are added for sheaf prefill and return operations. A prefill or return is considered _fast when it is able to grab or return a percpu spare sheaf (even if the sheaf needs a refill to satisfy the request, as those should amortize over time), and _slow otherwise (when the barn or even sheaf allocation/freeing has to be involved). sheaf_prefill_oversize is provided to determine how many prefills were oversize (counter for oversize returns is not necessary as all oversize refills result in oversize returns). When slub_debug is enabled for a cache with sheaves, no percpu sheaves exist for it, but the prefill functionality is still provided simply by all prefilled sheaves becoming oversize. If percpu sheaves are not created for a cache due to not passing the sheaf_capacity argument on cache creation, the prefills also work through oversize sheaves, but there's a WARN_ON_ONCE() to indicate the omission. Reviewed-by: Suren Baghdasaryan Reviewed-by: Harry Yoo Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 16 ++++ mm/slub.c | 265 +++++++++++++++++++++++++++++++++++++++++++++++= ++++ 2 files changed, 281 insertions(+) diff --git a/include/linux/slab.h b/include/linux/slab.h index 49acbcdc6696fd120c402adf757b3f41660ad50a..680193356ac7a22f9df5cd9b71f= f8b81e26404ad 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -829,6 +829,22 @@ void *kmem_cache_alloc_node_noprof(struct kmem_cache *= s, gfp_t flags, int node) __assume_slab_alignment __malloc; #define kmem_cache_alloc_node(...) alloc_hooks(kmem_cache_alloc_node_nopro= f(__VA_ARGS__)) =20 +struct slab_sheaf * +kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int siz= e); + +int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf **sheafp, unsigned int size); + +void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf *sheaf); + +void *kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *cachep, gfp_t = gfp, + struct slab_sheaf *sheaf) __assume_slab_alignment __malloc; +#define kmem_cache_alloc_from_sheaf(...) \ + alloc_hooks(kmem_cache_alloc_from_sheaf_noprof(__VA_ARGS__)) + +unsigned int kmem_cache_sheaf_size(struct slab_sheaf *sheaf); + /* * These macros allow declaring a kmem_buckets * parameter alongside size,= which * can be compiled out with CONFIG_SLAB_BUCKETS=3Dn so that a large number= of call diff --git a/mm/slub.c b/mm/slub.c index 7492076cf8c388793c09a64496a3b8850ef0d8ec..c8dda640f95e7e738cf2ceb05b9= 8d1176df6e83f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -401,6 +401,11 @@ enum stat_item { BARN_GET_FAIL, /* Failed to get full sheaf from barn */ BARN_PUT, /* Put full sheaf to barn */ BARN_PUT_FAIL, /* Failed to put full sheaf to barn */ + SHEAF_PREFILL_FAST, /* Sheaf prefill grabbed the spare sheaf */ + SHEAF_PREFILL_SLOW, /* Sheaf prefill found no spare sheaf */ + SHEAF_PREFILL_OVERSIZE, /* Allocation of oversize sheaf for prefill */ + SHEAF_RETURN_FAST, /* Sheaf return reattached spare sheaf */ + SHEAF_RETURN_SLOW, /* Sheaf return could not reattach spare */ NR_SLUB_STAT_ITEMS }; =20 @@ -462,6 +467,8 @@ struct slab_sheaf { union { struct rcu_head rcu_head; struct list_head barn_list; + /* only used for prefilled sheafs */ + unsigned int capacity; }; struct kmem_cache *cache; unsigned int size; @@ -2833,6 +2840,30 @@ static void barn_put_full_sheaf(struct node_barn *ba= rn, struct slab_sheaf *sheaf spin_unlock_irqrestore(&barn->lock, flags); } =20 +static struct slab_sheaf *barn_get_full_or_empty_sheaf(struct node_barn *b= arn) +{ + struct slab_sheaf *sheaf =3D NULL; + unsigned long flags; + + spin_lock_irqsave(&barn->lock, flags); + + if (barn->nr_full) { + sheaf =3D list_first_entry(&barn->sheaves_full, struct slab_sheaf, + barn_list); + list_del(&sheaf->barn_list); + barn->nr_full--; + } else if (barn->nr_empty) { + sheaf =3D list_first_entry(&barn->sheaves_empty, + struct slab_sheaf, barn_list); + list_del(&sheaf->barn_list); + barn->nr_empty--; + } + + spin_unlock_irqrestore(&barn->lock, flags); + + return sheaf; +} + /* * If a full sheaf is available, return it and put the supplied empty one = to * barn. We ignore the limit on empty sheaves as the number of sheaves doe= sn't @@ -4962,6 +4993,230 @@ void *kmem_cache_alloc_node_noprof(struct kmem_cach= e *s, gfp_t gfpflags, int nod } EXPORT_SYMBOL(kmem_cache_alloc_node_noprof); =20 +/* + * returns a sheaf that has at least the requested size + * when prefilling is needed, do so with given gfp flags + * + * return NULL if sheaf allocation or prefilling failed + */ +struct slab_sheaf * +kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int siz= e) +{ + struct slub_percpu_sheaves *pcs; + struct slab_sheaf *sheaf =3D NULL; + + if (unlikely(size > s->sheaf_capacity)) { + + /* + * slab_debug disables cpu sheaves intentionally so all + * prefilled sheaves become "oversize" and we give up on + * performance for the debugging. Same with SLUB_TINY. + * Creating a cache without sheaves and then requesting a + * prefilled sheaf is however not expected, so warn. + */ + WARN_ON_ONCE(s->sheaf_capacity =3D=3D 0 && + !IS_ENABLED(CONFIG_SLUB_TINY) && + !(s->flags & SLAB_DEBUG_FLAGS)); + + sheaf =3D kzalloc(struct_size(sheaf, objects, size), gfp); + if (!sheaf) + return NULL; + + stat(s, SHEAF_PREFILL_OVERSIZE); + sheaf->cache =3D s; + sheaf->capacity =3D size; + + if (!__kmem_cache_alloc_bulk(s, gfp, size, + &sheaf->objects[0])) { + kfree(sheaf); + return NULL; + } + + sheaf->size =3D size; + + return sheaf; + } + + local_lock(&s->cpu_sheaves->lock); + pcs =3D this_cpu_ptr(s->cpu_sheaves); + + if (pcs->spare) { + sheaf =3D pcs->spare; + pcs->spare =3D NULL; + stat(s, SHEAF_PREFILL_FAST); + } else { + stat(s, SHEAF_PREFILL_SLOW); + sheaf =3D barn_get_full_or_empty_sheaf(pcs->barn); + if (sheaf && sheaf->size) + stat(s, BARN_GET); + else + stat(s, BARN_GET_FAIL); + } + + local_unlock(&s->cpu_sheaves->lock); + + + if (!sheaf) + sheaf =3D alloc_empty_sheaf(s, gfp); + + if (sheaf && sheaf->size < size) { + if (refill_sheaf(s, sheaf, gfp)) { + sheaf_flush_unused(s, sheaf); + free_empty_sheaf(s, sheaf); + sheaf =3D NULL; + } + } + + if (sheaf) + sheaf->capacity =3D s->sheaf_capacity; + + return sheaf; +} + +/* + * Use this to return a sheaf obtained by kmem_cache_prefill_sheaf() + * + * If the sheaf cannot simply become the percpu spare sheaf, but there's s= pace + * for a full sheaf in the barn, we try to refill the sheaf back to the ca= che's + * sheaf_capacity to avoid handling partially full sheaves. + * + * If the refill fails because gfp is e.g. GFP_NOWAIT, or the barn is full= , the + * sheaf is instead flushed and freed. + */ +void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf *sheaf) +{ + struct slub_percpu_sheaves *pcs; + struct node_barn *barn; + + if (unlikely(sheaf->capacity !=3D s->sheaf_capacity)) { + sheaf_flush_unused(s, sheaf); + kfree(sheaf); + return; + } + + local_lock(&s->cpu_sheaves->lock); + pcs =3D this_cpu_ptr(s->cpu_sheaves); + + if (!pcs->spare) { + pcs->spare =3D sheaf; + sheaf =3D NULL; + stat(s, SHEAF_RETURN_FAST); + } + + local_unlock(&s->cpu_sheaves->lock); + + if (!sheaf) + return; + + stat(s, SHEAF_RETURN_SLOW); + + /* Accessing pcs->barn outside local_lock is safe. */ + barn =3D pcs->barn; + + /* + * If the barn has too many full sheaves or we fail to refill the sheaf, + * simply flush and free it. + */ + if (data_race(pcs->barn->nr_full) >=3D MAX_FULL_SHEAVES || + refill_sheaf(s, sheaf, gfp)) { + sheaf_flush_unused(s, sheaf); + free_empty_sheaf(s, sheaf); + return; + } + + barn_put_full_sheaf(barn, sheaf); + stat(s, BARN_PUT); +} + +/* + * refill a sheaf previously returned by kmem_cache_prefill_sheaf to at le= ast + * the given size + * + * the sheaf might be replaced by a new one when requesting more than + * s->sheaf_capacity objects if such replacement is necessary, but the ref= ill + * fails (returning -ENOMEM), the existing sheaf is left intact + * + * In practice we always refill to full sheaf's capacity. + */ +int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf **sheafp, unsigned int size) +{ + struct slab_sheaf *sheaf; + + /* + * TODO: do we want to support *sheaf =3D=3D NULL to be equivalent of + * kmem_cache_prefill_sheaf() ? + */ + if (!sheafp || !(*sheafp)) + return -EINVAL; + + sheaf =3D *sheafp; + if (sheaf->size >=3D size) + return 0; + + if (likely(sheaf->capacity >=3D size)) { + if (likely(sheaf->capacity =3D=3D s->sheaf_capacity)) + return refill_sheaf(s, sheaf, gfp); + + if (!__kmem_cache_alloc_bulk(s, gfp, sheaf->capacity - sheaf->size, + &sheaf->objects[sheaf->size])) { + return -ENOMEM; + } + sheaf->size =3D sheaf->capacity; + + return 0; + } + + /* + * We had a regular sized sheaf and need an oversize one, or we had an + * oversize one already but need a larger one now. + * This should be a very rare path so let's not complicate it. + */ + sheaf =3D kmem_cache_prefill_sheaf(s, gfp, size); + if (!sheaf) + return -ENOMEM; + + kmem_cache_return_sheaf(s, gfp, *sheafp); + *sheafp =3D sheaf; + return 0; +} + +/* + * Allocate from a sheaf obtained by kmem_cache_prefill_sheaf() + * + * Guaranteed not to fail as many allocations as was the requested size. + * After the sheaf is emptied, it fails - no fallback to the slab cache it= self. + * + * The gfp parameter is meant only to specify __GFP_ZERO or __GFP_ACCOUNT + * memcg charging is forced over limit if necessary, to avoid failure. + */ +void * +kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf *sheaf) +{ + void *ret =3D NULL; + bool init; + + if (sheaf->size =3D=3D 0) + goto out; + + ret =3D sheaf->objects[--sheaf->size]; + + init =3D slab_want_init_on_alloc(gfp, s); + + /* add __GFP_NOFAIL to force successful memcg charging */ + slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, init, s->objec= t_size); +out: + trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE); + + return ret; +} + +unsigned int kmem_cache_sheaf_size(struct slab_sheaf *sheaf) +{ + return sheaf->size; +} /* * To avoid unnecessary overhead, we pass through large allocation requests * directly to the page allocator. We use __GFP_COMP, because we will need= to @@ -8488,6 +8743,11 @@ STAT_ATTR(BARN_GET, barn_get); STAT_ATTR(BARN_GET_FAIL, barn_get_fail); STAT_ATTR(BARN_PUT, barn_put); STAT_ATTR(BARN_PUT_FAIL, barn_put_fail); +STAT_ATTR(SHEAF_PREFILL_FAST, sheaf_prefill_fast); +STAT_ATTR(SHEAF_PREFILL_SLOW, sheaf_prefill_slow); +STAT_ATTR(SHEAF_PREFILL_OVERSIZE, sheaf_prefill_oversize); +STAT_ATTR(SHEAF_RETURN_FAST, sheaf_return_fast); +STAT_ATTR(SHEAF_RETURN_SLOW, sheaf_return_slow); #endif /* CONFIG_SLUB_STATS */ =20 #ifdef CONFIG_KFENCE @@ -8588,6 +8848,11 @@ static struct attribute *slab_attrs[] =3D { &barn_get_fail_attr.attr, &barn_put_attr.attr, &barn_put_fail_attr.attr, + &sheaf_prefill_fast_attr.attr, + &sheaf_prefill_slow_attr.attr, + &sheaf_prefill_oversize_attr.attr, + &sheaf_return_fast_attr.attr, + &sheaf_return_slow_attr.attr, #endif #ifdef CONFIG_FAILSLAB &failslab_attr.attr, --=20 2.51.0 From nobody Fri Oct 3 18:10:03 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBA96341ABD for ; Wed, 27 Aug 2025 08:26:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283220; cv=none; b=ip2TyPw+P4FV+Q2pO4/SqMZIcITTTnVk1zrNNvREPatW7Q9lzRLE6ltpdieHu6L6iW6pfLoKlbyX0+6aacE1Jg3vHtyKBVsAGd3PhapRdB7qtjTNkMv2gu0b26DW9ftApZbM3qZgvi4cd+rZeGCa0eAOgckUlfsF4WDgHKLYg00= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283220; c=relaxed/simple; bh=v+q9Vxilok3SbfLK/5B4/q60PmqhPNOvmXaMItd00rQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=WszsBgjDiJPhAbQG9NRjIvl1LIdfhJUFpaHo2gTwEhCPHyo/5Dia9HUaU1HSmLihOBFhifUBsHZ6VmpXKoE40PCMLDQ+P3M6fTvqxNbTp7L73z+Rc1LElAFtQy5eoWIzV9nfJHjRRY1IBE6V80OKHU7jHO4u2MnUH1NONApch6E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 6EEFB1FF22; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Authentication-Results: smtp-out2.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 5A4F813A31; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id oFnDFTvBrmhNfgAAD6G6ig (envelope-from ); Wed, 27 Aug 2025 08:26:35 +0000 From: Vlastimil Babka Date: Wed, 27 Aug 2025 10:26:37 +0200 Subject: [PATCH v6 05/10] slab: determine barn status racily outside of lock Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250827-slub-percpu-caches-v6-5-f0f775a3f73f@suse.cz> References: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> In-Reply-To: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> To: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes Cc: Roman Gushchin , Harry Yoo , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, vbabka@suse.cz X-Mailer: b4 0.14.2 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spam-Level: X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Rspamd-Queue-Id: 6EEFB1FF22 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Action: no action X-Spam-Flag: NO X-Spam-Score: -4.00 The possibility of many barn operations is determined by the current number of full or empty sheaves. Taking the barn->lock just to find out that e.g. there are no empty sheaves results in unnecessary overhead and lock contention. Thus perform these checks outside of the lock with a data_race() annotated variable read and fail quickly without taking the lock. Checks for sheaf availability that racily succeed have to be obviously repeated under the lock for correctness, but we can skip repeating checks if there are too many sheaves on the given list as the limits don't need to be strict. Signed-off-by: Vlastimil Babka Reviewed-by: Suren Baghdasaryan Reviewed-by: Harry Yoo --- mm/slub.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index c8dda640f95e7e738cf2ceb05b98d1176df6e83f..ee3a222acd6b15389a71bb47429= d22b5326a4624 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2796,9 +2796,12 @@ static struct slab_sheaf *barn_get_empty_sheaf(struc= t node_barn *barn) struct slab_sheaf *empty =3D NULL; unsigned long flags; =20 + if (!data_race(barn->nr_empty)) + return NULL; + spin_lock_irqsave(&barn->lock, flags); =20 - if (barn->nr_empty) { + if (likely(barn->nr_empty)) { empty =3D list_first_entry(&barn->sheaves_empty, struct slab_sheaf, barn_list); list_del(&empty->barn_list); @@ -2845,6 +2848,9 @@ static struct slab_sheaf *barn_get_full_or_empty_shea= f(struct node_barn *barn) struct slab_sheaf *sheaf =3D NULL; unsigned long flags; =20 + if (!data_race(barn->nr_full) && !data_race(barn->nr_empty)) + return NULL; + spin_lock_irqsave(&barn->lock, flags); =20 if (barn->nr_full) { @@ -2875,9 +2881,12 @@ barn_replace_empty_sheaf(struct node_barn *barn, str= uct slab_sheaf *empty) struct slab_sheaf *full =3D NULL; unsigned long flags; =20 + if (!data_race(barn->nr_full)) + return NULL; + spin_lock_irqsave(&barn->lock, flags); =20 - if (barn->nr_full) { + if (likely(barn->nr_full)) { full =3D list_first_entry(&barn->sheaves_full, struct slab_sheaf, barn_list); list_del(&full->barn_list); @@ -2901,19 +2910,23 @@ barn_replace_full_sheaf(struct node_barn *barn, str= uct slab_sheaf *full) struct slab_sheaf *empty; unsigned long flags; =20 + /* we don't repeat this check under barn->lock as it's not critical */ + if (data_race(barn->nr_full) >=3D MAX_FULL_SHEAVES) + return ERR_PTR(-E2BIG); + if (!data_race(barn->nr_empty)) + return ERR_PTR(-ENOMEM); + spin_lock_irqsave(&barn->lock, flags); =20 - if (barn->nr_full >=3D MAX_FULL_SHEAVES) { - empty =3D ERR_PTR(-E2BIG); - } else if (!barn->nr_empty) { - empty =3D ERR_PTR(-ENOMEM); - } else { + if (likely(barn->nr_empty)) { empty =3D list_first_entry(&barn->sheaves_empty, struct slab_sheaf, barn_list); list_del(&empty->barn_list); list_add(&full->barn_list, &barn->sheaves_full); barn->nr_empty--; barn->nr_full++; + } else { + empty =3D ERR_PTR(-ENOMEM); } =20 spin_unlock_irqrestore(&barn->lock, flags); --=20 2.51.0 From nobody Fri Oct 3 18:10:03 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45B5E345729 for ; Wed, 27 Aug 2025 08:27:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283227; cv=none; b=nR9G+P0X4augXiEEXb2KHYQuMwmJmzmNLr+mIWNHqKSUmrrBuN48pe0Vv7VOpZSZP46LfBSVAewYh99IuRbqxcgpz1DoVb0V6TN8oQEWCzOolHNfta2WgMRl70NLAb38hRDiiVfHUiWeiWu0kOvbTW1M830SSg0wMemwGdD9dVA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283227; c=relaxed/simple; bh=wL2lljEY+hDmA1zB6plYdZUpdOhGswBfzWMyZKQkAiI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=cXI1QBMApW1oUlNluEqEWhMcU5AkPqBdDGNyDO/zymAgx60ToTKAXF8nvRM3D2ISlv/YArsJutHmc/L4y6HaZ/60NP26N/1jrqU6BuY849ObTvxtz0jR1gKdbZ/fcZbvi4qc6h/qdDB3FZzrPqmZ97+hWPHnZz2WMYmaZJGEcug= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8CABB1FF23; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Authentication-Results: smtp-out2.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 6CA6E13310; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id cLlDGjvBrmhNfgAAD6G6ig (envelope-from ); Wed, 27 Aug 2025 08:26:35 +0000 From: Vlastimil Babka Date: Wed, 27 Aug 2025 10:26:38 +0200 Subject: [PATCH v6 06/10] slab: skip percpu sheaves for remote object freeing Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250827-slub-percpu-caches-v6-6-f0f775a3f73f@suse.cz> References: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> In-Reply-To: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> To: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes Cc: Roman Gushchin , Harry Yoo , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, vbabka@suse.cz X-Mailer: b4 0.14.2 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Action: no action X-Spam-Flag: NO X-Spam-Level: X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Rspamd-Queue-Id: 8CABB1FF23 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spam-Score: -4.00 Since we don't control the NUMA locality of objects in percpu sheaves, allocations with node restrictions bypass them. Allocations without restrictions may however still expect to get local objects with high probability, and the introduction of sheaves can decrease it due to freed object from a remote node ending up in percpu sheaves. The fraction of such remote frees seems low (5% on an 8-node machine) but it can be expected that some cache or workload specific corner cases exist. We can either conclude that this is not a problem due to the low fraction, or we can make remote frees bypass percpu sheaves and go directly to their slabs. This will make the remote frees more expensive, but if if's only a small fraction, most frees will still benefit from the lower overhead of percpu sheaves. This patch thus makes remote object freeing bypass percpu sheaves, including bulk freeing, and kfree_rcu() via the rcu_free sheaf. However it's not intended to be 100% guarantee that percpu sheaves will only contain local objects. The refill from slabs does not provide that guarantee in the first place, and there might be cpu migrations happening when we need to unlock the local_lock. Avoiding all that could be possible but complicated so we can leave it for later investigation whether it would be worth it. It can be expected that the more selective freeing will itself prevent accumulation of remote objects in percpu sheaves so any such violations would have only short-term effects. Reviewed-by: Harry Yoo Signed-off-by: Vlastimil Babka --- mm/slab_common.c | 7 +++++-- mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++------ 2 files changed, 41 insertions(+), 8 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 2d806e02568532a1000fd3912db6978e945dcfa8..08f5baee1309e5b5f10a22b8b3b= 0a09dfb314419 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1623,8 +1623,11 @@ static bool kfree_rcu_sheaf(void *obj) =20 slab =3D folio_slab(folio); s =3D slab->slab_cache; - if (s->cpu_sheaves) - return __kfree_rcu_sheaf(s, obj); + if (s->cpu_sheaves) { + if (likely(!IS_ENABLED(CONFIG_NUMA) || + slab_nid(slab) =3D=3D numa_mem_id())) + return __kfree_rcu_sheaf(s, obj); + } =20 return false; } diff --git a/mm/slub.c b/mm/slub.c index ee3a222acd6b15389a71bb47429d22b5326a4624..b37e684457e7d14781466c0086d= 1b64df2fd8e9d 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -472,6 +472,7 @@ struct slab_sheaf { }; struct kmem_cache *cache; unsigned int size; + int node; /* only used for rcu_sheaf */ void *objects[]; }; =20 @@ -5744,7 +5745,7 @@ static void rcu_free_sheaf(struct rcu_head *head) */ __rcu_free_sheaf_prepare(s, sheaf); =20 - barn =3D get_node(s, numa_mem_id())->barn; + barn =3D get_node(s, sheaf->node)->barn; =20 /* due to slab_free_hook() */ if (unlikely(sheaf->size =3D=3D 0)) @@ -5827,10 +5828,12 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *= obj) =20 rcu_sheaf->objects[rcu_sheaf->size++] =3D obj; =20 - if (likely(rcu_sheaf->size < s->sheaf_capacity)) + if (likely(rcu_sheaf->size < s->sheaf_capacity)) { rcu_sheaf =3D NULL; - else + } else { pcs->rcu_free =3D NULL; + rcu_sheaf->node =3D numa_mem_id(); + } =20 local_unlock(&s->cpu_sheaves->lock); =20 @@ -5856,7 +5859,11 @@ static void free_to_pcs_bulk(struct kmem_cache *s, s= ize_t size, void **p) struct slab_sheaf *main, *empty; bool init =3D slab_want_init_on_free(s); unsigned int batch, i =3D 0; + void *remote_objects[PCS_BATCH_MAX]; + unsigned int remote_nr =3D 0; + int node =3D numa_mem_id(); =20 +next_remote_batch: while (i < size) { struct slab *slab =3D virt_to_slab(p[i]); =20 @@ -5866,7 +5873,15 @@ static void free_to_pcs_bulk(struct kmem_cache *s, s= ize_t size, void **p) if (unlikely(!slab_free_hook(s, p[i], init, false))) { p[i] =3D p[--size]; if (!size) - return; + goto flush_remote; + continue; + } + + if (unlikely(IS_ENABLED(CONFIG_NUMA) && slab_nid(slab) !=3D node)) { + remote_objects[remote_nr] =3D p[i]; + p[i] =3D p[--size]; + if (++remote_nr >=3D PCS_BATCH_MAX) + goto flush_remote; continue; } =20 @@ -5934,6 +5949,15 @@ static void free_to_pcs_bulk(struct kmem_cache *s, s= ize_t size, void **p) */ fallback: __kmem_cache_free_bulk(s, size, p); + +flush_remote: + if (remote_nr) { + __kmem_cache_free_bulk(s, remote_nr, &remote_objects[0]); + if (i < size) { + remote_nr =3D 0; + goto next_remote_batch; + } + } } =20 #ifndef CONFIG_SLUB_TINY @@ -6025,8 +6049,14 @@ void slab_free(struct kmem_cache *s, struct slab *sl= ab, void *object, if (unlikely(!slab_free_hook(s, object, slab_want_init_on_free(s), false)= )) return; =20 - if (!s->cpu_sheaves || !free_to_pcs(s, object)) - do_slab_free(s, slab, object, object, 1, addr); + if (s->cpu_sheaves && likely(!IS_ENABLED(CONFIG_NUMA) || + slab_nid(slab) =3D=3D numa_mem_id())) { + if (likely(free_to_pcs(s, object))) { + return; + } + } + + do_slab_free(s, slab, object, object, 1, addr); } =20 #ifdef CONFIG_MEMCG --=20 2.51.0 From nobody Fri Oct 3 18:10:03 2025 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B02E633CEB3 for ; Wed, 27 Aug 2025 08:26:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283205; cv=none; b=tJmhYhDSnsO61W63NVe54ZI2NsJk4m3OjsYRXN8DZbYAlHzItckyL0Kk3gG8Yz0ehegWbfDX+1ZCQF5umZrVobZG2n76GHJvPMtlE0JpMxqCkv01M6YVEN219rnXh000GbuQemaq7wjkRpHef/WMKgG//CBIfvDytQcqZG8cMrU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283205; c=relaxed/simple; bh=v9obO0py9wopSX+MdWuhyrZ0RzarMmXuW475AFi3C+k=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=qDmlq7C6PAEQ+E6jCQG1uGF+NmjcN/yt3PpinzKbkJxhJsIM93MHrIi602WI8HA8Jcr6vo0CtFCeQKtnOytGIaXSwH/J99GsJxCHbDyaCYoKniDJzYQcEBkiUssO/jcoGeMguhI5SO/hhDp42DTBjBWDXW1O1KNANCPdbKAAOyw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 953222207C; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Authentication-Results: smtp-out1.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 7F96A13A31; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 2KXgHjvBrmhNfgAAD6G6ig (envelope-from ); Wed, 27 Aug 2025 08:26:35 +0000 From: Vlastimil Babka Date: Wed, 27 Aug 2025 10:26:39 +0200 Subject: [PATCH v6 07/10] slab: allow NUMA restricted allocations to use percpu sheaves Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250827-slub-percpu-caches-v6-7-f0f775a3f73f@suse.cz> References: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> In-Reply-To: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> To: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes Cc: Roman Gushchin , Harry Yoo , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, vbabka@suse.cz X-Mailer: b4 0.14.2 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Action: no action X-Spam-Flag: NO X-Spam-Level: X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Rspamd-Queue-Id: 953222207C X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spam-Score: -4.00 Currently allocations asking for a specific node explicitly or via mempolicy in strict_numa node bypass percpu sheaves. Since sheaves contain mostly local objects, we can try allocating from them if the local node happens to be the requested node or allowed by the mempolicy. If we find the object from percpu sheaves is not from the expected node, we skip the sheaves - this should be rare. Reviewed-by: Harry Yoo Signed-off-by: Vlastimil Babka --- mm/slub.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 46 insertions(+), 7 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index b37e684457e7d14781466c0086d1b64df2fd8e9d..aeaffcbca49b3e50ef345c3a6f2= 4d007b53ef24e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4808,18 +4808,43 @@ __pcs_replace_empty_main(struct kmem_cache *s, stru= ct slub_percpu_sheaves *pcs, } =20 static __fastpath_inline -void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp) +void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp, int node) { struct slub_percpu_sheaves *pcs; + bool node_requested; void *object; =20 #ifdef CONFIG_NUMA - if (static_branch_unlikely(&strict_numa)) { - if (current->mempolicy) - return NULL; + if (static_branch_unlikely(&strict_numa) && + node =3D=3D NUMA_NO_NODE) { + + struct mempolicy *mpol =3D current->mempolicy; + + if (mpol) { + /* + * Special BIND rule support. If the local node + * is in permitted set then do not redirect + * to a particular node. + * Otherwise we apply the memory policy to get + * the node we need to allocate on. + */ + if (mpol->mode !=3D MPOL_BIND || + !node_isset(numa_mem_id(), mpol->nodes)) + + node =3D mempolicy_slab_node(); + } } #endif =20 + node_requested =3D IS_ENABLED(CONFIG_NUMA) && node !=3D NUMA_NO_NODE; + + /* + * We assume the percpu sheaves contain only local objects although it's + * not completely guaranteed, so we verify later. + */ + if (unlikely(node_requested && node !=3D numa_mem_id())) + return NULL; + if (!local_trylock(&s->cpu_sheaves->lock)) return NULL; =20 @@ -4831,7 +4856,21 @@ void *alloc_from_pcs(struct kmem_cache *s, gfp_t gfp) return NULL; } =20 - object =3D pcs->main->objects[--pcs->main->size]; + object =3D pcs->main->objects[pcs->main->size - 1]; + + if (unlikely(node_requested)) { + /* + * Verify that the object was from the node we want. This could + * be false because of cpu migration during an unlocked part of + * the current allocation or previous freeing process. + */ + if (folio_nid(virt_to_folio(object)) !=3D node) { + local_unlock(&s->cpu_sheaves->lock); + return NULL; + } + } + + pcs->main->size--; =20 local_unlock(&s->cpu_sheaves->lock); =20 @@ -4931,8 +4970,8 @@ static __fastpath_inline void *slab_alloc_node(struct= kmem_cache *s, struct list if (unlikely(object)) goto out; =20 - if (s->cpu_sheaves && node =3D=3D NUMA_NO_NODE) - object =3D alloc_from_pcs(s, gfpflags); + if (s->cpu_sheaves) + object =3D alloc_from_pcs(s, gfpflags, node); =20 if (!object) object =3D __slab_alloc_node(s, gfpflags, node, addr, orig_size); --=20 2.51.0 From nobody Fri Oct 3 18:10:03 2025 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5FBF4341658 for ; Wed, 27 Aug 2025 08:26:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283217; cv=none; b=OPQIDqqMx3iDI85EmSkLSbF7PHR0Gd/XQfUi8f/+SHHuZ2L5Vf1AzozvjwlOx6do6vHxus7E7vtK4+RH8mjV9U95ElqDKNUOPhpq8dHSJAuSQynsKqb7Lh9ljU8ocovBTpfondyXH5gQVCJqOhHnbb/5qqSM/mOhgRk1/foK7+4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283217; c=relaxed/simple; bh=zXygsHQ8M75ldUrzHX4v4b361hHPjjuRMUX5R7ROHQM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=H19bSsI1Cku0Ebk6T1BmdQSlhJNTGCurJMHgNLwLEy70Y2lKtZz2cuet8whn4dVseHvmihYsdAWKksc7dq8Sy6zEYnmTVlSZt78mOUnA+sXax2Lxv/U5LcHnPn9R28jTEfHQU1xSl/mH8tz1Uv0V1lspAHh5BNkgGV8hASa60qg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=FdWFdNk6; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=rMcsJ+55; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=FdWFdNk6; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=rMcsJ+55; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="FdWFdNk6"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="rMcsJ+55"; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="FdWFdNk6"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="rMcsJ+55" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A82312233D; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GkJLUdth35DaRyhM3TWCLrM77sXSdG1lPON3otYkBww=; b=FdWFdNk6TCZgy2nxkKnHyH/gOhW06MM52E144uaFEptFAWgOxxgZpiqOW+9oHSVGeomEKy DdHbsvEMgqP0LSxABJ7IXL6c5kahQ7XTNIg/Ygrjbua/aFMjbUk/fuPpn6sevEYHkYtLzO 15n+KWuecNEPw09EzAeDbBnlnkLvnsk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GkJLUdth35DaRyhM3TWCLrM77sXSdG1lPON3otYkBww=; b=rMcsJ+554wYvI1o+b4gPWt0Utnv4sbAmPE0solGOr15xwL6oebqkMZOo2VoWJPzWx/JCup CAhpVafhccTZF1Ag== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GkJLUdth35DaRyhM3TWCLrM77sXSdG1lPON3otYkBww=; b=FdWFdNk6TCZgy2nxkKnHyH/gOhW06MM52E144uaFEptFAWgOxxgZpiqOW+9oHSVGeomEKy DdHbsvEMgqP0LSxABJ7IXL6c5kahQ7XTNIg/Ygrjbua/aFMjbUk/fuPpn6sevEYHkYtLzO 15n+KWuecNEPw09EzAeDbBnlnkLvnsk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GkJLUdth35DaRyhM3TWCLrM77sXSdG1lPON3otYkBww=; b=rMcsJ+554wYvI1o+b4gPWt0Utnv4sbAmPE0solGOr15xwL6oebqkMZOo2VoWJPzWx/JCup CAhpVafhccTZF1Ag== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 9339013A6A; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 6HFpIzvBrmhNfgAAD6G6ig (envelope-from ); Wed, 27 Aug 2025 08:26:35 +0000 From: Vlastimil Babka Date: Wed, 27 Aug 2025 10:26:40 +0200 Subject: [PATCH v6 08/10] mm, vma: use percpu sheaves for vm_area_struct cache Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250827-slub-percpu-caches-v6-8-f0f775a3f73f@suse.cz> References: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> In-Reply-To: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> To: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes Cc: Roman Gushchin , Harry Yoo , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, vbabka@suse.cz X-Mailer: b4 0.14.2 X-Spam-Level: X-Spamd-Result: default: False [-8.30 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-0.999]; MIME_GOOD(-0.10)[text/plain]; FUZZY_RATELIMITED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_DN_SOME(0.00)[]; RCPT_COUNT_TWELVE(0.00)[12]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; R_RATELIMIT(0.00)[to_ip_from(RLwn5r54y1cp81no5tmbbew5oc)]; FROM_HAS_DN(0.00)[]; FREEMAIL_CC(0.00)[linux.dev,oracle.com,gmail.com,kvack.org,vger.kernel.org,lists.infradead.org,suse.cz]; MID_RHS_MATCH_FROM(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.cz:email,suse.cz:mid,imap1.dmz-prg2.suse.org:helo] X-Spam-Flag: NO X-Spam-Score: -8.30 Create the vm_area_struct cache with percpu sheaves of size 32 to improve its performance. Reviewed-by: Suren Baghdasaryan Signed-off-by: Vlastimil Babka --- mm/vma_init.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/vma_init.c b/mm/vma_init.c index 8e53c7943561e7324e7992946b4065dec1149b82..52c6b55fac4519e0da39ca75ad0= 18e14449d1d95 100644 --- a/mm/vma_init.c +++ b/mm/vma_init.c @@ -16,6 +16,7 @@ void __init vma_state_init(void) struct kmem_cache_args args =3D { .use_freeptr_offset =3D true, .freeptr_offset =3D offsetof(struct vm_area_struct, vm_freeptr), + .sheaf_capacity =3D 32, }; =20 vm_area_cachep =3D kmem_cache_create("vm_area_struct", --=20 2.51.0 From nobody Fri Oct 3 18:10:03 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9640C33A020 for ; Wed, 27 Aug 2025 08:27:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283233; cv=none; b=QCdjXQuKmE4J8KPNUEcNEuj+OUIrZK4Se1Q2NS3ioIZahuGA0dIYMJP9fz3C8KDtdLl4Qow78mcUjqJd9HgGAz/+jyV6ieb8XZypjSNAdeyWH7qD+vrUgN/JGAoM/k8W7pBuQjcwqGP1OD3tmkaEoJzm9aANqX0QAEvvUGGeuZE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283233; c=relaxed/simple; bh=w0FwJcPgX1e36DmPBz/2cYjO8qP6xKfM6E0UW0U0XDc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=MwFus4jkDPh657aErUE3OibK+R9Vjk8e0HmIubCYd5rlnTIxKpw+dWEDN9f4Jw+phlL4nFGpQ3DP/mtrQcOvS5ZoL1CHNiJZ0wpJc8xxz+EGQnY2JAl/odHtvHaVGMb7l+NORVyYQ5iuhs0cNBtVZjFp22lWyWEtqIjpPBQO3UQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C45CC1FF24; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Authentication-Results: smtp-out2.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id A552F13310; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id KPkQKDvBrmhNfgAAD6G6ig (envelope-from ); Wed, 27 Aug 2025 08:26:35 +0000 From: Vlastimil Babka Date: Wed, 27 Aug 2025 10:26:41 +0200 Subject: [PATCH v6 09/10] tools/testing: Add testing support for slab caches with sheaves Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250827-slub-percpu-caches-v6-9-f0f775a3f73f@suse.cz> References: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> In-Reply-To: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> To: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes Cc: Roman Gushchin , Harry Yoo , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, vbabka@suse.cz, "Liam R. Howlett" X-Mailer: b4 0.14.2 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Action: no action X-Spam-Flag: NO X-Spam-Level: X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Rspamd-Queue-Id: C45CC1FF24 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spam-Score: -4.00 From: "Liam R. Howlett" Make testing work for the slab changes that have come in with the sheaves work. That means support kmem_cache_args with sheaf_capacity. [vbabka@suse.cz: remove kfree_rcu() support, to be added later] Signed-off-by: Liam R. Howlett Signed-off-by: Vlastimil Babka --- tools/include/linux/slab.h | 41 ++++++++++++++++++++++++++++++++++++++--- tools/testing/shared/linux.c | 12 ++++++++---- 2 files changed, 46 insertions(+), 7 deletions(-) diff --git a/tools/include/linux/slab.h b/tools/include/linux/slab.h index c87051e2b26f5a7fee0362697fae067076b8e84d..d1444e79f2685edb828adbce8b3= fbb500c0f8844 100644 --- a/tools/include/linux/slab.h +++ b/tools/include/linux/slab.h @@ -23,6 +23,12 @@ enum slab_state { FULL }; =20 +struct kmem_cache_args { + unsigned int align; + unsigned int sheaf_capacity; + void (*ctor)(void *); +}; + static inline void *kzalloc(size_t size, gfp_t gfp) { return kmalloc(size, gfp | __GFP_ZERO); @@ -37,9 +43,38 @@ static inline void *kmem_cache_alloc(struct kmem_cache *= cachep, int flags) } void kmem_cache_free(struct kmem_cache *cachep, void *objp); =20 -struct kmem_cache *kmem_cache_create(const char *name, unsigned int size, - unsigned int align, unsigned int flags, - void (*ctor)(void *)); + +struct kmem_cache * +__kmem_cache_create_args(const char *name, unsigned int size, + struct kmem_cache_args *args, unsigned int flags); + +/* If NULL is passed for @args, use this variant with default arguments. */ +static inline struct kmem_cache * +__kmem_cache_default_args(const char *name, unsigned int size, + struct kmem_cache_args *args, unsigned int flags) +{ + struct kmem_cache_args kmem_default_args =3D {}; + + return __kmem_cache_create_args(name, size, &kmem_default_args, flags); +} + +static inline struct kmem_cache * +__kmem_cache_create(const char *name, unsigned int size, unsigned int alig= n, + unsigned int flags, void (*ctor)(void *)) +{ + struct kmem_cache_args kmem_args =3D { + .align =3D align, + .ctor =3D ctor, + }; + + return __kmem_cache_create_args(name, size, &kmem_args, flags); +} + +#define kmem_cache_create(__name, __object_size, __args, ...) \ + _Generic((__args), \ + struct kmem_cache_args *: __kmem_cache_create_args, \ + void *: __kmem_cache_default_args, \ + default: __kmem_cache_create)(__name, __object_size, __args, __VA_ARGS__) =20 void kmem_cache_free_bulk(struct kmem_cache *cachep, size_t size, void **l= ist); int kmem_cache_alloc_bulk(struct kmem_cache *cachep, gfp_t gfp, size_t siz= e, diff --git a/tools/testing/shared/linux.c b/tools/testing/shared/linux.c index 0f97fb0d19e19c327aa4843a35b45cc086f4f366..04730abe4dffbd6849b848373ec= 110b87c81bf33 100644 --- a/tools/testing/shared/linux.c +++ b/tools/testing/shared/linux.c @@ -20,6 +20,7 @@ struct kmem_cache { pthread_mutex_t lock; unsigned int size; unsigned int align; + unsigned int sheaf_capacity; int nr_objs; void *objs; void (*ctor)(void *); @@ -234,23 +235,26 @@ int kmem_cache_alloc_bulk(struct kmem_cache *cachep, = gfp_t gfp, size_t size, } =20 struct kmem_cache * -kmem_cache_create(const char *name, unsigned int size, unsigned int align, - unsigned int flags, void (*ctor)(void *)) +__kmem_cache_create_args(const char *name, unsigned int size, + struct kmem_cache_args *args, + unsigned int flags) { struct kmem_cache *ret =3D malloc(sizeof(*ret)); =20 pthread_mutex_init(&ret->lock, NULL); ret->size =3D size; - ret->align =3D align; + ret->align =3D args->align; + ret->sheaf_capacity =3D args->sheaf_capacity; ret->nr_objs =3D 0; ret->nr_allocated =3D 0; ret->nr_tallocated =3D 0; ret->objs =3D NULL; - ret->ctor =3D ctor; + ret->ctor =3D args->ctor; ret->non_kernel =3D 0; ret->exec_callback =3D false; ret->callback =3D NULL; ret->private =3D NULL; + return ret; } =20 --=20 2.51.0 From nobody Fri Oct 3 18:10:03 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B165933A020 for ; Wed, 27 Aug 2025 08:27:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283239; cv=none; b=fwCtGATWHJSQzFR3FjmzADvrkHAayqjpprH/UuMv8Ulbj78fZe15/TZtodPk1TpQ++zKR7T1DIALriF++cCSnUH7e9kCy984xwJGAkKMbFS+hLzypOskD3wv5Sak0F5BRc/P2dcktbS8r7N6l/S3t9bL8Rejo3XsqgLiNPyn8jU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756283239; c=relaxed/simple; bh=UlGxvACdv155XrFFrWk0ecsws481oUU4TIOFXFk1vIg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=FgPxOfp1SZoXZgAGNUSvmBKcGFEmfiSU+AHSfOAaiOz2mctvcfzs/zxDinEt7jTo3T9f8jz5mRyuj8+3uR1yaii4NScmOecz0+lM7NZY++U8j0R98PA5ljuI2ryDq1aEBe8fyI3NayBVJl99ymnPXO5hz+mkBAZXb8teybPwRAc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=xSTgJhIX; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=SVYIM7N9; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=xSTgJhIX; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=SVYIM7N9; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="xSTgJhIX"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="SVYIM7N9"; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="xSTgJhIX"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="SVYIM7N9" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id CDFAC1FF25; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0KRBIhNrsFaoYAcvZ0YP3W0XTTeCHuYd9glX8HrySn4=; b=xSTgJhIXzLSiuibHPW4pYXFd/u9s/jj08ZTG5dkCAya50kJwjIuH6kUeRPJjHtE7XTNkKY +MdhI1k/ObTywr5+SUtd8soD0Cmn1utHhNAMWheHDHEKhszwoppl5xs8mcNI8Dz7SRCCzY kAqGm4s7/QIxe2Qj9ksRopEHISHLJPE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0KRBIhNrsFaoYAcvZ0YP3W0XTTeCHuYd9glX8HrySn4=; b=SVYIM7N9OfbLAjCujcb/K7N7X7RFZ5jssSiXgju7I7NZ878Mj/thgXlignLIOVkvYO5Pru xC7jP1Mj2uKaEmCQ== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0KRBIhNrsFaoYAcvZ0YP3W0XTTeCHuYd9glX8HrySn4=; b=xSTgJhIXzLSiuibHPW4pYXFd/u9s/jj08ZTG5dkCAya50kJwjIuH6kUeRPJjHtE7XTNkKY +MdhI1k/ObTywr5+SUtd8soD0Cmn1utHhNAMWheHDHEKhszwoppl5xs8mcNI8Dz7SRCCzY kAqGm4s7/QIxe2Qj9ksRopEHISHLJPE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1756283195; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0KRBIhNrsFaoYAcvZ0YP3W0XTTeCHuYd9glX8HrySn4=; b=SVYIM7N9OfbLAjCujcb/K7N7X7RFZ5jssSiXgju7I7NZ878Mj/thgXlignLIOVkvYO5Pru xC7jP1Mj2uKaEmCQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id B8C3A13A31; Wed, 27 Aug 2025 08:26:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id mNbSLDvBrmhNfgAAD6G6ig (envelope-from ); Wed, 27 Aug 2025 08:26:35 +0000 From: Vlastimil Babka Date: Wed, 27 Aug 2025 10:26:42 +0200 Subject: [PATCH v6 10/10] maple_tree: use percpu sheaves for maple_node_cache Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250827-slub-percpu-caches-v6-10-f0f775a3f73f@suse.cz> References: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> In-Reply-To: <20250827-slub-percpu-caches-v6-0-f0f775a3f73f@suse.cz> To: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes Cc: Roman Gushchin , Harry Yoo , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, vbabka@suse.cz X-Mailer: b4 0.14.2 X-Spamd-Result: default: False [-8.30 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-0.999]; MIME_GOOD(-0.10)[text/plain]; FUZZY_RATELIMITED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_DN_SOME(0.00)[]; RCPT_COUNT_TWELVE(0.00)[12]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; R_RATELIMIT(0.00)[to_ip_from(RLwn5r54y1cp81no5tmbbew5oc)]; FROM_HAS_DN(0.00)[]; FREEMAIL_CC(0.00)[linux.dev,oracle.com,gmail.com,kvack.org,vger.kernel.org,lists.infradead.org,suse.cz]; MID_RHS_MATCH_FROM(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.cz:mid,suse.cz:email,imap1.dmz-prg2.suse.org:helo] X-Spam-Flag: NO X-Spam-Level: X-Spam-Score: -8.30 Setup the maple_node_cache with percpu sheaves of size 32 to hopefully improve its performance. Note this will not immediately take advantage of sheaf batching of kfree_rcu() operations due to the maple tree using call_rcu with custom callbacks. The followup changes to maple tree will change that and also make use of the prefilled sheaves functionality. Signed-off-by: Vlastimil Babka Reviewed-by: Suren Baghdasaryan --- lib/maple_tree.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index b4ee2d29d7a962ca374467d0533185f2db3d35ff..a0db6bdc63793b8bbd544e24639= 1d99e880dede3 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -6302,9 +6302,14 @@ bool mas_nomem(struct ma_state *mas, gfp_t gfp) =20 void __init maple_tree_init(void) { + struct kmem_cache_args args =3D { + .align =3D sizeof(struct maple_node), + .sheaf_capacity =3D 32, + }; + maple_node_cache =3D kmem_cache_create("maple_node", - sizeof(struct maple_node), sizeof(struct maple_node), - SLAB_PANIC, NULL); + sizeof(struct maple_node), &args, + SLAB_PANIC); } =20 /** --=20 2.51.0