From nobody Tue Feb 10 06:27:41 2026 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9A852ED866 for ; Mon, 12 Jan 2026 15:17:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768231067; cv=none; b=sMzMrVStE9/IBVdnak/IBcu6XNuhd5U9e5yiCTAgArQsXGy0LVaxWQq7rlYnOUyrhlucAYs7XsMLqZ8/M0lbct/7VfN4pAAPThmlRfiosRsdyWdqHPmtS8QU2CbpZTTSFecU+VpvG1iUOBChrrLcPSXjvkc97SbivVZVa+R7E8A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768231067; c=relaxed/simple; bh=NYpNQy1hQ1U5QuVYonhXRpDv/bTP5Vn/CFspZFwQYng=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Z6lAWfa1ZS3yLfzuta+7lmE8mCpQjjQeb+iESNhDU9aKucGBgpNGLunscf4/65d7Dwe6NXB9uzxU5NN2MWpx6tCw2k58zEZXRkCyDA2Edub7tG7eJ4/cippwCmTb2QdvR3vleKhIuRyjSgCNkTh1wzOaRLvtmUiuazLU5qKPSwk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 075C33369A; Mon, 12 Jan 2026 15:16:59 +0000 (UTC) Authentication-Results: smtp-out1.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id DF6E73EA65; Mon, 12 Jan 2026 15:16:58 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id WKlFNmoQZWn7FgAAD6G6ig (envelope-from ); Mon, 12 Jan 2026 15:16:58 +0000 From: Vlastimil Babka Date: Mon, 12 Jan 2026 16:17:08 +0100 Subject: [PATCH RFC v2 14/20] slab: remove struct kmem_cache_cpu Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260112-sheaves-for-all-v2-14-98225cfb50cf@suse.cz> References: <20260112-sheaves-for-all-v2-0-98225cfb50cf@suse.cz> In-Reply-To: <20260112-sheaves-for-all-v2-0-98225cfb50cf@suse.cz> To: Harry Yoo , Petr Tesarik , Christoph Lameter , David Rientjes , Roman Gushchin Cc: Hao Li , Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Suren Baghdasaryan , Sebastian Andrzej Siewior , Alexei Starovoitov , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev, bpf@vger.kernel.org, kasan-dev@googlegroups.com, Vlastimil Babka X-Mailer: b4 0.14.3 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spam-Score: -4.00 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Action: no action X-Rspamd-Queue-Id: 075C33369A X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Spam-Level: X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[] X-Spam-Flag: NO The cpu slab is not used anymore for allocation or freeing, the remaining code is for flushing, but it's effectively dead. Remove the whole struct kmem_cache_cpu, the flushing code and other orphaned functions. The remaining used field of kmem_cache_cpu is the stat array with CONFIG_SLUB_STATS. Put it instead in a new struct kmem_cache_stats. In struct kmem_cache, the field is cpu_stats and placed near the end of the struct. Signed-off-by: Vlastimil Babka --- mm/slab.h | 7 +- mm/slub.c | 298 +++++-----------------------------------------------------= ---- 2 files changed, 24 insertions(+), 281 deletions(-) diff --git a/mm/slab.h b/mm/slab.h index e9a0738133ed..87faeb6143f2 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -21,14 +21,12 @@ # define system_has_freelist_aba() system_has_cmpxchg128() # define try_cmpxchg_freelist try_cmpxchg128 # endif -#define this_cpu_try_cmpxchg_freelist this_cpu_try_cmpxchg128 typedef u128 freelist_full_t; #else /* CONFIG_64BIT */ # ifdef system_has_cmpxchg64 # define system_has_freelist_aba() system_has_cmpxchg64() # define try_cmpxchg_freelist try_cmpxchg64 # endif -#define this_cpu_try_cmpxchg_freelist this_cpu_try_cmpxchg64 typedef u64 freelist_full_t; #endif /* CONFIG_64BIT */ =20 @@ -189,7 +187,6 @@ struct kmem_cache_order_objects { * Slab cache management. */ struct kmem_cache { - struct kmem_cache_cpu __percpu *cpu_slab; struct slub_percpu_sheaves __percpu *cpu_sheaves; /* Used for retrieving partial slabs, etc. */ slab_flags_t flags; @@ -238,6 +235,10 @@ struct kmem_cache { unsigned int usersize; /* Usercopy region size */ #endif =20 +#ifdef CONFIG_SLUB_STATS + struct kmem_cache_stats __percpu *cpu_stats; +#endif + struct kmem_cache_node *node[MAX_NUMNODES]; }; =20 diff --git a/mm/slub.c b/mm/slub.c index 07d977e12478..882f607fb4ad 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -400,28 +400,11 @@ enum stat_item { NR_SLUB_STAT_ITEMS }; =20 -struct freelist_tid { - union { - struct { - void *freelist; /* Pointer to next available object */ - unsigned long tid; /* Globally unique transaction id */ - }; - freelist_full_t freelist_tid; - }; -}; - -/* - * When changing the layout, make sure freelist and tid are still compatib= le - * with this_cpu_cmpxchg_double() alignment requirements. - */ -struct kmem_cache_cpu { - struct freelist_tid; - struct slab *slab; /* The slab from which we are allocating */ - local_trylock_t lock; /* Protects the fields above */ #ifdef CONFIG_SLUB_STATS +struct kmem_cache_stats { unsigned int stat[NR_SLUB_STAT_ITEMS]; -#endif }; +#endif =20 static inline void stat(const struct kmem_cache *s, enum stat_item si) { @@ -430,7 +413,7 @@ static inline void stat(const struct kmem_cache *s, enu= m stat_item si) * The rmw is racy on a preemptible kernel but this is acceptable, so * avoid this_cpu_add()'s irq-disable overhead. */ - raw_cpu_inc(s->cpu_slab->stat[si]); + raw_cpu_inc(s->cpu_stats->stat[si]); #endif } =20 @@ -438,7 +421,7 @@ static inline void stat_add(const struct kmem_cache *s, enum stat_item si, int v) { #ifdef CONFIG_SLUB_STATS - raw_cpu_add(s->cpu_slab->stat[si], v); + raw_cpu_add(s->cpu_stats->stat[si], v); #endif } =20 @@ -1148,20 +1131,6 @@ static void object_err(struct kmem_cache *s, struct = slab *slab, WARN_ON(1); } =20 -static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab, - void **freelist, void *nextfree) -{ - if ((s->flags & SLAB_CONSISTENCY_CHECKS) && - !check_valid_pointer(s, slab, nextfree) && freelist) { - object_err(s, slab, *freelist, "Freechain corrupt"); - *freelist =3D NULL; - slab_fix(s, "Isolate corrupted freechain"); - return true; - } - - return false; -} - static void __slab_err(struct slab *slab) { if (slab_in_kunit_test()) @@ -1943,11 +1912,6 @@ static inline void inc_slabs_node(struct kmem_cache = *s, int node, int objects) {} static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects) {} -static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab, - void **freelist, void *nextfree) -{ - return false; -} #endif /* CONFIG_SLUB_DEBUG */ =20 /* @@ -3638,191 +3602,6 @@ static void *get_partial(struct kmem_cache *s, int = node, return get_any_partial(s, pc); } =20 -#ifdef CONFIG_PREEMPTION -/* - * Calculate the next globally unique transaction for disambiguation - * during cmpxchg. The transactions start with the cpu number and are then - * incremented by CONFIG_NR_CPUS. - */ -#define TID_STEP roundup_pow_of_two(CONFIG_NR_CPUS) -#else -/* - * No preemption supported therefore also no need to check for - * different cpus. - */ -#define TID_STEP 1 -#endif /* CONFIG_PREEMPTION */ - -static inline unsigned long next_tid(unsigned long tid) -{ - return tid + TID_STEP; -} - -#ifdef SLUB_DEBUG_CMPXCHG -static inline unsigned int tid_to_cpu(unsigned long tid) -{ - return tid % TID_STEP; -} - -static inline unsigned long tid_to_event(unsigned long tid) -{ - return tid / TID_STEP; -} -#endif - -static inline unsigned int init_tid(int cpu) -{ - return cpu; -} - -static void init_kmem_cache_cpus(struct kmem_cache *s) -{ - int cpu; - struct kmem_cache_cpu *c; - - for_each_possible_cpu(cpu) { - c =3D per_cpu_ptr(s->cpu_slab, cpu); - local_trylock_init(&c->lock); - c->tid =3D init_tid(cpu); - } -} - -/* - * Finishes removing the cpu slab. Merges cpu's freelist with slab's freel= ist, - * unfreezes the slabs and puts it on the proper list. - * Assumes the slab has been already safely taken away from kmem_cache_cpu - * by the caller. - */ -static void deactivate_slab(struct kmem_cache *s, struct slab *slab, - void *freelist) -{ - struct kmem_cache_node *n =3D get_node(s, slab_nid(slab)); - int free_delta =3D 0; - void *nextfree, *freelist_iter, *freelist_tail; - int tail =3D DEACTIVATE_TO_HEAD; - unsigned long flags =3D 0; - struct freelist_counters old, new; - - if (READ_ONCE(slab->freelist)) { - stat(s, DEACTIVATE_REMOTE_FREES); - tail =3D DEACTIVATE_TO_TAIL; - } - - /* - * Stage one: Count the objects on cpu's freelist as free_delta and - * remember the last object in freelist_tail for later splicing. - */ - freelist_tail =3D NULL; - freelist_iter =3D freelist; - while (freelist_iter) { - nextfree =3D get_freepointer(s, freelist_iter); - - /* - * If 'nextfree' is invalid, it is possible that the object at - * 'freelist_iter' is already corrupted. So isolate all objects - * starting at 'freelist_iter' by skipping them. - */ - if (freelist_corrupted(s, slab, &freelist_iter, nextfree)) - break; - - freelist_tail =3D freelist_iter; - free_delta++; - - freelist_iter =3D nextfree; - } - - /* - * Stage two: Unfreeze the slab while splicing the per-cpu - * freelist to the head of slab's freelist. - */ - do { - old.freelist =3D READ_ONCE(slab->freelist); - old.counters =3D READ_ONCE(slab->counters); - VM_BUG_ON(!old.frozen); - - /* Determine target state of the slab */ - new.counters =3D old.counters; - new.frozen =3D 0; - if (freelist_tail) { - new.inuse -=3D free_delta; - set_freepointer(s, freelist_tail, old.freelist); - new.freelist =3D freelist; - } else { - new.freelist =3D old.freelist; - } - } while (!slab_update_freelist(s, slab, &old, &new, "unfreezing slab")); - - /* - * Stage three: Manipulate the slab list based on the updated state. - */ - if (!new.inuse && n->nr_partial >=3D s->min_partial) { - stat(s, DEACTIVATE_EMPTY); - discard_slab(s, slab); - stat(s, FREE_SLAB); - } else if (new.freelist) { - spin_lock_irqsave(&n->list_lock, flags); - add_partial(n, slab, tail); - spin_unlock_irqrestore(&n->list_lock, flags); - stat(s, tail); - } else { - stat(s, DEACTIVATE_FULL); - } -} - -static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu = *c) -{ - unsigned long flags; - struct slab *slab; - void *freelist; - - local_lock_irqsave(&s->cpu_slab->lock, flags); - - slab =3D c->slab; - freelist =3D c->freelist; - - c->slab =3D NULL; - c->freelist =3D NULL; - c->tid =3D next_tid(c->tid); - - local_unlock_irqrestore(&s->cpu_slab->lock, flags); - - if (slab) { - deactivate_slab(s, slab, freelist); - stat(s, CPUSLAB_FLUSH); - } -} - -static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu) -{ - struct kmem_cache_cpu *c =3D per_cpu_ptr(s->cpu_slab, cpu); - void *freelist =3D c->freelist; - struct slab *slab =3D c->slab; - - c->slab =3D NULL; - c->freelist =3D NULL; - c->tid =3D next_tid(c->tid); - - if (slab) { - deactivate_slab(s, slab, freelist); - stat(s, CPUSLAB_FLUSH); - } -} - -static inline void flush_this_cpu_slab(struct kmem_cache *s) -{ - struct kmem_cache_cpu *c =3D this_cpu_ptr(s->cpu_slab); - - if (c->slab) - flush_slab(s, c); -} - -static bool has_cpu_slab(int cpu, struct kmem_cache *s) -{ - struct kmem_cache_cpu *c =3D per_cpu_ptr(s->cpu_slab, cpu); - - return c->slab; -} - static bool has_pcs_used(int cpu, struct kmem_cache *s) { struct slub_percpu_sheaves *pcs; @@ -3836,7 +3615,7 @@ static bool has_pcs_used(int cpu, struct kmem_cache *= s) } =20 /* - * Flush cpu slab. + * Flush percpu sheaves * * Called from CPU work handler with migration disabled. */ @@ -3851,8 +3630,6 @@ static void flush_cpu_slab(struct work_struct *w) =20 if (s->sheaf_capacity) pcs_flush_all(s); - - flush_this_cpu_slab(s); } =20 static void flush_all_cpus_locked(struct kmem_cache *s) @@ -3865,7 +3642,7 @@ static void flush_all_cpus_locked(struct kmem_cache *= s) =20 for_each_online_cpu(cpu) { sfw =3D &per_cpu(slub_flush, cpu); - if (!has_cpu_slab(cpu, s) && !has_pcs_used(cpu, s)) { + if (!has_pcs_used(cpu, s)) { sfw->skip =3D true; continue; } @@ -3975,7 +3752,6 @@ static int slub_cpu_dead(unsigned int cpu) =20 mutex_lock(&slab_mutex); list_for_each_entry(s, &slab_caches, list) { - __flush_cpu_slab(s, cpu); if (s->sheaf_capacity) __pcs_flush_all_cpu(s, cpu); } @@ -7115,26 +6891,21 @@ init_kmem_cache_node(struct kmem_cache_node *n, str= uct node_barn *barn) barn_init(barn); } =20 -static inline int alloc_kmem_cache_cpus(struct kmem_cache *s) +#ifdef CONFIG_SLUB_STATS +static inline int alloc_kmem_cache_stats(struct kmem_cache *s) { BUILD_BUG_ON(PERCPU_DYNAMIC_EARLY_SIZE < NR_KMALLOC_TYPES * KMALLOC_SHIFT_HIGH * - sizeof(struct kmem_cache_cpu)); + sizeof(struct kmem_cache_stats)); =20 - /* - * Must align to double word boundary for the double cmpxchg - * instructions to work; see __pcpu_double_call_return_bool(). - */ - s->cpu_slab =3D __alloc_percpu(sizeof(struct kmem_cache_cpu), - 2 * sizeof(void *)); + s->cpu_stats =3D alloc_percpu(struct kmem_cache_stats); =20 - if (!s->cpu_slab) + if (!s->cpu_stats) return 0; =20 - init_kmem_cache_cpus(s); - return 1; } +#endif =20 static int init_percpu_sheaves(struct kmem_cache *s) { @@ -7246,7 +7017,9 @@ void __kmem_cache_release(struct kmem_cache *s) cache_random_seq_destroy(s); if (s->cpu_sheaves) pcs_destroy(s); - free_percpu(s->cpu_slab); +#ifdef CONFIG_SLUB_STATS + free_percpu(s->cpu_stats); +#endif free_kmem_cache_nodes(s); } =20 @@ -7938,12 +7711,6 @@ static struct kmem_cache * __init bootstrap(struct k= mem_cache *static_cache) =20 memcpy(s, static_cache, kmem_cache->object_size); =20 - /* - * This runs very early, and only the boot processor is supposed to be - * up. Even if it weren't true, IRQs are not up so we couldn't fire - * IPIs around. - */ - __flush_cpu_slab(s, smp_processor_id()); for_each_kmem_cache_node(s, node, n) { struct slab *p; =20 @@ -8158,8 +7925,10 @@ int do_kmem_cache_create(struct kmem_cache *s, const= char *name, if (!init_kmem_cache_nodes(s)) goto out; =20 - if (!alloc_kmem_cache_cpus(s)) +#ifdef CONFIG_SLUB_STATS + if (!alloc_kmem_cache_stats(s)) goto out; +#endif =20 err =3D init_percpu_sheaves(s); if (err) @@ -8478,33 +8247,6 @@ static ssize_t show_slab_objects(struct kmem_cache *= s, if (!nodes) return -ENOMEM; =20 - if (flags & SO_CPU) { - int cpu; - - for_each_possible_cpu(cpu) { - struct kmem_cache_cpu *c =3D per_cpu_ptr(s->cpu_slab, - cpu); - int node; - struct slab *slab; - - slab =3D READ_ONCE(c->slab); - if (!slab) - continue; - - node =3D slab_nid(slab); - if (flags & SO_TOTAL) - x =3D slab->objects; - else if (flags & SO_OBJECTS) - x =3D slab->inuse; - else - x =3D 1; - - total +=3D x; - nodes[node] +=3D x; - - } - } - /* * It is impossible to take "mem_hotplug_lock" here with "kernfs_mutex" * already held which will conflict with an existing lock order: @@ -8875,7 +8617,7 @@ static int show_stat(struct kmem_cache *s, char *buf,= enum stat_item si) return -ENOMEM; =20 for_each_online_cpu(cpu) { - unsigned x =3D per_cpu_ptr(s->cpu_slab, cpu)->stat[si]; + unsigned x =3D per_cpu_ptr(s->cpu_stats, cpu)->stat[si]; =20 data[cpu] =3D x; sum +=3D x; @@ -8901,7 +8643,7 @@ static void clear_stat(struct kmem_cache *s, enum sta= t_item si) int cpu; =20 for_each_online_cpu(cpu) - per_cpu_ptr(s->cpu_slab, cpu)->stat[si] =3D 0; + per_cpu_ptr(s->cpu_stats, cpu)->stat[si] =3D 0; } =20 #define STAT_ATTR(si, text) \ --=20 2.52.0