Kbuild | 13 +++++- fs/buffer.c | 6 ++- fs/dcache.c | 8 ++-- fs/file_table.c | 32 +++++++------- fs/inode.c | 6 ++- fs/namespace.c | 6 ++- fs/ufs/super.c | 9 ++-- include/asm-generic/vmlinux.lds.h | 3 +- include/linux/fdtable.h | 3 +- include/linux/fs_struct.h | 3 +- include/linux/signal.h | 3 +- include/linux/slab-static.h | 69 +++++++++++++++++++++++++++++++ include/linux/slab.h | 11 +++++ kernel/fork.c | 37 ++++++++++------- mm/kmem_cache_size.c | 20 +++++++++ mm/slab.h | 1 + mm/slab_common.c | 44 +++++++++++++------- mm/slub.c | 7 ++++ 18 files changed, 214 insertions(+), 67 deletions(-) create mode 100644 include/linux/slab-static.h create mode 100644 mm/kmem_cache_size.c
kmem_cache_create() and friends create new instances of
struct kmem_cache and return pointers to those. Quite a few things in
core kernel are allocated from such caches; each allocation involves
dereferencing an assign-once pointer and for sufficiently hot ones that
dereferencing does show in profiles.
There had been patches floating around switching some of those
to runtime_const infrastructure. Unfortunately, it's arch-specific
and most of the architectures lack it.
There's an alternative approach applicable at least to the caches
that are never destroyed, which covers a lot of them. No matter what,
runtime_const for pointers is not going to be faster than plain &,
so if we had struct kmem_cache instances with static storage duration, we
would be at least no worse off than we are with runtime_const variants.
There are obstacles to doing that, but they turn out to be easy
to deal with.
1) as it is, struct kmem_cache is opaque for anything outside of a few
files in mm/*; that avoids serious headache with header dependencies,
etc., and it's not something we want to lose. Solution: struct
kmem_cache_opaque, with the size and alignment identical to struct
kmem_cache. Calculation of size and alignment can be done via the same
mechanism we use for asm-offsets.h and rq-offsets.h, with build-time
check for mismatches. With that done, we get an opaque type defined in
linux/slab-static.h that can be used for declaring those caches.
In linux/slab.h we add a forward declaration of kmem_cache_opaque +
helper (to_kmem_cache()) converting a pointer to kmem_cache_opaque
into pointer to kmem_cache.
2) real constructor of kmem_cache needs to be taught to deal with
preallocated instances. That turns out to be easy - we already pass an
obscene amount of optional arguments via struct kmem_cache_args, so we
can stash the pointer to preallocated instance in there. Changes in
mm/slab_common.c are very minor - we should treat preallocated caches
as unmergable, use the instance passed to us instead of allocating a
new one and we should not free them. That's it.
A set of helpers parallel to kmem_cache_create() and friends
(kmem_cache_setup(), etc.) is provided in the same linux/slab-static.h;
generally, conversion affects only a few lines.
Note that slab-static.h is needed only in places that create
such instances; all users need only slab.h (and they can be modular,
unlike runtime_const-based approach).
That covers the instances that never get destroyed. Quite a few
fall into that category, but there's a major exception - anything in
modules must be destroyed before the module gets removed. Note that
unlike runtime_constant-based approach, cache _uses_ in a module are
fine - if kmem_cache_opaque instance is exported, its address is available
to modules without any problems. It's caches _created_ in a module
that offer an extra twist.
Teaching kmem_cache_destroy() to skip actual freeing of given
kmem_cache instance is trivial; the problem is that kmem_cache_destroy()
may overlap with sysfs access to attributes of that cache. In that
case kmem_cache_destroy() may return before the instance gets freed -
freeing (from slab_kmem_cache_release()) happens when the refcount of
embedded kobject drops to zero. That's fine, since all references
to data structures in module's memory are already gone by the time
kmem_cache_destroy() returns. That, however, relies upon the struct
kmem_cache itself not being in module's memory; getting it unmapped
before slab_kmem_cache_release() has run needs to be avoided.
It's not hard to deal with, though. We need to make sure that
instance in a module will get to slab_kmem_cache_release() before the
module data gets freed. That's only a problem on sysfs setups -
otherwise it'll definitely be finished before kmem_cache_destroy()
returns.
Note that modules themselves have sysfs-exposed attributes,
so a similar problem already exists there. That's dealt with by
having mod_sysfs_teardown() wait for refcount of module->mkobj.kobj
reaching zero. Let's make use of that - have static-duration-in-module
kmem_cache instances grab a reference to that kobject upon setup and
drop it in the end of slab_kmem_cache_release().
Let setup helpers store the kobjetct to be pinned in
kmem_cache_args->owner (for preallocated; if somebody manually sets it
for non-preallocated case, it'll be ignored). That would be
&THIS_MODULE->mkobj.kobj for a module and NULL in built-in.
If sysfs is enabled and we are dealing with preallocated instance,
let create_cache() grab and stash that reference in kmem_cache->owner
and let slab_kmem_cache_release() drop it instead of freeing kmem_cache
instance.
Costs:
* a bit (SLAB_PREALLOCATED) is stolen from slab_flags_t
* such caches can't be merged. If you want them mergable, don't use that
technics.
* you can't do kmem_cache_setup()/kmem_cache_destroy()/kmem_cache_setup()
on the same instance. Just don't do that.
Al Viro (15):
static kmem_cache instances for core caches
allow static-duration kmem_cache in modules
make mnt_cache static-duration
turn thread_cache static-duration
turn signal_cache static-duration
turn bh_cachep static-duration
turn dentry_cache static-duration
turn files_cachep static-duration
make filp and bfilp caches static-duration
turn sighand_cache static-duration
turn mm_cachep static-duration
turn task_struct_cachep static-duration
turn fs_cachep static-duration
turn inode_cachep static-duration
turn ufs_inode_cache static-duration
Kbuild | 13 +++++-
fs/buffer.c | 6 ++-
fs/dcache.c | 8 ++--
fs/file_table.c | 32 +++++++-------
fs/inode.c | 6 ++-
fs/namespace.c | 6 ++-
fs/ufs/super.c | 9 ++--
include/asm-generic/vmlinux.lds.h | 3 +-
include/linux/fdtable.h | 3 +-
include/linux/fs_struct.h | 3 +-
include/linux/signal.h | 3 +-
include/linux/slab-static.h | 69 +++++++++++++++++++++++++++++++
include/linux/slab.h | 11 +++++
kernel/fork.c | 37 ++++++++++-------
mm/kmem_cache_size.c | 20 +++++++++
mm/slab.h | 1 +
mm/slab_common.c | 44 +++++++++++++-------
mm/slub.c | 7 ++++
18 files changed, 214 insertions(+), 67 deletions(-)
create mode 100644 include/linux/slab-static.h
create mode 100644 mm/kmem_cache_size.c
--
2.47.3
On Sat, 10 Jan 2026, Al Viro wrote: > 1) as it is, struct kmem_cache is opaque for anything outside of a few > files in mm/*; that avoids serious headache with header dependencies, > etc., and it's not something we want to lose. Solution: struct > kmem_cache_opaque, with the size and alignment identical to struct > kmem_cache. Calculation of size and alignment can be done via the same > mechanism we use for asm-offsets.h and rq-offsets.h, with build-time > check for mismatches. With that done, we get an opaque type defined in > linux/slab-static.h that can be used for declaring those caches. > In linux/slab.h we add a forward declaration of kmem_cache_opaque + > helper (to_kmem_cache()) converting a pointer to kmem_cache_opaque > into pointer to kmem_cache. Hmmm. A new kernel infrastructure feature: Opaque objects Would that an deserve a separate abstraction so it is usable by other subsystems?
On Wed, Jan 14, 2026 at 04:46:04PM -0800, Christoph Lameter (Ampere) wrote: > On Sat, 10 Jan 2026, Al Viro wrote: > > > 1) as it is, struct kmem_cache is opaque for anything outside of a few > > files in mm/*; that avoids serious headache with header dependencies, > > etc., and it's not something we want to lose. Solution: struct > > kmem_cache_opaque, with the size and alignment identical to struct > > kmem_cache. Calculation of size and alignment can be done via the same > > mechanism we use for asm-offsets.h and rq-offsets.h, with build-time > > check for mismatches. With that done, we get an opaque type defined in > > linux/slab-static.h that can be used for declaring those caches. > > In linux/slab.h we add a forward declaration of kmem_cache_opaque + > > helper (to_kmem_cache()) converting a pointer to kmem_cache_opaque > > into pointer to kmem_cache. > > Hmmm. A new kernel infrastructure feature: Opaque objects > > Would that an deserve a separate abstraction so it is usable by other > subsystems? *shrug* Probably could be done, but I don't see many applications for that. Note that in this case objects are either of "never destroyed at all" sort or "never destroyed until rmmod" one, and the latter already requires a pretty careful handling. If it's dynamically allocated, we have much more straightforward mechanisms - see e.g. struct mount vs. struct vfsmount, where most of the containing object is opaque for everyone outside of several files in fs/*.c and the public part is embedded into it. I'm not saying that no other similar cases exist, but until somebody comes up with other examples...
On Thu, 15 Jan 2026, Al Viro wrote:
> > Would that an deserve a separate abstraction so it is usable by other
> > subsystems?
>
> *shrug*
>
> Probably could be done, but I don't see many applications for that.
> Note that in this case objects are either of "never destroyed at all"
> sort or "never destroyed until rmmod" one, and the latter already
> requires a pretty careful handling.
>
> If it's dynamically allocated, we have much more straightforward
> mechanisms - see e.g. struct mount vs. struct vfsmount, where most
> of the containing object is opaque for everyone outside of several
> files in fs/*.c and the public part is embedded into it.
>
> I'm not saying that no other similar cases exist, but until somebody
> comes up with other examples...
Internal functions exist in the slab allocator that do what you want if
the opaqueness requirement is dropped. F.e. for the creation of kmalloc
caches we use do_kmem_cache_create():
void __init create_boot_cache(struct kmem_cache *s, const char *name,
unsigned int size, slab_flags_t flags,
unsigned int useroffset, unsigned int usersize)
{
int err;
unsigned int align = ARCH_KMALLOC_MINALIGN;
struct kmem_cache_args kmem_args = {};
/*
* kmalloc caches guarantee alignment of at least the largest
* power-of-two divisor of the size. For power-of-two sizes,
* it is the size itself.
*/
if (flags & SLAB_KMALLOC)
align = max(align, 1U << (ffs(size) - 1));
kmem_args.align = calculate_alignment(flags, align, size);
#ifdef CONFIG_HARDENED_USERCOPY
kmem_args.useroffset = useroffset;
kmem_args.usersize = usersize;
#endif
err = do_kmem_cache_create(s, name, size, &kmem_args, flags);
if (err)
panic("Creation of kmalloc slab %s size=%u failed. Reason %d\n",
name, size, err);
s->refcount = -1; /* Exempt from merging for now */
}
On Thu, Jan 15, 2026 at 11:10:00AM -0800, Christoph Lameter (Ampere) wrote: > Internal functions exist in the slab allocator that do what you want if > the opaqueness requirement is dropped. F.e. for the creation of kmalloc > caches we use do_kmem_cache_create(): Yes, I know. Do you really want to expose e.g. slab_caches and slab_mutex to the rest of the kernel? Surgery needed to have __kmem_cache_create() do everything is not large - see the mm/slab_common.c parts in the first two commits in this series.
On Fri, 9 Jan 2026 at 18:01, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> There's an alternative approach applicable at least to the caches
> that are never destroyed, which covers a lot of them. No matter what,
> runtime_const for pointers is not going to be faster than plain &,
> so if we had struct kmem_cache instances with static storage duration, we
> would be at least no worse off than we are with runtime_const variants.
I like it. Much better than runtime_const for these things.
That said, I don't love the commit messages. "turn xyzzy
static-duration" reads very oddly to me, and because I saw the emails
out of order originally it just made me go "whaa?"
So can we please explain this some more obvious way. Maybe just "Make
xyz be statically allocated". Yes, I'm nitpicking, but I feel like
explaining core patches is worth the effort.
And maybe that's for the sad reason that I read more explanations than
code these days '/
Linus
On Fri, Jan 09, 2026 at 07:33:41PM -1000, Linus Torvalds wrote: > On Fri, 9 Jan 2026 at 18:01, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > There's an alternative approach applicable at least to the caches > > that are never destroyed, which covers a lot of them. No matter what, > > runtime_const for pointers is not going to be faster than plain &, > > so if we had struct kmem_cache instances with static storage duration, we > > would be at least no worse off than we are with runtime_const variants. > > I like it. Much better than runtime_const for these things. > > That said, I don't love the commit messages. "turn xyzzy > static-duration" reads very oddly to me, and because I saw the emails > out of order originally it just made me go "whaa?" > > So can we please explain this some more obvious way. Maybe just "Make > xyz be statically allocated". Yes, I'm nitpicking, but I feel like > explaining core patches is worth the effort. Point, but TBH the tail of the series is basically a demo for conversions as well as "this is what I'd been testing, FSVO". In non-RFC form these would be folded into fewer commits, if nothing else... I'd really like to hear comments on the first two commits from SLAB maintainers - for example, if slab_flags_t bits are considered a scarce resource, the second commit would need to be modified. Still doable, but representation would be more convoluted... Another question is whether it's worth checking for accidental call of e.g. kmem_cache_setup() on an already initialized cache, statically or dynamically allocated. Again, up to maintainers - their subsystem, their preferences.
On Sat, Jan 10, 2026 at 06:16:00AM +0000, Al Viro wrote: > On Fri, Jan 09, 2026 at 07:33:41PM -1000, Linus Torvalds wrote: > > On Fri, 9 Jan 2026 at 18:01, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > There's an alternative approach applicable at least to the caches > > > that are never destroyed, which covers a lot of them. No matter what, > > > runtime_const for pointers is not going to be faster than plain &, > > > so if we had struct kmem_cache instances with static storage duration, we > > > would be at least no worse off than we are with runtime_const variants. > > > > I like it. Much better than runtime_const for these things. > > > > That said, I don't love the commit messages. "turn xyzzy > > static-duration" reads very oddly to me, and because I saw the emails > > out of order originally it just made me go "whaa?" > > > > So can we please explain this some more obvious way. Maybe just "Make > > xyz be statically allocated". Yes, I'm nitpicking, but I feel like > > explaining core patches is worth the effort. > > Point, but TBH the tail of the series is basically a demo for conversions > as well as "this is what I'd been testing, FSVO". In non-RFC form these > would be folded into fewer commits, if nothing else... > > I'd really like to hear comments on the first two commits from SLAB > maintainers - for example, if slab_flags_t bits are considered a scarce > resource, the second commit would need to be modified. Still doable, but I think it's okay to introduce a new cache flag as long as it's simpler. IMHO it's not a scarce resource (yet). > representation would be more convoluted... > > Another question is whether it's worth checking for accidental call > of e.g. kmem_cache_setup() on an already initialized cache, statically > or dynamically allocated. No strong opinion from me. > Again, up to maintainers - their subsystem, > their preferences. -- Cheers, Harry / Hyeonggon
© 2016 - 2026 Red Hat, Inc.