kmem_cache instances with static storage duration

[RFC PATCH 00/15] kmem_cache instances with static storage duration

Posted by Al Viro 4 weeks ago

        kmem_cache_create() and friends create new instances of
struct kmem_cache and return pointers to those.  Quite a few things in
core kernel are allocated from such caches; each allocation involves
dereferencing an assign-once pointer and for sufficiently hot ones that
dereferencing does show in profiles.

        There had been patches floating around switching some of those
to runtime_const infrastructure.  Unfortunately, it's arch-specific
and most of the architectures lack it.

        There's an alternative approach applicable at least to the caches
that are never destroyed, which covers a lot of them.  No matter what,
runtime_const for pointers is not going to be faster than plain &,
so if we had struct kmem_cache instances with static storage duration, we
would be at least no worse off than we are with runtime_const variants.

        There are obstacles to doing that, but they turn out to be easy
to deal with.

1) as it is, struct kmem_cache is opaque for anything outside of a few
files in mm/*; that avoids serious headache with header dependencies,
etc., and it's not something we want to lose.  Solution: struct
kmem_cache_opaque, with the size and alignment identical to struct
kmem_cache.  Calculation of size and alignment can be done via the same
mechanism we use for asm-offsets.h and rq-offsets.h, with build-time
check for mismatches.  With that done, we get an opaque type defined in
linux/slab-static.h that can be used for declaring those caches.
In linux/slab.h we add a forward declaration of kmem_cache_opaque +
helper (to_kmem_cache()) converting a pointer to kmem_cache_opaque
into pointer to kmem_cache.

2) real constructor of kmem_cache needs to be taught to deal with
preallocated instances.  That turns out to be easy - we already pass an
obscene amount of optional arguments via struct kmem_cache_args, so we
can stash the pointer to preallocated instance in there.  Changes in
mm/slab_common.c are very minor - we should treat preallocated caches
as unmergable, use the instance passed to us instead of allocating a
new one and we should not free them.  That's it.

	A set of helpers parallel to kmem_cache_create() and friends
(kmem_cache_setup(), etc.) is provided in the same linux/slab-static.h;
generally, conversion affects only a few lines.

	Note that slab-static.h is needed only in places that create
such instances; all users need only slab.h (and they can be modular,
unlike runtime_const-based approach).


	That covers the instances that never get destroyed.  Quite a few
fall into that category, but there's a major exception - anything in
modules must be destroyed before the module gets removed.  Note that
unlike runtime_constant-based approach, cache _uses_ in a module are
fine - if kmem_cache_opaque instance is exported, its address is available
to modules without any problems.  It's caches _created_ in a module
that offer an extra twist.

	Teaching kmem_cache_destroy() to skip actual freeing of given
kmem_cache instance is trivial; the problem is that kmem_cache_destroy()
may overlap with sysfs access to attributes of that cache.  In that
case kmem_cache_destroy() may return before the instance gets freed -
freeing (from slab_kmem_cache_release()) happens when the refcount of
embedded kobject drops to zero.  That's fine, since all references
to data structures in module's memory are already gone by the time
kmem_cache_destroy() returns.  That, however, relies upon the struct
kmem_cache itself not being in module's memory; getting it unmapped
before slab_kmem_cache_release() has run needs to be avoided.

	It's not hard to deal with, though.  We need to make sure that
instance in a module will get to slab_kmem_cache_release() before the
module data gets freed.  That's only a problem on sysfs setups -
otherwise it'll definitely be finished before kmem_cache_destroy()
returns.

	Note that modules themselves have sysfs-exposed attributes,
so a similar problem already exists there.  That's dealt with by
having mod_sysfs_teardown() wait for refcount of module->mkobj.kobj
reaching zero.  Let's make use of that - have static-duration-in-module
kmem_cache instances grab a reference to that kobject upon setup and
drop it in the end of slab_kmem_cache_release().

	Let setup helpers store the kobjetct to be pinned in
kmem_cache_args->owner (for preallocated; if somebody manually sets it
for non-preallocated case, it'll be ignored).  That would be
&THIS_MODULE->mkobj.kobj for a module and NULL in built-in.

	If sysfs is enabled and we are dealing with preallocated instance,
let create_cache() grab and stash that reference in kmem_cache->owner
and let slab_kmem_cache_release() drop it instead of freeing kmem_cache
instance.


	Costs:
* a bit (SLAB_PREALLOCATED) is stolen from slab_flags_t
* such caches can't be merged.  If you want them mergable, don't use that
technics.
* you can't do kmem_cache_setup()/kmem_cache_destroy()/kmem_cache_setup()
on the same instance.  Just don't do that.

Al Viro (15):
  static kmem_cache instances for core caches
  allow static-duration kmem_cache in modules
  make mnt_cache static-duration
  turn thread_cache static-duration
  turn signal_cache static-duration
  turn bh_cachep static-duration
  turn dentry_cache static-duration
  turn files_cachep static-duration
  make filp and bfilp caches static-duration
  turn sighand_cache static-duration
  turn mm_cachep static-duration
  turn task_struct_cachep static-duration
  turn fs_cachep static-duration
  turn inode_cachep static-duration
  turn ufs_inode_cache static-duration

 Kbuild                            | 13 +++++-
 fs/buffer.c                       |  6 ++-
 fs/dcache.c                       |  8 ++--
 fs/file_table.c                   | 32 +++++++-------
 fs/inode.c                        |  6 ++-
 fs/namespace.c                    |  6 ++-
 fs/ufs/super.c                    |  9 ++--
 include/asm-generic/vmlinux.lds.h |  3 +-
 include/linux/fdtable.h           |  3 +-
 include/linux/fs_struct.h         |  3 +-
 include/linux/signal.h            |  3 +-
 include/linux/slab-static.h       | 69 +++++++++++++++++++++++++++++++
 include/linux/slab.h              | 11 +++++
 kernel/fork.c                     | 37 ++++++++++-------
 mm/kmem_cache_size.c              | 20 +++++++++
 mm/slab.h                         |  1 +
 mm/slab_common.c                  | 44 +++++++++++++-------
 mm/slub.c                         |  7 ++++
 18 files changed, 214 insertions(+), 67 deletions(-)
 create mode 100644 include/linux/slab-static.h
 create mode 100644 mm/kmem_cache_size.c

-- 
2.47.3

Re: [RFC PATCH 00/15] kmem_cache instances with static storage duration

Posted by Christoph Lameter (Ampere) 3 weeks, 2 days ago

On Sat, 10 Jan 2026, Al Viro wrote:

> 1) as it is, struct kmem_cache is opaque for anything outside of a few
> files in mm/*; that avoids serious headache with header dependencies,
> etc., and it's not something we want to lose.  Solution: struct
> kmem_cache_opaque, with the size and alignment identical to struct
> kmem_cache.  Calculation of size and alignment can be done via the same
> mechanism we use for asm-offsets.h and rq-offsets.h, with build-time
> check for mismatches.  With that done, we get an opaque type defined in
> linux/slab-static.h that can be used for declaring those caches.
> In linux/slab.h we add a forward declaration of kmem_cache_opaque +
> helper (to_kmem_cache()) converting a pointer to kmem_cache_opaque
> into pointer to kmem_cache.

Hmmm. A new kernel infrastructure feature: Opaque objects

Would that an deserve a separate abstraction so it is usable by other
subsystems?

Re: [RFC PATCH 00/15] kmem_cache instances with static storage duration

Posted by Al Viro 3 weeks, 2 days ago

On Wed, Jan 14, 2026 at 04:46:04PM -0800, Christoph Lameter (Ampere) wrote:
> On Sat, 10 Jan 2026, Al Viro wrote:
> 
> > 1) as it is, struct kmem_cache is opaque for anything outside of a few
> > files in mm/*; that avoids serious headache with header dependencies,
> > etc., and it's not something we want to lose.  Solution: struct
> > kmem_cache_opaque, with the size and alignment identical to struct
> > kmem_cache.  Calculation of size and alignment can be done via the same
> > mechanism we use for asm-offsets.h and rq-offsets.h, with build-time
> > check for mismatches.  With that done, we get an opaque type defined in
> > linux/slab-static.h that can be used for declaring those caches.
> > In linux/slab.h we add a forward declaration of kmem_cache_opaque +
> > helper (to_kmem_cache()) converting a pointer to kmem_cache_opaque
> > into pointer to kmem_cache.
> 
> Hmmm. A new kernel infrastructure feature: Opaque objects
> 
> Would that an deserve a separate abstraction so it is usable by other
> subsystems?

*shrug*

Probably could be done, but I don't see many applications for that.
Note that in this case objects are either of "never destroyed at all"
sort or "never destroyed until rmmod" one, and the latter already
requires a pretty careful handling.

If it's dynamically allocated, we have much more straightforward
mechanisms - see e.g. struct mount vs. struct vfsmount, where most
of the containing object is opaque for everyone outside of several
files in fs/*.c and the public part is embedded into it.

I'm not saying that no other similar cases exist, but until somebody
comes up with other examples...

Re: [RFC PATCH 00/15] kmem_cache instances with static storage duration

Posted by Christoph Lameter (Ampere) 3 weeks, 1 day ago

On Thu, 15 Jan 2026, Al Viro wrote:

> > Would that an deserve a separate abstraction so it is usable by other
> > subsystems?
>
> *shrug*
>
> Probably could be done, but I don't see many applications for that.
> Note that in this case objects are either of "never destroyed at all"
> sort or "never destroyed until rmmod" one, and the latter already
> requires a pretty careful handling.
>
> If it's dynamically allocated, we have much more straightforward
> mechanisms - see e.g. struct mount vs. struct vfsmount, where most
> of the containing object is opaque for everyone outside of several
> files in fs/*.c and the public part is embedded into it.
>
> I'm not saying that no other similar cases exist, but until somebody
> comes up with other examples...

Internal functions exist in the slab allocator that do what you want if
the opaqueness requirement is dropped. F.e. for the creation of kmalloc
caches we use do_kmem_cache_create():

void __init create_boot_cache(struct kmem_cache *s, const char *name,
                unsigned int size, slab_flags_t flags,
                unsigned int useroffset, unsigned int usersize)
{
        int err;
        unsigned int align = ARCH_KMALLOC_MINALIGN;
        struct kmem_cache_args kmem_args = {};

        /*
         * kmalloc caches guarantee alignment of at least the largest
         * power-of-two divisor of the size. For power-of-two sizes,
         * it is the size itself.
         */
        if (flags & SLAB_KMALLOC)
                align = max(align, 1U << (ffs(size) - 1));
        kmem_args.align = calculate_alignment(flags, align, size);

#ifdef CONFIG_HARDENED_USERCOPY
        kmem_args.useroffset = useroffset;
        kmem_args.usersize = usersize;
#endif

        err = do_kmem_cache_create(s, name, size, &kmem_args, flags);

        if (err)
                panic("Creation of kmalloc slab %s size=%u failed. Reason %d\n",
                                        name, size, err);

        s->refcount = -1;       /* Exempt from merging for now */
}

Re: [RFC PATCH 00/15] kmem_cache instances with static storage duration

Posted by Al Viro 3 weeks, 1 day ago

On Thu, Jan 15, 2026 at 11:10:00AM -0800, Christoph Lameter (Ampere) wrote:

> Internal functions exist in the slab allocator that do what you want if
> the opaqueness requirement is dropped. F.e. for the creation of kmalloc
> caches we use do_kmem_cache_create():

Yes, I know.  Do you really want to expose e.g. slab_caches and slab_mutex
to the rest of the kernel?  Surgery needed to have __kmem_cache_create()
do everything is not large - see the mm/slab_common.c parts in the first
two commits in this series.

Re: [RFC PATCH 00/15] kmem_cache instances with static storage duration

Posted by Linus Torvalds 4 weeks ago

On Fri, 9 Jan 2026 at 18:01, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         There's an alternative approach applicable at least to the caches
> that are never destroyed, which covers a lot of them.  No matter what,
> runtime_const for pointers is not going to be faster than plain &,
> so if we had struct kmem_cache instances with static storage duration, we
> would be at least no worse off than we are with runtime_const variants.

I like it. Much better than runtime_const for these things.

That said, I don't love the commit messages. "turn xyzzy
static-duration" reads very oddly to me, and because I saw the emails
out of order originally it just made me go "whaa?"

So can we please explain this some more obvious way. Maybe just "Make
xyz be statically allocated". Yes, I'm nitpicking, but I feel like
explaining core patches is worth the effort.

And maybe that's for the sad reason that I read more explanations than
code these days '/

                Linus

Re: [RFC PATCH 00/15] kmem_cache instances with static storage duration

Posted by Al Viro 4 weeks ago

On Fri, Jan 09, 2026 at 07:33:41PM -1000, Linus Torvalds wrote:
> On Fri, 9 Jan 2026 at 18:01, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> >         There's an alternative approach applicable at least to the caches
> > that are never destroyed, which covers a lot of them.  No matter what,
> > runtime_const for pointers is not going to be faster than plain &,
> > so if we had struct kmem_cache instances with static storage duration, we
> > would be at least no worse off than we are with runtime_const variants.
> 
> I like it. Much better than runtime_const for these things.
> 
> That said, I don't love the commit messages. "turn xyzzy
> static-duration" reads very oddly to me, and because I saw the emails
> out of order originally it just made me go "whaa?"
> 
> So can we please explain this some more obvious way. Maybe just "Make
> xyz be statically allocated". Yes, I'm nitpicking, but I feel like
> explaining core patches is worth the effort.

Point, but TBH the tail of the series is basically a demo for conversions
as well as "this is what I'd been testing, FSVO".  In non-RFC form these
would be folded into fewer commits, if nothing else...

I'd really like to hear comments on the first two commits from SLAB
maintainers - for example, if slab_flags_t bits are considered a scarce
resource, the second commit would need to be modified.  Still doable, but
representation would be more convoluted...

Another question is whether it's worth checking for accidental call
of e.g. kmem_cache_setup() on an already initialized cache, statically
or dynamically allocated.  Again, up to maintainers - their subsystem,
their preferences.

Re: [RFC PATCH 00/15] kmem_cache instances with static storage duration

Posted by Harry Yoo 3 weeks, 3 days ago

On Sat, Jan 10, 2026 at 06:16:00AM +0000, Al Viro wrote:
> On Fri, Jan 09, 2026 at 07:33:41PM -1000, Linus Torvalds wrote:
> > On Fri, 9 Jan 2026 at 18:01, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > >         There's an alternative approach applicable at least to the caches
> > > that are never destroyed, which covers a lot of them.  No matter what,
> > > runtime_const for pointers is not going to be faster than plain &,
> > > so if we had struct kmem_cache instances with static storage duration, we
> > > would be at least no worse off than we are with runtime_const variants.
> > 
> > I like it. Much better than runtime_const for these things.
> > 
> > That said, I don't love the commit messages. "turn xyzzy
> > static-duration" reads very oddly to me, and because I saw the emails
> > out of order originally it just made me go "whaa?"
> > 
> > So can we please explain this some more obvious way. Maybe just "Make
> > xyz be statically allocated". Yes, I'm nitpicking, but I feel like
> > explaining core patches is worth the effort.
> 
> Point, but TBH the tail of the series is basically a demo for conversions
> as well as "this is what I'd been testing, FSVO".  In non-RFC form these
> would be folded into fewer commits, if nothing else...
> 
> I'd really like to hear comments on the first two commits from SLAB
> maintainers - for example, if slab_flags_t bits are considered a scarce
> resource, the second commit would need to be modified.  Still doable, but

I think it's okay to introduce a new cache flag as long as it's simpler.
IMHO it's not a scarce resource (yet).

> representation would be more convoluted...
> 
> Another question is whether it's worth checking for accidental call
> of e.g. kmem_cache_setup() on an already initialized cache, statically
> or dynamically allocated.

No strong opinion from me.

> Again, up to maintainers - their subsystem,
> their preferences.

-- 
Cheers,
Harry / Hyeonggon