include/linux/slab.h | 8 +- mm/slab.h | 40 ++- mm/slab_common.c | 2 +- mm/slub.c | 715 ++++++++++++++++++++++++++++++------------- tools/include/linux/slab.h | 14 +- tools/testing/shared/linux.c | 4 +- 6 files changed, 563 insertions(+), 220 deletions(-)
Background
==========
Sheaves were introduced in v6.18, and starting from v7.0, they are
enabled for all slab caches (except for kmem_cache{,_node}). In the
pre-sheaves era, there was a cpu_partial parameter to tune the number
of objects cached per CPU. However, sheaves don't have an equivalent
and the sheaf capacity is determined in the kernel code.
The goal is to allow tuning sheaves at runtime by the next LTS.
Overview
========
This patchset does two main things:
1. Make the sheaf_capacity sysfs attribute writable so that the number
of objects cached per CPU can be changed at runtime, and
2. Expose MAX_FULL_SHEAVES and MAX_EMPTY_SHEAVES as sysfs attributes
rather than constants, so that users can tune them.
Measuring the performance impact of these tunables is TBD.
Roughly, the sequence to change sheaf_capacity is as follows:
1. Disable sheaves. Make all online CPUs replace their main sheaves
with the bootstrap sheaf under local_lock and wait for completion.
2. Wait for all in-flight RCU callbacks to be processed.
3. Flush and free all existing sheaves.
4. Re-enable sheaves with a new capacity.
Challenges
==========
1. Allocations and frees can happen concurrently at any point between
these steps, and we cannot introduce heavyweight synchronization
mechanisms on the fastpath.
2. Currently, cache_has_sheaves() checks whether a cache has sheaves.
This works now because sheaves cannot be enabled or disabled once
the cache is created.
The question "Does this cache has sheaves?" should be split into
"Does this cache support sheaves?" and "Does this CPU actually has
sheaves enabled right now?".
3. Once the sheaf capacity update is complete, no sheaf with stale
capacity must remain. Flushing and freeing all existing sheaves is
relatively simple, but under the current design it is quite
challenging to prevent sheaves with stale capacity to be installed
to the pcs or the barn. Reading s->sheaf_capacity without an
expensive synchronization primitive is racy.
Patch 6 introduces a copy of s->sheaf_capacity to struct
slub_percpu_sheaves to address this. pcs->capacity is copied from
s->sheaf_capacity and it is stable under local_lock. If
s->sheaf_capacity and pcs->capacity don't match, the sheaf_capacity
writer is responsible for flushing and freeing them before completing
the process.
Patch Sequence
==============
Patch 1-3: A per-sheaf capacity is required for the following steps,
but I didn't want to grow struct slab_sheaf. So patch 1 drops the cache
pointer (which was used only on the slowpath), patch 2 changs
sheaf_capacity from unsigned int to unsigned short, and patch 3 adds
per-sheaf capacity.
Actually, the size is shrunken after those patches.
After (24 bytes, excluding the objects flex array):
struct slab_sheaf {
union {
struct rcu_head rcu_head;
struct list_head barn_list;
bool pfmemalloc;
};
unsigned short capacity;
unsigned short size;
int node;
void *objects[];
};
Patch 4 allows bootstrap_cache_sheaves() to fail so that it can be
used to re-enable sheaves without panicking the kernel.
Patch 5 splits cache_has_sheaves() into cache_supports_sheaves()
and pcs_has_sheaves().
Patch 6 enables tuning the sheaf capacity at runtime.
Patch 7 adds lockdep asserts to verify the new rule "Always hold
local_lock when accessing the barn" to make sure there is no sheaf
with stale capacity.
Patch 8 turns MAX_FULL_SHEAVES and MAX_EMPTY_SHEAVES into sysfs
attributes (max_full_sheaves, max_empty_sheaves) and allows tuning.
RFC V1 is also available in git at:
https://git.kernel.org/pub/scm/linux/kernel/git/harry/linux.git/log/?h=sheaves-tuning-rfc-v1r1
Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
Harry Yoo (Oracle) (8):
mm/slab: do not store cache pointer in struct slab_sheaf
mm/slab: change sheaf_capacity type to unsigned short
mm/slab: track capacity per sheaf
mm/slab: allow bootstrap_cache_sheaves() to fail
mm/slab: rework cache_has_sheaves() to check immutable properties only
mm/slab: allow changing sheaf_capacity at runtime
mm/slab: add pcs->lock lockdep assert when accessing the barn
mm/slab: allow changing max_{full,empty}_sheaves at runtime
include/linux/slab.h | 8 +-
mm/slab.h | 40 ++-
mm/slab_common.c | 2 +-
mm/slub.c | 715 ++++++++++++++++++++++++++++++-------------
tools/include/linux/slab.h | 14 +-
tools/testing/shared/linux.c | 4 +-
6 files changed, 563 insertions(+), 220 deletions(-)
---
base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
change-id: 20260515-sheaves-tuning-e1f897dc7f5e
Best regards,
--
Cheers,
Harry / Hyeonggon
On Sat, May 16, 2026 at 01:24:24AM +0900, Harry Yoo (Oracle) wrote:
> Background
> ==========
>
> Sheaves were introduced in v6.18, and starting from v7.0, they are
> enabled for all slab caches (except for kmem_cache{,_node}). In the
> pre-sheaves era, there was a cpu_partial parameter to tune the number
> of objects cached per CPU. However, sheaves don't have an equivalent
> and the sheaf capacity is determined in the kernel code.
What semantic do you need from this?
>
> The goal is to allow tuning sheaves at runtime by the next LTS.
>
> Overview
> ========
>
> This patchset does two main things:
>
> 1. Make the sheaf_capacity sysfs attribute writable so that the number
> of objects cached per CPU can be changed at runtime, and
>
> 2. Expose MAX_FULL_SHEAVES and MAX_EMPTY_SHEAVES as sysfs attributes
> rather than constants, so that users can tune them.
>
> Measuring the performance impact of these tunables is TBD.
>
> Roughly, the sequence to change sheaf_capacity is as follows:
>
> 1. Disable sheaves. Make all online CPUs replace their main sheaves
> with the bootstrap sheaf under local_lock and wait for completion.
This is extremely destabilizing performance-wise, were I to guess.
>
> 2. Wait for all in-flight RCU callbacks to be processed.
and this too.
>
> 3. Flush and free all existing sheaves.
>
> 4. Re-enable sheaves with a new capacity.
>
> Challenges
> ==========
>
> 1. Allocations and frees can happen concurrently at any point between
> these steps, and we cannot introduce heavyweight synchronization
> mechanisms on the fastpath.
>
> 2. Currently, cache_has_sheaves() checks whether a cache has sheaves.
> This works now because sheaves cannot be enabled or disabled once
> the cache is created.
>
> The question "Does this cache has sheaves?" should be split into
> "Does this cache support sheaves?" and "Does this CPU actually has
> sheaves enabled right now?".
>
> 3. Once the sheaf capacity update is complete, no sheaf with stale
> capacity must remain.
Why? I don't see a huge problem with having multiple sheaves with different
capacities, as long as you adequately, opportunistically kill the sheaves
if they don't have the desired size (say, once a sheaf is fully empty).
--
Pedro
On 5/18/26 8:52 PM, Pedro Falcato wrote:
> On Sat, May 16, 2026 at 01:24:24AM +0900, Harry Yoo (Oracle) wrote:
>> Background
>> ==========
>>
>> Sheaves were introduced in v6.18, and starting from v7.0, they are
>> enabled for all slab caches (except for kmem_cache{,_node}). In the
>> pre-sheaves era, there was a cpu_partial parameter to tune the number
>> of objects cached per CPU. However, sheaves don't have an equivalent
>> and the sheaf capacity is determined in the kernel code.
>
> What semantic do you need from this?
The intent is to allow adjusting sheaf capacity to mitigate per-node
barn / slab list contention on the slowpath (for servers with many
CPUs), similar to the 'cpu_partial' tunable in SLUB and the 'limit'
tunable in SLAB.
However, the semantics are slightly different from 'cpu_partial' and
'limit', as changing sheaf_capacity also affects the number of objects
cached in the barn.
>> Challenges
>> ==========
>>
>> 1. Allocations and frees can happen concurrently at any point between
>> these steps, and we cannot introduce heavyweight synchronization
>> mechanisms on the fastpath.
>>
>> 2. Currently, cache_has_sheaves() checks whether a cache has sheaves.
>> This works now because sheaves cannot be enabled or disabled once
>> the cache is created.
>>
>> The question "Does this cache has sheaves?" should be split into
>> "Does this cache support sheaves?" and "Does this CPU actually has
>> sheaves enabled right now?".
>>
>> 3. Once the sheaf capacity update is complete, no sheaf with stale
>> capacity must remain.
>
> Why? I don't see a huge problem with having multiple sheaves with different
> capacities, as long as you adequately, opportunistically kill the sheaves
> if they don't have the desired size (say, once a sheaf is fully empty).
Haha, you got me.
Right, enforcing a single capacity at any given point introduced so much
complexity that I started wondering myself about whether this is really
essential.
My main concern was that the performance characteristics would become
too unpredictable, but actually, users can avoid that by disabling
sheaves, shrinking it, and re-enabling it. So that's not an enough
justification.
When I first started, I was quite cautious and obsessed with the
invariant because many parts of the current implementation assume "a
kmem_cache has only a single capacity, and it doesn't change", but
that's also addressed by this patchset. So that's not a big issue either.
I agree that it is worth trying to allow sheaves of different capacities
and hopefully that would be less intrusive. Let's see.
Thank you, Pedro.
--
Cheers,
Harry / Hyeonggon
© 2016 - 2026 Red Hat, Inc.