[v2] slab: replace cpu (partial) slabs with sheaves

[PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()

Posted by Vlastimil Babka 4 weeks ago

After we submit the rcu_free sheaves to call_rcu() we need to make sure
the rcu callbacks complete. kvfree_rcu_barrier() does that via
flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
that.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
Cc: stable@vger.kernel.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index eed7ea556cb1..ee994ec7f251 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -2133,8 +2133,11 @@ EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
  */
 void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
 {
-	if (s->cpu_sheaves)
+	if (s->cpu_sheaves) {
 		flush_rcu_sheaves_on_cache(s);
+		rcu_barrier();
+	}
+
 	/*
 	 * TODO: Introduce a version of __kvfree_rcu_barrier() that works
 	 * on a specific slab cache.

-- 
2.52.0

Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()

Posted by Harry Yoo 3 weeks, 5 days ago

On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
> After we submit the rcu_free sheaves to call_rcu() we need to make sure
> the rcu callbacks complete. kvfree_rcu_barrier() does that via
> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
> that.
> 
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
> Cc: stable@vger.kernel.org
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---

LGTM,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

and I reproduced it locally and this resolves the issue, so:
Tested-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon

Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()

Posted by Harry Yoo 3 weeks, 6 days ago

On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
> After we submit the rcu_free sheaves to call_rcu() we need to make sure
> the rcu callbacks complete. kvfree_rcu_barrier() does that via
> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
> that.

Oops, my bad.

> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
> Cc: stable@vger.kernel.org
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---

The fix looks good to me, but I wonder why
`if (s->sheaf_capacity) rcu_barrier();` in __kmem_cache_shutdown()
didn't prevent the bug from happening?

>  mm/slab_common.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index eed7ea556cb1..ee994ec7f251 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -2133,8 +2133,11 @@ EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
>   */
>  void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
>  {
> -	if (s->cpu_sheaves)
> +	if (s->cpu_sheaves) {
>  		flush_rcu_sheaves_on_cache(s);
> +		rcu_barrier();
> +	}
> +
>  	/*
>  	 * TODO: Introduce a version of __kvfree_rcu_barrier() that works
>  	 * on a specific slab cache.
> 
> -- 
> 2.52.0
> 

-- 
Cheers,
Harry / Hyeonggon

Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()

Posted by Vlastimil Babka 3 weeks, 6 days ago

On 1/13/26 3:08 AM, Harry Yoo wrote:
> On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
>> After we submit the rcu_free sheaves to call_rcu() we need to make sure
>> the rcu callbacks complete. kvfree_rcu_barrier() does that via
>> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
>> that.
> 
> Oops, my bad.
> 
>> Reported-by: kernel test robot <oliver.sang@intel.com>
>> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
>> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
> 
> The fix looks good to me, but I wonder why
> `if (s->sheaf_capacity) rcu_barrier();` in __kmem_cache_shutdown()
> didn't prevent the bug from happening?

Hmm good point, didn't notice it's there.

I think it doesn't help because it happens only after
flush_all_cpus_locked(). And the callback from rcu_free_sheaf_nobarn()
will do sheaf_flush_unused() and end up installing the cpu slab again.

Because the bot flagged commit "slab: add sheaves to most caches" where
cpu slabs still exist. It's thus possible that with the full series, the
bug is gone. But we should prevent it upfront anyway. The rcu_barrier()
in __kmem_cache_shutdown() however is probably unnecessary then and we
can remove it, right?

>>  mm/slab_common.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index eed7ea556cb1..ee994ec7f251 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -2133,8 +2133,11 @@ EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
>>   */
>>  void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
>>  {
>> -	if (s->cpu_sheaves)
>> +	if (s->cpu_sheaves) {
>>  		flush_rcu_sheaves_on_cache(s);
>> +		rcu_barrier();
>> +	}
>> +
>>  	/*
>>  	 * TODO: Introduce a version of __kvfree_rcu_barrier() that works
>>  	 * on a specific slab cache.
>>
>> -- 
>> 2.52.0
>>
>

Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()

Posted by Harry Yoo 3 weeks, 6 days ago

On Tue, Jan 13, 2026 at 10:32:33AM +0100, Vlastimil Babka wrote:
> On 1/13/26 3:08 AM, Harry Yoo wrote:
> > On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
> >> After we submit the rcu_free sheaves to call_rcu() we need to make sure
> >> the rcu callbacks complete. kvfree_rcu_barrier() does that via
> >> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
> >> that.
> > 
> > Oops, my bad.
> > 
> >> Reported-by: kernel test robot <oliver.sang@intel.com>
> >> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
> >> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> ---
> > 
> > The fix looks good to me, but I wonder why
> > `if (s->sheaf_capacity) rcu_barrier();` in __kmem_cache_shutdown()
> > didn't prevent the bug from happening?
> 
> Hmm good point, didn't notice it's there.
> 
> I think it doesn't help because it happens only after
> flush_all_cpus_locked(). And the callback from rcu_free_sheaf_nobarn()
> will do sheaf_flush_unused() and end up installing the cpu slab again.

I thought about it a little bit more...

It's not because a cpu slab was installed again (for list_slab_objects()
to be called on a slab, it must be on n->partial list), but because
flush_slab() cannot handle concurrent frees to the cpu slab.

CPU X                                CPU Y

- flush_slab() reads
  c->freelist
                                     rcu_free_sheaf_nobarn()
				     ->sheaf_flush_unused()
				     ->__kmem_cache_free_bulk()
				     ->do_slab_free()
				       -> sees slab == c->slab
				       -> frees to c->freelist
- c->slab = NULL,
  c->freelist = NULL
- call deactivate_slab()
  ^ the object freed by sheaf_flush_unused() is leaked,
    thus slab->inuse != 0

That said, flush_slab() works fine only when it is guaranteed that
there will be no concurrent frees to the cpu slab (acquiring local_lock
in flush_slab() doesn't help because free fastpath doesn't take it)

calling rcu_barrier() before flush_all_cpus_locked() ensures
there will be no concurrent frees.

A side question; I'm not sure how __kmem_cache_shrink(),
validate_slab_cache(), cpu_partial_store() are supposed to work
correctly? They call flush_all() without guaranteeing there will be
no concurrent frees to the cpu slab.

...probably doesn't matter after sheaves-for-all :)

> Because the bot flagged commit "slab: add sheaves to most caches" where
> cpu slabs still exist. It's thus possible that with the full series, the
> bug is gone. But we should prevent it upfront anyway.

> The rcu_barrier() in __kmem_cache_shutdown() however is probably
> unnecessary then and we can remove it, right?

Agreed. As it's called (after flushing rcu sheaves) in
kvfree_rcu_barrier_on_cache(), it's not necessary in
__kmem_cache_shutdown().

> >>  mm/slab_common.c | 5 ++++-
> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> index eed7ea556cb1..ee994ec7f251 100644
> >> --- a/mm/slab_common.c
> >> +++ b/mm/slab_common.c
> >> @@ -2133,8 +2133,11 @@ EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
> >>   */
> >>  void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
> >>  {
> >> -	if (s->cpu_sheaves)
> >> +	if (s->cpu_sheaves) {
> >>  		flush_rcu_sheaves_on_cache(s);
> >> +		rcu_barrier();
> >> +	}
> >> +
> >>  	/*
> >>  	 * TODO: Introduce a version of __kvfree_rcu_barrier() that works
> >>  	 * on a specific slab cache.

-- 
Cheers,
Harry / Hyeonggon

Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()

Posted by Vlastimil Babka 3 weeks, 6 days ago

On 1/13/26 1:31 PM, Harry Yoo wrote:
> On Tue, Jan 13, 2026 at 10:32:33AM +0100, Vlastimil Babka wrote:
>> On 1/13/26 3:08 AM, Harry Yoo wrote:
>>> On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
>>>> After we submit the rcu_free sheaves to call_rcu() we need to make sure
>>>> the rcu callbacks complete. kvfree_rcu_barrier() does that via
>>>> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
>>>> that.
>>>
>>> Oops, my bad.
>>>
>>>> Reported-by: kernel test robot <oliver.sang@intel.com>
>>>> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
>>>> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>>>> ---
>>>
>>> The fix looks good to me, but I wonder why
>>> `if (s->sheaf_capacity) rcu_barrier();` in __kmem_cache_shutdown()
>>> didn't prevent the bug from happening?
>>
>> Hmm good point, didn't notice it's there.
>>
>> I think it doesn't help because it happens only after
>> flush_all_cpus_locked(). And the callback from rcu_free_sheaf_nobarn()
>> will do sheaf_flush_unused() and end up installing the cpu slab again.
> 
> I thought about it a little bit more...
> 
> It's not because a cpu slab was installed again (for list_slab_objects()
> to be called on a slab, it must be on n->partial list), but because

Hmm that's true.

> flush_slab() cannot handle concurrent frees to the cpu slab.
> 
> CPU X                                CPU Y
> 
> - flush_slab() reads
>   c->freelist
>                                      rcu_free_sheaf_nobarn()
> 				     ->sheaf_flush_unused()
> 				     ->__kmem_cache_free_bulk()
> 				     ->do_slab_free()
> 				       -> sees slab == c->slab
> 				       -> frees to c->freelist
> - c->slab = NULL,
>   c->freelist = NULL
> - call deactivate_slab()
>   ^ the object freed by sheaf_flush_unused() is leaked,
>     thus slab->inuse != 0

But for this to be the same "c" it has to be the same cpu, not different
X and Y, no?
And that case is protected I think, the action by X with
local_lock_irqsave() prevents an irq handler to execute Y. Action Y is
using __update_cpu_freelist_fast to find out it was interrupted by X
messing with c-> fields.


> That said, flush_slab() works fine only when it is guaranteed that
> there will be no concurrent frees to the cpu slab (acquiring local_lock
> in flush_slab() doesn't help because free fastpath doesn't take it)
> 
> calling rcu_barrier() before flush_all_cpus_locked() ensures
> there will be no concurrent frees.
> 
> A side question; I'm not sure how __kmem_cache_shrink(),
> validate_slab_cache(), cpu_partial_store() are supposed to work
> correctly? They call flush_all() without guaranteeing there will be
> no concurrent frees to the cpu slab.
> 
> ...probably doesn't matter after sheaves-for-all :)
> 
>> Because the bot flagged commit "slab: add sheaves to most caches" where
>> cpu slabs still exist. It's thus possible that with the full series, the
>> bug is gone. But we should prevent it upfront anyway.
> 
>> The rcu_barrier() in __kmem_cache_shutdown() however is probably
>> unnecessary then and we can remove it, right?
> 
> Agreed. As it's called (after flushing rcu sheaves) in
> kvfree_rcu_barrier_on_cache(), it's not necessary in
> __kmem_cache_shutdown().
> 
>>>>  mm/slab_common.c | 5 ++++-
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>>>> index eed7ea556cb1..ee994ec7f251 100644
>>>> --- a/mm/slab_common.c
>>>> +++ b/mm/slab_common.c
>>>> @@ -2133,8 +2133,11 @@ EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
>>>>   */
>>>>  void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
>>>>  {
>>>> -	if (s->cpu_sheaves)
>>>> +	if (s->cpu_sheaves) {
>>>>  		flush_rcu_sheaves_on_cache(s);
>>>> +		rcu_barrier();
>>>> +	}
>>>> +
>>>>  	/*
>>>>  	 * TODO: Introduce a version of __kvfree_rcu_barrier() that works
>>>>  	 * on a specific slab cache.
>

Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()

Posted by Harry Yoo 3 weeks, 5 days ago

On Tue, Jan 13, 2026 at 02:09:33PM +0100, Vlastimil Babka wrote:
> On 1/13/26 1:31 PM, Harry Yoo wrote:
> > On Tue, Jan 13, 2026 at 10:32:33AM +0100, Vlastimil Babka wrote:
> >> On 1/13/26 3:08 AM, Harry Yoo wrote:
> >>> On Mon, Jan 12, 2026 at 04:16:55PM +0100, Vlastimil Babka wrote:
> >>>> After we submit the rcu_free sheaves to call_rcu() we need to make sure
> >>>> the rcu callbacks complete. kvfree_rcu_barrier() does that via
> >>>> flush_all_rcu_sheaves() but kvfree_rcu_barrier_on_cache() doesn't. Fix
> >>>> that.
> >>>
> >>> Oops, my bad.
> >>>
> >>>> Reported-by: kernel test robot <oliver.sang@intel.com>
> >>>> Closes: https://lore.kernel.org/oe-lkp/202601121442.c530bed3-lkp@intel.com
> >>>> Fixes: 0f35040de593 ("mm/slab: introduce kvfree_rcu_barrier_on_cache() for cache destruction")
> >>>> Cc: stable@vger.kernel.org
> >>>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >>>> ---
> >>>
> >>> The fix looks good to me, but I wonder why
> >>> `if (s->sheaf_capacity) rcu_barrier();` in __kmem_cache_shutdown()
> >>> didn't prevent the bug from happening?
> >>
> >> Hmm good point, didn't notice it's there.
> >>
> >> I think it doesn't help because it happens only after
> >> flush_all_cpus_locked(). And the callback from rcu_free_sheaf_nobarn()
> >> will do sheaf_flush_unused() and end up installing the cpu slab again.
> > 
> > I thought about it a little bit more...
> > 
> > It's not because a cpu slab was installed again (for list_slab_objects()
> > to be called on a slab, it must be on n->partial list), but because
> 
> Hmm that's true.
> 
> > flush_slab() cannot handle concurrent frees to the cpu slab.
> > 
> > CPU X                                CPU Y
> > 
> > - flush_slab() reads
> >   c->freelist
> >                                      rcu_free_sheaf_nobarn()
> > 				     ->sheaf_flush_unused()
> > 				     ->__kmem_cache_free_bulk()
> > 				     ->do_slab_free()
> > 				       -> sees slab == c->slab
> > 				       -> frees to c->freelist
> > - c->slab = NULL,
> >   c->freelist = NULL
> > - call deactivate_slab()
> >   ^ the object freed by sheaf_flush_unused() is leaked,
> >     thus slab->inuse != 0
> 
> But for this to be the same "c" it has to be the same cpu, not different
> X and Y, no?

You're absolutely right! It just slipped my mind.

> And that case is protected I think, the action by X with
> local_lock_irqsave() prevents an irq handler to execute Y.
> Action Y is
> using __update_cpu_freelist_fast to find out it was interrupted by X
> messing with c-> fields.

Right.

Also, the test module is just freeing one object (with slab merging
disabled), so there is no concurrent freeing in the test.

For the record, an accurate analysis of the problem (as discussed
off-list):

It turns out the object freed by sheaf_flush_unused() was in KASAN
percpu quarantine list (confirmed by dumping the list) by the time
__kmem_cache_shutdown() returns an error.

Quarantined objects are supposed to be flushed by kasan_cache_shutdown(),
but things go wrong if the rcu callback (rcu_free_sheaf_nobarn()) is
processed after kasan_cache_shutdown() finishes.

That's why rcu_barrier() in __kmem_cache_shutdown() didn't help,
because it's called after kasan_cache_shutdown().

Calling rcu_barrier() in kvfree_rcu_barrier_on_cache() guarantees
that it'll be added to the quarantine list before kasan_cache_shutdown()
is called. So it's a valid fix!

-- 
Cheers,
Harry / Hyeonggon

Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()

Posted by Vlastimil Babka 3 weeks, 5 days ago

On 1/14/26 12:14, Harry Yoo wrote:
> For the record, an accurate analysis of the problem (as discussed
> off-list):
> 
> It turns out the object freed by sheaf_flush_unused() was in KASAN
> percpu quarantine list (confirmed by dumping the list) by the time
> __kmem_cache_shutdown() returns an error.
> 
> Quarantined objects are supposed to be flushed by kasan_cache_shutdown(),
> but things go wrong if the rcu callback (rcu_free_sheaf_nobarn()) is
> processed after kasan_cache_shutdown() finishes.
> 
> That's why rcu_barrier() in __kmem_cache_shutdown() didn't help,
> because it's called after kasan_cache_shutdown().
> 
> Calling rcu_barrier() in kvfree_rcu_barrier_on_cache() guarantees
> that it'll be added to the quarantine list before kasan_cache_shutdown()
> is called. So it's a valid fix!

Thanks a lot! Will incorporate to commit log.
This being KASAN-only means further reducing the urgency.

Re: [PATCH RFC v2 01/20] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()

Posted by Suren Baghdasaryan 3 weeks, 4 days ago

On Wed, Jan 14, 2026 at 1:02 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 1/14/26 12:14, Harry Yoo wrote:
> > For the record, an accurate analysis of the problem (as discussed
> > off-list):
> >
> > It turns out the object freed by sheaf_flush_unused() was in KASAN
> > percpu quarantine list (confirmed by dumping the list) by the time
> > __kmem_cache_shutdown() returns an error.
> >
> > Quarantined objects are supposed to be flushed by kasan_cache_shutdown(),
> > but things go wrong if the rcu callback (rcu_free_sheaf_nobarn()) is
> > processed after kasan_cache_shutdown() finishes.
> >
> > That's why rcu_barrier() in __kmem_cache_shutdown() didn't help,
> > because it's called after kasan_cache_shutdown().
> >
> > Calling rcu_barrier() in kvfree_rcu_barrier_on_cache() guarantees
> > that it'll be added to the quarantine list before kasan_cache_shutdown()
> > is called. So it's a valid fix!
>
> Thanks a lot! Will incorporate to commit log.
> This being KASAN-only means further reducing the urgency.

Thanks for the detailed explanation!

Reviewed-by: Suren Baghdasaryan <surenb@google.com>