[PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()

Vlastimil Babka posted 1 patch 3 months, 2 weeks ago
mm/slub.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
[PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
Posted by Vlastimil Babka 3 months, 2 weeks ago
Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
that we discard the newly allocated slab if we can't spin and we fail to
trylock. As a result we don't perform inc_slabs_node() later in the
function. Instead we perform a deferred deactivate_slab() which can
either put the unacounted slab on partial list, or discard it
immediately while performing dec_slabs_node(). Either way will cause an
accounting imbalance.

Fix this by not marking the slab as frozen, and using free_slab()
instead of deactivate_slab() for non-frozen slabs in
free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
case. By not using discard_slab() we avoid dec_slabs_node().

Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Changes in v2:
- Fix the problem differently. Harry pointed out that we can't move
  inc_slabs_node() outside of list_lock protected regions as that would
  reintroduce issues fixed by commit c7323a5ad078
- Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
---
 mm/slub.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 23d8f54e9486..87a1d2f9de0d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
 
 	if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
 		/* Unlucky, discard newly allocated slab */
-		slab->frozen = 1;
 		defer_deactivate_slab(slab, NULL);
 		return NULL;
 	}
@@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
 		struct slab *slab = container_of(pos, struct slab, llnode);
 
 #ifdef CONFIG_SLUB_TINY
-		discard_slab(slab->slab_cache, slab);
+		free_slab(slab->slab_cache, slab);
 #else
-		deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
+		if (slab->frozen)
+			deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
+		else
+			free_slab(slab->slab_cache, slab);
 #endif
 	}
 }

---
base-commit: 6ed8bfd24ce1cb31742b09a3eb557cd008533eec
change-id: 20251022-fix-slab-accounting-f0abbda8a6ff

Best regards,
-- 
Vlastimil Babka <vbabka@suse.cz>
Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
Posted by Alexei Starovoitov 3 months, 2 weeks ago
On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
> kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
> that we discard the newly allocated slab if we can't spin and we fail to
> trylock. As a result we don't perform inc_slabs_node() later in the
> function. Instead we perform a deferred deactivate_slab() which can
> either put the unacounted slab on partial list, or discard it
> immediately while performing dec_slabs_node(). Either way will cause an
> accounting imbalance.
>
> Fix this by not marking the slab as frozen, and using free_slab()
> instead of deactivate_slab() for non-frozen slabs in
> free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
> case. By not using discard_slab() we avoid dec_slabs_node().
>
> Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> Changes in v2:
> - Fix the problem differently. Harry pointed out that we can't move
>   inc_slabs_node() outside of list_lock protected regions as that would
>   reintroduce issues fixed by commit c7323a5ad078
> - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
> ---
>  mm/slub.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 23d8f54e9486..87a1d2f9de0d 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
>
>         if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
>                 /* Unlucky, discard newly allocated slab */
> -               slab->frozen = 1;
>                 defer_deactivate_slab(slab, NULL);
>                 return NULL;
>         }
> @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
>                 struct slab *slab = container_of(pos, struct slab, llnode);
>
>  #ifdef CONFIG_SLUB_TINY
> -               discard_slab(slab->slab_cache, slab);
> +               free_slab(slab->slab_cache, slab);
>  #else
> -               deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> +               if (slab->frozen)
> +                       deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> +               else
> +                       free_slab(slab->slab_cache, slab);

A bit odd to use 'frozen' flag as such a signal.
I guess I'm worried that truly !frozen slab can come here
via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab().
And things will be much worse than just accounting.

Maybe add
  inc_slabs_node(s, nid, slab->objects);
right before
  defer_deactivate_slab(slab, NULL);
  return NULL;

I don't quite get why c7323a5ad078 is doing everything under n->list_lock.
It's been 3 years since.
We have an empty slab here that is going to be freed soon.
It's effectively frozen, so inc_slabs_node() on it seems like a safe fix.
Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
Posted by Harry Yoo 3 months, 2 weeks ago
On Thu, Oct 23, 2025 at 04:13:37PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 23, 2025 at 5:01 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
> > kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
> > that we discard the newly allocated slab if we can't spin and we fail to
> > trylock. As a result we don't perform inc_slabs_node() later in the
> > function. Instead we perform a deferred deactivate_slab() which can
> > either put the unacounted slab on partial list, or discard it
> > immediately while performing dec_slabs_node(). Either way will cause an
> > accounting imbalance.
> >
> > Fix this by not marking the slab as frozen, and using free_slab()
> > instead of deactivate_slab() for non-frozen slabs in
> > free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
> > case. By not using discard_slab() we avoid dec_slabs_node().
> >
> > Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > ---
> > Changes in v2:
> > - Fix the problem differently. Harry pointed out that we can't move
> >   inc_slabs_node() outside of list_lock protected regions as that would
> >   reintroduce issues fixed by commit c7323a5ad078
> > - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
> > ---
> >  mm/slub.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 23d8f54e9486..87a1d2f9de0d 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3422,7 +3422,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> >
> >         if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
> >                 /* Unlucky, discard newly allocated slab */
> > -               slab->frozen = 1;
> >                 defer_deactivate_slab(slab, NULL);
> >                 return NULL;
> >         }
> > @@ -6471,9 +6470,12 @@ static void free_deferred_objects(struct irq_work *work)
> >                 struct slab *slab = container_of(pos, struct slab, llnode);
> >
> >  #ifdef CONFIG_SLUB_TINY
> > -               discard_slab(slab->slab_cache, slab);
> > +               free_slab(slab->slab_cache, slab);
> >  #else
> > -               deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> > +               if (slab->frozen)
> > +                       deactivate_slab(slab->slab_cache, slab, slab->flush_freelist);
> > +               else
> > +                       free_slab(slab->slab_cache, slab);
> 
> A bit odd to use 'frozen' flag as such a signal.
> I guess I'm worried that truly !frozen slab can come here
> via ___slab_alloc() -> retry_load_slab: -> defer_deactivate_slab().
> And things will be much worse than just accounting.

But the cpu slab must have been frozen before it's attached to
c->slab?

> Maybe add
>   inc_slabs_node(s, nid, slab->objects);
> right before
>   defer_deactivate_slab(slab, NULL);
>   return NULL;
> 
> I don't quite get why c7323a5ad078 is doing everything under n->list_lock.
> It's been 3 years since.

When n->nr_slabs is inconsistent, validate_slab_node() might report an
error (false positive) when someone wrote '1' to
/sys/kernel/slab/<cache name>/validate

> We have an empty slab here that is going to be freed soon.
> It's effectively frozen, so inc_slabs_node() on it seems like a safe fix.

-- 
Cheers,
Harry / Hyeonggon
Re: [PATCH v2] slab: fix slab accounting imbalance due to defer_deactivate_slab()
Posted by Harry Yoo 3 months, 2 weeks ago
On Thu, Oct 23, 2025 at 02:01:07PM +0200, Vlastimil Babka wrote:
> Since commit af92793e52c3 ("slab: Introduce kmalloc_nolock() and
> kfree_nolock().") there's a possibility in alloc_single_from_new_slab()
> that we discard the newly allocated slab if we can't spin and we fail to
> trylock. As a result we don't perform inc_slabs_node() later in the
> function. Instead we perform a deferred deactivate_slab() which can
> either put the unacounted slab on partial list, or discard it
> immediately while performing dec_slabs_node(). Either way will cause an
> accounting imbalance.
> 
> Fix this by not marking the slab as frozen, and using free_slab()
> instead of deactivate_slab() for non-frozen slabs in
> free_deferred_objects(). For CONFIG_SLUB_TINY, that's the only possible
> case. By not using discard_slab() we avoid dec_slabs_node().
> 
> Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> Changes in v2:
> - Fix the problem differently. Harry pointed out that we can't move
>   inc_slabs_node() outside of list_lock protected regions as that would
>   reintroduce issues fixed by commit c7323a5ad078
> - Link to v1: https://patch.msgid.link/20251022-fix-slab-accounting-v1-1-27870ec363ce@suse.cz
> ---

Looks good to me,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon