[PATCH v5 7/9] slub: Optimize deactivate_slab()

chengming.zhou@linux.dev posted 9 patches 2 years, 1 month ago
[PATCH v5 7/9] slub: Optimize deactivate_slab()
Posted by chengming.zhou@linux.dev 2 years, 1 month ago
From: Chengming Zhou <zhouchengming@bytedance.com>

Since the introduce of unfrozen slabs on cpu partial list, we don't
need to synchronize the slab frozen state under the node list_lock.

The caller of deactivate_slab() and the caller of __slab_free() won't
manipulate the slab list concurrently.

So we can get node list_lock in the last stage if we really need to
manipulate the slab list in this path.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
---
 mm/slub.c | 79 ++++++++++++++++++-------------------------------------
 1 file changed, 26 insertions(+), 53 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index bcb5b2c4e213..d137468fe4b9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2468,10 +2468,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
 static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
 			    void *freelist)
 {
-	enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
 	struct kmem_cache_node *n = get_node(s, slab_nid(slab));
 	int free_delta = 0;
-	enum slab_modes mode = M_NONE;
 	void *nextfree, *freelist_iter, *freelist_tail;
 	int tail = DEACTIVATE_TO_HEAD;
 	unsigned long flags = 0;
@@ -2509,65 +2507,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
 	/*
 	 * Stage two: Unfreeze the slab while splicing the per-cpu
 	 * freelist to the head of slab's freelist.
-	 *
-	 * Ensure that the slab is unfrozen while the list presence
-	 * reflects the actual number of objects during unfreeze.
-	 *
-	 * We first perform cmpxchg holding lock and insert to list
-	 * when it succeed. If there is mismatch then the slab is not
-	 * unfrozen and number of objects in the slab may have changed.
-	 * Then release lock and retry cmpxchg again.
 	 */
-redo:
-
-	old.freelist = READ_ONCE(slab->freelist);
-	old.counters = READ_ONCE(slab->counters);
-	VM_BUG_ON(!old.frozen);
-
-	/* Determine target state of the slab */
-	new.counters = old.counters;
-	if (freelist_tail) {
-		new.inuse -= free_delta;
-		set_freepointer(s, freelist_tail, old.freelist);
-		new.freelist = freelist;
-	} else
-		new.freelist = old.freelist;
-
-	new.frozen = 0;
+	do {
+		old.freelist = READ_ONCE(slab->freelist);
+		old.counters = READ_ONCE(slab->counters);
+		VM_BUG_ON(!old.frozen);
+
+		/* Determine target state of the slab */
+		new.counters = old.counters;
+		new.frozen = 0;
+		if (freelist_tail) {
+			new.inuse -= free_delta;
+			set_freepointer(s, freelist_tail, old.freelist);
+			new.freelist = freelist;
+		} else {
+			new.freelist = old.freelist;
+		}
+	} while (!slab_update_freelist(s, slab,
+		old.freelist, old.counters,
+		new.freelist, new.counters,
+		"unfreezing slab"));
 
+	/*
+	 * Stage three: Manipulate the slab list based on the updated state.
+	 */
 	if (!new.inuse && n->nr_partial >= s->min_partial) {
-		mode = M_FREE;
+		stat(s, DEACTIVATE_EMPTY);
+		discard_slab(s, slab);
+		stat(s, FREE_SLAB);
 	} else if (new.freelist) {
-		mode = M_PARTIAL;
-		/*
-		 * Taking the spinlock removes the possibility that
-		 * acquire_slab() will see a slab that is frozen
-		 */
 		spin_lock_irqsave(&n->list_lock, flags);
-	} else {
-		mode = M_FULL_NOLIST;
-	}
-
-
-	if (!slab_update_freelist(s, slab,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
-				"unfreezing slab")) {
-		if (mode == M_PARTIAL)
-			spin_unlock_irqrestore(&n->list_lock, flags);
-		goto redo;
-	}
-
-
-	if (mode == M_PARTIAL) {
 		add_partial(n, slab, tail);
 		spin_unlock_irqrestore(&n->list_lock, flags);
 		stat(s, tail);
-	} else if (mode == M_FREE) {
-		stat(s, DEACTIVATE_EMPTY);
-		discard_slab(s, slab);
-		stat(s, FREE_SLAB);
-	} else if (mode == M_FULL_NOLIST) {
+	} else {
 		stat(s, DEACTIVATE_FULL);
 	}
 }
-- 
2.20.1
Re: [PATCH v5 7/9] slub: Optimize deactivate_slab()
Posted by Hyeonggon Yoo 2 years ago
On Thu, Nov 2, 2023 at 12:25 PM <chengming.zhou@linux.dev> wrote:
>
> From: Chengming Zhou <zhouchengming@bytedance.com>
>
> Since the introduce of unfrozen slabs on cpu partial list, we don't
> need to synchronize the slab frozen state under the node list_lock.
>
> The caller of deactivate_slab() and the caller of __slab_free() won't
> manipulate the slab list concurrently.
>
> So we can get node list_lock in the last stage if we really need to
> manipulate the slab list in this path.
>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> ---
>  mm/slub.c | 79 ++++++++++++++++++-------------------------------------
>  1 file changed, 26 insertions(+), 53 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index bcb5b2c4e213..d137468fe4b9 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2468,10 +2468,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
>  static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>                             void *freelist)
>  {
> -       enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
>         struct kmem_cache_node *n = get_node(s, slab_nid(slab));
>         int free_delta = 0;
> -       enum slab_modes mode = M_NONE;
>         void *nextfree, *freelist_iter, *freelist_tail;
>         int tail = DEACTIVATE_TO_HEAD;
>         unsigned long flags = 0;
> @@ -2509,65 +2507,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>         /*
>          * Stage two: Unfreeze the slab while splicing the per-cpu
>          * freelist to the head of slab's freelist.
> -        *
> -        * Ensure that the slab is unfrozen while the list presence
> -        * reflects the actual number of objects during unfreeze.
> -        *
> -        * We first perform cmpxchg holding lock and insert to list
> -        * when it succeed. If there is mismatch then the slab is not
> -        * unfrozen and number of objects in the slab may have changed.
> -        * Then release lock and retry cmpxchg again.
>          */
> -redo:
> -
> -       old.freelist = READ_ONCE(slab->freelist);
> -       old.counters = READ_ONCE(slab->counters);
> -       VM_BUG_ON(!old.frozen);
> -
> -       /* Determine target state of the slab */
> -       new.counters = old.counters;
> -       if (freelist_tail) {
> -               new.inuse -= free_delta;
> -               set_freepointer(s, freelist_tail, old.freelist);
> -               new.freelist = freelist;
> -       } else
> -               new.freelist = old.freelist;
> -
> -       new.frozen = 0;
> +       do {
> +               old.freelist = READ_ONCE(slab->freelist);
> +               old.counters = READ_ONCE(slab->counters);
> +               VM_BUG_ON(!old.frozen);
> +
> +               /* Determine target state of the slab */
> +               new.counters = old.counters;
> +               new.frozen = 0;
> +               if (freelist_tail) {
> +                       new.inuse -= free_delta;
> +                       set_freepointer(s, freelist_tail, old.freelist);
> +                       new.freelist = freelist;
> +               } else {
> +                       new.freelist = old.freelist;
> +               }
> +       } while (!slab_update_freelist(s, slab,
> +               old.freelist, old.counters,
> +               new.freelist, new.counters,
> +               "unfreezing slab"));
>
> +       /*
> +        * Stage three: Manipulate the slab list based on the updated state.
> +        */

deactivate_slab() might unconsciously put empty slabs into partial list, like:

deactivate_slab()                    __slab_free()
cmpxchg(), slab's not empty
                                               cmpxchg(), slab's empty
and unfrozen
                                               spin_lock(&n->list_lock)
                                               (slab's empty but not
on partial list,

spin_unlock(&n->list_lock) and return)
spin_lock(&n->list_lock)
put slab into partial list
spin_unlock(&n->list_lock)

IMHO it should be fine in the real world, but just wanted to
mention as it doesn't seem to be intentional.

Otherwise it looks good to me!

>         if (!new.inuse && n->nr_partial >= s->min_partial) {
> -               mode = M_FREE;
> +               stat(s, DEACTIVATE_EMPTY);
> +               discard_slab(s, slab);
> +               stat(s, FREE_SLAB);
>         } else if (new.freelist) {
> -               mode = M_PARTIAL;
> -               /*
> -                * Taking the spinlock removes the possibility that
> -                * acquire_slab() will see a slab that is frozen
> -                */
>                 spin_lock_irqsave(&n->list_lock, flags);
> -       } else {
> -               mode = M_FULL_NOLIST;
> -       }
> -
> -
> -       if (!slab_update_freelist(s, slab,
> -                               old.freelist, old.counters,
> -                               new.freelist, new.counters,
> -                               "unfreezing slab")) {
> -               if (mode == M_PARTIAL)
> -                       spin_unlock_irqrestore(&n->list_lock, flags);
> -               goto redo;
> -       }
> -
> -
> -       if (mode == M_PARTIAL) {
>                 add_partial(n, slab, tail);
>                 spin_unlock_irqrestore(&n->list_lock, flags);
>                 stat(s, tail);
> -       } else if (mode == M_FREE) {
> -               stat(s, DEACTIVATE_EMPTY);
> -               discard_slab(s, slab);
> -               stat(s, FREE_SLAB);
> -       } else if (mode == M_FULL_NOLIST) {
> +       } else {
>                 stat(s, DEACTIVATE_FULL);
>         }
>  }
> --
> 2.20.1
>
Re: [PATCH v5 7/9] slub: Optimize deactivate_slab()
Posted by Vlastimil Babka 2 years ago
On 12/3/23 10:23, Hyeonggon Yoo wrote:
> On Thu, Nov 2, 2023 at 12:25 PM <chengming.zhou@linux.dev> wrote:
>>
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> Since the introduce of unfrozen slabs on cpu partial list, we don't
>> need to synchronize the slab frozen state under the node list_lock.
>>
>> The caller of deactivate_slab() and the caller of __slab_free() won't
>> manipulate the slab list concurrently.
>>
>> So we can get node list_lock in the last stage if we really need to
>> manipulate the slab list in this path.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
>> ---
>>  mm/slub.c | 79 ++++++++++++++++++-------------------------------------
>>  1 file changed, 26 insertions(+), 53 deletions(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index bcb5b2c4e213..d137468fe4b9 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2468,10 +2468,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
>>  static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>>                             void *freelist)
>>  {
>> -       enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
>>         struct kmem_cache_node *n = get_node(s, slab_nid(slab));
>>         int free_delta = 0;
>> -       enum slab_modes mode = M_NONE;
>>         void *nextfree, *freelist_iter, *freelist_tail;
>>         int tail = DEACTIVATE_TO_HEAD;
>>         unsigned long flags = 0;
>> @@ -2509,65 +2507,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>>         /*
>>          * Stage two: Unfreeze the slab while splicing the per-cpu
>>          * freelist to the head of slab's freelist.
>> -        *
>> -        * Ensure that the slab is unfrozen while the list presence
>> -        * reflects the actual number of objects during unfreeze.
>> -        *
>> -        * We first perform cmpxchg holding lock and insert to list
>> -        * when it succeed. If there is mismatch then the slab is not
>> -        * unfrozen and number of objects in the slab may have changed.
>> -        * Then release lock and retry cmpxchg again.
>>          */
>> -redo:
>> -
>> -       old.freelist = READ_ONCE(slab->freelist);
>> -       old.counters = READ_ONCE(slab->counters);
>> -       VM_BUG_ON(!old.frozen);
>> -
>> -       /* Determine target state of the slab */
>> -       new.counters = old.counters;
>> -       if (freelist_tail) {
>> -               new.inuse -= free_delta;
>> -               set_freepointer(s, freelist_tail, old.freelist);
>> -               new.freelist = freelist;
>> -       } else
>> -               new.freelist = old.freelist;
>> -
>> -       new.frozen = 0;
>> +       do {
>> +               old.freelist = READ_ONCE(slab->freelist);
>> +               old.counters = READ_ONCE(slab->counters);
>> +               VM_BUG_ON(!old.frozen);
>> +
>> +               /* Determine target state of the slab */
>> +               new.counters = old.counters;
>> +               new.frozen = 0;
>> +               if (freelist_tail) {
>> +                       new.inuse -= free_delta;
>> +                       set_freepointer(s, freelist_tail, old.freelist);
>> +                       new.freelist = freelist;
>> +               } else {
>> +                       new.freelist = old.freelist;
>> +               }
>> +       } while (!slab_update_freelist(s, slab,
>> +               old.freelist, old.counters,
>> +               new.freelist, new.counters,
>> +               "unfreezing slab"));
>>
>> +       /*
>> +        * Stage three: Manipulate the slab list based on the updated state.
>> +        */
> 
> deactivate_slab() might unconsciously put empty slabs into partial list, like:
> 
> deactivate_slab()                    __slab_free()
> cmpxchg(), slab's not empty
>                                                cmpxchg(), slab's empty
> and unfrozen
>                                                spin_lock(&n->list_lock)
>                                                (slab's empty but not
> on partial list,
> 
> spin_unlock(&n->list_lock) and return)
> spin_lock(&n->list_lock)
> put slab into partial list
> spin_unlock(&n->list_lock)
> 
> IMHO it should be fine in the real world, but just wanted to
> mention as it doesn't seem to be intentional.

I've noticed it too during review, but then realized it's not a new
behavior, same thing could happen with deactivate_slab() already before the
series. Free slabs on partial list are supported, we even keep some
intentionally as long as "n->nr_partial < s->min_partial" (and that check is
racy too), so no need to try making this more strict.

> Otherwise it looks good to me!

Good enough for a reviewed-by? :)

Re: [PATCH v5 7/9] slub: Optimize deactivate_slab()
Posted by Hyeonggon Yoo 2 years ago
On Tue, Dec 5, 2023 at 2:55 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 12/3/23 10:23, Hyeonggon Yoo wrote:
> > On Thu, Nov 2, 2023 at 12:25 PM <chengming.zhou@linux.dev> wrote:
> >>
> >> From: Chengming Zhou <zhouchengming@bytedance.com>
> >>
> >> Since the introduce of unfrozen slabs on cpu partial list, we don't
> >> need to synchronize the slab frozen state under the node list_lock.
> >>
> >> The caller of deactivate_slab() and the caller of __slab_free() won't
> >> manipulate the slab list concurrently.
> >>
> >> So we can get node list_lock in the last stage if we really need to
> >> manipulate the slab list in this path.
> >>
> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> >> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> >> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> >> ---
> >>  mm/slub.c | 79 ++++++++++++++++++-------------------------------------
> >>  1 file changed, 26 insertions(+), 53 deletions(-)
> >>
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index bcb5b2c4e213..d137468fe4b9 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -2468,10 +2468,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
> >>  static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> >>                             void *freelist)
> >>  {
> >> -       enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
> >>         struct kmem_cache_node *n = get_node(s, slab_nid(slab));
> >>         int free_delta = 0;
> >> -       enum slab_modes mode = M_NONE;
> >>         void *nextfree, *freelist_iter, *freelist_tail;
> >>         int tail = DEACTIVATE_TO_HEAD;
> >>         unsigned long flags = 0;
> >> @@ -2509,65 +2507,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> >>         /*
> >>          * Stage two: Unfreeze the slab while splicing the per-cpu
> >>          * freelist to the head of slab's freelist.
> >> -        *
> >> -        * Ensure that the slab is unfrozen while the list presence
> >> -        * reflects the actual number of objects during unfreeze.
> >> -        *
> >> -        * We first perform cmpxchg holding lock and insert to list
> >> -        * when it succeed. If there is mismatch then the slab is not
> >> -        * unfrozen and number of objects in the slab may have changed.
> >> -        * Then release lock and retry cmpxchg again.
> >>          */
> >> -redo:
> >> -
> >> -       old.freelist = READ_ONCE(slab->freelist);
> >> -       old.counters = READ_ONCE(slab->counters);
> >> -       VM_BUG_ON(!old.frozen);
> >> -
> >> -       /* Determine target state of the slab */
> >> -       new.counters = old.counters;
> >> -       if (freelist_tail) {
> >> -               new.inuse -= free_delta;
> >> -               set_freepointer(s, freelist_tail, old.freelist);
> >> -               new.freelist = freelist;
> >> -       } else
> >> -               new.freelist = old.freelist;
> >> -
> >> -       new.frozen = 0;
> >> +       do {
> >> +               old.freelist = READ_ONCE(slab->freelist);
> >> +               old.counters = READ_ONCE(slab->counters);
> >> +               VM_BUG_ON(!old.frozen);
> >> +
> >> +               /* Determine target state of the slab */
> >> +               new.counters = old.counters;
> >> +               new.frozen = 0;
> >> +               if (freelist_tail) {
> >> +                       new.inuse -= free_delta;
> >> +                       set_freepointer(s, freelist_tail, old.freelist);
> >> +                       new.freelist = freelist;
> >> +               } else {
> >> +                       new.freelist = old.freelist;
> >> +               }
> >> +       } while (!slab_update_freelist(s, slab,
> >> +               old.freelist, old.counters,
> >> +               new.freelist, new.counters,
> >> +               "unfreezing slab"));
> >>
> >> +       /*
> >> +        * Stage three: Manipulate the slab list based on the updated state.
> >> +        */
> >
> > deactivate_slab() might unconsciously put empty slabs into partial list, like:
> >
> > deactivate_slab()                    __slab_free()
> > cmpxchg(), slab's not empty
> >                                                cmpxchg(), slab's empty
> > and unfrozen
> >                                                spin_lock(&n->list_lock)
> >                                                (slab's empty but not
> > on partial list,
> >
> > spin_unlock(&n->list_lock) and return)
> > spin_lock(&n->list_lock)
> > put slab into partial list
> > spin_unlock(&n->list_lock)
> >
> > IMHO it should be fine in the real world, but just wanted to
> > mention as it doesn't seem to be intentional.
>
> I've noticed it too during review, but then realized it's not a new
> behavior, same thing could happen with deactivate_slab() already before the
> series.

Ah, you are right.

>  Free slabs on partial list are supported, we even keep some
> intentionally as long as "n->nr_partial < s->min_partial" (and that check is
> racy too) so no need to try making this more strict.

Agreed.

> > Otherwise it looks good to me!
>
> Good enough for a reviewed-by? :)

Yes,
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Thanks!
--
Hyeonggon
Re: [PATCH v5 7/9] slub: Optimize deactivate_slab()
Posted by Chengming Zhou 2 years ago
On 2023/12/3 17:23, Hyeonggon Yoo wrote:
> On Thu, Nov 2, 2023 at 12:25 PM <chengming.zhou@linux.dev> wrote:
>>
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> Since the introduce of unfrozen slabs on cpu partial list, we don't
>> need to synchronize the slab frozen state under the node list_lock.
>>
>> The caller of deactivate_slab() and the caller of __slab_free() won't
>> manipulate the slab list concurrently.
>>
>> So we can get node list_lock in the last stage if we really need to
>> manipulate the slab list in this path.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
>> ---
>>  mm/slub.c | 79 ++++++++++++++++++-------------------------------------
>>  1 file changed, 26 insertions(+), 53 deletions(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index bcb5b2c4e213..d137468fe4b9 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2468,10 +2468,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
>>  static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>>                             void *freelist)
>>  {
>> -       enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
>>         struct kmem_cache_node *n = get_node(s, slab_nid(slab));
>>         int free_delta = 0;
>> -       enum slab_modes mode = M_NONE;
>>         void *nextfree, *freelist_iter, *freelist_tail;
>>         int tail = DEACTIVATE_TO_HEAD;
>>         unsigned long flags = 0;
>> @@ -2509,65 +2507,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>>         /*
>>          * Stage two: Unfreeze the slab while splicing the per-cpu
>>          * freelist to the head of slab's freelist.
>> -        *
>> -        * Ensure that the slab is unfrozen while the list presence
>> -        * reflects the actual number of objects during unfreeze.
>> -        *
>> -        * We first perform cmpxchg holding lock and insert to list
>> -        * when it succeed. If there is mismatch then the slab is not
>> -        * unfrozen and number of objects in the slab may have changed.
>> -        * Then release lock and retry cmpxchg again.
>>          */
>> -redo:
>> -
>> -       old.freelist = READ_ONCE(slab->freelist);
>> -       old.counters = READ_ONCE(slab->counters);
>> -       VM_BUG_ON(!old.frozen);
>> -
>> -       /* Determine target state of the slab */
>> -       new.counters = old.counters;
>> -       if (freelist_tail) {
>> -               new.inuse -= free_delta;
>> -               set_freepointer(s, freelist_tail, old.freelist);
>> -               new.freelist = freelist;
>> -       } else
>> -               new.freelist = old.freelist;
>> -
>> -       new.frozen = 0;
>> +       do {
>> +               old.freelist = READ_ONCE(slab->freelist);
>> +               old.counters = READ_ONCE(slab->counters);
>> +               VM_BUG_ON(!old.frozen);
>> +
>> +               /* Determine target state of the slab */
>> +               new.counters = old.counters;
>> +               new.frozen = 0;
>> +               if (freelist_tail) {
>> +                       new.inuse -= free_delta;
>> +                       set_freepointer(s, freelist_tail, old.freelist);
>> +                       new.freelist = freelist;
>> +               } else {
>> +                       new.freelist = old.freelist;
>> +               }
>> +       } while (!slab_update_freelist(s, slab,
>> +               old.freelist, old.counters,
>> +               new.freelist, new.counters,
>> +               "unfreezing slab"));
>>
>> +       /*
>> +        * Stage three: Manipulate the slab list based on the updated state.
>> +        */
> 
> deactivate_slab() might unconsciously put empty slabs into partial list, like:
> 
> deactivate_slab()                    __slab_free()
> cmpxchg(), slab's not empty
>                                                cmpxchg(), slab's empty
> and unfrozen

Hi,

Sorry, but I don't get it here how __slab_free() can see the slab empty,
since the slab is not empty from deactivate_slab() path, and it can't be
used by any CPU at that time?

Thanks for review!

>                                                spin_lock(&n->list_lock)
>                                                (slab's empty but not
> on partial list,
> 
> spin_unlock(&n->list_lock) and return)
> spin_lock(&n->list_lock)
> put slab into partial list
> spin_unlock(&n->list_lock)
> 
> IMHO it should be fine in the real world, but just wanted to
> mention as it doesn't seem to be intentional.
> 
> Otherwise it looks good to me!
> 
>>         if (!new.inuse && n->nr_partial >= s->min_partial) {
>> -               mode = M_FREE;
>> +               stat(s, DEACTIVATE_EMPTY);
>> +               discard_slab(s, slab);
>> +               stat(s, FREE_SLAB);
>>         } else if (new.freelist) {
>> -               mode = M_PARTIAL;
>> -               /*
>> -                * Taking the spinlock removes the possibility that
>> -                * acquire_slab() will see a slab that is frozen
>> -                */
>>                 spin_lock_irqsave(&n->list_lock, flags);
>> -       } else {
>> -               mode = M_FULL_NOLIST;
>> -       }
>> -
>> -
>> -       if (!slab_update_freelist(s, slab,
>> -                               old.freelist, old.counters,
>> -                               new.freelist, new.counters,
>> -                               "unfreezing slab")) {
>> -               if (mode == M_PARTIAL)
>> -                       spin_unlock_irqrestore(&n->list_lock, flags);
>> -               goto redo;
>> -       }
>> -
>> -
>> -       if (mode == M_PARTIAL) {
>>                 add_partial(n, slab, tail);
>>                 spin_unlock_irqrestore(&n->list_lock, flags);
>>                 stat(s, tail);
>> -       } else if (mode == M_FREE) {
>> -               stat(s, DEACTIVATE_EMPTY);
>> -               discard_slab(s, slab);
>> -               stat(s, FREE_SLAB);
>> -       } else if (mode == M_FULL_NOLIST) {
>> +       } else {
>>                 stat(s, DEACTIVATE_FULL);
>>         }
>>  }
>> --
>> 2.20.1
>>
Re: [PATCH v5 7/9] slub: Optimize deactivate_slab()
Posted by Hyeonggon Yoo 2 years ago
On Sun, Dec 3, 2023 at 7:26 PM Chengming Zhou <chengming.zhou@linux.dev> wrote:
>
> On 2023/12/3 17:23, Hyeonggon Yoo wrote:
> > On Thu, Nov 2, 2023 at 12:25 PM <chengming.zhou@linux.dev> wrote:
> >>
> >> From: Chengming Zhou <zhouchengming@bytedance.com>
> >>
> >> Since the introduce of unfrozen slabs on cpu partial list, we don't
> >> need to synchronize the slab frozen state under the node list_lock.
> >>
> >> The caller of deactivate_slab() and the caller of __slab_free() won't
> >> manipulate the slab list concurrently.
> >>
> >> So we can get node list_lock in the last stage if we really need to
> >> manipulate the slab list in this path.
> >>
> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> >> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> >> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> >> ---
> >>  mm/slub.c | 79 ++++++++++++++++++-------------------------------------
> >>  1 file changed, 26 insertions(+), 53 deletions(-)
> >>
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index bcb5b2c4e213..d137468fe4b9 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -2468,10 +2468,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
> >>  static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> >>                             void *freelist)
> >>  {
> >> -       enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
> >>         struct kmem_cache_node *n = get_node(s, slab_nid(slab));
> >>         int free_delta = 0;
> >> -       enum slab_modes mode = M_NONE;
> >>         void *nextfree, *freelist_iter, *freelist_tail;
> >>         int tail = DEACTIVATE_TO_HEAD;
> >>         unsigned long flags = 0;
> >> @@ -2509,65 +2507,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> >>         /*
> >>          * Stage two: Unfreeze the slab while splicing the per-cpu
> >>          * freelist to the head of slab's freelist.
> >> -        *
> >> -        * Ensure that the slab is unfrozen while the list presence
> >> -        * reflects the actual number of objects during unfreeze.
> >> -        *
> >> -        * We first perform cmpxchg holding lock and insert to list
> >> -        * when it succeed. If there is mismatch then the slab is not
> >> -        * unfrozen and number of objects in the slab may have changed.
> >> -        * Then release lock and retry cmpxchg again.
> >>          */
> >> -redo:
> >> -
> >> -       old.freelist = READ_ONCE(slab->freelist);
> >> -       old.counters = READ_ONCE(slab->counters);
> >> -       VM_BUG_ON(!old.frozen);
> >> -
> >> -       /* Determine target state of the slab */
> >> -       new.counters = old.counters;
> >> -       if (freelist_tail) {
> >> -               new.inuse -= free_delta;
> >> -               set_freepointer(s, freelist_tail, old.freelist);
> >> -               new.freelist = freelist;
> >> -       } else
> >> -               new.freelist = old.freelist;
> >> -
> >> -       new.frozen = 0;
> >> +       do {
> >> +               old.freelist = READ_ONCE(slab->freelist);
> >> +               old.counters = READ_ONCE(slab->counters);
> >> +               VM_BUG_ON(!old.frozen);
> >> +
> >> +               /* Determine target state of the slab */
> >> +               new.counters = old.counters;
> >> +               new.frozen = 0;
> >> +               if (freelist_tail) {
> >> +                       new.inuse -= free_delta;
> >> +                       set_freepointer(s, freelist_tail, old.freelist);
> >> +                       new.freelist = freelist;
> >> +               } else {
> >> +                       new.freelist = old.freelist;
> >> +               }
> >> +       } while (!slab_update_freelist(s, slab,
> >> +               old.freelist, old.counters,
> >> +               new.freelist, new.counters,
> >> +               "unfreezing slab"));
> >>
> >> +       /*
> >> +        * Stage three: Manipulate the slab list based on the updated state.
> >> +        */
> >
> > deactivate_slab() might unconsciously put empty slabs into partial list, like:
> >
> > deactivate_slab()                    __slab_free()
> > cmpxchg(), slab's not empty
> >                                                cmpxchg(), slab's empty
> > and unfrozen
>
> Hi,
>
> Sorry, but I don't get it here how __slab_free() can see the slab empty,
> since the slab is not empty from deactivate_slab() path, and it can't be
> used by any CPU at that time?

The scenario is CPU B previously allocated an object from slab X, but
put it into node partial list and then CPU A have taken slab X into cpu slab.

While slab X is CPU A's cpu slab, when CPU B frees an object from slab X,
it puts the object into slab X's freelist using cmpxchg.

Let's say in CPU A the deactivation path performs cmpxchg and X.inuse was 1,
and then CPU B frees (__slab_free()) to slab X's freelist using cmpxchg,
_before_ slab X's put into partial list by CPU A.

Then CPU A thinks it's not empty so put it into partial list, but by CPU B
the slab has become empty.

Maybe I am confused, in that case please tell me I'm wrong :)

Thanks!

--
Hyeonggon
Re: [PATCH v5 7/9] slub: Optimize deactivate_slab()
Posted by Chengming Zhou 2 years ago
On 2023/12/3 19:19, Hyeonggon Yoo wrote:
> On Sun, Dec 3, 2023 at 7:26 PM Chengming Zhou <chengming.zhou@linux.dev> wrote:
>>
>> On 2023/12/3 17:23, Hyeonggon Yoo wrote:
>>> On Thu, Nov 2, 2023 at 12:25 PM <chengming.zhou@linux.dev> wrote:
>>>>
>>>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>>>
>>>> Since the introduce of unfrozen slabs on cpu partial list, we don't
>>>> need to synchronize the slab frozen state under the node list_lock.
>>>>
>>>> The caller of deactivate_slab() and the caller of __slab_free() won't
>>>> manipulate the slab list concurrently.
>>>>
>>>> So we can get node list_lock in the last stage if we really need to
>>>> manipulate the slab list in this path.
>>>>
>>>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>>> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
>>>> ---
>>>>  mm/slub.c | 79 ++++++++++++++++++-------------------------------------
>>>>  1 file changed, 26 insertions(+), 53 deletions(-)
>>>>
>>>> diff --git a/mm/slub.c b/mm/slub.c
>>>> index bcb5b2c4e213..d137468fe4b9 100644
>>>> --- a/mm/slub.c
>>>> +++ b/mm/slub.c
>>>> @@ -2468,10 +2468,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
>>>>  static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>>>>                             void *freelist)
>>>>  {
>>>> -       enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
>>>>         struct kmem_cache_node *n = get_node(s, slab_nid(slab));
>>>>         int free_delta = 0;
>>>> -       enum slab_modes mode = M_NONE;
>>>>         void *nextfree, *freelist_iter, *freelist_tail;
>>>>         int tail = DEACTIVATE_TO_HEAD;
>>>>         unsigned long flags = 0;
>>>> @@ -2509,65 +2507,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>>>>         /*
>>>>          * Stage two: Unfreeze the slab while splicing the per-cpu
>>>>          * freelist to the head of slab's freelist.
>>>> -        *
>>>> -        * Ensure that the slab is unfrozen while the list presence
>>>> -        * reflects the actual number of objects during unfreeze.
>>>> -        *
>>>> -        * We first perform cmpxchg holding lock and insert to list
>>>> -        * when it succeed. If there is mismatch then the slab is not
>>>> -        * unfrozen and number of objects in the slab may have changed.
>>>> -        * Then release lock and retry cmpxchg again.
>>>>          */
>>>> -redo:
>>>> -
>>>> -       old.freelist = READ_ONCE(slab->freelist);
>>>> -       old.counters = READ_ONCE(slab->counters);
>>>> -       VM_BUG_ON(!old.frozen);
>>>> -
>>>> -       /* Determine target state of the slab */
>>>> -       new.counters = old.counters;
>>>> -       if (freelist_tail) {
>>>> -               new.inuse -= free_delta;
>>>> -               set_freepointer(s, freelist_tail, old.freelist);
>>>> -               new.freelist = freelist;
>>>> -       } else
>>>> -               new.freelist = old.freelist;
>>>> -
>>>> -       new.frozen = 0;
>>>> +       do {
>>>> +               old.freelist = READ_ONCE(slab->freelist);
>>>> +               old.counters = READ_ONCE(slab->counters);
>>>> +               VM_BUG_ON(!old.frozen);
>>>> +
>>>> +               /* Determine target state of the slab */
>>>> +               new.counters = old.counters;
>>>> +               new.frozen = 0;
>>>> +               if (freelist_tail) {
>>>> +                       new.inuse -= free_delta;
>>>> +                       set_freepointer(s, freelist_tail, old.freelist);
>>>> +                       new.freelist = freelist;
>>>> +               } else {
>>>> +                       new.freelist = old.freelist;
>>>> +               }
>>>> +       } while (!slab_update_freelist(s, slab,
>>>> +               old.freelist, old.counters,
>>>> +               new.freelist, new.counters,
>>>> +               "unfreezing slab"));
>>>>
>>>> +       /*
>>>> +        * Stage three: Manipulate the slab list based on the updated state.
>>>> +        */
>>>
>>> deactivate_slab() might unconsciously put empty slabs into partial list, like:
>>>
>>> deactivate_slab()                    __slab_free()
>>> cmpxchg(), slab's not empty
>>>                                                cmpxchg(), slab's empty
>>> and unfrozen
>>
>> Hi,
>>
>> Sorry, but I don't get it here how __slab_free() can see the slab empty,
>> since the slab is not empty from deactivate_slab() path, and it can't be
>> used by any CPU at that time?
> 
> The scenario is CPU B previously allocated an object from slab X, but
> put it into node partial list and then CPU A have taken slab X into cpu slab.
> 
> While slab X is CPU A's cpu slab, when CPU B frees an object from slab X,
> it puts the object into slab X's freelist using cmpxchg.
> 
> Let's say in CPU A the deactivation path performs cmpxchg and X.inuse was 1,
> and then CPU B frees (__slab_free()) to slab X's freelist using cmpxchg,
> _before_ slab X's put into partial list by CPU A.
> 
> Then CPU A thinks it's not empty so put it into partial list, but by CPU B
> the slab has become empty.
> 
> Maybe I am confused, in that case please tell me I'm wrong :)
> 

Ah, you're right! I misunderstood the slab "empty" with "full". :)

Yes, in this case the "empty" slab would be put into the node partial list,
and it should be fine in the real world as you noted earlier.

Thanks!