mm/slub.c | 50 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 43 insertions(+), 7 deletions(-)
Introduce barn_get_full_sheaf(), a helper that detaches a full sheaf from
the per-node barn without requiring an empty sheaf in exchange.
Use this helper in __pcs_replace_empty_main() to change how an empty main
per-CPU sheaf is handled:
- If pcs->spare is NULL and pcs->main is empty, first try to obtain a
full sheaf from the barn via barn_get_full_sheaf(). On success, park
the empty main sheaf in pcs->spare and install the full sheaf as the
new pcs->main.
- If pcs->spare already exists and has objects, keep the existing
behavior of simply swapping pcs->main and pcs->spare.
- Only when both pcs->main and pcs->spare are empty do we fall back to
barn_replace_empty_sheaf() and trade the empty main sheaf into the
barn in exchange for a full one.
This makes the empty-main path more symmetric with __pcs_replace_full_main(),
which for a full main sheaf parks the full sheaf in pcs->spare and pulls an
empty sheaf from the barn. It also matches the documented design more closely:
"When both percpu sheaves are found empty during an allocation, an empty
sheaf may be replaced with a full one from the per-node barn."
Signed-off-by: Hao Li <haoli.tcs@gmail.com>
---
* This patch is based on b4/sheaves-for-all branch
mm/slub.c | 50 +++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 43 insertions(+), 7 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index a94c64f56504..1fd28aa204e1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2746,6 +2746,32 @@ static void pcs_destroy(struct kmem_cache *s)
s->cpu_sheaves = NULL;
}
+static struct slab_sheaf *barn_get_full_sheaf(struct node_barn *barn,
+ bool allow_spin)
+{
+ struct slab_sheaf *full = NULL;
+ unsigned long flags;
+
+ if (!data_race(barn->nr_full))
+ return NULL;
+
+ if (likely(allow_spin))
+ spin_lock_irqsave(&barn->lock, flags);
+ else if (!spin_trylock_irqsave(&barn->lock, flags))
+ return NULL;
+
+ if (likely(barn->nr_full)) {
+ full = list_first_entry(&barn->sheaves_full,
+ struct slab_sheaf, barn_list);
+ list_del(&full->barn_list);
+ barn->nr_full--;
+ }
+
+ spin_unlock_irqrestore(&barn->lock, flags);
+
+ return full;
+}
+
static struct slab_sheaf *barn_get_empty_sheaf(struct node_barn *barn,
bool allow_spin)
{
@@ -4120,7 +4146,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
struct slab_sheaf *empty = NULL;
struct slab_sheaf *full;
struct node_barn *barn;
- bool can_alloc;
+ bool can_alloc, allow_spin;
lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
@@ -4130,10 +4156,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
return NULL;
}
- if (pcs->spare && pcs->spare->size > 0) {
- swap(pcs->main, pcs->spare);
- return pcs;
- }
+ allow_spin = gfpflags_allow_spinning(gfp);
barn = get_barn(s);
if (!barn) {
@@ -4141,8 +4164,21 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
return NULL;
}
- full = barn_replace_empty_sheaf(barn, pcs->main,
- gfpflags_allow_spinning(gfp));
+ if (!pcs->spare) {
+ full = barn_get_full_sheaf(barn, allow_spin);
+ if (full) {
+ pcs->spare = pcs->main;
+ pcs->main = full;
+ return pcs;
+ }
+ } else if (pcs->spare->size > 0) {
+ swap(pcs->main, pcs->spare);
+ return pcs;
+ }
+
+ /* both main and spare are empty */
+
+ full = barn_replace_empty_sheaf(barn, pcs->main, allow_spin);
if (full) {
stat(s, BARN_GET);
--
2.50.1
On Tue, Dec 02, 2025 at 05:00:08PM +0800, Hao Li wrote: > Introduce barn_get_full_sheaf(), a helper that detaches a full sheaf from > the per-node barn without requiring an empty sheaf in exchange. > > Use this helper in __pcs_replace_empty_main() to change how an empty main > per-CPU sheaf is handled: > > - If pcs->spare is NULL and pcs->main is empty, first try to obtain a > full sheaf from the barn via barn_get_full_sheaf(). On success, park > the empty main sheaf in pcs->spare and install the full sheaf as the > new pcs->main. > > - If pcs->spare already exists and has objects, keep the existing > behavior of simply swapping pcs->main and pcs->spare. > > - Only when both pcs->main and pcs->spare are empty do we fall back to > barn_replace_empty_sheaf() and trade the empty main sheaf into the > barn in exchange for a full one. Hi Hao, Yeah this is a very subtle difference between __pcs_replace_full_main() and __pcs_replace_empty_main(), that the former installs the full main sheaf in pcs->spare, while the latter replaces the empty main sheaf with a full sheaf from the barn without populating pcs->spare. Is it intentional, Vlastimil? > This makes the empty-main path more symmetric with __pcs_replace_full_main(), > which for a full main sheaf parks the full sheaf in pcs->spare and pulls an > empty sheaf from the barn. It also matches the documented design more closely: > > "When both percpu sheaves are found empty during an allocation, an empty > sheaf may be replaced with a full one from the per-node barn." I'm not convinced that this change is worthwhile by adding more code; you probably need to make a stronger argument for why it should be done. > Signed-off-by: Hao Li <haoli.tcs@gmail.com> > --- -- Cheers, Harry / Hyeonggon
On Wed, Dec 03, 2025 at 02:46:22PM +0900, Harry Yoo wrote: > On Tue, Dec 02, 2025 at 05:00:08PM +0800, Hao Li wrote: > > Introduce barn_get_full_sheaf(), a helper that detaches a full sheaf from > > the per-node barn without requiring an empty sheaf in exchange. > > > > Use this helper in __pcs_replace_empty_main() to change how an empty main > > per-CPU sheaf is handled: > > > > - If pcs->spare is NULL and pcs->main is empty, first try to obtain a > > full sheaf from the barn via barn_get_full_sheaf(). On success, park > > the empty main sheaf in pcs->spare and install the full sheaf as the > > new pcs->main. > > > > - If pcs->spare already exists and has objects, keep the existing > > behavior of simply swapping pcs->main and pcs->spare. > > > > - Only when both pcs->main and pcs->spare are empty do we fall back to > > barn_replace_empty_sheaf() and trade the empty main sheaf into the > > barn in exchange for a full one. > > Hi Hao, > > Yeah this is a very subtle difference between __pcs_replace_full_main() > and __pcs_replace_empty_main(), that the former installs the full main > sheaf in pcs->spare, while the latter replaces the empty main sheaf with > a full sheaf from the barn without populating pcs->spare. Exactly. > > Is it intentional, Vlastimil? > > > This makes the empty-main path more symmetric with __pcs_replace_full_main(), > > which for a full main sheaf parks the full sheaf in pcs->spare and pulls an > > empty sheaf from the barn. It also matches the documented design more closely: > > > > "When both percpu sheaves are found empty during an allocation, an empty > > sheaf may be replaced with a full one from the per-node barn." > > I'm not convinced that this change is worthwhile by adding more code; > you probably need to make a stronger argument for why it should be done. Hi Harry, Let me explain my intuition in more detail. Previously, when pcs->main was empty and pcs->spare was NULL, we used barn_replace_empty_sheaf() to trade the empty main sheaf into the barn in exchange for a full one. As a result, pcs->main became full, but pcs->spare remained NULL. Later, when frees filled pcs->main again, __pcs_replace_full_main() had to call into the barn to obtain an empty sheaf, because there was still no local spare to use. With this patch, when pcs->main is empty and pcs->spare is NULL, __pcs_replace_empty_main() instead uses barn_get_full_sheaf() to pull a full sheaf from the barn while keeping the now‑empty main sheaf locally as pcs->spare. The next time pcs->main becomes full, __pcs_replace_full_main() can simply swap main and spare, with no barn operations and no need to allocate a new empty sheaf. In other words, although we still need one barn operation when main first becomes empty in __pcs_replace_empty_main(), we avoid a future barn operation on the subsequent “main full” path in __pcs_replace_full_main. Thanks. > > > Signed-off-by: Hao Li <haoli.tcs@gmail.com> > > --- > > -- > Cheers, > Harry / Hyeonggon
On Wed, Dec 03, 2025 at 07:15:12PM +0800, Hao Li wrote:
> On Wed, Dec 03, 2025 at 02:46:22PM +0900, Harry Yoo wrote:
> > On Tue, Dec 02, 2025 at 05:00:08PM +0800, Hao Li wrote:
> > > Introduce barn_get_full_sheaf(), a helper that detaches a full sheaf from
> > > the per-node barn without requiring an empty sheaf in exchange.
> > >
> > > Use this helper in __pcs_replace_empty_main() to change how an empty main
> > > per-CPU sheaf is handled:
> > >
> > > - If pcs->spare is NULL and pcs->main is empty, first try to obtain a
> > > full sheaf from the barn via barn_get_full_sheaf(). On success, park
> > > the empty main sheaf in pcs->spare and install the full sheaf as the
> > > new pcs->main.
> > >
> > > - If pcs->spare already exists and has objects, keep the existing
> > > behavior of simply swapping pcs->main and pcs->spare.
> > >
> > > - Only when both pcs->main and pcs->spare are empty do we fall back to
> > > barn_replace_empty_sheaf() and trade the empty main sheaf into the
> > > barn in exchange for a full one.
> >
> > Hi Hao,
> >
> > Yeah this is a very subtle difference between __pcs_replace_full_main()
> > and __pcs_replace_empty_main(), that the former installs the full main
> > sheaf in pcs->spare, while the latter replaces the empty main sheaf with
> > a full sheaf from the barn without populating pcs->spare.
>
> Exactly.
>
> > Is it intentional, Vlastimil?
Let's first see if Vlastimil had an intention, and...
> > > This makes the empty-main path more symmetric with __pcs_replace_full_main(),
> > > which for a full main sheaf parks the full sheaf in pcs->spare and pulls an
> > > empty sheaf from the barn. It also matches the documented design more closely:
> > >
> > > "When both percpu sheaves are found empty during an allocation, an empty
> > > sheaf may be replaced with a full one from the per-node barn."
> >
> > I'm not convinced that this change is worthwhile by adding more code;
> > you probably need to make a stronger argument for why it should be done.
>
> Hi Harry,
>
> Let me explain my intuition in more detail.
>
> Previously, when pcs->main was empty and pcs->spare was NULL, we used
> barn_replace_empty_sheaf() to trade the empty main sheaf into the barn
> in exchange for a full one. As a result, pcs->main became full, but
> pcs->spare remained NULL. Later, when frees filled pcs->main again,
> __pcs_replace_full_main() had to call into the barn to obtain an empty
> sheaf, because there was still no local spare to use.
>
> With this patch, when pcs->main is empty and pcs->spare is NULL,
> __pcs_replace_empty_main() instead uses barn_get_full_sheaf() to pull a
> full sheaf from the barn while keeping the now‑empty main sheaf locally
> as pcs->spare. The next time pcs->main becomes full,
> __pcs_replace_full_main() can simply swap main and spare, with no barn
> operations and no need to allocate a new empty sheaf.
I'm not still sure that either way is superior, as it really depends on
the alloc/free pattern. If the CPU keeps allocating more objects, keeping
the empty sheaf is unnecessary, but we don't know what the alloc/free
pattern will be.
So strong opinion from me, but I think it'd be better make
__pcs_replace_{full,empty}_main() handle it consistently,
if there is no special intention.
> In other words, although we still need one barn operation when main
> first becomes empty in __pcs_replace_empty_main(), we avoid a future
> barn operation on the subsequent “main full” path in
> __pcs_replace_full_main.
>
> Thanks.
>
> >
> > > Signed-off-by: Hao Li <haoli.tcs@gmail.com>
--
Cheers,
Harry / Hyeonggon
On 12/7/25 14:59, Harry Yoo wrote:
> On Wed, Dec 03, 2025 at 07:15:12PM +0800, Hao Li wrote:
>> On Wed, Dec 03, 2025 at 02:46:22PM +0900, Harry Yoo wrote:
>> > On Tue, Dec 02, 2025 at 05:00:08PM +0800, Hao Li wrote:
>> > > Introduce barn_get_full_sheaf(), a helper that detaches a full sheaf from
>> > > the per-node barn without requiring an empty sheaf in exchange.
>> > >
>> > > Use this helper in __pcs_replace_empty_main() to change how an empty main
>> > > per-CPU sheaf is handled:
>> > >
>> > > - If pcs->spare is NULL and pcs->main is empty, first try to obtain a
>> > > full sheaf from the barn via barn_get_full_sheaf(). On success, park
>> > > the empty main sheaf in pcs->spare and install the full sheaf as the
>> > > new pcs->main.
>> > >
>> > > - If pcs->spare already exists and has objects, keep the existing
>> > > behavior of simply swapping pcs->main and pcs->spare.
>> > >
>> > > - Only when both pcs->main and pcs->spare are empty do we fall back to
>> > > barn_replace_empty_sheaf() and trade the empty main sheaf into the
>> > > barn in exchange for a full one.
>> >
>> > Hi Hao,
>> >
>> > Yeah this is a very subtle difference between __pcs_replace_full_main()
>> > and __pcs_replace_empty_main(), that the former installs the full main
>> > sheaf in pcs->spare, while the latter replaces the empty main sheaf with
>> > a full sheaf from the barn without populating pcs->spare.
>>
>> Exactly.
>>
>> > Is it intentional, Vlastimil?
>
> Let's first see if Vlastimil had an intention, and...
Hm I don't think I aimed to make this difference on purpose, but I didn't
also aim to make the alloc/free paths completely symmetric. Rather the goal
was just to do what seemed the best option in each situation. And probably
getting a full sheaf and populating spare never seemed to be an important
case to warrant the extra code for a situation that's only transient after
boot (see below).
>> > > This makes the empty-main path more symmetric with __pcs_replace_full_main(),
>> > > which for a full main sheaf parks the full sheaf in pcs->spare and pulls an
>> > > empty sheaf from the barn. It also matches the documented design more closely:
>> > >
>> > > "When both percpu sheaves are found empty during an allocation, an empty
>> > > sheaf may be replaced with a full one from the per-node barn."
>> >
>> > I'm not convinced that this change is worthwhile by adding more code;
>> > you probably need to make a stronger argument for why it should be done.
>>
>> Hi Harry,
>>
>> Let me explain my intuition in more detail.
>>
>> Previously, when pcs->main was empty and pcs->spare was NULL, we used
>> barn_replace_empty_sheaf() to trade the empty main sheaf into the barn
>> in exchange for a full one. As a result, pcs->main became full, but
>> pcs->spare remained NULL. Later, when frees filled pcs->main again,
>> __pcs_replace_full_main() had to call into the barn to obtain an empty
>> sheaf, because there was still no local spare to use.
As Harry suggests, that assumes a specific pattern where we exhaust main
sheaf first and then we fill it fully back. But even then this can only
happen once per cpu and then we have populated the spare and are very
unlikely to run into this situation again.
Also it's unlikely that full sheaves even exist in the barn during this
early stage when we would request them. That assumes cpus behave differently
and some have returned full sheaves to the barn before other cpus have
consumed their first full sheaf and request another.
More likely both barn_replace_empty_sheaf() and barn_get_empty_sheaf() will
fail and we do alloc_full_sheaf(). And then... I think I can see an issue in
__pcs_replace_empty_main() that's more likely to be suboptimal than the lack
of symmetry you point out. When we reach the last part below "we can reach
here only when gfpflags_allow_blocking..." and we have empty pcs->main, a
full sheaf from alloc_full_sheaf() and no spare, we should be doing
"pcs->spare = pcs->main" and not barn_put_empty_sheaf(). Right? This is what
can delay populating the spare more likely I think.
>> With this patch, when pcs->main is empty and pcs->spare is NULL,
>> __pcs_replace_empty_main() instead uses barn_get_full_sheaf() to pull a
>> full sheaf from the barn while keeping the now‑empty main sheaf locally
>> as pcs->spare. The next time pcs->main becomes full,
>> __pcs_replace_full_main() can simply swap main and spare, with no barn
>> operations and no need to allocate a new empty sheaf.
>
> I'm not still sure that either way is superior, as it really depends on
> the alloc/free pattern. If the CPU keeps allocating more objects, keeping
> the empty sheaf is unnecessary, but we don't know what the alloc/free
> pattern will be.
Yeah.
> So strong opinion from me, but I think it'd be better make
> __pcs_replace_{full,empty}_main() handle it consistently,
> if there is no special intention.
I'd rather see some numbers. But the suboptimality pointed out above is more
obvious to me. Do you agree and want to send a patch? :)
>> In other words, although we still need one barn operation when main
>> first becomes empty in __pcs_replace_empty_main(), we avoid a future
>> barn operation on the subsequent “main full” path in
>> __pcs_replace_full_main.
>>
>> Thanks.
>>
>> >
>> > > Signed-off-by: Hao Li <haoli.tcs@gmail.com>
>
On Tue, Dec 9, 2025 at 2:51 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 12/7/25 14:59, Harry Yoo wrote:
> > On Wed, Dec 03, 2025 at 07:15:12PM +0800, Hao Li wrote:
> >> On Wed, Dec 03, 2025 at 02:46:22PM +0900, Harry Yoo wrote:
> >> > On Tue, Dec 02, 2025 at 05:00:08PM +0800, Hao Li wrote:
> >> > > Introduce barn_get_full_sheaf(), a helper that detaches a full sheaf from
> >> > > the per-node barn without requiring an empty sheaf in exchange.
> >> > >
> >> > > Use this helper in __pcs_replace_empty_main() to change how an empty main
> >> > > per-CPU sheaf is handled:
> >> > >
> >> > > - If pcs->spare is NULL and pcs->main is empty, first try to obtain a
> >> > > full sheaf from the barn via barn_get_full_sheaf(). On success, park
> >> > > the empty main sheaf in pcs->spare and install the full sheaf as the
> >> > > new pcs->main.
> >> > >
> >> > > - If pcs->spare already exists and has objects, keep the existing
> >> > > behavior of simply swapping pcs->main and pcs->spare.
> >> > >
> >> > > - Only when both pcs->main and pcs->spare are empty do we fall back to
> >> > > barn_replace_empty_sheaf() and trade the empty main sheaf into the
> >> > > barn in exchange for a full one.
> >> >
> >> > Hi Hao,
> >> >
> >> > Yeah this is a very subtle difference between __pcs_replace_full_main()
> >> > and __pcs_replace_empty_main(), that the former installs the full main
> >> > sheaf in pcs->spare, while the latter replaces the empty main sheaf with
> >> > a full sheaf from the barn without populating pcs->spare.
> >>
> >> Exactly.
> >>
> >> > Is it intentional, Vlastimil?
> >
> > Let's first see if Vlastimil had an intention, and...
>
> Hm I don't think I aimed to make this difference on purpose, but I didn't
> also aim to make the alloc/free paths completely symmetric.
Got it.
> Rather the goal
> was just to do what seemed the best option in each situation. And probably
> getting a full sheaf and populating spare never seemed to be an important
> case to warrant the extra code for a situation that's only transient after
> boot (see below).
>
> >> > > This makes the empty-main path more symmetric with __pcs_replace_full_main(),
> >> > > which for a full main sheaf parks the full sheaf in pcs->spare and pulls an
> >> > > empty sheaf from the barn. It also matches the documented design more closely:
> >> > >
> >> > > "When both percpu sheaves are found empty during an allocation, an empty
> >> > > sheaf may be replaced with a full one from the per-node barn."
> >> >
> >> > I'm not convinced that this change is worthwhile by adding more code;
> >> > you probably need to make a stronger argument for why it should be done.
> >>
> >> Hi Harry,
> >>
> >> Let me explain my intuition in more detail.
> >>
> >> Previously, when pcs->main was empty and pcs->spare was NULL, we used
> >> barn_replace_empty_sheaf() to trade the empty main sheaf into the barn
> >> in exchange for a full one. As a result, pcs->main became full, but
> >> pcs->spare remained NULL. Later, when frees filled pcs->main again,
> >> __pcs_replace_full_main() had to call into the barn to obtain an empty
> >> sheaf, because there was still no local spare to use.
>
> As Harry suggests, that assumes a specific pattern where we exhaust main
> sheaf first and then we fill it fully back. But even then this can only
> happen once per cpu and then we have populated the spare and are very
> unlikely to run into this situation again.
I agree that my original patch was trying to optimize a rare pattern and
added more code than is justified.
>
> Also it's unlikely that full sheaves even exist in the barn during this
> early stage when we would request them. That assumes cpus behave differently
> and some have returned full sheaves to the barn before other cpus have
> consumed their first full sheaf and request another.
>
> More likely both barn_replace_empty_sheaf() and barn_get_empty_sheaf() will
> fail and we do alloc_full_sheaf(). And then... I think I can see an issue in
> __pcs_replace_empty_main() that's more likely to be suboptimal than the lack
> of symmetry you point out. When we reach the last part below "we can reach
> here only when gfpflags_allow_blocking..." and we have empty pcs->main, a
> full sheaf from alloc_full_sheaf() and no spare, we should be doing
> "pcs->spare = pcs->main" and not barn_put_empty_sheaf(). Right? This is what
> can delay populating the spare more likely I think.
Thanks, this suboptimal case makes sense to me.
>
> >> With this patch, when pcs->main is empty and pcs->spare is NULL,
> >> __pcs_replace_empty_main() instead uses barn_get_full_sheaf() to pull a
> >> full sheaf from the barn while keeping the now‑empty main sheaf locally
> >> as pcs->spare. The next time pcs->main becomes full,
> >> __pcs_replace_full_main() can simply swap main and spare, with no barn
> >> operations and no need to allocate a new empty sheaf.
> >
> > I'm not still sure that either way is superior, as it really depends on
> > the alloc/free pattern. If the CPU keeps allocating more objects, keeping
> > the empty sheaf is unnecessary, but we don't know what the alloc/free
> > pattern will be.
>
> Yeah.
>
> > So strong opinion from me, but I think it'd be better make
> > __pcs_replace_{full,empty}_main() handle it consistently,
> > if there is no special intention.
>
> I'd rather see some numbers. But the suboptimality pointed out above is more
> obvious to me. Do you agree and want to send a patch? :)
Sure! I’ve prepared a smaller patch and will send it as v2.
>
> >> In other words, although we still need one barn operation when main
> >> first becomes empty in __pcs_replace_empty_main(), we avoid a future
> >> barn operation on the subsequent “main full” path in
> >> __pcs_replace_full_main.
> >>
> >> Thanks.
> >>
> >> >
> >> > > Signed-off-by: Hao Li <haoli.tcs@gmail.com>
> >
>
On Mon, Dec 08, 2025 at 07:51:40PM +0100, Vlastimil Babka wrote:
> On 12/7/25 14:59, Harry Yoo wrote:
> > On Wed, Dec 03, 2025 at 07:15:12PM +0800, Hao Li wrote:
> >> On Wed, Dec 03, 2025 at 02:46:22PM +0900, Harry Yoo wrote:
> >> > On Tue, Dec 02, 2025 at 05:00:08PM +0800, Hao Li wrote:
> >> > > Introduce barn_get_full_sheaf(), a helper that detaches a full sheaf from
> >> > > the per-node barn without requiring an empty sheaf in exchange.
> >> > >
> >> > > Use this helper in __pcs_replace_empty_main() to change how an empty main
> >> > > per-CPU sheaf is handled:
> >> > >
> >> > > - If pcs->spare is NULL and pcs->main is empty, first try to obtain a
> >> > > full sheaf from the barn via barn_get_full_sheaf(). On success, park
> >> > > the empty main sheaf in pcs->spare and install the full sheaf as the
> >> > > new pcs->main.
> >> > >
> >> > > - If pcs->spare already exists and has objects, keep the existing
> >> > > behavior of simply swapping pcs->main and pcs->spare.
> >> > >
> >> > > - Only when both pcs->main and pcs->spare are empty do we fall back to
> >> > > barn_replace_empty_sheaf() and trade the empty main sheaf into the
> >> > > barn in exchange for a full one.
> >> >
> >> > Hi Hao,
> >> >
> >> > Yeah this is a very subtle difference between __pcs_replace_full_main()
> >> > and __pcs_replace_empty_main(), that the former installs the full main
> >> > sheaf in pcs->spare, while the latter replaces the empty main sheaf with
> >> > a full sheaf from the barn without populating pcs->spare.
> >>
> >> Exactly.
> >>
> >> > Is it intentional, Vlastimil?
> >
> > Let's first see if Vlastimil had an intention, and...
>
> Hm I don't think I aimed to make this difference on purpose, but I didn't
> also aim to make the alloc/free paths completely symmetric. Rather the goal
> was just to do what seemed the best option in each situation. And probably
> getting a full sheaf and populating spare never seemed to be an important
> case to warrant the extra code for a situation that's only transient after
> boot (see below).
>
> >> > > This makes the empty-main path more symmetric with __pcs_replace_full_main(),
> >> > > which for a full main sheaf parks the full sheaf in pcs->spare and pulls an
> >> > > empty sheaf from the barn. It also matches the documented design more closely:
> >> > >
> >> > > "When both percpu sheaves are found empty during an allocation, an empty
> >> > > sheaf may be replaced with a full one from the per-node barn."
> >> >
> >> > I'm not convinced that this change is worthwhile by adding more code;
> >> > you probably need to make a stronger argument for why it should be done.
> >>
> >> Hi Harry,
> >>
> >> Let me explain my intuition in more detail.
> >>
> >> Previously, when pcs->main was empty and pcs->spare was NULL, we used
> >> barn_replace_empty_sheaf() to trade the empty main sheaf into the barn
> >> in exchange for a full one. As a result, pcs->main became full, but
> >> pcs->spare remained NULL. Later, when frees filled pcs->main again,
> >> __pcs_replace_full_main() had to call into the barn to obtain an empty
> >> sheaf, because there was still no local spare to use.
>
> As Harry suggests, that assumes a specific pattern where we exhaust main
> sheaf first and then we fill it fully back.
Right.
> But even then this can only
> happen once per cpu and then we have populated the spare and are very
> unlikely to run into this situation again.
Good point!
> Also it's unlikely that full sheaves even exist in the barn during this
> early stage when we would request them. That assumes cpus behave differently
> and some have returned full sheaves to the barn before other cpus have
> consumed their first full sheaf and request another.
Right.
> More likely both barn_replace_empty_sheaf() and barn_get_empty_sheaf() will
> fail and we do alloc_full_sheaf().
>
> And then... I think I can see an issue in
> __pcs_replace_empty_main() that's more likely to be suboptimal than the lack
> of symmetry you point out.
> When we reach the last part below "we can reach
> here only when gfpflags_allow_blocking..." and we have empty pcs->main, a
> full sheaf from alloc_full_sheaf() and no spare, we should be doing
> "pcs->spare = pcs->main" and not barn_put_empty_sheaf(). Right? This is what
> can delay populating the spare more likely I think.
That makes sense to me.
> >> With this patch, when pcs->main is empty and pcs->spare is NULL,
> >> __pcs_replace_empty_main() instead uses barn_get_full_sheaf() to pull a
> >> full sheaf from the barn while keeping the now‑empty main sheaf locally
> >> as pcs->spare. The next time pcs->main becomes full,
> >> __pcs_replace_full_main() can simply swap main and spare, with no barn
> >> operations and no need to allocate a new empty sheaf.
> >
> > I'm not still sure that either way is superior, as it really depends on
> > the alloc/free pattern. If the CPU keeps allocating more objects, keeping
> > the empty sheaf is unnecessary, but we don't know what the alloc/free
> > pattern will be.
>
> Yeah.
>
> > So strong opinion from me, but I think it'd be better make
> > __pcs_replace_{full,empty}_main() handle it consistently,
> > if there is no special intention.
>
> I'd rather see some numbers. But the suboptimality pointed out above is more
> obvious to me. Do you agree and want to send a patch? :)
I agree and would like Hao Li to try this path as he raised this topic,
if he's interested ;)
> >> In other words, although we still need one barn operation when main
> >> first becomes empty in __pcs_replace_empty_main(), we avoid a future
> >> barn operation on the subsequent “main full” path in
> >> __pcs_replace_full_main.
> >>
> >> Thanks.
> >>
> >> >
> >> > > Signed-off-by: Hao Li <haoli.tcs@gmail.com>
--
Cheers,
Harry / Hyeonggon
On Tue, Dec 9, 2025 at 10:39 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> On Mon, Dec 08, 2025 at 07:51:40PM +0100, Vlastimil Babka wrote:
> > On 12/7/25 14:59, Harry Yoo wrote:
> > > On Wed, Dec 03, 2025 at 07:15:12PM +0800, Hao Li wrote:
> > >> On Wed, Dec 03, 2025 at 02:46:22PM +0900, Harry Yoo wrote:
> > >> > On Tue, Dec 02, 2025 at 05:00:08PM +0800, Hao Li wrote:
> > >> > > Introduce barn_get_full_sheaf(), a helper that detaches a full sheaf from
> > >> > > the per-node barn without requiring an empty sheaf in exchange.
> > >> > >
> > >> > > Use this helper in __pcs_replace_empty_main() to change how an empty main
> > >> > > per-CPU sheaf is handled:
> > >> > >
> > >> > > - If pcs->spare is NULL and pcs->main is empty, first try to obtain a
> > >> > > full sheaf from the barn via barn_get_full_sheaf(). On success, park
> > >> > > the empty main sheaf in pcs->spare and install the full sheaf as the
> > >> > > new pcs->main.
> > >> > >
> > >> > > - If pcs->spare already exists and has objects, keep the existing
> > >> > > behavior of simply swapping pcs->main and pcs->spare.
> > >> > >
> > >> > > - Only when both pcs->main and pcs->spare are empty do we fall back to
> > >> > > barn_replace_empty_sheaf() and trade the empty main sheaf into the
> > >> > > barn in exchange for a full one.
> > >> >
> > >> > Hi Hao,
> > >> >
> > >> > Yeah this is a very subtle difference between __pcs_replace_full_main()
> > >> > and __pcs_replace_empty_main(), that the former installs the full main
> > >> > sheaf in pcs->spare, while the latter replaces the empty main sheaf with
> > >> > a full sheaf from the barn without populating pcs->spare.
> > >>
> > >> Exactly.
> > >>
> > >> > Is it intentional, Vlastimil?
> > >
> > > Let's first see if Vlastimil had an intention, and...
> >
> > Hm I don't think I aimed to make this difference on purpose, but I didn't
> > also aim to make the alloc/free paths completely symmetric. Rather the goal
> > was just to do what seemed the best option in each situation. And probably
> > getting a full sheaf and populating spare never seemed to be an important
> > case to warrant the extra code for a situation that's only transient after
> > boot (see below).
> >
> > >> > > This makes the empty-main path more symmetric with __pcs_replace_full_main(),
> > >> > > which for a full main sheaf parks the full sheaf in pcs->spare and pulls an
> > >> > > empty sheaf from the barn. It also matches the documented design more closely:
> > >> > >
> > >> > > "When both percpu sheaves are found empty during an allocation, an empty
> > >> > > sheaf may be replaced with a full one from the per-node barn."
> > >> >
> > >> > I'm not convinced that this change is worthwhile by adding more code;
> > >> > you probably need to make a stronger argument for why it should be done.
> > >>
> > >> Hi Harry,
> > >>
> > >> Let me explain my intuition in more detail.
> > >>
> > >> Previously, when pcs->main was empty and pcs->spare was NULL, we used
> > >> barn_replace_empty_sheaf() to trade the empty main sheaf into the barn
> > >> in exchange for a full one. As a result, pcs->main became full, but
> > >> pcs->spare remained NULL. Later, when frees filled pcs->main again,
> > >> __pcs_replace_full_main() had to call into the barn to obtain an empty
> > >> sheaf, because there was still no local spare to use.
> >
> > As Harry suggests, that assumes a specific pattern where we exhaust main
> > sheaf first and then we fill it fully back.
>
> Right.
>
> > But even then this can only
> > happen once per cpu and then we have populated the spare and are very
> > unlikely to run into this situation again.
>
> Good point!
>
> > Also it's unlikely that full sheaves even exist in the barn during this
> > early stage when we would request them. That assumes cpus behave differently
> > and some have returned full sheaves to the barn before other cpus have
> > consumed their first full sheaf and request another.
>
> Right.
>
> > More likely both barn_replace_empty_sheaf() and barn_get_empty_sheaf() will
> > fail and we do alloc_full_sheaf().
> >
> > And then... I think I can see an issue in
> > __pcs_replace_empty_main() that's more likely to be suboptimal than the lack
> > of symmetry you point out.
>
> > When we reach the last part below "we can reach
> > here only when gfpflags_allow_blocking..." and we have empty pcs->main, a
> > full sheaf from alloc_full_sheaf() and no spare, we should be doing
> > "pcs->spare = pcs->main" and not barn_put_empty_sheaf(). Right? This is what
> > can delay populating the spare more likely I think.
>
> That makes sense to me.
>
> > >> With this patch, when pcs->main is empty and pcs->spare is NULL,
> > >> __pcs_replace_empty_main() instead uses barn_get_full_sheaf() to pull a
> > >> full sheaf from the barn while keeping the now‑empty main sheaf locally
> > >> as pcs->spare. The next time pcs->main becomes full,
> > >> __pcs_replace_full_main() can simply swap main and spare, with no barn
> > >> operations and no need to allocate a new empty sheaf.
> > >
> > > I'm not still sure that either way is superior, as it really depends on
> > > the alloc/free pattern. If the CPU keeps allocating more objects, keeping
> > > the empty sheaf is unnecessary, but we don't know what the alloc/free
> > > pattern will be.
> >
> > Yeah.
> >
> > > So strong opinion from me, but I think it'd be better make
> > > __pcs_replace_{full,empty}_main() handle it consistently,
> > > if there is no special intention.
> >
> > I'd rather see some numbers. But the suboptimality pointed out above is more
> > obvious to me. Do you agree and want to send a patch? :)
>
> I agree and would like Hao Li to try this path as he raised this topic,
> if he's interested ;)
Thanks Harry for reviewing and letting me work on this as a newcomer to SLUB.
>
> > >> In other words, although we still need one barn operation when main
> > >> first becomes empty in __pcs_replace_empty_main(), we avoid a future
> > >> barn operation on the subsequent “main full” path in
> > >> __pcs_replace_full_main.
> > >>
> > >> Thanks.
> > >>
> > >> >
> > >> > > Signed-off-by: Hao Li <haoli.tcs@gmail.com>
>
> --
> Cheers,
> Harry / Hyeonggon
© 2016 - 2026 Red Hat, Inc.