mm/slab.h | 7 +- mm/slub.c | 256 +++++++++++++++++++++++++++++++++++++++++++++----------------- 2 files changed, 191 insertions(+), 72 deletions(-)
This is the draft patch from [1] turned into a proper series with
incremental changes. It's based on v7.0-rc3. It's too intrusive for a
7.0 hotfix, so we'll only be able to fix/reduce the regression in 7.1. I
hope it's acceptable given it's a non-standard configuration, 7.0 is not
a LTS, and it's a perf regression, not functionality.
Ming can you please retest this on top of v7.0-rc3, which already has
fb1091febd66 ("mm/slab: allow sheaf refill if blocking is not
allowed"). Separate data point for v7.0-rc3 could be also useful.
[1] https://lore.kernel.org/all/c6a01f7e-c6eb-454b-9b9e-734526dd659d@kernel.org/
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
Vlastimil Babka (SUSE) (3):
slab: decouple pointer to barn from kmem_cache_node
slab: create barns for online memoryless nodes
slab: free remote objects to sheaves on memoryless nodes
mm/slab.h | 7 +-
mm/slub.c | 256 +++++++++++++++++++++++++++++++++++++++++++++-----------------
2 files changed, 191 insertions(+), 72 deletions(-)
---
base-commit: 1f318b96cc84d7c2ab792fcc0bfd42a7ca890681
change-id: 20260311-b4-slab-memoryless-barns-fad64172ba05
Best regards,
--
Vlastimil Babka (SUSE) <vbabka@kernel.org>
On 3/11/26 09:25, Vlastimil Babka (SUSE) wrote:
> This is the draft patch from [1] turned into a proper series with
> incremental changes. It's based on v7.0-rc3. It's too intrusive for a
> 7.0 hotfix, so we'll only be able to fix/reduce the regression in 7.1. I
> hope it's acceptable given it's a non-standard configuration, 7.0 is not
> a LTS, and it's a perf regression, not functionality.
>
> Ming can you please retest this on top of v7.0-rc3, which already has
> fb1091febd66 ("mm/slab: allow sheaf refill if blocking is not
> allowed"). Separate data point for v7.0-rc3 could be also useful.
>
> [1] https://lore.kernel.org/all/c6a01f7e-c6eb-454b-9b9e-734526dd659d@kernel.org/
>
> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> ---
> Vlastimil Babka (SUSE) (3):
> slab: decouple pointer to barn from kmem_cache_node
> slab: create barns for online memoryless nodes
> slab: free remote objects to sheaves on memoryless nodes
>
> mm/slab.h | 7 +-
> mm/slub.c | 256 +++++++++++++++++++++++++++++++++++++++++++++-----------------
> 2 files changed, 191 insertions(+), 72 deletions(-)
> ---
> base-commit: 1f318b96cc84d7c2ab792fcc0bfd42a7ca890681
> change-id: 20260311-b4-slab-memoryless-barns-fad64172ba05
>
> Best regards,
Range-diff in slab/for-7.1/sheaves after applying Harry's feedback:
2: cc67056e94f1 ! 472: b002755da434 slab: decouple pointer to barn from kmem_cache_node
@@ Commit message
Link: https://patch.msgid.link/20260311-b4-slab-memoryless-barns-v1-1-70ab850be4ce@kernel.org
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
+ Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
## mm/slab.h ##
@@ mm/slab.h: struct kmem_cache_order_objects {
@@ mm/slub.c: struct kmem_cache_node {
/*
- * Get the barn of the current cpu's closest memory node. It may not exist on
- * systems with memoryless nodes but without CONFIG_HAVE_MEMORYLESS_NODES
-+ * Get the barn of the current cpu's memory node. It may be a memoryless node.
++ * Get the barn of the current cpu's NUMA node. It may be a memoryless node.
*/
static inline struct node_barn *get_barn(struct kmem_cache *s)
{
@@ mm/slub.c: struct kmem_cache_node {
- return NULL;
-
- return n->barn;
-+ return get_barn_node(s, numa_node_id());
++ return get_barn_node(s, numa_mem_id());
}
/*
3: 285bca63cf15 ! 473: f811cc3d9f6e slab: create barns for online memoryless nodes
@@ Commit message
nodes, tracked in a new nodemask slab_barn_nodes. Also add a cpu hotplug
callback as that's when a memoryless node can become online.
- Change rcu_sheaf->node assignment to numa_node_id() so it's returned to
- the barn of the local cpu's (potentially memoryless) node, and not to
- the nearest node with memory anymore.
+ Change both get_barn() and rcu_sheaf->node assignment to numa_node_id()
+ so it's returned to the barn of the local cpu's (potentially memoryless)
+ node, and not to the nearest node with memory anymore.
+
+ On systems with CONFIG_HAVE_MEMORYLESS_NODES=y (which are not the main
+ target of this change) barns did not exist on memoryless nodes, but
+ get_barn() using numa_mem_id() meant a barn was returned from the
+ nearest node with memory. This works, but the barn lock contention
+ increases with every such memoryless node. With this change, barn will
+ be allocated also on the memoryless node, reducing this contention in
+ exchange for increased memory consumption.
Reported-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/ [1]
Link: https://patch.msgid.link/20260311-b4-slab-memoryless-barns-v1-2-70ab850be4ce@kernel.org
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
+ Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
## mm/slub.c ##
+@@ mm/slub.c: static inline struct node_barn *get_barn_node(struct kmem_cache *s, int node)
+ */
+ static inline struct node_barn *get_barn(struct kmem_cache *s)
+ {
+- return get_barn_node(s, numa_mem_id());
++ return get_barn_node(s, numa_node_id());
+ }
+
+ /*
@@ mm/slub.c: static inline struct node_barn *get_barn(struct kmem_cache *s)
*/
static nodemask_t slab_nodes;
4: 1fe49af3aa46 ! 474: 86e18f36844f slab: free remote objects to sheaves on memoryless nodes
@@ Commit message
Link: https://patch.msgid.link/20260311-b4-slab-memoryless-barns-v1-3-70ab850be4ce@kernel.org
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
+ Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
## mm/slub.c ##
@@ mm/slub.c: bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj)
On Wed, Mar 11, 2026 at 09:25:54AM +0100, Vlastimil Babka (SUSE) wrote:
> This is the draft patch from [1] turned into a proper series with
> incremental changes. It's based on v7.0-rc3. It's too intrusive for a
> 7.0 hotfix, so we'll only be able to fix/reduce the regression in 7.1. I
> hope it's acceptable given it's a non-standard configuration, 7.0 is not
> a LTS, and it's a perf regression, not functionality.
>
> Ming can you please retest this on top of v7.0-rc3, which already has
> fb1091febd66 ("mm/slab: allow sheaf refill if blocking is not
> allowed"). Separate data point for v7.0-rc3 could be also useful.
>
> [1] https://lore.kernel.org/all/c6a01f7e-c6eb-454b-9b9e-734526dd659d@kernel.org/
>
> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> ---
> Vlastimil Babka (SUSE) (3):
> slab: decouple pointer to barn from kmem_cache_node
> slab: create barns for online memoryless nodes
> slab: free remote objects to sheaves on memoryless nodes
Hi Vlastimil and Guys,
I re-run the test case used in https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
- v6.19-rc5: 34M
- 815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next: 13M
- v7.0-rc3: 13M
- v7.0-rc3 + the three patches: 24M
# Test Machines
- AMD Zen4, dual sockets, 64 cores, 8 NUMA node(configure BIOS to use per-CCD numa, just 2 memory node)
- numactl -H:
https://lore.kernel.org/all/aZ7p9uF8H8u6RxrK@fedora/
# slab stat log
root@tomsrv:~/temp/mm/7.0-rc3/patched# (cd /sys/kernel/slab/bio-256/ && find . -type f -exec grep -aH . {} \;)
./remote_node_defrag_ratio:100
./total_objects:7344 N1=3417 N5=3927
./alloc_fastpath:476106437 C0=128 C1=26852005 C2=128 C3=27291181 C4=65 C5=35617011 C6=97 C7=34258221 C8=96 C9=28158690 C11=26433128 C12=128 C13=31715794 C15=28819773 C16=97 C17=26168947 C19=30768051 C20=128 C21=32964376 C23=34696825 C25=26471644 C26=130 C27=27844688 C28=97 C29=28480054 C31=29564950 C40=1 C42=2 C63=2
./cpu_slabs:0
./objects:7265 N1=3374 N5=3891
./sheaf_return_slow:0
./objects_partial:533 N1=212 N5=321
./sheaf_return_fast:0
./cpu_partial:0
./free_slowpath:295 C4=158 C6=136 C20=1
./barn_get_fail:270 C0=5 C1=16 C2=5 C3=6 C4=3 C5=21 C6=4 C7=14 C8=2 C9=7 C11=23 C12=3 C13=10 C15=19 C16=3 C17=4 C19=25 C20=5 C21=22 C23=6 C25=21 C26=5 C27=6 C28=1 C29=4 C31=27 C40=1 C42=1 C63=1
./sheaf_prefill_oversize:0
./skip_kfence:0
./min_partial:5
./order_fallback:0
./sheaf_capacity:28
./sheaf_flush:0
./free_rcu_sheaf:0
./sheaf_alloc:179 C0=9 C1=1 C2=4 C4=8 C5=1 C6=4 C7=65 C8=3 C10=10 C11=1 C12=2 C14=11 C15=1 C16=5 C18=8 C19=1 C20=8 C21=1 C22=5 C24=8 C25=1 C26=5 C28=5 C30=8 C31=1 C40=1 C42=1 C63=1
./sheaf_free:0
./sheaf_prefill_slow:0
./sheaf_prefill_fast:0
./poison:0
./red_zone:0
./free_slab:0
./slabs:144 N1=67 N5=77
./barn_get:17003547 C1=958985 C3=974680 C5=1272016 C7=1223494 C8=2 C9=1005661 C11=944018 C12=2 C13=1132697 C15=1029259 C16=1 C17=934602 C19=1098834 C21=1177278 C23=1239167 C25=945395 C27=994448 C28=3 C29=1017141 C31=1055864
./alloc_slowpath:0
./destroy_by_rcu:1
./free_rcu_sheaf_fail:0
./barn_put:17003623 C0=958995 C2=974679 C4=1272023 C6=1223496 C8=1005661 C10=944030 C12=1132701 C14=1029267 C16=934598 C18=1098848 C20=1177293 C22=1239162 C24=945405 C26=994447 C28=1017138 C30=1055880
./usersize:0
./sanity_checks:0
./barn_put_fail:0
./align:64
./alloc_node_mismatch:0
./alloc_slab:144 C0=2 C1=8 C2=3 C3=2 C4=1 C5=5 C6=1 C7=3 C8=2 C9=4 C11=14 C12=2 C13=7 C15=11 C16=2 C17=3 C19=20 C20=1 C21=5 C23=1 C25=13 C26=4 C27=5 C29=1 C31=21 C40=1 C42=1 C63=1
./free_remove_partial:0
./aliases:0
./store_user:0
./trace:0
./reclaim_account:0
./order:2
./sheaf_refill:7560 C0=140 C1=448 C2=140 C3=168 C4=84 C5=588 C6=112 C7=392 C8=56 C9=196 C11=644 C12=84 C13=280 C15=532 C16=84 C17=112 C19=700 C20=140 C21=616 C23=168 C25=588 C26=140 C27=168 C28=28 C29=112 C31=756 C40=28 C42=28 C63=28
./object_size:256
./free_fastpath:476102026 C0=26851883 C2=27291053 C4=35616664 C6=34257923 C8=28158529 C9=1 C10=26432875 C11=2 C12=31715665 C14=28819520 C16=26168783 C18=30767788 C20=32964224 C21=2 C22=34696578 C24=26471388 C26=27844558 C27=2 C28=28479894 C30=29564692 C31=2
./hwcache_align:1
./cmpxchg_double_fail:0
./objs_per_slab:51
./partial:12 N1=5 N5=7
./slabs_cpu_partial:0(0)
./free_add_partial:143 C0=3 C1=8 C2=2 C3=4 C4=11 C5=16 C6=13 C7=9 C9=3 C11=8 C12=1 C13=3 C15=8 C16=1 C17=1 C19=5 C20=5 C21=17 C23=5 C25=8 C26=1 C27=1 C28=1 C29=3 C31=6
./slab_size:320
./cache_dma:0
Thanks,
Ming
On 3/11/26 10:49, Ming Lei wrote:
> On Wed, Mar 11, 2026 at 09:25:54AM +0100, Vlastimil Babka (SUSE) wrote:
>> This is the draft patch from [1] turned into a proper series with
>> incremental changes. It's based on v7.0-rc3. It's too intrusive for a
>> 7.0 hotfix, so we'll only be able to fix/reduce the regression in 7.1. I
>> hope it's acceptable given it's a non-standard configuration, 7.0 is not
>> a LTS, and it's a perf regression, not functionality.
>>
>> Ming can you please retest this on top of v7.0-rc3, which already has
>> fb1091febd66 ("mm/slab: allow sheaf refill if blocking is not
>> allowed"). Separate data point for v7.0-rc3 could be also useful.
>>
>> [1] https://lore.kernel.org/all/c6a01f7e-c6eb-454b-9b9e-734526dd659d@kernel.org/
>>
>> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>> ---
>> Vlastimil Babka (SUSE) (3):
>> slab: decouple pointer to barn from kmem_cache_node
>> slab: create barns for online memoryless nodes
>> slab: free remote objects to sheaves on memoryless nodes
>
> Hi Vlastimil and Guys,
>
> I re-run the test case used in https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
>
> - v6.19-rc5: 34M
>
> - 815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next: 13M
>
> - v7.0-rc3: 13M
Thanks, that's in line with your previous testing of "mm/slab: allow sheaf
refill if blocking is not allowed" making no difference here. At least we
just learned it helps other benchmarks :)
> - v7.0-rc3 + the three patches: 24M
OK. So now it might be really the total per-cpu caching capacity difference.
> # Test Machines
>
> - AMD Zen4, dual sockets, 64 cores, 8 NUMA node(configure BIOS to use per-CCD numa, just 2 memory node)
>
> - numactl -H:
>
> https://lore.kernel.org/all/aZ7p9uF8H8u6RxrK@fedora/
>
> # slab stat log
>
> root@tomsrv:~/temp/mm/7.0-rc3/patched# (cd /sys/kernel/slab/bio-256/ && find . -type f -exec grep -aH . {} \;)
> ./remote_node_defrag_ratio:100
> ./total_objects:7344 N1=3417 N5=3927
> ./alloc_fastpath:476106437 C0=128 C1=26852005 C2=128 C3=27291181 C4=65 C5=35617011 C6=97 C7=34258221 C8=96 C9=28158690 C11=26433128 C12=128 C13=31715794 C15=28819773 C16=97 C17=26168947 C19=30768051 C20=128 C21=32964376 C23=34696825 C25=26471644 C26=130 C27=27844688 C28=97 C29=28480054 C31=29564950 C40=1 C42=2 C63=2
> ./cpu_slabs:0
> ./objects:7265 N1=3374 N5=3891
> ./sheaf_return_slow:0
> ./objects_partial:533 N1=212 N5=321
> ./sheaf_return_fast:0
> ./cpu_partial:0
> ./free_slowpath:295 C4=158 C6=136 C20=1
> ./barn_get_fail:270 C0=5 C1=16 C2=5 C3=6 C4=3 C5=21 C6=4 C7=14 C8=2 C9=7 C11=23 C12=3 C13=10 C15=19 C16=3 C17=4 C19=25 C20=5 C21=22 C23=6 C25=21 C26=5 C27=6 C28=1 C29=4 C31=27 C40=1 C42=1 C63=1
> ./sheaf_prefill_oversize:0
> ./skip_kfence:0
> ./min_partial:5
> ./order_fallback:0
> ./sheaf_capacity:28
> ./sheaf_flush:0
> ./free_rcu_sheaf:0
> ./sheaf_alloc:179 C0=9 C1=1 C2=4 C4=8 C5=1 C6=4 C7=65 C8=3 C10=10 C11=1 C12=2 C14=11 C15=1 C16=5 C18=8 C19=1 C20=8 C21=1 C22=5 C24=8 C25=1 C26=5 C28=5 C30=8 C31=1 C40=1 C42=1 C63=1
> ./sheaf_free:0
> ./sheaf_prefill_slow:0
> ./sheaf_prefill_fast:0
> ./poison:0
> ./red_zone:0
> ./free_slab:0
> ./slabs:144 N1=67 N5=77
> ./barn_get:17003547 C1=958985 C3=974680 C5=1272016 C7=1223494 C8=2 C9=1005661 C11=944018 C12=2 C13=1132697 C15=1029259 C16=1 C17=934602 C19=1098834 C21=1177278 C23=1239167 C25=945395 C27=994448 C28=3 C29=1017141 C31=1055864
> ./alloc_slowpath:0
> ./destroy_by_rcu:1
> ./free_rcu_sheaf_fail:0
> ./barn_put:17003623 C0=958995 C2=974679 C4=1272023 C6=1223496 C8=1005661 C10=944030 C12=1132701 C14=1029267 C16=934598 C18=1098848 C20=1177293 C22=1239162 C24=945405 C26=994447 C28=1017138 C30=1055880
> ./usersize:0
> ./sanity_checks:0
> ./barn_put_fail:0
> ./align:64
> ./alloc_node_mismatch:0
> ./alloc_slab:144 C0=2 C1=8 C2=3 C3=2 C4=1 C5=5 C6=1 C7=3 C8=2 C9=4 C11=14 C12=2 C13=7 C15=11 C16=2 C17=3 C19=20 C20=1 C21=5 C23=1 C25=13 C26=4 C27=5 C29=1 C31=21 C40=1 C42=1 C63=1
> ./free_remove_partial:0
> ./aliases:0
> ./store_user:0
> ./trace:0
> ./reclaim_account:0
> ./order:2
> ./sheaf_refill:7560 C0=140 C1=448 C2=140 C3=168 C4=84 C5=588 C6=112 C7=392 C8=56 C9=196 C11=644 C12=84 C13=280 C15=532 C16=84 C17=112 C19=700 C20=140 C21=616 C23=168 C25=588 C26=140 C27=168 C28=28 C29=112 C31=756 C40=28 C42=28 C63=28
> ./object_size:256
> ./free_fastpath:476102026 C0=26851883 C2=27291053 C4=35616664 C6=34257923 C8=28158529 C9=1 C10=26432875 C11=2 C12=31715665 C14=28819520 C16=26168783 C18=30767788 C20=32964224 C21=2 C22=34696578 C24=26471388 C26=27844558 C27=2 C28=28479894 C30=29564692 C31=2
> ./hwcache_align:1
> ./cmpxchg_double_fail:0
> ./objs_per_slab:51
> ./partial:12 N1=5 N5=7
> ./slabs_cpu_partial:0(0)
> ./free_add_partial:143 C0=3 C1=8 C2=2 C3=4 C4=11 C5=16 C6=13 C7=9 C9=3 C11=8 C12=1 C13=3 C15=8 C16=1 C17=1 C19=5 C20=5 C21=17 C23=5 C25=8 C26=1 C27=1 C28=1 C29=3 C31=6
> ./slab_size:320
> ./cache_dma:0
>
>
> Thanks,
> Ming
>
Hi Vlastimil,
On 11/03/2026 17:22, Vlastimil Babka (SUSE) wrote:
> On 3/11/26 10:49, Ming Lei wrote:
>> On Wed, Mar 11, 2026 at 09:25:54AM +0100, Vlastimil Babka (SUSE) wrote:
>>> This is the draft patch from [1] turned into a proper series with
>>> incremental changes. It's based on v7.0-rc3. It's too intrusive for a
>>> 7.0 hotfix, so we'll only be able to fix/reduce the regression in 7.1. I
>>> hope it's acceptable given it's a non-standard configuration, 7.0 is not
>>> a LTS, and it's a perf regression, not functionality.
>>>
>>> Ming can you please retest this on top of v7.0-rc3, which already has
>>> fb1091febd66 ("mm/slab: allow sheaf refill if blocking is not
>>> allowed"). Separate data point for v7.0-rc3 could be also useful.
>>>
>>> [1] https://lore.kernel.org/all/c6a01f7e-c6eb-454b-9b9e-734526dd659d@kernel.org/
>>>
>>> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>>> ---
>>> Vlastimil Babka (SUSE) (3):
>>> slab: decouple pointer to barn from kmem_cache_node
>>> slab: create barns for online memoryless nodes
>>> slab: free remote objects to sheaves on memoryless nodes
>>
>> Hi Vlastimil and Guys,
>>
>> I re-run the test case used in https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
>>
>> - v6.19-rc5: 34M
>>
>> - 815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next: 13M
>>
>> - v7.0-rc3: 13M
>
> Thanks, that's in line with your previous testing of "mm/slab: allow sheaf
> refill if blocking is not allowed" making no difference here. At least we
> just learned it helps other benchmarks :)
>
>> - v7.0-rc3 + the three patches: 24M
>
> OK. So now it might be really the total per-cpu caching capacity difference.
I have also observed a performance regresssion for Linux v7.0-rc for
some graphics related tests we run. I bisected to ...
# first bad commit: [e47c897a29491ade20b27612fdd3107c39a07357] slab: add
sheaves to most caches
I came across Ming's report and hence, found this series. I have also
tested the 3 patches in this series and it did appear to help with one
test, but overall I am still seeing a ~25% performance regression (the
tests are taking about 25% longer to run). I am not the owner or author
of these specific tests and I have not dived into see exactly what is
taking longer, but I just know they are taking longer to run.
Anyway, I have not seen any recent updates on this, and so I am not sure
if there are any other updates or what the current status of this is?
If there are any more patches available I will be happy to test.
Thanks!
Jon
--
nvpublic
On Wed, Apr 08, 2026 at 02:04:54PM +0100, Jon Hunter wrote:
> Hi Vlastimil,
Hi Jon,
> On 11/03/2026 17:22, Vlastimil Babka (SUSE) wrote:
> > On 3/11/26 10:49, Ming Lei wrote:
> > > On Wed, Mar 11, 2026 at 09:25:54AM +0100, Vlastimil Babka (SUSE) wrote:
> > > > This is the draft patch from [1] turned into a proper series with
> > > > incremental changes. It's based on v7.0-rc3. It's too intrusive for a
> > > > 7.0 hotfix, so we'll only be able to fix/reduce the regression in 7.1. I
> > > > hope it's acceptable given it's a non-standard configuration, 7.0 is not
> > > > a LTS, and it's a perf regression, not functionality.
> > > >
> > > > Ming can you please retest this on top of v7.0-rc3, which already has
> > > > fb1091febd66 ("mm/slab: allow sheaf refill if blocking is not
> > > > allowed"). Separate data point for v7.0-rc3 could be also useful.
> > > >
> > > > [1] https://lore.kernel.org/all/c6a01f7e-c6eb-454b-9b9e-734526dd659d@kernel.org/
> > > >
> > > > Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> > > > ---
> > > > Vlastimil Babka (SUSE) (3):
> > > > slab: decouple pointer to barn from kmem_cache_node
> > > > slab: create barns for online memoryless nodes
> > > > slab: free remote objects to sheaves on memoryless nodes
> > >
> > > Hi Vlastimil and Guys,
> > >
> > > I re-run the test case used in https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
> > >
> > > - v6.19-rc5: 34M
> > >
> > > - 815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next: 13M
> > >
> > > - v7.0-rc3: 13M
> >
> > Thanks, that's in line with your previous testing of "mm/slab: allow sheaf
> > refill if blocking is not allowed" making no difference here. At least we
> > just learned it helps other benchmarks :)
> >
> > > - v7.0-rc3 + the three patches: 24M
> >
> > OK. So now it might be really the total per-cpu caching capacity difference.
>
> I have also observed a performance regresssion for Linux v7.0-rc for some
> graphics related tests we run. I bisected to ...
>
> # first bad commit: [e47c897a29491ade20b27612fdd3107c39a07357] slab: add
> sheaves to most caches
>
> I came across Ming's report and hence, found this series. I have also tested
> the 3 patches in this series and it did appear to help with one test, but
> overall I am still seeing a ~25% performance regression (the tests are
> taking about 25% longer to run). I am not the owner or author of these
> specific tests and I have not dived into see exactly what is taking longer,
> but I just know they are taking longer to run.
>
> Anyway, I have not seen any recent updates on this, and so I am not sure if
> there are any other updates or what the current status of this is?
As far as I remember we didn't get to fully recovering the performance
yet. Interestingly even when most of allocations go through the fastpath
it didn't fully recover [1].
[1] https://lore.kernel.org/all/abI9DKxuwl_4Gasj@hyeyoo
I was suspecting it's probably because of:
- false sharing on something (sheaves, obj metadata, etc.), or
- suboptimal NUMA placement, or
- something outside slab involved
But I don't have enough data to back up any of these theories yet.
> If there are any more patches available I will be happy to test.
Thanks!
Before diving deeper, could you please share the NUMA topology from
`numactl -H` on your machine?
It's probably a NUMA machine? (and hopefully not memoryless ones!)
--
Cheers,
Harry / Hyeonggon
On Wed, Apr 08, 2026 at 02:04:54PM +0100, Jon Hunter wrote:
> Hi Vlastimil,
>
> On 11/03/2026 17:22, Vlastimil Babka (SUSE) wrote:
> > On 3/11/26 10:49, Ming Lei wrote:
> > > On Wed, Mar 11, 2026 at 09:25:54AM +0100, Vlastimil Babka (SUSE) wrote:
> > > > This is the draft patch from [1] turned into a proper series with
> > > > incremental changes. It's based on v7.0-rc3. It's too intrusive for a
> > > > 7.0 hotfix, so we'll only be able to fix/reduce the regression in 7.1. I
> > > > hope it's acceptable given it's a non-standard configuration, 7.0 is not
> > > > a LTS, and it's a perf regression, not functionality.
> > > >
> > > > Ming can you please retest this on top of v7.0-rc3, which already has
> > > > fb1091febd66 ("mm/slab: allow sheaf refill if blocking is not
> > > > allowed"). Separate data point for v7.0-rc3 could be also useful.
> > > >
> > > > [1] https://lore.kernel.org/all/c6a01f7e-c6eb-454b-9b9e-734526dd659d@kernel.org/
> > > >
> > > > Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> > > > ---
> > > > Vlastimil Babka (SUSE) (3):
> > > > slab: decouple pointer to barn from kmem_cache_node
> > > > slab: create barns for online memoryless nodes
> > > > slab: free remote objects to sheaves on memoryless nodes
> > >
> > > Hi Vlastimil and Guys,
> > >
> > > I re-run the test case used in https://lore.kernel.org/all/aZ0SbIqaIkwoW2mB@fedora/
> > >
> > > - v6.19-rc5: 34M
> > >
> > > - 815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next: 13M
> > >
> > > - v7.0-rc3: 13M
> >
> > Thanks, that's in line with your previous testing of "mm/slab: allow sheaf
> > refill if blocking is not allowed" making no difference here. At least we
> > just learned it helps other benchmarks :)
> >
> > > - v7.0-rc3 + the three patches: 24M
> >
> > OK. So now it might be really the total per-cpu caching capacity difference.
>
>
> I have also observed a performance regresssion for Linux v7.0-rc for some
> graphics related tests we run. I bisected to ...
>
> # first bad commit: [e47c897a29491ade20b27612fdd3107c39a07357] slab: add
> sheaves to most caches
Hi, Jon
Thanks for the reporting.
This first bad commit is surprising. In theory, this commit seems couldn't hurt
performance.
Could you possibly manually switch commits to verify this bad commit again,
without using git bisect?
>
> I came across Ming's report and hence, found this series. I have also tested
> the 3 patches in this series and it did appear to help with one test, but
> overall I am still seeing a ~25% performance regression (the tests are
> taking about 25% longer to run). I am not the owner or author of these
> specific tests and I have not dived into see exactly what is taking longer,
> but I just know they are taking longer to run.
>
> Anyway, I have not seen any recent updates on this, and so I am not sure if
> there are any other updates or what the current status of this is?
>
> If there are any more patches available I will be happy to test.
>
> Thanks!
> Jon
>
> --
> nvpublic
--
Thanks,
Hao
© 2016 - 2026 Red Hat, Inc.