kernel/locking/mutex.c | 60 +++++++---------- kernel/locking/mutex.h | 25 +++++++ kernel/locking/rtmutex.c | 26 +++++--- kernel/locking/rwbase_rt.c | 4 +- kernel/locking/rwsem.c | 4 +- kernel/locking/spinlock_rt.c | 3 +- kernel/locking/ww_mutex.h | 49 ++++++++------ kernel/sched/core.c | 122 +++++++++++++++++++++-------------- kernel/sched/deadline.c | 53 ++++++--------- kernel/sched/fair.c | 18 +++--- kernel/sched/rt.c | 59 +++++++---------- kernel/sched/sched.h | 44 ++++++++++++- 12 files changed, 268 insertions(+), 199 deletions(-)
After sending out v7 of Proxy Execution, I got feedback that the patch series was getting a bit unwieldy to review, and Qais suggested I break out just the cleanups/preparatory components of the patch series and submit them on their own in the hope we can start to merge the less complex bits and discussion can focus on the more complicated portions afterwards. So for the v8 of this series, I only submitted those earlier cleanup/preparatory changes: https://lore.kernel.org/lkml/20240210002328.4126422-1-jstultz@google.com/ After sending this out a few weeks back, I’ve not heard much, so I wanted to resend this again. (I did correct one detail here, which was that I had accidentally lost the author credit to one of the patches, and I’ve fixed that in this submission). As before, If you are interested, the full v8 series, it can be found here: https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v8-6.8-rc3 https://github.com/johnstultz-work/linux-dev.git proxy-exec-v8-6.8-rc3 However, I’ve been focusing pretty intensely on the series to shake out some issues with the more complicated later patches in the series (not in what I’m submitting here), and have resolved a number of problems I uncovered in doing wider testing (along with lots of review feedback from Metin), so v9 and all of its improvements will hopefully be ready to send out soon. If you want a preview, my current WIP tree (careful, as I rebase it frequently) is here: https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-WIP https://github.com/johnstultz-work/linux-dev.git proxy-exec-WIP Review and feedback would be greatly appreciated! Thanks so much! -john Cc: Joel Fernandes <joelaf@google.com> Cc: Qais Yousef <qyousef@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Zimuzo Ezeozue <zezeozue@google.com> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Metin Kaya <Metin.Kaya@arm.com> Cc: Xuewen Yan <xuewen.yan94@gmail.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@android.com Connor O'Brien (2): sched: Add do_push_task helper sched: Consolidate pick_*_task to task_is_pushable helper John Stultz (1): sched: Split out __schedule() deactivate task logic into a helper Juri Lelli (2): locking/mutex: Make mutex::wait_lock irq safe locking/mutex: Expose __mutex_owner() Peter Zijlstra (2): locking/mutex: Remove wakeups from under mutex::wait_lock sched: Split scheduler and execution contexts kernel/locking/mutex.c | 60 +++++++---------- kernel/locking/mutex.h | 25 +++++++ kernel/locking/rtmutex.c | 26 +++++--- kernel/locking/rwbase_rt.c | 4 +- kernel/locking/rwsem.c | 4 +- kernel/locking/spinlock_rt.c | 3 +- kernel/locking/ww_mutex.h | 49 ++++++++------ kernel/sched/core.c | 122 +++++++++++++++++++++-------------- kernel/sched/deadline.c | 53 ++++++--------- kernel/sched/fair.c | 18 +++--- kernel/sched/rt.c | 59 +++++++---------- kernel/sched/sched.h | 44 ++++++++++++- 12 files changed, 268 insertions(+), 199 deletions(-) -- 2.44.0.rc0.258.g7320e95886-goog
Hello John,
Happy to report that I did not see any regressions with the series
as expected. Full results below.
On 2/24/2024 5:41 AM, John Stultz wrote:
> After sending out v7 of Proxy Execution, I got feedback that the
> patch series was getting a bit unwieldy to review, and Qais
> suggested I break out just the cleanups/preparatory components
> of the patch series and submit them on their own in the hope we
> can start to merge the less complex bits and discussion can focus
> on the more complicated portions afterwards.
>
> So for the v8 of this series, I only submitted those earlier
> cleanup/preparatory changes:
> https://lore.kernel.org/lkml/20240210002328.4126422-1-jstultz@google.com/
>
> After sending this out a few weeks back, I’ve not heard much, so
> I wanted to resend this again.
>
> (I did correct one detail here, which was that I had accidentally
> lost the author credit to one of the patches, and I’ve fixed that
> in this submission).
>
> As before, If you are interested, the full v8 series, it can be
> found here:
> https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v8-6.8-rc3
> https://github.com/johnstultz-work/linux-dev.git proxy-exec-v8-6.8-rc3
>
> However, I’ve been focusing pretty intensely on the series to
> shake out some issues with the more complicated later patches in
> the series (not in what I’m submitting here), and have resolved
> a number of problems I uncovered in doing wider testing (along
> with lots of review feedback from Metin), so v9 and all of its
> improvements will hopefully be ready to send out soon.
>
> If you want a preview, my current WIP tree (careful, as I rebase
> it frequently) is here:
> https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-WIP
> https://github.com/johnstultz-work/linux-dev.git proxy-exec-WIP
>
> Review and feedback would be greatly appreciated!
o System Details
- 3rd Generation EPYC System
- 2 x 64C/128T
- NPS1 mode
o Kernels
tip: tip:sched/core at commit 8cec3dd9e593 ("sched/core:
Simplify code by removing duplicate #ifdefs")
proxy-setup: tip + this series
o Results
==================================================================
Test : hackbench
Units : Normalized time in seconds
Interpretation: Lower is better
Statistic : AMean
==================================================================
Case: tip[pct imp](CV) proxy-setup[pct imp](CV)
1-groups 1.00 [ -0.00]( 2.08) 1.01 [ -0.53]( 2.45)
2-groups 1.00 [ -0.00]( 0.89) 1.03 [ -3.32]( 1.48)
4-groups 1.00 [ -0.00]( 0.81) 1.02 [ -2.26]( 1.22)
8-groups 1.00 [ -0.00]( 0.78) 1.00 [ -0.29]( 0.97)
16-groups 1.00 [ -0.00]( 1.60) 1.00 [ -0.27]( 1.86)
==================================================================
Test : tbench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) proxy-setup[pct imp](CV)
1 1.00 [ 0.00]( 0.71) 1.00 [ 0.31]( 0.37)
2 1.00 [ 0.00]( 0.25) 0.99 [ -0.56]( 0.31)
4 1.00 [ 0.00]( 0.85) 0.98 [ -2.35]( 0.69)
8 1.00 [ 0.00]( 1.00) 0.99 [ -0.99]( 0.12)
16 1.00 [ 0.00]( 1.25) 0.99 [ -0.78]( 1.35)
32 1.00 [ 0.00]( 0.35) 1.00 [ 0.12]( 2.23)
64 1.00 [ 0.00]( 0.71) 0.99 [ -0.97]( 0.55)
128 1.00 [ 0.00]( 0.46) 0.96 [ -4.38]( 0.47)
256 1.00 [ 0.00]( 0.24) 0.99 [ -1.32]( 0.95)
512 1.00 [ 0.00]( 0.30) 0.98 [ -1.52]( 0.10)
1024 1.00 [ 0.00]( 0.40) 0.98 [ -1.59]( 0.23)
==================================================================
Test : stream-10
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) proxy-setup[pct imp](CV)
Copy 1.00 [ 0.00]( 9.73) 1.04 [ 4.18]( 3.12)
Scale 1.00 [ 0.00]( 5.57) 0.99 [ -1.35]( 5.74)
Add 1.00 [ 0.00]( 5.43) 0.99 [ -1.29]( 5.93)
Triad 1.00 [ 0.00]( 5.50) 0.97 [ -3.47]( 7.81)
==================================================================
Test : stream-100
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) proxy-setup[pct imp](CV)
Copy 1.00 [ 0.00]( 3.26) 1.01 [ 0.83]( 2.69)
Scale 1.00 [ 0.00]( 1.26) 1.00 [ -0.32]( 4.52)
Add 1.00 [ 0.00]( 1.47) 1.01 [ 0.63]( 0.96)
Triad 1.00 [ 0.00]( 1.77) 1.02 [ 1.81]( 1.00)
==================================================================
Test : netperf
Units : Normalized Througput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) proxy-setup[pct imp](CV)
1-clients 1.00 [ 0.00]( 0.22) 0.99 [ -0.53]( 0.26)
2-clients 1.00 [ 0.00]( 0.57) 1.00 [ -0.44]( 0.41)
4-clients 1.00 [ 0.00]( 0.43) 1.00 [ -0.48]( 0.39)
8-clients 1.00 [ 0.00]( 0.27) 1.00 [ -0.31]( 0.42)
16-clients 1.00 [ 0.00]( 0.46) 1.00 [ -0.11]( 0.42)
32-clients 1.00 [ 0.00]( 0.95) 1.00 [ -0.41]( 0.56)
64-clients 1.00 [ 0.00]( 1.79) 1.00 [ -0.15]( 1.65)
128-clients 1.00 [ 0.00]( 0.89) 1.00 [ -0.43]( 0.80)
256-clients 1.00 [ 0.00]( 3.88) 1.00 [ -0.37]( 4.74)
512-clients 1.00 [ 0.00](35.06) 1.01 [ 1.05](50.84)
==================================================================
Test : schbench
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) proxy-setup[pct imp](CV)
1 1.00 [ -0.00](27.28) 1.31 [-31.25]( 2.38)
2 1.00 [ -0.00]( 3.85) 1.00 [ -0.00]( 8.85)
4 1.00 [ -0.00](14.00) 1.11 [-10.53](11.18)
8 1.00 [ -0.00]( 4.68) 1.08 [ -8.33]( 9.93)
16 1.00 [ -0.00]( 4.08) 0.92 [ 8.06]( 3.70)
32 1.00 [ -0.00]( 6.68) 0.95 [ 5.10]( 2.22)
64 1.00 [ -0.00]( 1.79) 0.99 [ 1.02]( 3.18)
128 1.00 [ -0.00]( 6.30) 1.02 [ -2.48]( 7.37)
256 1.00 [ -0.00](43.39) 1.00 [ -0.00](37.06)
512 1.00 [ -0.00]( 2.26) 0.98 [ 1.88]( 6.96)
Note: schbench is known to have high run to run variance for
16-workers and below.
==================================================================
Test : Unixbench
Units : Normalized scores
Interpretation: Lower is better
Statistic : Various (Mentioned)
==================================================================
Metric Variant tip proxy-setup
Hmean unixbench-dhry2reg-1 0.00% -0.60%
Hmean unixbench-dhry2reg-512 0.00% -0.01%
Amean unixbench-syscall-1 0.00% -0.41%
Amean unixbench-syscall-512 0.00% 0.13%
Hmean unixbench-pipe-1 0.00% 1.02%
Hmean unixbench-pipe-512 0.00% 0.53%
Hmean unixbench-spawn-1 0.00% -2.68%
Hmean unixbench-spawn-512 0.00% 3.24%
Hmean unixbench-execl-1 0.00% 0.61%
Hmean unixbench-execl-512 0.00% 1.97%
--
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
>
> Thanks so much!
> -john
>
> [..snip..]
>
--
Thanks and Regards,
Prateek
On Tue, Feb 27, 2024 at 8:43 PM 'K Prateek Nayak' via kernel-team
<kernel-team@android.com> wrote:
> Happy to report that I did not see any regressions with the series
> as expected. Full results below.
>
[snip]
> o System Details
>
> - 3rd Generation EPYC System
> - 2 x 64C/128T
> - NPS1 mode
>
> o Kernels
>
> tip: tip:sched/core at commit 8cec3dd9e593 ("sched/core:
> Simplify code by removing duplicate #ifdefs")
>
> proxy-setup: tip + this series
>
Hey! Thank you so much for taking the time to run these through the
testing! I *really* appreciate it!
Just to clarify: by "this series" did you test just the 7 preparatory
patches submitted to the list here, or did you pull the full
proxy-exec-v8-6.8-rc3 set from git?
(Either is great! I just wanted to make sure its clear which were covered)
[snip]
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Thanks so much again!
-john
Hello John,
On 2/28/2024 10:21 AM, John Stultz wrote:
> On Tue, Feb 27, 2024 at 8:43 PM 'K Prateek Nayak' via kernel-team
> <kernel-team@android.com> wrote:
>> Happy to report that I did not see any regressions with the series
>> as expected. Full results below.
>>
> [snip]
>> o System Details
>>
>> - 3rd Generation EPYC System
>> - 2 x 64C/128T
>> - NPS1 mode
>>
>> o Kernels
>>
>> tip: tip:sched/core at commit 8cec3dd9e593 ("sched/core:
>> Simplify code by removing duplicate #ifdefs")
>>
>> proxy-setup: tip + this series
>>
>
> Hey! Thank you so much for taking the time to run these through the
> testing! I *really* appreciate it!
>
> Just to clarify: by "this series" did you test just the 7 preparatory
> patches submitted to the list here, or did you pull the full
> proxy-exec-v8-6.8-rc3 set from git?
Just these preparatory patches for now. On my way to queue a run for the
whole set from your tree. I'll use the "proxy-exec-v8-6.8-rc3" branch and
pick the commits past the
"[ANNOTATION] === Proxy Exec patches past this point ===" till the commit
ff90fb583a81 ("FIX: Avoid using possibly uninitialized cpu value with
activate_blocked_entities()") on top of the tip:sched/core mentioned
above since it'll allow me to reuse the baseline numbers :)
> (Either is great! I just wanted to make sure its clear which were covered)
>
> [snip]
>> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
>
> Thanks so much again!
> -john
--
Thanks and Regards,
Prateek
On Tue, Feb 27, 2024 at 9:12 PM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> On 2/28/2024 10:21 AM, John Stultz wrote:
> > Just to clarify: by "this series" did you test just the 7 preparatory
> > patches submitted to the list here, or did you pull the full
> > proxy-exec-v8-6.8-rc3 set from git?
>
> Just these preparatory patches for now. On my way to queue a run for the
> whole set from your tree. I'll use the "proxy-exec-v8-6.8-rc3" branch and
> pick the commits past the
> "[ANNOTATION] === Proxy Exec patches past this point ===" till the commit
> ff90fb583a81 ("FIX: Avoid using possibly uninitialized cpu value with
> activate_blocked_entities()") on top of the tip:sched/core mentioned
> above since it'll allow me to reuse the baseline numbers :)
>
Ah, thank you for the clarification!
Also, I really appreciate your testing with the rest of the series as
well. It will be good to have any potential problems identified early
(I'm trying to get v9 ready as soon as I can here, as its fixed a
number of smaller issues - However, I've also managed to uncover some
new problems in stress testing, so we'll see how quickly I can chase
those down).
thanks
-john
Hello John,
On 2/28/2024 10:54 AM, John Stultz wrote:
> On Tue, Feb 27, 2024 at 9:12 PM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>> On 2/28/2024 10:21 AM, John Stultz wrote:
>>> Just to clarify: by "this series" did you test just the 7 preparatory
>>> patches submitted to the list here, or did you pull the full
>>> proxy-exec-v8-6.8-rc3 set from git?
>>
>> Just these preparatory patches for now. On my way to queue a run for the
>> whole set from your tree. I'll use the "proxy-exec-v8-6.8-rc3" branch and
>> pick the commits past the
>> "[ANNOTATION] === Proxy Exec patches past this point ===" till the commit
>> ff90fb583a81 ("FIX: Avoid using possibly uninitialized cpu value with
>> activate_blocked_entities()") on top of the tip:sched/core mentioned
>> above since it'll allow me to reuse the baseline numbers :)
>>
>
> Ah, thank you for the clarification!
>
> Also, I really appreciate your testing with the rest of the series as
> well. It will be good to have any potential problems identified early
I got a chance to test the whole of v8 patches on the same dual socket
3rd Generation EPYC system:
tl;dr
- There is a slight regression in hackbench but instead of the 10x
blowup seen previously, it is only around 5% with overloaded case
not regressing at all.
- A small but consistent (~2-3%) regression is seen in tbench and
netperf.
- schbench is inconclusive due to run to run variance and stream is
perf neutral with proxy execution.
I've not looked deeper into the regressions. I'll let you know if I
spot anything when digging deeper. Below are the full results:
o System Details
- 3rd Generation EPYC System
- 2 x 64C/128T
- NPS1 mode
o Kernels
tip: tip:sched/core at commit 8cec3dd9e593
("sched/core: Simplify code by removing
duplicate #ifdefs")
proxy-exec-full: tip + proxy execution commits from
"proxy-exec-v8-6.8-rc3" described previously in
this thread.
o Results
==================================================================
Test : hackbench
Units : Normalized time in seconds
Interpretation: Lower is better
Statistic : AMean
==================================================================
Case: tip[pct imp](CV) proxy-exec-full[pct imp](CV)
1-groups 1.00 [ -0.00]( 2.08) 1.00 [ -0.18]( 3.90)
2-groups 1.00 [ -0.00]( 0.89) 1.04 [ -4.43]( 0.78)
4-groups 1.00 [ -0.00]( 0.81) 1.05 [ -4.82]( 1.03)
8-groups 1.00 [ -0.00]( 0.78) 1.02 [ -1.90]( 1.00)
16-groups 1.00 [ -0.00]( 1.60) 1.01 [ -0.80]( 1.18)
==================================================================
Test : tbench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) proxy-exec-full[pct imp](CV)
1 1.00 [ 0.00]( 0.71) 0.97 [ -3.00]( 0.15)
2 1.00 [ 0.00]( 0.25) 0.97 [ -3.35]( 0.98)
4 1.00 [ 0.00]( 0.85) 0.97 [ -3.26]( 1.40)
8 1.00 [ 0.00]( 1.00) 0.97 [ -2.75]( 0.46)
16 1.00 [ 0.00]( 1.25) 0.99 [ -1.27]( 0.11)
32 1.00 [ 0.00]( 0.35) 0.98 [ -2.42]( 0.06)
64 1.00 [ 0.00]( 0.71) 0.97 [ -2.76]( 1.81)
128 1.00 [ 0.00]( 0.46) 0.97 [ -2.67]( 0.88)
256 1.00 [ 0.00]( 0.24) 0.98 [ -1.97]( 0.98)
512 1.00 [ 0.00]( 0.30) 0.98 [ -2.41]( 0.38)
1024 1.00 [ 0.00]( 0.40) 0.98 [ -2.21]( 0.11)
==================================================================
Test : stream-10
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) proxy-exec-full[pct imp](CV)
Copy 1.00 [ 0.00]( 9.73) 1.00 [ 0.26]( 6.36)
Scale 1.00 [ 0.00]( 5.57) 1.02 [ 1.59]( 2.98)
Add 1.00 [ 0.00]( 5.43) 1.00 [ 0.48]( 2.77)
Triad 1.00 [ 0.00]( 5.50) 0.98 [ -2.18]( 6.06)
==================================================================
Test : stream-100
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) proxy-exec-full[pct imp](CV)
Copy 1.00 [ 0.00]( 3.26) 0.98 [ -1.96]( 3.24)
Scale 1.00 [ 0.00]( 1.26) 0.96 [ -3.61]( 6.41)
Add 1.00 [ 0.00]( 1.47) 0.98 [ -1.84]( 4.14)
Triad 1.00 [ 0.00]( 1.77) 1.00 [ 0.27]( 2.60)
==================================================================
Test : netperf
Units : Normalized Througput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) proxy-exec-full[pct imp](CV)
1-clients 1.00 [ 0.00]( 0.22) 0.97 [ -3.01]( 0.40)
2-clients 1.00 [ 0.00]( 0.57) 0.97 [ -3.25]( 0.45)
4-clients 1.00 [ 0.00]( 0.43) 0.97 [ -3.26]( 0.59)
8-clients 1.00 [ 0.00]( 0.27) 0.97 [ -2.83]( 0.55)
16-clients 1.00 [ 0.00]( 0.46) 0.97 [ -2.99]( 0.65)
32-clients 1.00 [ 0.00]( 0.95) 0.97 [ -2.98]( 0.71)
64-clients 1.00 [ 0.00]( 1.79) 0.97 [ -2.61]( 1.38)
128-clients 1.00 [ 0.00]( 0.89) 0.97 [ -2.72]( 0.94)
256-clients 1.00 [ 0.00]( 3.88) 0.98 [ -1.89]( 2.92)
512-clients 1.00 [ 0.00](35.06) 0.99 [ -0.78](47.83)
==================================================================
Test : schbench
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) proxy-exec-full[pct imp](CV)
1 1.00 [ -0.00](27.28) 1.31 [-31.25]( 6.45)
2 1.00 [ -0.00]( 3.85) 0.95 [ 5.00](10.02)
4 1.00 [ -0.00](14.00) 1.11 [-10.53]( 1.36)
8 1.00 [ -0.00]( 4.68) 1.15 [-14.58](14.55)
16 1.00 [ -0.00]( 4.08) 0.98 [ 1.61]( 3.28)
32 1.00 [ -0.00]( 6.68) 1.02 [ -2.04]( 1.71)
64 1.00 [ -0.00]( 1.79) 1.12 [-11.73]( 7.08)
128 1.00 [ -0.00]( 6.30) 1.11 [-10.84]( 5.52)
256 1.00 [ -0.00](43.39) 1.37 [-37.14](20.11)
512 1.00 [ -0.00]( 2.26) 0.99 [ 1.17]( 1.43)
==================================================================
Test : Unixbench
Units : Normalized scores
Interpretation: Lower is better
Statistic : Various (Mentioned)
==================================================================
Metric Variant tip proxy-exec-full
Hmean unixbench-dhry2reg-1 0.00% -0.67%
Hmean unixbench-dhry2reg-512 0.00% 0.14%
Amean unixbench-syscall-1 0.00% -0.86%
Amean unixbench-syscall-512 0.00% -6.42%
Hmean unixbench-pipe-1 0.00% 0.79%
Hmean unixbench-pipe-512 0.00% 0.57%
Hmean unixbench-spawn-1 0.00% -3.91%
Hmean unixbench-spawn-512 0.00% 3.17%
Hmean unixbench-execl-1 0.00% -1.18%
Hmean unixbench-execl-512 0.00% 1.26%
--
> (I'm trying to get v9 ready as soon as I can here, as its fixed a
> number of smaller issues - However, I've also managed to uncover some
> new problems in stress testing, so we'll see how quickly I can chase
> those down).
I haven't seen any splats when running the above tests. I'll test some
larger workloads next. Please let me know if you would like me to test
any specific workload or need additional data from these tests :)
>
> thanks
> -john
--
Thanks and Regards,
Prateek
On Wed, Feb 28, 2024 at 9:37 AM 'K Prateek Nayak' via kernel-team <kernel-team@android.com> wrote: > I got a chance to test the whole of v8 patches on the same dual socket > 3rd Generation EPYC system: > > tl;dr > > - There is a slight regression in hackbench but instead of the 10x > blowup seen previously, it is only around 5% with overloaded case > not regressing at all. > > - A small but consistent (~2-3%) regression is seen in tbench and > netperf. Once again, thank you so much for your testing and reporting of the data! I really appreciate it! Do you mind sharing exactly how you're running the benchmarks? (I'd like to try to reproduce these locally (though my machine is much smaller). I'm guessing the hackbench one is the same command you shared earlier with v6? thanks -john
Hello John,
On 2/29/2024 11:49 AM, John Stultz wrote:
> On Wed, Feb 28, 2024 at 9:37 AM 'K Prateek Nayak' via kernel-team
> <kernel-team@android.com> wrote:
>> I got a chance to test the whole of v8 patches on the same dual socket
>> 3rd Generation EPYC system:
>>
>> tl;dr
>>
>> - There is a slight regression in hackbench but instead of the 10x
>> blowup seen previously, it is only around 5% with overloaded case
>> not regressing at all.
>>
>> - A small but consistent (~2-3%) regression is seen in tbench and
>> netperf.
>
> Once again, thank you so much for your testing and reporting of the
> data! I really appreciate it!
>
> Do you mind sharing exactly how you're running the benchmarks? (I'd
> like to try to reproduce these locally (though my machine is much
> smaller).
>
> I'm guessing the hackbench one is the same command you shared earlier with v6?
Yup it is same as earlier. I'll list all the commands down below:
o Hackbench
perf bench sched messaging -p -t -l 100000 -g <# of groups>
o Old schbench
git://git.kernel.org/pub/scm/linux/kernel/git/mason/schbench.git
at commit e4aa540 ("Make sure rps isn't zero in auto_rps mode.")
schbench -m 2 -t <# workers> -r 30
(I should probably upgrade this to the latest! Let me get on it)
o tbench (https://www.samba.org/ftp/tridge/dbench/dbench-4.0.tar.gz)
nohup tbench_srv 0 &
tbench -c client.txt -t 60 <# clients> 127.0.0.1
o Stream (https://www.cs.virginia.edu/stream/FTP/Code/)
export ARRAY_SIZE=128000000; # 4 * Local L3 size
gcc -DSTREAM_ARRAY_SIZE=$ARRAY_SIZE -DNTIMES=<Loops internally> -fopenmp -O2 stream.c -o stream
export OMP_NUM_THREADS=16; # Number of CCX on my machine
./stream;
o netperf
netserver -L 127.0.0.1
for i in `seq 0 1 <num clients>`;
do
netperf -H 127.0.0.1 -t TCP_RR -l 100 -- -r 100 -k REQUEST_SIZE,RESPONSE_SIZE,ELAPSED_TIME,THROUGHPUT,THROUGHPUT_UNITS,MIN_LATENCY,MEAN_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MAX_LATENCY,STDDEV_LATENCY&
done
wait;
o Unixbench (from mmtest)
./run-mmtests.sh --no-monitor --config configs/config-workload-unixbench
--
If you have any other question, please do let me know :)
>
> thanks
> -john
--
Thanks and Regards,
Prateek
On Wed, Feb 28, 2024 at 10:44 PM 'K Prateek Nayak' via kernel-team <kernel-team@android.com> wrote: > On 2/29/2024 11:49 AM, John Stultz wrote: > > Do you mind sharing exactly how you're running the benchmarks? (I'd > > like to try to reproduce these locally (though my machine is much > > smaller). > > > > I'm guessing the hackbench one is the same command you shared earlier with v6? > > Yup it is same as earlier. I'll list all the commands down below: Great! I'll try to take a swing at reproducing these locally before I send out v9. [snip] > If you have any other question, please do let me know :) Thank you so much for the details! Your efforts here are very appreciated! -john
© 2016 - 2026 Red Hat, Inc.