include/linux/sched.h | 120 ++++++++----- init/init_task.c | 4 + kernel/fork.c | 4 + kernel/locking/mutex-debug.c | 4 +- kernel/locking/mutex.c | 83 +++++++-- kernel/locking/ww_mutex.h | 20 +-- kernel/sched/core.c | 329 +++++++++++++++++++++++++++++++++-- kernel/sched/fair.c | 3 +- kernel/sched/sched.h | 2 +- 9 files changed, 473 insertions(+), 96 deletions(-)
Hey All, I didn't get any feedback on the last iteration, so I wanted to resend this next chunk of the series: Donor Migration The main change from v20 before it, is that I previously had logic where the ww_mutex paths took the blocked_lock of the task it was waking (either the lock waiter->task or owner), but in a context from __mutex_lock_common() where we already held the current->block_lock. This required using the spin_lock_nested() annotation to keep lockdep happy, and I was leaning on the logic that there is an implied order between running current and the existing not-running lock waiters, which should avoid loops. In the wound case, there is also an order used if the owners context is younger, which sounded likely to avoid loops. However, after thinking more about the wound case where we are wounding a lock owner, since that owner is not waiting and could be trying to acquire a mutex current owns, I couldn’t quite convince myself we couldn’t get into a ABBA style deadlock with the nested blocked_lock accesses (though, I’ve not been able to contrive it to happen, but that doesn’t prove anything). So the main difference in v21 is reworking of how we hold the blocked_lock in the mutex_lock_common() code, reducing it so we don’t call into ww_mutex paths while holding it. The lock->waiter_lock still serializes things at top level, but the blocked_lock isn’t held completely in parallel under that, and is focused on its purpose of protecting the blocked_on, blocked_on_state and similar proxy-related values in the task struct. I also did some cleanups to be more consistent in how the blocked_on_state is handled. I had a few spots previously where I was cheating and just set the value instead of going through the helpers. And sure enough, in fixing those I realized there were a few spots where I wasn’t always holding the right blocked_lock, so some minor rework helped clean that up. I’m trying to submit this larger work in smallish digestible pieces, so in this portion of the series, I’m only submitting for review and consideration the logic that allows us to do donor(blocked waiter) migration, allowing us to proxy-execute lock owners that might be on other cpu runqueues. This requires some additional changes to locking and extra state tracking to ensure we don’t accidentally run a migrated donor on a cpu it isn’t affined to, as well as some extra handling to deal with balance callback state that needs to be reset when we decide to pick a different task after doing donor migration. I’d love to get some feedback on any place where these patches are confusing, or could use additional clarifications. Also you can find the full proxy-exec series here: https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v21-6.17-rc4/ https://github.com/johnstultz-work/linux-dev.git proxy-exec-v21-6.17-rc4 Issues still to address with the full series: * Need to sort out what is needed for sched_ext to be ok with proxy-execution enabled. This is my next priority. * K Prateek Nayak did some testing about a bit over a year ago with an earlier version of the full series and saw ~3-5% regressions in some cases. Need to re-evaluate this with the proxy-migration avoidance optimization Suleiman suggested having now been implemented. * The chain migration functionality needs further iterations and better validation to ensure it truly maintains the RT/DL load balancing invariants (despite this being broken in vanilla upstream with RT_PUSH_IPI currently) Future work: * Expand to other locking primitives: Suleiman is looking at rw_semaphores, as that is another common source of priority inversion. Figuring out pi-futexes would be good too. * Eventually: Work to replace rt_mutexes and get things happy with PREEMPT_RT I’d really appreciate any feedback or review thoughts on the full series as well. I’m trying to keep the chunks small, reviewable and iteratively testable, but if you have any suggestions on how to improve the series, I’m all ears. Credit/Disclaimer: —-------------------- As always, this Proxy Execution series has a long history with lots of developers that deserve credit: First described in a paper[1] by Watkins, Straub, Niehaus, then from patches from Peter Zijlstra, extended with lots of work by Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank you to Steven Rostedt for providing additional details here!) So again, many thanks to those above, as all the credit for this series really is due to them - while the mistakes are likely mine. Thanks so much! -john [1] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf Cc: Joel Fernandes <joelagnelf@nvidia.com> Cc: Qais Yousef <qyousef@layalina.io> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Zimuzo Ezeozue <zezeozue@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Will Deacon <will@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Metin Kaya <Metin.Kaya@arm.com> Cc: Xuewen Yan <xuewen.yan94@gmail.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: kuyo chang <kuyo.chang@mediatek.com> Cc: hupu <hupu.gm@gmail.com> Cc: kernel-team@android.com John Stultz (5): locking: Add task::blocked_lock to serialize blocked_on state sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration sched: Add logic to zap balance callbacks if we pick again sched: Handle blocked-waiter migration (and return migration) sched: Migrate whole chain in proxy_migrate_task() Peter Zijlstra (1): sched: Add blocked_donor link to task for smarter mutex handoffs include/linux/sched.h | 120 ++++++++----- init/init_task.c | 4 + kernel/fork.c | 4 + kernel/locking/mutex-debug.c | 4 +- kernel/locking/mutex.c | 83 +++++++-- kernel/locking/ww_mutex.h | 20 +-- kernel/sched/core.c | 329 +++++++++++++++++++++++++++++++++-- kernel/sched/fair.c | 3 +- kernel/sched/sched.h | 2 +- 9 files changed, 473 insertions(+), 96 deletions(-) -- 2.51.0.338.gd7d06c2dae-goog
Hi John, On 04/09/25 00:21, John Stultz wrote: > Hey All, > > I didn't get any feedback on the last iteration, so I wanted to > resend this next chunk of the series: Donor Migration Not ignoring you, but I had to spend some time putting together some testing infra I am not trying to use to see how DEADLINE behaves with the series, as it's somewhat difficult for me to think in abstract about all this. :) > Also you can find the full proxy-exec series here: > https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v21-6.17-rc4/ > https://github.com/johnstultz-work/linux-dev.git proxy-exec-v21-6.17-rc4 > > I’d really appreciate any feedback or review thoughts on the > full series as well. I current have the following on top of your complete series https://github.com/jlelli/linux/commits/experimental/eval-mbwi/ https://github.com/jlelli/linux experimental/eval-mbwi of which https://github.com/jlelli/linux/commit/9d4bbb1aca624e76e5b34938d848dc9a418c6146 introduces the testing (M-BWI is Multiprocessor Bandwidth Inheritance) infra and the rest some additional tracepoints (based on Gabriele's patch) to get more DEADLINE info out of tests (in conjuction with sched_tp [1]). Nothing bit to report just yet, mainly spent time getting this working. One thing I noticed thouh (and probably forgot from previous discussions) is that spin_on_owner might be 'confusing' from an RT/DEADLINE perspective as it deviates from what one expects from the ideal theoretical world (as tasks don't immediately block and potentially donate). Not sure what to do about it. Maybe special case it for RT/DEADLINE, but just started playing with it. Anyway, I will keep playing with all this. Just wanted to give you/others a quick update. Also adding Luca, Tommaso and Yuri to the thread so that they are aware of the testing framework. :) Thanks! Juri 1 - https://github.com/jlelli/sched_tp/tree/deadline-tp will open an MR against Qais' mainline repo as soon as TPs are hopefully merged upstream as well.
On Thu, Sep 11, 2025 at 6:59 AM Juri Lelli <juri.lelli@redhat.com> wrote: > On 04/09/25 00:21, John Stultz wrote: > > I’d really appreciate any feedback or review thoughts on the > > full series as well. > > I current have the following on top of your complete series > > https://github.com/jlelli/linux/commits/experimental/eval-mbwi/ > https://github.com/jlelli/linux experimental/eval-mbwi > > of which > > https://github.com/jlelli/linux/commit/9d4bbb1aca624e76e5b34938d848dc9a418c6146 > > introduces the testing (M-BWI is Multiprocessor Bandwidth Inheritance) > infra and the rest some additional tracepoints (based on Gabriele's > patch) to get more DEADLINE info out of tests (in conjuction with > sched_tp [1]). > > Nothing bit to report just yet, mainly spent time getting this working. Very cool to see! I'll have to pull those and take a look at it! And I'm of course very interested to hear if you find anything with the proxy set that I need to revise. > One thing I noticed thouh (and probably forgot from previous > discussions) is that spin_on_owner might be 'confusing' from an > RT/DEADLINE perspective as it deviates from what one expects from the > ideal theoretical world (as tasks don't immediately block and > potentially donate). Not sure what to do about it. Maybe special case it > for RT/DEADLINE, but just started playing with it. Can you refresh me a bit on why blocking to donate is preferred? If the lock owner is running, there's not much that blocking to donate would help with. Does this concern not apply to the current mutex logic without proxy? With proxy-exec, I'm trying to preserve the existing mutex behavior of spin_on_owner, with the main tweak is just the lock handoff to the current donor when we are proxying, otherwise the intent is it should be the same. Now, I do recognize that rt_mutexes and mutexes do have different lock handoff requirements for RT tasks (needs to strictly go to the highest priority waiter, and we can't let a lower priority task steal it), which is why I've not yet enabled proxy-exec on rt_mutexes. > Anyway, I will keep playing with all this. Just wanted to give > you/others a quick update. Also adding Luca, Tommaso and Yuri to the > thread so that they are aware of the testing framework. :) Thanks so much for sharing! -john
On 11/09/25 16:21, John Stultz wrote: > On Thu, Sep 11, 2025 at 6:59 AM Juri Lelli <juri.lelli@redhat.com> wrote: > > On 04/09/25 00:21, John Stultz wrote: > > > I’d really appreciate any feedback or review thoughts on the > > > full series as well. > > > > I current have the following on top of your complete series > > > > https://github.com/jlelli/linux/commits/experimental/eval-mbwi/ > > https://github.com/jlelli/linux experimental/eval-mbwi > > > > of which > > > > https://github.com/jlelli/linux/commit/9d4bbb1aca624e76e5b34938d848dc9a418c6146 > > > > introduces the testing (M-BWI is Multiprocessor Bandwidth Inheritance) > > infra and the rest some additional tracepoints (based on Gabriele's > > patch) to get more DEADLINE info out of tests (in conjuction with > > sched_tp [1]). > > > > Nothing bit to report just yet, mainly spent time getting this working. > > Very cool to see! I'll have to pull those and take a look at it! > > And I'm of course very interested to hear if you find anything with > the proxy set that I need to revise. > > > One thing I noticed thouh (and probably forgot from previous > > discussions) is that spin_on_owner might be 'confusing' from an > > RT/DEADLINE perspective as it deviates from what one expects from the > > ideal theoretical world (as tasks don't immediately block and > > potentially donate). Not sure what to do about it. Maybe special case it > > for RT/DEADLINE, but just started playing with it. > > Can you refresh me a bit on why blocking to donate is preferred? If > the lock owner is running, there's not much that blocking to donate > would help with. Does this concern not apply to the current mutex > logic without proxy? With proxy-exec, I'm trying to preserve the > existing mutex behavior of spin_on_owner, with the main tweak is just > the lock handoff to the current donor when we are proxying, otherwise > the intent is it should be the same. Yeah, I think we want to preserve that behavior for non-RT mutexes for throughput, but for RT I fear we might risk priority inversion if tasks spin (for a bit) before blocking. My understanding is that with PI enabled futexes (apart from some initial tries to get the lock with atomic ops) we then call into __rt_mutex_start_proxy_lock() which enqueues the blocked tasks onto the pi chain (so that PI rules are respected etc.). Guess maybe we could end-up reintroducing this behavior when we eventually kill rt-mutexes, so don't worry to much about it yet I think, just something to keep in mind. :) > Now, I do recognize that rt_mutexes and mutexes do have different lock > handoff requirements for RT tasks (needs to strictly go to the highest > priority waiter, and we can't let a lower priority task steal it), > which is why I've not yet enabled proxy-exec on rt_mutexes. Right. I will probably hack something in to test the DEADLINE scenarios, but again don't worry about it. Thanks, Juri
Hello John, On 9/4/2025 5:51 AM, John Stultz wrote: > Also you can find the full proxy-exec series here: > https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v21-6.17-rc4/ > https://github.com/johnstultz-work/linux-dev.git proxy-exec-v21-6.17-rc4 tl;dr This series seems fine from performance standpoint but the above branch may have some performance issues but take them with a grain of salt since this is not all apples to apples comparison. For this series things are alright - my harness for longer running benchmarks gave up for some reason so I'll rerun those tests again and report back later but either tip has some improvements for netperf / tbench or "proxy-exec-v21-6.17-rc4" may have some issues around it. I'll take a deeper look later in the week. o System Details - 3rd Generation EPYC System - 2 x 64C/128T - NPS1 mode o Kernels - tip tip:sched/core at commit 5b726e9bf954 ("sched/fair: Get rid of throttled_lb_pair()") (CONFIG_SCHED_PROXY_EXEC disabled) - proxy-v21 tip + this series as is (CONFIG_SCHED_PROXY_EXEC=y) - proxy-full proxy-exec-v21-6.17-rc4 as is (CONFIG_SCHED_PROXY_EXEC=y) o Benchmark results ================================================================== Test : hackbench Units : Normalized time in seconds Interpretation: Lower is better Statistic : AMean ================================================================== Case: tip[pct imp](CV) proxy-v21[pct imp](CV) proxy-full[pct imp](CV) 1-groups 1.00 [ -0.00](10.57) 0.94 [ 6.24]( 7.88) 0.91 [ 9.46](10.11) 2-groups 1.00 [ -0.00]( 3.33) 1.02 [ -1.75]( 3.16) 1.04 [ -4.17]( 2.51) 4-groups 1.00 [ -0.00]( 2.41) 1.01 [ -0.87]( 2.29) 1.03 [ -3.03]( 1.27) 8-groups 1.00 [ -0.00]( 2.67) 1.02 [ -1.66]( 2.10) 1.01 [ -0.55]( 1.45) 16-groups 1.00 [ -0.00]( 1.83) 1.01 [ -0.82]( 2.30) 1.00 [ -0.25]( 1.72) ================================================================== Test : tbench Units : Normalized throughput Interpretation: Higher is better Statistic : AMean ================================================================== Clients: tip[pct imp](CV) proxy-v21[pct imp](CV) proxy-full[pct imp](CV) 1 1.00 [ 0.00]( 0.81) 1.00 [ -0.13]( 0.16) 0.92 [ -8.06]( 0.39) 2 1.00 [ 0.00]( 0.32) 0.99 [ -0.84]( 0.66) 0.91 [ -8.85]( 0.54) 4 1.00 [ 0.00]( 0.32) 0.98 [ -2.37]( 1.40) 0.92 [ -8.28]( 0.28) 8 1.00 [ 0.00]( 0.69) 0.98 [ -2.47]( 0.53) 0.90 [ -9.58]( 0.36) 16 1.00 [ 0.00]( 1.24) 0.96 [ -3.94]( 1.51) 0.90 [ -9.83]( 0.69) 32 1.00 [ 0.00]( 0.60) 0.99 [ -1.47]( 3.38) 0.89 [-11.43]( 5.60) 64 1.00 [ 0.00]( 1.22) 0.99 [ -1.33]( 0.88) 0.91 [ -8.52]( 2.67) 128 1.00 [ 0.00]( 0.34) 0.99 [ -1.48]( 0.99) 0.92 [ -7.51]( 0.13) 256 1.00 [ 0.00]( 1.32) 0.98 [ -1.75]( 0.96) 0.97 [ -3.35]( 1.22) 512 1.00 [ 0.00]( 0.25) 0.99 [ -1.29]( 0.41) 0.97 [ -2.90]( 0.17) 1024 1.00 [ 0.00]( 0.24) 0.99 [ -0.59]( 0.14) 0.98 [ -2.36]( 0.33) ================================================================== Test : stream-10 Units : Normalized Bandwidth, MB/s Interpretation: Higher is better Statistic : HMean ================================================================== Test: tip[pct imp](CV) proxy-v21[pct imp](CV) proxy-full[pct imp](CV) Copy 1.00 [ 0.00](10.90) 1.07 [ 6.53]( 8.21) 1.07 [ 7.26]( 7.22) Scale 1.00 [ 0.00]( 9.62) 1.04 [ 4.00]( 6.99) 1.05 [ 4.71]( 5.85) Add 1.00 [ 0.00](10.17) 1.05 [ 5.07]( 6.14) 1.06 [ 6.03]( 6.56) Triad 1.00 [ 0.00]( 8.48) 1.04 [ 4.34]( 5.09) 1.04 [ 4.07]( 4.40) ================================================================== Test : stream-100 Units : Normalized Bandwidth, MB/s Interpretation: Higher is better Statistic : HMean ================================================================== Test: tip[pct imp](CV) proxy-v21[pct imp](CV) proxy-full[pct imp](CV) Copy 1.00 [ 0.00]( 1.38) 1.01 [ 0.99]( 1.21) 1.02 [ 1.68]( 1.50) Scale 1.00 [ 0.00]( 6.19) 1.02 [ 1.94]( 4.34) 1.03 [ 3.00]( 1.19) Add 1.00 [ 0.00]( 4.42) 1.01 [ 0.94]( 4.17) 1.02 [ 1.58]( 1.54) Triad 1.00 [ 0.00]( 1.30) 1.01 [ 0.61]( 1.37) 1.00 [ 0.18]( 2.65) ================================================================== Test : netperf Units : Normalized Througput Interpretation: Higher is better Statistic : AMean ================================================================== Clients: tip[pct imp](CV) proxy-v21[pct imp](CV) proxy-full[pct imp](CV) 1-clients 1.00 [ 0.00]( 0.41) 0.99 [ -1.03]( 0.34) 0.90 [ -9.96]( 0.46) 2-clients 1.00 [ 0.00]( 0.31) 0.99 [ -1.17]( 0.72) 0.90 [ -9.77]( 0.78) 4-clients 1.00 [ 0.00]( 0.57) 0.99 [ -0.68]( 0.32) 0.90 [-10.21]( 0.89) 8-clients 1.00 [ 0.00]( 0.46) 0.99 [ -0.69]( 0.32) 0.90 [-10.20]( 0.70) 16-clients 1.00 [ 0.00]( 0.57) 0.99 [ -1.39]( 1.28) 0.90 [-10.37]( 1.34) 32-clients 1.00 [ 0.00]( 1.03) 0.97 [ -2.53]( 1.92) 0.90 [-10.00]( 1.23) 64-clients 1.00 [ 0.00]( 1.23) 0.97 [ -3.15]( 2.94) 0.90 [ -9.94]( 1.52) 128-clients 1.00 [ 0.00]( 1.14) 0.99 [ -1.07]( 0.95) 0.90 [ -9.91]( 0.90) 256-clients 1.00 [ 0.00]( 3.73) 0.98 [ -1.80]( 3.66) 0.97 [ -3.41]( 4.47) 512-clients 1.00 [ 0.00](54.79) 0.97 [ -3.03](48.98) 0.95 [ -4.63](51.77) ================================================================== Test : schbench Units : Normalized 99th percentile latency in us Interpretation: Lower is better Statistic : Median ================================================================== #workers: tip[pct imp](CV) proxy-v21[pct imp](CV) proxy-full[pct imp](CV) 1 1.00 [ -0.00](30.14) 1.11 [-11.11](35.78) 1.31 [-30.56](42.87) 2 1.00 [ -0.00]( 7.87) 0.93 [ 7.14]( 8.45) 0.95 [ 4.76]( 7.50) 4 1.00 [ -0.00]( 7.87) 1.07 [ -7.14]( 7.36) 1.14 [-14.29](12.73) 8 1.00 [ -0.00]( 4.59) 1.08 [ -8.16]( 5.09) 1.12 [-12.24]( 7.44) 16 1.00 [ -0.00]( 5.33) 1.05 [ -5.08]( 0.93) 1.05 [ -5.08]( 2.75) 32 1.00 [ -0.00]( 1.04) 1.00 [ -0.00]( 3.12) 1.07 [ -7.29]( 4.49) 64 1.00 [ -0.00]( 1.04) 0.96 [ 3.50]( 3.78) 1.01 [ -1.00]( 2.24) 128 1.00 [ -0.00]( 5.11) 1.06 [ -6.11]( 7.56) 1.09 [ -8.60]( 6.26) 256 1.00 [ -0.00](19.39) 1.29 [-28.73](14.92) 1.15 [-14.71](14.83) 512 1.00 [ -0.00]( 0.15) 0.98 [ 2.02]( 1.85) 0.99 [ 1.01]( 1.66) ================================================================== Test : new-schbench-requests-per-second Units : Normalized Requests per second Interpretation: Higher is better Statistic : Median ================================================================== #workers: tip[pct imp](CV) proxy-v21[pct imp](CV) proxy-full[pct imp](CV) 1 1.00 [ 0.00]( 0.26) 1.00 [ 0.29]( 0.15) 1.00 [ -0.29]( 0.30) 2 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.15) 1.00 [ 0.00]( 0.15) 4 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.15) 1.00 [ 0.00]( 0.00) 8 1.00 [ 0.00]( 0.15) 1.00 [ 0.29]( 0.15) 1.00 [ 0.29]( 0.15) 16 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.00) 1.00 [ 0.00]( 0.00) 32 1.00 [ 0.00]( 1.86) 1.00 [ -0.31]( 0.28) 1.00 [ -0.31]( 2.12) 64 1.00 [ 0.00](13.62) 0.99 [ -0.77]( 4.78) 0.81 [-18.52](11.11) 128 1.00 [ 0.00]( 0.00) 1.00 [ 0.38]( 0.00) 1.00 [ 0.38]( 0.00) 256 1.00 [ 0.00]( 1.49) 1.02 [ 1.82]( 1.63) 1.00 [ 0.00]( 1.19) 512 1.00 [ 0.00]( 0.75) 1.01 [ 0.71]( 1.65) 1.01 [ 1.19]( 1.53) ================================================================== Test : new-schbench-wakeup-latency Units : Normalized 99th percentile latency in us Interpretation: Lower is better Statistic : Median ================================================================== #workers: tip[pct imp](CV) proxy-v21[pct imp](CV) proxy-full[pct imp](CV) 1 1.00 [ -0.00]( 6.74) 1.00 [ -0.00]( 6.74) 1.12 [-12.50](19.26) 2 1.00 [ -0.00](11.18) 1.00 [ -0.00](17.21) 1.50 [-50.00]( 7.45) 4 1.00 [ -0.00]( 9.94) 1.00 [ -0.00](19.26) 1.56 [-55.56](15.78) 8 1.00 [ -0.00](10.68) 1.00 [ -0.00](10.68) 1.44 [-44.44](28.77) 16 1.00 [ -0.00]( 9.68) 1.00 [ -0.00]( 9.68) 1.20 [-20.00]( 8.15) 32 1.00 [ -0.00](14.08) 1.00 [ -0.00]( 5.34) 1.20 [-20.00](14.70) 64 1.00 [ -0.00]( 3.52) 1.13 [-13.33]( 5.26) 1.27 [-26.67]( 2.77) 128 1.00 [ -0.00]( 1.79) 1.07 [ -6.56]( 2.70) 1.07 [ -6.97]( 7.71) 256 1.00 [ -0.00]( 9.89) 1.04 [ -4.50]( 3.81) 1.02 [ -2.00]( 7.78) 512 1.00 [ -0.00]( 0.00) 1.01 [ -0.77]( 0.34) 1.00 [ -0.00]( 0.20) ================================================================== Test : new-schbench-request-latency Units : Normalized 99th percentile latency in us Interpretation: Lower is better Statistic : Median ================================================================== #workers: tip[pct imp](CV) proxy-v21[pct imp](CV) proxy-full[pct imp](CV) 1 1.00 [ -0.00]( 1.33) 0.96 [ 3.89]( 1.46) 1.02 [ -1.82]( 3.02) 2 1.00 [ -0.00]( 0.14) 1.01 [ -1.09]( 0.24) 1.02 [ -2.44]( 2.73) 4 1.00 [ -0.00]( 1.24) 1.00 [ -0.26]( 1.69) 0.97 [ 2.65]( 0.14) 8 1.00 [ -0.00]( 0.54) 1.00 [ -0.00]( 1.02) 0.99 [ 1.31]( 2.16) 16 1.00 [ -0.00]( 0.36) 1.00 [ -0.00]( 1.70) 0.98 [ 1.59]( 1.00) 32 1.00 [ -0.00]( 5.51) 0.99 [ 0.73]( 2.09) 1.01 [ -1.45]( 7.52) 64 1.00 [ -0.00]( 5.38) 1.09 [ -9.27]( 0.88) 1.09 [ -8.56]( 0.11) 128 1.00 [ -0.00]( 0.32) 1.00 [ -0.36]( 0.32) 1.03 [ -2.54]( 1.15) 256 1.00 [ -0.00](10.51) 1.14 [-14.23](11.19) 1.00 [ 0.24](11.42) 512 1.00 [ -0.00]( 2.00) 1.03 [ -3.27]( 0.94) 1.02 [ -2.41]( 1.96) -- Thanks and Regards, Prateek
On Mon, Sep 15, 2025 at 8:19 PM K Prateek Nayak <kprateek.nayak@amd.com> wrote: > > Hello John, > > On 9/4/2025 5:51 AM, John Stultz wrote: > > Also you can find the full proxy-exec series here: > > https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v21-6.17-rc4/ > > https://github.com/johnstultz-work/linux-dev.git proxy-exec-v21-6.17-rc4 > > tl;dr > > This series seems fine from performance standpoint but the above branch > may have some performance issues but take them with a grain of salt > since this is not all apples to apples comparison. > Thank you so much for running these test! I really appreciate it! It does look like I need to spend some more work on the full series, as those regressions with the full proxy patch set does seem problematic. Thank you for raising this! (Thank you also for your feedback on the patches in this series, I've not gotten a chance to address and reply back individually yet, but I hope to do so soon!) > For this series things are alright - my harness for longer running > benchmarks gave up for some reason so I'll rerun those tests again and > report back later but either tip has some improvements for > netperf / tbench or "proxy-exec-v21-6.17-rc4" may have some issues > around it. I'll take a deeper look later in the week. > Let me know if you find anything further! Thank you so much again for all your efforts here, both in testing and review! As well as the testing you do for the whole community! -john
© 2016 - 2025 Red Hat, Inc.