[RESEND][PATCH v21 0/6] Donor Migration for Proxy Execution (v21)

John Stultz posted 6 patches 4 weeks, 1 day ago
include/linux/sched.h        | 120 ++++++++-----
init/init_task.c             |   4 +
kernel/fork.c                |   4 +
kernel/locking/mutex-debug.c |   4 +-
kernel/locking/mutex.c       |  83 +++++++--
kernel/locking/ww_mutex.h    |  20 +--
kernel/sched/core.c          | 329 +++++++++++++++++++++++++++++++++--
kernel/sched/fair.c          |   3 +-
kernel/sched/sched.h         |   2 +-
9 files changed, 473 insertions(+), 96 deletions(-)
[RESEND][PATCH v21 0/6] Donor Migration for Proxy Execution (v21)
Posted by John Stultz 4 weeks, 1 day ago
Hey All,

I didn't get any feedback on the last iteration, so I wanted to
resend this next chunk of the series: Donor Migration 

The main change from v20 before it, is that I previously had
logic where the ww_mutex paths took the blocked_lock of the task
it was waking (either the lock waiter->task or owner), but in a
context from __mutex_lock_common() where we already held the
current->block_lock. This required using the spin_lock_nested()
annotation to keep lockdep happy, and I was leaning on the logic
that there is an implied order between running current and the
existing not-running lock waiters, which should avoid loops. In
the wound case, there is also an order used if the owners
context is younger, which sounded likely to avoid loops.

However, after thinking more about the wound case where we are
wounding a lock owner, since that owner is not waiting and could
be trying to acquire a mutex current owns, I couldn’t quite
convince myself we couldn’t get into a ABBA style deadlock with
the nested blocked_lock accesses (though, I’ve not been able to
contrive it to happen, but that doesn’t prove anything).

So the main difference in v21 is reworking of how we hold the
blocked_lock in the mutex_lock_common() code, reducing it so we
don’t call into ww_mutex paths while holding it. The
lock->waiter_lock still serializes things at top level, but the
blocked_lock isn’t held completely in parallel under that, and
is focused on its purpose of protecting the blocked_on,
blocked_on_state and similar proxy-related values in the task
struct.

I also did some cleanups to be more consistent in how the
blocked_on_state is handled. I had a few spots previously where
I was cheating and just set the value instead of going through
the helpers. And sure enough, in fixing those I realized there
were a few spots where I wasn’t always holding the right
blocked_lock, so some minor rework helped clean that up.

I’m trying to submit this larger work in smallish digestible
pieces, so in this portion of the series, I’m only submitting
for review and consideration the logic that allows us to do
donor(blocked waiter) migration, allowing us to proxy-execute
lock owners that might be on other cpu runqueues. This requires
some additional changes to locking and extra state tracking to
ensure we don’t accidentally run a migrated donor on a cpu it
isn’t affined to, as well as some extra handling to deal with
balance callback state that needs to be reset when we decide to
pick a different task after doing donor migration.

I’d love to get some feedback on any place where these patches
are confusing, or could use additional clarifications.

Also you can find the full proxy-exec series here:
  https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v21-6.17-rc4/
  https://github.com/johnstultz-work/linux-dev.git proxy-exec-v21-6.17-rc4

Issues still to address with the full series:
* Need to sort out what is needed for sched_ext to be ok with
  proxy-execution enabled. This is my next priority.

* K Prateek Nayak did some testing about a bit over a year ago
  with an earlier version of the full series and saw ~3-5%
  regressions in some cases. Need to re-evaluate this with the
  proxy-migration avoidance optimization Suleiman suggested
  having now been implemented.

* The chain migration functionality needs further iterations and
  better validation to ensure it truly maintains the RT/DL load
  balancing invariants (despite this being broken in vanilla
  upstream with RT_PUSH_IPI currently)

Future work:
* Expand to other locking primitives: Suleiman is looking at
  rw_semaphores, as that is another common source of priority
  inversion. Figuring out pi-futexes would be good too.
* Eventually: Work to replace rt_mutexes and get things happy
  with PREEMPT_RT

I’d really appreciate any feedback or review thoughts on the
full series as well. I’m trying to keep the chunks small,
reviewable and iteratively testable, but if you have any
suggestions on how to improve the series, I’m all ears.

Credit/Disclaimer:
—--------------------
As always, this Proxy Execution series has a long history with
lots of developers that deserve credit: 

First described in a paper[1] by Watkins, Straub, Niehaus, then
from patches from Peter Zijlstra, extended with lots of work by
Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank
you to Steven Rostedt for providing additional details here!)

So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are likely mine.

Thanks so much!
-john

[1] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf

Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>   
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com


John Stultz (5):
  locking: Add task::blocked_lock to serialize blocked_on state
  sched/locking: Add blocked_on_state to provide necessary tri-state for
    proxy return-migration
  sched: Add logic to zap balance callbacks if we pick again
  sched: Handle blocked-waiter migration (and return migration)
  sched: Migrate whole chain in proxy_migrate_task()

Peter Zijlstra (1):
  sched: Add blocked_donor link to task for smarter mutex handoffs

 include/linux/sched.h        | 120 ++++++++-----
 init/init_task.c             |   4 +
 kernel/fork.c                |   4 +
 kernel/locking/mutex-debug.c |   4 +-
 kernel/locking/mutex.c       |  83 +++++++--
 kernel/locking/ww_mutex.h    |  20 +--
 kernel/sched/core.c          | 329 +++++++++++++++++++++++++++++++++--
 kernel/sched/fair.c          |   3 +-
 kernel/sched/sched.h         |   2 +-
 9 files changed, 473 insertions(+), 96 deletions(-)

-- 
2.51.0.338.gd7d06c2dae-goog
Re: [RESEND][PATCH v21 0/6] Donor Migration for Proxy Execution (v21)
Posted by Juri Lelli 3 weeks ago
Hi John,

On 04/09/25 00:21, John Stultz wrote:
> Hey All,
> 
> I didn't get any feedback on the last iteration, so I wanted to
> resend this next chunk of the series: Donor Migration 

Not ignoring you, but I had to spend some time putting together some
testing infra I am not trying to use to see how DEADLINE behaves with
the series, as it's somewhat difficult for me to think in abstract about
all this. :)

> Also you can find the full proxy-exec series here:
>   https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v21-6.17-rc4/
>   https://github.com/johnstultz-work/linux-dev.git proxy-exec-v21-6.17-rc4
> 

> I’d really appreciate any feedback or review thoughts on the
> full series as well.

I current have the following on top of your complete series

https://github.com/jlelli/linux/commits/experimental/eval-mbwi/
https://github.com/jlelli/linux experimental/eval-mbwi

of which

https://github.com/jlelli/linux/commit/9d4bbb1aca624e76e5b34938d848dc9a418c6146

introduces the testing (M-BWI is Multiprocessor Bandwidth Inheritance)
infra and the rest some additional tracepoints (based on Gabriele's
patch) to get more DEADLINE info out of tests (in conjuction with
sched_tp [1]).

Nothing bit to report just yet, mainly spent time getting this working.

One thing I noticed thouh (and probably forgot from previous
discussions) is that spin_on_owner might be 'confusing' from an
RT/DEADLINE perspective as it deviates from what one expects from the
ideal theoretical world (as tasks don't immediately block and
potentially donate). Not sure what to do about it. Maybe special case it
for RT/DEADLINE, but just started playing with it.

Anyway, I will keep playing with all this. Just wanted to give
you/others a quick update. Also adding Luca, Tommaso and Yuri to the
thread so that they are aware of the testing framework. :)

Thanks!
Juri

1 - https://github.com/jlelli/sched_tp/tree/deadline-tp
    will open an MR against Qais' mainline repo as soon as TPs are
    hopefully merged upstream as well.

Re: [RESEND][PATCH v21 0/6] Donor Migration for Proxy Execution (v21)
Posted by John Stultz 3 weeks ago
On Thu, Sep 11, 2025 at 6:59 AM Juri Lelli <juri.lelli@redhat.com> wrote:
> On 04/09/25 00:21, John Stultz wrote:
> > I’d really appreciate any feedback or review thoughts on the
> > full series as well.
>
> I current have the following on top of your complete series
>
> https://github.com/jlelli/linux/commits/experimental/eval-mbwi/
> https://github.com/jlelli/linux experimental/eval-mbwi
>
> of which
>
> https://github.com/jlelli/linux/commit/9d4bbb1aca624e76e5b34938d848dc9a418c6146
>
> introduces the testing (M-BWI is Multiprocessor Bandwidth Inheritance)
> infra and the rest some additional tracepoints (based on Gabriele's
> patch) to get more DEADLINE info out of tests (in conjuction with
> sched_tp [1]).
>
> Nothing bit to report just yet, mainly spent time getting this working.

Very cool to see! I'll have to pull those and take a look at it!

And I'm of course very interested to hear if you find anything with
the proxy set that I need to revise.

> One thing I noticed thouh (and probably forgot from previous
> discussions) is that spin_on_owner might be 'confusing' from an
> RT/DEADLINE perspective as it deviates from what one expects from the
> ideal theoretical world (as tasks don't immediately block and
> potentially donate). Not sure what to do about it. Maybe special case it
> for RT/DEADLINE, but just started playing with it.

Can you refresh me a bit on why blocking to donate is preferred? If
the lock owner is running, there's not much that blocking to donate
would help with. Does this concern not apply to the current mutex
logic without proxy?  With proxy-exec, I'm trying to preserve the
existing mutex behavior of spin_on_owner, with the main tweak is just
the lock handoff to the current donor when we are proxying, otherwise
the intent is it should be the same.

Now, I do recognize that rt_mutexes and mutexes do have different lock
handoff requirements for RT tasks (needs to strictly go to the highest
priority waiter, and we can't let a lower priority task steal it),
which is why I've not yet enabled proxy-exec on rt_mutexes.

> Anyway, I will keep playing with all this. Just wanted to give
> you/others a quick update. Also adding Luca, Tommaso and Yuri to the
> thread so that they are aware of the testing framework. :)

Thanks so much for sharing!
-john
Re: [RESEND][PATCH v21 0/6] Donor Migration for Proxy Execution (v21)
Posted by Juri Lelli 2 weeks, 6 days ago
On 11/09/25 16:21, John Stultz wrote:
> On Thu, Sep 11, 2025 at 6:59 AM Juri Lelli <juri.lelli@redhat.com> wrote:
> > On 04/09/25 00:21, John Stultz wrote:
> > > I’d really appreciate any feedback or review thoughts on the
> > > full series as well.
> >
> > I current have the following on top of your complete series
> >
> > https://github.com/jlelli/linux/commits/experimental/eval-mbwi/
> > https://github.com/jlelli/linux experimental/eval-mbwi
> >
> > of which
> >
> > https://github.com/jlelli/linux/commit/9d4bbb1aca624e76e5b34938d848dc9a418c6146
> >
> > introduces the testing (M-BWI is Multiprocessor Bandwidth Inheritance)
> > infra and the rest some additional tracepoints (based on Gabriele's
> > patch) to get more DEADLINE info out of tests (in conjuction with
> > sched_tp [1]).
> >
> > Nothing bit to report just yet, mainly spent time getting this working.
> 
> Very cool to see! I'll have to pull those and take a look at it!
> 
> And I'm of course very interested to hear if you find anything with
> the proxy set that I need to revise.
> 
> > One thing I noticed thouh (and probably forgot from previous
> > discussions) is that spin_on_owner might be 'confusing' from an
> > RT/DEADLINE perspective as it deviates from what one expects from the
> > ideal theoretical world (as tasks don't immediately block and
> > potentially donate). Not sure what to do about it. Maybe special case it
> > for RT/DEADLINE, but just started playing with it.
> 
> Can you refresh me a bit on why blocking to donate is preferred? If
> the lock owner is running, there's not much that blocking to donate
> would help with. Does this concern not apply to the current mutex
> logic without proxy?  With proxy-exec, I'm trying to preserve the
> existing mutex behavior of spin_on_owner, with the main tweak is just
> the lock handoff to the current donor when we are proxying, otherwise
> the intent is it should be the same.

Yeah, I think we want to preserve that behavior for non-RT mutexes for
throughput, but for RT I fear we might risk priority inversion if tasks
spin (for a bit) before blocking. My understanding is that with PI
enabled futexes (apart from some initial tries to get the lock with
atomic ops) we then call into __rt_mutex_start_proxy_lock() which
enqueues the blocked tasks onto the pi chain (so that PI rules are
respected etc.). Guess maybe we could end-up reintroducing this behavior
when we eventually kill rt-mutexes, so don't worry to much about it yet
I think, just something to keep in mind. :)

> Now, I do recognize that rt_mutexes and mutexes do have different lock
> handoff requirements for RT tasks (needs to strictly go to the highest
> priority waiter, and we can't let a lower priority task steal it),
> which is why I've not yet enabled proxy-exec on rt_mutexes.

Right. I will probably hack something in to test the DEADLINE scenarios,
but again don't worry about it.

Thanks,
Juri

Re: [RESEND][PATCH v21 0/6] Donor Migration for Proxy Execution (v21)
Posted by K Prateek Nayak 2 weeks, 3 days ago
Hello John,

On 9/4/2025 5:51 AM, John Stultz wrote:
> Also you can find the full proxy-exec series here:
>   https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v21-6.17-rc4/
>   https://github.com/johnstultz-work/linux-dev.git proxy-exec-v21-6.17-rc4

tl;dr

This series seems fine from performance standpoint but the above branch
may have some performance issues but take them with a grain of salt
since this is not all apples to apples comparison.

For this series things are alright - my harness for longer running
benchmarks gave up for some reason so I'll rerun those tests again and
report back later but either tip has some improvements for
netperf / tbench or "proxy-exec-v21-6.17-rc4" may have some issues
around it. I'll take a deeper look later in the week.


o System Details

- 3rd Generation EPYC System
- 2 x 64C/128T
- NPS1 mode


o Kernels

- tip		tip:sched/core at commit 5b726e9bf954 ("sched/fair: Get
		rid of throttled_lb_pair()")
		(CONFIG_SCHED_PROXY_EXEC disabled)

- proxy-v21	tip + this series as is
		(CONFIG_SCHED_PROXY_EXEC=y)

- proxy-full	proxy-exec-v21-6.17-rc4 as is
		(CONFIG_SCHED_PROXY_EXEC=y)


o Benchmark results

    ==================================================================
    Test          : hackbench
    Units         : Normalized time in seconds
    Interpretation: Lower is better
    Statistic     : AMean
    ==================================================================
    Case:           tip[pct imp](CV)      proxy-v21[pct imp](CV)   proxy-full[pct imp](CV)
     1-groups     1.00 [ -0.00](10.57)     0.94 [  6.24]( 7.88)     0.91 [  9.46](10.11)
     2-groups     1.00 [ -0.00]( 3.33)     1.02 [ -1.75]( 3.16)     1.04 [ -4.17]( 2.51)
     4-groups     1.00 [ -0.00]( 2.41)     1.01 [ -0.87]( 2.29)     1.03 [ -3.03]( 1.27)
     8-groups     1.00 [ -0.00]( 2.67)     1.02 [ -1.66]( 2.10)     1.01 [ -0.55]( 1.45)
    16-groups     1.00 [ -0.00]( 1.83)     1.01 [ -0.82]( 2.30)     1.00 [ -0.25]( 1.72)


    ==================================================================
    Test          : tbench
    Units         : Normalized throughput
    Interpretation: Higher is better
    Statistic     : AMean
    ==================================================================
    Clients:    tip[pct imp](CV)      proxy-v21[pct imp](CV)   proxy-full[pct imp](CV)
        1     1.00 [  0.00]( 0.81)     1.00 [ -0.13]( 0.16)     0.92 [ -8.06]( 0.39)
        2     1.00 [  0.00]( 0.32)     0.99 [ -0.84]( 0.66)     0.91 [ -8.85]( 0.54)
        4     1.00 [  0.00]( 0.32)     0.98 [ -2.37]( 1.40)     0.92 [ -8.28]( 0.28)
        8     1.00 [  0.00]( 0.69)     0.98 [ -2.47]( 0.53)     0.90 [ -9.58]( 0.36)
       16     1.00 [  0.00]( 1.24)     0.96 [ -3.94]( 1.51)     0.90 [ -9.83]( 0.69)
       32     1.00 [  0.00]( 0.60)     0.99 [ -1.47]( 3.38)     0.89 [-11.43]( 5.60)
       64     1.00 [  0.00]( 1.22)     0.99 [ -1.33]( 0.88)     0.91 [ -8.52]( 2.67)
      128     1.00 [  0.00]( 0.34)     0.99 [ -1.48]( 0.99)     0.92 [ -7.51]( 0.13)
      256     1.00 [  0.00]( 1.32)     0.98 [ -1.75]( 0.96)     0.97 [ -3.35]( 1.22)
      512     1.00 [  0.00]( 0.25)     0.99 [ -1.29]( 0.41)     0.97 [ -2.90]( 0.17)
     1024     1.00 [  0.00]( 0.24)     0.99 [ -0.59]( 0.14)     0.98 [ -2.36]( 0.33)


    ==================================================================
    Test          : stream-10
    Units         : Normalized Bandwidth, MB/s
    Interpretation: Higher is better
    Statistic     : HMean
    ==================================================================
    Test:       tip[pct imp](CV)      proxy-v21[pct imp](CV)   proxy-full[pct imp](CV)
     Copy     1.00 [  0.00](10.90)     1.07 [  6.53]( 8.21)     1.07 [  7.26]( 7.22)
    Scale     1.00 [  0.00]( 9.62)     1.04 [  4.00]( 6.99)     1.05 [  4.71]( 5.85)
      Add     1.00 [  0.00](10.17)     1.05 [  5.07]( 6.14)     1.06 [  6.03]( 6.56)
    Triad     1.00 [  0.00]( 8.48)     1.04 [  4.34]( 5.09)     1.04 [  4.07]( 4.40)


    ==================================================================
    Test          : stream-100
    Units         : Normalized Bandwidth, MB/s
    Interpretation: Higher is better
    Statistic     : HMean
    ==================================================================
    Test:       tip[pct imp](CV)      proxy-v21[pct imp](CV)   proxy-full[pct imp](CV)
     Copy     1.00 [  0.00]( 1.38)     1.01 [  0.99]( 1.21)     1.02 [  1.68]( 1.50)
    Scale     1.00 [  0.00]( 6.19)     1.02 [  1.94]( 4.34)     1.03 [  3.00]( 1.19)
      Add     1.00 [  0.00]( 4.42)     1.01 [  0.94]( 4.17)     1.02 [  1.58]( 1.54)
    Triad     1.00 [  0.00]( 1.30)     1.01 [  0.61]( 1.37)     1.00 [  0.18]( 2.65)


    ==================================================================
    Test          : netperf
    Units         : Normalized Througput
    Interpretation: Higher is better
    Statistic     : AMean
    ==================================================================
    Clients:         tip[pct imp](CV)      proxy-v21[pct imp](CV)   proxy-full[pct imp](CV)
     1-clients     1.00 [  0.00]( 0.41)     0.99 [ -1.03]( 0.34)     0.90 [ -9.96]( 0.46)
     2-clients     1.00 [  0.00]( 0.31)     0.99 [ -1.17]( 0.72)     0.90 [ -9.77]( 0.78)
     4-clients     1.00 [  0.00]( 0.57)     0.99 [ -0.68]( 0.32)     0.90 [-10.21]( 0.89)
     8-clients     1.00 [  0.00]( 0.46)     0.99 [ -0.69]( 0.32)     0.90 [-10.20]( 0.70)
    16-clients     1.00 [  0.00]( 0.57)     0.99 [ -1.39]( 1.28)     0.90 [-10.37]( 1.34)
    32-clients     1.00 [  0.00]( 1.03)     0.97 [ -2.53]( 1.92)     0.90 [-10.00]( 1.23)
    64-clients     1.00 [  0.00]( 1.23)     0.97 [ -3.15]( 2.94)     0.90 [ -9.94]( 1.52)
    128-clients    1.00 [  0.00]( 1.14)     0.99 [ -1.07]( 0.95)     0.90 [ -9.91]( 0.90)
    256-clients    1.00 [  0.00]( 3.73)     0.98 [ -1.80]( 3.66)     0.97 [ -3.41]( 4.47)
    512-clients    1.00 [  0.00](54.79)     0.97 [ -3.03](48.98)     0.95 [ -4.63](51.77)


    ==================================================================
    Test          : schbench
    Units         : Normalized 99th percentile latency in us
    Interpretation: Lower is better
    Statistic     : Median
    ==================================================================
    #workers: tip[pct imp](CV)      proxy-v21[pct imp](CV)   proxy-full[pct imp](CV)
      1     1.00 [ -0.00](30.14)     1.11 [-11.11](35.78)     1.31 [-30.56](42.87)
      2     1.00 [ -0.00]( 7.87)     0.93 [  7.14]( 8.45)     0.95 [  4.76]( 7.50)
      4     1.00 [ -0.00]( 7.87)     1.07 [ -7.14]( 7.36)     1.14 [-14.29](12.73)
      8     1.00 [ -0.00]( 4.59)     1.08 [ -8.16]( 5.09)     1.12 [-12.24]( 7.44)
     16     1.00 [ -0.00]( 5.33)     1.05 [ -5.08]( 0.93)     1.05 [ -5.08]( 2.75)
     32     1.00 [ -0.00]( 1.04)     1.00 [ -0.00]( 3.12)     1.07 [ -7.29]( 4.49)
     64     1.00 [ -0.00]( 1.04)     0.96 [  3.50]( 3.78)     1.01 [ -1.00]( 2.24)
    128     1.00 [ -0.00]( 5.11)     1.06 [ -6.11]( 7.56)     1.09 [ -8.60]( 6.26)
    256     1.00 [ -0.00](19.39)     1.29 [-28.73](14.92)     1.15 [-14.71](14.83)
    512     1.00 [ -0.00]( 0.15)     0.98 [  2.02]( 1.85)     0.99 [  1.01]( 1.66)


    ==================================================================
    Test          : new-schbench-requests-per-second
    Units         : Normalized Requests per second
    Interpretation: Higher is better
    Statistic     : Median
    ==================================================================
    #workers: tip[pct imp](CV)      proxy-v21[pct imp](CV)   proxy-full[pct imp](CV)
      1     1.00 [  0.00]( 0.26)     1.00 [  0.29]( 0.15)     1.00 [ -0.29]( 0.30)
      2     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.15)     1.00 [  0.00]( 0.15)
      4     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.15)     1.00 [  0.00]( 0.00)
      8     1.00 [  0.00]( 0.15)     1.00 [  0.29]( 0.15)     1.00 [  0.29]( 0.15)
     16     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)
     32     1.00 [  0.00]( 1.86)     1.00 [ -0.31]( 0.28)     1.00 [ -0.31]( 2.12)
     64     1.00 [  0.00](13.62)     0.99 [ -0.77]( 4.78)     0.81 [-18.52](11.11)
    128     1.00 [  0.00]( 0.00)     1.00 [  0.38]( 0.00)     1.00 [  0.38]( 0.00)
    256     1.00 [  0.00]( 1.49)     1.02 [  1.82]( 1.63)     1.00 [  0.00]( 1.19)
    512     1.00 [  0.00]( 0.75)     1.01 [  0.71]( 1.65)     1.01 [  1.19]( 1.53)


    ==================================================================
    Test          : new-schbench-wakeup-latency
    Units         : Normalized 99th percentile latency in us
    Interpretation: Lower is better
    Statistic     : Median
    ==================================================================
    #workers: tip[pct imp](CV)      proxy-v21[pct imp](CV)   proxy-full[pct imp](CV)
      1     1.00 [ -0.00]( 6.74)     1.00 [ -0.00]( 6.74)     1.12 [-12.50](19.26)
      2     1.00 [ -0.00](11.18)     1.00 [ -0.00](17.21)     1.50 [-50.00]( 7.45)
      4     1.00 [ -0.00]( 9.94)     1.00 [ -0.00](19.26)     1.56 [-55.56](15.78)
      8     1.00 [ -0.00](10.68)     1.00 [ -0.00](10.68)     1.44 [-44.44](28.77)
     16     1.00 [ -0.00]( 9.68)     1.00 [ -0.00]( 9.68)     1.20 [-20.00]( 8.15)
     32     1.00 [ -0.00](14.08)     1.00 [ -0.00]( 5.34)     1.20 [-20.00](14.70)
     64     1.00 [ -0.00]( 3.52)     1.13 [-13.33]( 5.26)     1.27 [-26.67]( 2.77)
    128     1.00 [ -0.00]( 1.79)     1.07 [ -6.56]( 2.70)     1.07 [ -6.97]( 7.71)
    256     1.00 [ -0.00]( 9.89)     1.04 [ -4.50]( 3.81)     1.02 [ -2.00]( 7.78)
    512     1.00 [ -0.00]( 0.00)     1.01 [ -0.77]( 0.34)     1.00 [ -0.00]( 0.20)


    ==================================================================
    Test          : new-schbench-request-latency
    Units         : Normalized 99th percentile latency in us
    Interpretation: Lower is better
    Statistic     : Median
    ==================================================================
    #workers: tip[pct imp](CV)      proxy-v21[pct imp](CV)   proxy-full[pct imp](CV)
      1     1.00 [ -0.00]( 1.33)     0.96 [  3.89]( 1.46)     1.02 [ -1.82]( 3.02)
      2     1.00 [ -0.00]( 0.14)     1.01 [ -1.09]( 0.24)     1.02 [ -2.44]( 2.73)
      4     1.00 [ -0.00]( 1.24)     1.00 [ -0.26]( 1.69)     0.97 [  2.65]( 0.14)
      8     1.00 [ -0.00]( 0.54)     1.00 [ -0.00]( 1.02)     0.99 [  1.31]( 2.16)
     16     1.00 [ -0.00]( 0.36)     1.00 [ -0.00]( 1.70)     0.98 [  1.59]( 1.00)
     32     1.00 [ -0.00]( 5.51)     0.99 [  0.73]( 2.09)     1.01 [ -1.45]( 7.52)
     64     1.00 [ -0.00]( 5.38)     1.09 [ -9.27]( 0.88)     1.09 [ -8.56]( 0.11)
    128     1.00 [ -0.00]( 0.32)     1.00 [ -0.36]( 0.32)     1.03 [ -2.54]( 1.15)
    256     1.00 [ -0.00](10.51)     1.14 [-14.23](11.19)     1.00 [  0.24](11.42)
    512     1.00 [ -0.00]( 2.00)     1.03 [ -3.27]( 0.94)     1.02 [ -2.41]( 1.96)


-- 
Thanks and Regards,
Prateek
Re: [RESEND][PATCH v21 0/6] Donor Migration for Proxy Execution (v21)
Posted by John Stultz 2 weeks, 3 days ago
On Mon, Sep 15, 2025 at 8:19 PM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello John,
>
> On 9/4/2025 5:51 AM, John Stultz wrote:
> > Also you can find the full proxy-exec series here:
> >   https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v21-6.17-rc4/
> >   https://github.com/johnstultz-work/linux-dev.git proxy-exec-v21-6.17-rc4
>
> tl;dr
>
> This series seems fine from performance standpoint but the above branch
> may have some performance issues but take them with a grain of salt
> since this is not all apples to apples comparison.
>

Thank you so much for running these test! I really appreciate it!
It does look like I need to  spend some more work on the full series,
as those regressions with the full proxy patch set does seem
problematic. Thank  you for raising this!

(Thank you also for your feedback on the patches in this series, I've
not gotten a chance to address and reply back individually yet, but I
hope to do so soon!)


> For this series things are alright - my harness for longer running
> benchmarks gave up for some reason so I'll rerun those tests again and
> report back later but either tip has some improvements for
> netperf / tbench or "proxy-exec-v21-6.17-rc4" may have some issues
> around it. I'll take a deeper look later in the week.
>

Let me know if you find anything further!

Thank you so much again for all your efforts here, both in testing and
review! As well as the testing you do for the whole community!
-john