[RFC][PATCH v13 0/7] Single CPU Proxy Execution (v13)

John Stultz posted 7 patches 2 weeks, 4 days ago
.../admin-guide/kernel-parameters.txt         |   5 +
include/linux/sched.h                         |  79 ++++-
init/Kconfig                                  |   7 +
init/init_task.c                              |   1 +
kernel/fork.c                                 |   4 +-
kernel/locking/mutex-debug.c                  |   9 +-
kernel/locking/mutex.c                        |  40 ++-
kernel/locking/ww_mutex.h                     |  24 +-
kernel/sched/core.c                           | 300 +++++++++++++++++-
kernel/sched/fair.c                           |  31 +-
kernel/sched/rt.c                             |  15 +-
kernel/sched/sched.h                          |  22 +-
kernel/sched/stats.h                          |   6 +-
13 files changed, 507 insertions(+), 36 deletions(-)
[RFC][PATCH v13 0/7] Single CPU Proxy Execution (v13)
Posted by John Stultz 2 weeks, 4 days ago
Hey All,

  Since the earlier proxy-execution preparation patches have
been queued in tip/sched/core, I wanted to send out the next
chunk of Proxy Execution - an approach for a generalized form of
priority inheritance. 

In this series, I’m only submitting the logic to support Proxy
Execution as both a build and runtime option, the mutex
blocked_on rework,  some small fixes for assumptions that proxy
changes, and the initial logic to run lock owners in place of
the waiting task on the same cpu.

With v13 of this series, there have been quite a number of
changes:
* Mostly dealing with collisions from changes that landed in
  6.12-rc1
* The most basic of handling of delayed dequeued tasks (just
  deactivate for now)
* Renaming “next” as “donor” to clarify things in proxy related
  functions
* Lots of cleanups

I’ve also continued working on the rest of the series, which
you can find here:
  https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v13-6.12-rc6
  https://github.com/johnstultz-work/linux-dev.git proxy-exec-v13-6.12-rc6

New changes in the full series include:
* After talking with Juri at LPC, he suggested for now that we
  re-add donor migrations of SCHED_DEADLINE tasks, so I’ve
  dropped the logic that previously disabled this.
* Workaround handling for “lost wakeups” issue (see below)

Issues still to address with the full series:
* The new delayed dequeuing logic added in 6.12-rc1 really
  conceptually collides with proxy-execution: As we now have
  multiple tasks that aren’t runnable for different reasons
  (one may be sleeping, another may be blocked on a mutex) that
  are left on the RQ and the rules for how we handle these
  un-runnable but enqueued tasks are different for each case.
  Right now my workaround is just stop proxying if we hit a
  sched_delayed task, but I’d like to have a better solution.
  My plan is to treat it similar to sleeping tasks, and do the
  same deactivated-owner-queuing (queuing the waiters on the
  sched_delayed task). The problem is when a sched_delayed task
  gets a wakeup, we won’t hit the logic to do the
  blocked-waiters activation, so I’ll need to change that.
  Just getting it working won’t address the conceptual
  collision, so I’d love any thoughts or feedback on how to
  generalize these two new forms of unrunnable-on-the-runqueue
  states.

* In testing with the full series (again, for clarity not with
  this same-rq proxying series I’m sending out), I hit some
  rare cases of what seem to be lost wakeups, where a task was
  marked as BO_WAKING, but then ttwu never managed to transition
  it to BO_RUNNABLE. This can cause us to get stuck either in the
  pick-again loop, or in a idle resched loop. I’ve added handlers
  to detect this and to safely do the BO_WAKING -> BO_RUNNABLE
  transition along with return migration if needed to avoid this
  issue, but this really is pasting over the underlying issue.
  This has been difficult to diagnose as by the time the issue
  is noticed, the wakeup may have been long in the past and the
  tracebuffer overwritten.

* K Prateek Nayak did some testing with an earlier version of the
  series and saw ~3-5% regressions in some cases. I’m hoping to
  look into this soon to see if we can reduce those further.

* The chain migration functionality needs further iterations and
  better validation to ensure it truly maintains the RT/DL load
  balancing invariants (despite this being broken in vanilla
  upstream with RT_PUSH_IPI currently)

I’d really appreciate any feedback or review thoughts on this
series. I’m trying to keep the chunks small, reviewable and
iteratively testable, but if you have any suggestions on how to
improve the series, I’m all ears.

Credit/Disclaimer:
—--------------------
As mentioned previously, this Proxy Execution series has a long
history: 

First described in a paper[1] by Watkins, Straub, Niehaus, then
from patches from Peter Zijlstra, extended with lots of work by
Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank
you to Steven Rostedt for providing additional details here!)

So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are likely mine.

Thanks so much!
-john

[1] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf


Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com



John Stultz (4):
  sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
  sched: Fix runtime accounting w/ split exec & sched contexts
  sched: Fix psi_dequeue for Proxy Execution
  sched: Add an initial sketch of the find_proxy_task() function

Peter Zijlstra (2):
  locking/mutex: Rework task_struct::blocked_on
  sched: Start blocked_on chain processing in find_proxy_task()

Valentin Schneider (1):
  sched: Fix proxy/current (push,pull)ability

 .../admin-guide/kernel-parameters.txt         |   5 +
 include/linux/sched.h                         |  79 ++++-
 init/Kconfig                                  |   7 +
 init/init_task.c                              |   1 +
 kernel/fork.c                                 |   4 +-
 kernel/locking/mutex-debug.c                  |   9 +-
 kernel/locking/mutex.c                        |  40 ++-
 kernel/locking/ww_mutex.h                     |  24 +-
 kernel/sched/core.c                           | 300 +++++++++++++++++-
 kernel/sched/fair.c                           |  31 +-
 kernel/sched/rt.c                             |  15 +-
 kernel/sched/sched.h                          |  22 +-
 kernel/sched/stats.h                          |   6 +-
 13 files changed, 507 insertions(+), 36 deletions(-)

-- 
2.47.0.199.ga7371fff76-goog