[PATCH v10 0/7] Preparatory changes for Proxy Execution v10

John Stultz posted 7 patches 1 week, 5 days ago
kernel/locking/mutex.c       |  60 +++++++----------
kernel/locking/mutex.h       |  27 ++++++++
kernel/locking/rtmutex.c     |  30 ++++++---
kernel/locking/rwbase_rt.c   |   8 ++-
kernel/locking/rwsem.c       |   4 +-
kernel/locking/spinlock_rt.c |   3 +-
kernel/locking/ww_mutex.h    |  49 ++++++++------
kernel/sched/core.c          | 122 +++++++++++++++++++++--------------
kernel/sched/deadline.c      |  53 ++++++---------
kernel/sched/fair.c          |  18 +++---
kernel/sched/rt.c            |  61 +++++++-----------
kernel/sched/sched.h         |  48 +++++++++++++-
12 files changed, 282 insertions(+), 201 deletions(-)
[PATCH v10 0/7] Preparatory changes for Proxy Execution v10
Posted by John Stultz 1 week, 5 days ago
As mentioned a few times previously[1], after earlier
submissions of the Proxy Execution series didn’t get much in the
way of feedback, it was noted that the patch series was getting
a bit unwieldy to review. Qais suggested I break out just the
cleanups/preparatory components of the patch series and submit
them on their own in the hope we can start to merge the less
complex bits and discussion can focus on the more complicated
portions afterwards.  This so far has not been very successful,
with the submission & RESEND of the v8 & v9 preparatory changes
not getting all that much in the way of review or feedback.

For v10 of this series, I’m again only submitting those early
cleanup/preparatory changes here. However, please let me know if
there is any way to make reviewing the series easier to move
this forward.

In the meantime, I’ve continued to put effort into the full
series, mostly focused on polishing the series for correctness.

Unfortunately one issue I found ended up taking awhile to
determine it was actually a problem in mainline (the RT_PUSH_IPI
feature broke the RT scheduling invariant -  after disabling it
I don’t see problems with mainline or with proxy-exec). But going
through the analysis process was helpful, and I’ve made some
tweaks to Metin’s patch for trace events to make it easier to
follow along the proxy behavior using ftrace & perfetto. Doing
this also helped find a case where when we were proxy-migrating
current, we first schedule idle, but didn’t preserve the
needs_resched flag, needlessly delaying things.

If you are interested, the full v10 series, it can be found here:
  https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v10-6.9-rc7
  https://github.com/johnstultz-work/linux-dev.git proxy-exec-v10-6.9-rc7


New in v10 (in the preparatory patches submitted here)
---------
* Switched preempt_enable to be lower close to the unlock as
  suggested by Valentin

* Added additional preempt_disable coverage around the wake_q
  calls as again noted by Valentin

* Handle null lock ptr in __mutex_owner, to simplify later code,
  as suggested by Metin Kaya

* Changed do_push_task to move_queued_task_locked as suggested
  by Valentin

* Use rq_selected in push_rt_task & get_push_task

* Added Reviewed by tags

New in v10 (in the rest of the series)
---------
* Tweak so that if find_proxy_task returns idle, we should
  always preserve needs_resched

* Drop WARN_ON(task_is_blocked(p)) in ttwu current case

* Add more details to the traceevents (owner task for proxy
  migrations, and  prev, selected and next for task selection)
  so its easier to understand the proxy behavior.

* Simplify logic to task_queued_on_rq suggested by Metin

* Rework from do_push_task usage to move_queued_task_locked

* Further Cleanups suggested by Metin


Performance:
---------
K Prateek Nayak provided some feedback on the full v8 series
here[2]. Given the potential extra overhead of doing rq
migrations/return migrations/etc for the proxy case, it’s not
completely surprising a few of K Prateek’s test cases saw ~3-5%
regressions, but I’m hoping to look into this soon to see if we
can reduce those further.


Issues still to address:
---------
* The chain migration functionality needs further iterations and
  better validation to ensure it truly maintains the RT/DL load
  balancing invariants.

* CFS load balancing. There was concern that blocked tasks may
  carry forward load (PELT) to the lock owner's CPU, so the CPU
  may look like it is overloaded. Needs investigation.

* The sleeping owner handling (where we deactivate waiting tasks
  and enqueue them onto a list, then reactivate them when the
  owner wakes up) doesn’t feel great. This is in part because
  when we want to activate tasks, we’re already holding a
  task.pi_lock and a rq_lock, just not the locks for the task
  we’re activating, nor the rq we’re enqueuing it onto. So there
  has to be a bit of lock juggling to drop and acquire the right
  locks (in the right order). It feels like there’s got to be a
  better way. Also needs some rework to get rid of the
  recursion.


Credit/Disclaimer:
—--------------------
As mentioned previously, this Proxy Execution series has a long
history: 

First described in a paper[3] by Watkins, Straub, Niehaus, then
from patches from Peter Zijlstra, extended with lots of work by
Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank
you to Steven Rostedt for providing additional details here!)

So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are likely
mine.

Thanks so much!
-john

[1] https://lore.kernel.org/lkml/20240401234439.834544-1-jstultz@google.com/
[2] https://lore.kernel.org/lkml/c26251d2-e1bf-e5c7-0636-12ad886e1ea8@amd.com/
[3] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com


Connor O'Brien (2):
  sched: Add move_queued_task_locked helper
  sched: Consolidate pick_*_task to task_is_pushable helper

John Stultz (1):
  sched: Split out __schedule() deactivate task logic into a helper

Juri Lelli (2):
  locking/mutex: Make mutex::wait_lock irq safe
  locking/mutex: Expose __mutex_owner()

Peter Zijlstra (2):
  locking/mutex: Remove wakeups from under mutex::wait_lock
  sched: Split scheduler and execution contexts

 kernel/locking/mutex.c       |  60 +++++++----------
 kernel/locking/mutex.h       |  27 ++++++++
 kernel/locking/rtmutex.c     |  30 ++++++---
 kernel/locking/rwbase_rt.c   |   8 ++-
 kernel/locking/rwsem.c       |   4 +-
 kernel/locking/spinlock_rt.c |   3 +-
 kernel/locking/ww_mutex.h    |  49 ++++++++------
 kernel/sched/core.c          | 122 +++++++++++++++++++++--------------
 kernel/sched/deadline.c      |  53 ++++++---------
 kernel/sched/fair.c          |  18 +++---
 kernel/sched/rt.c            |  61 +++++++-----------
 kernel/sched/sched.h         |  48 +++++++++++++-
 12 files changed, 282 insertions(+), 201 deletions(-)

-- 
2.45.0.rc1.225.g2a3ae87e7f-goog