[PATCH v25 0/9] Simple Donor Migration for Proxy Execution

John Stultz posted 9 patches 3 weeks, 4 days ago
There is a newer version of this series
include/linux/sched.h        |  91 +++++++----
init/init_task.c             |   1 +
kernel/fork.c                |   1 +
kernel/locking/mutex-debug.c |   4 +-
kernel/locking/mutex.c       |  40 +++--
kernel/locking/mutex.h       |   6 +
kernel/locking/ww_mutex.h    |  16 +-
kernel/sched/core.c          | 300 +++++++++++++++++++++++++++++------
kernel/sched/deadline.c      |  16 +-
kernel/sched/fair.c          |  26 ---
kernel/sched/rt.c            |  15 +-
kernel/sched/sched.h         |  35 +++-
12 files changed, 414 insertions(+), 137 deletions(-)
[PATCH v25 0/9] Simple Donor Migration for Proxy Execution
Posted by John Stultz 3 weeks, 4 days ago
Hey All,

Yet another iteration on the next chunk of the Proxy Exec
series: Simple Donor Migration

This is just the next step for Proxy Execution, to allow us to
migrate blocked donors across runqueues to boost remote lock
owners.

As always, I’m trying to submit this larger work in smallish
digestible pieces, so in this portion of the series, I’m only
submitting for review and consideration some recent fixups, and
the logic that allows us to do donor(blocked waiter) migration,
which requires some additional changes to locking and extra
state tracking to ensure we don’t accidentally run a migrated
donor on a cpu it isn’t affined to, as well as some extra
handling to deal with balance callback state that needs to be
reset when we decide to pick a different task after doing donor
migration.

Much of the new logic in this version is thanks to K Prateek,
who provided a lot of insightful suggestions to the v24 series!

New in this iteration:
* With additional changes, the previous full Donor Migration
  series had gotten pretty long, so to go easy on reviewers I’ve
  dropped the later Donor Migration patches I had in v24, which
  basically provided optimizations so try_to_wake_up() would do
  return-migration, smarter mutex handoffs, and proxy migrating
  the entire chain in one pass. K Prateek also had some
  suggestions for further improvements in these later patches
  that I have not yet addressed, so for now I’m going to table
  them and will revisit once progress is made with this set.

* Fix for proxy_tag_curr() erroneously leaving tasks off of the
  pushable list, reported by K Prateek and suggested by Peter,
  allowing us to drop the proxy_tag_curr() logic completely.

* Peter noted compilers don’t always optimize as we would like,
  and suggested reworked logic to reduce repetitive
  sched_proxy_exec() branches.

* Rework of proxy_force_return() suggested by K Prateek to use
  WF_TTWU flags, and to use attach_one_task() helper to simplify
  code.

* Other small cleanups through the series suggested by
  K Prateek.

I’d love to get further feedback on any place where these
patches are confusing, or could use additional clarifications.

There’s also been some further improvements In the full Proxy
Execution series:
* David Stevens reported and diagnosed an issue with loadavg
  being incorrect due to incorrect nr_uninterruptible accounting
  in the sleeping-owner handling. 

* An issue with rwsem support was found and fixed, along with
  other simplifications to the changes.

* Fix suggested by Peter for an edge case with DL adding tasks
  twice to the pushable list when Proxy Exec pushes the donor
  task.

* K Prateek had further suggestions to improve the optimized
  donor migration changes, dropping the unnecessary
  migration_node addition to the task_struct, and using
  atttach_tasks to simplify the full chain migration.

* Tiffany Yang pointed out some unnecessary CONFIG_SMP bits
  were still lingering and could be cleaned up.

* An initial draft at Documentation update to describe Proxy
  Execution.

I’d appreciate any testing or comments that folks have with
the full set!

You can find the full Proxy Exec series here:
  https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v25-7.0-rc3/
  https://github.com/johnstultz-work/linux-dev.git proxy-exec-v25-7.0-rc3


Issues still to address with the full series:
* Resolve a regression in the later optimized donor-migration
  changes combined with “Fix 'stuck' dl_server” change in 6.19

* With the full series against 7.0-rc3, when doing heavy stress
  testing, I’m occasionally hitting crashes due to null return
  from __pick_eevdf(). Need to dig on this and find why it
  doesn’t happen against 6.18

* Try to integrate and rework K Prateek’s suggestions for the
  later optimized donor-migration changes.

* Continue working to get sched_ext to be ok with Proxy
  Execution enabled.

* Reevaluate performance regression K Prateek Nayak found with
  the full series.

* The chain migration functionality needs further iterations and
  better validation to ensure it truly maintains the RT/DL load
  balancing invariants (despite this being broken in vanilla
  upstream with RT_PUSH_IPI currently)

Future work:
* Expand to more locking primitives: Figuring out pi-futexes
  would be good, using proxy for Binder PI is something else
  we’re exploring.

* Eventually: Work to replace rt_mutexes and get things happy
  with PREEMPT_RT

I’d really appreciate any feedback or review thoughts on the
full series as well. I’m trying to keep the chunks small,
reviewable and iteratively testable, but if you have any
suggestions on how to improve the larger series, I’m all ears.

Credit/Disclaimer:
—--------------------
As always, this Proxy Execution series has a long history with
lots of developers that deserve credit:

First described in a paper[1] by Watkins, Straub, Niehaus, then
from patches from Peter Zijlstra, extended with lots of work by
Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank
you to Steven Rostedt for providing additional details here!).
Thanks also to Joel Fernandes, Dietmar Eggemann, Metin Kaya,
K Prateek Nayak and Suleiman Souhlal for their substantial
review, suggestion, and patch contributions.

So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are surely mine.

Thanks so much!
-john

[1] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf

Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com

John Stultz (9):
  sched: Make class_schedulers avoid pushing current, and get rid of
    proxy_tag_curr()
  sched: Minimise repeated sched_proxy_exec() checking
  locking: Add task::blocked_lock to serialize blocked_on state
  sched: Fix modifying donor->blocked on without proper locking
  sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy
    return-migration
  sched: Add assert_balance_callbacks_empty helper
  sched: Add logic to zap balance callbacks if we pick again
  sched: Move attach_one_task and attach_task helpers to sched.h
  sched: Handle blocked-waiter migration (and return migration)

 include/linux/sched.h        |  91 +++++++----
 init/init_task.c             |   1 +
 kernel/fork.c                |   1 +
 kernel/locking/mutex-debug.c |   4 +-
 kernel/locking/mutex.c       |  40 +++--
 kernel/locking/mutex.h       |   6 +
 kernel/locking/ww_mutex.h    |  16 +-
 kernel/sched/core.c          | 300 +++++++++++++++++++++++++++++------
 kernel/sched/deadline.c      |  16 +-
 kernel/sched/fair.c          |  26 ---
 kernel/sched/rt.c            |  15 +-
 kernel/sched/sched.h         |  35 +++-
 12 files changed, 414 insertions(+), 137 deletions(-)

-- 
2.53.0.880.g73c4285caa-goog