[PATCH v4 00/20] sched: Introduce cpu_preferred_mask and steal-driven vCPU backoff

Shrikanth Hegde posted 20 patches 1 day, 9 hours ago
.../ABI/testing/sysfs-devices-system-cpu      |  11 +
Documentation/scheduler/sched-arch.rst        |  49 ++++
Documentation/scheduler/sched-debug.rst       |  34 +++
drivers/base/cpu.c                            |   8 +
include/linux/cpumask.h                       |  21 +-
include/linux/sched.h                         |  20 +-
kernel/Kconfig.preempt                        |  13 +
kernel/cpu.c                                  |  14 +
kernel/sched/core.c                           | 269 +++++++++++++++++-
kernel/sched/debug.c                          |  55 +++-
kernel/sched/fair.c                           |   4 +
kernel/sched/sched.h                          |  42 +++
12 files changed, 531 insertions(+), 9 deletions(-)
[PATCH v4 00/20] sched: Introduce cpu_preferred_mask and steal-driven vCPU backoff
Posted by Shrikanth Hegde 1 day, 9 hours ago
Very briefly,
- Maintain set of CPUs which can be used by workload. It is denoted as
  cpu_preferred_mask
- Periodically compute the steal time. If steal time is high/low based
  on the thresholds, either reduce/increase the preferred CPUs.
- If a CPU is marked as non-preferred, push the task running on it if
  possible.
- Use this CPU state in wakeup and load balance to ensure tasks run
  within preferred CPUs.

For more details on idea, problem statement and performance numbers,
please refer to cover-letter of v2[2] and OSPM talk[1].

*** Please review and provide your feedback!! ***

[1]:https://youtu.be/adxUKFPlOp0
[2] v2: https://lore.kernel.org/all/20260407191950.643549-1-sshegde@linux.ibm.com/#t
[3] v3: https://lore.kernel.org/all/20260514152204.481115-1-sshegde@linux.ibm.com/

v3->v4: 
- Made preferred subset of active instead of online. (K Prateek Nayak,
  Peter Zijlstra)
- Dropped RT patch
- Decided generic sched_ext change doesn't make sense. Hence it has to
  be custom sched_ext with its select_cpu, enqeue/dequeue etc. This will
  be done later. 
- changes to is_cpu_allowed/select_fallback_rq to avoid N**2 (K Prateek
  Nayak). There is encoding of two bits of information there. Let me
  know if this needs to split up into two.
- Add concurrency protection for enabling/disabling steal monitor (Ilya
  Leoshkevich)
- Dropped tmp_mask and reset steal_monitor state (Ilya Leoshkevich)
- Added a few cpumask_check (Yury Norov)
- Picked up tag for patch 1. (Thanks to K Prateek Nayak)
- Decided not to put too much complexity for numa splicing.

There is no major TODO item at this point. There are few minor additions
which maybe good to do provided numbers show its worth. Performance
numbers are expected to be same as v2.

base: tip/sched/core at 
c095741713d1 ("sched/fair: Fix newidle vs core-sched")


Shrikanth Hegde (20):
  sched/debug: Remove unused schedstats
  sched/docs: Document cpu_preferred_mask and Preferred CPU concept
  kconfig: Provide PREFERRED_CPU option
  cpumask: Introduce cpu_preferred_mask
  sysfs: Add preferred CPU file
  sched/core: allow only preferred CPUs in is_cpu_allowed
  sched/fair: Select preferred CPU at wakeup when possible
  sched/fair: load balance only among preferred CPUs
  sched/core: Keep tick on non-preferred CPUs until tasks are out
  sched/core: Push current task from non preferred CPU
  sched/debug: Add migration stats due to non preferred CPUs
  sched/debug: Create debugfs folder steal monitor
  sched/debug: Provide debugfs to enable/disable steal monitor
  sched/core: Introduce a simple steal monitor
  sched/core: Compute steal values at regular intervals
  sched/core: Introduce default arch handling code for inc/dec preferred
    CPUs
  sched/core: Handle steal values and mark CPUs as preferred
  sched/core: Mark the direction of steal values to avoid oscillations
  sched/debug: Add debug knobs for steal monitor
  sched/core: Add a few check for valid CPU in inc/dec of preferred CPUs

 .../ABI/testing/sysfs-devices-system-cpu      |  11 +
 Documentation/scheduler/sched-arch.rst        |  49 ++++
 Documentation/scheduler/sched-debug.rst       |  34 +++
 drivers/base/cpu.c                            |   8 +
 include/linux/cpumask.h                       |  21 +-
 include/linux/sched.h                         |  20 +-
 kernel/Kconfig.preempt                        |  13 +
 kernel/cpu.c                                  |  14 +
 kernel/sched/core.c                           | 269 +++++++++++++++++-
 kernel/sched/debug.c                          |  55 +++-
 kernel/sched/fair.c                           |   4 +
 kernel/sched/sched.h                          |  42 +++
 12 files changed, 531 insertions(+), 9 deletions(-)

-- 
2.47.3