[v3] sched_ext: Improve exit-time diagnostics

[PATCH v3 0/3] sched_ext: Improve exit-time diagnostics

Posted by Changwoo Min 1 month, 2 weeks ago

When sched_ext is disabled by an error, the per-CPU state dump in the
exit info can get truncated on systems with many CPUs. If the CPU that
triggered the exit happens to be in the middle or end of the CPU list,
its state may never appear in the output, making it difficult to
diagnose the failure.

This series addresses that by always dumping the exit CPU first and
surfacing the same CPU id to BPF schedulers and userspace tools.

Patch 1 is a preparatory refactor that extracts the per-CPU dump logic
into a scx_dump_cpu() helper.

Patch 2 adds an exit_cpu field to scx_exit_info and threads it through
the exit path. The scx_exit() wrapper is reworked into a macro that
captures the calling CPU automatically for all error paths, while the
watchdog stall site records cpu_of(rq) explicitly. scx_dump_state()
reports the CPU in the dump header and emits it before the rest of the
per-CPU loop so it survives any output truncation.

Patch 3 propagates exit_cpu to struct user_exit_info, the BPF /
userspace shared exit record. UEI_RECORD() defaults the field to -1
before its CO-RE-gated copy so older kernels remain distinguishable
from "exit happened on CPU 0", and UEI_REPORT() appends "on CPU N" to
the EXIT line so scheduler authors see the most diagnostically useful
piece of exit info without cracking open the debug dump.

Changes since v2:
- Use s32 (instead of int) for the new exit_cpu field and the
  __scx_exit() / scx_vexit() parameter, matching the convention for
  CPU ids in sched_ext.
- v2: https://lore.kernel.org/sched-ext/20260429060726.359024-1-changwoo@igalia.com/

Changes since v1:
- Generalized "stall CPU" to "exit CPU"; the scx_exit_info field is
  now exit_cpu and is populated for any path through scx_exit() /
  __scx_exit() / scx_vexit(), not just the watchdog stall path.
- Added patch 3 to expose exit_cpu via struct user_exit_info.
- SysRq-D initializes exit_cpu to -1 so debug dumps not tied to an
  exit don't arbitrarily promote CPU 0.
- Dump header now reports "on cpu N" alongside the exit kind.
- v1: https://lore.kernel.org/sched-ext/20260408031113.76005-1-changwoo@igalia.com/

Changwoo Min (3):
  sched_ext: Extract scx_dump_cpu() from scx_dump_state()
  sched_ext: Dump the exit CPU first
  sched_ext: Expose exit_cpu to BPF and userspace

 kernel/sched/ext.c                            | 221 ++++++++++--------
 kernel/sched/ext_internal.h                   |   6 +
 .../include/scx/user_exit_info.bpf.h          |   3 +
 tools/sched_ext/include/scx/user_exit_info.h  |   2 +
 .../include/scx/user_exit_info_common.h       |   5 +
 5 files changed, 142 insertions(+), 95 deletions(-)

-- 
2.54.0

Re: [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics

Posted by Tejun Heo 1 month, 2 weeks ago

Hello,

> Changwoo Min (3):
>   sched_ext: Extract scx_dump_cpu() from scx_dump_state()
>   sched_ext: Dump the exit CPU first
>   sched_ext: Expose exit_cpu to BPF and userspace

Applied 1-3 to sched_ext/for-7.2, thank you.

A few things I noticed that might be worth a follow-up:

1. scx_rcu_cpu_stall() takes no cpu, so the captured exit_cpu ends
   up being the detector rather than the stalled one. We could
   probably plumb it through from print_other_cpu_stall(), where
   the stalled cpu is known.

2. scx_hardlockup_irq_workfn() already has the hung cpu locally, so
   passing it via __scx_exit() might be a bit more robust than
   relying on irq_work routing.

3. Minor: "on cpu N" (kernel) vs "on CPU N" (UEI) - the casing
   could probably match.

Thanks.

--
tejun

Re: [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics

Posted by Cheng-Yang Chou 1 month, 2 weeks ago

Hi Tejun,

On Tue, Apr 28, 2026 at 10:57:27PM -1000, Tejun Heo wrote:
> A few things I noticed that might be worth a follow-up:
> 
> 1. scx_rcu_cpu_stall() takes no cpu, so the captured exit_cpu ends
>    up being the detector rather than the stalled one. We could
>    probably plumb it through from print_other_cpu_stall(), where
>    the stalled cpu is known.

Do you mean we should change the function signatures to pass the stalled
CPU through, e.g. panic_on_rcu_stall(int stalled_cpu) and
scx_rcu_cpu_stall(int stalled_cpu)?

> 
> 2. scx_hardlockup_irq_workfn() already has the hung cpu locally, so
>    passing it via __scx_exit() might be a bit more robust than
>    relying on irq_work routing.
> 
> 3. Minor: "on cpu N" (kernel) vs "on CPU N" (UEI) - the casing
>    could probably match.
> 

I have a draft patch and can send it out. If Changwoo or anyone else is
already working on this, pls let me know!

-- 
Cheers,
Cheng-Yang

Re: [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics

Posted by Tejun Heo 1 month, 2 weeks ago

On Wed, Apr 29, 2026 at 07:29:30PM +0800, Cheng-Yang Chou wrote:
> Hi Tejun,
> 
> On Tue, Apr 28, 2026 at 10:57:27PM -1000, Tejun Heo wrote:
> > A few things I noticed that might be worth a follow-up:
> > 
> > 1. scx_rcu_cpu_stall() takes no cpu, so the captured exit_cpu ends
> >    up being the detector rather than the stalled one. We could
> >    probably plumb it through from print_other_cpu_stall(), where
> >    the stalled cpu is known.
> 
> Do you mean we should change the function signatures to pass the stalled
> CPU through, e.g. panic_on_rcu_stall(int stalled_cpu) and
> scx_rcu_cpu_stall(int stalled_cpu)?

Yeah.

Thanks.

-- 
tejun

Re: [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics

Posted by Changwoo Min 1 month, 2 weeks ago

Hi Cheng-Yang,

On 4/29/26 8:29 PM, Cheng-Yang Chou wrote:
>> 2. scx_hardlockup_irq_workfn() already has the hung cpu locally, so
>>     passing it via __scx_exit() might be a bit more robust than
>>     relying on irq_work routing.
>>
>> 3. Minor: "on cpu N" (kernel) vs "on CPU N" (UEI) - the casing
>>     could probably match.
>>
> I have a draft patch and can send it out. If Changwoo or anyone else is
> already working on this, pls let me know!

Feel free to go ahead. Thanks!

Regards,
Changwoo Min