[PATCH 0/3] sched_ext: Fix kick_cpus_irq_workfn() NULL pointer warning during exit

Zhao Mengmeng posted 3 patches 2 months, 1 week ago
kernel/sched/ext.c                | 7 ++++---
tools/sched_ext/scx_central.bpf.c | 7 +++++++
2 files changed, 11 insertions(+), 3 deletions(-)
[PATCH 0/3] sched_ext: Fix kick_cpus_irq_workfn() NULL pointer warning during exit
Posted by Zhao Mengmeng 2 months, 1 week ago
From: Zhao Mengmeng <zhaomengmeng@kylinos.cn>

On PC with 8 core, when running scx_central with stress-ng -c 2 --io 8,
if enable/disable scx_central frequently, dmesg will pop an message:

[   38.214510] sched_ext: BPF scheduler "central" enabled
[   47.809167] hrtimer: interrupt took 11287895 ns
[   48.233813] sched_ext: BPF scheduler "central" disabled (unregistered from user space)
[   48.248460] kick_cpus_irq_workfn() called with NULL scx_kick_syncs <---- here
[   54.597735] sched_ext: BPF scheduler "central" enabled

After adding dump stack in the warning code, it shows:

[  231.269159] sched_ext: BPF scheduler "central" disabled (unregistered from user space)
[  231.278500] CPU: 0 UID: 0 PID: 123 Comm: tmux: server Not tainted 7.0.0-rc2-virtme #13 PREEMPT(full)
[  231.278519] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  231.278536] Call Trace:
[  231.278556]  <IRQ>
[  231.278571]  dump_stack_lvl+0x6f/0xb0
[  231.278640]  kick_cpus_irq_workfn.cold+0x19/0x57
[  231.278670]  ? mark_held_locks+0x40/0x70
[  231.278691]  ? _raw_spin_unlock_irqrestore+0x4b/0x70
[  231.278716]  ? __pfx_kick_cpus_irq_workfn+0x10/0x10
[  231.278750]  irq_work_single+0xcd/0x250
[  231.278776]  irq_work_run_list+0x6a/0xa0
[  231.278794]  irq_work_run+0x18/0x30
[  231.278805]  __sysvec_irq_work+0x4f/0x320
[  231.278842]  sysvec_irq_work+0x43/0xb0
[  231.278863]  asm_sysvec_irq_work+0x1a/0x20
[  231.278883] RIP: 0010:scx_kick_cpu.part.0+0x25b/0x680
[  231.278897] Code: 00 02 75 31 48 89 df e8 03 34 fd ff 4d 85 ed 0f 84 3c ff
[  231.278906] RSP: 0018:ffff888035e08cb8 EFLAGS: 00000286
[  231.278934] RAX: 0000000000000082 RBX: 0000000000000000 RCX: ffffffff88cec5cb
[  231.278946] RDX: ffff888002821d40 RSI: 0000000000000000 RDI: ffffffff88cec5cb
[  231.278956] RBP: ffff888035e49d00 R08: 0000000000000000 R09: 0000000000000001
[  231.278961] R10: ffff8880028228a0 R11: ffff888002821d40 R12: 0000000000000004
[  231.278967] R13: 0000000000000200 R14: 0000000000000002 R15: 0000000400000000
[  231.278998]  ? scx_kick_cpu.part.0+0x24b/0x680
[  231.279015]  ? scx_kick_cpu.part.0+0x24b/0x680
[  231.279039]  ? ops_cpu_valid+0x6c/0xf0
[  231.279060]  scx_bpf_kick_cpu+0xba/0x2e0
[  231.279086]  bpf_prog_402c05d73c49876a_central_timerfn+0x24f/0x26c
[  231.279117]  ? 0xffffffffc0409050
[  231.279150]  ? rcu_read_lock_bh_held+0x2b/0x60
[  231.279181]  bpf_timer_cb+0x168/0x2b0
[  231.279201]  ? __pfx_bpf_timer_cb+0x10/0x10

The whole process is: in scx_root_disable(), after setting scx_kick_syncs
to NULL in free_kick_syncs(), it then exits bypass mode. Since the BPF
timer is stored in BPF map, which is only destroyed at BPF program
exiting, it now can still fire and call scx_bpf_kick_cpu(). It will pass
the scx_bypassing() check and queue irq work, as a result, the null check
in kick_cpus_irq_workfn() is hitted.

Fix it in two aspects:
1. In kernel, check sch->aborting in scx_kick_cpu, prevent new kicks if
the scheduler is in aborting state.
2. In scx_central, cancel bpf timer in op.exit, which will called before
free_kick_syncs(), this avoid the cpu kick from source.

Besides, in Patch 3, propose replacing pr_warn_once with WARN_ON_ONCE to
add stronger race check, and if it is considered not appropriate, we can
drop it.

Zhao Mengmeng (3):
  sched_ext: Prevent CPU kicks from exiting schedulers
  scx_central: Cancel BPF timer during ops.exit
  sched_ext: Use WARN_ON_ONCE to check whether scx_kick_syncs is NULL

 kernel/sched/ext.c                | 7 ++++---
 tools/sched_ext/scx_central.bpf.c | 7 +++++++
 2 files changed, 11 insertions(+), 3 deletions(-)

-- 
2.43.0