[PATCH 0/4] sched_ext: fix three unreliable selftests and compat kfunc signature

zhidao su posted 4 patches 6 days, 20 hours ago
tools/sched_ext/include/scx/compat.bpf.h      |  4 +-
.../selftests/sched_ext/consume_immed.bpf.c   | 19 ++++++--
.../selftests/sched_ext/consume_immed.c       | 15 +++++--
.../selftests/sched_ext/dsq_reenq.bpf.c       | 45 ++++++++++++++-----
.../selftests/sched_ext/enq_immed.bpf.c       | 10 ++++-
5 files changed, 73 insertions(+), 20 deletions(-)
[PATCH 0/4] sched_ext: fix three unreliable selftests and compat kfunc signature
Posted by zhidao su 6 days, 20 hours ago
Three selftests added in c50dcf533149 ("selftests/sched_ext: Add tests
for SCX_ENQ_IMMED and scx_bpf_dsq_reenq()") have design flaws that
prevent them from reliably triggering the kernel behavior they were
written to verify.  This series fixes all three, plus a related
compat.bpf.h signature bug.

- Patch 1: compat.bpf.h declares scx_bpf_dsq_reenq___compat with an
  explicit aux__prog parameter, but scx_bpf_dsq_reenq() is a
  KF_IMPLICIT_ARGS kfunc, so the BPF-side declaration must NOT include
  the implicit parameter.

- Patch 2 (enq_immed): when the IMMED slow path fires, reenqueued tasks
  were re-inserted into CPU 0's local DSQ, immediately re-triggering
  the slow path and creating an infinite cycle that hits
  SCX_REENQ_LOCAL_MAX_REPEAT (256).  Fix: redirect reenqueued tasks to
  SCX_DSQ_GLOBAL.

- Patch 3 (dsq_reenq): dispatch consumed USER_DSQ directly, racing with
  the 5 ms BPF timer.  By the time the timer fired, USER_DSQ was empty
  and scx_bpf_dsq_reenq() found nothing to reenqueue.  Fix: two-DSQ
  design where USER_DSQ is never touched by dispatch.

- Patch 4 (consume_immed): workers were spread across all CPUs, so
  USER_DSQ usually had only one task and dsq->nr never exceeded 1.  Fix:
  pin all workers to CPU 0 via CPU affinity so USER_DSQ always has
  backlog; add dispatch loop to accumulate 2+ tasks and reliably trigger
  the IMMED slow path.

All three tests verified with virtme-ng (PASSED: 30, FAILED: 0 across
the full sched_ext selftest suite).

zhidao su (4):
  tools/sched_ext: compat.bpf.h: fix scx_bpf_dsq_reenq___compat
    signature
  selftests/sched_ext: enq_immed: fix IMMED reenqueue livelock
  selftests/sched_ext: dsq_reenq: fix reliability with two-DSQ design
  selftests/sched_ext: consume_immed: fix reliability with CPU affinity

 tools/sched_ext/include/scx/compat.bpf.h      |  4 +-
 .../selftests/sched_ext/consume_immed.bpf.c   | 19 ++++++--
 .../selftests/sched_ext/consume_immed.c       | 15 +++++--
 .../selftests/sched_ext/dsq_reenq.bpf.c       | 45 ++++++++++++++-----
 .../selftests/sched_ext/enq_immed.bpf.c       | 10 ++++-
 5 files changed, 73 insertions(+), 20 deletions(-)

-- 
2.43.0