[PATCH 0/2] sched_ext: Fix SCX_KICK_WAIT cycle deadlock

Christian Loehle posted 2 patches 3 weeks ago
kernel/sched/ext.c                            |  45 +++-
tools/testing/selftests/sched_ext/Makefile    |   1 +
.../selftests/sched_ext/wait_kick_cycle.bpf.c |  70 ++++++
.../selftests/sched_ext/wait_kick_cycle.c     | 223 ++++++++++++++++++
4 files changed, 337 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/sched_ext/wait_kick_cycle.bpf.c
create mode 100644 tools/testing/selftests/sched_ext/wait_kick_cycle.c
[PATCH 0/2] sched_ext: Fix SCX_KICK_WAIT cycle deadlock
Posted by Christian Loehle 3 weeks ago
When using SCX_KICK_WAIT I noticed that it lacks any deadlock
prevention and will deadlock with no chance of sched_ext even ejecting
the BPF scheduler.
The BPF scheduler cannot impose any reasonablesynchronisation itself,
except for a strict partition of which CPUs are allowed to SCX_KICK_WAIT
which other CPUs.
Since SCX_KICK_WAIT seems to be used quite rarely, just synchronize
all SCX_KICK_WAIT globally and don't try to be clever about cycle
detection.
Also add a testcase that reproduces the issue.

Christian Loehle (2):
  sched_ext: Prevent SCX_KICK_WAIT deadlock by serialization
  sched_ext/selftests: Add SCX_KICK_WAIT cycle tests

 kernel/sched/ext.c                            |  45 +++-
 tools/testing/selftests/sched_ext/Makefile    |   1 +
 .../selftests/sched_ext/wait_kick_cycle.bpf.c |  70 ++++++
 .../selftests/sched_ext/wait_kick_cycle.c     | 223 ++++++++++++++++++
 4 files changed, 337 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/sched_ext/wait_kick_cycle.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/wait_kick_cycle.c

-- 
2.34.1
Re: [PATCH 1/2] sched_ext: Prevent SCX_KICK_WAIT deadlock by serialization
Posted by Tejun Heo 1 week, 2 days ago
Hello,

I posted an alternative fix here:

  https://lore.kernel.org/r/20260329001856.835643-1-tj@kernel.org

Instead of serializing the kicks, it defers the wait to a balance callback
which can drop the rq lock and enable IRQs, avoiding the deadlock while
preserving the concurrent kick_wait semantics.

Thanks.

--
tejun