Hi Peter,
thanks for jumping on this. Comment below.
On Wed, Sep 10, 2025 at 05:44:09PM +0200, Peter Zijlstra wrote:
> Hi,
>
> As mentioned [1], a fair amount of sched ext weirdness (current and proposed)
> is down to the core code not quite working right for shared runqueue stuff.
>
> Instead of endlessly hacking around that, bite the bullet and fix it all up.
>
> With these patches, it should be possible to clean up pick_task_scx() to not
> rely on balance_scx(). Additionally it should be possible to fix that RT issue,
> and the dl_server issue without further propagating lock breaks.
>
> As is, these patches boot and run/pass selftests/sched_ext with lockdep on.
>
> I meant to do more sched_ext cleanups, but since this has all already taken
> longer than I would've liked (real life interrupted :/), I figured I should
> post this as is and let TJ/Andrea poke at it.
>
> Patches are also available at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/cleanup
>
>
> [1] https://lkml.kernel.org/r/20250904202858.GN4068168@noisy.programming.kicks-ass.net
I've done a quick test with this patch set applied and I was able to
trigger this:
[ 49.746281] ============================================
[ 49.746457] WARNING: possible recursive locking detected
[ 49.746559] 6.17.0-rc4-virtme #85 Not tainted
[ 49.746666] --------------------------------------------
[ 49.746763] stress-ng-race-/5818 is trying to acquire lock:
[ 49.746856] ffff890e0adacc18 (&dsq->lock){-.-.}-{2:2}, at: dispatch_dequeue+0x125/0x1f0
[ 49.747052]
[ 49.747052] but task is already holding lock:
[ 49.747234] ffff890e0adacc18 (&dsq->lock){-.-.}-{2:2}, at: task_rq_lock+0x6c/0x170
[ 49.747416]
[ 49.747416] other info that might help us debug this:
[ 49.747557] Possible unsafe locking scenario:
[ 49.747557]
[ 49.747689] CPU0
[ 49.747740] ----
[ 49.747793] lock(&dsq->lock);
[ 49.747867] lock(&dsq->lock);
[ 49.747950]
[ 49.747950] *** DEADLOCK ***
[ 49.747950]
[ 49.748086] May be due to missing lock nesting notation
[ 49.748086]
[ 49.748197] 3 locks held by stress-ng-race-/5818:
[ 49.748335] #0: ffff890e0f0fce70 (&p->pi_lock){-.-.}-{2:2}, at: task_rq_lock+0x38/0x170
[ 49.748474] #1: ffff890e3b6bcc98 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x20/0xa0
[ 49.748652] #2: ffff890e0adacc18 (&dsq->lock){-.-.}-{2:2}, at: task_rq_lock+0x6c/0x170
Reproducer:
$ cd tools/sched_ext
$ make scx_simple
$ sudo ./build/bin/scx_simple
... and in another shell
$ stress-ng --race-sched 0
I added an explicit BUG_ON() to see where the double locking is happening:
[ 15.160400] Call Trace:
[ 15.160706] dequeue_task_scx+0x14a/0x270
[ 15.160857] move_queued_task+0x7d/0x2d0
[ 15.160952] affine_move_task+0x6ca/0x700
[ 15.161210] __set_cpus_allowed_ptr+0x64/0xa0
[ 15.161348] __sched_setaffinity+0x72/0x100
[ 15.161459] sched_setaffinity+0x261/0x2f0
[ 15.161569] __x64_sys_sched_setaffinity+0x50/0x80
[ 15.161705] do_syscall_64+0xbb/0x370
[ 15.161816] entry_SYSCALL_64_after_hwframe+0x77/0x7f
Are we missing a DEQUEUE_LOCKED in the sched_setaffinity() path?
Thanks,
-Andrea