sched_ext: Clear direct dispatch state on dequeue when dsq is NULL

[PATCH] sched_ext: Clear direct dispatch state on dequeue when dsq is NULL

Posted by Daniel Hodges 2 weeks, 2 days ago

When a task is direct-dispatched from ops.select_cpu() or ops.enqueue(),
ddsp_dsq_id is set to indicate the target DSQ. If the task is dequeued
before dispatch_enqueue() completes (e.g., task killed or receives a
signal), dispatch_dequeue() is called with dsq == NULL.

In this case, the task is unlinked from ddsp_deferred_locals and
holding_cpu is cleared, but ddsp_dsq_id and ddsp_enq_flags are left
stale. On the next wakeup, when ops.select_cpu() calls
scx_bpf_dsq_insert(), mark_direct_dispatch() finds ddsp_dsq_id already
set and triggers:

  WARNING: CPU: 56 PID: 2323042 at kernel/sched/ext.c:2157
           scx_bpf_dsq_insert+0x16b/0x1d0

Fix this by clearing ddsp_dsq_id and ddsp_enq_flags in dispatch_dequeue()
when dsq is NULL, ensuring clean state for subsequent wakeups.

Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Signed-off-by: Daniel Hodges <hodgesd@meta.com>
---
 kernel/sched/ext.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index b563b8c3fd24..fdfef3fd8814 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1143,20 +1143,28 @@ static void dispatch_dequeue(struct rq *rq, struct task_struct *p)
 
 		/*
 		 * When dispatching directly from the BPF scheduler to a local
 		 * DSQ, the task isn't associated with any DSQ but
 		 * @p->scx.holding_cpu may be set under the protection of
 		 * %SCX_OPSS_DISPATCHING.
 		 */
 		if (p->scx.holding_cpu >= 0)
 			p->scx.holding_cpu = -1;
 
+		/*
+		 * Clear direct dispatch state. The task may have been
+		 * direct-dispatched from ops.select_cpu() or ops.enqueue()
+		 * but dequeued before the dispatch completed.
+		 */
+		p->scx.ddsp_dsq_id = SCX_DSQ_INVALID;
+		p->scx.ddsp_enq_flags = 0;
+
 		return;
 	}
 
 	if (!is_local)
 		raw_spin_lock(&dsq->lock);
 
 	/*
 	 * Now that we hold @dsq->lock, @p->holding_cpu and @p->scx.dsq_* can't
 	 * change underneath us.
 	*/
-- 
2.47.3

Re: [PATCH] sched_ext: Clear direct dispatch state on dequeue when dsq is NULL

Posted by Andrea Righi 2 weeks, 2 days ago

Hi Daniel,

On Wed, Jan 21, 2026 at 07:56:02AM -0800, Daniel Hodges wrote:
> When a task is direct-dispatched from ops.select_cpu() or ops.enqueue(),
> ddsp_dsq_id is set to indicate the target DSQ. If the task is dequeued
> before dispatch_enqueue() completes (e.g., task killed or receives a
> signal), dispatch_dequeue() is called with dsq == NULL.
> 
> In this case, the task is unlinked from ddsp_deferred_locals and
> holding_cpu is cleared, but ddsp_dsq_id and ddsp_enq_flags are left
> stale. On the next wakeup, when ops.select_cpu() calls
> scx_bpf_dsq_insert(), mark_direct_dispatch() finds ddsp_dsq_id already
> set and triggers:
> 
>   WARNING: CPU: 56 PID: 2323042 at kernel/sched/ext.c:2157
>            scx_bpf_dsq_insert+0x16b/0x1d0
> 
> Fix this by clearing ddsp_dsq_id and ddsp_enq_flags in dispatch_dequeue()
> when dsq is NULL, ensuring clean state for subsequent wakeups.

I've tried to fix this a while ago (same as this, right?
https://github.com/sched-ext/scx/issues/2758), I remember that I applied
exactly the same patch, but I was still able to trigger the warning.

IIRC there's also a race in ttwu_queue_wakelist tasks and
sched_setscheduler() that can hit the stale ddsp_dsq_id (maybe other
cases).

Long story short, the only thing that was working reliably for me was to
clear ddsp_dsq_id and ddsp_enq_flags in select_task_rq_scx(), but I thought
it was a bit too overkill and then I've never finished to investigate the
real issue...

In conclusion, I think this is fixing some of these warnings that we see
and it's probably good to apply it, but it's not fixing all of them.

Anyway, I'll do some tests with this patch and report back!

Thanks,
-Andrea

> 
> Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
> Signed-off-by: Daniel Hodges <hodgesd@meta.com>
> ---
>  kernel/sched/ext.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index b563b8c3fd24..fdfef3fd8814 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -1143,20 +1143,28 @@ static void dispatch_dequeue(struct rq *rq, struct task_struct *p)
>  
>  		/*
>  		 * When dispatching directly from the BPF scheduler to a local
>  		 * DSQ, the task isn't associated with any DSQ but
>  		 * @p->scx.holding_cpu may be set under the protection of
>  		 * %SCX_OPSS_DISPATCHING.
>  		 */
>  		if (p->scx.holding_cpu >= 0)
>  			p->scx.holding_cpu = -1;
>  
> +		/*
> +		 * Clear direct dispatch state. The task may have been
> +		 * direct-dispatched from ops.select_cpu() or ops.enqueue()
> +		 * but dequeued before the dispatch completed.
> +		 */
> +		p->scx.ddsp_dsq_id = SCX_DSQ_INVALID;
> +		p->scx.ddsp_enq_flags = 0;
> +
>  		return;
>  	}
>  
>  	if (!is_local)
>  		raw_spin_lock(&dsq->lock);
>  
>  	/*
>  	 * Now that we hold @dsq->lock, @p->holding_cpu and @p->scx.dsq_* can't
>  	 * change underneath us.
>  	*/
> -- 
> 2.47.3
>

Re: [PATCH] sched_ext: Clear direct dispatch state on dequeue when dsq is NULL

Posted by Daniel Hodges 2 weeks, 2 days ago

On Wed, Jan 21, 2026 at 10:10:59PM +0100, Andrea Righi wrote:
> Hi Daniel,
> 
> On Wed, Jan 21, 2026 at 07:56:02AM -0800, Daniel Hodges wrote:
> > When a task is direct-dispatched from ops.select_cpu() or ops.enqueue(),
> > ddsp_dsq_id is set to indicate the target DSQ. If the task is dequeued
> > before dispatch_enqueue() completes (e.g., task killed or receives a
> > signal), dispatch_dequeue() is called with dsq == NULL.
> > 
> > In this case, the task is unlinked from ddsp_deferred_locals and
> > holding_cpu is cleared, but ddsp_dsq_id and ddsp_enq_flags are left
> > stale. On the next wakeup, when ops.select_cpu() calls
> > scx_bpf_dsq_insert(), mark_direct_dispatch() finds ddsp_dsq_id already
> > set and triggers:
> > 
> >   WARNING: CPU: 56 PID: 2323042 at kernel/sched/ext.c:2157
> >            scx_bpf_dsq_insert+0x16b/0x1d0
> > 
> > Fix this by clearing ddsp_dsq_id and ddsp_enq_flags in dispatch_dequeue()
> > when dsq is NULL, ensuring clean state for subsequent wakeups.
> 
> I've tried to fix this a while ago (same as this, right?
> https://github.com/sched-ext/scx/issues/2758), I remember that I applied
> exactly the same patch, but I was still able to trigger the warning.
> 
> IIRC there's also a race in ttwu_queue_wakelist tasks and
> sched_setscheduler() that can hit the stale ddsp_dsq_id (maybe other
> cases).

I figured there was probably some other paths that it could race.


> Long story short, the only thing that was working reliably for me was to
> clear ddsp_dsq_id and ddsp_enq_flags in select_task_rq_scx(), but I thought
> it was a bit too overkill and then I've never finished to investigate the
> real issue...
> 
> In conclusion, I think this is fixing some of these warnings that we see
> and it's probably good to apply it, but it's not fixing all of them.
> 
> Anyway, I'll do some tests with this patch and report back!
> 
> Thanks,
> -Andrea

Sounds good, I hit this running cosmos on a moderately loaded machine.
I'll see if I can get a reproducer made and do some more testing.

Re: [PATCH] sched_ext: Clear direct dispatch state on dequeue when dsq is NULL

Posted by Andrea Righi 1 week, 3 days ago

On Wed, Jan 21, 2026 at 04:31:02PM -0800, Daniel Hodges wrote:
...
> > Long story short, the only thing that was working reliably for me was to
> > clear ddsp_dsq_id and ddsp_enq_flags in select_task_rq_scx(), but I thought
> > it was a bit too overkill and then I've never finished to investigate the
> > real issue...
> > 
> > In conclusion, I think this is fixing some of these warnings that we see
> > and it's probably good to apply it, but it's not fixing all of them.
> > 
> > Anyway, I'll do some tests with this patch and report back!
> > 
> > Thanks,
> > -Andrea
> 
> Sounds good, I hit this running cosmos on a moderately loaded machine.
> I'll see if I can get a reproducer made and do some more testing.

This is with 6.19.0-rc7 + this fix on top:

 WARNING: kernel/sched/ext.c:1282 at scx_dsq_insert_commit+0xf2/0x120, CPU#13: alacritty/6070

Which is WARN_ON_ONCE(p->scx.ddsp_dsq_id != SCX_DSQ_INVALID) in
mark_direct_dispatch().

Triggered almost immediately after loading scx_cosmos on my laptop. I'll
also try to find a better reproducer (ideally inside a VM).

-Andrea