[PATCH sched_ext/for-6.12] sched_ext: TASK_DEAD tasks must be switched out of SCX on ops_disable

Tejun Heo posted 1 patch 1 year, 3 months ago
kernel/sched/ext.c |   24 ++++++++----------------
1 file changed, 8 insertions(+), 16 deletions(-)
[PATCH sched_ext/for-6.12] sched_ext: TASK_DEAD tasks must be switched out of SCX on ops_disable
Posted by Tejun Heo 1 year, 3 months ago
scx_ops_disable_workfn() only switches !TASK_DEAD tasks out of SCX while
calling scx_ops_exit_task() on all tasks including dead ones. This can leave
a dead task on SCX but with SCX_TASK_NONE state, which is inconsistent.

If another task was in the process of changing the TASK_DEAD task's
scheduling class and grabs the rq lock after scx_ops_disable_workfn() is
done with the task, the task ends up calling scx_ops_disable_task() on the
dead task which is in an inconsistent state triggering a warning:

  WARNING: CPU: 6 PID: 3316 at kernel/sched/ext.c:3411 scx_ops_disable_task+0x12c/0x160
  ...
  RIP: 0010:scx_ops_disable_task+0x12c/0x160
  ...
  Call Trace:
   <TASK>
   check_class_changed+0x2c/0x70
   __sched_setscheduler+0x8a0/0xa50
   do_sched_setscheduler+0x104/0x1c0
   __x64_sys_sched_setscheduler+0x18/0x30
   do_syscall_64+0x7b/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f140d70ea5b

There is no reason to leave dead tasks on SCX when unloading the BPF
scheduler. Fix by making scx_ops_disable_workfn() eject all tasks including
the dead ones from SCX.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/ext.c |   24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 57f30b1604db..a1340d3c711c 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4051,30 +4051,22 @@ static void scx_ops_disable_workfn(struct kthread_work *work)
 	spin_lock_irq(&scx_tasks_lock);
 	scx_task_iter_init(&sti);
 	/*
-	 * Invoke scx_ops_exit_task() on all non-idle tasks, including
-	 * TASK_DEAD tasks. Because dead tasks may have a nonzero refcount,
-	 * we may not have invoked sched_ext_free() on them by the time a
-	 * scheduler is disabled. We must therefore exit the task here, or we'd
-	 * fail to invoke ops.exit_task(), as the scheduler will have been
-	 * unloaded by the time the task is subsequently exited on the
-	 * sched_ext_free() path.
+	 * The BPF scheduler is going away. All tasks including %TASK_DEAD ones
+	 * must be switched out and exited synchronously.
 	 */
 	while ((p = scx_task_iter_next_locked(&sti, true))) {
 		const struct sched_class *old_class = p->sched_class;
 		struct sched_enq_and_set_ctx ctx;
 
-		if (READ_ONCE(p->__state) != TASK_DEAD) {
-			sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE,
-					       &ctx);
+		sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx);
 
-			p->scx.slice = min_t(u64, p->scx.slice, SCX_SLICE_DFL);
-			__setscheduler_prio(p, p->prio);
-			check_class_changing(task_rq(p), p, old_class);
+		p->scx.slice = min_t(u64, p->scx.slice, SCX_SLICE_DFL);
+		__setscheduler_prio(p, p->prio);
+		check_class_changing(task_rq(p), p, old_class);
 
-			sched_enq_and_set_task(&ctx);
+		sched_enq_and_set_task(&ctx);
 
-			check_class_changed(task_rq(p), p, old_class, p->prio);
-		}
+		check_class_changed(task_rq(p), p, old_class, p->prio);
 		scx_ops_exit_task(p);
 	}
 	scx_task_iter_exit(&sti);
Re: [PATCH sched_ext/for-6.12] sched_ext: TASK_DEAD tasks must be switched out of SCX on ops_disable
Posted by Tejun Heo 1 year, 3 months ago
On Fri, Aug 30, 2024 at 01:44:40PM -1000, Tejun Heo wrote:
> scx_ops_disable_workfn() only switches !TASK_DEAD tasks out of SCX while
> calling scx_ops_exit_task() on all tasks including dead ones. This can leave
> a dead task on SCX but with SCX_TASK_NONE state, which is inconsistent.
> 
> If another task was in the process of changing the TASK_DEAD task's
> scheduling class and grabs the rq lock after scx_ops_disable_workfn() is
> done with the task, the task ends up calling scx_ops_disable_task() on the
> dead task which is in an inconsistent state triggering a warning:
> 
>   WARNING: CPU: 6 PID: 3316 at kernel/sched/ext.c:3411 scx_ops_disable_task+0x12c/0x160
>   ...
>   RIP: 0010:scx_ops_disable_task+0x12c/0x160
>   ...
>   Call Trace:
>    <TASK>
>    check_class_changed+0x2c/0x70
>    __sched_setscheduler+0x8a0/0xa50
>    do_sched_setscheduler+0x104/0x1c0
>    __x64_sys_sched_setscheduler+0x18/0x30
>    do_syscall_64+0x7b/0x140
>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>   RIP: 0033:0x7f140d70ea5b
> 
> There is no reason to leave dead tasks on SCX when unloading the BPF
> scheduler. Fix by making scx_ops_disable_workfn() eject all tasks including
> the dead ones from SCX.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Applied to sched_ext/for-6.12.

Thanks.

-- 
tejun