From nobody Fri Dec 19 07:17:40 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12C8D15C150 for ; Fri, 30 Aug 2024 23:44:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725061483; cv=none; b=sQySkN4qOqLrhBLQvVLPEBLw65Sg9kauPUWEnokDWHsRwNav2Go3M5iOqTi/0Qcy3JMD7VdVWsLjejsuh32FUbU26qhFbZM/yjPsJoTXzHVbUS1wYa3fN7dJIZcPoCl+YkPlCAAH43LasHUPES1119JPi0fnI48zTTld3bBXzYw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725061483; c=relaxed/simple; bh=7tquis1hF3WP05sIeYmloT33J1l7BkNdQSL+Utj2T8k=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition; b=Q54ngPsI17De7tDkifytRrRH5ryffKfVRCJW6vPv0Y7d/Pq4xVA4LLFdwF9sU//0ltiPY7v2uFMVlhWbfL9Zg6QtQAoP1Cabi5KeKvOJTtQoMRIi/HEUko63kNYa/y5t5f+9essqVKO5q6h6mk5kwODRt0ldWlGqmoXILoKj1JE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WCy9qUc4; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WCy9qUc4" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 64AA8C4CEC2; Fri, 30 Aug 2024 23:44:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1725061481; bh=7tquis1hF3WP05sIeYmloT33J1l7BkNdQSL+Utj2T8k=; h=Date:From:To:Cc:Subject:From; b=WCy9qUc4GbxuT0TQIiyqzGZHhcZ5aqCA5vBdVYKxtiHyVCHHlgZEMYTpi3y3q1N9I YSVggnR0rq763L+xZgJ83uUHLqqLmo2kKLx1zgRnb0rF1QtEvV6jaK9XpNm+RwX/7Z D18qqFfhYsvZk40a0/XWhesdCXIiCNayIr0O1mCUo9KDo9Soww0q0F/Ht63JvGmUva NwXaEIbBGxTWggsDcVBtmRZ4MkYNX+xWUkiBwQabG7dmA9AGjO2Phjzkx1d4X4E/dA RdMfwYNGOhKUmn3k7EgZLIHJmQPkcakTeubr5RjzU+Pd3KjH32wpYX2b5bU6YpOXaF mxXnT0dCNSIMA== Date: Fri, 30 Aug 2024 13:44:40 -1000 From: Tejun Heo To: David Vernet Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, Andrea Righi Subject: [PATCH sched_ext/for-6.12] sched_ext: TASK_DEAD tasks must be switched out of SCX on ops_disable Message-ID: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" scx_ops_disable_workfn() only switches !TASK_DEAD tasks out of SCX while calling scx_ops_exit_task() on all tasks including dead ones. This can leave a dead task on SCX but with SCX_TASK_NONE state, which is inconsistent. If another task was in the process of changing the TASK_DEAD task's scheduling class and grabs the rq lock after scx_ops_disable_workfn() is done with the task, the task ends up calling scx_ops_disable_task() on the dead task which is in an inconsistent state triggering a warning: WARNING: CPU: 6 PID: 3316 at kernel/sched/ext.c:3411 scx_ops_disable_task= +0x12c/0x160 ... RIP: 0010:scx_ops_disable_task+0x12c/0x160 ... Call Trace: check_class_changed+0x2c/0x70 __sched_setscheduler+0x8a0/0xa50 do_sched_setscheduler+0x104/0x1c0 __x64_sys_sched_setscheduler+0x18/0x30 do_syscall_64+0x7b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f140d70ea5b There is no reason to leave dead tasks on SCX when unloading the BPF scheduler. Fix by making scx_ops_disable_workfn() eject all tasks including the dead ones from SCX. Signed-off-by: Tejun Heo --- kernel/sched/ext.c | 24 ++++++++---------------- 1 file changed, 8 insertions(+), 16 deletions(-) diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 57f30b1604db..a1340d3c711c 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -4051,30 +4051,22 @@ static void scx_ops_disable_workfn(struct kthread_w= ork *work) spin_lock_irq(&scx_tasks_lock); scx_task_iter_init(&sti); /* - * Invoke scx_ops_exit_task() on all non-idle tasks, including - * TASK_DEAD tasks. Because dead tasks may have a nonzero refcount, - * we may not have invoked sched_ext_free() on them by the time a - * scheduler is disabled. We must therefore exit the task here, or we'd - * fail to invoke ops.exit_task(), as the scheduler will have been - * unloaded by the time the task is subsequently exited on the - * sched_ext_free() path. + * The BPF scheduler is going away. All tasks including %TASK_DEAD ones + * must be switched out and exited synchronously. */ while ((p =3D scx_task_iter_next_locked(&sti, true))) { const struct sched_class *old_class =3D p->sched_class; struct sched_enq_and_set_ctx ctx; =20 - if (READ_ONCE(p->__state) !=3D TASK_DEAD) { - sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, - &ctx); + sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx); =20 - p->scx.slice =3D min_t(u64, p->scx.slice, SCX_SLICE_DFL); - __setscheduler_prio(p, p->prio); - check_class_changing(task_rq(p), p, old_class); + p->scx.slice =3D min_t(u64, p->scx.slice, SCX_SLICE_DFL); + __setscheduler_prio(p, p->prio); + check_class_changing(task_rq(p), p, old_class); =20 - sched_enq_and_set_task(&ctx); + sched_enq_and_set_task(&ctx); =20 - check_class_changed(task_rq(p), p, old_class, p->prio); - } + check_class_changed(task_rq(p), p, old_class, p->prio); scx_ops_exit_task(p); } scx_task_iter_exit(&sti);