From nobody Sat Jun 13 18:12:12 2026 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FDEF2EAD15 for ; Wed, 6 May 2026 05:40:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778046009; cv=none; b=RHMcM3+OzXg6Tp92ae1eVgyyHtWPIJHoBfRLmKxI+S4+N1jw2gNu8YuhPchK6TSHSmrM2QRHpAuby3Y8UO/0IUxt0rn+KSV8T+Sa8QZ8MGaPrRgv7z8rN2+kgfWpQJX4NgJr7JZr1OT/IjaybFXplywk09HYvVua+OMSbZHvaj4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778046009; c=relaxed/simple; bh=0oxnFNHfER71Cwu+YeeIpMGm295Jj197SCHbqAMmPxs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bKzuM1vXn00ELO+re42iBYVf+E7wVE/5Cmp3XgOD728tq7m7dbD9qP8tn36VOYzTGZoCCUppVAQiI3qmNLSlH67MEEYjisnxVYkxxZK/zkWlOzWAqfPfRxuEL2/KGD6S4osXLlLSvUEpcR5wLzipfZuNz7vDlWdNM0Oj+DJoEiY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=dHi9Hphw; arc=none smtp.client-ip=209.85.210.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dHi9Hphw" Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-835386ff122so3151503b3a.3 for ; Tue, 05 May 2026 22:40:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778046007; x=1778650807; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BH/zOpdVMe548X4mjycpdIqPmJXYXW7FrfDBJMM87/o=; b=dHi9HphwsRP7sDRCj7G9S8giNmW3LzaldbcFgYMG8dSQYJbMFcqfM6jn+HFMnY15Pj NRzuWxPGSn+ofm42P77mF/2n4G9qXrr5Ucr8Dg3JHYH0tX6kU84xyfxHRr31ytHWxuep z2F2LTcTbDp7yK1bHl8ohTR3tFiBGV32HWh+rSthNRZMYKBsgarisxYZEkrAbHt+sRZ+ w4Tp28ISnwAaG3cYME3fI5sh/Nc3igANmM9RyOe7SYfsZkTRVlNxrPJ6pnvLlODMKn30 ejfjeqJ0SqHMIo8sc61RG9sMCmkFVnP/blnq/Ze1JHFgToiuM5kP6UzQg9mXUdTjg6US TFrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778046007; x=1778650807; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=BH/zOpdVMe548X4mjycpdIqPmJXYXW7FrfDBJMM87/o=; b=Q93mAyAqV8M8pOnyqlTjRl7Is4ap3B1gtveC9Pkyqw8mlQj+74wT8kYd7JfNGc/WGV gWeZREhs3iDcW8yvR0CqLqR3mm7BzdUdngaStMvpy42ELsQbdMFyMs4+dKugM8ZJKbf3 8/6domG111yhdl9UD7PVbbwDOxx78NfsQDoYGbGndvxCjeU5hxAR1BBXO3pVe/FjhUDb Qxu6xoCNgNR8IEYX55R5sJWxuuwUMt6LipKMynio1cn0FRKY3dtdt/XZljhBB7iYDs8f gPZcsqUHgP0L5ZLQJtg4AX9QmHP5nvEbvNLKoM+wCaJzfqrFCy+3L367lIJXtGy0RgD1 U3fw== X-Forwarded-Encrypted: i=1; AFNElJ+p/ET26bJMVnS4tXUoazmDXe6UDjajdDx+r1OLzaPA0LNp/gcKpF1WMPK3uzNYNVGVq0IZI9UpjriDLyw=@vger.kernel.org X-Gm-Message-State: AOJu0YygUZYmnG/nxpfncGGeYobyQ3EIp9rzEWNhDYjmZ+YiHnXGB5FL tD1zdE5uRuACKXYZsKNhThiPPubsp+eLmEC6cJMOKUa2Oe5SH7wGjGD+ X-Gm-Gg: AeBDieuYJo14z3AqwN2E1+n74wtftqE9qWuaB7ursDUME0M0WVlpzVwNNpbzKcNCtAk Yf82BL/DuPmKMFKnr4mBWKOc1onBY/7vTljXNQtpfYpLH9+2gC2ikT9DbFXg87q0Xyc9HGxhGA5 R0QCL9C9sb71Pl7V5Tdmkdhmm/jqLCBnpmhZSK9MH0x1T7koONIV7PFRr/QDvaz3HiKRcawfg83 g6+3eAzA3zD9C/g12LJCoyDiu2+nFCxOS1l747SiWIABQQmlo21o8S2UKgEh8DnK0U7wfASv9dN wSOOJClIm41fI2WpzOCk3F6bJunX+YsY25I424X93C+G/KUDohBCFS+K93K0WgLRvW80myDYrXL YsCIRQ4kPoSiEuFKDY9cqfU9gILVLd31yzqgMtRC5WBKqnDyKIo/VMxi2DKaClYPlPNTdg0/FYq 5KdiBTu5UemneIML8jCNdtOYZu6oHljIJD5XphtJMDsiFEfNReb0uVefvitoS1SjfDF9y/AGU8m csib0W2No75Z640Pding4Cr X-Received: by 2002:a05:6a00:844:b0:838:127d:a161 with SMTP id d2e1a72fcca58-83a5c4c5817mr1876113b3a.18.1778046006523; Tue, 05 May 2026 22:40:06 -0700 (PDT) Received: from mi-HP-ProDesk-680-G6-PCI-Microtower-PC.mioffice.cn ([43.224.245.226]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-839679c8462sm4080514b3a.38.2026.05.05.22.40.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 May 2026 22:40:06 -0700 (PDT) From: zhidao su X-Google-Original-From: zhidao su To: tj@kernel.org Cc: void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org Subject: [PATCH v2] sched_ext: Fix sched_ext_dead() race with scx_root_enable_workfn() Date: Wed, 6 May 2026 13:40:01 +0800 Message-ID: <20260506054001.1105522-1-suzhidao@xiaomi.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260429133155.3825247-1-suzhidao@xiaomi.com> References: <20260429133155.3825247-1-suzhidao@xiaomi.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In CONFIG_EXT_SUB_SCHED, scx_task_sched(p) returns p->scx.sched instead of scx_root. scx_root_enable_workfn() iterates all tasks and for each releases scx_tasks_lock via scx_task_iter_unlock() before calling scx_init_task(). A concurrent sched_ext_dead() can race in this window. Two bugs: 1. NULL deref: If sched_ext_dead() runs after scx_init_task() sets state=3DINIT but before the callsite sets p->scx.sched, the invariant "state !=3D NONE =3D> p->scx.sched !=3D NULL" is broken. sched_ext_dead= () calls scx_disable_and_exit_task(scx_task_sched(p)=3DNULL, p), which crashes in SCX_HAS_OP(NULL, ...). 2. Resource leak: If sched_ext_dead() runs before scx_init_task() when state=3DNONE, it skips scx_disable_and_exit_task() (state check fails). scx_init_task() then calls ops.init_task() and sets state=3DINIT. The enable loop never calls ops.exit_task(), leaking whatever ops.init_task() allocated. Fix both: - Move scx_set_task_sched(p, sch) into scx_init_task(), before the state transition off NONE. This restores the invariant so sched_ext_dead() always finds a valid scheduler pointer (fixes bug 1). - After scx_init_task() returns, check under scx_tasks_lock whether @p is still on scx_tasks. If not, sched_ext_dead() raced us. If state !=3D NONE, ops.init_task() ran before sched_ext_dead() saw state=3DNONE, so call scx_disable_and_exit_task() with cancelled=3Dtrue to release the resources (fixes bug 2). If state=3DNONE, sched_ext_dead() already cleaned up. Fixes: 88234b075c3f ("sched_ext: Introduce scx_task_sched[_rcu]()") Signed-off-by: zhidao su --- v2: Rewrite as writer-side fix per Tejun's review: - Move scx_set_task_sched(p, sch) into scx_init_task() before the state transition off NONE, restoring the "state!=3DNONE =3D> p->scx.sched!=3DN= ULL" invariant. Bug 1 (NULL deref) is fixed without touching sched_ext_dead(= ). - Handle bug 2 (resource leak) in the workfn's list_empty() path by calling scx_disable_and_exit_task() when state!=3DNONE, instead of the v1 reader-side branch in sched_ext_dead() that leaked resources. - Update Fixes: to 88234b075c3f ("sched_ext: Introduce scx_task_sched[_rcu= ]()") which is when scx_task_sched(p) started dereferencing p->scx.sched. kernel/sched/ext.c | 59 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 52 insertions(+), 7 deletions(-) diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index f7b1b16e81a5..99560f77af81 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -3583,7 +3583,15 @@ static int scx_init_task(struct scx_sched *sch, stru= ct task_struct *p, bool fork /* * While @p's rq is not locked. @p is not visible to the rest of * SCX yet and it's safe to update the flags and state. + * + * Install p->scx.sched before transitioning state off NONE so + * that the invariant state!=3DNONE =3D> p->scx.sched!=3DNULL holds as + * soon as state becomes observable. A concurrent sched_ext_dead() + * that races the INIT window will then always find a valid + * scheduler pointer and can call scx_disable_and_exit_task() + * to release resources allocated by ops.init_task(). */ + scx_set_task_sched(p, sch); p->scx.flags |=3D SCX_TASK_RESET_RUNNABLE_AT; scx_set_task_state(p, SCX_TASK_INIT); } @@ -3769,8 +3777,6 @@ void scx_pre_fork(struct task_struct *p) =20 int scx_fork(struct task_struct *p, struct kernel_clone_args *kargs) { - s32 ret; - percpu_rwsem_assert_held(&scx_fork_rwsem); =20 p->scx.tid =3D scx_alloc_tid(); @@ -3781,10 +3787,7 @@ int scx_fork(struct task_struct *p, struct kernel_cl= one_args *kargs) #else struct scx_sched *sch =3D scx_root; #endif - ret =3D scx_init_task(sch, p, true); - if (!ret) - scx_set_task_sched(p, sch); - return ret; + return scx_init_task(sch, p, true); } =20 return 0; @@ -6937,7 +6940,49 @@ static void scx_root_enable_workfn(struct kthread_wo= rk *work) goto err_disable_unlock_all; } =20 - scx_set_task_sched(p, sch); + /* + * sched_ext_dead() may have raced while locks were dropped in + * scx_task_iter_unlock(). Two cases: + * + * (a) sched_ext_dead() ran after scx_init_task() set state=3DINIT: + * it called scx_disable_and_exit_task() (cancelled=3Dtrue) and + * reset state to NONE. ops.exit_task() already ran; skip. + * + * (b) sched_ext_dead() ran before scx_init_task() (state=3DNONE at + * the time): it skipped scx_disable_and_exit_task() because + * state was NONE. scx_init_task() subsequently called + * ops.init_task() and set state=3DINIT, leaving allocated + * resources with no owner. We must call + * scx_disable_and_exit_task() here to release them. + * + * Distinguish case (a) from (b) by reading state: (a) leaves + * state=3DNONE (reset by scx_disable_and_exit_task); (b) leaves + * state=3DINIT (set by scx_init_task, never reset). + */ + { + bool p_dead =3D false, need_exit =3D false; + + scoped_guard(raw_spinlock_irq, &scx_tasks_lock) { + if (list_empty(&p->scx.tasks_node)) { + p_dead =3D true; + need_exit =3D scx_get_task_state(p) !=3D SCX_TASK_NONE; + } + } + + if (p_dead) { + if (need_exit) { + struct rq_flags rf; + struct rq *rq; + + rq =3D task_rq_lock(p, &rf); + scx_disable_and_exit_task(sch, p); + task_rq_unlock(rq, p, &rf); + } + put_task_struct(p); + continue; + } + } + scx_set_task_state(p, SCX_TASK_READY); =20 /* --=20 2.43.0