From nobody Mon Dec 1 21:31:22 2025 Received: from out-181.mta1.migadu.com (out-181.mta1.migadu.com [95.215.58.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D32D3081B8 for ; Mon, 1 Dec 2025 11:26:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764588365; cv=none; b=Xh30zW1mCTsMJbp3PQmC2TsKfIBgJPv/pJy4TIqz9UIt6xJWpsSKqARVrxenItpwqhiXzT5wLStzyJ/o/EdsvXWftZlR7shT80aMGnGglpuqItqSulSnhigiENdibG9LzcAjwUz8TeRGUxkvkEy2uPO/g052+6t1RdG/IumnjEE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764588365; c=relaxed/simple; bh=5Vq+WolWiK/uGdZ+c/W+BhuFZIGyPF1s6MZ+inUqCR8=; h=From:To:Cc:Subject:Date:Message-Id; b=jNkV+dgStIi42a0rjLyICdDg6JOf/6Xtc+5tbmGXLBINMlTeRh6qpFpKkIk+L1omOcT335fMVSp8YGb3xcpArmoGks5K961vyEV9QD5gaAA2HuPe9gFO70AAdYFfI92QtmQBXEe82QmH62asb5NP8ACCGwlRL2f+qRZdKgsETDc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=cbAdHDe1; arc=none smtp.client-ip=95.215.58.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="cbAdHDe1" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1764588350; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc; bh=JRkYCLGhXPfyeJs7bjyofrbyhJTDSe7vtCpUdjvtoUc=; b=cbAdHDe1CdLDMQMZ2aDJdM04Hl+RHj5on1fgtvTomr/AKgXHKiCMGwDT2vhouHOsa/0seF xZP4F97hHFimJopKY+RXxHevCA+MVTasJTvs3JqRth8ESDg5ShidYTxSkexPflMSlq1Csf GOvP97b+RFz2gvbwN0jgUHPbq5YkKNU= From: Zqiang To: tj@kernel.org, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com Cc: sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org, qiang.zhang@linux.dev Subject: [PATCH] sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks Date: Mon, 1 Dec 2025 19:25:40 +0800 Message-Id: <20251201112540.5119-1-qiang.zhang@linux.dev> X-Migadu-Flow: FLOW_OUT Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" When loading the ebpf scheduler, the tasks in the scx_tasks list will be traversed and invoke __setscheduler_class() to get new sched_class. however, this would also incorrectly set the per-cpu migration task's->sched_class to rt_sched_class, even after unload, the per-cpu migration task's->sched_class remains sched_rt_class. The log for this issue is as follows: ./scx_rustland --stats 1 [ 199.245639][ T630] sched_ext: "rustland" does not implement cgroup cpu.= weight [ 199.269213][ T630] sched_ext: BPF scheduler "rustland" enabled 04:25:09 [INFO] RustLand scheduler attached bpftrace -e 'iter:task /strcontains(ctx->task->comm, "migration")/ { printf("%s:%d->%pS\n", ctx->task->comm, ctx->task->pid, ctx->task->sched_= class); }' Attaching 1 probe... migration/0:24->rt_sched_class+0x0/0xe0 migration/1:27->rt_sched_class+0x0/0xe0 migration/2:33->rt_sched_class+0x0/0xe0 migration/3:39->rt_sched_class+0x0/0xe0 migration/4:45->rt_sched_class+0x0/0xe0 migration/5:52->rt_sched_class+0x0/0xe0 migration/6:58->rt_sched_class+0x0/0xe0 migration/7:64->rt_sched_class+0x0/0xe0 sched_ext: BPF scheduler "rustland" disabled (unregistered from user space) EXIT: unregistered from user space 04:25:21 [INFO] Unregister RustLand scheduler bpftrace -e 'iter:task /strcontains(ctx->task->comm, "migration")/ { printf("%s:%d->%pS\n", ctx->task->comm, ctx->task->pid, ctx->task->sched_= class); }' Attaching 1 probe... migration/0:24->rt_sched_class+0x0/0xe0 migration/1:27->rt_sched_class+0x0/0xe0 migration/2:33->rt_sched_class+0x0/0xe0 migration/3:39->rt_sched_class+0x0/0xe0 migration/4:45->rt_sched_class+0x0/0xe0 migration/5:52->rt_sched_class+0x0/0xe0 migration/6:58->rt_sched_class+0x0/0xe0 migration/7:64->rt_sched_class+0x0/0xe0 This commit therefore generate a new scx_setscheduler_class() and add check for stop_sched_class to replace __setscheduler_class(). Signed-off-by: Zqiang Reviewed-by: Andrea Righi --- kernel/sched/ext.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index b40d35964cd4..9447fada0050 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -248,6 +248,14 @@ static struct scx_dispatch_q *find_user_dsq(struct scx= _sched *sch, u64 dsq_id) return rhashtable_lookup(&sch->dsq_hash, &dsq_id, dsq_hash_params); } =20 +static const struct sched_class *scx_setscheduler_class(struct task_struct= *p) +{ + if (p->sched_class =3D=3D &stop_sched_class) + return &stop_sched_class; + + return __setscheduler_class(p->policy, p->prio); +} + /* * scx_kf_mask enforcement. Some kfuncs can only be called from specific S= CX * ops. When invoking SCX ops, SCX_CALL_OP[_RET]() should be used to indic= ate @@ -4241,8 +4249,7 @@ static void scx_disable_workfn(struct kthread_work *w= ork) while ((p =3D scx_task_iter_next_locked(&sti))) { unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; const struct sched_class *old_class =3D p->sched_class; - const struct sched_class *new_class =3D - __setscheduler_class(p->policy, p->prio); + const struct sched_class *new_class =3D scx_setscheduler_class(p); =20 update_rq_clock(task_rq(p)); =20 @@ -5045,8 +5052,7 @@ static int scx_enable(struct sched_ext_ops *ops, stru= ct bpf_link *link) while ((p =3D scx_task_iter_next_locked(&sti))) { unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE; const struct sched_class *old_class =3D p->sched_class; - const struct sched_class *new_class =3D - __setscheduler_class(p->policy, p->prio); + const struct sched_class *new_class =3D scx_setscheduler_class(p); =20 if (scx_get_task_state(p) !=3D SCX_TASK_READY) continue; --=20 2.17.1