From nobody Sun Feb 8 04:31:06 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A38C26A0AD; Mon, 6 Oct 2025 10:46:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747564; cv=none; b=WcLdNCEX6JCXOvZn6UO5sMGFJUgOf5l4vZrwKUfmbWL8PbHnv2D53Y1YnKFhPnQVoUHSkHi5w2mWGS+RpYzVi+V5PnT78yEdz3g4j6rgu17n8CPnwull0+EmUJMc60k2GJbp1k0scyFFmRpaJ1DvW+bPii3V1zTiHhOIb66w8zs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747564; c=relaxed/simple; bh=uoGVd06lq2eF3hcvvzxnLHE/hF6e2Zz4st/hy0x5MHo=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=irII5RkaSLZOn7zxeul2yntkChH9yPQO1UiJJlfBuAUzFRgV/mMmcQxXovgiBn/4wCdpkmF6VXn6bzDSErt+QZUfx0tAZUy4clXDc19QI1FftWSba3Kqlj1Q+SErPbHFSTdF5KVbwkoQq9ZgavSO5XDNIfvgObRilJqdugsvTMs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=gD48spt2; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="gD48spt2" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=ATdxrEudMk6ySd+MumMvxmM6SQe/dD5bbpX5J10MovI=; b=gD48spt2BTZNgJn5zHCLeJfQtj sWEUdVe2P4LXAZ7ay+Ns8WyBBhBgdvrtdoPV554iJJcNFqzBINCiG7WTL6XeP0/yRJUEo5ptqMCEC yUCQfej5b5QJrnNeztgiybdrdpFhggJSaMgnnWLWi+MzOaKylisrzivfQ4vVvvdG79DwoKOWS3X0z GRiXyGX4RIUspUODH6aq077QhzrUpDSsiieDaDXyWzHTLazAZGRZsXpAsrymhI2P7a2UgnPBm+fqd r6wHBMrYWwe2QXnPH/91FuqUjCdGqd2uHOMvgJRC95LGaTAHO1rQoB7LpsqpAMPUWN7a5fi+dYchT qurqmC9Q==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1v5iiy-0000000Grdn-3KFS; Mon, 06 Oct 2025 10:45:49 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id BEE2A300289; Mon, 06 Oct 2025 12:45:47 +0200 (CEST) Message-ID: <20251006104526.613879143@infradead.org> User-Agent: quilt/0.68 Date: Mon, 06 Oct 2025 12:44:03 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 01/12] sched: Employ sched_change guards References: <20251006104402.946760805@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As proposed a long while ago -- and half done by scx -- wrap the scheduler's 'change' pattern in a guard helper. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Tejun Heo Acked-by: Andrea Righi Acked-by: Juri Lelli Acked-by: Vincent Guittot for the serie Reviewed-by: Juri Lelli Reviewed-by: Valentin Schneider --- include/linux/cleanup.h | 5 + kernel/sched/core.c | 157 ++++++++++++++++++-------------------------= ----- kernel/sched/ext.c | 39 +++++------ kernel/sched/sched.h | 21 +++--- kernel/sched/syscalls.c | 65 +++++++------------ 5 files changed, 116 insertions(+), 171 deletions(-) --- a/include/linux/cleanup.h +++ b/include/linux/cleanup.h @@ -340,6 +340,11 @@ _label: #define __DEFINE_CLASS_IS_CONDITIONAL(_name, _is_cond) \ static __maybe_unused const bool class_##_name##_is_conditional =3D _is_co= nd =20 +#define DEFINE_CLASS_IS_UNCONDITIONAL(_name) \ + __DEFINE_CLASS_IS_CONDITIONAL(_name, false); \ + static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \ + { return (void *)1; } + #define __GUARD_IS_ERR(_ptr) \ ({ \ unsigned long _rc =3D (__force unsigned long)(_ptr); \ --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7326,7 +7326,7 @@ void rt_mutex_post_schedule(void) */ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task) { - int prio, oldprio, queued, running, queue_flag =3D + int prio, oldprio, queue_flag =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; const struct sched_class *prev_class, *next_class; struct rq_flags rf; @@ -7391,52 +7391,42 @@ void rt_mutex_setprio(struct task_struct if (prev_class !=3D next_class && p->se.sched_delayed) dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); =20 - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - if (queued) - dequeue_task(rq, p, queue_flag); - if (running) - put_prev_task(rq, p); - - /* - * Boosting condition are: - * 1. -rt task is running and holds mutex A - * --> -dl task blocks on mutex A - * - * 2. -dl task is running and holds mutex A - * --> -dl task blocks on mutex A and could preempt the - * running task - */ - if (dl_prio(prio)) { - if (!dl_prio(p->normal_prio) || - (pi_task && dl_prio(pi_task->prio) && - dl_entity_preempt(&pi_task->dl, &p->dl))) { - p->dl.pi_se =3D pi_task->dl.pi_se; - queue_flag |=3D ENQUEUE_REPLENISH; + scoped_guard (sched_change, p, queue_flag) { + /* + * Boosting condition are: + * 1. -rt task is running and holds mutex A + * --> -dl task blocks on mutex A + * + * 2. -dl task is running and holds mutex A + * --> -dl task blocks on mutex A and could preempt the + * running task + */ + if (dl_prio(prio)) { + if (!dl_prio(p->normal_prio) || + (pi_task && dl_prio(pi_task->prio) && + dl_entity_preempt(&pi_task->dl, &p->dl))) { + p->dl.pi_se =3D pi_task->dl.pi_se; + scope->flags |=3D ENQUEUE_REPLENISH; + } else { + p->dl.pi_se =3D &p->dl; + } + } else if (rt_prio(prio)) { + if (dl_prio(oldprio)) + p->dl.pi_se =3D &p->dl; + if (oldprio < prio) + scope->flags |=3D ENQUEUE_HEAD; } else { - p->dl.pi_se =3D &p->dl; + if (dl_prio(oldprio)) + p->dl.pi_se =3D &p->dl; + if (rt_prio(oldprio)) + p->rt.timeout =3D 0; } - } else if (rt_prio(prio)) { - if (dl_prio(oldprio)) - p->dl.pi_se =3D &p->dl; - if (oldprio < prio) - queue_flag |=3D ENQUEUE_HEAD; - } else { - if (dl_prio(oldprio)) - p->dl.pi_se =3D &p->dl; - if (rt_prio(oldprio)) - p->rt.timeout =3D 0; - } =20 - p->sched_class =3D next_class; - p->prio =3D prio; + p->sched_class =3D next_class; + p->prio =3D prio; =20 - check_class_changing(rq, p, prev_class); - - if (queued) - enqueue_task(rq, p, queue_flag); - if (running) - set_next_task(rq, p); + check_class_changing(rq, p, prev_class); + } =20 check_class_changed(rq, p, prev_class, oldprio); out_unlock: @@ -8084,26 +8074,9 @@ int migrate_task_to(struct task_struct * */ void sched_setnuma(struct task_struct *p, int nid) { - bool queued, running; - struct rq_flags rf; - struct rq *rq; - - rq =3D task_rq_lock(p, &rf); - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - - if (queued) - dequeue_task(rq, p, DEQUEUE_SAVE); - if (running) - put_prev_task(rq, p); - - p->numa_preferred_nid =3D nid; - - if (queued) - enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK); - if (running) - set_next_task(rq, p); - task_rq_unlock(rq, p, &rf); + guard(task_rq_lock)(p); + scoped_guard (sched_change, p, DEQUEUE_SAVE) + p->numa_preferred_nid =3D nid; } #endif /* CONFIG_NUMA_BALANCING */ =20 @@ -9205,8 +9178,9 @@ static void sched_change_group(struct ta */ void sched_move_task(struct task_struct *tsk, bool for_autogroup) { - int queued, running, queue_flags =3D + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; + bool resched =3D false; struct rq *rq; =20 CLASS(task_rq_lock, rq_guard)(tsk); @@ -9214,29 +9188,16 @@ void sched_move_task(struct task_struct =20 update_rq_clock(rq); =20 - running =3D task_current_donor(rq, tsk); - queued =3D task_on_rq_queued(tsk); + scoped_guard (sched_change, tsk, queue_flags) { + sched_change_group(tsk); + if (!for_autogroup) + scx_cgroup_move_task(tsk); + if (scope->running) + resched =3D true; + } =20 - if (queued) - dequeue_task(rq, tsk, queue_flags); - if (running) - put_prev_task(rq, tsk); - - sched_change_group(tsk); - if (!for_autogroup) - scx_cgroup_move_task(tsk); - - if (queued) - enqueue_task(rq, tsk, queue_flags); - if (running) { - set_next_task(rq, tsk); - /* - * After changing group, the running task may have joined a - * throttled one but it's still the running task. Trigger a - * resched to make sure that task can still run. - */ + if (resched) resched_curr(rq); - } } =20 static struct cgroup_subsys_state * @@ -10892,37 +10853,39 @@ void sched_mm_cid_fork(struct task_struc } #endif /* CONFIG_SCHED_MM_CID */ =20 -#ifdef CONFIG_SCHED_CLASS_EXT -void sched_deq_and_put_task(struct task_struct *p, int queue_flags, - struct sched_enq_and_set_ctx *ctx) +static DEFINE_PER_CPU(struct sched_change_ctx, sched_change_ctx); + +struct sched_change_ctx *sched_change_begin(struct task_struct *p, unsigne= d int flags) { + struct sched_change_ctx *ctx =3D this_cpu_ptr(&sched_change_ctx); struct rq *rq =3D task_rq(p); =20 lockdep_assert_rq_held(rq); =20 - *ctx =3D (struct sched_enq_and_set_ctx){ + *ctx =3D (struct sched_change_ctx){ .p =3D p, - .queue_flags =3D queue_flags, + .flags =3D flags, .queued =3D task_on_rq_queued(p), .running =3D task_current(rq, p), }; =20 - update_rq_clock(rq); if (ctx->queued) - dequeue_task(rq, p, queue_flags | DEQUEUE_NOCLOCK); + dequeue_task(rq, p, flags); if (ctx->running) put_prev_task(rq, p); + + return ctx; } =20 -void sched_enq_and_set_task(struct sched_enq_and_set_ctx *ctx) +void sched_change_end(struct sched_change_ctx *ctx) { - struct rq *rq =3D task_rq(ctx->p); + struct task_struct *p =3D ctx->p; + struct rq *rq =3D task_rq(p); =20 lockdep_assert_rq_held(rq); =20 if (ctx->queued) - enqueue_task(rq, ctx->p, ctx->queue_flags | ENQUEUE_NOCLOCK); + enqueue_task(rq, p, ctx->flags | ENQUEUE_NOCLOCK); if (ctx->running) - set_next_task(rq, ctx->p); + set_next_task(rq, p); } -#endif /* CONFIG_SCHED_CLASS_EXT */ --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -3780,11 +3780,10 @@ static void scx_bypass(bool bypass) */ list_for_each_entry_safe_reverse(p, n, &rq->scx.runnable_list, scx.runnable_node) { - struct sched_enq_and_set_ctx ctx; - /* cycling deq/enq is enough, see the function comment */ - sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx); - sched_enq_and_set_task(&ctx); + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE) { + /* nothing */ ; + } } =20 /* resched to restore ticks and idle state */ @@ -3916,17 +3915,16 @@ static void scx_disable_workfn(struct kt const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); - struct sched_enq_and_set_ctx ctx; - - if (old_class !=3D new_class && p->se.sched_delayed) - dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); =20 - sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx); + update_rq_clock(task_rq(p)); =20 - p->sched_class =3D new_class; - check_class_changing(task_rq(p), p, old_class); + if (old_class !=3D new_class && p->se.sched_delayed) + dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_N= OCLOCK); =20 - sched_enq_and_set_task(&ctx); + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOC= LOCK) { + p->sched_class =3D new_class; + check_class_changing(task_rq(p), p, old_class); + } =20 check_class_changed(task_rq(p), p, old_class, p->prio); scx_exit_task(p); @@ -4660,21 +4658,20 @@ static int scx_enable(struct sched_ext_o const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); - struct sched_enq_and_set_ctx ctx; =20 if (!tryget_task_struct(p)) continue; =20 - if (old_class !=3D new_class && p->se.sched_delayed) - dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); - - sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx); + update_rq_clock(task_rq(p)); =20 - p->scx.slice =3D SCX_SLICE_DFL; - p->sched_class =3D new_class; - check_class_changing(task_rq(p), p, old_class); + if (old_class !=3D new_class && p->se.sched_delayed) + dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_N= OCLOCK); =20 - sched_enq_and_set_task(&ctx); + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOC= LOCK) { + p->scx.slice =3D SCX_SLICE_DFL; + p->sched_class =3D new_class; + check_class_changing(task_rq(p), p, old_class); + } =20 check_class_changed(task_rq(p), p, old_class, p->prio); put_task_struct(p); --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3885,23 +3885,22 @@ extern void check_class_changed(struct r extern struct balance_callback *splice_balance_callbacks(struct rq *rq); extern void balance_callbacks(struct rq *rq, struct balance_callback *head= ); =20 -#ifdef CONFIG_SCHED_CLASS_EXT -/* - * Used by SCX in the enable/disable paths to move tasks between sched_cla= sses - * and establish invariants. - */ -struct sched_enq_and_set_ctx { +struct sched_change_ctx { struct task_struct *p; - int queue_flags; + int flags; bool queued; bool running; }; =20 -void sched_deq_and_put_task(struct task_struct *p, int queue_flags, - struct sched_enq_and_set_ctx *ctx); -void sched_enq_and_set_task(struct sched_enq_and_set_ctx *ctx); +struct sched_change_ctx *sched_change_begin(struct task_struct *p, unsigne= d int flags); +void sched_change_end(struct sched_change_ctx *ctx); =20 -#endif /* CONFIG_SCHED_CLASS_EXT */ +DEFINE_CLASS(sched_change, struct sched_change_ctx *, + sched_change_end(_T), + sched_change_begin(p, flags), + struct task_struct *p, unsigned int flags) + +DEFINE_CLASS_IS_UNCONDITIONAL(sched_change) =20 #include "ext.h" =20 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -64,7 +64,6 @@ static int effective_prio(struct task_st =20 void set_user_nice(struct task_struct *p, long nice) { - bool queued, running; struct rq *rq; int old_prio; =20 @@ -90,22 +89,12 @@ void set_user_nice(struct task_struct *p return; } =20 - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - if (queued) - dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK); - if (running) - put_prev_task(rq, p); - - p->static_prio =3D NICE_TO_PRIO(nice); - set_load_weight(p, true); - old_prio =3D p->prio; - p->prio =3D effective_prio(p); - - if (queued) - enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK); - if (running) - set_next_task(rq, p); + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK) { + p->static_prio =3D NICE_TO_PRIO(nice); + set_load_weight(p, true); + old_prio =3D p->prio; + p->prio =3D effective_prio(p); + } =20 /* * If the task increased its priority or is running and @@ -515,7 +504,7 @@ int __sched_setscheduler(struct task_str bool user, bool pi) { int oldpolicy =3D -1, policy =3D attr->sched_policy; - int retval, oldprio, newprio, queued, running; + int retval, oldprio, newprio; const struct sched_class *prev_class, *next_class; struct balance_callback *head; struct rq_flags rf; @@ -698,33 +687,25 @@ int __sched_setscheduler(struct task_str if (prev_class !=3D next_class && p->se.sched_delayed) dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); =20 - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - if (queued) - dequeue_task(rq, p, queue_flags); - if (running) - put_prev_task(rq, p); - - if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) { - __setscheduler_params(p, attr); - p->sched_class =3D next_class; - p->prio =3D newprio; - } - __setscheduler_uclamp(p, attr); - check_class_changing(rq, p, prev_class); + scoped_guard (sched_change, p, queue_flags) { =20 - if (queued) { - /* - * We enqueue to tail when the priority of a task is - * increased (user space view). - */ - if (oldprio < p->prio) - queue_flags |=3D ENQUEUE_HEAD; + if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) { + __setscheduler_params(p, attr); + p->sched_class =3D next_class; + p->prio =3D newprio; + } + __setscheduler_uclamp(p, attr); + check_class_changing(rq, p, prev_class); =20 - enqueue_task(rq, p, queue_flags); + if (scope->queued) { + /* + * We enqueue to tail when the priority of a task is + * increased (user space view). + */ + if (oldprio < p->prio) + scope->flags |=3D ENQUEUE_HEAD; + } } - if (running) - set_next_task(rq, p); =20 check_class_changed(rq, p, prev_class, oldprio); From nobody Sun Feb 8 04:31:06 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C047316EB42; Mon, 6 Oct 2025 10:45:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747560; cv=none; b=g3dGviZH0Ho2qGi82cv4foRYOqPJXPLzH8mHluB/3ScXx1h/pCAm2b9U2n9TP5qJGY5ScFNoTrQ8nzuBDkV0ZWV4cdmrAGYSaWSOJWM6/z+4DZGxMBTsn8WnmvpIKX5G3gVSbfOEkn/b2IiLLlU3eGL4hmMF5vgmTuQATSiBw/o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747560; c=relaxed/simple; bh=MXTMcR9YZX0DfzxJOhZPMSsGeJkLOquRCKrvD4CNy4s=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Gp5oL8K2DCvc/WeQuMYR+eazMiilpf8ExaKKJgXnHIRpTZ0kF38MoIuxDudIqcYw8sn01c70hWQXFpsbvmyO5f3wHLXoqounRQuXBGe4Ygmcb1lBV0JpS5LqmGEXS5IZ2gpswqCXmrS3lJHHFiWbwMLIsheGpAfxq+mo6lDibgU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=gsdPOHYX; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="gsdPOHYX" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=c5AKA1Ul7x9P1KOonIgzBsVrilkhe+QAcdwdwNnhhOc=; b=gsdPOHYXR0mS2XtAnzCkc6Zd9o kfGRotyBCA5cph+HhI4FuQcUMCtiGUXYsTeG/qR6UI0zuMEsZgbliaP/xCmAklw6QnYFRI4mv2EiH tssIcUgXEIXON5rylbwczUnlowLqNNPTL55gnfAPOUFXBCjFt8BZZkb9Tt9uTuR25A+/lYwsQpd1L EmiCDb0tF8Oyi5uZlN3aW1+DfrQVgOy9V3QSkRKz9paw08Yaut1VRoFqTeUeQ7GCvv4XTGnZxKPFR BpTpfE6v0ZJ7m8PZ+DcYCdsw3Q4K2brVX4OZJDoE3vrK4jzcMkJxokzpMjKVqNI7MHiTS3hDYw3LZ E+bs/8Aw==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1v5iix-00000009yac-0Rta; Mon, 06 Oct 2025 10:45:48 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id C370A30039F; Mon, 06 Oct 2025 12:45:47 +0200 (CEST) Message-ID: <20251006104526.729048003@infradead.org> User-Agent: quilt/0.68 Date: Mon, 06 Oct 2025 12:44:04 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 02/12] sched: Re-arrange the {EN,DE}QUEUE flags References: <20251006104402.946760805@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ensure the matched flags are in the low word while the unmatched flags go into the second word. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Tejun Heo Acked-by: Andrea Righi Acked-by: Juri Lelli Acked-by: Vincent Guittot for the serie Reviewed-by: Juri Lelli Reviewed-by: Valentin Schneider --- kernel/sched/sched.h | 45 ++++++++++++++++++++++++--------------------- 1 file changed, 24 insertions(+), 21 deletions(-) --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2362,27 +2362,30 @@ extern const u32 sched_prio_to_wmult[40 * */ =20 -#define DEQUEUE_SLEEP 0x01 /* Matches ENQUEUE_WAKEUP */ -#define DEQUEUE_SAVE 0x02 /* Matches ENQUEUE_RESTORE */ -#define DEQUEUE_MOVE 0x04 /* Matches ENQUEUE_MOVE */ -#define DEQUEUE_NOCLOCK 0x08 /* Matches ENQUEUE_NOCLOCK */ -#define DEQUEUE_SPECIAL 0x10 -#define DEQUEUE_MIGRATING 0x100 /* Matches ENQUEUE_MIGRATING */ -#define DEQUEUE_DELAYED 0x200 /* Matches ENQUEUE_DELAYED */ -#define DEQUEUE_THROTTLE 0x800 - -#define ENQUEUE_WAKEUP 0x01 -#define ENQUEUE_RESTORE 0x02 -#define ENQUEUE_MOVE 0x04 -#define ENQUEUE_NOCLOCK 0x08 - -#define ENQUEUE_HEAD 0x10 -#define ENQUEUE_REPLENISH 0x20 -#define ENQUEUE_MIGRATED 0x40 -#define ENQUEUE_INITIAL 0x80 -#define ENQUEUE_MIGRATING 0x100 -#define ENQUEUE_DELAYED 0x200 -#define ENQUEUE_RQ_SELECTED 0x400 +#define DEQUEUE_SLEEP 0x0001 /* Matches ENQUEUE_WAKEUP */ +#define DEQUEUE_SAVE 0x0002 /* Matches ENQUEUE_RESTORE */ +#define DEQUEUE_MOVE 0x0004 /* Matches ENQUEUE_MOVE */ +#define DEQUEUE_NOCLOCK 0x0008 /* Matches ENQUEUE_NOCLOCK */ + +#define DEQUEUE_MIGRATING 0x0010 /* Matches ENQUEUE_MIGRATING */ +#define DEQUEUE_DELAYED 0x0020 /* Matches ENQUEUE_DELAYED */ + +#define DEQUEUE_SPECIAL 0x00010000 +#define DEQUEUE_THROTTLE 0x00020000 + +#define ENQUEUE_WAKEUP 0x0001 +#define ENQUEUE_RESTORE 0x0002 +#define ENQUEUE_MOVE 0x0004 +#define ENQUEUE_NOCLOCK 0x0008 + +#define ENQUEUE_MIGRATING 0x0010 +#define ENQUEUE_DELAYED 0x0020 + +#define ENQUEUE_HEAD 0x00010000 +#define ENQUEUE_REPLENISH 0x00020000 +#define ENQUEUE_MIGRATED 0x00040000 +#define ENQUEUE_INITIAL 0x00080000 +#define ENQUEUE_RQ_SELECTED 0x00100000 =20 #define RETRY_TASK ((void *)-1UL) From nobody Sun Feb 8 04:31:06 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A41F26A0D5; Mon, 6 Oct 2025 10:46:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747564; cv=none; b=iUaZYlTXouLJGtCr6O+ADvCCj14bgv3wS/OuxxEy8GvIOtb2ZIdKFGrMoDQDQ7Dz/dBWKZe9yhoe2d0cnJSaAK4PR7tq01czOXanrMwvEz1YUMIQiD+bEwcGeHbiQ/gZRwy6mWfjEPNCrzesxFwMtlZYyzXarNL5eCA8Oq+vWjw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747564; c=relaxed/simple; bh=95IGUrX5ihHjj6Vu1vP5ruvAiYX5mwIcetO5Xw1Q9eI=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=CwLOH36RYVMvSGqebu3RnJl9DbEotcRXRT9oHo10IRPh5aPL+CWt1x9heT7+GyXWT720ZxIG3gVCh0jZIfC5buSHVA9eTkbInzkTwxAxbW7X/DHM2oyYlXhTwKl0OkHFz3GtPFdDNPP3lyvEw+U8G1i35aUUhrE0yb/MrkhHWtA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=hPW8CzD7; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="hPW8CzD7" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=kgMb8FY5Ba7jk0kr0FcfqMV6B6P7o3R/ebzqbWBa26Y=; b=hPW8CzD7+VBryG7aAp4TmkWI6i LCrzlpE4TJWFpiVI9Vw7TB4/g30OyI5K4ROvHjfLXiN6wn3EwG4eBdH8qstyXp2IOT9c6l5TywCNs UbL7YzUPo4V6rAkK+xYdjY54CVQjPbWD3JXnTxL78BauQ8TvRJaWRrsGFP/1JDZE279vzR3fs26Qd Nj0uxw/4QAu2GskTtGBUBZhi0dhPofHEsYFfzAPwtPAK05QzDqQRtqBb+fOZVY4uVr0TQn/4EpvSh OoljFZUdg3wNI2KfJwo7kBpBNFxkYH2reW/g6BHR1cyr+X++ERfq+/t/v1Z6nZaKDwJtSXfPasjuJ I6R76mvQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1v5iiy-0000000Grdo-3KFb; Mon, 06 Oct 2025 10:45:49 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id C7703300462; Mon, 06 Oct 2025 12:45:47 +0200 (CEST) Message-ID: <20251006104526.861755244@infradead.org> User-Agent: quilt/0.68 Date: Mon, 06 Oct 2025 12:44:05 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 03/12] sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern References: <20251006104402.946760805@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add {DE,EN}QUEUE_CLASS and fold the sched_class::switch* methods into the change pattern. This completes and makes the pattern more symmetric. This changes the order of callbacks slightly: | | switching_from() dequeue_task(); | dequeue_task() put_prev_task(); | put_prev_task() | switched_from() | ... change task ... | ... change task ... | switching_to(); | switching_to() enqueue_task(); | enqueue_task() set_next_task(); | set_next_task() prev_class->switched_from() | switched_to() | switched_to() | Notably, it moves the switched_from() callback right after the dequeue/put. Existing implementations don't appear to be affected by this change in location -- specifically the task isn't enqueued on the class in question in either location. Make (CLASS)^(SAVE|MOVE), because there is nothing to save-restore when changing scheduling classes. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Tejun Heo Acked-by: Andrea Righi Acked-by: Juri Lelli Acked-by: Vincent Guittot for the serie Reviewed-by: Juri Lelli Reviewed-by: Valentin Schneider --- kernel/sched/core.c | 56 +++++++++++++++++++++---------------------= ----- kernel/sched/ext.c | 26 ++++++++++++++++----- kernel/sched/idle.c | 4 +-- kernel/sched/rt.c | 2 - kernel/sched/sched.h | 22 ++++++------------ kernel/sched/stop_task.c | 4 +-- kernel/sched/syscalls.c | 9 +++++-- 7 files changed, 66 insertions(+), 57 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2169,34 +2169,9 @@ inline int task_curr(const struct task_s return cpu_curr(task_cpu(p)) =3D=3D p; } =20 -/* - * ->switching_to() is called with the pi_lock and rq_lock held and must n= ot - * mess with locking. - */ -void check_class_changing(struct rq *rq, struct task_struct *p, - const struct sched_class *prev_class) +void check_prio_changed(struct rq *rq, struct task_struct *p, int oldprio) { - if (prev_class !=3D p->sched_class && p->sched_class->switching_to) - p->sched_class->switching_to(rq, p); -} - -/* - * switched_from, switched_to and prio_changed must _NOT_ drop rq->lock, - * use the balance_callback list if you want balancing. - * - * this means any call to check_class_changed() must be followed by a call= to - * balance_callback(). - */ -void check_class_changed(struct rq *rq, struct task_struct *p, - const struct sched_class *prev_class, - int oldprio) -{ - if (prev_class !=3D p->sched_class) { - if (prev_class->switched_from) - prev_class->switched_from(rq, p); - - p->sched_class->switched_to(rq, p); - } else if (oldprio !=3D p->prio || dl_task(p)) + if (oldprio !=3D p->prio || dl_task(p)) p->sched_class->prio_changed(rq, p, oldprio); } =20 @@ -7388,6 +7363,11 @@ void rt_mutex_setprio(struct task_struct prev_class =3D p->sched_class; next_class =3D __setscheduler_class(p->policy, prio); =20 + if (prev_class !=3D next_class) { + queue_flag |=3D DEQUEUE_CLASS; + queue_flag &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); + } + if (prev_class !=3D next_class && p->se.sched_delayed) dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); =20 @@ -7424,11 +7404,10 @@ void rt_mutex_setprio(struct task_struct =20 p->sched_class =3D next_class; p->prio =3D prio; - - check_class_changing(rq, p, prev_class); } =20 - check_class_changed(rq, p, prev_class, oldprio); + if (!(queue_flag & DEQUEUE_CLASS)) + check_prio_changed(rq, p, oldprio); out_unlock: /* Avoid rq from going away on us: */ preempt_disable(); @@ -10862,6 +10841,14 @@ struct sched_change_ctx *sched_change_be =20 lockdep_assert_rq_held(rq); =20 + if (flags & DEQUEUE_CLASS) { + if (WARN_ON_ONCE(flags & (DEQUEUE_SAVE | DEQUEUE_MOVE))) + flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); + + if (p->sched_class->switching_from) + p->sched_class->switching_from(rq, p); + } + *ctx =3D (struct sched_change_ctx){ .p =3D p, .flags =3D flags, @@ -10874,6 +10861,9 @@ struct sched_change_ctx *sched_change_be if (ctx->running) put_prev_task(rq, p); =20 + if ((flags & DEQUEUE_CLASS) && p->sched_class->switched_from) + p->sched_class->switched_from(rq, p); + return ctx; } =20 @@ -10884,8 +10874,14 @@ void sched_change_end(struct sched_chang =20 lockdep_assert_rq_held(rq); =20 + if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switching_to) + p->sched_class->switching_to(rq, p); + if (ctx->queued) enqueue_task(rq, p, ctx->flags | ENQUEUE_NOCLOCK); if (ctx->running) set_next_task(rq, p); + + if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switched_to) + p->sched_class->switched_to(rq, p); } --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -3912,21 +3912,28 @@ static void scx_disable_workfn(struct kt =20 scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); =20 update_rq_clock(task_rq(p)); =20 + if (old_class !=3D new_class) { + queue_flags |=3D DEQUEUE_CLASS; + queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); + } + if (old_class !=3D new_class && p->se.sched_delayed) dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_N= OCLOCK); =20 - scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOC= LOCK) { + scoped_guard (sched_change, p, queue_flags) { p->sched_class =3D new_class; - check_class_changing(task_rq(p), p, old_class); } =20 - check_class_changed(task_rq(p), p, old_class, p->prio); + if (!(queue_flags & DEQUEUE_CLASS)) + check_prio_changed(task_rq(p), p, p->prio); + scx_exit_task(p); } scx_task_iter_stop(&sti); @@ -4655,6 +4662,7 @@ static int scx_enable(struct sched_ext_o percpu_down_write(&scx_fork_rwsem); scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); @@ -4664,16 +4672,22 @@ static int scx_enable(struct sched_ext_o =20 update_rq_clock(task_rq(p)); =20 + if (old_class !=3D new_class) { + queue_flags |=3D DEQUEUE_CLASS; + queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); + } + if (old_class !=3D new_class && p->se.sched_delayed) dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_N= OCLOCK); =20 - scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOC= LOCK) { + scoped_guard (sched_change, p, queue_flags) { p->scx.slice =3D SCX_SLICE_DFL; p->sched_class =3D new_class; - check_class_changing(task_rq(p), p, old_class); } =20 - check_class_changed(task_rq(p), p, old_class, p->prio); + if (!(queue_flags & DEQUEUE_CLASS)) + check_prio_changed(task_rq(p), p, p->prio); + put_task_struct(p); } scx_task_iter_stop(&sti); --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -498,7 +498,7 @@ static void task_tick_idle(struct rq *rq { } =20 -static void switched_to_idle(struct rq *rq, struct task_struct *p) +static void switching_to_idle(struct rq *rq, struct task_struct *p) { BUG(); } @@ -536,6 +536,6 @@ DEFINE_SCHED_CLASS(idle) =3D { .task_tick =3D task_tick_idle, =20 .prio_changed =3D prio_changed_idle, - .switched_to =3D switched_to_idle, + .switching_to =3D switching_to_idle, .update_curr =3D update_curr_idle, }; --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2589,8 +2589,8 @@ DEFINE_SCHED_CLASS(rt) =3D { =20 .get_rr_interval =3D get_rr_interval_rt, =20 - .prio_changed =3D prio_changed_rt, .switched_to =3D switched_to_rt, + .prio_changed =3D prio_changed_rt, =20 .update_curr =3D update_curr_rt, =20 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -20,7 +20,6 @@ #include #include #include - #include #include #include @@ -2369,6 +2368,7 @@ extern const u32 sched_prio_to_wmult[40 =20 #define DEQUEUE_MIGRATING 0x0010 /* Matches ENQUEUE_MIGRATING */ #define DEQUEUE_DELAYED 0x0020 /* Matches ENQUEUE_DELAYED */ +#define DEQUEUE_CLASS 0x0040 /* Matches ENQUEUE_CLASS */ =20 #define DEQUEUE_SPECIAL 0x00010000 #define DEQUEUE_THROTTLE 0x00020000 @@ -2380,6 +2380,7 @@ extern const u32 sched_prio_to_wmult[40 =20 #define ENQUEUE_MIGRATING 0x0010 #define ENQUEUE_DELAYED 0x0020 +#define ENQUEUE_CLASS 0x0040 =20 #define ENQUEUE_HEAD 0x00010000 #define ENQUEUE_REPLENISH 0x00020000 @@ -2443,14 +2444,11 @@ struct sched_class { void (*task_fork)(struct task_struct *p); void (*task_dead)(struct task_struct *p); =20 - /* - * The switched_from() call is allowed to drop rq->lock, therefore we - * cannot assume the switched_from/switched_to pair is serialized by - * rq->lock. They are however serialized by p->pi_lock. - */ - void (*switching_to) (struct rq *this_rq, struct task_struct *task); - void (*switched_from)(struct rq *this_rq, struct task_struct *task); - void (*switched_to) (struct rq *this_rq, struct task_struct *task); + void (*switching_from)(struct rq *this_rq, struct task_struct *task); + void (*switched_from) (struct rq *this_rq, struct task_struct *task); + void (*switching_to) (struct rq *this_rq, struct task_struct *task); + void (*switched_to) (struct rq *this_rq, struct task_struct *task); + void (*reweight_task)(struct rq *this_rq, struct task_struct *task, const struct load_weight *lw); void (*prio_changed) (struct rq *this_rq, struct task_struct *task, @@ -3879,11 +3877,7 @@ extern void set_load_weight(struct task_ extern void enqueue_task(struct rq *rq, struct task_struct *p, int flags); extern bool dequeue_task(struct rq *rq, struct task_struct *p, int flags); =20 -extern void check_class_changing(struct rq *rq, struct task_struct *p, - const struct sched_class *prev_class); -extern void check_class_changed(struct rq *rq, struct task_struct *p, - const struct sched_class *prev_class, - int oldprio); +extern void check_prio_changed(struct rq *rq, struct task_struct *p, int o= ldprio); =20 extern struct balance_callback *splice_balance_callbacks(struct rq *rq); extern void balance_callbacks(struct rq *rq, struct balance_callback *head= ); --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -75,7 +75,7 @@ static void task_tick_stop(struct rq *rq { } =20 -static void switched_to_stop(struct rq *rq, struct task_struct *p) +static void switching_to_stop(struct rq *rq, struct task_struct *p) { BUG(); /* its impossible to change to this class */ } @@ -112,6 +112,6 @@ DEFINE_SCHED_CLASS(stop) =3D { .task_tick =3D task_tick_stop, =20 .prio_changed =3D prio_changed_stop, - .switched_to =3D switched_to_stop, + .switching_to =3D switching_to_stop, .update_curr =3D update_curr_stop, }; --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -684,6 +684,11 @@ int __sched_setscheduler(struct task_str prev_class =3D p->sched_class; next_class =3D __setscheduler_class(policy, newprio); =20 + if (prev_class !=3D next_class) { + queue_flags |=3D DEQUEUE_CLASS; + queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); + } + if (prev_class !=3D next_class && p->se.sched_delayed) dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); =20 @@ -695,7 +700,6 @@ int __sched_setscheduler(struct task_str p->prio =3D newprio; } __setscheduler_uclamp(p, attr); - check_class_changing(rq, p, prev_class); =20 if (scope->queued) { /* @@ -707,7 +711,8 @@ int __sched_setscheduler(struct task_str } } =20 - check_class_changed(rq, p, prev_class, oldprio); + if (!(queue_flags & DEQUEUE_CLASS)) + check_prio_changed(rq, p, oldprio); =20 /* Avoid rq from going away on us: */ preempt_disable(); From nobody Sun Feb 8 04:31:06 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFF752627FC; Mon, 6 Oct 2025 10:46:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747562; cv=none; b=WoGM0x3Q38NqXE9fT5sMdL/FIfnvhQtQGJMKCzKuduip3HOHdD2iLI65zsKA295dCDHU6UEv94/CwnbcpNuCp8uRSPaFDbuLDjJT6AECENYqpux0xGktiIeymfg5JhCjv5FWa2UmIeaGed+VWodvMys15Zcx2H5PJIktha9wr3M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747562; c=relaxed/simple; bh=c9xoIob1d4Dr2GfYVz5Rit2g6x6od88j4KsW99vCxOI=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=f6mY3yqT5uOJtJq7B+Jd40zjo/69JZeewK8z4LGZzlPnEGXSexvoP4y5bolLA7w+JtdPm7w96mlBKtx8r1Id81/8DXV0YbNqVdcPis25I12wvSQaWAsz/W92cT1S+4u0+x2IsG1d+BSPncO7ic/94+taxgzJYrVLvv8T+wKNrGk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=D6EvLUAm; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="D6EvLUAm" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=BgD2e8jwOUPdYSAJVBJ3I9+RfPG0RYrESbEwPiGDcBM=; b=D6EvLUAmY4DbKVsAczPiazAJ9v psS8akrf1uI3rZrGQwRls/RtJebAjfO8UJJaJWSdrXxX2Kv6+HRbwcICxHyx5IBaMyNuCEjMH8W6h UlFqvQSKQFHRNXVlWsT6fKP2agBhBqtGPP6R3V/9dqYlwAl79QPUOq3znO2fl8G+1vRDCEcLu+snV WCDuDz0mXWIDqMbozNhtyYjea1irX/nFtJhIscs5DppyqgQj59jUh+n/ipiQJxjyl8CTf5FHahWDU LkcoTZJ0ovj/Jdc6zhl004j8c4FQI4gYOfcSLdvl+SxbWyrKqge4iJGrprrV8Rln/YOsJcT7dmMea YrqGPGgQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1v5iix-00000009yaz-0RHa; Mon, 06 Oct 2025 10:45:48 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id CC577300754; Mon, 06 Oct 2025 12:45:47 +0200 (CEST) Message-ID: <20251006104526.964100769@infradead.org> User-Agent: quilt/0.68 Date: Mon, 06 Oct 2025 12:44:06 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 04/12] sched: Cleanup sched_delayed handling for class switches References: <20251006104402.946760805@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use the new sched_class::switching_from() method to dequeue delayed tasks before switching to another class. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Tejun Heo Acked-by: Andrea Righi Acked-by: Juri Lelli Acked-by: Vincent Guittot for the serie Reviewed-by: Juri Lelli Reviewed-by: Valentin Schneider Reviewed-by: Vincent Guittot --- kernel/sched/core.c | 12 ++++++++---- kernel/sched/ext.c | 6 ------ kernel/sched/fair.c | 7 +++++++ kernel/sched/syscalls.c | 3 --- 4 files changed, 15 insertions(+), 13 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7368,9 +7368,6 @@ void rt_mutex_setprio(struct task_struct queue_flag &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); } =20 - if (prev_class !=3D next_class && p->se.sched_delayed) - dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); - scoped_guard (sched_change, p, queue_flag) { /* * Boosting condition are: @@ -10845,8 +10842,15 @@ struct sched_change_ctx *sched_change_be if (WARN_ON_ONCE(flags & (DEQUEUE_SAVE | DEQUEUE_MOVE))) flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); =20 - if (p->sched_class->switching_from) + if (p->sched_class->switching_from) { + /* + * switching_from_fair() assumes CLASS implies NOCLOCK; + * fixing this assumption would mean switching_from() + * would need to be able to change flags. + */ + WARN_ON(!(flags & DEQUEUE_NOCLOCK)); p->sched_class->switching_from(rq, p); + } } =20 *ctx =3D (struct sched_change_ctx){ --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -3924,9 +3924,6 @@ static void scx_disable_workfn(struct kt queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); } =20 - if (old_class !=3D new_class && p->se.sched_delayed) - dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_N= OCLOCK); - scoped_guard (sched_change, p, queue_flags) { p->sched_class =3D new_class; } @@ -4677,9 +4674,6 @@ static int scx_enable(struct sched_ext_o queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); } =20 - if (old_class !=3D new_class && p->se.sched_delayed) - dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_N= OCLOCK); - scoped_guard (sched_change, p, queue_flags) { p->scx.slice =3D SCX_SLICE_DFL; p->sched_class =3D new_class; --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13237,6 +13237,12 @@ static void attach_task_cfs_rq(struct ta attach_entity_cfs_rq(se); } =20 +static void switching_from_fair(struct rq *rq, struct task_struct *p) +{ + if (p->se.sched_delayed) + dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); +} + static void switched_from_fair(struct rq *rq, struct task_struct *p) { detach_task_cfs_rq(p); @@ -13638,6 +13644,7 @@ DEFINE_SCHED_CLASS(fair) =3D { =20 .reweight_task =3D reweight_task_fair, .prio_changed =3D prio_changed_fair, + .switching_from =3D switching_from_fair, .switched_from =3D switched_from_fair, .switched_to =3D switched_to_fair, =20 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -689,9 +689,6 @@ int __sched_setscheduler(struct task_str queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); } =20 - if (prev_class !=3D next_class && p->se.sched_delayed) - dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); - scoped_guard (sched_change, p, queue_flags) { =20 if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) { From nobody Sun Feb 8 04:31:06 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BAA826A0F8; Mon, 6 Oct 2025 10:46:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747564; cv=none; b=p47f5KoaktD/zsxPJUbtJ4CMnswP2p7uRapbOjJXsVcapthC1hEBW36FhCCjzABnuXN4z3cUUTJUGM6OhfZtVmb/M3pc2IZSy6XAmKgZtydRwmDky7FTcpPGDN4YIbqqU7J1WOIy2gbDDurIPcrq3U3JpaTWSi4pz69kXdQ1wto= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747564; c=relaxed/simple; bh=2jRxNfwlvsjHZBs8OKl5EwWe4QmjriuanSnkxBCGgok=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=G3BIGU2GyN1VOekQFZsXJkLJlchKRu64oAAn/fcKYaNdLMbJBDShoPiDmFxB1MJJiD1nJK67PDRSavbixIXzFKy4dTACXCOZ5yARx3RJpjeN2O68XcFVul1X930ek51yGzHcUHS4wS1xT53AeVLlGbuS18eT/6N+yDENgUzYMy4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=UlBzfxpF; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="UlBzfxpF" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=7IaiVVcZxMwyJAC7lakJaxXx69OYQoV3ZOcsSQpZ6Uc=; b=UlBzfxpFdxOIDttduGz2XGzKRK MGLhdTEo/ek/vWaorsQg7oJ1oO5YYuXEZViQ0lw2dkBd2anXPVm2w9+zsqmM1Iblw5eZccLPMLT/s jwttxli0sxWOXFLVhThL70V30CFMk8nu/nzaWWFtEGoEkwj5VoUVkRTLUNMxVXa4Uxi0wdvTZf/2v fZZH7DRiDOUvpr1FExlRAgVYRGUrYCW1E2AhrQ+J8lfFbV/nilMwC9OVTSMiyKFBNOXQM1JjITeKp 7PxQRhVW2rUwdxTbSzQQIEGLaA6r+3qhblGB41CUPaOzYfmCro/0cD/bEG/pUmNMGmam5cDRsDc7V Fs1lnwVg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1v5iiz-0000000Grdq-0HUJ; Mon, 06 Oct 2025 10:45:49 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id D08BD301142; Mon, 06 Oct 2025 12:45:47 +0200 (CEST) Message-ID: <20251006104527.083607521@infradead.org> User-Agent: quilt/0.68 Date: Mon, 06 Oct 2025 12:44:07 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 05/12] sched: Move sched_class::prio_changed() into the change pattern References: <20251006104402.946760805@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move sched_class::prio_changed() into the change pattern. And while there, extend it with sched_class::get_prio() in order to fix the deadline sitation. Suggested-by: Tejun Heo Signed-off-by: Peter Zijlstra (Intel) Acked-by: Tejun Heo Acked-by: Andrea Righi Acked-by: Juri Lelli Acked-by: Vincent Guittot for the serie Reviewed-by: Juri Lelli Reviewed-by: Valentin Schneider --- kernel/sched/core.c | 24 +++++++++++++----------- kernel/sched/deadline.c | 20 +++++++++++--------- kernel/sched/ext.c | 8 +------- kernel/sched/fair.c | 8 ++++++-- kernel/sched/idle.c | 5 ++++- kernel/sched/rt.c | 5 ++++- kernel/sched/sched.h | 7 ++++--- kernel/sched/stop_task.c | 5 ++++- kernel/sched/syscalls.c | 9 --------- 9 files changed, 47 insertions(+), 44 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2169,12 +2169,6 @@ inline int task_curr(const struct task_s return cpu_curr(task_cpu(p)) =3D=3D p; } =20 -void check_prio_changed(struct rq *rq, struct task_struct *p, int oldprio) -{ - if (oldprio !=3D p->prio || dl_task(p)) - p->sched_class->prio_changed(rq, p, oldprio); -} - void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags) { struct task_struct *donor =3D rq->donor; @@ -7402,9 +7396,6 @@ void rt_mutex_setprio(struct task_struct p->sched_class =3D next_class; p->prio =3D prio; } - - if (!(queue_flag & DEQUEUE_CLASS)) - check_prio_changed(rq, p, oldprio); out_unlock: /* Avoid rq from going away on us: */ preempt_disable(); @@ -10860,6 +10851,13 @@ struct sched_change_ctx *sched_change_be .running =3D task_current(rq, p), }; =20 + if (!(flags & DEQUEUE_CLASS)) { + if (p->sched_class->get_prio) + ctx->prio =3D p->sched_class->get_prio(rq, p); + else + ctx->prio =3D p->prio; + } + if (ctx->queued) dequeue_task(rq, p, flags); if (ctx->running) @@ -10886,6 +10884,10 @@ void sched_change_end(struct sched_chang if (ctx->running) set_next_task(rq, p); =20 - if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switched_to) - p->sched_class->switched_to(rq, p); + if (ctx->flags & ENQUEUE_CLASS) { + if (p->sched_class->switched_to) + p->sched_class->switched_to(rq, p); + } else { + p->sched_class->prio_changed(rq, p, ctx->prio); + } } --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -3042,23 +3042,24 @@ static void switched_to_dl(struct rq *rq } } =20 +static u64 get_prio_dl(struct rq *rq, struct task_struct *p) +{ + return p->dl.deadline; +} + /* * If the scheduling parameters of a -deadline task changed, * a push or pull operation might be needed. */ -static void prio_changed_dl(struct rq *rq, struct task_struct *p, - int oldprio) +static void prio_changed_dl(struct rq *rq, struct task_struct *p, u64 old_= deadline) { if (!task_on_rq_queued(p)) return; =20 - /* - * This might be too much, but unfortunately - * we don't have the old deadline value, and - * we can't argue if the task is increasing - * or lowering its prio, so... - */ - if (!rq->dl.overloaded) + if (p->dl.deadline =3D=3D old_deadline) + return; + + if (dl_time_before(old_deadline, p->dl.deadline)) deadline_queue_pull_task(rq); =20 if (task_current_donor(rq, p)) { @@ -3113,6 +3114,7 @@ DEFINE_SCHED_CLASS(dl) =3D { .task_tick =3D task_tick_dl, .task_fork =3D task_fork_dl, =20 + .get_prio =3D get_prio_dl, .prio_changed =3D prio_changed_dl, .switched_from =3D switched_from_dl, .switched_to =3D switched_to_dl, --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -2961,7 +2961,7 @@ static void reweight_task_scx(struct rq p, p->scx.weight); } =20 -static void prio_changed_scx(struct rq *rq, struct task_struct *p, int old= prio) +static void prio_changed_scx(struct rq *rq, struct task_struct *p, u64 old= prio) { } =20 @@ -3928,9 +3928,6 @@ static void scx_disable_workfn(struct kt p->sched_class =3D new_class; } =20 - if (!(queue_flags & DEQUEUE_CLASS)) - check_prio_changed(task_rq(p), p, p->prio); - scx_exit_task(p); } scx_task_iter_stop(&sti); @@ -4679,9 +4676,6 @@ static int scx_enable(struct sched_ext_o p->sched_class =3D new_class; } =20 - if (!(queue_flags & DEQUEUE_CLASS)) - check_prio_changed(task_rq(p), p, p->prio); - put_task_struct(p); } scx_task_iter_stop(&sti); --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13138,11 +13138,14 @@ static void task_fork_fair(struct task_s * the current task. */ static void -prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio) +prio_changed_fair(struct rq *rq, struct task_struct *p, u64 oldprio) { if (!task_on_rq_queued(p)) return; =20 + if (p->prio =3D=3D oldprio) + return; + if (rq->cfs.nr_queued =3D=3D 1) return; =20 @@ -13154,8 +13157,9 @@ prio_changed_fair(struct rq *rq, struct if (task_current_donor(rq, p)) { if (p->prio > oldprio) resched_curr(rq); - } else + } else { wakeup_preempt(rq, p, 0); + } } =20 #ifdef CONFIG_FAIR_GROUP_SCHED --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -504,8 +504,11 @@ static void switching_to_idle(struct rq } =20 static void -prio_changed_idle(struct rq *rq, struct task_struct *p, int oldprio) +prio_changed_idle(struct rq *rq, struct task_struct *p, u64 oldprio) { + if (p->prio =3D=3D oldprio) + return; + BUG(); } =20 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2437,11 +2437,14 @@ static void switched_to_rt(struct rq *rq * us to initiate a push or pull. */ static void -prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio) +prio_changed_rt(struct rq *rq, struct task_struct *p, u64 oldprio) { if (!task_on_rq_queued(p)) return; =20 + if (p->prio =3D=3D oldprio) + return; + if (task_current_donor(rq, p)) { /* * If our priority decreases while running, we --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2451,8 +2451,10 @@ struct sched_class { =20 void (*reweight_task)(struct rq *this_rq, struct task_struct *task, const struct load_weight *lw); + + u64 (*get_prio) (struct rq *this_rq, struct task_struct *task); void (*prio_changed) (struct rq *this_rq, struct task_struct *task, - int oldprio); + u64 oldprio); =20 unsigned int (*get_rr_interval)(struct rq *rq, struct task_struct *task); @@ -3877,12 +3879,11 @@ extern void set_load_weight(struct task_ extern void enqueue_task(struct rq *rq, struct task_struct *p, int flags); extern bool dequeue_task(struct rq *rq, struct task_struct *p, int flags); =20 -extern void check_prio_changed(struct rq *rq, struct task_struct *p, int o= ldprio); - extern struct balance_callback *splice_balance_callbacks(struct rq *rq); extern void balance_callbacks(struct rq *rq, struct balance_callback *head= ); =20 struct sched_change_ctx { + u64 prio; struct task_struct *p; int flags; bool queued; --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -81,8 +81,11 @@ static void switching_to_stop(struct rq } =20 static void -prio_changed_stop(struct rq *rq, struct task_struct *p, int oldprio) +prio_changed_stop(struct rq *rq, struct task_struct *p, u64 oldprio) { + if (p->prio =3D=3D oldprio) + return; + BUG(); /* how!?, what priority? */ } =20 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -95,12 +95,6 @@ void set_user_nice(struct task_struct *p old_prio =3D p->prio; p->prio =3D effective_prio(p); } - - /* - * If the task increased its priority or is running and - * lowered its priority, then reschedule its CPU: - */ - p->sched_class->prio_changed(rq, p, old_prio); } EXPORT_SYMBOL(set_user_nice); =20 @@ -708,9 +702,6 @@ int __sched_setscheduler(struct task_str } } =20 - if (!(queue_flags & DEQUEUE_CLASS)) - check_prio_changed(rq, p, oldprio); - /* Avoid rq from going away on us: */ preempt_disable(); head =3D splice_balance_callbacks(rq); From nobody Sun Feb 8 04:31:06 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A87026A0EB; Mon, 6 Oct 2025 10:46:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747564; cv=none; b=oLF22OoLU3tUhDdQR8FFjRCbC+1e0VDJKjVeXBEpY6Q3hLy4+i6udZ/dKRRwHkhZh3zAnXIVrXcVcUm1uzs4sa5QFc2f91+QzFNquc1KaiBKuitcWEnYeGI5b/5DisAVb3n0q3OCmp+oVt/UaNTNSKcTLayq5vqP5nUO34NlBhs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747564; c=relaxed/simple; bh=gTI742AaE0sWs/CFpbysRhf9OccN75qridDrxhx998o=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=JXOlpiJuob6Yx2b22t6V47vaPJLuI4uiL7FXGR/7WGTEllKSHfmPCSU7NxUikJ9GQ5aRk7tdqScrGAS8lT1Sc42/vgJkZ2jVDaGAVNsfpgc+hAaM8FfHVKX2d4Mu+1e9AO2qs+QISKo8jy+VPzr0iwMgdhU7Dz4rSyMwj+GjoQs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=gIxJnzUR; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="gIxJnzUR" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=s73925o8F4ztmgHRHrIqf3eXQDX1t2LjryoHd9LMTzQ=; b=gIxJnzURgvag+QCVNMyydUloom AIZ16CmLkc2+tTNLSDN8DUEiM9FrbHg7vOXi0KpEoMORL/3/0a99n5EYj5jbwPwoyJ05m+mbzyf+s qiIzu+VbjkYBy2HVV8Z7CEqZeGYuiVbBiRDlKZAoHY5+bo18TQ6s36SNGxB4nb5hSrdnLbC74qtZk 1uA3JeuwnJ1SWnpKxXLikFKoXUGnjNWYCd/1lIuhp1jhTYTuVbUqXiMslpYYoKHMGT7Y5oSMp0A8u oaf6EM80PnNb9DhlzvVsk6L/PnHkUqxI/GPjPNjP88miVl91GS4DsYnGzLdcPcHktjpU2vIvZmEN7 yWb+8QNA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1v5iiz-0000000Grdr-0EgO; Mon, 06 Oct 2025 10:45:49 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id D5158301A54; Mon, 06 Oct 2025 12:45:47 +0200 (CEST) Message-ID: <20251006104527.202601024@infradead.org> User-Agent: quilt/0.68 Date: Mon, 06 Oct 2025 12:44:08 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 06/12] sched: Fix migrate_disable_switch() locking References: <20251006104402.946760805@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For some reason migrate_disable_switch() was more complicated than it needs to be, resulting in mind bending locking of dubious quality. Recognise that migrate_disable_switch() must be called before a context switch, but any place before that switch is equally good. Since the current place results in troubled locking, simply move the thing before taking rq->lock. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Andrea Righi Acked-by: Juri Lelli Acked-by: Tejun Heo Acked-by: Vincent Guittot for the serie Reviewed-by: Juri Lelli Reviewed-by: Valentin Schneider --- kernel/sched/core.c | 27 ++++++--------------------- 1 file changed, 6 insertions(+), 21 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2346,10 +2346,10 @@ static void migrate_disable_switch(struc if (p->cpus_ptr !=3D &p->cpus_mask) return; =20 - /* - * Violates locking rules! See comment in __do_set_cpus_allowed(). - */ - __do_set_cpus_allowed(p, &ac); + scoped_guard (task_rq_lock, p) { + update_rq_clock(scope.rq); + __do_set_cpus_allowed(p, &ac); + } } =20 void ___migrate_enable(void) @@ -2667,22 +2667,7 @@ __do_set_cpus_allowed(struct task_struct struct rq *rq =3D task_rq(p); bool queued, running; =20 - /* - * This here violates the locking rules for affinity, since we're only - * supposed to change these variables while holding both rq->lock and - * p->pi_lock. - * - * HOWEVER, it magically works, because ttwu() is the only code that - * accesses these variables under p->pi_lock and only does so after - * smp_cond_load_acquire(&p->on_cpu, !VAL), and we're in __schedule() - * before finish_task(). - * - * XXX do further audits, this smells like something putrid. - */ - if (ctx->flags & SCA_MIGRATE_DISABLE) - WARN_ON_ONCE(!p->on_cpu); - else - lockdep_assert_held(&p->pi_lock); + lockdep_assert_held(&p->pi_lock); =20 queued =3D task_on_rq_queued(p); running =3D task_current_donor(rq, p); @@ -6781,6 +6766,7 @@ static void __sched notrace __schedule(i =20 local_irq_disable(); rcu_note_context_switch(preempt); + migrate_disable_switch(rq, prev); =20 /* * Make sure that signal_pending_state()->signal_pending() below @@ -6887,7 +6873,6 @@ static void __sched notrace __schedule(i */ ++*switch_count; =20 - migrate_disable_switch(rq, prev); psi_account_irqtime(rq, prev, next); psi_sched_switch(prev, next, !task_on_rq_queued(prev) || prev->se.sched_delayed); From nobody Sun Feb 8 04:31:06 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A25926A08F; Mon, 6 Oct 2025 10:46:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747564; cv=none; b=VQ9L5/unNwCy0pVtsO9/sIyiHqXY5lNAXGWvcS5vJokt6a5bxxobk23dS2yeDYrtMFyBoxgJXCWX8J/gczY/DgaGWF9QiRBP2t71eCnclxkLDlwxRmHzSLni10PKn3CwpUyCqWomUE53ZRKtDRAUpki1tr4bFvQloRvkkfetgTM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747564; c=relaxed/simple; bh=F1C+9QHNtIdrMXPFJNmdK9DTUeQQgH/b3wgeCQqewdw=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=USQMkNwSMFBfYB3pa40E9HB1IhmwaS85GVxToumOpkd/RNtGfUiqZs70HvZrxL/8Z4o0tw949GXk5yM2PaPZJBhCUBAyTcrWq4moJzt0N2jr0VlqrfbX264H8FFWiZuydKiOuCN5EFrCy7lkzd5yXvyVFnsltUC5wAgjpx2eaWs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=G09lvud0; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="G09lvud0" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=M59h4os1VP7Nr1E0HNWCErqmH/RtIRWZRp8A3WSdowA=; b=G09lvud0qqJNIfTPOJw5icFoKr 6tGZ6DJtcursvko8i3sH/LXuRTykAPBIGcVG8FJNuoKWczPV2SluqBesOJgXrsxUrgRJ7Itr2pVoS HWmdhNkS6RiPG4LmRVLdFruQ5EYVhU85sqNQe+Sq7m5Cx/y+xfSCyp8Aw/trgLsUS9xGsF+gTciee xZvordzYv8W/rqKeps+5B2FcR//PHppoQxvAF4mGICJR52eW8BQoszy4NkT0+xI3cfp46XsYWAoVd 1CBTceQ5OKg2SerE6HxfG9pO9+tHt6jhk64hPjRGuae7FCBMgAtZiUW9aa2YcuFWCtAu8FAdsLT8Z BT4qSpKA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1v5iiz-0000000Grdp-0C9f; Mon, 06 Oct 2025 10:45:49 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id D9D21302186; Mon, 06 Oct 2025 12:45:47 +0200 (CEST) Message-ID: <20251006104527.331463972@infradead.org> User-Agent: quilt/0.68 Date: Mon, 06 Oct 2025 12:44:09 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 07/12] sched: Fix do_set_cpus_allowed() locking References: <20251006104402.946760805@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" All callers of do_set_cpus_allowed() only take p->pi_lock, which is not sufficient to actually change the cpumask. Again, this is mostly ok in these cases, but it results in unnecessarily complicated reasoning. Furthermore, there is no reason what so ever to not just take all the required locks, so do just that. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Andrea Righi Acked-by: Juri Lelli Acked-by: Tejun Heo Acked-by: Vincent Guittot for the serie Reviewed-by: Juri Lelli Reviewed-by: Valentin Schneider --- kernel/kthread.c | 15 +++++---------- kernel/sched/core.c | 21 +++++++-------------- kernel/sched/sched.h | 5 +++++ 3 files changed, 17 insertions(+), 24 deletions(-) --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -593,18 +593,16 @@ EXPORT_SYMBOL(kthread_create_on_node); =20 static void __kthread_bind_mask(struct task_struct *p, const struct cpumas= k *mask, unsigned int state) { - unsigned long flags; - if (!wait_task_inactive(p, state)) { WARN_ON(1); return; } =20 + scoped_guard (raw_spinlock_irqsave, &p->pi_lock) + do_set_cpus_allowed(p, mask); + /* It's safe because the task is inactive. */ - raw_spin_lock_irqsave(&p->pi_lock, flags); - do_set_cpus_allowed(p, mask); p->flags |=3D PF_NO_SETAFFINITY; - raw_spin_unlock_irqrestore(&p->pi_lock, flags); } =20 static void __kthread_bind(struct task_struct *p, unsigned int cpu, unsign= ed int state) @@ -857,7 +855,6 @@ int kthread_affine_preferred(struct task { struct kthread *kthread =3D to_kthread(p); cpumask_var_t affinity; - unsigned long flags; int ret =3D 0; =20 if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) { @@ -882,10 +879,8 @@ int kthread_affine_preferred(struct task list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); kthread_fetch_affinity(kthread, affinity); =20 - /* It's safe because the task is inactive. */ - raw_spin_lock_irqsave(&p->pi_lock, flags); - do_set_cpus_allowed(p, affinity); - raw_spin_unlock_irqrestore(&p->pi_lock, flags); + scoped_guard (raw_spinlock_irqsave, &p->pi_lock) + do_set_cpus_allowed(p, affinity); =20 mutex_unlock(&kthreads_hotplug_lock); out: --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2668,18 +2668,14 @@ __do_set_cpus_allowed(struct task_struct bool queued, running; =20 lockdep_assert_held(&p->pi_lock); + lockdep_assert_rq_held(rq); =20 queued =3D task_on_rq_queued(p); running =3D task_current_donor(rq, p); =20 - if (queued) { - /* - * Because __kthread_bind() calls this on blocked tasks without - * holding rq->lock. - */ - lockdep_assert_rq_held(rq); + if (queued) dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK); - } + if (running) put_prev_task(rq, p); =20 @@ -2708,7 +2704,10 @@ void do_set_cpus_allowed(struct task_str struct rcu_head rcu; }; =20 - __do_set_cpus_allowed(p, &ac); + scoped_guard (__task_rq_lock, p) { + update_rq_clock(scope.rq); + __do_set_cpus_allowed(p, &ac); + } =20 /* * Because this is called with p->pi_lock held, it is not possible @@ -3483,12 +3482,6 @@ static int select_fallback_rq(int cpu, s } fallthrough; case possible: - /* - * XXX When called from select_task_rq() we only - * hold p->pi_lock and again violate locking order. - * - * More yuck to audit. - */ do_set_cpus_allowed(p, task_cpu_fallback_mask(p)); state =3D fail; break; --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1847,6 +1847,11 @@ DEFINE_LOCK_GUARD_1(task_rq_lock, struct task_rq_unlock(_T->rq, _T->lock, &_T->rf), struct rq *rq; struct rq_flags rf) =20 +DEFINE_LOCK_GUARD_1(__task_rq_lock, struct task_struct, + _T->rq =3D __task_rq_lock(_T->lock, &_T->rf), + __task_rq_unlock(_T->rq, &_T->rf), + struct rq *rq; struct rq_flags rf) + static inline void rq_lock_irqsave(struct rq *rq, struct rq_flags *rf) __acquires(rq->lock) { From nobody Sun Feb 8 04:31:06 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A31326A0A7; Mon, 6 Oct 2025 10:46:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747564; cv=none; b=GDj8g3EdaMMQ5KbLfVytO85UAqa6MYAcf+UJtGK81MUiQQqifokRhLIqC/+3vvmepbeUk/QHe2pkN3Ugw33a5Nwr2PB0az5NHD1f6kpn9cCZscV/9s7GYhB0ntwNIKVVH6OFUeFalVRn0TS5GrutkO2VHZDURofm/yM8IIgmuzw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747564; c=relaxed/simple; bh=/WwwZQpNkiY7Vdc4Weg9Fg9nNoU6Mnp6DBjM+eZYf8I=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ABDiOsoP6n9R/MLRfFwQ0qIRLOz5kC1aEG5xJ3uetek0NvZgvbALKYEm77IPlqg+tTDQJkbWMPHtSprkuAFoWrxQYNkP/G+CyWBGiFScKWolMWmiyoLEpVSIwOck15T70eBgGbMsTlfQ2nAN8ah9T4X4rVJi6BKzOq22dDUVOTY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=oiOtLHZM; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="oiOtLHZM" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=lpYK8Skx9AKfVTqEY+qoQ2vsAZoz9EAzmFb7n9Ub3w8=; b=oiOtLHZM4VVX831snSxca0UWII 4ovkSgEwHbd5HucpXYZQxOybSamhOPYIQoxt6ntruYHE+4B/FXi+FdRNw8QhcZlmLB2zk8LEZdvCT MpabLZHaqY2CQPJ0MJDwp4t76HAZvgV+omYRsusFEWwvCZST+FPRa0ChBzVmQzinjwGLwGH9AB5AF /CcNPVWIz4yBXGiNyEFClWv7nnXXEdRK9i6imJYsrsVN0ZRPuOYAh4JXmkG6iPT0I155CdNbJklnq TzMXe6iA9P3Br8WY/GB28u8lTxdR2yP8CPzneIcSkICmSPQ3SUzWu+l30YqIUvecWkpTOBjDzZSC5 2h9xT+kw==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1v5iiz-0000000Grds-0Egh; Mon, 06 Oct 2025 10:45:49 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id DD8B130300D; Mon, 06 Oct 2025 12:45:47 +0200 (CEST) Message-ID: <20251006104527.451083206@infradead.org> User-Agent: quilt/0.68 Date: Mon, 06 Oct 2025 12:44:10 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 08/12] sched: Rename do_set_cpus_allowed() References: <20251006104402.946760805@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Hopefully saner naming. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Andrea Righi Acked-by: Juri Lelli Acked-by: Tejun Heo Acked-by: Vincent Guittot for the serie Reviewed-by: Juri Lelli Reviewed-by: Valentin Schneider --- include/linux/sched.h | 4 ++-- kernel/cgroup/cpuset.c | 2 +- kernel/kthread.c | 4 ++-- kernel/sched/core.c | 16 ++++++++-------- kernel/sched/sched.h | 2 +- 5 files changed, 14 insertions(+), 14 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1861,8 +1861,8 @@ extern int task_can_attach(struct task_s extern int dl_bw_alloc(int cpu, u64 dl_bw); extern void dl_bw_free(int cpu, u64 dl_bw); =20 -/* do_set_cpus_allowed() - consider using set_cpus_allowed_ptr() instead */ -extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumas= k *new_mask); +/* set_cpus_allowed_force() - consider using set_cpus_allowed_ptr() instea= d */ +extern void set_cpus_allowed_force(struct task_struct *p, const struct cpu= mask *new_mask); =20 /** * set_cpus_allowed_ptr - set CPU affinity mask of a task --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -4180,7 +4180,7 @@ bool cpuset_cpus_allowed_fallback(struct rcu_read_lock(); cs_mask =3D task_cs(tsk)->cpus_allowed; if (is_in_v2_mode() && cpumask_subset(cs_mask, possible_mask)) { - do_set_cpus_allowed(tsk, cs_mask); + set_cpus_allowed_force(tsk, cs_mask); changed =3D true; } rcu_read_unlock(); --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -599,7 +599,7 @@ static void __kthread_bind_mask(struct t } =20 scoped_guard (raw_spinlock_irqsave, &p->pi_lock) - do_set_cpus_allowed(p, mask); + set_cpus_allowed_force(p, mask); =20 /* It's safe because the task is inactive. */ p->flags |=3D PF_NO_SETAFFINITY; @@ -880,7 +880,7 @@ int kthread_affine_preferred(struct task kthread_fetch_affinity(kthread, affinity); =20 scoped_guard (raw_spinlock_irqsave, &p->pi_lock) - do_set_cpus_allowed(p, affinity); + set_cpus_allowed_force(p, affinity); =20 mutex_unlock(&kthreads_hotplug_lock); out: --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2331,7 +2331,7 @@ unsigned long wait_task_inactive(struct } =20 static void -__do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx); +do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx); =20 static void migrate_disable_switch(struct rq *rq, struct task_struct *p) { @@ -2348,7 +2348,7 @@ static void migrate_disable_switch(struc =20 scoped_guard (task_rq_lock, p) { update_rq_clock(scope.rq); - __do_set_cpus_allowed(p, &ac); + do_set_cpus_allowed(p, &ac); } } =20 @@ -2662,7 +2662,7 @@ void set_cpus_allowed_common(struct task } =20 static void -__do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx) +do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx) { struct rq *rq =3D task_rq(p); bool queued, running; @@ -2692,7 +2692,7 @@ __do_set_cpus_allowed(struct task_struct * Used for kthread_bind() and select_fallback_rq(), in both cases the user * affinity (if any) should be destroyed too. */ -void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_= mask) +void set_cpus_allowed_force(struct task_struct *p, const struct cpumask *n= ew_mask) { struct affinity_context ac =3D { .new_mask =3D new_mask, @@ -2706,7 +2706,7 @@ void do_set_cpus_allowed(struct task_str =20 scoped_guard (__task_rq_lock, p) { update_rq_clock(scope.rq); - __do_set_cpus_allowed(p, &ac); + do_set_cpus_allowed(p, &ac); } =20 /* @@ -2745,7 +2745,7 @@ int dup_user_cpus_ptr(struct task_struct * Use pi_lock to protect content of user_cpus_ptr * * Though unlikely, user_cpus_ptr can be reset to NULL by a concurrent - * do_set_cpus_allowed(). + * set_cpus_allowed_force(). */ raw_spin_lock_irqsave(&src->pi_lock, flags); if (src->user_cpus_ptr) { @@ -3073,7 +3073,7 @@ static int __set_cpus_allowed_ptr_locked goto out; } =20 - __do_set_cpus_allowed(p, ctx); + do_set_cpus_allowed(p, ctx); =20 return affine_move_task(rq, p, rf, dest_cpu, ctx->flags); =20 @@ -3482,7 +3482,7 @@ static int select_fallback_rq(int cpu, s } fallthrough; case possible: - do_set_cpus_allowed(p, task_cpu_fallback_mask(p)); + set_cpus_allowed_force(p, task_cpu_fallback_mask(p)); state =3D fail; break; case fail: --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2617,7 +2617,7 @@ static inline bool task_allowed_on_cpu(s static inline cpumask_t *alloc_user_cpus_ptr(int node) { /* - * See do_set_cpus_allowed() above for the rcu_head usage. + * See set_cpus_allowed_force() above for the rcu_head usage. */ int size =3D max_t(int, cpumask_size(), sizeof(struct rcu_head)); From nobody Sun Feb 8 04:31:06 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C04DF215F42; Mon, 6 Oct 2025 10:45:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747560; cv=none; b=iup0T9Dzu0B+LYJ867PeQTC1txv5E/NrlmIg/J7IcWcQMOrdJYX+j8sxKtEw6sIU4keazZhgyL5JKUj9DhGXjJN4Jw/IcmvZG2YBOTzDSfVcD/kFkqRVSAi/QRxk74YsN81sj628Qc88K17Zrw1BboF6+F9ARXdZtYSVJ+Z2bpw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747560; c=relaxed/simple; bh=q/mQQad02fec3KRDgHLKjGaVdgdetNPm65tL4bPFZfg=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=E2uNuTWupFoYGewTtGqQ0qwAolSE5BOvmks5TATt+6pYilo+0mANuKq9NH4s9Iq5XfqLdTyLolTxHSPsADRTQ+6oKXhCoaZNnco5op1a3B4L1UYeTEEO29Ze6TSOwQ6KXvXUGGNkxWr2EtxUqn0tO4J2NwqJuGeXRNITpdUnVSY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=LBYFt7IH; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="LBYFt7IH" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=iIvGQ9AECv7VbRj4ol/gnuQgZMBrsdsc9a+qXMZl7v4=; b=LBYFt7IHRlPzUE8YoTjPlxnknI QAWDfUI2hDh9CjIN+DtfWIZpptooNEQmeMDOTtEAhBwZj7zSwUfk63N+b/WCHFEAktL+tCTPX/fQp fBlAmw0Nfar5Dyodcft7LY83BNF//gVVgUHNB6bWQgfRsoS5v9xQ/PdvpHJ+l5ZTm0dmSfk5H91cC IIDjh8byGzIPDP7WTC92f8r3BD9muAG0fMdSl5slUfkAKvgU84F/y9YMspxwcpSGisN9VruNowM/o DNh3x1DRMgGijGCuNX94zW6XAS4+o7G2M2N4Zqsdcga5PEqXReqrSKa4PQ1pAJENmA9f/sGnDOrPv heuijHdg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1v5iix-00000009ydA-3P8q; Mon, 06 Oct 2025 10:45:49 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id E198630302C; Mon, 06 Oct 2025 12:45:47 +0200 (CEST) Message-ID: <20251006104527.574803551@infradead.org> User-Agent: quilt/0.68 Date: Mon, 06 Oct 2025 12:44:11 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 09/12] sched: Make __do_set_cpus_allowed() use the sched_change pattern References: <20251006104402.946760805@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that do_set_cpus_allowed() holds all the regular locks, convert it to use the sched_change pattern helper. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Andrea Righi Acked-by: Juri Lelli Acked-by: Tejun Heo Acked-by: Vincent Guittot for the serie Reviewed-by: Juri Lelli Reviewed-by: Valentin Schneider --- kernel/sched/core.c | 26 +++++--------------------- 1 file changed, 5 insertions(+), 21 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2664,28 +2664,12 @@ void set_cpus_allowed_common(struct task static void do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx) { - struct rq *rq =3D task_rq(p); - bool queued, running; + u32 flags =3D DEQUEUE_SAVE | DEQUEUE_NOCLOCK; =20 - lockdep_assert_held(&p->pi_lock); - lockdep_assert_rq_held(rq); - - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - - if (queued) - dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK); - - if (running) - put_prev_task(rq, p); - - p->sched_class->set_cpus_allowed(p, ctx); - mm_set_cpus_allowed(p->mm, ctx->new_mask); - - if (queued) - enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK); - if (running) - set_next_task(rq, p); + scoped_guard (sched_change, p, flags) { + p->sched_class->set_cpus_allowed(p, ctx); + mm_set_cpus_allowed(p->mm, ctx->new_mask); + } } =20 /* From nobody Sun Feb 8 04:31:06 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A73DA261B71; Mon, 6 Oct 2025 10:45:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747562; cv=none; b=JMp0NETFi69RqOs5G6d+FAwNjYpKsr6qNZXMHC6EX5iJJepAb1G+0zVMp4eVPgBylExAXBrb4mxxni8kQSAosWzU7DgUND/UzJgB7xwz7Rtzqt/qiw8Q9QQKXDUZFC6GHwcRiyQkTLNuvHKsUSXLzh18atXU4cT4n32jgK9dfto= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747562; c=relaxed/simple; bh=v72Sl8acF1wtEVNJ5ebk6hAwHuSQbIfFFkRhsGxBvgI=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Ykt1gF+Q57lzkgDtM1PDaIsx7JHbauD76C5merEdNy4pHiPAOJoz9/BxbvBGF/9qaLq2Yxpr8w2ududHaoVQ8PBcEu+GA0VrKEcqoYwwfRs16pwexQoE7oWTI9J/IuUd2bGj+agBF5WPvVTrno7ZsQyRFelI3lt/2G7dF3w/8yM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=mmgpuF7Y; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="mmgpuF7Y" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=SWO9rCAVqQzNfjdIyJT3efkTH2n23/SbcKVjKjVsQhc=; b=mmgpuF7Y+CdYAETXpaF1WWwwtO SWj+B1ct3Dep/hjCrWKm/EAuvGvjP4oeHifUnFWg+rdWH8waC67/Q+BQPL3jaF/9Oggg5YRZqhK8E H9jU/OO9m/TM/vIQRs9JOaEW10LvjPDJrUbN9bsh6BqZysQhTmUvBEq3eGpUWbBw8zQKOsr57Mcao lzYwsQ9kNx6g+BLfWsE7ClDNsfjYP2CTLqt7WCLAy0xbCiEsRw9tzdzm7Ju3J8/DV9uby5kzb+xqC l6AcGNAz5FJfD/QcDyn6tbAyWMh9IuazHLQ1l5ZpAGyc5CRdBy7oSOUgnOhEwARHbGMq0zZiJ/VXR Ze8udoWQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1v5iix-00000009ydB-3kfB; Mon, 06 Oct 2025 10:45:49 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id E5606303031; Mon, 06 Oct 2025 12:45:47 +0200 (CEST) Message-ID: <20251006104527.694841522@infradead.org> User-Agent: quilt/0.68 Date: Mon, 06 Oct 2025 12:44:12 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 10/12] sched: Add locking comments to sched_class methods References: <20251006104402.946760805@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 'Document' the locking context the various sched_class methods are called under. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Andrea Righi Acked-by: Juri Lelli Acked-by: Tejun Heo Acked-by: Vincent Guittot for the serie Reviewed-by: Juri Lelli Reviewed-by: Valentin Schneider --- kernel/sched/core.c | 6 +- kernel/sched/sched.h | 106 ++++++++++++++++++++++++++++++++++++++++++++++= ++--- 2 files changed, 103 insertions(+), 9 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -583,8 +583,8 @@ EXPORT_SYMBOL(__trace_set_current_state) * * p->on_rq <- { 0, 1 =3D TASK_ON_RQ_QUEUED, 2 =3D TASK_ON_RQ_MIGRATING }: * - * is set by activate_task() and cleared by deactivate_task(), under - * rq->lock. Non-zero indicates the task is runnable, the special + * is set by activate_task() and cleared by deactivate_task()/block_task= (), + * under rq->lock. Non-zero indicates the task is runnable, the special * ON_RQ_MIGRATING state is used for migration without holding both * rq->locks. It indicates task_cpu() is not stable, see task_rq_lock(). * @@ -4162,7 +4162,7 @@ int try_to_wake_up(struct task_struct *p * __schedule(). See the comment for smp_mb__after_spinlock(). * * Form a control-dep-acquire with p->on_rq =3D=3D 0 above, to ensure - * schedule()'s deactivate_task() has 'happened' and p will no longer + * schedule()'s block_task() has 'happened' and p will no longer * care about it's own p->state. See the comment in __schedule(). */ smp_acquire__after_ctrl_dep(); --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2345,8 +2345,7 @@ extern const u32 sched_prio_to_wmult[40 /* * {de,en}queue flags: * - * DEQUEUE_SLEEP - task is no longer runnable - * ENQUEUE_WAKEUP - task just became runnable + * SLEEP/WAKEUP - task is no-longer/just-became runnable * * SAVE/RESTORE - an otherwise spurious dequeue/enqueue, done to ensure ta= sks * are in a known state which allows modification. Such pai= rs @@ -2359,6 +2358,11 @@ extern const u32 sched_prio_to_wmult[40 * * MIGRATION - p->on_rq =3D=3D TASK_ON_RQ_MIGRATING (used for DEADLINE) * + * DELAYED - de/re-queue a sched_delayed task + * + * CLASS - going to update p->sched_class; makes sched_change call the + * various switch methods. + * * ENQUEUE_HEAD - place at front of runqueue (tail if not specified) * ENQUEUE_REPLENISH - CBS (replenish runtime and postpone deadline) * ENQUEUE_MIGRATED - the task was migrated during wakeup @@ -2409,14 +2413,50 @@ struct sched_class { int uclamp_enabled; #endif =20 + /* + * move_queued_task/activate_task/enqueue_task: rq->lock + * ttwu_do_activate/activate_task/enqueue_task: rq->lock + * wake_up_new_task/activate_task/enqueue_task: task_rq_lock + * ttwu_runnable/enqueue_task: task_rq_lock + * proxy_task_current: rq->lock + * sched_change_end + */ void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags); + /* + * move_queued_task/deactivate_task/dequeue_task: rq->lock + * __schedule/block_task/dequeue_task: rq->lock + * proxy_task_current: rq->lock + * wait_task_inactive: task_rq_lock + * sched_change_begin + */ bool (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags); + + /* + * do_sched_yield: rq->lock + */ void (*yield_task) (struct rq *rq); + /* + * yield_to: rq->lock (double) + */ bool (*yield_to_task)(struct rq *rq, struct task_struct *p); =20 + /* + * move_queued_task: rq->lock + * __migrate_swap_task: rq->lock + * ttwu_do_activate: rq->lock + * ttwu_runnable: task_rq_lock + * wake_up_new_task: task_rq_lock + */ void (*wakeup_preempt)(struct rq *rq, struct task_struct *p, int flags); =20 + /* + * schedule/pick_next_task/prev_balance: rq->lock + */ int (*balance)(struct rq *rq, struct task_struct *prev, struct rq_flags *= rf); + + /* + * schedule/pick_next_task: rq->lock + */ struct task_struct *(*pick_task)(struct rq *rq); /* * Optional! When implemented pick_next_task() should be equivalent to: @@ -2429,48 +2469,102 @@ struct sched_class { */ struct task_struct *(*pick_next_task)(struct rq *rq, struct task_struct *= prev); =20 + /* + * sched_change: + * __schedule: rq->lock + */ void (*put_prev_task)(struct rq *rq, struct task_struct *p, struct task_s= truct *next); void (*set_next_task)(struct rq *rq, struct task_struct *p, bool first); =20 + /* + * select_task_rq: p->pi_lock + * sched_exec: p->pi_lock + */ int (*select_task_rq)(struct task_struct *p, int task_cpu, int flags); =20 + /* + * set_task_cpu: p->pi_lock || rq->lock (ttwu like) + */ void (*migrate_task_rq)(struct task_struct *p, int new_cpu); =20 + /* + * ttwu_do_activate: rq->lock + * wake_up_new_task: task_rq_lock + */ void (*task_woken)(struct rq *this_rq, struct task_struct *task); =20 + /* + * do_set_cpus_allowed: task_rq_lock + sched_change + */ void (*set_cpus_allowed)(struct task_struct *p, struct affinity_context *= ctx); =20 + /* + * sched_set_rq_{on,off}line: rq->lock + */ void (*rq_online)(struct rq *rq); void (*rq_offline)(struct rq *rq); =20 + /* + * push_cpu_stop: p->pi_lock && rq->lock + */ struct rq *(*find_lock_rq)(struct task_struct *p, struct rq *rq); =20 + /* + * hrtick: rq->lock + * sched_tick: rq->lock + * sched_tick_remote: rq->lock + */ void (*task_tick)(struct rq *rq, struct task_struct *p, int queued); + /* + * sched_cgroup_fork: p->pi_lock + */ void (*task_fork)(struct task_struct *p); + /* + * finish_task_switch: no locks + */ void (*task_dead)(struct task_struct *p); =20 + /* + * sched_change + */ void (*switching_from)(struct rq *this_rq, struct task_struct *task); void (*switched_from) (struct rq *this_rq, struct task_struct *task); void (*switching_to) (struct rq *this_rq, struct task_struct *task); void (*switched_to) (struct rq *this_rq, struct task_struct *task); - - void (*reweight_task)(struct rq *this_rq, struct task_struct *task, - const struct load_weight *lw); - u64 (*get_prio) (struct rq *this_rq, struct task_struct *task); void (*prio_changed) (struct rq *this_rq, struct task_struct *task, u64 oldprio); =20 + /* + * set_load_weight: task_rq_lock + sched_change + * __setscheduler_parms: task_rq_lock + sched_change + */ + void (*reweight_task)(struct rq *this_rq, struct task_struct *task, + const struct load_weight *lw); + + /* + * sched_rr_get_interval: task_rq_lock + */ unsigned int (*get_rr_interval)(struct rq *rq, struct task_struct *task); =20 + /* + * task_sched_runtime: task_rq_lock + */ void (*update_curr)(struct rq *rq); =20 #ifdef CONFIG_FAIR_GROUP_SCHED + /* + * sched_change_group: task_rq_lock + sched_change + */ void (*task_change_group)(struct task_struct *p); #endif =20 #ifdef CONFIG_SCHED_CORE + /* + * pick_next_task: rq->lock + * try_steal_cookie: rq->lock (double) + */ int (*task_is_throttled)(struct task_struct *p, int cpu); #endif }; From nobody Sun Feb 8 04:31:06 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7228123314B; Mon, 6 Oct 2025 10:45:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747561; cv=none; b=egmm3C/PTY8KVI2ttfsR/2GagjaPqqdkQodODdktAmB584DyyKeDWrrTaO8zZZK+dEuPfpiA9iVTrx0YvYWPurcgKxlmqQFY8E2Ob+KD8R4g+0mem+Joeg5kbNvX0Fp4lDej6NjH+5t5dZqWbFnWAzep+8Ktj2/dhScOchv29wU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747561; c=relaxed/simple; bh=TBOsjh5msezKVy3SJD6PzCzp4xG8h4Zbz5m8LYzdih0=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=FpIa4893GUhkH6coZYbHRa0D9KWmSBzbTGl8Oy6WnpuRN78nom3X0BMBQMwzoQYRyjj+/v3tBvyu9h89ZhRmtrmNmlS4haNhqY8vACCPK3VLWXO1AbmA9N4yN4azSssfmQj/hoNSitEauw4QoSu9DGfto/Riz06mDDji21g4ps0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=q5MeTx63; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="q5MeTx63" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=5fdDBWetzyDppzwCsqnsKCICzvIF4lbyV4cpTrAa49s=; b=q5MeTx63kvuQbollZFQns892vV 5Y1ZGHdSx8p6tcqhwrfumf/qBHyUaiU3jb9reFkCAtjcNi68tZEktkQerqcwI9kOlTXK5eqccx1Jo t/UReIQW1i/drdBtVsxroXhiy6WGRTFYq8bwWaYPM+v/+LAC3fVQUlfMjmUP8fEublnhlyiAzOjED kbYOCcmB6WaeRLG657zEJJqz6IzzhwjAX4sScRZg3eHdtfjZ9ckfZnwFL/XWMFh75udOwRjxGk4yG g2XtCuNWEAz9AZnKPFowbNZW9GHAEjIcGFzP9qTR1+T8/S8zQu3S3qD/F4a9lCtj1m2qlw3rvSubW PUkPFKgQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1v5iiy-00000009ydV-2DME; Mon, 06 Oct 2025 10:45:49 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id E9DEB303032; Mon, 06 Oct 2025 12:45:47 +0200 (CEST) Message-ID: <20251006104527.813272361@infradead.org> User-Agent: quilt/0.68 Date: Mon, 06 Oct 2025 12:44:13 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 11/12] sched: Match __task_rq_{,un}lock() References: <20251006104402.946760805@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation to adding more rules to __task_rq_lock(), such that __task_rq_unlock() will no longer be requivalent to rq_unlock(), make sure every __task_rq_lock() is matched by a __task_rq_unlock() and vice-versa. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Andrea Righi Acked-by: Juri Lelli Acked-by: Tejun Heo Acked-by: Vincent Guittot for the serie Reviewed-by: Juri Lelli Reviewed-by: Valentin Schneider --- kernel/sched/core.c | 13 ++++++++----- kernel/sched/sched.h | 8 ++++---- kernel/sched/stats.h | 2 +- 3 files changed, 13 insertions(+), 10 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2582,7 +2582,8 @@ static int migration_cpu_stop(void *data */ WARN_ON_ONCE(!pending->stop_pending); preempt_disable(); - task_rq_unlock(rq, p, &rf); + rq_unlock(rq, &rf); + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop, &pending->arg, &pending->stop_work); preempt_enable(); @@ -2591,7 +2592,8 @@ static int migration_cpu_stop(void *data out: if (pending) pending->stop_pending =3D false; - task_rq_unlock(rq, p, &rf); + rq_unlock(rq, &rf); + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); =20 if (complete) complete_all(&pending->done); @@ -3708,7 +3710,7 @@ static int ttwu_runnable(struct task_str ttwu_do_wakeup(p); ret =3D 1; } - __task_rq_unlock(rq, &rf); + __task_rq_unlock(rq, p, &rf); =20 return ret; } @@ -4301,7 +4303,7 @@ int task_call_func(struct task_struct *p ret =3D func(p, arg); =20 if (rq) - rq_unlock(rq, &rf); + __task_rq_unlock(rq, p, &rf); =20 raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); return ret; @@ -7364,7 +7366,8 @@ void rt_mutex_setprio(struct task_struct =20 rq_unpin_lock(rq, &rf); __balance_callbacks(rq); - raw_spin_rq_unlock(rq); + rq_repin_lock(rq, &rf); + __task_rq_unlock(rq, p, &rf); =20 preempt_enable(); } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1825,7 +1825,8 @@ struct rq *task_rq_lock(struct task_stru __acquires(p->pi_lock) __acquires(rq->lock); =20 -static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf) +static inline void +__task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf) __releases(rq->lock) { rq_unpin_lock(rq, rf); @@ -1837,8 +1838,7 @@ task_rq_unlock(struct rq *rq, struct tas __releases(rq->lock) __releases(p->pi_lock) { - rq_unpin_lock(rq, rf); - raw_spin_rq_unlock(rq); + __task_rq_unlock(rq, p, rf); raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags); } =20 @@ -1849,7 +1849,7 @@ DEFINE_LOCK_GUARD_1(task_rq_lock, struct =20 DEFINE_LOCK_GUARD_1(__task_rq_lock, struct task_struct, _T->rq =3D __task_rq_lock(_T->lock, &_T->rf), - __task_rq_unlock(_T->rq, &_T->rf), + __task_rq_unlock(_T->rq, _T->lock, &_T->rf), struct rq *rq; struct rq_flags rf) =20 static inline void rq_lock_irqsave(struct rq *rq, struct rq_flags *rf) --- a/kernel/sched/stats.h +++ b/kernel/sched/stats.h @@ -206,7 +206,7 @@ static inline void psi_ttwu_dequeue(stru =20 rq =3D __task_rq_lock(p, &rf); psi_task_change(p, p->psi_flags, 0); - __task_rq_unlock(rq, &rf); + __task_rq_unlock(rq, p, &rf); } } From nobody Sun Feb 8 04:31:06 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A80126A0DB; Mon, 6 Oct 2025 10:46:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747565; cv=none; b=sGCeZvRhAEUQ2FbawQCr+1idX8QyOhoMmGv5Xuu3Le7nDEpI1teJu8pRYzF/SPxm1ZcXvDAnr9SADDi/J0eBCljqWBWW20AVpKDHulKkf7HU7LiFXdNBQGCorbRlSYYGAO/Ma3htFxY91YIf4+clXOTapu0ABhK5wg5BFit2DC8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759747565; c=relaxed/simple; bh=w1lYh1YiRq6E0f553ouGsiKs7QHfZQxW7NSyJYfO3JA=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=dcbPWJPzKLDH+8pamyfjFCu4/J644JPhlNL/c2lNImAsA3I+BeccSuKK8NH8fF7BAgrZPVNnIYRHQUStYE65fzFDhhgW9ibrt4BgnDKx8uCfscnDf3oVnUqWIIeYU3WR4lbmA4Z4YYAeR2U2ksuugQPGywrFhIrU1nPruEEtU08= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=fYHMvtbw; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="fYHMvtbw" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=YzWW/tZ1HJVm0W1ykNrIHLrqdF6e6lEYaxKV/jPwKXw=; b=fYHMvtbwSxynWQrsh+uxfOW5ct LlWJ8GYDV/Ck2nyg6CKlMDhT4+V6S6qieNfz+O+oGw53jkZ48uLU7me8uWfWs8EWbYEKPN/4ufqYh yMiYyrT5XGVmcT0+dh3HEoGibJytA3zse7vOPUr/P1u0VgOnOaukn3D1fwroUB/44WA3D47iknesJ 6uKLJxOUkhcKsyuUHM0SplJ3M5oxnLu13FVZ3+VycCRvILCQmPzlu2cQP6oy3qF86DG7q47qx6ZAn LI6OgKWqI9+DEsNi2V1CKAwiJ+CcOiUkgupHfUgqELqGvo2YqRsVUKS4Sa+YTJx6VtrEgNOYzszu5 FJgXmzuA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1v5iiz-0000000Grdy-35eW; Mon, 06 Oct 2025 10:45:50 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id EDFF8303033; Mon, 06 Oct 2025 12:45:47 +0200 (CEST) Message-ID: <20251006104527.928947651@infradead.org> User-Agent: quilt/0.68 Date: Mon, 06 Oct 2025 12:44:14 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de, K Prateek Nayak Subject: [PATCH 12/12] sched: Cleanup the sched_change NOCLOCK usage References: <20251006104402.946760805@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Teach the sched_change pattern how to do update_rq_clock(); this allows for some simplifications / cleanups. Suggested-by: K Prateek Nayak Signed-off-by: Peter Zijlstra (Intel) Acked-by: Andrea Righi Acked-by: Juri Lelli Acked-by: Tejun Heo Acked-by: Vincent Guittot for the serie Reviewed-by: Juri Lelli Reviewed-by: Valentin Schneider --- kernel/sched/core.c | 33 +++++++++++---------------------- kernel/sched/ext.c | 8 ++------ kernel/sched/syscalls.c | 8 ++------ 3 files changed, 15 insertions(+), 34 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2346,10 +2346,8 @@ static void migrate_disable_switch(struc if (p->cpus_ptr !=3D &p->cpus_mask) return; =20 - scoped_guard (task_rq_lock, p) { - update_rq_clock(scope.rq); + scoped_guard (task_rq_lock, p) do_set_cpus_allowed(p, &ac); - } } =20 void ___migrate_enable(void) @@ -2666,9 +2664,7 @@ void set_cpus_allowed_common(struct task static void do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx) { - u32 flags =3D DEQUEUE_SAVE | DEQUEUE_NOCLOCK; - - scoped_guard (sched_change, p, flags) { + scoped_guard (sched_change, p, DEQUEUE_SAVE) { p->sched_class->set_cpus_allowed(p, ctx); mm_set_cpus_allowed(p->mm, ctx->new_mask); } @@ -2690,10 +2686,8 @@ void set_cpus_allowed_force(struct task_ struct rcu_head rcu; }; =20 - scoped_guard (__task_rq_lock, p) { - update_rq_clock(scope.rq); + scoped_guard (__task_rq_lock, p) do_set_cpus_allowed(p, &ac); - } =20 /* * Because this is called with p->pi_lock held, it is not possible @@ -9110,16 +9104,13 @@ static void sched_change_group(struct ta */ void sched_move_task(struct task_struct *tsk, bool for_autogroup) { - unsigned int queue_flags =3D - DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE; bool resched =3D false; struct rq *rq; =20 CLASS(task_rq_lock, rq_guard)(tsk); rq =3D rq_guard.rq; =20 - update_rq_clock(rq); - scoped_guard (sched_change, tsk, queue_flags) { sched_change_group(tsk); if (!for_autogroup) @@ -10794,19 +10785,17 @@ struct sched_change_ctx *sched_change_be =20 lockdep_assert_rq_held(rq); =20 + if (!(flags & DEQUEUE_NOCLOCK)) { + update_rq_clock(rq); + flags |=3D DEQUEUE_NOCLOCK; + } + if (flags & DEQUEUE_CLASS) { if (WARN_ON_ONCE(flags & (DEQUEUE_SAVE | DEQUEUE_MOVE))) flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); =20 - if (p->sched_class->switching_from) { - /* - * switching_from_fair() assumes CLASS implies NOCLOCK; - * fixing this assumption would mean switching_from() - * would need to be able to change flags. - */ - WARN_ON(!(flags & DEQUEUE_NOCLOCK)); + if (p->sched_class->switching_from) p->sched_class->switching_from(rq, p); - } } =20 *ctx =3D (struct sched_change_ctx){ @@ -10845,7 +10834,7 @@ void sched_change_end(struct sched_chang p->sched_class->switching_to(rq, p); =20 if (ctx->queued) - enqueue_task(rq, p, ctx->flags | ENQUEUE_NOCLOCK); + enqueue_task(rq, p, ctx->flags); if (ctx->running) set_next_task(rq, p); =20 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -3912,13 +3912,11 @@ static void scx_disable_workfn(struct kt =20 scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { - unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); =20 - update_rq_clock(task_rq(p)); - if (old_class !=3D new_class) { queue_flags |=3D DEQUEUE_CLASS; queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); @@ -4656,7 +4654,7 @@ static int scx_enable(struct sched_ext_o percpu_down_write(&scx_fork_rwsem); scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { - unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); @@ -4664,8 +4662,6 @@ static int scx_enable(struct sched_ext_o if (!tryget_task_struct(p)) continue; =20 - update_rq_clock(task_rq(p)); - if (old_class !=3D new_class) { queue_flags |=3D DEQUEUE_CLASS; queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -64,7 +64,6 @@ static int effective_prio(struct task_st =20 void set_user_nice(struct task_struct *p, long nice) { - struct rq *rq; int old_prio; =20 if (task_nice(p) =3D=3D nice || nice < MIN_NICE || nice > MAX_NICE) @@ -73,10 +72,7 @@ void set_user_nice(struct task_struct *p * We have to be careful, if called from sys_setpriority(), * the task might be in the middle of scheduling on another CPU. */ - CLASS(task_rq_lock, rq_guard)(p); - rq =3D rq_guard.rq; - - update_rq_clock(rq); + guard(task_rq_lock)(p); =20 /* * The RT priorities are set via sched_setscheduler(), but we still @@ -89,7 +85,7 @@ void set_user_nice(struct task_struct *p return; } =20 - scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK) { + scoped_guard (sched_change, p, DEQUEUE_SAVE) { p->static_prio =3D NICE_TO_PRIO(nice); set_load_weight(p, true); old_prio =3D p->prio;