From nobody Wed Sep 10 23:21:40 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05AD732C331; Wed, 10 Sep 2025 16:00:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520043; cv=none; b=AyWRd8TIDRfN4aX6EwVNa3jTbMCPwbPGq7Yu8c0nSgy6luOWOQFPo8T0VeZxvI3U7BD5/mjqNeBavwItvumJKvVw+Bpc2qnW/Fi0FaU+9+z9C/hwty8WZbEynnCd2O0EIs5oD0y2l5ximUvfxuf6q3mhWRl5wwF55OrqIyAH+xs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520043; c=relaxed/simple; bh=HgpktHzVEl6mwka7+8MYjghT2mNOVlEJcpiW5DPBMyQ=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=jJG7S/f7WcgAl+I3ClfUxrnGO2UyEhBxIFseafFsFGF4A+6WoAXIlDijpQ9npSJChWiEvOHoZTKfTCkQgiK6EnHd5SU2lcDLd8TdCs8xsmXfhtFWkEB5dwgEi2DpjiQcRpdMwIoSnFZmZxChLIkEyucRm+cxykSfAxv2n3NjR7g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=oigPxxPB; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="oigPxxPB" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=3pWhl54VJy6L23J3isnwQ/vKOpFfhFiPeN8RMgmrECE=; b=oigPxxPBxHQpMeZ/AhD24TQPzs 0Pgu5bodTFf6vwhnNOm0F//F8vBFS2S2frn4MCfkCS3YS9R4khugpqonOwD7Ee7308nh1OtVecuqO 4YcbHasyKjZgJhML4Uj9Z7S9umvCCR82z19J76cZmWffugW7idDf4CAx83iuQDU5AeKhmBshH85LN WXJiz9xxcr7V8xKHHgnIfr3IouDwHIkSPW37ZRjbGFCRwIV3/VlwMQhj7ohX1w7FlVc0Tf/oJBmED VSXw9g6sYMeu+Isj0thWinXecu/vAuZnLU4Td05xDIl/R3S96d9APHtlA/Vb5CprI6v/0t49EO6W/ WgW/WMew==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFF-0000000BXor-3wnc; Wed, 10 Sep 2025 16:00:30 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 7A92930050D; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155808.415580225@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:10 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 01/14] sched: Employ sched_change guards References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As proposed a long while ago -- and half done by scx -- wrap the scheduler's 'change' pattern in a guard helper. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/cleanup.h | 5 + kernel/sched/core.c | 156 +++++++++++++++++--------------------------= ----- kernel/sched/ext.c | 39 +++++------- kernel/sched/sched.h | 21 +++--- kernel/sched/syscalls.c | 65 +++++++------------- 5 files changed, 114 insertions(+), 172 deletions(-) --- a/include/linux/cleanup.h +++ b/include/linux/cleanup.h @@ -340,6 +340,11 @@ _label: #define __DEFINE_CLASS_IS_CONDITIONAL(_name, _is_cond) \ static __maybe_unused const bool class_##_name##_is_conditional =3D _is_co= nd =20 +#define DEFINE_CLASS_IS_UNCONDITIONAL(_name) \ + __DEFINE_CLASS_IS_CONDITIONAL(_name, false); \ + static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \ + { return (void *)1; } + #define __GUARD_IS_ERR(_ptr) \ ({ \ unsigned long _rc =3D (__force unsigned long)(_ptr); \ --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7361,7 +7361,7 @@ void rt_mutex_post_schedule(void) */ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task) { - int prio, oldprio, queued, running, queue_flag =3D + int prio, oldprio, queue_flag =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; const struct sched_class *prev_class, *next_class; struct rq_flags rf; @@ -7426,52 +7426,42 @@ void rt_mutex_setprio(struct task_struct if (prev_class !=3D next_class && p->se.sched_delayed) dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); =20 - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - if (queued) - dequeue_task(rq, p, queue_flag); - if (running) - put_prev_task(rq, p); - - /* - * Boosting condition are: - * 1. -rt task is running and holds mutex A - * --> -dl task blocks on mutex A - * - * 2. -dl task is running and holds mutex A - * --> -dl task blocks on mutex A and could preempt the - * running task - */ - if (dl_prio(prio)) { - if (!dl_prio(p->normal_prio) || - (pi_task && dl_prio(pi_task->prio) && - dl_entity_preempt(&pi_task->dl, &p->dl))) { - p->dl.pi_se =3D pi_task->dl.pi_se; - queue_flag |=3D ENQUEUE_REPLENISH; + scoped_guard (sched_change, p, queue_flag) { + /* + * Boosting condition are: + * 1. -rt task is running and holds mutex A + * --> -dl task blocks on mutex A + * + * 2. -dl task is running and holds mutex A + * --> -dl task blocks on mutex A and could preempt the + * running task + */ + if (dl_prio(prio)) { + if (!dl_prio(p->normal_prio) || + (pi_task && dl_prio(pi_task->prio) && + dl_entity_preempt(&pi_task->dl, &p->dl))) { + p->dl.pi_se =3D pi_task->dl.pi_se; + scope->flags |=3D ENQUEUE_REPLENISH; + } else { + p->dl.pi_se =3D &p->dl; + } + } else if (rt_prio(prio)) { + if (dl_prio(oldprio)) + p->dl.pi_se =3D &p->dl; + if (oldprio < prio) + scope->flags |=3D ENQUEUE_HEAD; } else { - p->dl.pi_se =3D &p->dl; + if (dl_prio(oldprio)) + p->dl.pi_se =3D &p->dl; + if (rt_prio(oldprio)) + p->rt.timeout =3D 0; } - } else if (rt_prio(prio)) { - if (dl_prio(oldprio)) - p->dl.pi_se =3D &p->dl; - if (oldprio < prio) - queue_flag |=3D ENQUEUE_HEAD; - } else { - if (dl_prio(oldprio)) - p->dl.pi_se =3D &p->dl; - if (rt_prio(oldprio)) - p->rt.timeout =3D 0; - } =20 - p->sched_class =3D next_class; - p->prio =3D prio; + p->sched_class =3D next_class; + p->prio =3D prio; =20 - check_class_changing(rq, p, prev_class); - - if (queued) - enqueue_task(rq, p, queue_flag); - if (running) - set_next_task(rq, p); + check_class_changing(rq, p, prev_class); + } =20 check_class_changed(rq, p, prev_class, oldprio); out_unlock: @@ -8119,26 +8109,9 @@ int migrate_task_to(struct task_struct * */ void sched_setnuma(struct task_struct *p, int nid) { - bool queued, running; - struct rq_flags rf; - struct rq *rq; - - rq =3D task_rq_lock(p, &rf); - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - - if (queued) - dequeue_task(rq, p, DEQUEUE_SAVE); - if (running) - put_prev_task(rq, p); - - p->numa_preferred_nid =3D nid; - - if (queued) - enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK); - if (running) - set_next_task(rq, p); - task_rq_unlock(rq, p, &rf); + guard(task_rq_lock)(p); + scoped_guard (sched_change, p, DEQUEUE_SAVE) + p->numa_preferred_nid =3D nid; } #endif /* CONFIG_NUMA_BALANCING */ =20 @@ -9240,8 +9213,9 @@ static void sched_change_group(struct ta */ void sched_move_task(struct task_struct *tsk, bool for_autogroup) { - int queued, running, queue_flags =3D + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; + bool resched =3D false; struct rq *rq; =20 CLASS(task_rq_lock, rq_guard)(tsk); @@ -9249,28 +9223,12 @@ void sched_move_task(struct task_struct =20 update_rq_clock(rq); =20 - running =3D task_current_donor(rq, tsk); - queued =3D task_on_rq_queued(tsk); - - if (queued) - dequeue_task(rq, tsk, queue_flags); - if (running) - put_prev_task(rq, tsk); - - sched_change_group(tsk); - if (!for_autogroup) - scx_cgroup_move_task(tsk); - - if (queued) - enqueue_task(rq, tsk, queue_flags); - if (running) { - set_next_task(rq, tsk); - /* - * After changing group, the running task may have joined a - * throttled one but it's still the running task. Trigger a - * resched to make sure that task can still run. - */ - resched_curr(rq); + scoped_guard (sched_change, tsk, queue_flags) { + sched_change_group(tsk); + if (!for_autogroup) + scx_cgroup_move_task(tsk); + if (scope->running) + resched =3D true; } } =20 @@ -10929,37 +10887,39 @@ void sched_mm_cid_fork(struct task_struc } #endif /* CONFIG_SCHED_MM_CID */ =20 -#ifdef CONFIG_SCHED_CLASS_EXT -void sched_deq_and_put_task(struct task_struct *p, int queue_flags, - struct sched_enq_and_set_ctx *ctx) +static DEFINE_PER_CPU(struct sched_change_ctx, sched_change_ctx); + +struct sched_change_ctx *sched_change_begin(struct task_struct *p, unsigne= d int flags) { + struct sched_change_ctx *ctx =3D this_cpu_ptr(&sched_change_ctx); struct rq *rq =3D task_rq(p); =20 lockdep_assert_rq_held(rq); =20 - *ctx =3D (struct sched_enq_and_set_ctx){ + *ctx =3D (struct sched_change_ctx){ .p =3D p, - .queue_flags =3D queue_flags, + .flags =3D flags, .queued =3D task_on_rq_queued(p), .running =3D task_current(rq, p), }; =20 - update_rq_clock(rq); if (ctx->queued) - dequeue_task(rq, p, queue_flags | DEQUEUE_NOCLOCK); + dequeue_task(rq, p, flags); if (ctx->running) put_prev_task(rq, p); + + return ctx; } =20 -void sched_enq_and_set_task(struct sched_enq_and_set_ctx *ctx) +void sched_change_end(struct sched_change_ctx *ctx) { - struct rq *rq =3D task_rq(ctx->p); + struct task_struct *p =3D ctx->p; + struct rq *rq =3D task_rq(p); =20 lockdep_assert_rq_held(rq); =20 if (ctx->queued) - enqueue_task(rq, ctx->p, ctx->queue_flags | ENQUEUE_NOCLOCK); + enqueue_task(rq, p, ctx->flags | ENQUEUE_NOCLOCK); if (ctx->running) - set_next_task(rq, ctx->p); + set_next_task(rq, p); } -#endif /* CONFIG_SCHED_CLASS_EXT */ --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -4867,11 +4867,10 @@ static void scx_bypass(bool bypass) */ list_for_each_entry_safe_reverse(p, n, &rq->scx.runnable_list, scx.runnable_node) { - struct sched_enq_and_set_ctx ctx; - /* cycling deq/enq is enough, see the function comment */ - sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx); - sched_enq_and_set_task(&ctx); + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE) { + /* nothing */ ; + } } =20 /* resched to restore ticks and idle state */ @@ -5003,17 +5002,16 @@ static void scx_disable_workfn(struct kt const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); - struct sched_enq_and_set_ctx ctx; - - if (old_class !=3D new_class && p->se.sched_delayed) - dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); =20 - sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx); + update_rq_clock(task_rq(p)); =20 - p->sched_class =3D new_class; - check_class_changing(task_rq(p), p, old_class); + if (old_class !=3D new_class && p->se.sched_delayed) + dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_N= OCLOCK); =20 - sched_enq_and_set_task(&ctx); + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOC= LOCK) { + p->sched_class =3D new_class; + check_class_changing(task_rq(p), p, old_class); + } =20 check_class_changed(task_rq(p), p, old_class, p->prio); scx_exit_task(p); @@ -5747,21 +5745,20 @@ static int scx_enable(struct sched_ext_o const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); - struct sched_enq_and_set_ctx ctx; =20 if (!tryget_task_struct(p)) continue; =20 - if (old_class !=3D new_class && p->se.sched_delayed) - dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); - - sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx); + update_rq_clock(task_rq(p)); =20 - p->scx.slice =3D SCX_SLICE_DFL; - p->sched_class =3D new_class; - check_class_changing(task_rq(p), p, old_class); + if (old_class !=3D new_class && p->se.sched_delayed) + dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_N= OCLOCK); =20 - sched_enq_and_set_task(&ctx); + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOC= LOCK) { + p->scx.slice =3D SCX_SLICE_DFL; + p->sched_class =3D new_class; + check_class_changing(task_rq(p), p, old_class); + } =20 check_class_changed(task_rq(p), p, old_class, p->prio); put_task_struct(p); --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3860,23 +3860,22 @@ extern void check_class_changed(struct r extern struct balance_callback *splice_balance_callbacks(struct rq *rq); extern void balance_callbacks(struct rq *rq, struct balance_callback *head= ); =20 -#ifdef CONFIG_SCHED_CLASS_EXT -/* - * Used by SCX in the enable/disable paths to move tasks between sched_cla= sses - * and establish invariants. - */ -struct sched_enq_and_set_ctx { +struct sched_change_ctx { struct task_struct *p; - int queue_flags; + int flags; bool queued; bool running; }; =20 -void sched_deq_and_put_task(struct task_struct *p, int queue_flags, - struct sched_enq_and_set_ctx *ctx); -void sched_enq_and_set_task(struct sched_enq_and_set_ctx *ctx); +struct sched_change_ctx *sched_change_begin(struct task_struct *p, unsigne= d int flags); +void sched_change_end(struct sched_change_ctx *ctx); =20 -#endif /* CONFIG_SCHED_CLASS_EXT */ +DEFINE_CLASS(sched_change, struct sched_change_ctx *, + sched_change_end(_T), + sched_change_begin(p, flags), + struct task_struct *p, unsigned int flags) + +DEFINE_CLASS_IS_UNCONDITIONAL(sched_change) =20 #include "ext.h" =20 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -64,7 +64,6 @@ static int effective_prio(struct task_st =20 void set_user_nice(struct task_struct *p, long nice) { - bool queued, running; struct rq *rq; int old_prio; =20 @@ -90,22 +89,12 @@ void set_user_nice(struct task_struct *p return; } =20 - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - if (queued) - dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK); - if (running) - put_prev_task(rq, p); - - p->static_prio =3D NICE_TO_PRIO(nice); - set_load_weight(p, true); - old_prio =3D p->prio; - p->prio =3D effective_prio(p); - - if (queued) - enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK); - if (running) - set_next_task(rq, p); + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK) { + p->static_prio =3D NICE_TO_PRIO(nice); + set_load_weight(p, true); + old_prio =3D p->prio; + p->prio =3D effective_prio(p); + } =20 /* * If the task increased its priority or is running and @@ -515,7 +504,7 @@ int __sched_setscheduler(struct task_str bool user, bool pi) { int oldpolicy =3D -1, policy =3D attr->sched_policy; - int retval, oldprio, newprio, queued, running; + int retval, oldprio, newprio; const struct sched_class *prev_class, *next_class; struct balance_callback *head; struct rq_flags rf; @@ -698,33 +687,25 @@ int __sched_setscheduler(struct task_str if (prev_class !=3D next_class && p->se.sched_delayed) dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); =20 - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - if (queued) - dequeue_task(rq, p, queue_flags); - if (running) - put_prev_task(rq, p); - - if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) { - __setscheduler_params(p, attr); - p->sched_class =3D next_class; - p->prio =3D newprio; - } - __setscheduler_uclamp(p, attr); - check_class_changing(rq, p, prev_class); + scoped_guard (sched_change, p, queue_flags) { =20 - if (queued) { - /* - * We enqueue to tail when the priority of a task is - * increased (user space view). - */ - if (oldprio < p->prio) - queue_flags |=3D ENQUEUE_HEAD; + if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) { + __setscheduler_params(p, attr); + p->sched_class =3D next_class; + p->prio =3D newprio; + } + __setscheduler_uclamp(p, attr); + check_class_changing(rq, p, prev_class); =20 - enqueue_task(rq, p, queue_flags); + if (scope->queued) { + /* + * We enqueue to tail when the priority of a task is + * increased (user space view). + */ + if (oldprio < p->prio) + scope->flags |=3D ENQUEUE_HEAD; + } } - if (running) - set_next_task(rq, p); =20 check_class_changed(rq, p, prev_class, oldprio); From nobody Wed Sep 10 23:21:40 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A7A033472C; Wed, 10 Sep 2025 16:00:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520045; cv=none; b=NUckWFP7Z+HM397h5iMMBBYxOjMVNFxLPbmZaMaXfk+094dog6WXlMe51jxxnQuAKdAKCr80TK0CB/wXHBIv7CDjXzr5pHUQwYvOyQYaWKNH6Us4FqDrIBNdR/lLmRunWwyO3/xq2q/NnQtdj0HC1TGBPNwc6pVOfWlRK0AjFrc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520045; c=relaxed/simple; bh=+Xi+VxJz3QwD8xpd/CFUanD6QsGXGT7DYIQZlvWMsJ4=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=NjbqH619Ku2ItSKogXvjbVjkM5Iv3k2iRm86ZgUMkRiT35kqVwCd3Ta4V9UTEBeNtrPNct2mWIupMyvzHF+4LskEu9IFaWjpDdqiyX09K9FhG0deUHCQsMeatlVhon4ZZkS5qaPWSGpmiM1G1oed3ur4PodTH7v0pNgBiTtS/L8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=J6pLRmOQ; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="J6pLRmOQ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=BchCbAovlKyHntSnrXPDYCJXjw7uOf9GlTgQLA3hUug=; b=J6pLRmOQ97lWeC8GkFItwrD3Pu eIJ+08yCsKITPBZUebYJxIiwmZwG6V3aH9XmeQX/ZYAyR7ePRtHvzzVNeqOBFGtxpWOoeOnAkTa3O YImGNowOQ98Rf/MP8dYDsbRPtNy3NkaBcLmpcvFLCaLW0i8ZA2FF16fUixKVB9CWqQ+Gv9PHkglJ6 +QpZUNAhsXo0qR+QMm2iD9Tv8pM4SYsjiq2y6OCASsr/aRD3T3q5jnzBN+ZoJ/DdLZfrWuFHl1Upq fokmeZd9UNqbse16KWXvGKsrtfLY4URqO7u46Jcz5eWNlEOxomG6UtAD21mQwsJDq1yyND0UuEZo6 pJbJu26Q==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFG-00000005ufE-0S6d; Wed, 10 Sep 2025 16:00:30 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 7EA7A302D42; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155808.536785976@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:11 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 02/14] sched: Re-arrange the {EN,DE}QUEUE flags References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ensure the matched flags are in the low word while the unmatched flags go into the second word. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/sched.h | 45 ++++++++++++++++++++++++--------------------- 1 file changed, 24 insertions(+), 21 deletions(-) --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2337,27 +2337,30 @@ extern const u32 sched_prio_to_wmult[40 * */ =20 -#define DEQUEUE_SLEEP 0x01 /* Matches ENQUEUE_WAKEUP */ -#define DEQUEUE_SAVE 0x02 /* Matches ENQUEUE_RESTORE */ -#define DEQUEUE_MOVE 0x04 /* Matches ENQUEUE_MOVE */ -#define DEQUEUE_NOCLOCK 0x08 /* Matches ENQUEUE_NOCLOCK */ -#define DEQUEUE_SPECIAL 0x10 -#define DEQUEUE_MIGRATING 0x100 /* Matches ENQUEUE_MIGRATING */ -#define DEQUEUE_DELAYED 0x200 /* Matches ENQUEUE_DELAYED */ -#define DEQUEUE_THROTTLE 0x800 - -#define ENQUEUE_WAKEUP 0x01 -#define ENQUEUE_RESTORE 0x02 -#define ENQUEUE_MOVE 0x04 -#define ENQUEUE_NOCLOCK 0x08 - -#define ENQUEUE_HEAD 0x10 -#define ENQUEUE_REPLENISH 0x20 -#define ENQUEUE_MIGRATED 0x40 -#define ENQUEUE_INITIAL 0x80 -#define ENQUEUE_MIGRATING 0x100 -#define ENQUEUE_DELAYED 0x200 -#define ENQUEUE_RQ_SELECTED 0x400 +#define DEQUEUE_SLEEP 0x0001 /* Matches ENQUEUE_WAKEUP */ +#define DEQUEUE_SAVE 0x0002 /* Matches ENQUEUE_RESTORE */ +#define DEQUEUE_MOVE 0x0004 /* Matches ENQUEUE_MOVE */ +#define DEQUEUE_NOCLOCK 0x0008 /* Matches ENQUEUE_NOCLOCK */ + +#define DEQUEUE_MIGRATING 0x0010 /* Matches ENQUEUE_MIGRATING */ +#define DEQUEUE_DELAYED 0x0020 /* Matches ENQUEUE_DELAYED */ + +#define DEQUEUE_SPECIAL 0x00010000 +#define DEQUEUE_THROTTLE 0x00020000 + +#define ENQUEUE_WAKEUP 0x0001 +#define ENQUEUE_RESTORE 0x0002 +#define ENQUEUE_MOVE 0x0004 +#define ENQUEUE_NOCLOCK 0x0008 + +#define ENQUEUE_MIGRATING 0x0010 +#define ENQUEUE_DELAYED 0x0020 + +#define ENQUEUE_HEAD 0x00010000 +#define ENQUEUE_REPLENISH 0x00020000 +#define ENQUEUE_MIGRATED 0x00040000 +#define ENQUEUE_INITIAL 0x00080000 +#define ENQUEUE_RQ_SELECTED 0x00100000 =20 #define RETRY_TASK ((void *)-1UL) From nobody Wed Sep 10 23:21:40 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05A6132C305; Wed, 10 Sep 2025 16:00:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520043; cv=none; b=oWr4QOVkGTV9+stgcBCrMJ957HxwribNnWa0+qHKX+qgAg+/5s4Ak+xudDpWGKS+ea3Kx/jccu3pRw36GZciXUf5RGwTuxKA26al7CtBy64XbrIxwyORPEfvgcCSrmqc1dAK9GikduGv087Js3N7Bk36wQvyuxP0Omto+W2aQbg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520043; c=relaxed/simple; bh=UabuqHs82eiB6wfw3nlkdJGIxKNkVMrU3JFOzGrGF74=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Yr3Z7P32E6R0NNb7K9omYlvyQ/zl6cfX7f8t3iHUc067ScD7lhYiYIdhE7opyt1dE27elhrWXvmzT2fZVUxMAbZAJzA9oR3fCr4vgAkKvZZazlWLMioLBdn4W14dPu3y5KCbQExcjaR4P6Y5Kv1NXOlQJws82e4kPcHMyl76vgg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=kWw6DIei; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="kWw6DIei" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=noQZMFROCfJfcSZHYVFLL9z8ZG1W1SPyFqWShS69sJc=; b=kWw6DIeiTHMghlxf3k9tQVKS1a g9e5o2osRsvWF6CwSdr5T6XDVZ+gCsChyRFQdkrjuxBejosIj/50nnwT4yyANduSdGwWa34Q9FV2O I7+wB1/z9h2rUch8WICPfxkmDS9NKouMjrEbsKq24A+o5E5uIJ4bSnYvMh5DRPYUWbBpRbSlsmg+N jXnmMWFtj8qpkYz5leMk1WKv5XqSkg4dQXljX0mzfQguOqnb61oaH4SbGeONIGzUTvWzV11hox4lZ Q9hjh59z4O1muiPatnWf4Av/yvuWwGNU2h6WRomih5j4OIy2fWES6+dn34rIkuUyryHPfD2i97lN5 8xQDl29g==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFF-0000000BXot-3xc4; Wed, 10 Sep 2025 16:00:30 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 8318E302D9C; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155808.649485170@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:12 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 03/14] sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add {DE,EN}QUEUE_CLASS and fold the sched_class::switch* methods into the change pattern. This completes and makes the pattern more symmetric. This changes the order of callbacks slightly: | | switching_from() dequeue_task(); | dequeue_task() put_prev_task(); | put_prev_task() | switched_from() | ... change task ... | ... change task ... | switching_to(); | switching_to() enqueue_task(); | enqueue_task() set_next_task(); | set_next_task() prev_class->switched_from() | switched_to() | switched_to() | Notably, it moves the switched_from() callback right after the dequeue/put. Existing implementations don't appear to be affected by this change in location -- specifically the task isn't enqueued on the class in question in either location. Make (CLASS)^(SAVE|MOVE), because there is nothing to save-restore when changing scheduling classes. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 56 +++++++++++++++++++++---------------------= ----- kernel/sched/ext.c | 26 ++++++++++++++++----- kernel/sched/idle.c | 4 +-- kernel/sched/rt.c | 2 - kernel/sched/sched.h | 22 ++++++------------ kernel/sched/stop_task.c | 4 +-- kernel/sched/syscalls.c | 9 +++++-- 7 files changed, 66 insertions(+), 57 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2167,34 +2167,9 @@ inline int task_curr(const struct task_s return cpu_curr(task_cpu(p)) =3D=3D p; } =20 -/* - * ->switching_to() is called with the pi_lock and rq_lock held and must n= ot - * mess with locking. - */ -void check_class_changing(struct rq *rq, struct task_struct *p, - const struct sched_class *prev_class) +void check_prio_changed(struct rq *rq, struct task_struct *p, int oldprio) { - if (prev_class !=3D p->sched_class && p->sched_class->switching_to) - p->sched_class->switching_to(rq, p); -} - -/* - * switched_from, switched_to and prio_changed must _NOT_ drop rq->lock, - * use the balance_callback list if you want balancing. - * - * this means any call to check_class_changed() must be followed by a call= to - * balance_callback(). - */ -void check_class_changed(struct rq *rq, struct task_struct *p, - const struct sched_class *prev_class, - int oldprio) -{ - if (prev_class !=3D p->sched_class) { - if (prev_class->switched_from) - prev_class->switched_from(rq, p); - - p->sched_class->switched_to(rq, p); - } else if (oldprio !=3D p->prio || dl_task(p)) + if (oldprio !=3D p->prio || dl_task(p)) p->sched_class->prio_changed(rq, p, oldprio); } =20 @@ -7423,6 +7398,11 @@ void rt_mutex_setprio(struct task_struct prev_class =3D p->sched_class; next_class =3D __setscheduler_class(p->policy, prio); =20 + if (prev_class !=3D next_class) { + queue_flag |=3D DEQUEUE_CLASS; + queue_flag &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); + } + if (prev_class !=3D next_class && p->se.sched_delayed) dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); =20 @@ -7459,11 +7439,10 @@ void rt_mutex_setprio(struct task_struct =20 p->sched_class =3D next_class; p->prio =3D prio; - - check_class_changing(rq, p, prev_class); } =20 - check_class_changed(rq, p, prev_class, oldprio); + if (!(queue_flag & DEQUEUE_CLASS)) + check_prio_changed(rq, p, oldprio); out_unlock: /* Avoid rq from going away on us: */ preempt_disable(); @@ -10896,6 +10875,14 @@ struct sched_change_ctx *sched_change_be =20 lockdep_assert_rq_held(rq); =20 + if (flags & DEQUEUE_CLASS) { + if (WARN_ON_ONCE(flags & (DEQUEUE_SAVE | DEQUEUE_MOVE))) + flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); + + if (p->sched_class->switching_from) + p->sched_class->switching_from(rq, p); + } + *ctx =3D (struct sched_change_ctx){ .p =3D p, .flags =3D flags, @@ -10908,6 +10895,9 @@ struct sched_change_ctx *sched_change_be if (ctx->running) put_prev_task(rq, p); =20 + if ((flags & DEQUEUE_CLASS) && p->sched_class->switched_from) + p->sched_class->switched_from(rq, p); + return ctx; } =20 @@ -10918,8 +10908,14 @@ void sched_change_end(struct sched_chang =20 lockdep_assert_rq_held(rq); =20 + if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switching_to) + p->sched_class->switching_to(rq, p); + if (ctx->queued) enqueue_task(rq, p, ctx->flags | ENQUEUE_NOCLOCK); if (ctx->running) set_next_task(rq, p); + + if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switched_to) + p->sched_class->switched_to(rq, p); } --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -4999,21 +4999,28 @@ static void scx_disable_workfn(struct kt =20 scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); =20 update_rq_clock(task_rq(p)); =20 + if (old_class !=3D new_class) { + queue_flags |=3D DEQUEUE_CLASS; + queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); + } + if (old_class !=3D new_class && p->se.sched_delayed) dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_N= OCLOCK); =20 - scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOC= LOCK) { + scoped_guard (sched_change, p, queue_flags) { p->sched_class =3D new_class; - check_class_changing(task_rq(p), p, old_class); } =20 - check_class_changed(task_rq(p), p, old_class, p->prio); + if (!(queue_flags & DEQUEUE_CLASS)) + check_prio_changed(task_rq(p), p, p->prio); + scx_exit_task(p); } scx_task_iter_stop(&sti); @@ -5742,6 +5749,7 @@ static int scx_enable(struct sched_ext_o percpu_down_write(&scx_fork_rwsem); scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); @@ -5751,16 +5759,22 @@ static int scx_enable(struct sched_ext_o =20 update_rq_clock(task_rq(p)); =20 + if (old_class !=3D new_class) { + queue_flags |=3D DEQUEUE_CLASS; + queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); + } + if (old_class !=3D new_class && p->se.sched_delayed) dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_N= OCLOCK); =20 - scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOC= LOCK) { + scoped_guard (sched_change, p, queue_flags) { p->scx.slice =3D SCX_SLICE_DFL; p->sched_class =3D new_class; - check_class_changing(task_rq(p), p, old_class); } =20 - check_class_changed(task_rq(p), p, old_class, p->prio); + if (!(queue_flags & DEQUEUE_CLASS)) + check_prio_changed(task_rq(p), p, p->prio); + put_task_struct(p); } scx_task_iter_stop(&sti); --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -498,7 +498,7 @@ static void task_tick_idle(struct rq *rq { } =20 -static void switched_to_idle(struct rq *rq, struct task_struct *p) +static void switching_to_idle(struct rq *rq, struct task_struct *p) { BUG(); } @@ -536,6 +536,6 @@ DEFINE_SCHED_CLASS(idle) =3D { .task_tick =3D task_tick_idle, =20 .prio_changed =3D prio_changed_idle, - .switched_to =3D switched_to_idle, + .switching_to =3D switching_to_idle, .update_curr =3D update_curr_idle, }; --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2589,8 +2589,8 @@ DEFINE_SCHED_CLASS(rt) =3D { =20 .get_rr_interval =3D get_rr_interval_rt, =20 - .prio_changed =3D prio_changed_rt, .switched_to =3D switched_to_rt, + .prio_changed =3D prio_changed_rt, =20 .update_curr =3D update_curr_rt, =20 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -20,7 +20,6 @@ #include #include #include - #include #include #include @@ -2344,6 +2343,7 @@ extern const u32 sched_prio_to_wmult[40 =20 #define DEQUEUE_MIGRATING 0x0010 /* Matches ENQUEUE_MIGRATING */ #define DEQUEUE_DELAYED 0x0020 /* Matches ENQUEUE_DELAYED */ +#define DEQUEUE_CLASS 0x0040 /* Matches ENQUEUE_CLASS */ =20 #define DEQUEUE_SPECIAL 0x00010000 #define DEQUEUE_THROTTLE 0x00020000 @@ -2355,6 +2355,7 @@ extern const u32 sched_prio_to_wmult[40 =20 #define ENQUEUE_MIGRATING 0x0010 #define ENQUEUE_DELAYED 0x0020 +#define ENQUEUE_CLASS 0x0040 =20 #define ENQUEUE_HEAD 0x00010000 #define ENQUEUE_REPLENISH 0x00020000 @@ -2418,14 +2419,11 @@ struct sched_class { void (*task_fork)(struct task_struct *p); void (*task_dead)(struct task_struct *p); =20 - /* - * The switched_from() call is allowed to drop rq->lock, therefore we - * cannot assume the switched_from/switched_to pair is serialized by - * rq->lock. They are however serialized by p->pi_lock. - */ - void (*switching_to) (struct rq *this_rq, struct task_struct *task); - void (*switched_from)(struct rq *this_rq, struct task_struct *task); - void (*switched_to) (struct rq *this_rq, struct task_struct *task); + void (*switching_from)(struct rq *this_rq, struct task_struct *task); + void (*switched_from) (struct rq *this_rq, struct task_struct *task); + void (*switching_to) (struct rq *this_rq, struct task_struct *task); + void (*switched_to) (struct rq *this_rq, struct task_struct *task); + void (*reweight_task)(struct rq *this_rq, struct task_struct *task, const struct load_weight *lw); void (*prio_changed) (struct rq *this_rq, struct task_struct *task, @@ -3854,11 +3852,7 @@ extern void set_load_weight(struct task_ extern void enqueue_task(struct rq *rq, struct task_struct *p, int flags); extern bool dequeue_task(struct rq *rq, struct task_struct *p, int flags); =20 -extern void check_class_changing(struct rq *rq, struct task_struct *p, - const struct sched_class *prev_class); -extern void check_class_changed(struct rq *rq, struct task_struct *p, - const struct sched_class *prev_class, - int oldprio); +extern void check_prio_changed(struct rq *rq, struct task_struct *p, int o= ldprio); =20 extern struct balance_callback *splice_balance_callbacks(struct rq *rq); extern void balance_callbacks(struct rq *rq, struct balance_callback *head= ); --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -75,7 +75,7 @@ static void task_tick_stop(struct rq *rq { } =20 -static void switched_to_stop(struct rq *rq, struct task_struct *p) +static void switching_to_stop(struct rq *rq, struct task_struct *p) { BUG(); /* its impossible to change to this class */ } @@ -112,6 +112,6 @@ DEFINE_SCHED_CLASS(stop) =3D { .task_tick =3D task_tick_stop, =20 .prio_changed =3D prio_changed_stop, - .switched_to =3D switched_to_stop, + .switching_to =3D switching_to_stop, .update_curr =3D update_curr_stop, }; --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -684,6 +684,11 @@ int __sched_setscheduler(struct task_str prev_class =3D p->sched_class; next_class =3D __setscheduler_class(policy, newprio); =20 + if (prev_class !=3D next_class) { + queue_flags |=3D DEQUEUE_CLASS; + queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); + } + if (prev_class !=3D next_class && p->se.sched_delayed) dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); =20 @@ -695,7 +700,6 @@ int __sched_setscheduler(struct task_str p->prio =3D newprio; } __setscheduler_uclamp(p, attr); - check_class_changing(rq, p, prev_class); =20 if (scope->queued) { /* @@ -707,7 +711,8 @@ int __sched_setscheduler(struct task_str } } =20 - check_class_changed(rq, p, prev_class, oldprio); + if (!(queue_flags & DEQUEUE_CLASS)) + check_prio_changed(rq, p, oldprio); =20 /* Avoid rq from going away on us: */ preempt_disable(); From nobody Wed Sep 10 23:21:40 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C69E334733; Wed, 10 Sep 2025 16:00:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520045; cv=none; b=lUXAjEjVu8kKjRb+yaBdOL+OruwYUcsR/B178PsfTWUTvePfSf8hpEOB66s+gCA8xy9j1dqmVhdnrQxvB1w+mCvZPiIMPRLhL24mI4d8M2mAW+MqCAPiQjsTpESW80BxhK/Zf3pSOKhzsgG6fxv9l2f1uNu6YeD5jnP2Fct7vQ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520045; c=relaxed/simple; bh=IUCkWQ6wGqZwAbUmhFozAUW9M10IVL1wf8pIxmTstes=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=qDu3yHbiJ1EodNoXVbMrNicmCUzclZYAo2Ft/2WQZhik7lQCJHPl78E36I6Dd4OzXP9ngiA0/HAZTeOnkgK39p8lT5WL6kfhaS1dJpMroZu59Q9nwTX8sYzOpvWB6AsisWEpM83EhVv5Zwpv9bUfPkKuG46W1pWTk6bIKA9GZJU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=gSwRbE9B; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="gSwRbE9B" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=KROeO6ylt+iaOjtKwEutyUCejaZHIqUyNIxZI7cvLNw=; b=gSwRbE9BunXYI8CC6h6+QMoY7u umuBf1guLbuHCBol2ESKhgjJ2bwMdBSEHi28ojadQHCZOn/r4/9imVOD0jh/Q6bUPwaN2gqjn45ut mm5CiyfkwYJiJcCS+V1jBEtZzqeupc4vZ1w5WT+FxfiuDuH9CBqVIBBO71ydbbyTPJp0+U0yh2jde aJmFD6uXOJIrxTh7zPbPubFFCQ8ANYztNd99zm0u2vmcqgDWdI54QLFpuhZxT0X4avUTivNHb2A85 hUALMCP310FoddZWfN8F2SC2SbJ2z3S6gPUL4A+o+q9/jm6TPxHEwCzOnaD3gmjzK6Hm5Cd2tH0UV xZdnpT+A==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFG-00000005ufD-0UXk; Wed, 10 Sep 2025 16:00:30 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 86FD6302E5F; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155808.764084708@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:13 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 04/14] sched: Cleanup sched_delayed handling for class switches References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use the new sched_class::switching_from() method to dequeue delayed tasks before switching to another class. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 12 ++++++++---- kernel/sched/ext.c | 6 ------ kernel/sched/fair.c | 7 +++++++ kernel/sched/syscalls.c | 3 --- 4 files changed, 15 insertions(+), 13 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7403,9 +7403,6 @@ void rt_mutex_setprio(struct task_struct queue_flag &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); } =20 - if (prev_class !=3D next_class && p->se.sched_delayed) - dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); - scoped_guard (sched_change, p, queue_flag) { /* * Boosting condition are: @@ -10879,8 +10876,15 @@ struct sched_change_ctx *sched_change_be if (WARN_ON_ONCE(flags & (DEQUEUE_SAVE | DEQUEUE_MOVE))) flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); =20 - if (p->sched_class->switching_from) + if (p->sched_class->switching_from) { + /* + * switching_from_fair() assumes CLASS implies NOCLOCK; + * fixing this assumption would mean switching_from() + * would need to be able to change flags. + */ + WARN_ON(!(flags & DEQUEUE_NOCLOCK)); p->sched_class->switching_from(rq, p); + } } =20 *ctx =3D (struct sched_change_ctx){ --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -5011,9 +5011,6 @@ static void scx_disable_workfn(struct kt queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); } =20 - if (old_class !=3D new_class && p->se.sched_delayed) - dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_N= OCLOCK); - scoped_guard (sched_change, p, queue_flags) { p->sched_class =3D new_class; } @@ -5764,9 +5761,6 @@ static int scx_enable(struct sched_ext_o queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); } =20 - if (old_class !=3D new_class && p->se.sched_delayed) - dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_N= OCLOCK); - scoped_guard (sched_change, p, queue_flags) { p->scx.slice =3D SCX_SLICE_DFL; p->sched_class =3D new_class; --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13221,6 +13221,12 @@ static void attach_task_cfs_rq(struct ta attach_entity_cfs_rq(se); } =20 +static void switching_from_fair(struct rq *rq, struct task_struct *p) +{ + if (p->se.sched_delayed) + dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); +} + static void switched_from_fair(struct rq *rq, struct task_struct *p) { detach_task_cfs_rq(p); @@ -13622,6 +13628,7 @@ DEFINE_SCHED_CLASS(fair) =3D { =20 .reweight_task =3D reweight_task_fair, .prio_changed =3D prio_changed_fair, + .switching_from =3D switching_from_fair, .switched_from =3D switched_from_fair, .switched_to =3D switched_to_fair, =20 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -689,9 +689,6 @@ int __sched_setscheduler(struct task_str queue_flags &=3D ~(DEQUEUE_SAVE | DEQUEUE_MOVE); } =20 - if (prev_class !=3D next_class && p->se.sched_delayed) - dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); - scoped_guard (sched_change, p, queue_flags) { =20 if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) { From nobody Wed Sep 10 23:21:40 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23B983314AC; Wed, 10 Sep 2025 16:00:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520044; cv=none; b=dNkuc3GTWYpBBJrzkNP7FnPNRxZHz0A2XaL5otcuCAJ+tPTpj/LPllWC0vRFeusGM6hX+a0sHuVBMSbvwpqQVPGNdgF8juKm/fZ+L8Tbb+/uOGoi5WiWIX4A5Gvzn3NDkfo4wVfwfaQig0VgBPkoajlhUAye3Y7wVm+0MiMyQ/8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520044; c=relaxed/simple; bh=eJ4oXeYmnXgBRnf1RFj21yJ6/BQU/W/19x07jWAIl7U=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=aNWgX8FYUyTLbWwT2gYsGI9TL7vSo9JBfE4ee0DXOm7YHXewR/7Ut3FxmQey5EE1IcnbU7yfY7KgB8yyWNvxQAizlpoBqnG3tMjRALSsTq3Smrt9jd5hNqOMsj9wdBwpMZzmwJfD89ermJX9z+o1okzZ770pIOTG8ZAo+Di4TrY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=WY4k2TO6; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="WY4k2TO6" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=SF/7cq/p/mZxCjyO1Uj0cUAjeatFeno/8Y4qP/ER83Q=; b=WY4k2TO6T3ojOF/rtHICSfVI6G tqwyqOPFGh/LXvur8NPxacLNpZcM2zrwa7ri3F3IdWhdgMv7zY7mJjy72Zg1KQRWYWuAbgmM64e8+ a3+vsUHqVmY/qTRsm9a0JOGWXqltd++j2cYWSShxrz9kMOMtwApAIbLuD1B9SoD6GOTnvAhCIsPA1 jOfoX8CBpRThoA4r/bS9lHo+wCoJbqpMMUq8bkUJxQLJJDr3o3nAi5k34CHLa2t/0Tj6cv4Mle/cj L5tLyKZdiHIbo0VRptMwzJ+SVblCCYuv/eGiJihThsl0SmOXkZxI7TXxPUV2EeI49wc9fgT/duSUu ZjhwLDZQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFG-0000000BXpC-2zyL; Wed, 10 Sep 2025 16:00:30 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 8D9C6302EA6; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155808.876041318@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:14 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 05/14] sched: Move sched_class::prio_changed() into the change pattern References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move sched_class::prio_changed() into the change pattern. And while there, extend it with sched_class::get_prio() in order to fix the deadline sitation. Suggested-by: Tejun Heo Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 24 +++++++++++++----------- kernel/sched/deadline.c | 20 +++++++++++--------- kernel/sched/ext.c | 8 +------- kernel/sched/fair.c | 8 ++++++-- kernel/sched/idle.c | 5 ++++- kernel/sched/rt.c | 5 ++++- kernel/sched/sched.h | 7 ++++--- kernel/sched/stop_task.c | 5 ++++- kernel/sched/syscalls.c | 9 --------- 9 files changed, 47 insertions(+), 44 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2167,12 +2167,6 @@ inline int task_curr(const struct task_s return cpu_curr(task_cpu(p)) =3D=3D p; } =20 -void check_prio_changed(struct rq *rq, struct task_struct *p, int oldprio) -{ - if (oldprio !=3D p->prio || dl_task(p)) - p->sched_class->prio_changed(rq, p, oldprio); -} - void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags) { struct task_struct *donor =3D rq->donor; @@ -7437,9 +7431,6 @@ void rt_mutex_setprio(struct task_struct p->sched_class =3D next_class; p->prio =3D prio; } - - if (!(queue_flag & DEQUEUE_CLASS)) - check_prio_changed(rq, p, oldprio); out_unlock: /* Avoid rq from going away on us: */ preempt_disable(); @@ -10894,6 +10885,13 @@ struct sched_change_ctx *sched_change_be .running =3D task_current(rq, p), }; =20 + if (!(flags & DEQUEUE_CLASS)) { + if (p->sched_class->get_prio) + ctx->prio =3D p->sched_class->get_prio(rq, p); + else + ctx->prio =3D p->prio; + } + if (ctx->queued) dequeue_task(rq, p, flags); if (ctx->running) @@ -10920,6 +10918,10 @@ void sched_change_end(struct sched_chang if (ctx->running) set_next_task(rq, p); =20 - if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switched_to) - p->sched_class->switched_to(rq, p); + if (ctx->flags & ENQUEUE_CLASS) { + if (p->sched_class->switched_to) + p->sched_class->switched_to(rq, p); + } else { + p->sched_class->prio_changed(rq, p, ctx->prio); + } } --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -3071,23 +3071,24 @@ static void switched_to_dl(struct rq *rq } } =20 +static u64 get_prio_dl(struct rq *rq, struct task_struct *p) +{ + return p->dl.deadline; +} + /* * If the scheduling parameters of a -deadline task changed, * a push or pull operation might be needed. */ -static void prio_changed_dl(struct rq *rq, struct task_struct *p, - int oldprio) +static void prio_changed_dl(struct rq *rq, struct task_struct *p, u64 old_= deadline) { if (!task_on_rq_queued(p)) return; =20 - /* - * This might be too much, but unfortunately - * we don't have the old deadline value, and - * we can't argue if the task is increasing - * or lowering its prio, so... - */ - if (!rq->dl.overloaded) + if (p->dl.deadline =3D=3D old_deadline) + return; + + if (dl_time_before(old_deadline, p->dl.deadline)) deadline_queue_pull_task(rq); =20 if (task_current_donor(rq, p)) { @@ -3142,6 +3143,7 @@ DEFINE_SCHED_CLASS(dl) =3D { .task_tick =3D task_tick_dl, .task_fork =3D task_fork_dl, =20 + .get_prio =3D get_prio_dl, .prio_changed =3D prio_changed_dl, .switched_from =3D switched_from_dl, .switched_to =3D switched_to_dl, --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -4023,7 +4023,7 @@ static void reweight_task_scx(struct rq p, p->scx.weight); } =20 -static void prio_changed_scx(struct rq *rq, struct task_struct *p, int old= prio) +static void prio_changed_scx(struct rq *rq, struct task_struct *p, u64 old= prio) { } =20 @@ -5015,9 +5015,6 @@ static void scx_disable_workfn(struct kt p->sched_class =3D new_class; } =20 - if (!(queue_flags & DEQUEUE_CLASS)) - check_prio_changed(task_rq(p), p, p->prio); - scx_exit_task(p); } scx_task_iter_stop(&sti); @@ -5766,9 +5763,6 @@ static int scx_enable(struct sched_ext_o p->sched_class =3D new_class; } =20 - if (!(queue_flags & DEQUEUE_CLASS)) - check_prio_changed(task_rq(p), p, p->prio); - put_task_struct(p); } scx_task_iter_stop(&sti); --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13122,11 +13122,14 @@ static void task_fork_fair(struct task_s * the current task. */ static void -prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio) +prio_changed_fair(struct rq *rq, struct task_struct *p, u64 oldprio) { if (!task_on_rq_queued(p)) return; =20 + if (p->prio =3D=3D oldprio) + return; + if (rq->cfs.nr_queued =3D=3D 1) return; =20 @@ -13138,8 +13141,9 @@ prio_changed_fair(struct rq *rq, struct if (task_current_donor(rq, p)) { if (p->prio > oldprio) resched_curr(rq); - } else + } else { wakeup_preempt(rq, p, 0); + } } =20 #ifdef CONFIG_FAIR_GROUP_SCHED --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -504,8 +504,11 @@ static void switching_to_idle(struct rq } =20 static void -prio_changed_idle(struct rq *rq, struct task_struct *p, int oldprio) +prio_changed_idle(struct rq *rq, struct task_struct *p, u64 oldprio) { + if (p->prio =3D=3D oldprio) + return; + BUG(); } =20 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2437,11 +2437,14 @@ static void switched_to_rt(struct rq *rq * us to initiate a push or pull. */ static void -prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio) +prio_changed_rt(struct rq *rq, struct task_struct *p, u64 oldprio) { if (!task_on_rq_queued(p)) return; =20 + if (p->prio =3D=3D oldprio) + return; + if (task_current_donor(rq, p)) { /* * If our priority decreases while running, we --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2426,8 +2426,10 @@ struct sched_class { =20 void (*reweight_task)(struct rq *this_rq, struct task_struct *task, const struct load_weight *lw); + + u64 (*get_prio) (struct rq *this_rq, struct task_struct *task); void (*prio_changed) (struct rq *this_rq, struct task_struct *task, - int oldprio); + u64 oldprio); =20 unsigned int (*get_rr_interval)(struct rq *rq, struct task_struct *task); @@ -3852,12 +3854,11 @@ extern void set_load_weight(struct task_ extern void enqueue_task(struct rq *rq, struct task_struct *p, int flags); extern bool dequeue_task(struct rq *rq, struct task_struct *p, int flags); =20 -extern void check_prio_changed(struct rq *rq, struct task_struct *p, int o= ldprio); - extern struct balance_callback *splice_balance_callbacks(struct rq *rq); extern void balance_callbacks(struct rq *rq, struct balance_callback *head= ); =20 struct sched_change_ctx { + u64 prio; struct task_struct *p; int flags; bool queued; --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -81,8 +81,11 @@ static void switching_to_stop(struct rq } =20 static void -prio_changed_stop(struct rq *rq, struct task_struct *p, int oldprio) +prio_changed_stop(struct rq *rq, struct task_struct *p, u64 oldprio) { + if (p->prio =3D=3D oldprio) + return; + BUG(); /* how!?, what priority? */ } =20 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -95,12 +95,6 @@ void set_user_nice(struct task_struct *p old_prio =3D p->prio; p->prio =3D effective_prio(p); } - - /* - * If the task increased its priority or is running and - * lowered its priority, then reschedule its CPU: - */ - p->sched_class->prio_changed(rq, p, old_prio); } EXPORT_SYMBOL(set_user_nice); =20 @@ -708,9 +702,6 @@ int __sched_setscheduler(struct task_str } } =20 - if (!(queue_flags & DEQUEUE_CLASS)) - check_prio_changed(rq, p, oldprio); - /* Avoid rq from going away on us: */ preempt_disable(); head =3D splice_balance_callbacks(rq); From nobody Wed Sep 10 23:21:40 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8C06334391; Wed, 10 Sep 2025 16:00:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520045; cv=none; b=FuIl/LWWip2U3NhwoECOsRbupV7YO9ENXjKr+wMWjPVzljpQQheVKU7SunQRI+G5V9roqKQGlikpoHgzlOGzIjAz34vsPi02G+lgR5VpiLZJOClfbQTE1r25GFnnvV5UsWGkY2e6t7qLLXRJWum9uH4nKRVqWkVsofu67m2450U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520045; c=relaxed/simple; bh=osnhLCq+rTGvUo4rO5z2cPoV6pOR2BCW5uwsZ2PrbYA=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=DXGpRhUILtNau3vyZ3rHpl1f14krcYgvWnJ2VYRL5oTSoMu6eqUdOABGXWYLBxEl6CleD1xV2KJfIWr0aZqseSnoU+d/LxckuF4vFOM/odxo0Mllg9UsTA+nDTqsP4ZlMRZLyLEyLcsAdyo3H1sUSxJ/sNs/7n8ObcdKIVJuOpg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=YwZX65ms; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="YwZX65ms" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=8oEWYSUZB/IIQUVHNUsQ2fhwwdqwSyCfZUQ3TS6Le5k=; b=YwZX65msD5IAoXkZTZqvRa0pGe 7gP3nWoFZwrxdUjsyg/YU9YsyTGCpoUQntPR1n+3jJE6D4os92PLnGuqmldHYTJ9j7pE4fikDhhEA MYIAgLEzne8ay+9JOKtDxx09knIqiyRQyezJvIHHGT780Ir4RWbh4xdoL5xR5Li7UlC8+b1lIvrGA vXjFvoCD8OQ00lN4hX4pRUNQ5RyQIy/PzFD8CMn5uIgIeIdZWL6xnwRFTbfZGPoZU5iDCZZcEyH73 E5pcz280dFI5QvwJ8NYB74NEVvIPYhKmSQ2Rs7KwG2uxYm5wx7WhWy70engR5IhfwEeNPoXMQifCC 0jB36nEg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFG-00000005ufG-37qb; Wed, 10 Sep 2025 16:00:30 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 8FDCB302F12; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155808.987080375@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:15 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 06/14] sched: Fix migrate_disable_switch() locking References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For some reason migrate_disable_switch() was more complicated than it needs to be, resulting in mind bending locking of dubious quality. Recognise that migrate_disable_switch() must be called before a context switch, but any place before that switch is equally good. Since the current place results in troubled locking, simply move the thing before taking rq->lock. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 27 ++++++--------------------- 1 file changed, 6 insertions(+), 21 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2344,10 +2344,10 @@ static void migrate_disable_switch(struc if (p->cpus_ptr !=3D &p->cpus_mask) return; =20 - /* - * Violates locking rules! See comment in __do_set_cpus_allowed(). - */ - __do_set_cpus_allowed(p, &ac); + scoped_guard (task_rq_lock, p) { + update_rq_clock(scope.rq); + __do_set_cpus_allowed(p, &ac); + } } =20 void migrate_disable(void) @@ -2702,22 +2702,7 @@ __do_set_cpus_allowed(struct task_struct struct rq *rq =3D task_rq(p); bool queued, running; =20 - /* - * This here violates the locking rules for affinity, since we're only - * supposed to change these variables while holding both rq->lock and - * p->pi_lock. - * - * HOWEVER, it magically works, because ttwu() is the only code that - * accesses these variables under p->pi_lock and only does so after - * smp_cond_load_acquire(&p->on_cpu, !VAL), and we're in __schedule() - * before finish_task(). - * - * XXX do further audits, this smells like something putrid. - */ - if (ctx->flags & SCA_MIGRATE_DISABLE) - WARN_ON_ONCE(!p->on_cpu); - else - lockdep_assert_held(&p->pi_lock); + lockdep_assert_held(&p->pi_lock); =20 queued =3D task_on_rq_queued(p); running =3D task_current_donor(rq, p); @@ -6816,6 +6801,7 @@ static void __sched notrace __schedule(i =20 local_irq_disable(); rcu_note_context_switch(preempt); + migrate_disable_switch(rq, prev); =20 /* * Make sure that signal_pending_state()->signal_pending() below @@ -6922,7 +6908,6 @@ static void __sched notrace __schedule(i */ ++*switch_count; =20 - migrate_disable_switch(rq, prev); psi_account_irqtime(rq, prev, next); psi_sched_switch(prev, next, !task_on_rq_queued(prev) || prev->se.sched_delayed); From nobody Wed Sep 10 23:21:40 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAEDE334394; Wed, 10 Sep 2025 16:00:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520046; cv=none; b=C65sSjnsZaBdiDwO9hhaiZg3gYhM+86TFMHVzve7/dgPdMXjMjNmL/t0gxWvmOe4SHXlflV8WCmIg0A+oS5eT/E76ArfJpfP5pr+2p/5prG3LeLWr1KTeZAK5RChVXs68n90aN4Uq9gHYWyovv7r5CYfpX90lw4cGyBVScvQQCk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520046; c=relaxed/simple; bh=c5T0jUWnLXQTc3r4WHSJJ8qpcfmmo29c845DsrXaOyw=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=W6JoqCiKi+xR06i6uzAox+AegvW3Dvui5SpNCEwhIrKpQE+PTRs5uYg34vCPeSCbssBxFP/g2vvVPacYP6284EMQPNp95xo5cPNU0k8ruikSaP7ZPFF6LZQjhAFKaQX7Az2dkn5LhY/FF1XeGe512sXD2KdfXatGtSIP3bmbUn8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=g2YPGlwc; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="g2YPGlwc" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=Ld6FtW1Fo6RjyE9mTvnA50fxy0ZwQbJgnqO54sTZjlQ=; b=g2YPGlwciMVnKROglpWKYgBr1Q 3MLhkJYLKXDmkIBBwR4N77vDsVvGha0lLT3eP41ptsoNLRsVCulyzuG0r6QE7eW8xhL3AQ6S+xzus tgXH/pxJIRsdVk5caXDF/RLCPTXojdq0Q9+zJEpcJ3mKISpPTxXFd/gOSMJpoaDt6UeDg7dtf1zuU rqPd+2rT4s2WqtGKtsqtI1oB9h8H1PuFjkOWV0BU/tXbLlIJd5l/dGi7aw7S5swZlpermwkuKgMaV mIKDAf8hj48srXW8WND1sZw3a0/MnOmzFjwp8xmzoYq6jH/TWoNXFxtcJHRYvCPd2uzNGj28hRZnJ WvIGR4EQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFG-00000005ufF-34LG; Wed, 10 Sep 2025 16:00:30 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 93F2B302F27; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155809.103475671@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:16 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 07/14] sched: Fix do_set_cpus_allowed() locking References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" All callers of do_set_cpus_allowed() only take p->pi_lock, which is not sufficient to actually change the cpumask. Again, this is mostly ok in these cases, but it results in unnecessarily complicated reasoning. Furthermore, there is no reason what so ever to not just take all the required locks, so do just that. Signed-off-by: Peter Zijlstra (Intel) --- kernel/kthread.c | 15 +++++---------- kernel/sched/core.c | 21 +++++++-------------- kernel/sched/sched.h | 5 +++++ 3 files changed, 17 insertions(+), 24 deletions(-) --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -593,18 +593,16 @@ EXPORT_SYMBOL(kthread_create_on_node); =20 static void __kthread_bind_mask(struct task_struct *p, const struct cpumas= k *mask, unsigned int state) { - unsigned long flags; - if (!wait_task_inactive(p, state)) { WARN_ON(1); return; } =20 + scoped_guard (raw_spinlock_irqsave, &p->pi_lock) + do_set_cpus_allowed(p, mask); + /* It's safe because the task is inactive. */ - raw_spin_lock_irqsave(&p->pi_lock, flags); - do_set_cpus_allowed(p, mask); p->flags |=3D PF_NO_SETAFFINITY; - raw_spin_unlock_irqrestore(&p->pi_lock, flags); } =20 static void __kthread_bind(struct task_struct *p, unsigned int cpu, unsign= ed int state) @@ -857,7 +855,6 @@ int kthread_affine_preferred(struct task { struct kthread *kthread =3D to_kthread(p); cpumask_var_t affinity; - unsigned long flags; int ret =3D 0; =20 if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) { @@ -882,10 +879,8 @@ int kthread_affine_preferred(struct task list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); kthread_fetch_affinity(kthread, affinity); =20 - /* It's safe because the task is inactive. */ - raw_spin_lock_irqsave(&p->pi_lock, flags); - do_set_cpus_allowed(p, affinity); - raw_spin_unlock_irqrestore(&p->pi_lock, flags); + scoped_guard (raw_spinlock_irqsave, &p->pi_lock) + do_set_cpus_allowed(p, affinity); =20 mutex_unlock(&kthreads_hotplug_lock); out: --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2703,18 +2703,14 @@ __do_set_cpus_allowed(struct task_struct bool queued, running; =20 lockdep_assert_held(&p->pi_lock); + lockdep_assert_rq_held(rq); =20 queued =3D task_on_rq_queued(p); running =3D task_current_donor(rq, p); =20 - if (queued) { - /* - * Because __kthread_bind() calls this on blocked tasks without - * holding rq->lock. - */ - lockdep_assert_rq_held(rq); + if (queued) dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK); - } + if (running) put_prev_task(rq, p); =20 @@ -2743,7 +2739,10 @@ void do_set_cpus_allowed(struct task_str struct rcu_head rcu; }; =20 - __do_set_cpus_allowed(p, &ac); + scoped_guard (__task_rq_lock, p) { + update_rq_clock(scope.rq); + __do_set_cpus_allowed(p, &ac); + } =20 /* * Because this is called with p->pi_lock held, it is not possible @@ -3518,12 +3517,6 @@ static int select_fallback_rq(int cpu, s } fallthrough; case possible: - /* - * XXX When called from select_task_rq() we only - * hold p->pi_lock and again violate locking order. - * - * More yuck to audit. - */ do_set_cpus_allowed(p, task_cpu_fallback_mask(p)); state =3D fail; break; --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1822,6 +1822,11 @@ DEFINE_LOCK_GUARD_1(task_rq_lock, struct task_rq_unlock(_T->rq, _T->lock, &_T->rf), struct rq *rq; struct rq_flags rf) =20 +DEFINE_LOCK_GUARD_1(__task_rq_lock, struct task_struct, + _T->rq =3D __task_rq_lock(_T->lock, &_T->rf), + __task_rq_unlock(_T->rq, &_T->rf), + struct rq *rq; struct rq_flags rf) + static inline void rq_lock_irqsave(struct rq *rq, struct rq_flags *rf) __acquires(rq->lock) { From nobody Wed Sep 10 23:21:40 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66A863314BC; Wed, 10 Sep 2025 16:00:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520044; cv=none; b=Q7TRm4S9y03wZChX59LHzY+wL8JLD/018GsX2OB6zIug4OpD5L2hy0g512ZDetfokVZuR9XKUGG/nkIw/4lJWMoJIyOeNn24uXtOFBJl1q0Zm4Z5FGJmns/abr32kSCf4Gf6oe1PvEGP7lSGhl6p5Tq36K+NBaAmCuw+GxdT8NU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520044; c=relaxed/simple; bh=rIt1lwzsp9XzrZJT1bWsfyVgjSQqSvGb+smlHq7qH/A=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=GVodH72pNSHqJZmxh5DE3s6RXfsgf4CPZpakCNJlwhyly+UIRdhgtxhttCJGSkttloBzPFJH6HOgI69sEtjB3uBdYdw8NVCahazgS/qjBsQocFY8ug8fXKo9lhGfmVdpDy7veH32gtvuIYbAMYBoDRJmB1Iwa2+EIM43CcL1tL4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=paNyjDhm; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="paNyjDhm" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=c76h7a/CjjaErQ2AyCehuglLa0jKRK4q3XvZjeWGlBI=; b=paNyjDhmVYhSaEVsZLHANoq5Yw dggdv+W6D0xAVmmCAo2UOX18Ijl4dpo4Zn6ioMhxta27uG5NejEfZF2Im+MnZM1G7run23ftLXa9C HbpLDFbjEHBI+l45pESmMByCfGudvmJm+WYWMEMT7QhbsHPP8RwPYvlwnTpHCllneFnXrPDejU/qk eDk+B63cZc7QfgcYnQkPEKu8vd/pVb+JP0AmlNine8s85nCxVw7dxTLdakg02lD5DwynQg52CfqBI cdIrOxjoavrV+cEE686OSJrnFsh7iNxW/rvToxaytR6d5zpEQosdiaWMgcdcvQ5dT3Tl3I2adBwRX pQX6Z84A==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFG-00000005ufH-37qg; Wed, 10 Sep 2025 16:00:30 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 97E19302F33; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155809.219537898@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:17 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 08/14] sched: Rename do_set_cpus_allowed() References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Hopefully saner naming. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/sched.h | 4 ++-- kernel/cgroup/cpuset.c | 2 +- kernel/kthread.c | 4 ++-- kernel/sched/core.c | 16 ++++++++-------- kernel/sched/sched.h | 2 +- 5 files changed, 14 insertions(+), 14 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1860,8 +1860,8 @@ extern int task_can_attach(struct task_s extern int dl_bw_alloc(int cpu, u64 dl_bw); extern void dl_bw_free(int cpu, u64 dl_bw); =20 -/* do_set_cpus_allowed() - consider using set_cpus_allowed_ptr() instead */ -extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumas= k *new_mask); +/* set_cpus_allowed_force() - consider using set_cpus_allowed_ptr() instea= d */ +extern void set_cpus_allowed_force(struct task_struct *p, const struct cpu= mask *new_mask); =20 /** * set_cpus_allowed_ptr - set CPU affinity mask of a task --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -4122,7 +4122,7 @@ bool cpuset_cpus_allowed_fallback(struct rcu_read_lock(); cs_mask =3D task_cs(tsk)->cpus_allowed; if (is_in_v2_mode() && cpumask_subset(cs_mask, possible_mask)) { - do_set_cpus_allowed(tsk, cs_mask); + set_cpus_allowed_force(tsk, cs_mask); changed =3D true; } rcu_read_unlock(); --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -599,7 +599,7 @@ static void __kthread_bind_mask(struct t } =20 scoped_guard (raw_spinlock_irqsave, &p->pi_lock) - do_set_cpus_allowed(p, mask); + set_cpus_allowed_force(p, mask); =20 /* It's safe because the task is inactive. */ p->flags |=3D PF_NO_SETAFFINITY; @@ -880,7 +880,7 @@ int kthread_affine_preferred(struct task kthread_fetch_affinity(kthread, affinity); =20 scoped_guard (raw_spinlock_irqsave, &p->pi_lock) - do_set_cpus_allowed(p, affinity); + set_cpus_allowed_force(p, affinity); =20 mutex_unlock(&kthreads_hotplug_lock); out: --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2329,7 +2329,7 @@ unsigned long wait_task_inactive(struct } =20 static void -__do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx); +do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx); =20 static void migrate_disable_switch(struct rq *rq, struct task_struct *p) { @@ -2346,7 +2346,7 @@ static void migrate_disable_switch(struc =20 scoped_guard (task_rq_lock, p) { update_rq_clock(scope.rq); - __do_set_cpus_allowed(p, &ac); + do_set_cpus_allowed(p, &ac); } } =20 @@ -2697,7 +2697,7 @@ void set_cpus_allowed_common(struct task } =20 static void -__do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx) +do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx) { struct rq *rq =3D task_rq(p); bool queued, running; @@ -2727,7 +2727,7 @@ __do_set_cpus_allowed(struct task_struct * Used for kthread_bind() and select_fallback_rq(), in both cases the user * affinity (if any) should be destroyed too. */ -void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_= mask) +void set_cpus_allowed_force(struct task_struct *p, const struct cpumask *n= ew_mask) { struct affinity_context ac =3D { .new_mask =3D new_mask, @@ -2741,7 +2741,7 @@ void do_set_cpus_allowed(struct task_str =20 scoped_guard (__task_rq_lock, p) { update_rq_clock(scope.rq); - __do_set_cpus_allowed(p, &ac); + do_set_cpus_allowed(p, &ac); } =20 /* @@ -2780,7 +2780,7 @@ int dup_user_cpus_ptr(struct task_struct * Use pi_lock to protect content of user_cpus_ptr * * Though unlikely, user_cpus_ptr can be reset to NULL by a concurrent - * do_set_cpus_allowed(). + * set_cpus_allowed_force(). */ raw_spin_lock_irqsave(&src->pi_lock, flags); if (src->user_cpus_ptr) { @@ -3108,7 +3108,7 @@ static int __set_cpus_allowed_ptr_locked goto out; } =20 - __do_set_cpus_allowed(p, ctx); + do_set_cpus_allowed(p, ctx); =20 return affine_move_task(rq, p, rf, dest_cpu, ctx->flags); =20 @@ -3517,7 +3517,7 @@ static int select_fallback_rq(int cpu, s } fallthrough; case possible: - do_set_cpus_allowed(p, task_cpu_fallback_mask(p)); + set_cpus_allowed_force(p, task_cpu_fallback_mask(p)); state =3D fail; break; case fail: --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2592,7 +2592,7 @@ static inline bool task_allowed_on_cpu(s static inline cpumask_t *alloc_user_cpus_ptr(int node) { /* - * See do_set_cpus_allowed() above for the rcu_head usage. + * See set_cpus_allowed_force() above for the rcu_head usage. */ int size =3D max_t(int, cpumask_size(), sizeof(struct rcu_head)); From nobody Wed Sep 10 23:21:40 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3A5A32F774; Wed, 10 Sep 2025 16:00:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520043; cv=none; b=einX+nebhJCyVua0niIwcc2pnNos2rKcssjQ86znKEWpCF4sFis10KC8pdcZ+2k+kAlH1/WXsZj29JnPOOSL3fEM61GD6KJh7gpsxQK7ZFa5dm4fDrj8lAQm+P+ABlYT+JQGMb9K8jAPSisn5AWyMzXvDOaJBQulkW5Ap8f6kJk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520043; c=relaxed/simple; bh=+SBu/NKPdyC5nAz1A9E/LJLm0LC/xkfFy8dViSf1hSs=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=HgoXVKsxkWMTwKKp4cJFJgp59rjAMrI6kaSDwqBHOiDavfTMp9x7XVNucV2vCkpjqWDxOEOA589MyvzgLvKSew7R1IFm7l02EmaSg8Mhr1RKxPajk/EUPPWMbBSFr+7FNZIyrbJgMc7tPAf51yfXhR+dWmkAoCZjipSw82UVE4g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=bcWTM8YD; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="bcWTM8YD" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=U/+Fx4A53xEfXCRTuNuO6C8QlTHTslSx7NN0UvlNNqQ=; b=bcWTM8YDZG0VJ4zuUGgxfAvGyJ I8sSG4DFfOgGhfmhOWVUqycFGoFKbnVu+d/BjfOUar6wYxz8ASouzcxUnSXsKDqGySTpXwwtlrezm ftu10Vv1TO7RECuV8VEz2wQ7YSyYslETQNLtANtNbC8EPPK1x9CULrLSUSA9bKJHFEWBoBr9dwfgz VLZpAIZXd+rXJD21llGUECfgoICgVgINRh/l2up3dUleKYb9ysyqnCPBHHJ+vkAngAGA5URvwKtjd lI+OnwBg0kdgYA6XBoMqIM9HEBs/RFJJ1O1AX2+HxkuxknyasjGPRLXxhff7PPQeX8nxXMwyy7U0R /nIqK6Kg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFG-00000005ufI-3FL8; Wed, 10 Sep 2025 16:00:30 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 9C053302F52; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155809.336621043@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:18 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 09/14] sched: Make __do_set_cpus_allowed() use the sched_change pattern References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that do_set_cpus_allowed() holds all the regular locks, convert it to use the sched_change pattern helper. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 26 +++++--------------------- 1 file changed, 5 insertions(+), 21 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2697,28 +2697,12 @@ void set_cpus_allowed_common(struct task static void do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx) { - struct rq *rq =3D task_rq(p); - bool queued, running; + u32 flags =3D DEQUEUE_SAVE | DEQUEUE_NOCLOCK; =20 - lockdep_assert_held(&p->pi_lock); - lockdep_assert_rq_held(rq); - - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - - if (queued) - dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK); - - if (running) - put_prev_task(rq, p); - - p->sched_class->set_cpus_allowed(p, ctx); - mm_set_cpus_allowed(p->mm, ctx->new_mask); - - if (queued) - enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK); - if (running) - set_next_task(rq, p); + scoped_guard (sched_change, p, flags) { + p->sched_class->set_cpus_allowed(p, ctx); + mm_set_cpus_allowed(p->mm, ctx->new_mask); + } } =20 /* From nobody Wed Sep 10 23:21:40 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F7CD3375AD; Wed, 10 Sep 2025 16:00:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520046; cv=none; b=GlG4Qx+bo/rAM3ejJDGC99dW8mzzL8WvrChQZGsyZlAhd7svI7oyK3++dTe2TQXPAsoLd3CZEdQnEHxQtAWZxUGXwHZdpK50AWkNYhq4mRGDZL4q5QJq3qXoUhCH1vAEZ6Jft4QDrSK0g72OzOEIS4ofrii7xhY/+oiKsBzTZkw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520046; c=relaxed/simple; bh=KfepqktiJ/syjXebA141BLmjGiDQ3zjGf/IqCiTygCM=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=OswDdvRGnz7sS/MVa1/JtdZE8+vFl4ipbKOHLmRQWEQ1R5mKic5GCh1U8LiGPD5cjYe5qALgAGrooYAXs5wYTTkdNeNBciFNTbdN0MVmO9UCJ+S99zZ9s14l9vBPBLRegDkA1JEBaZHb/7y5MnIqI35y5w90RMnznyY2CKCAaTc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=bna4uYzo; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="bna4uYzo" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=hM4Arz3ECp2TB6CkTXCrwFeiKmOj4WOQ4RjqWcON0XU=; b=bna4uYzo5OJEogGWmcdOKqMFi/ bOn5jIcSb9kpFz7I130r+/uZY9DI/COA37AH4BR2f9fKYSlzwBp4XCjdAZ5lcIfYiy4S/4bhpNAHV aLWSZos61eEuuf+m+uf3iPZnPCeZC1f/CK3vYdXvpUTFy8C7HQCGc4eWYICtnQmua4oOXjFONskBp zajxWi//oI4eM++3/Vm5D+5K0RjX3fQgGraJ3KMunrQBl3Upw7eT92lZv5xrMjN6RwV8tqIc9pjyp 5IRxP1w881ZXUzW2L/MzfXib6Hb/Li/INPEp+mL3lUlkz0GRki+/SpHt7XoFH1iMEVR62JWveSUaG DnvcBD1Q==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFG-00000005ufJ-3LN1; Wed, 10 Sep 2025 16:00:30 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id A0117302F5F; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155809.456383178@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:19 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 10/14] sched: Add locking comments to sched_class methods References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 'Document' the locking context the various sched_class methods are called under. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 6 +- kernel/sched/sched.h | 106 ++++++++++++++++++++++++++++++++++++++++++++++= ++--- 2 files changed, 103 insertions(+), 9 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -581,8 +581,8 @@ EXPORT_SYMBOL(__trace_set_current_state) * * p->on_rq <- { 0, 1 =3D TASK_ON_RQ_QUEUED, 2 =3D TASK_ON_RQ_MIGRATING }: * - * is set by activate_task() and cleared by deactivate_task(), under - * rq->lock. Non-zero indicates the task is runnable, the special + * is set by activate_task() and cleared by deactivate_task()/block_task= (), + * under rq->lock. Non-zero indicates the task is runnable, the special * ON_RQ_MIGRATING state is used for migration without holding both * rq->locks. It indicates task_cpu() is not stable, see task_rq_lock(). * @@ -4193,7 +4193,7 @@ int try_to_wake_up(struct task_struct *p * __schedule(). See the comment for smp_mb__after_spinlock(). * * Form a control-dep-acquire with p->on_rq =3D=3D 0 above, to ensure - * schedule()'s deactivate_task() has 'happened' and p will no longer + * schedule()'s block_task() has 'happened' and p will no longer * care about it's own p->state. See the comment in __schedule(). */ smp_acquire__after_ctrl_dep(); --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2320,8 +2320,7 @@ extern const u32 sched_prio_to_wmult[40 /* * {de,en}queue flags: * - * DEQUEUE_SLEEP - task is no longer runnable - * ENQUEUE_WAKEUP - task just became runnable + * SLEEP/WAKEUP - task is no-longer/just-became runnable * * SAVE/RESTORE - an otherwise spurious dequeue/enqueue, done to ensure ta= sks * are in a known state which allows modification. Such pai= rs @@ -2334,6 +2333,11 @@ extern const u32 sched_prio_to_wmult[40 * * MIGRATION - p->on_rq =3D=3D TASK_ON_RQ_MIGRATING (used for DEADLINE) * + * DELAYED - de/re-queue a sched_delayed task + * + * CLASS - going to update p->sched_class; makes sched_change call the + * various switch methods. + * * ENQUEUE_HEAD - place at front of runqueue (tail if not specified) * ENQUEUE_REPLENISH - CBS (replenish runtime and postpone deadline) * ENQUEUE_MIGRATED - the task was migrated during wakeup @@ -2384,14 +2388,50 @@ struct sched_class { int uclamp_enabled; #endif =20 + /* + * move_queued_task/activate_task/enqueue_task: rq->lock + * ttwu_do_activate/activate_task/enqueue_task: rq->lock + * wake_up_new_task/activate_task/enqueue_task: task_rq_lock + * ttwu_runnable/enqueue_task: task_rq_lock + * proxy_task_current: rq->lock + * sched_change_end + */ void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags); + /* + * move_queued_task/deactivate_task/dequeue_task: rq->lock + * __schedule/block_task/dequeue_task: rq->lock + * proxy_task_current: rq->lock + * wait_task_inactive: task_rq_lock + * sched_change_begin + */ bool (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags); + + /* + * do_sched_yield: rq->lock + */ void (*yield_task) (struct rq *rq); + /* + * yield_to: rq->lock (double) + */ bool (*yield_to_task)(struct rq *rq, struct task_struct *p); =20 + /* + * move_queued_task: rq->lock + * __migrate_swap_task: rq->lock + * ttwu_do_activate: rq->lock + * ttwu_runnable: task_rq_lock + * wake_up_new_task: task_rq_lock + */ void (*wakeup_preempt)(struct rq *rq, struct task_struct *p, int flags); =20 + /* + * schedule/pick_next_task/prev_balance: rq->lock + */ int (*balance)(struct rq *rq, struct task_struct *prev, struct rq_flags *= rf); + + /* + * schedule/pick_next_task: rq->lock + */ struct task_struct *(*pick_task)(struct rq *rq); /* * Optional! When implemented pick_next_task() should be equivalent to: @@ -2404,48 +2444,102 @@ struct sched_class { */ struct task_struct *(*pick_next_task)(struct rq *rq, struct task_struct *= prev); =20 + /* + * sched_change: + * __schedule: rq->lock + */ void (*put_prev_task)(struct rq *rq, struct task_struct *p, struct task_s= truct *next); void (*set_next_task)(struct rq *rq, struct task_struct *p, bool first); =20 + /* + * select_task_rq: p->pi_lock + * sched_exec: p->pi_lock + */ int (*select_task_rq)(struct task_struct *p, int task_cpu, int flags); =20 + /* + * set_task_cpu: p->pi_lock || rq->lock (ttwu like) + */ void (*migrate_task_rq)(struct task_struct *p, int new_cpu); =20 + /* + * ttwu_do_activate: rq->lock + * wake_up_new_task: task_rq_lock + */ void (*task_woken)(struct rq *this_rq, struct task_struct *task); =20 + /* + * do_set_cpus_allowed: task_rq_lock + sched_change + */ void (*set_cpus_allowed)(struct task_struct *p, struct affinity_context *= ctx); =20 + /* + * sched_set_rq_{on,off}line: rq->lock + */ void (*rq_online)(struct rq *rq); void (*rq_offline)(struct rq *rq); =20 + /* + * push_cpu_stop: p->pi_lock && rq->lock + */ struct rq *(*find_lock_rq)(struct task_struct *p, struct rq *rq); =20 + /* + * hrtick: rq->lock + * sched_tick: rq->lock + * sched_tick_remote: rq->lock + */ void (*task_tick)(struct rq *rq, struct task_struct *p, int queued); + /* + * sched_cgroup_fork: p->pi_lock + */ void (*task_fork)(struct task_struct *p); + /* + * finish_task_switch: no locks + */ void (*task_dead)(struct task_struct *p); =20 + /* + * sched_change + */ void (*switching_from)(struct rq *this_rq, struct task_struct *task); void (*switched_from) (struct rq *this_rq, struct task_struct *task); void (*switching_to) (struct rq *this_rq, struct task_struct *task); void (*switched_to) (struct rq *this_rq, struct task_struct *task); - - void (*reweight_task)(struct rq *this_rq, struct task_struct *task, - const struct load_weight *lw); - u64 (*get_prio) (struct rq *this_rq, struct task_struct *task); void (*prio_changed) (struct rq *this_rq, struct task_struct *task, u64 oldprio); =20 + /* + * set_load_weight: task_rq_lock + sched_change + * __setscheduler_parms: task_rq_lock + sched_change + */ + void (*reweight_task)(struct rq *this_rq, struct task_struct *task, + const struct load_weight *lw); + + /* + * sched_rr_get_interval: task_rq_lock + */ unsigned int (*get_rr_interval)(struct rq *rq, struct task_struct *task); =20 + /* + * task_sched_runtime: task_rq_lock + */ void (*update_curr)(struct rq *rq); =20 #ifdef CONFIG_FAIR_GROUP_SCHED + /* + * sched_change_group: task_rq_lock + sched_change + */ void (*task_change_group)(struct task_struct *p); #endif =20 #ifdef CONFIG_SCHED_CORE + /* + * pick_next_task: rq->lock + * try_steal_cookie: rq->lock (double) + */ int (*task_is_throttled)(struct task_struct *p, int cpu); #endif }; From nobody Wed Sep 10 23:21:40 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 894DE27FD76; Wed, 10 Sep 2025 16:00:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520042; cv=none; b=KysK0HYmI5lMu2bwVdyHqJk6VPk5q5A5gAu05Tjch5zUCSoyXsKx/baneNWu+ZL5zArARL5VQOoMDcm8fJGpEq+CzEMzM6QwAl/rIVFYCihnef/oGR4QB7kYymf4LAQteY+wtP6/P+NYtk3N1/YJxuSeNInzr1t7V5aXelZiJ3U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520042; c=relaxed/simple; bh=ez4YCv5vbah9NEmQRbnqaVJy8liiti6yFa2mJFPFaeQ=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=GDg0rV94fRxzdhLpay3EXjL9xLKZH3jxzQnRYOvIQhGQpgyeb+6Ryq2WvEbcslr8gHdljK3bFwZBwyufX5qDd438HzD2U1CzGCeFvldThZYnE5wv+AStSkxHUUowz2/x4gW9jT2CK3qRk1nc1ieVoxuvUG/NNGf1Yo286F8Gquc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=huXY/J11; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="huXY/J11" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=6HCkL6pDdwM+l3KCDVSNPOCM5Jj8sjUY/Qzwb+O256w=; b=huXY/J118POEIqYFDc1P50Gvwo OzR2l37ochnVBUh/hM1gCtQeLM2c1jlYaKW83a8UIzHQcBswLLcnH6rlgnfzWvAAncAvygSBeOElY G3B7u4QUWX4rBSTMM9iRd/5/5LCUgDXMgNcavFbdGBmouFibtIO0v6JPCShalqr2nQ+r3HDWwcLYR 5nxNA6ZbfQPpozrNLoMXU0QYnszOvRi6sZLxlrMtDqIoh5hVWzt6x8QXPGSxTTuxTMf6KzjM7HwMH jkrcdCrvCAnKSqyLjKsKBHJHdx5uJASFWKv/wfOclWgKxwV08dNDmuw8gwd9vFjnbqvHG3yFdj3pE itv05B1w==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFH-0000000BXpG-0TbN; Wed, 10 Sep 2025 16:00:31 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id A3FFF302F8A; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155809.571531402@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:20 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 11/14] sched: Add flags to {put_prev,set_next}_task() methods References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 4 ++-- kernel/sched/deadline.c | 6 ++++-- kernel/sched/ext.c | 4 ++-- kernel/sched/fair.c | 8 +++++--- kernel/sched/idle.c | 5 +++-- kernel/sched/rt.c | 6 ++++-- kernel/sched/sched.h | 18 ++++++++++-------- kernel/sched/stop_task.c | 5 +++-- 8 files changed, 33 insertions(+), 23 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10857,7 +10857,7 @@ struct sched_change_ctx *sched_change_be if (ctx->queued) dequeue_task(rq, p, flags); if (ctx->running) - put_prev_task(rq, p); + put_prev_task(rq, p, flags); =20 if ((flags & DEQUEUE_CLASS) && p->sched_class->switched_from) p->sched_class->switched_from(rq, p); @@ -10878,7 +10878,7 @@ void sched_change_end(struct sched_chang if (ctx->queued) enqueue_task(rq, p, ctx->flags | ENQUEUE_NOCLOCK); if (ctx->running) - set_next_task(rq, p); + set_next_task(rq, p, ctx->flags); =20 if (ctx->flags & ENQUEUE_CLASS) { if (p->sched_class->switched_to) --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2340,10 +2340,11 @@ static void start_hrtick_dl(struct rq *r } #endif /* !CONFIG_SCHED_HRTICK */ =20 -static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool fi= rst) +static void set_next_task_dl(struct rq *rq, struct task_struct *p, int fla= gs) { struct sched_dl_entity *dl_se =3D &p->dl; struct dl_rq *dl_rq =3D &rq->dl; + bool first =3D flags & ENQUEUE_FIRST; =20 p->se.exec_start =3D rq_clock_task(rq); if (on_dl_rq(&p->dl)) @@ -2413,7 +2414,8 @@ static struct task_struct *pick_task_dl( return __pick_task_dl(rq); } =20 -static void put_prev_task_dl(struct rq *rq, struct task_struct *p, struct = task_struct *next) +static void put_prev_task_dl(struct rq *rq, struct task_struct *p, + struct task_struct *next, int flags) { struct sched_dl_entity *dl_se =3D &p->dl; struct dl_rq *dl_rq =3D &rq->dl; --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -3243,7 +3243,7 @@ static void process_ddsp_deferred_locals } } =20 -static void set_next_task_scx(struct rq *rq, struct task_struct *p, bool f= irst) +static void set_next_task_scx(struct rq *rq, struct task_struct *p, int fl= ags) { struct scx_sched *sch =3D scx_root; =20 @@ -3346,7 +3346,7 @@ static void switch_class(struct rq *rq, } =20 static void put_prev_task_scx(struct rq *rq, struct task_struct *p, - struct task_struct *next) + struct task_struct *next, int flags) { struct scx_sched *sch =3D scx_root; update_curr_scx(rq); --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8839,7 +8839,7 @@ static struct task_struct *pick_task_fai } =20 static void __set_next_task_fair(struct rq *rq, struct task_struct *p, boo= l first); -static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool = first); +static void set_next_task_fair(struct rq *rq, struct task_struct *p, int f= lags); =20 struct task_struct * pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_fla= gs *rf) @@ -8955,7 +8955,8 @@ void fair_server_init(struct rq *rq) /* * Account for a descheduled task: */ -static void put_prev_task_fair(struct rq *rq, struct task_struct *prev, st= ruct task_struct *next) +static void put_prev_task_fair(struct rq *rq, struct task_struct *prev, + struct task_struct *next, int flags) { struct sched_entity *se =3D &prev->se; struct cfs_rq *cfs_rq; @@ -13286,9 +13287,10 @@ static void __set_next_task_fair(struct * This routine is mostly called to set cfs_rq->curr field when a task * migrates between groups/classes. */ -static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool = first) +static void set_next_task_fair(struct rq *rq, struct task_struct *p, int f= lags) { struct sched_entity *se =3D &p->se; + bool first =3D flags & ENQUEUE_FIRST; =20 for_each_sched_entity(se) { struct cfs_rq *cfs_rq =3D cfs_rq_of(se); --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -452,13 +452,14 @@ static void wakeup_preempt_idle(struct r resched_curr(rq); } =20 -static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, st= ruct task_struct *next) +static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, + struct task_struct *next, int flags) { dl_server_update_idle_time(rq, prev); scx_update_idle(rq, false, true); } =20 -static void set_next_task_idle(struct rq *rq, struct task_struct *next, bo= ol first) +static void set_next_task_idle(struct rq *rq, struct task_struct *next, in= t flags) { update_idle_core(rq); scx_update_idle(rq, true, true); --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1636,10 +1636,11 @@ static void wakeup_preempt_rt(struct rq check_preempt_equal_prio(rq, p); } =20 -static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, = bool first) +static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, = int flags) { struct sched_rt_entity *rt_se =3D &p->rt; struct rt_rq *rt_rq =3D &rq->rt; + bool first =3D flags & ENQUEUE_FIRST; =20 p->se.exec_start =3D rq_clock_task(rq); if (on_rt_rq(&p->rt)) @@ -1707,7 +1708,8 @@ static struct task_struct *pick_task_rt( return p; } =20 -static void put_prev_task_rt(struct rq *rq, struct task_struct *p, struct = task_struct *next) +static void put_prev_task_rt(struct rq *rq, struct task_struct *p, + struct task_struct *next, int flags) { struct sched_rt_entity *rt_se =3D &p->rt; struct rt_rq *rt_rq =3D &rq->rt; --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2370,7 +2370,9 @@ extern const u32 sched_prio_to_wmult[40 #define ENQUEUE_REPLENISH 0x00020000 #define ENQUEUE_MIGRATED 0x00040000 #define ENQUEUE_INITIAL 0x00080000 + #define ENQUEUE_RQ_SELECTED 0x00100000 +#define ENQUEUE_FIRST 0x00200000 =20 #define RETRY_TASK ((void *)-1UL) =20 @@ -2448,8 +2450,8 @@ struct sched_class { * sched_change: * __schedule: rq->lock */ - void (*put_prev_task)(struct rq *rq, struct task_struct *p, struct task_s= truct *next); - void (*set_next_task)(struct rq *rq, struct task_struct *p, bool first); + void (*put_prev_task)(struct rq *rq, struct task_struct *p, struct task_s= truct *next, int flags); + void (*set_next_task)(struct rq *rq, struct task_struct *p, int flags); =20 /* * select_task_rq: p->pi_lock @@ -2544,15 +2546,15 @@ struct sched_class { #endif }; =20 -static inline void put_prev_task(struct rq *rq, struct task_struct *prev) +static inline void put_prev_task(struct rq *rq, struct task_struct *prev, = int flags) { WARN_ON_ONCE(rq->donor !=3D prev); - prev->sched_class->put_prev_task(rq, prev, NULL); + prev->sched_class->put_prev_task(rq, prev, NULL, flags); } =20 -static inline void set_next_task(struct rq *rq, struct task_struct *next) +static inline void set_next_task(struct rq *rq, struct task_struct *next, = int flags) { - next->sched_class->set_next_task(rq, next, false); + next->sched_class->set_next_task(rq, next, flags); } =20 static inline void @@ -2576,8 +2578,8 @@ static inline void put_prev_set_next_tas if (next =3D=3D prev) return; =20 - prev->sched_class->put_prev_task(rq, prev, next); - next->sched_class->set_next_task(rq, next, true); + prev->sched_class->put_prev_task(rq, prev, next, 0); + next->sched_class->set_next_task(rq, next, ENQUEUE_FIRST); } =20 /* --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -27,7 +27,7 @@ wakeup_preempt_stop(struct rq *rq, struc /* we're never preempted */ } =20 -static void set_next_task_stop(struct rq *rq, struct task_struct *stop, bo= ol first) +static void set_next_task_stop(struct rq *rq, struct task_struct *stop, in= t flags) { stop->se.exec_start =3D rq_clock_task(rq); } @@ -58,7 +58,8 @@ static void yield_task_stop(struct rq *r BUG(); /* the stop task should never yield, its pointless. */ } =20 -static void put_prev_task_stop(struct rq *rq, struct task_struct *prev, st= ruct task_struct *next) +static void put_prev_task_stop(struct rq *rq, struct task_struct *prev, + struct task_struct *next, int flags) { update_curr_common(rq); } From nobody Wed Sep 10 23:21:40 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB40926B75C; Wed, 10 Sep 2025 16:00:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520041; cv=none; b=WnyNrYTCGbITH3wjexACjJkXTVmIuJtvojVbnMGaV0tOH/YI57vCbz3NKcRmmAmtZSmLSSWIMfzFnXdbNWDmIytpghUVN3lO3x74BzKdJUJFuhzb2ZnPkcT4Fg2c+j4GOxoms3bgUqELC8hUHBJbKleFEDfu1IW2PtNZESk+tTM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520041; c=relaxed/simple; bh=v/Iu70dxRzbGDCv8MSQ/4Ric2NCqoqB/0rR4wmEUtFU=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=l39jzq5GmhQ+73ObXnF8wAGUCk2y5FCdqf83xvAbS7BoEMJN7yYH69FLnzLO6aQgyZ3c4xXuKensR6vwRmiQsDtqSlRqR06LqLEb+RLuK82yrwPTyhR24qk+Fi0oHGNcj4T+jw8bssuWM+opdmXMZ9rzRU1jt3mOreuaFJ4zQPs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=SXt8fcWT; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="SXt8fcWT" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=5odGo3alJRTX8gNydTE0XZfL6eVbL5R2OiB+CAHbetE=; b=SXt8fcWTTp6l1Z/JZkZuUcRAZL bjrlm97YT27B/8/d9gHCTxDXjFU6+/7ECshmtbp2PxsPidaGdmWMK1ipDbp8vr4ptbGCp6jF18tAl +ZldycbT1OxQL6rdp1QGWkdcvleFsT2eR4urMobeRYY+6zdtqWF4nRZlnCm0dgIFUfqzCEZGdQjgv ts/NnmMKBVb45T/dRaVseY1EHhmkfNopU5VfDR4R+c+O+/Kev+oJ5yoYhBUVSzELLStYjfWl90tVd Utd2PvSBImiOCspMAL/vJrZ6X45YLk7RZ9T1FNbqsPrPtA5QxtWW8/VT0f1PSnuMtauYAKSwOg3Ci Nhp4jaZw==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFH-0000000BXpI-0Wsj; Wed, 10 Sep 2025 16:00:31 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id A82F8302FD7; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155809.684653538@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:21 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 12/14] sched: Add shared runqueue locking to __task_rq_lock() References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to fix the whole SCHED_EXT balance/pick mess, and avoid further complicating all this, make the regular: p->pi_lock rq->lock dsq->lock order work. Notably, while sched_class::pick_task() is called with rq->lock held, and pick_task_scx() takes dsq->lock, and while the normal sched_change pattern goes into dequeue/enqueue and thus takes dsq->lock, various other things like task_call_func() / sched_setaffinity() do not necessarily do so. Therefore, add a per task spinlock pointer that can be set to reference the shared runqueue lock where appropriate and teach __task_rq_lock() to take this long along with rq->lock. This ensures all 'normal' scheduling operations serialize against the shared lock. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/sched.h | 2 +- kernel/sched/core.c | 27 ++++++++++++++++++++++----- kernel/sched/sched.h | 10 ++++++---- kernel/sched/stats.h | 2 +- 4 files changed, 30 insertions(+), 11 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1225,8 +1225,8 @@ struct task_struct { /* Protection against (de-)allocation: mm, files, fs, tty, keyrings, mems= _allowed, mempolicy: */ spinlock_t alloc_lock; =20 - /* Protection of the PI data structures: */ raw_spinlock_t pi_lock; + raw_spinlock_t *srq_lock; =20 struct wake_q_node wake_q; =20 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -703,17 +703,24 @@ void double_rq_lock(struct rq *rq1, stru struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf) __acquires(rq->lock) { + raw_spinlock_t *slock; struct rq *rq; =20 lockdep_assert_held(&p->pi_lock); =20 for (;;) { rq =3D task_rq(p); + slock =3D p->srq_lock; raw_spin_rq_lock(rq); - if (likely(rq =3D=3D task_rq(p) && !task_on_rq_migrating(p))) { + if (slock) + raw_spin_lock(slock); + if (likely(rq =3D=3D task_rq(p) && !task_on_rq_migrating(p) && + (!slock || p->srq_lock =3D=3D slock))) { rq_pin_lock(rq, rf); return rq; } + if (slock) + raw_spin_unlock(slock); raw_spin_rq_unlock(rq); =20 while (unlikely(task_on_rq_migrating(p))) @@ -728,12 +735,16 @@ struct rq *task_rq_lock(struct task_stru __acquires(p->pi_lock) __acquires(rq->lock) { + raw_spinlock_t *slock; struct rq *rq; =20 for (;;) { raw_spin_lock_irqsave(&p->pi_lock, rf->flags); rq =3D task_rq(p); + slock =3D p->srq_lock; raw_spin_rq_lock(rq); + if (slock) + raw_spin_lock(slock); /* * move_queued_task() task_rq_lock() * @@ -751,10 +762,14 @@ struct rq *task_rq_lock(struct task_stru * dependency headed by '[L] rq =3D task_rq()' and the acquire * will pair with the WMB to ensure we then also see migrating. */ - if (likely(rq =3D=3D task_rq(p) && !task_on_rq_migrating(p))) { + if (likely(rq =3D=3D task_rq(p) && !task_on_rq_migrating(p) && + (!slock || p->srq_lock =3D=3D slock))) { rq_pin_lock(rq, rf); return rq; } + + if (slock) + raw_spin_unlock(slock); raw_spin_rq_unlock(rq); raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags); =20 @@ -2617,7 +2632,8 @@ static int migration_cpu_stop(void *data */ WARN_ON_ONCE(!pending->stop_pending); preempt_disable(); - task_rq_unlock(rq, p, &rf); + rq_unlock(rq, &rf); + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop, &pending->arg, &pending->stop_work); preempt_enable(); @@ -2626,7 +2642,8 @@ static int migration_cpu_stop(void *data out: if (pending) pending->stop_pending =3D false; - task_rq_unlock(rq, p, &rf); + rq_unlock(rq, &rf); + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); =20 if (complete) complete_all(&pending->done); @@ -3743,7 +3760,7 @@ static int ttwu_runnable(struct task_str ttwu_do_wakeup(p); ret =3D 1; } - __task_rq_unlock(rq, &rf); + __task_rq_unlock(rq, p, &rf); =20 return ret; } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1800,10 +1800,13 @@ struct rq *task_rq_lock(struct task_stru __acquires(p->pi_lock) __acquires(rq->lock); =20 -static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf) +static inline void +__task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf) __releases(rq->lock) { rq_unpin_lock(rq, rf); + if (p->srq_lock) + raw_spin_unlock(p->srq_lock); raw_spin_rq_unlock(rq); } =20 @@ -1812,8 +1815,7 @@ task_rq_unlock(struct rq *rq, struct tas __releases(rq->lock) __releases(p->pi_lock) { - rq_unpin_lock(rq, rf); - raw_spin_rq_unlock(rq); + __task_rq_unlock(rq, p, rf); raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags); } =20 @@ -1824,7 +1826,7 @@ DEFINE_LOCK_GUARD_1(task_rq_lock, struct =20 DEFINE_LOCK_GUARD_1(__task_rq_lock, struct task_struct, _T->rq =3D __task_rq_lock(_T->lock, &_T->rf), - __task_rq_unlock(_T->rq, &_T->rf), + __task_rq_unlock(_T->rq, _T->lock, &_T->rf), struct rq *rq; struct rq_flags rf) =20 static inline void rq_lock_irqsave(struct rq *rq, struct rq_flags *rf) --- a/kernel/sched/stats.h +++ b/kernel/sched/stats.h @@ -206,7 +206,7 @@ static inline void psi_ttwu_dequeue(stru =20 rq =3D __task_rq_lock(p, &rf); psi_task_change(p, p->psi_flags, 0); - __task_rq_unlock(rq, &rf); + __task_rq_unlock(rq, p, &rf); } } From nobody Wed Sep 10 23:21:40 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A91E1334375; Wed, 10 Sep 2025 16:00:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520044; cv=none; b=udxh0w3vxvdPvq9wLKUnuHkrj4JF25mtird0FvkpF3/JEjowp5rdd7bpwobZ8k7CcpegnayTrBBpowrhZhpDPcxOxCLcVWi4KRV/R+5/FbHyVaHEQe2Mbhhb1x9kUbwwqklINGgu5KtLBD1ZW6slUIuzKoJol+CH/+qTQ7WFV1k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520044; c=relaxed/simple; bh=PmRp8XW4Z8YyfaN/EKoesgC7XksetQTnuq0iyfQMVmw=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Wu/SCMTANXaJ8j5pJwM7ps4UPz1qiC7ZqHKzGReOjWOsGcjxDQKQ/U2saUl0EPE3mH9chg+tbawlikPfioxpopcb8gtzNa/L6XFxTsTin2CofyxZTw1bLCxGipDUkLx9hzi5/stNxzwTKB8o7N4UxpKnuud23AarddZ7lCx/O14= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Lny8shyH; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Lny8shyH" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=8vR+1d+dAYzwiQmuRl68SeQrWl74OXubMqH95iJkoXk=; b=Lny8shyHL0DcOeja3lfUQs2MQA xKHpgltAtqn6wIN2VmQ44rsdFqQN0x3iuYk40rEom6xglktWMIB2SVsdnXbDEmXQZX88RAmojhG+c uC/s27iR+m0gvnosKCj6gZMp7UFd4VkChTynAZBPWus97VqMVKU8lHbpsVVCR9z+ob5RrPNHtTLYY ubhawpenIqJ3SfEpEk5BR+wfm1V7LbtIm4Bcc0XGkIoueBNj5Vw5TazqLMtB+LZsrOGxTVeGGqDIm JS8g50842bDWanV6OiH/ScGLpp05yiz+vUnPMnS42gwH8/LjWFukGokvC/Z43OlbY4zCH2JIYZFnm XMpVUWBg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFH-00000005ufO-0WGg; Wed, 10 Sep 2025 16:00:31 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id AC282302FFB; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155809.800554594@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:22 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 13/14] sched: Add {DE,EN}QUEUE_LOCKED References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Provide a LOCKED queue flag, indicating that the {en,de}queue() operation is in task_rq_lock() context. Note: the sched_change in scx_bypass() is the only one that does not use task_rq_lock(). If that were fixed, we could have sched_change imply LOCKED. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 31 +++++++++++++++++++++++++------ kernel/sched/sched.h | 7 +++++++ kernel/sched/syscalls.c | 4 ++-- 3 files changed, 34 insertions(+), 8 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2716,7 +2716,7 @@ void set_cpus_allowed_common(struct task static void do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx) { - u32 flags =3D DEQUEUE_SAVE | DEQUEUE_NOCLOCK; + u32 flags =3D DEQUEUE_SAVE | DEQUEUE_NOCLOCK | DEQUEUE_LOCKED; =20 scoped_guard (sched_change, p, flags) { p->sched_class->set_cpus_allowed(p, ctx); @@ -3749,7 +3749,7 @@ static int ttwu_runnable(struct task_str if (task_on_rq_queued(p)) { update_rq_clock(rq); if (p->se.sched_delayed) - enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED); + enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED | ENQUEUE_LOCKED); if (!task_on_cpu(rq, p)) { /* * When on_rq && !on_cpu the task is preempted, see if @@ -4816,7 +4816,7 @@ void wake_up_new_task(struct task_struct update_rq_clock(rq); post_init_entity_util_avg(p); =20 - activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_INITIAL); + activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_INITIAL | ENQUEUE_LOCKED); trace_sched_wakeup_new(p); wakeup_preempt(rq, p, wake_flags); if (p->sched_class->task_woken) { @@ -7310,7 +7310,7 @@ void rt_mutex_post_schedule(void) void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task) { int prio, oldprio, queue_flag =3D - DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; + DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK | DEQUEUE_LOCKED; const struct sched_class *prev_class, *next_class; struct rq_flags rf; struct rq *rq; @@ -8056,7 +8056,7 @@ int migrate_task_to(struct task_struct * void sched_setnuma(struct task_struct *p, int nid) { guard(task_rq_lock)(p); - scoped_guard (sched_change, p, DEQUEUE_SAVE) + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_LOCKED) p->numa_preferred_nid =3D nid; } #endif /* CONFIG_NUMA_BALANCING */ @@ -9160,7 +9160,7 @@ static void sched_change_group(struct ta void sched_move_task(struct task_struct *tsk, bool for_autogroup) { unsigned int queue_flags =3D - DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; + DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK | DEQUEUE_LOCKED; bool resched =3D false; struct rq *rq; =20 @@ -10841,6 +10841,13 @@ struct sched_change_ctx *sched_change_be struct rq *rq =3D task_rq(p); =20 lockdep_assert_rq_held(rq); +#ifdef CONFIG_PROVE_LOCKING + if (flags & DEQUEUE_LOCKED) { + lockdep_assert_held(&p->pi_lock); + if (p->srq_lock) + lockdep_assert_held(p->srq_lock); + } +#endif =20 if (flags & DEQUEUE_CLASS) { if (WARN_ON_ONCE(flags & (DEQUEUE_SAVE | DEQUEUE_MOVE))) @@ -10862,6 +10869,9 @@ struct sched_change_ctx *sched_change_be .flags =3D flags, .queued =3D task_on_rq_queued(p), .running =3D task_current(rq, p), +#ifdef CONFIG_PROVE_LOCKING + .srq_lock =3D p->srq_lock, +#endif }; =20 if (!(flags & DEQUEUE_CLASS)) { @@ -10888,6 +10898,15 @@ void sched_change_end(struct sched_chang struct rq *rq =3D task_rq(p); =20 lockdep_assert_rq_held(rq); +#ifdef CONFIG_PROVE_LOCKING + if (ctx->flags & ENQUEUE_LOCKED) { + lockdep_assert_held(&p->pi_lock); + if (p->srq_lock) + lockdep_assert_held(p->srq_lock); + if (ctx->srq_lock && ctx->srq_lock !=3D p->srq_lock) + lockdep_assert_not_held(ctx->srq_lock); + } +#endif =20 if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switching_to) p->sched_class->switching_to(rq, p); --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2340,6 +2340,8 @@ extern const u32 sched_prio_to_wmult[40 * CLASS - going to update p->sched_class; makes sched_change call the * various switch methods. * + * LOCKED - task_rq_lock() context, implies p->srq_lock taken when set. + * * ENQUEUE_HEAD - place at front of runqueue (tail if not specified) * ENQUEUE_REPLENISH - CBS (replenish runtime and postpone deadline) * ENQUEUE_MIGRATED - the task was migrated during wakeup @@ -2355,6 +2357,7 @@ extern const u32 sched_prio_to_wmult[40 #define DEQUEUE_MIGRATING 0x0010 /* Matches ENQUEUE_MIGRATING */ #define DEQUEUE_DELAYED 0x0020 /* Matches ENQUEUE_DELAYED */ #define DEQUEUE_CLASS 0x0040 /* Matches ENQUEUE_CLASS */ +#define DEQUEUE_LOCKED 0x0080 /* Matches ENQUEUE_LOCKED */ =20 #define DEQUEUE_SPECIAL 0x00010000 #define DEQUEUE_THROTTLE 0x00020000 @@ -2367,6 +2370,7 @@ extern const u32 sched_prio_to_wmult[40 #define ENQUEUE_MIGRATING 0x0010 #define ENQUEUE_DELAYED 0x0020 #define ENQUEUE_CLASS 0x0040 +#define ENQUEUE_LOCKED 0x0080 =20 #define ENQUEUE_HEAD 0x00010000 #define ENQUEUE_REPLENISH 0x00020000 @@ -3963,6 +3967,9 @@ extern void balance_callbacks(struct rq struct sched_change_ctx { u64 prio; struct task_struct *p; +#ifdef CONFIG_PROVE_LOCKING + raw_spinlock_t *srq_lock; +#endif int flags; bool queued; bool running; --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -89,7 +89,7 @@ void set_user_nice(struct task_struct *p return; } =20 - scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK) { + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK | DEQUEUE_L= OCKED) { p->static_prio =3D NICE_TO_PRIO(nice); set_load_weight(p, true); old_prio =3D p->prio; @@ -503,7 +503,7 @@ int __sched_setscheduler(struct task_str struct balance_callback *head; struct rq_flags rf; int reset_on_fork; - int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; + int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK | DEQUE= UE_LOCKED; struct rq *rq; bool cpuset_locked =3D false; From nobody Wed Sep 10 23:21:40 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8955B311944; Wed, 10 Sep 2025 16:00:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520042; cv=none; b=htHLjC/NC5poxxTvq3AxNjbexkg8rArmLRoARIHAi2j+9BEq/hlSeexBmM8ttD7QnGrZKhplG5kBdi9cdScPxpsUJ/9syzOTS7M3QIFBtTnS/JCWBO2k/aqiNOqS/OCsFjNImBJTNp61AU+YIPkExEdMlCTMw/Viz0deiGwZBbI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520042; c=relaxed/simple; bh=B8aSuqbTgLfmYBfvMd3xIXOwLf3sM4eWPpCxmqmLkIQ=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=k8zzmAVDmECy4CilUs083QFtOllZziJdLRozwppCoeZl1YY1NWBQwMuxcWu4RA6FTgAXan6boNDdQ7C/PScLeV5k9tN5qfCfij8ihLBRx9QV66DciUKLBOOCy71Pe69bCtmJAtmOQ0/vjL/BdFz4WGTI/R9Eluezekb09qgFfu8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=bXB8xur8; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="bXB8xur8" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=3J8eOg/7mOqS6yso4behIDB9wj7wsh81tlPZ1LgWgX0=; b=bXB8xur8nanb7DTbYIt3SOumzk xSBhQsb3nTy4o72SqqQg/MM/7IQv00zUznLJzLPpwOaZgLtDh5zGAmsIT6uK58pCm/D4uMTjU9rZq Seu2jiqOgyGqFea/67h4Z6rjJua5pgxwf8/CbLW0LzFA3BPffS9BLAc5npaEjN6lUaZO1HHxXwMGv oYs2Aokjf/jZHtglqKhA30xekNR0uB8x8zT67qmjjAIcZdUjdkR4SMHy2p6rK/xv9FP+P9uoOrAut rfpm89RPoYf+XfmjprUZIguLKMNPdRRRlea6iMfAlfdL+WRAp3vx/FrKvm5n++YY0a92JKCIdw2Tc EUpRf3XA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFH-0000000BXpK-0cXS; Wed, 10 Sep 2025 16:00:31 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id B0097302FFC; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155809.916720757@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:23 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 14/14] sched/ext: Implement p->srq_lock support References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Have enqueue set p->srq_lock to &dsq->lock and have dequeue clear it, when dst is non-local. When enqueue sees ENQUEUE_LOCKED, it must lock dsq->lock (since p->srq_lock will be NULL on entry) but must not unlock on exit when it sets p->srq_lock. When dequeue sees DEQUEUE_LOCKED, it must not lock dsq->lock when p->srq_lock is set (instead it must verify they are the same), but it must unlock on exit, since it will have cleared p->srq_lock. For DEQUEUE_SAVE/ENQUEUE_RESTORE it can retain p->srq_lock, since the extra unlock+lock cycle is pointless. Note: set_next_task_scx() relies on LOCKED to avoid self-recursion on dsq->lock in the enqueue_task/set_next_task case. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/ext.c | 68 ++++++++++++++++++++++++++++++++++++------------= ----- 1 file changed, 47 insertions(+), 21 deletions(-) --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -1952,13 +1952,16 @@ static void dispatch_enqueue(struct scx_ struct task_struct *p, u64 enq_flags) { bool is_local =3D dsq->id =3D=3D SCX_DSQ_LOCAL; + bool locked =3D enq_flags & ENQUEUE_LOCKED; + bool restore =3D enq_flags & ENQUEUE_RESTORE; =20 WARN_ON_ONCE(p->scx.dsq || !list_empty(&p->scx.dsq_list.node)); WARN_ON_ONCE((p->scx.dsq_flags & SCX_TASK_DSQ_ON_PRIQ) || !RB_EMPTY_NODE(&p->scx.dsq_priq)); =20 if (!is_local) { - raw_spin_lock(&dsq->lock); + if (!locked || !p->srq_lock) + raw_spin_lock(&dsq->lock); if (unlikely(dsq->id =3D=3D SCX_DSQ_INVALID)) { scx_error(sch, "attempting to dispatch to a destroyed dsq"); /* fall back to the global dsq */ @@ -2028,6 +2031,10 @@ static void dispatch_enqueue(struct scx_ =20 dsq_mod_nr(dsq, 1); p->scx.dsq =3D dsq; + if (!is_local) { + WARN_ON_ONCE(locked && restore && p->srq_lock && p->srq_lock !=3D &dsq->= lock); + p->srq_lock =3D &dsq->lock; + } =20 /* * scx.ddsp_dsq_id and scx.ddsp_enq_flags are only relevant on the @@ -2059,13 +2066,17 @@ static void dispatch_enqueue(struct scx_ rq->curr->sched_class)) resched_curr(rq); } else { - raw_spin_unlock(&dsq->lock); + if (!locked) + raw_spin_unlock(&dsq->lock); } } =20 static void task_unlink_from_dsq(struct task_struct *p, - struct scx_dispatch_q *dsq) + struct scx_dispatch_q *dsq, + int deq_flags) { + bool save =3D deq_flags & DEQUEUE_SAVE; + WARN_ON_ONCE(list_empty(&p->scx.dsq_list.node)); =20 if (p->scx.dsq_flags & SCX_TASK_DSQ_ON_PRIQ) { @@ -2076,12 +2087,15 @@ static void task_unlink_from_dsq(struct =20 list_del_init(&p->scx.dsq_list.node); dsq_mod_nr(dsq, -1); + if (!save) + p->srq_lock =3D NULL; } =20 -static void dispatch_dequeue(struct rq *rq, struct task_struct *p) +static void dispatch_dequeue(struct rq *rq, struct task_struct *p, int deq= _flags) { struct scx_dispatch_q *dsq =3D p->scx.dsq; bool is_local =3D dsq =3D=3D &rq->scx.local_dsq; + bool locked =3D deq_flags & DEQUEUE_LOCKED; =20 if (!dsq) { /* @@ -2103,8 +2117,10 @@ static void dispatch_dequeue(struct rq * return; } =20 - if (!is_local) - raw_spin_lock(&dsq->lock); + if (!is_local) { + if (!locked) + raw_spin_lock(&dsq->lock); + } =20 /* * Now that we hold @dsq->lock, @p->holding_cpu and @p->scx.dsq_* can't @@ -2112,7 +2128,8 @@ static void dispatch_dequeue(struct rq * */ if (p->scx.holding_cpu < 0) { /* @p must still be on @dsq, dequeue */ - task_unlink_from_dsq(p, dsq); + WARN_ON_ONCE(!is_local && !p->srq_lock); + task_unlink_from_dsq(p, dsq, deq_flags); } else { /* * We're racing against dispatch_to_local_dsq() which already @@ -2125,8 +2142,10 @@ static void dispatch_dequeue(struct rq * } p->scx.dsq =3D NULL; =20 - if (!is_local) - raw_spin_unlock(&dsq->lock); + if (!is_local) { + if (!locked || !p->srq_lock) + raw_spin_unlock(&dsq->lock); + } } =20 static struct scx_dispatch_q *find_dsq_for_dispatch(struct scx_sched *sch, @@ -2372,7 +2391,7 @@ static void clr_task_runnable(struct tas p->scx.flags |=3D SCX_TASK_RESET_RUNNABLE_AT; } =20 -static void set_next_task_scx(struct rq *rq, struct task_struct *p, bool f= irst); +static void __set_next_task_scx(struct rq *rq, struct task_struct *p, u32 = qf); =20 static void enqueue_task_scx(struct rq *rq, struct task_struct *p, u32 enq= _flags) { @@ -2421,7 +2440,7 @@ static void enqueue_task_scx(struct rq * __scx_add_event(sch, SCX_EV_SELECT_CPU_FALLBACK, 1); =20 if (enq_flags & ENQUEUE_CURR) - set_next_task_scx(rq, p, false); + __set_next_task_scx(rq, p, enq_flags); } =20 static void ops_dequeue(struct rq *rq, struct task_struct *p, u64 deq_flag= s) @@ -2516,7 +2535,7 @@ static bool dequeue_task_scx(struct rq * rq->scx.nr_running--; sub_nr_running(rq, 1); =20 - dispatch_dequeue(rq, p); + dispatch_dequeue(rq, p, deq_flags); =20 out: if (deq_flags & DEQUEUE_CURR) @@ -2710,7 +2729,7 @@ static bool unlink_dsq_and_lock_src_rq(s lockdep_assert_held(&dsq->lock); =20 WARN_ON_ONCE(p->scx.holding_cpu >=3D 0); - task_unlink_from_dsq(p, dsq); + task_unlink_from_dsq(p, dsq, 0); p->scx.holding_cpu =3D cpu; =20 raw_spin_unlock(&dsq->lock); @@ -2782,7 +2801,7 @@ static struct rq *move_task_between_dsqs if (dst_dsq->id =3D=3D SCX_DSQ_LOCAL) { /* @p is going from a non-local DSQ to a local DSQ */ if (src_rq =3D=3D dst_rq) { - task_unlink_from_dsq(p, src_dsq); + task_unlink_from_dsq(p, src_dsq, 0); move_local_task_to_local_dsq(p, enq_flags, src_dsq, dst_rq); raw_spin_unlock(&src_dsq->lock); @@ -2796,7 +2815,7 @@ static struct rq *move_task_between_dsqs * @p is going from a non-local DSQ to a non-local DSQ. As * $src_dsq is already locked, do an abbreviated dequeue. */ - task_unlink_from_dsq(p, src_dsq); + task_unlink_from_dsq(p, src_dsq, 0); p->scx.dsq =3D NULL; raw_spin_unlock(&src_dsq->lock); =20 @@ -2862,7 +2881,7 @@ static bool consume_dispatch_q(struct sc struct rq *task_rq =3D task_rq(p); =20 if (rq =3D=3D task_rq) { - task_unlink_from_dsq(p, dsq); + task_unlink_from_dsq(p, dsq, 0); move_local_task_to_local_dsq(p, 0, dsq, rq); raw_spin_unlock(&dsq->lock); return true; @@ -3256,7 +3275,7 @@ static void process_ddsp_deferred_locals } } =20 -static void set_next_task_scx(struct rq *rq, struct task_struct *p, bool f= irst) +static void __set_next_task_scx(struct rq *rq, struct task_struct *p, u32 = qf) { struct scx_sched *sch =3D scx_root; =20 @@ -3266,7 +3285,7 @@ static void set_next_task_scx(struct rq * dispatched. Call ops_dequeue() to notify the BPF scheduler. */ ops_dequeue(rq, p, SCX_DEQ_CORE_SCHED_EXEC); - dispatch_dequeue(rq, p); + dispatch_dequeue(rq, p, qf); } =20 p->se.exec_start =3D rq_clock_task(rq); @@ -3300,6 +3319,11 @@ static void set_next_task_scx(struct rq } } =20 +static void set_next_task_scx(struct rq *rq, struct task_struct *p, bool f= irst) +{ + __set_next_task_scx(rq, p, 0); +} + static enum scx_cpu_preempt_reason preempt_reason_from_class(const struct sched_class *class) { @@ -5012,7 +5036,8 @@ static void scx_disable_workfn(struct kt =20 scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { - unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | + DEQUEUE_NOCLOCK | DEQUEUE_LOCKED; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); @@ -5756,7 +5781,8 @@ static int scx_enable(struct sched_ext_o percpu_down_write(&scx_fork_rwsem); scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { - unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | + DEQUEUE_NOCLOCK | DEQUEUE_LOCKED; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); @@ -6808,7 +6834,7 @@ __bpf_kfunc u32 scx_bpf_reenqueue_local( if (p->migration_pending || is_migration_disabled(p) || p->nr_cpus_allow= ed =3D=3D 1) continue; =20 - dispatch_dequeue(rq, p); + dispatch_dequeue(rq, p, 0); list_add_tail(&p->scx.dsq_list.node, &tasks); }