From nobody Mon Nov 25 03:07:22 2024 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA7342561D for ; Wed, 30 Oct 2024 15:25:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730301954; cv=none; b=HclgfAAhlmjY63Se8jcWSWGzBpS194GkdgSN2nA81C87q5xWaxRs8l++cZLt1zqQm/v1ICNePNarfX7bIL0M087dXhW8RT2hb4z2R33K+sZ/gbTwttFZi9RYg6I2YoiYX1QmdLveJGfCcgy+t9SlbA3u/BCfxl1zSWQ4Y8/HPOc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730301954; c=relaxed/simple; bh=6Ks3e9Uk/ZKTYxD4/YEDl9B2m8TpBp0jREYyDCrsFkM=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=k9s6aCBu+nLWVlUT/TGDKiuBHBJt0QMMAiN9npVHgGAgjHTuQm9kIcdmH3ivR5WoAFUtfvZhprJNm6AJvrdq+0HEbnO1rXWYkdlzjSN3RxbCHvjLFwe8gWKqKfDfTgp2O5UChmIOgzbo/KLlGmN3zun4NPhbPjq3pN0/o3A19rs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=btR0bnor; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="btR0bnor" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=iqBunzPUDSWv3TKKN/Q/5ZZ4v9ZwJa4KftzK5/fKuWc=; b=btR0bnorRWdwLKz1MGOsuEd/ou oj2sqnQakVQSdB1C9HeduNVlxBJ/uRs1yDCshMNBVhVZv5al0xqbMbQSv1FQwvqVPoVYI/ZY61ZEd pxbbLjHFg19ManLBbu3oFSOXHOX1z4Z3NE6SUjA7IyAqp7MLKmpGkhHzpKEtz1rEpX9AAW/XLxdAS f1MLCWpTQWKwcemPBzOX86/l9cXxN7bQQ6a9bZqXfSmBpcBKx5/xGXAVUnn3HEY9RZrFe8iZ5mXtJ HcIdwUbHmlaMnX9gRldFuPkXLqSMcXynI1JVaIdraULxeAhrR5394k1U3UHuUOMtLp1IZdnw4HojT Rxl9vVYg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1t6AZs-0000000DYqU-2VEh; Wed, 30 Oct 2024 15:25:45 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id DB0B6300B40; Wed, 30 Oct 2024 16:25:43 +0100 (CET) Message-Id: <20241030152142.372771313@infradead.org> User-Agent: quilt/0.65 Date: Wed, 30 Oct 2024 16:12:56 +0100 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, void@manifault.com, linux-kernel@vger.kernel.org Subject: [PATCH 1/6] sched/ext: Fix scx vs sched_delayed References: <20241030151255.300069509@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Commit 98442f0ccd82 ("sched: Fix delayed_dequeue vs switched_from_fair()") forgot about scx :/ Fixes: 98442f0ccd82 ("sched: Fix delayed_dequeue vs switched_from_fair()") Reported-by: Tejun Heo Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20241030104934.GK14555@noisy.programming.ki= cks-ass.net --- kernel/sched/ext.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -4489,11 +4489,16 @@ static void scx_ops_disable_workfn(struc scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { const struct sched_class *old_class =3D p->sched_class; + const struct sched_class *new_class =3D + __setscheduler_class(p->policy, p->prio); struct sched_enq_and_set_ctx ctx; =20 + if (old_class !=3D new_class && p->se.sched_delayed) + dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); + sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx); =20 - p->sched_class =3D __setscheduler_class(p->policy, p->prio); + p->sched_class =3D new_class; check_class_changing(task_rq(p), p, old_class); =20 sched_enq_and_set_task(&ctx); @@ -5199,12 +5204,17 @@ static int scx_ops_enable(struct sched_e scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { const struct sched_class *old_class =3D p->sched_class; + const struct sched_class *new_class =3D + __setscheduler_class(p->policy, p->prio); struct sched_enq_and_set_ctx ctx; =20 + if (old_class !=3D new_class && p->se.sched_delayed) + dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEE_DELAYED); + sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx); =20 p->scx.slice =3D SCX_SLICE_DFL; - p->sched_class =3D __setscheduler_class(p->policy, p->prio); + p->sched_class =3D new_class; check_class_changing(task_rq(p), p, old_class); =20 sched_enq_and_set_task(&ctx); From nobody Mon Nov 25 03:07:22 2024 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8A33213ED9 for ; Wed, 30 Oct 2024 15:25:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730301956; cv=none; b=DlIl1QpM8jnSqac5ypF5ZqDx+3lkl0zLmDsHiqt2jGR9Qua+rIXlqSKCc79XaFt7TZRZTlh6M43NKM2c+P1rHzP383gxJo2u7d+L9ZeVO09m7QOyitff2J2vMWY0jHzj/YjOEQ881WiNS4mSuer0Nu88f83MRfGWjwAZlfCysQQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730301956; c=relaxed/simple; bh=mccBWL9WEoEDXhSL3qlwPQc6HCn5mfi3S4mWhduGjio=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=aA7q8V+eJy7bFYwDciLU+yURNsPE30NlS2U2m9wfpd5dm7EAtt2/kh3H10XM4F/enCVhxtYeRTM1OUS4CwAvJ8NDGuazRtotc9IesYoz7Y/l730ajYP+kH10LKRG4q3zRDIwh3HyczedpL9vyAIe/1+ThIHjkqWzxruLCII6l0s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=N7wIEeVd; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="N7wIEeVd" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=CENJfiIAjzpdW6w2/RkM/sIExyk1ul4yI9JaIDui0rk=; b=N7wIEeVdgULIyi7njLxxIBobBo BYg1apHI1COlEDCc6v/9zL2vMyDI2H3zeUlYdv9X65FHd+gUFf86aJJcQQMmhDyArZtNfSFTTag5M rOkT10ge9VT1kgPcDJzyFFSnNDuWrAo/fG8Vp6m5dvHRL617i1aIiVyhlOWgGP1vggBn4jEKwFOrG m+XaVCf/OJgN4GDVaeCiJkrkFvtsk1Geto8lLVLnjbCXYpiEDSXbE6vhHN35WJ9JWGIOXibDO/qfi PI04I0JAPQ2nV3oHVQWMI7lGlKIKLzTZpp5SxFPYxqkM+WmavZCrO+P43Wwjew787HNn4FL+Ploee VN+gSdZA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1t6AZs-0000000AI26-3P15; Wed, 30 Oct 2024 15:25:45 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id DEF72301173; Wed, 30 Oct 2024 16:25:43 +0100 (CET) Message-Id: <20241030152142.488737132@infradead.org> User-Agent: quilt/0.65 Date: Wed, 30 Oct 2024 16:12:57 +0100 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, void@manifault.com, linux-kernel@vger.kernel.org Subject: [RFC][PATCH 2/6] sched: Employ sched_change guards References: <20241030151255.300069509@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As proposed a long while ago -- and half done by scx -- wrap the scheduler's 'change' pattern in a guard helper. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/cleanup.h | 5 + kernel/sched/core.c | 158 ++++++++++++++++++-------------------------= ----- kernel/sched/ext.c | 33 +++------- kernel/sched/sched.h | 21 +++--- kernel/sched/syscalls.c | 65 ++++++------------- 5 files changed, 112 insertions(+), 170 deletions(-) --- a/include/linux/cleanup.h +++ b/include/linux/cleanup.h @@ -297,6 +297,11 @@ static inline class_##_name##_t class_## #define __DEFINE_CLASS_IS_CONDITIONAL(_name, _is_cond) \ static __maybe_unused const bool class_##_name##_is_conditional =3D _is_co= nd =20 +#define DEFINE_CLASS_IS_UNCONDITIONAL(_name) \ + __DEFINE_CLASS_IS_CONDITIONAL(_name, false); \ + static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \ + { return (void *)1; } + #define DEFINE_GUARD(_name, _type, _lock, _unlock) \ __DEFINE_CLASS_IS_CONDITIONAL(_name, false); \ DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type = _T); \ --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7099,7 +7099,7 @@ void rt_mutex_post_schedule(void) */ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task) { - int prio, oldprio, queued, running, queue_flag =3D + int prio, oldprio, queue_flag =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; const struct sched_class *prev_class, *next_class; struct rq_flags rf; @@ -7164,52 +7164,42 @@ void rt_mutex_setprio(struct task_struct if (prev_class !=3D next_class && p->se.sched_delayed) dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); =20 - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - if (queued) - dequeue_task(rq, p, queue_flag); - if (running) - put_prev_task(rq, p); - - /* - * Boosting condition are: - * 1. -rt task is running and holds mutex A - * --> -dl task blocks on mutex A - * - * 2. -dl task is running and holds mutex A - * --> -dl task blocks on mutex A and could preempt the - * running task - */ - if (dl_prio(prio)) { - if (!dl_prio(p->normal_prio) || - (pi_task && dl_prio(pi_task->prio) && - dl_entity_preempt(&pi_task->dl, &p->dl))) { - p->dl.pi_se =3D pi_task->dl.pi_se; - queue_flag |=3D ENQUEUE_REPLENISH; + scoped_guard (sched_change, p, queue_flag) { + /* + * Boosting condition are: + * 1. -rt task is running and holds mutex A + * --> -dl task blocks on mutex A + * + * 2. -dl task is running and holds mutex A + * --> -dl task blocks on mutex A and could preempt the + * running task + */ + if (dl_prio(prio)) { + if (!dl_prio(p->normal_prio) || + (pi_task && dl_prio(pi_task->prio) && + dl_entity_preempt(&pi_task->dl, &p->dl))) { + p->dl.pi_se =3D pi_task->dl.pi_se; + scope.flags |=3D ENQUEUE_REPLENISH; + } else { + p->dl.pi_se =3D &p->dl; + } + } else if (rt_prio(prio)) { + if (dl_prio(oldprio)) + p->dl.pi_se =3D &p->dl; + if (oldprio < prio) + scope.flags |=3D ENQUEUE_HEAD; } else { - p->dl.pi_se =3D &p->dl; + if (dl_prio(oldprio)) + p->dl.pi_se =3D &p->dl; + if (rt_prio(oldprio)) + p->rt.timeout =3D 0; } - } else if (rt_prio(prio)) { - if (dl_prio(oldprio)) - p->dl.pi_se =3D &p->dl; - if (oldprio < prio) - queue_flag |=3D ENQUEUE_HEAD; - } else { - if (dl_prio(oldprio)) - p->dl.pi_se =3D &p->dl; - if (rt_prio(oldprio)) - p->rt.timeout =3D 0; - } - - p->sched_class =3D next_class; - p->prio =3D prio; =20 - check_class_changing(rq, p, prev_class); + p->sched_class =3D next_class; + p->prio =3D prio; =20 - if (queued) - enqueue_task(rq, p, queue_flag); - if (running) - set_next_task(rq, p); + check_class_changing(rq, p, prev_class); + } =20 check_class_changed(rq, p, prev_class, oldprio); out_unlock: @@ -7819,26 +7809,9 @@ int migrate_task_to(struct task_struct * */ void sched_setnuma(struct task_struct *p, int nid) { - bool queued, running; - struct rq_flags rf; - struct rq *rq; - - rq =3D task_rq_lock(p, &rf); - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - - if (queued) - dequeue_task(rq, p, DEQUEUE_SAVE); - if (running) - put_prev_task(rq, p); - - p->numa_preferred_nid =3D nid; - - if (queued) - enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK); - if (running) - set_next_task(rq, p); - task_rq_unlock(rq, p, &rf); + guard(task_rq_lock)(p); + scoped_guard (sched_change, p, DEQUEUE_SAVE) + p->numa_preferred_nid =3D nid; } #endif /* CONFIG_NUMA_BALANCING */ =20 @@ -8957,9 +8930,10 @@ static void sched_change_group(struct ta */ void sched_move_task(struct task_struct *tsk) { - int queued, running, queue_flags =3D + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; struct task_group *group; + bool resched =3D false; struct rq *rq; =20 CLASS(task_rq_lock, rq_guard)(tsk); @@ -8975,21 +8949,14 @@ void sched_move_task(struct task_struct =20 update_rq_clock(rq); =20 - running =3D task_current_donor(rq, tsk); - queued =3D task_on_rq_queued(tsk); + scoped_guard (sched_change, tsk, queue_flags) { + sched_change_group(tsk, group); + scx_move_task(tsk); + if (scope.running) + resched =3D true; + } =20 - if (queued) - dequeue_task(rq, tsk, queue_flags); - if (running) - put_prev_task(rq, tsk); - - sched_change_group(tsk, group); - scx_move_task(tsk); - - if (queued) - enqueue_task(rq, tsk, queue_flags); - if (running) { - set_next_task(rq, tsk); + if (resched) { /* * After changing group, the running task may have joined a * throttled one but it's still the running task. Trigger a @@ -10580,37 +10547,34 @@ void sched_mm_cid_fork(struct task_struc } #endif =20 -#ifdef CONFIG_SCHED_CLASS_EXT -void sched_deq_and_put_task(struct task_struct *p, int queue_flags, - struct sched_enq_and_set_ctx *ctx) +struct sched_change_ctx sched_change_begin(struct task_struct *p, unsigned= int flags) { struct rq *rq =3D task_rq(p); - - lockdep_assert_rq_held(rq); - - *ctx =3D (struct sched_enq_and_set_ctx){ + struct sched_change_ctx ctx =3D { .p =3D p, - .queue_flags =3D queue_flags, + .flags =3D flags, .queued =3D task_on_rq_queued(p), .running =3D task_current(rq, p), }; =20 - update_rq_clock(rq); - if (ctx->queued) - dequeue_task(rq, p, queue_flags | DEQUEUE_NOCLOCK); - if (ctx->running) + lockdep_assert_rq_held(rq); + + if (ctx.queued) + dequeue_task(rq, p, flags); + if (ctx.running) put_prev_task(rq, p); + + return ctx; } =20 -void sched_enq_and_set_task(struct sched_enq_and_set_ctx *ctx) +void sched_change_end(struct sched_change_ctx ctx) { - struct rq *rq =3D task_rq(ctx->p); + struct rq *rq =3D task_rq(ctx.p); =20 lockdep_assert_rq_held(rq); =20 - if (ctx->queued) - enqueue_task(rq, ctx->p, ctx->queue_flags | ENQUEUE_NOCLOCK); - if (ctx->running) - set_next_task(rq, ctx->p); + if (ctx.queued) + enqueue_task(rq, ctx.p, ctx.flags | ENQUEUE_NOCLOCK); + if (ctx.running) + set_next_task(rq, ctx.p); } -#endif /* CONFIG_SCHED_CLASS_EXT */ --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -4355,11 +4355,10 @@ static void scx_ops_bypass(bool bypass) */ list_for_each_entry_safe_reverse(p, n, &rq->scx.runnable_list, scx.runnable_node) { - struct sched_enq_and_set_ctx ctx; - /* cycling deq/enq is enough, see the function comment */ - sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx); - sched_enq_and_set_task(&ctx); + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE) { + /* nothing */ ; + } } =20 rq_unlock_irqrestore(rq, &rf); @@ -4491,17 +4490,14 @@ static void scx_ops_disable_workfn(struc const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); - struct sched_enq_and_set_ctx ctx; =20 if (old_class !=3D new_class && p->se.sched_delayed) dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); =20 - sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx); - - p->sched_class =3D new_class; - check_class_changing(task_rq(p), p, old_class); - - sched_enq_and_set_task(&ctx); + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE) { + p->sched_class =3D new_class; + check_class_changing(task_rq(p), p, old_class); + } =20 check_class_changed(task_rq(p), p, old_class, p->prio); scx_ops_exit_task(p); @@ -5206,18 +5202,15 @@ static int scx_ops_enable(struct sched_e const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); - struct sched_enq_and_set_ctx ctx; =20 if (old_class !=3D new_class && p->se.sched_delayed) - dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEE_DELAYED); - - sched_deq_and_put_task(p, DEQUEUE_SAVE | DEQUEUE_MOVE, &ctx); - - p->scx.slice =3D SCX_SLICE_DFL; - p->sched_class =3D new_class; - check_class_changing(task_rq(p), p, old_class); + dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); =20 - sched_enq_and_set_task(&ctx); + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE) { + p->scx.slice =3D SCX_SLICE_DFL; + p->sched_class =3D new_class; + check_class_changing(task_rq(p), p, old_class); + } =20 check_class_changed(task_rq(p), p, old_class, p->prio); } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3921,23 +3921,22 @@ static inline void balance_callbacks(str =20 #endif =20 -#ifdef CONFIG_SCHED_CLASS_EXT -/* - * Used by SCX in the enable/disable paths to move tasks between sched_cla= sses - * and establish invariants. - */ -struct sched_enq_and_set_ctx { +struct sched_change_ctx { struct task_struct *p; - int queue_flags; + unsigned int flags; bool queued; bool running; }; =20 -void sched_deq_and_put_task(struct task_struct *p, int queue_flags, - struct sched_enq_and_set_ctx *ctx); -void sched_enq_and_set_task(struct sched_enq_and_set_ctx *ctx); +struct sched_change_ctx sched_change_begin(struct task_struct *p, unsigned= int flags); +void sched_change_end(struct sched_change_ctx ctx); =20 -#endif /* CONFIG_SCHED_CLASS_EXT */ +DEFINE_CLASS(sched_change, struct sched_change_ctx, + sched_change_end(_T), + sched_change_begin(p, flags), + struct task_struct *p, unsigned int flags) + +DEFINE_CLASS_IS_UNCONDITIONAL(sched_change) =20 #include "ext.h" =20 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -64,7 +64,6 @@ static int effective_prio(struct task_st =20 void set_user_nice(struct task_struct *p, long nice) { - bool queued, running; struct rq *rq; int old_prio; =20 @@ -90,22 +89,12 @@ void set_user_nice(struct task_struct *p return; } =20 - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - if (queued) - dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK); - if (running) - put_prev_task(rq, p); - - p->static_prio =3D NICE_TO_PRIO(nice); - set_load_weight(p, true); - old_prio =3D p->prio; - p->prio =3D effective_prio(p); - - if (queued) - enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK); - if (running) - set_next_task(rq, p); + scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK) { + p->static_prio =3D NICE_TO_PRIO(nice); + set_load_weight(p, true); + old_prio =3D p->prio; + p->prio =3D effective_prio(p); + } =20 /* * If the task increased its priority or is running and @@ -528,7 +517,7 @@ int __sched_setscheduler(struct task_str bool user, bool pi) { int oldpolicy =3D -1, policy =3D attr->sched_policy; - int retval, oldprio, newprio, queued, running; + int retval, oldprio, newprio; const struct sched_class *prev_class, *next_class; struct balance_callback *head; struct rq_flags rf; @@ -712,33 +701,25 @@ int __sched_setscheduler(struct task_str if (prev_class !=3D next_class && p->se.sched_delayed) dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); =20 - queued =3D task_on_rq_queued(p); - running =3D task_current_donor(rq, p); - if (queued) - dequeue_task(rq, p, queue_flags); - if (running) - put_prev_task(rq, p); - - if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) { - __setscheduler_params(p, attr); - p->sched_class =3D next_class; - p->prio =3D newprio; - } - __setscheduler_uclamp(p, attr); - check_class_changing(rq, p, prev_class); + scoped_guard (sched_change, p, queue_flags) { =20 - if (queued) { - /* - * We enqueue to tail when the priority of a task is - * increased (user space view). - */ - if (oldprio < p->prio) - queue_flags |=3D ENQUEUE_HEAD; + if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) { + __setscheduler_params(p, attr); + p->sched_class =3D next_class; + p->prio =3D newprio; + } + __setscheduler_uclamp(p, attr); + check_class_changing(rq, p, prev_class); =20 - enqueue_task(rq, p, queue_flags); + if (scope.queued) { + /* + * We enqueue to tail when the priority of a task is + * increased (user space view). + */ + if (oldprio < p->prio) + scope.flags |=3D ENQUEUE_HEAD; + } } - if (running) - set_next_task(rq, p); =20 check_class_changed(rq, p, prev_class, oldprio); From nobody Mon Nov 25 03:07:22 2024 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F292D1FAC2A for ; Wed, 30 Oct 2024 15:25:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730301953; cv=none; b=bDiw/pLlbpDCs74tvUrq+ecJhfmWnW7TnTuUDu7aY723c+0HXUfd69PUVpmUwSq8HjMhvNJOW3U+gHNpwDvdHvEwZIWEx8vyp0eOxGndIX0Im/+qax2vtSdqlbAJGxQvafcY9j2YRUQtRu8/fnHtiI3aM7hMvUU8RL94JwOioWY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730301953; c=relaxed/simple; bh=XscQ+bVi3jjKbtGttvWup3T5JJPmmKuWuh6WpGbeC8I=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=svfNzJywhpuBFQBnB1uKIsuD672di7ZJukLT/On1WxcDhGxIM6XJ3/xnMAz6/HpBtHPXSwOYFEKA910xwY39YMYfxXeK8Wzx0yHwkCKs42ExDge9G+g67kfpSEuUpqpqOyRa5vcmi4+5AZdst0SkgLVojfPndq6tS6aLDXFHj2o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=OHPuIzqW; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="OHPuIzqW" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=hySKwNG5fYiUJP6OaBxgQPAobnHMX6aFY1uAJveg7qs=; b=OHPuIzqWJM2LMN0W86hptkIx/l eg1b3TU9yzW5qyrLGwSg1PvudXFbb3JSIC5X3AaX+hwtohRWliMgXBp6SuenszzDe+Ms9z2GdaF5h m1EqKWbcSPM5yMDf4+j26vJ3er9Ee4Vi0vPfoLEiIwfZjsodpItmTQ2xTyCg6U6ec8N7zJmfzWbuF yRgLKHDv9NhR+SHer/wpRBH784p45XiRNuE1bpHn7vkn85bPwwG5uQ/50KLI7m89jWuluA63F+YNi J+H1zPOIPPhyArJjLIR6ohqvky1NKU/PrzmdoJrjEorJ1WomXN6F+bDLGoGqhEUXF+ZACkIKmT5AT oGUvm/SQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1t6AZs-0000000AI27-3QPL; Wed, 30 Oct 2024 15:25:45 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id E3D873011E4; Wed, 30 Oct 2024 16:25:43 +0100 (CET) Message-Id: <20241030152142.602414152@infradead.org> User-Agent: quilt/0.65 Date: Wed, 30 Oct 2024 16:12:58 +0100 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, void@manifault.com, linux-kernel@vger.kernel.org Subject: [RFC][PATCH 3/6] sched: Re-arrange the {EN,DE}QUEUE flags References: <20241030151255.300069509@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ensure the matched flags are in the low byte while the unmatched flags go into the second byte. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/sched.h | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2337,26 +2337,25 @@ extern const u32 sched_prio_to_wmult[40 #define DEQUEUE_SAVE 0x02 /* Matches ENQUEUE_RESTORE */ #define DEQUEUE_MOVE 0x04 /* Matches ENQUEUE_MOVE */ #define DEQUEUE_NOCLOCK 0x08 /* Matches ENQUEUE_NOCLOCK */ -#define DEQUEUE_SPECIAL 0x10 -#define DEQUEUE_MIGRATING 0x100 /* Matches ENQUEUE_MIGRATING */ -#define DEQUEUE_DELAYED 0x200 /* Matches ENQUEUE_DELAYED */ + +#define DEQUEUE_MIGRATING 0x10 /* Matches ENQUEUE_MIGRATING */ +#define DEQUEUE_DELAYED 0x20 /* Matches ENQUEUE_DELAYED */ + +#define DEQUEUE_SPECIAL 0x0100 =20 #define ENQUEUE_WAKEUP 0x01 #define ENQUEUE_RESTORE 0x02 #define ENQUEUE_MOVE 0x04 #define ENQUEUE_NOCLOCK 0x08 =20 -#define ENQUEUE_HEAD 0x10 -#define ENQUEUE_REPLENISH 0x20 -#ifdef CONFIG_SMP -#define ENQUEUE_MIGRATED 0x40 -#else -#define ENQUEUE_MIGRATED 0x00 -#endif -#define ENQUEUE_INITIAL 0x80 -#define ENQUEUE_MIGRATING 0x100 -#define ENQUEUE_DELAYED 0x200 -#define ENQUEUE_RQ_SELECTED 0x400 +#define ENQUEUE_MIGRATING 0x10 +#define ENQUEUE_DELAYED 0x20 + +#define ENQUEUE_HEAD 0x0100 +#define ENQUEUE_REPLENISH 0x0200 +#define ENQUEUE_MIGRATED (0x0400*IS_ENABLED(CONFIG_SMP)) +#define ENQUEUE_INITIAL 0x0800 +#define ENQUEUE_RQ_SELECTED 0x1000 =20 #define RETRY_TASK ((void *)-1UL) From nobody Mon Nov 25 03:07:22 2024 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A56A213EE0 for ; Wed, 30 Oct 2024 15:25:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730301956; cv=none; b=X4/uznqFi1Cn6OLaMH4zZriNykXB6jjjgb8JgZAlRaIMJFbZyhvRGPy/JvMlL9n0ojV/mC8eIFLlW6VBAdLIahzTcctzkkCD3QzH5QAgmMieeGzjITANcpvumI4pcOzv/abdX8I11/uIawwtymI71AGmtLDvsCK6ECU6MytMdvM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730301956; c=relaxed/simple; bh=RNOg5ouTL1U+ujWCIwyxrXtmr4rRrvxV8iy7MoAAMzw=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Pc/O5okPLlB4UZ6c+7N9fnoK28M43cUCYuOlhpJstnaI6Fq0IyZN7ZHM4ALLr2wK5yM8LXKWpudR0TuXHPhe0FFosv2nsG1NF5S8FMQGKFx8gr8b4theRN3yxvAzMYBMHJ6P4oFnWJBzqsenaAMTSjSx79PBQzPVO3axowDzIXo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=bp1GsDCF; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="bp1GsDCF" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=aunwiDDKYmVRvfITOprvRJLMm2ei70V4JyWWDCQb24U=; b=bp1GsDCF9nZj4N+lWP9O1kDt1e oyzl/jBSZua0slb1ZAQVsO1EON6kbRZgkAl30MGsO7i3ht+68pcO7aaOfCNqpRT8LiLRoQJ2Xs73G pmgdJ7vSVXqs83/S6azQtsTtUdnS3Q9vi6GpmPaezX+C3aouZZ2EcvBcxKlueg6PgsIP9I/vOPKLK ND/M4mdD922hXabXWHjSsBl9hcqXUNkWSTQrzXhZ8mwHKlp0QEhNbhwJZf26ccxa8Hlsv/2nDEeqw b1C5OUqRuPEFJY2sZJmJOLqgM9WDvoBpyOouiKJlIiFW0F0ewA5didolsX86Ls0IvKXbkx27Tu+vJ 6JnfUE5Q==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1t6AZs-0000000AI28-3P1D; Wed, 30 Oct 2024 15:25:45 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id E9D84301D03; Wed, 30 Oct 2024 16:25:43 +0100 (CET) Message-Id: <20241030152142.711768679@infradead.org> User-Agent: quilt/0.65 Date: Wed, 30 Oct 2024 16:12:59 +0100 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, void@manifault.com, linux-kernel@vger.kernel.org Subject: [RFC][PATCH 4/6] sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern References: <20241030151255.300069509@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add {DE,EN}QUEUE_CLASS and fold the sched_class::switch* methods into the change pattern. This completes and makes the pattern more symmetric. This changes the order of callbacks slightly: | | switching_from() dequeue_task(); | dequeue_task() put_prev_task(); | put_prev_task() | switched_from() | ... change task ... | ... change task ... | switching_to(); | switching_to() enqueue_task(); | enqueue_task() set_next_task(); | set_next_task() prev_class->switched_from() | switched_to() | switched_to() | Notably, it moves the switched_from() callback right after the dequeue/put. Existing implementations don't appear to be affected by this change in location -- specifically the task isn't enqueued on the class in question in either location. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 61 ++++++++++++++++++++----------------------= ----- kernel/sched/ext.c | 19 ++++++++++---- kernel/sched/idle.c | 4 +-- kernel/sched/rt.c | 4 +-- kernel/sched/sched.h | 21 ++++++---------- kernel/sched/stop_task.c | 4 +-- kernel/sched/syscalls.c | 7 +++-- 7 files changed, 59 insertions(+), 61 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2102,34 +2102,9 @@ inline int task_curr(const struct task_s return cpu_curr(task_cpu(p)) =3D=3D p; } =20 -/* - * ->switching_to() is called with the pi_lock and rq_lock held and must n= ot - * mess with locking. - */ -void check_class_changing(struct rq *rq, struct task_struct *p, - const struct sched_class *prev_class) +void check_prio_changed(struct rq *rq, struct task_struct *p, int oldprio) { - if (prev_class !=3D p->sched_class && p->sched_class->switching_to) - p->sched_class->switching_to(rq, p); -} - -/* - * switched_from, switched_to and prio_changed must _NOT_ drop rq->lock, - * use the balance_callback list if you want balancing. - * - * this means any call to check_class_changed() must be followed by a call= to - * balance_callback(). - */ -void check_class_changed(struct rq *rq, struct task_struct *p, - const struct sched_class *prev_class, - int oldprio) -{ - if (prev_class !=3D p->sched_class) { - if (prev_class->switched_from) - prev_class->switched_from(rq, p); - - p->sched_class->switched_to(rq, p); - } else if (oldprio !=3D p->prio || dl_task(p)) + if (oldprio !=3D p->prio || dl_task(p)) p->sched_class->prio_changed(rq, p, oldprio); } =20 @@ -7161,6 +7136,9 @@ void rt_mutex_setprio(struct task_struct prev_class =3D p->sched_class; next_class =3D __setscheduler_class(p->policy, prio); =20 + if (prev_class !=3D next_class) + queue_flag |=3D DEQUEUE_CLASS; + if (prev_class !=3D next_class && p->se.sched_delayed) dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); =20 @@ -7197,11 +7175,10 @@ void rt_mutex_setprio(struct task_struct =20 p->sched_class =3D next_class; p->prio =3D prio; - - check_class_changing(rq, p, prev_class); } =20 - check_class_changed(rq, p, prev_class, oldprio); + if (!(queue_flag & DEQUEUE_CLASS)) + check_prio_changed(rq, p, oldprio); out_unlock: /* Avoid rq from going away on us: */ preempt_disable(); @@ -10550,6 +10527,12 @@ void sched_mm_cid_fork(struct task_struc struct sched_change_ctx sched_change_begin(struct task_struct *p, unsigned= int flags) { struct rq *rq =3D task_rq(p); + + lockdep_assert_rq_held(rq); + + if ((flags & DEQUEUE_CLASS) && p->sched_class->switching_from) + p->sched_class->switching_from(rq, p); + struct sched_change_ctx ctx =3D { .p =3D p, .flags =3D flags, @@ -10557,24 +10540,32 @@ struct sched_change_ctx sched_change_beg .running =3D task_current(rq, p), }; =20 - lockdep_assert_rq_held(rq); - if (ctx.queued) dequeue_task(rq, p, flags); if (ctx.running) put_prev_task(rq, p); =20 + if ((flags & DEQUEUE_CLASS) && p->sched_class->switched_from) + p->sched_class->switched_from(rq, p); + return ctx; } =20 void sched_change_end(struct sched_change_ctx ctx) { - struct rq *rq =3D task_rq(ctx.p); + struct task_struct *p =3D ctx.p; + struct rq *rq =3D task_rq(p); =20 lockdep_assert_rq_held(rq); =20 + if ((ctx.flags & ENQUEUE_CLASS) && p->sched_class->switching_to) + p->sched_class->switching_to(rq, p); + if (ctx.queued) - enqueue_task(rq, ctx.p, ctx.flags | ENQUEUE_NOCLOCK); + enqueue_task(rq, p, ctx.flags | ENQUEUE_NOCLOCK); if (ctx.running) - set_next_task(rq, ctx.p); + set_next_task(rq, p); + + if ((ctx.flags & ENQUEUE_CLASS) && p->sched_class->switched_to) + p->sched_class->switched_to(rq, p); } --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -4487,19 +4487,24 @@ static void scx_ops_disable_workfn(struc =20 scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); =20 + if (old_class !=3D new_class) + queue_flags |=3D DEQUEUE_CLASS; + if (old_class !=3D new_class && p->se.sched_delayed) dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); =20 - scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE) { + scoped_guard (sched_change, p, queue_flags) { p->sched_class =3D new_class; - check_class_changing(task_rq(p), p, old_class); } =20 - check_class_changed(task_rq(p), p, old_class, p->prio); + if (!(queue_flags & DEQUEUE_CLASS)) + check_prio_changed(task_rq(p), p, p->prio); + scx_ops_exit_task(p); } scx_task_iter_stop(&sti); @@ -5199,20 +5204,24 @@ static int scx_ops_enable(struct sched_e percpu_down_write(&scx_fork_rwsem); scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); =20 + if (old_class !=3D new_class) + queue_flags |=3D DEQUEUE_CLASS; + if (old_class !=3D new_class && p->se.sched_delayed) dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); =20 scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE) { p->scx.slice =3D SCX_SLICE_DFL; p->sched_class =3D new_class; - check_class_changing(task_rq(p), p, old_class); } =20 - check_class_changed(task_rq(p), p, old_class, p->prio); + if (!(queue_flags & DEQUEUE_CLASS)) + check_prio_changed(task_rq(p), p, p->prio); } scx_task_iter_stop(&sti); percpu_up_write(&scx_fork_rwsem); --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -494,7 +494,7 @@ static void task_tick_idle(struct rq *rq { } =20 -static void switched_to_idle(struct rq *rq, struct task_struct *p) +static void switching_to_idle(struct rq *rq, struct task_struct *p) { BUG(); } @@ -534,6 +534,6 @@ DEFINE_SCHED_CLASS(idle) =3D { .task_tick =3D task_tick_idle, =20 .prio_changed =3D prio_changed_idle, - .switched_to =3D switched_to_idle, + .switched_to =3D switching_to_idle, .update_curr =3D update_curr_idle, }; --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2633,7 +2633,6 @@ DEFINE_SCHED_CLASS(rt) =3D { .rq_online =3D rq_online_rt, .rq_offline =3D rq_offline_rt, .task_woken =3D task_woken_rt, - .switched_from =3D switched_from_rt, .find_lock_rq =3D find_lock_lowest_rq, #endif =20 @@ -2641,8 +2640,9 @@ DEFINE_SCHED_CLASS(rt) =3D { =20 .get_rr_interval =3D get_rr_interval_rt, =20 - .prio_changed =3D prio_changed_rt, + .switched_from =3D switched_from_rt, .switched_to =3D switched_to_rt, + .prio_changed =3D prio_changed_rt, =20 .update_curr =3D update_curr_rt, =20 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2340,6 +2340,7 @@ extern const u32 sched_prio_to_wmult[40 =20 #define DEQUEUE_MIGRATING 0x10 /* Matches ENQUEUE_MIGRATING */ #define DEQUEUE_DELAYED 0x20 /* Matches ENQUEUE_DELAYED */ +#define DEQUEUE_CLASS 0x40 /* Matches ENQUEUE_CLASS */ =20 #define DEQUEUE_SPECIAL 0x0100 =20 @@ -2350,6 +2351,7 @@ extern const u32 sched_prio_to_wmult[40 =20 #define ENQUEUE_MIGRATING 0x10 #define ENQUEUE_DELAYED 0x20 +#define ENQUEUE_CLASS 0x40 =20 #define ENQUEUE_HEAD 0x0100 #define ENQUEUE_REPLENISH 0x0200 @@ -2415,14 +2417,11 @@ struct sched_class { void (*task_fork)(struct task_struct *p); void (*task_dead)(struct task_struct *p); =20 - /* - * The switched_from() call is allowed to drop rq->lock, therefore we - * cannot assume the switched_from/switched_to pair is serialized by - * rq->lock. They are however serialized by p->pi_lock. - */ - void (*switching_to) (struct rq *this_rq, struct task_struct *task); - void (*switched_from)(struct rq *this_rq, struct task_struct *task); - void (*switched_to) (struct rq *this_rq, struct task_struct *task); + void (*switching_from)(struct rq *this_rq, struct task_struct *task); + void (*switched_from) (struct rq *this_rq, struct task_struct *task); + void (*switching_to) (struct rq *this_rq, struct task_struct *task); + void (*switched_to) (struct rq *this_rq, struct task_struct *task); + void (*reweight_task)(struct rq *this_rq, struct task_struct *task, const struct load_weight *lw); void (*prio_changed) (struct rq *this_rq, struct task_struct *task, @@ -3898,11 +3897,7 @@ extern void set_load_weight(struct task_ extern void enqueue_task(struct rq *rq, struct task_struct *p, int flags); extern bool dequeue_task(struct rq *rq, struct task_struct *p, int flags); =20 -extern void check_class_changing(struct rq *rq, struct task_struct *p, - const struct sched_class *prev_class); -extern void check_class_changed(struct rq *rq, struct task_struct *p, - const struct sched_class *prev_class, - int oldprio); +extern void check_prio_changed(struct rq *rq, struct task_struct *p, int o= ldprio); =20 #ifdef CONFIG_SMP extern struct balance_callback *splice_balance_callbacks(struct rq *rq); --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -76,7 +76,7 @@ static void task_tick_stop(struct rq *rq { } =20 -static void switched_to_stop(struct rq *rq, struct task_struct *p) +static void switching_to_stop(struct rq *rq, struct task_struct *p) { BUG(); /* its impossible to change to this class */ } @@ -115,6 +115,6 @@ DEFINE_SCHED_CLASS(stop) =3D { .task_tick =3D task_tick_stop, =20 .prio_changed =3D prio_changed_stop, - .switched_to =3D switched_to_stop, + .switching_to =3D switching_to_stop, .update_curr =3D update_curr_stop, }; --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -698,6 +698,9 @@ int __sched_setscheduler(struct task_str prev_class =3D p->sched_class; next_class =3D __setscheduler_class(policy, newprio); =20 + if (prev_class !=3D next_class) + queue_flags |=3D DEQUEUE_CLASS; + if (prev_class !=3D next_class && p->se.sched_delayed) dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); =20 @@ -709,7 +712,6 @@ int __sched_setscheduler(struct task_str p->prio =3D newprio; } __setscheduler_uclamp(p, attr); - check_class_changing(rq, p, prev_class); =20 if (scope.queued) { /* @@ -721,7 +723,8 @@ int __sched_setscheduler(struct task_str } } =20 - check_class_changed(rq, p, prev_class, oldprio); + if (!(queue_flags & DEQUEUE_CLASS)) + check_prio_changed(rq, p, oldprio); =20 /* Avoid rq from going away on us: */ preempt_disable(); From nobody Mon Nov 25 03:07:22 2024 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DEE3E213EC9 for ; Wed, 30 Oct 2024 15:25:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730301955; cv=none; b=OIpsYbwyxaAuCRIBY7D/MlKrfY4vZur+obi3p6lsq0ibgtzxdDW8dBjURnRdDk80zRf4CadSc7VmeldZqZXRwnmv+Z/EgIbUzbp8ykfp39+1/jKBum0frqkMyxDTfFy+LaGLfGpTjidSyHDXGG1sdvwOm0utWf0+HcDOiNv8dXI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730301955; c=relaxed/simple; bh=FK1SaYO3rXBz41dSsLxeqMRIFrZ9TU7uqpD8sKXnWC4=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=NVDxvmWOrbhTowXom3NqFbBRDEuIfbfDPPqmal6AXqxGgZ/JdXtvvNkwS7ezgLQ2mpnJtVXE1cwANPSilCwhHUH/ruLQzqXkpjTTcmEztDAuZmZxtIj5USBYqFWxcxS7hNsVZexQoiF2FDOnY5buM3edVD2fxumPzg96Pi4IO/c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=az0kkCMZ; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="az0kkCMZ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=fQYqPcYm16PSvtjGOJry54IoSuFIkTvJQlTDvE/eJ5k=; b=az0kkCMZtNALWLisF3HAZlwJWO 4ww1hMI5Kex4ii3C53jCJvHTJpv2LiEEGMmV4hbas/5KXXDS4X7Xb87P5s7v3DZwXrmgb/sB/Nsas HngnUPz4UdkEe2VVCQEXT9CwUBHbEnvdACA3H4tOZfK3PcfdeUjD32B5aommIn1fkhoWSZ/w3mUTi 1b7mP8WTzJ3ScNgnZPtSvmBf0JQwc1erACyNr5ryBp+ODWBmBPCiQQ6nDF3coaZ3XV1krwTU6RSgo aGp3aQF6B/U9b6pUU83Wa69lyRcNlvhqHFBh2LO9xe/JZHgPUC6HBlpHF1+taz5ptJd+6vJjfLhos bSc1Mihg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1t6AZt-0000000AI2F-3UBa; Wed, 30 Oct 2024 15:25:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id EF1023021D4; Wed, 30 Oct 2024 16:25:43 +0100 (CET) Message-Id: <20241030152142.820157837@infradead.org> User-Agent: quilt/0.65 Date: Wed, 30 Oct 2024 16:13:00 +0100 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, void@manifault.com, linux-kernel@vger.kernel.org Subject: [RFC][PATCH 5/6] sched: Add flags to the switch{ing,ed}_{to,from}() methods References: <20241030151255.300069509@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" ... and remove switched_to_scx(), since all methods are now optional. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 8 ++++---- kernel/sched/deadline.c | 4 ++-- kernel/sched/ext.c | 6 ++---- kernel/sched/fair.c | 4 ++-- kernel/sched/idle.c | 2 +- kernel/sched/rt.c | 4 ++-- kernel/sched/sched.h | 8 ++++---- kernel/sched/stop_task.c | 2 +- 8 files changed, 18 insertions(+), 20 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10531,7 +10531,7 @@ struct sched_change_ctx sched_change_beg lockdep_assert_rq_held(rq); =20 if ((flags & DEQUEUE_CLASS) && p->sched_class->switching_from) - p->sched_class->switching_from(rq, p); + p->sched_class->switching_from(rq, p, flags); =20 struct sched_change_ctx ctx =3D { .p =3D p, @@ -10546,7 +10546,7 @@ struct sched_change_ctx sched_change_beg put_prev_task(rq, p); =20 if ((flags & DEQUEUE_CLASS) && p->sched_class->switched_from) - p->sched_class->switched_from(rq, p); + p->sched_class->switched_from(rq, p, flags); =20 return ctx; } @@ -10559,7 +10559,7 @@ void sched_change_end(struct sched_chang lockdep_assert_rq_held(rq); =20 if ((ctx.flags & ENQUEUE_CLASS) && p->sched_class->switching_to) - p->sched_class->switching_to(rq, p); + p->sched_class->switching_to(rq, p, ctx.flags); =20 if (ctx.queued) enqueue_task(rq, p, ctx.flags | ENQUEUE_NOCLOCK); @@ -10567,5 +10567,5 @@ void sched_change_end(struct sched_chang set_next_task(rq, p); =20 if ((ctx.flags & ENQUEUE_CLASS) && p->sched_class->switched_to) - p->sched_class->switched_to(rq, p); + p->sched_class->switched_to(rq, p, ctx.flags); } --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2968,7 +2968,7 @@ void dl_clear_root_domain(struct root_do =20 #endif /* CONFIG_SMP */ =20 -static void switched_from_dl(struct rq *rq, struct task_struct *p) +static void switched_from_dl(struct rq *rq, struct task_struct *p, int fla= gs) { /* * task_non_contending() can start the "inactive timer" (if the 0-lag @@ -3022,7 +3022,7 @@ static void switched_from_dl(struct rq * * When switching to -deadline, we may overload the rq, then * we try to push someone off, if possible. */ -static void switched_to_dl(struct rq *rq, struct task_struct *p) +static void switched_to_dl(struct rq *rq, struct task_struct *p, int flags) { if (hrtimer_try_to_cancel(&p->dl.inactive_timer) =3D=3D 1) put_task_struct(p); --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -3663,7 +3663,7 @@ static void prio_changed_scx(struct rq * { } =20 -static void switching_to_scx(struct rq *rq, struct task_struct *p) +static void switching_to_scx(struct rq *rq, struct task_struct *p, int fla= gs) { scx_ops_enable_task(p); =20 @@ -3676,13 +3676,12 @@ static void switching_to_scx(struct rq * (struct cpumask *)p->cpus_ptr); } =20 -static void switched_from_scx(struct rq *rq, struct task_struct *p) +static void switched_from_scx(struct rq *rq, struct task_struct *p, int fl= ags) { scx_ops_disable_task(p); } =20 static void wakeup_preempt_scx(struct rq *rq, struct task_struct *p,int wa= ke_flags) {} -static void switched_to_scx(struct rq *rq, struct task_struct *p) {} =20 int scx_check_setscheduler(struct task_struct *p, int policy) { @@ -3971,7 +3970,6 @@ DEFINE_SCHED_CLASS(ext) =3D { =20 .switching_to =3D switching_to_scx, .switched_from =3D switched_from_scx, - .switched_to =3D switched_to_scx, .reweight_task =3D reweight_task_scx, .prio_changed =3D prio_changed_scx, =20 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13175,12 +13175,12 @@ static void attach_task_cfs_rq(struct ta attach_entity_cfs_rq(se); } =20 -static void switched_from_fair(struct rq *rq, struct task_struct *p) +static void switched_from_fair(struct rq *rq, struct task_struct *p, int f= lags) { detach_task_cfs_rq(p); } =20 -static void switched_to_fair(struct rq *rq, struct task_struct *p) +static void switched_to_fair(struct rq *rq, struct task_struct *p, int fla= gs) { SCHED_WARN_ON(p->se.sched_delayed); =20 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -494,7 +494,7 @@ static void task_tick_idle(struct rq *rq { } =20 -static void switching_to_idle(struct rq *rq, struct task_struct *p) +static void switching_to_idle(struct rq *rq, struct task_struct *p, int fl= ags) { BUG(); } --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2420,7 +2420,7 @@ static void rq_offline_rt(struct rq *rq) * When switch from the rt queue, we bring ourselves to a position * that we might want to pull RT tasks from other runqueues. */ -static void switched_from_rt(struct rq *rq, struct task_struct *p) +static void switched_from_rt(struct rq *rq, struct task_struct *p, int fla= gs) { /* * If there are other RT tasks then we will reschedule @@ -2451,7 +2451,7 @@ void __init init_sched_rt_class(void) * with RT tasks. In this case we try to push them off to * other runqueues. */ -static void switched_to_rt(struct rq *rq, struct task_struct *p) +static void switched_to_rt(struct rq *rq, struct task_struct *p, int flags) { /* * If we are running, update the avg_rt tracking, as the running time --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2417,10 +2417,10 @@ struct sched_class { void (*task_fork)(struct task_struct *p); void (*task_dead)(struct task_struct *p); =20 - void (*switching_from)(struct rq *this_rq, struct task_struct *task); - void (*switched_from) (struct rq *this_rq, struct task_struct *task); - void (*switching_to) (struct rq *this_rq, struct task_struct *task); - void (*switched_to) (struct rq *this_rq, struct task_struct *task); + void (*switching_from)(struct rq *this_rq, struct task_struct *task, int = flags); + void (*switched_from) (struct rq *this_rq, struct task_struct *task, int = flags); + void (*switching_to) (struct rq *this_rq, struct task_struct *task, int = flags); + void (*switched_to) (struct rq *this_rq, struct task_struct *task, int = flags); =20 void (*reweight_task)(struct rq *this_rq, struct task_struct *task, const struct load_weight *lw); --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -76,7 +76,7 @@ static void task_tick_stop(struct rq *rq { } =20 -static void switching_to_stop(struct rq *rq, struct task_struct *p) +static void switching_to_stop(struct rq *rq, struct task_struct *p, int fl= ags) { BUG(); /* its impossible to change to this class */ } From nobody Mon Nov 25 03:07:22 2024 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DEEBC213ECB for ; Wed, 30 Oct 2024 15:25:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730301955; cv=none; b=Tn0wsrneLRcQjAN1WE5aAqjB3Hr4pYB9weoyflPi6jylkg6iKW2rYwcRIruxLVDZ1EVyJsXyZ9dfahVNApQkC/a3JdGd5z+7Rj7WkrJOl/cd+HEKtpas9EuRcXYO6nJDuFW72/mI3SnJT+i/h5BIpp3dXiIha3FD1Y/W+fZJ080= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730301955; c=relaxed/simple; bh=8E32x75lCrtV/w1t1rzOYa0pZAKOafPl72BulaYj7SA=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=gbEdd2Lb9wfGqMoGN+pOC4Vba8O7ZWVnWGHMhrHpbw5t+TwLS5yX4O7j5NX9cgVTEcLqKq/9WybVAXXMjjisc5EMr8abdPYQsjjn1a4vCoj1x4sdbsSD4blU+ML+yBkrXmpEY8c4sQEaXpc7vU4jxlYfVSbziWx6Gbzvdu5CDsw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=lPjADyxu; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="lPjADyxu" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=k7Ui+RJB68mM8gJ9CNv96KrJtDsj2TyL1J3cQNlTEpk=; b=lPjADyxuluDRY7Hox/Zdyj1fje 24XclypEXGxgngZvduw9Dh/NMlnucbZvJh2MlMi3QpXve6nzNUJJmbkDRK/tM6lQGxRBGZoM3/eS0 gUmat7ExuHZves7rjUzDKBneNvxdWMF0Eehhcv5nJYg/eecRWOiYZxymxA32nh3KH+6y58iUFv92f 0ojSDRHcCcs/wunUbetx2L5e2Jc3w3sL9BIhNS6q0EkKcipm5dCnWK2GQqGV66eWG1atAoCr4PMp6 mNrLjnH41TSKC6D8JINIWaED/hzro5mlA8q4dQm6Hv9G/zYcAOHoYhNmVP9WkaWlH1OYKUfJvVBhT K8tBmD5w==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1t6AZt-0000000AI2E-3UAk; Wed, 30 Oct 2024 15:25:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 0311930625D; Wed, 30 Oct 2024 16:25:43 +0100 (CET) Message-Id: <20241030152142.928034708@infradead.org> User-Agent: quilt/0.65 Date: Wed, 30 Oct 2024 16:13:01 +0100 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, void@manifault.com, linux-kernel@vger.kernel.org Subject: [RFC][PATCH 6/6] sched: Cleanup sched_delayed handling for class switches References: <20241030151255.300069509@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use the new sched_class::switching_from() method to dequeue delayed tasks before switching to another class. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 12 ++++++++---- kernel/sched/ext.c | 12 ++++-------- kernel/sched/fair.c | 7 +++++++ kernel/sched/syscalls.c | 3 --- 4 files changed, 19 insertions(+), 15 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7139,9 +7139,6 @@ void rt_mutex_setprio(struct task_struct if (prev_class !=3D next_class) queue_flag |=3D DEQUEUE_CLASS; =20 - if (prev_class !=3D next_class && p->se.sched_delayed) - dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); - scoped_guard (sched_change, p, queue_flag) { /* * Boosting condition are: @@ -10530,8 +10527,15 @@ struct sched_change_ctx sched_change_beg =20 lockdep_assert_rq_held(rq); =20 - if ((flags & DEQUEUE_CLASS) && p->sched_class->switching_from) + if ((flags & DEQUEUE_CLASS) && p->sched_class->switching_from) { + /* + * switching_from_fair() assumes CLASS implies NOCLOCK; fixing + * this assumption would mean switching_from() would need to be + * able to change flags. + */ + SCHED_WARN_ON(!(flags & DEQUEUE_NOCLOCK)); p->sched_class->switching_from(rq, p, flags); + } =20 struct sched_change_ctx ctx =3D { .p =3D p, --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -4485,7 +4485,7 @@ static void scx_ops_disable_workfn(struc =20 scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { - unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE; + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); @@ -4493,9 +4493,7 @@ static void scx_ops_disable_workfn(struc if (old_class !=3D new_class) queue_flags |=3D DEQUEUE_CLASS; =20 - if (old_class !=3D new_class && p->se.sched_delayed) - dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); - + update_rq_clock(task_rq(p)); scoped_guard (sched_change, p, queue_flags) { p->sched_class =3D new_class; } @@ -5202,7 +5200,7 @@ static int scx_ops_enable(struct sched_e percpu_down_write(&scx_fork_rwsem); scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { - unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE; + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); @@ -5210,9 +5208,7 @@ static int scx_ops_enable(struct sched_e if (old_class !=3D new_class) queue_flags |=3D DEQUEUE_CLASS; =20 - if (old_class !=3D new_class && p->se.sched_delayed) - dequeue_task(task_rq(p), p, DEQUEUE_SLEEP | DEQUEUE_DELAYED); - + update_rq_clock(task_rq(p)); scoped_guard (sched_change, p, DEQUEUE_SAVE | DEQUEUE_MOVE) { p->scx.slice =3D SCX_SLICE_DFL; p->sched_class =3D new_class; --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13175,6 +13175,12 @@ static void attach_task_cfs_rq(struct ta attach_entity_cfs_rq(se); } =20 +static void switching_from_fair(struct rq *rq, struct task_struct *p, int = flags) +{ + if ((flags & DEQUEUE_CLASS) && p->se.sched_delayed) + dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); +} + static void switched_from_fair(struct rq *rq, struct task_struct *p, int f= lags) { detach_task_cfs_rq(p); @@ -13592,6 +13598,7 @@ DEFINE_SCHED_CLASS(fair) =3D { =20 .reweight_task =3D reweight_task_fair, .prio_changed =3D prio_changed_fair, + .switching_from =3D switching_from_fair, .switched_from =3D switched_from_fair, .switched_to =3D switched_to_fair, =20 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -701,9 +701,6 @@ int __sched_setscheduler(struct task_str if (prev_class !=3D next_class) queue_flags |=3D DEQUEUE_CLASS; =20 - if (prev_class !=3D next_class && p->se.sched_delayed) - dequeue_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_DELAYED | DEQUEUE_NOCLOCK); - scoped_guard (sched_change, p, queue_flags) { =20 if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) {