From nobody Mon Dec 15 23:31:36 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BECA9313525; Mon, 15 Dec 2025 07:59:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765785558; cv=none; b=AW8fvfpC6tifgjU6TSc2rSOptFhDfr54P7fz4d/P220llXbZwtReJOklE2sOaRThdHvKBCrGPldloK7VTEIs0aoClFZtuBrQQNqvx0b1AM8SNw8mrAwEdajpGqAJq2jwylShMynlBlIf9j9SXm6WsU28AUdTQluwxeMQFO0u3yc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765785558; c=relaxed/simple; bh=Z34OCFXlevZdGrtbSsD8oAHV/QNL7NlvOkU78WF1tCg=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=kdyFrv8HuodVKNUKDebRi3poTYFs0474lXIKs3tz/lp0rZzD6UhCANlsWw38N4XgJ4JnFUTi11fbB2aLJGA2IFGSt5wXWiAqLdC1zQf7UtAYeYF2I2I+uZ3Qso/UVidqw/sFtEITkdxJ65p/NuC/zVuzcoy9+IjUdXUCr5m4dAQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=wv/2a4zt; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=qBY3ZCJH; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="wv/2a4zt"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="qBY3ZCJH" Date: Mon, 15 Dec 2025 07:59:07 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1765785548; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BI1wFU/JKPBzJnEB5fYpg0nyonYOTN+NFy3jKAu7rkQ=; b=wv/2a4ztult7J97C9Te+RIED5UmouiA4YHIZqmWdcZsAm/tybCG46XXg7GV+w0ADRTuOEH r7D2t1n5t5dTGbD08AktkQ7YmRqBDSYGoxfD97ekwsCTUDENMAiE/NOz4udxhbUiEJDsvG /IJLzWaMipjLwhonOvJFhaPWnFAFaSHEFOiSR39LzCjd5ObDG+8OMWR1KU9Qv5ijJ8F61n 0Qef6zWPkrXu2QZqCl5esorLhrPZGPvlGuO2B7KuSUgIpbVCZiOOPgowxn8a5k4X0+PKbS 0De+hcyAD5EglXSeJ4qa6fub9yXdbmySbob//giB5ydop+EPkjyNvqzQI7cfVg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1765785548; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BI1wFU/JKPBzJnEB5fYpg0nyonYOTN+NFy3jKAu7rkQ=; b=qBY3ZCJHTLTbOdk8zpqbLkPFd6EymC6oJqeLahYJDTmAjP79TJQALVj7Hr7jdh+Jyv3sKA 3BodaZnQ4/69upDA== From: "tip-bot2 for Peter Zijlstra" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*() Cc: "Peter Zijlstra (Intel)" , Ingo Molnar , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20251127154725.901391274@infradead.org> References: <20251127154725.901391274@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <176578554720.498.799395370694223734.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the sched/core branch of tip: Commit-ID: 5d1f0b2f278eb55aebe29210fbc8f352c53497d6 Gitweb: https://git.kernel.org/tip/5d1f0b2f278eb55aebe29210fbc8f352c= 53497d6 Author: Peter Zijlstra AuthorDate: Wed, 10 Dec 2025 09:06:50 +01:00 Committer: Ingo Molnar CommitterDate: Mon, 15 Dec 2025 07:53:35 +01:00 sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*() Change sched_class::wakeup_preempt() to also get called for cross-class wakeups, specifically those where the woken task is of a higher class than the previous highest class. In order to do this, track the current highest class of the runqueue in rq::next_class and have wakeup_preempt() track this upwards for each new wakeup. Additionally have schedule() re-set the value on pick. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Link: https://patch.msgid.link/20251127154725.901391274@infradead.org --- kernel/sched/core.c | 32 +++++++++++++++++++++++--------- kernel/sched/deadline.c | 14 +++++++++----- kernel/sched/ext.c | 7 +++---- kernel/sched/fair.c | 17 ++++++++++------- kernel/sched/idle.c | 3 --- kernel/sched/rt.c | 9 ++++++--- kernel/sched/sched.h | 26 ++------------------------ kernel/sched/stop_task.c | 3 --- 8 files changed, 53 insertions(+), 58 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4479f7d..7d0a862 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2090,7 +2090,6 @@ void enqueue_task(struct rq *rq, struct task_struct *= p, int flags) */ uclamp_rq_inc(rq, p, flags); =20 - rq->queue_mask |=3D p->sched_class->queue_mask; p->sched_class->enqueue_task(rq, p, flags); =20 psi_enqueue(p, flags); @@ -2123,7 +2122,6 @@ inline bool dequeue_task(struct rq *rq, struct task_s= truct *p, int flags) * and mark the task ->sched_delayed. */ uclamp_rq_dec(rq, p); - rq->queue_mask |=3D p->sched_class->queue_mask; return p->sched_class->dequeue_task(rq, p, flags); } =20 @@ -2174,10 +2172,14 @@ void wakeup_preempt(struct rq *rq, struct task_stru= ct *p, int flags) { struct task_struct *donor =3D rq->donor; =20 - if (p->sched_class =3D=3D donor->sched_class) - donor->sched_class->wakeup_preempt(rq, p, flags); - else if (sched_class_above(p->sched_class, donor->sched_class)) + if (p->sched_class =3D=3D rq->next_class) { + rq->next_class->wakeup_preempt(rq, p, flags); + + } else if (sched_class_above(p->sched_class, rq->next_class)) { + rq->next_class->wakeup_preempt(rq, p, flags); resched_curr(rq); + rq->next_class =3D p->sched_class; + } =20 /* * A queue event has occurred, and we're going to schedule. In @@ -6804,6 +6806,7 @@ static void __sched notrace __schedule(int sched_mode) pick_again: next =3D pick_next_task(rq, rq->donor, &rf); rq_set_donor(rq, next); + rq->next_class =3D next->sched_class; if (unlikely(task_is_blocked(next))) { next =3D find_proxy_task(rq, next, &rf); if (!next) @@ -8650,6 +8653,8 @@ void __init sched_init(void) rq->rt.rt_runtime =3D global_rt_runtime(); init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL); #endif + rq->next_class =3D &idle_sched_class; + rq->sd =3D NULL; rq->rd =3D NULL; rq->cpu_capacity =3D SCHED_CAPACITY_SCALE; @@ -10775,10 +10780,8 @@ struct sched_change_ctx *sched_change_begin(struct= task_struct *p, unsigned int=20 flags |=3D DEQUEUE_NOCLOCK; } =20 - if (flags & DEQUEUE_CLASS) { - if (p->sched_class->switching_from) - p->sched_class->switching_from(rq, p); - } + if ((flags & DEQUEUE_CLASS) && p->sched_class->switching_from) + p->sched_class->switching_from(rq, p); =20 *ctx =3D (struct sched_change_ctx){ .p =3D p, @@ -10831,6 +10834,17 @@ void sched_change_end(struct sched_change_ctx *ctx) p->sched_class->switched_to(rq, p); =20 /* + * If this was a class promotion; let the old class know it + * got preempted. Note that none of the switch*_from() methods + * know the new class and none of the switch*_to() methods + * know the old class. + */ + if (ctx->running && sched_class_above(p->sched_class, ctx->class)) { + rq->next_class->wakeup_preempt(rq, p, 0); + rq->next_class =3D p->sched_class; + } + + /* * If this was a degradation in class someone should have set * need_resched by now. */ diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 319439f..80c9559 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2499,9 +2499,16 @@ static int balance_dl(struct rq *rq, struct task_str= uct *p, struct rq_flags *rf) * Only called when both the current and waking task are -deadline * tasks. */ -static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, - int flags) +static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, int fl= ags) { + /* + * Can only get preempted by stop-class, and those should be + * few and short lived, doesn't really make sense to push + * anything away for that. + */ + if (p->sched_class !=3D &dl_sched_class) + return; + if (dl_entity_preempt(&p->dl, &rq->donor->dl)) { resched_curr(rq); return; @@ -3346,9 +3353,6 @@ static int task_is_throttled_dl(struct task_struct *p= , int cpu) #endif =20 DEFINE_SCHED_CLASS(dl) =3D { - - .queue_mask =3D 8, - .enqueue_task =3D enqueue_task_dl, .dequeue_task =3D dequeue_task_dl, .yield_task =3D yield_task_dl, diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 05f5a49..3058777 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -2431,7 +2431,7 @@ do_pick_task_scx(struct rq *rq, struct rq_flags *rf, = bool force_scx) /* see kick_cpus_irq_workfn() */ smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1); =20 - rq_modified_clear(rq); + rq->next_class =3D &fair_sched_class; =20 rq_unpin_lock(rq, rf); balance_one(rq, prev); @@ -3075,7 +3075,8 @@ static void switched_from_scx(struct rq *rq, struct t= ask_struct *p) scx_disable_task(p); } =20 -static void wakeup_preempt_scx(struct rq *rq, struct task_struct *p,int wa= ke_flags) {} +static void wakeup_preempt_scx(struct rq *rq, struct task_struct *p, int w= ake_flags) {} + static void switched_to_scx(struct rq *rq, struct task_struct *p) {} =20 int scx_check_setscheduler(struct task_struct *p, int policy) @@ -3336,8 +3337,6 @@ static void scx_cgroup_unlock(void) {} * their current sched_class. Call them directly from sched core instead. */ DEFINE_SCHED_CLASS(ext) =3D { - .queue_mask =3D 1, - .enqueue_task =3D enqueue_task_scx, .dequeue_task =3D dequeue_task_scx, .yield_task =3D yield_task_scx, diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d588eb8..76f5e4b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8736,7 +8736,7 @@ preempt_sync(struct rq *rq, int wake_flags, /* * Preempt the current task with a newly woken task if needed: */ -static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p= , int wake_flags) +static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int = wake_flags) { enum preempt_wakeup_action preempt_action =3D PREEMPT_WAKEUP_PICK; struct task_struct *donor =3D rq->donor; @@ -8744,6 +8744,12 @@ static void check_preempt_wakeup_fair(struct rq *rq,= struct task_struct *p, int=20 struct cfs_rq *cfs_rq =3D task_cfs_rq(donor); int cse_is_idle, pse_is_idle; =20 + /* + * XXX Getting preempted by higher class, try and find idle CPU? + */ + if (p->sched_class !=3D &fair_sched_class) + return; + if (unlikely(se =3D=3D pse)) return; =20 @@ -12911,7 +12917,7 @@ static int sched_balance_newidle(struct rq *this_rq= , struct rq_flags *rf) t0 =3D sched_clock_cpu(this_cpu); __sched_balance_update_blocked_averages(this_rq); =20 - rq_modified_clear(this_rq); + this_rq->next_class =3D &fair_sched_class; raw_spin_rq_unlock(this_rq); =20 for_each_domain(this_cpu, sd) { @@ -12978,7 +12984,7 @@ static int sched_balance_newidle(struct rq *this_rq= , struct rq_flags *rf) pulled_task =3D 1; =20 /* If a higher prio class was modified, restart the pick */ - if (rq_modified_above(this_rq, &fair_sched_class)) + if (sched_class_above(this_rq->next_class, &fair_sched_class)) pulled_task =3D -1; =20 out: @@ -13882,15 +13888,12 @@ static unsigned int get_rr_interval_fair(struct r= q *rq, struct task_struct *task * All the scheduling class methods: */ DEFINE_SCHED_CLASS(fair) =3D { - - .queue_mask =3D 2, - .enqueue_task =3D enqueue_task_fair, .dequeue_task =3D dequeue_task_fair, .yield_task =3D yield_task_fair, .yield_to_task =3D yield_to_task_fair, =20 - .wakeup_preempt =3D check_preempt_wakeup_fair, + .wakeup_preempt =3D wakeup_preempt_fair, =20 .pick_task =3D pick_task_fair, .pick_next_task =3D pick_next_task_fair, diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index c174afe..65eb8f8 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -536,9 +536,6 @@ static void update_curr_idle(struct rq *rq) * Simple, special scheduling class for the per-CPU idle tasks: */ DEFINE_SCHED_CLASS(idle) =3D { - - .queue_mask =3D 0, - /* no enqueue/yield_task for idle tasks */ =20 /* dequeue is not valid, we print a debug message there: */ diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index f1867fe..0a9b2cd 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1615,6 +1615,12 @@ static void wakeup_preempt_rt(struct rq *rq, struct = task_struct *p, int flags) { struct task_struct *donor =3D rq->donor; =20 + /* + * XXX If we're preempted by DL, queue a push? + */ + if (p->sched_class !=3D &rt_sched_class) + return; + if (p->prio < donor->prio) { resched_curr(rq); return; @@ -2568,9 +2574,6 @@ static int task_is_throttled_rt(struct task_struct *p= , int cpu) #endif /* CONFIG_SCHED_CORE */ =20 DEFINE_SCHED_CLASS(rt) =3D { - - .queue_mask =3D 4, - .enqueue_task =3D enqueue_task_rt, .dequeue_task =3D dequeue_task_rt, .yield_task =3D yield_task_rt, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ab1bfa0..bdb1e74 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1119,7 +1119,6 @@ struct rq { raw_spinlock_t __lock; =20 /* Per class runqueue modification mask; bits in class order. */ - unsigned int queue_mask; unsigned int nr_running; #ifdef CONFIG_NUMA_BALANCING unsigned int nr_numa_running; @@ -1179,6 +1178,7 @@ struct rq { struct sched_dl_entity *dl_server; struct task_struct *idle; struct task_struct *stop; + const struct sched_class *next_class; unsigned long next_balance; struct mm_struct *prev_mm; =20 @@ -2426,15 +2426,6 @@ struct sched_class { #ifdef CONFIG_UCLAMP_TASK int uclamp_enabled; #endif - /* - * idle: 0 - * ext: 1 - * fair: 2 - * rt: 4 - * dl: 8 - * stop: 16 - */ - unsigned int queue_mask; =20 /* * move_queued_task/activate_task/enqueue_task: rq->lock @@ -2593,20 +2584,6 @@ struct sched_class { #endif }; =20 -/* - * Does not nest; only used around sched_class::pick_task() rq-lock-breaks. - */ -static inline void rq_modified_clear(struct rq *rq) -{ - rq->queue_mask =3D 0; -} - -static inline bool rq_modified_above(struct rq *rq, const struct sched_cla= ss * class) -{ - unsigned int mask =3D class->queue_mask; - return rq->queue_mask & ~((mask << 1) - 1); -} - static inline void put_prev_task(struct rq *rq, struct task_struct *prev) { WARN_ON_ONCE(rq->donor !=3D prev); @@ -3899,6 +3876,7 @@ void move_queued_task_locked(struct rq *src_rq, struc= t rq *dst_rq, struct task_s deactivate_task(src_rq, task, 0); set_task_cpu(task, dst_rq->cpu); activate_task(dst_rq, task, 0); + wakeup_preempt(dst_rq, task, 0); } =20 static inline diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c index 4f9192b..f95798b 100644 --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -97,9 +97,6 @@ static void update_curr_stop(struct rq *rq) * Simple, special scheduling class for the per-CPU stop tasks: */ DEFINE_SCHED_CLASS(stop) =3D { - - .queue_mask =3D 16, - .enqueue_task =3D enqueue_task_stop, .dequeue_task =3D dequeue_task_stop, .yield_task =3D yield_task_stop,