From nobody Mon Jun 8 15:38:28 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C38ED3AFCEF; Thu, 28 May 2026 09:42:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779961325; cv=none; b=ASahhavCmDMVybAPEEPcYojNXwSDBnIsLy4aVo0tlVSlI1k2F7Xs7ICN0cOcauNBME/qWfN6cyakGcaYLKaxvVp7EvULhlbhJ0V2516VoJel9UPR9644b61yj3Xd6l94Wz3ael7seV3GhK4gMkLuOcSYmOwezSG6oBe74hJ/xfs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779961325; c=relaxed/simple; bh=9TblskgSn2xp/CE1l6dS0O7Ld93uAKLxKqhN2jM/eLI=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=QjV7ZBo89SkRDW6JDgb+a+bKl5N2d9A7+4kgT8wPZvWLsH/dfZhwPGo1WUwSAXx+3+UvkHyUSV72ljSHr9bwfr9B2MshDkHTHbDDoU74uwtghknJDVURCC1skWUfKbyYQVKWwSWlt25shieWEmC2rjor7MAAENNT9cMatKhtPWM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=K0YZnH3f; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=epZ0z8+l; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="K0YZnH3f"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="epZ0z8+l" Date: Thu, 28 May 2026 09:42:00 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1779961322; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tp703aJBYhesx5SG+GokMVZOajkQl9g06imUVS4QuUk=; b=K0YZnH3fQ4edNxFxlzEAi+Ct6xyiR5oHYYf0MUQM4jS51sO5ZNTf8e2kyEo4sBbfhWuwsO zoAAd8Z/p2pY3yoVCnm4gEvlyLd16mcafblUcjjrWTsd1fBupranDuY9b5+1kCq75SIGSr adGAIoEa2mcHGy5zN5/xp+OVFKXAMERCpbzDiVq6LMor1xXhBUELduFCuU0VwA2L4keI7t 156NKeQ2ovns7IUeEOr5aHJ/g2+pwqJFrrKktaSuGM+9xxsS8esiPnaPaSAukJ1VfAD24I A7l8nG23+YJokCkh+YNj7nbaP6U8XdwS40DF0yNvrlsgqmbkDVT6aLdhPPQvAw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1779961322; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tp703aJBYhesx5SG+GokMVZOajkQl9g06imUVS4QuUk=; b=epZ0z8+liKSJCPuY9epXk7ITHkjSvCu54eJQvJ3PBY7WNcM4NXGglIO013Tni2COxGHhqB ZKnMhtwG39Bjm2AA== From: "tip-bot2 for Peter Zijlstra" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] sched: Remove sched_class::pick_next_task() Cc: "Peter Zijlstra (Intel)" , Vincent Guittot , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20260511120628.057634261@infradead.org> References: <20260511120628.057634261@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <177996132028.1039918.10780330032704172152.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the sched/core branch of tip: Commit-ID: 5ad278dd20bdf59714443894d7b3044471af97d0 Gitweb: https://git.kernel.org/tip/5ad278dd20bdf59714443894d7b304447= 1af97d0 Author: Peter Zijlstra AuthorDate: Mon, 11 May 2026 13:31:13 +02:00 Committer: Peter Zijlstra CommitterDate: Tue, 26 May 2026 13:53:14 +02:00 sched: Remove sched_class::pick_next_task() The reason for pick_next_task_fair() is the put/set optimization that avoids touching the common ancestors. However, it is possible to implement this in the put_prev_task() and set_next_task() calls as used in put_prev_set_next_task(). Notably, put_prev_set_next_task() is the only site that: - calls put_prev_task() with a .next argument; - calls set_next_task() with .first =3D true. This means that put_prev_task() can determine the common hierarchy and stop there, and then set_next_task() can terminate where put_prev_task stopped. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Vincent Guittot Link: https://patch.msgid.link/20260511120628.057634261@infradead.org --- kernel/sched/core.c | 27 ++------ kernel/sched/fair.c | 139 ++++++++++++++---------------------------- kernel/sched/sched.h | 14 +---- 3 files changed, 57 insertions(+), 123 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 83202f0..3c8bfd6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6030,16 +6030,15 @@ __pick_next_task(struct rq *rq, struct task_struct = *prev, struct rq_flags *rf) if (likely(!sched_class_above(prev->sched_class, &fair_sched_class) && rq->nr_running =3D=3D rq->cfs.h_nr_queued)) { =20 - p =3D pick_next_task_fair(rq, prev, rf); + p =3D pick_task_fair(rq, rf); if (unlikely(p =3D=3D RETRY_TASK)) goto restart; =20 /* Assume the next prioritized class is idle_sched_class */ - if (!p) { + if (!p) p =3D pick_task_idle(rq, rf); - put_prev_set_next_task(rq, prev, p); - } =20 + put_prev_set_next_task(rq, prev, p); return p; } =20 @@ -6047,20 +6046,12 @@ restart: prev_balance(rq, prev, rf); =20 for_each_active_class(class) { - if (class->pick_next_task) { - p =3D class->pick_next_task(rq, prev, rf); - if (unlikely(p =3D=3D RETRY_TASK)) - goto restart; - if (p) - return p; - } else { - p =3D class->pick_task(rq, rf); - if (unlikely(p =3D=3D RETRY_TASK)) - goto restart; - if (p) { - put_prev_set_next_task(rq, prev, p); - return p; - } + p =3D class->pick_task(rq, rf); + if (unlikely(p =3D=3D RETRY_TASK)) + goto restart; + if (p) { + put_prev_set_next_task(rq, prev, p); + return p; } } =20 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5f48af7..62a2dcb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9862,7 +9862,7 @@ preempt: resched_curr_lazy(rq); } =20 -static struct task_struct *pick_task_fair(struct rq *rq, struct rq_flags *= rf) +struct task_struct *pick_task_fair(struct rq *rq, struct rq_flags *rf) __must_hold(__rq_lockp(rq)) { struct sched_entity *se; @@ -9905,72 +9905,6 @@ idle: return NULL; } =20 -static void __set_next_task_fair(struct rq *rq, struct task_struct *p, boo= l first); -static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool = first); - -struct task_struct * -pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_fla= gs *rf) - __must_hold(__rq_lockp(rq)) -{ - struct sched_entity *se; - struct task_struct *p; - - p =3D pick_task_fair(rq, rf); - if (unlikely(p =3D=3D RETRY_TASK)) - return p; - if (!p) - return p; - se =3D &p->se; - -#ifdef CONFIG_FAIR_GROUP_SCHED - if (prev->sched_class !=3D &fair_sched_class) - goto simple; - - __put_prev_set_next_dl_server(rq, prev, p); - - /* - * Because of the set_next_buddy() in dequeue_task_fair() it is rather - * likely that a next task is from the same cgroup as the current. - * - * Therefore attempt to avoid putting and setting the entire cgroup - * hierarchy, only change the part that actually changes. - * - * Since we haven't yet done put_prev_entity and if the selected task - * is a different task than we started out with, try and touch the - * least amount of cfs_rqs. - */ - if (prev !=3D p) { - struct sched_entity *pse =3D &prev->se; - struct cfs_rq *cfs_rq; - - while (!(cfs_rq =3D is_same_group(se, pse))) { - int se_depth =3D se->depth; - int pse_depth =3D pse->depth; - - if (se_depth <=3D pse_depth) { - put_prev_entity(cfs_rq_of(pse), pse); - pse =3D parent_entity(pse); - } - if (se_depth >=3D pse_depth) { - set_next_entity(cfs_rq_of(se), se, true); - se =3D parent_entity(se); - } - } - - put_prev_entity(cfs_rq, pse); - set_next_entity(cfs_rq, se, true); - - __set_next_task_fair(rq, p, true); - } - - return p; - -simple: -#endif /* CONFIG_FAIR_GROUP_SCHED */ - put_prev_set_next_task(rq, prev, p); - return p; -} - static struct task_struct * fair_server_pick_task(struct sched_dl_entity *dl_se, struct rq_flags *rf) __must_hold(__rq_lockp(dl_se->rq)) @@ -9994,10 +9928,33 @@ static void put_prev_task_fair(struct rq *rq, struc= t task_struct *prev, struct t { struct sched_entity *se =3D &prev->se; struct cfs_rq *cfs_rq; + struct sched_entity *nse =3D NULL; =20 - for_each_sched_entity(se) { +#ifdef CONFIG_FAIR_GROUP_SCHED + if (next && next->sched_class =3D=3D &fair_sched_class) + nse =3D &next->se; +#endif + + while (se) { cfs_rq =3D cfs_rq_of(se); - put_prev_entity(cfs_rq, se); + if (!nse || cfs_rq->curr) + put_prev_entity(cfs_rq, se); +#ifdef CONFIG_FAIR_GROUP_SCHED + if (nse) { + if (is_same_group(se, nse)) + break; + + int d =3D nse->depth - se->depth; + if (d >=3D 0) { + /* nse has equal or greater depth, ascend */ + nse =3D parent_entity(nse); + /* if nse is the deeper, do not ascend se */ + if (d > 0) + continue; + } + } +#endif + se =3D parent_entity(se); } } =20 @@ -15021,10 +14978,30 @@ static void switched_to_fair(struct rq *rq, struc= t task_struct *p) } } =20 -static void __set_next_task_fair(struct rq *rq, struct task_struct *p, boo= l first) +/* + * Account for a task changing its policy or group. + * + * This routine is mostly called to set cfs_rq->curr field when a task + * migrates between groups/classes. + */ +static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool = first) { struct sched_entity *se =3D &p->se; =20 + for_each_sched_entity(se) { + struct cfs_rq *cfs_rq =3D cfs_rq_of(se); + + if (IS_ENABLED(CONFIG_FAIR_GROUP_SCHED) && + first && cfs_rq->curr) + break; + + set_next_entity(cfs_rq, se, first); + /* ensure bandwidth has been allocated on our new cfs_rq */ + account_cfs_rq_runtime(cfs_rq, 0); + } + + se =3D &p->se; + if (task_on_rq_queued(p)) { /* * Move the next running task to the front of the list, so our @@ -15044,27 +15021,6 @@ static void __set_next_task_fair(struct rq *rq, st= ruct task_struct *p, bool firs sched_fair_update_stop_tick(rq, p); } =20 -/* - * Account for a task changing its policy or group. - * - * This routine is mostly called to set cfs_rq->curr field when a task - * migrates between groups/classes. - */ -static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool = first) -{ - struct sched_entity *se =3D &p->se; - - for_each_sched_entity(se) { - struct cfs_rq *cfs_rq =3D cfs_rq_of(se); - - set_next_entity(cfs_rq, se, first); - /* ensure bandwidth has been allocated on our new cfs_rq */ - account_cfs_rq_runtime(cfs_rq, 0); - } - - __set_next_task_fair(rq, p, first); -} - void init_cfs_rq(struct cfs_rq *cfs_rq) { cfs_rq->tasks_timeline =3D RB_ROOT_CACHED; @@ -15376,7 +15332,6 @@ DEFINE_SCHED_CLASS(fair) =3D { .wakeup_preempt =3D wakeup_preempt_fair, =20 .pick_task =3D pick_task_fair, - .pick_next_task =3D pick_next_task_fair, .put_prev_task =3D put_prev_task_fair, .set_next_task =3D set_next_task_fair, =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 8eb8f83..6b48bb3 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2589,17 +2589,6 @@ struct sched_class { * schedule/pick_next_task: rq->lock */ struct task_struct *(*pick_task)(struct rq *rq, struct rq_flags *rf); - /* - * Optional! When implemented pick_next_task() should be equivalent to: - * - * next =3D pick_task(); - * if (next) { - * put_prev_task(prev); - * set_next_task_first(next); - * } - */ - struct task_struct *(*pick_next_task)(struct rq *rq, struct task_struct *= prev, - struct rq_flags *rf); =20 /* * sched_change: @@ -2823,8 +2812,7 @@ static inline bool sched_fair_runnable(struct rq *rq) return rq->cfs.nr_queued > 0; } =20 -extern struct task_struct *pick_next_task_fair(struct rq *rq, struct task_= struct *prev, - struct rq_flags *rf); +extern struct task_struct *pick_task_fair(struct rq *rq, struct rq_flags *= rf); extern struct task_struct *pick_task_idle(struct rq *rq, struct rq_flags *= rf); =20 #define SCA_CHECK 0x01