From nobody Wed Jun 17 05:10:17 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B63E34A79E for ; Wed, 22 Apr 2026 23:07:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899230; cv=none; b=cBLRiJLd1rAU+lHMHv1njrH6E2EFRrUcCXMpJ44SRjKk1chGycV1zZ6Yj+BOZ9R3x8/DiZtNAPQSsHQ6eE1q1NIyU4g5/aR/N9bSnfLdQnCMYv4Ph/7SYVcz3xSWo8LQmfFr/bsBtBOh2zHxeurR6fFMiMdUVecDQlt0CI+yVnM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899230; c=relaxed/simple; bh=fGa6TOAYeHeInmFA+Ft5ufCLZZ9luP3dGsCVoOTPo+Y=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=QWs1KNmqWzyK8xJaCo3io1kuqA/7mkaCBRCTgoD/A6oKry8m0QRwRjWYQ+Gq0YU43Jh/TkhRsl54mDjHX7OFL0DmefjshsPwD8iGBVWdJQEwd5y7jLZ7E1hSXWf2E4c6J0QoxTheU0LpC48U86Yb2kdDzf2+9Om1iMAgNDslq+g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=UhDo+jFh; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="UhDo+jFh" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-358e425c261so7185838a91.3 for ; Wed, 22 Apr 2026 16:07:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776899227; x=1777504027; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TJ4qjatYydvK9gZxdeDSyJL/1YC/vMfoiWEL4QJRCcU=; b=UhDo+jFh/2hVFP8Fm+zkYM1BaUsXsqMdkfNz97wtzUjKvAb66yQOaOoX6AUCCMW38z p/KFZs10hQ0PWMQx5T15tEKLEeynIo3JaH0ekW56ZvQBIta1WjNPt7uNF4Rt62slmrxB X5d5lq2iMOSBmRR5Zhu8TxZatbSeB9SUEwEEB5KwAhSvZf+BXl8qQacScK4Je5DdqNsA jim18mqm8gLiATfz3B0zWzYZ7Iu7UkEHDRFmSwmOJXXFt91RPN0sXqN47q4yl2Nx3IUp jI5hrgRLV4H+BYIpgrmRmMcgwxR8xD1IGBDQuI0teJfl+4gkoQCYmV2IPbGGVz8QxI0H F8Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776899227; x=1777504027; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TJ4qjatYydvK9gZxdeDSyJL/1YC/vMfoiWEL4QJRCcU=; b=Q7CjXT0pkCWrxkvJkeKPKjuEa3nN6imwnXAaq6uf5YVVfxQ8i/J4cPnUgXbv05C0Fo 6hLC8p93bWCH0wxvVYwMUKIYK7vsVRqcrMKQLkRGPVYeS+rgqde57Jq0jhLPmslNPsoS MbCOQdda6ZPtGvP/fJzzqJ84Uw6s+erurgi+RZJXW7sj6HGFcbr8lUjMKZ76u2nU8bmL ubDq04qsB7hTcedMYofAZ2BFqFgqBziJysRfwMISEvEUJtAtyeLys9cMXnus62oLIUco RoFTLX4U1ox+lSPqzYZyzDV+97I0otOTjzKXReZ9BnSDIGIUJjnscIsPaRVfmPnNHY8V SLvA== X-Gm-Message-State: AOJu0Ywz/zm/YlZRmTtMHGH3nv+e+DC8O1thstk39zxKRkmhZD12t0SZ smJ5YMvuc5susGH5gStdRkOpiuqtKsecMtzbS+mNnvdh6lT+Oamq3Ak1jTL2h8yTfrfGCCGaDK+ xaaiV10MbQUCpuTpseHXSHbow1CyKvAY3VB503iPiO4bu+QfvLCdtijDfgdEDRZYSCu13buoaBn 1fag6ccNAHQFPnaH9YOde7DrGRhM3DumFwYJRfTEMkZGEQeC4q X-Received: from pjjd7.prod.google.com ([2002:a17:90a:6287:b0:35c:dbb:e44a]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3811:b0:35f:c729:de9f with SMTP id 98e67ed59e1d1-361404c2053mr25011836a91.27.1776899227147; Wed, 22 Apr 2026 16:07:07 -0700 (PDT) Date: Wed, 22 Apr 2026 23:06:42 +0000 In-Reply-To: <20260422230659.903191-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260422230659.903191-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.rc2.533.g4f5dca5207-goog Message-ID: <20260422230659.903191-2-jstultz@google.com> Subject: [PATCH v28 1/8] sched: Rework pick_next_task() and prev_balance() to avoid stale prev references From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Historically, the prev value from __schedule() was the rq->curr. This prev value is passed down through numerous functions, and used in the class scheduler implementations. The fact that prev was on_cpu until the end of __schedule(), meant it was stable across the rq lock drops that the class->pick_next_task() and ->balance() implementations often do. However, with proxy-exec, the prev passed to functions called by __schedule() is rq->donor, which may not be the same as rq->curr and may not be on_cpu, this makes the prev value potentially unstable across rq lock drops. A recently found issue with proxy-exec, is when we begin doing return migration from try_to_wake_up(), its possible we may be waking up the rq->donor. When we do this, we proxy_resched_idle() to put_prev_set_next() setting the rq->donor to rq->idle, allowing the rq->donor to be return migrated and allowed to run. This however runs into trouble, as on another cpu we might be in the middle of calling __schedule(). Conceptually the rq lock is held for the majority of the time, but in calling pick_next_task() its possible the class->pick_next_task() handler or the ->balance() call may briefly drop the rq lock. This opens a window for try_to_wake_up() to wake and return migrate the rq->donor before the class logic reacquires the rq lock. Unfortunately pick_next_task() and prev_balance() pass in a prev argument, to which we pass rq->donor. However this prev value can now become stale and incorrect across a rq lock drop. So, to correct this, rework the pick_next_task() and prev_balance() calls so that they do not take a "prev" argument. Also rework the class ->pick_next_task() and ->balance() implementations to drop the prev argument, and in the cases where it was used, and have the class functions reference rq->donor directly, and not save the value across rq lock drops so that we don't end up with a stale references. Signed-off-by: John Stultz --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 37 ++++++++++++++++++------------------- kernel/sched/deadline.c | 8 +++++++- kernel/sched/fair.c | 9 +++++++-- kernel/sched/idle.c | 2 +- kernel/sched/rt.c | 8 +++++++- kernel/sched/sched.h | 10 ++++------ kernel/sched/stop_task.c | 2 +- 7 files changed, 45 insertions(+), 31 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index da20fb6ea25ae..3ac6dd4d3c587 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5971,10 +5971,9 @@ static inline void schedule_debug(struct task_struct= *prev, bool preempt) schedstat_inc(this_rq()->sched_count); } =20 -static void prev_balance(struct rq *rq, struct task_struct *prev, - struct rq_flags *rf) +static void prev_balance(struct rq *rq, struct rq_flags *rf) { - const struct sched_class *start_class =3D prev->sched_class; + const struct sched_class *start_class =3D rq->donor->sched_class; const struct sched_class *class; =20 /* @@ -5986,7 +5985,7 @@ static void prev_balance(struct rq *rq, struct task_s= truct *prev, * a runnable task of @class priority or higher. */ for_active_class_range(class, start_class, &idle_sched_class) { - if (class->balance && class->balance(rq, prev, rf)) + if (class->balance && class->balance(rq, rf)) break; } } @@ -5995,7 +5994,7 @@ static void prev_balance(struct rq *rq, struct task_s= truct *prev, * Pick up the highest-prio task: */ static inline struct task_struct * -__pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags = *rf) +__pick_next_task(struct rq *rq, struct rq_flags *rf) __must_hold(__rq_lockp(rq)) { const struct sched_class *class; @@ -6012,28 +6011,28 @@ __pick_next_task(struct rq *rq, struct task_struct = *prev, struct rq_flags *rf) * higher scheduling class, because otherwise those lose the * opportunity to pull in more work from other CPUs. */ - if (likely(!sched_class_above(prev->sched_class, &fair_sched_class) && + if (likely(!sched_class_above(rq->donor->sched_class, &fair_sched_class) = && rq->nr_running =3D=3D rq->cfs.h_nr_queued)) { =20 - p =3D pick_next_task_fair(rq, prev, rf); + p =3D pick_next_task_fair(rq, rf); if (unlikely(p =3D=3D RETRY_TASK)) goto restart; =20 /* Assume the next prioritized class is idle_sched_class */ if (!p) { p =3D pick_task_idle(rq, rf); - put_prev_set_next_task(rq, prev, p); + put_prev_set_next_task(rq, rq->donor, p); } =20 return p; } =20 restart: - prev_balance(rq, prev, rf); + prev_balance(rq, rf); =20 for_each_active_class(class) { if (class->pick_next_task) { - p =3D class->pick_next_task(rq, prev, rf); + p =3D class->pick_next_task(rq, rf); if (unlikely(p =3D=3D RETRY_TASK)) goto restart; if (p) @@ -6043,7 +6042,7 @@ __pick_next_task(struct rq *rq, struct task_struct *p= rev, struct rq_flags *rf) if (unlikely(p =3D=3D RETRY_TASK)) goto restart; if (p) { - put_prev_set_next_task(rq, prev, p); + put_prev_set_next_task(rq, rq->donor, p); return p; } } @@ -6096,7 +6095,7 @@ extern void task_vruntime_update(struct rq *rq, struc= t task_struct *p, bool in_f static void queue_core_balance(struct rq *rq); =20 static struct task_struct * -pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *r= f) +pick_next_task(struct rq *rq, struct rq_flags *rf) __must_hold(__rq_lockp(rq)) { struct task_struct *next, *p, *max; @@ -6109,7 +6108,7 @@ pick_next_task(struct rq *rq, struct task_struct *pre= v, struct rq_flags *rf) bool need_sync; =20 if (!sched_core_enabled(rq)) - return __pick_next_task(rq, prev, rf); + return __pick_next_task(rq, rf); =20 cpu =3D cpu_of(rq); =20 @@ -6122,7 +6121,7 @@ pick_next_task(struct rq *rq, struct task_struct *pre= v, struct rq_flags *rf) */ rq->core_pick =3D NULL; rq->core_dl_server =3D NULL; - return __pick_next_task(rq, prev, rf); + return __pick_next_task(rq, rf); } =20 /* @@ -6146,7 +6145,7 @@ pick_next_task(struct rq *rq, struct task_struct *pre= v, struct rq_flags *rf) goto out_set_next; } =20 - prev_balance(rq, prev, rf); + prev_balance(rq, rf); =20 smt_mask =3D cpu_smt_mask(cpu); need_sync =3D !!rq->core->core_cookie; @@ -6328,7 +6327,7 @@ pick_next_task(struct rq *rq, struct task_struct *pre= v, struct rq_flags *rf) } =20 out_set_next: - put_prev_set_next_task(rq, prev, next); + put_prev_set_next_task(rq, rq->donor, next); if (rq->core->core_forceidle_count && next =3D=3D rq->idle) queue_core_balance(rq); =20 @@ -6551,10 +6550,10 @@ static inline void sched_core_cpu_deactivate(unsign= ed int cpu) {} static inline void sched_core_cpu_dying(unsigned int cpu) {} =20 static struct task_struct * -pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *r= f) +pick_next_task(struct rq *rq, struct rq_flags *rf) __must_hold(__rq_lockp(rq)) { - return __pick_next_task(rq, prev, rf); + return __pick_next_task(rq, rf); } =20 #endif /* !CONFIG_SCHED_CORE */ @@ -7101,7 +7100,7 @@ static void __sched notrace __schedule(int sched_mode) =20 pick_again: assert_balance_callbacks_empty(rq); - next =3D pick_next_task(rq, rq->donor, &rf); + next =3D pick_next_task(rq, &rf); rq->next_class =3D next->sched_class; if (sched_proxy_exec()) { struct task_struct *prev_donor =3D rq->donor; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index edca7849b165d..f07a888314450 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2506,8 +2506,14 @@ static void check_preempt_equal_dl(struct rq *rq, st= ruct task_struct *p) resched_curr(rq); } =20 -static int balance_dl(struct rq *rq, struct task_struct *p, struct rq_flag= s *rf) +static int balance_dl(struct rq *rq, struct rq_flags *rf) { + /* + * Note, rq->donor may change during rq lock drops, + * so don't re-use prev across lock drops + */ + struct task_struct *p =3D rq->donor; + if (!on_dl_rq(&p->dl) && need_pull_dl_task(rq, p)) { /* * This is OK, because current is on_cpu, which avoids it being diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 69361c63353ad..b843f9a876d6d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9195,14 +9195,19 @@ static void __set_next_task_fair(struct rq *rq, str= uct task_struct *p, bool firs static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool = first); =20 struct task_struct * -pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_fla= gs *rf) +pick_next_task_fair(struct rq *rq, struct rq_flags *rf) __must_hold(__rq_lockp(rq)) { struct sched_entity *se; - struct task_struct *p; + struct task_struct *p, *prev; int new_tasks; =20 again: + /* + * Re-read rq->donor at the top as it may have + * changed across a rq lock drop + */ + prev =3D rq->donor; p =3D pick_task_fair(rq, rf); if (!p) goto idle; diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index a83be0c834ddb..ff39120d723a9 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -462,7 +462,7 @@ select_task_rq_idle(struct task_struct *p, int cpu, int= flags) } =20 static int -balance_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) +balance_idle(struct rq *rq, struct rq_flags *rf) { return WARN_ON_ONCE(1); } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 4ee8faf01441a..3c5f37c858b60 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1596,8 +1596,14 @@ static void check_preempt_equal_prio(struct rq *rq, = struct task_struct *p) resched_curr(rq); } =20 -static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flag= s *rf) +static int balance_rt(struct rq *rq, struct rq_flags *rf) { + /* + * Note, rq->donor may change during rq lock drops, + * so don't re-use p across lock drops + */ + struct task_struct *p =3D rq->donor; + if (!on_rt_rq(&p->rt) && need_pull_rt_task(rq, p)) { /* * This is OK, because current is on_cpu, which avoids it being diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9f63b15d309d1..2b3a97735efeb 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2561,7 +2561,7 @@ struct sched_class { /* * schedule/pick_next_task/prev_balance: rq->lock */ - int (*balance)(struct rq *rq, struct task_struct *prev, struct rq_flags *= rf); + int (*balance)(struct rq *rq, struct rq_flags *rf); =20 /* * schedule/pick_next_task: rq->lock @@ -2572,12 +2572,11 @@ struct sched_class { * * next =3D pick_task(); * if (next) { - * put_prev_task(prev); + * put_prev_task(rq->donor); * set_next_task_first(next); * } */ - struct task_struct *(*pick_next_task)(struct rq *rq, struct task_struct *= prev, - struct rq_flags *rf); + struct task_struct *(*pick_next_task)(struct rq *rq, struct rq_flags *rf); =20 /* * sched_change: @@ -2801,8 +2800,7 @@ static inline bool sched_fair_runnable(struct rq *rq) return rq->cfs.nr_queued > 0; } =20 -extern struct task_struct *pick_next_task_fair(struct rq *rq, struct task_= struct *prev, - struct rq_flags *rf); +extern struct task_struct *pick_next_task_fair(struct rq *rq, struct rq_fl= ags *rf); extern struct task_struct *pick_task_idle(struct rq *rq, struct rq_flags *= rf); =20 #define SCA_CHECK 0x01 diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c index f95798baddebb..c909ca0d8c87c 100644 --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -16,7 +16,7 @@ select_task_rq_stop(struct task_struct *p, int cpu, int f= lags) } =20 static int -balance_stop(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) +balance_stop(struct rq *rq, struct rq_flags *rf) { return sched_stop_runnable(rq); } --=20 2.54.0.rc2.533.g4f5dca5207-goog From nobody Wed Jun 17 05:10:17 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CF4039F172 for ; Wed, 22 Apr 2026 23:07:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899231; cv=none; b=ulTQ6ErXJkZNZVY94Zx4l53xDwQgWi6ripa7c94izkKmd2JnRe14ghCRr7E9VfwGQg+2DmC06uT2tTyVZF+7KNHVwwt79eohhSZxYNudv5lwyMSyY3Jd7JGxbzlDelp/m4Pe7gfLNaaWPrJuShCFtEsEJigZwWChUltxFHfNGOA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899231; c=relaxed/simple; bh=YRW/EPe5+CC0yj8CG1OH6gDifbLU+FWNeYZ8EzC0ipY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=obG5faUTtp9298HTkczEqkBW624JfLPB1WYT7yjkFBN4DdKd9iElpoE+yIZ7RBbAAgkfyVZsaaJK0s7l7k6SA0yRVNCL3UBzytCWSdfW095ZGr4I0go6QY+0CXD/K/IT3WMzTxo+HcNLgrkL6CtIway6q4xJ+daQ9aO6cGMtuW8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=exBnH6R6; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="exBnH6R6" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2b2e06219cbso76805805ad.3 for ; Wed, 22 Apr 2026 16:07:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776899229; x=1777504029; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XNUOzC8w0maPp8alwRX3ZW5+YpT5DiPfDrAYQzTzt3U=; b=exBnH6R6zYLUh/o0Al1zvgUymKuN4x3+WXNuBhVcgsU622jP4PkRT9DQBwWV0LwGK7 zsOJaebelzUhtvelf/PGiN6QNt2TKnXHjrWc+Sqae4mq8sKt6n6fu5glBuRFo78PGo8h HQiPHVygQ8nIpxby1JWuLsIBFxxCsr6m18p2ROC/2Vl+RfhKKgLi+bzpdKrIxLRPmAM6 d5Rwcbaon9Xfe9cxoStOGtvxRnaIH61lIQhf1zf8EICTQdlbok+0zQ7OB9DQ5teM0uEM BL8WI+oVKb3NmJltSeKBjv2xYHGOfJpHC0b2ZQL2subYodfQ0aOOExKTko+n04tSd7nZ 1K4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776899229; x=1777504029; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XNUOzC8w0maPp8alwRX3ZW5+YpT5DiPfDrAYQzTzt3U=; b=OQfRLUhTx95NjChArrMccyjMJo+NspoJPeg/pOGH1/MTywzTiOm9zzG0art44bsGi4 Np6CwsucqzIq+UeTC8IVRdURrgljkU6N5Ej29PVF/lRW8f4X0Qm6bILYjPNKNsD3Yr00 ZpDdzOAR673p/WnP4vAZPNmE7kgZAs9+Gliu/pYD9CgUNhhjoQ4SLIW2XuqYWkX3L0vi fscBrLArLOgxFD6O5/ky2RAgYWBfs/55GWfaJDezV8+FlSINYvTUm2xcvSykDv1vl8T0 OLuluRHsHz8loe2Qq3XUojLXDVNhhrR8kYNVAgFEIUzzei6OoWTqt/BULxWVh4dKNhGy rrmA== X-Gm-Message-State: AOJu0Yw+ykZ7ZyQbeWoDTFSztTgfIW/An78UtaouPsEbY7BEbJvH2P/z JF4tdcmV1infjE9Wo8MDPG2dFXZ3/Sv8jcIEs84I6Fd4wqOqsD42xBavCvZCmh+mOwyMZtsuTV8 lBOXPYS7ogZQ5ZCQm0OgNt4SuShA5Q0V8pA23Q46rZxqz6zLJ9+4KzrI4VjKyVyApIJByrnhlve EkptI1ZsSlEDcivpMVHwCOV3dcv+1RmIHIyf7Ng6aul1kE2k1g X-Received: from plbmh12.prod.google.com ([2002:a17:903:9cc:b0:2b2:98eb:2605]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:1450:b0:2b7:a350:463f with SMTP id d9443c01a7336-2b7a3505063mr5185105ad.10.1776899228698; Wed, 22 Apr 2026 16:07:08 -0700 (PDT) Date: Wed, 22 Apr 2026 23:06:43 +0000 In-Reply-To: <20260422230659.903191-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260422230659.903191-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.rc2.533.g4f5dca5207-goog Message-ID: <20260422230659.903191-3-jstultz@google.com> Subject: [PATCH v28 2/8] sched: deadline: Add some helper variables to cleanup deadline logic From: John Stultz To: LKML Cc: John Stultz , Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As part of an improvement to handling pushable deadline tasks, Peter suggested this cleanup[1], to use helper values for dl_entity and dl_rq in the enqueue_task_dl() and put_prev_task_dl() functions. There should be no functional change from this patch. To make sure this cleanup change doesn't obscure later logic changes, I've split it into its own patch. [1]: https://lore.kernel.org/lkml/20260304095123.GP606826@noisy.programming= .kicks-ass.net/ Suggested-by: Peter Zijlstra Signed-off-by: John Stultz --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/deadline.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index f07a888314450..4ff3e164d9880 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2292,7 +2292,10 @@ static void dequeue_dl_entity(struct sched_dl_entity= *dl_se, int flags) =20 static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flag= s) { - if (is_dl_boosted(&p->dl)) { + struct sched_dl_entity *dl_se =3D &p->dl; + struct dl_rq *dl_rq =3D &rq->dl; + + if (is_dl_boosted(dl_se)) { /* * Because of delays in the detection of the overrun of a * thread's runtime, it might be the case that a thread @@ -2305,14 +2308,14 @@ static void enqueue_task_dl(struct rq *rq, struct t= ask_struct *p, int flags) * * In this case, the boost overrides the throttle. */ - if (p->dl.dl_throttled) { + if (dl_se->dl_throttled) { /* * The replenish timer needs to be canceled. No * problem if it fires concurrently: boosted threads * are ignored in dl_task_timer(). */ - cancel_replenish_timer(&p->dl); - p->dl.dl_throttled =3D 0; + cancel_replenish_timer(dl_se); + dl_se->dl_throttled =3D 0; } } else if (!dl_prio(p->normal_prio)) { /* @@ -2324,7 +2327,7 @@ static void enqueue_task_dl(struct rq *rq, struct tas= k_struct *p, int flags) * being boosted again with no means to replenish the runtime and clear * the throttle. */ - p->dl.dl_throttled =3D 0; + dl_se->dl_throttled =3D 0; if (!(flags & ENQUEUE_REPLENISH)) printk_deferred_once("sched: DL de-boosted task PID %d: REPLENISH flag = missing\n", task_pid_nr(p)); @@ -2333,14 +2336,14 @@ static void enqueue_task_dl(struct rq *rq, struct t= ask_struct *p, int flags) } =20 check_schedstat_required(); - update_stats_wait_start_dl(dl_rq_of_se(&p->dl), &p->dl); + update_stats_wait_start_dl(dl_rq, dl_se); =20 if (p->on_rq =3D=3D TASK_ON_RQ_MIGRATING) flags |=3D ENQUEUE_MIGRATING; =20 - enqueue_dl_entity(&p->dl, flags); + enqueue_dl_entity(dl_se, flags); =20 - if (dl_server(&p->dl)) + if (dl_server(dl_se)) return; =20 if (task_is_blocked(p)) @@ -2643,7 +2646,7 @@ static void put_prev_task_dl(struct rq *rq, struct ta= sk_struct *p, struct task_s struct sched_dl_entity *dl_se =3D &p->dl; struct dl_rq *dl_rq =3D &rq->dl; =20 - if (on_dl_rq(&p->dl)) + if (on_dl_rq(dl_se)) update_stats_wait_start_dl(dl_rq, dl_se); =20 update_curr_dl(rq); @@ -2653,7 +2656,7 @@ static void put_prev_task_dl(struct rq *rq, struct ta= sk_struct *p, struct task_s if (task_is_blocked(p)) return; =20 - if (on_dl_rq(&p->dl) && p->nr_cpus_allowed > 1) + if (on_dl_rq(dl_se) && p->nr_cpus_allowed > 1) enqueue_pushable_dl_task(rq, p); } =20 --=20 2.54.0.rc2.533.g4f5dca5207-goog From nobody Wed Jun 17 05:10:17 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 928E539C005 for ; Wed, 22 Apr 2026 23:07:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899233; cv=none; b=j5Ddc2+vykR6m7Cqg/SYa+yJOuLikKT7xeAm+0v6q2U6hbS8p+d34WlNJbifZ2CaQcVC/ZiuKimDS5OZ60k60rL26HujJJt8v8daBQAIVVecoVes+y/cDhLop11/DzNcGPIpSU97yVgO5C9fkt5JOFQbpWZX+CpevkCeTIPF2EM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899233; c=relaxed/simple; bh=ycivuRo1T5jW2YQ9L0yq7POovtAqlpPgLPZzt7RQY+E=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=KTSHKzQ9RX/Rrhmtis8EvaUAF2LVONWzlvV+f6BYc2yjm38lHItyU+LjUJfTZU9/PvTAHUoGfJL2p2dcn3tlRMGNdp6cajJLBXfgf/SN5ai6tOFmUZUG0P1A2rOdoOMfhdYn0OZV3n7y9Y3b1e3ZYnoKAmWJrSz4FldnX6ZYHVY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=He751+qH; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="He751+qH" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-354c0234c1fso6510607a91.2 for ; Wed, 22 Apr 2026 16:07:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776899231; x=1777504031; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lAboLaNdfEg5ZaKcXWXhV1PkcwhTcJXf/EXqaZs21hM=; b=He751+qHa7fBiNcHMQNGq7I3yDYXyEX7meEH6wSyowH2T41Wc2/RukSBHCWlaaLbnQ bD4pD33oJeFoi/0IC8trZKeKvtD7a1U30mVa57Bt9OjpiUA8Kt0RQQVr2Uacudr5qfWy PAK+pL/NGWgg3+3eo5Kr4mIraJqPZk3ES0+G2jcL9As6JaCBLxVKR75xSNqhlM5aVfIh B2/K0iH+MuMr7W5lsmRIOR+x8IYSXcD72kRHN1pxAZMoBHD/45x/L5IPDxAyf4i8SbtN tDJsj4O7qwmzXAOav+9tbcXJpTpL/U4O9E0ds2/ON+79DbVglWbMitESsYArdzedGsMY DnuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776899231; x=1777504031; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lAboLaNdfEg5ZaKcXWXhV1PkcwhTcJXf/EXqaZs21hM=; b=cMcVpY4akp5oa1PTqQ4ThHjEJeqXu+lYdjXrVjwpPYZRNLwBKXWlfZyUf2DB5hgSuH Shq8SPf16H5FJs+NfGdlX+gnOqIPt7ox/BdaT0IVb+GDMIVetUT39xdfyIVcexVbJteP ZaC/XFXnsIKTMhu/XH013qbF+Pr4jwIwEoIe+XpN6OS21wE3fgJL572v8mIxCYkqF1Kf XVbNkh2LrPO01i1zuqpCgGi9J4kjJ+OB4b0SSch4YyD5K5wSbaCc0aD0MPba26J2bsYz XYxMIaRaNZSg7bpu/9n8VqdGcc1k/o7gClubmjqm1soH9YzslHAGy7Afyi4SioehZyL8 YmwQ== X-Gm-Message-State: AOJu0YxjK0fKNuqqgfcsNlj0kxQYI1zTHL0mys9m7kD/XG9ltllIQ6Dz 0kh5KfziErnTTZFSlEH99Ljq6AzBxR+fojIzew9HGTo6Z5WjWJbfrIsayE4a/cAFb+CRKEq9VML fpmCacCbkaYRCkqbfOtop5Pch7sWyzV4gHWU97ujNkcXslY/TQofN2NOd7odbjoRSgncWg/HDSk AHCz8FEqSZyXLQLfbK0s1BWD3brcoOt1S/ohjTyaXcCrZctaHA X-Received: from pjvc17.prod.google.com ([2002:a17:90a:d911:b0:35d:9ed4:368c]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:e704:b0:35d:9f7c:142c with SMTP id 98e67ed59e1d1-361404a7d14mr23099475a91.26.1776899230549; Wed, 22 Apr 2026 16:07:10 -0700 (PDT) Date: Wed, 22 Apr 2026 23:06:44 +0000 In-Reply-To: <20260422230659.903191-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260422230659.903191-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.rc2.533.g4f5dca5207-goog Message-ID: <20260422230659.903191-4-jstultz@google.com> Subject: [PATCH v28 3/8] sched: deadline: Add dl_rq->curr pointer to address issues with Proxy Exec From: John Stultz To: LKML Cc: John Stultz , Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The DL scheduler keeps the current task in the rbtree, since the deadline value isn't usually chagned while the task is runnable. This results in set_next_task() and put_prev_task() being simpler, but unfortunately this causes complexity elsewhere. Specifically when update_curr_dl() updates the deadline, it has to dequeue and then enqueue the task. From put_prev_task_dl(), we first call update_curr_dl(), and then call enqueue_pushable_dl_task(). However, with Proxy Exec this goes awry. Since when a mutex is released, we might wake the waiting rq->donor. This will cause put_prev_task() to be called on the donor to take it off the cpu for return migration. At that point, from put_prev_task_dl() the update_curr_dl() logic will dequeue & enqueue the task, and the enqueue function will call enqueue_pushable_dl_task() (since the task_current() check won't prevent it). Then back up the callstack in put_prev_task_dl() we'll end up calling enqueue_pushable_dl_task() again, tripping the !RB_EMPTY_NODE(&p->pushable_dl_tasks) warning. So to avoid this, use Peter's suggested[1] approach, and add a dl_rq->curr pointer that is set/cleared from set_next_task()/ put_prev_task(), which effectively tracks the rq->donor. We can then use this to avoid adding the active donor to the pushable list from enqueue_task_dl(). [1]: https://lore.kernel.org/lkml/20260304095123.GP606826@noisy.programming= .kicks-ass.net/ Suggested-by: Peter Zijlstra Signed-off-by: John Stultz --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/deadline.c | 13 +++++++++++++ kernel/sched/sched.h | 1 + 2 files changed, 14 insertions(+) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 4ff3e164d9880..df56406b12766 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2349,6 +2349,9 @@ static void enqueue_task_dl(struct rq *rq, struct tas= k_struct *p, int flags) if (task_is_blocked(p)) return; =20 + if (dl_rq->curr =3D=3D dl_se) + return; + if (!task_current(rq, p) && !p->dl.dl_throttled && p->nr_cpus_allowed > 1) enqueue_pushable_dl_task(rq, p); } @@ -2571,6 +2574,10 @@ static void start_hrtick_dl(struct rq *rq, struct sc= hed_dl_entity *dl_se) } #endif /* !CONFIG_SCHED_HRTICK */ =20 +/* + * DL keeps current in tree, because ->deadline is not typically changed w= hile + * a task is runnable. + */ static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool fi= rst) { struct sched_dl_entity *dl_se =3D &p->dl; @@ -2583,6 +2590,9 @@ static void set_next_task_dl(struct rq *rq, struct ta= sk_struct *p, bool first) /* You can't push away the running task */ dequeue_pushable_dl_task(rq, p); =20 + WARN_ON_ONCE(dl_rq->curr); + dl_rq->curr =3D dl_se; + if (!first) return; =20 @@ -2653,6 +2663,9 @@ static void put_prev_task_dl(struct rq *rq, struct ta= sk_struct *p, struct task_s =20 update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 1); =20 + WARN_ON_ONCE(dl_rq->curr !=3D dl_se); + dl_rq->curr =3D NULL; + if (task_is_blocked(p)) return; =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2b3a97735efeb..d0f3e164a61d2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -889,6 +889,7 @@ struct dl_rq { =20 bool overloaded; =20 + struct sched_dl_entity *curr; /* * Tasks on this rq that can be pushed away. They are kept in * an rb-tree, ordered by tasks' deadlines, with caching --=20 2.54.0.rc2.533.g4f5dca5207-goog From nobody Wed Jun 17 05:10:17 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40000361657 for ; Wed, 22 Apr 2026 23:07:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899237; cv=none; b=WTcHdK4v4NTyH0g9eURp2bsSQ2eFcYTpWXAPpkvo23Dv24UGvJ6Oa79tpA/Nzo6fOtO/KUDXrliI5KfvqySzj+p5WkBwVNS/KXPJx72Som8mbdMIxFFJqSoQwwMUc2nvFPtnF8Z5KIzBcNPEaLO/oEYtUPHohl4DG1G55N8/EcU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899237; c=relaxed/simple; bh=cRCb/LO+Fvqrp+IZrGguXi182Zak1qasNs6VIFWWQ78=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=qL7L+t8lhci2+qM28yUhyav7U6HnlgdcVIXy2d+TW0idwSN8f5reCsgcwxy0n8M7Jlq/k1frN0Rcit5Y4HlSJssD23gOMvtn7SYa6UVFfmrN7p2uyZvoB2ry15eY5IjLBtCDV2DF6l3nddvYnAEuyR2Jfm4dXjiJxcO8ycVj4QY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=WmbwIJhG; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="WmbwIJhG" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2aad5fec175so84531075ad.2 for ; Wed, 22 Apr 2026 16:07:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776899232; x=1777504032; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XX06oNYxOK+/mKT5tcRc7TAoC7ph1g70pIJ9OefEGrI=; b=WmbwIJhGbErqFipvqkL1imhIC1RFLYphQ3k1eZ62FQL234SI5n6qJLDgshG5hOooc0 KfHBRYJyVOQXS259XyGE6V/q9TGzN7a53slonkKFm/LDx6yrrRvzjcOMxVk9y9QZmjcU rFnFfaj5iYQb1rLjU/Kwjw/r/R+mIphT5fcL1Hi2TJoOVUe29aWDUH0jYNKHhstTO9R0 AJNmexnWTDgA6nITHfz4ajMIkS+Lg+D96ACBAmd9LPGt9YFUUivUvTK5p4rZW7Xl7rVS ZTQ5i9EmHzWEdB9z1oTGaVv6NAlehKodq7JxaeYpRTqVeoPIqs4JgVv6ybr0ioW5QnQ6 dpWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776899232; x=1777504032; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XX06oNYxOK+/mKT5tcRc7TAoC7ph1g70pIJ9OefEGrI=; b=qF+XXOPyfaZACX3Q2gD6DCAFBRmnfig7C8I+ngZ2+c138qtXwwJluNo0PIaEMxRip7 Ccj2QEjcahnOpBgUbIYJGAa2tzMYQ8xxxOiugCWPZslfFeZ+3sUteMwoo7CFbNYq3lqQ uFF/7Vmd/WIlFat7+xZYwJHFXCyflB0IMEaoMeglsNwqp3G8OcTiEdRe+0izRBQl+X1g +nYGCyADlYJQ3J+Otqys8m6zI517izgpEb5XoBcS6mQJU6OECAI7w8Lt6MnP/DINErHd OrUwiMu231qEwvoH7uUVowsfXkJg1ohKC7xXsuN3wLvL25TuYWGTIwTTXxLwXc3KCyst Pbeg== X-Gm-Message-State: AOJu0Yx+1Q/PCwAJ4EVOO0D6I1KmTvS9mazd3r2jH2LhDaZ65JPEMcsq 242jsZ7yWztK+bzg98NnPGyHhSt/XvkR1U2Zt06KXaHa23kQuNp+V6nPw9DMcr+bShk0PoL2w79 zimws3CIDFuQGl/LfN4p8DOBTITg3rzg++ATwXMprgPd7xAWULRPeqJ4djxkn62Bh2swrjKIFYP sFzFutFweFVovmHC48y6ikp1VY4empwzXQUXegsm1DZr0jkgsA X-Received: from plvx5.prod.google.com ([2002:a17:902:9a45:b0:2b2:a704:94d8]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:32c9:b0:2b0:5923:5194 with SMTP id d9443c01a7336-2b5f9fa70e1mr252694595ad.27.1776899232191; Wed, 22 Apr 2026 16:07:12 -0700 (PDT) Date: Wed, 22 Apr 2026 23:06:45 +0000 In-Reply-To: <20260422230659.903191-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260422230659.903191-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.rc2.533.g4f5dca5207-goog Message-ID: <20260422230659.903191-5-jstultz@google.com> Subject: [PATCH v28 4/8] sched: Rework block_task so it can be directly called From: John Stultz To: LKML Cc: John Stultz , Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Pull most of the logic out of try_to_block_task() and put it into block_task() directly, so that we can call block_task() and not have to worry about the failing cases in try_to_block_task() Suggested-by: Peter Zijlstra Signed-off-by: John Stultz --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 45 +++++++++++++++++++++++---------------------- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3ac6dd4d3c587..942af3b34ffe0 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2223,8 +2223,29 @@ void deactivate_task(struct rq *rq, struct task_stru= ct *p, int flags) dequeue_task(rq, p, flags); } =20 -static void block_task(struct rq *rq, struct task_struct *p, int flags) +static void block_task(struct rq *rq, struct task_struct *p, unsigned long= task_state) { + int flags =3D DEQUEUE_NOCLOCK; + + p->sched_contributes_to_load =3D + (task_state & TASK_UNINTERRUPTIBLE) && + !(task_state & TASK_NOLOAD) && + !(task_state & TASK_FROZEN); + + if (unlikely(is_special_task_state(task_state))) + flags |=3D DEQUEUE_SPECIAL; + + /* + * __schedule() ttwu() + * prev_state =3D prev->state; if (p->on_rq && ...) + * if (prev_state) goto out; + * p->on_rq =3D 0; smp_acquire__after_ctrl_dep(); + * p->state =3D TASK_WAKING + * + * Where __schedule() and ttwu() have matching control dependencies. + * + * After this, schedule() must not care about p->state any more. + */ if (dequeue_task(rq, p, DEQUEUE_SLEEP | flags)) __block_task(rq, p); } @@ -6581,7 +6602,6 @@ static bool try_to_block_task(struct rq *rq, struct t= ask_struct *p, unsigned long *task_state_p, bool should_block) { unsigned long task_state =3D *task_state_p; - int flags =3D DEQUEUE_NOCLOCK; =20 if (signal_pending_state(task_state, p)) { WRITE_ONCE(p->__state, TASK_RUNNING); @@ -6601,26 +6621,7 @@ static bool try_to_block_task(struct rq *rq, struct = task_struct *p, if (!should_block) return false; =20 - p->sched_contributes_to_load =3D - (task_state & TASK_UNINTERRUPTIBLE) && - !(task_state & TASK_NOLOAD) && - !(task_state & TASK_FROZEN); - - if (unlikely(is_special_task_state(task_state))) - flags |=3D DEQUEUE_SPECIAL; - - /* - * __schedule() ttwu() - * prev_state =3D prev->state; if (p->on_rq && ...) - * if (prev_state) goto out; - * p->on_rq =3D 0; smp_acquire__after_ctrl_dep(); - * p->state =3D TASK_WAKING - * - * Where __schedule() and ttwu() have matching control dependencies. - * - * After this, schedule() must not care about p->state any more. - */ - block_task(rq, p, flags); + block_task(rq, p, task_state); return true; } =20 --=20 2.54.0.rc2.533.g4f5dca5207-goog From nobody Wed Jun 17 05:10:17 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE7053A255D for ; Wed, 22 Apr 2026 23:07:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899239; cv=none; b=AIbAEohT9NDbCDP7kOpNPqMz0dGc7mFS1wEL1UBtLUrbWcOuyhHS67Y2tK4Ag2QYj7/qnZd+18Ilr2d2VqyPnV/7tiE5DSoIl9pJhI+GjTj6I35DGoVZvWdNBWEKLoOhB20CVBrQWOPXb0zya0ktyDyRsDwR7HQ8l1hWgdOT/wc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899239; c=relaxed/simple; bh=NPXh0KmRYOFaMMEvwWTWk0BKW250gM9qA7pumOyUA4k=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jGrFA10TiXY8zMg7A52h4U17zZq4bwjqCVJUHonUE1L/sj2zXDWwmLq4f6N3qa2cvVht08Oa9Ox9QgWvtoIJjL2IM8wu3vYNQr82M0lU57aM76Jx2tQm6XNqDUPHWbB9kclKgUUTk+np+N2CtKwDW7NYLGEaAQ5oBxaZzq75btA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=AqbZpMXv; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="AqbZpMXv" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-35fbb5779e8so6966743a91.3 for ; Wed, 22 Apr 2026 16:07:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776899234; x=1777504034; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WEMdwADTR5bjH1z+sSz1SGoDcZgjFXtGqF46AN0pidA=; b=AqbZpMXvVhuIqPC3H+QINbFjeOYi+Ka65uVNZy/Oa3ZewVFHrbD+8IIg5Q9UqfPQtB cEu2s1Zz67jHmK2638v+vfYAah1Y8FB7QOOsVDUDbRXJZwf/1kgeGtwNitHIsHqgY50p t1/TlqJ0L/5/Wiqqw0k/qEtUwRO+cZc/GG8rkz7bsyI13sWAsSHMPH+ssbVvB3WWoo4O UHasSd6KS/QwPyYp/B1gy59f02ZZaxSsXL9NUIL4yjWO5fXDhm3aIqRJ6nhvVBvtSpQR XqwX2qOY2H92ADTZS6TgJmEPk/g/ZMVh54gpi8wqftr1P+VduHiY2QssbjJyBxtu7FDr sGww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776899234; x=1777504034; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WEMdwADTR5bjH1z+sSz1SGoDcZgjFXtGqF46AN0pidA=; b=laOcWvEe6DcafCIXgr/LwqXVs2SzPQdpOQPLqSVKHw+C9bGsRP6eJPcM3v0vqrYGvW 2fDaEN8MvwzMMtcB430JWufACZeQaRazN4Gy3dFTOLxRyl/4guVpe38g2Sb6L1m0OcpV Nu8Rtfe3sGbKYJbWgDXlOpkPwpThVioh+Zzb6Smm8yv61gMi5A3LC+e+DBV4AIfajMnM NjrGjt6ZqRZ++2cvEMySf8lKe28yOElnea80E8ldFfgy2gxtnXtb3zO6TGivi+s1q8ss y+bft87vjdfqOg9rFUUUg0jESRbVAHtD2z22CoC++yHxqZUcWA75MTaK6FpSYbjud4Qm Orbg== X-Gm-Message-State: AOJu0Ywo0nVaDseCk+nQ/7j+2rnlyAvwVlZHgCD21Kt/ygWELD8qELuR Pq6reyLIWGQm2dkCdeZJYlu+X8Gux8BdyA/j09E/eoSY/rg9FuboO7uTj3lhGPrO6yia3X1skPA FKiMNNDU4GgOmNz3/6IT0/rOYe+BPSLSHnyf9k5azGsLi0MwrkVuYMUg+W1GGMQ2oOznHn7y31M OkdFuXwRXwLryi9jzpMKBREA6p0FkQNwEayPWlN3pFqtseKfqr X-Received: from plbbb3.prod.google.com ([2002:a17:902:bc83:b0:29f:25cf:e576]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3c52:b0:35f:b50e:defc with SMTP id 98e67ed59e1d1-36140473f07mr25888192a91.16.1776899233791; Wed, 22 Apr 2026 16:07:13 -0700 (PDT) Date: Wed, 22 Apr 2026 23:06:46 +0000 In-Reply-To: <20260422230659.903191-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260422230659.903191-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.rc2.533.g4f5dca5207-goog Message-ID: <20260422230659.903191-6-jstultz@google.com> Subject: [PATCH v28 5/8] sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING case From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds logic so try_to_wake_up() will notice if we are waking a task where blocked_on =3D=3D PROXY_WAKING, and if necessary dequeue the task so the wakeup will naturally return-migrate the donor task back to a cpu it can run on. This helps performance as we do the dequeue and wakeup under the locks normally taken in the try_to_wake_up() and avoids having to do proxy_force_return() from __schedule(), which has to re-take similar locks and then force a pick again loop. This was split out from the larger proxy patch, and significantly reworked. Credits for the original patch go to: Peter Zijlstra (Intel) Juri Lelli Valentin Schneider Connor O'Brien Signed-off-by: John Stultz --- XXX - Make sure to switch to use ACQUIRE(__task_rq_lock, guard) in 7.0-rc+ branches --- v24: * Reworked proxy_needs_return() so its less nested as suggested by K Prateek * Switch to using block_task with DEQUEUE_SPECIAL as suggested by K Prateek * Fix edge case to reset wake_cpu if select_task_rq() chooses the current rq and we skip set_task_cpu() v26: * Handle both blocked and PROXY_WAKING tasks in proxy_needs_return(), as suggested by K Prateek * Try to handle signal edge case in ttwu that K Prateek pointed out v27: * Integrate simplifications to proxy_needs_return() suggested by K Prateek * Rework ttwu_runnable() to align with ACQUIRE(__task_rq_lock, guard)(p) usage as suggested by Peter * Major rework suggested by Peter to get rid of proxy_force_return() completely, using proxy_deactivate() and allow ttwu to handle all the return migration. Lots of helpful improvements suggested by K Prateek included as well here. v28: * Folded in change suggested by K Prateek to introduce proxy_reset_donor() to reset the donor to current task when the donor is woken up. * Drop an unnecessary PROXY_WAKING assignment which was noted by K Prateek Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- include/linux/sched.h | 2 +- kernel/sched/core.c | 196 +++++++++++++++++++++--------------------- 2 files changed, 98 insertions(+), 100 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 368c7b4d7cb51..5b68a1c9eedcf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -161,7 +161,7 @@ struct user_event_mm; */ #define is_special_task_state(state) \ ((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | \ - TASK_DEAD | TASK_FROZEN)) + TASK_DEAD | TASK_WAKING | TASK_FROZEN)) =20 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP # define debug_normal_state_change(state_value) \ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 942af3b34ffe0..17797e1f76f25 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3722,6 +3722,54 @@ void update_rq_avg_idle(struct rq *rq) rq->idle_stamp =3D 0; } =20 +#ifdef CONFIG_SCHED_PROXY_EXEC +static void zap_balance_callbacks(struct rq *rq); + +static inline void proxy_reset_donor(struct rq *rq) +{ + WARN_ON_ONCE(rq->donor =3D=3D rq->curr); + + put_prev_set_next_task(rq, rq->donor, rq->curr); + rq_set_donor(rq, rq->curr); + zap_balance_callbacks(rq); + resched_curr(rq); +} + +/* + * Checks to see if task p has been proxy-migrated to another rq + * and needs to be returned. If so, we deactivate the task here + * so that it can be properly woken up on the p->wake_cpu + * (or whichever cpu select_task_rq() picks at the bottom of + * try_to_wake_up() + */ +static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p) +{ + if (!task_is_blocked(p)) + return false; + + guard(raw_spinlock)(&p->blocked_lock); + + /* Task is waking up; clear any blocked_on relationship */ + __clear_task_blocked_on(p, NULL); + + /* If already current, don't need to return migrate */ + if (task_current(rq, p)) + return false; + + /* If we're return migrating the rq->donor, switch it out for idle */ + if (task_current_donor(rq, p)) + proxy_reset_donor(rq); + + block_task(rq, p, TASK_WAKING); + return true; +} +#else /* !CONFIG_SCHED_PROXY_EXEC */ +static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p) +{ + return false; +} +#endif /* CONFIG_SCHED_PROXY_EXEC */ + static void ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, struct rq_flags *rf) @@ -3786,28 +3834,26 @@ ttwu_do_activate(struct rq *rq, struct task_struct = *p, int wake_flags, */ static int ttwu_runnable(struct task_struct *p, int wake_flags) { - struct rq_flags rf; - struct rq *rq; - int ret =3D 0; + ACQUIRE(__task_rq_lock, guard)(p); + struct rq *rq =3D guard.rq; =20 - rq =3D __task_rq_lock(p, &rf); - if (task_on_rq_queued(p)) { - update_rq_clock(rq); - if (p->se.sched_delayed) - enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED); - if (!task_on_cpu(rq, p)) { - /* - * When on_rq && !on_cpu the task is preempted, see if - * it should preempt the task that is current now. - */ - wakeup_preempt(rq, p, wake_flags); - } - ttwu_do_wakeup(p); - ret =3D 1; - } - __task_rq_unlock(rq, p, &rf); + if (!task_on_rq_queued(p)) + return 0; =20 - return ret; + update_rq_clock(rq); + if (p->se.sched_delayed) + enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED); + if (proxy_needs_return(rq, p)) + return 0; + if (!task_on_cpu(rq, p)) { + /* + * When on_rq && !on_cpu the task is preempted, see if + * it should preempt the task that is current now. + */ + wakeup_preempt(rq, p, wake_flags); + } + ttwu_do_wakeup(p); + return 1; } =20 void sched_ttwu_pending(void *arg) @@ -4194,6 +4240,8 @@ int try_to_wake_up(struct task_struct *p, unsigned in= t state, int wake_flags) * it disabling IRQs (this allows not taking ->pi_lock). */ WARN_ON_ONCE(p->se.sched_delayed); + /* If p is current, we know we can run here, so clear blocked_on */ + clear_task_blocked_on(p, NULL); if (!ttwu_state_match(p, state, &success)) goto out; =20 @@ -4210,6 +4258,7 @@ int try_to_wake_up(struct task_struct *p, unsigned in= t state, int wake_flags) */ scoped_guard (raw_spinlock_irqsave, &p->pi_lock) { smp_mb__after_spinlock(); + if (!ttwu_state_match(p, state, &success)) break; =20 @@ -4274,6 +4323,14 @@ int try_to_wake_up(struct task_struct *p, unsigned i= nt state, int wake_flags) */ WRITE_ONCE(p->__state, TASK_WAKING); =20 + /* + * We never clear the blocked_on relation on proxy_deactivate. + * If we don't clear it here, we have TASK_RUNNING + p->blocked_on + * when waking up. Since this is a fully blocked, off CPU task + * waking up, it should be safe to clear the blocked_on relation. + */ + if (task_is_blocked(p)) + clear_task_blocked_on(p, NULL); /* * If the owning (remote) CPU is still in the middle of schedule() with * this task as prev, considering queueing p on the remote CPUs wake_list @@ -4318,6 +4375,16 @@ int try_to_wake_up(struct task_struct *p, unsigned i= nt state, int wake_flags) wake_flags |=3D WF_MIGRATED; psi_ttwu_dequeue(p); set_task_cpu(p, cpu); + } else if (cpu !=3D p->wake_cpu) { + /* + * If we were proxy-migrated to cpu, then + * select_task_rq() picks cpu instead of wake_cpu + * to return to, we won't call set_task_cpu(), + * leaving a stale wake_cpu pointing to where we + * proxy-migrated from. So just fixup wake_cpu here + * if its not correct + */ + p->wake_cpu =3D cpu; } =20 ttwu_queue(p, cpu, wake_flags); @@ -6606,7 +6673,7 @@ static bool try_to_block_task(struct rq *rq, struct t= ask_struct *p, if (signal_pending_state(task_state, p)) { WRITE_ONCE(p->__state, TASK_RUNNING); *task_state_p =3D TASK_RUNNING; - set_task_blocked_on_waking(p, NULL); + clear_task_blocked_on(p, NULL); =20 return false; } @@ -6649,13 +6716,11 @@ static inline struct task_struct *proxy_resched_idl= e(struct rq *rq) return rq->idle; } =20 -static bool proxy_deactivate(struct rq *rq, struct task_struct *donor) +static void proxy_deactivate(struct rq *rq, struct task_struct *donor) { unsigned long state =3D READ_ONCE(donor->__state); =20 - /* Don't deactivate if the state has been changed to TASK_RUNNING */ - if (state =3D=3D TASK_RUNNING) - return false; + WARN_ON_ONCE(state =3D=3D TASK_RUNNING); /* * Because we got donor from pick_next_task(), it is *crucial* * that we call proxy_resched_idle() before we deactivate it. @@ -6666,7 +6731,7 @@ static bool proxy_deactivate(struct rq *rq, struct ta= sk_struct *donor) * need to be changed from next *before* we deactivate. */ proxy_resched_idle(rq); - return try_to_block_task(rq, donor, &state, true); + block_task(rq, donor, state); } =20 static inline void proxy_release_rq_lock(struct rq *rq, struct rq_flags *r= f) @@ -6740,71 +6805,6 @@ static void proxy_migrate_task(struct rq *rq, struct= rq_flags *rf, proxy_reacquire_rq_lock(rq, rf); } =20 -static void proxy_force_return(struct rq *rq, struct rq_flags *rf, - struct task_struct *p) - __must_hold(__rq_lockp(rq)) -{ - struct rq *task_rq, *target_rq =3D NULL; - int cpu, wake_flag =3D WF_TTWU; - - lockdep_assert_rq_held(rq); - WARN_ON(p =3D=3D rq->curr); - - if (p =3D=3D rq->donor) - proxy_resched_idle(rq); - - proxy_release_rq_lock(rq, rf); - /* - * We drop the rq lock, and re-grab task_rq_lock to get - * the pi_lock (needed for select_task_rq) as well. - */ - scoped_guard (task_rq_lock, p) { - task_rq =3D scope.rq; - - /* - * Since we let go of the rq lock, the task may have been - * woken or migrated to another rq before we got the - * task_rq_lock. So re-check we're on the same RQ. If - * not, the task has already been migrated and that CPU - * will handle any futher migrations. - */ - if (task_rq !=3D rq) - break; - - /* - * Similarly, if we've been dequeued, someone else will - * wake us - */ - if (!task_on_rq_queued(p)) - break; - - /* - * Since we should only be calling here from __schedule() - * -> find_proxy_task(), no one else should have - * assigned current out from under us. But check and warn - * if we see this, then bail. - */ - if (task_current(task_rq, p) || task_on_cpu(task_rq, p)) { - WARN_ONCE(1, "%s rq: %i current/on_cpu task %s %d on_cpu: %i\n", - __func__, cpu_of(task_rq), - p->comm, p->pid, p->on_cpu); - break; - } - - update_rq_clock(task_rq); - deactivate_task(task_rq, p, DEQUEUE_NOCLOCK); - cpu =3D select_task_rq(p, p->wake_cpu, &wake_flag); - set_task_cpu(p, cpu); - target_rq =3D cpu_rq(cpu); - clear_task_blocked_on(p, NULL); - } - - if (target_rq) - attach_one_task(target_rq, p); - - proxy_reacquire_rq_lock(rq, rf); -} - /* * Find runnable lock owner to proxy for mutex blocked donor * @@ -6840,7 +6840,7 @@ find_proxy_task(struct rq *rq, struct task_struct *do= nor, struct rq_flags *rf) clear_task_blocked_on(p, PROXY_WAKING); return p; } - goto force_return; + goto deactivate; } =20 /* @@ -6875,7 +6875,7 @@ find_proxy_task(struct rq *rq, struct task_struct *do= nor, struct rq_flags *rf) __clear_task_blocked_on(p, NULL); return p; } - goto force_return; + goto deactivate; } =20 if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) { @@ -6954,12 +6954,7 @@ find_proxy_task(struct rq *rq, struct task_struct *d= onor, struct rq_flags *rf) return owner; =20 deactivate: - if (proxy_deactivate(rq, donor)) - return NULL; - /* If deactivate fails, force return */ - p =3D donor; -force_return: - proxy_force_return(rq, rf, p); + proxy_deactivate(rq, p); return NULL; migrate_task: proxy_migrate_task(rq, rf, p, owner_cpu); @@ -7106,6 +7101,9 @@ static void __sched notrace __schedule(int sched_mode) if (sched_proxy_exec()) { struct task_struct *prev_donor =3D rq->donor; =20 + if (!prev_state && prev->blocked_on) + clear_task_blocked_on(prev, NULL); + rq_set_donor(rq, next); if (unlikely(next->blocked_on)) { next =3D find_proxy_task(rq, next, &rf); --=20 2.54.0.rc2.533.g4f5dca5207-goog From nobody Wed Jun 17 05:10:17 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48057383C97 for ; Wed, 22 Apr 2026 23:07:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899240; cv=none; b=TRl5RJSpCjuf5lrmAWoLZBd0u25ogoBx3jzjXzVoaL1G15d2p+BQYKExubSxBiTYFyDuzHs4keoSN1DBOtKyVQt8iSlS8DaWUBjDt3H+Vvm28qv4cFO696FSJ04MGBSSSHy8gytlbnA8x+MN5xWA6ixn6mNWETPHnajD5EUMi8E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899240; c=relaxed/simple; bh=zLAWmC8SEilSP2upfSM5Xp7xMO6OSj1mdldNAqGQMs4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jbA9PRNfvR3Tkf2vGtALXwIUwTEizO4bthKiIjrFAQKclJ35IjayyMoaaqUVMXUa262AtjKXus7rHSfxUftBB7cRxWrpjPsIUXJXo4aYVhAjSUfUaah8le5Ccs6C+TYXmoRHxgufms/3X2ckmbvGaalNbov1acAgMEw1mqc1xXI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TgU+Rl5F; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TgU+Rl5F" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-358f058973fso7422109a91.1 for ; Wed, 22 Apr 2026 16:07:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776899236; x=1777504036; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1lvtAoImYg+N2TMW1qJHwAlCROLzg7IxwF+Mm2urAgQ=; b=TgU+Rl5FoqhDlv2WXEFn3uQOys3kXUeg0WcZjmrn+YDOT7DwuwDrUN2Cr7l3ZnaMRu ruhM8LFUbwnocbBN/pfFsaCnebtjLnxymFBEdxQIDN6Kidc518C6OsmQU0NtcLGCGNjf Hv6kmygamT7OgqKthrEeoUDAJCyEVfQbW1ekQ/D7tntzu5Q16nfho8lTDD3wF05UzJ/4 7twRyHiOrzy4xK/eJzbpQ+dq8/v1iFi7soECnS2wrlbgykdsWTmzW8cl0cd8c3aRu+Cq sHdP+DNm5mJsr3BQhBNujldNAmFqN1MeBNKPYxtzTtCk90lj34kDi/jPX27zxouMcUB+ qdEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776899236; x=1777504036; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1lvtAoImYg+N2TMW1qJHwAlCROLzg7IxwF+Mm2urAgQ=; b=FUX+2AyXGceUsauL8E62bSl640yu82iBqPsK32UvIyat6qg5gy545GSO5gJJ4ctNIU 39c41OtnXKANCF2j47MhE4c/+aRWWiETiGbTWPnCc4uABLocbxMbbhM5XvkjtLQIzOhI agX0gKmsM8simXLrxqpUZNY7y7rkTTExF8r+Y4GLACXrzi7X8SMGO4oCoa4x1E+Ie0l1 KlDd1Jt8tkFdZq/+ulPSJcvMhYGyI6r87xSb0TKQlh7XFVoBDs/kw+FarDBPFLBj1gKv TvcAkUbGErNmum+huX4R/weXQGidiCZ3JX301QyvMQjkHlxjnSUwifk6mc2VSDyuCa1J JGCA== X-Gm-Message-State: AOJu0Yy/8e1chFmFspKTaSD+9kbBjD8bZI3qBH6tOf/5H284DJ+HX8L3 PYKZtHFByRuO5Q0Gw6i8nT4ErhfeCMBfrS7Pp6WuBmpSZrj1CupOojTrAgPtydF4Sqyfw6FraYg 8Z3xuww9T7fht27AIJYF4pyQdC2kU087jC5S0ZmozFaCagUkVNUys+vmVlCtyuHbkez9cH4FRfN o7W3fWksYkpnJdup/ve44KMM0uo6Q8zONzyt30r1vzh0OYhe0w X-Received: from pjbmj17.prod.google.com ([2002:a17:90b:3691:b0:35d:a01d:80ab]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4cca:b0:35d:93ff:2854 with SMTP id 98e67ed59e1d1-361403f1311mr24521039a91.8.1776899235402; Wed, 22 Apr 2026 16:07:15 -0700 (PDT) Date: Wed, 22 Apr 2026 23:06:47 +0000 In-Reply-To: <20260422230659.903191-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260422230659.903191-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.rc2.533.g4f5dca5207-goog Message-ID: <20260422230659.903191-7-jstultz@google.com> Subject: [PATCH v28 6/8] sched: Add blocked_donor link to task for smarter mutex handoffs From: John Stultz To: LKML Cc: Peter Zijlstra , Juri Lelli , Valentin Schneider , "Connor O'Brien" , John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Add link to the task this task is proxying for, and use it so the mutex owner can do an intelligent hand-off of the mutex to the task that the owner is running on behalf. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Juri Lelli Signed-off-by: Valentin Schneider Signed-off-by: Connor O'Brien [jstultz: This patch was split out from larger proxy patch] Signed-off-by: John Stultz --- v5: * Split out from larger proxy patch v6: * Moved proxied value from earlier patch to this one where it is actually used * Rework logic to check sched_proxy_exec() instead of using ifdefs * Moved comment change to this patch where it makes sense v7: * Use more descriptive term then "us" in comments, as suggested by Metin Kaya. * Minor typo fixup from Metin Kaya * Reworked proxied variable to prev_not_proxied to simplify usage v8: * Use helper for donor blocked_on_state transition v9: * Re-add mutex lock handoff in the unlock path, but only when we have a blocked donor * Slight reword of commit message suggested by Metin v18: * Add task_init initialization for blocked_donor, suggested by Suleiman v23: * Reworks for PROXY_WAKING approach suggested by PeterZ v25: * Simplified some logic now we don't have proxy_tag_curr() v28: * Remove sched_proxy_exec() conditionalized blocked_lock usage as it was hitting llvm's "'blocked_lock' is not held on every path through here [-Wthread-safety-analysis]" errors. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com Fixup for llvm 'blocked_lock' is not held on every path through here [-Wthr= ead-safety-analysis] Clang is fussy so lets not do conditional locking --- include/linux/sched.h | 1 + init/init_task.c | 1 + kernel/fork.c | 1 + kernel/locking/mutex.c | 43 +++++++++++++++++++++++++++++++++++++++--- kernel/sched/core.c | 14 +++++++++++++- 5 files changed, 56 insertions(+), 4 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 5b68a1c9eedcf..65791d7359553 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1239,6 +1239,7 @@ struct task_struct { #endif =20 struct mutex *blocked_on; /* lock we're blocked on */ + struct task_struct *blocked_donor; /* task that is boosting this task */ raw_spinlock_t blocked_lock; =20 #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER diff --git a/init/init_task.c b/init/init_task.c index b5f48ebdc2b6e..41c19670c8f6b 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -200,6 +200,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = =3D { .mems_allowed_seq =3D SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq, &init_task.alloc_lock), #endif + .blocked_donor =3D NULL, #ifdef CONFIG_RT_MUTEXES .pi_waiters =3D RB_ROOT_CACHED, .pi_top_task =3D NULL, diff --git a/kernel/fork.c b/kernel/fork.c index f1ad69c6dc2d4..c1d65b86729d6 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2216,6 +2216,7 @@ __latent_entropy struct task_struct *copy_process( lockdep_init_task(p); =20 p->blocked_on =3D NULL; /* not blocked yet */ + p->blocked_donor =3D NULL; /* nobody is boosting p yet */ =20 #ifdef CONFIG_BCACHE p->sequential_io =3D 0; diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 6e6699d9fcba6..333f33d33ef06 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -976,7 +976,7 @@ EXPORT_SYMBOL_GPL(ww_mutex_lock_interruptible); static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, u= nsigned long ip) __releases(lock) { - struct task_struct *next =3D NULL; + struct task_struct *donor, *next =3D NULL; struct mutex_waiter *waiter; DEFINE_WAKE_Q(wake_q); unsigned long owner; @@ -997,6 +997,12 @@ static noinline void __sched __mutex_unlock_slowpath(s= truct mutex *lock, unsigne MUTEX_WARN_ON(__owner_task(owner) !=3D current); MUTEX_WARN_ON(owner & MUTEX_FLAG_PICKUP); =20 + if (sched_proxy_exec() && current->blocked_donor) { + /* force handoff if we have a blocked_donor */ + owner =3D MUTEX_FLAG_HANDOFF; + break; + } + if (owner & MUTEX_FLAG_HANDOFF) break; =20 @@ -1009,19 +1015,50 @@ static noinline void __sched __mutex_unlock_slowpat= h(struct mutex *lock, unsigne } =20 raw_spin_lock_irqsave(&lock->wait_lock, flags); + raw_spin_lock(¤t->blocked_lock); debug_mutex_unlock(lock); + + if (sched_proxy_exec()) { + /* + * If we have a task boosting current, and that task was boosting + * current through this lock, hand the lock to that task, as that + * is the highest waiter, as selected by the scheduling function. + */ + donor =3D current->blocked_donor; + if (donor) { + struct mutex *next_lock; + + raw_spin_lock_nested(&donor->blocked_lock, SINGLE_DEPTH_NESTING); + next_lock =3D __get_task_blocked_on(donor); + if (next_lock =3D=3D lock) { + next =3D donor; + __set_task_blocked_on_waking(donor, next_lock); + wake_q_add(&wake_q, donor); + current->blocked_donor =3D NULL; + } + raw_spin_unlock(&donor->blocked_lock); + } + } + + /* + * Failing that, pick first on the wait list. + */ waiter =3D lock->first_waiter; - if (waiter) { + if (!next && waiter) { next =3D waiter->task; =20 + raw_spin_lock_nested(&next->blocked_lock, SINGLE_DEPTH_NESTING); debug_mutex_wake_waiter(lock, waiter); - set_task_blocked_on_waking(next, lock); + __set_task_blocked_on_waking(next, lock); + raw_spin_unlock(&next->blocked_lock); wake_q_add(&wake_q, next); + } =20 if (owner & MUTEX_FLAG_HANDOFF) __mutex_handoff(lock, next); =20 + raw_spin_unlock(¤t->blocked_lock); raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q); } =20 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 17797e1f76f25..edd418dde1e47 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6809,7 +6809,17 @@ static void proxy_migrate_task(struct rq *rq, struct= rq_flags *rf, * Find runnable lock owner to proxy for mutex blocked donor * * Follow the blocked-on relation: - * task->blocked_on -> mutex->owner -> task... + * + * ,-> task + * | | blocked-on + * | v + * blocked_donor | mutex + * | | owner + * | v + * `-- task + * + * and set the blocked_donor relation, this latter is used by the mutex + * code to find which (blocked) task to hand-off to. * * Lock order: * @@ -6949,6 +6959,7 @@ find_proxy_task(struct rq *rq, struct task_struct *do= nor, struct rq_flags *rf) * rq, therefore holding @rq->lock is sufficient to * guarantee its existence, as per ttwu_remote(). */ + owner->blocked_donor =3D p; } WARN_ON_ONCE(owner && !owner->on_rq); return owner; @@ -7105,6 +7116,7 @@ static void __sched notrace __schedule(int sched_mode) clear_task_blocked_on(prev, NULL); =20 rq_set_donor(rq, next); + next->blocked_donor =3D NULL; if (unlikely(next->blocked_on)) { next =3D find_proxy_task(rq, next, &rf); if (!next) { --=20 2.54.0.rc2.533.g4f5dca5207-goog From nobody Wed Jun 17 05:10:17 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42F84383C87 for ; Wed, 22 Apr 2026 23:07:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899239; cv=none; b=KOpKp6nMPtfz842VIe3rWduagIsUDIMKOf1DvZKlVe2yaM4PCbpIBfplEm8cqkHjHPrzA2veQlqPxipi9gQkdjegc2NLdAXTrWzJjdPWZNBBaa1do21NQajWvKrrRRs4YrQIm1fvdgdlHs0MR9j+ePZbtr34dUHT0fFG3rP0yjI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899239; c=relaxed/simple; bh=bPcLkZJP4h2SgR6kmCmNWxyUSJsDo6m5l2fJKgYDgOo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=js/5tt0GgLI0r8AZ4BtC5zr4S05SAkme5IOUOa56q6DAjuOkCCWaN1EY5stEb3c3Oaf0cwo24FdmrNC4aYBurpnd9vqAIk59MzLiTR1aGWlaxwnZWMpjCeLtZvfo93IqpXcM9IQbptwmt/Raz3BJ4v4b9jMf52m/AwhpxLk1QSo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fZMJ5VJc; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fZMJ5VJc" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2b461b36990so63192405ad.3 for ; Wed, 22 Apr 2026 16:07:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776899238; x=1777504038; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lKbuH3wqe2+lXlo0c7hGmfIP9QHlt+5CG7QRfYYGq4g=; b=fZMJ5VJcMG7gjUAPQliZd/bPtL6jZq2R4fJOwuSYZNlFRlxqq9vobwIml4eKVdkMSY Jg89jN+JuT00dnra13S7dd4YEg48kVS8WRLqJXOofxuykGMtI912Z1Sztlck/59BwsKN g9b7es3padclmPTKKVzxUPFSW8cHlvCUHlnfjPIqkzgXblxv30+d3fSt3NF1wdL/QItX P8GmeF8WhWCAsAw/Kw4FBI2eeOaGBPAZpVU/kvL/QvTsV8R7D56PZFloRLYbqTpKEL7U IYn8wmtH6e+JaWt4wdaBA3CXk49akVYL+lrVOfIOv2eE1z6Jtxu5KjPLyb/2xblThjKA t9tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776899238; x=1777504038; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lKbuH3wqe2+lXlo0c7hGmfIP9QHlt+5CG7QRfYYGq4g=; b=ZUkhMw+vhVivw+2h5Myzv9EUfE+Vr9KVaBEMhOk5JgoCDkPvjXRDxuxzTRPmlvx+Bm QtptAGdGWbDCBrPQyiaVgJdRZ65gmbXggqBep5FZZoolk0XJMjF2yDhpwrNXY2IEk1Nv ziMgUAydcp87khLdy7Z0jbOFMSvXHJrZoZrHmIpxZ8wMZlNIXTERD272mPj3x/ixDL2p ZlMKbYn+Kbjg7CxC/L15TJK9F7S6O4Rc7P7cLdQ7RZ2U8q/XS3wjJ70c1o6fIRKqZusE k0N29CDF2O2yjdRQvameE1FZDRd1FqZhV/IO5wQxiUILfpq3C8HkGx+DsF5gwXIi8asX nb9Q== X-Gm-Message-State: AOJu0Yy1JS/mdB3MFRCnZC6tCcNUNonXRrWmO5xz8XQuSZP4G2JtMawJ DnKKvUOfabsrjMwFmIUNZrnOpJFWWbWtKvTXbWqZthBoWCLijy3KLJKnZgeCcRdZRsYjdgmI6i4 c92RXeMIfaq90jb1noB3NW0ml5hONxm5X0Vn2RzpZikKjNJ1fiw/qO2SxNKihwOq9anWTTYa3+n PbTwgikZCSEoRml1U88dlB0FerFmLMoa2jgW+kUbUhmQimJW1i X-Received: from pgj66.prod.google.com ([2002:a63:945:0:b0:c79:15ba:9c4c]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:914b:b0:39b:e1e5:a101 with SMTP id adf61e73a8af0-3a08d8d6827mr28558692637.43.1776899237326; Wed, 22 Apr 2026 16:07:17 -0700 (PDT) Date: Wed, 22 Apr 2026 23:06:48 +0000 In-Reply-To: <20260422230659.903191-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260422230659.903191-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.rc2.533.g4f5dca5207-goog Message-ID: <20260422230659.903191-8-jstultz@google.com> Subject: [PATCH v28 7/8] sched: Break out core of attach_tasks() helper into sched.h From: John Stultz To: LKML Cc: John Stultz , K Prateek Nayak , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Pull the core of attach_tasks() out into sched.h so it can be used more generically. Suggested-by: K Prateek Nayak Signed-off-by: John Stultz --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/fair.c | 16 +--------------- kernel/sched/sched.h | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+), 15 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b843f9a876d6d..af6bd2fdd460f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10011,21 +10011,7 @@ static int detach_tasks(struct lb_env *env) */ static void attach_tasks(struct lb_env *env) { - struct list_head *tasks =3D &env->tasks; - struct task_struct *p; - struct rq_flags rf; - - rq_lock(env->dst_rq, &rf); - update_rq_clock(env->dst_rq); - - while (!list_empty(tasks)) { - p =3D list_first_entry(tasks, struct task_struct, se.group_node); - list_del_init(&p->se.group_node); - - attach_task(env->dst_rq, p); - } - - rq_unlock(env->dst_rq, &rf); + __attach_tasks(env->dst_rq, &env->tasks); } =20 #ifdef CONFIG_NO_HZ_COMMON diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d0f3e164a61d2..e1927e53e85cb 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3045,6 +3045,25 @@ static inline void attach_one_task(struct rq *rq, st= ruct task_struct *p) attach_task(rq, p); } =20 +/* + * __attach_tasks() - attaches a list of tasks (using se.group_node) to + * the new rq + */ +static inline void __attach_tasks(struct rq *rq, struct list_head *tasks) +{ + guard(rq_lock)(rq); + update_rq_clock(rq); + + while (!list_empty(tasks)) { + struct task_struct *p; + + p =3D list_first_entry(tasks, struct task_struct, se.group_node); + list_del_init(&p->se.group_node); + + attach_task(rq, p); + } +} + #ifdef CONFIG_PREEMPT_RT # define SCHED_NR_MIGRATE_BREAK 8 #else --=20 2.54.0.rc2.533.g4f5dca5207-goog From nobody Wed Jun 17 05:10:17 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18D643A2574 for ; Wed, 22 Apr 2026 23:07:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899252; cv=none; b=mxqldPqlpCk7qfuKqbfqmgv6ZNdzJgqU9ZS/wNl8Zlo/EjsYxt5GpWfqipmLnL9zFkHqPVq1F+58SaH6wyhec9dxsovZB4R0zUXD62HFQ5Y2gNOBN1IwVezOigp0q2LDXggNVmpX//l9pOLzM6VtUBpJ4ZWPJR3F+DteErZtjV8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899252; c=relaxed/simple; bh=bjd20hXdQONHNffvBFuajylxmzBngkvfkrYIICVO01A=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=CNrmIuK0JfF/cdOdz6O3Oz7JlXbrvsfKVybqPl+UO0Y1aBx7EVxxBSvv37LhvzScNbsFh4dyWjPJQQaQ4xOI4RQIxnwxCGb4PKqdcUTp4PQKkFnu20a3q5plWf1m2vIPPiTXQse7+9fNGzBdNxpwr9mFhnnzdQGK+PkrGzVo51I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=HmazZIwE; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="HmazZIwE" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2b465bf993cso62216985ad.1 for ; Wed, 22 Apr 2026 16:07:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776899239; x=1777504039; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=O15jZGurrpTf939HxX2cOKx38Q52K7VnoCedb3nSxhE=; b=HmazZIwEBIZPR5ZwpSGwldAHSKyJQOvudCII6iTjlU5nDCmlDCKqZ+vIudpN4EXVSv Ofdy9RuvxZRmBWKICbgbWlyT4AQArs6tAzqenKcxP1x3ziGBYAUwBIOwJmgCwL5QBnW3 Sr0sHCGu1uYo/FsWiIrhn9EfaRRmTl276Gife2L6xfbo2gyu/qGve0opRTdQQPogr06y FxrV0CRWCTAskH8W/U68ICDqltanMBOUgillKn/d8+ExdSXHq/k+ZWumn6SNquE+q3sP EX4u24A7f+D3CBFGRA7lu0L8StLt9QEYRW6aFxIV1yYX0ZSV8J6cMWIUX3tfleAR+a41 3y0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776899239; x=1777504039; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=O15jZGurrpTf939HxX2cOKx38Q52K7VnoCedb3nSxhE=; b=YiSfR8to1vOERhKPNdcgOqJfGWZLc09AQQMEpP9fHx5q/2dtC4Wue1nkD1Rg+e6ock pvsq/QdrLtiZNwmsvjAM/ISS5z0iEVAsyVh2qxhM/DQyHYOw9TK7FgFYSmzUtq60jjzg 6gjQWfqFx39rxE2j6ONGxoTvndG0TCtijFNIXy1la1UJzwbZ6FMFkTmdnFkppGTtQV9J mbjOqxmyap4sr+oIfHPUzYaDRedW/1qMHiLTDYr8NuGiiJVDKeOew7yiI6bDY1CeG6Ay /lmO2Nt/+f/25uVpmEw0vl50wuyzt2015zjrRL/OcULdLn5IRwCwKlzqHW5HUskP1yik K2dw== X-Gm-Message-State: AOJu0YwM+Utj3soCcWc9K0F0JKWpXNFaQR53eV1l+mdgZcZkD2qP1AGR +HHGKzX7ZwFTYcSqtQeRyn2bdkbmNn0AQC0TNCB04iZZjQogaHQXfGcimuoH4Djdub+tvusDsMz yoRVj4o26uy7n1zbnj+y/LJycNtwSBSy7X7jGPCqZBLSN7KzJ2L503jbxSoqdNrQOBz4DmznxTz mEhcOA1RS41cVRiHMAL67v9l/OqFgeEYGWNPABNdBWO4Qy2hLo X-Received: from plbkg11.prod.google.com ([2002:a17:903:60b:b0:2b4:5fd0:a45d]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:7009:b0:2b2:45b7:307f with SMTP id d9443c01a7336-2b5f9eb0988mr189984615ad.9.1776899239198; Wed, 22 Apr 2026 16:07:19 -0700 (PDT) Date: Wed, 22 Apr 2026 23:06:49 +0000 In-Reply-To: <20260422230659.903191-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260422230659.903191-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.rc2.533.g4f5dca5207-goog Message-ID: <20260422230659.903191-9-jstultz@google.com> Subject: [PATCH v28 8/8] sched: Migrate whole chain in proxy_migrate_task() From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Instead of migrating one task each time through find_proxy_task(), we can walk up the blocked_donor ptrs and migrate the entire current chain in one go. This was broken out of earlier patches and held back while the series was being stabilized, but I wanted to re-introduce it. Signed-off-by: John Stultz --- v12: * Earlier this was re-using blocked_node, but I hit a race with activating blocked entities, and to avoid it introduced a new migration_node listhead v18: * Add init_task initialization of migration_node as suggested by Suleiman v22: * Move migration_node under CONFIG_SCHED_PROXY_EXEC as suggested by K Prateek v25: * Use se.group_node instead of adding migration_node, as suggsested by K Prateek * Integrated attach_tasks() cleanups suggested by K Prateek Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index edd418dde1e47..886984a02adad 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6779,9 +6779,9 @@ static void proxy_migrate_task(struct rq *rq, struct = rq_flags *rf, __must_hold(__rq_lockp(rq)) { struct rq *target_rq =3D cpu_rq(target_cpu); + LIST_HEAD(migrate_list); =20 lockdep_assert_rq_held(rq); - WARN_ON(p =3D=3D rq->curr); /* * Since we are migrating a blocked donor, it could be rq->donor, * and we want to make sure there aren't any references from this @@ -6794,13 +6794,20 @@ static void proxy_migrate_task(struct rq *rq, struc= t rq_flags *rf, * before we release the lock. */ proxy_resched_idle(rq); - - deactivate_task(rq, p, DEQUEUE_NOCLOCK); - proxy_set_task_cpu(p, target_cpu); - + for (; p; p =3D p->blocked_donor) { + WARN_ON(p =3D=3D rq->curr); + deactivate_task(rq, p, DEQUEUE_NOCLOCK); + proxy_set_task_cpu(p, target_cpu); + /* + * We can re-use se.group_node to migrate the thing, + * because @p is deactivated (won't be balanced) and + * we hold the rq_lock. + */ + list_add(&p->se.group_node, &migrate_list); + } proxy_release_rq_lock(rq, rf); =20 - attach_one_task(target_rq, p); + __attach_tasks(target_rq, &migrate_list); =20 proxy_reacquire_rq_lock(rq, rf); } --=20 2.54.0.rc2.533.g4f5dca5207-goog