From nobody Fri Jun 12 22:47:00 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB1313750D6 for ; Tue, 12 May 2026 02:56:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554607; cv=none; b=nRP5/X/lcn5RfICGnU0gro4G9w48FHGMyR6H1cSLqFL+GSS23WDFZntogf6Hq7WuKTTSbw8Q54H0NUUBMrmJ0fG+n5EsEbqB2+aUHJ1rUva64X8CMw8IXhQxObarUlOlibK9RFVZIbCbNT8JAzk9dQ8CYCUYDgNxtTqgw8Wktuk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554607; c=relaxed/simple; bh=0Nx0AIsWuAR373rAzSAIyBJVk6zw03bKwyuWUeG88Kw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=CaXhyj4PBd7XVqTSXcS9TxuJjuUGEcFsroASx5Wkqje9STHO1ifCbM0T9AJOFENt5kA7PhAHAkvbVq0GoCZSzph4tmKHX5GFm9ORgElFcmO7BvZhu8Sn8DByPnzIJsaD2Rim7m7WLuHSI9Bv0DKQo/S+AglABD7ukam6FXr1g2A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=XiuJH1aG; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="XiuJH1aG" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c8294d8c48eso931774a12.0 for ; Mon, 11 May 2026 19:56:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778554605; x=1779159405; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dpxXTpDdXoVQYH6QDJvYP+ckVlmSDRa1TL9zMnV28Tc=; b=XiuJH1aGaSpnGQuySfQO2Gr9LOVf5VcxSRdjEx0OwSqPS5CZcV4qAZWxsWmRRe/67z 5dN6pqcrMtQ56kZIlzB3q3U+vHbmR9Mx3WGjcYQDFH3Nh1m85HvhxC1kdxYzPS0a3ijl Lk9cvKFdh6HI01MWjadmHHIdshZpGeI/XRY8YdZ9xKrielyYd0ApMaDte7mNPstS1O9R MwlgmW0UJq19Sfekne1+LSul4wH9/lA3t5EEMet8OjPI4Xc8kwUvpKdkXwJXq9MA2wAq e05QSR6qodLmSYz6PIuzlP0ElEteKJIP2Uks9vF+4HxT6VdHC4/6NZRO/4OmGfNx7NiT Mj+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778554605; x=1779159405; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dpxXTpDdXoVQYH6QDJvYP+ckVlmSDRa1TL9zMnV28Tc=; b=T5gUzScGHmHm3a+68GGvjptmgywa6Ouef5xyB9Hp8ETC/fKbN1tak7B/43W00hGKiI UWwwrtp6ml8mDtoU/ae8pS5tZirg7ic/+n3x5XLQ+evjjq6kQYbBYvBcHc96r+amXwDG eCw1c+o1+7lW3Fi64beXzdG6D+sYOZz+96zoeb6caTvsBclcMTHIzNLrgCRnxrFt2faW Y7ZjRP6Vpaav+RkqwEI7C/+WluAmjmIC3R/hwg0PYmVN04kOq6pT5sMun1HvplPbVX64 C2aZC6RLOS8wR1a7IIindHBHojtmpERV3ImBTO8hbcMMAoHIiSownLsmqTlkmMSqRiv9 EHzQ== X-Gm-Message-State: AOJu0YxEIMytpdgcvqxa2gw2kivSX4NXiMFIrcqQq5b2mVSuKC7+I5rh 2j8rsrDA3wJAZ2qNUjeshtXKInxijM+r1vnvNExwqy5xrtUYSWrnRFf2iv09z1OfrZzoEvEyd9h nWwSOQQ67bwUzBbUmnv1ippRcSbmKj4LbRMmxSn9CF9FWXdRPLg+01aCNfoLf/GlvSj9Yb7+sQs j8dWmRU/cni/uPKj1Wp/zxVwS64E5XaJN3JwA4LxKx1IMY1g47 X-Received: from pfbg4.prod.google.com ([2002:a05:6a00:ae04:b0:82f:c34b:9799]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:aa7:9069:0:b0:82f:b0:28f0 with SMTP id d2e1a72fcca58-83eebc2c64amr1081862b3a.34.1778554604602; Mon, 11 May 2026 19:56:44 -0700 (PDT) Date: Tue, 12 May 2026 02:56:11 +0000 In-Reply-To: <20260512025635.2840817-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260512025635.2840817-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <20260512025635.2840817-2-jstultz@google.com> Subject: [PATCH v29 1/9] sched: Rework pick_next_task() and prev_balance() to avoid stale prev references From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Historically, the prev value from __schedule() was the rq->curr. This prev value is passed down through numerous functions, and used in the class scheduler implementations. The fact that prev was on_cpu until the end of __schedule(), meant it was stable across the rq lock drops that the class->pick_next_task() and ->balance() implementations often do. However, with proxy-exec, the prev passed to functions called by __schedule() is rq->donor, which may not be the same as rq->curr and may not be on_cpu, this makes the prev value potentially unstable across rq lock drops. A recently found issue with proxy-exec, is when we begin doing return migration from try_to_wake_up(), its possible we may be waking up the rq->donor. When we do this, we proxy_resched_idle() to put_prev_set_next() setting the rq->donor to rq->idle, allowing the rq->donor to be return migrated and allowed to run. This however runs into trouble, as on another cpu we might be in the middle of calling __schedule(). Conceptually the rq lock is held for the majority of the time, but in calling pick_next_task() its possible the class->pick_next_task() handler or the ->balance() call may briefly drop the rq lock. This opens a window for try_to_wake_up() to wake and return migrate the rq->donor before the class logic reacquires the rq lock. Unfortunately pick_next_task() and prev_balance() pass in a prev argument, to which we pass rq->donor. However this prev value can now become stale and incorrect across a rq lock drop. So, to correct this, rework the pick_next_task() and prev_balance() calls so that they do not take a "prev" argument. Also rework the class ->pick_next_task() and ->balance() implementations to drop the prev argument, and in the cases where it was used, and have the class functions reference rq->donor directly, and not save the value across rq lock drops so that we don't end up with a stale references. Signed-off-by: John Stultz --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 37 ++++++++++++++++++------------------- kernel/sched/deadline.c | 8 +++++++- kernel/sched/fair.c | 9 +++++++-- kernel/sched/idle.c | 2 +- kernel/sched/rt.c | 8 +++++++- kernel/sched/sched.h | 10 ++++------ kernel/sched/stop_task.c | 2 +- 7 files changed, 45 insertions(+), 31 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b8871449d3c69..90267454077ea 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5972,10 +5972,9 @@ static inline void schedule_debug(struct task_struct= *prev, bool preempt) schedstat_inc(this_rq()->sched_count); } =20 -static void prev_balance(struct rq *rq, struct task_struct *prev, - struct rq_flags *rf) +static void prev_balance(struct rq *rq, struct rq_flags *rf) { - const struct sched_class *start_class =3D prev->sched_class; + const struct sched_class *start_class =3D rq->donor->sched_class; const struct sched_class *class; =20 /* @@ -5987,7 +5986,7 @@ static void prev_balance(struct rq *rq, struct task_s= truct *prev, * a runnable task of @class priority or higher. */ for_active_class_range(class, start_class, &idle_sched_class) { - if (class->balance && class->balance(rq, prev, rf)) + if (class->balance && class->balance(rq, rf)) break; } } @@ -5996,7 +5995,7 @@ static void prev_balance(struct rq *rq, struct task_s= truct *prev, * Pick up the highest-prio task: */ static inline struct task_struct * -__pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags = *rf) +__pick_next_task(struct rq *rq, struct rq_flags *rf) __must_hold(__rq_lockp(rq)) { const struct sched_class *class; @@ -6013,28 +6012,28 @@ __pick_next_task(struct rq *rq, struct task_struct = *prev, struct rq_flags *rf) * higher scheduling class, because otherwise those lose the * opportunity to pull in more work from other CPUs. */ - if (likely(!sched_class_above(prev->sched_class, &fair_sched_class) && + if (likely(!sched_class_above(rq->donor->sched_class, &fair_sched_class) = && rq->nr_running =3D=3D rq->cfs.h_nr_queued)) { =20 - p =3D pick_next_task_fair(rq, prev, rf); + p =3D pick_next_task_fair(rq, rf); if (unlikely(p =3D=3D RETRY_TASK)) goto restart; =20 /* Assume the next prioritized class is idle_sched_class */ if (!p) { p =3D pick_task_idle(rq, rf); - put_prev_set_next_task(rq, prev, p); + put_prev_set_next_task(rq, rq->donor, p); } =20 return p; } =20 restart: - prev_balance(rq, prev, rf); + prev_balance(rq, rf); =20 for_each_active_class(class) { if (class->pick_next_task) { - p =3D class->pick_next_task(rq, prev, rf); + p =3D class->pick_next_task(rq, rf); if (unlikely(p =3D=3D RETRY_TASK)) goto restart; if (p) @@ -6044,7 +6043,7 @@ __pick_next_task(struct rq *rq, struct task_struct *p= rev, struct rq_flags *rf) if (unlikely(p =3D=3D RETRY_TASK)) goto restart; if (p) { - put_prev_set_next_task(rq, prev, p); + put_prev_set_next_task(rq, rq->donor, p); return p; } } @@ -6097,7 +6096,7 @@ extern void task_vruntime_update(struct rq *rq, struc= t task_struct *p, bool in_f static void queue_core_balance(struct rq *rq); =20 static struct task_struct * -pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *r= f) +pick_next_task(struct rq *rq, struct rq_flags *rf) __must_hold(__rq_lockp(rq)) { struct task_struct *next, *p, *max; @@ -6110,7 +6109,7 @@ pick_next_task(struct rq *rq, struct task_struct *pre= v, struct rq_flags *rf) bool need_sync; =20 if (!sched_core_enabled(rq)) - return __pick_next_task(rq, prev, rf); + return __pick_next_task(rq, rf); =20 cpu =3D cpu_of(rq); =20 @@ -6123,7 +6122,7 @@ pick_next_task(struct rq *rq, struct task_struct *pre= v, struct rq_flags *rf) */ rq->core_pick =3D NULL; rq->core_dl_server =3D NULL; - return __pick_next_task(rq, prev, rf); + return __pick_next_task(rq, rf); } =20 /* @@ -6147,7 +6146,7 @@ pick_next_task(struct rq *rq, struct task_struct *pre= v, struct rq_flags *rf) goto out_set_next; } =20 - prev_balance(rq, prev, rf); + prev_balance(rq, rf); =20 smt_mask =3D cpu_smt_mask(cpu); need_sync =3D !!rq->core->core_cookie; @@ -6329,7 +6328,7 @@ pick_next_task(struct rq *rq, struct task_struct *pre= v, struct rq_flags *rf) } =20 out_set_next: - put_prev_set_next_task(rq, prev, next); + put_prev_set_next_task(rq, rq->donor, next); if (rq->core->core_forceidle_count && next =3D=3D rq->idle) queue_core_balance(rq); =20 @@ -6552,10 +6551,10 @@ static inline void sched_core_cpu_deactivate(unsign= ed int cpu) {} static inline void sched_core_cpu_dying(unsigned int cpu) {} =20 static struct task_struct * -pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *r= f) +pick_next_task(struct rq *rq, struct rq_flags *rf) __must_hold(__rq_lockp(rq)) { - return __pick_next_task(rq, prev, rf); + return __pick_next_task(rq, rf); } =20 #endif /* !CONFIG_SCHED_CORE */ @@ -7102,7 +7101,7 @@ static void __sched notrace __schedule(int sched_mode) =20 pick_again: assert_balance_callbacks_empty(rq); - next =3D pick_next_task(rq, rq->donor, &rf); + next =3D pick_next_task(rq, &rf); rq->next_class =3D next->sched_class; if (sched_proxy_exec()) { struct task_struct *prev_donor =3D rq->donor; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index edca7849b165d..f07a888314450 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2506,8 +2506,14 @@ static void check_preempt_equal_dl(struct rq *rq, st= ruct task_struct *p) resched_curr(rq); } =20 -static int balance_dl(struct rq *rq, struct task_struct *p, struct rq_flag= s *rf) +static int balance_dl(struct rq *rq, struct rq_flags *rf) { + /* + * Note, rq->donor may change during rq lock drops, + * so don't re-use prev across lock drops + */ + struct task_struct *p =3D rq->donor; + if (!on_dl_rq(&p->dl) && need_pull_dl_task(rq, p)) { /* * This is OK, because current is on_cpu, which avoids it being diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3ebec186f9823..9fb2b2ec1a1ec 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9231,14 +9231,19 @@ static void __set_next_task_fair(struct rq *rq, str= uct task_struct *p, bool firs static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool = first); =20 struct task_struct * -pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_fla= gs *rf) +pick_next_task_fair(struct rq *rq, struct rq_flags *rf) __must_hold(__rq_lockp(rq)) { struct sched_entity *se; - struct task_struct *p; + struct task_struct *p, *prev; int new_tasks; =20 again: + /* + * Re-read rq->donor at the top as it may have + * changed across a rq lock drop + */ + prev =3D rq->donor; p =3D pick_task_fair(rq, rf); if (!p) goto idle; diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index a83be0c834ddb..ff39120d723a9 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -462,7 +462,7 @@ select_task_rq_idle(struct task_struct *p, int cpu, int= flags) } =20 static int -balance_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) +balance_idle(struct rq *rq, struct rq_flags *rf) { return WARN_ON_ONCE(1); } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 4ee8faf01441a..3c5f37c858b60 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1596,8 +1596,14 @@ static void check_preempt_equal_prio(struct rq *rq, = struct task_struct *p) resched_curr(rq); } =20 -static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flag= s *rf) +static int balance_rt(struct rq *rq, struct rq_flags *rf) { + /* + * Note, rq->donor may change during rq lock drops, + * so don't re-use p across lock drops + */ + struct task_struct *p =3D rq->donor; + if (!on_rt_rq(&p->rt) && need_pull_rt_task(rq, p)) { /* * This is OK, because current is on_cpu, which avoids it being diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9f63b15d309d1..2b3a97735efeb 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2561,7 +2561,7 @@ struct sched_class { /* * schedule/pick_next_task/prev_balance: rq->lock */ - int (*balance)(struct rq *rq, struct task_struct *prev, struct rq_flags *= rf); + int (*balance)(struct rq *rq, struct rq_flags *rf); =20 /* * schedule/pick_next_task: rq->lock @@ -2572,12 +2572,11 @@ struct sched_class { * * next =3D pick_task(); * if (next) { - * put_prev_task(prev); + * put_prev_task(rq->donor); * set_next_task_first(next); * } */ - struct task_struct *(*pick_next_task)(struct rq *rq, struct task_struct *= prev, - struct rq_flags *rf); + struct task_struct *(*pick_next_task)(struct rq *rq, struct rq_flags *rf); =20 /* * sched_change: @@ -2801,8 +2800,7 @@ static inline bool sched_fair_runnable(struct rq *rq) return rq->cfs.nr_queued > 0; } =20 -extern struct task_struct *pick_next_task_fair(struct rq *rq, struct task_= struct *prev, - struct rq_flags *rf); +extern struct task_struct *pick_next_task_fair(struct rq *rq, struct rq_fl= ags *rf); extern struct task_struct *pick_task_idle(struct rq *rq, struct rq_flags *= rf); =20 #define SCA_CHECK 0x01 diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c index f95798baddebb..c909ca0d8c87c 100644 --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -16,7 +16,7 @@ select_task_rq_stop(struct task_struct *p, int cpu, int f= lags) } =20 static int -balance_stop(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) +balance_stop(struct rq *rq, struct rq_flags *rf) { return sched_stop_runnable(rq); } --=20 2.54.0.563.g4f69b47b94-goog From nobody Fri Jun 12 22:47:00 2026 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8304F3783AF for ; Tue, 12 May 2026 02:56:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554607; cv=none; b=YAtLxZof+bbWdJrgOlX7nkECgI4C54iyLdYZUAx7MG0Z2JhMWQoZcFLJWk1vumuXyKlACbjARXXM/qsO0Hb20/EtmPgjDRZrFJ/n1XFJ/ljyrylqtsul5JkKPoNaSEyfMKAcuYPKMIdoKwWdo5NqpENBT4/mE68Nye256/jIbzg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554607; c=relaxed/simple; bh=afHe15jfWQe/JzRjLipsfPtgqLRC2mAHjpBSv5UN41w=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=E12j/639E1lxJ8JBcjUnXn4dpc8gnnPbCqe/8+Vl3Gc9DAndxBJasFAheWwLKAEXfY+HCltiiuukDVi4THUYPYURbF6ICY1uN3Hn5bDMU2us3YFsJTHql65OgXpj7halYHxQncqxPfF8VdRolKS9OUMb3mL3WeZfZhFEvpxs8n0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=HlUlOSRe; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="HlUlOSRe" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-82f74bcfb86so5112694b3a.0 for ; Mon, 11 May 2026 19:56:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778554606; x=1779159406; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nZj3agavyvVHIacLbskesgjSZzR516cxelZ1II2eIQI=; b=HlUlOSRexCku+jpz80IVsZdtctV38NmqljntlB7vMVrY1li37wM1Y3q5wRQ7eXqbDp FIdK183H3UNu1+0M/LlSXoh5fAhrgyHIKtX/odwgJI/RlIh5pn+nwBqZLmBvozfaLp3r 6GV84gXpzz/ytDWvIwPUropNmd1E4YHfvpDvfHzhk/vbFERsD6PhreBGa7Hr4Brz+5Ih 0nilfb9/JIvwegjJgjfLL44HMV6PbKLOHbfMqjG0Hh9jVHNzO4uVRMAHNQ3zrWzuKuig NAYDRfhdv60QxPRcM2mv4z3BjrFQkVxC33BOifDPAC3bx+p0x+h/TdOVzo2UkWV8+JWR Z3aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778554606; x=1779159406; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nZj3agavyvVHIacLbskesgjSZzR516cxelZ1II2eIQI=; b=ExgonUmTONm3YXdwtZcYQzsNwr2GT+StnQtWjWFU+uR9XfQz11YEBfCCxTN27xn6OJ s7BP7qijSfuIuThIdixIEF8UUY3UP5Q7s1jl21bMfxS+0q2R0xN5FMb6gedKELyteUk+ irJCqW4HlqvpcYVprS9RW8v0EfFTCeFnxz08MVNHb8DiTkgfeM3gQ6HM8EuMHJZg36rS 8oNU6DUd582r5oNedYcAVEb4ZVIPFbTlonWgMw4fBIcFKjX2WU5j2hJIMeWbitXUj1h9 KSodImiwsSADulV5SwO6X6IohzdMjmzwyX2m1ZG7eIoWl/kELdjTXXOb2J6vMRhgzcBk nN6A== X-Gm-Message-State: AOJu0Yyl7IuHc/olCGlx1tBN26g3qhBph6Y4NShtGIQdqOrMNz47rU31 mZ6tvnh3lSW3QAQ79J/uZ5dTEYWevTMFLt3N3ONXaEiu7fs1YNGzPq7p7A8tqYT6bPqbqJFkw6s V1HvsMeSdOHjHlCf6MyPxuM+sirzJXIBBpi2prqG7FD9bmI9dVtQD7bSeHChqoJYLd/6i9aWYhr 7fEz9GZNVplqIwDuAOw7lr90Tp35NiiGoE4EsbUP12SwVMoZlt X-Received: from pfblr27.prod.google.com ([2002:a05:6a00:739b:b0:82f:7220:86e7]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:2d06:b0:835:388c:9764 with SMTP id d2e1a72fcca58-83e3b7b4579mr12193187b3a.37.1778554605417; Mon, 11 May 2026 19:56:45 -0700 (PDT) Date: Tue, 12 May 2026 02:56:12 +0000 In-Reply-To: <20260512025635.2840817-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260512025635.2840817-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <20260512025635.2840817-3-jstultz@google.com> Subject: [PATCH v29 2/9] sched: deadline: Add some helper variables to cleanup deadline logic From: John Stultz To: LKML Cc: John Stultz , Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As part of an improvement to handling pushable deadline tasks, Peter suggested this cleanup[1], to use helper values for dl_entity and dl_rq in the enqueue_task_dl() and put_prev_task_dl() functions. There should be no functional change from this patch. To make sure this cleanup change doesn't obscure later logic changes, I've split it into its own patch. [1]: https://lore.kernel.org/lkml/20260304095123.GP606826@noisy.programming= .kicks-ass.net/ Suggested-by: Peter Zijlstra Signed-off-by: John Stultz --- v29: * Replace a missed p->dl usage with dl_se Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/deadline.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index f07a888314450..4aec446cd900c 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2292,7 +2292,10 @@ static void dequeue_dl_entity(struct sched_dl_entity= *dl_se, int flags) =20 static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flag= s) { - if (is_dl_boosted(&p->dl)) { + struct sched_dl_entity *dl_se =3D &p->dl; + struct dl_rq *dl_rq =3D &rq->dl; + + if (is_dl_boosted(dl_se)) { /* * Because of delays in the detection of the overrun of a * thread's runtime, it might be the case that a thread @@ -2305,14 +2308,14 @@ static void enqueue_task_dl(struct rq *rq, struct t= ask_struct *p, int flags) * * In this case, the boost overrides the throttle. */ - if (p->dl.dl_throttled) { + if (dl_se->dl_throttled) { /* * The replenish timer needs to be canceled. No * problem if it fires concurrently: boosted threads * are ignored in dl_task_timer(). */ - cancel_replenish_timer(&p->dl); - p->dl.dl_throttled =3D 0; + cancel_replenish_timer(dl_se); + dl_se->dl_throttled =3D 0; } } else if (!dl_prio(p->normal_prio)) { /* @@ -2324,7 +2327,7 @@ static void enqueue_task_dl(struct rq *rq, struct tas= k_struct *p, int flags) * being boosted again with no means to replenish the runtime and clear * the throttle. */ - p->dl.dl_throttled =3D 0; + dl_se->dl_throttled =3D 0; if (!(flags & ENQUEUE_REPLENISH)) printk_deferred_once("sched: DL de-boosted task PID %d: REPLENISH flag = missing\n", task_pid_nr(p)); @@ -2333,20 +2336,20 @@ static void enqueue_task_dl(struct rq *rq, struct t= ask_struct *p, int flags) } =20 check_schedstat_required(); - update_stats_wait_start_dl(dl_rq_of_se(&p->dl), &p->dl); + update_stats_wait_start_dl(dl_rq, dl_se); =20 if (p->on_rq =3D=3D TASK_ON_RQ_MIGRATING) flags |=3D ENQUEUE_MIGRATING; =20 - enqueue_dl_entity(&p->dl, flags); + enqueue_dl_entity(dl_se, flags); =20 - if (dl_server(&p->dl)) + if (dl_server(dl_se)) return; =20 if (task_is_blocked(p)) return; =20 - if (!task_current(rq, p) && !p->dl.dl_throttled && p->nr_cpus_allowed > 1) + if (!task_current(rq, p) && !dl_se->dl_throttled && p->nr_cpus_allowed > = 1) enqueue_pushable_dl_task(rq, p); } =20 @@ -2643,7 +2646,7 @@ static void put_prev_task_dl(struct rq *rq, struct ta= sk_struct *p, struct task_s struct sched_dl_entity *dl_se =3D &p->dl; struct dl_rq *dl_rq =3D &rq->dl; =20 - if (on_dl_rq(&p->dl)) + if (on_dl_rq(dl_se)) update_stats_wait_start_dl(dl_rq, dl_se); =20 update_curr_dl(rq); @@ -2653,7 +2656,7 @@ static void put_prev_task_dl(struct rq *rq, struct ta= sk_struct *p, struct task_s if (task_is_blocked(p)) return; =20 - if (on_dl_rq(&p->dl) && p->nr_cpus_allowed > 1) + if (on_dl_rq(dl_se) && p->nr_cpus_allowed > 1) enqueue_pushable_dl_task(rq, p); } =20 --=20 2.54.0.563.g4f69b47b94-goog From nobody Fri Jun 12 22:47:00 2026 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FF56379996 for ; Tue, 12 May 2026 02:56:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554609; cv=none; b=YIhrcVR3kMgjPj70yP7mO4q8HGokNuWJIIU9je7aCjUp77zBEDd1+6X9f7pq/IUtnEK14vKsw8GEcqPEiTXNbaX2P4r70WgAfyvELMdl2PlIyinYAn+c4sbEwCGsg8KToMKkfyWKvEIfagk3oPIarv1ydnAyzavOrG6Lvttkc6c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554609; c=relaxed/simple; bh=nMS8FFMFPSZuQNBXYW+W8F+WJHNfGQxOicZS4FFiInQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NpUlUwUoHEGtOR/WMUh4LGnU7VgnRJG90PJmJ99PlnWKGjGNsZw7GhbIQEX8ENYH8lN+NhDUWyDs195K7f8fvVJEAU9HHkAmYNSp7BVQlsSL33VMscwIw9M/W/+HCn0o3hnJ6sU7V8hCkTIR5hVnRhCNcb7p/6MM8PWoJ5P0VII= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=u+MY5m6n; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="u+MY5m6n" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-8353df9bc7eso4776750b3a.2 for ; Mon, 11 May 2026 19:56:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778554607; x=1779159407; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ea9lHDZ5Wm2cbsLyseN3u9E7vph0SkqBcOjuJpOzgU8=; b=u+MY5m6n1JlEAzLbOeHv3Bm/Cs8dBm1kUbxFn0JW9BY+WU6Y+tZcSC8KPAHVkSEccJ aE1Y2vn/wY0H/GGhj0eCl72exHa3DC2sQYZJlpUTJFhdQb7RrWlZMMgbShviN/R3Mffe e4h7Aw2jYJpkbeinG5+eEeKSrSkrjtqR2rUZ3CN1pAapf9eItBCHQuyu8e9gX8QVgDEd +bfq00RdnwOtHFW34HRlUd86eTCeDWR2FGsCdQiPy/gCsfNPlMfJ1cOsi254GpIdkXfw qpQSd+HSUKLD03GMLgGaQq1/gvYlPbEX93cv7UIuR7B5xjFJerbkZLQTSDOFVXmx9r77 ewcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778554607; x=1779159407; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ea9lHDZ5Wm2cbsLyseN3u9E7vph0SkqBcOjuJpOzgU8=; b=Mg4h1+Tk/9A8Tx6jy/zx7/fFE1nMDBik2kNvkMN7qYnX71fQHYdWQULLg9ePG0VEln 7y0e3vKiz0617aX3xcfVBoPAj5+dx7NMJNjnG2q3ziMAISjV8qZB2ey8+D+1d8DA8owT 3pWnuiL9ahsRCrY2il5Qg+YS7ULo/Wp3rXhaAiTAw62SugAjKUAqMsDIoU6YlWIDASXd GSrPA89qyPHQFKTYj1EghZk++MwzUh/Phh9d0tEYSQHlFi4twJoUpacePTONaY3/ltnQ CWI3TXv+/JfTeKOa+zbzmGKnwoPqcMBfvFcveoMhGsKQKEeXkFLH6fshWGeUxff81jIz 7WPw== X-Gm-Message-State: AOJu0YwPj6RNlo7s018+Cs0fLMrCA8OF8PwYadtyp+m0o+IGj3r+fq68 w2n3hwWBNkYjZUz7SdWgf+6/bue1hpV31vLkd1QvS7fhk5ZsOvkwgz/rN11QQv/wHhNS6/vAY/J W8imrPzDBMy8MsDxlsHNxBoaJ60SipGI88fUnZBxnC2twiLHqBEavEWlNzVlzx1b/tIFoINXjq1 FGoB7uGOWwTOiZTpme+MQSuMSSL4vuJFaiW+1Be0LQXVZPaBXq X-Received: from pfbeq3.prod.google.com ([2002:a05:6a00:37c3:b0:82f:ad57:40bb]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:aa7:98c6:0:b0:83b:c487:43d2 with SMTP id d2e1a72fcca58-83bc4874759mr15893533b3a.36.1778554606324; Mon, 11 May 2026 19:56:46 -0700 (PDT) Date: Tue, 12 May 2026 02:56:13 +0000 In-Reply-To: <20260512025635.2840817-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260512025635.2840817-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <20260512025635.2840817-4-jstultz@google.com> Subject: [PATCH v29 3/9] sched: deadline: Add dl_rq->curr pointer to address issues with Proxy Exec From: John Stultz To: LKML Cc: John Stultz , Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The DL scheduler keeps the current task in the rbtree, since the deadline value isn't usually chagned while the task is runnable. This results in set_next_task() and put_prev_task() being simpler, but unfortunately this causes complexity elsewhere. Specifically when update_curr_dl() updates the deadline, it has to dequeue and then enqueue the task. From put_prev_task_dl(), we first call update_curr_dl(), and then call enqueue_pushable_dl_task(). However, with Proxy Exec this goes awry. Since when a mutex is released, we might wake the waiting rq->donor. This will cause put_prev_task() to be called on the donor to take it off the cpu for return migration. At that point, from put_prev_task_dl() the update_curr_dl() logic will dequeue & enqueue the task, and the enqueue function will call enqueue_pushable_dl_task() (since the task_current() check won't prevent it). Then back up the callstack in put_prev_task_dl() we'll end up calling enqueue_pushable_dl_task() again, tripping the !RB_EMPTY_NODE(&p->pushable_dl_tasks) warning. So to avoid this, use Peter's suggested[1] approach, and add a dl_rq->curr pointer that is set/cleared from set_next_task()/ put_prev_task(), which effectively tracks the rq->donor. We can then use this to avoid adding the active donor to the pushable list from enqueue_task_dl(). [1]: https://lore.kernel.org/lkml/20260304095123.GP606826@noisy.programming= .kicks-ass.net/ Suggested-by: Peter Zijlstra Signed-off-by: John Stultz --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/deadline.c | 13 +++++++++++++ kernel/sched/sched.h | 1 + 2 files changed, 14 insertions(+) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 4aec446cd900c..2033b58d3c741 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2349,6 +2349,9 @@ static void enqueue_task_dl(struct rq *rq, struct tas= k_struct *p, int flags) if (task_is_blocked(p)) return; =20 + if (dl_rq->curr =3D=3D dl_se) + return; + if (!task_current(rq, p) && !dl_se->dl_throttled && p->nr_cpus_allowed > = 1) enqueue_pushable_dl_task(rq, p); } @@ -2571,6 +2574,10 @@ static void start_hrtick_dl(struct rq *rq, struct sc= hed_dl_entity *dl_se) } #endif /* !CONFIG_SCHED_HRTICK */ =20 +/* + * DL keeps current in tree, because ->deadline is not typically changed w= hile + * a task is runnable. + */ static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool fi= rst) { struct sched_dl_entity *dl_se =3D &p->dl; @@ -2583,6 +2590,9 @@ static void set_next_task_dl(struct rq *rq, struct ta= sk_struct *p, bool first) /* You can't push away the running task */ dequeue_pushable_dl_task(rq, p); =20 + WARN_ON_ONCE(dl_rq->curr); + dl_rq->curr =3D dl_se; + if (!first) return; =20 @@ -2653,6 +2663,9 @@ static void put_prev_task_dl(struct rq *rq, struct ta= sk_struct *p, struct task_s =20 update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 1); =20 + WARN_ON_ONCE(dl_rq->curr !=3D dl_se); + dl_rq->curr =3D NULL; + if (task_is_blocked(p)) return; =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2b3a97735efeb..d0f3e164a61d2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -889,6 +889,7 @@ struct dl_rq { =20 bool overloaded; =20 + struct sched_dl_entity *curr; /* * Tasks on this rq that can be pushed away. They are kept in * an rb-tree, ordered by tasks' deadlines, with caching --=20 2.54.0.563.g4f69b47b94-goog From nobody Fri Jun 12 22:47:00 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7167337BE8E for ; Tue, 12 May 2026 02:56:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554610; cv=none; b=Kvl2CTxPIo0Fd5Kf17pcpS02OrsiPLxoJal0lkzQOS97qKnBkud52kQCrBD4MUH7JHru3rjh4Bq+TUUM3SAWE1ddZ2gAOQd2OAyeESt+pnnEYR6s8WBMCJUw0jvhw/J7yRmrAnWYQizDeaFD2zXeVVt61QgOp1qE5deYcI67vWA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554610; c=relaxed/simple; bh=4jlq+XUvXO9yYL4QCOBG2vgh1Bbl211AsG6vwc7wVSE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cQj4RuBQyL7dQaTRJzODsajYV/UrwxKZbTdr2LN+ONwvU6riqWg+AJDMTWPwZ8a7q60lyD4/NaiMltz8KJnnVe3ee5e2I7yt9DsB87S39X3Pnbqr0S3QLemozVssJe4K7UU60EUbxYZvyEMkGCxfXQms+Ku3zME0uGkvTgI0M4I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=KXglkY8E; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KXglkY8E" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-368b15eeb3bso747520a91.2 for ; Mon, 11 May 2026 19:56:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778554608; x=1779159408; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=AS46FFpJxT5bSe9nNRpsnj5UbRCBsW10mrd9YZXtxxA=; b=KXglkY8EqEcXW2mEOSujC9st64Jsb9uwJjPGXK87S+OnEfbF0o7BYLjy+WpGqH3gwN PlZ4IUN0v7+CkM4Ukstkzl0n12iV9n0VNjSzAp91uCeEsP/ZxsHYeDcFghl5ZRKqPMYI 0Xf3Lr+mf68MCPSNfpMStKnJ8h3wqxtDPnqmrXtmrwJ5N7LnsIvOMmw7oPwETmrMsJBD q9ttkN2Ee4BNporqi6YjnBg6Vzao0vqtOz9xfLi7Qn0l91Sj9MkkYOZ3p36O5E+cEHzA qQaayJtmrw9pRIa8lIT5J3KiRV8HNSHofIO1AxRk0Ba8JWOFvTacrPgmryHYiKxDhv09 LQUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778554608; x=1779159408; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AS46FFpJxT5bSe9nNRpsnj5UbRCBsW10mrd9YZXtxxA=; b=ppnv1ySjUGmz1o3Hzw/9QaOsdo/GlIfZhoHrDH9sVS4wiWZDWRtSTqNzp67KBC1iIy fUFNG//nTqnqgpDtJj6k4lzUqumJUMCbC1mWC0Q7XHu6pckHbMqqe85HIn9pQj+0s/QK KoVrLY6jtaateH69g+2keixsu889q3S1wFYu6B4I4anaqC4o2AJLdVKj5//b6h9ISQa3 IxylTVyDfPP8raJrvtAHdhq8SRzVu5O1DdfmIutN83xf4rBEAVMalBujeA48XSa/YR9X LPP9M1yYD1l8xgX1PzkDnS1355AMid7/eu3i2w8NmssfBrkfu+iTvLYfQsp6JuRLjFU1 IzZA== X-Gm-Message-State: AOJu0Yyq9x5tySnK9oJ74Oy1Cyo9m192LCTdR+ghNcJDOatLMyPM+e2Q gOnoDuqXfS8AGC2sxBWmLdpOw9Kl7h2BwcJQOYhUaVsoP+7UCTaNDWZph3uWg5tkgmclO6suZqG ojp/+ZcrM/U5xCXPA3MQLkEACdaXjn1m2yJ6gpqMvgZ0wV5IeBDVQ1QZ9wWfpga5ZnhCXd1bIKd 8hEg/7bZsjgIz69d0u/nDPOwoQsO3a2mVMYcjSsvRemOZdsjGA X-Received: from pgbbc18.prod.google.com ([2002:a65:6d92:0:b0:c73:7b68:90d9]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:7345:b0:3a3:a9b1:e12f with SMTP id adf61e73a8af0-3aad430788dmr13961413637.21.1778554607239; Mon, 11 May 2026 19:56:47 -0700 (PDT) Date: Tue, 12 May 2026 02:56:14 +0000 In-Reply-To: <20260512025635.2840817-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260512025635.2840817-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <20260512025635.2840817-5-jstultz@google.com> Subject: [PATCH v29 4/9] sched: Rework block_task so it can be directly called From: John Stultz To: LKML Cc: John Stultz , Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Pull most of the logic out of try_to_block_task() and put it into block_task() directly, so that we can call block_task() and not have to worry about the failing cases in try_to_block_task() Suggested-by: Peter Zijlstra Signed-off-by: John Stultz --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 45 +++++++++++++++++++++++---------------------- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 90267454077ea..7440bd7c05ca8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2223,8 +2223,29 @@ void deactivate_task(struct rq *rq, struct task_stru= ct *p, int flags) dequeue_task(rq, p, flags); } =20 -static void block_task(struct rq *rq, struct task_struct *p, int flags) +static void block_task(struct rq *rq, struct task_struct *p, unsigned long= task_state) { + int flags =3D DEQUEUE_NOCLOCK; + + p->sched_contributes_to_load =3D + (task_state & TASK_UNINTERRUPTIBLE) && + !(task_state & TASK_NOLOAD) && + !(task_state & TASK_FROZEN); + + if (unlikely(is_special_task_state(task_state))) + flags |=3D DEQUEUE_SPECIAL; + + /* + * __schedule() ttwu() + * prev_state =3D prev->state; if (p->on_rq && ...) + * if (prev_state) goto out; + * p->on_rq =3D 0; smp_acquire__after_ctrl_dep(); + * p->state =3D TASK_WAKING + * + * Where __schedule() and ttwu() have matching control dependencies. + * + * After this, schedule() must not care about p->state any more. + */ if (dequeue_task(rq, p, DEQUEUE_SLEEP | flags)) __block_task(rq, p); } @@ -6582,7 +6603,6 @@ static bool try_to_block_task(struct rq *rq, struct t= ask_struct *p, unsigned long *task_state_p, bool should_block) { unsigned long task_state =3D *task_state_p; - int flags =3D DEQUEUE_NOCLOCK; =20 if (signal_pending_state(task_state, p)) { WRITE_ONCE(p->__state, TASK_RUNNING); @@ -6602,26 +6622,7 @@ static bool try_to_block_task(struct rq *rq, struct = task_struct *p, if (!should_block) return false; =20 - p->sched_contributes_to_load =3D - (task_state & TASK_UNINTERRUPTIBLE) && - !(task_state & TASK_NOLOAD) && - !(task_state & TASK_FROZEN); - - if (unlikely(is_special_task_state(task_state))) - flags |=3D DEQUEUE_SPECIAL; - - /* - * __schedule() ttwu() - * prev_state =3D prev->state; if (p->on_rq && ...) - * if (prev_state) goto out; - * p->on_rq =3D 0; smp_acquire__after_ctrl_dep(); - * p->state =3D TASK_WAKING - * - * Where __schedule() and ttwu() have matching control dependencies. - * - * After this, schedule() must not care about p->state any more. - */ - block_task(rq, p, flags); + block_task(rq, p, task_state); return true; } =20 --=20 2.54.0.563.g4f69b47b94-goog From nobody Fri Jun 12 22:47:00 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 093C837C0FF for ; Tue, 12 May 2026 02:56:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554613; cv=none; b=JkMQ9MNY+D3EvtqvkZq0vVcE/Ft+TQ0K9PxEp/KnurLdXoVDCSasAxVU54BMZbKnYlSVGK4d1iQqo6fO6oIKNAnZQKd8dPkitXbA3IJ4fdsz09M0POK4QcI6YK4pYhjDL9pbTmNfGaw79Ug3VRWKh5kJLN64KYx991pnbdv2jzM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554613; c=relaxed/simple; bh=P9W67kzoztNKAztCiEsPiO7oHtYdlTccP/9qGh49THY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=edhKUtgwTpI7LiKyanwpyxmHlXnH+GKbLe7tIXKWwB+bI3KO+747zSbveQ33qut0U9bA7f1EClxkt+Prrl1CNYqU5VxDYWoVJZlSe8dRv8XZLl71RHTdyIzA5FRxzn7H+yo2hGJ15r40MI/xl0DIPGDcBTlCWsMb0hJEfcfJKMc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=DzZ7zlC2; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DzZ7zlC2" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2b2ecc96a9aso52512825ad.1 for ; Mon, 11 May 2026 19:56:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778554608; x=1779159408; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4DQyEb3aqqwRBoQmDWLLJhtTNMKIg01JJoupwb/G4L4=; b=DzZ7zlC2ejJ57M0KM8R8xWkPLvcWQnsqiZVuxvRRhRGQiRKaXV1p5D48gnCQ0P1ly3 /LaIViSBi+NkDaPJwgy2qnbkGPz6IBpj7F6gOpGqVE0hpyinN0xxmHYMoVR+rn6pj6hu sveSMsuE4QOwo9Opy/oiP9XaglR1G3BSj5cXqbVtSO4a+GIYuiHX+rH14VGKh5ccsdPm XHs9Rkv/iMM2XMbZ3uLqfMXdH7NFR3uhxNAl6bPczUQ994DJl7d3tCQCoycL7TySTjJ+ a+EpozQH2gHUGGGohC0VG1cmiomrGxtVcJYbCyo9fc3A8L9Dx8StoXzYNxDtwZHeroHe +Rmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778554608; x=1779159408; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4DQyEb3aqqwRBoQmDWLLJhtTNMKIg01JJoupwb/G4L4=; b=BuEuG0/HowNoSRVddStJpOM0ysX0ZSFCRE/QJtjWJFqED+grXhNCmeUsgQ016NItGM 2tfsNB3o1h+DaEPNk6kKAkV0oe7Lh33vKLJSa3zhbqCW/zMGiHk9d2UUC6ldG0+tVbhc XAE033m2mGek/WF1jz3IZCOav7xh06pVPy9WQUab9wquU+XBbe8ILkGUN6tyUm6u37/j 8GGZThGcRSU55z0QfhkeymnKQTj67G+PTl643KDD0ScRzQ5co5OpM7ZHIpiPM5LK/fmN LEN1SzJCO6X5B3YcXXnEAAwhjscl4ZMyezSInvn7fmTBrVuu1xH6NNo0XBcN2/3XtW+k erYg== X-Gm-Message-State: AOJu0Ywd2+MDIoT0KYDCKjOyRY/5N4DA/i7JZxMG6pwWkYV2VOBIAfD0 ABvL01FxxAOTxIeG0O89eWhaG+Bryvw+7IGHyc2AA3wigEqnZgLfJFEeB+26iUz4PSKPJoA9dEv KlR0liMWsCRGF25C6R5s7CKQ1V5C+KEGB65iya59kJnlp9zET18EQ8R3b8Hur7xsnw3AsGVJgxp 0kdnFE3A+HIKDqI4SwfMPxw690PTM3cK4GlueXN3N1TffO2Pnh X-Received: from plko11.prod.google.com ([2002:a17:902:6b0b:b0:2b2:a721:3e36]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:3c68:b0:2bc:db02:d1e6 with SMTP id d9443c01a7336-2bd012d6d00mr12089805ad.38.1778554608018; Mon, 11 May 2026 19:56:48 -0700 (PDT) Date: Tue, 12 May 2026 02:56:15 +0000 In-Reply-To: <20260512025635.2840817-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260512025635.2840817-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <20260512025635.2840817-6-jstultz@google.com> Subject: [PATCH v29 5/9] sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING case From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds logic so try_to_wake_up() will notice if we are waking a task where blocked_on =3D=3D PROXY_WAKING, and if necessary dequeue the task so the wakeup will naturally return-migrate the donor task back to a cpu it can run on. This helps performance as we do the dequeue and wakeup under the locks normally taken in the try_to_wake_up() and avoids having to do proxy_force_return() from __schedule(), which has to re-take similar locks and then force a pick again loop. This was split out from the larger proxy patch, and significantly reworked. Credits for the original patch go to: Peter Zijlstra (Intel) Juri Lelli Valentin Schneider Connor O'Brien Signed-off-by: John Stultz --- v24: * Reworked proxy_needs_return() so its less nested as suggested by K Prateek * Switch to using block_task with DEQUEUE_SPECIAL as suggested by K Prateek * Fix edge case to reset wake_cpu if select_task_rq() chooses the current rq and we skip set_task_cpu() v26: * Handle both blocked and PROXY_WAKING tasks in proxy_needs_return(), as suggested by K Prateek * Try to handle signal edge case in ttwu that K Prateek pointed out v27: * Integrate simplifications to proxy_needs_return() suggested by K Prateek * Rework ttwu_runnable() to align with ACQUIRE(__task_rq_lock, guard)(p) usage as suggested by Peter * Major rework suggested by Peter to get rid of proxy_force_return() completely, using proxy_deactivate() and allow ttwu to handle all the return migration. Lots of helpful improvements suggested by K Prateek included as well here. v28: * Folded in change suggested by K Prateek to introduce proxy_reset_donor() to reset the donor to current task when the donor is woken up. * Drop an unnecessary PROXY_WAKING assignment which was noted by K Prateek v29: * Use scoped_guard in proxy_needs_return() to avoid block_task being called with blocked_lock held. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- include/linux/sched.h | 2 +- kernel/sched/core.c | 195 +++++++++++++++++++++--------------------- 2 files changed, 97 insertions(+), 100 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 368c7b4d7cb51..5b68a1c9eedcf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -161,7 +161,7 @@ struct user_event_mm; */ #define is_special_task_state(state) \ ((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | \ - TASK_DEAD | TASK_FROZEN)) + TASK_DEAD | TASK_WAKING | TASK_FROZEN)) =20 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP # define debug_normal_state_change(state_value) \ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7440bd7c05ca8..633dd5b8428e5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3722,6 +3722,53 @@ void update_rq_avg_idle(struct rq *rq) rq->idle_stamp =3D 0; } =20 +#ifdef CONFIG_SCHED_PROXY_EXEC +static void zap_balance_callbacks(struct rq *rq); + +static inline void proxy_reset_donor(struct rq *rq) +{ + WARN_ON_ONCE(rq->donor =3D=3D rq->curr); + + put_prev_set_next_task(rq, rq->donor, rq->curr); + rq_set_donor(rq, rq->curr); + zap_balance_callbacks(rq); + resched_curr(rq); +} + +/* + * Checks to see if task p has been proxy-migrated to another rq + * and needs to be returned. If so, we deactivate the task here + * so that it can be properly woken up on the p->wake_cpu + * (or whichever cpu select_task_rq() picks at the bottom of + * try_to_wake_up() + */ +static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p) +{ + if (!task_is_blocked(p)) + return false; + + scoped_guard(raw_spinlock, &p->blocked_lock) { + /* Task is waking up; clear any blocked_on relationship */ + __clear_task_blocked_on(p, NULL); + + /* If already current, don't need to return migrate */ + if (task_current(rq, p)) + return false; + + /* If we're return migrating the rq->donor, switch it out for idle */ + if (task_current_donor(rq, p)) + proxy_reset_donor(rq); + } + block_task(rq, p, TASK_WAKING); + return true; +} +#else /* !CONFIG_SCHED_PROXY_EXEC */ +static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p) +{ + return false; +} +#endif /* CONFIG_SCHED_PROXY_EXEC */ + static void ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, struct rq_flags *rf) @@ -3786,28 +3833,26 @@ ttwu_do_activate(struct rq *rq, struct task_struct = *p, int wake_flags, */ static int ttwu_runnable(struct task_struct *p, int wake_flags) { - struct rq_flags rf; - struct rq *rq; - int ret =3D 0; + ACQUIRE(__task_rq_lock, guard)(p); + struct rq *rq =3D guard.rq; =20 - rq =3D __task_rq_lock(p, &rf); - if (task_on_rq_queued(p)) { - update_rq_clock(rq); - if (p->se.sched_delayed) - enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED); - if (!task_on_cpu(rq, p)) { - /* - * When on_rq && !on_cpu the task is preempted, see if - * it should preempt the task that is current now. - */ - wakeup_preempt(rq, p, wake_flags); - } - ttwu_do_wakeup(p); - ret =3D 1; - } - __task_rq_unlock(rq, p, &rf); + if (!task_on_rq_queued(p)) + return 0; =20 - return ret; + update_rq_clock(rq); + if (p->se.sched_delayed) + enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED); + if (proxy_needs_return(rq, p)) + return 0; + if (!task_on_cpu(rq, p)) { + /* + * When on_rq && !on_cpu the task is preempted, see if + * it should preempt the task that is current now. + */ + wakeup_preempt(rq, p, wake_flags); + } + ttwu_do_wakeup(p); + return 1; } =20 void sched_ttwu_pending(void *arg) @@ -4194,6 +4239,8 @@ int try_to_wake_up(struct task_struct *p, unsigned in= t state, int wake_flags) * it disabling IRQs (this allows not taking ->pi_lock). */ WARN_ON_ONCE(p->se.sched_delayed); + /* If p is current, we know we can run here, so clear blocked_on */ + clear_task_blocked_on(p, NULL); if (!ttwu_state_match(p, state, &success)) goto out; =20 @@ -4210,6 +4257,7 @@ int try_to_wake_up(struct task_struct *p, unsigned in= t state, int wake_flags) */ scoped_guard (raw_spinlock_irqsave, &p->pi_lock) { smp_mb__after_spinlock(); + if (!ttwu_state_match(p, state, &success)) break; =20 @@ -4274,6 +4322,14 @@ int try_to_wake_up(struct task_struct *p, unsigned i= nt state, int wake_flags) */ WRITE_ONCE(p->__state, TASK_WAKING); =20 + /* + * We never clear the blocked_on relation on proxy_deactivate. + * If we don't clear it here, we have TASK_RUNNING + p->blocked_on + * when waking up. Since this is a fully blocked, off CPU task + * waking up, it should be safe to clear the blocked_on relation. + */ + if (task_is_blocked(p)) + clear_task_blocked_on(p, NULL); /* * If the owning (remote) CPU is still in the middle of schedule() with * this task as prev, considering queueing p on the remote CPUs wake_list @@ -4318,6 +4374,16 @@ int try_to_wake_up(struct task_struct *p, unsigned i= nt state, int wake_flags) wake_flags |=3D WF_MIGRATED; psi_ttwu_dequeue(p); set_task_cpu(p, cpu); + } else if (cpu !=3D p->wake_cpu) { + /* + * If we were proxy-migrated to cpu, then + * select_task_rq() picks cpu instead of wake_cpu + * to return to, we won't call set_task_cpu(), + * leaving a stale wake_cpu pointing to where we + * proxy-migrated from. So just fixup wake_cpu here + * if its not correct + */ + p->wake_cpu =3D cpu; } =20 ttwu_queue(p, cpu, wake_flags); @@ -6607,7 +6673,7 @@ static bool try_to_block_task(struct rq *rq, struct t= ask_struct *p, if (signal_pending_state(task_state, p)) { WRITE_ONCE(p->__state, TASK_RUNNING); *task_state_p =3D TASK_RUNNING; - set_task_blocked_on_waking(p, NULL); + clear_task_blocked_on(p, NULL); =20 return false; } @@ -6650,13 +6716,11 @@ static inline struct task_struct *proxy_resched_idl= e(struct rq *rq) return rq->idle; } =20 -static bool proxy_deactivate(struct rq *rq, struct task_struct *donor) +static void proxy_deactivate(struct rq *rq, struct task_struct *donor) { unsigned long state =3D READ_ONCE(donor->__state); =20 - /* Don't deactivate if the state has been changed to TASK_RUNNING */ - if (state =3D=3D TASK_RUNNING) - return false; + WARN_ON_ONCE(state =3D=3D TASK_RUNNING); /* * Because we got donor from pick_next_task(), it is *crucial* * that we call proxy_resched_idle() before we deactivate it. @@ -6667,7 +6731,7 @@ static bool proxy_deactivate(struct rq *rq, struct ta= sk_struct *donor) * need to be changed from next *before* we deactivate. */ proxy_resched_idle(rq); - return try_to_block_task(rq, donor, &state, true); + block_task(rq, donor, state); } =20 static inline void proxy_release_rq_lock(struct rq *rq, struct rq_flags *r= f) @@ -6741,71 +6805,6 @@ static void proxy_migrate_task(struct rq *rq, struct= rq_flags *rf, proxy_reacquire_rq_lock(rq, rf); } =20 -static void proxy_force_return(struct rq *rq, struct rq_flags *rf, - struct task_struct *p) - __must_hold(__rq_lockp(rq)) -{ - struct rq *task_rq, *target_rq =3D NULL; - int cpu, wake_flag =3D WF_TTWU; - - lockdep_assert_rq_held(rq); - WARN_ON(p =3D=3D rq->curr); - - if (p =3D=3D rq->donor) - proxy_resched_idle(rq); - - proxy_release_rq_lock(rq, rf); - /* - * We drop the rq lock, and re-grab task_rq_lock to get - * the pi_lock (needed for select_task_rq) as well. - */ - scoped_guard (task_rq_lock, p) { - task_rq =3D scope.rq; - - /* - * Since we let go of the rq lock, the task may have been - * woken or migrated to another rq before we got the - * task_rq_lock. So re-check we're on the same RQ. If - * not, the task has already been migrated and that CPU - * will handle any futher migrations. - */ - if (task_rq !=3D rq) - break; - - /* - * Similarly, if we've been dequeued, someone else will - * wake us - */ - if (!task_on_rq_queued(p)) - break; - - /* - * Since we should only be calling here from __schedule() - * -> find_proxy_task(), no one else should have - * assigned current out from under us. But check and warn - * if we see this, then bail. - */ - if (task_current(task_rq, p) || task_on_cpu(task_rq, p)) { - WARN_ONCE(1, "%s rq: %i current/on_cpu task %s %d on_cpu: %i\n", - __func__, cpu_of(task_rq), - p->comm, p->pid, p->on_cpu); - break; - } - - update_rq_clock(task_rq); - deactivate_task(task_rq, p, DEQUEUE_NOCLOCK); - cpu =3D select_task_rq(p, p->wake_cpu, &wake_flag); - set_task_cpu(p, cpu); - target_rq =3D cpu_rq(cpu); - clear_task_blocked_on(p, NULL); - } - - if (target_rq) - attach_one_task(target_rq, p); - - proxy_reacquire_rq_lock(rq, rf); -} - /* * Find runnable lock owner to proxy for mutex blocked donor * @@ -6841,7 +6840,7 @@ find_proxy_task(struct rq *rq, struct task_struct *do= nor, struct rq_flags *rf) clear_task_blocked_on(p, PROXY_WAKING); return p; } - goto force_return; + goto deactivate; } =20 /* @@ -6876,7 +6875,7 @@ find_proxy_task(struct rq *rq, struct task_struct *do= nor, struct rq_flags *rf) __clear_task_blocked_on(p, NULL); return p; } - goto force_return; + goto deactivate; } =20 if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) { @@ -6955,12 +6954,7 @@ find_proxy_task(struct rq *rq, struct task_struct *d= onor, struct rq_flags *rf) return owner; =20 deactivate: - if (proxy_deactivate(rq, donor)) - return NULL; - /* If deactivate fails, force return */ - p =3D donor; -force_return: - proxy_force_return(rq, rf, p); + proxy_deactivate(rq, p); return NULL; migrate_task: proxy_migrate_task(rq, rf, p, owner_cpu); @@ -7107,6 +7101,9 @@ static void __sched notrace __schedule(int sched_mode) if (sched_proxy_exec()) { struct task_struct *prev_donor =3D rq->donor; =20 + if (!prev_state && prev->blocked_on) + clear_task_blocked_on(prev, NULL); + rq_set_donor(rq, next); if (unlikely(next->blocked_on)) { next =3D find_proxy_task(rq, next, &rf); --=20 2.54.0.563.g4f69b47b94-goog From nobody Fri Jun 12 22:47:00 2026 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E53CA37BE91 for ; Tue, 12 May 2026 02:56:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554618; cv=none; b=kfuOnJ88O29ey/LJuMFSb3Ni8NDFothhpL9FSbli2RE3CkF4cEwHX6DEq2aL2OnzPP0Uw2/SwsVKPMZNpw2UDCrGT1imZCx6uBReabDIwoxwsh+9bRnWQiHEZ0ulwymCgnsXsSWN5yXl/v8YqQEXLcTVTNpOf3V+pnW767MzVRo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554618; c=relaxed/simple; bh=uu5jZh5sMFXc54UY1yj0t6YpHLulP+EolXIMP6z1wiU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=BWrrfQBwn2EhJ0HPhjkXRml3javmWXZ3bmq3lrhl9WD50FZeMMMP0k42HEIb0/7ouZCQipOC8ERlQlSAz2GSl35304DAczqUtRSorF6Ld60Gbs7UGTuVHWXF6eGEzS4MSQPCoqTVoSVvIjApPCwTzSDPuUrA27QGz2L2BeZ0SWA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=avMW0kb3; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="avMW0kb3" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-82fd55bf6cdso3288298b3a.3 for ; Mon, 11 May 2026 19:56:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778554609; x=1779159409; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=bPmrYQ8MlObtJ/qg9HEe4fMV04tB2JQRuyWVKdMHA9k=; b=avMW0kb3JYE0Gh4sEGmj+UNK55KC8OeQDxPkyP/8mEiXQ7lyIdfkoX/XTh7B7lEPd5 TvJqvLLL9OxE62gqfSugW4iuOULvxakkkYjXKOPcxPVDODKKFo4ZbeSVMdGF+Tw12ZTa 7KjRFqf17hGSzxyaG9QVxJbCj5YzKqszAgBWZemBQh9Jmy9yDqaKE753nKlR70MttmpI Hgfntcmjgtej05mXXwNMlwBu17419H24HS5lH2988ibdsmMvFSiLQ25hE7kVGfjx5z74 WYZyDUvVhobltscp/Q0/Gdp9lRHQP1SN/2YK+mkb+fO3cx8sQK8C36fs8oiqiv2+S4ZG R9hQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778554609; x=1779159409; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bPmrYQ8MlObtJ/qg9HEe4fMV04tB2JQRuyWVKdMHA9k=; b=BriVMy5soYxbjqf5hCxXM2Nt4TBltS79edwUKdSLiGpVzC4nmbZtiopQ2e39zVahdw Jlx0fFFMHeKi6gtpALDraIoC1JLGa0qTHVz7KVa9HSPcWhV+bI0UFvA2nqmJuDzyQ4a7 x1azoFnacbg+7OwaNmVqt4ai77i2sET1mcemYoxbE1OOT2yFQcbSsqtjBLDyUyT0/ghK O2jgbH0HdSCrosveaajM7kTNSumMWttAre4vIj48rqp05cuxDLrHWauwE8NaO9DcFZkz KqrSjuKgcuLH4wd/7P5ZP4TzH8F0eZITpmv6QG/kDe4N9k+m+swA34TM4BblAi2j+Gqd yAaw== X-Gm-Message-State: AOJu0Yw/AzDx22d1TbL2A76ebz7vCWXEkTGZ8FyBUQVCfmtG0T3/ikfm g5dmGybrufxaUEIcjclQ2K52aHOYKMWgNa2tXFtXDUnA/7Va/9ujyYf1wAXonv69tKiMcNCOrft ugaH9P/ZuEHSRtW3gR3w9FPD/C9pVhPUY4/RshshnMWkCvz3C50l/MMOEucViJ2mYGzosMNO/tx DNi7KM8eR5PbvJfWoUNGADDwepTNKsNUK2qWr1GbDtirrkrU7/ X-Received: from pfblh16.prod.google.com ([2002:a05:6a00:7110:b0:82f:805:b62a]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:4655:b0:82f:21ee:270e with SMTP id d2e1a72fcca58-83a5dc60001mr25681328b3a.42.1778554608811; Mon, 11 May 2026 19:56:48 -0700 (PDT) Date: Tue, 12 May 2026 02:56:16 +0000 In-Reply-To: <20260512025635.2840817-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260512025635.2840817-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <20260512025635.2840817-7-jstultz@google.com> Subject: [PATCH v29 6/9] sched: Add is_blocked task flag From: John Stultz To: LKML Cc: John Stultz , Vineeth Pillai , Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a new is_blocked flag to the task struct. This flag is set by try_to_block_task() and cleared by ttwu_do_wakeup() and tracks if the task is blocked. Traditionally this would mirror !p->on_rq, however due things like DELAY_DEQUEUE and PROXY_EXEC, this can diverge, so its useful to manage separately. Additionally with this, we might be able to get rid of the p->se.sched_delayed (ab)use in the core code (eventually). Taken whole cloth from Peter's email: https://lore.kernel.org/lkml/20260501132143.GC1026330@noisy.programming.k= icks-ass.net/ With a few additional p->is_blocked =3D 0 in a few cases where we return current if blocked_on gets zeroed or there is no owner. This may hint that these current special cases might be dropped eventually. This change also helps resolve wait-queue stalls seen with proxy-execution. See previous patch attempts for details: https://lore.kernel.org/lkml/20260430215103.2978955-2-jstultz@google.com/ Reported-by: Vineeth Pillai Suggested-by: Peter Zijlstra Signed-off-by: John Stultz --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- include/linux/sched.h | 7 +++++-- kernel/sched/core.c | 16 +++++++++++++++- 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 5b68a1c9eedcf..bbb183233855a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -846,7 +846,11 @@ struct task_struct { struct alloc_tag *alloc_tag; #endif =20 - int on_cpu; + u8 on_cpu; + u8 on_rq; + u8 is_blocked; + u8 __pad; + struct __call_single_node wake_entry; unsigned int wakee_flips; unsigned long wakee_flip_decay_ts; @@ -861,7 +865,6 @@ struct task_struct { */ int recent_used_cpu; int wake_cpu; - int on_rq; =20 int prio; int static_prio; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 633dd5b8428e5..8a223555be2e9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -615,6 +615,12 @@ EXPORT_SYMBOL(__trace_set_current_state); * [ The astute reader will observe that it is possible for two tasks on= one * CPU to have ->on_cpu =3D 1 at the same time. ] * + * p->is_blocked <- { 0, 1 }: + * + * is set by try_to_block_task() and cleared by ttwu_do_wakeup() and tra= cks + * if the task is blocked. Traditionally this would mirror p->on_rq, how= ever + * due things like DELAY_DEQUEUE and PROXY_EXEC, this can diverge. + * * task_cpu(p): is changed by set_task_cpu(), the rules are: * * - Don't call set_task_cpu() on a blocked task: @@ -3706,6 +3712,7 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_fl= ags) */ static inline void ttwu_do_wakeup(struct task_struct *p) { + p->is_blocked =3D 0; WRITE_ONCE(p->__state, TASK_RUNNING); trace_sched_wakeup(p); } @@ -4239,6 +4246,7 @@ int try_to_wake_up(struct task_struct *p, unsigned in= t state, int wake_flags) * it disabling IRQs (this allows not taking ->pi_lock). */ WARN_ON_ONCE(p->se.sched_delayed); + WARN_ON_ONCE(p->is_blocked); /* If p is current, we know we can run here, so clear blocked_on */ clear_task_blocked_on(p, NULL); if (!ttwu_state_match(p, state, &success)) @@ -4550,6 +4558,7 @@ static void __sched_fork(u64 clone_flags, struct task= _struct *p) =20 /* A delayed task cannot be in clone(). */ WARN_ON_ONCE(p->se.sched_delayed); + WARN_ON_ONCE(p->is_blocked); =20 #ifdef CONFIG_FAIR_GROUP_SCHED p->se.cfs_rq =3D NULL; @@ -6671,6 +6680,7 @@ static bool try_to_block_task(struct rq *rq, struct t= ask_struct *p, unsigned long task_state =3D *task_state_p; =20 if (signal_pending_state(task_state, p)) { + p->is_blocked =3D 0; WRITE_ONCE(p->__state, TASK_RUNNING); *task_state_p =3D TASK_RUNNING; clear_task_blocked_on(p, NULL); @@ -6678,6 +6688,8 @@ static bool try_to_block_task(struct rq *rq, struct t= ask_struct *p, return false; } =20 + p->is_blocked =3D 1; + /* * We check should_block after signal_pending because we * will want to wake the task in that case. But if @@ -6837,6 +6849,7 @@ find_proxy_task(struct rq *rq, struct task_struct *do= nor, struct rq_flags *rf) /* if its PROXY_WAKING, do return migration or run if current */ if (mutex =3D=3D PROXY_WAKING) { if (task_current(rq, p)) { + p->is_blocked =3D 0; clear_task_blocked_on(p, PROXY_WAKING); return p; } @@ -6872,6 +6885,7 @@ find_proxy_task(struct rq *rq, struct task_struct *do= nor, struct rq_flags *rf) * just run on this rq), or return-migrate the task. */ if (task_current(rq, p)) { + p->is_blocked =3D 0; __clear_task_blocked_on(p, NULL); return p; } @@ -7105,7 +7119,7 @@ static void __sched notrace __schedule(int sched_mode) clear_task_blocked_on(prev, NULL); =20 rq_set_donor(rq, next); - if (unlikely(next->blocked_on)) { + if (unlikely(next->is_blocked && next->blocked_on)) { next =3D find_proxy_task(rq, next, &rf); if (!next) { zap_balance_callbacks(rq); --=20 2.54.0.563.g4f69b47b94-goog From nobody Fri Jun 12 22:47:00 2026 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 999BC37C90D for ; Tue, 12 May 2026 02:56:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554623; cv=none; b=Fy7/LY2cDYir0Qg7hjp4c/T8y5DJFbH3C0XPitw4FGt0RMS/+0m9SXicHKdm3VEy8O0ZKP4Ax+zF3Cmc77QoGYwHPJkG3WhmkZwV4mKucMlBbr0kXIEgJgmxBDbhOhndoakLOxIBazLYTbvsWCPWsZcSOQd73y1AOPGD7ueM+Fw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554623; c=relaxed/simple; bh=SKueJJHIHNOunKZIsv5psY9ilW2Rwc5SI7MWOjSr2dE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ElStZie6ogjPZ6W4WH4ZiUVComs4NktDQPPbjgRwiLFQDf3C7//6D3oXYkdESK/PXdAuF/ytJLUjiRD0Bsi3tXm3tDfZ4Ej/jXwH/YpEDz8mc3wf6n8nKicZtRakHpdapRZVAT096cNPJozu2COMkKhKZMdZ0+Un+n8sO7shNMw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=E0OZvvpo; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="E0OZvvpo" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-83ef22f8e8bso36332b3a.0 for ; Mon, 11 May 2026 19:56:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778554610; x=1779159410; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=76dfL95OPJNhHMWdWrMSyJO9GaVKGNAGD0FQczkg1GU=; b=E0OZvvpoMkXCU1Wc8YyEjDiQnULtLiePjR6MQAxrloZdjeVdQ3YBBfOk84YcnsFBUn jmLMZHM+U9c/f+8L5aqlnMrb32NlUC79Ctp2W3fmRFqAFOwIICFr6zwL+s6q58ymlJ00 UtptoWx2q4GOIIWgbyzWDaRlsetBxVOqgkFEeDaoWU2oPusCAoqacz5WLyAoSDUJhL0V 4RqGGOMpa+rYLXUmZXniy99Yju9k8Qts5/k9/nzP/0EfEFzDneZG3L1QPXSE1IeU6OIP ueZwfxPFiF7aZ3kupXlWu6qlj7DTWAQYmN+nGjgyMGBSFbHccOCagur2kYVTLzV+xRIP hnEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778554610; x=1779159410; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=76dfL95OPJNhHMWdWrMSyJO9GaVKGNAGD0FQczkg1GU=; b=VtPQEWvb0whdi/OiKslSQ95KaKLS2Vvrq0CjoqcuFxCAhuz1CiAni1oPGmE6Bi0To6 3afX2kUWg+7mzB27KD/fmU86+2LAm9ocuSJOqe0Il00p5mfjLhYyOvBMRrlEFnHyeeB4 vIIvy3jfB3d5wUvbIMRigmGhfaDIFKBLEKdNl6qxzoQ2umd4s6GGZJjA0Lm7ObeV7x/A M79QrTA4VPxZWPmyJ0JKE4qmmMnGKLK4Bb/n6D+Del4BT/FzbaIX1EDAS1/jgwKTqMV/ b+dSfYnQRBaROZxW1W/SKR5yJyGJAYbuR3dk+9Mpxh/ILLLZBNW7JUdkBkkQBDgX/tvr 5uLQ== X-Gm-Message-State: AOJu0YyNjIAq3eDLaeoK1GnRDMiB8GBOH+4kdu49WPgIbpNCVHlVuU7B NbDhmT1Soh/CL1RrbErJxW6DjVllDsxVT3mpXT7bX2dH4s7F3nzzv+XkGBSg7MdbO+MFEdqS4y1 bkUkeAb4Tm0CnFuyFjb96FbsG18SPqqDRk6YGhBYpzAhcOi7lHtRG9iu9ifEcAxf5vc1FpMGqe/ t1+tLT/jbQdCt62Cwsf2lfHkLnDbRDp3NUjTwG5pmwXr2srrHO X-Received: from pfqp7.prod.google.com ([2002:aa7:9e87:0:b0:835:429c:76df]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:b484:b0:82a:7471:eb90 with SMTP id d2e1a72fcca58-83a5d0998bfmr26241507b3a.30.1778554609655; Mon, 11 May 2026 19:56:49 -0700 (PDT) Date: Tue, 12 May 2026 02:56:17 +0000 In-Reply-To: <20260512025635.2840817-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260512025635.2840817-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <20260512025635.2840817-8-jstultz@google.com> Subject: [PATCH v29 7/9] sched: Add blocked_donor link to task for smarter mutex handoffs From: John Stultz To: LKML Cc: Peter Zijlstra , Juri Lelli , Valentin Schneider , "Connor O'Brien" , John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Add link to the task this task is proxying for, and use it so the mutex owner can do an intelligent hand-off of the mutex to the task that the owner is running on behalf. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Juri Lelli Signed-off-by: Valentin Schneider Signed-off-by: Connor O'Brien [jstultz: This patch was split out from larger proxy patch] Signed-off-by: John Stultz --- v5: * Split out from larger proxy patch v6: * Moved proxied value from earlier patch to this one where it is actually used * Rework logic to check sched_proxy_exec() instead of using ifdefs * Moved comment change to this patch where it makes sense v7: * Use more descriptive term then "us" in comments, as suggested by Metin Kaya. * Minor typo fixup from Metin Kaya * Reworked proxied variable to prev_not_proxied to simplify usage v8: * Use helper for donor blocked_on_state transition v9: * Re-add mutex lock handoff in the unlock path, but only when we have a blocked donor * Slight reword of commit message suggested by Metin v18: * Add task_init initialization for blocked_donor, suggested by Suleiman v23: * Reworks for PROXY_WAKING approach suggested by PeterZ v25: * Simplified some logic now we don't have proxy_tag_curr() v28: * Remove sched_proxy_exec() conditionalized blocked_lock usage as it was hitting llvm's "'blocked_lock' is not held on every path through here [-Wthread-safety-analysis]" errors. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- include/linux/sched.h | 1 + init/init_task.c | 1 + kernel/fork.c | 1 + kernel/locking/mutex.c | 43 +++++++++++++++++++++++++++++++++++++++--- kernel/sched/core.c | 14 +++++++++++++- 5 files changed, 56 insertions(+), 4 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index bbb183233855a..c93883ce82ee4 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1242,6 +1242,7 @@ struct task_struct { #endif =20 struct mutex *blocked_on; /* lock we're blocked on */ + struct task_struct *blocked_donor; /* task that is boosting this task */ raw_spinlock_t blocked_lock; =20 #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER diff --git a/init/init_task.c b/init/init_task.c index b5f48ebdc2b6e..41c19670c8f6b 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -200,6 +200,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = =3D { .mems_allowed_seq =3D SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq, &init_task.alloc_lock), #endif + .blocked_donor =3D NULL, #ifdef CONFIG_RT_MUTEXES .pi_waiters =3D RB_ROOT_CACHED, .pi_top_task =3D NULL, diff --git a/kernel/fork.c b/kernel/fork.c index 5f3fdfdb14c7c..7e50e42ec4d97 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2218,6 +2218,7 @@ __latent_entropy struct task_struct *copy_process( lockdep_init_task(p); =20 p->blocked_on =3D NULL; /* not blocked yet */ + p->blocked_donor =3D NULL; /* nobody is boosting p yet */ =20 #ifdef CONFIG_BCACHE p->sequential_io =3D 0; diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 09534628dc01a..0064b724ccda3 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -980,7 +980,7 @@ EXPORT_SYMBOL_GPL(ww_mutex_lock_interruptible); static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, u= nsigned long ip) __releases(lock) { - struct task_struct *next =3D NULL; + struct task_struct *donor, *next =3D NULL; struct mutex_waiter *waiter; DEFINE_WAKE_Q(wake_q); unsigned long owner; @@ -1001,6 +1001,12 @@ static noinline void __sched __mutex_unlock_slowpath= (struct mutex *lock, unsigne MUTEX_WARN_ON(__owner_task(owner) !=3D current); MUTEX_WARN_ON(owner & MUTEX_FLAG_PICKUP); =20 + if (sched_proxy_exec() && current->blocked_donor) { + /* force handoff if we have a blocked_donor */ + owner =3D MUTEX_FLAG_HANDOFF; + break; + } + if (owner & MUTEX_FLAG_HANDOFF) break; =20 @@ -1013,19 +1019,50 @@ static noinline void __sched __mutex_unlock_slowpat= h(struct mutex *lock, unsigne } =20 raw_spin_lock_irqsave(&lock->wait_lock, flags); + raw_spin_lock(¤t->blocked_lock); debug_mutex_unlock(lock); + + if (sched_proxy_exec()) { + /* + * If we have a task boosting current, and that task was boosting + * current through this lock, hand the lock to that task, as that + * is the highest waiter, as selected by the scheduling function. + */ + donor =3D current->blocked_donor; + if (donor) { + struct mutex *next_lock; + + raw_spin_lock_nested(&donor->blocked_lock, SINGLE_DEPTH_NESTING); + next_lock =3D __get_task_blocked_on(donor); + if (next_lock =3D=3D lock) { + next =3D donor; + __set_task_blocked_on_waking(donor, next_lock); + wake_q_add(&wake_q, donor); + current->blocked_donor =3D NULL; + } + raw_spin_unlock(&donor->blocked_lock); + } + } + + /* + * Failing that, pick first on the wait list. + */ waiter =3D lock->first_waiter; - if (waiter) { + if (!next && waiter) { next =3D waiter->task; =20 + raw_spin_lock_nested(&next->blocked_lock, SINGLE_DEPTH_NESTING); debug_mutex_wake_waiter(lock, waiter); - set_task_blocked_on_waking(next, lock); + __set_task_blocked_on_waking(next, lock); + raw_spin_unlock(&next->blocked_lock); wake_q_add(&wake_q, next); + } =20 if (owner & MUTEX_FLAG_HANDOFF) __mutex_handoff(lock, next); =20 + raw_spin_unlock(¤t->blocked_lock); raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q); } =20 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8a223555be2e9..2226f594376d6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6821,7 +6821,17 @@ static void proxy_migrate_task(struct rq *rq, struct= rq_flags *rf, * Find runnable lock owner to proxy for mutex blocked donor * * Follow the blocked-on relation: - * task->blocked_on -> mutex->owner -> task... + * + * ,-> task + * | | blocked-on + * | v + * blocked_donor | mutex + * | | owner + * | v + * `-- task + * + * and set the blocked_donor relation, this latter is used by the mutex + * code to find which (blocked) task to hand-off to. * * Lock order: * @@ -6963,6 +6973,7 @@ find_proxy_task(struct rq *rq, struct task_struct *do= nor, struct rq_flags *rf) * rq, therefore holding @rq->lock is sufficient to * guarantee its existence, as per ttwu_remote(). */ + owner->blocked_donor =3D p; } WARN_ON_ONCE(owner && !owner->on_rq); return owner; @@ -7119,6 +7130,7 @@ static void __sched notrace __schedule(int sched_mode) clear_task_blocked_on(prev, NULL); =20 rq_set_donor(rq, next); + next->blocked_donor =3D NULL; if (unlikely(next->is_blocked && next->blocked_on)) { next =3D find_proxy_task(rq, next, &rf); if (!next) { --=20 2.54.0.563.g4f69b47b94-goog From nobody Fri Jun 12 22:47:00 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD24237BE7F for ; Tue, 12 May 2026 02:56:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554623; cv=none; b=TC7feQ4IzRPd78fl+C0NTyvnwvpojfl/nNt+p+fPBPOr99kwvV8jEInsKrW6+3DmdfunAH3zRBQE+myhLkC2pO7fI5jwWk/L6o7CgiQi9jYlS1wsDY6URJzCXCB+/ydYIifePRCem6bd04lKYzKn2ipRcEzKuutzjV+rE6UiTts= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554623; c=relaxed/simple; bh=iAgIKhZeMhfwnNaJoqYXsgwYQ/hRb18subq/jtb9fGU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nJhswtLrl7VzaMfBmyCWO6bRRTWHaB9kHohcZdt3W9SHO2CUh6lObewY1D6V9lz5+FUSxqRZXWNQM6X2WkvxGjCpZ2yYpeM3AokBBIGQ3q0eIoMjgK7HoCYGjTzLQ8v1i8cUYein5jVbjqF0g/lcNhoCmnLtYESs1Aw3EMFr6Wk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=XR7uRrAa; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="XR7uRrAa" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-365e70c39d0so6438180a91.0 for ; Mon, 11 May 2026 19:56:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778554611; x=1779159411; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=0TmeEHVbfUCyTIyNuBDod729zIuSuUZz3NFAI/IFw5M=; b=XR7uRrAaeMbzTXUmJNnwk7GsAH29xu9UAX1bAIp1C4StDXWP/Dik/iBHV868rmiJOh t5+QMqGnC1T0q3fu2MFZ+3mmrVfBIqwUjnS9mBvBthkWMnhxJKtxot+nrytY9YrMpGI1 aJwxNiqc5duIG6DuoZ/ZBTKsSx3cdnRvq2dJ+NLIJ1Kbb64Nsig7dv4nseLc88N/DRF4 JaWeQRXQClgvWRE5K0U6MMam+7T/vQTOOXHfpNaQBix7tSoopcdFMW6H4CtxB0mZ2Zmv qudYny+19isumw8L452olvQZvIaVYjy3YxAmmFfWvweFe8s2i2+nuxCS2h4WVAI96W4R Zv7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778554611; x=1779159411; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0TmeEHVbfUCyTIyNuBDod729zIuSuUZz3NFAI/IFw5M=; b=o+84ZopyJmHb4aK6LlniKcJdAVXGL/yH0Wnczv3RyEI8Zem6jSn+SS5VeJFuX0YKop kHSe9C37cF1eC2WnA1C7Mgy+ecUhTEk+0Hb/sAZqSrN8A0Pk5QUmL2j/zoGTLj/wSl2y w9B78yyOgmbRmUyDfX7BwHJDyS0b7sdNWTV4p848KwfkyWnQi8VUniSmJcbpWzvJ673a zNg7GzZyRPtywmrmBf8T1PsEOiXoEY8ekOoqfQTWSubcx/gUQfRk0+WypQT3IIl6hRAv OaW7W+txHwRVFNnKwCxwBqFLo5DO97xD3fzaFieUyetffoSu4HPx/W7Y9jRVxhVbcEpc /PfA== X-Gm-Message-State: AOJu0YxlGDNRgczJNFq/ZDPqdPqifm6c64h8NKl+iueOGyyUuS5Oi1aq qOgLRHjFq+6VZF6wBaMNAigPEMHZtGn14x8W9kCW3HzHeJLDJGK9V+hzgr/Bb92bYKXziUXSHgO veOWgQXg4f4vqVjkx7D4YT1iP/dIRi0Wq8KNrfuZkD8o2Q0wjpaOl+a8T9oaw3sf49Pw2/UmX0C oML3+7JphkagbcOqfw8TYx/N5PlXesyK35/Lo7m3vVyBOWd2Ok X-Received: from plbko7.prod.google.com ([2002:a17:903:7c7:b0:2bd:9a1:9a05]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:988c:b0:2b0:c90f:449d with SMTP id d9443c01a7336-2ba793aec30mr195410885ad.19.1778554610574; Mon, 11 May 2026 19:56:50 -0700 (PDT) Date: Tue, 12 May 2026 02:56:18 +0000 In-Reply-To: <20260512025635.2840817-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260512025635.2840817-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <20260512025635.2840817-9-jstultz@google.com> Subject: [PATCH v29 8/9] sched: Break out core of attach_tasks() helper into sched.h From: John Stultz To: LKML Cc: John Stultz , K Prateek Nayak , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Pull the core of attach_tasks() out into sched.h so it can be used more generically. Suggested-by: K Prateek Nayak Signed-off-by: John Stultz --- Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/fair.c | 16 +--------------- kernel/sched/sched.h | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+), 15 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9fb2b2ec1a1ec..7f5a1b4328249 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10047,21 +10047,7 @@ static int detach_tasks(struct lb_env *env) */ static void attach_tasks(struct lb_env *env) { - struct list_head *tasks =3D &env->tasks; - struct task_struct *p; - struct rq_flags rf; - - rq_lock(env->dst_rq, &rf); - update_rq_clock(env->dst_rq); - - while (!list_empty(tasks)) { - p =3D list_first_entry(tasks, struct task_struct, se.group_node); - list_del_init(&p->se.group_node); - - attach_task(env->dst_rq, p); - } - - rq_unlock(env->dst_rq, &rf); + __attach_tasks(env->dst_rq, &env->tasks); } =20 #ifdef CONFIG_NO_HZ_COMMON diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d0f3e164a61d2..e1927e53e85cb 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3045,6 +3045,25 @@ static inline void attach_one_task(struct rq *rq, st= ruct task_struct *p) attach_task(rq, p); } =20 +/* + * __attach_tasks() - attaches a list of tasks (using se.group_node) to + * the new rq + */ +static inline void __attach_tasks(struct rq *rq, struct list_head *tasks) +{ + guard(rq_lock)(rq); + update_rq_clock(rq); + + while (!list_empty(tasks)) { + struct task_struct *p; + + p =3D list_first_entry(tasks, struct task_struct, se.group_node); + list_del_init(&p->se.group_node); + + attach_task(rq, p); + } +} + #ifdef CONFIG_PREEMPT_RT # define SCHED_NR_MIGRATE_BREAK 8 #else --=20 2.54.0.563.g4f69b47b94-goog From nobody Fri Jun 12 22:47:00 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 680E6379EF0 for ; Tue, 12 May 2026 02:56:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554624; cv=none; b=uRCGc9AtiJItXUQiFMZAtmg8uRfo+NHrM/lznKLbWGCVj1Tn8ZwsfxDULL+lp7LOwOeGBEyb6AJMCZ38l9BBJxN+RW65U3zr39WBGorxs818mjQLEY1RMmujS1jefQW0WsJz0O3uj+ev3haz7AnuRuLpPeMoLaQdPAiqBgkmYP8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778554624; c=relaxed/simple; bh=azOv9/89E3v4TrGojOiyqBhrhyYU2fzayWFE2501nCM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Ax8qOFqYgU1eHU6HfYR2RnUrftT6eUhZoco1o4/7jk63hjm5eXlGO3cR4JZA3uprG52hIc8tIWXH7u3P9U9YAz3kZco/shjHdc1V2kWt9arLpGrQk8kaHqUVWFI8e/5+9uJlgX5i9Y+yNP4Ipp7D+a6NVUuAc6Vbp02ZhCnIAyU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=GF90TMjm; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="GF90TMjm" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2ba6fe41283so58651255ad.1 for ; Mon, 11 May 2026 19:56:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778554612; x=1779159412; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=RrDDPlGDQHgcoRDfnNRpPB18PTCHaXCTL4POqtiaWPc=; b=GF90TMjmuicHmGQkxPXzjYNLxaL1Z+DvAbDLhmYfporRGn9QPZeK/GJgyrD1s6lO3U KD6uW/gHOMUy8jMbvtC1WPuELZIgd1R1MIDQxnlnaafcsy7ZQIiouoarYhCYu8qexTvP n3b7jlYv9L9YAkNjQd9AXVFLixAsxstUcxnuS9894dISYNJDXIToyN1qS6X4PPTQdUCM MDGTrPCDtMLdslpNLYG2HTae+wLSvgmhHFBtiWxb0ZWJZGAnZqXfBlTo1jFQSmwZ8cyG yS+1t//Iu2CZptZXYKgBT5OubmsnygZ9dvbR50oc/Ve/FtGq1GglUWmuEfrbXqi5P3zx PNpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778554612; x=1779159412; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RrDDPlGDQHgcoRDfnNRpPB18PTCHaXCTL4POqtiaWPc=; b=inu4FaA47t32te0w04oRPmMEpBE5vJR19XHsriudCx2A/PunIL0WjMXsP5CaCVOk+5 I9nj92UEsqYzVv8XXuA1SoMJRpIB3er4JNmqqysT4kWeTqYf6xw+NaUCStVPT5DP2F9r gGIZi/0toLfITg0t2A5xPNueO/eKFddeGWeUHxRWqamfO3J+ovY7rOJozyq9O5Rnjxxn HlMB0cKTEuPYsb1EDJQ+MbtvepdVN5MOr/QSeIXwCzdtmYB4nKXJgwNpP/yn+Aga6pD8 1Y2/1ajpJ659Vjz5mC5kNu0XmdVLvWsDZ9kO9LEclMSZWS4F67B/z3qzf96UpsoHERBm 97wg== X-Gm-Message-State: AOJu0YxHlNGOxZRwgPZ4s+F22CiYMRpuSrpUICVnHPkQwiomuk4bWe+S JHVlqxv5x/im3zHEyafSlRaKZG7dUafZTOEn4cmfsAu+0SvjpCuMQc0U5osbbPjjQ4x2jIEN8GJ acr+gim5oIUHZxVT1fKH3Jr80YTZDYKgmg23DiIpp9+sAAJUpudcwT8nFoB/347XWhsX+qP26XU uWh2QV4eOoAthJ9madARsjozqNoyuGjJRGBiJReBLKrhXFgK0F X-Received: from plsr4.prod.google.com ([2002:a17:902:be04:b0:2b9:fe8d:ccb8]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:2b04:b0:2b4:6080:d4d0 with SMTP id d9443c01a7336-2ba798c2f47mr302043655ad.22.1778554611376; Mon, 11 May 2026 19:56:51 -0700 (PDT) Date: Tue, 12 May 2026 02:56:19 +0000 In-Reply-To: <20260512025635.2840817-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260512025635.2840817-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.563.g4f69b47b94-goog Message-ID: <20260512025635.2840817-10-jstultz@google.com> Subject: [PATCH v29 9/9] sched: Migrate whole chain in proxy_migrate_task() From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Instead of migrating one task each time through find_proxy_task(), we can walk up the blocked_donor ptrs and migrate the entire current chain in one go. This was broken out of earlier patches and held back while the series was being stabilized, but I wanted to re-introduce it. Signed-off-by: John Stultz --- v12: * Earlier this was re-using blocked_node, but I hit a race with activating blocked entities, and to avoid it introduced a new migration_node listhead v18: * Add init_task initialization of migration_node as suggested by Suleiman v22: * Move migration_node under CONFIG_SCHED_PROXY_EXEC as suggested by K Prateek v25: * Use se.group_node instead of adding migration_node, as suggsested by K Prateek * Integrated attach_tasks() cleanups suggested by K Prateek Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- kernel/sched/core.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2226f594376d6..c6958a20f4758 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6791,9 +6791,9 @@ static void proxy_migrate_task(struct rq *rq, struct = rq_flags *rf, __must_hold(__rq_lockp(rq)) { struct rq *target_rq =3D cpu_rq(target_cpu); + LIST_HEAD(migrate_list); =20 lockdep_assert_rq_held(rq); - WARN_ON(p =3D=3D rq->curr); /* * Since we are migrating a blocked donor, it could be rq->donor, * and we want to make sure there aren't any references from this @@ -6806,13 +6806,20 @@ static void proxy_migrate_task(struct rq *rq, struc= t rq_flags *rf, * before we release the lock. */ proxy_resched_idle(rq); - - deactivate_task(rq, p, DEQUEUE_NOCLOCK); - proxy_set_task_cpu(p, target_cpu); - + for (; p; p =3D p->blocked_donor) { + WARN_ON(p =3D=3D rq->curr); + deactivate_task(rq, p, DEQUEUE_NOCLOCK); + proxy_set_task_cpu(p, target_cpu); + /* + * We can re-use se.group_node to migrate the thing, + * because @p is deactivated (won't be balanced) and + * we hold the rq_lock. + */ + list_add(&p->se.group_node, &migrate_list); + } proxy_release_rq_lock(rq, rf); =20 - attach_one_task(target_rq, p); + __attach_tasks(target_rq, &migrate_list); =20 proxy_reacquire_rq_lock(rq, rf); } --=20 2.54.0.563.g4f69b47b94-goog