From nobody Fri Apr  3 08:34:57 2026
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com
 [209.85.216.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 652FA3B7771
	for <linux-kernel@vger.kernel.org>; Tue, 24 Mar 2026 19:13:43 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.74
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774379624; cv=none;
 b=WmHTgBpCJiAtBlm9o1DOiA3v++1CZaABj6Bw68TGxdAfjk4mebf8ZERCw9oBZOxNNOQ8Nobe4EKadceD4RC1+LtYa7kmkm2SVnjOsPvEBeXlGxnoUTgo56ipL7n5erWU5uk24T/oP5LtmaWsBsUKK+ETHuK+jpEqaiNhGCIc1VY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774379624; c=relaxed/simple;
	bh=ngCb/aQ4bDVXBkz5wCqEB3etCuqhzIu3JEQj3hyK+bc=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=R+J2y/Q1cvTV5f78FcO7fRfUSjfnOZ8d346pk540mZ9RWnPXv7Z37B/yRbdBHRvcntYN0BlMZn6uZZlUgYGzrfnTTpIHRvxw2WaC54lhq8ZKp2zH81cSmbOt+2MBKyfw1U5gA1IMOa+l2o2z60eJzAXv8lx9WDSgKUXmpxn7U3I=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=sQ4PjQ4X; arc=none smtp.client-ip=209.85.216.74
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="sQ4PjQ4X"
Received: by mail-pj1-f74.google.com with SMTP id
 98e67ed59e1d1-359fe4e9ea7so5235836a91.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 24 Mar 2026 12:13:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1774379623; x=1774984423;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=keD820Bd2VVPiFIRNkR+P92K+R/iH8crdu/iEvU+RlE=;
        b=sQ4PjQ4XoKnB1oVO0cdw7DJA1TbMMkquvUTS8SV7MpyKHCmkuvoxD05mJKrj9wuwM2
         khpQBSyrqYx0RNojWX884wPI5wP/dikw/blB4muQLfmBERYC0wyL236cLR4uNHwqKRm2
         z+NCQdWHZLdoRslcC8UOGCfJcPbjBD1dZ8nx6k+ewQG6lrQOt/zQYvgXfgem2KvpCJRZ
         DbwdjluW1eIdzcxtH7tJEgGoGn0ZZ4gxIcJZBe78L6gxEmpO0sjzmP9FwPbPW4H/sx6u
         Z2+jIeW9xpIZh4ePz3DY47BuTQ/+26vQlrMVQrR3oRO8k7IbOoJhmlTQFVU7NP/cxCAa
         dlzQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1774379623; x=1774984423;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=keD820Bd2VVPiFIRNkR+P92K+R/iH8crdu/iEvU+RlE=;
        b=feRNWOvP0CWkmW7TfdvFs3aQyZ1x+WO6CKaVW7mcjQOeWr+DzyoyPu666JjKS53DMl
         EDeE44fvLjPUH/MMiiEwxZZ7kZAKQpi+WBt6th/AtFcefxGYbdddxPLdKw2pFRflaOml
         Fv8K0GLtOa0nGhj3rJwmEF7euqHuhuR2LyZV+ACMXnMxYbjOXGBOKC6xRA6EjekrQiXY
         O6ytLbxw6szWwUJQ41YlwoPN82x1zTmiImYrTU4/Lud4rjPiVG9q1v77QpDlz2DVN4VQ
         n3FoYONKbulEa9Ec6AVCVxSgwBFpMZrAZWrwHtRq9qfbowPMQbsYcNK7d3Ff1mxQ++yc
         RRqA==
X-Gm-Message-State: AOJu0Yw+ITtTZtf0XOORXqYtZKjBKDGFB5SAWP06L9Iai1a0J8J2TkgO
	ReLwLPBJhdDaktieUhY7JyC02yVfTLISGKelPjmJnftT+8VqQkJY1tvTRzerkt69UWfhuQGPl0Q
	VdWidJbDdsuiR8C5kIgd5WxcW4uMLvkQ0fGuv6f7fR4mRT1oMnOMjxMGkuNz/B4jlHQGv3eKMKc
	thCUBRmuy4oDJA8CIgKzSQPsxzloBL55tFX5L3aWCSzW/nFwAj
X-Received: from pjuj3.prod.google.com ([2002:a17:90a:d003:b0:35b:98ff:2e86])
 (user=jstultz job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:4a4f:b0:359:fd9a:c50c
 with SMTP id 98e67ed59e1d1-35c0ddb20b3mr434474a91.22.1774379622193; Tue, 24
 Mar 2026 12:13:42 -0700 (PDT)
Date: Tue, 24 Mar 2026 19:13:16 +0000
In-Reply-To: <20260324191337.1841376-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260324191337.1841376-1-jstultz@google.com>
X-Mailer: git-send-email 2.53.0.1018.g2bb0e51243-goog
Message-ID: <20260324191337.1841376-2-jstultz@google.com>
Subject: [PATCH v26 01/10] sched: Make class_schedulers avoid pushing current,
 and get rid of proxy_tag_curr()
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Peter Zijlstra <peterz@infradead.org>,
 Joel Fernandes <joelagnelf@nvidia.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>, Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
 Suleiman Souhlal <suleiman@google.com>,
	kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
 kernel-team@android.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

With proxy-execution, the scheduler selects the donor, but for
blocked donors, we end up running the lock owner.

This caused some complexity, because the class schedulers make
sure to remove the task they pick from their pushable task
lists, which prevents the donor from being migrated, but there
wasn't then anything to prevent rq->curr from being migrated
if rq->curr !=3D rq->donor.

This was sort of hacked around by calling proxy_tag_curr() on
the rq->curr task if we were running something other then the
donor. proxy_tag_curr() did a dequeue/enqueue pair on the
rq->curr task, allowing the class schedulers to remove it from
their pushable list.

The dequeue/enqueue pair was wasteful, and additonally K Prateek
highlighted that we didn't properly undo things when we stopped
proxying, leaving the lock owner off the pushable list.

After some alternative approaches were considered, Peter
suggested just having the RT/DL classes just avoid migrating
when task_on_cpu().

So rework pick_next_pushable_dl_task() and the rt
pick_next_pushable_task() functions so that they skip over the
first pushable task if it is on_cpu.

Then just drop all of the proxy_tag_curr() logic.

Fixes: be39617e38e0 ("sched: Fix proxy/current (push,pull)ability")
Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
Closes: https://lore.kernel.org/lkml/e735cae0-2cc9-4bae-b761-fcb082ed3e94@a=
md.com/
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <jstultz@google.com>
---
v26:
* Fix issue Juri noticed by using a separate iterator value in
  pick_next_pusahble_task_dl()

Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
 kernel/sched/core.c     | 24 ------------------------
 kernel/sched/deadline.c | 18 ++++++++++++++++--
 kernel/sched/rt.c       | 15 ++++++++++++---
 3 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 496dff740dcaf..92b1807c05a4e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6705,23 +6705,6 @@ find_proxy_task(struct rq *rq, struct task_struct *d=
onor, struct rq_flags *rf)
 }
 #endif /* SCHED_PROXY_EXEC */
=20
-static inline void proxy_tag_curr(struct rq *rq, struct task_struct *owner)
-{
-	if (!sched_proxy_exec())
-		return;
-	/*
-	 * pick_next_task() calls set_next_task() on the chosen task
-	 * at some point, which ensures it is not push/pullable.
-	 * However, the chosen/donor task *and* the mutex owner form an
-	 * atomic pair wrt push/pull.
-	 *
-	 * Make sure owner we run is not pushable. Unfortunately we can
-	 * only deal with that by means of a dequeue/enqueue cycle. :-/
-	 */
-	dequeue_task(rq, owner, DEQUEUE_NOCLOCK | DEQUEUE_SAVE);
-	enqueue_task(rq, owner, ENQUEUE_NOCLOCK | ENQUEUE_RESTORE);
-}
-
 /*
  * __schedule() is the main scheduler function.
  *
@@ -6874,9 +6857,6 @@ static void __sched notrace __schedule(int sched_mode)
 		 */
 		RCU_INIT_POINTER(rq->curr, next);
=20
-		if (!task_current_donor(rq, next))
-			proxy_tag_curr(rq, next);
-
 		/*
 		 * The membarrier system call requires each architecture
 		 * to have a full memory barrier after updating
@@ -6910,10 +6890,6 @@ static void __sched notrace __schedule(int sched_mod=
e)
 		/* Also unlocks the rq: */
 		rq =3D context_switch(rq, prev, next, &rf);
 	} else {
-		/* In case next was already curr but just got blocked_donor */
-		if (!task_current_donor(rq, next))
-			proxy_tag_curr(rq, next);
-
 		rq_unpin_lock(rq, &rf);
 		__balance_callbacks(rq, NULL);
 		raw_spin_rq_unlock_irq(rq);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d08b004293234..52c524f5ba4dd 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2801,12 +2801,26 @@ static int find_later_rq(struct task_struct *task)
=20
 static struct task_struct *pick_next_pushable_dl_task(struct rq *rq)
 {
-	struct task_struct *p;
+	struct task_struct *i, *p =3D NULL;
+	struct rb_node *next_node;
=20
 	if (!has_pushable_dl_tasks(rq))
 		return NULL;
=20
-	p =3D __node_2_pdl(rb_first_cached(&rq->dl.pushable_dl_tasks_root));
+	next_node =3D rb_first_cached(&rq->dl.pushable_dl_tasks_root);
+	while (next_node) {
+		i =3D __node_2_pdl(next_node);
+		/* make sure task isn't on_cpu (possible with proxy-exec) */
+		if (!task_on_cpu(rq, i)) {
+			p =3D i;
+			break;
+		}
+
+		next_node =3D rb_next(next_node);
+	}
+
+	if (!p)
+		return NULL;
=20
 	WARN_ON_ONCE(rq->cpu !=3D task_cpu(p));
 	WARN_ON_ONCE(task_current(rq, p));
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f69e1f16d9238..61569b622d1a3 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1853,13 +1853,22 @@ static int find_lowest_rq(struct task_struct *task)
=20
 static struct task_struct *pick_next_pushable_task(struct rq *rq)
 {
-	struct task_struct *p;
+	struct plist_head *head =3D &rq->rt.pushable_tasks;
+	struct task_struct *i, *p =3D NULL;
=20
 	if (!has_pushable_tasks(rq))
 		return NULL;
=20
-	p =3D plist_first_entry(&rq->rt.pushable_tasks,
-			      struct task_struct, pushable_tasks);
+	plist_for_each_entry(i, head, pushable_tasks) {
+		/* make sure task isn't on_cpu (possible with proxy-exec) */
+		if (!task_on_cpu(rq, i)) {
+			p =3D i;
+			break;
+		}
+	}
+
+	if (!p)
+		return NULL;
=20
 	BUG_ON(rq->cpu !=3D task_cpu(p));
 	BUG_ON(task_current(rq, p));
--=20
2.53.0.1018.g2bb0e51243-goog
From nobody Fri Apr  3 08:34:57 2026
Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com
 [209.85.215.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 088363B8D55
	for <linux-kernel@vger.kernel.org>; Tue, 24 Mar 2026 19:13:44 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.215.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774379626; cv=none;
 b=czE44BM39lWA9hXw5QyQnFZKxH4iPPIWQlQH9ObjhPDxvZOnyQFJdhdycxj4GdCpR2zypMW82OB7jrOggJ/BQ7BlD2Ubd4Idtu6g2NB3mSOGg8qOZpNF+gOt6veTWlQwSq0JBuRZUNo7VTRI4eS54OusvKkrKt3Uv0Wc/xNU+i0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774379626; c=relaxed/simple;
	bh=iZIhg/r/TNNDlTVE57PfIglOHZs3AE+cYlDHkVMVeEA=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=k002HyWKPuXd4WAqXDU87A5JkDU8+/o+g818CU2m/MDdewevY6CxVtxc0djg8v/J1UUBc0wq7aHjbzy+tGWOA+qVnGlPcXdqR3zJUJQyTM696Cb+v29XKFkT53BIDPebC9FG8bETP0RXPJv3HpSm1cCrTi3mg6R/wKJQKCZY8uU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=VvPuOqwY; arc=none smtp.client-ip=209.85.215.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="VvPuOqwY"
Received: by mail-pg1-f202.google.com with SMTP id
 41be03b00d2f7-c7422397574so18749645a12.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 24 Mar 2026 12:13:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1774379624; x=1774984424;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=xIC3GhT6qS+YczviL1cN/gFYHUb8GVHPov43NUIVNl0=;
        b=VvPuOqwYWIFav9UZNiz1r0fF/SPNj/41afBK0u4FE4uz/d0qg+MUS3a9iZmNFRdAbZ
         vnZiIugjjXsothkyifeZtyTz23BARoSeK057+lslRP2l2orsoj8fnkGQjrigFCiZ1g03
         f9Hkm5nJKsXYGZoGnUdT4S5F90py96tpdH6Xlc1jPpdqeV4YWg66Qb0LhMRj5Mp/tcID
         vmRW9fOeOOHZvE3kFC505l7HEm6bVYEaQHZ/7bWFPR1VwSdhhF1bUY60Q+sOfbdwycDN
         GFpVYxbEKfZ27GH2/mwyuz9oojjIf3ihqm1VfCWl0CaS4uJf+vhUY9gMsAY1vjt5DfeB
         ve6g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1774379624; x=1774984424;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=xIC3GhT6qS+YczviL1cN/gFYHUb8GVHPov43NUIVNl0=;
        b=VJZgu9fDTss+7LEFfWLYN5JzlvwlRZED2N6UtEUJ4joKAG7pUe1fEFL85KuCy+kRMd
         Lki47xC9yuDEriE5tJXDjkD56uGxBcrVKRgx3pRmG5QtZ2Us3eLZEA9a3kr5X9s6n71R
         MzKm7SsPfP+P9aiGK4gKspJF3OOV4XiiPlnrGNiuW72FhH26ZZdZD0P5Q6XPWGING6pw
         n+VcJvHLvUZh+N0l2jumhD0hJNrolplpYurEu0kDF7otovn85aD5ptaSn8gNJUqnFTkK
         nlpbWAjfjtAUkWqpdtXa3l23tClazO8mjrchZifoNyG04DnaigFpj/neYFL1ZZVl20VP
         LGcQ==
X-Gm-Message-State: AOJu0YwerPg7JYOgGCYieo3ILPmp1evx6rR8Qk6wOh75xD3xCfXSZYTX
	mbddhtg+GF+/pAkpiGcFRtGjp5LgyGpFq00GlF6+CLwxwD4XTsQ62lpNcOKdQN167SCVBrFIwe2
	TTV5611MO2KDcFsHhpikr8mqAESRZpZnWiQN5pA1C8r69nsTvFZjSPvoOuKX0wQGcRZ//r658PH
	2k8n9BG8XgX2PqEYf/wmttmvW/+q82u9R69bqzc5tmz+cF5FTv
X-Received: from pfblr9.prod.google.com
 ([2002:a05:6a00:7389:b0:829:a0ac:22ee])
 (user=jstultz job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a00:8717:b0:82c:6f07:2dc6
 with SMTP id d2e1a72fcca58-82c6f0733d8mr258166b3a.52.1774379623906; Tue, 24
 Mar 2026 12:13:43 -0700 (PDT)
Date: Tue, 24 Mar 2026 19:13:17 +0000
In-Reply-To: <20260324191337.1841376-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260324191337.1841376-1-jstultz@google.com>
X-Mailer: git-send-email 2.53.0.1018.g2bb0e51243-goog
Message-ID: <20260324191337.1841376-3-jstultz@google.com>
Subject: [PATCH v26 02/10] sched: Minimise repeated sched_proxy_exec()
 checking
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Peter Zijlstra <peterz@infradead.org>,
 Joel Fernandes <joelagnelf@nvidia.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>, Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
 Suleiman Souhlal <suleiman@google.com>,
	kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
 kernel-team@android.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Peter noted: Compilers are really bad (as in they utterly refuse)
optimizing (even when marked with __pure) the static branch
things, and will happily emit multiple identical in a row.

So pull out the one obvious sched_proxy_exec() branch in
__schedule() and remove some of the 'implicit' ones in that
path.

Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <jstultz@google.com>
---
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
 kernel/sched/core.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 92b1807c05a4e..dc044a405f83b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6600,11 +6600,7 @@ find_proxy_task(struct rq *rq, struct task_struct *d=
onor, struct rq_flags *rf)
 	struct mutex *mutex;
=20
 	/* Follow blocked_on chain. */
-	for (p =3D donor; task_is_blocked(p); p =3D owner) {
-		mutex =3D p->blocked_on;
-		/* Something changed in the chain, so pick again */
-		if (!mutex)
-			return NULL;
+	for (p =3D donor; (mutex =3D p->blocked_on); p =3D owner) {
 		/*
 		 * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
 		 * and ensure @owner sticks around.
@@ -6835,12 +6831,14 @@ static void __sched notrace __schedule(int sched_mo=
de)
 	next =3D pick_next_task(rq, rq->donor, &rf);
 	rq_set_donor(rq, next);
 	rq->next_class =3D next->sched_class;
-	if (unlikely(task_is_blocked(next))) {
-		next =3D find_proxy_task(rq, next, &rf);
-		if (!next)
-			goto pick_again;
-		if (next =3D=3D rq->idle)
-			goto keep_resched;
+	if (sched_proxy_exec()) {
+		if (unlikely(next->blocked_on)) {
+			next =3D find_proxy_task(rq, next, &rf);
+			if (!next)
+				goto pick_again;
+			if (next =3D=3D rq->idle)
+				goto keep_resched;
+		}
 	}
 picked:
 	clear_tsk_need_resched(prev);
--=20
2.53.0.1018.g2bb0e51243-goog
From nobody Fri Apr  3 08:34:57 2026
Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com
 [209.85.214.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 704903B9DA6
	for <linux-kernel@vger.kernel.org>; Tue, 24 Mar 2026 19:13:46 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774379627; cv=none;
 b=a8BBQdceUrQ2HalZmm4HU4xijPgYR0d/rHE5lLbw/8o7Bu8jjqwXQBfrteDwk4FNS6SfZkjABA2n7Wp5k5+zl4oVEncPv7pGHuXzX/CjMoIZFr/30isS8vJAY7la8QvGiC+yKNCiIMv30RmMgTuuKbUEmVjEM0+0TsFKLic4vdg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774379627; c=relaxed/simple;
	bh=s24gttX2fPobni+4uF5EYC/ETHJY8wNfhjrnD0ySd1E=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=u2TAtiUVM7NMmo6T46U7rOkIm0+FQTBfL1teVTm1XbZoV7QA5PTk8DKnsCFbgNYAt4M3AyG1BPrxWtg37qq+3acaO+bjT8+h/TEcFd4Lf8HabsnJ43zzCFGO8kdz60D+a9A0U0rsX0n0M75T9uNxDb5Va5M3Op9Bm7Meuap5NXw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=FUSeRBya; arc=none smtp.client-ip=209.85.214.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="FUSeRBya"
Received: by mail-pl1-f201.google.com with SMTP id
 d9443c01a7336-2aec07e8aafso20081855ad.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 24 Mar 2026 12:13:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1774379626; x=1774984426;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=bkKxrN4mK8grYtgcOZzbVYEk2bIMXF95k62kkx3vWTE=;
        b=FUSeRByaulyJkRjgwbGrdCMXUOJ2akYr2KneLUVDZHIQKGguB8twe5SJcJi/5HSCHX
         yajW8GIp3GvwSLuLlnccFkNvmLogKTaA9KBAmQxiDdHEH7ter6gBsj7I8biCFuI/rE/e
         Lkk9ZF36+hsPT2BPFJwmGmNszmCnxMEjClFKl5QoIoDetFeDS23yWvxeKchuiLWJRV9r
         tZpzUwvnUi6EiHzbkpqYeWt1IT6u1glJJ4f8aXekgm5I3X4aJuZCV00j8L3Mpu8FO37m
         MUhDRC8NVA8loG8JnNfKlNKVmKWg8fpuGrS92rH9lGVsjZQa/nCuyMtikJ87JEoQdb1M
         k+jQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1774379626; x=1774984426;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=bkKxrN4mK8grYtgcOZzbVYEk2bIMXF95k62kkx3vWTE=;
        b=jkkUhoYHVhDL2DnWTSxsLvmJU5io7KwVeHtu+/CuDDyDeXG1wnENdsw1NRs93nFN3l
         MmOmIkwrZU3DcaiH9Hlvl8O+Oy+va9hpJ09VLQd92MDmKGNqR/QZevd9io3kARcDg7Xs
         xUJaicXsaqqx1khnpep5IdD1W9Mj7IoJqJvH8z5a2durPEPhyvk7kwTnOQTlrEbYTzF/
         XHPoR2bxj3R1Mxv2YDiL1njv+taORNUKqVgoQT6FEdUr/+IcOffWq8xEh201OiTv0zzI
         WIvHVHjG50/RqSsSIaIsBVczz6tdjSFDRVI8yLWJwSFnMbk6ZTwydoMk2ubCb6GXDU58
         lOLA==
X-Gm-Message-State: AOJu0YxAUpiZ+Uhl1qdgthJiTReVCnnUsaEZMSPGrhcEz9H3QoeO/FPT
	j/Qo4ivy6utpSrmyLRpLAaXAq41yVFY07lCHKAQnlRqvo1y8OnHQqKHEa2ruWseFOCy2sMxj9vc
	iVm2yrRrr1ubxy/aMej8K0zvdc7BJVCfDP+T3zy/tJUOA/ri+IOYxNy7gJTmFLMlvKYBzSnskFj
	alebg+NLRNMK/17qTgMOpk3bDunhjS+Pg/wbjF5XPT7o4JfMSn
X-Received: from pjvd16.prod.google.com ([2002:a17:90a:d990:b0:35b:a305:76f5])
 (user=jstultz job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:540f:b0:35b:9397:7073
 with SMTP id 98e67ed59e1d1-35c0dd9aa90mr392920a91.30.1774379625457; Tue, 24
 Mar 2026 12:13:45 -0700 (PDT)
Date: Tue, 24 Mar 2026 19:13:18 +0000
In-Reply-To: <20260324191337.1841376-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260324191337.1841376-1-jstultz@google.com>
X-Mailer: git-send-email 2.53.0.1018.g2bb0e51243-goog
Message-ID: <20260324191337.1841376-4-jstultz@google.com>
Subject: [PATCH v26 03/10] sched: Fix potentially missing balancing with Proxy
 Exec
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Peter Zijlstra <peterz@infradead.org>,
 Joel Fernandes <joelagnelf@nvidia.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>, Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
 Suleiman Souhlal <suleiman@google.com>,
	kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
 kernel-team@android.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

K Prateek pointed out that with Proxy Exec, we may have cases
where we context switch in __schedule(), while the donor remains
the same. This could cause balancing issues, since the
put_prev_set_next() logic short-cuts if (prev =3D=3D next). With
proxy-exec prev is the previous donor, and next is the next
donor. Should the donor remain the same, but different tasks are
picked to actually run, the shortcut will have avoided enqueuing
the sched class balance callback.

So, if we are context switching, add logic to catch the
same-donor case, and trigger the put_prev/set_next calls to
ensure the balance callbacks get enqueued.

Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
Closes: https://lore.kernel.org/lkml/20ea3670-c30a-433b-a07f-c4ff98ae2379@a=
md.com/
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <jstultz@google.com>
---
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
 kernel/sched/core.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dc044a405f83b..610e48cdb66a9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6829,9 +6829,11 @@ static void __sched notrace __schedule(int sched_mod=
e)
=20
 pick_again:
 	next =3D pick_next_task(rq, rq->donor, &rf);
-	rq_set_donor(rq, next);
 	rq->next_class =3D next->sched_class;
 	if (sched_proxy_exec()) {
+		struct task_struct *prev_donor =3D rq->donor;
+
+		rq_set_donor(rq, next);
 		if (unlikely(next->blocked_on)) {
 			next =3D find_proxy_task(rq, next, &rf);
 			if (!next)
@@ -6839,7 +6841,27 @@ static void __sched notrace __schedule(int sched_mod=
e)
 			if (next =3D=3D rq->idle)
 				goto keep_resched;
 		}
+		if (rq->donor =3D=3D prev_donor && prev !=3D next) {
+			struct task_struct *donor =3D rq->donor;
+			/*
+			 * When transitioning like:
+			 *
+			 *         prev         next
+			 * donor:    B            B
+			 * curr:     A          B or C
+			 *
+			 * then put_prev_set_next_task() will not have done
+			 * anything, since B =3D=3D B. However, A might have
+			 * missed a RT/DL balance opportunity due to being
+			 * on_cpu.
+			 */
+			donor->sched_class->put_prev_task(rq, donor, donor);
+			donor->sched_class->set_next_task(rq, donor, true);
+		}
+	} else {
+		rq_set_donor(rq, next);
 	}
+
 picked:
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
--=20
2.53.0.1018.g2bb0e51243-goog
From nobody Fri Apr  3 08:34:57 2026
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com
 [209.85.216.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9FD73BC683
	for <linux-kernel@vger.kernel.org>; Tue, 24 Mar 2026 19:13:47 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.74
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774379629; cv=none;
 b=IwFRYY5ENMG9TlatlhpzwCiVlhONhZcALGgSkSwTgKcpeYNGAuTLNv92TULRkZuHjkGSxtleG0TaUvz6CF97n7N0VGzy1JEKGoeHgt8jxsWAZ6DqpANj8yszg4Y6i4JnG5N0uO4KyuQw/nM6W8iHQATJNpqheBTjHpzV/55iHMs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774379629; c=relaxed/simple;
	bh=1xwLJTadRtheElHy7jN+Ty5RwtDbKPw8FK/ib8Dx3w4=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=CMeAotHu65827lILO6AEdXjRzkvZkhd24ub2niFMEz4kh8NyMUVXsI8h2C8QO2ZGB/mjvbFANM+8mlkPrMawC/yXVQf+tCFPOVNceQjpCiCjBAgFPSFr/HaPcKW3DmoF33psJ7roadVEbhUgwmPCVy1/0A9pJYAlHeH+xZfCXn0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=hV6TmaQ5; arc=none smtp.client-ip=209.85.216.74
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="hV6TmaQ5"
Received: by mail-pj1-f74.google.com with SMTP id
 98e67ed59e1d1-3568090851aso16693071a91.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 24 Mar 2026 12:13:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1774379627; x=1774984427;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=k74Wa8titNz8JCdnu+fsDD1ZR3pa7e3m92lYpEBLd+E=;
        b=hV6TmaQ5z9DEmtIqxf0cCEdFUaM2OO7C/AGDpGGuNZX73a2Jx7FGygBC05hQNthJUV
         nkMr1osDC+mfdfd05N1dvEIDBJUxyn5GI6bkSRLdN0y4AwlgXteyJMOPvrppC/M533Xm
         OrM0zReRj2uDlaAFxtOu25GSDPMu1jtQdeNuUwQls+Zp7Orzz7wxx8K0VC53+R76T3mt
         ZW3z5V9e198DldqSIS/Z5HD/ye8qkEBliUJdRanSeHiuWdrAHdYTpMfSkFWwWGV8rh9j
         0txCsQDtwxZI5m3nSNI/wF3/w6IOK2Xh2VAzCHbFZ6Bzx3snnxPaVe3TyODPG96pfVgR
         LrAQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1774379627; x=1774984427;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=k74Wa8titNz8JCdnu+fsDD1ZR3pa7e3m92lYpEBLd+E=;
        b=Nh+S/vFkhf8l82h72oHLSZ8kNfTgx8PpsV0xCS6+sKDthPD4O2CoyP83A6Jp1GqLFT
         dIl/MQ6bsLgzdxrgE0uhvcFrvQcgdnQgHKa9yC2cTaFGLUqyp2UxJRfeBigJO95wuV+z
         f4pTghwNMIE+/+Icx/Hqm1mAEPCaEViUOKw5HVB+UaegCfGLtrIwdytts1O0dL7qh/8m
         eperPKwbdG7RMzC/JD1kYwQEv6231LCLdLqP1gKIZxW54S6Dnw3gtcliM2mPDqh33kj6
         qd+pqs0WX40Pxv2PBveDOjBI2pXH8Ix5HvMlERVLTolNYyJCmbxByk9BEQrtiYbmhZGs
         TTUg==
X-Gm-Message-State: AOJu0YwXqkoOJrSYhhp0rvQIL35S7nwy4LTH9n0RH156HETPQdgGRQkG
	hUNnyYWrMHtffhPOI8An6mNDBzs0cvsU/CiuJcbF+A4BbJ+xV2GbUVDCFe3Is/CyT9Qgvkx36Xg
	oOirQUt39n5Y1k0TvIntabPfvHPKFQnscyymv415hOilQhNQVRU99klCJZ3YroPQGafIVVOeNMM
	/XEthnBWyquOVdecXOvIzs555uO+Ro4xTc3tjvyAi6MUB3zgPn
X-Received: from pjzd9.prod.google.com ([2002:a17:90a:e289:b0:356:1f2b:7ea1])
 (user=jstultz job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:3f0d:b0:359:9158:7459
 with SMTP id 98e67ed59e1d1-35c0db6a849mr533535a91.0.1774379626910; Tue, 24
 Mar 2026 12:13:46 -0700 (PDT)
Date: Tue, 24 Mar 2026 19:13:19 +0000
In-Reply-To: <20260324191337.1841376-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260324191337.1841376-1-jstultz@google.com>
X-Mailer: git-send-email 2.53.0.1018.g2bb0e51243-goog
Message-ID: <20260324191337.1841376-5-jstultz@google.com>
Subject: [PATCH v26 04/10] locking: Add task::blocked_lock to serialize
 blocked_on state
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Joel Fernandes <joelagnelf@nvidia.com>, Qais Yousef <qyousef@layalina.io>,
	Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>, Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
 Suleiman Souhlal <suleiman@google.com>,
	kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
 kernel-team@android.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

So far, we have been able to utilize the mutex::wait_lock
for serializing the blocked_on state, but when we move to
proxying across runqueues, we will need to add more state
and a way to serialize changes to this state in contexts
where we don't hold the mutex::wait_lock.

So introduce the task::blocked_lock, which nests under the
mutex::wait_lock in the locking order, and rework the locking
to use it.

Signed-off-by: John Stultz <jstultz@google.com>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
v15:
* Split back out into later in the series
v16:
* Fixups to mark tasks unblocked before sleeping in
  mutex_optimistic_spin()
* Rework to use guard() as suggested by Peter
v19:
* Rework logic for PREEMPT_RT issues reported by
  K Prateek Nayak
v21:
* After recently thinking more on ww_mutex code, I
  reworked the blocked_lock usage in mutex lock to
  avoid having to take nested locks in the ww_mutex
  paths, as I was concerned the lock ordering
  constraints weren't as strong as I had previously
  thought.
v22:
* Added some extra spaces to avoid dense code blocks
  suggested by K Prateek
v23:
* Move get_task_blocked_on() to kernel/locking/mutex.h
  as requested by PeterZ

Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
 include/linux/sched.h        | 48 +++++++++++++-----------------------
 init/init_task.c             |  1 +
 kernel/fork.c                |  1 +
 kernel/locking/mutex-debug.c |  4 +--
 kernel/locking/mutex.c       | 40 +++++++++++++++++++-----------
 kernel/locking/mutex.h       |  6 +++++
 kernel/locking/ww_mutex.h    |  4 +--
 kernel/sched/core.c          |  4 ++-
 8 files changed, 58 insertions(+), 50 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5a5d3dbc9cdf3..2eef9bc6daaab 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1238,6 +1238,7 @@ struct task_struct {
 #endif
=20
 	struct mutex			*blocked_on;	/* lock we're blocked on */
+	raw_spinlock_t			blocked_lock;
=20
 #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
 	/*
@@ -2181,57 +2182,42 @@ extern int __cond_resched_rwlock_write(rwlock_t *lo=
ck) __must_hold(lock);
 #ifndef CONFIG_PREEMPT_RT
 static inline struct mutex *__get_task_blocked_on(struct task_struct *p)
 {
-	struct mutex *m =3D p->blocked_on;
-
-	if (m)
-		lockdep_assert_held_once(&m->wait_lock);
-	return m;
+	lockdep_assert_held_once(&p->blocked_lock);
+	return p->blocked_on;
 }
=20
 static inline void __set_task_blocked_on(struct task_struct *p, struct mut=
ex *m)
 {
-	struct mutex *blocked_on =3D READ_ONCE(p->blocked_on);
-
 	WARN_ON_ONCE(!m);
 	/* The task should only be setting itself as blocked */
 	WARN_ON_ONCE(p !=3D current);
-	/* Currently we serialize blocked_on under the mutex::wait_lock */
-	lockdep_assert_held_once(&m->wait_lock);
+	/* Currently we serialize blocked_on under the task::blocked_lock */
+	lockdep_assert_held_once(&p->blocked_lock);
 	/*
 	 * Check ensure we don't overwrite existing mutex value
 	 * with a different mutex. Note, setting it to the same
 	 * lock repeatedly is ok.
 	 */
-	WARN_ON_ONCE(blocked_on && blocked_on !=3D m);
-	WRITE_ONCE(p->blocked_on, m);
-}
-
-static inline void set_task_blocked_on(struct task_struct *p, struct mutex=
 *m)
-{
-	guard(raw_spinlock_irqsave)(&m->wait_lock);
-	__set_task_blocked_on(p, m);
+	WARN_ON_ONCE(p->blocked_on && p->blocked_on !=3D m);
+	p->blocked_on =3D m;
 }
=20
 static inline void __clear_task_blocked_on(struct task_struct *p, struct m=
utex *m)
 {
-	if (m) {
-		struct mutex *blocked_on =3D READ_ONCE(p->blocked_on);
-
-		/* Currently we serialize blocked_on under the mutex::wait_lock */
-		lockdep_assert_held_once(&m->wait_lock);
-		/*
-		 * There may be cases where we re-clear already cleared
-		 * blocked_on relationships, but make sure we are not
-		 * clearing the relationship with a different lock.
-		 */
-		WARN_ON_ONCE(blocked_on && blocked_on !=3D m);
-	}
-	WRITE_ONCE(p->blocked_on, NULL);
+	/* Currently we serialize blocked_on under the task::blocked_lock */
+	lockdep_assert_held_once(&p->blocked_lock);
+	/*
+	 * There may be cases where we re-clear already cleared
+	 * blocked_on relationships, but make sure we are not
+	 * clearing the relationship with a different lock.
+	 */
+	WARN_ON_ONCE(m && p->blocked_on && p->blocked_on !=3D m);
+	p->blocked_on =3D NULL;
 }
=20
 static inline void clear_task_blocked_on(struct task_struct *p, struct mut=
ex *m)
 {
-	guard(raw_spinlock_irqsave)(&m->wait_lock);
+	guard(raw_spinlock_irqsave)(&p->blocked_lock);
 	__clear_task_blocked_on(p, m);
 }
 #else
diff --git a/init/init_task.c b/init/init_task.c
index 5c838757fc10e..b5f48ebdc2b6e 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -169,6 +169,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) =
=3D {
 	.journal_info	=3D NULL,
 	INIT_CPU_TIMERS(init_task)
 	.pi_lock	=3D __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
+	.blocked_lock	=3D __RAW_SPIN_LOCK_UNLOCKED(init_task.blocked_lock),
 	.timer_slack_ns =3D 50000, /* 50 usec default slack */
 	.thread_pid	=3D &init_struct_pid,
 	.thread_node	=3D LIST_HEAD_INIT(init_signals.thread_head),
diff --git a/kernel/fork.c b/kernel/fork.c
index bc2bf58b93b65..079802cb61002 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2076,6 +2076,7 @@ __latent_entropy struct task_struct *copy_process(
 	ftrace_graph_init_task(p);
=20
 	rt_mutex_init_task(p);
+	raw_spin_lock_init(&p->blocked_lock);
=20
 	lockdep_assert_irqs_enabled();
 #ifdef CONFIG_PROVE_LOCKING
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index 2c6b02d4699be..cc6aa9c6e9813 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -54,13 +54,13 @@ void debug_mutex_add_waiter(struct mutex *lock, struct =
mutex_waiter *waiter,
 	lockdep_assert_held(&lock->wait_lock);
=20
 	/* Current thread can't be already blocked (since it's executing!) */
-	DEBUG_LOCKS_WARN_ON(__get_task_blocked_on(task));
+	DEBUG_LOCKS_WARN_ON(get_task_blocked_on(task));
 }
=20
 void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *wa=
iter,
 			 struct task_struct *task)
 {
-	struct mutex *blocked_on =3D __get_task_blocked_on(task);
+	struct mutex *blocked_on =3D get_task_blocked_on(task);
=20
 	DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list));
 	DEBUG_LOCKS_WARN_ON(waiter->task !=3D task);
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 2a1d165b3167e..4aa79bcab08c7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -656,6 +656,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 			goto err_early_kill;
 	}
=20
+	raw_spin_lock(&current->blocked_lock);
 	__set_task_blocked_on(current, lock);
 	set_current_state(state);
 	trace_contention_begin(lock, LCB_F_MUTEX);
@@ -669,8 +670,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		 * the handoff.
 		 */
 		if (__mutex_trylock(lock))
-			goto acquired;
+			break;
=20
+		raw_spin_unlock(&current->blocked_lock);
 		/*
 		 * Check for signals and kill conditions while holding
 		 * wait_lock. This ensures the lock cancellation is ordered
@@ -693,12 +695,14 @@ __mutex_lock_common(struct mutex *lock, unsigned int =
state, unsigned int subclas
=20
 		first =3D __mutex_waiter_is_first(lock, &waiter);
=20
+		raw_spin_lock_irqsave(&lock->wait_lock, flags);
+		raw_spin_lock(&current->blocked_lock);
 		/*
 		 * As we likely have been woken up by task
 		 * that has cleared our blocked_on state, re-set
 		 * it to the lock we are trying to acquire.
 		 */
-		set_task_blocked_on(current, lock);
+		__set_task_blocked_on(current, lock);
 		set_current_state(state);
 		/*
 		 * Here we order against unlock; we must either see it change
@@ -709,25 +713,33 @@ __mutex_lock_common(struct mutex *lock, unsigned int =
state, unsigned int subclas
 			break;
=20
 		if (first) {
-			trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
+			bool opt_acquired;
+
 			/*
 			 * mutex_optimistic_spin() can call schedule(), so
-			 * clear blocked on so we don't become unselectable
+			 * we need to release these locks before calling it,
+			 * and clear blocked on so we don't become unselectable
 			 * to run.
 			 */
-			clear_task_blocked_on(current, lock);
-			if (mutex_optimistic_spin(lock, ww_ctx, &waiter))
+			__clear_task_blocked_on(current, lock);
+			raw_spin_unlock(&current->blocked_lock);
+			raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
+
+			trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
+			opt_acquired =3D mutex_optimistic_spin(lock, ww_ctx, &waiter);
+
+			raw_spin_lock_irqsave(&lock->wait_lock, flags);
+			raw_spin_lock(&current->blocked_lock);
+			__set_task_blocked_on(current, lock);
+
+			if (opt_acquired)
 				break;
-			set_task_blocked_on(current, lock);
 			trace_contention_begin(lock, LCB_F_MUTEX);
 		}
-
-		raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	}
-	raw_spin_lock_irqsave(&lock->wait_lock, flags);
-acquired:
 	__clear_task_blocked_on(current, lock);
 	__set_current_state(TASK_RUNNING);
+	raw_spin_unlock(&current->blocked_lock);
=20
 	if (ww_ctx) {
 		/*
@@ -756,11 +768,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int =
state, unsigned int subclas
 	return 0;
=20
 err:
-	__clear_task_blocked_on(current, lock);
+	clear_task_blocked_on(current, lock);
 	__set_current_state(TASK_RUNNING);
 	__mutex_remove_waiter(lock, &waiter);
 err_early_kill:
-	WARN_ON(__get_task_blocked_on(current));
+	WARN_ON(get_task_blocked_on(current));
 	trace_contention_end(lock, ret);
 	raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
 	debug_mutex_free_waiter(&waiter);
@@ -971,7 +983,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 		next =3D waiter->task;
=20
 		debug_mutex_wake_waiter(lock, waiter);
-		__clear_task_blocked_on(next, lock);
+		clear_task_blocked_on(next, lock);
 		wake_q_add(&wake_q, next);
 	}
=20
diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h
index 9ad4da8cea004..7a8ba13fee949 100644
--- a/kernel/locking/mutex.h
+++ b/kernel/locking/mutex.h
@@ -47,6 +47,12 @@ static inline struct task_struct *__mutex_owner(struct m=
utex *lock)
 	return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA=
GS);
 }
=20
+static inline struct mutex *get_task_blocked_on(struct task_struct *p)
+{
+	guard(raw_spinlock_irqsave)(&p->blocked_lock);
+	return __get_task_blocked_on(p);
+}
+
 #ifdef CONFIG_DEBUG_MUTEXES
 extern void debug_mutex_lock_common(struct mutex *lock,
 				    struct mutex_waiter *waiter);
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 31a785afee6c0..e4a81790ea7dd 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -289,7 +289,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER =
*waiter,
 		 * blocked_on pointer. Otherwise we can see circular
 		 * blocked_on relationships that can't resolve.
 		 */
-		__clear_task_blocked_on(waiter->task, lock);
+		clear_task_blocked_on(waiter->task, lock);
 		wake_q_add(wake_q, waiter->task);
 	}
=20
@@ -347,7 +347,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 			 * are waking the mutex owner, who may be currently
 			 * blocked on a different mutex.
 			 */
-			__clear_task_blocked_on(owner, NULL);
+			clear_task_blocked_on(owner, NULL);
 			wake_q_add(wake_q, owner);
 		}
 		return true;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 610e48cdb66a9..7187c63174cd2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6587,6 +6587,7 @@ static struct task_struct *proxy_deactivate(struct rq=
 *rq, struct task_struct *d
  *   p->pi_lock
  *     rq->lock
  *       mutex->wait_lock
+ *         p->blocked_lock
  *
  * Returns the task that is going to be used as execution context (the one
  * that is actually going to be run on cpu_of(rq)).
@@ -6606,8 +6607,9 @@ find_proxy_task(struct rq *rq, struct task_struct *do=
nor, struct rq_flags *rf)
 		 * and ensure @owner sticks around.
 		 */
 		guard(raw_spinlock)(&mutex->wait_lock);
+		guard(raw_spinlock)(&p->blocked_lock);
=20
-		/* Check again that p is blocked with wait_lock held */
+		/* Check again that p is blocked with blocked_lock held */
 		if (mutex !=3D __get_task_blocked_on(p)) {
 			/*
 			 * Something changed in the blocked_on chain and
--=20
2.53.0.1018.g2bb0e51243-goog
From nobody Fri Apr  3 08:34:57 2026
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com
 [209.85.216.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D26D3BC691
	for <linux-kernel@vger.kernel.org>; Tue, 24 Mar 2026 19:13:49 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.74
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774379630; cv=none;
 b=lOVA97O6D2pVwiDu4ayn8bE5dSnYjK6CS3QS3WKaY5BUDl+2AVhtzXznDY8J7kEQIc9/BJnRIZc9KefWoev4ZcV6a+fIyMo2SPY8xL6bz0kA0OJeiEEZ7W8hoctUmptp8Q6H/YWyeKFuyShVGHQnn9wEdIY/pmwRB4QmJPEF8Nw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774379630; c=relaxed/simple;
	bh=msTCyA1PkdHuWuPlFQDkzCVw5Fo1Q6YWL1Ro+s/waj0=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=aAfmpbnkITbSSFFOQb8vesx0XOK+dvTXOB4D9ubYikwmENEne37TQdbCJt6ij98uVTlw2XcMIKB04jIdWll1Qa9pasr9wqdd5ZbKlsUy/eD5SaeSlm+XDuM9EOOpWtB6E5DwqCqavnJL65FW9KSx28aUqUnfRZZxY4pi1u8UNY0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=Ol8P+b57; arc=none smtp.client-ip=209.85.216.74
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="Ol8P+b57"
Received: by mail-pj1-f74.google.com with SMTP id
 98e67ed59e1d1-35ba237d2e0so5230261a91.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 24 Mar 2026 12:13:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1774379629; x=1774984429;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=Dl6sTkn6PQBY5ESOoos6QwsTSfV85Htk250jFpA56CI=;
        b=Ol8P+b57P+ehp+t6bLwT23fNVKNKHwedNeiAiSM0UDEXXX51HRh4MsNE/WVTCCaZFq
         2nKXUkpvlsBLetgFb7jsho7eZhrtXjyM0QIhNBRScYxxp8wWkQhnlNg3dsXo4+XAT3K/
         0z/uvHmIBAJzpKEiaDPQ5+77P6oXk+aKCCpLHFhlijqsEy3uc860rdpKSBNMdrw1f4yc
         Y4V/zvuNP5TZRaIeUc0qsFiC3lU0dbZL2tDooM72hYM7HjjUXPHE5n+OfkEqkl9BGmVV
         9uL2e+etqzK6APAj7Aj/wnl3rDzs6RlgW9sL2FJD3W0Vlh7s5kV03c7/WfgCxuoOL4DE
         z3gA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1774379629; x=1774984429;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=Dl6sTkn6PQBY5ESOoos6QwsTSfV85Htk250jFpA56CI=;
        b=cMiRwCwhe5KVTCK5n+8XhZvoOip4daYQqo3FEaV4KyXePpTaAvxhKQa5E7xiq+8gmp
         9t+7BPFEfNkQ1iE1zaWP1RRqfJbMPfnMstShTb5yZS75CybTs30RHL1c98HHUQnOyr7J
         uTYCPCkutfI48D8bG1HB68cJh6b9qG7EPnHAcNo+7bS27uIT2s8nb8lokOebi7oLcXxh
         OIjFqYmVxNgaVdndjMgnaC5sj3oEwruwePVKsXI5Ps41ztCRttWN1IALE88hsqwLImOT
         r3PfHQMMunh7Tyigba7IFC45z0K5LIWNPbAUqCxTmdk7bARl2qaugaTvl9xZuY7go1pF
         X9YA==
X-Gm-Message-State: AOJu0YxAtXvKbaLmHQoPkWdVLF3ZsnUgIfvo286GFyyEMHNyuMvtL+97
	ZCMlpL0Ylk6N9fj3Wibypx3iK91cFsh+xS83oL3MarfO0hx1j4C4GWzWwW4Rovm9mUAFRZatAa4
	Tyt7FY27QBFEQZyD20OVtwNy1piy3+dG8mmCXJvEbx8r9DCTVUajA9ClxQjMXnozOI9iA6G6WgT
	pbz4iij1s1fEPdGYA8zMeETvRjV22Kz+zURxph4NUlgMtYRKvv
X-Received: from pjbgv15.prod.google.com
 ([2002:a17:90b:11cf:b0:35b:cc2b:ddfa])
 (user=jstultz job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:1dc6:b0:359:f7d2:a1f9
 with SMTP id 98e67ed59e1d1-35c0dc811c0mr438630a91.2.1774379628422; Tue, 24
 Mar 2026 12:13:48 -0700 (PDT)
Date: Tue, 24 Mar 2026 19:13:20 +0000
In-Reply-To: <20260324191337.1841376-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260324191337.1841376-1-jstultz@google.com>
X-Mailer: git-send-email 2.53.0.1018.g2bb0e51243-goog
Message-ID: <20260324191337.1841376-6-jstultz@google.com>
Subject: [PATCH v26 05/10] sched: Fix modifying donor->blocked on without
 proper locking
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Joel Fernandes <joelagnelf@nvidia.com>, Qais Yousef <qyousef@layalina.io>,
	Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>, Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
 Suleiman Souhlal <suleiman@google.com>,
	kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
 kernel-team@android.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Introduce an action enum in find_proxy_task() which allows
us to handle work needed to be done outside the mutex.wait_lock
and task.blocked_lock guard scopes.

This ensures proper locking when we clear the donor's blocked_on
pointer in proxy_deactivate(), and the switch statement will be
useful as we add more cases to handle later in this series.

Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
v23:
* Split out from earlier patch.
v24:
* Minor re-ordering local variables to keep with style
  as suggested by K Prateek

Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
 kernel/sched/core.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7187c63174cd2..c43e7926fda51 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6571,7 +6571,7 @@ static struct task_struct *proxy_deactivate(struct rq=
 *rq, struct task_struct *d
 		 * as unblocked, as we aren't doing proxy-migrations
 		 * yet (more logic will be needed then).
 		 */
-		donor->blocked_on =3D NULL;
+		clear_task_blocked_on(donor, NULL);
 	}
 	return NULL;
 }
@@ -6595,6 +6595,7 @@ static struct task_struct *proxy_deactivate(struct rq=
 *rq, struct task_struct *d
 static struct task_struct *
 find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags =
*rf)
 {
+	enum { FOUND, DEACTIVATE_DONOR } action =3D FOUND;
 	struct task_struct *owner =3D NULL;
 	int this_cpu =3D cpu_of(rq);
 	struct task_struct *p;
@@ -6628,12 +6629,14 @@ find_proxy_task(struct rq *rq, struct task_struct *=
donor, struct rq_flags *rf)
=20
 		if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
 			/* XXX Don't handle blocked owners/delayed dequeue yet */
-			return proxy_deactivate(rq, donor);
+			action =3D DEACTIVATE_DONOR;
+			break;
 		}
=20
 		if (task_cpu(owner) !=3D this_cpu) {
 			/* XXX Don't handle migrations yet */
-			return proxy_deactivate(rq, donor);
+			action =3D DEACTIVATE_DONOR;
+			break;
 		}
=20
 		if (task_on_rq_migrating(owner)) {
@@ -6691,6 +6694,13 @@ find_proxy_task(struct rq *rq, struct task_struct *d=
onor, struct rq_flags *rf)
 		 */
 	}
=20
+	/* Handle actions we need to do outside of the guard() scope */
+	switch (action) {
+	case DEACTIVATE_DONOR:
+		return proxy_deactivate(rq, donor);
+	case FOUND:
+		/* fallthrough */;
+	}
 	WARN_ON_ONCE(owner && !owner->on_rq);
 	return owner;
 }
--=20
2.53.0.1018.g2bb0e51243-goog
From nobody Fri Apr  3 08:34:57 2026
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D96443BE65E
	for <linux-kernel@vger.kernel.org>; Tue, 24 Mar 2026 19:13:50 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.73
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774379632; cv=none;
 b=a3f0qSj7mG4SljoJwT0Iz0Hm8kEH9zZfEDm1XBI2SuFfwpHJfeRGxBILSZNr1EZgbReyoylAXTgDaaNxeCMMLKb7hK6uymsuXJ5R6ViWbZ0ObDDX3Ctj4VcgVbG7vVXBNGV3NvwKvHMTxMvb1BQTmG432dY6cNaJMIEY7uvAuVk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774379632; c=relaxed/simple;
	bh=UOYq7jwiA8Dz6NpCDTItHSoCBfnsYO4WxynXRmH4YAY=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=tng0BJi6EqZHDckO9V/Fu/QmK6pecGXnTLKGL87p29z842GPqmPDxjs8CCfOeevXLW4K8WMzlPkawSGaJaDpA8u3OrOo86mhpJcXaZkvzr82SUEmGbOUyOa/JoZOVhyWvFK5fhlmqUZ385Dfi3XcnHVXrEzsQcTC7WCIjm/4lBg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=KqCLKjzq; arc=none smtp.client-ip=209.85.216.73
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="KqCLKjzq"
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-354c0234c1fso5292291a91.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 24 Mar 2026 12:13:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1774379630; x=1774984430;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=UOAF2eQq8I/MAVyj0OKpSC0AfmcBEnARmoaP/lLtjkc=;
        b=KqCLKjzqBMBGPqNGdD4eW9vxKqzslSLLwmJ9R2QvzddS66KmSk2lDEeGeoSva+PAMr
         rAgQdIy9eKVBuxsuP02bHzpSl9e2w8P/M6gHTHqA8bypBRhAV1F7kXINvt/JMfVDUoda
         AULAqYXQnpF/HDV3YpLaXkYNFJdJgHIUo6SY+l0Ds3jljpmhYfJHoQ8JHzzm0pWIRNHS
         3AakNH9/R3o+fiqv1I3LCfG2sqRQbejTq7qN2vwBL1L4EGACrVQ1yIdzAnlrllI9UCRE
         AIu107qqDa7S4dLfgf/FwVdWvXZ9Xg+lmJ4tkGbMYnzzW0lecxrTXcqj4YN3ZkvmSTxp
         kl2w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1774379630; x=1774984430;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=UOAF2eQq8I/MAVyj0OKpSC0AfmcBEnARmoaP/lLtjkc=;
        b=tXzMxwhgYVE50GWlLL6GYqkqBK0ZFrMUr0ItH2fj5FM+LQAVnriZNZzPAlUsnsbpFq
         yLLsy3dMR28zjXyc7EMiH3RMxoy8/3pj8QaN5aen+oJU7XFwHEUyb4DFxiICDMYr7G/F
         QCszZyu0fia/xrRGhYwJ5w1ldNbqyLe+W29R3ap5yJDG00OX7FSkanzn9t5trSFudrIN
         cWy2LU2z3FzInISpM/gYw1iZw7dwUkcQ9LghSnG57P23iKYWk5pqBjT4Ze29wEe52oL4
         pQ3uMu/xsJmSwKn1KEZWGvsuONifaTgSedH4pkluwtnhXO51+kjUi0ACHV8nh3Ewqtip
         4MYw==
X-Gm-Message-State: AOJu0YzMSLnWWzNPqd+0ACtCHAde2SDudJM07ZIPAzwnz3l4PGcVWAES
	6HgkRFcQApIfJa9mlgaVVnsSZxb35IbiPGGrSSiD5Uprhjg4ZZ2UgEtZMOJdLsUswIVbMLuLU6L
	xAEa0kzZUVLeRWyEHIrSNtz4gbYNFsVcOBBkdYS+oD6CZe/GTMZf2GHg8KeLHkLMA3v/U2LWOF4
	y1adNLJRbDss7B3AdpZC4yExRyLdYeo/7DNM+4sbswYuDY32Hg
X-Received: from pjte4.prod.google.com ([2002:a17:90a:c204:b0:359:bc10:36d6])
 (user=jstultz job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90a:d887:b0:359:ff8a:ee55
 with SMTP id 98e67ed59e1d1-35c0dca6aafmr434920a91.14.1774379629830; Tue, 24
 Mar 2026 12:13:49 -0700 (PDT)
Date: Tue, 24 Mar 2026 19:13:21 +0000
In-Reply-To: <20260324191337.1841376-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260324191337.1841376-1-jstultz@google.com>
X-Mailer: git-send-email 2.53.0.1018.g2bb0e51243-goog
Message-ID: <20260324191337.1841376-7-jstultz@google.com>
Subject: [PATCH v26 06/10] sched/locking: Add special
 p->blocked_on==PROXY_WAKING
 value for proxy return-migration
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Joel Fernandes <joelagnelf@nvidia.com>, Qais Yousef <qyousef@layalina.io>,
	Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>, Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
 Suleiman Souhlal <suleiman@google.com>,
	kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
 kernel-team@android.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

As we add functionality to proxy execution, we may migrate a
donor task to a runqueue where it can't run due to cpu affinity.
Thus, we must be careful to ensure we return-migrate the task
back to a cpu in its cpumask when it becomes unblocked.

Peter helpfully provided the following example with pictures:
"Suppose we have a ww_mutex cycle:

                  ,-+-* Mutex-1 <-.
        Task-A ---' |             | ,-- Task-B
                    `-> Mutex-2 *-+-'

Where Task-A holds Mutex-1 and tries to acquire Mutex-2, and
where Task-B holds Mutex-2 and tries to acquire Mutex-1.

Then the blocked_on->owner chain will go in circles.

        Task-A  -> Mutex-2
          ^          |
          |          v
        Mutex-1 <- Task-B

We need two things:

 - find_proxy_task() to stop iterating the circle;

 - the woken task to 'unblock' and run, such that it can
   back-off and re-try the transaction.

Now, the current code [without this patch] does:
        __clear_task_blocked_on();
        wake_q_add();

And surely clearing ->blocked_on is sufficient to break the
cycle.

Suppose it is Task-B that is made to back-off, then we have:

  Task-A -> Mutex-2 -> Task-B (no further blocked_on)

and it would attempt to run Task-B. Or worse, it could directly
pick Task-B and run it, without ever getting into
find_proxy_task().

Now, here is a problem because Task-B might not be runnable on
the CPU it is currently on; and because !task_is_blocked() we
don't get into the proxy paths, so nobody is going to fix this
up.

Ideally we would have dequeued Task-B alongside of clearing
->blocked_on, but alas, [the lock ordering prevents us from
getting the task_rq_lock() and] spoils things."

Thus we need more than just a binary concept of the task being
blocked on a mutex or not.

So allow setting blocked_on to PROXY_WAKING as a special value
which specifies the task is no longer blocked, but needs to
be evaluated for return migration *before* it can be run.

This will then be used in a later patch to handle proxy
return-migration.

Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
v15:
* Split blocked_on_state into its own patch later in the
  series, as the tri-state isn't necessary until we deal
  with proxy/return migrations
v16:
* Handle case where task in the chain is being set as
  BO_WAKING by another cpu (usually via ww_mutex die code).
  Make sure we release the rq lock so the wakeup can
  complete.
* Rework to use guard() in find_proxy_task() as suggested
  by Peter
v18:
* Add initialization of blocked_on_state for init_task
v19:
* PREEMPT_RT build fixups and rework suggested by
  K Prateek Nayak
v20:
* Simplify one of the blocked_on_state changes to avoid extra
  PREMEPT_RT conditionals
v21:
* Slight reworks due to avoiding nested blocked_lock locking
* Be consistent in use of blocked_on_state helper functions
* Rework calls to proxy_deactivate() to do proper locking
  around blocked_on_state changes that we were cheating in
  previous versions.
* Minor cleanups, some comment improvements
v22:
* Re-order blocked_on_state helpers to try to make it clearer
  the set_task_blocked_on() and clear_task_blocked_on() are
  the main enter/exit states and the blocked_on_state helpers
  help manage the transition states within. Per feedback from
  K Prateek Nayak.
* Rework blocked_on_state to be defined within
  CONFIG_SCHED_PROXY_EXEC as suggested by K Prateek Nayak.
* Reworked empty stub functions to just take one line as
  suggestd by K Prateek
* Avoid using gotos out of a guard() scope, as highlighted by
  K Prateek, and instead rework logic to break and switch()
  on an action value.
v23:
* Big rework to using PROXY_WAKING instead of blocked_on_state
  as suggested by Peter.
* Reworked commit message to include Peter's nice diagrams and
  example for why this extra state is necessary.

Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
 include/linux/sched.h     | 51 +++++++++++++++++++++++++++++++++++++--
 kernel/locking/mutex.c    |  2 +-
 kernel/locking/ww_mutex.h | 16 ++++++------
 kernel/sched/core.c       | 16 ++++++++++++
 4 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2eef9bc6daaab..8ec3b6d7d718b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2180,10 +2180,20 @@ extern int __cond_resched_rwlock_write(rwlock_t *lo=
ck) __must_hold(lock);
 })
=20
 #ifndef CONFIG_PREEMPT_RT
+
+/*
+ * With proxy exec, if a task has been proxy-migrated, it may be a donor
+ * on a cpu that it can't actually run on. Thus we need a special state
+ * to denote that the task is being woken, but that it needs to be
+ * evaluated for return-migration before it is run. So if the task is
+ * blocked_on PROXY_WAKING, return migrate it before running it.
+ */
+#define PROXY_WAKING ((struct mutex *)(-1L))
+
 static inline struct mutex *__get_task_blocked_on(struct task_struct *p)
 {
 	lockdep_assert_held_once(&p->blocked_lock);
-	return p->blocked_on;
+	return p->blocked_on =3D=3D PROXY_WAKING ? NULL : p->blocked_on;
 }
=20
 static inline void __set_task_blocked_on(struct task_struct *p, struct mut=
ex *m)
@@ -2211,7 +2221,7 @@ static inline void __clear_task_blocked_on(struct tas=
k_struct *p, struct mutex *
 	 * blocked_on relationships, but make sure we are not
 	 * clearing the relationship with a different lock.
 	 */
-	WARN_ON_ONCE(m && p->blocked_on && p->blocked_on !=3D m);
+	WARN_ON_ONCE(m && p->blocked_on && p->blocked_on !=3D m && p->blocked_on =
!=3D PROXY_WAKING);
 	p->blocked_on =3D NULL;
 }
=20
@@ -2220,6 +2230,35 @@ static inline void clear_task_blocked_on(struct task=
_struct *p, struct mutex *m)
 	guard(raw_spinlock_irqsave)(&p->blocked_lock);
 	__clear_task_blocked_on(p, m);
 }
+
+static inline void __set_task_blocked_on_waking(struct task_struct *p, str=
uct mutex *m)
+{
+	/* Currently we serialize blocked_on under the task::blocked_lock */
+	lockdep_assert_held_once(&p->blocked_lock);
+
+	if (!sched_proxy_exec()) {
+		__clear_task_blocked_on(p, m);
+		return;
+	}
+
+	/* Don't set PROXY_WAKING if blocked_on was already cleared */
+	if (!p->blocked_on)
+		return;
+	/*
+	 * There may be cases where we set PROXY_WAKING on tasks that were
+	 * already set to waking, but make sure we are not changing
+	 * the relationship with a different lock.
+	 */
+	WARN_ON_ONCE(m && p->blocked_on !=3D m && p->blocked_on !=3D PROXY_WAKING=
);
+	p->blocked_on =3D PROXY_WAKING;
+}
+
+static inline void set_task_blocked_on_waking(struct task_struct *p, struc=
t mutex *m)
+{
+	guard(raw_spinlock_irqsave)(&p->blocked_lock);
+	__set_task_blocked_on_waking(p, m);
+}
+
 #else
 static inline void __clear_task_blocked_on(struct task_struct *p, struct r=
t_mutex *m)
 {
@@ -2228,6 +2267,14 @@ static inline void __clear_task_blocked_on(struct ta=
sk_struct *p, struct rt_mute
 static inline void clear_task_blocked_on(struct task_struct *p, struct rt_=
mutex *m)
 {
 }
+
+static inline void __set_task_blocked_on_waking(struct task_struct *p, str=
uct rt_mutex *m)
+{
+}
+
+static inline void set_task_blocked_on_waking(struct task_struct *p, struc=
t rt_mutex *m)
+{
+}
 #endif /* !CONFIG_PREEMPT_RT */
=20
 static __always_inline bool need_resched(void)
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4aa79bcab08c7..7d359647156df 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -983,7 +983,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 		next =3D waiter->task;
=20
 		debug_mutex_wake_waiter(lock, waiter);
-		clear_task_blocked_on(next, lock);
+		set_task_blocked_on_waking(next, lock);
 		wake_q_add(&wake_q, next);
 	}
=20
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index e4a81790ea7dd..5cd9dfa4b31e6 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -285,11 +285,11 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITE=
R *waiter,
 		debug_mutex_wake_waiter(lock, waiter);
 #endif
 		/*
-		 * When waking up the task to die, be sure to clear the
-		 * blocked_on pointer. Otherwise we can see circular
-		 * blocked_on relationships that can't resolve.
+		 * When waking up the task to die, be sure to set the
+		 * blocked_on to PROXY_WAKING. Otherwise we can see
+		 * circular blocked_on relationships that can't resolve.
 		 */
-		clear_task_blocked_on(waiter->task, lock);
+		set_task_blocked_on_waking(waiter->task, lock);
 		wake_q_add(wake_q, waiter->task);
 	}
=20
@@ -339,15 +339,15 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 		 */
 		if (owner !=3D current) {
 			/*
-			 * When waking up the task to wound, be sure to clear the
-			 * blocked_on pointer. Otherwise we can see circular
-			 * blocked_on relationships that can't resolve.
+			 * When waking up the task to wound, be sure to set the
+			 * blocked_on to PROXY_WAKING. Otherwise we can see
+			 * circular blocked_on relationships that can't resolve.
 			 *
 			 * NOTE: We pass NULL here instead of lock, because we
 			 * are waking the mutex owner, who may be currently
 			 * blocked on a different mutex.
 			 */
-			clear_task_blocked_on(owner, NULL);
+			set_task_blocked_on_waking(owner, NULL);
 			wake_q_add(wake_q, owner);
 		}
 		return true;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c43e7926fda51..aa2e7287235e3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4242,6 +4242,13 @@ int try_to_wake_up(struct task_struct *p, unsigned i=
nt state, int wake_flags)
 		ttwu_queue(p, cpu, wake_flags);
 	}
 out:
+	/*
+	 * For now, if we've been woken up, clear the task->blocked_on
+	 * regardless if it was set to a mutex or PROXY_WAKING so the
+	 * task can run. We will need to be more careful later when
+	 * properly handling proxy migration
+	 */
+	clear_task_blocked_on(p, NULL);
 	if (success)
 		ttwu_stat(p, task_cpu(p), wake_flags);
=20
@@ -6603,6 +6610,10 @@ find_proxy_task(struct rq *rq, struct task_struct *d=
onor, struct rq_flags *rf)
=20
 	/* Follow blocked_on chain. */
 	for (p =3D donor; (mutex =3D p->blocked_on); p =3D owner) {
+		/* if its PROXY_WAKING, resched_idle so ttwu can complete */
+		if (mutex =3D=3D PROXY_WAKING)
+			return proxy_resched_idle(rq);
+
 		/*
 		 * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
 		 * and ensure @owner sticks around.
@@ -6623,6 +6634,11 @@ find_proxy_task(struct rq *rq, struct task_struct *d=
onor, struct rq_flags *rf)
=20
 		owner =3D __mutex_owner(mutex);
 		if (!owner) {
+			/*
+			 * If there is no owner, clear blocked_on
+			 * and return p so it can run and try to
+			 * acquire the lock
+			 */
 			__clear_task_blocked_on(p, mutex);
 			return p;
 		}
--=20
2.53.0.1018.g2bb0e51243-goog
From nobody Fri Apr  3 08:34:57 2026
Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com
 [209.85.214.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A7A33BFE34
	for <linux-kernel@vger.kernel.org>; Tue, 24 Mar 2026 19:13:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774379634; cv=none;
 b=ESxXtSTWCGjKIPMOaI2FDhFWvQKBFQxsDtbu04f91c503w0C1Y4rhsB58+T1M1YWhf2NnO6czdeEFq1Kj1fAUlMOPaHXrTAuyIEuv1fXcrKy/QoLs3GYjzGrSnVZjfcR158zlDQK9kTYIcnpsZ1VTQfDJIn+FHIzN31vM/02zGM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774379634; c=relaxed/simple;
	bh=/eZJFY5uhia92AI7Ho19mC2qaMCeVxXvpv/NzMKp2sA=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=OSNSwtn5PdIoCvC0PspEu7gA2IFKXBFNdW9IHl9BQ4hfwGmUAy/0PBS9jpn4orvSBrPVqGIOOWT045T+3jgAdt45kVoRbAu1C46XLwy2IMjQcPdn4oVf6Kfm2vplD2M6TjIpR3JB5I0yo0zfUcnaXaJ5tUikkMsirDSiG5FsLr4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=tDxx9Q9R; arc=none smtp.client-ip=209.85.214.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="tDxx9Q9R"
Received: by mail-pl1-f201.google.com with SMTP id
 d9443c01a7336-2b0560c1320so121396585ad.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 24 Mar 2026 12:13:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1774379632; x=1774984432;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=mSL183B1Idyggo5Iz6pvpRnPaL/uqCRmDARcuYitFV4=;
        b=tDxx9Q9R7H+56VpE0kHLzoMICSu2ioK7/CGLmAfyloHsbFS+JC3bC1Idjewjep1t0P
         eULAOLsBR9vT8K0E7C0wBFfDIZWKyxxUSyNj4pFBjMkEGUtRkM7o/bFcsg7M0Ms8v9A4
         l+Jjf/slo07uP6qt121UxYl/usUgEv0TgnyJksE//CG5nwUsGPwBCMnOkiuFH1hOfRYy
         5Bx6BezthAHeavuR8jE/efGlF6/crmjjmnVWhn/AbfAUPFl2lhufEIp0fXDdZ4HPC2fF
         PPiUeimprwTW+cDhEh55rnLfZYuNgBMdKa0DRZwSdm0inWmO8UiZkQXYTz2sTLY0/J/f
         Gtrg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1774379632; x=1774984432;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=mSL183B1Idyggo5Iz6pvpRnPaL/uqCRmDARcuYitFV4=;
        b=ZDNMHPBCERqG1aUGzQC+Fp2UTlLSkxoFCljkX8GlvcIbBL/xtA+Lq+CvbcuAmXVIre
         UAwTxTr/yXGgL2nBtVhCfPEOu+kiAvH2zpjiJVOgBKRroA7cmJokNVDBQ/RToNg4xfvA
         WBn4ODmMhY9siBc8LEPl5vS/yQP/ls5eqJabwV1LFdblAWYt4HE5Paic6XCF4d7zaEIV
         v3BOECPA/Rk95dnVDTVnOGKv6CgORdNXolbe+x7li1HtsxDSK+ztXoZgHl2/i8xBFT5k
         DZqL372kMXqEeLMvGmbDeVG1g7aUDV4Xh3pXG7nIT4VYAm7A+LUzzDRKwW3D9SRL3f/e
         kapA==
X-Gm-Message-State: AOJu0Yxf+wvJtoeh1mNqH9JLorwObdzdVKO8Vdt9WPpe/knCVvZ8dAPj
	7KXlsnXBgSBbu+GFXdKLwjaU6jDq5r+jcmUwr5PrLipNFmtfLSym0TkvuGyuyG3xzJ+Mkj+L+db
	ssmko4e7AL42MOtND/SAuSevp6l+Tdkh/oBdvFq6tmmc3Sx+2mSqM9cHktK/SejenQUgA0pryOK
	DEIWau0D4c6mOQx4SoCVvInvCMWtyukv8iFUqIVjwNkBBtsUoT
X-Received: from ploh5.prod.google.com ([2002:a17:902:f705:b0:2b0:abae:5cf7])
 (user=jstultz job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:903:46cb:b0:2b0:4b37:e9a5
 with SMTP id d9443c01a7336-2b0b0b2efd3mr7048255ad.53.1774379631545; Tue, 24
 Mar 2026 12:13:51 -0700 (PDT)
Date: Tue, 24 Mar 2026 19:13:22 +0000
In-Reply-To: <20260324191337.1841376-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260324191337.1841376-1-jstultz@google.com>
X-Mailer: git-send-email 2.53.0.1018.g2bb0e51243-goog
Message-ID: <20260324191337.1841376-8-jstultz@google.com>
Subject: [PATCH v26 07/10] sched: Add assert_balance_callbacks_empty helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Peter Zijlstra <peterz@infradead.org>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Joel Fernandes <joelagnelf@nvidia.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>, Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
 Suleiman Souhlal <suleiman@google.com>,
	kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
 kernel-team@android.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

With proxy-exec utilizing pick-again logic, we can end up having
balance callbacks set by the preivous pick_next_task() call left
on the list.

So pull the warning out into a helper function, and make sure we
check it when we pick again.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
v24:
* Use IS_ENABLED() as suggested by K Prateek

Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
 kernel/sched/core.c  | 1 +
 kernel/sched/sched.h | 9 ++++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index aa2e7287235e3..b316b6015ffea 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6856,6 +6856,7 @@ static void __sched notrace __schedule(int sched_mode)
 	}
=20
 pick_again:
+	assert_balance_callbacks_empty(rq);
 	next =3D pick_next_task(rq, rq->donor, &rf);
 	rq->next_class =3D next->sched_class;
 	if (sched_proxy_exec()) {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 43bbf0693cca4..2a0236d745832 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1853,6 +1853,13 @@ static inline void scx_rq_clock_update(struct rq *rq=
, u64 clock) {}
 static inline void scx_rq_clock_invalidate(struct rq *rq) {}
 #endif /* !CONFIG_SCHED_CLASS_EXT */
=20
+static inline void assert_balance_callbacks_empty(struct rq *rq)
+{
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_LOCKING) &&
+		     rq->balance_callback &&
+		     rq->balance_callback !=3D &balance_push_callback);
+}
+
 /*
  * Lockdep annotation that avoids accidental unlocks; it's like a
  * sticky/continuous lockdep_assert_held().
@@ -1869,7 +1876,7 @@ static inline void rq_pin_lock(struct rq *rq, struct =
rq_flags *rf)
=20
 	rq->clock_update_flags &=3D (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
 	rf->clock_update_flags =3D 0;
-	WARN_ON_ONCE(rq->balance_callback && rq->balance_callback !=3D &balance_p=
ush_callback);
+	assert_balance_callbacks_empty(rq);
 }
=20
 static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
--=20
2.53.0.1018.g2bb0e51243-goog
From nobody Fri Apr  3 08:34:57 2026
Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com
 [209.85.215.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBA2C3C061A
	for <linux-kernel@vger.kernel.org>; Tue, 24 Mar 2026 19:13:53 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.215.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774379635; cv=none;
 b=uWny44JKBt/hEHtJBR2dscyaCNrsK4fr7xQayt1y7EI9liCxXbGH5NKiykE8qJvjP4ysBzODZrcMhnXzyMDm3ur9xUcwC/LpVm5ForNrODYKJiYuNzmbUcWjJRXb2mhsUdW3SGHI2TxhiI89HfdXF4cXrw7DM1K8iJzzqAqL1Fs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774379635; c=relaxed/simple;
	bh=5vWmbZ72LnrTfT8IJ/kbRaBRf3CyXBDSIU9uSpfjdhg=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=TF0VmHjfSQK9dChgDN4sTCqpoTntBxSbWxXzeAYI7LYA1nr+cmI4Ynh+7kPOLkQFbynF2PTBicCuCIihI+zih1OUQwSINrv8S33+4rlHIL+AxlKjcpWq83YRvIDgZCSSed/uqzn3Yp8O/7nIChXHTw2uVMqSI2WdZhJ8ya3enKo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=TaEbvhdS; arc=none smtp.client-ip=209.85.215.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="TaEbvhdS"
Received: by mail-pg1-f202.google.com with SMTP id
 41be03b00d2f7-c73781252edso23241266a12.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 24 Mar 2026 12:13:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1774379633; x=1774984433;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=Nvn72xE70j3mk1nVv0Rrlv//tMOmDYSjJGcB5sLcQ+E=;
        b=TaEbvhdSf0B5FiEl6ui1H2uxRlCxspidI+q1sBoZ2tazLqxCV5xTCJ3/hK0q7ndFsy
         AMOrNP9u9M125gYci9Aivx7ITJuJeojfboKnCIyAvT69YUeVpmFsaxo24gvZmSID4wVs
         O8eYRrzvfCZ/t5TnyB87sucWG/Jj+MgnoEHhon3qE7yUhfaSms5TINx/VRRwUE2NMn6I
         HHnN/uWLMFFq/OfvNURaVA5eaoXUsdIDNDJOsStHDOyGzvglpSmhX8vOihXlRGQORwKh
         xXnbwf+OjvfCJBt2fbtNES2v2KJaCl6b9XdaMUgPheMdkyBtPlVHFwfSS4UKsZ0WDfJn
         w4Ow==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1774379633; x=1774984433;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=Nvn72xE70j3mk1nVv0Rrlv//tMOmDYSjJGcB5sLcQ+E=;
        b=QmL/OudS7HTLJ3nN1z69D9ri9yXfD3S7EsKxboED4tbYeoczwEjCF/bbH1WUyBXnUY
         RSiahXGA5fFF/OUie9buuqI3QMg9kZsoK1R8ZdZHf/y1vsYvo1ez0fkSNqMsIGlTBUjU
         SdaiHfOsj6cv2iDhnxa4qwY6TAeKxdGk+glglIlSahscP3+PP5UagCvx5KUcGKLeHjzu
         iCfcA7P3z/AKoi64TjB9wUoESgEmtLbXP7qr+WXplBN8pO6yi13Y7Byvsj76kwk0E8eJ
         N9U/N5CcfCtiK2kbu/CK5OXIB1Q26PedPd69oZqI7eWbLKA4iI+UDrehMRI6T0UnauYO
         M3fg==
X-Gm-Message-State: AOJu0Yx4yHdf/DlxoIJ0XbwMkgY8CNVd8ALP0IcPdofjh8bB23701hRQ
	huBPQC6e0daHI2MJsPGzvQqpMctOO1qlNGQy71oRIHCrky1PXu9l6EKtUtOdeBk1Q3gpPTk4L77
	AHEDDzt3+NO+ybsPd1VkFRQxgtuTrgh1JAEZI85s31Py4kzVVol+kOJ5DjyurKZ2iFnBhUCm8BD
	ZSk/lrj3cdwpQvg1dOA38qrHSDBLg4bBmyGIm1V2DLDC/0/b/p
X-Received: from pfbbw25.prod.google.com
 ([2002:a05:6a00:4099:b0:824:9ab3:ebe8])
 (user=jstultz job=prod-delivery.src-stubby-dispatcher) by
 2002:aa7:88d3:0:b0:82a:1044:3563
 with SMTP id d2e1a72fcca58-82c6df392fcmr732502b3a.23.1774379632885; Tue, 24
 Mar 2026 12:13:52 -0700 (PDT)
Date: Tue, 24 Mar 2026 19:13:23 +0000
In-Reply-To: <20260324191337.1841376-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260324191337.1841376-1-jstultz@google.com>
X-Mailer: git-send-email 2.53.0.1018.g2bb0e51243-goog
Message-ID: <20260324191337.1841376-9-jstultz@google.com>
Subject: [PATCH v26 08/10] sched: Add logic to zap balance callbacks if we
 pick again
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Joel Fernandes <joelagnelf@nvidia.com>, Qais Yousef <qyousef@layalina.io>,
	Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>, Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
 Suleiman Souhlal <suleiman@google.com>,
	kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
 kernel-team@android.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

With proxy-exec, a task is selected to run via pick_next_task(),
and then if it is a mutex blocked task, we call find_proxy_task()
to find a runnable owner. If the runnable owner is on another
cpu, we will need to migrate the selected donor task away, after
which we will pick_again can call pick_next_task() to choose
something else.

However, in the first call to pick_next_task(), we may have
had a balance_callback setup by the class scheduler. After we
pick again, its possible pick_next_task_fair() will be called
which calls sched_balance_newidle() and sched_balance_rq().

This will throw a warning:
[    8.796467] rq->balance_callback && rq->balance_callback !=3D &balance_p=
ush_callback
[    8.796467] WARNING: CPU: 32 PID: 458 at kernel/sched/sched.h:1750 sched=
_balance_rq+0xe92/0x1250
...
[    8.796467] Call Trace:
[    8.796467]  <TASK>
[    8.796467]  ? __warn.cold+0xb2/0x14e
[    8.796467]  ? sched_balance_rq+0xe92/0x1250
[    8.796467]  ? report_bug+0x107/0x1a0
[    8.796467]  ? handle_bug+0x54/0x90
[    8.796467]  ? exc_invalid_op+0x17/0x70
[    8.796467]  ? asm_exc_invalid_op+0x1a/0x20
[    8.796467]  ? sched_balance_rq+0xe92/0x1250
[    8.796467]  sched_balance_newidle+0x295/0x820
[    8.796467]  pick_next_task_fair+0x51/0x3f0
[    8.796467]  __schedule+0x23a/0x14b0
[    8.796467]  ? lock_release+0x16d/0x2e0
[    8.796467]  schedule+0x3d/0x150
[    8.796467]  worker_thread+0xb5/0x350
[    8.796467]  ? __pfx_worker_thread+0x10/0x10
[    8.796467]  kthread+0xee/0x120
[    8.796467]  ? __pfx_kthread+0x10/0x10
[    8.796467]  ret_from_fork+0x31/0x50
[    8.796467]  ? __pfx_kthread+0x10/0x10
[    8.796467]  ret_from_fork_asm+0x1a/0x30
[    8.796467]  </TASK>

This is because if a RT task was originally picked, it will
setup the rq->balance_callback with push_rt_tasks() via
set_next_task_rt().

Once the task is migrated away and we pick again, we haven't
processed any balance callbacks, so rq->balance_callback is not
in the same state as it was the first time pick_next_task was
called.

To handle this, add a zap_balance_callbacks() helper function
which cleans up the balance callbacks without running them. This
should be ok, as we are effectively undoing the state set in
the first call to pick_next_task(), and when we pick again,
the new callback can be configured for the donor task actually
selected.

Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
v20:
* Tweaked to avoid build issues with different configs
v22:
* Spelling fix suggested by K Prateek
* Collapsed the stub implementation to one line as suggested
  by K Prateek
* Zap callbacks when we resched idle, as suggested by K Prateek
v24:
* Don't conditionalize function on CONFIG_SCHED_PROXY_EXEC as
  the callers will be optimized out if that is unset, and the
  dead function will be removed, as suggsted by K Prateek

Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
 kernel/sched/core.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b316b6015ffea..4ed24ef590f73 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4920,6 +4920,34 @@ static inline void finish_task(struct task_struct *p=
rev)
 	smp_store_release(&prev->on_cpu, 0);
 }
=20
+/*
+ * Only called from __schedule context
+ *
+ * There are some cases where we are going to re-do the action
+ * that added the balance callbacks. We may not be in a state
+ * where we can run them, so just zap them so they can be
+ * properly re-added on the next time around. This is similar
+ * handling to running the callbacks, except we just don't call
+ * them.
+ */
+static void zap_balance_callbacks(struct rq *rq)
+{
+	struct balance_callback *next, *head;
+	bool found =3D false;
+
+	lockdep_assert_rq_held(rq);
+
+	head =3D rq->balance_callback;
+	while (head) {
+		if (head =3D=3D &balance_push_callback)
+			found =3D true;
+		next =3D head->next;
+		head->next =3D NULL;
+		head =3D next;
+	}
+	rq->balance_callback =3D found ? &balance_push_callback : NULL;
+}
+
 static void do_balance_callbacks(struct rq *rq, struct balance_callback *h=
ead)
 {
 	void (*func)(struct rq *rq);
@@ -6865,10 +6893,14 @@ static void __sched notrace __schedule(int sched_mo=
de)
 		rq_set_donor(rq, next);
 		if (unlikely(next->blocked_on)) {
 			next =3D find_proxy_task(rq, next, &rf);
-			if (!next)
+			if (!next) {
+				zap_balance_callbacks(rq);
 				goto pick_again;
-			if (next =3D=3D rq->idle)
+			}
+			if (next =3D=3D rq->idle) {
+				zap_balance_callbacks(rq);
 				goto keep_resched;
+			}
 		}
 		if (rq->donor =3D=3D prev_donor && prev !=3D next) {
 			struct task_struct *donor =3D rq->donor;
--=20
2.53.0.1018.g2bb0e51243-goog
From nobody Fri Apr  3 08:34:57 2026
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com
 [209.85.214.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BC003C13F5
	for <linux-kernel@vger.kernel.org>; Tue, 24 Mar 2026 19:13:55 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774379637; cv=none;
 b=U1nfWL9FXi6FckuC9Zyg6tPaFqWE7ImhYa/1Tb3C4CCAP1WsW/6hUCjjgYwSscrP3sfF8F6oL0Iv4HkqnaO5Pin5FYEBz3P5dOvyVCmDqz2RAn1o4UYG/vtuOwQ5s95h5OzC+NoURbz5cgMRGc/REGLTFRlPw+aKjMsuJx5/UdU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774379637; c=relaxed/simple;
	bh=/V3gaHlTHv/x429QQd88aapwPmpMOfi/GWuEqIpg4Ig=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=CVOmSZ3/78u06tk02cSZ2dJOAAGCbHgHC5xHa6VDsHr5Sohm8yMKQuBveOY+QrkK0yliu1mRKPuBvFPaIEn0FFwgfP9rpKLlpB9aOGXeBe8daRqtxOYMS3s8EB3dJKzH2BmkFXy2HiNdvs21dQfmHehatt5QZuDdg1lyaJaxcHI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=fUw+LWbT; arc=none smtp.client-ip=209.85.214.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="fUw+LWbT"
Received: by mail-pl1-f202.google.com with SMTP id
 d9443c01a7336-2b08025f240so2501805ad.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 24 Mar 2026 12:13:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1774379635; x=1774984435;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=R6cQ3WpqP5ss2YZOuRfU4M9m0rbTHLz4QTo0lRNq+Bk=;
        b=fUw+LWbTj1OTtbU0sTzbe6uOHP8t2Y933CYGeuYVA8Fysb+IKPrtS+eyhK02WH5G0d
         E86vFOAM7yhy2DFQQYgHPJ61Fvis1SgnBy9bkjl1RTLHpMKAt5WlWCPFd0RzMtQfiN4I
         Gi+C35Nu1uCMLsIgvPD9fYU9IZ0vYjZJMcVPHerZ5/9w9R2Ms+HmSWX/OZP2UFydKaaM
         uN/fR9VwWS9UBq6LBS8GuJ2/gUgfbIMV22vsmI+CDSPWdXk4rKttDwsW7/954GeXbgj/
         EvVNTpuKfq2hxmVAx34fJRKkHe7b/Zae7wOEyVawVZhzs6vjLfWE3M38HXh58AGbV6XT
         M7gg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1774379635; x=1774984435;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=R6cQ3WpqP5ss2YZOuRfU4M9m0rbTHLz4QTo0lRNq+Bk=;
        b=lgA6dfLN19RlOFOXVWW5sfxaIVA+Q6ljj+jgXSJwYm//h5QI1IGk+PpLPguUNNEjm+
         JrKbWt0+4nwyOBN6p1v6fLOVXAwSRgkvuCprluH2Pcei2VMRbn2iTsI6BnKKRXCUKuFG
         aorSfUR3zGdMtBaTe09Hsa4RxID9y2NKq+EPem5m8O7kQjP8EAPM5wSSm2JLkDBBd75Z
         h85xfxc5YNCloDARrlQm6eaoJbFSzYSd0NCBlwWp0OGVLufVfTmQZ2bTheoREc9i4Prc
         TFvwVfceQdv3gv8ZOptPFBAnepmqUL3a0LF/gPhmo93b4Ro9KZjVAQxNzT07rr0eDHv+
         UBEQ==
X-Gm-Message-State: AOJu0Yz7in3pQY3KZ2cwogSIHA8ejNtVSmAaTgVrTun9Lah+NDU025d7
	NGQjGaDh8sweEFcCEzser9EJf8iWT5Q/UAaZ4F09LvHIp16tO8rhnD2ceYwwirX4gZ3mIJcA2Dp
	S9IyWRmI5Trikq9aBkipmLhx55p6gjLjZHwb6A8LHnwMAUOextDPUlZMjsOQI6MYKdZrSoCYrcF
	+dwKMD9i+7tbYLzPsRxKxPYL2K+TffBMpLyuTHhTgRlBWrRiRx
X-Received: from plpn15.prod.google.com ([2002:a17:902:968f:b0:2ae:c5aa:fcd7])
 (user=jstultz job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:902:c943:b0:2b0:7502:6ebe
 with SMTP id d9443c01a7336-2b0b083a7d6mr8360365ad.25.1774379634332; Tue, 24
 Mar 2026 12:13:54 -0700 (PDT)
Date: Tue, 24 Mar 2026 19:13:24 +0000
In-Reply-To: <20260324191337.1841376-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260324191337.1841376-1-jstultz@google.com>
X-Mailer: git-send-email 2.53.0.1018.g2bb0e51243-goog
Message-ID: <20260324191337.1841376-10-jstultz@google.com>
Subject: [PATCH v26 09/10] sched: Move attach_one_task and attach_task helpers
 to sched.h
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Joel Fernandes <joelagnelf@nvidia.com>, Qais Yousef <qyousef@layalina.io>,
	Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>, Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
 Suleiman Souhlal <suleiman@google.com>,
	kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
 kernel-team@android.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

The fair scheduler locally introduced attach_one_task() and
attach_task() helpers, but these could be generically useful so
move this code to sched.h so we can use them elsewhere.

One minor tweak made to utilize guard(rq_lock)(rq) to simplifiy
the function.

Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
v26:
* Folded in switch to use guard(rq_lock)(rq) as suggested
  by K Prateek

Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
 kernel/sched/fair.c  | 26 --------------------------
 kernel/sched/sched.h | 23 +++++++++++++++++++++++
 2 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed1..53da01a251487 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9784,32 +9784,6 @@ static int detach_tasks(struct lb_env *env)
 	return detached;
 }
=20
-/*
- * attach_task() -- attach the task detached by detach_task() to its new r=
q.
- */
-static void attach_task(struct rq *rq, struct task_struct *p)
-{
-	lockdep_assert_rq_held(rq);
-
-	WARN_ON_ONCE(task_rq(p) !=3D rq);
-	activate_task(rq, p, ENQUEUE_NOCLOCK);
-	wakeup_preempt(rq, p, 0);
-}
-
-/*
- * attach_one_task() -- attaches the task returned from detach_one_task() =
to
- * its new rq.
- */
-static void attach_one_task(struct rq *rq, struct task_struct *p)
-{
-	struct rq_flags rf;
-
-	rq_lock(rq, &rf);
-	update_rq_clock(rq);
-	attach_task(rq, p);
-	rq_unlock(rq, &rf);
-}
-
 /*
  * attach_tasks() -- attaches all tasks detached by detach_tasks() to their
  * new rq.
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2a0236d745832..d4def70df05a6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3008,6 +3008,29 @@ extern void deactivate_task(struct rq *rq, struct ta=
sk_struct *p, int flags);
=20
 extern void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags=
);
=20
+/*
+ * attach_task() -- attach the task detached by detach_task() to its new r=
q.
+ */
+static inline void attach_task(struct rq *rq, struct task_struct *p)
+{
+	lockdep_assert_rq_held(rq);
+
+	WARN_ON_ONCE(task_rq(p) !=3D rq);
+	activate_task(rq, p, ENQUEUE_NOCLOCK);
+	wakeup_preempt(rq, p, 0);
+}
+
+/*
+ * attach_one_task() -- attaches the task returned from detach_one_task() =
to
+ * its new rq.
+ */
+static inline void attach_one_task(struct rq *rq, struct task_struct *p)
+{
+	guard(rq_lock)(rq);
+	update_rq_clock(rq);
+	attach_task(rq, p);
+}
+
 #ifdef CONFIG_PREEMPT_RT
 # define SCHED_NR_MIGRATE_BREAK 8
 #else
--=20
2.53.0.1018.g2bb0e51243-goog
From nobody Fri Apr  3 08:34:57 2026
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com
 [209.85.216.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AFC773C3425
	for <linux-kernel@vger.kernel.org>; Tue, 24 Mar 2026 19:13:56 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.74
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774379638; cv=none;
 b=DYQa93DHCtmOf0QLJBtBePwdoWWE4i+NnE7OdQ04iNOQ/n/yJi+QrX73SSl+xJF/gLsCjrWzbvPLshFLreSe3ebv6hI4yyuRVdKdRkoyyv3S4jLZFZUzPfYn+UEfnX2aFSzrNuYPxWghW6+6CPcdjH2VcbVB80p3c+gYZONnLHk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774379638; c=relaxed/simple;
	bh=zY6yibpoxUjfwuJKoR1TOi7KKOeJwI/O9L85xYREVfY=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=u/oTAomlkFimMterf65uyAj/6WltxcZ+Pr/b6qRVcwJD1LMNueQWx3e+MBG8JOIfJWRdzv2xzZ8XFg/7wWN8pOhW4C5HxeZS+98TRkJ/AJUDJhgUnflnjk9SM+qfGlGi/8u8cfYna69KsVwPsBHUszpGN5rBwTbK+iKYe5M740U=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=RWC9O+kw; arc=none smtp.client-ip=209.85.216.74
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="RWC9O+kw"
Received: by mail-pj1-f74.google.com with SMTP id
 98e67ed59e1d1-354490889b6so25155858a91.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 24 Mar 2026 12:13:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20251104; t=1774379636; x=1774984436;
 darn=vger.kernel.org;
        h=content-transfer-encoding:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id
         :reply-to;
        bh=LJH/T4l7rFebVxD6yhovQF7CuzEOG3LuB9tnOojgwm8=;
        b=RWC9O+kwfsla8oI1ckAr/zXbyPJxUE3/wbQsOzto1w+n1YRIwzbWte41xU4hQys/2L
         1iI3/bj4WVcgI81icma3ABMSQUSqTB90T7FtckmGu5PyxpMyYcuiku9OzG9LussznKzx
         mUqVQaw11KZdGqLGhVwwGv2GO4gLrv0fZVY6BuQ3GiPAM2fq1i06e7+NbvvMq0HPqaJ4
         SDqlGYz+skbB/Tfd7M4VhmVHkUhsgZyvSkDm6M/Ey2M/UG/p+VY6wxnJ2YdfG7TtIBSc
         Ezb0Mm5vlEtIqc9Sy+5aZdMr9NfYJeerBKiy6b+BTzRB/rUKn5WCLlLhCVRwF18v4dv8
         MmHw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1774379636; x=1774984436;
        h=content-transfer-encoding:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject
         :date:message-id:reply-to;
        bh=LJH/T4l7rFebVxD6yhovQF7CuzEOG3LuB9tnOojgwm8=;
        b=K8N10zrz0wPufReJ+i4qhek2R6Is9lL5hQJh9pl+vlDkMPx14UlQySL14ErJJ1JUmo
         MlpPmddO1g8tuH4XK8b+CS9QyJaBNvt1dNWFEWG4v7yOw7gaufIeoAYohKOum5eUe1Ry
         nRmK9HHSP+O6mvFzpijVpgUi5HMS3d0aXEwmP0SOr17xwhSLFhzifVxas4bVqeKX2xep
         dxB5iI4MyDRZoe1QRHijuCJZcE1QeUR9GGAvjxW1TykvIhdjXDEwijD9CezhuIvcrqxr
         XDX4vBFGotwRkP608wkjAbtGWd/3QM2KxTkVPGjtB/D5EbPupvT4dkUHmJg8wTdC/bG1
         6ArA==
X-Gm-Message-State: AOJu0YweytgFd9BAMur/xM68D4Jv6hmm+wuDhMcae0RlQJSaZhREg+i0
	nPEuWNuQSkUGUyJcZ9BqtSpzwvrlnljruzLrK4sJHaHbIHNFcBOhDXiKe/Rm3wf6QUULhCkri0B
	HP02jX/2YYQ4A7PzX8Wh165KmGlaWLjtNaJ/el6mOP2+Q4cl6Wf8oo/wXuG1w5dhReRYx0Koewf
	sE3nqgfdGfDG113x9seaBGGD/ebeXZo4wFagvadBM0DTlF7cau
X-Received: from pjbgk23.prod.google.com
 ([2002:a17:90b:1197:b0:352:c130:fba7])
 (user=jstultz job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:1d4c:b0:35b:aca5:db39
 with SMTP id 98e67ed59e1d1-35c0dcb2a8amr432000a91.9.1774379635807; Tue, 24
 Mar 2026 12:13:55 -0700 (PDT)
Date: Tue, 24 Mar 2026 19:13:25 +0000
In-Reply-To: <20260324191337.1841376-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260324191337.1841376-1-jstultz@google.com>
X-Mailer: git-send-email 2.53.0.1018.g2bb0e51243-goog
Message-ID: <20260324191337.1841376-11-jstultz@google.com>
Subject: [PATCH v26 10/10] sched: Handle blocked-waiter migration (and return
 migration)
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelagnelf@nvidia.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
 Mel Gorman <mgorman@suse.de>,
	Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
 Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
 Daniel Lezcano <daniel.lezcano@linaro.org>,
	Suleiman Souhlal <suleiman@google.com>, kuyo chang <kuyo.chang@mediatek.com>,
 hupu <hupu.gm@gmail.com>,
	kernel-team@android.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add logic to handle migrating a blocked waiter to a remote
cpu where the lock owner is runnable.

Additionally, as the blocked task may not be able to run
on the remote cpu, add logic to handle return migration once
the waiting task is given the mutex.

Because tasks may get migrated to where they cannot run, also
modify the scheduling classes to avoid sched class migrations on
mutex blocked tasks, leaving find_proxy_task() and related logic
to do the migrations and return migrations.

This was split out from the larger proxy patch, and
significantly reworked.

Credits for the original patch go to:
  Peter Zijlstra (Intel) <peterz@infradead.org>
  Juri Lelli <juri.lelli@redhat.com>
  Valentin Schneider <valentin.schneider@arm.com>
  Connor O'Brien <connoro@google.com>

Signed-off-by: John Stultz <jstultz@google.com>
---
v6:
* Integrated sched_proxy_exec() check in proxy_return_migration()
* Minor cleanups to diff
* Unpin the rq before calling __balance_callbacks()
* Tweak proxy migrate to migrate deeper task in chain, to avoid
  tasks pingponging between rqs
v7:
* Fixup for unused function arguments
* Switch from that_rq -> target_rq, other minor tweaks, and typo
  fixes suggested by Metin Kaya
* Switch back to doing return migration in the ttwu path, which
  avoids nasty lock juggling and performance issues
* Fixes for UP builds
v8:
* More simplifications from Metin Kaya
* Fixes for null owner case, including doing return migration
* Cleanup proxy_needs_return logic
v9:
* Narrow logic in ttwu that sets BO_RUNNABLE, to avoid missed
  return migrations
* Switch to using zap_balance_callbacks rathern then running
  them when we are dropping rq locks for proxy_migration.
* Drop task_is_blocked check in sched_submit_work as suggested
  by Metin (may re-add later if this causes trouble)
* Do return migration when we're not on wake_cpu. This avoids
  bad task placement caused by proxy migrations raised by
  Xuewen Yan
* Fix to call set_next_task(rq->curr) prior to dropping rq lock
  to avoid rq->curr getting migrated before we have actually
  switched from it
* Cleanup to re-use proxy_resched_idle() instead of open coding
  it in proxy_migrate_task()
* Fix return migration not to use DEQUEUE_SLEEP, so that we
  properly see the task as task_on_rq_migrating() after it is
  dequeued but before set_task_cpu() has been called on it
* Fix to broaden find_proxy_task() checks to avoid race where
  a task is dequeued off the rq due to return migration, but
  set_task_cpu() and the enqueue on another rq happened after
  we checked task_cpu(owner). This ensures we don't proxy
  using a task that is not actually on our runqueue.
* Cleanup to avoid the locked BO_WAKING->BO_RUNNABLE transition
  in try_to_wake_up() if proxy execution isn't enabled.
* Cleanup to improve comment in proxy_migrate_task() explaining
  the set_next_task(rq->curr) logic
* Cleanup deadline.c change to stylistically match rt.c change
* Numerous cleanups suggested by Metin
v10:
* Drop WARN_ON(task_is_blocked(p)) in ttwu current case
v11:
* Include proxy_set_task_cpu from later in the series to this
  change so we can use it, rather then reworking logic later
  in the series.
* Fix problem with return migration, where affinity was changed
  and wake_cpu was left outside the affinity mask.
* Avoid reading the owner's cpu twice (as it might change inbetween)
  to avoid occasional migration-to-same-cpu edge cases
* Add extra WARN_ON checks for wake_cpu and return migration
  edge cases.
* Typo fix from Metin
v13:
* As we set ret, return it, not just NULL (pulling this change
  in from later patch)
* Avoid deadlock between try_to_wake_up() and find_proxy_task() when
  blocked_on cycle with ww_mutex is trying a mid-chain wakeup.
* Tweaks to use new __set_blocked_on_runnable() helper
* Potential fix for incorrectly updated task->dl_server issues
* Minor comment improvements
* Add logic to handle missed wakeups, in that case doing return
  migration from the find_proxy_task() path
* Minor cleanups
v14:
* Improve edge cases where we wouldn't set the task as BO_RUNNABLE
v15:
* Added comment to better describe proxy_needs_return() as suggested
  by Qais
* Build fixes for !CONFIG_SMP reported by
  Maciej =C5=BBenczykowski <maze@google.com>
* Adds fix for re-evaluating proxy_needs_return when
  sched_proxy_exec() is disabled, reported and diagnosed by:
  kuyo chang <kuyo.chang@mediatek.com>
v16:
* Larger rework of needs_return logic in find_proxy_task, in
  order to avoid problems with cpuhotplug
* Rework to use guard() as suggested by Peter
v18:
* Integrate optimization suggested by Suleiman to do the checks
  for sleeping owners before checking if the task_cpu is this_cpu,
  so that we can avoid needlessly proxy-migrating tasks to only
  then dequeue them. Also check if migrating last.
* Improve comments around guard locking
* Include tweak to ttwu_runnable() as suggested by
  hupu <hupu.gm@gmail.com>
* Rework the logic releasing the rq->donor reference before letting
  go of the rqlock. Just use rq->idle.
* Go back to doing return migration on BO_WAKING owners, as I was
  hitting some softlockups caused by running tasks not making
  it out of BO_WAKING.
v19:
* Fixed proxy_force_return() logic for !SMP cases
v21:
* Reworked donor deactivation for unhandled sleeping owners
* Commit message tweaks
v22:
* Add comments around zap_balance_callbacks in proxy_migration logic
* Rework logic to avoid gotos out of guard() scopes, and instead
  use break and switch() on action value, as suggested by K Prateek
* K Prateek suggested simplifications around putting donor and
  setting idle as next task in the migration paths, which I further
  simplified to using proxy_resched_idle()
* Comment improvements
* Dropped curr !=3D donor check in pick_next_task_fair() suggested by
  K Prateek
v23:
* Rework to use the PROXY_WAKING approach suggested by Peter
* Drop unnecessarily setting wake_cpu when affinity changes
  as noticed by Peter
* Split out the ttwu() logic changes into a later separate patch
  as suggested by Peter
v24:
* Numerous fixes for rq clock handling, pointed out by K Prateek
* Slight tweak to where put_task() is called suggested by
  K Prateek
v25:
* Use WF_TTWU in proxy_force_return(), suggested by K Prateek
* Drop get/put_task_struct() in proxy_force_return(), suggested
  by K Prateek
* Use attach_one_task() to reduce repetitive logic, as suggested
  by K Prateek
v26:
* Add context analysis fixups suggested by Peter
* Add proxy_release/reacquire_rq_lock helpers suggested by Peter
* Rework comments as suggested by Peter
* Rework logic to use scoped_guard (task_rq_lock, p) suggested
  by Peter
* Move proxy_resched_idle() call up earlier before rq release
  in proxy_force_return() as suggested by K Prateek
* If needed, mark task PROXY_WAKING if try_to_block_task() fails
  due to a signal, as noted by K Prateek

Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
 kernel/sched/core.c | 225 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 197 insertions(+), 28 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4ed24ef590f73..49e4528450083 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3643,6 +3643,23 @@ void update_rq_avg_idle(struct rq *rq)
 	rq->idle_stamp =3D 0;
 }
=20
+#ifdef CONFIG_SCHED_PROXY_EXEC
+static inline void proxy_set_task_cpu(struct task_struct *p, int cpu)
+{
+	unsigned int wake_cpu;
+
+	/*
+	 * Since we are enqueuing a blocked task on a cpu it may
+	 * not be able to run on, preserve wake_cpu when we
+	 * __set_task_cpu so we can return the task to where it
+	 * was previously runnable.
+	 */
+	wake_cpu =3D p->wake_cpu;
+	__set_task_cpu(p, cpu);
+	p->wake_cpu =3D wake_cpu;
+}
+#endif /* CONFIG_SCHED_PROXY_EXEC */
+
 static void
 ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 		 struct rq_flags *rf)
@@ -4242,13 +4259,6 @@ int try_to_wake_up(struct task_struct *p, unsigned i=
nt state, int wake_flags)
 		ttwu_queue(p, cpu, wake_flags);
 	}
 out:
-	/*
-	 * For now, if we've been woken up, clear the task->blocked_on
-	 * regardless if it was set to a mutex or PROXY_WAKING so the
-	 * task can run. We will need to be more careful later when
-	 * properly handling proxy migration
-	 */
-	clear_task_blocked_on(p, NULL);
 	if (success)
 		ttwu_stat(p, task_cpu(p), wake_flags);
=20
@@ -6533,6 +6543,8 @@ static bool try_to_block_task(struct rq *rq, struct t=
ask_struct *p,
 	if (signal_pending_state(task_state, p)) {
 		WRITE_ONCE(p->__state, TASK_RUNNING);
 		*task_state_p =3D TASK_RUNNING;
+		set_task_blocked_on_waking(p, NULL);
+
 		return false;
 	}
=20
@@ -6578,7 +6590,7 @@ static inline struct task_struct *proxy_resched_idle(=
struct rq *rq)
 	return rq->idle;
 }
=20
-static bool __proxy_deactivate(struct rq *rq, struct task_struct *donor)
+static bool proxy_deactivate(struct rq *rq, struct task_struct *donor)
 {
 	unsigned long state =3D READ_ONCE(donor->__state);
=20
@@ -6598,17 +6610,140 @@ static bool __proxy_deactivate(struct rq *rq, stru=
ct task_struct *donor)
 	return try_to_block_task(rq, donor, &state, true);
 }
=20
-static struct task_struct *proxy_deactivate(struct rq *rq, struct task_str=
uct *donor)
+static inline void proxy_release_rq_lock(struct rq *rq, struct rq_flags *r=
f)
+	__releases(__rq_lockp(rq))
+{
+	/*
+	 * The class scheduler may have queued a balance callback
+	 * from pick_next_task() called earlier.
+	 *
+	 * So here we have to zap callbacks before unlocking the rq
+	 * as another CPU may jump in and call sched_balance_rq
+	 * which can trip the warning in rq_pin_lock() if we
+	 * leave callbacks set.
+	 *
+	 * After we later reaquire the rq lock, we will force __schedule()
+	 * to pick_again, so the callbacks will get re-established.
+	 */
+	zap_balance_callbacks(rq);
+	rq_unpin_lock(rq, rf);
+	raw_spin_rq_unlock(rq);
+}
+
+static inline void proxy_reacquire_rq_lock(struct rq *rq, struct rq_flags =
*rf)
+__acquires(__rq_lockp(rq))
+{
+	raw_spin_rq_lock(rq);
+	rq_repin_lock(rq, rf);
+	update_rq_clock(rq);
+}
+
+/*
+ * If the blocked-on relationship crosses CPUs, migrate @p to the
+ * owner's CPU.
+ *
+ * This is because we must respect the CPU affinity of execution
+ * contexts (owner) but we can ignore affinity for scheduling
+ * contexts (@p). So we have to move scheduling contexts towards
+ * potential execution contexts.
+ *
+ * Note: The owner can disappear, but simply migrate to @target_cpu
+ * and leave that CPU to sort things out.
+ */
+static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
+			       struct task_struct *p, int target_cpu)
+	__must_hold(__rq_lockp(rq))
+{
+	struct rq *target_rq =3D cpu_rq(target_cpu);
+
+	lockdep_assert_rq_held(rq);
+	WARN_ON(p =3D=3D rq->curr);
+	/*
+	 * Since we are migrating a blocked donor, it could be rq->donor,
+	 * and we want to make sure there aren't any references from this
+	 * rq to it before we drop the lock. This avoids another cpu
+	 * jumping in and grabbing the rq lock and referencing rq->donor
+	 * or cfs_rq->curr, etc after we have migrated it to another cpu,
+	 * and before we pick_again in __schedule.
+	 *
+	 * So call proxy_resched_idle() to drop the rq->donor references
+	 * before we release the lock.
+	 */
+	proxy_resched_idle(rq);
+
+	deactivate_task(rq, p, DEQUEUE_NOCLOCK);
+	proxy_set_task_cpu(p, target_cpu);
+
+	proxy_release_rq_lock(rq, rf);
+
+	attach_one_task(target_rq, p);
+
+	proxy_reacquire_rq_lock(rq, rf);
+}
+
+static void proxy_force_return(struct rq *rq, struct rq_flags *rf,
+			       struct task_struct *p)
+	__must_hold(__rq_lockp(rq))
 {
-	if (!__proxy_deactivate(rq, donor)) {
+	struct rq *task_rq, *target_rq =3D NULL;
+	int cpu, wake_flag =3D WF_TTWU;
+
+	lockdep_assert_rq_held(rq);
+	WARN_ON(p =3D=3D rq->curr);
+
+	if (p =3D=3D rq->donor)
+		proxy_resched_idle(rq);
+
+	proxy_release_rq_lock(rq, rf);
+	/*
+	 * We drop the rq lock, and re-grab task_rq_lock to get
+	 * the pi_lock (needed for select_task_rq) as well.
+	 */
+	scoped_guard (task_rq_lock, p) {
+		task_rq =3D scope.rq;
+
 		/*
-		 * XXX: For now, if deactivation failed, set donor
-		 * as unblocked, as we aren't doing proxy-migrations
-		 * yet (more logic will be needed then).
+		 * Since we let go of the rq lock, the task may have been
+		 * woken or migrated to another rq before we  got the
+		 * task_rq_lock. So re-check we're on the same RQ. If
+		 * not, the task has already been migrated and that CPU
+		 * will handle any futher migrations.
 		 */
-		clear_task_blocked_on(donor, NULL);
+		if (task_rq !=3D rq)
+			break;
+
+		/*
+		 * Similarly, if we've been dequeued, someone else will
+		 * wake us
+		 */
+		if (!task_on_rq_queued(p))
+			break;
+
+		/*
+		 * Since we should only be calling here from __schedule()
+		 * -> find_proxy_task(), no one else should have
+		 * assigned current out from under us. But check and warn
+		 * if we see this, then bail.
+		 */
+		if (task_current(task_rq, p) || task_on_cpu(task_rq, p)) {
+			WARN_ONCE(1, "%s rq: %i current/on_cpu task %s %d  on_cpu: %i\n",
+				  __func__, cpu_of(task_rq),
+				  p->comm, p->pid, p->on_cpu);
+			break;
+		}
+
+		update_rq_clock(task_rq);
+		deactivate_task(task_rq, p, DEQUEUE_NOCLOCK);
+		cpu =3D select_task_rq(p, p->wake_cpu, &wake_flag);
+		set_task_cpu(p, cpu);
+		target_rq =3D cpu_rq(cpu);
+		clear_task_blocked_on(p, NULL);
 	}
-	return NULL;
+
+	if (target_rq)
+		attach_one_task(target_rq, p);
+
+	proxy_reacquire_rq_lock(rq, rf);
 }
=20
 /*
@@ -6629,18 +6764,27 @@ static struct task_struct *proxy_deactivate(struct =
rq *rq, struct task_struct *d
  */
 static struct task_struct *
 find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags =
*rf)
+	__must_hold(__rq_lockp(rq))
 {
-	enum { FOUND, DEACTIVATE_DONOR } action =3D FOUND;
+	enum { FOUND, DEACTIVATE_DONOR, MIGRATE, NEEDS_RETURN } action =3D FOUND;
 	struct task_struct *owner =3D NULL;
+	bool curr_in_chain =3D false;
 	int this_cpu =3D cpu_of(rq);
 	struct task_struct *p;
 	struct mutex *mutex;
+	int owner_cpu;
=20
 	/* Follow blocked_on chain. */
 	for (p =3D donor; (mutex =3D p->blocked_on); p =3D owner) {
-		/* if its PROXY_WAKING, resched_idle so ttwu can complete */
-		if (mutex =3D=3D PROXY_WAKING)
-			return proxy_resched_idle(rq);
+		/* if its PROXY_WAKING, do return migration or run if current */
+		if (mutex =3D=3D PROXY_WAKING) {
+			if (task_current(rq, p)) {
+				clear_task_blocked_on(p, PROXY_WAKING);
+				return p;
+			}
+			action =3D NEEDS_RETURN;
+			break;
+		}
=20
 		/*
 		 * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
@@ -6660,26 +6804,41 @@ find_proxy_task(struct rq *rq, struct task_struct *=
donor, struct rq_flags *rf)
 			return NULL;
 		}
=20
+		if (task_current(rq, p))
+			curr_in_chain =3D true;
+
 		owner =3D __mutex_owner(mutex);
 		if (!owner) {
 			/*
-			 * If there is no owner, clear blocked_on
-			 * and return p so it can run and try to
-			 * acquire the lock
+			 * If there is no owner, either clear blocked_on
+			 * and return p (if it is current and safe to
+			 * just run on this rq), or return-migrate the task.
 			 */
-			__clear_task_blocked_on(p, mutex);
-			return p;
+			if (task_current(rq, p)) {
+				__clear_task_blocked_on(p, NULL);
+				return p;
+			}
+			action =3D NEEDS_RETURN;
+			break;
 		}
=20
 		if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
 			/* XXX Don't handle blocked owners/delayed dequeue yet */
+			if (curr_in_chain)
+				return proxy_resched_idle(rq);
 			action =3D DEACTIVATE_DONOR;
 			break;
 		}
=20
-		if (task_cpu(owner) !=3D this_cpu) {
-			/* XXX Don't handle migrations yet */
-			action =3D DEACTIVATE_DONOR;
+		owner_cpu =3D task_cpu(owner);
+		if (owner_cpu !=3D this_cpu) {
+			/*
+			 * @owner can disappear, simply migrate to @owner_cpu
+			 * and leave that CPU to sort things out.
+			 */
+			if (curr_in_chain)
+				return proxy_resched_idle(rq);
+			action =3D MIGRATE;
 			break;
 		}
=20
@@ -6741,7 +6900,17 @@ find_proxy_task(struct rq *rq, struct task_struct *d=
onor, struct rq_flags *rf)
 	/* Handle actions we need to do outside of the guard() scope */
 	switch (action) {
 	case DEACTIVATE_DONOR:
-		return proxy_deactivate(rq, donor);
+		if (proxy_deactivate(rq, donor))
+			return NULL;
+		/* If deactivate fails, force return */
+		p =3D donor;
+		fallthrough;
+	case NEEDS_RETURN:
+		proxy_force_return(rq, rf, p);
+		return NULL;
+	case MIGRATE:
+		proxy_migrate_task(rq, rf, p, owner_cpu);
+		return NULL;
 	case FOUND:
 		/* fallthrough */;
 	}
--=20
2.53.0.1018.g2bb0e51243-goog