From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0802FC4332F
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:35:47 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232793AbjKFTfq (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:35:46 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53650 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231710AbjKFTfm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:35:42 -0500
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05A5BD7D
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:35:40 -0800 (PST)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-5a8ee6a1801so66770827b3.3
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:35:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299339; x=1699904139;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=6u2s2IvYmP2jgBpA7Wp/1d6vvyB2fzH6TMmw4jw2+SQ=;
        b=OKkv6VzsWx4A+3V4NYeYmNhLDYIs+dEw4Af1U/+9jKi98i1v0bGzc3CNyZq5GFZw57
         wrPWzg8anGyYBL0aVxf23Gumy9iIM+6FFWHOgqS9jbEg/TW1GiadHZiC+Q7zQ7uBO8Yp
         hh01KUNCHYRhv18Qk7hry0Ow1LDkkNKtDI17B8PsvvtzKEanv3wp2WxohfxpDOSjz0Rp
         1lNf0iSp7qc3sTxYw6lQGJXvpqqrCsi+sBPiwLrw8/vgndSaCqa0g9rJ/zMlstJSp1uo
         Vdo33c07FMOtBYxDBkY0GuJpa4W/uqnhoW4XZIllmx3wsIjfWhqxnZ5I88ai5wZVQ+Jo
         xNOQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299339; x=1699904139;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=6u2s2IvYmP2jgBpA7Wp/1d6vvyB2fzH6TMmw4jw2+SQ=;
        b=xITxqADh5WpXR45KEKZYP1mCBFgeD0uJbEyF9xQmZjYJDxxK04xNNsp3rKAkaG20dc
         2WTl6Jb3tXZapqiFIQFUGtmjJID+e7cZ7JebJUo0Spt34lehH26Ts1lXqDqG/D2CXNdx
         W+m+mAb27OexIBCDbAISnKz2zzqlcPcuKXLpPuiRenuYXWT13HfsnSGSNOe6dAhH3/Pr
         18VIJ2G0x3SpUF7/+/Rz0UCiFiymli0AR0fPDqT0bLPRDShpE7YGVzxDfG1NUVm7/XV6
         7yM0A8U2MAWfp0iOdOF9OS0BPYFh0fEi3zEujnVgv0mtAN3OVm0zHF7bOYZ2uxoJbyyQ
         m1QA==
X-Gm-Message-State: AOJu0YzMJtJ9HdGCFYpFSACtSr7m+3bbkC2lb+SENEcaeRz+v5NOYYHH
        kYMPSWncRgvJRkninLdYqY9QKaqk5Wr5548eq5HEtOsDwiDBEY45xH2SFMryMkMz62AJo+lpoJH
        mXC7LUShoNvavTJ7ohZQk21F0533zel62jBrqLV4GEmEm5l+Pp1/r0n5DwHNXhkG53nRNH1Q=
X-Google-Smtp-Source: 
 AGHT+IG2HErThoIAbXbQwaBi8l0rKfZA8WNdA+IUidnkO6zjjPGmdR+O953aOjFpheELmcgpAZIZTn+t3u/N
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a0d:df97:0:b0:5a7:b496:5983 with SMTP id
 i145-20020a0ddf97000000b005a7b4965983mr226393ywe.9.1699299338905; Mon, 06 Nov
 2023 11:35:38 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:44 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-2-jstultz@google.com>
Subject: [PATCH v6 01/20] sched: Unify runtime accounting across classes
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com,
        "Connor O'Brien" <connoro@google.com>,
        John Stultz <jstultz@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

All classes use sched_entity::exec_start to track runtime and have
copies of the exact same code around to compute runtime.

Collapse all that.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[fix conflicts, fold in update_current_exec_runtime]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: rebased, resovling minor conflicts]
Signed-off-by: John Stultz <jstultz@google.com>
---
NOTE: This patch is a general cleanup and if no one objects
could be merged at this point. If needed, I'll resend separately
if it isn't picked up on its own.
---
 include/linux/sched.h    |  2 +-
 kernel/sched/deadline.c  | 13 +++-------
 kernel/sched/fair.c      | 56 ++++++++++++++++++++++++++++++----------
 kernel/sched/rt.c        | 13 +++-------
 kernel/sched/sched.h     | 12 ++-------
 kernel/sched/stop_task.c | 13 +---------
 6 files changed, 52 insertions(+), 57 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 77f01ac385f7..4f5b0710c0f1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -520,7 +520,7 @@ struct sched_statistics {
 	u64				block_max;
 	s64				sum_block_runtime;
=20
-	u64				exec_max;
+	s64				exec_max;
 	u64				slice_max;
=20
 	u64				nr_migrations_cold;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 58b542bf2893..9522e6607754 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1299,9 +1299,8 @@ static void update_curr_dl(struct rq *rq)
 {
 	struct task_struct *curr =3D rq->curr;
 	struct sched_dl_entity *dl_se =3D &curr->dl;
-	u64 delta_exec, scaled_delta_exec;
+	s64 delta_exec, scaled_delta_exec;
 	int cpu =3D cpu_of(rq);
-	u64 now;
=20
 	if (!dl_task(curr) || !on_dl_rq(dl_se))
 		return;
@@ -1314,21 +1313,15 @@ static void update_curr_dl(struct rq *rq)
 	 * natural solution, but the full ramifications of this
 	 * approach need further study.
 	 */
-	now =3D rq_clock_task(rq);
-	delta_exec =3D now - curr->se.exec_start;
-	if (unlikely((s64)delta_exec <=3D 0)) {
+	delta_exec =3D update_curr_common(rq);
+	if (unlikely(delta_exec <=3D 0)) {
 		if (unlikely(dl_se->dl_yielded))
 			goto throttle;
 		return;
 	}
=20
-	schedstat_set(curr->stats.exec_max,
-		      max(curr->stats.exec_max, delta_exec));
-
 	trace_sched_stat_runtime(curr, delta_exec, 0);
=20
-	update_current_exec_runtime(curr, now, delta_exec);
-
 	if (dl_entity_is_special(dl_se))
 		return;
=20
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df348aa55d3c..c919633acd3d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1144,23 +1144,17 @@ static void update_tg_load_avg(struct cfs_rq *cfs_r=
q)
 }
 #endif /* CONFIG_SMP */
=20
-/*
- * Update the current task's runtime statistics.
- */
-static void update_curr(struct cfs_rq *cfs_rq)
+static s64 update_curr_se(struct rq *rq, struct sched_entity *curr)
 {
-	struct sched_entity *curr =3D cfs_rq->curr;
-	u64 now =3D rq_clock_task(rq_of(cfs_rq));
-	u64 delta_exec;
-
-	if (unlikely(!curr))
-		return;
+	u64 now =3D rq_clock_task(rq);
+	s64 delta_exec;
=20
 	delta_exec =3D now - curr->exec_start;
-	if (unlikely((s64)delta_exec <=3D 0))
-		return;
+	if (unlikely(delta_exec <=3D 0))
+		return delta_exec;
=20
 	curr->exec_start =3D now;
+	curr->sum_exec_runtime +=3D delta_exec;
=20
 	if (schedstat_enabled()) {
 		struct sched_statistics *stats;
@@ -1170,9 +1164,43 @@ static void update_curr(struct cfs_rq *cfs_rq)
 				max(delta_exec, stats->exec_max));
 	}
=20
-	curr->sum_exec_runtime +=3D delta_exec;
-	schedstat_add(cfs_rq->exec_clock, delta_exec);
+	return delta_exec;
+}
+
+/*
+ * Used by other classes to account runtime.
+ */
+s64 update_curr_common(struct rq *rq)
+{
+	struct task_struct *curr =3D rq->curr;
+	s64 delta_exec;
=20
+	delta_exec =3D update_curr_se(rq, &curr->se);
+	if (unlikely(delta_exec <=3D 0))
+		return delta_exec;
+
+	account_group_exec_runtime(curr, delta_exec);
+	cgroup_account_cputime(curr, delta_exec);
+
+	return delta_exec;
+}
+
+/*
+ * Update the current task's runtime statistics.
+ */
+static void update_curr(struct cfs_rq *cfs_rq)
+{
+	struct sched_entity *curr =3D cfs_rq->curr;
+	s64 delta_exec;
+
+	if (unlikely(!curr))
+		return;
+
+	delta_exec =3D update_curr_se(rq_of(cfs_rq), curr);
+	if (unlikely(delta_exec <=3D 0))
+		return;
+
+	schedstat_add(cfs_rq->exec_clock, delta_exec);
 	curr->vruntime +=3D calc_delta_fair(delta_exec, curr);
 	update_deadline(cfs_rq, curr);
 	update_min_vruntime(cfs_rq);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 0597ba0f85ff..327ae4148aec 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1046,24 +1046,17 @@ static void update_curr_rt(struct rq *rq)
 {
 	struct task_struct *curr =3D rq->curr;
 	struct sched_rt_entity *rt_se =3D &curr->rt;
-	u64 delta_exec;
-	u64 now;
+	s64 delta_exec;
=20
 	if (curr->sched_class !=3D &rt_sched_class)
 		return;
=20
-	now =3D rq_clock_task(rq);
-	delta_exec =3D now - curr->se.exec_start;
-	if (unlikely((s64)delta_exec <=3D 0))
+	delta_exec =3D update_curr_common(rq);
+	if (unlikely(delta_exec < 0))
 		return;
=20
-	schedstat_set(curr->stats.exec_max,
-		      max(curr->stats.exec_max, delta_exec));
-
 	trace_sched_stat_runtime(curr, delta_exec, 0);
=20
-	update_current_exec_runtime(curr, now, delta_exec);
-
 	if (!rt_bandwidth_enabled())
 		return;
=20
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 04846272409c..1def5b7fa1df 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2228,6 +2228,8 @@ struct affinity_context {
 	unsigned int flags;
 };
=20
+extern s64 update_curr_common(struct rq *rq);
+
 struct sched_class {
=20
 #ifdef CONFIG_UCLAMP_TASK
@@ -3280,16 +3282,6 @@ extern int sched_dynamic_mode(const char *str);
 extern void sched_dynamic_update(int mode);
 #endif
=20
-static inline void update_current_exec_runtime(struct task_struct *curr,
-						u64 now, u64 delta_exec)
-{
-	curr->se.sum_exec_runtime +=3D delta_exec;
-	account_group_exec_runtime(curr, delta_exec);
-
-	curr->se.exec_start =3D now;
-	cgroup_account_cputime(curr, delta_exec);
-}
-
 #ifdef CONFIG_SCHED_MM_CID
=20
 #define SCHED_MM_CID_PERIOD_NS	(100ULL * 1000000)	/* 100ms */
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 85590599b4d6..7595494ceb6d 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -70,18 +70,7 @@ static void yield_task_stop(struct rq *rq)
=20
 static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
 {
-	struct task_struct *curr =3D rq->curr;
-	u64 now, delta_exec;
-
-	now =3D rq_clock_task(rq);
-	delta_exec =3D now - curr->se.exec_start;
-	if (unlikely((s64)delta_exec < 0))
-		delta_exec =3D 0;
-
-	schedstat_set(curr->stats.exec_max,
-		      max(curr->stats.exec_max, delta_exec));
-
-	update_current_exec_runtime(curr, now, delta_exec);
+	update_curr_common(rq);
 }
=20
 /*
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EA576C4332F
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:35:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232958AbjKFTf5 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:35:57 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53672 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232721AbjKFTfp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:35:45 -0500
Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com
 [IPv6:2607:f8b0:4864:20::449])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 232F91BC
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:35:42 -0800 (PST)
Received: by mail-pf1-x449.google.com with SMTP id
 d2e1a72fcca58-6c2cc5b13dfso3835415b3a.0
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:35:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299341; x=1699904141;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=E+Tf7lAmN3+1i7vDdu4NTcFRbtdUHjsd0AV3anEQT7I=;
        b=F5RpFosj4OSpjrRH58YBsT5bPq9RM9QZGp4tmm5n7xMw5NzdyVd3YmKk/lWX4WceAA
         U9vtANbCX7rKe7t3b330PI3JExLTKkAQ+4THB/2L1VlQxIy/ly1f1yunfw5GQ4qFiSEf
         eDq2w6nZx1Mu8VyBoqE/srjuq5z3KYoV4aCcUaOvWTTB+h3f0z4V35qSCf5CDcsUWoWr
         xth3PGH3X1KgF9m1kodyJt5iMiu/kkaN4URtWOvGqb+vzqiMjuxOFGrDBlplP4sIJ3S1
         o6IjqpeDAjfXRcRMhFmbfLgBtFZd8ilK570b7CDAiqrsKNYh6HNwjEC5dTeHh/mXKjTZ
         09mw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299341; x=1699904141;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=E+Tf7lAmN3+1i7vDdu4NTcFRbtdUHjsd0AV3anEQT7I=;
        b=B+GJg9SVylRbuPiFubW6SYB2cTKGxt7jdSOo8/R5rCrcUYrzrLUe2PIznu83xHFTdL
         KMd3qSMP9Ztd4zanQsAE0CJCx1RrdyzOOGn84OgFyHvpLmwtxY5ruhcfj4mwYokRZynv
         BEAHt4YflH3hbwCR7P7hsO1fiHo8FPzFnvKp095P/LoQfh/Vv6qhnHFWQ1IPFXOYqu1O
         UabjWnnQAqdjYbQgsxTD4nWs1wsYWgALXgF9VNw14nJZvfO1RqNj94oGIEWH9KM2C4+4
         sTuLZ11nb2Uc8z/A2DNBiLLRJQY+ZqUPSt7MBzu05v1uuLWWH732vLhFKC3N0Rkv10HZ
         VJNg==
X-Gm-Message-State: AOJu0YxsyhVnJWFCGIowWXm/XMDPrYD05y01UfrcfCsLNxFgWdWf9WB1
        bDEDemdA3RS2Xdbn/VYKKcaO1qrrEho161ftSoirwT8pGixKiX1PxHjfDWaNQxTqcN+rAiBPx5E
        n6OB0kgNGlT/vP+fvYuWx24sCsY7yLUVGMCP5ykv0jSBHAIE/VUz3hKBotf01CD+g8DbD4ak=
X-Google-Smtp-Source: 
 AGHT+IHJ2FKuUmMZntlWThRMdR+LCkyPguXeMnJFmKXwEoTQL99vXYbrhKi1jMPEi3AsRkP6FlLqtSGBgYzt
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6a00:790:b0:690:bc3f:4fe2 with SMTP id
 g16-20020a056a00079000b00690bc3f4fe2mr14372pfu.1.1699299340517; Mon, 06 Nov
 2023 11:35:40 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:45 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-3-jstultz@google.com>
Subject: [PATCH v6 02/20] locking/mutex: Removes wakeups from under
 mutex::wait_lock
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

In preparation to nest mutex::wait_lock under rq::lock we need to remove
wakeups from under it.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Heavily changed after 55f036ca7e74 ("locking: WW mutex cleanup") and
08295b3b5bee ("locking: Implement an algorithm choice for Wound-Wait
mutexes")]
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
[jstultz: rebased to mainline, added extra wake_up_q & init
 to avoid hangs, similar to Connor's rework of this patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Reverted back to an earlier version of this patch to undo
  the change that kept the wake_q in the ctx structure, as
  that broke the rule that the wake_q must always be on the
  stack, as its not safe for concurrency.
v6:
* Made tweaks suggested by Waiman Long
---
 kernel/locking/mutex.c    | 17 +++++++++++++----
 kernel/locking/ww_mutex.h | 29 ++++++++++++++++++-----------
 2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index d973fe6041bf..4ada158eb7ca 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -570,6 +570,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		    struct lockdep_map *nest_lock, unsigned long ip,
 		    struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
 {
+	DEFINE_WAKE_Q(wake_q);
 	struct mutex_waiter waiter;
 	struct ww_mutex *ww;
 	int ret;
@@ -620,7 +621,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	 */
 	if (__mutex_trylock(lock)) {
 		if (ww_ctx)
-			__ww_mutex_check_waiters(lock, ww_ctx);
+			__ww_mutex_check_waiters(lock, ww_ctx, &wake_q);
=20
 		goto skip_wait;
 	}
@@ -640,7 +641,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		 * Add in stamp order, waking up waiters that must kill
 		 * themselves.
 		 */
-		ret =3D __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
+		ret =3D __ww_mutex_add_waiter(&waiter, lock, ww_ctx, &wake_q);
 		if (ret)
 			goto err_early_kill;
 	}
@@ -676,6 +677,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int s=
tate, unsigned int subclas
 		}
=20
 		raw_spin_unlock(&lock->wait_lock);
+		/* Make sure we do wakeups before calling schedule */
+		if (!wake_q_empty(&wake_q)) {
+			wake_up_q(&wake_q);
+			wake_q_init(&wake_q);
+		}
 		schedule_preempt_disabled();
=20
 		first =3D __mutex_waiter_is_first(lock, &waiter);
@@ -709,7 +715,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		 */
 		if (!ww_ctx->is_wait_die &&
 		    !__mutex_waiter_is_first(lock, &waiter))
-			__ww_mutex_check_waiters(lock, ww_ctx);
+			__ww_mutex_check_waiters(lock, ww_ctx, &wake_q);
 	}
=20
 	__mutex_remove_waiter(lock, &waiter);
@@ -725,6 +731,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		ww_mutex_lock_acquired(ww, ww_ctx);
=20
 	raw_spin_unlock(&lock->wait_lock);
+	wake_up_q(&wake_q);
 	preempt_enable();
 	return 0;
=20
@@ -736,6 +743,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	raw_spin_unlock(&lock->wait_lock);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, ip);
+	wake_up_q(&wake_q);
 	preempt_enable();
 	return ret;
 }
@@ -929,6 +937,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 		}
 	}
=20
+	preempt_disable();
 	raw_spin_lock(&lock->wait_lock);
 	debug_mutex_unlock(lock);
 	if (!list_empty(&lock->wait_list)) {
@@ -947,8 +956,8 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 		__mutex_handoff(lock, next);
=20
 	raw_spin_unlock(&lock->wait_lock);
-
 	wake_up_q(&wake_q);
+	preempt_enable();
 }
=20
 #ifndef CONFIG_DEBUG_LOCK_ALLOC
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 3ad2cc4823e5..7189c6631d90 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -275,7 +275,7 @@ __ww_ctx_less(struct ww_acquire_ctx *a, struct ww_acqui=
re_ctx *b)
  */
 static bool
 __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
-	       struct ww_acquire_ctx *ww_ctx)
+	       struct ww_acquire_ctx *ww_ctx, struct wake_q_head *wake_q)
 {
 	if (!ww_ctx->is_wait_die)
 		return false;
@@ -284,7 +284,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER =
*waiter,
 #ifndef WW_RT
 		debug_mutex_wake_waiter(lock, waiter);
 #endif
-		wake_up_process(waiter->task);
+		wake_q_add(wake_q, waiter->task);
 	}
=20
 	return true;
@@ -299,7 +299,8 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER =
*waiter,
  */
 static bool __ww_mutex_wound(struct MUTEX *lock,
 			     struct ww_acquire_ctx *ww_ctx,
-			     struct ww_acquire_ctx *hold_ctx)
+			     struct ww_acquire_ctx *hold_ctx,
+			     struct wake_q_head *wake_q)
 {
 	struct task_struct *owner =3D __ww_mutex_owner(lock);
=20
@@ -331,7 +332,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 		 * wakeup pending to re-read the wounded state.
 		 */
 		if (owner !=3D current)
-			wake_up_process(owner);
+			wake_q_add(wake_q, owner);
=20
 		return true;
 	}
@@ -352,7 +353,8 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
  * The current task must not be on the wait list.
  */
 static void
-__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx)
+__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx,
+			 struct wake_q_head *wake_q)
 {
 	struct MUTEX_WAITER *cur;
=20
@@ -364,8 +366,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_=
acquire_ctx *ww_ctx)
 		if (!cur->ww_ctx)
 			continue;
=20
-		if (__ww_mutex_die(lock, cur, ww_ctx) ||
-		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
+		if (__ww_mutex_die(lock, cur, ww_ctx, wake_q) ||
+		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx, wake_q))
 			break;
 	}
 }
@@ -377,6 +379,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_=
acquire_ctx *ww_ctx)
 static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx=
 *ctx)
 {
+	DEFINE_WAKE_Q(wake_q);
+
 	ww_mutex_lock_acquired(lock, ctx);
=20
 	/*
@@ -405,8 +409,10 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, s=
truct ww_acquire_ctx *ctx)
 	 * die or wound us.
 	 */
 	lock_wait_lock(&lock->base);
-	__ww_mutex_check_waiters(&lock->base, ctx);
+	__ww_mutex_check_waiters(&lock->base, ctx, &wake_q);
 	unlock_wait_lock(&lock->base);
+
+	wake_up_q(&wake_q);
 }
=20
 static __always_inline int
@@ -488,7 +494,8 @@ __ww_mutex_check_kill(struct MUTEX *lock, struct MUTEX_=
WAITER *waiter,
 static inline int
 __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
 		      struct MUTEX *lock,
-		      struct ww_acquire_ctx *ww_ctx)
+		      struct ww_acquire_ctx *ww_ctx,
+		      struct wake_q_head *wake_q)
 {
 	struct MUTEX_WAITER *cur, *pos =3D NULL;
 	bool is_wait_die;
@@ -532,7 +539,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
 		pos =3D cur;
=20
 		/* Wait-Die: ensure younger waiters die. */
-		__ww_mutex_die(lock, cur, ww_ctx);
+		__ww_mutex_die(lock, cur, ww_ctx, wake_q);
 	}
=20
 	__ww_waiter_add(lock, waiter, pos);
@@ -550,7 +557,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
 		 * such that either we or the fastpath will wound @ww->ctx.
 		 */
 		smp_mb();
-		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
+		__ww_mutex_wound(lock, ww_ctx, ww->ctx, wake_q);
 	}
=20
 	return 0;
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E7EA4C4332F
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:36:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232917AbjKFTgA (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:36:00 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53708 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231710AbjKFTfr (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:35:47 -0500
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F6E8D79
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:35:44 -0800 (PST)
Received: by mail-pg1-x549.google.com with SMTP id
 41be03b00d2f7-56f75e70190so2714408a12.3
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:35:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299343; x=1699904143;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=7eHKVwnwUP7uOaaC7Di13JNalv6mEHQZgUfF6YhSuus=;
        b=TsZRUl5+hagL0Dck+uvpX3cJxv2ZWn3OjdxuIysbe8Uhhut8eqZrzyS0Nza1k3TUwJ
         9O2j75VbgxL2woJkCHW0GqBFJZu8r/WRcKxImflyEi9P625deqpr071jH9yhIkBH36pz
         Yha76756eHD5oxnjfWJYgmUkZrRuAoyfzO1HffqAO3ZKynK8F7MrpS3/889YzM4iS5V1
         NMg/8pAxDap+nku2JktQZALQ9nY9GhX4YUoicNnahU0WbdBMlGqyBCMu6CknTmkspvVN
         +LNJoJsaG4RTYyUpCtNj1UIZ7hcqThDyobXdjUXf5YQzWkCxjLAFeI8dM8t6HJOGl6kM
         Ojpw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299343; x=1699904143;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=7eHKVwnwUP7uOaaC7Di13JNalv6mEHQZgUfF6YhSuus=;
        b=iHg662SLRU+C8glwnTTlPdwJZC+qPfNNr9zTGTQz30Fr+wBzT+XnDAzWi3TTFgLAag
         MPHeuNgC/OkAqW1UMs/DsQQFR3X8b6OF0yGGanFltzHpe+jLXvSr00e/ZOSfv+65Gant
         Lzxri6sn+R5NMD9NwytkfNdwK0Xjcf7tijzto1pASy2VGEcmMhZEuJP8LCLAVKdijR6t
         zYATTeYWv8V1l7MH+sQYmNMfoFGeJnL/xtC/Vr8MCLhtwkAQ7IThYGaoO/cktM1hpOG6
         j70y/2rGE3pAQdHm7ol5KT9A2X6PMCTzU1c2vVfWFHKUth+wvNrWu5eHIc5PjYC+cyoh
         j10g==
X-Gm-Message-State: AOJu0Yw5+6+xgBLVyA68fFoTPXD+6hIGQAL70U8GoBUZM6UUNNhqOCmm
        wTgYpP9c8TBSCZeBH35jQtlma80JVAgvJaD/sUdK0tHk80TwrpAzwZ1s2y5eQfsVl95/GTUvQ9H
        EhUgk/AA+Isn7hw/6+DcjL3GPkikhVqPeT9uh97JJqQt88eYzWAHUpfMwhKXmx8XcBrJjoxU=
X-Google-Smtp-Source: 
 AGHT+IEteypnPS1O1xhM+mUxPxpszFdc9CjPtZCnjEs6grs3DdjgVX72UGgAOiPwYddcURaIFXbpbN9N8mPf
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a63:7459:0:b0:584:dd94:24c1 with SMTP id
 e25-20020a637459000000b00584dd9424c1mr532134pgn.11.1699299342360; Mon, 06 Nov
 2023 11:35:42 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:46 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-4-jstultz@google.com>
Subject: [PATCH v6 03/20] locking/mutex: make mutex::wait_lock irq safe
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com,
        "Connor O'Brien" <connoro@google.com>,
        John Stultz <jstultz@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Juri Lelli <juri.lelli@redhat.com>

mutex::wait_lock might be nested under rq->lock.

Make it irq safe then.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[rebase & fix {un,}lock_wait_lock helpers in ww_mutex.h]
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
v3:
* Re-added this patch after it was dropped in v2 which
  caused lockdep warnings to trip.
---
 kernel/locking/mutex.c    | 18 ++++++++++--------
 kernel/locking/ww_mutex.h | 22 +++++++++++-----------
 2 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4ada158eb7ca..4c63a197f6fe 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -573,6 +573,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	DEFINE_WAKE_Q(wake_q);
 	struct mutex_waiter waiter;
 	struct ww_mutex *ww;
+	unsigned long flags;
 	int ret;
=20
 	if (!use_ww_ctx)
@@ -615,7 +616,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		return 0;
 	}
=20
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	/*
 	 * After waiting to acquire the wait_lock, try again.
 	 */
@@ -676,7 +677,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 				goto err;
 		}
=20
-		raw_spin_unlock(&lock->wait_lock);
+		raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 		/* Make sure we do wakeups before calling schedule */
 		if (!wake_q_empty(&wake_q)) {
 			wake_up_q(&wake_q);
@@ -702,9 +703,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 			trace_contention_begin(lock, LCB_F_MUTEX);
 		}
=20
-		raw_spin_lock(&lock->wait_lock);
+		raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	}
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 acquired:
 	__set_current_state(TASK_RUNNING);
=20
@@ -730,7 +731,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	if (ww_ctx)
 		ww_mutex_lock_acquired(ww, ww_ctx);
=20
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
 	return 0;
@@ -740,7 +741,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	__mutex_remove_waiter(lock, &waiter);
 err_early_kill:
 	trace_contention_end(lock, ret);
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, ip);
 	wake_up_q(&wake_q);
@@ -911,6 +912,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 	struct task_struct *next =3D NULL;
 	DEFINE_WAKE_Q(wake_q);
 	unsigned long owner;
+	unsigned long flags;
=20
 	mutex_release(&lock->dep_map, ip);
=20
@@ -938,7 +940,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 	}
=20
 	preempt_disable();
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	debug_mutex_unlock(lock);
 	if (!list_empty(&lock->wait_list)) {
 		/* get the first entry from the wait-list: */
@@ -955,7 +957,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 	if (owner & MUTEX_FLAG_HANDOFF)
 		__mutex_handoff(lock, next);
=20
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
 }
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 7189c6631d90..8b94f4b89e74 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -70,14 +70,14 @@ __ww_mutex_has_waiters(struct mutex *lock)
 	return atomic_long_read(&lock->owner) & MUTEX_FLAG_WAITERS;
 }
=20
-static inline void lock_wait_lock(struct mutex *lock)
+static inline void lock_wait_lock(struct mutex *lock, unsigned long *flags)
 {
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, *flags);
 }
=20
-static inline void unlock_wait_lock(struct mutex *lock)
+static inline void unlock_wait_lock(struct mutex *lock, unsigned long flag=
s)
 {
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 }
=20
 static inline void lockdep_assert_wait_lock_held(struct mutex *lock)
@@ -144,14 +144,14 @@ __ww_mutex_has_waiters(struct rt_mutex *lock)
 	return rt_mutex_has_waiters(&lock->rtmutex);
 }
=20
-static inline void lock_wait_lock(struct rt_mutex *lock)
+static inline void lock_wait_lock(struct rt_mutex *lock, unsigned long *fl=
ags)
 {
-	raw_spin_lock(&lock->rtmutex.wait_lock);
+	raw_spin_lock_irqsave(&lock->rtmutex.wait_lock, *flags);
 }
=20
-static inline void unlock_wait_lock(struct rt_mutex *lock)
+static inline void unlock_wait_lock(struct rt_mutex *lock, flags)
 {
-	raw_spin_unlock(&lock->rtmutex.wait_lock);
+	raw_spin_unlock_irqrestore(&lock->rtmutex.wait_lock, flags);
 }
=20
 static inline void lockdep_assert_wait_lock_held(struct rt_mutex *lock)
@@ -380,6 +380,7 @@ static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx=
 *ctx)
 {
 	DEFINE_WAKE_Q(wake_q);
+	unsigned long flags;
=20
 	ww_mutex_lock_acquired(lock, ctx);
=20
@@ -408,10 +409,9 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, s=
truct ww_acquire_ctx *ctx)
 	 * Uh oh, we raced in fastpath, check if any of the waiters need to
 	 * die or wound us.
 	 */
-	lock_wait_lock(&lock->base);
+	lock_wait_lock(&lock->base, &flags);
 	__ww_mutex_check_waiters(&lock->base, ctx, &wake_q);
-	unlock_wait_lock(&lock->base);
-
+	unlock_wait_lock(&lock->base, flags);
 	wake_up_q(&wake_q);
 }
=20
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 81C4DC4167D
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:36:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232995AbjKFTgD (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:36:03 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52278 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232891AbjKFTfy (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:35:54 -0500
Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com
 [IPv6:2607:f8b0:4864:20::114a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59BA3D77
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:35:45 -0800 (PST)
Received: by mail-yw1-x114a.google.com with SMTP id
 00721157ae682-5a828bdcfbaso67678747b3.2
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:35:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299344; x=1699904144;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=rmX5Q7SFKWcrt3zpGHMTaKC5ErTWRpzWqGhGcyC8r+o=;
        b=X2CBGIjDJhspBsEYJlKbs03S7dCvyj2MKKbM3wO1ZNggOgV/fuHlyCLDSHvv+klLPp
         n8zQADn4BG/zn+fkLtXmcx9mPGucrPkznPw1SQ3kUWl8gGzDd7xA0tawdKcTR2Lknpyv
         s6mFrL1YJjG7XCZsvmsZojhvGwwj23IjnmeBwyxCpAvn0oVkfoToXF7AzsxDIzjIXetv
         hC7Uu4eGy8T+BVbZu7TqC4GHQmJetYmESeE3OEFrcQSuZ+zYmSGtOcCDdpMLMNRhMDrz
         e4HO7ikF0nvtr2CNBpdRXxqhSnURKBnQtPj21lKGuWY0uhdo5UFmCU2tNYYESXmISZs2
         /22A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299344; x=1699904144;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=rmX5Q7SFKWcrt3zpGHMTaKC5ErTWRpzWqGhGcyC8r+o=;
        b=CHB06x3EA87kq7Td6lgajSui0P0NhF2VKeNpwznH9zzLdkelVziNpTUuvwZH3eLL4o
         yRyC9kqpXRQiymou99ZkuZ8tMxPb2BK0m4DyeD3f+aLKkcNFk6SXHPDluhZV2NrbxRfD
         xMU+dH1yDPgQxFWhXa/cb6sA88FdtO/xP3roBbG7wZtP7Yx7mvkV6X9kA3ax3dN804Bx
         5ShSnM2JeT2rQKFiuqFZhJLb315ZQKhtzKuXhgufHIQiwjb4ZiuKK8BkFE5sJprQuLBI
         5DZgrFng9IotJL66guzllKfd+VKtqVh5iOGxuMQTVJBPKHIr//UZ0UN38u6ORUVKPR5k
         kLUA==
X-Gm-Message-State: AOJu0YwnTWFejn6ANjdOLqYQWRvdObwsSohJK16Vn74uC3MiDwMpnHUG
        kmlKtQH3wg7y9OkSl5eU2x/A63qQMV5GKLms/XdSsG174eW8ZRRUBzHHwP9JseDrZ2CMDRJJKPw
        O/Ej8U3Eq6GeqIbGRpzVKMPZok9h8hy3mgTzXxKchYQ9PBZlJxTO3d4cY/TtIQpNTxXvdizc=
X-Google-Smtp-Source: 
 AGHT+IE0cSl2KladoOk9DsoTcmrBwfiWeSwyIQHH3cn8WIQmivONyLaTUw7IkYkif6oZm157DDc/0oAIKYnn
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a0d:df4f:0:b0:5a7:af47:9dda with SMTP id
 i76-20020a0ddf4f000000b005a7af479ddamr231392ywe.9.1699299344416; Mon, 06 Nov
 2023 11:35:44 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:47 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-5-jstultz@google.com>
Subject: [PATCH v6 04/20] locking/mutex: Expose __mutex_owner()
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com,
        Valentin Schneider <valentin.schneider@arm.com>,
        "Connor O'Brien" <connoro@google.com>,
        John Stultz <jstultz@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Juri Lelli <juri.lelli@redhat.com>

Implementing proxy execution requires that scheduler code be able to
identify the current owner of a mutex. Expose __mutex_owner() for
this purpose (alone!).

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
[Removed the EXPORT_SYMBOL]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Reworked per Peter's suggestions]
Signed-off-by: John Stultz <jstultz@google.com>
---
v4:
* Move __mutex_owner() to kernel/locking/mutex.h instead of
  adding a new globally available accessor function to keep
  the exposure of this low, along with keeping it an inline
  function, as suggested by PeterZ
---
 kernel/locking/mutex.c | 25 -------------------------
 kernel/locking/mutex.h | 25 +++++++++++++++++++++++++
 2 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4c63a197f6fe..2c5d1a9cf767 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -56,31 +56,6 @@ __mutex_init(struct mutex *lock, const char *name, struc=
t lock_class_key *key)
 }
 EXPORT_SYMBOL(__mutex_init);
=20
-/*
- * @owner: contains: 'struct task_struct *' to the current lock owner,
- * NULL means not owned. Since task_struct pointers are aligned at
- * at least L1_CACHE_BYTES, we have low bits to store extra state.
- *
- * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup.
- * Bit1 indicates unlock needs to hand the lock to the top-waiter
- * Bit2 indicates handoff has been done and we're waiting for pickup.
- */
-#define MUTEX_FLAG_WAITERS	0x01
-#define MUTEX_FLAG_HANDOFF	0x02
-#define MUTEX_FLAG_PICKUP	0x04
-
-#define MUTEX_FLAGS		0x07
-
-/*
- * Internal helper function; C doesn't allow us to hide it :/
- *
- * DO NOT USE (outside of mutex code).
- */
-static inline struct task_struct *__mutex_owner(struct mutex *lock)
-{
-	return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA=
GS);
-}
-
 static inline struct task_struct *__owner_task(unsigned long owner)
 {
 	return (struct task_struct *)(owner & ~MUTEX_FLAGS);
diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h
index 0b2a79c4013b..1c7d3d32def8 100644
--- a/kernel/locking/mutex.h
+++ b/kernel/locking/mutex.h
@@ -20,6 +20,31 @@ struct mutex_waiter {
 #endif
 };
=20
+/*
+ * @owner: contains: 'struct task_struct *' to the current lock owner,
+ * NULL means not owned. Since task_struct pointers are aligned at
+ * at least L1_CACHE_BYTES, we have low bits to store extra state.
+ *
+ * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup.
+ * Bit1 indicates unlock needs to hand the lock to the top-waiter
+ * Bit2 indicates handoff has been done and we're waiting for pickup.
+ */
+#define MUTEX_FLAG_WAITERS	0x01
+#define MUTEX_FLAG_HANDOFF	0x02
+#define MUTEX_FLAG_PICKUP	0x04
+
+#define MUTEX_FLAGS		0x07
+
+/*
+ * Internal helper function; C doesn't allow us to hide it :/
+ *
+ * DO NOT USE (outside of mutex & scheduler code).
+ */
+static inline struct task_struct *__mutex_owner(struct mutex *lock)
+{
+	return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA=
GS);
+}
+
 #ifdef CONFIG_DEBUG_MUTEXES
 extern void debug_mutex_lock_common(struct mutex *lock,
 				    struct mutex_waiter *waiter);
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1FF60C4167D
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:36:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233044AbjKFTgL (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:36:11 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52300 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232939AbjKFTfz (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:35:55 -0500
Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com
 [IPv6:2607:f8b0:4864:20::114a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E67010CF
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:35:47 -0800 (PST)
Received: by mail-yw1-x114a.google.com with SMTP id
 00721157ae682-5b59662ff67so26976577b3.0
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:35:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299347; x=1699904147;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=jkWqsamzbkDYVkOMHZ9o6Mm5vyOWDIPUHoqOezvBTm8=;
        b=n61RZnkXf160k8YCjVauYORtV1bmgd2KwyBt/E1hJRXs8M0Kbz137QgFTd1+99tPEl
         bkdHoQOc6/wRV7xTdN1o4NF1C2IJSEBhFR87CAyUjrpftjVXYqJ/xtltXMWXxJaY6/bk
         mPTlcqd+rGoYo41LZnTCDb4jZ2u3nXCZKeVoQswRGFua6cEJzcnQR9vbPdMoCUPic9Pa
         MJ0bQLEa8MWWJPRO7xz7O6gmWoRpUoFbkmW1m7BTS9D0iPHPP8GBjDuiHaoe80vef1Fa
         YY7xLEoG8HO1MuRQoWeciLp5h2ZjBJBM8mjzYycSHE1AneeL+Fe4AlurF0hPp1+ePBEh
         gGhQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299347; x=1699904147;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=jkWqsamzbkDYVkOMHZ9o6Mm5vyOWDIPUHoqOezvBTm8=;
        b=UA2E2bnLrfAt7GK4glFglhYvkyklYOJ8n8jeQbr2ldPL5/w22X4TeNzbQIJRWWy7gQ
         W85WVZz5GbOnaH5bzipnkZUa8ad8t5HfnbrWDIWwRFKFH2jEvXBtdmTzQqB+kUSRmuJ3
         FAbGIaGZdmstgX8JL79Vc/lOpCYr4lihAtF1/mJjpnfms0o1g2qC4VZPFJnEGWKpY+45
         gLzH9/jQqNb3wk54FPPTj5bBlRTCThSKzVzTlNzp35HwSGvfMjFw+D+64iYCK9EinbcT
         mu5cX7NmjgiBbLEtDBe+nrG4EYJVyhlYb3206DPxAwNyQIOelPSwSsAswKRCnIBwgYon
         eftQ==
X-Gm-Message-State: AOJu0Yxoj717uG5TmplFPx2NCx9GpddXCm4ZT855GzyiY1+mZ4Uhxx8B
        ekGZx4GUgSZrN+qKqnO0jpDkjKv1IwyeYb2HMG7Ap90QBUBsE9kDMa3mO2PL5kaMEN7WOTtvQGo
        BAWDJHcxAM278ld14yOYifV4tE5f38BB6+NH3DjYVaJxp3jqkPiKenKgLMIKbKTDQSo/EXbw=
X-Google-Smtp-Source: 
 AGHT+IGJXm/z1L8KYB7V1a4zuw/rVw0M27NKezXNHtsJGE+rw8G6HBli5J01iVWbxZS6n2I87vrP3dktp8jZ
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a25:d28e:0:b0:da0:37e1:558e with SMTP id
 j136-20020a25d28e000000b00da037e1558emr13029ybg.6.1699299346350; Mon, 06 Nov
 2023 11:35:46 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:48 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-6-jstultz@google.com>
Subject: [PATCH v6 05/20] locking/mutex: Rework task_struct::blocked_on
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com,
        "Connor O'Brien" <connoro@google.com>,
        John Stultz <jstultz@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

Track the blocked-on relation for mutexes, this allows following this
relation at schedule time.

   task
     | blocked-on
     v
   mutex
     | owner
     v
   task

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[minor changes while rebasing]
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Fix blocked_on tracking in __mutex_lock_common in error paths]
Signed-off-by: John Stultz <jstultz@google.com>
---
v2:
* Fixed blocked_on tracking in error paths that was causing crashes
v4:
* Ensure we clear blocked_on when waking ww_mutexes to die or wound.
  This is critical so we don't get ciruclar blocked_on relationships
  that can't be resolved.
v5:
* Fix potential bug where the skip_wait path might clear blocked_on
  when that path never set it
* Slight tweaks to where we set blocked_on to make it consistent,
  along with extra WARN_ON correctness checking
* Minor comment changes
---
 include/linux/sched.h        |  5 +----
 kernel/fork.c                |  3 +--
 kernel/locking/mutex-debug.c |  9 +++++----
 kernel/locking/mutex.c       | 10 ++++++++++
 kernel/locking/ww_mutex.h    | 17 +++++++++++++++--
 5 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4f5b0710c0f1..22a6ac47d5fb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1145,10 +1145,7 @@ struct task_struct {
 	struct rt_mutex_waiter		*pi_blocked_on;
 #endif
=20
-#ifdef CONFIG_DEBUG_MUTEXES
-	/* Mutex deadlock detection: */
-	struct mutex_waiter		*blocked_on;
-#endif
+	struct mutex			*blocked_on;	/* lock we're blocked on */
=20
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 	int				non_block_count;
diff --git a/kernel/fork.c b/kernel/fork.c
index 3b6d20dfb9a8..1c3f7eaa9239 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2455,9 +2455,8 @@ __latent_entropy struct task_struct *copy_process(
 	lockdep_init_task(p);
 #endif
=20
-#ifdef CONFIG_DEBUG_MUTEXES
 	p->blocked_on =3D NULL; /* not blocked yet */
-#endif
+
 #ifdef CONFIG_BCACHE
 	p->sequential_io	=3D 0;
 	p->sequential_io_avg	=3D 0;
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index bc8abb8549d2..7228909c3e62 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -52,17 +52,18 @@ void debug_mutex_add_waiter(struct mutex *lock, struct =
mutex_waiter *waiter,
 {
 	lockdep_assert_held(&lock->wait_lock);
=20
-	/* Mark the current thread as blocked on the lock: */
-	task->blocked_on =3D waiter;
+	/* Current thread can't be already blocked (since it's executing!) */
+	DEBUG_LOCKS_WARN_ON(task->blocked_on);
 }
=20
 void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *wa=
iter,
 			 struct task_struct *task)
 {
+	struct mutex *blocked_on =3D READ_ONCE(task->blocked_on);
+
 	DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list));
 	DEBUG_LOCKS_WARN_ON(waiter->task !=3D task);
-	DEBUG_LOCKS_WARN_ON(task->blocked_on !=3D waiter);
-	task->blocked_on =3D NULL;
+	DEBUG_LOCKS_WARN_ON(blocked_on && blocked_on !=3D lock);
=20
 	INIT_LIST_HEAD(&waiter->list);
 	waiter->task =3D NULL;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 2c5d1a9cf767..73064e4865b7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -622,6 +622,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 			goto err_early_kill;
 	}
=20
+	current->blocked_on =3D lock;
 	set_current_state(state);
 	trace_contention_begin(lock, LCB_F_MUTEX);
 	for (;;) {
@@ -662,6 +663,10 @@ __mutex_lock_common(struct mutex *lock, unsigned int s=
tate, unsigned int subclas
=20
 		first =3D __mutex_waiter_is_first(lock, &waiter);
=20
+		/*
+		 * Gets reset by unlock path().
+		 */
+		current->blocked_on =3D lock;
 		set_current_state(state);
 		/*
 		 * Here we order against unlock; we must either see it change
@@ -682,6 +687,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	}
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 acquired:
+	current->blocked_on =3D NULL;
 	__set_current_state(TASK_RUNNING);
=20
 	if (ww_ctx) {
@@ -712,9 +718,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int s=
tate, unsigned int subclas
 	return 0;
=20
 err:
+	current->blocked_on =3D NULL;
 	__set_current_state(TASK_RUNNING);
 	__mutex_remove_waiter(lock, &waiter);
 err_early_kill:
+	WARN_ON(current->blocked_on);
 	trace_contention_end(lock, ret);
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	debug_mutex_free_waiter(&waiter);
@@ -926,6 +934,8 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 		next =3D waiter->task;
=20
 		debug_mutex_wake_waiter(lock, waiter);
+		WARN_ON(next->blocked_on !=3D lock);
+		next->blocked_on =3D NULL;
 		wake_q_add(&wake_q, next);
 	}
=20
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 8b94f4b89e74..8bb334491732 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -284,6 +284,13 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER=
 *waiter,
 #ifndef WW_RT
 		debug_mutex_wake_waiter(lock, waiter);
 #endif
+		/*
+		 * When waking up the task to die, be sure to clear the
+		 * blocked_on pointer. Otherwise we can see circular
+		 * blocked_on relationships that can't resolve.
+		 */
+		WARN_ON(waiter->task->blocked_on !=3D lock);
+		waiter->task->blocked_on =3D NULL;
 		wake_q_add(wake_q, waiter->task);
 	}
=20
@@ -331,9 +338,15 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 		 * it's wounded in __ww_mutex_check_kill() or has a
 		 * wakeup pending to re-read the wounded state.
 		 */
-		if (owner !=3D current)
+		if (owner !=3D current) {
+			/*
+			 * When waking up the task to wound, be sure to clear the
+			 * blocked_on pointer. Otherwise we can see circular
+			 * blocked_on relationships that can't resolve.
+			 */
+			owner->blocked_on =3D NULL;
 			wake_q_add(wake_q, owner);
-
+		}
 		return true;
 	}
=20
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3E8B8C4167B
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:36:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233025AbjKFTgI (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:36:08 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52390 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232942AbjKFTfz (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:35:55 -0500
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4243010E0
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:35:48 -0800 (PST)
Received: by mail-yb1-xb4a.google.com with SMTP id
 3f1490d57ef6-d9a5a3f2d4fso5597486276.3
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:35:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299348; x=1699904148;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=7MPkRceNH3DZ48m8K5H1aiiXD4O3G2rzC5nk4A0l5dQ=;
        b=RUTh5ND5THQ00isgSf+h1XSx89Nx2eHRZ5tDdhQxmu0cwQ5KgnDqquYLYk4lDps+Od
         VpmLhuCOXDwA0wy1e21L58M91i7YaB+swvNGEEpqpEW2MM/w7Rd3UzXPXZhxT4cYBitO
         KhO25GNLFMXrqd8OhuhVe7zTz3iyarw+ua/wy7ChGD5sQCE0ytfGBBv2jTn2hOQKV7Kg
         yhxlCIXiVVn8Na4fV1jUTEDC7InBTi72mlw5NQ2sLFoWALJW3I+oRvTv6vwJgCHxrHcC
         fYnHYfuytLGb7rAVXuUtactH6juORxdPAiDWO0mBq1Y2G1gRcRUtCh1uKR8cYg+TM9IY
         RDcg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299348; x=1699904148;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=7MPkRceNH3DZ48m8K5H1aiiXD4O3G2rzC5nk4A0l5dQ=;
        b=oOgj8UtSJUNpIy6gHQVd4Lis8hFfx8CmgfdrXkoSZwV8aWJKDKWUCTU4UzzQoCn04v
         /o/rpI4VWGFp1GB8ZpeToLor832PxpvyRN3u1p3FJLChnDTIoxoKnBogTOFwL0FnzcVb
         Yzi6jemf8La8NxGW3qT89t6HGzF8YY78DCpJicJATIy27VNPW1Fi91D3p/YrLFdj5aFk
         lsK9s5V08ZXNj+jBxqi1xCvJMIZsU0oC2zcSa8zuU4xNpeZphO9WO44W6ETWX/Rclc06
         4abEjzw+OKgWaSuOJ9GtLk4EO22x3QraWGCKVMkSM90j5ZKp5QcHQpE47f+Lqc7IfLZt
         5PRw==
X-Gm-Message-State: AOJu0YxIUTfYWnpNjO4Ey49l1vLEYrzIj4nUpiB4h+2SAiI2NsMmfXoe
        GLen9So1wlqRd0Gfq7XyDzc109l2otkMiqQ3mhvhTo14ech/F46Wk5D+fxII+QgPUSzKN4V0H32
        79qwejdpy4XtLkZPnKDg8ArRlqYpZ8PWWNK44q73FOUH9tHiM7NpDvu7i2rAYcpEgYmx2peg=
X-Google-Smtp-Source: 
 AGHT+IH2v0IY4Ooueuqum1UEnJEfRT9m5qDMVJhlOrt7jMHjxnXfjwd6dqtfMq+xdOOHjo8tQyjdxfnukRBx
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a25:b108:0:b0:d99:3750:d607 with SMTP id
 g8-20020a25b108000000b00d993750d607mr530175ybj.8.1699299347935; Mon, 06 Nov
 2023 11:35:47 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:49 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-7-jstultz@google.com>
Subject: [PATCH v6 06/20] locking/mutex: Add task_struct::blocked_lock to
 serialize changes to the blocked_on state
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com,
        Valentin Schneider <valentin.schneider@arm.com>,
        "Connor O'Brien" <connoro@google.com>,
        John Stultz <jstultz@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

This patch was split out from the later "sched: Add proxy
execution" patch.

Adds blocked_lock to the task_struct so we can safely keep track
of which tasks are blocked on us.

This will be used for tracking blocked-task/mutex chains with
the prox-execution patch in a similar fashion to how priority
inheritence is done with rt_mutexes.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[rebased, added comments and changelog]
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
[Fixed rebase conflicts]
[squashed sched: Ensure blocked_on is always guarded by blocked_lock]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
[fix rebase conflicts, various fixes & tweaks commented inline]
[squashed sched: Use rq->curr vs rq->proxy checks]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Split out from bigger patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v2:
* Split out into its own patch
v4:
* Remove verbose comments/questions to avoid review
  distractions, as suggested by Dietmar
* Fixed nested block_on locking for ww_mutex access
---
 include/linux/sched.h     |  1 +
 init/init_task.c          |  1 +
 kernel/fork.c             |  1 +
 kernel/locking/mutex.c    | 24 ++++++++++++++++++++----
 kernel/locking/ww_mutex.h |  6 ++++++
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 22a6ac47d5fb..a9258dae00e0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1146,6 +1146,7 @@ struct task_struct {
 #endif
=20
 	struct mutex			*blocked_on;	/* lock we're blocked on */
+	raw_spinlock_t			blocked_lock;
=20
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 	int				non_block_count;
diff --git a/init/init_task.c b/init/init_task.c
index ff6c4b9bfe6b..189ce67e9704 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -130,6 +130,7 @@ struct task_struct init_task
 	.journal_info	=3D NULL,
 	INIT_CPU_TIMERS(init_task)
 	.pi_lock	=3D __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
+	.blocked_lock	=3D __RAW_SPIN_LOCK_UNLOCKED(init_task.blocked_lock),
 	.timer_slack_ns =3D 50000, /* 50 usec default slack */
 	.thread_pid	=3D &init_struct_pid,
 	.thread_group	=3D LIST_HEAD_INIT(init_task.thread_group),
diff --git a/kernel/fork.c b/kernel/fork.c
index 1c3f7eaa9239..47b76ed5ddf6 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2353,6 +2353,7 @@ __latent_entropy struct task_struct *copy_process(
 	ftrace_graph_init_task(p);
=20
 	rt_mutex_init_task(p);
+	raw_spin_lock_init(&p->blocked_lock);
=20
 	lockdep_assert_irqs_enabled();
 #ifdef CONFIG_PROVE_LOCKING
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 73064e4865b7..df186c0bf4a9 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -592,6 +592,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	}
=20
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
+	raw_spin_lock(&current->blocked_lock);
 	/*
 	 * After waiting to acquire the wait_lock, try again.
 	 */
@@ -653,6 +654,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 				goto err;
 		}
=20
+		raw_spin_unlock(&current->blocked_lock);
 		raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 		/* Make sure we do wakeups before calling schedule */
 		if (!wake_q_empty(&wake_q)) {
@@ -663,6 +665,8 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
=20
 		first =3D __mutex_waiter_is_first(lock, &waiter);
=20
+		raw_spin_lock_irqsave(&lock->wait_lock, flags);
+		raw_spin_lock(&current->blocked_lock);
 		/*
 		 * Gets reset by unlock path().
 		 */
@@ -677,15 +681,23 @@ __mutex_lock_common(struct mutex *lock, unsigned int =
state, unsigned int subclas
 			break;
=20
 		if (first) {
+			bool acquired;
+
+			/*
+			 * mutex_optimistic_spin() can schedule, so  we need to
+			 * release these locks before calling it.
+			 */
+			raw_spin_unlock(&current->blocked_lock);
+			raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 			trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
-			if (mutex_optimistic_spin(lock, ww_ctx, &waiter))
+			acquired =3D mutex_optimistic_spin(lock, ww_ctx, &waiter);
+			raw_spin_lock_irqsave(&lock->wait_lock, flags);
+			raw_spin_lock(&current->blocked_lock);
+			if (acquired)
 				break;
 			trace_contention_begin(lock, LCB_F_MUTEX);
 		}
-
-		raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	}
-	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 acquired:
 	current->blocked_on =3D NULL;
 	__set_current_state(TASK_RUNNING);
@@ -712,6 +724,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	if (ww_ctx)
 		ww_mutex_lock_acquired(ww, ww_ctx);
=20
+	raw_spin_unlock(&current->blocked_lock);
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
@@ -724,6 +737,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 err_early_kill:
 	WARN_ON(current->blocked_on);
 	trace_contention_end(lock, ret);
+	raw_spin_unlock(&current->blocked_lock);
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, ip);
@@ -934,8 +948,10 @@ static noinline void __sched __mutex_unlock_slowpath(s=
truct mutex *lock, unsigne
 		next =3D waiter->task;
=20
 		debug_mutex_wake_waiter(lock, waiter);
+		raw_spin_lock(&next->blocked_lock);
 		WARN_ON(next->blocked_on !=3D lock);
 		next->blocked_on =3D NULL;
+		raw_spin_unlock(&next->blocked_lock);
 		wake_q_add(&wake_q, next);
 	}
=20
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 8bb334491732..2929a95b4272 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -281,6 +281,8 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER =
*waiter,
 		return false;
=20
 	if (waiter->ww_ctx->acquired > 0 && __ww_ctx_less(waiter->ww_ctx, ww_ctx)=
) {
+		/* nested as we should hold current->blocked_lock already */
+		raw_spin_lock_nested(&waiter->task->blocked_lock, SINGLE_DEPTH_NESTING);
 #ifndef WW_RT
 		debug_mutex_wake_waiter(lock, waiter);
 #endif
@@ -292,6 +294,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER =
*waiter,
 		WARN_ON(waiter->task->blocked_on !=3D lock);
 		waiter->task->blocked_on =3D NULL;
 		wake_q_add(wake_q, waiter->task);
+		raw_spin_unlock(&waiter->task->blocked_lock);
 	}
=20
 	return true;
@@ -339,6 +342,8 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 		 * wakeup pending to re-read the wounded state.
 		 */
 		if (owner !=3D current) {
+			/* nested as we should hold current->blocked_lock already */
+			raw_spin_lock_nested(&owner->blocked_lock, SINGLE_DEPTH_NESTING);
 			/*
 			 * When waking up the task to wound, be sure to clear the
 			 * blocked_on pointer. Otherwise we can see circular
@@ -346,6 +351,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 			 */
 			owner->blocked_on =3D NULL;
 			wake_q_add(wake_q, owner);
+			raw_spin_unlock(&owner->blocked_lock);
 		}
 		return true;
 	}
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C2ADCC4332F
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:36:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233055AbjKFTgR (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:36:17 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52464 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232967AbjKFTf5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:35:57 -0500
Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com
 [IPv6:2607:f8b0:4864:20::114a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8ADD2D47
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:35:50 -0800 (PST)
Received: by mail-yw1-x114a.google.com with SMTP id
 00721157ae682-5af9b0850fdso65781507b3.1
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:35:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299349; x=1699904149;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=E57J/i4cUWuAwfC7XG6AzrNtexVrU81CJufi+3vfKlM=;
        b=qXw7kJ5W0ifuJtSiQBRP27i2DJmvwdlsh9bncGmXbYBqDYYEhEuBOo5Cyjdowg7FC1
         Jhb56lpXVAGKmF23W3Lr2b6LWUQVVfH/uiyDSURF+9b+F2lm1TVf6PI2gAlcZk28FjNA
         r2/TjPFMz9+xSTpX9OrFKN1XcDX6WAt7VMFMi0K/Xs+Bv8NpjV9yQBkIUSD66D3cpg2z
         pVf1RvV4cIPrt48yP88jt+byUsYVm6bMZY1W+lrQ7Cogj1riaUOL9PFj4ikkfH0OVMkM
         ZYgdKsdUZ5lK0V6A1uXFuot2Vlb3ON+ix7ymyjPEQymtDHtsFIGMNylBLM3/ky/eiiN7
         PVQA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299349; x=1699904149;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=E57J/i4cUWuAwfC7XG6AzrNtexVrU81CJufi+3vfKlM=;
        b=WswFjBlqhuztzGjE/M7PP+uFuT8tLpSDkoXusnGxZv7nNRknTT0IHtSj8iqTWPCnuU
         9O7FPP92JiwqIPvj7dZPQFE5pX/oHZyU1nkfruEWcDVa/uE/qfhBB/jaBwB1Hhuwt8J3
         O/F6z5JcIRjCbZNsQ6sRsMzNs+RhUW84hB5aA0YdlSVdKAFl6W47WsP9e5/jTZeUfbgJ
         J3mid+MZsE4twpJPkjnLsdMVpwiLqtT7a/m7zWSzk2LtRnS40N0km0Fo4lLbrV36t+Zv
         KJNP9A8qWKP+JwBpcZyODoBIZ87z3uIdaOJ2MxJAeB5qzQxsIGbkKxLeM7HE5f3SsqPn
         5g7Q==
X-Gm-Message-State: AOJu0YzGrf6INBJZe4vMZmLRK7cxn5YxR9xbzEVR9R+L85Ac+UzWvLAN
        wQevlU4j2UaeZxTMiWwb3zdK6GttrECeBhYnpMJm8Pb5h3KuCSL3VHeheRKrWEtxB94ruhTJmEj
        LG1t936qa61ks6H/ojzGbHCGkBX5XsQ2PvwqVOwUldfE1Pm2p3mDfoha3GnFG+Pd0rfTvJsg=
X-Google-Smtp-Source: 
 AGHT+IEp4vHO62WTZMDwX26JCgUwtSKYYeDVwtDwzdUlAENraqFn2qKaM1LQfz49GTp+jAgBqyhNrwEpfL/O
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6902:1083:b0:d9a:c3b8:4274 with SMTP
 id v3-20020a056902108300b00d9ac3b84274mr698783ybu.7.1699299349407; Mon, 06
 Nov 2023 11:35:49 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:50 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-8-jstultz@google.com>
Subject: [PATCH v6 07/20] locking/mutex: Add p->blocked_on wrappers for
 correctness checks
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com,
        "Connor O'Brien" <connoro@google.com>,
        John Stultz <jstultz@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Valentin Schneider <valentin.schneider@arm.com>

This lets us assert p->blocked_lock is held whenever we access
p->blocked_on, as well as warn us for unexpected state changes.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
[fix conflicts, call in more places]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: tweaked commit subject, added get_task_blocked_on() as well]
Signed-off-by: John Stultz <jstultz@google.com>
---
v2:
* Added get_task_blocked_on() accessor
v4:
* Address READ_ONCE usage that was dropped in v2
* Reordered to be a later add on to the main patch series as
  Peter was unhappy with similar wrappers in other patches.
v5:
* Added some extra correctness checking in wrappers
---
 include/linux/sched.h        | 22 ++++++++++++++++++++++
 kernel/locking/mutex-debug.c |  4 ++--
 kernel/locking/mutex.c       | 10 +++++-----
 kernel/locking/ww_mutex.h    |  4 ++--
 4 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a9258dae00e0..81334677e008 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2248,6 +2248,28 @@ static inline int rwlock_needbreak(rwlock_t *lock)
 #endif
 }
=20
+static inline void set_task_blocked_on(struct task_struct *p, struct mutex=
 *m)
+{
+	lockdep_assert_held(&p->blocked_lock);
+
+	/* We should be setting values to NULL or NULL to values */
+	WARN_ON((!m && !p->blocked_on) || (m && p->blocked_on));
+
+	p->blocked_on =3D m;
+}
+
+static inline struct mutex *get_task_blocked_on(struct task_struct *p)
+{
+	lockdep_assert_held(&p->blocked_lock);
+
+	return p->blocked_on;
+}
+
+static inline struct mutex *get_task_blocked_on_once(struct task_struct *p)
+{
+	return READ_ONCE(p->blocked_on);
+}
+
 static __always_inline bool need_resched(void)
 {
 	return unlikely(tif_need_resched());
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index 7228909c3e62..1eedf7c60c00 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -53,13 +53,13 @@ void debug_mutex_add_waiter(struct mutex *lock, struct =
mutex_waiter *waiter,
 	lockdep_assert_held(&lock->wait_lock);
=20
 	/* Current thread can't be already blocked (since it's executing!) */
-	DEBUG_LOCKS_WARN_ON(task->blocked_on);
+	DEBUG_LOCKS_WARN_ON(get_task_blocked_on(task));
 }
=20
 void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *wa=
iter,
 			 struct task_struct *task)
 {
-	struct mutex *blocked_on =3D READ_ONCE(task->blocked_on);
+	struct mutex *blocked_on =3D get_task_blocked_on_once(task);
=20
 	DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list));
 	DEBUG_LOCKS_WARN_ON(waiter->task !=3D task);
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index df186c0bf4a9..36e563f69705 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -623,7 +623,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 			goto err_early_kill;
 	}
=20
-	current->blocked_on =3D lock;
+	set_task_blocked_on(current, lock);
 	set_current_state(state);
 	trace_contention_begin(lock, LCB_F_MUTEX);
 	for (;;) {
@@ -670,7 +670,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		/*
 		 * Gets reset by unlock path().
 		 */
-		current->blocked_on =3D lock;
+		set_task_blocked_on(current, lock);
 		set_current_state(state);
 		/*
 		 * Here we order against unlock; we must either see it change
@@ -699,7 +699,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		}
 	}
 acquired:
-	current->blocked_on =3D NULL;
+	set_task_blocked_on(current, NULL);
 	__set_current_state(TASK_RUNNING);
=20
 	if (ww_ctx) {
@@ -731,7 +731,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	return 0;
=20
 err:
-	current->blocked_on =3D NULL;
+	set_task_blocked_on(current, NULL);
 	__set_current_state(TASK_RUNNING);
 	__mutex_remove_waiter(lock, &waiter);
 err_early_kill:
@@ -950,7 +950,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 		debug_mutex_wake_waiter(lock, waiter);
 		raw_spin_lock(&next->blocked_lock);
 		WARN_ON(next->blocked_on !=3D lock);
-		next->blocked_on =3D NULL;
+		set_task_blocked_on(current, NULL);
 		raw_spin_unlock(&next->blocked_lock);
 		wake_q_add(&wake_q, next);
 	}
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 2929a95b4272..44a532dda927 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -292,7 +292,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER =
*waiter,
 		 * blocked_on relationships that can't resolve.
 		 */
 		WARN_ON(waiter->task->blocked_on !=3D lock);
-		waiter->task->blocked_on =3D NULL;
+		set_task_blocked_on(waiter->task, NULL);
 		wake_q_add(wake_q, waiter->task);
 		raw_spin_unlock(&waiter->task->blocked_lock);
 	}
@@ -349,7 +349,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 			 * blocked_on pointer. Otherwise we can see circular
 			 * blocked_on relationships that can't resolve.
 			 */
-			owner->blocked_on =3D NULL;
+			set_task_blocked_on(owner, NULL);
 			wake_q_add(wake_q, owner);
 			raw_spin_unlock(&owner->blocked_lock);
 		}
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2BE8BC4167D
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:36:20 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233061AbjKFTgU (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:36:20 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52506 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232975AbjKFTf5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:35:57 -0500
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D410D42
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:35:52 -0800 (PST)
Received: by mail-yb1-xb4a.google.com with SMTP id
 3f1490d57ef6-d9a5a3f2d4fso5597543276.3
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:35:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299351; x=1699904151;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=JeaszBAoxCXKdkXYBf2TzRJcYXC3x0zPMGfP5WqEvsc=;
        b=4T34G51Vy24jzZYmXhLVBV33LpOwspeKDUmQx9HJQcv5TeDzByoGZmuh1Jx58QYspS
         GOo0ztpeLs7Zbct5/wX4dWIz8vkxkig9fJVuWYALs+xhjgKI3h1B0z6wlkn9ZKy6VHqN
         ARAA38YjWfq1heHQVGWuqG0nou7/iBtBHdH44evGq+gCm/8NpJvW9R1f+PPFPAOxkk6+
         mnkA70x+yEFQj/MtNLsm9IzFE5aacYqnAlH11ATuNlXvbjbGdpnG2ShqGSMdseDseUVP
         /Nwl7zBP+3eB8WAK/ibp9/4PgeabDcsa21IMJw/cS1z//4pcojchVH/Vjg9TMbcvXIxd
         uNzA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299351; x=1699904151;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=JeaszBAoxCXKdkXYBf2TzRJcYXC3x0zPMGfP5WqEvsc=;
        b=LqxsQkKrNbyX56umMmhVXarNhXUp+LxsOOLjIQNIbpM9NIZEZDWXaywKJliIf2dmGR
         2rRNppWC5qgMUQt3O1YbMdz8cwMXRC6jDPO2lhek5rSD5uwYs8t7TRi0o32aUD4myeY0
         JJV3dOq8tZTnJR7sEJu8GzS7qOW9cRqABiVvTGoL++Rzm/c7gUFKyX0+T1PCQW92aeqv
         wHgQjEsyXacv95jF1+V/JRJeLg7vf9wUha7n25bHU90hJSIolDQAiquVZ/p3JLFEiwsp
         S5q5z+NUls6FDRwGKyfM1MEj0sD+CUMofivcjUApvrA/cbPemJZVBNtX3y5RDP1jHweu
         acOQ==
X-Gm-Message-State: AOJu0YxKTELPlymViI2aGbGQpmsGbYc0IHijfpBprdiXulnbcsd5TJzJ
        eW70ajV2eduGXBCqyL1TbbZq2F4sfZs3GqMVlxxIJO+bWxq7b8lrC9K9idgdOAxScq6X+xB6LJH
        ZbC8WY3C1AuJM10Uks59pcgsPF4T93W2qkVHK+bgbD4ph/A7ibQFB3Y/5AwgHfY2F8ZU304g=
X-Google-Smtp-Source: 
 AGHT+IFfAQBJwHKCUEtc1lCCAvLeb7ZhZV6fQoGJP3MV8a5CMDnMVK4WUbPUdMMs7gks2IbL4dSEWa3/hqNZ
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6902:1083:b0:d9a:c3b8:4274 with SMTP
 id v3-20020a056902108300b00d9ac3b84274mr698792ybu.7.1699299351218; Mon, 06
 Nov 2023 11:35:51 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:51 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-9-jstultz@google.com>
Subject: [PATCH v6 08/20] sched: Add CONFIG_PROXY_EXEC & boot argument to
 enable/disable
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This patch adds the CONFIG_PROXY_EXEC option, along with
a boot argument prox_exec=3D that can be used to disable the
feature at boot time if CONFIG_PROXY_EXEC was enabled.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
 .../admin-guide/kernel-parameters.txt         |  4 +++
 include/linux/sched.h                         | 13 ++++++++
 init/Kconfig                                  |  7 +++++
 kernel/sched/core.c                           | 31 +++++++++++++++++++
 4 files changed, 55 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio=
n/admin-guide/kernel-parameters.txt
index 0a1731a0f0ef..199578ae1606 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4648,6 +4648,10 @@
 			that).
 			Format: <bool>
=20
+	proxy_exec=3D	[KNL] Enable or disables "proxy execution" style
+			solution to mutex based priority inversion.
+			Format: "enable" or "disable"
+
 	psi=3D		[KNL] Enable or disable pressure stall information
 			tracking.
 			Format: <bool>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81334677e008..5f05d9a4cc3f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1551,6 +1551,19 @@ struct task_struct {
 	 */
 };
=20
+#ifdef CONFIG_PROXY_EXEC
+DECLARE_STATIC_KEY_TRUE(__sched_proxy_exec);
+static inline bool sched_proxy_exec(void)
+{
+	return static_branch_likely(&__sched_proxy_exec);
+}
+#else
+static inline bool sched_proxy_exec(void)
+{
+	return false;
+}
+#endif
+
 static inline struct pid *task_pid(struct task_struct *task)
 {
 	return task->thread_pid;
diff --git a/init/Kconfig b/init/Kconfig
index 6d35728b94b2..884f94d8ee9e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -908,6 +908,13 @@ config NUMA_BALANCING_DEFAULT_ENABLED
 	  If set, automatic NUMA balancing will be enabled if running on a NUMA
 	  machine.
=20
+config PROXY_EXEC
+	bool "Proxy Execution"
+	default n
+	help
+	  This option enables proxy execution, a mechanism for mutex owning
+	  tasks to inherit the scheduling context of higher priority waiters.
+
 menuconfig CGROUPS
 	bool "Control Group support"
 	select KERNFS
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 802551e0009b..a38bf8ef5798 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -117,6 +117,37 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_t=
p);
=20
 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
=20
+#ifdef CONFIG_PROXY_EXEC
+DEFINE_STATIC_KEY_TRUE(__sched_proxy_exec);
+static int __init setup_proxy_exec(char *str)
+{
+	int ret =3D 0;
+
+	if (!str)
+		goto out;
+
+	if (!strcmp(str, "enable")) {
+		static_branch_enable(&__sched_proxy_exec);
+		ret =3D 1;
+	} else if (!strcmp(str, "disable")) {
+		static_branch_disable(&__sched_proxy_exec);
+		ret =3D 1;
+	}
+out:
+	if (!ret)
+		pr_warn("Unable to parse proxy_exec=3D\n");
+
+	return ret;
+}
+#else
+static int __init setup_proxy_exec(char *str)
+{
+	pr_warn("CONFIG_PROXY_EXEC=3Dn, so it cannot be enabled or disabled at bo=
ottime\n");
+	return 0;
+}
+#endif
+__setup("proxy_exec=3D", setup_proxy_exec);
+
 #ifdef CONFIG_SCHED_DEBUG
 /*
  * Debugging: various feature bits
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0C9BDC4332F
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:36:24 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233070AbjKFTgY (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:36:24 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52390 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232941AbjKFTf6 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:35:58 -0500
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CD3AD79
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:35:54 -0800 (PST)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-5afbcffe454so98456937b3.3
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:35:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299353; x=1699904153;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=QWj20OWbsgaFfuYnHKg9vRj05US4KuJfwuDjDU63DiY=;
        b=jiJ2t/S71PZg0fJ6lFp9DFGSi1BftrjUORhUUOoakAMaN32Jpgce5pm0zSYKlzp+To
         Hu7nxAqIC0f4C8ocf6J2i3hICZCC8xCmWTJiTpKCerNaEwm/APA5SgKshfchErI2tIZA
         M7NCwtZb4KUmjGN4xhOAQZXtDFXWl+d0/NqppjRWjoq31zqFSSKWU5kpC3hzrTDuImup
         Kqco33Wt+DghEFX+Y5GUv1FmK/1/rNtEj8s9YCOVpt407uko1UdPIEWqR1SMKe9RKlW1
         Mai9jOLibhhU6e+ODWkXZT3a9eA9jdPpOhpnN6iFAuNvQCco2/VIBcKBFRcHtzylNYG5
         WrjA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299353; x=1699904153;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=QWj20OWbsgaFfuYnHKg9vRj05US4KuJfwuDjDU63DiY=;
        b=IgiHftonNFJKuN5lzyw8lp3BHkxl2zVAJ9S5yM9p2oWeiUfQMGRuC8zMK0NDIidqOa
         vX767SIStbLdm06u+BGnNbANx739XySk9CJu3QT8dIcHQQD0nbgtYplK91TJWiCRYQ7t
         I9OEebxlwxdGkyvC15kNkdqpegjUAtAs0ovBQwclQRNu9NinVBr2n+sa/b9vqS5bZisl
         A1oE51Sfmk1lHPEiHUgOSHpUFN3CkVy3m+5DocboCjYJJGAIZ/g7szyIR86O3ZrEQ1+m
         b/ts9SlVNJpDWuSk8ONhvrBZsy/UQ0jpcKWbXvcr8cItVDKdi6zlD9MDwXeWRCONfkcV
         5pkg==
X-Gm-Message-State: AOJu0YyY4ywVZ6p9UvlgIZ0lQPQGaDQZQtxFEhL0OvD9BhBL0XnfkTNT
        tVQoW4K520VDF2EmjD3GVNRCOBCKe8lMXfD4qteBx7w/pT4toUp1NaJVk5rup6z2oxAi5g4QcnZ
        NmES8SLV6cxGlMy113hEQt5liRWPpwBlpI+I45wSmV6/k/b/G3MlNf1hlePqJMXTzSt/BYaU=
X-Google-Smtp-Source: 
 AGHT+IE9x5bWgSFi4G8HfQl/9l7DD67dhpm9VoN0fFQvAH07er21wyHh46EoCLNwxJzlzIoQm6swCP0Vp7xi
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a0d:d591:0:b0:583:4f82:b9d9 with SMTP id
 x139-20020a0dd591000000b005834f82b9d9mr225159ywd.5.1699299353337; Mon, 06 Nov
 2023 11:35:53 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:52 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-10-jstultz@google.com>
Subject: [PATCH v6 09/20] locking/mutex: Split blocked_on logic into two
 states (blocked_on and blocked_on_waking)
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This patch adds blocked_on_waking so we can track separately if
the task should be able to try to aquire the lock separately
from the lock it is blocked on.

This avoids some of the subtle magic where the blocked_on state
gets cleared, only to have it re-added by the
mutex_lock_slowpath call when it tries to aquire the lock on
wakeup

This should make dealing with the ww_mutex issue cleaner.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
 include/linux/sched.h     |  2 ++
 kernel/fork.c             |  1 +
 kernel/locking/mutex.c    |  7 ++++---
 kernel/locking/ww_mutex.h | 12 ++++++------
 kernel/sched/sched.h      | 12 ++++++++++++
 5 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5f05d9a4cc3f..47c7095b918a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1146,6 +1146,7 @@ struct task_struct {
 #endif
=20
 	struct mutex			*blocked_on;	/* lock we're blocked on */
+	bool				blocked_on_waking; /* blocked on, but waking */
 	raw_spinlock_t			blocked_lock;
=20
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
@@ -2269,6 +2270,7 @@ static inline void set_task_blocked_on(struct task_st=
ruct *p, struct mutex *m)
 	WARN_ON((!m && !p->blocked_on) || (m && p->blocked_on));
=20
 	p->blocked_on =3D m;
+	p->blocked_on_waking =3D false;
 }
=20
 static inline struct mutex *get_task_blocked_on(struct task_struct *p)
diff --git a/kernel/fork.c b/kernel/fork.c
index 47b76ed5ddf6..930947bf4569 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2457,6 +2457,7 @@ __latent_entropy struct task_struct *copy_process(
 #endif
=20
 	p->blocked_on =3D NULL; /* not blocked yet */
+	p->blocked_on_waking =3D false; /* not blocked yet */
=20
 #ifdef CONFIG_BCACHE
 	p->sequential_io	=3D 0;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 36e563f69705..f37b7afe8aa5 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -667,10 +667,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int =
state, unsigned int subclas
=20
 		raw_spin_lock_irqsave(&lock->wait_lock, flags);
 		raw_spin_lock(&current->blocked_lock);
+
 		/*
-		 * Gets reset by unlock path().
+		 * Clear blocked_on_waking flag set by the unlock path().
 		 */
-		set_task_blocked_on(current, lock);
+		current->blocked_on_waking =3D false;
 		set_current_state(state);
 		/*
 		 * Here we order against unlock; we must either see it change
@@ -950,7 +951,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 		debug_mutex_wake_waiter(lock, waiter);
 		raw_spin_lock(&next->blocked_lock);
 		WARN_ON(next->blocked_on !=3D lock);
-		set_task_blocked_on(current, NULL);
+		next->blocked_on_waking =3D true;
 		raw_spin_unlock(&next->blocked_lock);
 		wake_q_add(&wake_q, next);
 	}
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 44a532dda927..3b0a68d7e308 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -287,12 +287,12 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITE=
R *waiter,
 		debug_mutex_wake_waiter(lock, waiter);
 #endif
 		/*
-		 * When waking up the task to die, be sure to clear the
-		 * blocked_on pointer. Otherwise we can see circular
+		 * When waking up the task to die, be sure to set the
+		 * blocked_on_waking flag. Otherwise we can see circular
 		 * blocked_on relationships that can't resolve.
 		 */
 		WARN_ON(waiter->task->blocked_on !=3D lock);
-		set_task_blocked_on(waiter->task, NULL);
+		waiter->task->blocked_on_waking =3D true;
 		wake_q_add(wake_q, waiter->task);
 		raw_spin_unlock(&waiter->task->blocked_lock);
 	}
@@ -345,11 +345,11 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 			/* nested as we should hold current->blocked_lock already */
 			raw_spin_lock_nested(&owner->blocked_lock, SINGLE_DEPTH_NESTING);
 			/*
-			 * When waking up the task to wound, be sure to clear the
-			 * blocked_on pointer. Otherwise we can see circular
+			 * When waking up the task to wound, be sure to set the
+			 * blocked_on_waking flag. Otherwise we can see circular
 			 * blocked_on relationships that can't resolve.
 			 */
-			set_task_blocked_on(owner, NULL);
+			owner->blocked_on_waking =3D true;
 			wake_q_add(wake_q, owner);
 			raw_spin_unlock(&owner->blocked_lock);
 		}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1def5b7fa1df..7c37d478e0f8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2133,6 +2133,18 @@ static inline int task_current(struct rq *rq, struct=
 task_struct *p)
 	return rq->curr =3D=3D p;
 }
=20
+#ifdef CONFIG_PROXY_EXEC
+static inline bool task_is_blocked(struct task_struct *p)
+{
+	return sched_proxy_exec() && !!p->blocked_on && !p->blocked_on_waking;
+}
+#else /* !PROXY_EXEC */
+static inline bool task_is_blocked(struct task_struct *p)
+{
+	return false;
+}
+#endif /* PROXY_EXEC */
+
 static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
 {
 #ifdef CONFIG_SMP
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6CF39C4332F
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:36:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233086AbjKFTg2 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:36:28 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52422 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232748AbjKFTf7 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:35:59 -0500
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68B5E10C4
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:35:56 -0800 (PST)
Received: by mail-yb1-xb4a.google.com with SMTP id
 3f1490d57ef6-da03ef6fc30so5576977276.0
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:35:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299355; x=1699904155;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=SA9U3BYKpPLNMavyPlNHV0DBfe0X28VHPrgcDnlNWu4=;
        b=dtPbMDawfzBhwhQq3OO+tqSrn9Z5NXNBhaCYZ0iaiOKCRu5v+wr4ItugAqtwQwRmH+
         uP0YW2zXibSwOHKrnn96JOSFiloFEbTMa5IYbVJ1QNWJbpq+ZUvN86/rQFa8hDCwHXtV
         Bffm/zFcrGUJSR6pNzTE7IvDneaj9JKB2fulmEv0JzTi06PUaBFDUof+mafYI1naHrPm
         GYsWi6iMQRQoqX+oLKixJlre6CVGg0Qi6paJCmLo0hVrm9pkP2rgBhWLs4q0zNH9c+xg
         2tOT0E1Z6f2VWQUUoSxZxQ7TbmMJmUkKuIS2Uz2X9JdvzFSQv5TqRxaNg6YIwUn0bWUi
         6YWw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299355; x=1699904155;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=SA9U3BYKpPLNMavyPlNHV0DBfe0X28VHPrgcDnlNWu4=;
        b=c9z2kFU4GhWvTIHxYoDSTLvmIpNwpXC1mKfDODzIHg7yvcC8BP0G9FgQdZEsCAJxpC
         AXGPQf+HJgG2mff0BST+5wyingxk/PbTnyUu6g+yWyn0MivPWFFSnn8eW64D+Fa/ON8/
         wYPkQdUhdXH2o/i5KL8aK8Gl07osWnssi3ADBTyv4eWulgqQMcYro4mD9zQ0n4+JFUuH
         Cn47fFdz1A7lqBAjqrXUEislUhZg2SkMvSwLJrjA/yjR4m30JRl9ygj2WFtIHitxuddq
         iq6i6gIPC1rH+SQWkUtpOhROrTclgclbYpo2B8jlvfPvlTC+YVyavNAhsSiOd0ppOeb0
         PvVg==
X-Gm-Message-State: AOJu0YzbuRBitWnpaBM5SDdrDgIeAFBFbkS/4sM6t5/u11nwzzgKEBWe
        sCWOwPapNHBJgCa2Ys/wTiUKgex/kJqEC6QkOeMZ6faW2QB8Oqu2cnTJXOQr7DUHBk1FqSTUlOC
        Y3wLnoEyuBOOXs8ieIhfKbMw8qUt55YaXOaY5BUKs35nhO+H7OfrJNawfMJB5vLBe0+UYl+Y=
X-Google-Smtp-Source: 
 AGHT+IF5RAlbcLYjtzUA5D89EB529oUqPyEcJ5pKnV8ZnWdz83UyUdJSJ9VqPvsLcCCnDWjLa0aBuLaZTMhp
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6902:1802:b0:da0:94e8:84d4 with SMTP
 id cf2-20020a056902180200b00da094e884d4mr554056ybb.12.1699299355240; Mon, 06
 Nov 2023 11:35:55 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:53 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-11-jstultz@google.com>
Subject: [PATCH v6 10/20] locking/mutex: Switch to mutex handoffs for
 CONFIG_PROXY_EXEC
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com,
        Valentin Schneider <valentin.schneider@arm.com>,
        "Connor O'Brien" <connoro@google.com>,
        John Stultz <jstultz@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

Since with PROXY_EXEC, we will want to hand off locks to the
task's we are running on behalf of, switch to using mutex
handoffs.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[rebased, added comments and changelog]
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
[Fixed rebase conflicts]
[squashed sched: Ensure blocked_on is always guarded by blocked_lock]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
[fix rebase conflicts, various fixes & tweaks commented inline]
[squashed sched: Use rq->curr vs rq->proxy checks]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Split out only the very basic initial framework
 for proxy logic from a larger patch.]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Split out from core proxy patch
v6:
* Rework to use sched_proxy_exec() instead of #ifdef CONFIG_PROXY_EXEC
---
 kernel/Kconfig.locks   |  2 +-
 kernel/locking/mutex.c | 39 ++++++++++++++++++++++-----------------
 2 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 4198f0273ecd..791c98f1d329 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -226,7 +226,7 @@ config ARCH_SUPPORTS_ATOMIC_RMW
=20
 config MUTEX_SPIN_ON_OWNER
 	def_bool y
-	depends on SMP && ARCH_SUPPORTS_ATOMIC_RMW
+	depends on SMP && ARCH_SUPPORTS_ATOMIC_RMW && !PROXY_EXEC
=20
 config RWSEM_SPIN_ON_OWNER
        def_bool y
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index f37b7afe8aa5..5394a3c4b5d9 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -914,26 +914,31 @@ static noinline void __sched __mutex_unlock_slowpath(=
struct mutex *lock, unsigne
=20
 	mutex_release(&lock->dep_map, ip);
=20
-	/*
-	 * Release the lock before (potentially) taking the spinlock such that
-	 * other contenders can get on with things ASAP.
-	 *
-	 * Except when HANDOFF, in that case we must not clear the owner field,
-	 * but instead set it to the top waiter.
-	 */
-	owner =3D atomic_long_read(&lock->owner);
-	for (;;) {
-		MUTEX_WARN_ON(__owner_task(owner) !=3D current);
-		MUTEX_WARN_ON(owner & MUTEX_FLAG_PICKUP);
-
-		if (owner & MUTEX_FLAG_HANDOFF)
-			break;
+	if (sched_proxy_exec()) {
+		/* Always force HANDOFF for Proxy Exec for now. Revisit. */
+		owner =3D MUTEX_FLAG_HANDOFF;
+	} else {
+		/*
+		 * Release the lock before (potentially) taking the spinlock
+		 * such that other contenders can get on with things ASAP.
+		 *
+		 * Except when HANDOFF, in that case we must not clear the
+		 * owner field, but instead set it to the top waiter.
+		 */
+		owner =3D atomic_long_read(&lock->owner);
+		for (;;) {
+			MUTEX_WARN_ON(__owner_task(owner) !=3D current);
+			MUTEX_WARN_ON(owner & MUTEX_FLAG_PICKUP);
=20
-		if (atomic_long_try_cmpxchg_release(&lock->owner, &owner, __owner_flags(=
owner))) {
-			if (owner & MUTEX_FLAG_WAITERS)
+			if (owner & MUTEX_FLAG_HANDOFF)
 				break;
=20
-			return;
+			if (atomic_long_try_cmpxchg_release(&lock->owner, &owner,
+							    __owner_flags(owner))) {
+				if (owner & MUTEX_FLAG_WAITERS)
+					break;
+				return;
+			}
 		}
 	}
=20
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EFC9AC4167B
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:36:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232984AbjKFTgq (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:36:46 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57320 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233024AbjKFTgI (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:36:08 -0500
Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com
 [IPv6:2607:f8b0:4864:20::44a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 308AA10D2
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:35:59 -0800 (PST)
Received: by mail-pf1-x44a.google.com with SMTP id
 d2e1a72fcca58-6b31cb3cc7eso3225270b3a.0
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:35:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299358; x=1699904158;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=ClsCieoc65HaPuChloqB8WZHNaRbpIH283rOAN9AQVQ=;
        b=zv0CaL8fWlXFmEp9Ha6phT8k1R1y86ZLSsJT98IS1GRaNBeN4+JvwONvBIVSXROSwv
         /MNyR6QA6ZewhZJ/oToobTZ0HkgejG8DmabpVe+bfxL8cU89sEntbS7vgorbhiy+eDoF
         POztwXu5lc5gNzgo7845Kz9BzLqSEHrtjx4IY7HuxNyDrCBwksL+zVr1tbL0FzCbFxQO
         ZudRiGAxReOXxud+IDlg1N89RLybWsIAGvBC6uLAHrqpEHmB55hVKIPfm19Jqv0fSoKg
         mPgWeTSakpYT5lqbNQLgZ7sHAToulqHRraU3NFmFwXQHcK9zIbEVYu0z9SgrwlmJtJp3
         1XhA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299358; x=1699904158;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=ClsCieoc65HaPuChloqB8WZHNaRbpIH283rOAN9AQVQ=;
        b=My7z/bCyyqfayLLV2sZOEAyxyEQdTXdVjtD0iqW2BrkdnoKW1X3h48yLTfOF9wzeLC
         0+mMmkSu3PkTvvaAXJB+swd1kdGjsFD4NXZiwg9x9VFTiQoNatNrfCjJWYxf8sYQqk1L
         rxMhNfmtH3Xxf4NsW+2+fReuzRDO95TohdMlNeDJQIY+1sascw8gxFgrKtqtNqNIEW9x
         VOKcqOVJCoor2qB+DkfZOMn1Y/06KLbLm0mMhwxqQun5MQrUz+6P88pVkPx5vDt4MsC7
         afCba1WKtf8ISz2PNrYWxFbbo6PHCBMxNq5t3O9+lAjbZP/3U32lE9UA/a6qtVhfDKfv
         OhPg==
X-Gm-Message-State: AOJu0Yw+QV64ABFZMJiXzWJC0YoctoGha11rEazIIGMRIQQDhsTyy7S8
        LjO1P88VvgYvb/9r+a9eAguZbYy2HYm5k0Qy3XJ6SekPoVoJzLm/6saCZhxsTtYPWtiiSNPcbto
        JTOgtZe9haS0k48wr4nTarwifUFUonNyx6m2TerZ5cHRObnHKOI+eNV1Lbqq69h0eGcgCEdA=
X-Google-Smtp-Source: 
 AGHT+IHtI/o5pNeGz2cY7lq9IkPnYGSTCv2/jAujz0MnI+G/LRxESQ+2O6giZBrdCidFPtMXcGmuDiYAecPK
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6a00:26ea:b0:692:c216:8830 with SMTP
 id p42-20020a056a0026ea00b00692c2168830mr16132pfw.0.1699299357192; Mon, 06
 Nov 2023 11:35:57 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:54 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-12-jstultz@google.com>
Subject: [PATCH v6 11/20] sched: Split scheduler execution context
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com,
        "Connor O'Brien" <connoro@google.com>,
        John Stultz <jstultz@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

Lets define the scheduling context as all the scheduler state in
task_struct and the execution context as all state required to run
the task.

Currently both are intertwined in task_struct. We want to logically
split these such that we can run the execution context of one task
with the scheduling context of another.

To this purpose introduce rq_selected() macro to point to the
task_struct used for scheduler state and preserve rq->curr to
denote the execution context.

NOTE: Peter previously mentioned he didn't like the name
"rq_selected()", but I've not come up with a better alternative.
I'm very open to other name proposals.

Question for Peter: Dietmar suggested you'd prefer I drop the
conditionalization of the scheduler context pointer on the rq
(so rq_selected() would be open coded as rq->curr_sched or
whatever we agree on for a name), but I'd think in the
!CONFIG_PROXY_EXEC case we'd want to avoid the wasted pointer
and its use (since it curr_sched would always be =3D=3D curr)?
If I'm wrong I'm fine switching this, but would appreciate
clarification.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20181009092434.26221-5-juri.lelli@redhat.com
[add additional comments and update more sched_class code to use
 rq::proxy]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Rebased and resolved minor collisions, reworked to use
 accessors, tweaked update_curr_common to use rq_proxy fixing rt
 scheduling issues]
Signed-off-by: John Stultz <jstultz@google.com>
---
v2:
* Reworked to use accessors
* Fixed update_curr_common to use proxy instead of curr
v3:
* Tweaked wrapper names
* Swapped proxy for selected for clarity
v4:
* Minor variable name tweaks for readability
* Use a macro instead of a inline function and drop
  other helper functions as suggested by Peter.
* Remove verbose comments/questions to avoid review
  distractions, as suggested by Dietmar
v5:
* Add CONFIG_PROXY_EXEC option to this patch so the
  new logic can be tested with this change
* Minor fix to grab rq_selected when holding the rq lock
---
 kernel/sched/core.c     | 38 +++++++++++++++++++++++++-------------
 kernel/sched/deadline.c | 35 ++++++++++++++++++-----------------
 kernel/sched/fair.c     | 18 +++++++++---------
 kernel/sched/rt.c       | 40 ++++++++++++++++++++--------------------
 kernel/sched/sched.h    | 37 +++++++++++++++++++++++++++++++++++--
 5 files changed, 107 insertions(+), 61 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a38bf8ef5798..9931940ba474 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -824,7 +824,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *time=
r)
=20
 	rq_lock(rq, &rf);
 	update_rq_clock(rq);
-	rq->curr->sched_class->task_tick(rq, rq->curr, 1);
+	rq_selected(rq)->sched_class->task_tick(rq, rq_selected(rq), 1);
 	rq_unlock(rq, &rf);
=20
 	return HRTIMER_NORESTART;
@@ -2251,16 +2251,18 @@ static inline void check_class_changed(struct rq *r=
q, struct task_struct *p,
=20
 void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 {
-	if (p->sched_class =3D=3D rq->curr->sched_class)
-		rq->curr->sched_class->check_preempt_curr(rq, p, flags);
-	else if (sched_class_above(p->sched_class, rq->curr->sched_class))
+	struct task_struct *curr =3D rq_selected(rq);
+
+	if (p->sched_class =3D=3D curr->sched_class)
+		curr->sched_class->check_preempt_curr(rq, p, flags);
+	else if (sched_class_above(p->sched_class, curr->sched_class))
 		resched_curr(rq);
=20
 	/*
 	 * A queue event has occurred, and we're going to schedule.  In
 	 * this case, we can save a useless back to back clock update.
 	 */
-	if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
+	if (task_on_rq_queued(curr) && test_tsk_need_resched(rq->curr))
 		rq_clock_skip_update(rq);
 }
=20
@@ -2799,7 +2801,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct a=
ffinity_context *ctx)
 		lockdep_assert_held(&p->pi_lock);
=20
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_selected(rq, p);
=20
 	if (queued) {
 		/*
@@ -5602,7 +5604,7 @@ unsigned long long task_sched_runtime(struct task_str=
uct *p)
 	 * project cycles that may never be accounted to this
 	 * thread, breaking clock_gettime().
 	 */
-	if (task_current(rq, p) && task_on_rq_queued(p)) {
+	if (task_current_selected(rq, p) && task_on_rq_queued(p)) {
 		prefetch_curr_exec_start(p);
 		update_rq_clock(rq);
 		p->sched_class->update_curr(rq);
@@ -5670,7 +5672,8 @@ void scheduler_tick(void)
 {
 	int cpu =3D smp_processor_id();
 	struct rq *rq =3D cpu_rq(cpu);
-	struct task_struct *curr =3D rq->curr;
+	/* accounting goes to the selected task */
+	struct task_struct *curr;
 	struct rq_flags rf;
 	unsigned long thermal_pressure;
 	u64 resched_latency;
@@ -5681,6 +5684,7 @@ void scheduler_tick(void)
 	sched_clock_tick();
=20
 	rq_lock(rq, &rf);
+	curr =3D rq_selected(rq);
=20
 	update_rq_clock(rq);
 	thermal_pressure =3D arch_scale_thermal_pressure(cpu_of(rq));
@@ -5765,6 +5769,12 @@ static void sched_tick_remote(struct work_struct *wo=
rk)
 		struct task_struct *curr =3D rq->curr;
=20
 		if (cpu_online(cpu)) {
+			/*
+			 * Since this is a remote tick for full dynticks mode,
+			 * we are always sure that there is no proxy (only a
+			 * single task is running).
+			 */
+			SCHED_WARN_ON(rq->curr !=3D rq_selected(rq));
 			update_rq_clock(rq);
=20
 			if (!is_idle_task(curr)) {
@@ -6688,6 +6698,7 @@ static void __sched notrace __schedule(unsigned int s=
ched_mode)
 	}
=20
 	next =3D pick_next_task(rq, prev, &rf);
+	rq_set_selected(rq, next);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
 #ifdef CONFIG_SCHED_DEBUG
@@ -7154,7 +7165,7 @@ void rt_mutex_setprio(struct task_struct *p, struct t=
ask_struct *pi_task)
=20
 	prev_class =3D p->sched_class;
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_selected(rq, p);
 	if (queued)
 		dequeue_task(rq, p, queue_flag);
 	if (running)
@@ -7242,7 +7253,7 @@ void set_user_nice(struct task_struct *p, long nice)
 		goto out_unlock;
 	}
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_selected(rq, p);
 	if (queued)
 		dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
 	if (running)
@@ -7826,7 +7837,7 @@ static int __sched_setscheduler(struct task_struct *p,
 	}
=20
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_selected(rq, p);
 	if (queued)
 		dequeue_task(rq, p, queue_flags);
 	if (running)
@@ -9327,6 +9338,7 @@ void __init init_idle(struct task_struct *idle, int c=
pu)
 	rcu_read_unlock();
=20
 	rq->idle =3D idle;
+	rq_set_selected(rq, idle);
 	rcu_assign_pointer(rq->curr, idle);
 	idle->on_rq =3D TASK_ON_RQ_QUEUED;
 #ifdef CONFIG_SMP
@@ -9416,7 +9428,7 @@ void sched_setnuma(struct task_struct *p, int nid)
=20
 	rq =3D task_rq_lock(p, &rf);
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_selected(rq, p);
=20
 	if (queued)
 		dequeue_task(rq, p, DEQUEUE_SAVE);
@@ -10543,7 +10555,7 @@ void sched_move_task(struct task_struct *tsk)
=20
 	update_rq_clock(rq);
=20
-	running =3D task_current(rq, tsk);
+	running =3D task_current_selected(rq, tsk);
 	queued =3D task_on_rq_queued(tsk);
=20
 	if (queued)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 9522e6607754..e8bca6b8da6f 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1174,7 +1174,7 @@ static enum hrtimer_restart dl_task_timer(struct hrti=
mer *timer)
 #endif
=20
 	enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
-	if (dl_task(rq->curr))
+	if (dl_task(rq_selected(rq)))
 		check_preempt_curr_dl(rq, p, 0);
 	else
 		resched_curr(rq);
@@ -1297,7 +1297,7 @@ static u64 grub_reclaim(u64 delta, struct rq *rq, str=
uct sched_dl_entity *dl_se)
  */
 static void update_curr_dl(struct rq *rq)
 {
-	struct task_struct *curr =3D rq->curr;
+	struct task_struct *curr =3D rq_selected(rq);
 	struct sched_dl_entity *dl_se =3D &curr->dl;
 	s64 delta_exec, scaled_delta_exec;
 	int cpu =3D cpu_of(rq);
@@ -1810,7 +1810,7 @@ static int find_later_rq(struct task_struct *task);
 static int
 select_task_rq_dl(struct task_struct *p, int cpu, int flags)
 {
-	struct task_struct *curr;
+	struct task_struct *curr, *selected;
 	bool select_rq;
 	struct rq *rq;
=20
@@ -1821,6 +1821,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int=
 flags)
=20
 	rcu_read_lock();
 	curr =3D READ_ONCE(rq->curr); /* unlocked access */
+	selected =3D READ_ONCE(rq_selected(rq));
=20
 	/*
 	 * If we are dealing with a -deadline task, we must
@@ -1831,9 +1832,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int=
 flags)
 	 * other hand, if it has a shorter deadline, we
 	 * try to make it stay here, it might be important.
 	 */
-	select_rq =3D unlikely(dl_task(curr)) &&
+	select_rq =3D unlikely(dl_task(selected)) &&
 		    (curr->nr_cpus_allowed < 2 ||
-		     !dl_entity_preempt(&p->dl, &curr->dl)) &&
+		     !dl_entity_preempt(&p->dl, &selected->dl)) &&
 		    p->nr_cpus_allowed > 1;
=20
 	/*
@@ -1896,7 +1897,7 @@ static void check_preempt_equal_dl(struct rq *rq, str=
uct task_struct *p)
 	 * let's hope p can move out.
 	 */
 	if (rq->curr->nr_cpus_allowed =3D=3D 1 ||
-	    !cpudl_find(&rq->rd->cpudl, rq->curr, NULL))
+	    !cpudl_find(&rq->rd->cpudl, rq_selected(rq), NULL))
 		return;
=20
 	/*
@@ -1935,7 +1936,7 @@ static int balance_dl(struct rq *rq, struct task_stru=
ct *p, struct rq_flags *rf)
 static void check_preempt_curr_dl(struct rq *rq, struct task_struct *p,
 				  int flags)
 {
-	if (dl_entity_preempt(&p->dl, &rq->curr->dl)) {
+	if (dl_entity_preempt(&p->dl, &rq_selected(rq)->dl)) {
 		resched_curr(rq);
 		return;
 	}
@@ -1945,7 +1946,7 @@ static void check_preempt_curr_dl(struct rq *rq, stru=
ct task_struct *p,
 	 * In the unlikely case current and p have the same deadline
 	 * let us try to decide what's the best thing to do...
 	 */
-	if ((p->dl.deadline =3D=3D rq->curr->dl.deadline) &&
+	if ((p->dl.deadline =3D=3D rq_selected(rq)->dl.deadline) &&
 	    !test_tsk_need_resched(rq->curr))
 		check_preempt_equal_dl(rq, p);
 #endif /* CONFIG_SMP */
@@ -1980,7 +1981,7 @@ static void set_next_task_dl(struct rq *rq, struct ta=
sk_struct *p, bool first)
 	if (hrtick_enabled_dl(rq))
 		start_hrtick_dl(rq, p);
=20
-	if (rq->curr->sched_class !=3D &dl_sched_class)
+	if (rq_selected(rq)->sched_class !=3D &dl_sched_class)
 		update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0);
=20
 	deadline_queue_push_tasks(rq);
@@ -2297,8 +2298,8 @@ static int push_dl_task(struct rq *rq)
 	 * can move away, it makes sense to just reschedule
 	 * without going further in pushing next_task.
 	 */
-	if (dl_task(rq->curr) &&
-	    dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) &&
+	if (dl_task(rq_selected(rq)) &&
+	    dl_time_before(next_task->dl.deadline, rq_selected(rq)->dl.deadline) =
&&
 	    rq->curr->nr_cpus_allowed > 1) {
 		resched_curr(rq);
 		return 0;
@@ -2423,7 +2424,7 @@ static void pull_dl_task(struct rq *this_rq)
 			 * deadline than the current task of its runqueue.
 			 */
 			if (dl_time_before(p->dl.deadline,
-					   src_rq->curr->dl.deadline))
+					   rq_selected(src_rq)->dl.deadline))
 				goto skip;
=20
 			if (is_migration_disabled(p)) {
@@ -2462,9 +2463,9 @@ static void task_woken_dl(struct rq *rq, struct task_=
struct *p)
 	if (!task_on_cpu(rq, p) &&
 	    !test_tsk_need_resched(rq->curr) &&
 	    p->nr_cpus_allowed > 1 &&
-	    dl_task(rq->curr) &&
+	    dl_task(rq_selected(rq)) &&
 	    (rq->curr->nr_cpus_allowed < 2 ||
-	     !dl_entity_preempt(&p->dl, &rq->curr->dl))) {
+	     !dl_entity_preempt(&p->dl, &rq_selected(rq)->dl))) {
 		push_dl_tasks(rq);
 	}
 }
@@ -2639,12 +2640,12 @@ static void switched_to_dl(struct rq *rq, struct ta=
sk_struct *p)
 		return;
 	}
=20
-	if (rq->curr !=3D p) {
+	if (rq_selected(rq) !=3D p) {
 #ifdef CONFIG_SMP
 		if (p->nr_cpus_allowed > 1 && rq->dl.overloaded)
 			deadline_queue_push_tasks(rq);
 #endif
-		if (dl_task(rq->curr))
+		if (dl_task(rq_selected(rq)))
 			check_preempt_curr_dl(rq, p, 0);
 		else
 			resched_curr(rq);
@@ -2673,7 +2674,7 @@ static void prio_changed_dl(struct rq *rq, struct tas=
k_struct *p,
 	if (!rq->dl.overloaded)
 		deadline_queue_pull_task(rq);
=20
-	if (task_current(rq, p)) {
+	if (task_current_selected(rq, p)) {
 		/*
 		 * If we now have a earlier deadline task than p,
 		 * then reschedule, provided p is still on this
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c919633acd3d..3d5c1ec34bf7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1172,7 +1172,7 @@ static s64 update_curr_se(struct rq *rq, struct sched=
_entity *curr)
  */
 s64 update_curr_common(struct rq *rq)
 {
-	struct task_struct *curr =3D rq->curr;
+	struct task_struct *curr =3D rq_selected(rq);
 	s64 delta_exec;
=20
 	delta_exec =3D update_curr_se(rq, &curr->se);
@@ -1218,7 +1218,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
=20
 static void update_curr_fair(struct rq *rq)
 {
-	update_curr(cfs_rq_of(&rq->curr->se));
+	update_curr(cfs_rq_of(&rq_selected(rq)->se));
 }
=20
 static inline void
@@ -6461,7 +6461,7 @@ static void hrtick_start_fair(struct rq *rq, struct t=
ask_struct *p)
 		s64 delta =3D slice - ran;
=20
 		if (delta < 0) {
-			if (task_current(rq, p))
+			if (task_current_selected(rq, p))
 				resched_curr(rq);
 			return;
 		}
@@ -6476,7 +6476,7 @@ static void hrtick_start_fair(struct rq *rq, struct t=
ask_struct *p)
  */
 static void hrtick_update(struct rq *rq)
 {
-	struct task_struct *curr =3D rq->curr;
+	struct task_struct *curr =3D rq_selected(rq);
=20
 	if (!hrtick_enabled_fair(rq) || curr->sched_class !=3D &fair_sched_class)
 		return;
@@ -8082,7 +8082,7 @@ static void set_next_buddy(struct sched_entity *se)
  */
 static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int=
 wake_flags)
 {
-	struct task_struct *curr =3D rq->curr;
+	struct task_struct *curr =3D rq_selected(rq);
 	struct sched_entity *se =3D &curr->se, *pse =3D &p->se;
 	struct cfs_rq *cfs_rq =3D task_cfs_rq(curr);
 	int next_buddy_marked =3D 0;
@@ -8115,7 +8115,7 @@ static void check_preempt_wakeup(struct rq *rq, struc=
t task_struct *p, int wake_
 	 * prevents us from potentially nominating it as a false LAST_BUDDY
 	 * below.
 	 */
-	if (test_tsk_need_resched(curr))
+	if (test_tsk_need_resched(rq->curr))
 		return;
=20
 	/* Idle tasks are by definition preempted by non-idle tasks. */
@@ -9099,7 +9099,7 @@ static bool __update_blocked_others(struct rq *rq, bo=
ol *done)
 	 * update_load_avg() can call cpufreq_update_util(). Make sure that RT,
 	 * DL and IRQ signals have been updated before updating CFS.
 	 */
-	curr_class =3D rq->curr->sched_class;
+	curr_class =3D rq_selected(rq)->sched_class;
=20
 	thermal_pressure =3D arch_scale_thermal_pressure(cpu_of(rq));
=20
@@ -12471,7 +12471,7 @@ prio_changed_fair(struct rq *rq, struct task_struct=
 *p, int oldprio)
 	 * our priority decreased, or if we are not currently running on
 	 * this runqueue and our priority is higher than the current's
 	 */
-	if (task_current(rq, p)) {
+	if (task_current_selected(rq, p)) {
 		if (p->prio > oldprio)
 			resched_curr(rq);
 	} else
@@ -12574,7 +12574,7 @@ static void switched_to_fair(struct rq *rq, struct =
task_struct *p)
 		 * kick off the schedule if running, otherwise just see
 		 * if we can still preempt the current task.
 		 */
-		if (task_current(rq, p))
+		if (task_current_selected(rq, p))
 			resched_curr(rq);
 		else
 			check_preempt_curr(rq, p, 0);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 327ae4148aec..bc243e70bc0e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -574,7 +574,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *r=
t_se, unsigned int flags)
=20
 static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
 {
-	struct task_struct *curr =3D rq_of_rt_rq(rt_rq)->curr;
+	struct task_struct *curr =3D rq_selected(rq_of_rt_rq(rt_rq));
 	struct rq *rq =3D rq_of_rt_rq(rt_rq);
 	struct sched_rt_entity *rt_se;
=20
@@ -1044,7 +1044,7 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt=
_rq)
  */
 static void update_curr_rt(struct rq *rq)
 {
-	struct task_struct *curr =3D rq->curr;
+	struct task_struct *curr =3D rq_selected(rq);
 	struct sched_rt_entity *rt_se =3D &curr->rt;
 	s64 delta_exec;
=20
@@ -1591,7 +1591,7 @@ static int find_lowest_rq(struct task_struct *task);
 static int
 select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 {
-	struct task_struct *curr;
+	struct task_struct *curr, *selected;
 	struct rq *rq;
 	bool test;
=20
@@ -1603,6 +1603,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int=
 flags)
=20
 	rcu_read_lock();
 	curr =3D READ_ONCE(rq->curr); /* unlocked access */
+	selected =3D READ_ONCE(rq_selected(rq));
=20
 	/*
 	 * If the current task on @p's runqueue is an RT task, then
@@ -1631,8 +1632,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int=
 flags)
 	 * systems like big.LITTLE.
 	 */
 	test =3D curr &&
-	       unlikely(rt_task(curr)) &&
-	       (curr->nr_cpus_allowed < 2 || curr->prio <=3D p->prio);
+	       unlikely(rt_task(selected)) &&
+	       (curr->nr_cpus_allowed < 2 || selected->prio <=3D p->prio);
=20
 	if (test || !rt_task_fits_capacity(p, cpu)) {
 		int target =3D find_lowest_rq(p);
@@ -1662,12 +1663,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, in=
t flags)
=20
 static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
 {
-	/*
-	 * Current can't be migrated, useless to reschedule,
-	 * let's hope p can move out.
-	 */
 	if (rq->curr->nr_cpus_allowed =3D=3D 1 ||
-	    !cpupri_find(&rq->rd->cpupri, rq->curr, NULL))
+	    !cpupri_find(&rq->rd->cpupri, rq_selected(rq), NULL))
 		return;
=20
 	/*
@@ -1710,7 +1707,9 @@ static int balance_rt(struct rq *rq, struct task_stru=
ct *p, struct rq_flags *rf)
  */
 static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p, in=
t flags)
 {
-	if (p->prio < rq->curr->prio) {
+	struct task_struct *curr =3D rq_selected(rq);
+
+	if (p->prio < curr->prio) {
 		resched_curr(rq);
 		return;
 	}
@@ -1728,7 +1727,7 @@ static void check_preempt_curr_rt(struct rq *rq, stru=
ct task_struct *p, int flag
 	 * to move current somewhere else, making room for our non-migratable
 	 * task.
 	 */
-	if (p->prio =3D=3D rq->curr->prio && !test_tsk_need_resched(rq->curr))
+	if (p->prio =3D=3D curr->prio && !test_tsk_need_resched(rq->curr))
 		check_preempt_equal_prio(rq, p);
 #endif
 }
@@ -1753,7 +1752,7 @@ static inline void set_next_task_rt(struct rq *rq, st=
ruct task_struct *p, bool f
 	 * utilization. We only care of the case where we start to schedule a
 	 * rt task
 	 */
-	if (rq->curr->sched_class !=3D &rt_sched_class)
+	if (rq_selected(rq)->sched_class !=3D &rt_sched_class)
 		update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0);
=20
 	rt_queue_push_tasks(rq);
@@ -2034,6 +2033,7 @@ static struct task_struct *pick_next_pushable_task(st=
ruct rq *rq)
=20
 	BUG_ON(rq->cpu !=3D task_cpu(p));
 	BUG_ON(task_current(rq, p));
+	BUG_ON(task_current_selected(rq, p));
 	BUG_ON(p->nr_cpus_allowed <=3D 1);
=20
 	BUG_ON(!task_on_rq_queued(p));
@@ -2066,7 +2066,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 	 * higher priority than current. If that's the case
 	 * just reschedule current.
 	 */
-	if (unlikely(next_task->prio < rq->curr->prio)) {
+	if (unlikely(next_task->prio < rq_selected(rq)->prio)) {
 		resched_curr(rq);
 		return 0;
 	}
@@ -2419,7 +2419,7 @@ static void pull_rt_task(struct rq *this_rq)
 			 * p if it is lower in priority than the
 			 * current task on the run queue
 			 */
-			if (p->prio < src_rq->curr->prio)
+			if (p->prio < rq_selected(src_rq)->prio)
 				goto skip;
=20
 			if (is_migration_disabled(p)) {
@@ -2461,9 +2461,9 @@ static void task_woken_rt(struct rq *rq, struct task_=
struct *p)
 	bool need_to_push =3D !task_on_cpu(rq, p) &&
 			    !test_tsk_need_resched(rq->curr) &&
 			    p->nr_cpus_allowed > 1 &&
-			    (dl_task(rq->curr) || rt_task(rq->curr)) &&
+			    (dl_task(rq_selected(rq)) || rt_task(rq_selected(rq))) &&
 			    (rq->curr->nr_cpus_allowed < 2 ||
-			     rq->curr->prio <=3D p->prio);
+			     rq_selected(rq)->prio <=3D p->prio);
=20
 	if (need_to_push)
 		push_rt_tasks(rq);
@@ -2547,7 +2547,7 @@ static void switched_to_rt(struct rq *rq, struct task=
_struct *p)
 		if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
 			rt_queue_push_tasks(rq);
 #endif /* CONFIG_SMP */
-		if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq)))
+		if (p->prio < rq_selected(rq)->prio && cpu_online(cpu_of(rq)))
 			resched_curr(rq);
 	}
 }
@@ -2562,7 +2562,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,=
 int oldprio)
 	if (!task_on_rq_queued(p))
 		return;
=20
-	if (task_current(rq, p)) {
+	if (task_current_selected(rq, p)) {
 #ifdef CONFIG_SMP
 		/*
 		 * If our priority decreases while running, we
@@ -2588,7 +2588,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,=
 int oldprio)
 		 * greater than the current running task
 		 * then reschedule.
 		 */
-		if (p->prio < rq->curr->prio)
+		if (p->prio < rq_selected(rq)->prio)
 			resched_curr(rq);
 	}
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7c37d478e0f8..130e669cf5da 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1013,7 +1013,10 @@ struct rq {
 	 */
 	unsigned int		nr_uninterruptible;
=20
-	struct task_struct __rcu	*curr;
+	struct task_struct __rcu	*curr;       /* Execution context */
+#ifdef CONFIG_PROXY_EXEC
+	struct task_struct __rcu	*curr_sched; /* Scheduling context (policy) */
+#endif
 	struct task_struct	*idle;
 	struct task_struct	*stop;
 	unsigned long		next_balance;
@@ -1212,6 +1215,22 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 #define raw_rq()		raw_cpu_ptr(&runqueues)
=20
+#ifdef CONFIG_PROXY_EXEC
+#define rq_selected(rq)		((rq)->curr_sched)
+#define cpu_curr_selected(cpu)	(cpu_rq(cpu)->curr_sched)
+static inline void rq_set_selected(struct rq *rq, struct task_struct *t)
+{
+	rcu_assign_pointer(rq->curr_sched, t);
+}
+#else
+#define rq_selected(rq)		((rq)->curr)
+#define cpu_curr_selected(cpu)	(cpu_rq(cpu)->curr)
+static inline void rq_set_selected(struct rq *rq, struct task_struct *t)
+{
+	/* Do nothing */
+}
+#endif
+
 struct sched_group;
 #ifdef CONFIG_SCHED_CORE
 static inline struct cpumask *sched_group_span(struct sched_group *sg);
@@ -2128,11 +2147,25 @@ static inline u64 global_rt_runtime(void)
 	return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC;
 }
=20
+/*
+ * Is p the current execution context?
+ */
 static inline int task_current(struct rq *rq, struct task_struct *p)
 {
 	return rq->curr =3D=3D p;
 }
=20
+/*
+ * Is p the current scheduling context?
+ *
+ * Note that it might be the current execution context at the same time if
+ * rq->curr =3D=3D rq_selected() =3D=3D p.
+ */
+static inline int task_current_selected(struct rq *rq, struct task_struct =
*p)
+{
+	return rq_selected(rq) =3D=3D p;
+}
+
 #ifdef CONFIG_PROXY_EXEC
 static inline bool task_is_blocked(struct task_struct *p)
 {
@@ -2308,7 +2341,7 @@ struct sched_class {
=20
 static inline void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	WARN_ON_ONCE(rq->curr !=3D prev);
+	WARN_ON_ONCE(rq_selected(rq) !=3D prev);
 	prev->sched_class->put_prev_task(rq, prev);
 }
=20
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CA92CC4167D
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:36:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232748AbjKFTgt (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:36:49 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52396 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232721AbjKFTgJ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:36:09 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38DC21723
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:36:01 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 98e67ed59e1d1-280465be3c9so3227574a91.1
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:36:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299360; x=1699904160;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=tlM0fUGrXkJ77vpxrL8Ua3abDkMWMrcPq7YvjnZzG1k=;
        b=RK/X8K9qOxJs3epYdFPaemyKI7J1R12sGYnQSKcCYMnAzRGStO1qECjUhuu9m/2no+
         jT3qummZDFbo/w4tGzL4t3uD+f1kSEAjF0shzjvcNvhrGYhDDBMWe+l5OkjS78lmnZ7N
         KFJNHuwDZ2H6KvskuOnzyhWz19C6+2d+Un3DhxFKReFYi5W/wk+V722W8dP7tN7RHmri
         PSYbBBYITsCCD/jWtYamBpSH3yF2pY1YPGVHNDzX57eB3wMhRmhJwGjLDCvh2URAM+Ix
         AC3mm6rXaXvgCz315aIPTeoYwyZvWK1F6HKbUxefyNXigAhRudpk6ymPxlNeS93xQ13n
         kvZA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299360; x=1699904160;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=tlM0fUGrXkJ77vpxrL8Ua3abDkMWMrcPq7YvjnZzG1k=;
        b=qyYMmo+e/hAVeJHtFFqwVDUK36INljF+89IkgTk8icVj+qVhoIzrma7YJ7QjGTjanf
         lxyFoeL2njJauHOswNCOk+5zGMOd/yRkQ3SHx/AxqB6aGwFgJeDr14Gd3lz/yXzBYDEB
         sHybPOmBDZZzk7zDbnJHp1l8nYyU8i/sZSRHZ3js3+InCQCY8H+tzEjlqQOub0v+l/9z
         RuCPCPj4aG80O+SaFoQA467ce8yjPExsCuyXfvkbrZpTv0aC5WWerYMAPfJgxyyeewFF
         hP37F5uHX7l7xqcf5+ud0VJtgK6yaJLfta7XHjhWTYOlmjlkAgXvR3TzPX2WJkC6DJHe
         jSJw==
X-Gm-Message-State: AOJu0YyClRRnkL0/WV5xu2RsXyU547J+uq6cansDYl5BBQlgy4nNduDR
        FbKPlGB/6EsFYlz+X7CAbRn7UTQtPp0KfcNaUUs6IPnw/oGJlM8qqquMhx5eyCxOFLsjc9ZrC3p
        lLcV/JUwo7R3exEi/9viNmd2ANWphd1GrECz+06Ek6lyCM0fs7TGCtnwmn70jL7xVAPkYo1A=
X-Google-Smtp-Source: 
 AGHT+IFuSY4TKULEwQRlAhcK4DIy+tTFWR3TVxo3zbojlzz0Z6KY40zo8comijA7uSLG852mFFc+HVZbfhED
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:90a:c708:b0:280:47ba:7685 with SMTP id
 o8-20020a17090ac70800b0028047ba7685mr14653pjt.0.1699299359325; Mon, 06 Nov
 2023 11:35:59 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:55 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-13-jstultz@google.com>
Subject: [PATCH v6 12/20] sched: Fix runtime accounting w/ split exec & sched
 contexts
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

The idea here is we want to charge the scheduler-context task's
vruntime but charge the execution-context task's sum_exec_runtime.

This way cputime accounting goes against the task actually running
but vruntime accounting goes against the selected task so we get
proper fairness.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
 kernel/sched/fair.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3d5c1ec34bf7..1aca675985b2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1144,22 +1144,36 @@ static void update_tg_load_avg(struct cfs_rq *cfs_r=
q)
 }
 #endif /* CONFIG_SMP */
=20
-static s64 update_curr_se(struct rq *rq, struct sched_entity *curr)
+static s64 update_curr_se(struct rq *rq, struct sched_entity *se)
 {
 	u64 now =3D rq_clock_task(rq);
 	s64 delta_exec;
=20
-	delta_exec =3D now - curr->exec_start;
+	/* Calculate the delta from selected se */
+	delta_exec =3D now - se->exec_start;
 	if (unlikely(delta_exec <=3D 0))
 		return delta_exec;
=20
-	curr->exec_start =3D now;
-	curr->sum_exec_runtime +=3D delta_exec;
+	/* Update selected se's exec_start */
+	se->exec_start =3D now;
+	if (entity_is_task(se)) {
+		struct task_struct *running =3D rq->curr;
+		/*
+		 * If se is a task, we account the time
+		 * against the running task, as w/ proxy-exec
+		 * they may not be the same.
+		 */
+		running->se.exec_start =3D now;
+		running->se.sum_exec_runtime +=3D delta_exec;
+	} else {
+		/* If not task, account the time against se */
+		se->sum_exec_runtime +=3D delta_exec;
+	}
=20
 	if (schedstat_enabled()) {
 		struct sched_statistics *stats;
=20
-		stats =3D __schedstats_from_se(curr);
+		stats =3D __schedstats_from_se(se);
 		__schedstat_set(stats->exec_max,
 				max(delta_exec, stats->exec_max));
 	}
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F36B5C4167B
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:36:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233011AbjKFTgz (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:36:55 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38774 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233033AbjKFTgJ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:36:09 -0500
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7188173F
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:36:02 -0800 (PST)
Received: by mail-yb1-xb4a.google.com with SMTP id
 3f1490d57ef6-d86dac81f8fso5882315276.1
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:36:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299361; x=1699904161;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=Q8WchLjhxzR8hLwmAM2b8K3LlHrmaAbxbI1VulbCNAg=;
        b=ng3201E88f0muoa/itwNYD0l/qp/0PZVu723T1YVJth0r44XVnz6xyJNcpzjn2Ounw
         7SlVT0xm2+5eh+2RXgTGEj/GhLGe5x45UPPm+JT1n6LmItMQUNHtfyq+9d3pltLZRjvF
         /uN/O3S7LMUfIJ3LMb/bpVcDVwLnXk6MoTLHWRoIgis56FXzjg3HLT0ph+nFM8zzP7ft
         SZYg+J6J/HODFwShn98jWrV3dsoRnH4FffgaLqveuC2qH1pRDRI49AWIQUbt/yviW12o
         mJaXJd5pofJC17aPaWKp17Y31c/IYysIYPF55aO3gu5hWCX8nXw/zcFhDjxDiabQHgjM
         Wd8Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299361; x=1699904161;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=Q8WchLjhxzR8hLwmAM2b8K3LlHrmaAbxbI1VulbCNAg=;
        b=EQHWJdTEJ2+nJCra4aZMl4t4fP8RqJtKbzISaGO/Hmt0luGjuEXqcTRhh61gZmv5Yn
         ZLMhf8PPTAQA88TEVDljwKnx3nDRC8HFsoFlpkC+9TnsgywP2wppF6IlFUgagN1PKrKM
         DTUzHDw53J8ER+aXr7REj5h2/sHhqy2ykNu3OOvizuxgAfeQRMhMMP1Q4PdzEYqOuy5e
         oeJAshFdCtmNV+Yr2rQ3MaehGRy/hPWWTiHw5ypvHKq8LVynAt0ECBcnvjS3JVMq5iRW
         mzesQUdNyKNXTyWRNWZV80IFz8Z2ncVr+FzUtZrxBS+qIGDbKUhlSJJXNaR7he1yN5Gw
         jsmw==
X-Gm-Message-State: AOJu0Ywsg6amn+jwMqk170gaCZkDZIBkC4fljXWY8zMP3+SrN4UqkXzj
        UrEtQkZHpAOikQ6FqLgCfgXRX2djXqACnUtq8HjPRnVL1USPpouJDx0ayHR+L+5TPqtz7iberji
        NTo6dYks3bXA0mFtlzFUaisrpSV+7ieh1wQFtYnxmYQ4M6u/dI6IYysuTZHPLWzoj6ScfwN8=
X-Google-Smtp-Source: 
 AGHT+IE2LagPZOXLu2Dujz9sY/wkyg+vRcsLP1Mi7QOxbjwhBOs+j0G+W0ghI1VCLGiX1IJn3Kb0v07PkhGQ
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a25:aae7:0:b0:da0:5a30:6887 with SMTP id
 t94-20020a25aae7000000b00da05a306887mr537996ybi.4.1699299361477; Mon, 06 Nov
 2023 11:36:01 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:56 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-14-jstultz@google.com>
Subject: [PATCH v6 13/20] sched: Split out __sched() deactivate task logic
 into a helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

As we're going to re-use the deactivation logic,
split it into a helper.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
v6:
* Define function as static to avoid "no previous prototype"
  warnings as Reported-by: kernel test robot <lkp@intel.com>
---
 kernel/sched/core.c | 65 +++++++++++++++++++++++++--------------------
 1 file changed, 36 insertions(+), 29 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9931940ba474..1b38b34d3f64 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6575,6 +6575,41 @@ pick_next_task(struct rq *rq, struct task_struct *pr=
ev, struct rq_flags *rf)
 # define SM_MASK_PREEMPT	SM_PREEMPT
 #endif
=20
+static bool try_to_deactivate_task(struct rq *rq, struct task_struct *p, u=
nsigned long state)
+{
+	if (signal_pending_state(state, p)) {
+		WRITE_ONCE(p->__state, TASK_RUNNING);
+	} else {
+		p->sched_contributes_to_load =3D
+			(state & TASK_UNINTERRUPTIBLE) &&
+			!(state & TASK_NOLOAD) &&
+			!(state & TASK_FROZEN);
+
+		if (p->sched_contributes_to_load)
+			rq->nr_uninterruptible++;
+
+		/*
+		 * __schedule()			ttwu()
+		 *   prev_state =3D prev->state;    if (p->on_rq && ...)
+		 *   if (prev_state)		    goto out;
+		 *     p->on_rq =3D 0;		  smp_acquire__after_ctrl_dep();
+		 *				  p->state =3D TASK_WAKING
+		 *
+		 * Where __schedule() and ttwu() have matching control dependencies.
+		 *
+		 * After this, schedule() must not care about p->state any more.
+		 */
+		deactivate_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
+
+		if (p->in_iowait) {
+			atomic_inc(&rq->nr_iowait);
+			delayacct_blkio_start();
+		}
+		return true;
+	}
+	return false;
+}
+
 /*
  * __schedule() is the main scheduler function.
  *
@@ -6665,35 +6700,7 @@ static void __sched notrace __schedule(unsigned int =
sched_mode)
 	 */
 	prev_state =3D READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
-		if (signal_pending_state(prev_state, prev)) {
-			WRITE_ONCE(prev->__state, TASK_RUNNING);
-		} else {
-			prev->sched_contributes_to_load =3D
-				(prev_state & TASK_UNINTERRUPTIBLE) &&
-				!(prev_state & TASK_NOLOAD) &&
-				!(prev_state & TASK_FROZEN);
-
-			if (prev->sched_contributes_to_load)
-				rq->nr_uninterruptible++;
-
-			/*
-			 * __schedule()			ttwu()
-			 *   prev_state =3D prev->state;    if (p->on_rq && ...)
-			 *   if (prev_state)		    goto out;
-			 *     p->on_rq =3D 0;		  smp_acquire__after_ctrl_dep();
-			 *				  p->state =3D TASK_WAKING
-			 *
-			 * Where __schedule() and ttwu() have matching control dependencies.
-			 *
-			 * After this, schedule() must not care about p->state any more.
-			 */
-			deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
-
-			if (prev->in_iowait) {
-				atomic_inc(&rq->nr_iowait);
-				delayacct_blkio_start();
-			}
-		}
+		try_to_deactivate_task(rq, prev, prev_state);
 		switch_count =3D &prev->nvcsw;
 	}
=20
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D369DC4332F
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:37:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233146AbjKFThD (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:37:03 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52444 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232927AbjKFTgj (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:36:39 -0500
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 547E1D42
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:36:06 -0800 (PST)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-5b064442464so66639117b3.3
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:36:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299365; x=1699904165;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=wSuDJnvRlC5g6NjMbMGU+ddx8uvBAfxXgo2joCjm/cQ=;
        b=UsPss4Bc0ny3I6D0n8iVXhIqmBoX3q/V35zjhKAUkxvwKy01LuU9Qs8dzzqIEPG+tf
         IOzyuQ9phsFG4Ib+lh3Obgmpz69Q7LUgO4TIKqfHHl76kl3Vd5Tm8aUNVYCmlvISBFg9
         qSAVT0wd44yaV3IEYQuP1WD989pJldj+s1tGfveGflPhmRpQYHIWsV/iF6d0Pd6sWl1X
         Ja54hE8III8lB889Yftiz+PRMaurrPXLkn7qABfheeMknRZN69lSmh94RDgIUx49/s5C
         ewqla6GB/E+tMj88GLeZR9w0Y9YmyEJNCVoaQXPgjy2Sx5h2kWyMroz10IbxdKar71rw
         rIMA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299365; x=1699904165;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=wSuDJnvRlC5g6NjMbMGU+ddx8uvBAfxXgo2joCjm/cQ=;
        b=p/mJIwu1g8YY324YFDEcQesE1qDdHwS6XmRbTffWKsrLwWw7pdQ9RgoUiPwRgSD+up
         vmrcmTepTTXIxuTjBeXjrUAvKF5HlYREsngwk46eTeS38jo/xDL1514BWQO0iS7AiQNp
         mHzSSRA1f15+Y2axGudXgpAN9axaTQP3Kgme/jObhX+oCmfRVo8vcJ/SiyyODYDn+Rg9
         X4yvjGgasVld9p9aYdVDDDgTt0Troye7eP0dmJpuo8KX2Tl5bxg3pNMa5EMuujdC6J1S
         mmqgI5aYIRJShADeB72vpf2zoxGvyTPNwx5AOf2lPXZInzJtPdRDZzw0JZ53E8kmLxLc
         +q9g==
X-Gm-Message-State: AOJu0Yw83f79s3dSQ3GhGjyW/TWBzda5iUyUU+Rv5TNfngIH7cdBBwVw
        Cq0bo7t7z+lJvEE7RocTuj4+pPw90RUjjljTCp7s7XlFmzOQfS1xheVxzRNIGhAaUnFw/IPkjRG
        ixghEvkoZRxfPQDvRh6M14pzUgf8ZtcIBqRWAOEITJ/mhGwK3FTuX4tpcUCvIWsKxwQ4esHM=
X-Google-Smtp-Source: 
 AGHT+IHjvdfR5DfWZjvi2OasLUyS3W5istGDMVTaJehIhopNTGxK9u5ALFk264tV7pBZJsTqLE1uT+QAbd9U
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a81:4958:0:b0:59f:3cde:b33a with SMTP id
 w85-20020a814958000000b0059f3cdeb33amr239418ywa.6.1699299363609; Mon, 06 Nov
 2023 11:36:03 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:57 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-15-jstultz@google.com>
Subject: [PATCH v6 14/20] sched: Add a very simple proxy() function
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

This adds a very simple proxy() function so if
we select a blocked task to run, we will deactivate it
and pick again. The exception being if it has become
unblocked after proxy() was called.

Greatly simplified from patch by:
  Peter Zijlstra (Intel) <peterz@infradead.org>
  Juri Lelli <juri.lelli@redhat.com>
  Valentin Schneider <valentin.schneider@arm.com>
  Connor O'Brien <connoro@google.com>

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
[jstultz: Split out from larger proxy patch and simplified
 for review and testing.]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Split out from larger proxy patch
---
 kernel/sched/core.c | 89 +++++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/rt.c   | 19 +++++++++-
 2 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1b38b34d3f64..5770656b898d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6575,11 +6575,12 @@ pick_next_task(struct rq *rq, struct task_struct *p=
rev, struct rq_flags *rf)
 # define SM_MASK_PREEMPT	SM_PREEMPT
 #endif
=20
-static bool try_to_deactivate_task(struct rq *rq, struct task_struct *p, u=
nsigned long state)
+static bool try_to_deactivate_task(struct rq *rq, struct task_struct *p,
+				   unsigned long state, bool deactivate_cond)
 {
 	if (signal_pending_state(state, p)) {
 		WRITE_ONCE(p->__state, TASK_RUNNING);
-	} else {
+	} else if (deactivate_cond) {
 		p->sched_contributes_to_load =3D
 			(state & TASK_UNINTERRUPTIBLE) &&
 			!(state & TASK_NOLOAD) &&
@@ -6610,6 +6611,74 @@ static bool try_to_deactivate_task(struct rq *rq, st=
ruct task_struct *p, unsigne
 	return false;
 }
=20
+#ifdef CONFIG_PROXY_EXEC
+/*
+ * Initial simple proxy that just returns the task if its waking
+ * or deactivates the blocked task so we can pick something that
+ * isn't blocked.
+ */
+static struct task_struct *
+proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
+{
+	struct task_struct *p =3D next;
+	struct mutex *mutex;
+	unsigned long state;
+
+	mutex =3D p->blocked_on;
+	/* Something changed in the chain, pick_again */
+	if (!mutex)
+		return NULL;
+	/*
+	 * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
+	 * and ensure @owner sticks around.
+	 */
+	raw_spin_lock(&mutex->wait_lock);
+	raw_spin_lock(&p->blocked_lock);
+
+	/* Check again that p is blocked with blocked_lock held */
+	if (!task_is_blocked(p) || mutex !=3D p->blocked_on) {
+		/*
+		 * Something changed in the blocked_on chain and
+		 * we don't know if only at this level. So, let's
+		 * just bail out completely and let __schedule
+		 * figure things out (pick_again loop).
+		 */
+		raw_spin_unlock(&p->blocked_lock);
+		raw_spin_unlock(&mutex->wait_lock);
+		return NULL;
+	}
+
+	state =3D READ_ONCE(p->__state);
+	/* Don't deactivate if the state has been changed to TASK_RUNNING */
+	if (!state) {
+		raw_spin_unlock(&p->blocked_lock);
+		raw_spin_unlock(&mutex->wait_lock);
+		return p;
+	}
+
+	try_to_deactivate_task(rq, next, state, true);
+
+	/*
+	 * If next is the selected task, then remove lingering
+	 * references to it from rq and sched_class structs after
+	 * dequeueing.
+	 */
+	put_prev_task(rq, next);
+	rq_set_selected(rq, rq->idle);
+	resched_curr(rq);
+	raw_spin_unlock(&p->blocked_lock);
+	raw_spin_unlock(&mutex->wait_lock);
+	return NULL;
+}
+#else /* PROXY_EXEC */
+static struct task_struct *
+proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
+{
+	BUG(); // This should never be called in the !PROXY case
+	return next;
+}
+#endif /* PROXY_EXEC */
+
 /*
  * __schedule() is the main scheduler function.
  *
@@ -6700,12 +6769,24 @@ static void __sched notrace __schedule(unsigned int=
 sched_mode)
 	 */
 	prev_state =3D READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
-		try_to_deactivate_task(rq, prev, prev_state);
+		try_to_deactivate_task(rq, prev, prev_state,
+				       !task_is_blocked(prev));
 		switch_count =3D &prev->nvcsw;
 	}
=20
-	next =3D pick_next_task(rq, prev, &rf);
+pick_again:
+	next =3D pick_next_task(rq, rq_selected(rq), &rf);
 	rq_set_selected(rq, next);
+	if (unlikely(task_is_blocked(next))) {
+		next =3D proxy(rq, next, &rf);
+		if (!next) {
+			rq_unpin_lock(rq, &rf);
+			__balance_callbacks(rq);
+			rq_repin_lock(rq, &rf);
+			goto pick_again;
+		}
+	}
+
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
 #ifdef CONFIG_SCHED_DEBUG
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index bc243e70bc0e..0125a3ae5a7a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1537,8 +1537,19 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p=
, int flags)
=20
 	enqueue_rt_entity(rt_se, flags);
=20
-	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
-		enqueue_pushable_task(rq, p);
+	/*
+	 * Current can't be pushed away. Selected is tied to current,
+	 * so don't push it either.
+	 */
+	if (task_current(rq, p) || task_current_selected(rq, p))
+		return;
+	/*
+	 * Pinned tasks can't be pushed.
+	 */
+	if (p->nr_cpus_allowed =3D=3D 1)
+		return;
+
+	enqueue_pushable_task(rq, p);
 }
=20
 static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flag=
s)
@@ -1825,6 +1836,10 @@ static void put_prev_task_rt(struct rq *rq, struct t=
ask_struct *p)
=20
 	update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 1);
=20
+	/* Avoid marking selected as pushable */
+	if (task_current_selected(rq, p))
+		return;
+
 	/*
 	 * The previous task needs to be made eligible for pushing
 	 * if it is still active
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6E8F8C4332F
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:37:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232932AbjKFThI (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:37:08 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38844 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233005AbjKFTgk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:36:40 -0500
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AC371707
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:36:07 -0800 (PST)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-5ae5b12227fso67172917b3.0
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:36:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299366; x=1699904166;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=0t48YFx0UVkNi0jaFWpek94vC78jiuGmEf4m+wee/fQ=;
        b=C0tmMvW6EDloKdIqfq6GmYrjwV31rwd00BAc5YzsVqYfSReZHw0VOD2LNj6hDMaNR+
         mcxx9ogkukKMM4MfNVh+FX2DQYlItetz2AzR0IsxlcfgsJEVHa5DQCe6D3wNiSWEcLHM
         8JIU/jSTh4C3NHmwZETY97Y8UIQjhNAuTNbW9XppccP98o3j7sh5LBVqoD8Tyf/525d+
         6TcyjfG/8NYOfK4PYHDcMuPKkBDezK90h42vNgLEUBbJUS3ps+PiburhiPKf3xYFdY6X
         sFbzm4OtE2nImHOlnYHfeIA6C7fmOvwUT/5fAIiZ2jaHDnrHhRasrDBROmh8WDPI/jIo
         GP+g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299366; x=1699904166;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=0t48YFx0UVkNi0jaFWpek94vC78jiuGmEf4m+wee/fQ=;
        b=eH0eOz2NkAVFbdemG5VLReZjubpgEllj835O4S+aaVIKUJNZUAqB+V69k91MGAFk4A
         ZIhuRUcHdq5tdweKVUYwzTFxQxIm5zA8duI27GRQfhHt5Brcdn3Wvd4Puz0V4P8Hk9TH
         zHJ/RY67IZikUCbkJTiv/bRSMwgLSLavbQ9CiPnppgXUdux3gmzj/rjaYShhv8AjZAnw
         kqjOfeoFcMQZycUzQzn+GzXxVelv0O+HkrY1tTCWhr2TyfCdExs+AtLR9vVCD+7liLRy
         rp/R7nIftTTwWww72FQ3Nn1e53fseNP4ljXA8LFsBfFhtFNtrArFOwQmSHtyG22siJ+W
         dkVw==
X-Gm-Message-State: AOJu0YxplOcN1udXeds8mxl202HfvMh5MdSj2XzgUmHiQbEj2x5zPyZh
        4k8qgSnEJnBNkDiXcj4wxHHvY2cMtZKNG+d+7Gz3uetMUeO+MPrbcvzJymqCY0jSLWNmRgkS8PH
        ItmmHFc4Pm2cwzXnFX2YA4Jgd58bzDNz08qMWl7rRVbxckSJbXDIkJ/RpC29LFlWlQgYNOlU=
X-Google-Smtp-Source: 
 AGHT+IGNLfe0+euHxIci+tIjggk76N+xFg2VyEIJgcQl313pCoDUTxW/eJMMRAm747JUNmEl2yiynWjX68CD
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a81:4855:0:b0:592:83d2:1f86 with SMTP id
 v82-20020a814855000000b0059283d21f86mr221121ywa.4.1699299365819; Mon, 06 Nov
 2023 11:36:05 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:58 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-16-jstultz@google.com>
Subject: [PATCH v6 15/20] sched: Add proxy deactivate helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add small helper for deactivating the selected task

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
 kernel/sched/core.c | 43 +++++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5770656b898d..1b84d612332e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6612,6 +6612,22 @@ static bool try_to_deactivate_task(struct rq *rq, st=
ruct task_struct *p,
 }
=20
 #ifdef CONFIG_PROXY_EXEC
+
+bool proxy_deactivate(struct rq *rq, struct task_struct *next)
+{
+	unsigned long state =3D READ_ONCE(next->__state);
+
+	/* Don't deactivate if the state has been changed to TASK_RUNNING */
+	if (!state)
+		return false;
+	if (!try_to_deactivate_task(rq, next, state, true))
+		return false;
+	put_prev_task(rq, next);
+	rq_set_selected(rq, rq->idle);
+	resched_curr(rq);
+	return true;
+}
+
 /*
  * Initial simple proxy that just returns the task if its waking
  * or deactivates the blocked task so we can pick something that
@@ -6620,10 +6636,9 @@ static bool try_to_deactivate_task(struct rq *rq, st=
ruct task_struct *p,
 static struct task_struct *
 proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
 {
+	struct task_struct *ret =3D NULL;
 	struct task_struct *p =3D next;
 	struct mutex *mutex;
-	unsigned long state;
-
 	mutex =3D p->blocked_on;
 	/* Something changed in the chain, pick_again */
 	if (!mutex)
@@ -6645,30 +6660,14 @@ proxy(struct rq *rq, struct task_struct *next, stru=
ct rq_flags *rf)
 		 */
 		raw_spin_unlock(&p->blocked_lock);
 		raw_spin_unlock(&mutex->wait_lock);
-		return NULL;
-	}
-
-	state =3D READ_ONCE(p->__state);
-	/* Don't deactivate if the state has been changed to TASK_RUNNING */
-	if (!state) {
-		raw_spin_unlock(&p->blocked_lock);
-		raw_spin_unlock(&mutex->wait_lock);
-		return p;
+		return ret;
 	}
=20
-	try_to_deactivate_task(rq, next, state, true);
-
-	/*
-	 * If next is the selected task, then remove lingering
-	 * references to it from rq and sched_class structs after
-	 * dequeueing.
-	 */
-	put_prev_task(rq, next);
-	rq_set_selected(rq, rq->idle);
-	resched_curr(rq);
+	if (!proxy_deactivate(rq, next))
+		ret =3D p;
 	raw_spin_unlock(&p->blocked_lock);
 	raw_spin_unlock(&mutex->wait_lock);
-	return NULL;
+	return ret;
 }
 #else /* PROXY_EXEC */
 static struct task_struct *
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 100CDC4332F
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:37:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233117AbjKFThN (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:37:13 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54808 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232983AbjKFTgm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:36:42 -0500
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1EAB198D
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:36:09 -0800 (PST)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-5b31e000e97so66741107b3.1
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:36:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299368; x=1699904168;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=EvEjJWcD2rwHfLNLtqnw1VcwI8/iMYn+p32Qq0ErP5M=;
        b=hBNwZUJgExXdTjKGaNRrvKwIHWOo60MChEK7WMdG/2r+L4o4t0Bg1Njw9JfBLwvpmH
         QIWsbO0c+G4aB9WoDuDIUodTfYY0qXYA8iy4nhXBfA/26qq6ep6r4/gJgUMzOwVordxk
         cxzBvFoq2Iz1tIarlUwGKg5kkSsXBxiQ4zSoEFjMGm3fnmpLIAFVIYP1IrGyPeDFs0fb
         VS9Jhw3IdhTtZQV//aKiVdjUrdRODufnWradbJ+sn+Hp6OxZThesKguPSQ8qbLsipfMl
         vsOuCn25bUaI5vwmt0EBu5dX7sRZCBg6GZYBIsU14pMh3s8W6OXvswbYAXo3CgFfbNqL
         Fpow==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299368; x=1699904168;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=EvEjJWcD2rwHfLNLtqnw1VcwI8/iMYn+p32Qq0ErP5M=;
        b=Oa4uJvzo+y5dege3s0QFRJCDL6Dm5Dw8CywamUGcFrZ5MCAQR+koOC4uiRIbLeiuwk
         eR4+B41QsdukKrte502bq3jD55i3nl3V0JwwixLFYPvURXJNMpSmv9n2KYCdGo2KbBTe
         FQCMv51AEt+2bYPn/vpPya02YbFqgNK5U+CeIvg5PL96RrcXD4pL+uMmpcP8rg9W4g0Y
         CFd+mTm30ObERGVtIIq1tJnLJPrDUEBL4jeYGESVFueTPZQDwn7DUXJ76jPPkNmZSnvV
         tRE8WWSyYXJg+J9I8SuTlfP3SNSMUZY3zVxtS4XOT89CvfeiT9OYR8++rQLnjLhT5FXr
         ZpEA==
X-Gm-Message-State: AOJu0YysDUBXrp747K9vlsOkS4Jx2UwEjxv1YghZPKl/jdSnA3WYXBth
        xZZ6q87Xn7zLNZrfbd/wbJWRU4Ndz4epAo2gY4brGpQ8WnZuQWQdXfcyWU5ow7AAFgHmXKiAhfd
        4F4sJTgYAWBlZxBb8GA/BgSCDVsQ71iUHPz2z+LGO93ACr1ayTIUnc8OikNb7nEUzZkYoSMo=
X-Google-Smtp-Source: 
 AGHT+IFWpn9/uxIES+a+vBpl6VRwj4UwUuGBp85gPDVfQBsopGfvAbPF3nY8SktB3xMf8rS6Kctbh3hDkuJ4
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a25:9385:0:b0:da0:59f7:3c97 with SMTP id
 a5-20020a259385000000b00da059f73c97mr546655ybm.12.1699299368307; Mon, 06 Nov
 2023 11:36:08 -0800 (PST)
Date: Mon,  6 Nov 2023 19:34:59 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-17-jstultz@google.com>
Subject: [PATCH v6 16/20] sched: Fix proxy/current (push,pull)ability
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com,
        "Connor O'Brien" <connoro@google.com>,
        John Stultz <jstultz@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Valentin Schneider <valentin.schneider@arm.com>

Proxy execution forms atomic pairs of tasks: The selected task
(scheduling context) and an proxy (execution context). The
selected task, along with the rest of the blocked chain,
follows the proxy wrt CPU placement.

They can be the same task, in which case push/pull doesn't need any
modification. When they are different, however,
FIFO1 & FIFO42:

	      ,->  RT42
	      |     | blocked-on
	      |     v
blocked_donor |   mutex
	      |     | owner
	      |     v
	      `--  RT1

   RT1
   RT42

  CPU0            CPU1
   ^                ^
   |                |
  overloaded    !overloaded
  rq prio =3D 42  rq prio =3D 0

RT1 is eligible to be pushed to CPU1, but should that happen it will
"carry" RT42 along. Clearly here neither RT1 nor RT42 must be seen as
push/pullable.

Unfortunately, only the selected task is usually dequeued from the
rq, and the proxy'ed execution context (rq->curr) remains on the rq.
This can cause RT1 to be selected for migration from logic like the
rt pushable_list.

This patch adds a dequeue/enqueue cycle on the proxy task before
__schedule returns, which allows the sched class logic to avoid
adding the now current task to the pushable_list.

Furthermore, tasks becoming blocked on a mutex don't need an explicit
dequeue/enqueue cycle to be made (push/pull)able: they have to be running
to block on a mutex, thus they will eventually hit put_prev_task().

XXX: pinned tasks becoming unblocked should be removed from the push/pull
lists, but those don't get to see __schedule() straight away.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
v3:
* Tweaked comments & commit message
v5:
* Minor simplifications to utilize the fix earlier
  in the patch series.
* Rework the wording of the commit message to match selected/
  proxy terminology and expand a bit to make it more clear how
  it works.
v6:
* Droped now-unused proxied value, to be re-added later in the
  series when it is used, as caught by Dietmar
---
 kernel/sched/core.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1b84d612332e..c148ee5dcf7e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6669,6 +6669,21 @@ proxy(struct rq *rq, struct task_struct *next, struc=
t rq_flags *rf)
 	raw_spin_unlock(&mutex->wait_lock);
 	return ret;
 }
+
+static inline void proxy_tag_curr(struct rq *rq, struct task_struct *next)
+{
+	/*
+	 * pick_next_task() calls set_next_task() on the selected task
+	 * at some point, which ensures it is not push/pullable.
+	 * However, the selected task *and* the ,mutex owner form an
+	 * atomic pair wrt push/pull.
+	 *
+	 * Make sure owner is not pushable. Unfortunately we can only
+	 * deal with that by means of a dequeue/enqueue cycle. :-/
+	 */
+	dequeue_task(rq, next, DEQUEUE_NOCLOCK | DEQUEUE_SAVE);
+	enqueue_task(rq, next, ENQUEUE_NOCLOCK | ENQUEUE_RESTORE);
+}
 #else /* PROXY_EXEC */
 static struct task_struct *
 proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
@@ -6676,6 +6691,8 @@ proxy(struct rq *rq, struct task_struct *next, struct=
 rq_flags *rf)
 	BUG(); // This should never be called in the !PROXY case
 	return next;
 }
+
+static inline void proxy_tag_curr(struct rq *rq, struct task_struct *next)=
 { }
 #endif /* PROXY_EXEC */
=20
 /*
@@ -6799,6 +6816,10 @@ static void __sched notrace __schedule(unsigned int =
sched_mode)
 		 * changes to task_struct made by pick_next_task().
 		 */
 		RCU_INIT_POINTER(rq->curr, next);
+
+		if (unlikely(!task_current_selected(rq, next)))
+			proxy_tag_curr(rq, next);
+
 		/*
 		 * The membarrier system call requires each architecture
 		 * to have a full memory barrier after updating
@@ -6823,6 +6844,10 @@ static void __sched notrace __schedule(unsigned int =
sched_mode)
 		/* Also unlocks the rq: */
 		rq =3D context_switch(rq, prev, next, &rf);
 	} else {
+		/* In case next was already curr but just got blocked_donor*/
+		if (unlikely(!task_current_selected(rq, next)))
+			proxy_tag_curr(rq, next);
+
 		rq->clock_update_flags &=3D ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
=20
 		rq_unpin_lock(rq, &rf);
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 96D9CC4332F
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:37:27 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233057AbjKFTh1 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:37:27 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52356 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233139AbjKFTgo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:36:44 -0500
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4F4019A0
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:36:11 -0800 (PST)
Received: by mail-pj1-x104a.google.com with SMTP id
 98e67ed59e1d1-28016806be2so4049409a91.1
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:36:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299371; x=1699904171;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=l/34a7T65q/i93pZLMFxfmu9n9gfJaq30v2OjoL4QEA=;
        b=z02nA1bb4+zRrhBFV3BPFW1Wd7iMvx0jSVo1tEU1fBo5Am00eOf5ks2/6rF729YcNv
         oekUN2MFs9GPqZAtBJDplxzMxG60MsMjIhi14NVP8W8FpQaxMe84Vsy3oI3/Mx/64EUp
         4fvkTc9j+7+/JvyVwOVMLkQGUWzQRBhOLMZkAVIAzOb+tEi29y6YOCQH2MzBxkHqZptJ
         hW/eSVXddjo950iL7QkkV8bSeqzhBkspZ6Iqemdqpb4OJY3H33elvMt1ZhjX3rw3lNJZ
         o8vhxJk3t1IZIB+wUSsQUpGXjBJHsxuNk2G9U0zq0yvvS8nKPeJcivjGtEDNUIsrZdC2
         1ahQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299371; x=1699904171;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=l/34a7T65q/i93pZLMFxfmu9n9gfJaq30v2OjoL4QEA=;
        b=qjMpw8rIcmCYMufWCSTQkxYZ3cUwsPkICyI7QXxc2od+5OPolOZ4uL/JeTJy9jpubA
         aY07YPyVqepM2LstdDzaI2wT5t43pMih/OUbhdQHULIYRQItnE7fpyd779DSZ9/EQV3Q
         jJVeoGnNu2VeoQbODDXm6DrhmGRGi/mks/ZlLSJKtxTjkhupyFVkqDit4jnRJTXDIduE
         oA4YdW6h2rBkryaVMl4CjgM+xiDi3yWfTFfU9eJv0lgIKISrY7U9DE9VyQj/NFshCyR+
         /+6VTQWAuANMb7h8OVCj6n4l2sS7ld9r8LctS3WWwTcCaS3BJVs9qoLj3vtMY4rEabgL
         JIgw==
X-Gm-Message-State: AOJu0YxjPFaHeKlS10QiBgGlA3HusNomDp2uWBFNtNC41BWLa4584SqL
        /SYCCodFuErjKZVvV62p/wy0K6NGSbLDaa5N0RjbdSb/nIwUynzVX9MK55bKteI0NPVFAfqiPzP
        9LBzliUJ8rjACv7r2MIYugoLLfZ9ZfVF59VtZA00rx6u6c7t+yA9ZP56nTq/tlwezYI6QloQ=
X-Google-Smtp-Source: 
 AGHT+IGgzYmPIY2EvqG2W1YPDa6SUZJKDYVdWSS02IGuXugANtR8YZqg7XLXRSEtZRsXRNYcXmojkSHvd8MX
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:90a:c095:b0:280:54e3:cc69 with SMTP id
 o21-20020a17090ac09500b0028054e3cc69mr12004pjs.3.1699299370218; Mon, 06 Nov
 2023 11:36:10 -0800 (PST)
Date: Mon,  6 Nov 2023 19:35:00 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-18-jstultz@google.com>
Subject: [PATCH v6 17/20] sched: Start blocked_on chain processing in proxy()
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com,
        Valentin Schneider <valentin.schneider@arm.com>,
        "Connor O'Brien" <connoro@google.com>,
        John Stultz <jstultz@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

Start to flesh out the real proxy() implementation, but
avoid the migration cases for now, in those cases just
deactivate the selected task and pick again.

To ensure the selected task or other blocked tasks in
the chain aren't migrated away while we're running the
proxy, this patch also tweaks CFS logic to avoid migrating
selected or mutex blocked tasks.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: This change was split out from the larger proxy patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Split this out from larger proxy patch
---
 kernel/sched/core.c | 162 ++++++++++++++++++++++++++++++++++++--------
 kernel/sched/fair.c |  10 ++-
 2 files changed, 143 insertions(+), 29 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c148ee5dcf7e..c7b5cb5d8dc3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -95,6 +95,7 @@
 #include "../workqueue_internal.h"
 #include "../../io_uring/io-wq.h"
 #include "../smpboot.h"
+#include "../locking/mutex.h"
=20
 EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpu);
 EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpumask);
@@ -6613,6 +6614,15 @@ static bool try_to_deactivate_task(struct rq *rq, st=
ruct task_struct *p,
=20
 #ifdef CONFIG_PROXY_EXEC
=20
+static inline struct task_struct *
+proxy_resched_idle(struct rq *rq, struct task_struct *next)
+{
+	put_prev_task(rq, next);
+	rq_set_selected(rq, rq->idle);
+	set_tsk_need_resched(rq->idle);
+	return rq->idle;
+}
+
 bool proxy_deactivate(struct rq *rq, struct task_struct *next)
 {
 	unsigned long state =3D READ_ONCE(next->__state);
@@ -6622,52 +6632,146 @@ bool proxy_deactivate(struct rq *rq, struct task_s=
truct *next)
 		return false;
 	if (!try_to_deactivate_task(rq, next, state, true))
 		return false;
-	put_prev_task(rq, next);
-	rq_set_selected(rq, rq->idle);
-	resched_curr(rq);
+	proxy_resched_idle(rq, next);
 	return true;
 }
=20
 /*
- * Initial simple proxy that just returns the task if its waking
- * or deactivates the blocked task so we can pick something that
- * isn't blocked.
+ * Find who @next (currently blocked on a mutex) can proxy for.
+ *
+ * Follow the blocked-on relation:
+ *   task->blocked_on -> mutex->owner -> task...
+ *
+ * Lock order:
+ *
+ *   p->pi_lock
+ *     rq->lock
+ *       mutex->wait_lock
+ *         p->blocked_lock
+ *
+ * Returns the task that is going to be used as execution context (the one
+ * that is actually going to be put to run on cpu_of(rq)).
  */
 static struct task_struct *
 proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
 {
 	struct task_struct *ret =3D NULL;
 	struct task_struct *p =3D next;
+	struct task_struct *owner =3D NULL;
+	int this_cpu;
 	struct mutex *mutex;
-	mutex =3D p->blocked_on;
-	/* Something changed in the chain, pick_again */
-	if (!mutex)
-		return NULL;
+
+	this_cpu =3D cpu_of(rq);
+
 	/*
-	 * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
-	 * and ensure @owner sticks around.
+	 * Follow blocked_on chain.
+	 *
+	 * TODO: deadlock detection
 	 */
-	raw_spin_lock(&mutex->wait_lock);
-	raw_spin_lock(&p->blocked_lock);
+	for (p =3D next; task_is_blocked(p); p =3D owner) {
+		mutex =3D p->blocked_on;
+		/* Something changed in the chain, pick_again */
+		if (!mutex)
+			return NULL;
=20
-	/* Check again that p is blocked with blocked_lock held */
-	if (!task_is_blocked(p) || mutex !=3D p->blocked_on) {
 		/*
-		 * Something changed in the blocked_on chain and
-		 * we don't know if only at this level. So, let's
-		 * just bail out completely and let __schedule
-		 * figure things out (pick_again loop).
+		 * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
+		 * and ensure @owner sticks around.
+		 */
+		raw_spin_lock(&mutex->wait_lock);
+		raw_spin_lock(&p->blocked_lock);
+
+		/* Check again that p is blocked with blocked_lock held */
+		if (mutex !=3D p->blocked_on) {
+			/*
+			 * Something changed in the blocked_on chain and
+			 * we don't know if only at this level. So, let's
+			 * just bail out completely and let __schedule
+			 * figure things out (pick_again loop).
+			 */
+			raw_spin_unlock(&p->blocked_lock);
+			raw_spin_unlock(&mutex->wait_lock);
+			return NULL;
+		}
+
+		owner =3D __mutex_owner(mutex);
+		if (!owner) {
+			raw_spin_unlock(&p->blocked_lock);
+			raw_spin_unlock(&mutex->wait_lock);
+			return p;
+		}
+
+		if (task_cpu(owner) !=3D this_cpu) {
+			/* XXX Don't handle migrations yet */
+			if (!proxy_deactivate(rq, next))
+				ret =3D next;
+			raw_spin_unlock(&p->blocked_lock);
+			raw_spin_unlock(&mutex->wait_lock);
+			return ret;
+		}
+
+		if (task_on_rq_migrating(owner)) {
+			/*
+			 * One of the chain of mutex owners is currently migrating to this
+			 * CPU, but has not yet been enqueued because we are holding the
+			 * rq lock. As a simple solution, just schedule rq->idle to give
+			 * the migration a chance to complete. Much like the migrate_task
+			 * case we should end up back in proxy(), this time hopefully with
+			 * all relevant tasks already enqueued.
+			 */
+			raw_spin_unlock(&p->blocked_lock);
+			raw_spin_unlock(&mutex->wait_lock);
+			return proxy_resched_idle(rq, next);
+		}
+
+		if (!owner->on_rq) {
+			/* XXX Don't handle blocked owners yet */
+			if (!proxy_deactivate(rq, next))
+				ret =3D next;
+			raw_spin_unlock(&p->blocked_lock);
+			raw_spin_unlock(&mutex->wait_lock);
+			return ret;
+		}
+
+		if (owner =3D=3D p) {
+			/*
+			 * Its possible we interleave with mutex_unlock like:
+			 *
+			 *				lock(&rq->lock);
+			 *				  proxy()
+			 * mutex_unlock()
+			 *   lock(&wait_lock);
+			 *   next(owner) =3D current->blocked_donor;
+			 *   unlock(&wait_lock);
+			 *
+			 *   wake_up_q();
+			 *     ...
+			 *       ttwu_runnable()
+			 *         __task_rq_lock()
+			 *				  lock(&wait_lock);
+			 *				  owner =3D=3D p
+			 *
+			 * Which leaves us to finish the ttwu_runnable() and make it go.
+			 *
+			 * So schedule rq->idle so that ttwu_runnable can get the rq lock
+			 * and mark owner as running.
+			 */
+			raw_spin_unlock(&p->blocked_lock);
+			raw_spin_unlock(&mutex->wait_lock);
+			return proxy_resched_idle(rq, next);
+		}
+
+		/*
+		 * OK, now we're absolutely sure @owner is not blocked _and_
+		 * on this rq, therefore holding @rq->lock is sufficient to
+		 * guarantee its existence, as per ttwu_remote().
 		 */
 		raw_spin_unlock(&p->blocked_lock);
 		raw_spin_unlock(&mutex->wait_lock);
-		return ret;
 	}
=20
-	if (!proxy_deactivate(rq, next))
-		ret =3D p;
-	raw_spin_unlock(&p->blocked_lock);
-	raw_spin_unlock(&mutex->wait_lock);
-	return ret;
+	WARN_ON_ONCE(owner && !owner->on_rq);
+	return owner;
 }
=20
 static inline void proxy_tag_curr(struct rq *rq, struct task_struct *next)
@@ -6742,6 +6846,7 @@ static void __sched notrace __schedule(unsigned int s=
ched_mode)
 	struct rq_flags rf;
 	struct rq *rq;
 	int cpu;
+	bool preserve_need_resched =3D false;
=20
 	cpu =3D smp_processor_id();
 	rq =3D cpu_rq(cpu);
@@ -6801,9 +6906,12 @@ static void __sched notrace __schedule(unsigned int =
sched_mode)
 			rq_repin_lock(rq, &rf);
 			goto pick_again;
 		}
+		if (next =3D=3D rq->idle && prev =3D=3D rq->idle)
+			preserve_need_resched =3D true;
 	}
=20
-	clear_tsk_need_resched(prev);
+	if (!preserve_need_resched)
+		clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
 #ifdef CONFIG_SCHED_DEBUG
 	rq->last_seen_need_resched_ns =3D 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1aca675985b2..f334b129b269 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8752,7 +8752,8 @@ int can_migrate_task(struct task_struct *p, struct lb=
_env *env)
 	/* Disregard pcpu kthreads; they are where they need to be. */
 	if (kthread_is_per_cpu(p))
 		return 0;
-
+	if (task_is_blocked(p))
+		return 0;
 	if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
 		int cpu;
=20
@@ -8789,7 +8790,8 @@ int can_migrate_task(struct task_struct *p, struct lb=
_env *env)
 	/* Record that we found at least one task that could run on dst_cpu */
 	env->flags &=3D ~LBF_ALL_PINNED;
=20
-	if (task_on_cpu(env->src_rq, p)) {
+	if (task_on_cpu(env->src_rq, p) ||
+	    task_current_selected(env->src_rq, p)) {
 		schedstat_inc(p->stats.nr_failed_migrations_running);
 		return 0;
 	}
@@ -8828,6 +8830,10 @@ static void detach_task(struct task_struct *p, struc=
t lb_env *env)
 {
 	lockdep_assert_rq_held(env->src_rq);
=20
+	BUG_ON(task_is_blocked(p));
+	BUG_ON(task_current(env->src_rq, p));
+	BUG_ON(task_current_selected(env->src_rq, p));
+
 	deactivate_task(env->src_rq, p, DEQUEUE_NOCLOCK);
 	set_task_cpu(p, env->dst_cpu);
 }
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 41302C4167B
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:37:31 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232967AbjKFThb (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:37:31 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57416 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232865AbjKFTgp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:36:45 -0500
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83E261FCF
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:36:13 -0800 (PST)
Received: by mail-pg1-x54a.google.com with SMTP id
 41be03b00d2f7-5b806e55dd2so3985459a12.3
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:36:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299372; x=1699904172;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=+DYYMf6B/BOOOjO8wB1W8MvY0mQJzjw6qeErHpWe9lQ=;
        b=0AewwAz3/yOAYQxEC99x6ZKQnvgKn3O7JiBVIm+Sg1ha4mF8dRDdRn+WnMDT+xqpPA
         vd7F8v0p+KqDLlWHkewD4BVX1ncsdz+EtWohDFP9f/6dJLJPBuE0SOrTvb2w2isbpaVO
         RjlQ26uF9nFc3QcvXwHd/msqMeekXU+HEP8gBz70yBYnMlp+/ZJqh08Rx0Jfy6FnMUxt
         grfUBbj97pseMB8UYuPKwzSwe4PKtNuc5JFnVBtVT77YUbCaxKjNfc7oHSEcLkpZ1Zeg
         yjrr4NZoT2yWvQNZWJL3m304PU/aRv3bXhZQ3bfonXtAU5uUibsGMYXAO03QlkiZSx6B
         CZ0A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299372; x=1699904172;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=+DYYMf6B/BOOOjO8wB1W8MvY0mQJzjw6qeErHpWe9lQ=;
        b=JjU/EmLG6nY5W6Yawqm0WDLD/MVhxycehArMkazuG7r1byDNBQ7rh49YFyRU7Y133w
         TStQ/5v0alFXdtMOjgieNp5hOgXV0aLxRARl1IPdPF74tqM2/FMxuRnP29EB/ftSQhKe
         dHd2WyASLRciRljnq7AydGHna+E3ZU4c8zPjC4606KYdsSoJJiS4RBCHKyav5PYLwsgx
         H1ak+AwgBGAMiOPWVGYGlgODIZfz6O7CECncBsPIo97NzmjuFzD8OjIW4AX9o3aT02mX
         RAgbjrFYpycu8hF45nkgMoqXztmVNj5PP7IyDwkosJLygi7fiikRls44I8Zj5MOHNpd2
         dxEA==
X-Gm-Message-State: AOJu0YwRqCrfc2Ci+yOoyNoQJ1nFoZI/lcOBtvdIIgAgn7V091PK0KL8
        YYFVpHWgPnvaIdjz5PR5zlB8swZcw+UD1n62xVCir4pOSTKfeajVp2vvixV04GjwfGZw3EhZfFb
        K4FvdkPgTsgCdur08gOki1eei1s0jTo8kb1ycm3eyv07oimeQU6DcovJSMnuxY5Y2VEoLYE8=
X-Google-Smtp-Source: 
 AGHT+IGdjfZwcTvITXfl+E+YXXt+S9v5v7Cw0MbAfVZk/9aHyf9v1d14/97VhvSIBSvP8zTMhHtBrxInVVG6
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a63:7985:0:b0:5b3:da50:ac28 with SMTP id
 u127-20020a637985000000b005b3da50ac28mr567674pgc.5.1699299372106; Mon, 06 Nov
 2023 11:36:12 -0800 (PST)
Date: Mon,  6 Nov 2023 19:35:01 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-19-jstultz@google.com>
Subject: [PATCH v6 18/20] sched: Handle blocked-waiter migration (and return
 migration)
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add logic to handle migrating a blocked waiter to a remote
cpu where the lock owner is runnable.

Additionally, as the blocked task may not be able to run
on the remote cpu, add logic to handle return migration once
the waiting task is given the mutex.

Because tasks may get migrated to where they cannot run,
this patch also modifies the scheduling classes to avoid
sched class migrations on mutex blocked tasks, leaving
proxy() to do the migrations and return migrations.

This was split out from the larger proxy patch, and
significantly reworked to avoid changes to the try_to_wakeup()
call path.

Credits for the original patch go to:
  Peter Zijlstra (Intel) <peterz@infradead.org>
  Juri Lelli <juri.lelli@redhat.com>
  Valentin Schneider <valentin.schneider@arm.com>
  Connor O'Brien <connoro@google.com>

NOTE: The return migration is further complicated in that we
need to take the pi_lock in order to decide which cpu we should
migrate back to. This requires dropping the current rq lock,
grabbing the pi_lock re-taking the current rq lock, picking a
cpu, deactivating the task, switching its cpu, dropping the
current rq lock, grabbing the target rq, activating the task
and then dropping the target rq and reaquiring the current
rq. This seems overly complex, so suggestions for a better
approach would be welcome!

TODO: Seeing stalls/hangs after running for awhile with this
patch, which suggests we're losing track of a task somewhere
in the migrations.
[  880.032744] BUG: workqueue lockup - pool cpus=3D11 node=3D0 flags=3D0x0 =
nice=3D0 stuck for 58s!
...
[ 1443.185762] watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [irqbalan=
ce:1880]
...

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
v6:
* Integrated sched_proxy_exec() check in proxy_return_migration()
* Minor cleanups to diff
* Unpin the rq before calling __balance_callbacks()
* Tweak proxy migrate to migrate deeper task in chain, to avoid
  tasks pingponging between rqs
---
 kernel/sched/core.c     | 183 ++++++++++++++++++++++++++++++++++++++--
 kernel/sched/deadline.c |   2 +-
 kernel/sched/fair.c     |   4 +-
 kernel/sched/rt.c       |   9 +-
 4 files changed, 187 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c7b5cb5d8dc3..760e2753a24c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3000,8 +3000,15 @@ static int affine_move_task(struct rq *rq, struct ta=
sk_struct *p, struct rq_flag
 	struct set_affinity_pending my_pending =3D { }, *pending =3D NULL;
 	bool stop_pending, complete =3D false;
=20
-	/* Can the task run on the task's current CPU? If so, we're done */
-	if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
+	/*
+	 * Can the task run on the task's current CPU? If so, we're done
+	 *
+	 * We are also done if the task is selected, boosting a lock-
+	 * holding proxy, (and potentially has been migrated outside its
+	 * current or previous affinity mask)
+	 */
+	if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask) ||
+	    (task_current_selected(rq, p) && !task_current(rq, p))) {
 		struct task_struct *push_task =3D NULL;
=20
 		if ((flags & SCA_MIGRATE_ENABLE) &&
@@ -6636,6 +6643,141 @@ bool proxy_deactivate(struct rq *rq, struct task_st=
ruct *next)
 	return true;
 }
=20
+static struct task_struct *
+proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
+		   struct task_struct *p, int that_cpu)
+{
+	struct rq *that_rq;
+	int wake_cpu;
+
+	/*
+	 * If the blocked-on relationship crosses CPUs, migrate @p to the
+	 * @owner's CPU.
+	 *
+	 * This is because we must respect the CPU affinity of execution
+	 * contexts (@owner) but we can ignore affinity for scheduling
+	 * contexts (@p). So we have to move scheduling contexts towards
+	 * potential execution contexts.
+	 */
+	that_rq =3D cpu_rq(that_cpu);
+
+	/*
+	 * @owner can disappear, but simply migrate to @that_cpu and leave
+	 * that CPU to sort things out.
+	 */
+
+	/*
+	 * Since we're going to drop @rq, we have to put(@rq_selected) first,
+	 * otherwise we have a reference that no longer belongs to us. Use
+	 * @rq->idle to fill the void and make the next pick_next_task()
+	 * invocation happy.
+	 *
+	 * CPU0				CPU1
+	 *
+	 *				B mutex_lock(X)
+	 *
+	 * A mutex_lock(X) <- B
+	 * A __schedule()
+	 * A pick->A
+	 * A proxy->B
+	 * A migrate A to CPU1
+	 *				B mutex_unlock(X) -> A
+	 *				B __schedule()
+	 *				B pick->A
+	 *				B switch_to (A)
+	 *				A ... does stuff
+	 * A ... is still running here
+	 *
+	 *		* BOOM *
+	 */
+	put_prev_task(rq, rq_selected(rq));
+	rq_set_selected(rq, rq->idle);
+	set_next_task(rq, rq_selected(rq));
+	WARN_ON(p =3D=3D rq->curr);
+
+	wake_cpu =3D p->wake_cpu;
+	deactivate_task(rq, p, 0);
+	set_task_cpu(p, that_cpu);
+	/*
+	 * Preserve p->wake_cpu, such that we can tell where it
+	 * used to run later.
+	 */
+	p->wake_cpu =3D wake_cpu;
+
+	rq_unpin_lock(rq, rf);
+	__balance_callbacks(rq);
+
+	raw_spin_rq_unlock(rq);
+	raw_spin_rq_lock(that_rq);
+
+	activate_task(that_rq, p, 0);
+	check_preempt_curr(that_rq, p, 0);
+
+	raw_spin_rq_unlock(that_rq);
+	raw_spin_rq_lock(rq);
+	rq_repin_lock(rq, rf);
+
+	return NULL; /* Retry task selection on _this_ CPU. */
+}
+
+static inline bool proxy_return_migration(struct rq *rq, struct rq_flags *=
rf,
+					  struct task_struct *next)
+{
+	if (!sched_proxy_exec())
+		return false;
+
+	if (next->blocked_on && next->blocked_on_waking) {
+		if (!is_cpu_allowed(next, cpu_of(rq))) {
+			struct rq *that_rq;
+			int cpu;
+
+			if (next =3D=3D rq->curr) {
+				/* can't migrate curr, so return and let caller sort it */
+				return true;
+			}
+
+			put_prev_task(rq, rq_selected(rq));
+			rq_set_selected(rq, rq->idle);
+
+			/* First unpin & run balance callbacks */
+			rq_unpin_lock(rq, rf);
+			__balance_callbacks(rq);
+			/*
+			 * Drop the rq lock so we can get pi_lock,
+			 * then reaquire it again to figure out
+			 * where to send it.
+			 */
+			raw_spin_rq_unlock(rq);
+			raw_spin_lock(&next->pi_lock);
+			rq_lock(rq, rf);
+
+			cpu =3D select_task_rq(next, next->wake_cpu, WF_TTWU);
+
+			deactivate_task(rq, next, 0);
+			set_task_cpu(next, cpu);
+			that_rq =3D cpu_rq(cpu);
+
+			/* drop this rq lock and grab that_rq's */
+			rq_unpin_lock(rq, rf);
+			raw_spin_rq_unlock(rq);
+			raw_spin_rq_lock(that_rq);
+
+			activate_task(that_rq, next, 0);
+			check_preempt_curr(that_rq, next, 0);
+
+			/* drop that_rq's lock and re-grab this' */
+			raw_spin_rq_unlock(that_rq);
+			raw_spin_rq_lock(rq);
+			rq_repin_lock(rq, rf);
+
+			raw_spin_unlock(&next->pi_lock);
+
+			return true;
+		}
+	}
+	return false;
+}
+
 /*
  * Find who @next (currently blocked on a mutex) can proxy for.
  *
@@ -6658,7 +6800,8 @@ proxy(struct rq *rq, struct task_struct *next, struct=
 rq_flags *rf)
 	struct task_struct *ret =3D NULL;
 	struct task_struct *p =3D next;
 	struct task_struct *owner =3D NULL;
-	int this_cpu;
+	bool curr_in_chain =3D false;
+	int this_cpu, that_cpu;
 	struct mutex *mutex;
=20
 	this_cpu =3D cpu_of(rq);
@@ -6694,6 +6837,9 @@ proxy(struct rq *rq, struct task_struct *next, struct=
 rq_flags *rf)
 			return NULL;
 		}
=20
+		if (task_current(rq, p))
+			curr_in_chain =3D true;
+
 		owner =3D __mutex_owner(mutex);
 		if (!owner) {
 			raw_spin_unlock(&p->blocked_lock);
@@ -6702,12 +6848,17 @@ proxy(struct rq *rq, struct task_struct *next, stru=
ct rq_flags *rf)
 		}
=20
 		if (task_cpu(owner) !=3D this_cpu) {
-			/* XXX Don't handle migrations yet */
-			if (!proxy_deactivate(rq, next))
-				ret =3D next;
+			that_cpu =3D task_cpu(owner);
+			/*
+			 * @owner can disappear, simply migrate to @that_cpu and leave that CPU
+			 * to sort things out.
+			 */
 			raw_spin_unlock(&p->blocked_lock);
 			raw_spin_unlock(&mutex->wait_lock);
-			return ret;
+			if (curr_in_chain)
+				return proxy_resched_idle(rq, next);
+
+			return proxy_migrate_task(rq, rf, p, that_cpu);
 		}
=20
 		if (task_on_rq_migrating(owner)) {
@@ -6788,7 +6939,14 @@ static inline void proxy_tag_curr(struct rq *rq, str=
uct task_struct *next)
 	dequeue_task(rq, next, DEQUEUE_NOCLOCK | DEQUEUE_SAVE);
 	enqueue_task(rq, next, ENQUEUE_NOCLOCK | ENQUEUE_RESTORE);
 }
+
 #else /* PROXY_EXEC */
+static inline bool proxy_return_migration(struct rq *rq, struct rq_flags *=
rf,
+					  struct task_struct *next)
+{
+	return false;
+}
+
 static struct task_struct *
 proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
 {
@@ -6909,6 +7067,14 @@ static void __sched notrace __schedule(unsigned int =
sched_mode)
 		if (next =3D=3D rq->idle && prev =3D=3D rq->idle)
 			preserve_need_resched =3D true;
 	}
+	if (unlikely(proxy_return_migration(rq, &rf, next))) {
+		if (next !=3D rq->curr)
+			goto pick_again;
+
+		rq_set_selected(rq, rq->idle);
+		set_tsk_need_resched(rq->idle);
+		next =3D rq->idle;
+	}
=20
 	if (!preserve_need_resched)
 		clear_tsk_need_resched(prev);
@@ -7006,6 +7172,9 @@ static inline void sched_submit_work(struct task_stru=
ct *tsk)
 	 */
 	SCHED_WARN_ON(current->__state & TASK_RTLOCK_WAIT);
=20
+	if (task_is_blocked(tsk))
+		return;
+
 	/*
 	 * If we are going to sleep and we have plugged IO queued,
 	 * make sure to submit it to avoid deadlocks.
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index e8bca6b8da6f..99788cfd8835 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1731,7 +1731,7 @@ static void enqueue_task_dl(struct rq *rq, struct tas=
k_struct *p, int flags)
=20
 	enqueue_dl_entity(&p->dl, flags);
=20
-	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
+	if (!task_current(rq, p) && p->nr_cpus_allowed > 1 && !task_is_blocked(p))
 		enqueue_pushable_dl_task(rq, p);
 }
=20
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f334b129b269..f2dee89f475b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8220,7 +8220,9 @@ pick_next_task_fair(struct rq *rq, struct task_struct=
 *prev, struct rq_flags *rf
 		goto idle;
=20
 #ifdef CONFIG_FAIR_GROUP_SCHED
-	if (!prev || prev->sched_class !=3D &fair_sched_class)
+	if (!prev ||
+	    prev->sched_class !=3D &fair_sched_class ||
+	    rq->curr !=3D rq_selected(rq))
 		goto simple;
=20
 	/*
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 0125a3ae5a7a..d4f5625e4433 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1549,6 +1549,9 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p,=
 int flags)
 	if (p->nr_cpus_allowed =3D=3D 1)
 		return;
=20
+	if (task_is_blocked(p))
+		return;
+
 	enqueue_pushable_task(rq, p);
 }
=20
@@ -1836,10 +1839,12 @@ static void put_prev_task_rt(struct rq *rq, struct =
task_struct *p)
=20
 	update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 1);
=20
-	/* Avoid marking selected as pushable */
-	if (task_current_selected(rq, p))
+	/* Avoid marking current or selected as pushable */
+	if (task_current(rq, p) || task_current_selected(rq, p))
 		return;
=20
+	if (task_is_blocked(p))
+		return;
 	/*
 	 * The previous task needs to be made eligible for pushing
 	 * if it is still active
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C9505C4167D
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:37:33 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233096AbjKFThe (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:37:34 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38820 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232941AbjKFTgt (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:36:49 -0500
Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com
 [IPv6:2607:f8b0:4864:20::64a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7C0A1FDD
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:36:15 -0800 (PST)
Received: by mail-pl1-x64a.google.com with SMTP id
 d9443c01a7336-1cc1ddb34ccso31473565ad.1
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:36:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299375; x=1699904175;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=YQZP1XEEgwn2YYv3etnqftUpHkV/CQmcTiW8k1J7ma8=;
        b=FjwOaRqOvOJT4NJ3JgSNlO9UKL1xPnnxLoRpEGhJ43rL1T2lcFcEMq7wuyrlyaqSZs
         YTJvuHIz49GSqACKRHvuwE6CEi1qISHpT72dcuEAyHCf2+CDmvPHxrFAg7zb05CuYKxs
         939o0oR3GuC/bwyZJNdMZrlH8P2wMaawq57XL7gqFNfoU8HsI5TX17KorevWm5EzkTXv
         tEBq1L4aLlyBdAPrSDdTteyS8UxNxXJmYtqDeMvVOnIVs1WG1GpU5q7oVHYFpa7/KG74
         urMg0AsmYPAz0ZHLC4siFflQZ+e4CZWMbVZdciNu345jAHGVuSgfCGfayGsspUmNlZtE
         wJqQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299375; x=1699904175;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=YQZP1XEEgwn2YYv3etnqftUpHkV/CQmcTiW8k1J7ma8=;
        b=G3Wypry+cjLZHxkGE2nlgQJP0onFvNjW4fOKFkT+KmEw5HW72RKiuKVHZSjIyGqaaW
         2WYYnQgT+1vZXB/Jmos5TtQLrv9gy5QQUKYGIpE7JvoYzA2R0Pv0hWoBGl2vjjX2sHls
         GVY54VLZ21ZxtY/cmXezxXnxAOAyoDbi2to1J7pdwg56tWZttr48+fx21X4BDL0u7VcU
         V7TyXIadgF1KeHsIwbWYIrCOFrOAotIsxmz5U/6tnqfVfH7/SaASSX4p4FozZSUaaMJ/
         2mZY7BDfcxtIBCZOQHtYBgkRMfVgFEybwDPao6HPrHpBUXlOOZuV5brGIX3tAyMFafen
         ILYA==
X-Gm-Message-State: AOJu0Yz5fUx7q7L6WIRXz1xDc7DArjwwMgLoEkPVhLC7sE8fo75e1WVx
        5binzW/Xt/gFsIesKwHMA9+/4VLAXd2MzZjGAUsJkYVeyJFDT+rGUctz7//CAckcxGQUo4T5T06
        5WYupDjF1UB0oJFAR8WTRrw1iUShqD03FGK05RFb01Ml38EU/ffYcaQPUShIpObsdJLnWrLE=
X-Google-Smtp-Source: 
 AGHT+IGY/Vg33tEt+lGHPMLzx2eg54yf5M/fb4KFZc/G5c3aBucYKpOFcfeNOyv5xKlfAl6bx00g0vemAfvo
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:902:efd1:b0:1c6:2b9d:570b with SMTP id
 ja17-20020a170902efd100b001c62b9d570bmr533931plb.7.1699299373896; Mon, 06 Nov
 2023 11:36:13 -0800 (PST)
Date: Mon,  6 Nov 2023 19:35:02 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-20-jstultz@google.com>
Subject: [PATCH v6 19/20] sched: Add blocked_donor link to task for smarter
 mutex handoffs
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com,
        Valentin Schneider <valentin.schneider@arm.com>,
        "Connor O'Brien" <connoro@google.com>,
        John Stultz <jstultz@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

Add link to the task this task is proxying for, and use it so we
do intellegent hand-off of the owned mutex to the task we're
running on behalf.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: This patch was split out from larger proxy patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Split out from larger proxy patch

v6:
* Moved proxied value from earlier patch to this one where it
  is actually used
* Rework logic to check sched_proxy_exec() instead of using ifdefs
* Moved comment change to this patch where it makes sense
---
 include/linux/sched.h  |  1 +
 kernel/fork.c          |  1 +
 kernel/locking/mutex.c | 35 ++++++++++++++++++++++++++++++++---
 kernel/sched/core.c    | 19 +++++++++++++++++--
 4 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 47c7095b918a..9bff2f123207 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1145,6 +1145,7 @@ struct task_struct {
 	struct rt_mutex_waiter		*pi_blocked_on;
 #endif
=20
+	struct task_struct		*blocked_donor;	/* task that is boosting us */
 	struct mutex			*blocked_on;	/* lock we're blocked on */
 	bool				blocked_on_waking; /* blocked on, but waking */
 	raw_spinlock_t			blocked_lock;
diff --git a/kernel/fork.c b/kernel/fork.c
index 930947bf4569..6604e0472da0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2456,6 +2456,7 @@ __latent_entropy struct task_struct *copy_process(
 	lockdep_init_task(p);
 #endif
=20
+	p->blocked_donor =3D NULL; /* nobody is boosting us yet */
 	p->blocked_on =3D NULL; /* not blocked yet */
 	p->blocked_on_waking =3D false; /* not blocked yet */
=20
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 5394a3c4b5d9..f7187a247482 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -907,7 +907,7 @@ EXPORT_SYMBOL_GPL(ww_mutex_lock_interruptible);
  */
 static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, u=
nsigned long ip)
 {
-	struct task_struct *next =3D NULL;
+	struct task_struct *donor, *next =3D NULL;
 	DEFINE_WAKE_Q(wake_q);
 	unsigned long owner;
 	unsigned long flags;
@@ -945,7 +945,34 @@ static noinline void __sched __mutex_unlock_slowpath(s=
truct mutex *lock, unsigne
 	preempt_disable();
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	debug_mutex_unlock(lock);
-	if (!list_empty(&lock->wait_list)) {
+
+	if (sched_proxy_exec()) {
+		raw_spin_lock(&current->blocked_lock);
+		/*
+		 * If we have a task boosting us, and that task was boosting us through
+		 * this lock, hand the lock to that task, as that is the highest
+		 * waiter, as selected by the scheduling function.
+		 */
+		donor =3D current->blocked_donor;
+		if (donor) {
+			struct mutex *next_lock;
+
+			raw_spin_lock_nested(&donor->blocked_lock, SINGLE_DEPTH_NESTING);
+			next_lock =3D get_task_blocked_on(donor);
+			if (next_lock =3D=3D lock) {
+				next =3D donor;
+				donor->blocked_on_waking =3D true;
+				wake_q_add(&wake_q, donor);
+				current->blocked_donor =3D NULL;
+			}
+			raw_spin_unlock(&donor->blocked_lock);
+		}
+	}
+
+	/*
+	 * Failing that, pick any on the wait list.
+	 */
+	if (!next && !list_empty(&lock->wait_list)) {
 		/* get the first entry from the wait-list: */
 		struct mutex_waiter *waiter =3D
 			list_first_entry(&lock->wait_list,
@@ -954,7 +981,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 		next =3D waiter->task;
=20
 		debug_mutex_wake_waiter(lock, waiter);
-		raw_spin_lock(&next->blocked_lock);
+		raw_spin_lock_nested(&next->blocked_lock, SINGLE_DEPTH_NESTING);
 		WARN_ON(next->blocked_on !=3D lock);
 		next->blocked_on_waking =3D true;
 		raw_spin_unlock(&next->blocked_lock);
@@ -964,6 +991,8 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 	if (owner & MUTEX_FLAG_HANDOFF)
 		__mutex_handoff(lock, next);
=20
+	if (sched_proxy_exec())
+		raw_spin_unlock(&current->blocked_lock);
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 760e2753a24c..6ac7a241dacc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6782,7 +6782,17 @@ static inline bool proxy_return_migration(struct rq =
*rq, struct rq_flags *rf,
  * Find who @next (currently blocked on a mutex) can proxy for.
  *
  * Follow the blocked-on relation:
- *   task->blocked_on -> mutex->owner -> task...
+ *
+ *                ,-> task
+ *                |     | blocked-on
+ *                |     v
+ *  blocked_donor |   mutex
+ *                |     | owner
+ *                |     v
+ *                `-- task
+ *
+ * and set the blocked_donor relation, this latter is used by the mutex
+ * code to find which (blocked) task to hand-off to.
  *
  * Lock order:
  *
@@ -6919,6 +6929,8 @@ proxy(struct rq *rq, struct task_struct *next, struct=
 rq_flags *rf)
 		 */
 		raw_spin_unlock(&p->blocked_lock);
 		raw_spin_unlock(&mutex->wait_lock);
+
+		owner->blocked_donor =3D p;
 	}
=20
 	WARN_ON_ONCE(owner && !owner->on_rq);
@@ -7003,6 +7015,7 @@ static void __sched notrace __schedule(unsigned int s=
ched_mode)
 	unsigned long prev_state;
 	struct rq_flags rf;
 	struct rq *rq;
+	bool proxied;
 	int cpu;
 	bool preserve_need_resched =3D false;
=20
@@ -7053,9 +7066,11 @@ static void __sched notrace __schedule(unsigned int =
sched_mode)
 		switch_count =3D &prev->nvcsw;
 	}
=20
+	proxied =3D !!prev->blocked_donor;
 pick_again:
 	next =3D pick_next_task(rq, rq_selected(rq), &rf);
 	rq_set_selected(rq, next);
+	next->blocked_donor =3D NULL;
 	if (unlikely(task_is_blocked(next))) {
 		next =3D proxy(rq, next, &rf);
 		if (!next) {
@@ -7119,7 +7134,7 @@ static void __sched notrace __schedule(unsigned int s=
ched_mode)
 		rq =3D context_switch(rq, prev, next, &rf);
 	} else {
 		/* In case next was already curr but just got blocked_donor*/
-		if (unlikely(!task_current_selected(rq, next)))
+		if (unlikely(!proxied && next->blocked_donor))
 			proxy_tag_curr(rq, next);
=20
 		rq->clock_update_flags &=3D ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
--=20
2.42.0.869.gea05f2083d-goog
From nobody Sat Feb  7 21:26:15 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 69CF2C4332F
	for <linux-kernel@archiver.kernel.org>; Mon,  6 Nov 2023 19:37:45 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233155AbjKFThp (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 6 Nov 2023 14:37:45 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52300 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S233145AbjKFThD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 6 Nov 2023 14:37:03 -0500
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65D051FF5
        for <linux-kernel@vger.kernel.org>;
 Mon,  6 Nov 2023 11:36:18 -0800 (PST)
Received: by mail-pg1-x54a.google.com with SMTP id
 41be03b00d2f7-5b4128814ffso3549732a12.3
        for <linux-kernel@vger.kernel.org>;
 Mon, 06 Nov 2023 11:36:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1699299377; x=1699904177;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=KgovNar/cQhKvhzYglsfqyNmg5J3OOfZL8S11R87w8M=;
        b=oE7v8GeoPN5SX5Gv8Rb/pX4lv163Qipmo3GXhX6MplIOtNyAQ62nPWhkGPGT9pxBZ3
         JWPNjnl0QAWZCX8AdKVsHD8t48+dc0ShkFdmnawRaIKnaiwnf2VlIqb87EhKrQasxlqs
         Lw/XVjMtenb/mooG2qEeTy1lStncCTNIgWGUyNDOZkaM3F2z/rv96wbXRjBSC/6Z9VPD
         NDcawUSm8WIRDipmIZXz4JVA48z5BGtFb0RE34xg4yBBwTc6+NIPZwl+cOUWlPi6H4GX
         JnvqTN30m/yjLanLt99WPXlhQsENs/f21X00WktQUHX8LQATYq3OJIXURHqIg6W0TUUA
         0uqg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1699299377; x=1699904177;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=KgovNar/cQhKvhzYglsfqyNmg5J3OOfZL8S11R87w8M=;
        b=CEWez4TjgRaRZpL6RCpt9htTshMoJ10mxnKILvUHdDXa7F/d9Wq1aLg1nkvUM+LTSO
         PmxeCIf+FV9IArPz4p7l2DBCEnbUcXdqEhzc99MmZV+VMZCSdfMo1QAQ1OeUVAdpfLOp
         43lTLLCFsGSty7iP143r+CUdXi03qXX9uAnp9yV+xUPaV8BumK7hAwImr4SCEGz8U5rE
         nooczCtjXbkFsh4RPGkPL0Ob2MYCNBwn8iSBbcO2LzWmbuSpyGLWd7i3w1Cvew/sXtPs
         vZUeHqQz8iHuLsDTklMXdIHG3mI9u44m/zY+/6voCdpzyPJ6xx3pC/VfDAjxp+2qaWsO
         z8gg==
X-Gm-Message-State: AOJu0YxAqNyPXTMAtBIXJckFTY46NnBx2hUEYWLRFcCu0l7sw1UwXt9t
        II3f2E/iwTIOnTs9D83LZJuB5mMj2ybmhdp5cxpEH+uXdQncHdQWhKASWXPqBSLAlGo2llYNRSm
        NG+rknyhb1qi8fb5/kSF/I2EYXsOafQ9ZRs+whyViZ0FlY3aXFSMPC4Q5CR85FPx0IiRC1jM=
X-Google-Smtp-Source: 
 AGHT+IF6673xo7bBPe/sQtIZXqhLp4ljNUFfZCN2sFvtWlw4mL/daZHQt77afxNPWCUfvkGOaklemHoinRVQ
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a63:b25d:0:b0:5b9:293d:ea24 with SMTP id
 t29-20020a63b25d000000b005b9293dea24mr537704pgo.2.1699299375693; Mon, 06 Nov
 2023 11:36:15 -0800 (PST)
Date: Mon,  6 Nov 2023 19:35:03 +0000
In-Reply-To: <20231106193524.866104-1-jstultz@google.com>
Mime-Version: 1.0
References: <20231106193524.866104-1-jstultz@google.com>
X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog
Message-ID: <20231106193524.866104-21-jstultz@google.com>
Subject: [PATCH v6 20/20] sched: Add deactivated (sleeping) owner handling to
 proxy()
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Joel Fernandes <joelaf@google.com>,
        Qais Yousef <qyousef@google.com>,
        Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Valentin Schneider <vschneid@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>,
        Zimuzo Ezeozue <zezeozue@google.com>,
        Youssef Esmat <youssefesmat@google.com>,
        Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Will Deacon <will@kernel.org>,
        Waiman Long <longman@redhat.com>,
        Boqun Feng <boqun.feng@gmail.com>,
        "Paul E . McKenney" <paulmck@kernel.org>, kernel-team@android.com,
        Valentin Schneider <valentin.schneider@arm.com>,
        "Connor O'Brien" <connoro@google.com>,
        John Stultz <jstultz@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

Adds a implementation of (sleeping) deactivated owner handling
where we queue the selected task on the deactivated owner task
and deactivate it as well, re-activating it later when the owner
is woken up.

NOTE: This has been particularly challenging to get working
properly, and some of the locking is particularly ackward. I'd
very much appreciate review and feedback for ways to simplify
this.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: This was broken out from the larger proxy() patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Split out from larger proxy patch
v6:
* Major rework, replacing the single list head per task with
  per-task list head and nodes, creating a tree structure so
  we only wake up decendents of the task woken.
* Reworked the locking to take the task->pi_lock, so we can
  avoid mid-chain wakeup races from try_to_wake_up() called by
  the ww_mutex logic.
---
 include/linux/sched.h |   3 +
 kernel/fork.c         |   4 +-
 kernel/sched/core.c   | 198 ++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 196 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9bff2f123207..c5aa0208104f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1148,6 +1148,9 @@ struct task_struct {
 	struct task_struct		*blocked_donor;	/* task that is boosting us */
 	struct mutex			*blocked_on;	/* lock we're blocked on */
 	bool				blocked_on_waking; /* blocked on, but waking */
+	struct list_head		blocked_head;  /* tasks blocked on us */
+	struct list_head		blocked_node;  /* our entry on someone elses blocked_he=
ad */
+	struct task_struct		*sleeping_owner; /* task our blocked_node is enqueued=
 on */
 	raw_spinlock_t			blocked_lock;
=20
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
diff --git a/kernel/fork.c b/kernel/fork.c
index 6604e0472da0..bbcf2697652f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2459,7 +2459,9 @@ __latent_entropy struct task_struct *copy_process(
 	p->blocked_donor =3D NULL; /* nobody is boosting us yet */
 	p->blocked_on =3D NULL; /* not blocked yet */
 	p->blocked_on_waking =3D false; /* not blocked yet */
-
+	INIT_LIST_HEAD(&p->blocked_head);
+	INIT_LIST_HEAD(&p->blocked_node);
+	p->sleeping_owner =3D NULL;
 #ifdef CONFIG_BCACHE
 	p->sequential_io	=3D 0;
 	p->sequential_io_avg	=3D 0;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6ac7a241dacc..8f87318784d0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3804,6 +3804,119 @@ static inline void ttwu_do_wakeup(struct task_struc=
t *p)
 	trace_sched_wakeup(p);
 }
=20
+#ifdef CONFIG_PROXY_EXEC
+static void do_activate_task(struct rq *rq, struct task_struct *p, int en_=
flags)
+{
+	lockdep_assert_rq_held(rq);
+
+	if (!sched_proxy_exec()) {
+		activate_task(rq, p, en_flags);
+		return;
+	}
+
+	if (p->sleeping_owner) {
+		struct task_struct *owner =3D p->sleeping_owner;
+
+		raw_spin_lock(&owner->blocked_lock);
+		list_del_init(&p->blocked_node);
+		p->sleeping_owner =3D NULL;
+		raw_spin_unlock(&owner->blocked_lock);
+	}
+
+	/*
+	 * By calling activate_task with blocked_lock held, we order against
+	 * the proxy() blocked_task case such that no more blocked tasks will
+	 * be enqueued on p once we release p->blocked_lock.
+	 */
+	raw_spin_lock(&p->blocked_lock);
+	WARN_ON(task_cpu(p) !=3D cpu_of(rq));
+	activate_task(rq, p, en_flags);
+	raw_spin_unlock(&p->blocked_lock);
+}
+
+static void activate_blocked_ents(struct rq *target_rq,
+				  struct task_struct *owner,
+				  int wake_flags)
+{
+	unsigned long flags;
+	struct rq_flags rf;
+	int target_cpu =3D cpu_of(target_rq);
+	int en_flags =3D ENQUEUE_WAKEUP | ENQUEUE_NOCLOCK;
+
+	if (wake_flags & WF_MIGRATED)
+		en_flags |=3D ENQUEUE_MIGRATED;
+	/*
+	 * A whole bunch of 'proxy' tasks back this blocked task, wake
+	 * them all up to give this task its 'fair' share.
+	 */
+	raw_spin_lock(&owner->blocked_lock);
+	while (!list_empty(&owner->blocked_head)) {
+		struct task_struct *pp;
+		unsigned int state;
+
+		pp =3D list_first_entry(&owner->blocked_head,
+				      struct task_struct,
+				      blocked_node);
+		BUG_ON(pp =3D=3D owner);
+		list_del_init(&pp->blocked_node);
+		WARN_ON(!pp->sleeping_owner);
+		pp->sleeping_owner =3D NULL;
+		raw_spin_unlock(&owner->blocked_lock);
+
+		/* Nested as ttwu holds the owner's pi_lock */
+		/* XXX But how do we enforce ordering to avoid ABBA? */
+		raw_spin_lock_irqsave_nested(&pp->pi_lock, flags, SINGLE_DEPTH_NESTING);
+		smp_rmb();
+		state =3D READ_ONCE(pp->__state);
+		/* Avoid racing with ttwu */
+		if (state =3D=3D TASK_WAKING) {
+			raw_spin_unlock_irqrestore(&pp->pi_lock, flags);
+			raw_spin_lock(&owner->blocked_lock);
+			continue;
+		}
+		if (READ_ONCE(pp->on_rq)) {
+			/*
+			 * We raced with a non mutex handoff activation of pp.
+			 * That activation will also take care of activating
+			 * all of the tasks after pp in the blocked_entry list,
+			 * so we're done here.
+			 */
+			raw_spin_unlock_irqrestore(&pp->pi_lock, flags);
+			raw_spin_lock(&owner->blocked_lock);
+			continue;
+		}
+
+		__set_task_cpu(pp, target_cpu);
+
+		rq_lock_irqsave(target_rq, &rf);
+		update_rq_clock(target_rq);
+		do_activate_task(target_rq, pp, en_flags);
+		resched_curr(target_rq);
+		rq_unlock_irqrestore(target_rq, &rf);
+		raw_spin_unlock_irqrestore(&pp->pi_lock, flags);
+
+		/* recurse */
+		activate_blocked_ents(target_rq, pp, wake_flags);
+
+		raw_spin_lock(&owner->blocked_lock);
+	}
+	raw_spin_unlock(&owner->blocked_lock);
+}
+
+#else
+static inline void do_activate_task(struct rq *rq, struct task_struct *p,
+				    int en_flags)
+{
+	activate_task(rq, p, en_flags);
+}
+
+static inline void activate_blocked_ents(struct rq *target_rq,
+					 struct task_struct *owner,
+					 int wake_flags)
+{
+}
+#endif
+
 static void
 ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 		 struct rq_flags *rf)
@@ -3825,7 +3938,8 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p=
, int wake_flags,
 		atomic_dec(&task_rq(p)->nr_iowait);
 	}
=20
-	activate_task(rq, p, en_flags);
+	do_activate_task(rq, p, en_flags);
+
 	check_preempt_curr(rq, p, wake_flags);
=20
 	ttwu_do_wakeup(p);
@@ -3922,13 +4036,19 @@ void sched_ttwu_pending(void *arg)
 	update_rq_clock(rq);
=20
 	llist_for_each_entry_safe(p, t, llist, wake_entry.llist) {
+		int wake_flags;
 		if (WARN_ON_ONCE(p->on_cpu))
 			smp_cond_load_acquire(&p->on_cpu, !VAL);
=20
 		if (WARN_ON_ONCE(task_cpu(p) !=3D cpu_of(rq)))
 			set_task_cpu(p, cpu_of(rq));
=20
-		ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf);
+		wake_flags =3D p->sched_remote_wakeup ? WF_MIGRATED : 0;
+		ttwu_do_activate(rq, p, wake_flags, &rf);
+		rq_unlock(rq, &rf);
+		activate_blocked_ents(rq, p, wake_flags);
+		rq_lock(rq, &rf);
+		update_rq_clock(rq);
 	}
=20
 	/*
@@ -4069,6 +4189,15 @@ static void ttwu_queue(struct task_struct *p, int cp=
u, int wake_flags)
 	update_rq_clock(rq);
 	ttwu_do_activate(rq, p, wake_flags, &rf);
 	rq_unlock(rq, &rf);
+
+	/*
+	 * When activating blocked ents, we will take the entities
+	 * pi_lock, so drop the owners. Would love suggestions for
+	 * a better approach.
+	 */
+	raw_spin_unlock(&p->pi_lock);
+	activate_blocked_ents(rq, p, wake_flags);
+	raw_spin_lock(&p->pi_lock);
 }
=20
 /*
@@ -6778,6 +6907,31 @@ static inline bool proxy_return_migration(struct rq =
*rq, struct rq_flags *rf,
 	return false;
 }
=20
+static void proxy_enqueue_on_owner(struct rq *rq, struct task_struct *owne=
r,
+				   struct task_struct *next)
+{
+	/*
+	 * ttwu_activate() will pick them up and place them on whatever rq
+	 * @owner will run next.
+	 */
+	if (!owner->on_rq) {
+		BUG_ON(!next->on_rq);
+		deactivate_task(rq, next, DEQUEUE_SLEEP);
+		if (task_current_selected(rq, next)) {
+			put_prev_task(rq, next);
+			rq_set_selected(rq, rq->idle);
+		}
+		/*
+		 * ttwu_do_activate must not have a chance to activate p
+		 * elsewhere before it's fully extricated from its old rq.
+		 */
+		WARN_ON(next->sleeping_owner);
+		next->sleeping_owner =3D owner;
+		smp_mb();
+		list_add(&next->blocked_node, &owner->blocked_head);
+	}
+}
+
 /*
  * Find who @next (currently blocked on a mutex) can proxy for.
  *
@@ -6807,7 +6961,6 @@ static inline bool proxy_return_migration(struct rq *=
rq, struct rq_flags *rf,
 static struct task_struct *
 proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
 {
-	struct task_struct *ret =3D NULL;
 	struct task_struct *p =3D next;
 	struct task_struct *owner =3D NULL;
 	bool curr_in_chain =3D false;
@@ -6886,12 +7039,41 @@ proxy(struct rq *rq, struct task_struct *next, stru=
ct rq_flags *rf)
 		}
=20
 		if (!owner->on_rq) {
-			/* XXX Don't handle blocked owners yet */
-			if (!proxy_deactivate(rq, next))
-				ret =3D next;
-			raw_spin_unlock(&p->blocked_lock);
+			/*
+			 * rq->curr must not be added to the blocked_head list or else
+			 * ttwu_do_activate could enqueue it elsewhere before it switches
+			 * out here. The approach to avoiding this is the same as in the
+			 * migrate_task case.
+			 */
+			if (curr_in_chain) {
+				raw_spin_unlock(&p->blocked_lock);
+				raw_spin_unlock(&mutex->wait_lock);
+				return proxy_resched_idle(rq, next);
+			}
+
+			/*
+			 * If !@owner->on_rq, holding @rq->lock will not pin the task,
+			 * so we cannot drop @mutex->wait_lock until we're sure its a blocked
+			 * task on this rq.
+			 *
+			 * We use @owner->blocked_lock to serialize against ttwu_activate().
+			 * Either we see its new owner->on_rq or it will see our list_add().
+			 */
+			if (owner !=3D p) {
+				raw_spin_unlock(&p->blocked_lock);
+				raw_spin_lock(&owner->blocked_lock);
+			}
+
+			proxy_enqueue_on_owner(rq, owner, next);
+
+			if (task_current_selected(rq, next)) {
+				put_prev_task(rq, next);
+				rq_set_selected(rq, rq->idle);
+			}
+			raw_spin_unlock(&owner->blocked_lock);
 			raw_spin_unlock(&mutex->wait_lock);
-			return ret;
+
+			return NULL; /* retry task selection */
 		}
=20
 		if (owner =3D=3D p) {
--=20
2.42.0.869.gea05f2083d-goog