From nobody Tue Nov  4 22:27:51 2025
Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com
 [209.85.210.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA2471509A5
	for <linux-kernel@vger.kernel.org>; Tue,  9 Jul 2024 20:32:28 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.210.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1720557150; cv=none;
 b=C58THBz1OO87Ac7KMWq0f3h5OB/ifFYEj77L2cGnz5SIuq50nfZyElwzDgyY01Eyx06TP3Sp3IUXIyk4wsvEgSLJDR9pS+Sntjjtp8ayUKTDuKh9DnMlxPfeQAnC3+Q/W+LPJYmMGW4RBJNRdDBYMIiGXVT2j65OZY1i3bI3w/U=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1720557150; c=relaxed/simple;
	bh=q8Ix9JjcyY9SL/viaL1qjWfW1Dg1Z1BI9AMBEYWy8rk=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=YP4ET1Lw8aJspYhO9wqXmDGm6NAIYGcjlgISpKmAigjREkNx+N5sgvFqhBkPE4EDvaXxJF9jXmB+Cv4dCth6gsQe6hdKOoN/Sc+uz0YEo4zuv04QkDTOhBlyGcLg8ddpXdSlxgLaHqI0ihOgBkQaYTdlngJr50qUo3om0ebF1Tw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=yEED3fkF; arc=none smtp.client-ip=209.85.210.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="yEED3fkF"
Received: by mail-pf1-f201.google.com with SMTP id
 d2e1a72fcca58-70af6961edfso3823302b3a.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 09 Jul 2024 13:32:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1720557148; x=1721161948;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=VifKCQ8gzzjROGV0IDOcCIlHaH0M28p7NLw7oCj8lNA=;
        b=yEED3fkF6hOuUgBYLhWqZsaLr413+J4mVxGn/7alegwPvAFdqDlPE4sbutIMvU6iDG
         xRrzLfN6zDwqADwQP6LxiWi191KkdsGKS++KgfEadS+hHCxIB9ERZ7LeHix5FukRN6FH
         rUBtAoczNyoegwK4dC1wBwUXtOq/QYdRJN7aV4HPo3+FFnog6Iwn8sHD64UDtCxmvVKw
         JwT1weZXy2+cqTHIFzcZFesk+5uQpiM4DUKkWbAWsXiYpGwuNv9gkYOhEAdkzHhfGzSE
         ev/94ZI6buvGcZXRjVAUxE3IS7IaU1e5aBY35EEX4zZvKrP2dLrw1NUrPAuDS2V7ylvj
         8tOQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1720557148; x=1721161948;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=VifKCQ8gzzjROGV0IDOcCIlHaH0M28p7NLw7oCj8lNA=;
        b=HINUETgqzbTpfFlTfvMSQ04Mu1gCKOOx7YXrPwUauXUnkSrF8ZefIVvsC+9Pb3/G0z
         muRgDG/36SJNZURm4TE6UPG3FOTmd3DIl1qrgWylHXBcSurI7kvk6NcJY2WB561dZ1cv
         DxFI0LoUE1JYz4yaFhhuF/kXkk/fjqO16yf+Bv4Z2umkup4lQpAHfrqDNMfucM8nssh3
         vV744ksqYg7tnZml/eDzHQQRnryakc7Ev9gUDwtLZ+2H8bN5DEfIPJk3LF17gr9cuPMX
         +BoNImRxfIFeFKk3NEK9rP7nM9qduzBTFQF5L3ttoFt8fiXtUmj6gnKUaXy/I1VclW6F
         3IuA==
X-Gm-Message-State: AOJu0YxjWPWDkHch51MtuJpJ7evTUDrBB7w2IF2pahPYP8XwDgrIg+xe
	HDNaxAEDzOALd47v+msAOwwXZJM6fLLSc5fBk5a1tg+aB47jSd0CGwsWgflsbHNNUvi3C7At8Qk
	RJ6s7QK7dGhA7uj3ML+h7QpZugbsno+5/4VGchxzBR6CweNo33/K95X8Z8EldXNw0a//nX8c/0W
	kFX3im4sGWj8aI/Ujw8iaM5IcqQbScOpXblp7ZHRnc3CpF
X-Google-Smtp-Source: 
 AGHT+IEIzKQflnwFEK3K64qo1nTbbiDqUlbliJiC7D862cuEovbFwy0nKfpYSApan4u2dzu8ITu+B77e1RnY
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:aa7:9a5c:0:b0:706:2a35:66e2 with SMTP id
 d2e1a72fcca58-70b4369b573mr53994b3a.5.1720557145751; Tue, 09 Jul 2024
 13:32:25 -0700 (PDT)
Date: Tue,  9 Jul 2024 13:31:44 -0700
In-Reply-To: <20240709203213.799070-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240709203213.799070-1-jstultz@google.com>
X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog
Message-ID: <20240709203213.799070-2-jstultz@google.com>
Subject: [PATCH v11 1/7] locking/mutex: Remove wakeups from under
 mutex::wait_lock
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
 Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	John Stultz <jstultz@google.com>, Metin Kaya <metin.kaya@arm.com>,
	Davidlohr Bueso <dave@stgolabs.net>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

In preparation to nest mutex::wait_lock under rq::lock we need to remove
wakeups from under it.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Heavily changed after 55f036ca7e74 ("locking: WW mutex cleanup") and
08295b3b5bee ("locking: Implement an algorithm choice for Wound-Wait
mutexes")]
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
[jstultz: rebased to mainline, added extra wake_up_q & init
 to avoid hangs, similar to Connor's rework of this patch]
Signed-off-by: John Stultz <jstultz@google.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
---
v5:
* Reverted back to an earlier version of this patch to undo
  the change that kept the wake_q in the ctx structure, as
  that broke the rule that the wake_q must always be on the
  stack, as its not safe for concurrency.
v6:
* Made tweaks suggested by Waiman Long
v7:
* Fixups to pass wake_qs down for PREEMPT_RT logic
v10:
* Switched preempt_enable to be lower close to the unlock as
  suggested by Valentin
* Added additional preempt_disable coverage around the wake_q
  calls as again noted by Valentin
---
 kernel/locking/mutex.c       | 17 +++++++++++++----
 kernel/locking/rtmutex.c     | 30 +++++++++++++++++++++---------
 kernel/locking/rwbase_rt.c   |  8 +++++++-
 kernel/locking/rwsem.c       |  4 ++--
 kernel/locking/spinlock_rt.c |  3 ++-
 kernel/locking/ww_mutex.h    | 29 ++++++++++++++++++-----------
 6 files changed, 63 insertions(+), 28 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index cbae8c0b89ab..4269da1f3ef5 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -575,6 +575,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		    struct lockdep_map *nest_lock, unsigned long ip,
 		    struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
 {
+	DEFINE_WAKE_Q(wake_q);
 	struct mutex_waiter waiter;
 	struct ww_mutex *ww;
 	int ret;
@@ -625,7 +626,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	 */
 	if (__mutex_trylock(lock)) {
 		if (ww_ctx)
-			__ww_mutex_check_waiters(lock, ww_ctx);
+			__ww_mutex_check_waiters(lock, ww_ctx, &wake_q);
=20
 		goto skip_wait;
 	}
@@ -645,7 +646,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		 * Add in stamp order, waking up waiters that must kill
 		 * themselves.
 		 */
-		ret =3D __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
+		ret =3D __ww_mutex_add_waiter(&waiter, lock, ww_ctx, &wake_q);
 		if (ret)
 			goto err_early_kill;
 	}
@@ -681,6 +682,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int s=
tate, unsigned int subclas
 		}
=20
 		raw_spin_unlock(&lock->wait_lock);
+		/* Make sure we do wakeups before calling schedule */
+		if (!wake_q_empty(&wake_q)) {
+			wake_up_q(&wake_q);
+			wake_q_init(&wake_q);
+		}
 		schedule_preempt_disabled();
=20
 		first =3D __mutex_waiter_is_first(lock, &waiter);
@@ -714,7 +720,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		 */
 		if (!ww_ctx->is_wait_die &&
 		    !__mutex_waiter_is_first(lock, &waiter))
-			__ww_mutex_check_waiters(lock, ww_ctx);
+			__ww_mutex_check_waiters(lock, ww_ctx, &wake_q);
 	}
=20
 	__mutex_remove_waiter(lock, &waiter);
@@ -730,6 +736,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		ww_mutex_lock_acquired(ww, ww_ctx);
=20
 	raw_spin_unlock(&lock->wait_lock);
+	wake_up_q(&wake_q);
 	preempt_enable();
 	return 0;
=20
@@ -741,6 +748,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	raw_spin_unlock(&lock->wait_lock);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, ip);
+	wake_up_q(&wake_q);
 	preempt_enable();
 	return ret;
 }
@@ -951,9 +959,10 @@ static noinline void __sched __mutex_unlock_slowpath(s=
truct mutex *lock, unsigne
 	if (owner & MUTEX_FLAG_HANDOFF)
 		__mutex_handoff(lock, next);
=20
+	preempt_disable();
 	raw_spin_unlock(&lock->wait_lock);
-
 	wake_up_q(&wake_q);
+	preempt_enable();
 }
=20
 #ifndef CONFIG_DEBUG_LOCK_ALLOC
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 88d08eeb8bc0..7a85d9bfa972 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -34,13 +34,15 @@
=20
 static inline int __ww_mutex_add_waiter(struct rt_mutex_waiter *waiter,
 					struct rt_mutex *lock,
-					struct ww_acquire_ctx *ww_ctx)
+					struct ww_acquire_ctx *ww_ctx,
+					struct wake_q_head *wake_q)
 {
 	return 0;
 }
=20
 static inline void __ww_mutex_check_waiters(struct rt_mutex *lock,
-					    struct ww_acquire_ctx *ww_ctx)
+					    struct ww_acquire_ctx *ww_ctx,
+					    struct wake_q_head *wake_q)
 {
 }
=20
@@ -1207,6 +1209,7 @@ static int __sched task_blocks_on_rt_mutex(struct rt_=
mutex_base *lock,
 	struct rt_mutex_waiter *top_waiter =3D waiter;
 	struct rt_mutex_base *next_lock;
 	int chain_walk =3D 0, res;
+	DEFINE_WAKE_Q(wake_q);
=20
 	lockdep_assert_held(&lock->wait_lock);
=20
@@ -1245,7 +1248,10 @@ static int __sched task_blocks_on_rt_mutex(struct rt=
_mutex_base *lock,
=20
 		/* Check whether the waiter should back out immediately */
 		rtm =3D container_of(lock, struct rt_mutex, rtmutex);
-		res =3D __ww_mutex_add_waiter(waiter, rtm, ww_ctx);
+		preempt_disable();
+		res =3D __ww_mutex_add_waiter(waiter, rtm, ww_ctx, &wake_q);
+		wake_up_q(&wake_q);
+		preempt_enable();
 		if (res) {
 			raw_spin_lock(&task->pi_lock);
 			rt_mutex_dequeue(lock, waiter);
@@ -1678,7 +1684,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mute=
x_base *lock,
 				       struct ww_acquire_ctx *ww_ctx,
 				       unsigned int state,
 				       enum rtmutex_chainwalk chwalk,
-				       struct rt_mutex_waiter *waiter)
+				       struct rt_mutex_waiter *waiter,
+				       struct wake_q_head *wake_q)
 {
 	struct rt_mutex *rtm =3D container_of(lock, struct rt_mutex, rtmutex);
 	struct ww_mutex *ww =3D ww_container_of(rtm);
@@ -1689,7 +1696,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mute=
x_base *lock,
 	/* Try to acquire the lock again: */
 	if (try_to_take_rt_mutex(lock, current, NULL)) {
 		if (build_ww_mutex() && ww_ctx) {
-			__ww_mutex_check_waiters(rtm, ww_ctx);
+			__ww_mutex_check_waiters(rtm, ww_ctx, wake_q);
 			ww_mutex_lock_acquired(ww, ww_ctx);
 		}
 		return 0;
@@ -1707,7 +1714,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mute=
x_base *lock,
 		/* acquired the lock */
 		if (build_ww_mutex() && ww_ctx) {
 			if (!ww_ctx->is_wait_die)
-				__ww_mutex_check_waiters(rtm, ww_ctx);
+				__ww_mutex_check_waiters(rtm, ww_ctx, wake_q);
 			ww_mutex_lock_acquired(ww, ww_ctx);
 		}
 	} else {
@@ -1729,7 +1736,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mute=
x_base *lock,
=20
 static inline int __rt_mutex_slowlock_locked(struct rt_mutex_base *lock,
 					     struct ww_acquire_ctx *ww_ctx,
-					     unsigned int state)
+					     unsigned int state,
+					     struct wake_q_head *wake_q)
 {
 	struct rt_mutex_waiter waiter;
 	int ret;
@@ -1738,7 +1746,7 @@ static inline int __rt_mutex_slowlock_locked(struct r=
t_mutex_base *lock,
 	waiter.ww_ctx =3D ww_ctx;
=20
 	ret =3D __rt_mutex_slowlock(lock, ww_ctx, state, RT_MUTEX_MIN_CHAINWALK,
-				  &waiter);
+				  &waiter, wake_q);
=20
 	debug_rt_mutex_free_waiter(&waiter);
 	return ret;
@@ -1754,6 +1762,7 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_=
base *lock,
 				     struct ww_acquire_ctx *ww_ctx,
 				     unsigned int state)
 {
+	DEFINE_WAKE_Q(wake_q);
 	unsigned long flags;
 	int ret;
=20
@@ -1775,8 +1784,11 @@ static int __sched rt_mutex_slowlock(struct rt_mutex=
_base *lock,
 	 * irqsave/restore variants.
 	 */
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
-	ret =3D __rt_mutex_slowlock_locked(lock, ww_ctx, state);
+	ret =3D __rt_mutex_slowlock_locked(lock, ww_ctx, state, &wake_q);
+	preempt_disable();
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
+	wake_up_q(&wake_q);
+	preempt_enable();
 	rt_mutex_post_schedule();
=20
 	return ret;
diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 34a59569db6b..9f4322c07486 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -69,6 +69,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *r=
wb,
 				      unsigned int state)
 {
 	struct rt_mutex_base *rtm =3D &rwb->rtmutex;
+	DEFINE_WAKE_Q(wake_q);
 	int ret;
=20
 	rwbase_pre_schedule();
@@ -110,7 +111,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt =
*rwb,
 	 * For rwlocks this returns 0 unconditionally, so the below
 	 * !ret conditionals are optimized out.
 	 */
-	ret =3D rwbase_rtmutex_slowlock_locked(rtm, state);
+	ret =3D rwbase_rtmutex_slowlock_locked(rtm, state, &wake_q);
=20
 	/*
 	 * On success the rtmutex is held, so there can't be a writer
@@ -121,7 +122,12 @@ static int __sched __rwbase_read_lock(struct rwbase_rt=
 *rwb,
 	 */
 	if (!ret)
 		atomic_inc(&rwb->readers);
+
+	preempt_disable();
 	raw_spin_unlock_irq(&rtm->wait_lock);
+	wake_up_q(&wake_q);
+	preempt_enable();
+
 	if (!ret)
 		rwbase_rtmutex_unlock(rtm);
=20
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index c6d17aee4209..79ab7b8df5c1 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1415,8 +1415,8 @@ static inline void __downgrade_write(struct rw_semaph=
ore *sem)
 #define rwbase_rtmutex_lock_state(rtm, state)		\
 	__rt_mutex_lock(rtm, state)
=20
-#define rwbase_rtmutex_slowlock_locked(rtm, state)	\
-	__rt_mutex_slowlock_locked(rtm, NULL, state)
+#define rwbase_rtmutex_slowlock_locked(rtm, state, wq)	\
+	__rt_mutex_slowlock_locked(rtm, NULL, state, wq)
=20
 #define rwbase_rtmutex_unlock(rtm)			\
 	__rt_mutex_unlock(rtm)
diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c
index 38e292454fcc..fb1810a14c9d 100644
--- a/kernel/locking/spinlock_rt.c
+++ b/kernel/locking/spinlock_rt.c
@@ -162,7 +162,8 @@ rwbase_rtmutex_lock_state(struct rt_mutex_base *rtm, un=
signed int state)
 }
=20
 static __always_inline int
-rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int sta=
te)
+rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int sta=
te,
+			       struct wake_q_head *wake_q)
 {
 	rtlock_slowlock_locked(rtm);
 	return 0;
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 3ad2cc4823e5..7189c6631d90 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -275,7 +275,7 @@ __ww_ctx_less(struct ww_acquire_ctx *a, struct ww_acqui=
re_ctx *b)
  */
 static bool
 __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
-	       struct ww_acquire_ctx *ww_ctx)
+	       struct ww_acquire_ctx *ww_ctx, struct wake_q_head *wake_q)
 {
 	if (!ww_ctx->is_wait_die)
 		return false;
@@ -284,7 +284,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER =
*waiter,
 #ifndef WW_RT
 		debug_mutex_wake_waiter(lock, waiter);
 #endif
-		wake_up_process(waiter->task);
+		wake_q_add(wake_q, waiter->task);
 	}
=20
 	return true;
@@ -299,7 +299,8 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER =
*waiter,
  */
 static bool __ww_mutex_wound(struct MUTEX *lock,
 			     struct ww_acquire_ctx *ww_ctx,
-			     struct ww_acquire_ctx *hold_ctx)
+			     struct ww_acquire_ctx *hold_ctx,
+			     struct wake_q_head *wake_q)
 {
 	struct task_struct *owner =3D __ww_mutex_owner(lock);
=20
@@ -331,7 +332,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 		 * wakeup pending to re-read the wounded state.
 		 */
 		if (owner !=3D current)
-			wake_up_process(owner);
+			wake_q_add(wake_q, owner);
=20
 		return true;
 	}
@@ -352,7 +353,8 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
  * The current task must not be on the wait list.
  */
 static void
-__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx)
+__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx,
+			 struct wake_q_head *wake_q)
 {
 	struct MUTEX_WAITER *cur;
=20
@@ -364,8 +366,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_=
acquire_ctx *ww_ctx)
 		if (!cur->ww_ctx)
 			continue;
=20
-		if (__ww_mutex_die(lock, cur, ww_ctx) ||
-		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
+		if (__ww_mutex_die(lock, cur, ww_ctx, wake_q) ||
+		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx, wake_q))
 			break;
 	}
 }
@@ -377,6 +379,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_=
acquire_ctx *ww_ctx)
 static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx=
 *ctx)
 {
+	DEFINE_WAKE_Q(wake_q);
+
 	ww_mutex_lock_acquired(lock, ctx);
=20
 	/*
@@ -405,8 +409,10 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, s=
truct ww_acquire_ctx *ctx)
 	 * die or wound us.
 	 */
 	lock_wait_lock(&lock->base);
-	__ww_mutex_check_waiters(&lock->base, ctx);
+	__ww_mutex_check_waiters(&lock->base, ctx, &wake_q);
 	unlock_wait_lock(&lock->base);
+
+	wake_up_q(&wake_q);
 }
=20
 static __always_inline int
@@ -488,7 +494,8 @@ __ww_mutex_check_kill(struct MUTEX *lock, struct MUTEX_=
WAITER *waiter,
 static inline int
 __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
 		      struct MUTEX *lock,
-		      struct ww_acquire_ctx *ww_ctx)
+		      struct ww_acquire_ctx *ww_ctx,
+		      struct wake_q_head *wake_q)
 {
 	struct MUTEX_WAITER *cur, *pos =3D NULL;
 	bool is_wait_die;
@@ -532,7 +539,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
 		pos =3D cur;
=20
 		/* Wait-Die: ensure younger waiters die. */
-		__ww_mutex_die(lock, cur, ww_ctx);
+		__ww_mutex_die(lock, cur, ww_ctx, wake_q);
 	}
=20
 	__ww_waiter_add(lock, waiter, pos);
@@ -550,7 +557,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
 		 * such that either we or the fastpath will wound @ww->ctx.
 		 */
 		smp_mb();
-		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
+		__ww_mutex_wound(lock, ww_ctx, ww->ctx, wake_q);
 	}
=20
 	return 0;
--=20
2.45.2.993.g49e7a77208-goog
From nobody Tue Nov  4 22:27:51 2025
Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com
 [209.85.215.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4AB4185614
	for <linux-kernel@vger.kernel.org>; Tue,  9 Jul 2024 20:32:31 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.215.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1720557154; cv=none;
 b=d3NbQRuB91TSicd+Qbej+kgfmR1FyabAG43daPMqPY6UxLbrC6THZqYD1u2F95KjWD/F9fMOTSQl3rJ6N23t1PipcY39hxucypDNp/HDIUqt4HSFu0OPaflJ6zIrlQ9pQiWm9aNlsDoeEnePyqEyzKhs1PF6dRLfIin1R02rBqU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1720557154; c=relaxed/simple;
	bh=8WwFP5CCf9stN1FiTOlKvBC5zqi6xiZP0UDWiU8fL0k=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=t5E24m11nQhScZI/e+oVaEUAsxkCdTJfWiK5PxYrAG2t6wyxAqBCoDrkT554sqmjYXWpzcqz6517uXs5QYYl+SNkfFyMniZqlzXnaItgxca3swqcMYNWQd3YvH/F3PHa6JRE4te6XQiRCsmJa9k2KxDiUlIivFReS3F+HVfca2A=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=kRriDddW; arc=none smtp.client-ip=209.85.215.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="kRriDddW"
Received: by mail-pg1-f202.google.com with SMTP id
 41be03b00d2f7-5e4df21f22dso84516a12.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 09 Jul 2024 13:32:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1720557151; x=1721161951;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=dLvTAbNH73dBrl7bNnyR5edCKfFkbAVRdkATgKgoksk=;
        b=kRriDddWYlXlgFyqUDT6O5yMU+O7f1xMIClEigxPga0u5tVetcpX++Fzf+1h8JqwKP
         N9tM1fOmvduAZytY50/dsUiDsgpF24s824T5by2S3pCsYK6vb818TlFyLIw9YZ5svr/M
         nu30flUOuGbxaqVQU1EAoPYCt0zKUJnDEYMiI1M1nTZmqSkaUteWWMwYkxKfgBA3HziE
         WnyEM83I2Ab/0fb3if1Xmij1qaEkjbiOOZubR8NUqrMdANVhPzMkXa8k+TWJlqoBiwCV
         8yr2YJc48Lb56KA7Ejs2W0qzViJ9ACKL89BNJMLfXjIdel8JIpyFPS/ocKOr/94R+qLK
         kF2w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1720557151; x=1721161951;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=dLvTAbNH73dBrl7bNnyR5edCKfFkbAVRdkATgKgoksk=;
        b=KHOiDssZOWURzFsslrVFyYV5ucLru5yk0aXCgo0SliyYrN1xzuoVNFfNaepBmhzOuX
         25VuybLLVgDR5ZL/xHVGr6/MchOwnqiLVcuzdr5AYFwQ99NTLHAIYMKO3udwcpSkwha7
         u7rq2XcA8IOGyJFnlAfyUXzGaDk8XO0aTES3jtlk3UegXrW/ClsaFp9PYKL9FoqT3qwN
         dgpGWAI6kLCPeE76fPrrLYD4jdb44v49/7z1N00xjB/AnFFAELnD/wScVpFxX0c5NQGV
         tS9b2kmEzW+4kDoZTlxwAKBgP7KGlrSNIfrDtzSYDgbU1sFoM5BpbOrfdFV3HNx7Wkg1
         iu+Q==
X-Gm-Message-State: AOJu0YwfOETly6t5c9XANuH2FcvEkKF6wxGR4eh18w05XLfnoouXVc7I
	KYhfWL0OsKN8lQDHgeVcYGFthpe+m2X87Q5j8EoFfyqZq1dlpyr0Qe5XjKJovs1rYENKtqwRReP
	uQSZaX478SLIM6qXfkzEQqkfqc778MVM6YLqPDjRiFReaOqwOf3cESgDxI9km/Ih6SdUC+JBoLT
	qXHqDfQTtVn/hfdbeom5ePVt2DcOflACiRvNX3Y31CsrbF
X-Google-Smtp-Source: 
 AGHT+IGMvCZToZeHj65S2vUnjRIIe0nCFMOuJ9QbdPNkwAiB09LaXZrmLtpISjk6zta7Lvx8LztCl8RRgrDp
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a65:67da:0:b0:70e:1058:e16c with SMTP id
 41be03b00d2f7-77dfededd53mr8536a12.1.1720557148168; Tue, 09 Jul 2024 13:32:28
 -0700 (PDT)
Date: Tue,  9 Jul 2024 13:31:45 -0700
In-Reply-To: <20240709203213.799070-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240709203213.799070-1-jstultz@google.com>
X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog
Message-ID: <20240709203213.799070-3-jstultz@google.com>
Subject: [PATCH v11 2/7] locking/mutex: Make mutex::wait_lock irq safe
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
 Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	"Connor O'Brien" <connoro@google.com>, John Stultz <jstultz@google.com>,
 Metin Kaya <metin.kaya@arm.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Juri Lelli <juri.lelli@redhat.com>

mutex::wait_lock might be nested under rq->lock.

Make it irq safe then.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[rebase & fix {un,}lock_wait_lock helpers in ww_mutex.h]
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
---
v3:
* Re-added this patch after it was dropped in v2 which
  caused lockdep warnings to trip.
v7:
* Fix function definition for PREEMPT_RT case, as pointed out
  by Metin Kaya.
* Fix incorrect flags handling in PREEMPT_RT case as found by
  Metin Kaya
---
 kernel/locking/mutex.c    | 18 ++++++++++--------
 kernel/locking/ww_mutex.h | 22 +++++++++++-----------
 2 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4269da1f3ef5..6d843a0978a5 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -578,6 +578,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	DEFINE_WAKE_Q(wake_q);
 	struct mutex_waiter waiter;
 	struct ww_mutex *ww;
+	unsigned long flags;
 	int ret;
=20
 	if (!use_ww_ctx)
@@ -620,7 +621,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		return 0;
 	}
=20
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	/*
 	 * After waiting to acquire the wait_lock, try again.
 	 */
@@ -681,7 +682,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 				goto err;
 		}
=20
-		raw_spin_unlock(&lock->wait_lock);
+		raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 		/* Make sure we do wakeups before calling schedule */
 		if (!wake_q_empty(&wake_q)) {
 			wake_up_q(&wake_q);
@@ -707,9 +708,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 			trace_contention_begin(lock, LCB_F_MUTEX);
 		}
=20
-		raw_spin_lock(&lock->wait_lock);
+		raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	}
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 acquired:
 	__set_current_state(TASK_RUNNING);
=20
@@ -735,7 +736,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	if (ww_ctx)
 		ww_mutex_lock_acquired(ww, ww_ctx);
=20
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
 	return 0;
@@ -745,7 +746,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	__mutex_remove_waiter(lock, &waiter);
 err_early_kill:
 	trace_contention_end(lock, ret);
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, ip);
 	wake_up_q(&wake_q);
@@ -916,6 +917,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 	struct task_struct *next =3D NULL;
 	DEFINE_WAKE_Q(wake_q);
 	unsigned long owner;
+	unsigned long flags;
=20
 	mutex_release(&lock->dep_map, ip);
=20
@@ -942,7 +944,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 		}
 	}
=20
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	debug_mutex_unlock(lock);
 	if (!list_empty(&lock->wait_list)) {
 		/* get the first entry from the wait-list: */
@@ -960,7 +962,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 		__mutex_handoff(lock, next);
=20
 	preempt_disable();
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
 }
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 7189c6631d90..9facc0ddfdd3 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -70,14 +70,14 @@ __ww_mutex_has_waiters(struct mutex *lock)
 	return atomic_long_read(&lock->owner) & MUTEX_FLAG_WAITERS;
 }
=20
-static inline void lock_wait_lock(struct mutex *lock)
+static inline void lock_wait_lock(struct mutex *lock, unsigned long *flags)
 {
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, *flags);
 }
=20
-static inline void unlock_wait_lock(struct mutex *lock)
+static inline void unlock_wait_lock(struct mutex *lock, unsigned long *fla=
gs)
 {
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, *flags);
 }
=20
 static inline void lockdep_assert_wait_lock_held(struct mutex *lock)
@@ -144,14 +144,14 @@ __ww_mutex_has_waiters(struct rt_mutex *lock)
 	return rt_mutex_has_waiters(&lock->rtmutex);
 }
=20
-static inline void lock_wait_lock(struct rt_mutex *lock)
+static inline void lock_wait_lock(struct rt_mutex *lock, unsigned long *fl=
ags)
 {
-	raw_spin_lock(&lock->rtmutex.wait_lock);
+	raw_spin_lock_irqsave(&lock->rtmutex.wait_lock, *flags);
 }
=20
-static inline void unlock_wait_lock(struct rt_mutex *lock)
+static inline void unlock_wait_lock(struct rt_mutex *lock, unsigned long *=
flags)
 {
-	raw_spin_unlock(&lock->rtmutex.wait_lock);
+	raw_spin_unlock_irqrestore(&lock->rtmutex.wait_lock, *flags);
 }
=20
 static inline void lockdep_assert_wait_lock_held(struct rt_mutex *lock)
@@ -380,6 +380,7 @@ static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx=
 *ctx)
 {
 	DEFINE_WAKE_Q(wake_q);
+	unsigned long flags;
=20
 	ww_mutex_lock_acquired(lock, ctx);
=20
@@ -408,10 +409,9 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, s=
truct ww_acquire_ctx *ctx)
 	 * Uh oh, we raced in fastpath, check if any of the waiters need to
 	 * die or wound us.
 	 */
-	lock_wait_lock(&lock->base);
+	lock_wait_lock(&lock->base, &flags);
 	__ww_mutex_check_waiters(&lock->base, ctx, &wake_q);
-	unlock_wait_lock(&lock->base);
-
+	unlock_wait_lock(&lock->base, &flags);
 	wake_up_q(&wake_q);
 }
=20
--=20
2.45.2.993.g49e7a77208-goog
From nobody Tue Nov  4 22:27:51 2025
Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com
 [209.85.128.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E81D1185616
	for <linux-kernel@vger.kernel.org>; Tue,  9 Jul 2024 20:32:32 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1720557154; cv=none;
 b=oLkSXWwihO9SCt2tU6OtVJ302qW/qRB+9rDXuZnU8H3Xt/98s2CVyVZR543Zr9T48cdmcDSGBRaQImzwbjt6HmoUoKgGCMWNmwr66fjT7DdCUJtadRWIqv9aXNYAIvaMC7ezlXTmXEtaom/AAxF9MecIwd7axVnZQhwAHGHN2Ac=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1720557154; c=relaxed/simple;
	bh=X541dEl79Yn7xvidTxrBKmxXUU7GWVN/DNRykcpVjO8=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=Hz2KyYW2K/gn8pzMI5lCatLEJcnSxv6rKDCu7mfGPeYdDGLqUHmfU5clzcZre2u5o0jMqehMC6hfETEBQYC9/1KqtydKcnG5uHKpP3oOUCAUORUKgyshxDRAAsRfsGjT4qZQ8ZOg/xVpfWbRnZ41/HT2NwWEGhgIrA0+fy/5oMg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=J33I4bmP; arc=none smtp.client-ip=209.85.128.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="J33I4bmP"
Received: by mail-yw1-f202.google.com with SMTP id
 00721157ae682-652c6e92147so93851177b3.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 09 Jul 2024 13:32:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1720557152; x=1721161952;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=HO7ngMSkWjPHLXmo+rqra2Oo+7w/UwqwG9JY6wVCFY0=;
        b=J33I4bmPO55haCHEhl7ssnCPZDyCEQ+8oC2qPr9GVEhtRbXcj95dLEwq1p0sMOp6Cv
         5JqaHxrGTEIqkwvDOXtV0JHfYT14QMCqKHz0k+/HAFAtyty9tzuE3V6G0H0dV2GEtSOC
         ni/h/CmZhGaVKVAow20LOLRxNZvrJJHdKhP3coQXlIjvUfl4hUHiX0MT8sh100Gc2za6
         EQGhlIsuYMBWhMwIZX8qJrFI1PrC1tquyZITVZC8F+3yEfU5KRzTGIRHinJKNcdEQZRJ
         N8EBo5SvnxxUS6zTYe/beFqo7nJRyynYApx1VLG6+HAm7mWwg/Um04c48wKbi7M4LN8B
         dd/w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1720557152; x=1721161952;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=HO7ngMSkWjPHLXmo+rqra2Oo+7w/UwqwG9JY6wVCFY0=;
        b=w8FGUhKJ1eXVZg1ZOUmVVF4SBiSbCeaztPlbqV3QyTRl4rDDdhl6GwAmfmnWOGp44K
         weCNDtBInwXRpDb4x/dFfC0gjVL2+tD9POzt1MhL7Cuhheyn0mJRLWN4TwCqbwyS3kcF
         3pSqTGjzv/I0Am8YQMowZYM01dGqETimMEKOQblw0JiPbxIjK66w9oEbm/VSdeKGlb6i
         9ww9PUWvVfN/lxgCX0AbOKl1ytOxMxyCfCLX4yv2rJ/22ooIwa8IFR3ZflxIKgHC+NzI
         nUEAGHbEsVnUPjhGYytyXSmc0k8xnxPtMeTSVQdABtvkaVjQIcbmeAzqJwptYVYJwqok
         ZHEA==
X-Gm-Message-State: AOJu0Yy2y2W1QI0q34mpkFkYfy+FI9XRdOrJDLRJIqQ3mnbDrAmUmZJc
	/c1dfxjZYFYtjLNimsrHB3L8kAYDRSAF9EjSelNka+93EFt2SIoWlsrAUFet/HIQHosGXNfgydy
	mQ4P2EcsMvz8ljcql8M8ypqiqQK5Fq4FJXlVaHLtJgPun/3fGk3hneHUBDlY3PPmVbxkM2LhUEI
	d3BgM5hx8uxyOKGeNOwIfrZse2TM6gyuGf33HZT73/1+JC
X-Google-Smtp-Source: 
 AGHT+IHTH0TVWsYmV2MjvWBvPTj8ERIhji6Z6aTrrT6MOTDQGQfSU4XEudhShTCy9Ws1HmhvyEA3jtWXgA4p
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6902:260b:b0:e05:5fd1:b4ba with SMTP
 id 3f1490d57ef6-e055fd1bcb3mr97276.4.1720557151478; Tue, 09 Jul 2024 13:32:31
 -0700 (PDT)
Date: Tue,  9 Jul 2024 13:31:46 -0700
In-Reply-To: <20240709203213.799070-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240709203213.799070-1-jstultz@google.com>
X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog
Message-ID: <20240709203213.799070-4-jstultz@google.com>
Subject: [PATCH v11 3/7] locking/mutex: Expose __mutex_owner()
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
 Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	Valentin Schneider <valentin.schneider@arm.com>,
 "Connor O'Brien" <connoro@google.com>,
	John Stultz <jstultz@google.com>, Metin Kaya <metin.kaya@arm.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Juri Lelli <juri.lelli@redhat.com>

Implementing proxy execution requires that scheduler code be able to
identify the current owner of a mutex. Expose __mutex_owner() for
this purpose (alone!). Includes a null mutex check, so that users
of the function can be simplified.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
[Removed the EXPORT_SYMBOL]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Reworked per Peter's suggestions]
Signed-off-by: John Stultz <jstultz@google.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
---
v4:
* Move __mutex_owner() to kernel/locking/mutex.h instead of
  adding a new globally available accessor function to keep
  the exposure of this low, along with keeping it an inline
  function, as suggested by PeterZ
v10:
* Handle null lock ptr, to simplify later code, as suggested
  by Metin Kaya
v11:
* Tweak commit message suggested by Metin Kaya
---
 kernel/locking/mutex.c | 25 -------------------------
 kernel/locking/mutex.h | 27 +++++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 6d843a0978a5..4b7193fd3be9 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -56,31 +56,6 @@ __mutex_init(struct mutex *lock, const char *name, struc=
t lock_class_key *key)
 }
 EXPORT_SYMBOL(__mutex_init);
=20
-/*
- * @owner: contains: 'struct task_struct *' to the current lock owner,
- * NULL means not owned. Since task_struct pointers are aligned at
- * at least L1_CACHE_BYTES, we have low bits to store extra state.
- *
- * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup.
- * Bit1 indicates unlock needs to hand the lock to the top-waiter
- * Bit2 indicates handoff has been done and we're waiting for pickup.
- */
-#define MUTEX_FLAG_WAITERS	0x01
-#define MUTEX_FLAG_HANDOFF	0x02
-#define MUTEX_FLAG_PICKUP	0x04
-
-#define MUTEX_FLAGS		0x07
-
-/*
- * Internal helper function; C doesn't allow us to hide it :/
- *
- * DO NOT USE (outside of mutex code).
- */
-static inline struct task_struct *__mutex_owner(struct mutex *lock)
-{
-	return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA=
GS);
-}
-
 static inline struct task_struct *__owner_task(unsigned long owner)
 {
 	return (struct task_struct *)(owner & ~MUTEX_FLAGS);
diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h
index 0b2a79c4013b..cbff35b9b7ae 100644
--- a/kernel/locking/mutex.h
+++ b/kernel/locking/mutex.h
@@ -20,6 +20,33 @@ struct mutex_waiter {
 #endif
 };
=20
+/*
+ * @owner: contains: 'struct task_struct *' to the current lock owner,
+ * NULL means not owned. Since task_struct pointers are aligned at
+ * at least L1_CACHE_BYTES, we have low bits to store extra state.
+ *
+ * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup.
+ * Bit1 indicates unlock needs to hand the lock to the top-waiter
+ * Bit2 indicates handoff has been done and we're waiting for pickup.
+ */
+#define MUTEX_FLAG_WAITERS	0x01
+#define MUTEX_FLAG_HANDOFF	0x02
+#define MUTEX_FLAG_PICKUP	0x04
+
+#define MUTEX_FLAGS		0x07
+
+/*
+ * Internal helper function; C doesn't allow us to hide it :/
+ *
+ * DO NOT USE (outside of mutex & scheduler code).
+ */
+static inline struct task_struct *__mutex_owner(struct mutex *lock)
+{
+	if (!lock)
+		return NULL;
+	return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA=
GS);
+}
+
 #ifdef CONFIG_DEBUG_MUTEXES
 extern void debug_mutex_lock_common(struct mutex *lock,
 				    struct mutex_waiter *waiter);
--=20
2.45.2.993.g49e7a77208-goog
From nobody Tue Nov  4 22:27:51 2025
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com
 [209.85.216.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8B4B185639
	for <linux-kernel@vger.kernel.org>; Tue,  9 Jul 2024 20:32:35 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.74
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1720557157; cv=none;
 b=atBUWVONEb8cJ8cP/+z5/yAHhUusu5E8JBLfHE9m1j4bNbet9u9X2awhwxGLxC99Tj9Ee5WR98s2uAp30UwDT3/2dcbj8IoXqjIllaLR7Z/0hY2dRN1ttrUg2YblgEnQvQJr52wJShVSTVwbLevODJ7KdIVXs4e+bYNIRuuQLeI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1720557157; c=relaxed/simple;
	bh=oQTA8x65GQzVtig5DcSmr8Ibwy1iNnjZMuGwkCWZtXU=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=p9I+hxd2ZGI/Y1N5J0SFbIaNq8b6C2M0bi3CRCyXmF8hZ8fEiMS4smwR2qmX5m/D74bncYs8fcx+V5muTGkAHZGfgGTvzpi26LjsHfLB0dAsnl4/bK4jgj4JyceM4ybVbNxvvQXxTna3Tpq2fzOzvzyZF5dX9B7C+6bG1FiDesY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=WRBJ/T71; arc=none smtp.client-ip=209.85.216.74
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="WRBJ/T71"
Received: by mail-pj1-f74.google.com with SMTP id
 98e67ed59e1d1-2c1a9e8d3b0so5116937a91.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 09 Jul 2024 13:32:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1720557155; x=1721161955;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=SK4jF2z7xl5uoUAgslE/9jLcJAWwxsMsuk73k7nd2+0=;
        b=WRBJ/T71/bmxhqzlCuBznydgPqvt2TJHsKCxdgkM2qdivlftAxd1eFPDDIGduugl9Q
         X7pG/R3zrhsdKOMnLjD6a63+FlZVOben5hGUDcrPf7kL8XH7JxcxDDRZiyDHGIby2U6q
         CbQlIuj5SZ3wzZQbSI8wORhH2tIEYyiADBHZQ7KNPEs73mY95PHoWMWaKkSuLE9AJrCd
         2xZXSresuRsVqnfb0H9EqgEsOcYIbiZHNEFSp8EAAsnjOTKhTfDmELP2KyIM7b/iV6ub
         MZQnktuXmSNs6fafUF0jx0OISkJurkXsd3Lj9gGaWg9EWjzJVOgW6I7d2QFHL/lC0Ozr
         TMEg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1720557155; x=1721161955;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=SK4jF2z7xl5uoUAgslE/9jLcJAWwxsMsuk73k7nd2+0=;
        b=Kp/Uh3AfDr80Z3Ex9WSuZgGIEFM+fXupc/1iBEQmChomCkv+97OdyJfZBtkPQwImff
         huOogw5HcQuz/2j+Og/qME2BBfq1/9j5wMpQC+EMQ1L0PTbh/LInt572beP+ALvwYaWv
         ZuBrhdNL+n3d4Sc5eaXV34cFRCSeETYYaKy7PgPSD9tCB1Cm/NDzPb9fLNnnjRUuadd+
         WPgYymQrPpD8DkDwGSl7F23zyZpOkMBYqBbj69kQRJedeCQ6+lMstFSQLbSoTwAytsfS
         fIH+LFSUreqYSdV9OHr/+aoMmduJ3yrGBVxzUMGKNpVlxS2mIBEz4lFM8gzJpEovV7bx
         ErrA==
X-Gm-Message-State: AOJu0YysZaGBb04YCMtq1Fg29WYV1uqo9guyubtgySkMc3DZZrPokvHl
	YuP3e2RG5iWm3QIvD04yDk3UxBipHAnb1Ikx0xj7173Fb7pU3LpX3wEhXMHCAEGVLLCeikqUiHC
	thGV3+b1JHOA2eB2X73xjJnD4Nq0UNhA4bWfmwMWU6Ev5qOzcoDQk/rjMnup2tsf0sHorPhQLOx
	yF0NNklEzsYZUmQ7ZolqAAKwIMrxOvuvHInd3pk7mdsJkI
X-Google-Smtp-Source: 
 AGHT+IEvefeNpWBJ9CbUI1hgDOI+7IfAEHN9pnRIlIhRJhJ/mWxMB3uWM2sWWUIxs6D5KtBnBELcK4gb6kAm
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:90a:e610:b0:2c9:6903:a427 with SMTP id
 98e67ed59e1d1-2ca35be7eb9mr87390a91.1.1720557154317; Tue, 09 Jul 2024
 13:32:34 -0700 (PDT)
Date: Tue,  9 Jul 2024 13:31:47 -0700
In-Reply-To: <20240709203213.799070-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240709203213.799070-1-jstultz@google.com>
X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog
Message-ID: <20240709203213.799070-5-jstultz@google.com>
Subject: [PATCH v11 4/7] sched: Add move_queued_task_locked helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: "Connor O'Brien" <connoro@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
 Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	John Stultz <jstultz@google.com>, Metin Kaya <metin.kaya@arm.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Connor O'Brien <connoro@google.com>

Switch logic that deactivates, sets the task cpu,
and reactivates a task on a different rq to use a
helper that will be later extended to push entire
blocked task chains.

This patch was broken out from a larger chain migration
patch originally by Connor O'Brien.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: split out from larger chain migration patch]
Signed-off-by: John Stultz <jstultz@google.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
---
v8:
* Renamed from push_task_chain to do_push_task so it makes
  more sense without proxy-execution
v10:
* Changed name to move_queued_task_locked as suggested by Valentin
v11:
* Also use new helper in __migrate_swap_task() and
  try_steal_cookie() as suggested by Qais Yousef
* Nit cleanups suggested by Metin
---
 kernel/sched/core.c     | 13 +++----------
 kernel/sched/deadline.c |  8 ++------
 kernel/sched/rt.c       |  8 ++------
 kernel/sched/sched.h    | 12 ++++++++++++
 4 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bcf2c4cc0522..5e63dbcbc1f0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2712,9 +2712,7 @@ int push_cpu_stop(void *arg)
=20
 	// XXX validate p is still the highest prio task
 	if (task_rq(p) =3D=3D rq) {
-		deactivate_task(rq, p, 0);
-		set_task_cpu(p, lowest_rq->cpu);
-		activate_task(lowest_rq, p, 0);
+		move_queued_task_locked(rq, lowest_rq, p);
 		resched_curr(lowest_rq);
 	}
=20
@@ -3414,9 +3412,7 @@ static void __migrate_swap_task(struct task_struct *p=
, int cpu)
 		rq_pin_lock(src_rq, &srf);
 		rq_pin_lock(dst_rq, &drf);
=20
-		deactivate_task(src_rq, p, 0);
-		set_task_cpu(p, cpu);
-		activate_task(dst_rq, p, 0);
+		move_queued_task_locked(src_rq, dst_rq, p);
 		wakeup_preempt(dst_rq, p, 0);
=20
 		rq_unpin_lock(dst_rq, &drf);
@@ -6369,10 +6365,7 @@ static bool try_steal_cookie(int this, int that)
 		if (sched_task_is_throttled(p, this))
 			goto next;
=20
-		deactivate_task(src, p, 0);
-		set_task_cpu(p, this);
-		activate_task(dst, p, 0);
-
+		move_queued_task_locked(src, dst, p);
 		resched_curr(dst);
=20
 		success =3D true;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index c75d1307d86d..3e05e239f5f6 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2442,9 +2442,7 @@ static int push_dl_task(struct rq *rq)
 		goto retry;
 	}
=20
-	deactivate_task(rq, next_task, 0);
-	set_task_cpu(next_task, later_rq->cpu);
-	activate_task(later_rq, next_task, 0);
+	move_queued_task_locked(rq, later_rq, next_task);
 	ret =3D 1;
=20
 	resched_curr(later_rq);
@@ -2530,9 +2528,7 @@ static void pull_dl_task(struct rq *this_rq)
 			if (is_migration_disabled(p)) {
 				push_task =3D get_push_task(src_rq);
 			} else {
-				deactivate_task(src_rq, p, 0);
-				set_task_cpu(p, this_cpu);
-				activate_task(this_rq, p, 0);
+				move_queued_task_locked(src_rq, this_rq, p);
 				dmin =3D p->dl.deadline;
 				resched =3D true;
 			}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index aa4c1c874fa4..55bb33f367ad 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2105,9 +2105,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 		goto retry;
 	}
=20
-	deactivate_task(rq, next_task, 0);
-	set_task_cpu(next_task, lowest_rq->cpu);
-	activate_task(lowest_rq, next_task, 0);
+	move_queued_task_locked(rq, lowest_rq, next_task);
 	resched_curr(lowest_rq);
 	ret =3D 1;
=20
@@ -2378,9 +2376,7 @@ static void pull_rt_task(struct rq *this_rq)
 			if (is_migration_disabled(p)) {
 				push_task =3D get_push_task(src_rq);
 			} else {
-				deactivate_task(src_rq, p, 0);
-				set_task_cpu(p, this_cpu);
-				activate_task(this_rq, p, 0);
+				move_queued_task_locked(src_rq, this_rq, p);
 				resched =3D true;
 			}
 			/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a831af102070..a24286059a6a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3479,5 +3479,17 @@ static inline void init_sched_mm_cid(struct task_str=
uct *t) { }
=20
 extern u64 avg_vruntime(struct cfs_rq *cfs_rq);
 extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se);
+#ifdef CONFIG_SMP
+static inline
+void move_queued_task_locked(struct rq *src_rq, struct rq *dst_rq, struct =
task_struct *task)
+{
+	lockdep_assert_rq_held(src_rq);
+	lockdep_assert_rq_held(dst_rq);
+
+	deactivate_task(src_rq, task, 0);
+	set_task_cpu(task, dst_rq->cpu);
+	activate_task(dst_rq, task, 0);
+}
+#endif
=20
 #endif /* _KERNEL_SCHED_SCHED_H */
--=20
2.45.2.993.g49e7a77208-goog
From nobody Tue Nov  4 22:27:52 2025
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com
 [209.85.214.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2BE718785B
	for <linux-kernel@vger.kernel.org>; Tue,  9 Jul 2024 20:32:37 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1720557159; cv=none;
 b=soDFIXegG2WdQej12ms0ONm6oj39cPP1IjAbzQiL7qq6q1ILZZHXrdr5XtYigHwAbBK0QtBw444bqF45SX7gkU5uSq1VaIep23mk/bQj6nLk3YnwAx5XTbYf7wYwHXPdM7GOPvzzDLVApQC4SC4ACnn1oz23FkEceBvOvDYa7oY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1720557159; c=relaxed/simple;
	bh=V1PF08pJ+tLWUnbiVFTp/QGQhy5yYg7gQLXFb/OQcQ4=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=uLeR5nDPS21p/QsBpwwrydhFBN4x+hsKfTACqoP71LLV8H9Qig6f62mAkZNLsJAYv4898Bq4jmDHaDxHLWL5plw77YDFRd58oaE8npKYdSEz3V1WOWgWZlL6yT1yzIaWhmv7zsIIJFxkb9KenJFNui9IttQeGBBaQR1ESy0dqI0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=H04tlgh8; arc=none smtp.client-ip=209.85.214.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="H04tlgh8"
Received: by mail-pl1-f202.google.com with SMTP id
 d9443c01a7336-1fb2ee83169so42772435ad.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 09 Jul 2024 13:32:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1720557157; x=1721161957;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=q6xxvV2bWwwSWDlU+VJFmJ+e4u0BCl9LU2uMRtJJq7g=;
        b=H04tlgh8975QOJTdnCyA0433UcmoNXsf+8jlnpoW9oOmY0K8bU88MtdxPYwQ7LJ86I
         ///aCSorh93PC5EiDW890bECBIepHIngGk6mU3zygbeMrM6dsjEhBCxtena6msKIAnj0
         lMf30+qT49MyqTEJBvmxMF16dxJXNIHF0AUk3Ne6uEJNTLiohGneiHs27tGCI8R/acje
         QAtItwSNKo7v+vuSuTTHwmD1Co6lQh9zfInZGBEKJ/36BIaYqqltNlGZr6ljZub2eAVp
         vDsfojcHffXRaWZV3qiv4mZUJhOPBShSEbgVw/TpluhV0Lc1tfx6IjTnnGEDbHmAPEuF
         XfIw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1720557157; x=1721161957;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=q6xxvV2bWwwSWDlU+VJFmJ+e4u0BCl9LU2uMRtJJq7g=;
        b=wUMhUf0L2ID9G1ZfoeYpjVJwZROb8brfhJeshsU/okGexQPns6GmEa7D3vB2+3/hKD
         kVyRrospeIOARMa2cACGHHnwRUp95GHFxHoNFVDlP8I/q7g86jyJJvMjCy1PQyo2f1hr
         A+lu6FH8OJC4yVK4rvT7RndUMfnX02bo4GP0B1hLHuWvBvI2bJhBcB9Slx1BCFf4TK70
         1TLmFwyBAocoBWxFlqrU6FQ1RfrP0j47DgHBHkrhxDarxLQ/y1uDbm5Hg9FFLaqccv4p
         FccgwDufKkNynzyuls1w2PfKI9WlRPkNhHCJcJO9o7C+4iv6cbMiSkd0c5LVUKeBVCrU
         e8SQ==
X-Gm-Message-State: AOJu0YwdYWDdt8yrOKdGIvE3IJbwjP8itMX03Oro9WEj+wTLO2eknxHM
	tbqdpECSGUKN5El5dKnjlU1GfbSKWJbDfT4T3MZFCdLOj6QAl21pjr6vwlyd3guVaGvY8jM1w6C
	STOrgMAT5CYpvdAuERNNg5VkRoD55g/S72Ps6TMB4h73iENgVoDnmyEtASQ1CHJmHQbcgdoYtH4
	OOiDf4SiZ6aVgSJm7it+S2N+o5QXTY/E8BgP40IfENW3rc
X-Google-Smtp-Source: 
 AGHT+IFl1bSmqEi3C8wdAPOssZy1aEGWMhMjHM4X/fIsGQ249Cc0QIbjiNLJqMXt9/yCo0Du1jYzfmP+7xoc
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:902:ecc7:b0:1fa:acf0:72d6 with SMTP id
 d9443c01a7336-1fbb6cd2646mr161955ad.3.1720557156775; Tue, 09 Jul 2024
 13:32:36 -0700 (PDT)
Date: Tue,  9 Jul 2024 13:31:48 -0700
In-Reply-To: <20240709203213.799070-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240709203213.799070-1-jstultz@google.com>
X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog
Message-ID: <20240709203213.799070-6-jstultz@google.com>
Subject: [PATCH v11 5/7] sched: Consolidate pick_*_task to task_is_pushable
 helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: "Connor O'Brien" <connoro@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
 Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	John Stultz <jstultz@google.com>, Metin Kaya <metin.kaya@arm.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Connor O'Brien <connoro@google.com>

This patch consolidates rt and deadline pick_*_task functions to
a task_is_pushable() helper

This patch was broken out from a larger chain migration
patch originally by Connor O'Brien.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: split out from larger chain migration patch,
 renamed helper function]
Signed-off-by: John Stultz <jstultz@google.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
---
v7:
* Split from chain migration patch
* Renamed function
v11:
* Switched to bool (though later in the series it goes
  to a tri-state return) for now to simplify review.
  Will add tri-state handling later in the series when
  its needed. Suggested by Metin and others.
---
 kernel/sched/deadline.c | 10 +---------
 kernel/sched/rt.c       | 11 +----------
 kernel/sched/sched.h    |  9 +++++++++
 3 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 3e05e239f5f6..ef135776e068 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2178,14 +2178,6 @@ static void task_fork_dl(struct task_struct *p)
 /* Only try algorithms three times */
 #define DL_MAX_TRIES 3
=20
-static int pick_dl_task(struct rq *rq, struct task_struct *p, int cpu)
-{
-	if (!task_on_cpu(rq, p) &&
-	    cpumask_test_cpu(cpu, &p->cpus_mask))
-		return 1;
-	return 0;
-}
-
 /*
  * Return the earliest pushable rq's task, which is suitable to be executed
  * on the CPU, NULL otherwise:
@@ -2204,7 +2196,7 @@ static struct task_struct *pick_earliest_pushable_dl_=
task(struct rq *rq, int cpu
 	if (next_node) {
 		p =3D __node_2_pdl(next_node);
=20
-		if (pick_dl_task(rq, p, cpu))
+		if (task_is_pushable(rq, p, cpu))
 			return p;
=20
 		next_node =3D rb_next(next_node);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 55bb33f367ad..56363e18949a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1790,15 +1790,6 @@ static void put_prev_task_rt(struct rq *rq, struct t=
ask_struct *p)
 /* Only try algorithms three times */
 #define RT_MAX_TRIES 3
=20
-static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
-{
-	if (!task_on_cpu(rq, p) &&
-	    cpumask_test_cpu(cpu, &p->cpus_mask))
-		return 1;
-
-	return 0;
-}
-
 /*
  * Return the highest pushable rq's task, which is suitable to be executed
  * on the CPU, NULL otherwise
@@ -1812,7 +1803,7 @@ static struct task_struct *pick_highest_pushable_task=
(struct rq *rq, int cpu)
 		return NULL;
=20
 	plist_for_each_entry(p, head, pushable_tasks) {
-		if (pick_rt_task(rq, p, cpu))
+		if (task_is_pushable(rq, p, cpu))
 			return p;
 	}
=20
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a24286059a6a..493de4cc320a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3490,6 +3490,15 @@ void move_queued_task_locked(struct rq *src_rq, stru=
ct rq *dst_rq, struct task_s
 	set_task_cpu(task, dst_rq->cpu);
 	activate_task(dst_rq, task, 0);
 }
+
+static inline
+bool task_is_pushable(struct rq *rq, struct task_struct *p, int cpu)
+{
+	if (!task_on_cpu(rq, p) &&
+	    cpumask_test_cpu(cpu, &p->cpus_mask))
+		return true;
+	return false;
+}
 #endif
=20
 #endif /* _KERNEL_SCHED_SCHED_H */
--=20
2.45.2.993.g49e7a77208-goog
From nobody Tue Nov  4 22:27:52 2025
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08B2A18787A
	for <linux-kernel@vger.kernel.org>; Tue,  9 Jul 2024 20:32:39 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.73
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1720557161; cv=none;
 b=sk8U1HU8Ll2YPKS/MIJ8NreAizH4hfP5WSmRiliTKfQk4vpr5+GC8pEpI5r1Tvnbx8WqVxndcqHUlXMZ8vmqcVlsKjzP+Ja8M5uqszmxMYAPLoDuRVR/u9FJPX/kzdoQxRrfaWwt//ncj0Mt/D7CWWRaIGbGWP/uyxBAW5swaIs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1720557161; c=relaxed/simple;
	bh=OdboL8GRjZ9HaNPDORv6f99DqVqCLARn8M9yns8iW10=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=NZotO+NYPBwnVLgEQLtpkL8cv1S0W2VUEwZzhdANWDsKeGt0p1gNA1D7TNXI2NAZv0YmyinvDCM9ClJ0o/SYcTrj66oWj5pQVwY/jUHdvIs0Fz2o0H58ZgU0BqUYQKOlnVglZ9X/SlyuzKc1DRmNk+Js1lbpvILrDRNuUxQvWJ0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=s6QjC69J; arc=none smtp.client-ip=209.85.216.73
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="s6QjC69J"
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-2c95f737113so5418308a91.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 09 Jul 2024 13:32:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1720557159; x=1721161959;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=2+x+zlwEFkDOcEBJWqLmqHwwAqtu5VuIJvpoq5xrn34=;
        b=s6QjC69J/O1Mv7fjBxyGWCT6lKzY6Qew2CusBhxk5qVWrmZ/sy9pNEv1ZsUduCayDj
         7lMnBCsonuv4aZ1oHdYKcB5e7ubanVJSKT+jwa2iiCUmq/QDn9DClC2aYiU1UKqQia5W
         oy1DBi0AmDa/lrx76yJsUk8AfMlD7aZTgyi08g/j4jsAVybEVUWR53aEJeDBVtZXITkW
         KqrEMVd1Wyt0VC0kgAlBLwx08A9Ha3/DdPBy2Q4SSEPYVHsU1r8QEEMPC9QDrjJFzcUI
         UsOYH2ur8e0/Cz6WLe3awyprfDNwwhTpKZOaIvgcVTYD7pd/9/Hd2OWPKjM6HTwwRCET
         wygA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1720557159; x=1721161959;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=2+x+zlwEFkDOcEBJWqLmqHwwAqtu5VuIJvpoq5xrn34=;
        b=W7MAVy9sc7LdZkXNrvEPOYRJhMbuha4uZTCfa/5GrVsDOm0kfZiBi55qZLG7MGaEKK
         LmTClCLq4YZ46V6W+FvTDKL4ouBGJBaVTRyyF/UThGk6G9EceJNAH8aa7XF1/24CK0ng
         C4VNivIYt81CcQZoIy7akqUrqgya3iZswbDBOJq9eLH11TphmtPKmlp7Wg8iwwVUU+zb
         BjLqmU8Aa4x0/87PQfnhiQcZq1QYUXPbs+rPlEAcGr8Tzpf+v+nMS7lUULUtdRAq8sft
         GT84gMGMyd3XvjJcG7wM0dYGUDgNoPpW+59HyV+dbSkYFJJZaAm6VScOZhffyebcWKSm
         HaTQ==
X-Gm-Message-State: AOJu0YxLWu49OaRpWzNsdVzr73CAtQHh9rEdcVhOpRZkYCNSj3KxYFNi
	/5/Hup47DB4NfCQF3vNbzXSQbPRZa2fwIm/Q9oFWQ7wvo8GuCbiTi81+iuKhnzlLaf2mYuYMYwr
	qpg23NasU5Lx8rq27B6PuPEC1ZAnpxbzoMv/KWvKDBn5rELUOhjv5DXt5FK2NJk80u71pTncojD
	zYjz0jM3gLGhqb2tzC9yFWgOoIedUUvCFs4syPcyB5d4aH
X-Google-Smtp-Source: 
 AGHT+IFFC0Vp1N3md7SxhD+JLWl5slCFlb5hK6ZHAGLfzOORUrqcGP4Bk8LB46ia8JbxDjJuWLDE4ifCoONQ
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:90b:1917:b0:2c9:75c6:32d8 with SMTP id
 98e67ed59e1d1-2ca35be805dmr83893a91.1.1720557158836; Tue, 09 Jul 2024
 13:32:38 -0700 (PDT)
Date: Tue,  9 Jul 2024 13:31:49 -0700
In-Reply-To: <20240709203213.799070-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240709203213.799070-1-jstultz@google.com>
X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog
Message-ID: <20240709203213.799070-7-jstultz@google.com>
Subject: [PATCH v11 6/7] sched: Split out __schedule() deactivate task logic
 into a helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
 Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	Metin Kaya <metin.kaya@arm.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

As we're going to re-use the deactivation logic,
split it into a helper.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
---
v6:
* Define function as static to avoid "no previous prototype"
  warnings as Reported-by: kernel test robot <lkp@intel.com>
v7:
* Rename state task_state to be more clear, as suggested by
  Metin Kaya
v11:
* Return early to simplify indentation, and drop unused bool
  return (will be introduced later when its needed) as sugggested
  by Qais.
---
 kernel/sched/core.c | 71 +++++++++++++++++++++++++++------------------
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5e63dbcbc1f0..029e7ecf5ea9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6566,6 +6566,47 @@ pick_next_task(struct rq *rq, struct task_struct *pr=
ev, struct rq_flags *rf)
 # define SM_MASK_PREEMPT	SM_PREEMPT
 #endif
=20
+/*
+ * Helper function for __schedule()
+ *
+ * If a task does not have signals pending, deactivate it
+ * Otherwise marks the task's __state as RUNNING
+ */
+static void try_to_deactivate_task(struct rq *rq, struct task_struct *p,
+				   unsigned long task_state)
+{
+	if (signal_pending_state(task_state, p)) {
+		WRITE_ONCE(p->__state, TASK_RUNNING);
+		return;
+	}
+
+	p->sched_contributes_to_load =3D
+		(task_state & TASK_UNINTERRUPTIBLE) &&
+		!(task_state & TASK_NOLOAD) &&
+		!(task_state & TASK_FROZEN);
+
+	if (p->sched_contributes_to_load)
+		rq->nr_uninterruptible++;
+
+	/*
+	 * __schedule()			ttwu()
+	 *   prev_state =3D prev->state;    if (p->on_rq && ...)
+	 *   if (prev_state)		    goto out;
+	 *     p->on_rq =3D 0;		  smp_acquire__after_ctrl_dep();
+	 *				  p->state =3D TASK_WAKING
+	 *
+	 * Where __schedule() and ttwu() have matching control dependencies.
+	 *
+	 * After this, schedule() must not care about p->state any more.
+	 */
+	deactivate_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
+
+	if (p->in_iowait) {
+		atomic_inc(&rq->nr_iowait);
+		delayacct_blkio_start();
+	}
+}
+
 /*
  * __schedule() is the main scheduler function.
  *
@@ -6659,35 +6700,7 @@ static void __sched notrace __schedule(unsigned int =
sched_mode)
 	 */
 	prev_state =3D READ_ONCE(prev->__state);
 	if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
-		if (signal_pending_state(prev_state, prev)) {
-			WRITE_ONCE(prev->__state, TASK_RUNNING);
-		} else {
-			prev->sched_contributes_to_load =3D
-				(prev_state & TASK_UNINTERRUPTIBLE) &&
-				!(prev_state & TASK_NOLOAD) &&
-				!(prev_state & TASK_FROZEN);
-
-			if (prev->sched_contributes_to_load)
-				rq->nr_uninterruptible++;
-
-			/*
-			 * __schedule()			ttwu()
-			 *   prev_state =3D prev->state;    if (p->on_rq && ...)
-			 *   if (prev_state)		    goto out;
-			 *     p->on_rq =3D 0;		  smp_acquire__after_ctrl_dep();
-			 *				  p->state =3D TASK_WAKING
-			 *
-			 * Where __schedule() and ttwu() have matching control dependencies.
-			 *
-			 * After this, schedule() must not care about p->state any more.
-			 */
-			deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
-
-			if (prev->in_iowait) {
-				atomic_inc(&rq->nr_iowait);
-				delayacct_blkio_start();
-			}
-		}
+		try_to_deactivate_task(rq, prev, prev_state);
 		switch_count =3D &prev->nvcsw;
 	}
=20
--=20
2.45.2.993.g49e7a77208-goog
From nobody Tue Nov  4 22:27:52 2025
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CF19189F37
	for <linux-kernel@vger.kernel.org>; Tue,  9 Jul 2024 20:32:42 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.73
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1720557164; cv=none;
 b=OT3GPEGgGKjEZyuStnoPtr6qFNgwoR4u+ObddckF9b/T+Jchj8GqPtKQ2Ox0DikxSDLKJUEbQTPKwlhHuakkOGELa0zlfDVU0nOo+Z8Q38WozgAnU8OJXPLAs7QQ9s1TUuSkleUoOARQBe+VjAgRDb6tHwEbag1xOKGpGxi0384=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1720557164; c=relaxed/simple;
	bh=8oj2dPJkunaEhP3TXkBMXGahZJyB/Lzz3GQs5N7CM0A=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=JUHBbSzutA/9pcVw94NjfVYJRWW4x0Eh/igmWW+JhrOs5CwnqlNWJlg6xKpwnlMUGzyb84XtKiaQ9B/4u01WGH2b586rsYdrUxj8qaKfjhrxpD3yMkCPviyLohF/W3PQg6X82plpqeVCo68dZ91eTvmheDCVwVFfrhCqXFnrJ1s=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=K6RvXDIX; arc=none smtp.client-ip=209.85.216.73
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="K6RvXDIX"
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-2c1a9e8d3b0so5117027a91.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 09 Jul 2024 13:32:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1720557161; x=1721161961;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=9e4tXmsEC4Nw+Pj8+hrXXnuPc9FiOIZDGJo5IDuGx9g=;
        b=K6RvXDIXx59I/RuMaY43edwJawoUHU88TZdmrWEfGmYmG/26cBVTXYqJkDcNBUJhyc
         ZMsDb9DvWyaYPNLW/2hkGPemA0+pbClfDhI2u28L1UAoX8JDD2gRnrzYj2F3sKVFxQAb
         ZyO7k2KEz3avWSksNUkSzrxsL8aWUGcgYA5MJwZj/IrmB/qeRASntg1o7lTIAOc6QQzF
         JE2CxruGwcvGL7alUpYLrSjkIcndVo2cStUH7ol25rhVo/3mAblk7M4XlacecVQ2ORcC
         9ts9h/HsbQgPBs+ePMN89bRErA174Bxe+ldOywe5/pJw06mBnFRW+2uOfPyibkdX1+TB
         bSNA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1720557161; x=1721161961;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=9e4tXmsEC4Nw+Pj8+hrXXnuPc9FiOIZDGJo5IDuGx9g=;
        b=epck3RfAQdv8SdFSsu5W4fKvs3YT+1OA79nIC3zaOwF0Xg1C+EED1Gx50BXc9ahOtO
         AXo1257aKbwdPWsWUbnHjW+uNCAXrSX2wCmUBMig1QDYvw5EXJ4WPlrLvtBDjLfDl9Jz
         fL5UA1OUBzVhyZ7ttk81bsPN27Qmt8aH8ivr6NvKEHTKdduHgtQGWUKmk6peWcDtqyEj
         VMdTYZjSgJYTc6psrdCXG6kPON9kkGYHcYs/eXJrmITqm66GfPo/tJwR6xTVIkZuiN8C
         wEYBEHH8dxn1fH5pNFz7jgy3DV9LG3ZNFObebamAcchrEun6o4gzsyjqGo/94zNzOcuJ
         YKJw==
X-Gm-Message-State: AOJu0YxMsSthsK25wcKd0auyZJwc0VOh2nCxY5Ffkr//khj24/4eVrv7
	VHN7QEFWLPfJySdvyP11DzXGFtXDVOGw7Qbe+Bg5qMnnMMDDn4DwJiQmpX6SRle0qSla3z5PnUM
	Z1rUFBhHHKUrnY7eJ9BkltdfyYPTh70A/JDoo6WmKLB1qzB+LwjJR+ZEh06vW7UaEF8/vflhCVP
	3Ag38wAziv5i3EEap3YOCtRbnafblpQYWDl5VxFYPuj9kg
X-Google-Smtp-Source: 
 AGHT+IHHYaY+AsRwjbooo/r+hbZcznnhi7/ERdwg/3poVO0ge/tGlFZlxV2mrkusIBNXMalXMTMntVDLoSD0
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:902:d2cc:b0:1f9:aafc:1ebb with SMTP id
 d9443c01a7336-1fbb6e9abc3mr2637455ad.13.1720557161034; Tue, 09 Jul 2024
 13:32:41 -0700 (PDT)
Date: Tue,  9 Jul 2024 13:31:50 -0700
In-Reply-To: <20240709203213.799070-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240709203213.799070-1-jstultz@google.com>
X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog
Message-ID: <20240709203213.799070-8-jstultz@google.com>
Subject: [PATCH v11 7/7] sched: Split scheduler and execution contexts
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
 Youssef Esmat <youssefesmat@google.com>,
	Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
 Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Metin Kaya <Metin.Kaya@arm.com>, Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	"Connor O'Brien" <connoro@google.com>, John Stultz <jstultz@google.com>,
 Metin Kaya <metin.kaya@arm.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

Let's define the scheduling context as all the scheduler state
in task_struct for the task selected to run, and the execution
context as all state required to actually run the task.

Currently both are intertwined in task_struct. We want to
logically split these such that we can use the scheduling
context of the task selected to be scheduled, but use the
execution context of a different task to actually be run.

To this purpose, introduce rq_selected() macro to point to the
task_struct selected from the runqueue by the scheduler, and
will be used for scheduler state, and preserve rq->curr to
indicate the execution context of the task that will actually be
run.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20181009092434.26221-5-juri.lelli@redhat.com
[add additional comments and update more sched_class code to use
 rq::proxy]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Rebased and resolved minor collisions, reworked to use
 accessors, tweaked update_curr_common to use rq_proxy fixing rt
 scheduling issues]
Signed-off-by: John Stultz <jstultz@google.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
---
v2:
* Reworked to use accessors
* Fixed update_curr_common to use proxy instead of curr
v3:
* Tweaked wrapper names
* Swapped proxy for selected for clarity
v4:
* Minor variable name tweaks for readability
* Use a macro instead of a inline function and drop
  other helper functions as suggested by Peter.
* Remove verbose comments/questions to avoid review
  distractions, as suggested by Dietmar
v5:
* Add CONFIG_PROXY_EXEC option to this patch so the
  new logic can be tested with this change
* Minor fix to grab rq_selected when holding the rq lock
v7:
* Minor spelling fix and unused argument fixes suggested by
  Metin Kaya
* Switch to curr_selected for consistency, and minor rewording
  of commit message for clarity
* Rename variables selected instead of curr when we're using
  rq_selected()
* Reduce macros in CONFIG_SCHED_PROXY_EXEC ifdef sections,
  as suggested by Metin Kaya
v8:
* Use rq->curr, not rq_selected with task_tick, as suggested by
  Valentin
* Minor rework to reorder this with CONFIG_SCHED_PROXY_EXEC patch
v10:
* Use rq_selected in push_rt_task & get_push_task
v11:
* Rework to use selected instead of curr in a few cases we were
  previously assigning curr =3D rq_selected() to minimize lines of
  change. Suggested by Metin.
---
 kernel/sched/core.c     | 46 ++++++++++++++++++++++++---------------
 kernel/sched/deadline.c | 39 +++++++++++++++++----------------
 kernel/sched/fair.c     | 32 +++++++++++++--------------
 kernel/sched/rt.c       | 48 ++++++++++++++++++++---------------------
 kernel/sched/sched.h    | 27 ++++++++++++++++++++---
 5 files changed, 113 insertions(+), 79 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 029e7ecf5ea9..17036bae4a27 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -794,7 +794,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *time=
r)
=20
 	rq_lock(rq, &rf);
 	update_rq_clock(rq);
-	rq->curr->sched_class->task_tick(rq, rq->curr, 1);
+	rq_selected(rq)->sched_class->task_tick(rq, rq->curr, 1);
 	rq_unlock(rq, &rf);
=20
 	return HRTIMER_NORESTART;
@@ -2236,16 +2236,18 @@ static inline void check_class_changed(struct rq *r=
q, struct task_struct *p,
=20
 void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags)
 {
-	if (p->sched_class =3D=3D rq->curr->sched_class)
-		rq->curr->sched_class->wakeup_preempt(rq, p, flags);
-	else if (sched_class_above(p->sched_class, rq->curr->sched_class))
+	struct task_struct *selected =3D rq_selected(rq);
+
+	if (p->sched_class =3D=3D selected->sched_class)
+		selected->sched_class->wakeup_preempt(rq, p, flags);
+	else if (sched_class_above(p->sched_class, selected->sched_class))
 		resched_curr(rq);
=20
 	/*
 	 * A queue event has occurred, and we're going to schedule.  In
 	 * this case, we can save a useless back to back clock update.
 	 */
-	if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
+	if (task_on_rq_queued(selected) && test_tsk_need_resched(rq->curr))
 		rq_clock_skip_update(rq);
 }
=20
@@ -2772,7 +2774,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct a=
ffinity_context *ctx)
 		lockdep_assert_held(&p->pi_lock);
=20
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_selected(rq, p);
=20
 	if (queued) {
 		/*
@@ -5593,7 +5595,7 @@ unsigned long long task_sched_runtime(struct task_str=
uct *p)
 	 * project cycles that may never be accounted to this
 	 * thread, breaking clock_gettime().
 	 */
-	if (task_current(rq, p) && task_on_rq_queued(p)) {
+	if (task_current_selected(rq, p) && task_on_rq_queued(p)) {
 		prefetch_curr_exec_start(p);
 		update_rq_clock(rq);
 		p->sched_class->update_curr(rq);
@@ -5661,7 +5663,8 @@ void sched_tick(void)
 {
 	int cpu =3D smp_processor_id();
 	struct rq *rq =3D cpu_rq(cpu);
-	struct task_struct *curr =3D rq->curr;
+	/* accounting goes to the selected task */
+	struct task_struct *selected;
 	struct rq_flags rf;
 	unsigned long hw_pressure;
 	u64 resched_latency;
@@ -5672,16 +5675,17 @@ void sched_tick(void)
 	sched_clock_tick();
=20
 	rq_lock(rq, &rf);
+	selected =3D rq_selected(rq);
=20
 	update_rq_clock(rq);
 	hw_pressure =3D arch_scale_hw_pressure(cpu_of(rq));
 	update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure);
-	curr->sched_class->task_tick(rq, curr, 0);
+	selected->sched_class->task_tick(rq, selected, 0);
 	if (sched_feat(LATENCY_WARN))
 		resched_latency =3D cpu_resched_latency(rq);
 	calc_global_load_tick(rq);
 	sched_core_tick(rq);
-	task_tick_mm_cid(rq, curr);
+	task_tick_mm_cid(rq, selected);
=20
 	rq_unlock(rq, &rf);
=20
@@ -5690,8 +5694,8 @@ void sched_tick(void)
=20
 	perf_event_task_tick();
=20
-	if (curr->flags & PF_WQ_WORKER)
-		wq_worker_tick(curr);
+	if (selected->flags & PF_WQ_WORKER)
+		wq_worker_tick(selected);
=20
 #ifdef CONFIG_SMP
 	rq->idle_balance =3D idle_cpu(cpu);
@@ -5756,6 +5760,12 @@ static void sched_tick_remote(struct work_struct *wo=
rk)
 		struct task_struct *curr =3D rq->curr;
=20
 		if (cpu_online(cpu)) {
+			/*
+			 * Since this is a remote tick for full dynticks mode,
+			 * we are always sure that there is no proxy (only a
+			 * single task is running).
+			 */
+			SCHED_WARN_ON(rq->curr !=3D rq_selected(rq));
 			update_rq_clock(rq);
=20
 			if (!is_idle_task(curr)) {
@@ -6705,6 +6715,7 @@ static void __sched notrace __schedule(unsigned int s=
ched_mode)
 	}
=20
 	next =3D pick_next_task(rq, prev, &rf);
+	rq_set_selected(rq, next);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
 #ifdef CONFIG_SCHED_DEBUG
@@ -7215,7 +7226,7 @@ void rt_mutex_setprio(struct task_struct *p, struct t=
ask_struct *pi_task)
=20
 	prev_class =3D p->sched_class;
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_selected(rq, p);
 	if (queued)
 		dequeue_task(rq, p, queue_flag);
 	if (running)
@@ -7305,7 +7316,7 @@ void set_user_nice(struct task_struct *p, long nice)
 	}
=20
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_selected(rq, p);
 	if (queued)
 		dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
 	if (running)
@@ -7884,7 +7895,7 @@ static int __sched_setscheduler(struct task_struct *p,
 	}
=20
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_selected(rq, p);
 	if (queued)
 		dequeue_task(rq, p, queue_flags);
 	if (running)
@@ -9311,6 +9322,7 @@ void __init init_idle(struct task_struct *idle, int c=
pu)
 	rcu_read_unlock();
=20
 	rq->idle =3D idle;
+	rq_set_selected(rq, idle);
 	rcu_assign_pointer(rq->curr, idle);
 	idle->on_rq =3D TASK_ON_RQ_QUEUED;
 #ifdef CONFIG_SMP
@@ -9400,7 +9412,7 @@ void sched_setnuma(struct task_struct *p, int nid)
=20
 	rq =3D task_rq_lock(p, &rf);
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_selected(rq, p);
=20
 	if (queued)
 		dequeue_task(rq, p, DEQUEUE_SAVE);
@@ -10505,7 +10517,7 @@ void sched_move_task(struct task_struct *tsk)
=20
 	update_rq_clock(rq);
=20
-	running =3D task_current(rq, tsk);
+	running =3D task_current_selected(rq, tsk);
 	queued =3D task_on_rq_queued(tsk);
=20
 	if (queued)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ef135776e068..dbfa14ff16ed 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1217,7 +1217,7 @@ static enum hrtimer_restart dl_task_timer(struct hrti=
mer *timer)
 #endif
=20
 	enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
-	if (dl_task(rq->curr))
+	if (dl_task(rq_selected(rq)))
 		wakeup_preempt_dl(rq, p, 0);
 	else
 		resched_curr(rq);
@@ -1441,11 +1441,11 @@ void dl_server_init(struct sched_dl_entity *dl_se, =
struct rq *rq,
  */
 static void update_curr_dl(struct rq *rq)
 {
-	struct task_struct *curr =3D rq->curr;
-	struct sched_dl_entity *dl_se =3D &curr->dl;
+	struct task_struct *selected =3D rq_selected(rq);
+	struct sched_dl_entity *dl_se =3D &selected->dl;
 	s64 delta_exec;
=20
-	if (!dl_task(curr) || !on_dl_rq(dl_se))
+	if (!dl_task(selected) || !on_dl_rq(dl_se))
 		return;
=20
 	/*
@@ -1898,7 +1898,7 @@ static int find_later_rq(struct task_struct *task);
 static int
 select_task_rq_dl(struct task_struct *p, int cpu, int flags)
 {
-	struct task_struct *curr;
+	struct task_struct *curr, *selected;
 	bool select_rq;
 	struct rq *rq;
=20
@@ -1909,6 +1909,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int=
 flags)
=20
 	rcu_read_lock();
 	curr =3D READ_ONCE(rq->curr); /* unlocked access */
+	selected =3D READ_ONCE(rq_selected(rq));
=20
 	/*
 	 * If we are dealing with a -deadline task, we must
@@ -1919,9 +1920,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int=
 flags)
 	 * other hand, if it has a shorter deadline, we
 	 * try to make it stay here, it might be important.
 	 */
-	select_rq =3D unlikely(dl_task(curr)) &&
+	select_rq =3D unlikely(dl_task(selected)) &&
 		    (curr->nr_cpus_allowed < 2 ||
-		     !dl_entity_preempt(&p->dl, &curr->dl)) &&
+		     !dl_entity_preempt(&p->dl, &selected->dl)) &&
 		    p->nr_cpus_allowed > 1;
=20
 	/*
@@ -1984,7 +1985,7 @@ static void check_preempt_equal_dl(struct rq *rq, str=
uct task_struct *p)
 	 * let's hope p can move out.
 	 */
 	if (rq->curr->nr_cpus_allowed =3D=3D 1 ||
-	    !cpudl_find(&rq->rd->cpudl, rq->curr, NULL))
+	    !cpudl_find(&rq->rd->cpudl, rq_selected(rq), NULL))
 		return;
=20
 	/*
@@ -2023,7 +2024,7 @@ static int balance_dl(struct rq *rq, struct task_stru=
ct *p, struct rq_flags *rf)
 static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p,
 				  int flags)
 {
-	if (dl_entity_preempt(&p->dl, &rq->curr->dl)) {
+	if (dl_entity_preempt(&p->dl, &rq_selected(rq)->dl)) {
 		resched_curr(rq);
 		return;
 	}
@@ -2033,7 +2034,7 @@ static void wakeup_preempt_dl(struct rq *rq, struct t=
ask_struct *p,
 	 * In the unlikely case current and p have the same deadline
 	 * let us try to decide what's the best thing to do...
 	 */
-	if ((p->dl.deadline =3D=3D rq->curr->dl.deadline) &&
+	if ((p->dl.deadline =3D=3D rq_selected(rq)->dl.deadline) &&
 	    !test_tsk_need_resched(rq->curr))
 		check_preempt_equal_dl(rq, p);
 #endif /* CONFIG_SMP */
@@ -2065,7 +2066,7 @@ static void set_next_task_dl(struct rq *rq, struct ta=
sk_struct *p, bool first)
 	if (!first)
 		return;
=20
-	if (rq->curr->sched_class !=3D &dl_sched_class)
+	if (rq_selected(rq)->sched_class !=3D &dl_sched_class)
 		update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0);
=20
 	deadline_queue_push_tasks(rq);
@@ -2390,8 +2391,8 @@ static int push_dl_task(struct rq *rq)
 	 * can move away, it makes sense to just reschedule
 	 * without going further in pushing next_task.
 	 */
-	if (dl_task(rq->curr) &&
-	    dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) &&
+	if (dl_task(rq_selected(rq)) &&
+	    dl_time_before(next_task->dl.deadline, rq_selected(rq)->dl.deadline) =
&&
 	    rq->curr->nr_cpus_allowed > 1) {
 		resched_curr(rq);
 		return 0;
@@ -2514,7 +2515,7 @@ static void pull_dl_task(struct rq *this_rq)
 			 * deadline than the current task of its runqueue.
 			 */
 			if (dl_time_before(p->dl.deadline,
-					   src_rq->curr->dl.deadline))
+					   rq_selected(src_rq)->dl.deadline))
 				goto skip;
=20
 			if (is_migration_disabled(p)) {
@@ -2553,9 +2554,9 @@ static void task_woken_dl(struct rq *rq, struct task_=
struct *p)
 	if (!task_on_cpu(rq, p) &&
 	    !test_tsk_need_resched(rq->curr) &&
 	    p->nr_cpus_allowed > 1 &&
-	    dl_task(rq->curr) &&
+	    dl_task(rq_selected(rq)) &&
 	    (rq->curr->nr_cpus_allowed < 2 ||
-	     !dl_entity_preempt(&p->dl, &rq->curr->dl))) {
+	     !dl_entity_preempt(&p->dl, &rq_selected(rq)->dl))) {
 		push_dl_tasks(rq);
 	}
 }
@@ -2730,12 +2731,12 @@ static void switched_to_dl(struct rq *rq, struct ta=
sk_struct *p)
 		return;
 	}
=20
-	if (rq->curr !=3D p) {
+	if (rq_selected(rq) !=3D p) {
 #ifdef CONFIG_SMP
 		if (p->nr_cpus_allowed > 1 && rq->dl.overloaded)
 			deadline_queue_push_tasks(rq);
 #endif
-		if (dl_task(rq->curr))
+		if (dl_task(rq_selected(rq)))
 			wakeup_preempt_dl(rq, p, 0);
 		else
 			resched_curr(rq);
@@ -2764,7 +2765,7 @@ static void prio_changed_dl(struct rq *rq, struct tas=
k_struct *p,
 	if (!rq->dl.overloaded)
 		deadline_queue_pull_task(rq);
=20
-	if (task_current(rq, p)) {
+	if (task_current_selected(rq, p)) {
 		/*
 		 * If we now have a earlier deadline task than p,
 		 * then reschedule, provided p is still on this
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a5b1ae0aa55..4d0d3b423220 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1140,12 +1140,12 @@ static inline void update_curr_task(struct task_str=
uct *p, s64 delta_exec)
  */
 s64 update_curr_common(struct rq *rq)
 {
-	struct task_struct *curr =3D rq->curr;
+	struct task_struct *selected =3D rq_selected(rq);
 	s64 delta_exec;
=20
-	delta_exec =3D update_curr_se(rq, &curr->se);
+	delta_exec =3D update_curr_se(rq, &selected->se);
 	if (likely(delta_exec > 0))
-		update_curr_task(curr, delta_exec);
+		update_curr_task(selected, delta_exec);
=20
 	return delta_exec;
 }
@@ -1177,7 +1177,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
=20
 static void update_curr_fair(struct rq *rq)
 {
-	update_curr(cfs_rq_of(&rq->curr->se));
+	update_curr(cfs_rq_of(&rq_selected(rq)->se));
 }
=20
 static inline void
@@ -6646,7 +6646,7 @@ static void hrtick_start_fair(struct rq *rq, struct t=
ask_struct *p)
 		s64 delta =3D slice - ran;
=20
 		if (delta < 0) {
-			if (task_current(rq, p))
+			if (task_current_selected(rq, p))
 				resched_curr(rq);
 			return;
 		}
@@ -6661,12 +6661,12 @@ static void hrtick_start_fair(struct rq *rq, struct=
 task_struct *p)
  */
 static void hrtick_update(struct rq *rq)
 {
-	struct task_struct *curr =3D rq->curr;
+	struct task_struct *selected =3D rq_selected(rq);
=20
-	if (!hrtick_enabled_fair(rq) || curr->sched_class !=3D &fair_sched_class)
+	if (!hrtick_enabled_fair(rq) || selected->sched_class !=3D &fair_sched_cl=
ass)
 		return;
=20
-	hrtick_start_fair(rq, curr);
+	hrtick_start_fair(rq, selected);
 }
 #else /* !CONFIG_SCHED_HRTICK */
 static inline void
@@ -8348,9 +8348,9 @@ static void set_next_buddy(struct sched_entity *se)
  */
 static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p=
, int wake_flags)
 {
-	struct task_struct *curr =3D rq->curr;
-	struct sched_entity *se =3D &curr->se, *pse =3D &p->se;
-	struct cfs_rq *cfs_rq =3D task_cfs_rq(curr);
+	struct task_struct *selected =3D rq_selected(rq);
+	struct sched_entity *se =3D &selected->se, *pse =3D &p->se;
+	struct cfs_rq *cfs_rq =3D task_cfs_rq(selected);
 	int cse_is_idle, pse_is_idle;
=20
 	if (unlikely(se =3D=3D pse))
@@ -8379,11 +8379,11 @@ static void check_preempt_wakeup_fair(struct rq *rq=
, struct task_struct *p, int
 	 * prevents us from potentially nominating it as a false LAST_BUDDY
 	 * below.
 	 */
-	if (test_tsk_need_resched(curr))
+	if (test_tsk_need_resched(rq->curr))
 		return;
=20
 	/* Idle tasks are by definition preempted by non-idle tasks. */
-	if (unlikely(task_has_idle_policy(curr)) &&
+	if (unlikely(task_has_idle_policy(selected)) &&
 	    likely(!task_has_idle_policy(p)))
 		goto preempt;
=20
@@ -9361,7 +9361,7 @@ static bool __update_blocked_others(struct rq *rq, bo=
ol *done)
 	 * update_load_avg() can call cpufreq_update_util(). Make sure that RT,
 	 * DL and IRQ signals have been updated before updating CFS.
 	 */
-	curr_class =3D rq->curr->sched_class;
+	curr_class =3D rq_selected(rq)->sched_class;
=20
 	hw_pressure =3D arch_scale_hw_pressure(cpu_of(rq));
=20
@@ -12738,7 +12738,7 @@ prio_changed_fair(struct rq *rq, struct task_struct=
 *p, int oldprio)
 	 * our priority decreased, or if we are not currently running on
 	 * this runqueue and our priority is higher than the current's
 	 */
-	if (task_current(rq, p)) {
+	if (task_current_selected(rq, p)) {
 		if (p->prio > oldprio)
 			resched_curr(rq);
 	} else
@@ -12843,7 +12843,7 @@ static void switched_to_fair(struct rq *rq, struct =
task_struct *p)
 		 * kick off the schedule if running, otherwise just see
 		 * if we can still preempt the current task.
 		 */
-		if (task_current(rq, p))
+		if (task_current_selected(rq, p))
 			resched_curr(rq);
 		else
 			wakeup_preempt(rq, p, 0);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 56363e18949a..da4cbd744fe6 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -529,7 +529,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *r=
t_se, unsigned int flags)
=20
 static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
 {
-	struct task_struct *curr =3D rq_of_rt_rq(rt_rq)->curr;
+	struct task_struct *selected =3D rq_selected(rq_of_rt_rq(rt_rq));
 	struct rq *rq =3D rq_of_rt_rq(rt_rq);
 	struct sched_rt_entity *rt_se;
=20
@@ -543,7 +543,7 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
 		else if (!on_rt_rq(rt_se))
 			enqueue_rt_entity(rt_se, 0);
=20
-		if (rt_rq->highest_prio.curr < curr->prio)
+		if (rt_rq->highest_prio.curr < selected->prio)
 			resched_curr(rq);
 	}
 }
@@ -999,11 +999,11 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt=
_rq)
  */
 static void update_curr_rt(struct rq *rq)
 {
-	struct task_struct *curr =3D rq->curr;
-	struct sched_rt_entity *rt_se =3D &curr->rt;
+	struct task_struct *selected =3D rq_selected(rq);
+	struct sched_rt_entity *rt_se =3D &selected->rt;
 	s64 delta_exec;
=20
-	if (curr->sched_class !=3D &rt_sched_class)
+	if (selected->sched_class !=3D &rt_sched_class)
 		return;
=20
 	delta_exec =3D update_curr_common(rq);
@@ -1542,7 +1542,7 @@ static int find_lowest_rq(struct task_struct *task);
 static int
 select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 {
-	struct task_struct *curr;
+	struct task_struct *curr, *selected;
 	struct rq *rq;
 	bool test;
=20
@@ -1554,6 +1554,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int=
 flags)
=20
 	rcu_read_lock();
 	curr =3D READ_ONCE(rq->curr); /* unlocked access */
+	selected =3D READ_ONCE(rq_selected(rq));
=20
 	/*
 	 * If the current task on @p's runqueue is an RT task, then
@@ -1582,8 +1583,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int=
 flags)
 	 * systems like big.LITTLE.
 	 */
 	test =3D curr &&
-	       unlikely(rt_task(curr)) &&
-	       (curr->nr_cpus_allowed < 2 || curr->prio <=3D p->prio);
+	       unlikely(rt_task(selected)) &&
+	       (curr->nr_cpus_allowed < 2 || selected->prio <=3D p->prio);
=20
 	if (test || !rt_task_fits_capacity(p, cpu)) {
 		int target =3D find_lowest_rq(p);
@@ -1613,12 +1614,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, in=
t flags)
=20
 static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
 {
-	/*
-	 * Current can't be migrated, useless to reschedule,
-	 * let's hope p can move out.
-	 */
 	if (rq->curr->nr_cpus_allowed =3D=3D 1 ||
-	    !cpupri_find(&rq->rd->cpupri, rq->curr, NULL))
+	    !cpupri_find(&rq->rd->cpupri, rq_selected(rq), NULL))
 		return;
=20
 	/*
@@ -1661,7 +1658,9 @@ static int balance_rt(struct rq *rq, struct task_stru=
ct *p, struct rq_flags *rf)
  */
 static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int fl=
ags)
 {
-	if (p->prio < rq->curr->prio) {
+	struct task_struct *selected =3D rq_selected(rq);
+
+	if (p->prio < selected->prio) {
 		resched_curr(rq);
 		return;
 	}
@@ -1679,7 +1678,7 @@ static void wakeup_preempt_rt(struct rq *rq, struct t=
ask_struct *p, int flags)
 	 * to move current somewhere else, making room for our non-migratable
 	 * task.
 	 */
-	if (p->prio =3D=3D rq->curr->prio && !test_tsk_need_resched(rq->curr))
+	if (p->prio =3D=3D selected->prio && !test_tsk_need_resched(rq->curr))
 		check_preempt_equal_prio(rq, p);
 #endif
 }
@@ -1704,7 +1703,7 @@ static inline void set_next_task_rt(struct rq *rq, st=
ruct task_struct *p, bool f
 	 * utilization. We only care of the case where we start to schedule a
 	 * rt task
 	 */
-	if (rq->curr->sched_class !=3D &rt_sched_class)
+	if (rq_selected(rq)->sched_class !=3D &rt_sched_class)
 		update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0);
=20
 	rt_queue_push_tasks(rq);
@@ -1976,6 +1975,7 @@ static struct task_struct *pick_next_pushable_task(st=
ruct rq *rq)
=20
 	BUG_ON(rq->cpu !=3D task_cpu(p));
 	BUG_ON(task_current(rq, p));
+	BUG_ON(task_current_selected(rq, p));
 	BUG_ON(p->nr_cpus_allowed <=3D 1);
=20
 	BUG_ON(!task_on_rq_queued(p));
@@ -2008,7 +2008,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 	 * higher priority than current. If that's the case
 	 * just reschedule current.
 	 */
-	if (unlikely(next_task->prio < rq->curr->prio)) {
+	if (unlikely(next_task->prio < rq_selected(rq)->prio)) {
 		resched_curr(rq);
 		return 0;
 	}
@@ -2029,7 +2029,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 		 * Note that the stoppers are masqueraded as SCHED_FIFO
 		 * (cf. sched_set_stop_task()), so we can't rely on rt_task().
 		 */
-		if (rq->curr->sched_class !=3D &rt_sched_class)
+		if (rq_selected(rq)->sched_class !=3D &rt_sched_class)
 			return 0;
=20
 		cpu =3D find_lowest_rq(rq->curr);
@@ -2361,7 +2361,7 @@ static void pull_rt_task(struct rq *this_rq)
 			 * p if it is lower in priority than the
 			 * current task on the run queue
 			 */
-			if (p->prio < src_rq->curr->prio)
+			if (p->prio < rq_selected(src_rq)->prio)
 				goto skip;
=20
 			if (is_migration_disabled(p)) {
@@ -2403,9 +2403,9 @@ static void task_woken_rt(struct rq *rq, struct task_=
struct *p)
 	bool need_to_push =3D !task_on_cpu(rq, p) &&
 			    !test_tsk_need_resched(rq->curr) &&
 			    p->nr_cpus_allowed > 1 &&
-			    (dl_task(rq->curr) || rt_task(rq->curr)) &&
+			    (dl_task(rq_selected(rq)) || rt_task(rq_selected(rq))) &&
 			    (rq->curr->nr_cpus_allowed < 2 ||
-			     rq->curr->prio <=3D p->prio);
+			     rq_selected(rq)->prio <=3D p->prio);
=20
 	if (need_to_push)
 		push_rt_tasks(rq);
@@ -2489,7 +2489,7 @@ static void switched_to_rt(struct rq *rq, struct task=
_struct *p)
 		if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
 			rt_queue_push_tasks(rq);
 #endif /* CONFIG_SMP */
-		if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq)))
+		if (p->prio < rq_selected(rq)->prio && cpu_online(cpu_of(rq)))
 			resched_curr(rq);
 	}
 }
@@ -2504,7 +2504,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,=
 int oldprio)
 	if (!task_on_rq_queued(p))
 		return;
=20
-	if (task_current(rq, p)) {
+	if (task_current_selected(rq, p)) {
 #ifdef CONFIG_SMP
 		/*
 		 * If our priority decreases while running, we
@@ -2530,7 +2530,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,=
 int oldprio)
 		 * greater than the current running task
 		 * then reschedule.
 		 */
-		if (p->prio < rq->curr->prio)
+		if (p->prio < rq_selected(rq)->prio)
 			resched_curr(rq);
 	}
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 493de4cc320a..7ee8c7fa0ae8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1051,7 +1051,7 @@ struct rq {
 	 */
 	unsigned int		nr_uninterruptible;
=20
-	struct task_struct __rcu	*curr;
+	struct task_struct __rcu	*curr;       /* Execution context */
 	struct task_struct	*idle;
 	struct task_struct	*stop;
 	unsigned long		next_balance;
@@ -1246,6 +1246,13 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 #define raw_rq()		raw_cpu_ptr(&runqueues)
=20
+/* For now, rq_selected =3D=3D rq->curr */
+#define rq_selected(rq)		((rq)->curr)
+static inline void rq_set_selected(struct rq *rq, struct task_struct *t)
+{
+	/* Do nothing */
+}
+
 struct sched_group;
 #ifdef CONFIG_SCHED_CORE
 static inline struct cpumask *sched_group_span(struct sched_group *sg);
@@ -2151,11 +2158,25 @@ static inline u64 global_rt_runtime(void)
 	return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC;
 }
=20
+/*
+ * Is p the current execution context?
+ */
 static inline int task_current(struct rq *rq, struct task_struct *p)
 {
 	return rq->curr =3D=3D p;
 }
=20
+/*
+ * Is p the current scheduling context?
+ *
+ * Note that it might be the current execution context at the same time if
+ * rq->curr =3D=3D rq_selected() =3D=3D p.
+ */
+static inline int task_current_selected(struct rq *rq, struct task_struct =
*p)
+{
+	return rq_selected(rq) =3D=3D p;
+}
+
 static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
 {
 #ifdef CONFIG_SMP
@@ -2325,7 +2346,7 @@ struct sched_class {
=20
 static inline void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	WARN_ON_ONCE(rq->curr !=3D prev);
+	WARN_ON_ONCE(rq_selected(rq) !=3D prev);
 	prev->sched_class->put_prev_task(rq, prev);
 }
=20
@@ -2406,7 +2427,7 @@ extern void set_cpus_allowed_common(struct task_struc=
t *p, struct affinity_conte
=20
 static inline struct task_struct *get_push_task(struct rq *rq)
 {
-	struct task_struct *p =3D rq->curr;
+	struct task_struct *p =3D rq_selected(rq);
=20
 	lockdep_assert_rq_held(rq);
=20
--=20
2.45.2.993.g49e7a77208-goog