From nobody Sat Feb  7 16:18:56 2026
Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com
 [209.85.214.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 127C11D1745
	for <linux-kernel@vger.kernel.org>; Fri, 11 Oct 2024 23:25:38 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1728689141; cv=none;
 b=LcPAUY3mLvv3T6XOl/TXNH2ZKbU8iB++ecm3ZhL5eF6K3sp3XLZxNXHp5IUSz75MfYdefRftJ/XphyNKnzBmWIhlSQfp4ANEQ3/+i8dO8dJLw19gpSKxV9Zs3bUIKA+PBesguxG5giaTqokp07AdqZJbX7FZDLA345pgP3EZhPw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1728689141; c=relaxed/simple;
	bh=SaolSQrmZugwURUO1/2nE4/Fsqo4U96OCl6zGppw31U=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=tlPeoBA+mShHr20f2yXWo3sqr1xY5T1ejE4XqKjS78ErdBgm0nXPM5D6it09KmEWjKbaelUQPy7t4ccqhziK16dImibK41R5uu9GZu8u8yjKpiwbnRWBiBLVt7waBMcTseDjnv5Agn/nt/jtU7Py/sxjo3qMacDmqbyio0xGwAw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=encYlpDW; arc=none smtp.client-ip=209.85.214.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="encYlpDW"
Received: by mail-pl1-f201.google.com with SMTP id
 d9443c01a7336-20c8a799e8dso21579025ad.2
        for <linux-kernel@vger.kernel.org>;
 Fri, 11 Oct 2024 16:25:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1728689138; x=1729293938;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=mci0IDSL2fMyiVnOT4z+dvqIjXvgYiI9foZu788538k=;
        b=encYlpDWsFpJx3qoDCgfM4ElsecQTV2yBcV5vuEon8DYiMhUE+dpeudJzRGTCP4ioV
         4KXQUBOF2LxWgmHVQcGq7H3/hFitUrZ/qP+7s+WVa8LCKQQnGwQaeJiut5wwUqTpY9+F
         q/k+MMJVzKAYTEK68dLS+ewAzNPLaPceX59SMouE8NxbWed4Wq7REM2jUo1p61VmZYVz
         k9imqTCeYq/7uUA0+jmfLuwUBD2D1h+qRHP0mzOnbIr7XuK4465hHveszDFJumecyzR8
         GZqBTH4zn4SdzoEhUO3kJPpc6Z3CynvXFOLafBdAlWn/F1eoYSsXBuehHYmIsQ0dNDKI
         X6tA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1728689138; x=1729293938;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=mci0IDSL2fMyiVnOT4z+dvqIjXvgYiI9foZu788538k=;
        b=R1hdjFlUjXoC5kzAcoRzz6VYAy+QT0bwHLLEqt7A5jRvb7MMKFdvBvBLND/jVYD3Xh
         WuYYs/wUV9E4Av2LTZWf4ZXQWxY5y3FWL5CvzC15PbqB48wCiSn8UWDC/No91bolaKHe
         zLX4dW37ZR9zplb1+vJXkUR4Y71TDY3e5PMMLn5NTITu2wb+GovVLYUIFB4+KjCtHFUN
         A44CQq3KvUwsxdcC2b9z5Qhcu6dTeDsCujtnxS4T5+jw/xjUvIIqTe3WAY78s7L2DRjL
         aW3Eki/uqdvzV81KPAZzKHbNDZx47nmMA6rsr3Q1kJqFY36yqfOq9sbDcWsuzWSG1BQN
         UfZA==
X-Gm-Message-State: AOJu0YwU5oaaPHg7QazfKVZ8fidRvYJ1fRdxbiy39ZwMunNGdQj+ABaw
	BtMJZpkoB/Y6RbrNv771J6nS07EGMC9KYFUzkR/UUoJQwcsonaRjfoS/4/xGztTXSfl9l5Dios+
	zGsrcdA75k3Go8TqqWF8wrw9LYutgk5gGEXXDZxtFqtJi4igXiP0LQvthD+KMTp00IkAzihgCqr
	lzXIdXLzsJbxk3HuR/lJhnQP95HH0PbrUI6ci9SuSMoS9/
X-Google-Smtp-Source: 
 AGHT+IH0KIrWlkiv+680eJs6Q4IWuUI/NVOaQvwxRUnhc7DYyJs24F3E7Fr9tQVppJeHpZjKPdDgghCF4oDy
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:902:e845:b0:1fa:2ae7:cc6a with SMTP id
 d9443c01a7336-20ca14625c1mr126855ad.4.1728689137282; Fri, 11 Oct 2024
 16:25:37 -0700 (PDT)
Date: Fri, 11 Oct 2024 16:25:13 -0700
In-Reply-To: <20241011232525.2513424-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20241011232525.2513424-1-jstultz@google.com>
X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog
Message-ID: <20241011232525.2513424-2-jstultz@google.com>
Subject: [PATCH v13 1/7] locking/mutex: Remove wakeups from under
 mutex::wait_lock
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
 Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	Metin Kaya <metin.kaya@arm.com>, Davidlohr Bueso <dave@stgolabs.net>,
 John Stultz <jstultz@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

In preparation to nest mutex::wait_lock under rq::lock we need
to remove wakeups from under it.

Do this by utilizing wake_qs to defer the wakeup until after the
lock is dropped.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Heavily changed after 55f036ca7e74 ("locking: WW mutex cleanup") and
08295b3b5bee ("locking: Implement an algorithm choice for Wound-Wait
mutexes")]
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
[jstultz: rebased to mainline, added extra wake_up_q & init
 to avoid hangs, similar to Connor's rework of this patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Reverted back to an earlier version of this patch to undo
  the change that kept the wake_q in the ctx structure, as
  that broke the rule that the wake_q must always be on the
  stack, as its not safe for concurrency.
v6:
* Made tweaks suggested by Waiman Long
v7:
* Fixups to pass wake_qs down for PREEMPT_RT logic
v10:
* Switched preempt_enable to be lower close to the unlock as
  suggested by Valentin
* Added additional preempt_disable coverage around the wake_q
  calls as again noted by Valentin
v12:
* Fixes and simplifications from K Prateek Nayak and Peter Zijlstra
* Commit message tweak
---
 kernel/futex/pi.c               |  6 +++-
 kernel/locking/mutex.c          | 16 ++++++++---
 kernel/locking/rtmutex.c        | 49 +++++++++++++++++++++++----------
 kernel/locking/rtmutex_api.c    | 11 ++++++--
 kernel/locking/rtmutex_common.h |  3 +-
 kernel/locking/rwbase_rt.c      |  8 +++++-
 kernel/locking/rwsem.c          |  4 +--
 kernel/locking/spinlock_rt.c    |  3 +-
 kernel/locking/ww_mutex.h       | 30 ++++++++++++--------
 9 files changed, 92 insertions(+), 38 deletions(-)

diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c
index 5722467f2737..d62cca5ed8f4 100644
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -922,6 +922,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags=
, ktime_t *time, int tryl
 	struct rt_mutex_waiter rt_waiter;
 	struct futex_hash_bucket *hb;
 	struct futex_q q =3D futex_q_init;
+	DEFINE_WAKE_Q(wake_q);
 	int res, ret;
=20
 	if (!IS_ENABLED(CONFIG_FUTEX_PI))
@@ -1018,8 +1019,11 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fl=
ags, ktime_t *time, int tryl
 	 * such that futex_unlock_pi() is guaranteed to observe the waiter when
 	 * it sees the futex_q::pi_state.
 	 */
-	ret =3D __rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, cu=
rrent);
+	ret =3D __rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, cu=
rrent, &wake_q);
+	preempt_disable();
 	raw_spin_unlock_irq(&q.pi_state->pi_mutex.wait_lock);
+	wake_up_q(&wake_q);
+	preempt_enable();
=20
 	if (ret) {
 		if (ret =3D=3D 1)
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index cbae8c0b89ab..6c94da061ec2 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -575,6 +575,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		    struct lockdep_map *nest_lock, unsigned long ip,
 		    struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
 {
+	DEFINE_WAKE_Q(wake_q);
 	struct mutex_waiter waiter;
 	struct ww_mutex *ww;
 	int ret;
@@ -625,7 +626,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	 */
 	if (__mutex_trylock(lock)) {
 		if (ww_ctx)
-			__ww_mutex_check_waiters(lock, ww_ctx);
+			__ww_mutex_check_waiters(lock, ww_ctx, &wake_q);
=20
 		goto skip_wait;
 	}
@@ -645,7 +646,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		 * Add in stamp order, waking up waiters that must kill
 		 * themselves.
 		 */
-		ret =3D __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
+		ret =3D __ww_mutex_add_waiter(&waiter, lock, ww_ctx, &wake_q);
 		if (ret)
 			goto err_early_kill;
 	}
@@ -681,6 +682,10 @@ __mutex_lock_common(struct mutex *lock, unsigned int s=
tate, unsigned int subclas
 		}
=20
 		raw_spin_unlock(&lock->wait_lock);
+		/* Make sure we do wakeups before calling schedule */
+		wake_up_q(&wake_q);
+		wake_q_init(&wake_q);
+
 		schedule_preempt_disabled();
=20
 		first =3D __mutex_waiter_is_first(lock, &waiter);
@@ -714,7 +719,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		 */
 		if (!ww_ctx->is_wait_die &&
 		    !__mutex_waiter_is_first(lock, &waiter))
-			__ww_mutex_check_waiters(lock, ww_ctx);
+			__ww_mutex_check_waiters(lock, ww_ctx, &wake_q);
 	}
=20
 	__mutex_remove_waiter(lock, &waiter);
@@ -730,6 +735,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		ww_mutex_lock_acquired(ww, ww_ctx);
=20
 	raw_spin_unlock(&lock->wait_lock);
+	wake_up_q(&wake_q);
 	preempt_enable();
 	return 0;
=20
@@ -741,6 +747,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	raw_spin_unlock(&lock->wait_lock);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, ip);
+	wake_up_q(&wake_q);
 	preempt_enable();
 	return ret;
 }
@@ -951,9 +958,10 @@ static noinline void __sched __mutex_unlock_slowpath(s=
truct mutex *lock, unsigne
 	if (owner & MUTEX_FLAG_HANDOFF)
 		__mutex_handoff(lock, next);
=20
+	preempt_disable();
 	raw_spin_unlock(&lock->wait_lock);
-
 	wake_up_q(&wake_q);
+	preempt_enable();
 }
=20
 #ifndef CONFIG_DEBUG_LOCK_ALLOC
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index ebebd0eec7f6..8ada6567a141 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -34,13 +34,15 @@
=20
 static inline int __ww_mutex_add_waiter(struct rt_mutex_waiter *waiter,
 					struct rt_mutex *lock,
-					struct ww_acquire_ctx *ww_ctx)
+					struct ww_acquire_ctx *ww_ctx,
+					struct wake_q_head *wake_q)
 {
 	return 0;
 }
=20
 static inline void __ww_mutex_check_waiters(struct rt_mutex *lock,
-					    struct ww_acquire_ctx *ww_ctx)
+					    struct ww_acquire_ctx *ww_ctx,
+					    struct wake_q_head *wake_q)
 {
 }
=20
@@ -1201,7 +1203,8 @@ static int __sched task_blocks_on_rt_mutex(struct rt_=
mutex_base *lock,
 					   struct rt_mutex_waiter *waiter,
 					   struct task_struct *task,
 					   struct ww_acquire_ctx *ww_ctx,
-					   enum rtmutex_chainwalk chwalk)
+					   enum rtmutex_chainwalk chwalk,
+					   struct wake_q_head *wake_q)
 {
 	struct task_struct *owner =3D rt_mutex_owner(lock);
 	struct rt_mutex_waiter *top_waiter =3D waiter;
@@ -1245,7 +1248,10 @@ static int __sched task_blocks_on_rt_mutex(struct rt=
_mutex_base *lock,
=20
 		/* Check whether the waiter should back out immediately */
 		rtm =3D container_of(lock, struct rt_mutex, rtmutex);
-		res =3D __ww_mutex_add_waiter(waiter, rtm, ww_ctx);
+		preempt_disable();
+		res =3D __ww_mutex_add_waiter(waiter, rtm, ww_ctx, wake_q);
+		wake_up_q(wake_q);
+		preempt_enable();
 		if (res) {
 			raw_spin_lock(&task->pi_lock);
 			rt_mutex_dequeue(lock, waiter);
@@ -1679,7 +1685,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mute=
x_base *lock,
 				       struct ww_acquire_ctx *ww_ctx,
 				       unsigned int state,
 				       enum rtmutex_chainwalk chwalk,
-				       struct rt_mutex_waiter *waiter)
+				       struct rt_mutex_waiter *waiter,
+				       struct wake_q_head *wake_q)
 {
 	struct rt_mutex *rtm =3D container_of(lock, struct rt_mutex, rtmutex);
 	struct ww_mutex *ww =3D ww_container_of(rtm);
@@ -1690,7 +1697,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mute=
x_base *lock,
 	/* Try to acquire the lock again: */
 	if (try_to_take_rt_mutex(lock, current, NULL)) {
 		if (build_ww_mutex() && ww_ctx) {
-			__ww_mutex_check_waiters(rtm, ww_ctx);
+			__ww_mutex_check_waiters(rtm, ww_ctx, wake_q);
 			ww_mutex_lock_acquired(ww, ww_ctx);
 		}
 		return 0;
@@ -1700,7 +1707,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mute=
x_base *lock,
=20
 	trace_contention_begin(lock, LCB_F_RT);
=20
-	ret =3D task_blocks_on_rt_mutex(lock, waiter, current, ww_ctx, chwalk);
+	ret =3D task_blocks_on_rt_mutex(lock, waiter, current, ww_ctx, chwalk, wa=
ke_q);
 	if (likely(!ret))
 		ret =3D rt_mutex_slowlock_block(lock, ww_ctx, state, NULL, waiter);
=20
@@ -1708,7 +1715,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mute=
x_base *lock,
 		/* acquired the lock */
 		if (build_ww_mutex() && ww_ctx) {
 			if (!ww_ctx->is_wait_die)
-				__ww_mutex_check_waiters(rtm, ww_ctx);
+				__ww_mutex_check_waiters(rtm, ww_ctx, wake_q);
 			ww_mutex_lock_acquired(ww, ww_ctx);
 		}
 	} else {
@@ -1730,7 +1737,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mute=
x_base *lock,
=20
 static inline int __rt_mutex_slowlock_locked(struct rt_mutex_base *lock,
 					     struct ww_acquire_ctx *ww_ctx,
-					     unsigned int state)
+					     unsigned int state,
+					     struct wake_q_head *wake_q)
 {
 	struct rt_mutex_waiter waiter;
 	int ret;
@@ -1739,7 +1747,7 @@ static inline int __rt_mutex_slowlock_locked(struct r=
t_mutex_base *lock,
 	waiter.ww_ctx =3D ww_ctx;
=20
 	ret =3D __rt_mutex_slowlock(lock, ww_ctx, state, RT_MUTEX_MIN_CHAINWALK,
-				  &waiter);
+				  &waiter, wake_q);
=20
 	debug_rt_mutex_free_waiter(&waiter);
 	return ret;
@@ -1755,6 +1763,7 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_=
base *lock,
 				     struct ww_acquire_ctx *ww_ctx,
 				     unsigned int state)
 {
+	DEFINE_WAKE_Q(wake_q);
 	unsigned long flags;
 	int ret;
=20
@@ -1776,8 +1785,11 @@ static int __sched rt_mutex_slowlock(struct rt_mutex=
_base *lock,
 	 * irqsave/restore variants.
 	 */
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
-	ret =3D __rt_mutex_slowlock_locked(lock, ww_ctx, state);
+	ret =3D __rt_mutex_slowlock_locked(lock, ww_ctx, state, &wake_q);
+	preempt_disable();
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
+	wake_up_q(&wake_q);
+	preempt_enable();
 	rt_mutex_post_schedule();
=20
 	return ret;
@@ -1804,7 +1816,8 @@ static __always_inline int __rt_mutex_lock(struct rt_=
mutex_base *lock,
  * rtlock_slowlock_locked - Slow path lock acquisition for RT locks
  * @lock:	The underlying RT mutex
  */
-static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
+static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock,
+					   struct wake_q_head *wake_q)
 {
 	struct rt_mutex_waiter waiter;
 	struct task_struct *owner;
@@ -1821,7 +1834,7 @@ static void __sched rtlock_slowlock_locked(struct rt_=
mutex_base *lock)
=20
 	trace_contention_begin(lock, LCB_F_RT);
=20
-	task_blocks_on_rt_mutex(lock, &waiter, current, NULL, RT_MUTEX_MIN_CHAINW=
ALK);
+	task_blocks_on_rt_mutex(lock, &waiter, current, NULL, RT_MUTEX_MIN_CHAINW=
ALK, wake_q);
=20
 	for (;;) {
 		/* Try to acquire the lock again */
@@ -1832,7 +1845,11 @@ static void __sched rtlock_slowlock_locked(struct rt=
_mutex_base *lock)
 			owner =3D rt_mutex_owner(lock);
 		else
 			owner =3D NULL;
+		preempt_disable();
 		raw_spin_unlock_irq(&lock->wait_lock);
+		wake_up_q(wake_q);
+		wake_q_init(wake_q);
+		preempt_enable();
=20
 		if (!owner || !rtmutex_spin_on_owner(lock, &waiter, owner))
 			schedule_rtlock();
@@ -1857,10 +1874,14 @@ static void __sched rtlock_slowlock_locked(struct r=
t_mutex_base *lock)
 static __always_inline void __sched rtlock_slowlock(struct rt_mutex_base *=
lock)
 {
 	unsigned long flags;
+	DEFINE_WAKE_Q(wake_q);
=20
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
-	rtlock_slowlock_locked(lock);
+	rtlock_slowlock_locked(lock, &wake_q);
+	preempt_disable();
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
+	wake_up_q(&wake_q);
+	preempt_enable();
 }
=20
 #endif /* RT_MUTEX_BUILD_SPINLOCKS */
diff --git a/kernel/locking/rtmutex_api.c b/kernel/locking/rtmutex_api.c
index a6974d044593..747f2da16037 100644
--- a/kernel/locking/rtmutex_api.c
+++ b/kernel/locking/rtmutex_api.c
@@ -291,7 +291,8 @@ void __sched rt_mutex_proxy_unlock(struct rt_mutex_base=
 *lock)
  */
 int __sched __rt_mutex_start_proxy_lock(struct rt_mutex_base *lock,
 					struct rt_mutex_waiter *waiter,
-					struct task_struct *task)
+					struct task_struct *task,
+					struct wake_q_head *wake_q)
 {
 	int ret;
=20
@@ -302,7 +303,7 @@ int __sched __rt_mutex_start_proxy_lock(struct rt_mutex=
_base *lock,
=20
 	/* We enforce deadlock detection for futexes */
 	ret =3D task_blocks_on_rt_mutex(lock, waiter, task, NULL,
-				      RT_MUTEX_FULL_CHAINWALK);
+				      RT_MUTEX_FULL_CHAINWALK, wake_q);
=20
 	if (ret && !rt_mutex_owner(lock)) {
 		/*
@@ -341,12 +342,16 @@ int __sched rt_mutex_start_proxy_lock(struct rt_mutex=
_base *lock,
 				      struct task_struct *task)
 {
 	int ret;
+	DEFINE_WAKE_Q(wake_q);
=20
 	raw_spin_lock_irq(&lock->wait_lock);
-	ret =3D __rt_mutex_start_proxy_lock(lock, waiter, task);
+	ret =3D __rt_mutex_start_proxy_lock(lock, waiter, task, &wake_q);
 	if (unlikely(ret))
 		remove_waiter(lock, waiter);
+	preempt_disable();
 	raw_spin_unlock_irq(&lock->wait_lock);
+	wake_up_q(&wake_q);
+	preempt_enable();
=20
 	return ret;
 }
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_commo=
n.h
index 1162e07cdaea..c38a2d2d4a7e 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -83,7 +83,8 @@ extern void rt_mutex_init_proxy_locked(struct rt_mutex_ba=
se *lock,
 extern void rt_mutex_proxy_unlock(struct rt_mutex_base *lock);
 extern int __rt_mutex_start_proxy_lock(struct rt_mutex_base *lock,
 				     struct rt_mutex_waiter *waiter,
-				     struct task_struct *task);
+				     struct task_struct *task,
+				     struct wake_q_head *);
 extern int rt_mutex_start_proxy_lock(struct rt_mutex_base *lock,
 				     struct rt_mutex_waiter *waiter,
 				     struct task_struct *task);
diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 34a59569db6b..9f4322c07486 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -69,6 +69,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *r=
wb,
 				      unsigned int state)
 {
 	struct rt_mutex_base *rtm =3D &rwb->rtmutex;
+	DEFINE_WAKE_Q(wake_q);
 	int ret;
=20
 	rwbase_pre_schedule();
@@ -110,7 +111,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt =
*rwb,
 	 * For rwlocks this returns 0 unconditionally, so the below
 	 * !ret conditionals are optimized out.
 	 */
-	ret =3D rwbase_rtmutex_slowlock_locked(rtm, state);
+	ret =3D rwbase_rtmutex_slowlock_locked(rtm, state, &wake_q);
=20
 	/*
 	 * On success the rtmutex is held, so there can't be a writer
@@ -121,7 +122,12 @@ static int __sched __rwbase_read_lock(struct rwbase_rt=
 *rwb,
 	 */
 	if (!ret)
 		atomic_inc(&rwb->readers);
+
+	preempt_disable();
 	raw_spin_unlock_irq(&rtm->wait_lock);
+	wake_up_q(&wake_q);
+	preempt_enable();
+
 	if (!ret)
 		rwbase_rtmutex_unlock(rtm);
=20
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 2bbb6eca5144..2ddb827e3bea 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1413,8 +1413,8 @@ static inline void __downgrade_write(struct rw_semaph=
ore *sem)
 #define rwbase_rtmutex_lock_state(rtm, state)		\
 	__rt_mutex_lock(rtm, state)
=20
-#define rwbase_rtmutex_slowlock_locked(rtm, state)	\
-	__rt_mutex_slowlock_locked(rtm, NULL, state)
+#define rwbase_rtmutex_slowlock_locked(rtm, state, wq)	\
+	__rt_mutex_slowlock_locked(rtm, NULL, state, wq)
=20
 #define rwbase_rtmutex_unlock(rtm)			\
 	__rt_mutex_unlock(rtm)
diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c
index 38e292454fcc..fb1810a14c9d 100644
--- a/kernel/locking/spinlock_rt.c
+++ b/kernel/locking/spinlock_rt.c
@@ -162,7 +162,8 @@ rwbase_rtmutex_lock_state(struct rt_mutex_base *rtm, un=
signed int state)
 }
=20
 static __always_inline int
-rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int sta=
te)
+rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int sta=
te,
+			       struct wake_q_head *wake_q)
 {
 	rtlock_slowlock_locked(rtm);
 	return 0;
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 76d204b7d29c..a54bd16d0f17 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -275,7 +275,7 @@ __ww_ctx_less(struct ww_acquire_ctx *a, struct ww_acqui=
re_ctx *b)
  */
 static bool
 __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
-	       struct ww_acquire_ctx *ww_ctx)
+	       struct ww_acquire_ctx *ww_ctx, struct wake_q_head *wake_q)
 {
 	if (!ww_ctx->is_wait_die)
 		return false;
@@ -284,7 +284,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER =
*waiter,
 #ifndef WW_RT
 		debug_mutex_wake_waiter(lock, waiter);
 #endif
-		wake_up_process(waiter->task);
+		wake_q_add(wake_q, waiter->task);
 	}
=20
 	return true;
@@ -299,7 +299,8 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER =
*waiter,
  */
 static bool __ww_mutex_wound(struct MUTEX *lock,
 			     struct ww_acquire_ctx *ww_ctx,
-			     struct ww_acquire_ctx *hold_ctx)
+			     struct ww_acquire_ctx *hold_ctx,
+			     struct wake_q_head *wake_q)
 {
 	struct task_struct *owner =3D __ww_mutex_owner(lock);
=20
@@ -331,7 +332,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 		 * wakeup pending to re-read the wounded state.
 		 */
 		if (owner !=3D current)
-			wake_up_process(owner);
+			wake_q_add(wake_q, owner);
=20
 		return true;
 	}
@@ -352,7 +353,8 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
  * The current task must not be on the wait list.
  */
 static void
-__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx)
+__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx,
+			 struct wake_q_head *wake_q)
 {
 	struct MUTEX_WAITER *cur;
=20
@@ -364,8 +366,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_=
acquire_ctx *ww_ctx)
 		if (!cur->ww_ctx)
 			continue;
=20
-		if (__ww_mutex_die(lock, cur, ww_ctx) ||
-		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
+		if (__ww_mutex_die(lock, cur, ww_ctx, wake_q) ||
+		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx, wake_q))
 			break;
 	}
 }
@@ -377,6 +379,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_=
acquire_ctx *ww_ctx)
 static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx=
 *ctx)
 {
+	DEFINE_WAKE_Q(wake_q);
+
 	ww_mutex_lock_acquired(lock, ctx);
=20
 	/*
@@ -405,8 +409,11 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, s=
truct ww_acquire_ctx *ctx)
 	 * die or wound us.
 	 */
 	lock_wait_lock(&lock->base);
-	__ww_mutex_check_waiters(&lock->base, ctx);
+	__ww_mutex_check_waiters(&lock->base, ctx, &wake_q);
+	preempt_disable();
 	unlock_wait_lock(&lock->base);
+	wake_up_q(&wake_q);
+	preempt_enable();
 }
=20
 static __always_inline int
@@ -488,7 +495,8 @@ __ww_mutex_check_kill(struct MUTEX *lock, struct MUTEX_=
WAITER *waiter,
 static inline int
 __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
 		      struct MUTEX *lock,
-		      struct ww_acquire_ctx *ww_ctx)
+		      struct ww_acquire_ctx *ww_ctx,
+		      struct wake_q_head *wake_q)
 {
 	struct MUTEX_WAITER *cur, *pos =3D NULL;
 	bool is_wait_die;
@@ -532,7 +540,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
 		pos =3D cur;
=20
 		/* Wait-Die: ensure younger waiters die. */
-		__ww_mutex_die(lock, cur, ww_ctx);
+		__ww_mutex_die(lock, cur, ww_ctx, wake_q);
 	}
=20
 	__ww_waiter_add(lock, waiter, pos);
@@ -550,7 +558,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
 		 * such that either we or the fastpath will wound @ww->ctx.
 		 */
 		smp_mb();
-		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
+		__ww_mutex_wound(lock, ww_ctx, ww->ctx, wake_q);
 	}
=20
 	return 0;
--=20
2.47.0.rc1.288.g06298d1525-goog
From nobody Sat Feb  7 16:18:56 2026
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com
 [209.85.219.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DB071D278A
	for <linux-kernel@vger.kernel.org>; Fri, 11 Oct 2024 23:25:40 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1728689142; cv=none;
 b=QvjgP5ids7+LBLqIF4vqCkAfnuj888lXcN8+vF/zwECxfrhUbNK2V1vDNQNdZCzc3VOG0XXBpZx0q1C5MYWMyOMkpZW1fXZg6dKo2+CBVS3rSv5bDjP0TWJRpUjwQMYJog0xO55CiwB76I3xcauq32y9p8xL8mxbWnYKjPZwBxc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1728689142; c=relaxed/simple;
	bh=yBUGVh250y9wcrlQI9aZ02GIIdZzuXcPE/vJ7ukaofY=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=jJhSGP1bH0qQPdGFPPZgnXM80+2xawbpS8mqEBC+IK/4JC4KtJScATnGuTOL9E2E9Ua/WArXOJ1d9Svle+g0zPVDNYSY3zez5q7i5EnUICtkEQjVJgwHnJKMylOjPvn1tgjm8yBf2IRkuk362szf0Wps01bjpGVAWSvucQkrKxA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=vOlheWsJ; arc=none smtp.client-ip=209.85.219.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="vOlheWsJ"
Received: by mail-yb1-f202.google.com with SMTP id
 3f1490d57ef6-e1159159528so5342948276.1
        for <linux-kernel@vger.kernel.org>;
 Fri, 11 Oct 2024 16:25:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1728689139; x=1729293939;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=226t3UQeedvRaJce9J8TjMTLATm8KD11iEAlxcOhk4g=;
        b=vOlheWsJ65XcI9QPKGxZSstLRhw0Y1nu2U8shRrmBfFC/IUgVvaXnCCv860HDPH3vk
         laazT4fJHD0CHkFTT/DPtNJOWo0gqg56Jn3sof3CvssT63qYreONpnF6PRjXLFBCJV1S
         OI9zr0e4WvlaljXN2MMO6LITjldYEJTIuVCCsN8X64jthJJe8LiClEifhQdq8+ZA9ZzM
         UmgJswyt2EKNgCHFOkrLSYJoSXVtCbH0g+lcc0Sa3vgMyzAwcHSkHUnKhknS4WN4EAmd
         E7qLep6UOHlxD1qLsi9GhZgqK5ZX5NZ7O4aD1EUov0UaL7oa652RqCX9q762zp2dYbhL
         8y2g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1728689139; x=1729293939;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=226t3UQeedvRaJce9J8TjMTLATm8KD11iEAlxcOhk4g=;
        b=Ll5EeqeO2cF205+eEct4zjg+8Wcy2IfKuDisF+qtdQMm/Q8L3ZVtqo9Rvah/CXBmFb
         S9s1FK8HqdIQ/7Q2D/Vc7dB10j0VSpE8azSMsVmrRmj6JL8zjTYTwHixQsMx1XTgKGpG
         W0OLCDaaDQr8pvxObMIgvApmxsoTqOzctyU1ORMauq0B9aQk0MqPYauCFWDFOt7buy3V
         N6RGK89wgT6fcS/q/Qj0p4ybYeHLcYXKhqsMxDgNamHWNzwUe9/iWtOmaixsi6NdMA3w
         C8Dxedb2Vfa6OpV7LYyqenzKvx4crH/zzLzYDjVlZfkqgK9xeWWKmrJG0D8IIbFOHaU7
         RfIA==
X-Gm-Message-State: AOJu0YxIQ8ty9DYZ1QUltOMvrSvmg6/mQYhhwsAOhKJ9U0Y0f2RRiTEE
	E59CqvrossyYhEcCVJb3guj2OcyODB6mBzIgJ8SZxtxmjdXAQJR1a+Y2XVOEExnctKvXL3NKZ3d
	LWU/RrtlGIFjJzh/EGA2MBnoO7OtVUQd/Ijb7f8mRlqRuLf1C2cDS2zJ6U5P08ePX8zlBJYXnEp
	wJCVcGlIXdM8R5d/pukGzIKYqfMD1Dj4sR3skwdkXseZts
X-Google-Smtp-Source: 
 AGHT+IETR74kUpYrWqMF7Zh86ROmJtWomKMdNac2TWgJeP88X6NSmqTayttQE/owikZWDzLxSvtFY/uv/tsI
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a05:6902:1349:b0:e20:2da6:ed77 with SMTP
 id 3f1490d57ef6-e2918421de9mr28000276.5.1728689139199; Fri, 11 Oct 2024
 16:25:39 -0700 (PDT)
Date: Fri, 11 Oct 2024 16:25:14 -0700
In-Reply-To: <20241011232525.2513424-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20241011232525.2513424-1-jstultz@google.com>
X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog
Message-ID: <20241011232525.2513424-3-jstultz@google.com>
Subject: [PATCH v13 2/7] locking/mutex: Make mutex::wait_lock irq safe
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
 Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	Metin Kaya <metin.kaya@arm.com>, "Connor O'Brien" <connoro@google.com>,
	John Stultz <jstultz@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Juri Lelli <juri.lelli@redhat.com>

With the proxy-execution series, we traverse the
task->mutex->task blocked_on/owner chain in the scheduler core.
We do this while holding the rq::lock to keep the structures in
place while taking and releasing the alternating lock types.

Since the mutex::wait_lock is one of the locks we will take in
this way under the rq::lock in the scheduler core, we need to
make sure that its usage elsewhere is irq safe.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[rebase & fix {un,}lock_wait_lock helpers in ww_mutex.h]
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
v3:
* Re-added this patch after it was dropped in v2 which
  caused lockdep warnings to trip.
v7:
* Fix function definition for PREEMPT_RT case, as pointed out
  by Metin Kaya.
* Fix incorrect flags handling in PREEMPT_RT case as found by
  Metin Kaya
v13:
* Expand the commit message, as suggested by Steven Rostedt
---
 kernel/locking/mutex.c    | 18 ++++++++++--------
 kernel/locking/ww_mutex.h | 21 +++++++++++----------
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 6c94da061ec2..cd248d1767eb 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -578,6 +578,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	DEFINE_WAKE_Q(wake_q);
 	struct mutex_waiter waiter;
 	struct ww_mutex *ww;
+	unsigned long flags;
 	int ret;
=20
 	if (!use_ww_ctx)
@@ -620,7 +621,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 		return 0;
 	}
=20
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	/*
 	 * After waiting to acquire the wait_lock, try again.
 	 */
@@ -681,7 +682,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 				goto err;
 		}
=20
-		raw_spin_unlock(&lock->wait_lock);
+		raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 		/* Make sure we do wakeups before calling schedule */
 		wake_up_q(&wake_q);
 		wake_q_init(&wake_q);
@@ -706,9 +707,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 			trace_contention_begin(lock, LCB_F_MUTEX);
 		}
=20
-		raw_spin_lock(&lock->wait_lock);
+		raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	}
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 acquired:
 	__set_current_state(TASK_RUNNING);
=20
@@ -734,7 +735,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	if (ww_ctx)
 		ww_mutex_lock_acquired(ww, ww_ctx);
=20
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
 	return 0;
@@ -744,7 +745,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st=
ate, unsigned int subclas
 	__mutex_remove_waiter(lock, &waiter);
 err_early_kill:
 	trace_contention_end(lock, ret);
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, ip);
 	wake_up_q(&wake_q);
@@ -915,6 +916,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 	struct task_struct *next =3D NULL;
 	DEFINE_WAKE_Q(wake_q);
 	unsigned long owner;
+	unsigned long flags;
=20
 	mutex_release(&lock->dep_map, ip);
=20
@@ -941,7 +943,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 		}
 	}
=20
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	debug_mutex_unlock(lock);
 	if (!list_empty(&lock->wait_list)) {
 		/* get the first entry from the wait-list: */
@@ -959,7 +961,7 @@ static noinline void __sched __mutex_unlock_slowpath(st=
ruct mutex *lock, unsigne
 		__mutex_handoff(lock, next);
=20
 	preempt_disable();
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
 }
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index a54bd16d0f17..37f025a096c9 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -70,14 +70,14 @@ __ww_mutex_has_waiters(struct mutex *lock)
 	return atomic_long_read(&lock->owner) & MUTEX_FLAG_WAITERS;
 }
=20
-static inline void lock_wait_lock(struct mutex *lock)
+static inline void lock_wait_lock(struct mutex *lock, unsigned long *flags)
 {
-	raw_spin_lock(&lock->wait_lock);
+	raw_spin_lock_irqsave(&lock->wait_lock, *flags);
 }
=20
-static inline void unlock_wait_lock(struct mutex *lock)
+static inline void unlock_wait_lock(struct mutex *lock, unsigned long *fla=
gs)
 {
-	raw_spin_unlock(&lock->wait_lock);
+	raw_spin_unlock_irqrestore(&lock->wait_lock, *flags);
 }
=20
 static inline void lockdep_assert_wait_lock_held(struct mutex *lock)
@@ -144,14 +144,14 @@ __ww_mutex_has_waiters(struct rt_mutex *lock)
 	return rt_mutex_has_waiters(&lock->rtmutex);
 }
=20
-static inline void lock_wait_lock(struct rt_mutex *lock)
+static inline void lock_wait_lock(struct rt_mutex *lock, unsigned long *fl=
ags)
 {
-	raw_spin_lock(&lock->rtmutex.wait_lock);
+	raw_spin_lock_irqsave(&lock->rtmutex.wait_lock, *flags);
 }
=20
-static inline void unlock_wait_lock(struct rt_mutex *lock)
+static inline void unlock_wait_lock(struct rt_mutex *lock, unsigned long *=
flags)
 {
-	raw_spin_unlock(&lock->rtmutex.wait_lock);
+	raw_spin_unlock_irqrestore(&lock->rtmutex.wait_lock, *flags);
 }
=20
 static inline void lockdep_assert_wait_lock_held(struct rt_mutex *lock)
@@ -380,6 +380,7 @@ static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx=
 *ctx)
 {
 	DEFINE_WAKE_Q(wake_q);
+	unsigned long flags;
=20
 	ww_mutex_lock_acquired(lock, ctx);
=20
@@ -408,10 +409,10 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, =
struct ww_acquire_ctx *ctx)
 	 * Uh oh, we raced in fastpath, check if any of the waiters need to
 	 * die or wound us.
 	 */
-	lock_wait_lock(&lock->base);
+	lock_wait_lock(&lock->base, &flags);
 	__ww_mutex_check_waiters(&lock->base, ctx, &wake_q);
 	preempt_disable();
-	unlock_wait_lock(&lock->base);
+	unlock_wait_lock(&lock->base, &flags);
 	wake_up_q(&wake_q);
 	preempt_enable();
 }
--=20
2.47.0.rc1.288.g06298d1525-goog
From nobody Sat Feb  7 16:18:56 2026
Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com
 [209.85.215.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08AF81E0B6D
	for <linux-kernel@vger.kernel.org>; Fri, 11 Oct 2024 23:25:42 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.215.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1728689144; cv=none;
 b=dy88fHHplxaesbd2kXX+qbegb5Lj6MkZvijCroHRQp/zTBvCoOfGBKIoKNiZcsPIRLgZ3yA8x9cMeYGUtNDuk2UeT0IjP08dV5zqF4A2mg8/VH+ZprIOgb50AK9e6zs2mgPPwqVogxkCyZhW/eS5dGcUjtlPem52cv4FoBOGu2A=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1728689144; c=relaxed/simple;
	bh=urjROjbvvzG3adzYbMqYoVN99qnivB8A7yniPBhNj20=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=frfIcjOFlRF/bq5iEbHp7b5k1leS44llzdKm+FgdzpmrA4MgEDfBfJyzhB68J9NkXNwL2Cc/RGNChC+edkVmjcMcqOgMYXInGQfmSRLuZP9yJS/ToygzPbPNmfgAQNhuQkZnFf1pWjsAYGXhE8v6TkAdKT9DyKYiU9yykCVdpaY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=RxFKRPAm; arc=none smtp.client-ip=209.85.215.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="RxFKRPAm"
Received: by mail-pg1-f202.google.com with SMTP id
 41be03b00d2f7-7ea0069a8b0so2476457a12.0
        for <linux-kernel@vger.kernel.org>;
 Fri, 11 Oct 2024 16:25:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1728689142; x=1729293942;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=/zSymx5j9+ZJCHggN2DuLmX3gB/lIKt6AY663VlhicU=;
        b=RxFKRPAm0iGtAgi7TiRpgbaazm8on5KnNx7JDVi84aBWxK1AyJCKRyfSIQkFr9Tmpw
         AMghBAueRKZD9Yz2qPqfPhwbCFDz9edCqUYmDVoQPHnWYi+H3zgSjClnAquQKUmZdcSb
         d6OSzDaWvcsDRszBjdbKGhDh6ICFApOtJurQwjx1BBq07JjjHRVyKyvEJQbyDHePxlMS
         CxC9+OMooM+6qmz10wCOGdI5bGbIIHL0qzEbBSMWFw+1TmvY1iKK4MQCGqBaa6sMsxaE
         RcZoSSxx9+os6AJHTMt95fMy708WMT9nFvQFwfGofUPP9PkWiyrhyFxvvVfozTub5ViU
         QVYg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1728689142; x=1729293942;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=/zSymx5j9+ZJCHggN2DuLmX3gB/lIKt6AY663VlhicU=;
        b=ejQCl1f8iTpFcL3L02uVS4jTt6Z+I3fsjntVM4S59HnnLTOwPCj98sLYBfqljLBcTq
         uvTbJCPVmw7twM+FIQBodYYj07jAxs7H9HDDDKI1OTw3GALELAcIuM07KuWuYvrfMJd1
         itHhPWswDg8Dm2XjJj/oIvDmXTZ9CWk728ByXNjQRuNVab2tNlnlZQI8LkP7Lbqbb2JZ
         C8k9vYT2eTTywpzArZmKLh08hEXgq5nYc1E6A8SMGZV4iRVJOKmb7EE2POx2iqI/y7dv
         JEvbfWX+un3vquH/55OV9OrOoWbiju42fumQYpJM/GodRj/A2B4hp0esWaAlFJOMLj4y
         b67w==
X-Gm-Message-State: AOJu0YzW8+ME/uTAD64LOeYX0M74wuJrpSUMszT+3bnXrs2tpRY3erx5
	BcdM268UVQMo28i8UmVbtxfIvM/rwlTisTOO/pqeiJBXivAHixYto8HWbfU5yZwDa3APV4BeCXO
	JQutiJYQlrFkzD2IsDzoV6Jc5DGcG6WyC2V3dzmj8b4ziI/JRdGkNHIuPg5CBJrf6klrlNuui7W
	DdoGecqsHxGvXc4ldZIY5hzAI6ji5BTZ+HeAstTLtU+bul
X-Google-Smtp-Source: 
 AGHT+IEijfMwwl5nsVTqDbSHbzV4cEsaxEpnYMdFNF9cXIcE7hTT62zKvUWPzw0G57DaYlcvR1v/Ownztaso
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a63:de55:0:b0:7cd:8b26:15df with SMTP id
 41be03b00d2f7-7ea5358e84cmr3375a12.7.1728689141080; Fri, 11 Oct 2024 16:25:41
 -0700 (PDT)
Date: Fri, 11 Oct 2024 16:25:15 -0700
In-Reply-To: <20241011232525.2513424-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20241011232525.2513424-1-jstultz@google.com>
X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog
Message-ID: <20241011232525.2513424-4-jstultz@google.com>
Subject: [PATCH v13 3/7] locking/mutex: Expose __mutex_owner()
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
 Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	Metin Kaya <metin.kaya@arm.com>,
 Valentin Schneider <valentin.schneider@arm.com>,
	"Connor O'Brien" <connoro@google.com>, John Stultz <jstultz@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Juri Lelli <juri.lelli@redhat.com>

Implementing proxy execution requires that scheduler code be able to
identify the current owner of a mutex. Expose __mutex_owner() for
this purpose (alone!). Includes a null mutex check, so that users
of the function can be simplified.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
[Removed the EXPORT_SYMBOL]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Reworked per Peter's suggestions]
Signed-off-by: John Stultz <jstultz@google.com>
---
v4:
* Move __mutex_owner() to kernel/locking/mutex.h instead of
  adding a new globally available accessor function to keep
  the exposure of this low, along with keeping it an inline
  function, as suggested by PeterZ
v10:
* Handle null lock ptr, to simplify later code, as suggested
  by Metin Kaya
v11:
* Tweak commit message suggested by Metin Kaya
---
 kernel/locking/mutex.c | 25 -------------------------
 kernel/locking/mutex.h | 27 +++++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index cd248d1767eb..3302e52f0c96 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -56,31 +56,6 @@ __mutex_init(struct mutex *lock, const char *name, struc=
t lock_class_key *key)
 }
 EXPORT_SYMBOL(__mutex_init);
=20
-/*
- * @owner: contains: 'struct task_struct *' to the current lock owner,
- * NULL means not owned. Since task_struct pointers are aligned at
- * at least L1_CACHE_BYTES, we have low bits to store extra state.
- *
- * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup.
- * Bit1 indicates unlock needs to hand the lock to the top-waiter
- * Bit2 indicates handoff has been done and we're waiting for pickup.
- */
-#define MUTEX_FLAG_WAITERS	0x01
-#define MUTEX_FLAG_HANDOFF	0x02
-#define MUTEX_FLAG_PICKUP	0x04
-
-#define MUTEX_FLAGS		0x07
-
-/*
- * Internal helper function; C doesn't allow us to hide it :/
- *
- * DO NOT USE (outside of mutex code).
- */
-static inline struct task_struct *__mutex_owner(struct mutex *lock)
-{
-	return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA=
GS);
-}
-
 static inline struct task_struct *__owner_task(unsigned long owner)
 {
 	return (struct task_struct *)(owner & ~MUTEX_FLAGS);
diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h
index 0b2a79c4013b..cbff35b9b7ae 100644
--- a/kernel/locking/mutex.h
+++ b/kernel/locking/mutex.h
@@ -20,6 +20,33 @@ struct mutex_waiter {
 #endif
 };
=20
+/*
+ * @owner: contains: 'struct task_struct *' to the current lock owner,
+ * NULL means not owned. Since task_struct pointers are aligned at
+ * at least L1_CACHE_BYTES, we have low bits to store extra state.
+ *
+ * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup.
+ * Bit1 indicates unlock needs to hand the lock to the top-waiter
+ * Bit2 indicates handoff has been done and we're waiting for pickup.
+ */
+#define MUTEX_FLAG_WAITERS	0x01
+#define MUTEX_FLAG_HANDOFF	0x02
+#define MUTEX_FLAG_PICKUP	0x04
+
+#define MUTEX_FLAGS		0x07
+
+/*
+ * Internal helper function; C doesn't allow us to hide it :/
+ *
+ * DO NOT USE (outside of mutex & scheduler code).
+ */
+static inline struct task_struct *__mutex_owner(struct mutex *lock)
+{
+	if (!lock)
+		return NULL;
+	return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA=
GS);
+}
+
 #ifdef CONFIG_DEBUG_MUTEXES
 extern void debug_mutex_lock_common(struct mutex *lock,
 				    struct mutex_waiter *waiter);
--=20
2.47.0.rc1.288.g06298d1525-goog
From nobody Sat Feb  7 16:18:56 2026
Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com
 [209.85.215.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06C0F1E7670
	for <linux-kernel@vger.kernel.org>; Fri, 11 Oct 2024 23:25:44 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.215.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1728689147; cv=none;
 b=pwF5sEFm7JiQRw8aTlKb+MA6rvC1Q9XpcV3yP1K3aH8qZmgIAO83FJZVxmFdDkbmZC4dxLa93YTCgiYcAtuhUWH14gpEF4eEysFM+cB3nbpHCzjS3tvgwBHnRmI4JN4MX/YXW4Aj49jzi3kxEsngiuI/PbgJmAb6Qd7nNmaGH4c=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1728689147; c=relaxed/simple;
	bh=2MMOxSmQOaj45c8JI7ITrThZt6Tw6kmxdn7MPxX8Zj4=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=pit7BXNQkoyeu9dNWYObOJB+JCwC6inHG9GyUO5OPwX3XvsNYzDGUnzLnyWCUT+fM5TpPMQN9v6xUFRJCK/XtlZpAbX99F1iSj4G0iYnj7L7/rRhJl+bpqGYm4Y7PwfWphsqF3vsJffcbbGNJVGiaXlhJMirk5HM+TSupyELHV4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=VUIe64eR; arc=none smtp.client-ip=209.85.215.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="VUIe64eR"
Received: by mail-pg1-f201.google.com with SMTP id
 41be03b00d2f7-7ea0069a8b0so2476469a12.0
        for <linux-kernel@vger.kernel.org>;
 Fri, 11 Oct 2024 16:25:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1728689144; x=1729293944;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=TmdfVKli6f+t2i8KGgDxePHnjVWNTK9utMAJ0OxuThg=;
        b=VUIe64eR1CHi4pTsbuiCZdZGQlVO7D8/m6eVKz0E6Sj5zTzCnbbcNfx0W/7aetyCpE
         kuhy0ZmMBCAIeUbZ3mMQc1E9wWLYdn78UQiAB48imUJa6VJ0Jm3EYqXUxP49nevRTQZM
         HTQnDWuCptsxL66yGSobea+n0mjk2DlHoqRvCWSkrEJy1fFWDbYIp0NyEdlBT2I9whBN
         2w0DQFZeJx7R5kywMzDj4RS2SJu9b9AB050wrj51Oc6yBRQZYBLjPTKlaZ0UIEnjy1qr
         ZDuFmOWg/RO7GceF95o3Jkgq3XMwNm8wPKF1mkYwQVuKgFvW9g8/rM80qnMCGBod5EpL
         lr+A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1728689144; x=1729293944;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=TmdfVKli6f+t2i8KGgDxePHnjVWNTK9utMAJ0OxuThg=;
        b=EdVo+BgcV9st0I7QaOAgqC3vFd14pAIqlNdBT2/KZ7qPLcXCxPawchTOjDOTaj24hy
         /J80dR1D/rnsVARRjZmGA7gjlFpgOPN1WqIQlzheU/XsqgPD222nrVNRxXnQJZvipOQv
         4kDpQ1iMvyuTilQi1WepTXGLEHyyudBX4Pd5I6+LOj7u4o6tjiNhjcSNUh26EGWMQNXN
         xLoqpPuijMPja60wp6AL7xn4euAmcYVz/sSi4BQsaXFZgYETeea39abQQcPrbn2Mxjgw
         xs5PU/qyMNNUZssYCFpG3a0mEFusDSHyARY3lcDmxKVn1MUIm+8nKnzIEz+x4FBYEJQj
         bjQQ==
X-Gm-Message-State: AOJu0Ywp9/MXtOrat6ko8Fl2OYpNlH7BGlbFGiNO1EGwMbDEhDGzRZS0
	hXdyi8adhgw+uSU7lvELjcXbhXSassYEC6FT306cD75D8Y/fKp05qxMjQHctXkxK4O/fCFSw0Jq
	JWcEuBhLY1BUXBpEOpJ1yLFs6lloYRU3iaFgV234OjiK0VcB0IV3WTffl5gr2E16H1KHQTsmlB7
	sJBtG4nrRGdnD2TH/mbT0ryorHHETJDGM4ks/EmY2Ll/80
X-Google-Smtp-Source: 
 AGHT+IEbAY3tUbeddT/n7uJQtatQtjVyFSsRZDwaIBPWNPSgL5WFmRrvAG+MDLJ1h8NYBadiOpOyoi+88I53
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:90a:9a94:b0:2e2:da81:40c1 with SMTP id
 98e67ed59e1d1-2e2f09f2280mr6662a91.1.1728689142910; Fri, 11 Oct 2024 16:25:42
 -0700 (PDT)
Date: Fri, 11 Oct 2024 16:25:16 -0700
In-Reply-To: <20241011232525.2513424-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20241011232525.2513424-1-jstultz@google.com>
X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog
Message-ID: <20241011232525.2513424-5-jstultz@google.com>
Subject: [PATCH v13 4/7] sched: Add move_queued_task_locked helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: "Connor O'Brien" <connoro@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
 Mel Gorman <mgorman@suse.de>,
	Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
 Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
 Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	Metin Kaya <metin.kaya@arm.com>, John Stultz <jstultz@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Connor O'Brien <connoro@google.com>

Switch logic that deactivates, sets the task cpu,
and reactivates a task on a different rq to use a
helper that will be later extended to push entire
blocked task chains.

This patch was broken out from a larger chain migration
patch originally by Connor O'Brien.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: split out from larger chain migration patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v8:
* Renamed from push_task_chain to do_push_task so it makes
  more sense without proxy-execution
v10:
* Changed name to move_queued_task_locked as suggested by Valentin
v11:
* Also use new helper in __migrate_swap_task() and
  try_steal_cookie() as suggested by Qais Yousef
* Nit cleanups suggested by Metin
---
 kernel/sched/core.c     | 13 +++----------
 kernel/sched/deadline.c |  8 ++------
 kernel/sched/rt.c       |  8 ++------
 kernel/sched/sched.h    | 12 ++++++++++++
 4 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 43e453ab7e20..615fc7a7b17c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2615,9 +2615,7 @@ int push_cpu_stop(void *arg)
=20
 	// XXX validate p is still the highest prio task
 	if (task_rq(p) =3D=3D rq) {
-		deactivate_task(rq, p, 0);
-		set_task_cpu(p, lowest_rq->cpu);
-		activate_task(lowest_rq, p, 0);
+		move_queued_task_locked(rq, lowest_rq, p);
 		resched_curr(lowest_rq);
 	}
=20
@@ -3303,9 +3301,7 @@ static void __migrate_swap_task(struct task_struct *p=
, int cpu)
 		rq_pin_lock(src_rq, &srf);
 		rq_pin_lock(dst_rq, &drf);
=20
-		deactivate_task(src_rq, p, 0);
-		set_task_cpu(p, cpu);
-		activate_task(dst_rq, p, 0);
+		move_queued_task_locked(src_rq, dst_rq, p);
 		wakeup_preempt(dst_rq, p, 0);
=20
 		rq_unpin_lock(dst_rq, &drf);
@@ -6293,10 +6289,7 @@ static bool try_steal_cookie(int this, int that)
 		if (sched_task_is_throttled(p, this))
 			goto next;
=20
-		deactivate_task(src, p, 0);
-		set_task_cpu(p, this);
-		activate_task(dst, p, 0);
-
+		move_queued_task_locked(src, dst, p);
 		resched_curr(dst);
=20
 		success =3D true;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 9ce93d0bf452..6c87d812efbe 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2751,9 +2751,7 @@ static int push_dl_task(struct rq *rq)
 		goto retry;
 	}
=20
-	deactivate_task(rq, next_task, 0);
-	set_task_cpu(next_task, later_rq->cpu);
-	activate_task(later_rq, next_task, 0);
+	move_queued_task_locked(rq, later_rq, next_task);
 	ret =3D 1;
=20
 	resched_curr(later_rq);
@@ -2839,9 +2837,7 @@ static void pull_dl_task(struct rq *this_rq)
 			if (is_migration_disabled(p)) {
 				push_task =3D get_push_task(src_rq);
 			} else {
-				deactivate_task(src_rq, p, 0);
-				set_task_cpu(p, this_cpu);
-				activate_task(this_rq, p, 0);
+				move_queued_task_locked(src_rq, this_rq, p);
 				dmin =3D p->dl.deadline;
 				resched =3D true;
 			}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 172c588de542..e2506ab33c97 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2088,9 +2088,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 		goto retry;
 	}
=20
-	deactivate_task(rq, next_task, 0);
-	set_task_cpu(next_task, lowest_rq->cpu);
-	activate_task(lowest_rq, next_task, 0);
+	move_queued_task_locked(rq, lowest_rq, next_task);
 	resched_curr(lowest_rq);
 	ret =3D 1;
=20
@@ -2361,9 +2359,7 @@ static void pull_rt_task(struct rq *this_rq)
 			if (is_migration_disabled(p)) {
 				push_task =3D get_push_task(src_rq);
 			} else {
-				deactivate_task(src_rq, p, 0);
-				set_task_cpu(p, this_cpu);
-				activate_task(this_rq, p, 0);
+				move_queued_task_locked(src_rq, this_rq, p);
 				resched =3D true;
 			}
 			/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b1c3588a8f00..b904a5004eae 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3768,6 +3768,18 @@ static inline void init_sched_mm_cid(struct task_str=
uct *t) { }
=20
 extern u64 avg_vruntime(struct cfs_rq *cfs_rq);
 extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se);
+#ifdef CONFIG_SMP
+static inline
+void move_queued_task_locked(struct rq *src_rq, struct rq *dst_rq, struct =
task_struct *task)
+{
+	lockdep_assert_rq_held(src_rq);
+	lockdep_assert_rq_held(dst_rq);
+
+	deactivate_task(src_rq, task, 0);
+	set_task_cpu(task, dst_rq->cpu);
+	activate_task(dst_rq, task, 0);
+}
+#endif
=20
 #ifdef CONFIG_RT_MUTEXES
=20
--=20
2.47.0.rc1.288.g06298d1525-goog
From nobody Sat Feb  7 16:18:56 2026
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com
 [209.85.214.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DE431D278A
	for <linux-kernel@vger.kernel.org>; Fri, 11 Oct 2024 23:25:47 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1728689149; cv=none;
 b=k//0tvsSNK+uZFmgvHXcHLi61Y4zJ9ftZS4lMk275cpFS2pE6GufqZoMO06LXByGQePUTKdkuVbtUJKqBnv7+f2oEPHyEa7oPM2MMvUK+URMsXB6bxAuQnwRkjlwjmpnMBrwHhAyUbeLZm4rBpMATwe20+r5yS4IZxztOWJrktg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1728689149; c=relaxed/simple;
	bh=vKVhXyUCQW3VrmxMgwIAKNIWXaRjutTkmiYmUdOL7y8=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=YA4uORysuDtxLLp2zWpafnPfXxhnRIfhJicG/DhpuKScwuFFUOCpRjDX4qinDR5PlW99Zw/rwUkZWL6kfYpSA7uCISd6nXJbsnrMMuY7LLuQAolkOW0Cxy0aihkjh8kHPF3w9eYy8fQ0i3PSRFECjqDVQLP6GX3q1PYHUTjDJrM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=Pb4BnJea; arc=none smtp.client-ip=209.85.214.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="Pb4BnJea"
Received: by mail-pl1-f202.google.com with SMTP id
 d9443c01a7336-20c98aa6b4bso16147685ad.0
        for <linux-kernel@vger.kernel.org>;
 Fri, 11 Oct 2024 16:25:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1728689147; x=1729293947;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=JZKyKnHvG8nikVkXgW9xs5MrACmc1p4QGfsCdciCMo4=;
        b=Pb4BnJea98IaD5B0gKRVwf0d5wzJ4bPU8h5Ez95erQpL4AolgPM5zjF4B7sHBcrzDl
         0f+4jlpLXjCDX+YyEh9FGcIbsSJvgNfTvi04aZh/ykfS//W3eqLB776VTj9mbu0EBYUY
         xU5ex9BnfesmpMgUTvpLw3kCjJvjE/wrvWs3KooV2FpHP7CSkjZetyWik6G5VputEMD/
         CNjP0e0ZtxC/AYVF+J+EY0K1VPKPLWV9v7RNgh8VklOafWY72iG3WZDHNbjHZinW2tyk
         D4mPRX6hIqM/RMoEODxg2y2S4AhqHp0jz/95Rgs3+Ixnhuw7PaWArtdxMjvhWhYbptzF
         tG+w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1728689147; x=1729293947;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=JZKyKnHvG8nikVkXgW9xs5MrACmc1p4QGfsCdciCMo4=;
        b=t9BjToq5HSaOdcAVyAPWRavOMbISmtqdBDgz+Y/k0viJ6t9hY2RTw/pkslyWRbtZTp
         vFmrvNi4svEGEi29pd29YX+qzgHziGvC5WoOZ80+cA2sWkESF6ovEX8+3oe1y+zGAu36
         3BFDmP91vMMsz2Heuilue5N/mbzv26mtNUtM/JGR8ThISjfjazpUJcMyrfObFegios+3
         mzVyH+Copj3s0QXH3uBV+6TUcN7R3692hF6BauZ+kK8MHuH156eNhBrJuXpQNm9iApKX
         PAqeYmZBYJ2gO2tO6u7gtNDnFysa8WrbEMAMINB3RBzyHeBNtGvk84msF/NAXC1muXjd
         Euww==
X-Gm-Message-State: AOJu0Yw3bp+PSpahx5y9gMIlZctOMB+BCP/PhwIu7t7ed4o0TbT7KuVW
	bwzyoc2eKVNfN0WlT9IMoh/bMDsYOS2yQmW1qBEhDt+lh9cFLksfQXWtuQ2wrSlnAhN9Lp1Rff+
	sU10dj8+f/c1Bhgxegnjz6q3o02dCEpMshyYfcx/44hyH+c7cFjdBs1pvCAtqPTvGPv7V98m91o
	2JK3xnR4MydAX+hzBXVg2N8QTPH8DCF5FgmKyqAlvWQcBv
X-Google-Smtp-Source: 
 AGHT+IFnuEvQlaT8Q+rAgHUYmvhhqBT1iMhn0UnSzE2/kqSQ0d+r4yswJ7gwmgr3nI8NpsGnXcI/Qyig4jzH
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a17:902:c94b:b0:20c:8b07:c66b with SMTP id
 d9443c01a7336-20ca17378efmr94865ad.11.1728689144888; Fri, 11 Oct 2024
 16:25:44 -0700 (PDT)
Date: Fri, 11 Oct 2024 16:25:17 -0700
In-Reply-To: <20241011232525.2513424-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20241011232525.2513424-1-jstultz@google.com>
X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog
Message-ID: <20241011232525.2513424-6-jstultz@google.com>
Subject: [PATCH v13 5/7] sched: Consolidate pick_*_task to task_is_pushable
 helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: "Connor O'Brien" <connoro@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
 Mel Gorman <mgorman@suse.de>,
	Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
 Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
 Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	Metin Kaya <metin.kaya@arm.com>, Christian Loehle <christian.loehle@arm.com>,
	John Stultz <jstultz@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Connor O'Brien <connoro@google.com>

This patch consolidates rt and deadline pick_*_task functions to
a task_is_pushable() helper

This patch was broken out from a larger chain migration
patch originally by Connor O'Brien.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: split out from larger chain migration patch,
 renamed helper function]
Signed-off-by: John Stultz <jstultz@google.com>
---
v7:
* Split from chain migration patch
* Renamed function
v11:
* Switched to bool (though later in the series it goes
  to a tri-state return) for now to simplify review.
  Will add tri-state handling later in the series when
  its needed. Suggested by Metin and others.
---
 kernel/sched/deadline.c | 10 +---------
 kernel/sched/rt.c       | 11 +----------
 kernel/sched/sched.h    | 10 ++++++++++
 3 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 6c87d812efbe..56260a80a268 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2487,14 +2487,6 @@ static void task_fork_dl(struct task_struct *p)
 /* Only try algorithms three times */
 #define DL_MAX_TRIES 3
=20
-static int pick_dl_task(struct rq *rq, struct task_struct *p, int cpu)
-{
-	if (!task_on_cpu(rq, p) &&
-	    cpumask_test_cpu(cpu, &p->cpus_mask))
-		return 1;
-	return 0;
-}
-
 /*
  * Return the earliest pushable rq's task, which is suitable to be executed
  * on the CPU, NULL otherwise:
@@ -2513,7 +2505,7 @@ static struct task_struct *pick_earliest_pushable_dl_=
task(struct rq *rq, int cpu
 	if (next_node) {
 		p =3D __node_2_pdl(next_node);
=20
-		if (pick_dl_task(rq, p, cpu))
+		if (task_is_pushable(rq, p, cpu))
 			return p;
=20
 		next_node =3D rb_next(next_node);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e2506ab33c97..c5c22fc51824 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1773,15 +1773,6 @@ static void put_prev_task_rt(struct rq *rq, struct t=
ask_struct *p, struct task_s
 /* Only try algorithms three times */
 #define RT_MAX_TRIES 3
=20
-static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
-{
-	if (!task_on_cpu(rq, p) &&
-	    cpumask_test_cpu(cpu, &p->cpus_mask))
-		return 1;
-
-	return 0;
-}
-
 /*
  * Return the highest pushable rq's task, which is suitable to be executed
  * on the CPU, NULL otherwise
@@ -1795,7 +1786,7 @@ static struct task_struct *pick_highest_pushable_task=
(struct rq *rq, int cpu)
 		return NULL;
=20
 	plist_for_each_entry(p, head, pushable_tasks) {
-		if (pick_rt_task(rq, p, cpu))
+		if (task_is_pushable(rq, p, cpu))
 			return p;
 	}
=20
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b904a5004eae..cb74a577c89d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3779,6 +3779,16 @@ void move_queued_task_locked(struct rq *src_rq, stru=
ct rq *dst_rq, struct task_s
 	set_task_cpu(task, dst_rq->cpu);
 	activate_task(dst_rq, task, 0);
 }
+
+static inline
+bool task_is_pushable(struct rq *rq, struct task_struct *p, int cpu)
+{
+	if (!task_on_cpu(rq, p) &&
+	    cpumask_test_cpu(cpu, &p->cpus_mask))
+		return true;
+
+	return false;
+}
 #endif
=20
 #ifdef CONFIG_RT_MUTEXES
--=20
2.47.0.rc1.288.g06298d1525-goog
From nobody Sat Feb  7 16:18:56 2026
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com
 [209.85.219.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BFFC1E9078
	for <linux-kernel@vger.kernel.org>; Fri, 11 Oct 2024 23:25:48 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1728689150; cv=none;
 b=S25X5fE6bwY3NSn5Ui+xjHOxdkzl34Fs6BOtxwgTKrfkkcq1wpD/HMWASaQC9J/8SOSoBp52eAS8xMREjZAYlm952LrlTlo+GhPAWuHzXizQVvFZAK4NBRVrAjQVYZ1kHjEKlx7Kn6AHxyz+dKHC4hDQ1A07zkso8sW6v6VIFlc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1728689150; c=relaxed/simple;
	bh=XP+m3UmPjVATfCaJLwYuwjCIQxmOqdjo+gwIbRcdtUI=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=Wdpd7Al/2xBW9z8m+m/Dq/zZZCVX/TQJ6LbbP1a/U+1lz4+jG7TZiaDFMeGUHQOLGCfPs/CSOuRmHz+ejPWy3D/x8V0k1qaC9bbWwyoSw0vSYGlH3/JnCa8Uw6zCt4dPLAHvZD6sgNRO9+4DrF4UUNk4/DE/apahc8DkOjay1DQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=LZnKmUKH; arc=none smtp.client-ip=209.85.219.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="LZnKmUKH"
Received: by mail-yb1-f202.google.com with SMTP id
 3f1490d57ef6-e2939e25402so37487276.2
        for <linux-kernel@vger.kernel.org>;
 Fri, 11 Oct 2024 16:25:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1728689147; x=1729293947;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=uvjM0SvpeCxaj9hARcp4o5YDk0ESpiMNhc0SwU/LYEg=;
        b=LZnKmUKH8Wb+qIQszlbB06XOsyE60y3VLcYVY/YsmqX3WKZoEd86tbg+De2tIJJQFY
         kLXkGJI5M3pwdaGeFGCoWMvWVvIFZ5OCkkl6bgWEbRZuZFuYPELw5vp7Y4qQncp9YTwt
         l2exFXXVrhJ0IpLJy4DUZlLBQK223as7B/OEKLwyK8rC1EZDJM+jiFBWoei8Y8LEuRNg
         ftUnGRBfzjuSVqxayvrEwh9mOoAhW/Dz8ZzH2HMjAvVy+eiInHvX2mCXFyuI8lograib
         H2IwvEK+JDyjsGODlgS/BGVDFjM2o/O/aG65W2jZvgxsw3m/wpu4yIKQwrsX5RKymw+W
         Cumw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1728689147; x=1729293947;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=uvjM0SvpeCxaj9hARcp4o5YDk0ESpiMNhc0SwU/LYEg=;
        b=o82XZIE2EibZmtDsLvrQMtv/nw2U+qErtGqnlQuCkPtjNSITaDQXxG0cpXNzuiX0FK
         RPYurBgwnNJNNTphsi3uFllrEvToJMxBCUxojKV0aJcbyvNOAJhhCgNbds7gZuI59mlU
         eX5D8PjicB7Cx0CuZqE2/p2fP5+HCAH49ZUb3mq8xKpeTzeVvdMcYOH9IJKo0Tq7GBHF
         XWZOhVZQEDXizXPH951AwwnYWWpK7QSdGEhOJ6YMJIz4//Q+Bl3doO7AjJ7eDSPEs8qN
         x4h81uZEjZIATDyc6KSHtmxiKjEm5F4c+43DGGOfMiPOhETl82sHLsiaAGRqFW/niGz3
         2OnQ==
X-Gm-Message-State: AOJu0Yx1NiTXOc4Cr0kgprgdFhFIFM06sir5BbsestqZ3k9VXZt0yi11
	H/I7Qarz1zTuphQuH739DiGOfXRD8Q4Pd2XDwK/FT8A0B7DDmOaVrm1189DWnvW+svK3VCTgqNk
	k+HxMq0vLP+RHAiU8w2gBqyJaH+f68cNBW7Ip6ZgtA0hJa62Txrs+txGNPWypT83g87elpzJOZZ
	PPXg/gJK4Fnrc22JON96IYqn+7dMQRXdiVgJqYeRhV3jSU
X-Google-Smtp-Source: 
 AGHT+IGyyGDE/H0C+TeXj4id1783Jm1J4InYkFwFfayEhjnfOWRXeFPVn9dm/UugiLKC8cfYngjI9TxUtOXO
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:a25:d689:0:b0:e03:53a4:1a7 with SMTP id
 3f1490d57ef6-e291a33e840mr16515276.10.1728689146813; Fri, 11 Oct 2024
 16:25:46 -0700 (PDT)
Date: Fri, 11 Oct 2024 16:25:18 -0700
In-Reply-To: <20241011232525.2513424-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20241011232525.2513424-1-jstultz@google.com>
X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog
Message-ID: <20241011232525.2513424-7-jstultz@google.com>
Subject: [PATCH v13 6/7] sched: Split out __schedule() deactivate task logic
 into a helper
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
 Mel Gorman <mgorman@suse.de>,
	Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
 Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
 K Prateek Nayak <kprateek.nayak@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
 Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	Metin Kaya <metin.kaya@arm.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

As we're going to re-use the deactivation logic,
split it into a helper.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Signed-off-by: John Stultz <jstultz@google.com>
---
v6:
* Define function as static to avoid "no previous prototype"
  warnings as Reported-by: kernel test robot <lkp@intel.com>
v7:
* Rename state task_state to be more clear, as suggested by
  Metin Kaya
v11:
* Return early to simplify indentation, and drop unused bool
  return (will be introduced later when its needed) as sugggested
  by Qais.
---
 kernel/sched/core.c | 65 +++++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 615fc7a7b17c..7f949bb9e2a8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6483,6 +6483,44 @@ pick_next_task(struct rq *rq, struct task_struct *pr=
ev, struct rq_flags *rf)
 #define SM_PREEMPT		1
 #define SM_RTLOCK_WAIT		2
=20
+/*
+ * Helper function for __schedule()
+ *
+ * If a task does not have signals pending, deactivate it
+ * Otherwise marks the task's __state as RUNNING
+ */
+static void try_to_deactivate_task(struct rq *rq, struct task_struct *p,
+				   unsigned long task_state)
+{
+	int flags =3D DEQUEUE_NOCLOCK;
+
+	if (signal_pending_state(task_state, p)) {
+		WRITE_ONCE(p->__state, TASK_RUNNING);
+		return;
+	}
+
+	p->sched_contributes_to_load =3D
+		(task_state & TASK_UNINTERRUPTIBLE) &&
+		!(task_state & TASK_NOLOAD) &&
+		!(task_state & TASK_FROZEN);
+
+	if (unlikely(is_special_task_state(task_state)))
+		flags |=3D DEQUEUE_SPECIAL;
+
+	/*
+	 * __schedule()			ttwu()
+	 *   prev_state =3D prev->state;    if (p->on_rq && ...)
+	 *   if (prev_state)		    goto out;
+	 *     p->on_rq =3D 0;		  smp_acquire__after_ctrl_dep();
+	 *				  p->state =3D TASK_WAKING
+	 *
+	 * Where __schedule() and ttwu() have matching control dependencies.
+	 *
+	 * After this, schedule() must not care about p->state any more.
+	 */
+	block_task(rq, p, flags);
+}
+
 /*
  * __schedule() is the main scheduler function.
  *
@@ -6590,32 +6628,7 @@ static void __sched notrace __schedule(int sched_mod=
e)
 			goto picked;
 		}
 	} else if (!preempt && prev_state) {
-		if (signal_pending_state(prev_state, prev)) {
-			WRITE_ONCE(prev->__state, TASK_RUNNING);
-		} else {
-			int flags =3D DEQUEUE_NOCLOCK;
-
-			prev->sched_contributes_to_load =3D
-				(prev_state & TASK_UNINTERRUPTIBLE) &&
-				!(prev_state & TASK_NOLOAD) &&
-				!(prev_state & TASK_FROZEN);
-
-			if (unlikely(is_special_task_state(prev_state)))
-				flags |=3D DEQUEUE_SPECIAL;
-
-			/*
-			 * __schedule()			ttwu()
-			 *   prev_state =3D prev->state;    if (p->on_rq && ...)
-			 *   if (prev_state)		    goto out;
-			 *     p->on_rq =3D 0;		  smp_acquire__after_ctrl_dep();
-			 *				  p->state =3D TASK_WAKING
-			 *
-			 * Where __schedule() and ttwu() have matching control dependencies.
-			 *
-			 * After this, schedule() must not care about p->state any more.
-			 */
-			block_task(rq, prev, flags);
-		}
+		try_to_deactivate_task(rq, prev, prev_state);
 		switch_count =3D &prev->nvcsw;
 	}
=20
--=20
2.47.0.rc1.288.g06298d1525-goog
From nobody Sat Feb  7 16:18:56 2026
Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com
 [209.85.210.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39EAB1EB9E4
	for <linux-kernel@vger.kernel.org>; Fri, 11 Oct 2024 23:25:51 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.210.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1728689153; cv=none;
 b=nL3ghnT54ZGsUBA4/jMHqPu2+Ro6X8oEq5FWFR54obq3jDs/O6D5LuCdGQiRiKIEDpA4UnKRIpzqXeezSLdz6W5+uKyEDw123+CMY5VdQV3cYt1FgIio6FukKjjyN5WddHVBCT5hhxGn+DvzdqHpYEm538nS4GriUuKMVMJaw9A=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1728689153; c=relaxed/simple;
	bh=JGUA5JoyApRZvt+4LIlDdKsgB1GAEWybEp0EjMc9G2E=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=OJwOVUuqK08Pw9RVyIcNgTQ8KIOq39WcgLDwbvAswbH0Qe9jHEur/40qxyVDLkYTfmh4+oXq4O7J1pvWBncmOqg39frgCiKsXgtjn4qZUA2Ci+POjWJRUVqY6szGNltFWy8GiJ9gC/SVln2APfVFlSiH8diwMWzfcys18cGap6s=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=iwVyPZnJ; arc=none smtp.client-ip=209.85.210.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="iwVyPZnJ"
Received: by mail-pf1-f201.google.com with SMTP id
 d2e1a72fcca58-71e04c42fecso2475393b3a.0
        for <linux-kernel@vger.kernel.org>;
 Fri, 11 Oct 2024 16:25:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1728689150; x=1729293950;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=D8M6aRV6iJvfaFjBkee8sFnJIaKc61E+oNUtFJOZKTA=;
        b=iwVyPZnJ9WjYJYmant2Dw0ccTItoHXALbvyf0TX9GQeAaxhbZ1CydoOyyEBEkPEEfE
         FhxWXMSbpRgt7Lje4iKpSTSHob3g7UlCr/lJ1B1wLgppzehn3IijZa4+JP1QPcM71MuG
         4GicSQw3UJuqEnZs0VQkaW95HEAUvDhwq5xj9WBijz9lBY1mFInO8WSFhBzOWmBtlkCS
         ZJyS7najV+EcTMHiBu1H6DEb8a42NXcj0MMziih6hYzpwtP+nrO7Da1RcYskMSp6SWjL
         DLaYXVMvV86Ywz/cgWQU3S0W6j3dS+nCvKZ10agikyoYKggcJtWggtBHSsJ2j5Cjn0y8
         eZPg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1728689150; x=1729293950;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=D8M6aRV6iJvfaFjBkee8sFnJIaKc61E+oNUtFJOZKTA=;
        b=ggn8FBhODnn4MthxxtFzG7cR/hT4Hp4Z0jgL+ZU+DSgQK6rzC6ME7KCCOjRtJUrdKo
         rj1Q3CGSHWvWt+5+1pK4Ifx+qhdfDxDMrWTOMvcaJnWag1cL+oDtWlissQfPA6/o9xLP
         7dW/sZ9yGJLKZaA7x+lbrEB+5E89Y2b51u7H8Jy5eoPoQIVL0MJSpWXe9ZWtLQ29UuUs
         WbCGoCfUhrUmpyJHZcKdKFVMh9DZk/zT2u0vggnOaEmhPOmwioZ23BSy/02BfUMmkiwK
         2L7Ie06yrcg/U0t7/Bos7KeqV9zgSgObhl4QyVjtGhBQHVkCMPkThjXYgSOZ1FmjpWIL
         s1+w==
X-Gm-Message-State: AOJu0Yz9YQKWhJ/WCImvAExBq9r9YrhArekMnf4ZJqayKEsF3kkszoqL
	vZa59ovZJi2taNyFap3Np+hLaYaU0ECPiA+Ja8UwqvVeX8i5YGGkoKHts4hBvr8sWC42mh6LXdJ
	T4i+erpZ+mCDm9rZS9W5u9vDUL8TZY0UPyC6UENJQvuMQYXmBngW+Jbn2krBrxlLhs5DzcyFI0K
	xH+kwsQsWWRCP6JmU69k/B8C7SyhteyYj7VGsKFZMIo/ju
X-Google-Smtp-Source: 
 AGHT+IGjiPT5Ag2QntktHsQPPIot1ex8JqVkEy9XtKtR24darfnlnw4CsN7/DMj1OMuId0b7ykRxbRQn7hVg
X-Received: from jstultz-noogler2.c.googlers.com
 ([fda3:e722:ac3:cc00:24:72f4:c0a8:600])
 (user=jstultz job=sendgmr) by 2002:aa7:8fb0:0:b0:71d:fa24:4d84 with SMTP id
 d2e1a72fcca58-71e3795f8c4mr12343b3a.3.1728689148640; Fri, 11 Oct 2024
 16:25:48 -0700 (PDT)
Date: Fri, 11 Oct 2024 16:25:19 -0700
In-Reply-To: <20241011232525.2513424-1-jstultz@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20241011232525.2513424-1-jstultz@google.com>
X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog
Message-ID: <20241011232525.2513424-8-jstultz@google.com>
Subject: [PATCH v13 7/7] sched: Split scheduler and execution contexts
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>, Mel Gorman <mgorman@suse.de>,
 Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>, Metin Kaya <Metin.Kaya@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
 Daniel Lezcano <daniel.lezcano@linaro.org>, kernel-team@android.com,
	Metin Kaya <metin.kaya@arm.com>, "Connor O'Brien" <connoro@google.com>,
	John Stultz <jstultz@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Peter Zijlstra <peterz@infradead.org>

Let's define the "scheduling context" as all the scheduler state
in task_struct for the task chosen to run, which we'll call the
donor task, and the "execution context" as all state required to
actually run the task.

Currently both are intertwined in task_struct. We want to
logically split these such that we can use the scheduling
context of the donor task selected to be scheduled, but use
the execution context of a different task to actually be run.

To this purpose, introduce rq->donor field to point to the
task_struct chosen from the runqueue by the scheduler, and will
be used for scheduler state, and preserve rq->curr to indicate
the execution context of the task that will actually be run.

This patch introduces the donor field as a union with curr, so it
doesn't cause the contexts to be split yet, but adds the logic to
handle everything separately.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20181009092434.26221-5-juri.lelli@redhat.com
[add additional comments and update more sched_class code to use
 rq::proxy]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Rebased and resolved minor collisions, reworked to use
 accessors, tweaked update_curr_common to use rq_proxy fixing rt
 scheduling issues]
Signed-off-by: John Stultz <jstultz@google.com>
---
v2:
* Reworked to use accessors
* Fixed update_curr_common to use proxy instead of curr
v3:
* Tweaked wrapper names
* Swapped proxy for selected for clarity
v4:
* Minor variable name tweaks for readability
* Use a macro instead of a inline function and drop
  other helper functions as suggested by Peter.
* Remove verbose comments/questions to avoid review
  distractions, as suggested by Dietmar
v5:
* Add CONFIG_PROXY_EXEC option to this patch so the
  new logic can be tested with this change
* Minor fix to grab rq_selected when holding the rq lock
v7:
* Minor spelling fix and unused argument fixes suggested by
  Metin Kaya
* Switch to curr_selected for consistency, and minor rewording
  of commit message for clarity
* Rename variables selected instead of curr when we're using
  rq_selected()
* Reduce macros in CONFIG_SCHED_PROXY_EXEC ifdef sections,
  as suggested by Metin Kaya
v8:
* Use rq->curr, not rq_selected with task_tick, as suggested by
  Valentin
* Minor rework to reorder this with CONFIG_SCHED_PROXY_EXEC patch
v10:
* Use rq_selected in push_rt_task & get_push_task
v11:
* Rework to use selected instead of curr in a few cases we were
  previously assigning curr =3D rq_selected() to minimize lines of
  change. Suggested by Metin.
v12:
* Big rename to use rq->donor instead of rq_selected(), as suggested
  by Peter.
---
 kernel/sched/core.c     | 45 +++++++++++++++++++++++---------------
 kernel/sched/deadline.c | 39 +++++++++++++++++----------------
 kernel/sched/fair.c     | 28 ++++++++++++------------
 kernel/sched/pelt.c     |  2 +-
 kernel/sched/rt.c       | 48 ++++++++++++++++++++---------------------
 kernel/sched/sched.h    | 28 +++++++++++++++++++++---
 kernel/sched/syscalls.c |  4 ++--
 7 files changed, 114 insertions(+), 80 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f949bb9e2a8..74f01fbff726 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -827,7 +827,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *time=
r)
=20
 	rq_lock(rq, &rf);
 	update_rq_clock(rq);
-	rq->curr->sched_class->task_tick(rq, rq->curr, 1);
+	rq->donor->sched_class->task_tick(rq, rq->curr, 1);
 	rq_unlock(rq, &rf);
=20
 	return HRTIMER_NORESTART;
@@ -2130,16 +2130,18 @@ void check_class_changed(struct rq *rq, struct task=
_struct *p,
=20
 void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags)
 {
-	if (p->sched_class =3D=3D rq->curr->sched_class)
-		rq->curr->sched_class->wakeup_preempt(rq, p, flags);
-	else if (sched_class_above(p->sched_class, rq->curr->sched_class))
+	struct task_struct *donor =3D rq->donor;
+
+	if (p->sched_class =3D=3D donor->sched_class)
+		donor->sched_class->wakeup_preempt(rq, p, flags);
+	else if (sched_class_above(p->sched_class, donor->sched_class))
 		resched_curr(rq);
=20
 	/*
 	 * A queue event has occurred, and we're going to schedule.  In
 	 * this case, we can save a useless back to back clock update.
 	 */
-	if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
+	if (task_on_rq_queued(donor) && test_tsk_need_resched(rq->curr))
 		rq_clock_skip_update(rq);
 }
=20
@@ -2675,7 +2677,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct a=
ffinity_context *ctx)
 		lockdep_assert_held(&p->pi_lock);
=20
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_donor(rq, p);
=20
 	if (queued) {
 		/*
@@ -5500,7 +5502,7 @@ unsigned long long task_sched_runtime(struct task_str=
uct *p)
 	 * project cycles that may never be accounted to this
 	 * thread, breaking clock_gettime().
 	 */
-	if (task_current(rq, p) && task_on_rq_queued(p)) {
+	if (task_current_donor(rq, p) && task_on_rq_queued(p)) {
 		prefetch_curr_exec_start(p);
 		update_rq_clock(rq);
 		p->sched_class->update_curr(rq);
@@ -5568,7 +5570,8 @@ void sched_tick(void)
 {
 	int cpu =3D smp_processor_id();
 	struct rq *rq =3D cpu_rq(cpu);
-	struct task_struct *curr;
+	/* accounting goes to the donor task */
+	struct task_struct *donor;
 	struct rq_flags rf;
 	unsigned long hw_pressure;
 	u64 resched_latency;
@@ -5579,19 +5582,19 @@ void sched_tick(void)
 	sched_clock_tick();
=20
 	rq_lock(rq, &rf);
+	donor =3D rq->donor;
=20
-	curr =3D rq->curr;
-	psi_account_irqtime(rq, curr, NULL);
+	psi_account_irqtime(rq, donor, NULL);
=20
 	update_rq_clock(rq);
 	hw_pressure =3D arch_scale_hw_pressure(cpu_of(rq));
 	update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure);
-	curr->sched_class->task_tick(rq, curr, 0);
+	donor->sched_class->task_tick(rq, donor, 0);
 	if (sched_feat(LATENCY_WARN))
 		resched_latency =3D cpu_resched_latency(rq);
 	calc_global_load_tick(rq);
 	sched_core_tick(rq);
-	task_tick_mm_cid(rq, curr);
+	task_tick_mm_cid(rq, donor);
 	scx_tick(rq);
=20
 	rq_unlock(rq, &rf);
@@ -5601,8 +5604,8 @@ void sched_tick(void)
=20
 	perf_event_task_tick();
=20
-	if (curr->flags & PF_WQ_WORKER)
-		wq_worker_tick(curr);
+	if (donor->flags & PF_WQ_WORKER)
+		wq_worker_tick(donor);
=20
 #ifdef CONFIG_SMP
 	if (!scx_switched_all()) {
@@ -5669,6 +5672,12 @@ static void sched_tick_remote(struct work_struct *wo=
rk)
 		struct task_struct *curr =3D rq->curr;
=20
 		if (cpu_online(cpu)) {
+			/*
+			 * Since this is a remote tick for full dynticks mode,
+			 * we are always sure that there is no proxy (only a
+			 * single task is running).
+			 */
+			SCHED_WARN_ON(rq->curr !=3D rq->donor);
 			update_rq_clock(rq);
=20
 			if (!is_idle_task(curr)) {
@@ -6633,6 +6642,7 @@ static void __sched notrace __schedule(int sched_mode)
 	}
=20
 	next =3D pick_next_task(rq, prev, &rf);
+	rq_set_donor(rq, next);
 picked:
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
@@ -7134,7 +7144,7 @@ void rt_mutex_setprio(struct task_struct *p, struct t=
ask_struct *pi_task)
=20
 	prev_class =3D p->sched_class;
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_donor(rq, p);
 	if (queued)
 		dequeue_task(rq, p, queue_flag);
 	if (running)
@@ -7702,6 +7712,7 @@ void __init init_idle(struct task_struct *idle, int c=
pu)
 	rcu_read_unlock();
=20
 	rq->idle =3D idle;
+	rq_set_donor(rq, idle);
 	rcu_assign_pointer(rq->curr, idle);
 	idle->on_rq =3D TASK_ON_RQ_QUEUED;
 #ifdef CONFIG_SMP
@@ -7791,7 +7802,7 @@ void sched_setnuma(struct task_struct *p, int nid)
=20
 	rq =3D task_rq_lock(p, &rf);
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_donor(rq, p);
=20
 	if (queued)
 		dequeue_task(rq, p, DEQUEUE_SAVE);
@@ -8941,7 +8952,7 @@ void sched_move_task(struct task_struct *tsk)
=20
 	update_rq_clock(rq);
=20
-	running =3D task_current(rq, tsk);
+	running =3D task_current_donor(rq, tsk);
 	queued =3D task_on_rq_queued(tsk);
=20
 	if (queued)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 56260a80a268..90ddab256072 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1339,7 +1339,7 @@ static enum hrtimer_restart dl_task_timer(struct hrti=
mer *timer)
 #endif
=20
 	enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
-	if (dl_task(rq->curr))
+	if (dl_task(rq->donor))
 		wakeup_preempt_dl(rq, p, 0);
 	else
 		resched_curr(rq);
@@ -1736,11 +1736,11 @@ int dl_server_apply_params(struct sched_dl_entity *=
dl_se, u64 runtime, u64 perio
  */
 static void update_curr_dl(struct rq *rq)
 {
-	struct task_struct *curr =3D rq->curr;
-	struct sched_dl_entity *dl_se =3D &curr->dl;
+	struct task_struct *donor =3D rq->donor;
+	struct sched_dl_entity *dl_se =3D &donor->dl;
 	s64 delta_exec;
=20
-	if (!dl_task(curr) || !on_dl_rq(dl_se))
+	if (!dl_task(donor) || !on_dl_rq(dl_se))
 		return;
=20
 	/*
@@ -2213,7 +2213,7 @@ static int find_later_rq(struct task_struct *task);
 static int
 select_task_rq_dl(struct task_struct *p, int cpu, int flags)
 {
-	struct task_struct *curr;
+	struct task_struct *curr, *donor;
 	bool select_rq;
 	struct rq *rq;
=20
@@ -2224,6 +2224,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int=
 flags)
=20
 	rcu_read_lock();
 	curr =3D READ_ONCE(rq->curr); /* unlocked access */
+	donor =3D READ_ONCE(rq->donor);
=20
 	/*
 	 * If we are dealing with a -deadline task, we must
@@ -2234,9 +2235,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int=
 flags)
 	 * other hand, if it has a shorter deadline, we
 	 * try to make it stay here, it might be important.
 	 */
-	select_rq =3D unlikely(dl_task(curr)) &&
+	select_rq =3D unlikely(dl_task(donor)) &&
 		    (curr->nr_cpus_allowed < 2 ||
-		     !dl_entity_preempt(&p->dl, &curr->dl)) &&
+		     !dl_entity_preempt(&p->dl, &donor->dl)) &&
 		    p->nr_cpus_allowed > 1;
=20
 	/*
@@ -2299,7 +2300,7 @@ static void check_preempt_equal_dl(struct rq *rq, str=
uct task_struct *p)
 	 * let's hope p can move out.
 	 */
 	if (rq->curr->nr_cpus_allowed =3D=3D 1 ||
-	    !cpudl_find(&rq->rd->cpudl, rq->curr, NULL))
+	    !cpudl_find(&rq->rd->cpudl, rq->donor, NULL))
 		return;
=20
 	/*
@@ -2338,7 +2339,7 @@ static int balance_dl(struct rq *rq, struct task_stru=
ct *p, struct rq_flags *rf)
 static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p,
 				  int flags)
 {
-	if (dl_entity_preempt(&p->dl, &rq->curr->dl)) {
+	if (dl_entity_preempt(&p->dl, &rq->donor->dl)) {
 		resched_curr(rq);
 		return;
 	}
@@ -2348,7 +2349,7 @@ static void wakeup_preempt_dl(struct rq *rq, struct t=
ask_struct *p,
 	 * In the unlikely case current and p have the same deadline
 	 * let us try to decide what's the best thing to do...
 	 */
-	if ((p->dl.deadline =3D=3D rq->curr->dl.deadline) &&
+	if ((p->dl.deadline =3D=3D rq->donor->dl.deadline) &&
 	    !test_tsk_need_resched(rq->curr))
 		check_preempt_equal_dl(rq, p);
 #endif /* CONFIG_SMP */
@@ -2380,7 +2381,7 @@ static void set_next_task_dl(struct rq *rq, struct ta=
sk_struct *p, bool first)
 	if (!first)
 		return;
=20
-	if (rq->curr->sched_class !=3D &dl_sched_class)
+	if (rq->donor->sched_class !=3D &dl_sched_class)
 		update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0);
=20
 	deadline_queue_push_tasks(rq);
@@ -2699,8 +2700,8 @@ static int push_dl_task(struct rq *rq)
 	 * can move away, it makes sense to just reschedule
 	 * without going further in pushing next_task.
 	 */
-	if (dl_task(rq->curr) &&
-	    dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) &&
+	if (dl_task(rq->donor) &&
+	    dl_time_before(next_task->dl.deadline, rq->donor->dl.deadline) &&
 	    rq->curr->nr_cpus_allowed > 1) {
 		resched_curr(rq);
 		return 0;
@@ -2823,7 +2824,7 @@ static void pull_dl_task(struct rq *this_rq)
 			 * deadline than the current task of its runqueue.
 			 */
 			if (dl_time_before(p->dl.deadline,
-					   src_rq->curr->dl.deadline))
+					   src_rq->donor->dl.deadline))
 				goto skip;
=20
 			if (is_migration_disabled(p)) {
@@ -2862,9 +2863,9 @@ static void task_woken_dl(struct rq *rq, struct task_=
struct *p)
 	if (!task_on_cpu(rq, p) &&
 	    !test_tsk_need_resched(rq->curr) &&
 	    p->nr_cpus_allowed > 1 &&
-	    dl_task(rq->curr) &&
+	    dl_task(rq->donor) &&
 	    (rq->curr->nr_cpus_allowed < 2 ||
-	     !dl_entity_preempt(&p->dl, &rq->curr->dl))) {
+	     !dl_entity_preempt(&p->dl, &rq->donor->dl))) {
 		push_dl_tasks(rq);
 	}
 }
@@ -3039,12 +3040,12 @@ static void switched_to_dl(struct rq *rq, struct ta=
sk_struct *p)
 		return;
 	}
=20
-	if (rq->curr !=3D p) {
+	if (rq->donor !=3D p) {
 #ifdef CONFIG_SMP
 		if (p->nr_cpus_allowed > 1 && rq->dl.overloaded)
 			deadline_queue_push_tasks(rq);
 #endif
-		if (dl_task(rq->curr))
+		if (dl_task(rq->donor))
 			wakeup_preempt_dl(rq, p, 0);
 		else
 			resched_curr(rq);
@@ -3073,7 +3074,7 @@ static void prio_changed_dl(struct rq *rq, struct tas=
k_struct *p,
 	if (!rq->dl.overloaded)
 		deadline_queue_pull_task(rq);
=20
-	if (task_current(rq, p)) {
+	if (task_current_donor(rq, p)) {
 		/*
 		 * If we now have a earlier deadline task than p,
 		 * then reschedule, provided p is still on this
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5a621210c9c1..078ef61dd1a0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1200,12 +1200,12 @@ static inline bool do_preempt_short(struct cfs_rq *=
cfs_rq,
  */
 s64 update_curr_common(struct rq *rq)
 {
-	struct task_struct *curr =3D rq->curr;
+	struct task_struct *donor =3D rq->donor;
 	s64 delta_exec;
=20
-	delta_exec =3D update_curr_se(rq, &curr->se);
+	delta_exec =3D update_curr_se(rq, &donor->se);
 	if (likely(delta_exec > 0))
-		update_curr_task(curr, delta_exec);
+		update_curr_task(donor, delta_exec);
=20
 	return delta_exec;
 }
@@ -1258,7 +1258,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
=20
 static void update_curr_fair(struct rq *rq)
 {
-	update_curr(cfs_rq_of(&rq->curr->se));
+	update_curr(cfs_rq_of(&rq->donor->se));
 }
=20
 static inline void
@@ -6812,7 +6812,7 @@ static void hrtick_start_fair(struct rq *rq, struct t=
ask_struct *p)
 		s64 delta =3D slice - ran;
=20
 		if (delta < 0) {
-			if (task_current(rq, p))
+			if (task_current_donor(rq, p))
 				resched_curr(rq);
 			return;
 		}
@@ -6827,12 +6827,12 @@ static void hrtick_start_fair(struct rq *rq, struct=
 task_struct *p)
  */
 static void hrtick_update(struct rq *rq)
 {
-	struct task_struct *curr =3D rq->curr;
+	struct task_struct *donor =3D rq->donor;
=20
-	if (!hrtick_enabled_fair(rq) || curr->sched_class !=3D &fair_sched_class)
+	if (!hrtick_enabled_fair(rq) || donor->sched_class !=3D &fair_sched_class)
 		return;
=20
-	hrtick_start_fair(rq, curr);
+	hrtick_start_fair(rq, donor);
 }
 #else /* !CONFIG_SCHED_HRTICK */
 static inline void
@@ -8747,9 +8747,9 @@ static void set_next_buddy(struct sched_entity *se)
  */
 static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p=
, int wake_flags)
 {
-	struct task_struct *curr =3D rq->curr;
-	struct sched_entity *se =3D &curr->se, *pse =3D &p->se;
-	struct cfs_rq *cfs_rq =3D task_cfs_rq(curr);
+	struct task_struct *donor =3D rq->donor;
+	struct sched_entity *se =3D &donor->se, *pse =3D &p->se;
+	struct cfs_rq *cfs_rq =3D task_cfs_rq(donor);
 	int cse_is_idle, pse_is_idle;
=20
 	if (unlikely(se =3D=3D pse))
@@ -8778,7 +8778,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, =
struct task_struct *p, int
 	 * prevents us from potentially nominating it as a false LAST_BUDDY
 	 * below.
 	 */
-	if (test_tsk_need_resched(curr))
+	if (test_tsk_need_resched(rq->curr))
 		return;
=20
 	if (!sched_feat(WAKEUP_PREEMPTION))
@@ -13077,7 +13077,7 @@ prio_changed_fair(struct rq *rq, struct task_struct=
 *p, int oldprio)
 	 * our priority decreased, or if we are not currently running on
 	 * this runqueue and our priority is higher than the current's
 	 */
-	if (task_current(rq, p)) {
+	if (task_current_donor(rq, p)) {
 		if (p->prio > oldprio)
 			resched_curr(rq);
 	} else
@@ -13200,7 +13200,7 @@ static void switched_to_fair(struct rq *rq, struct =
task_struct *p)
 		 * kick off the schedule if running, otherwise just see
 		 * if we can still preempt the current task.
 		 */
-		if (task_current(rq, p))
+		if (task_current_donor(rq, p))
 			resched_curr(rq);
 		else
 			wakeup_preempt(rq, p, 0);
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index a9c65d97b3ca..fc07382361a8 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -476,7 +476,7 @@ int update_irq_load_avg(struct rq *rq, u64 running)
 bool update_other_load_avgs(struct rq *rq)
 {
 	u64 now =3D rq_clock_pelt(rq);
-	const struct sched_class *curr_class =3D rq->curr->sched_class;
+	const struct sched_class *curr_class =3D rq->donor->sched_class;
 	unsigned long hw_pressure =3D arch_scale_hw_pressure(cpu_of(rq));
=20
 	lockdep_assert_rq_held(rq);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index c5c22fc51824..bd66a46b06ac 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -528,7 +528,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *r=
t_se, unsigned int flags)
=20
 static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
 {
-	struct task_struct *curr =3D rq_of_rt_rq(rt_rq)->curr;
+	struct task_struct *donor =3D rq_of_rt_rq(rt_rq)->donor;
 	struct rq *rq =3D rq_of_rt_rq(rt_rq);
 	struct sched_rt_entity *rt_se;
=20
@@ -542,7 +542,7 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
 		else if (!on_rt_rq(rt_se))
 			enqueue_rt_entity(rt_se, 0);
=20
-		if (rt_rq->highest_prio.curr < curr->prio)
+		if (rt_rq->highest_prio.curr < donor->prio)
 			resched_curr(rq);
 	}
 }
@@ -988,10 +988,10 @@ static inline int rt_se_prio(struct sched_rt_entity *=
rt_se)
  */
 static void update_curr_rt(struct rq *rq)
 {
-	struct task_struct *curr =3D rq->curr;
+	struct task_struct *donor =3D rq->donor;
 	s64 delta_exec;
=20
-	if (curr->sched_class !=3D &rt_sched_class)
+	if (donor->sched_class !=3D &rt_sched_class)
 		return;
=20
 	delta_exec =3D update_curr_common(rq);
@@ -999,7 +999,7 @@ static void update_curr_rt(struct rq *rq)
 		return;
=20
 #ifdef CONFIG_RT_GROUP_SCHED
-	struct sched_rt_entity *rt_se =3D &curr->rt;
+	struct sched_rt_entity *rt_se =3D &donor->rt;
=20
 	if (!rt_bandwidth_enabled())
 		return;
@@ -1535,7 +1535,7 @@ static int find_lowest_rq(struct task_struct *task);
 static int
 select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 {
-	struct task_struct *curr;
+	struct task_struct *curr, *donor;
 	struct rq *rq;
 	bool test;
=20
@@ -1547,6 +1547,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int=
 flags)
=20
 	rcu_read_lock();
 	curr =3D READ_ONCE(rq->curr); /* unlocked access */
+	donor =3D READ_ONCE(rq->donor);
=20
 	/*
 	 * If the current task on @p's runqueue is an RT task, then
@@ -1575,8 +1576,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int=
 flags)
 	 * systems like big.LITTLE.
 	 */
 	test =3D curr &&
-	       unlikely(rt_task(curr)) &&
-	       (curr->nr_cpus_allowed < 2 || curr->prio <=3D p->prio);
+	       unlikely(rt_task(donor)) &&
+	       (curr->nr_cpus_allowed < 2 || donor->prio <=3D p->prio);
=20
 	if (test || !rt_task_fits_capacity(p, cpu)) {
 		int target =3D find_lowest_rq(p);
@@ -1606,12 +1607,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, in=
t flags)
=20
 static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
 {
-	/*
-	 * Current can't be migrated, useless to reschedule,
-	 * let's hope p can move out.
-	 */
 	if (rq->curr->nr_cpus_allowed =3D=3D 1 ||
-	    !cpupri_find(&rq->rd->cpupri, rq->curr, NULL))
+	    !cpupri_find(&rq->rd->cpupri, rq->donor, NULL))
 		return;
=20
 	/*
@@ -1654,7 +1651,9 @@ static int balance_rt(struct rq *rq, struct task_stru=
ct *p, struct rq_flags *rf)
  */
 static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int fl=
ags)
 {
-	if (p->prio < rq->curr->prio) {
+	struct task_struct *donor =3D rq->donor;
+
+	if (p->prio < donor->prio) {
 		resched_curr(rq);
 		return;
 	}
@@ -1672,7 +1671,7 @@ static void wakeup_preempt_rt(struct rq *rq, struct t=
ask_struct *p, int flags)
 	 * to move current somewhere else, making room for our non-migratable
 	 * task.
 	 */
-	if (p->prio =3D=3D rq->curr->prio && !test_tsk_need_resched(rq->curr))
+	if (p->prio =3D=3D donor->prio && !test_tsk_need_resched(rq->curr))
 		check_preempt_equal_prio(rq, p);
 #endif
 }
@@ -1697,7 +1696,7 @@ static inline void set_next_task_rt(struct rq *rq, st=
ruct task_struct *p, bool f
 	 * utilization. We only care of the case where we start to schedule a
 	 * rt task
 	 */
-	if (rq->curr->sched_class !=3D &rt_sched_class)
+	if (rq->donor->sched_class !=3D &rt_sched_class)
 		update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0);
=20
 	rt_queue_push_tasks(rq);
@@ -1959,6 +1958,7 @@ static struct task_struct *pick_next_pushable_task(st=
ruct rq *rq)
=20
 	BUG_ON(rq->cpu !=3D task_cpu(p));
 	BUG_ON(task_current(rq, p));
+	BUG_ON(task_current_donor(rq, p));
 	BUG_ON(p->nr_cpus_allowed <=3D 1);
=20
 	BUG_ON(!task_on_rq_queued(p));
@@ -1991,7 +1991,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 	 * higher priority than current. If that's the case
 	 * just reschedule current.
 	 */
-	if (unlikely(next_task->prio < rq->curr->prio)) {
+	if (unlikely(next_task->prio < rq->donor->prio)) {
 		resched_curr(rq);
 		return 0;
 	}
@@ -2012,7 +2012,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 		 * Note that the stoppers are masqueraded as SCHED_FIFO
 		 * (cf. sched_set_stop_task()), so we can't rely on rt_task().
 		 */
-		if (rq->curr->sched_class !=3D &rt_sched_class)
+		if (rq->donor->sched_class !=3D &rt_sched_class)
 			return 0;
=20
 		cpu =3D find_lowest_rq(rq->curr);
@@ -2344,7 +2344,7 @@ static void pull_rt_task(struct rq *this_rq)
 			 * p if it is lower in priority than the
 			 * current task on the run queue
 			 */
-			if (p->prio < src_rq->curr->prio)
+			if (p->prio < src_rq->donor->prio)
 				goto skip;
=20
 			if (is_migration_disabled(p)) {
@@ -2386,9 +2386,9 @@ static void task_woken_rt(struct rq *rq, struct task_=
struct *p)
 	bool need_to_push =3D !task_on_cpu(rq, p) &&
 			    !test_tsk_need_resched(rq->curr) &&
 			    p->nr_cpus_allowed > 1 &&
-			    (dl_task(rq->curr) || rt_task(rq->curr)) &&
+			    (dl_task(rq->donor) || rt_task(rq->donor)) &&
 			    (rq->curr->nr_cpus_allowed < 2 ||
-			     rq->curr->prio <=3D p->prio);
+			     rq->donor->prio <=3D p->prio);
=20
 	if (need_to_push)
 		push_rt_tasks(rq);
@@ -2472,7 +2472,7 @@ static void switched_to_rt(struct rq *rq, struct task=
_struct *p)
 		if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
 			rt_queue_push_tasks(rq);
 #endif /* CONFIG_SMP */
-		if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq)))
+		if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq)))
 			resched_curr(rq);
 	}
 }
@@ -2487,7 +2487,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,=
 int oldprio)
 	if (!task_on_rq_queued(p))
 		return;
=20
-	if (task_current(rq, p)) {
+	if (task_current_donor(rq, p)) {
 #ifdef CONFIG_SMP
 		/*
 		 * If our priority decreases while running, we
@@ -2513,7 +2513,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,=
 int oldprio)
 		 * greater than the current running task
 		 * then reschedule.
 		 */
-		if (p->prio < rq->curr->prio)
+		if (p->prio < rq->donor->prio)
 			resched_curr(rq);
 	}
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cb74a577c89d..12651bb9410b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1147,7 +1147,10 @@ struct rq {
 	 */
 	unsigned int		nr_uninterruptible;
=20
-	struct task_struct __rcu	*curr;
+	union {
+		struct task_struct __rcu *donor; /* Scheduler context */
+		struct task_struct __rcu *curr;  /* Execution context */
+	};
 	struct sched_dl_entity	*dl_server;
 	struct task_struct	*idle;
 	struct task_struct	*stop;
@@ -1344,6 +1347,11 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 #define raw_rq()		raw_cpu_ptr(&runqueues)
=20
+static inline void rq_set_donor(struct rq *rq, struct task_struct *t)
+{
+	/* Do nothing */
+}
+
 #ifdef CONFIG_SCHED_CORE
 static inline struct cpumask *sched_group_span(struct sched_group *sg);
=20
@@ -2260,11 +2268,25 @@ static inline u64 global_rt_runtime(void)
 	return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC;
 }
=20
+/*
+ * Is p the current execution context?
+ */
 static inline int task_current(struct rq *rq, struct task_struct *p)
 {
 	return rq->curr =3D=3D p;
 }
=20
+/*
+ * Is p the current scheduling context?
+ *
+ * Note that it might be the current execution context at the same time if
+ * rq->curr =3D=3D rq->donor =3D=3D p.
+ */
+static inline int task_current_donor(struct rq *rq, struct task_struct *p)
+{
+	return rq->donor =3D=3D p;
+}
+
 static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
 {
 #ifdef CONFIG_SMP
@@ -2448,7 +2470,7 @@ struct sched_class {
=20
 static inline void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	WARN_ON_ONCE(rq->curr !=3D prev);
+	WARN_ON_ONCE(rq->donor !=3D prev);
 	prev->sched_class->put_prev_task(rq, prev, NULL);
 }
=20
@@ -2612,7 +2634,7 @@ static inline cpumask_t *alloc_user_cpus_ptr(int node)
=20
 static inline struct task_struct *get_push_task(struct rq *rq)
 {
-	struct task_struct *p =3D rq->curr;
+	struct task_struct *p =3D rq->donor;
=20
 	lockdep_assert_rq_held(rq);
=20
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index aa70beee9895..db6f3fcb8111 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -91,7 +91,7 @@ void set_user_nice(struct task_struct *p, long nice)
 	}
=20
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_donor(rq, p);
 	if (queued)
 		dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
 	if (running)
@@ -707,7 +707,7 @@ int __sched_setscheduler(struct task_struct *p,
 	}
=20
 	queued =3D task_on_rq_queued(p);
-	running =3D task_current(rq, p);
+	running =3D task_current_donor(rq, p);
 	if (queued)
 		dequeue_task(rq, p, queue_flags);
 	if (running)
--=20
2.47.0.rc1.288.g06298d1525-goog