From nobody Sat Feb 7 15:10:45 2026 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D83496BB20 for ; Tue, 7 May 2024 04:54:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057700; cv=none; b=Yjs4h7SSKephAc6EpBAuves2r4o1jvhjM/duHSxsYqs9b8JPM25esXf1BPGBNg8dkkfJveERENf3KmHjx1yaMfU2jxCy8GY4nwSin1U5Dw9iY/rV0xLG7/nw3UeEm+gu4ajf+8BLq4MFxhuseiS6g7x1QhKRRMRJy9H5ixI55Vk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057700; c=relaxed/simple; bh=dAr826/Udchkey/kqQcHlgRzJQPIP5K9bjYOtCcBPZs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=COg8L0TmK9OQWH3C3KFCg0lGdcehB6Zg/slgXhm3C2ec3bMf0rC2rblzlXHdJTst3T3VSxCAeTOXByAbO8O1vwPK+nl2KANmujtMV8T1lG4apOw9NmSv4xJ2hm981tpmE9sGnYCKVohNRTdQncN3ZV0NCVGpvuDOxBQdecThOt4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=15zsz87n; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="15zsz87n" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-6f474d5a8dfso1548706b3a.2 for ; Mon, 06 May 2024 21:54:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1715057698; x=1715662498; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ji8hJbE5PNrkLM9ECPsTJ5XTmv34szXgRMIOA0wtt1I=; b=15zsz87nOCdBeiODhs6TcRmUZOExu1RhzvF7X2zQClq1LFJXOFBtK3rlJhe353Dugk Q8oEpoHZjoMM0sbM9pc4b6kiqLTveLI9JCW6SzZK7OAbgpt0Ut4Wp1ONdfrTNjMHEF6e SdidBHEtMEX9riCLBtStDKygUa9gMUmGcDEMXzalmJM0HWa15AQIGyGvpe+64vi3YIez OJ5HZjYN3i9ap3/YT7tyfvmYHnbdplC+0j/qIOD/HnZMi4xt24NuQYhPfH7iAb5kjp8p SxDQHOaovyAbzTBTbnXNnUvDBYXvYMw1z0r7GA+D6ULmCoawz7FhUYO2PwvsCzxxTNzu Qdiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715057698; x=1715662498; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ji8hJbE5PNrkLM9ECPsTJ5XTmv34szXgRMIOA0wtt1I=; b=AmvNNroGwOzetlau7wBQHVGOQUxhSA1HeZ7cIOJC3rf53UWusjuPtz3e76OWAjOewb hO3dxt2PuGptjTL5hO/FkofZda/wmaq6AzC1MKJn5ZaIsJ28uFWKkM/6YMU7y9cp/zqs NTlvkD7gqNJkv55hkWxd8q5eqFOJQ5t5LsS6CVemTn8m8yZyGJlEgRrwctUOIQwI3ya2 nJC0wb421Mpiq97wEj8QBKeJjgaBjRO9NUBY9t6M312a8z5Q07dm0eR8PskZyCRzJP3/ tcvTrKKEgAcFXN33xO4PlFkNvXUBtNY0d57nI7lLtF8CX2WCCodVq3RTzO6WucUS6b5I rueA== X-Gm-Message-State: AOJu0YyDJ+DQaPsrfygT9/pU1pV78aa+sDTciIlkd2QGZW2pAg7iK2Oi c82o1YKYC+HHEQuzxQjjK3hCsPvwlTES037iql8pqiCGzjtBqkDlwvqx7dOQuUxeARdNca21Y89 +kd54F3QML+lXTlMWYxxNBnwYCG7dS+tdlR2PKrGoVol1wojR/eOFS3dr2MH5FSebDcwl39M4rc +NaWwktyu2DWTRtGK47o8hB1d5FWHjHbTxvDiPGxhhcLHg X-Google-Smtp-Source: AGHT+IE/uTHnlpHWFzV26Y3KIC0S9o+qx8c0fcAwd1FGc5gstBQWP5oMOEDYiTcQZdOFPUHOz4OKXy29dh0c X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a05:6a00:a2a:b0:6ec:f406:ab4b with SMTP id p42-20020a056a000a2a00b006ecf406ab4bmr462508pfh.4.1715057697905; Mon, 06 May 2024 21:54:57 -0700 (PDT) Date: Mon, 6 May 2024 21:54:30 -0700 In-Reply-To: <20240507045450.895430-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240507045450.895430-1-jstultz@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240507045450.895430-2-jstultz@google.com> Subject: [PATCH v10 1/7] locking/mutex: Remove wakeups from under mutex::wait_lock From: John Stultz To: LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , kernel-team@android.com, Metin Kaya , Davidlohr Bueso , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra In preparation to nest mutex::wait_lock under rq::lock we need to remove wakeups from under it. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Acked-by: Davidlohr Bueso Reviewed-by: Metin Kaya Signed-off-by: Peter Zijlstra (Intel) [Heavily changed after 55f036ca7e74 ("locking: WW mutex cleanup") and 08295b3b5bee ("locking: Implement an algorithm choice for Wound-Wait mutexes")] Signed-off-by: Juri Lelli [jstultz: rebased to mainline, added extra wake_up_q & init to avoid hangs, similar to Connor's rework of this patch] Signed-off-by: John Stultz --- v5: * Reverted back to an earlier version of this patch to undo the change that kept the wake_q in the ctx structure, as that broke the rule that the wake_q must always be on the stack, as its not safe for concurrency. v6: * Made tweaks suggested by Waiman Long v7: * Fixups to pass wake_qs down for PREEMPT_RT logic v10: * Switched preempt_enable to be lower close to the unlock as suggested by Valentin * Added additional preempt_disable coverage around the wake_q calls as again noted by Valentin --- kernel/locking/mutex.c | 17 +++++++++++++---- kernel/locking/rtmutex.c | 30 +++++++++++++++++++++--------- kernel/locking/rwbase_rt.c | 8 +++++++- kernel/locking/rwsem.c | 4 ++-- kernel/locking/spinlock_rt.c | 3 ++- kernel/locking/ww_mutex.h | 29 ++++++++++++++++++----------- 6 files changed, 63 insertions(+), 28 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index cbae8c0b89ab..4269da1f3ef5 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -575,6 +575,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas struct lockdep_map *nest_lock, unsigned long ip, struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx) { + DEFINE_WAKE_Q(wake_q); struct mutex_waiter waiter; struct ww_mutex *ww; int ret; @@ -625,7 +626,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas */ if (__mutex_trylock(lock)) { if (ww_ctx) - __ww_mutex_check_waiters(lock, ww_ctx); + __ww_mutex_check_waiters(lock, ww_ctx, &wake_q); =20 goto skip_wait; } @@ -645,7 +646,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas * Add in stamp order, waking up waiters that must kill * themselves. */ - ret =3D __ww_mutex_add_waiter(&waiter, lock, ww_ctx); + ret =3D __ww_mutex_add_waiter(&waiter, lock, ww_ctx, &wake_q); if (ret) goto err_early_kill; } @@ -681,6 +682,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int s= tate, unsigned int subclas } =20 raw_spin_unlock(&lock->wait_lock); + /* Make sure we do wakeups before calling schedule */ + if (!wake_q_empty(&wake_q)) { + wake_up_q(&wake_q); + wake_q_init(&wake_q); + } schedule_preempt_disabled(); =20 first =3D __mutex_waiter_is_first(lock, &waiter); @@ -714,7 +720,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas */ if (!ww_ctx->is_wait_die && !__mutex_waiter_is_first(lock, &waiter)) - __ww_mutex_check_waiters(lock, ww_ctx); + __ww_mutex_check_waiters(lock, ww_ctx, &wake_q); } =20 __mutex_remove_waiter(lock, &waiter); @@ -730,6 +736,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas ww_mutex_lock_acquired(ww, ww_ctx); =20 raw_spin_unlock(&lock->wait_lock); + wake_up_q(&wake_q); preempt_enable(); return 0; =20 @@ -741,6 +748,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas raw_spin_unlock(&lock->wait_lock); debug_mutex_free_waiter(&waiter); mutex_release(&lock->dep_map, ip); + wake_up_q(&wake_q); preempt_enable(); return ret; } @@ -951,9 +959,10 @@ static noinline void __sched __mutex_unlock_slowpath(s= truct mutex *lock, unsigne if (owner & MUTEX_FLAG_HANDOFF) __mutex_handoff(lock, next); =20 + preempt_disable(); raw_spin_unlock(&lock->wait_lock); - wake_up_q(&wake_q); + preempt_enable(); } =20 #ifndef CONFIG_DEBUG_LOCK_ALLOC diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 88d08eeb8bc0..7a85d9bfa972 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -34,13 +34,15 @@ =20 static inline int __ww_mutex_add_waiter(struct rt_mutex_waiter *waiter, struct rt_mutex *lock, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, + struct wake_q_head *wake_q) { return 0; } =20 static inline void __ww_mutex_check_waiters(struct rt_mutex *lock, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, + struct wake_q_head *wake_q) { } =20 @@ -1207,6 +1209,7 @@ static int __sched task_blocks_on_rt_mutex(struct rt_= mutex_base *lock, struct rt_mutex_waiter *top_waiter =3D waiter; struct rt_mutex_base *next_lock; int chain_walk =3D 0, res; + DEFINE_WAKE_Q(wake_q); =20 lockdep_assert_held(&lock->wait_lock); =20 @@ -1245,7 +1248,10 @@ static int __sched task_blocks_on_rt_mutex(struct rt= _mutex_base *lock, =20 /* Check whether the waiter should back out immediately */ rtm =3D container_of(lock, struct rt_mutex, rtmutex); - res =3D __ww_mutex_add_waiter(waiter, rtm, ww_ctx); + preempt_disable(); + res =3D __ww_mutex_add_waiter(waiter, rtm, ww_ctx, &wake_q); + wake_up_q(&wake_q); + preempt_enable(); if (res) { raw_spin_lock(&task->pi_lock); rt_mutex_dequeue(lock, waiter); @@ -1678,7 +1684,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, struct ww_acquire_ctx *ww_ctx, unsigned int state, enum rtmutex_chainwalk chwalk, - struct rt_mutex_waiter *waiter) + struct rt_mutex_waiter *waiter, + struct wake_q_head *wake_q) { struct rt_mutex *rtm =3D container_of(lock, struct rt_mutex, rtmutex); struct ww_mutex *ww =3D ww_container_of(rtm); @@ -1689,7 +1696,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, /* Try to acquire the lock again: */ if (try_to_take_rt_mutex(lock, current, NULL)) { if (build_ww_mutex() && ww_ctx) { - __ww_mutex_check_waiters(rtm, ww_ctx); + __ww_mutex_check_waiters(rtm, ww_ctx, wake_q); ww_mutex_lock_acquired(ww, ww_ctx); } return 0; @@ -1707,7 +1714,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, /* acquired the lock */ if (build_ww_mutex() && ww_ctx) { if (!ww_ctx->is_wait_die) - __ww_mutex_check_waiters(rtm, ww_ctx); + __ww_mutex_check_waiters(rtm, ww_ctx, wake_q); ww_mutex_lock_acquired(ww, ww_ctx); } } else { @@ -1729,7 +1736,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, =20 static inline int __rt_mutex_slowlock_locked(struct rt_mutex_base *lock, struct ww_acquire_ctx *ww_ctx, - unsigned int state) + unsigned int state, + struct wake_q_head *wake_q) { struct rt_mutex_waiter waiter; int ret; @@ -1738,7 +1746,7 @@ static inline int __rt_mutex_slowlock_locked(struct r= t_mutex_base *lock, waiter.ww_ctx =3D ww_ctx; =20 ret =3D __rt_mutex_slowlock(lock, ww_ctx, state, RT_MUTEX_MIN_CHAINWALK, - &waiter); + &waiter, wake_q); =20 debug_rt_mutex_free_waiter(&waiter); return ret; @@ -1754,6 +1762,7 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_= base *lock, struct ww_acquire_ctx *ww_ctx, unsigned int state) { + DEFINE_WAKE_Q(wake_q); unsigned long flags; int ret; =20 @@ -1775,8 +1784,11 @@ static int __sched rt_mutex_slowlock(struct rt_mutex= _base *lock, * irqsave/restore variants. */ raw_spin_lock_irqsave(&lock->wait_lock, flags); - ret =3D __rt_mutex_slowlock_locked(lock, ww_ctx, state); + ret =3D __rt_mutex_slowlock_locked(lock, ww_ctx, state, &wake_q); + preempt_disable(); raw_spin_unlock_irqrestore(&lock->wait_lock, flags); + wake_up_q(&wake_q); + preempt_enable(); rt_mutex_post_schedule(); =20 return ret; diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index 34a59569db6b..9f4322c07486 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -69,6 +69,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *r= wb, unsigned int state) { struct rt_mutex_base *rtm =3D &rwb->rtmutex; + DEFINE_WAKE_Q(wake_q); int ret; =20 rwbase_pre_schedule(); @@ -110,7 +111,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt = *rwb, * For rwlocks this returns 0 unconditionally, so the below * !ret conditionals are optimized out. */ - ret =3D rwbase_rtmutex_slowlock_locked(rtm, state); + ret =3D rwbase_rtmutex_slowlock_locked(rtm, state, &wake_q); =20 /* * On success the rtmutex is held, so there can't be a writer @@ -121,7 +122,12 @@ static int __sched __rwbase_read_lock(struct rwbase_rt= *rwb, */ if (!ret) atomic_inc(&rwb->readers); + + preempt_disable(); raw_spin_unlock_irq(&rtm->wait_lock); + wake_up_q(&wake_q); + preempt_enable(); + if (!ret) rwbase_rtmutex_unlock(rtm); =20 diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index c6d17aee4209..79ab7b8df5c1 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1415,8 +1415,8 @@ static inline void __downgrade_write(struct rw_semaph= ore *sem) #define rwbase_rtmutex_lock_state(rtm, state) \ __rt_mutex_lock(rtm, state) =20 -#define rwbase_rtmutex_slowlock_locked(rtm, state) \ - __rt_mutex_slowlock_locked(rtm, NULL, state) +#define rwbase_rtmutex_slowlock_locked(rtm, state, wq) \ + __rt_mutex_slowlock_locked(rtm, NULL, state, wq) =20 #define rwbase_rtmutex_unlock(rtm) \ __rt_mutex_unlock(rtm) diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c index 38e292454fcc..fb1810a14c9d 100644 --- a/kernel/locking/spinlock_rt.c +++ b/kernel/locking/spinlock_rt.c @@ -162,7 +162,8 @@ rwbase_rtmutex_lock_state(struct rt_mutex_base *rtm, un= signed int state) } =20 static __always_inline int -rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int sta= te) +rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int sta= te, + struct wake_q_head *wake_q) { rtlock_slowlock_locked(rtm); return 0; diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index 3ad2cc4823e5..7189c6631d90 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -275,7 +275,7 @@ __ww_ctx_less(struct ww_acquire_ctx *a, struct ww_acqui= re_ctx *b) */ static bool __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, struct wake_q_head *wake_q) { if (!ww_ctx->is_wait_die) return false; @@ -284,7 +284,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER = *waiter, #ifndef WW_RT debug_mutex_wake_waiter(lock, waiter); #endif - wake_up_process(waiter->task); + wake_q_add(wake_q, waiter->task); } =20 return true; @@ -299,7 +299,8 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER = *waiter, */ static bool __ww_mutex_wound(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx, - struct ww_acquire_ctx *hold_ctx) + struct ww_acquire_ctx *hold_ctx, + struct wake_q_head *wake_q) { struct task_struct *owner =3D __ww_mutex_owner(lock); =20 @@ -331,7 +332,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock, * wakeup pending to re-read the wounded state. */ if (owner !=3D current) - wake_up_process(owner); + wake_q_add(wake_q, owner); =20 return true; } @@ -352,7 +353,8 @@ static bool __ww_mutex_wound(struct MUTEX *lock, * The current task must not be on the wait list. */ static void -__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx) +__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx, + struct wake_q_head *wake_q) { struct MUTEX_WAITER *cur; =20 @@ -364,8 +366,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_= acquire_ctx *ww_ctx) if (!cur->ww_ctx) continue; =20 - if (__ww_mutex_die(lock, cur, ww_ctx) || - __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx)) + if (__ww_mutex_die(lock, cur, ww_ctx, wake_q) || + __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx, wake_q)) break; } } @@ -377,6 +379,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_= acquire_ctx *ww_ctx) static __always_inline void ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx= *ctx) { + DEFINE_WAKE_Q(wake_q); + ww_mutex_lock_acquired(lock, ctx); =20 /* @@ -405,8 +409,10 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, s= truct ww_acquire_ctx *ctx) * die or wound us. */ lock_wait_lock(&lock->base); - __ww_mutex_check_waiters(&lock->base, ctx); + __ww_mutex_check_waiters(&lock->base, ctx, &wake_q); unlock_wait_lock(&lock->base); + + wake_up_q(&wake_q); } =20 static __always_inline int @@ -488,7 +494,8 @@ __ww_mutex_check_kill(struct MUTEX *lock, struct MUTEX_= WAITER *waiter, static inline int __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter, struct MUTEX *lock, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, + struct wake_q_head *wake_q) { struct MUTEX_WAITER *cur, *pos =3D NULL; bool is_wait_die; @@ -532,7 +539,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter, pos =3D cur; =20 /* Wait-Die: ensure younger waiters die. */ - __ww_mutex_die(lock, cur, ww_ctx); + __ww_mutex_die(lock, cur, ww_ctx, wake_q); } =20 __ww_waiter_add(lock, waiter, pos); @@ -550,7 +557,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter, * such that either we or the fastpath will wound @ww->ctx. */ smp_mb(); - __ww_mutex_wound(lock, ww_ctx, ww->ctx); + __ww_mutex_wound(lock, ww_ctx, ww->ctx, wake_q); } =20 return 0; --=20 2.45.0.rc1.225.g2a3ae87e7f-goog From nobody Sat Feb 7 15:10:45 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 780806A8C1 for ; Tue, 7 May 2024 04:55:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057702; cv=none; b=Ko8IiMhJAgPl0Yz0bjdE67dqEzcAjLE3+/h5uAY9goB5H2mT0gzvNzJbMEJ/Mkjf7jTzHVtWFfAF0xVTRg5a0f43tnkaDLAGc6fjJZmQil2kcWH+CARH8YoSNEa0ndNIPPhf963TajQ4hDXGsS+RzLfj43kUXsE0MLxtCk/mTqE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057702; c=relaxed/simple; bh=bo+gVaVxjNijWunmE9f5bCNb+eUl23Mllga9rZCgjYs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZcAyKXPnhfie6CSE6wEiAQTau1aUpOZgNOagvQuPgo/Q8925LMQeQay3S2AdQUMmiwYDQT3BxAk1ri5VhW+UDmAz1M0OH/CKmJRIy9kCwWZY5EchrLUpfw8g46CVXtbQkQRkCp89JXu+QyGN0v7SW5sLRj0kbVlv3xLchbUwILw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=gHph6ZBW; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gHph6ZBW" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2b4330e57b6so3176603a91.2 for ; Mon, 06 May 2024 21:55:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1715057700; x=1715662500; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WSwAjnZ04uvIvYrm8l/lLv1+BQspgO/0v1ogtObw5dk=; b=gHph6ZBWBXnxvjYJcj+GVBGSHOL+V6IFcYtHU9nqNDVgcgRUNFu2dXtohuz1HKwOSl TwCscv7ZP1BE9iytC35lHbgwEgu5KNxDyOb0IZ7Gmmf3Z9XCOEveg3QnTkfoQYLmRqqJ p6RtXnvzwIn3A4Vr6goEdeaQ9ZjtK3I4DL4pwyJT4gRaaMRvi2FgWW9WMCrq3X3eV7ZA B/RCx8GhQGlCEiGiaoIc8BZ+dMdWKsF75/oIquzFXcnWTa3s6I9XBTAezjJ3omTiB8JK 64F1bf4JkwTFU4+ltvVL+jRBS5rLUXGLHZbv1hfk+1xOFFvfJQt8tTdxLPFSDbgprMG4 9E3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715057700; x=1715662500; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WSwAjnZ04uvIvYrm8l/lLv1+BQspgO/0v1ogtObw5dk=; b=h7BY8Ezsj/b5qbvuYT1j+p+WaXmEfCeiGIXqDt7IGjK65nhqb2jfOKoR4YNSxhrbcA jguIcFy99S1N08EskYf/bbSII/mbaZuthRwueZu/YmYMdEVWPzxpbqVxG5t04wAgRTVP TOmeYjqxcB0+UbPGMRZVIRAZCXmMIOI5TNaypiR1OFTXxMsDLTTxvKrDBy+dJIc90sHt s7MiFSMiw25ePdFujZUZ00dZADI0fGb1lssm7oN6BX2hmKmeGP83/BZvsXyD9wLRyclf bbgIyvY3DRagesEBVKvUlIRxw4iMVXkFp9E4i9+wuiDj9XSmHDOjgDT17v61Xvwco2B1 nV7g== X-Gm-Message-State: AOJu0Yzyi4Zp+gUn5g13yCqVU2XJ5fJyO5Cix53mFmQzZkd6u/j4tRj6 GB22cx1WP2eTbpSdn339yLG1bUPyQqt/tzWYrAbQVKuqGlLmDMSso3gRaEZBthaOLQ0HHt1D0QH tCtD5aTzHAemcSlc3n2IjuGrbR7R8H0/AFF68tSZVD1qaKrct6PMFFmS0bqBg/qFKc0rhw/RUEO Y26vyMGK0Dujhxnm5khutbYKmRoidsXCv5/0fOl/FGjxK9 X-Google-Smtp-Source: AGHT+IF7yVqG/GK7K/5+cWhpXrMrduHe0OomZZSLOu+1d9x/vYuJK4Olkn0mdF6WKdaI8ecHwxYNQ290GQRM X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a17:90a:e387:b0:2b1:6a65:c73a with SMTP id b7-20020a17090ae38700b002b16a65c73amr34556pjz.2.1715057699677; Mon, 06 May 2024 21:54:59 -0700 (PDT) Date: Mon, 6 May 2024 21:54:31 -0700 In-Reply-To: <20240507045450.895430-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240507045450.895430-1-jstultz@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240507045450.895430-3-jstultz@google.com> Subject: [PATCH v10 2/7] locking/mutex: Make mutex::wait_lock irq safe From: John Stultz To: LKML Cc: Juri Lelli , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , kernel-team@android.com, Metin Kaya , "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Juri Lelli mutex::wait_lock might be nested under rq->lock. Make it irq safe then. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Valentin Schneider Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra (Intel) [rebase & fix {un,}lock_wait_lock helpers in ww_mutex.h] Signed-off-by: Connor O'Brien Signed-off-by: John Stultz --- v3: * Re-added this patch after it was dropped in v2 which caused lockdep warnings to trip. v7: * Fix function definition for PREEMPT_RT case, as pointed out by Metin Kaya. * Fix incorrect flags handling in PREEMPT_RT case as found by Metin Kaya --- kernel/locking/mutex.c | 18 ++++++++++-------- kernel/locking/ww_mutex.h | 22 +++++++++++----------- 2 files changed, 21 insertions(+), 19 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 4269da1f3ef5..6d843a0978a5 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -578,6 +578,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas DEFINE_WAKE_Q(wake_q); struct mutex_waiter waiter; struct ww_mutex *ww; + unsigned long flags; int ret; =20 if (!use_ww_ctx) @@ -620,7 +621,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas return 0; } =20 - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); /* * After waiting to acquire the wait_lock, try again. */ @@ -681,7 +682,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas goto err; } =20 - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); /* Make sure we do wakeups before calling schedule */ if (!wake_q_empty(&wake_q)) { wake_up_q(&wake_q); @@ -707,9 +708,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas trace_contention_begin(lock, LCB_F_MUTEX); } =20 - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); } - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); acquired: __set_current_state(TASK_RUNNING); =20 @@ -735,7 +736,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas if (ww_ctx) ww_mutex_lock_acquired(ww, ww_ctx); =20 - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); wake_up_q(&wake_q); preempt_enable(); return 0; @@ -745,7 +746,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas __mutex_remove_waiter(lock, &waiter); err_early_kill: trace_contention_end(lock, ret); - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); debug_mutex_free_waiter(&waiter); mutex_release(&lock->dep_map, ip); wake_up_q(&wake_q); @@ -916,6 +917,7 @@ static noinline void __sched __mutex_unlock_slowpath(st= ruct mutex *lock, unsigne struct task_struct *next =3D NULL; DEFINE_WAKE_Q(wake_q); unsigned long owner; + unsigned long flags; =20 mutex_release(&lock->dep_map, ip); =20 @@ -942,7 +944,7 @@ static noinline void __sched __mutex_unlock_slowpath(st= ruct mutex *lock, unsigne } } =20 - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); debug_mutex_unlock(lock); if (!list_empty(&lock->wait_list)) { /* get the first entry from the wait-list: */ @@ -960,7 +962,7 @@ static noinline void __sched __mutex_unlock_slowpath(st= ruct mutex *lock, unsigne __mutex_handoff(lock, next); =20 preempt_disable(); - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); wake_up_q(&wake_q); preempt_enable(); } diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index 7189c6631d90..9facc0ddfdd3 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -70,14 +70,14 @@ __ww_mutex_has_waiters(struct mutex *lock) return atomic_long_read(&lock->owner) & MUTEX_FLAG_WAITERS; } =20 -static inline void lock_wait_lock(struct mutex *lock) +static inline void lock_wait_lock(struct mutex *lock, unsigned long *flags) { - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, *flags); } =20 -static inline void unlock_wait_lock(struct mutex *lock) +static inline void unlock_wait_lock(struct mutex *lock, unsigned long *fla= gs) { - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, *flags); } =20 static inline void lockdep_assert_wait_lock_held(struct mutex *lock) @@ -144,14 +144,14 @@ __ww_mutex_has_waiters(struct rt_mutex *lock) return rt_mutex_has_waiters(&lock->rtmutex); } =20 -static inline void lock_wait_lock(struct rt_mutex *lock) +static inline void lock_wait_lock(struct rt_mutex *lock, unsigned long *fl= ags) { - raw_spin_lock(&lock->rtmutex.wait_lock); + raw_spin_lock_irqsave(&lock->rtmutex.wait_lock, *flags); } =20 -static inline void unlock_wait_lock(struct rt_mutex *lock) +static inline void unlock_wait_lock(struct rt_mutex *lock, unsigned long *= flags) { - raw_spin_unlock(&lock->rtmutex.wait_lock); + raw_spin_unlock_irqrestore(&lock->rtmutex.wait_lock, *flags); } =20 static inline void lockdep_assert_wait_lock_held(struct rt_mutex *lock) @@ -380,6 +380,7 @@ static __always_inline void ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx= *ctx) { DEFINE_WAKE_Q(wake_q); + unsigned long flags; =20 ww_mutex_lock_acquired(lock, ctx); =20 @@ -408,10 +409,9 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, s= truct ww_acquire_ctx *ctx) * Uh oh, we raced in fastpath, check if any of the waiters need to * die or wound us. */ - lock_wait_lock(&lock->base); + lock_wait_lock(&lock->base, &flags); __ww_mutex_check_waiters(&lock->base, ctx, &wake_q); - unlock_wait_lock(&lock->base); - + unlock_wait_lock(&lock->base, &flags); wake_up_q(&wake_q); } =20 --=20 2.45.0.rc1.225.g2a3ae87e7f-goog From nobody Sat Feb 7 15:10:45 2026 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D30296BFA4 for ; Tue, 7 May 2024 04:55:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057704; cv=none; b=ufl01+ZvEq6v7khmeDM//erGxfpAC4VEHCom3fHvh+zAaTP/lYtfeJMMfuUjdpw2gZgbEirtDrY6DT+S2+eGbmNFROaUQIaNZE/n5chKo4OarUKCWZAq7qe0IfZt9Ao1jeev8tgk01PNXeGCqmRhDGMimo8wcs4N0NIabI3gHvM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057704; c=relaxed/simple; bh=LJozwWFo00icSc+ph1GrGuSQds/hXTBjQ8rp7ambAo8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=FBEkFZ9L8dJWWWr+2E7r1jEpzAPu5eohzi/gGScxPu/65uVSlnWHF6enfsNfjoJLIGa7jDCKG8tSrvuZV7taeZ/xMzTiqKseLN3ALOFsRJWK3fUcbNnjdNYRzGKZNY9A5nSzaS0CRkwLbDhZBQ2b8TuoUvTGDQyWZTV6WybKG7k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Jl2uy3zP; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Jl2uy3zP" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-de468af2b73so5917862276.0 for ; Mon, 06 May 2024 21:55:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1715057702; x=1715662502; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=B5TEm22m8AhcjBDC3yb2+ACT2qc5jdCH9i9C2IPWAkE=; b=Jl2uy3zP/Vdcj/ApST/FlgX0jLLb+mlqQviZtBcC1BT8oQrIKowrNxylcvsYGG652M iojTrVGLIoqqZZ07qcXqYsmjhLqmy6vnO/y+/3f+LIH9ocudEBiytii1MRBW8e2+vOqY dTL7Jaz9CNuBCtvtspgI9PMeITcpYJ4RPFhH+/ist7Q5lF61y1F3yCrCHsjuPBE2yK0U +bs6vhu7JRTeXipT6+C3PuM5APlelQ/RAJVVfr0zxRcLbs8zqSCy6HJqx2bsJaw7FMVI AWvk/Fi7cTfNlLdArgPU6tfpr169RXHtew2cWQobtUdLsfDslMbmQA5LY0vkDuxFv1GQ 1/jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715057702; x=1715662502; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=B5TEm22m8AhcjBDC3yb2+ACT2qc5jdCH9i9C2IPWAkE=; b=aw/6AlXyNRBr63xqCWw1vhnU8QtrzfmCFVjA208WJiNGimPDqs7ux9LbDkFru7I959 dEC7ru641GzqJA5FuYxHPVW5CW8qBwFET1tZ/aVjh1tSH6Wvs9UQR1QHGN+ZQE1cU0oP aw1VIAZQr3V4Ach8J0aXltVT5t9aXHTEUUm8KnHgY+H2AcTfUvg6W5zZGQnXpD0Zz8Cw /Bh9dXUe9kYPNTKOj83IjkmwhL6IJACr82sHVQNJz0EosbHkDXqe0KDU/aP1LDdkUezz TsyGwTQJ/+frb1/tmt4HXBb6eEwz6rsBoCaVvRUrdgW8pAfSYN8PcfiaTFftFS3L7DQq eRpw== X-Gm-Message-State: AOJu0YzBv/wzgu4BrULUDwU92FRCP19MF6ub0UY4gfaIZrtf8IRux4op 0pCs7bHEfQkeJMQrKWM65I4drF+z90U9lRc/56QjVUPIACLxz3CSiLf0UNl2ilbLGNNbDvSnB5N D22N3jgc/3n8Y4fqcFhvIPcofTN9F1ABUVHz2jzH9Dvh0WWCGaKZBVvQJqCYqraW3ZWqzHzBm6p 4Oj+2ocNuv++qNygCAaOkm5TVbK9e8n2rj1KmdR89cz18m X-Google-Smtp-Source: AGHT+IEBdky+n78mUJDzkbJ5MNXnoaZpEpKu6Pn2MLbvJpxrCOpCdmVI80xUslnpLqD7y0cfr2dRXFly7bst X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a05:6902:701:b0:dcb:e4a2:1ab1 with SMTP id k1-20020a056902070100b00dcbe4a21ab1mr4171430ybt.11.1715057701501; Mon, 06 May 2024 21:55:01 -0700 (PDT) Date: Mon, 6 May 2024 21:54:32 -0700 In-Reply-To: <20240507045450.895430-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240507045450.895430-1-jstultz@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240507045450.895430-4-jstultz@google.com> Subject: [PATCH v10 3/7] locking/mutex: Expose __mutex_owner() From: John Stultz To: LKML Cc: Juri Lelli , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , kernel-team@android.com, Metin Kaya , Valentin Schneider , "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Juri Lelli Implementing proxy execution requires that scheduler code be able to identify the current owner of a mutex. Expose __mutex_owner() for this purpose (alone!). Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Valentin Schneider Signed-off-by: Juri Lelli [Removed the EXPORT_SYMBOL] Signed-off-by: Valentin Schneider Signed-off-by: Connor O'Brien [jstultz: Reworked per Peter's suggestions] Signed-off-by: John Stultz --- v4: * Move __mutex_owner() to kernel/locking/mutex.h instead of adding a new globally available accessor function to keep the exposure of this low, along with keeping it an inline function, as suggested by PeterZ v10: * Handle null lock ptr, to simplify later code, as suggested by Metin Kaya --- kernel/locking/mutex.c | 25 ------------------------- kernel/locking/mutex.h | 27 +++++++++++++++++++++++++++ 2 files changed, 27 insertions(+), 25 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 6d843a0978a5..4b7193fd3be9 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -56,31 +56,6 @@ __mutex_init(struct mutex *lock, const char *name, struc= t lock_class_key *key) } EXPORT_SYMBOL(__mutex_init); =20 -/* - * @owner: contains: 'struct task_struct *' to the current lock owner, - * NULL means not owned. Since task_struct pointers are aligned at - * at least L1_CACHE_BYTES, we have low bits to store extra state. - * - * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup. - * Bit1 indicates unlock needs to hand the lock to the top-waiter - * Bit2 indicates handoff has been done and we're waiting for pickup. - */ -#define MUTEX_FLAG_WAITERS 0x01 -#define MUTEX_FLAG_HANDOFF 0x02 -#define MUTEX_FLAG_PICKUP 0x04 - -#define MUTEX_FLAGS 0x07 - -/* - * Internal helper function; C doesn't allow us to hide it :/ - * - * DO NOT USE (outside of mutex code). - */ -static inline struct task_struct *__mutex_owner(struct mutex *lock) -{ - return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA= GS); -} - static inline struct task_struct *__owner_task(unsigned long owner) { return (struct task_struct *)(owner & ~MUTEX_FLAGS); diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h index 0b2a79c4013b..cbff35b9b7ae 100644 --- a/kernel/locking/mutex.h +++ b/kernel/locking/mutex.h @@ -20,6 +20,33 @@ struct mutex_waiter { #endif }; =20 +/* + * @owner: contains: 'struct task_struct *' to the current lock owner, + * NULL means not owned. Since task_struct pointers are aligned at + * at least L1_CACHE_BYTES, we have low bits to store extra state. + * + * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup. + * Bit1 indicates unlock needs to hand the lock to the top-waiter + * Bit2 indicates handoff has been done and we're waiting for pickup. + */ +#define MUTEX_FLAG_WAITERS 0x01 +#define MUTEX_FLAG_HANDOFF 0x02 +#define MUTEX_FLAG_PICKUP 0x04 + +#define MUTEX_FLAGS 0x07 + +/* + * Internal helper function; C doesn't allow us to hide it :/ + * + * DO NOT USE (outside of mutex & scheduler code). + */ +static inline struct task_struct *__mutex_owner(struct mutex *lock) +{ + if (!lock) + return NULL; + return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA= GS); +} + #ifdef CONFIG_DEBUG_MUTEXES extern void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter); --=20 2.45.0.rc1.225.g2a3ae87e7f-goog From nobody Sat Feb 7 15:10:45 2026 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA29D6CDBA for ; Tue, 7 May 2024 04:55:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057706; cv=none; b=rqymENQYbngs7i7LBxCfiLmyCcV6Q4YdC3RB6BAiO4vVbkOLK4epgfQiTtkwBPA2yhZXwQiXzONPoh7dAWBi4DXFM7DGuHEkziQt72V0wOAPUbyOYblyavuetBvgZoXpgK4HUFGEsGvtUvSADQvCwpMdeOCrotq/ynTexKhSEI4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057706; c=relaxed/simple; bh=WOTPWSz2YlNs0MD/3Y+/7eM8LrVBYFxCR8h+6Ap+srY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YQr2fIC4z2QHKtJGcL4iHTM7tuwRfNqcG5PgFQRFvNvf0po8x6egAqhH/8fBJIxzULlDVY7gdYwtZFqEVwLN8JfnoCrGrbhFCpz60034IWWNG7j4en2FM6ETeD1I8rD+Btz693Oe9jx2QmW2lcpO9PrnFQ3/04V5DDy/obEKOMA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3wvU7jLF; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3wvU7jLF" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-ddaf165a8d9so4156384276.1 for ; Mon, 06 May 2024 21:55:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1715057704; x=1715662504; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mkplDeMhqyzSjhNDkkKbm0Zp7QOfiWk6J51tAdtVLaQ=; b=3wvU7jLFFl0a4pvtq5g7vWKoEg1Xe9Ddi4iPVTeQzQMcNCEbLj2yzAD97i9B8TBwKE IVWkk+e1N+FHzFLbrsI73joOsVZBJKBbL/obMvNW0nrnBB8e4wERhDlqzatFapDQZjA8 tvHowSHFWBjblienuKvsMA2hjwgbmXH6YVt5AdHovaHh3YszMR8iYZWD5U8hY4UcvCze fneCFmU2rk8b76K8wo0ep9/mwzyYitgEOVqvJNcoX+qrhqhDjmZPhNzSRi+jLyy7c2CU ykRTOXE4wvzaDnsT4Q+Jdfkfc8Ok/sRfGluPbRDabFRa/4/cuiMdgXrkW6Rqet6eqd7z ASuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715057704; x=1715662504; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mkplDeMhqyzSjhNDkkKbm0Zp7QOfiWk6J51tAdtVLaQ=; b=EXZ9ldBFeexld4TO7+Lyo/TM0KKVke49sPivqZQIohxJbX297lzRGR1WX/75cg+J/3 l7CcaGUOPWBOptwO+nXZj3+fzyIK9IBOjjXPL55hQOCuXWwO2XOB9fAWjp/JtatjShG5 vs1qQvMe3GAm+tjo0l8RqubmYnFQy8Io4uBBiv+DJlEo0RZLNi7YIdf0DmXFkfgVbRJp JJXfHNNlOF0u0G6Rh8Jj0EodmyvZtQRd0BV08ZK7zlEEgnH4BXun67dzOxo2MHqHoAPN TX4nQaYe3eWr8anbsZPRtPcG0kYV92WsSJ5JszfGSdhrgBHdGDsez8kcd22gvWBKrZU/ uyqg== X-Gm-Message-State: AOJu0YzKHF7RXaar7nI2nB56j/UV+yjhVAdFwFlFIFjkWBhPv8cbmKsD Oq+zQcOEDISrjROFriCqtvrNpdtojij88w35tQ0J/e2oVOBEm2o3hpcVfzJM5fTfulJvuQG3D+Z P5gFyiY98q5o1A/4voiITj+6M0PA8s6cW7QD3AwI2D16vtDhG31ptJTu3WJqWhIEGbMdzUhLv9f L6rWz4POeK/YctAVNnS4RgAiyeeEmCz2/4yK27iQirQCWf X-Google-Smtp-Source: AGHT+IGZP+zsLmo6nICRW9xaiUj6SsVuJj0XGFWgzUQ1D8XCVkb812YDE6NQv66ac3EXtScHVNn/bpW6TVUe X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a25:c504:0:b0:de4:654f:9ad0 with SMTP id v4-20020a25c504000000b00de4654f9ad0mr1516343ybe.6.1715057703421; Mon, 06 May 2024 21:55:03 -0700 (PDT) Date: Mon, 6 May 2024 21:54:33 -0700 In-Reply-To: <20240507045450.895430-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240507045450.895430-1-jstultz@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240507045450.895430-5-jstultz@google.com> Subject: [PATCH v10 4/7] sched: Add move_queued_task_locked helper From: John Stultz To: LKML Cc: "Connor O'Brien" , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , kernel-team@android.com, Metin Kaya , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Connor O'Brien Switch logic that deactivates, sets the task cpu, and reactivates a task on a different rq to use a helper that will be later extended to push entire blocked task chains. This patch was broken out from a larger chain migration patch originally by Connor O'Brien. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Valentin Schneider Signed-off-by: Connor O'Brien [jstultz: split out from larger chain migration patch] Signed-off-by: John Stultz Reviewed-by: Qais Yousef --- v8: * Renamed from push_task_chain to do_push_task so it makes more sense without proxy-execution v10: * Changed name to move_queued_task_locked as suggested by Valentin --- kernel/sched/core.c | 4 +--- kernel/sched/deadline.c | 8 ++------ kernel/sched/rt.c | 8 ++------ kernel/sched/sched.h | 11 +++++++++++ 4 files changed, 16 insertions(+), 15 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7019a40457a6..48f0d4b381d5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2712,9 +2712,7 @@ int push_cpu_stop(void *arg) =20 // XXX validate p is still the highest prio task if (task_rq(p) =3D=3D rq) { - deactivate_task(rq, p, 0); - set_task_cpu(p, lowest_rq->cpu); - activate_task(lowest_rq, p, 0); + move_queued_task_locked(rq, lowest_rq, p); resched_curr(lowest_rq); } =20 diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index a04a436af8cc..eaedc69c5e30 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2443,9 +2443,7 @@ static int push_dl_task(struct rq *rq) goto retry; } =20 - deactivate_task(rq, next_task, 0); - set_task_cpu(next_task, later_rq->cpu); - activate_task(later_rq, next_task, 0); + move_queued_task_locked(rq, later_rq, next_task); ret =3D 1; =20 resched_curr(later_rq); @@ -2531,9 +2529,7 @@ static void pull_dl_task(struct rq *this_rq) if (is_migration_disabled(p)) { push_task =3D get_push_task(src_rq); } else { - deactivate_task(src_rq, p, 0); - set_task_cpu(p, this_cpu); - activate_task(this_rq, p, 0); + move_queued_task_locked(src_rq, this_rq, p); dmin =3D p->dl.deadline; resched =3D true; } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 3261b067b67e..975cb49a64dc 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2106,9 +2106,7 @@ static int push_rt_task(struct rq *rq, bool pull) goto retry; } =20 - deactivate_task(rq, next_task, 0); - set_task_cpu(next_task, lowest_rq->cpu); - activate_task(lowest_rq, next_task, 0); + move_queued_task_locked(rq, lowest_rq, next_task); resched_curr(lowest_rq); ret =3D 1; =20 @@ -2379,9 +2377,7 @@ static void pull_rt_task(struct rq *this_rq) if (is_migration_disabled(p)) { push_task =3D get_push_task(src_rq); } else { - deactivate_task(src_rq, p, 0); - set_task_cpu(p, this_cpu); - activate_task(this_rq, p, 0); + move_queued_task_locked(src_rq, this_rq, p); resched =3D true; } /* diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ae50f212775e..2d41ebe200c7 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3480,5 +3480,16 @@ static inline void init_sched_mm_cid(struct task_str= uct *t) { } =20 extern u64 avg_vruntime(struct cfs_rq *cfs_rq); extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se); +#ifdef CONFIG_SMP +static inline +void move_queued_task_locked(struct rq *rq, struct rq *dst_rq, struct task= _struct *task) +{ + lockdep_assert_rq_held(rq); + lockdep_assert_rq_held(dst_rq); + deactivate_task(rq, task, 0); + set_task_cpu(task, dst_rq->cpu); + activate_task(dst_rq, task, 0); +} +#endif =20 #endif /* _KERNEL_SCHED_SCHED_H */ --=20 2.45.0.rc1.225.g2a3ae87e7f-goog From nobody Sat Feb 7 15:10:45 2026 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 748546A8DE for ; Tue, 7 May 2024 04:55:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057708; cv=none; b=QRWOT1cSTkh2ZIVDfTwpgcWZYpv+f2o58+kD3X4p5iF9ZlqeOacYN1rH81ahWvOgskH9BsgsmFv8QJrsyHcSZbs5pAeDZ7tb7ZLhXUWK2ENjwEGJcmHcpd+29Bc0PJjg1F1+rQgYt344VefO8TKsYzqQYdsJIA8GlfYfDVOG2s4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057708; c=relaxed/simple; bh=QRM6rt9Y2CwgCYANXyI4MGSWhlWefGEY58W8j/wW5v0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=l1wU6smykxzDm0V7gYJSf7jA2dytA/5NWae8Bh+o2M5KHobl7RVxj9b0zh+uOBlElQ2Zfe9NvIHZA/u1pgkEPyhxBITNIfkxMHl98AkH5ShK2tsQoksZ9Xy2BRk5qPDxBFbhf1fBAB54iF8r4m8GKx0lDiETzTU6ZZQyou1wLVg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=eRw8vXNs; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="eRw8vXNs" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-deb45f85880so4894417276.2 for ; Mon, 06 May 2024 21:55:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1715057705; x=1715662505; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=RJRHvCMvLtcKdBlmKBDsb8/B612809tuo4BzdlxZuvg=; b=eRw8vXNseaAHh68cxxb9IBF7jzaieJYBvk/yUNamGQPTknGnQmIjx7vyPmui35s6ct bUS6T1NcHhyfZbsX47xYCjxS6rEUxfXa7F7WD49uQ5agGLegt0f6+4GRA3ohis4TJ3LW I+T8/BO3uwN/CNimpcsSbNeBo9gfjpGoayMBMRtai1tJHh7I1ZsrS4c+wBTZh8al+wgV ygB+31shcu7wGd0Vv5cL7xJ6ByYaX8djfHJVHPEI2BJoS1GfUIMJyW83PqIePVULNIsE np44Hpd9sS0QXssRaU1cKWiNknXpXQB+R0pZ6zIH+77PNdoG5EhJx98t3wev6Q7dSq+e M9tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715057705; x=1715662505; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RJRHvCMvLtcKdBlmKBDsb8/B612809tuo4BzdlxZuvg=; b=B8xMn0WY+VPuwrpPfSMvVlDAtcJ+AbUf+uUzDt6U9AxFqFAA+jhVXRlsotYcrT0B1w +dCnP42+AgSt0tH+BS0hmaU1wX7gh4u1bgaYpDV4n5aAW6KBc7uU0L9F04DufeS2Q15W 1jfe8XoT/saIuz7EjI3Q6IgSRADy+THWaXn98mFZgnWxVK0+yD8lVjrnPrKS84avGCRf 5KLacIMul6R+k2kNFv/a0tmtj/hoaBmbL/XWWeh3WhYQl6dWEut09qEHUgIqQZsExhR9 afxA1QvP29Fw31uCuMZFjtJNBTBfHRt5mTFB7CWllYo3VptIk6ZpnHxIWwhxw8GTqyXc AgGQ== X-Gm-Message-State: AOJu0YwQT+o/59SAPYvPpiHDFvDGt440Ck7CFmBVeSWQi+qu1vTF5ql5 2NW5zlfxNoPdvc7uBRxr6y1aLr7BRPbSHbIEa49RvhmMbRTJcMKkNZauCYl4JxIxI+3O9UelC3V TlQcNwVezsmnKKmg/nu2rH0+JTcA7k3zLU1i3686S9vML3lj4cE73puIvneYmDV0VLkAqtoTHyL R7kxnTMKedL9f3u76bivAXa4CeG1ltEDk9yoB8OUF7RLMc X-Google-Smtp-Source: AGHT+IHeys1vXI0WLNCvokOkQNCtDBQAgNUlXYE91szlRUmsMyateB7LH42VV9Tu7E4Z1Lcj3O8T7+TREuVL X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a25:2f53:0:b0:de1:d49:7ff6 with SMTP id v80-20020a252f53000000b00de10d497ff6mr1469308ybv.7.1715057705297; Mon, 06 May 2024 21:55:05 -0700 (PDT) Date: Mon, 6 May 2024 21:54:34 -0700 In-Reply-To: <20240507045450.895430-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240507045450.895430-1-jstultz@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240507045450.895430-6-jstultz@google.com> Subject: [PATCH v10 5/7] sched: Consolidate pick_*_task to task_is_pushable helper From: John Stultz To: LKML Cc: "Connor O'Brien" , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , kernel-team@android.com, Metin Kaya , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Connor O'Brien This patch consolidates rt and deadline pick_*_task functions to a task_is_pushable() helper This patch was broken out from a larger chain migration patch originally by Connor O'Brien. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Valentin Schneider Signed-off-by: Connor O'Brien [jstultz: split out from larger chain migration patch, renamed helper function] Signed-off-by: John Stultz Reviewed-by: Qais Yousef --- v7: * Split from chain migration patch * Renamed function --- kernel/sched/deadline.c | 10 +--------- kernel/sched/rt.c | 11 +---------- kernel/sched/sched.h | 10 ++++++++++ 3 files changed, 12 insertions(+), 19 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index eaedc69c5e30..ae583a427539 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2179,14 +2179,6 @@ static void task_fork_dl(struct task_struct *p) /* Only try algorithms three times */ #define DL_MAX_TRIES 3 =20 -static int pick_dl_task(struct rq *rq, struct task_struct *p, int cpu) -{ - if (!task_on_cpu(rq, p) && - cpumask_test_cpu(cpu, &p->cpus_mask)) - return 1; - return 0; -} - /* * Return the earliest pushable rq's task, which is suitable to be executed * on the CPU, NULL otherwise: @@ -2205,7 +2197,7 @@ static struct task_struct *pick_earliest_pushable_dl_= task(struct rq *rq, int cpu if (next_node) { p =3D __node_2_pdl(next_node); =20 - if (pick_dl_task(rq, p, cpu)) + if (task_is_pushable(rq, p, cpu) =3D=3D 1) return p; =20 next_node =3D rb_next(next_node); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 975cb49a64dc..8b6fb77e095b 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1791,15 +1791,6 @@ static void put_prev_task_rt(struct rq *rq, struct t= ask_struct *p) /* Only try algorithms three times */ #define RT_MAX_TRIES 3 =20 -static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu) -{ - if (!task_on_cpu(rq, p) && - cpumask_test_cpu(cpu, &p->cpus_mask)) - return 1; - - return 0; -} - /* * Return the highest pushable rq's task, which is suitable to be executed * on the CPU, NULL otherwise @@ -1813,7 +1804,7 @@ static struct task_struct *pick_highest_pushable_task= (struct rq *rq, int cpu) return NULL; =20 plist_for_each_entry(p, head, pushable_tasks) { - if (pick_rt_task(rq, p, cpu)) + if (task_is_pushable(rq, p, cpu) =3D=3D 1) return p; } =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2d41ebe200c7..e46f69ba9ba2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3490,6 +3490,16 @@ void move_queued_task_locked(struct rq *rq, struct r= q *dst_rq, struct task_struc set_task_cpu(task, dst_rq->cpu); activate_task(dst_rq, task, 0); } + +static inline +int task_is_pushable(struct rq *rq, struct task_struct *p, int cpu) +{ + if (!task_on_cpu(rq, p) && + cpumask_test_cpu(cpu, &p->cpus_mask)) + return 1; + + return 0; +} #endif =20 #endif /* _KERNEL_SCHED_SCHED_H */ --=20 2.45.0.rc1.225.g2a3ae87e7f-goog From nobody Sat Feb 7 15:10:45 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E88056BB44 for ; Tue, 7 May 2024 04:55:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057710; cv=none; b=N16/cSHVMaq6pehWEdm5EiW2Z4EGvBcyQjm+8fRCXR0oMq6yJ3Pwp39gfWNq+emNNBD21UI4+S3vPHv5q1Lx5mwz3VNm4hGOnrYqGiukL/HpoV3hCEUIqpTLU834tXJW1UAau12RNiKLEBccor5/B7IDSkEwuYSY5zL6DqQToy0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057710; c=relaxed/simple; bh=g+GTE0QgxbfXrqDmGkRrFg5fgfsWhvESxTFXfZ6/tqs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=VUsSKjt0Otxo0oPllgbcJKudr3VnPoFXNLLitXxVQuKP1Fm+rMEMU6Nlk38wgXGjwEYyAJKlnICkoShXe5NsK2jkYuEfR9k0dH+kIGo/Pv1x9ITC30R6B7puAfu7uiDi4NevCVaSgfIw4gr0APUBxoOndm1pXJbsuTVtf29A21Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Po6hLqjS; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Po6hLqjS" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2b38e234a96so2286230a91.0 for ; Mon, 06 May 2024 21:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1715057708; x=1715662508; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=frDfJWnzxEJXtKKN1swtCrc9MKAkukKvokYtsbi9wt8=; b=Po6hLqjSKMMxMISjQak1BL6baDtYbXBqx6mexRMsjDJxGt028L6sI33lJjuTNSdYw0 cxl88WOrYNgD5Fth5kwstDdfDMurbCv+rTqpY350Svau3YfeVNP3A11+HjDpwauJdFHC wZxcawLXciUtoGpqRuS4JuIkHoR5t4/NeamFTRyP5v831RE2uuTYWB9ymYoewMLYgk1B KH+F4MhtYeMZAYooGmffsMA+T0IfcD42l6+AgXtUhhsD6gnxo0xNaChV5ZLTvv69HPsF 1scMik7z2LfIz0haXUVt/gS5jkGmVBUOfudI+CzT00Ayy3JJ4bwzWJq1xQdj+tELurlC +Jgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715057708; x=1715662508; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=frDfJWnzxEJXtKKN1swtCrc9MKAkukKvokYtsbi9wt8=; b=UO6pXvaTSodG0nSnKkp2Zu/22j4WdzyxkxjFmvdZlK9Dm7t+BdiPixBLHOIrYgZvij W9/RadtQCmVycQZOaFp0xVYy94XYOmrxDbfugby26PypA3WzlPh2BrgKvQwC+2QDU+/B CELJCHGmOfEYRpKwceTf6tOfDAf6GiM+Alc2krO6JaqhY1Knn+vbFcP23cNr1YWLVStu aL3w20id7Ax4HMoaRgWs6qat1kOBtR/O9Xm1hS/GDKjmr48OYCokLdwzxlCPXmdYcX7i PQa0TFnCZvgDTY8N76SAu5OqEgPQKsCg0/qczhAOs3RPCX45WUTN9q/a9OWtDM6DAj5b u/XQ== X-Gm-Message-State: AOJu0Yy5wfkzaEMRO461oLtpB9cfNGLdr12tZB1+vreywgMc1JbX5yUh BhWNYmn0WkM8R5uUgCEmA3Pp/Aag2AiIpAQvMeiexiChEM2RpWZPCLJsD7BWNjSZw58uETb+cpP yKAtOwfQLtBIJK4k2YFLGmlzv9vRdSZe5W0dyRKAxp/SQP/Y/xsAqow02sEmeQVSkla8VPZnCZ4 AnUr0N1TZP/FnEc0BW9qFiuV91NLXkAl22YbBMofhQ13WT X-Google-Smtp-Source: AGHT+IFvlQYUY1S0c6R25r61u9MoSdtllzzBEFfXLANFAfot55vN9dlCHOU8FfzFirWkKwAzNUTX/WNRVXNB X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a17:90a:aa17:b0:2ad:5fb9:16e9 with SMTP id k23-20020a17090aaa1700b002ad5fb916e9mr38005pjq.2.1715057707178; Mon, 06 May 2024 21:55:07 -0700 (PDT) Date: Mon, 6 May 2024 21:54:35 -0700 In-Reply-To: <20240507045450.895430-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240507045450.895430-1-jstultz@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240507045450.895430-7-jstultz@google.com> Subject: [PATCH v10 6/7] sched: Split out __schedule() deactivate task logic into a helper From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , kernel-team@android.com, Metin Kaya Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As we're going to re-use the deactivation logic, split it into a helper. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Signed-off-by: John Stultz Reviewed-by: Qais Yousef --- v6: * Define function as static to avoid "no previous prototype" warnings as Reported-by: kernel test robot v7: * Rename state task_state to be more clear, as suggested by Metin Kaya --- kernel/sched/core.c | 72 +++++++++++++++++++++++++++------------------ 1 file changed, 43 insertions(+), 29 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 48f0d4b381d5..8bc5844ebab9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6572,6 +6572,48 @@ pick_next_task(struct rq *rq, struct task_struct *pr= ev, struct rq_flags *rf) # define SM_MASK_PREEMPT SM_PREEMPT #endif =20 +/* + * Helper function for __schedule() + * + * If a task does not have signals pending, deactivate it and return true + * Otherwise marks the task's __state as RUNNING and returns false + */ +static bool try_to_deactivate_task(struct rq *rq, struct task_struct *p, + unsigned long task_state) +{ + if (signal_pending_state(task_state, p)) { + WRITE_ONCE(p->__state, TASK_RUNNING); + } else { + p->sched_contributes_to_load =3D + (task_state & TASK_UNINTERRUPTIBLE) && + !(task_state & TASK_NOLOAD) && + !(task_state & TASK_FROZEN); + + if (p->sched_contributes_to_load) + rq->nr_uninterruptible++; + + /* + * __schedule() ttwu() + * prev_state =3D prev->state; if (p->on_rq && ...) + * if (prev_state) goto out; + * p->on_rq =3D 0; smp_acquire__after_ctrl_dep(); + * p->state =3D TASK_WAKING + * + * Where __schedule() and ttwu() have matching control dependencies. + * + * After this, schedule() must not care about p->state any more. + */ + deactivate_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK); + + if (p->in_iowait) { + atomic_inc(&rq->nr_iowait); + delayacct_blkio_start(); + } + return true; + } + return false; +} + /* * __schedule() is the main scheduler function. * @@ -6665,35 +6707,7 @@ static void __sched notrace __schedule(unsigned int = sched_mode) */ prev_state =3D READ_ONCE(prev->__state); if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) { - if (signal_pending_state(prev_state, prev)) { - WRITE_ONCE(prev->__state, TASK_RUNNING); - } else { - prev->sched_contributes_to_load =3D - (prev_state & TASK_UNINTERRUPTIBLE) && - !(prev_state & TASK_NOLOAD) && - !(prev_state & TASK_FROZEN); - - if (prev->sched_contributes_to_load) - rq->nr_uninterruptible++; - - /* - * __schedule() ttwu() - * prev_state =3D prev->state; if (p->on_rq && ...) - * if (prev_state) goto out; - * p->on_rq =3D 0; smp_acquire__after_ctrl_dep(); - * p->state =3D TASK_WAKING - * - * Where __schedule() and ttwu() have matching control dependencies. - * - * After this, schedule() must not care about p->state any more. - */ - deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK); - - if (prev->in_iowait) { - atomic_inc(&rq->nr_iowait); - delayacct_blkio_start(); - } - } + try_to_deactivate_task(rq, prev, prev_state); switch_count =3D &prev->nvcsw; } =20 --=20 2.45.0.rc1.225.g2a3ae87e7f-goog From nobody Sat Feb 7 15:10:45 2026 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 10B1371757 for ; Tue, 7 May 2024 04:55:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057712; cv=none; b=K6scXKoZhwiHaW/nXRaizPwSRhBF5XWwiEWYzeYMs2bz7XE6E9rEfMcMG+rWFCbHFQ0DgAr+5GGzVfnfSDp2wC6En0SdIPuABf9w6shCHHtsbCQYogzDD8NW7BDy8grp3wuTCGo7Q4rIMnFihJMxoJbq6aKgFLbT9zSmW8RD064= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715057712; c=relaxed/simple; bh=c0CTlNWIT2VmkUCh6P94jxABJG5lfMRKmxS1WBdA+zM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=dj1jzGaRpbMbHGgbB7ympOg3Glj688oIWAknUZ02ypncSFR+oiHhNTLlEULfKxWZeYmbdNOygNsCee1n8LdWPYcig+svemhBKDIK5UlHVSEezYzaihRZrQ8O6RAMzvJOcYQDfrjhqGRIfuxUq9gchzpQ9SnJjAaddA/vjDVL9d8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ARBXxHwQ; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ARBXxHwQ" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-6ef9edf9910so3524807b3a.1 for ; Mon, 06 May 2024 21:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1715057709; x=1715662509; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Z81Liy4u6PT+n9ZgAyfJktjHXoOW7bLB/k6KjtnCoO0=; b=ARBXxHwQkenVdE25uVCX45Uu95rW48Z0MIgVTXdxaBDPmbs38LJTFOCw0iCxjD0IAz zi0YYFYOs/Mjxh1yb4/CaoCwBhxoRpw4VgmFFDfzEFU5I/p1JjgagiLTDhr108VOTz/W Ni2aZtcsI4YrFqK8UFT2NPQjZeyx3tuTOhmZpUIX+ob4Ty8yUEFBJzCYks3eHcBMnwmM iL5mC7arzV4qwuOgXhwcKY68NASjuuXVEk0Oc6saVi4aYN/p13Cz0GMjcnzt81lKL2dO upHRO1iTzSAPHp9y00xmMa3GxZKb6h/wrqLEp8XB89Z9cYGef+09TlB4yknTaJkram9u ctCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715057709; x=1715662509; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Z81Liy4u6PT+n9ZgAyfJktjHXoOW7bLB/k6KjtnCoO0=; b=AChrjeam/zHW8hs5cawvcva77x/jd/ZcLxVPmULM7sNz8rTw6jcCSIUv4Yp+Y2dHc2 tQm3JSb/W/HaE1wrdpLkmL6xWn0qP9CwvbQ2OJTLMxyuzbDGCoTOfBU3WeDW99A2v6Ie dqtyOI7WuPDGkXMt3FDXWnQ8JKcxz4ahq0ybHx7JZv4R8NfgUeMqv1ab8OkpMVmuwE+/ JdoRbjeoZFdG6jqrEJWU0QWBiZHNXlqJ78L9tzEkVS98pSyqACvagBRcwHXh1v2W7Gnq QqhAl0ySovfBnwZH/tMtcR3SZr4PIcEwzDTZN6D7F+lPQ8PttYNCgCKUBGCA0yMtb4wf j9MA== X-Gm-Message-State: AOJu0YwEtO5yjtibjnCiRu2Li/Exa3dnXEcELFnaAURtoSas0uN5Rt1Y fh5P8u9s5ItiZgKWd7n1s+ql6m0jNub46Jedw0Y/iTs2rkFqzl1yElKC1lAl1NxgOHJVceZoN3M XaTDhYYl2UGytTQjxSjWrXTNzRfBicdrw52OWo5mlsa2Vs70o49RfFHiYs/BuE/lmmFNJPxQWek sgbZSfHsGQC248BQZ0QTlz+pw4/Y2cKSJ55CUdgGW9vY9s X-Google-Smtp-Source: AGHT+IFbiOTBRpUY+fKzB/py8nOYBkzA+8Xge7AMh2xZEzZuGmV0+KXUbMo0Wg29OmbJohXfDWFp0lapBEuS X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a05:6a00:3a01:b0:6f3:8468:432f with SMTP id fj1-20020a056a003a0100b006f38468432fmr320614pfb.3.1715057709034; Mon, 06 May 2024 21:55:09 -0700 (PDT) Date: Mon, 6 May 2024 21:54:36 -0700 In-Reply-To: <20240507045450.895430-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240507045450.895430-1-jstultz@google.com> X-Mailer: git-send-email 2.45.0.rc1.225.g2a3ae87e7f-goog Message-ID: <20240507045450.895430-8-jstultz@google.com> Subject: [PATCH v10 7/7] sched: Split scheduler and execution contexts From: John Stultz To: LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Xuewen Yan , K Prateek Nayak , Metin Kaya , Thomas Gleixner , kernel-team@android.com, Metin Kaya , "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Let's define the scheduling context as all the scheduler state in task_struct for the task selected to run, and the execution context as all state required to actually run the task. Currently both are intertwined in task_struct. We want to logically split these such that we can use the scheduling context of the task selected to be scheduled, but use the execution context of a different task to actually be run. To this purpose, introduce rq_selected() macro to point to the task_struct selected from the runqueue by the scheduler, and will be used for scheduler state, and preserve rq->curr to indicate the execution context of the task that will actually be run. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Metin Kaya Cc: Thomas Gleixner Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20181009092434.26221-5-juri.lelli@redhat.com [add additional comments and update more sched_class code to use rq::proxy] Signed-off-by: Connor O'Brien [jstultz: Rebased and resolved minor collisions, reworked to use accessors, tweaked update_curr_common to use rq_proxy fixing rt scheduling issues] Signed-off-by: John Stultz --- v2: * Reworked to use accessors * Fixed update_curr_common to use proxy instead of curr v3: * Tweaked wrapper names * Swapped proxy for selected for clarity v4: * Minor variable name tweaks for readability * Use a macro instead of a inline function and drop other helper functions as suggested by Peter. * Remove verbose comments/questions to avoid review distractions, as suggested by Dietmar v5: * Add CONFIG_PROXY_EXEC option to this patch so the new logic can be tested with this change * Minor fix to grab rq_selected when holding the rq lock v7: * Minor spelling fix and unused argument fixes suggested by Metin Kaya * Switch to curr_selected for consistency, and minor rewording of commit message for clarity * Rename variables selected instead of curr when we're using rq_selected() * Reduce macros in CONFIG_SCHED_PROXY_EXEC ifdef sections, as suggested by Metin Kaya v8: * Use rq->curr, not rq_selected with task_tick, as suggested by Valentin * Minor rework to reorder this with CONFIG_SCHED_PROXY_EXEC patch v10: * Use rq_selected in push_rt_task & get_push_task --- kernel/sched/core.c | 46 ++++++++++++++++++++++++++--------------- kernel/sched/deadline.c | 35 ++++++++++++++++--------------- kernel/sched/fair.c | 18 ++++++++-------- kernel/sched/rt.c | 42 ++++++++++++++++++------------------- kernel/sched/sched.h | 27 +++++++++++++++++++++--- 5 files changed, 101 insertions(+), 67 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8bc5844ebab9..30af17648f8c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -794,7 +794,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *time= r) =20 rq_lock(rq, &rf); update_rq_clock(rq); - rq->curr->sched_class->task_tick(rq, rq->curr, 1); + rq_selected(rq)->sched_class->task_tick(rq, rq->curr, 1); rq_unlock(rq, &rf); =20 return HRTIMER_NORESTART; @@ -2236,16 +2236,18 @@ static inline void check_class_changed(struct rq *r= q, struct task_struct *p, =20 void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags) { - if (p->sched_class =3D=3D rq->curr->sched_class) - rq->curr->sched_class->wakeup_preempt(rq, p, flags); - else if (sched_class_above(p->sched_class, rq->curr->sched_class)) + struct task_struct *selected =3D rq_selected(rq); + + if (p->sched_class =3D=3D selected->sched_class) + selected->sched_class->wakeup_preempt(rq, p, flags); + else if (sched_class_above(p->sched_class, selected->sched_class)) resched_curr(rq); =20 /* * A queue event has occurred, and we're going to schedule. In * this case, we can save a useless back to back clock update. */ - if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr)) + if (task_on_rq_queued(selected) && test_tsk_need_resched(rq->curr)) rq_clock_skip_update(rq); } =20 @@ -2772,7 +2774,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct a= ffinity_context *ctx) lockdep_assert_held(&p->pi_lock); =20 queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); =20 if (queued) { /* @@ -5596,7 +5598,7 @@ unsigned long long task_sched_runtime(struct task_str= uct *p) * project cycles that may never be accounted to this * thread, breaking clock_gettime(). */ - if (task_current(rq, p) && task_on_rq_queued(p)) { + if (task_current_selected(rq, p) && task_on_rq_queued(p)) { prefetch_curr_exec_start(p); update_rq_clock(rq); p->sched_class->update_curr(rq); @@ -5664,7 +5666,8 @@ void scheduler_tick(void) { int cpu =3D smp_processor_id(); struct rq *rq =3D cpu_rq(cpu); - struct task_struct *curr =3D rq->curr; + /* accounting goes to the selected task */ + struct task_struct *selected; struct rq_flags rf; unsigned long thermal_pressure; u64 resched_latency; @@ -5675,16 +5678,17 @@ void scheduler_tick(void) sched_clock_tick(); =20 rq_lock(rq, &rf); + selected =3D rq_selected(rq); =20 update_rq_clock(rq); thermal_pressure =3D arch_scale_thermal_pressure(cpu_of(rq)); update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure); - curr->sched_class->task_tick(rq, curr, 0); + selected->sched_class->task_tick(rq, selected, 0); if (sched_feat(LATENCY_WARN)) resched_latency =3D cpu_resched_latency(rq); calc_global_load_tick(rq); sched_core_tick(rq); - task_tick_mm_cid(rq, curr); + task_tick_mm_cid(rq, selected); =20 rq_unlock(rq, &rf); =20 @@ -5693,8 +5697,8 @@ void scheduler_tick(void) =20 perf_event_task_tick(); =20 - if (curr->flags & PF_WQ_WORKER) - wq_worker_tick(curr); + if (selected->flags & PF_WQ_WORKER) + wq_worker_tick(selected); =20 #ifdef CONFIG_SMP rq->idle_balance =3D idle_cpu(cpu); @@ -5759,6 +5763,12 @@ static void sched_tick_remote(struct work_struct *wo= rk) struct task_struct *curr =3D rq->curr; =20 if (cpu_online(cpu)) { + /* + * Since this is a remote tick for full dynticks mode, + * we are always sure that there is no proxy (only a + * single task is running). + */ + SCHED_WARN_ON(rq->curr !=3D rq_selected(rq)); update_rq_clock(rq); =20 if (!is_idle_task(curr)) { @@ -6712,6 +6722,7 @@ static void __sched notrace __schedule(unsigned int s= ched_mode) } =20 next =3D pick_next_task(rq, prev, &rf); + rq_set_selected(rq, next); clear_tsk_need_resched(prev); clear_preempt_need_resched(); #ifdef CONFIG_SCHED_DEBUG @@ -7222,7 +7233,7 @@ void rt_mutex_setprio(struct task_struct *p, struct t= ask_struct *pi_task) =20 prev_class =3D p->sched_class; queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); if (queued) dequeue_task(rq, p, queue_flag); if (running) @@ -7312,7 +7323,7 @@ void set_user_nice(struct task_struct *p, long nice) } =20 queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); if (queued) dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK); if (running) @@ -7891,7 +7902,7 @@ static int __sched_setscheduler(struct task_struct *p, } =20 queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); if (queued) dequeue_task(rq, p, queue_flags); if (running) @@ -9318,6 +9329,7 @@ void __init init_idle(struct task_struct *idle, int c= pu) rcu_read_unlock(); =20 rq->idle =3D idle; + rq_set_selected(rq, idle); rcu_assign_pointer(rq->curr, idle); idle->on_rq =3D TASK_ON_RQ_QUEUED; #ifdef CONFIG_SMP @@ -9407,7 +9419,7 @@ void sched_setnuma(struct task_struct *p, int nid) =20 rq =3D task_rq_lock(p, &rf); queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); =20 if (queued) dequeue_task(rq, p, DEQUEUE_SAVE); @@ -10512,7 +10524,7 @@ void sched_move_task(struct task_struct *tsk) =20 update_rq_clock(rq); =20 - running =3D task_current(rq, tsk); + running =3D task_current_selected(rq, tsk); queued =3D task_on_rq_queued(tsk); =20 if (queued) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index ae583a427539..6b49f9229414 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1218,7 +1218,7 @@ static enum hrtimer_restart dl_task_timer(struct hrti= mer *timer) #endif =20 enqueue_task_dl(rq, p, ENQUEUE_REPLENISH); - if (dl_task(rq->curr)) + if (dl_task(rq_selected(rq))) wakeup_preempt_dl(rq, p, 0); else resched_curr(rq); @@ -1442,7 +1442,7 @@ void dl_server_init(struct sched_dl_entity *dl_se, st= ruct rq *rq, */ static void update_curr_dl(struct rq *rq) { - struct task_struct *curr =3D rq->curr; + struct task_struct *curr =3D rq_selected(rq); struct sched_dl_entity *dl_se =3D &curr->dl; s64 delta_exec; =20 @@ -1899,7 +1899,7 @@ static int find_later_rq(struct task_struct *task); static int select_task_rq_dl(struct task_struct *p, int cpu, int flags) { - struct task_struct *curr; + struct task_struct *curr, *selected; bool select_rq; struct rq *rq; =20 @@ -1910,6 +1910,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int= flags) =20 rcu_read_lock(); curr =3D READ_ONCE(rq->curr); /* unlocked access */ + selected =3D READ_ONCE(rq_selected(rq)); =20 /* * If we are dealing with a -deadline task, we must @@ -1920,9 +1921,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int= flags) * other hand, if it has a shorter deadline, we * try to make it stay here, it might be important. */ - select_rq =3D unlikely(dl_task(curr)) && + select_rq =3D unlikely(dl_task(selected)) && (curr->nr_cpus_allowed < 2 || - !dl_entity_preempt(&p->dl, &curr->dl)) && + !dl_entity_preempt(&p->dl, &selected->dl)) && p->nr_cpus_allowed > 1; =20 /* @@ -1985,7 +1986,7 @@ static void check_preempt_equal_dl(struct rq *rq, str= uct task_struct *p) * let's hope p can move out. */ if (rq->curr->nr_cpus_allowed =3D=3D 1 || - !cpudl_find(&rq->rd->cpudl, rq->curr, NULL)) + !cpudl_find(&rq->rd->cpudl, rq_selected(rq), NULL)) return; =20 /* @@ -2024,7 +2025,7 @@ static int balance_dl(struct rq *rq, struct task_stru= ct *p, struct rq_flags *rf) static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, int flags) { - if (dl_entity_preempt(&p->dl, &rq->curr->dl)) { + if (dl_entity_preempt(&p->dl, &rq_selected(rq)->dl)) { resched_curr(rq); return; } @@ -2034,7 +2035,7 @@ static void wakeup_preempt_dl(struct rq *rq, struct t= ask_struct *p, * In the unlikely case current and p have the same deadline * let us try to decide what's the best thing to do... */ - if ((p->dl.deadline =3D=3D rq->curr->dl.deadline) && + if ((p->dl.deadline =3D=3D rq_selected(rq)->dl.deadline) && !test_tsk_need_resched(rq->curr)) check_preempt_equal_dl(rq, p); #endif /* CONFIG_SMP */ @@ -2066,7 +2067,7 @@ static void set_next_task_dl(struct rq *rq, struct ta= sk_struct *p, bool first) if (!first) return; =20 - if (rq->curr->sched_class !=3D &dl_sched_class) + if (rq_selected(rq)->sched_class !=3D &dl_sched_class) update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0); =20 deadline_queue_push_tasks(rq); @@ -2391,8 +2392,8 @@ static int push_dl_task(struct rq *rq) * can move away, it makes sense to just reschedule * without going further in pushing next_task. */ - if (dl_task(rq->curr) && - dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) && + if (dl_task(rq_selected(rq)) && + dl_time_before(next_task->dl.deadline, rq_selected(rq)->dl.deadline) = && rq->curr->nr_cpus_allowed > 1) { resched_curr(rq); return 0; @@ -2515,7 +2516,7 @@ static void pull_dl_task(struct rq *this_rq) * deadline than the current task of its runqueue. */ if (dl_time_before(p->dl.deadline, - src_rq->curr->dl.deadline)) + rq_selected(src_rq)->dl.deadline)) goto skip; =20 if (is_migration_disabled(p)) { @@ -2554,9 +2555,9 @@ static void task_woken_dl(struct rq *rq, struct task_= struct *p) if (!task_on_cpu(rq, p) && !test_tsk_need_resched(rq->curr) && p->nr_cpus_allowed > 1 && - dl_task(rq->curr) && + dl_task(rq_selected(rq)) && (rq->curr->nr_cpus_allowed < 2 || - !dl_entity_preempt(&p->dl, &rq->curr->dl))) { + !dl_entity_preempt(&p->dl, &rq_selected(rq)->dl))) { push_dl_tasks(rq); } } @@ -2731,12 +2732,12 @@ static void switched_to_dl(struct rq *rq, struct ta= sk_struct *p) return; } =20 - if (rq->curr !=3D p) { + if (rq_selected(rq) !=3D p) { #ifdef CONFIG_SMP if (p->nr_cpus_allowed > 1 && rq->dl.overloaded) deadline_queue_push_tasks(rq); #endif - if (dl_task(rq->curr)) + if (dl_task(rq_selected(rq))) wakeup_preempt_dl(rq, p, 0); else resched_curr(rq); @@ -2765,7 +2766,7 @@ static void prio_changed_dl(struct rq *rq, struct tas= k_struct *p, if (!rq->dl.overloaded) deadline_queue_pull_task(rq); =20 - if (task_current(rq, p)) { + if (task_current_selected(rq, p)) { /* * If we now have a earlier deadline task than p, * then reschedule, provided p is still on this diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c62805dbd608..4c0018ba7ea3 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1146,7 +1146,7 @@ static inline void update_curr_task(struct task_struc= t *p, s64 delta_exec) */ s64 update_curr_common(struct rq *rq) { - struct task_struct *curr =3D rq->curr; + struct task_struct *curr =3D rq_selected(rq); s64 delta_exec; =20 delta_exec =3D update_curr_se(rq, &curr->se); @@ -1183,7 +1183,7 @@ static void update_curr(struct cfs_rq *cfs_rq) =20 static void update_curr_fair(struct rq *rq) { - update_curr(cfs_rq_of(&rq->curr->se)); + update_curr(cfs_rq_of(&rq_selected(rq)->se)); } =20 static inline void @@ -6639,7 +6639,7 @@ static void hrtick_start_fair(struct rq *rq, struct t= ask_struct *p) s64 delta =3D slice - ran; =20 if (delta < 0) { - if (task_current(rq, p)) + if (task_current_selected(rq, p)) resched_curr(rq); return; } @@ -6654,7 +6654,7 @@ static void hrtick_start_fair(struct rq *rq, struct t= ask_struct *p) */ static void hrtick_update(struct rq *rq) { - struct task_struct *curr =3D rq->curr; + struct task_struct *curr =3D rq_selected(rq); =20 if (!hrtick_enabled_fair(rq) || curr->sched_class !=3D &fair_sched_class) return; @@ -8285,7 +8285,7 @@ static void set_next_buddy(struct sched_entity *se) */ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p= , int wake_flags) { - struct task_struct *curr =3D rq->curr; + struct task_struct *curr =3D rq_selected(rq); struct sched_entity *se =3D &curr->se, *pse =3D &p->se; struct cfs_rq *cfs_rq =3D task_cfs_rq(curr); int cse_is_idle, pse_is_idle; @@ -8316,7 +8316,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, = struct task_struct *p, int * prevents us from potentially nominating it as a false LAST_BUDDY * below. */ - if (test_tsk_need_resched(curr)) + if (test_tsk_need_resched(rq->curr)) return; =20 /* Idle tasks are by definition preempted by non-idle tasks. */ @@ -9298,7 +9298,7 @@ static bool __update_blocked_others(struct rq *rq, bo= ol *done) * update_load_avg() can call cpufreq_update_util(). Make sure that RT, * DL and IRQ signals have been updated before updating CFS. */ - curr_class =3D rq->curr->sched_class; + curr_class =3D rq_selected(rq)->sched_class; =20 thermal_pressure =3D arch_scale_thermal_pressure(cpu_of(rq)); =20 @@ -12667,7 +12667,7 @@ prio_changed_fair(struct rq *rq, struct task_struct= *p, int oldprio) * our priority decreased, or if we are not currently running on * this runqueue and our priority is higher than the current's */ - if (task_current(rq, p)) { + if (task_current_selected(rq, p)) { if (p->prio > oldprio) resched_curr(rq); } else @@ -12770,7 +12770,7 @@ static void switched_to_fair(struct rq *rq, struct = task_struct *p) * kick off the schedule if running, otherwise just see * if we can still preempt the current task. */ - if (task_current(rq, p)) + if (task_current_selected(rq, p)) resched_curr(rq); else wakeup_preempt(rq, p, 0); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 8b6fb77e095b..b02e8aad9b86 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -530,7 +530,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *r= t_se, unsigned int flags) =20 static void sched_rt_rq_enqueue(struct rt_rq *rt_rq) { - struct task_struct *curr =3D rq_of_rt_rq(rt_rq)->curr; + struct task_struct *curr =3D rq_selected(rq_of_rt_rq(rt_rq)); struct rq *rq =3D rq_of_rt_rq(rt_rq); struct sched_rt_entity *rt_se; =20 @@ -1000,7 +1000,7 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt= _rq) */ static void update_curr_rt(struct rq *rq) { - struct task_struct *curr =3D rq->curr; + struct task_struct *curr =3D rq_selected(rq); struct sched_rt_entity *rt_se =3D &curr->rt; s64 delta_exec; =20 @@ -1543,7 +1543,7 @@ static int find_lowest_rq(struct task_struct *task); static int select_task_rq_rt(struct task_struct *p, int cpu, int flags) { - struct task_struct *curr; + struct task_struct *curr, *selected; struct rq *rq; bool test; =20 @@ -1555,6 +1555,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int= flags) =20 rcu_read_lock(); curr =3D READ_ONCE(rq->curr); /* unlocked access */ + selected =3D READ_ONCE(rq_selected(rq)); =20 /* * If the current task on @p's runqueue is an RT task, then @@ -1583,8 +1584,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int= flags) * systems like big.LITTLE. */ test =3D curr && - unlikely(rt_task(curr)) && - (curr->nr_cpus_allowed < 2 || curr->prio <=3D p->prio); + unlikely(rt_task(selected)) && + (curr->nr_cpus_allowed < 2 || selected->prio <=3D p->prio); =20 if (test || !rt_task_fits_capacity(p, cpu)) { int target =3D find_lowest_rq(p); @@ -1614,12 +1615,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, in= t flags) =20 static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p) { - /* - * Current can't be migrated, useless to reschedule, - * let's hope p can move out. - */ if (rq->curr->nr_cpus_allowed =3D=3D 1 || - !cpupri_find(&rq->rd->cpupri, rq->curr, NULL)) + !cpupri_find(&rq->rd->cpupri, rq_selected(rq), NULL)) return; =20 /* @@ -1662,7 +1659,9 @@ static int balance_rt(struct rq *rq, struct task_stru= ct *p, struct rq_flags *rf) */ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int fl= ags) { - if (p->prio < rq->curr->prio) { + struct task_struct *curr =3D rq_selected(rq); + + if (p->prio < curr->prio) { resched_curr(rq); return; } @@ -1680,7 +1679,7 @@ static void wakeup_preempt_rt(struct rq *rq, struct t= ask_struct *p, int flags) * to move current somewhere else, making room for our non-migratable * task. */ - if (p->prio =3D=3D rq->curr->prio && !test_tsk_need_resched(rq->curr)) + if (p->prio =3D=3D curr->prio && !test_tsk_need_resched(rq->curr)) check_preempt_equal_prio(rq, p); #endif } @@ -1705,7 +1704,7 @@ static inline void set_next_task_rt(struct rq *rq, st= ruct task_struct *p, bool f * utilization. We only care of the case where we start to schedule a * rt task */ - if (rq->curr->sched_class !=3D &rt_sched_class) + if (rq_selected(rq)->sched_class !=3D &rt_sched_class) update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0); =20 rt_queue_push_tasks(rq); @@ -1977,6 +1976,7 @@ static struct task_struct *pick_next_pushable_task(st= ruct rq *rq) =20 BUG_ON(rq->cpu !=3D task_cpu(p)); BUG_ON(task_current(rq, p)); + BUG_ON(task_current_selected(rq, p)); BUG_ON(p->nr_cpus_allowed <=3D 1); =20 BUG_ON(!task_on_rq_queued(p)); @@ -2009,7 +2009,7 @@ static int push_rt_task(struct rq *rq, bool pull) * higher priority than current. If that's the case * just reschedule current. */ - if (unlikely(next_task->prio < rq->curr->prio)) { + if (unlikely(next_task->prio < rq_selected(rq)->prio)) { resched_curr(rq); return 0; } @@ -2030,7 +2030,7 @@ static int push_rt_task(struct rq *rq, bool pull) * Note that the stoppers are masqueraded as SCHED_FIFO * (cf. sched_set_stop_task()), so we can't rely on rt_task(). */ - if (rq->curr->sched_class !=3D &rt_sched_class) + if (rq_selected(rq)->sched_class !=3D &rt_sched_class) return 0; =20 cpu =3D find_lowest_rq(rq->curr); @@ -2362,7 +2362,7 @@ static void pull_rt_task(struct rq *this_rq) * p if it is lower in priority than the * current task on the run queue */ - if (p->prio < src_rq->curr->prio) + if (p->prio < rq_selected(src_rq)->prio) goto skip; =20 if (is_migration_disabled(p)) { @@ -2404,9 +2404,9 @@ static void task_woken_rt(struct rq *rq, struct task_= struct *p) bool need_to_push =3D !task_on_cpu(rq, p) && !test_tsk_need_resched(rq->curr) && p->nr_cpus_allowed > 1 && - (dl_task(rq->curr) || rt_task(rq->curr)) && + (dl_task(rq_selected(rq)) || rt_task(rq_selected(rq))) && (rq->curr->nr_cpus_allowed < 2 || - rq->curr->prio <=3D p->prio); + rq_selected(rq)->prio <=3D p->prio); =20 if (need_to_push) push_rt_tasks(rq); @@ -2490,7 +2490,7 @@ static void switched_to_rt(struct rq *rq, struct task= _struct *p) if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) rt_queue_push_tasks(rq); #endif /* CONFIG_SMP */ - if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq))) + if (p->prio < rq_selected(rq)->prio && cpu_online(cpu_of(rq))) resched_curr(rq); } } @@ -2505,7 +2505,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,= int oldprio) if (!task_on_rq_queued(p)) return; =20 - if (task_current(rq, p)) { + if (task_current_selected(rq, p)) { #ifdef CONFIG_SMP /* * If our priority decreases while running, we @@ -2531,7 +2531,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,= int oldprio) * greater than the current running task * then reschedule. */ - if (p->prio < rq->curr->prio) + if (p->prio < rq_selected(rq)->prio) resched_curr(rq); } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index e46f69ba9ba2..747233cf1116 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1032,7 +1032,7 @@ struct rq { */ unsigned int nr_uninterruptible; =20 - struct task_struct __rcu *curr; + struct task_struct __rcu *curr; /* Execution context */ struct task_struct *idle; struct task_struct *stop; unsigned long next_balance; @@ -1227,6 +1227,13 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); #define cpu_curr(cpu) (cpu_rq(cpu)->curr) #define raw_rq() raw_cpu_ptr(&runqueues) =20 +/* For now, rq_selected =3D=3D rq->curr */ +#define rq_selected(rq) ((rq)->curr) +static inline void rq_set_selected(struct rq *rq, struct task_struct *t) +{ + /* Do nothing */ +} + struct sched_group; #ifdef CONFIG_SCHED_CORE static inline struct cpumask *sched_group_span(struct sched_group *sg); @@ -2150,11 +2157,25 @@ static inline u64 global_rt_runtime(void) return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC; } =20 +/* + * Is p the current execution context? + */ static inline int task_current(struct rq *rq, struct task_struct *p) { return rq->curr =3D=3D p; } =20 +/* + * Is p the current scheduling context? + * + * Note that it might be the current execution context at the same time if + * rq->curr =3D=3D rq_selected() =3D=3D p. + */ +static inline int task_current_selected(struct rq *rq, struct task_struct = *p) +{ + return rq_selected(rq) =3D=3D p; +} + static inline int task_on_cpu(struct rq *rq, struct task_struct *p) { #ifdef CONFIG_SMP @@ -2324,7 +2345,7 @@ struct sched_class { =20 static inline void put_prev_task(struct rq *rq, struct task_struct *prev) { - WARN_ON_ONCE(rq->curr !=3D prev); + WARN_ON_ONCE(rq_selected(rq) !=3D prev); prev->sched_class->put_prev_task(rq, prev); } =20 @@ -2405,7 +2426,7 @@ extern void set_cpus_allowed_common(struct task_struc= t *p, struct affinity_conte =20 static inline struct task_struct *get_push_task(struct rq *rq) { - struct task_struct *p =3D rq->curr; + struct task_struct *p =3D rq_selected(rq); =20 lockdep_assert_rq_held(rq); =20 --=20 2.45.0.rc1.225.g2a3ae87e7f-goog