From nobody Tue Nov 4 22:27:51 2025 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA2471509A5 for ; Tue, 9 Jul 2024 20:32:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557150; cv=none; b=C58THBz1OO87Ac7KMWq0f3h5OB/ifFYEj77L2cGnz5SIuq50nfZyElwzDgyY01Eyx06TP3Sp3IUXIyk4wsvEgSLJDR9pS+Sntjjtp8ayUKTDuKh9DnMlxPfeQAnC3+Q/W+LPJYmMGW4RBJNRdDBYMIiGXVT2j65OZY1i3bI3w/U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557150; c=relaxed/simple; bh=q8Ix9JjcyY9SL/viaL1qjWfW1Dg1Z1BI9AMBEYWy8rk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YP4ET1Lw8aJspYhO9wqXmDGm6NAIYGcjlgISpKmAigjREkNx+N5sgvFqhBkPE4EDvaXxJF9jXmB+Cv4dCth6gsQe6hdKOoN/Sc+uz0YEo4zuv04QkDTOhBlyGcLg8ddpXdSlxgLaHqI0ihOgBkQaYTdlngJr50qUo3om0ebF1Tw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=yEED3fkF; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yEED3fkF" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-70af6961edfso3823302b3a.1 for ; Tue, 09 Jul 2024 13:32:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720557148; x=1721161948; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=VifKCQ8gzzjROGV0IDOcCIlHaH0M28p7NLw7oCj8lNA=; b=yEED3fkF6hOuUgBYLhWqZsaLr413+J4mVxGn/7alegwPvAFdqDlPE4sbutIMvU6iDG xRrzLfN6zDwqADwQP6LxiWi191KkdsGKS++KgfEadS+hHCxIB9ERZ7LeHix5FukRN6FH rUBtAoczNyoegwK4dC1wBwUXtOq/QYdRJN7aV4HPo3+FFnog6Iwn8sHD64UDtCxmvVKw JwT1weZXy2+cqTHIFzcZFesk+5uQpiM4DUKkWbAWsXiYpGwuNv9gkYOhEAdkzHhfGzSE ev/94ZI6buvGcZXRjVAUxE3IS7IaU1e5aBY35EEX4zZvKrP2dLrw1NUrPAuDS2V7ylvj 8tOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720557148; x=1721161948; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VifKCQ8gzzjROGV0IDOcCIlHaH0M28p7NLw7oCj8lNA=; b=HINUETgqzbTpfFlTfvMSQ04Mu1gCKOOx7YXrPwUauXUnkSrF8ZefIVvsC+9Pb3/G0z muRgDG/36SJNZURm4TE6UPG3FOTmd3DIl1qrgWylHXBcSurI7kvk6NcJY2WB561dZ1cv DxFI0LoUE1JYz4yaFhhuF/kXkk/fjqO16yf+Bv4Z2umkup4lQpAHfrqDNMfucM8nssh3 vV744ksqYg7tnZml/eDzHQQRnryakc7Ev9gUDwtLZ+2H8bN5DEfIPJk3LF17gr9cuPMX +BoNImRxfIFeFKk3NEK9rP7nM9qduzBTFQF5L3ttoFt8fiXtUmj6gnKUaXy/I1VclW6F 3IuA== X-Gm-Message-State: AOJu0YxjWPWDkHch51MtuJpJ7evTUDrBB7w2IF2pahPYP8XwDgrIg+xe HDNaxAEDzOALd47v+msAOwwXZJM6fLLSc5fBk5a1tg+aB47jSd0CGwsWgflsbHNNUvi3C7At8Qk RJ6s7QK7dGhA7uj3ML+h7QpZugbsno+5/4VGchxzBR6CweNo33/K95X8Z8EldXNw0a//nX8c/0W kFX3im4sGWj8aI/Ujw8iaM5IcqQbScOpXblp7ZHRnc3CpF X-Google-Smtp-Source: AGHT+IEIzKQflnwFEK3K64qo1nTbbiDqUlbliJiC7D862cuEovbFwy0nKfpYSApan4u2dzu8ITu+B77e1RnY X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:aa7:9a5c:0:b0:706:2a35:66e2 with SMTP id d2e1a72fcca58-70b4369b573mr53994b3a.5.1720557145751; Tue, 09 Jul 2024 13:32:25 -0700 (PDT) Date: Tue, 9 Jul 2024 13:31:44 -0700 In-Reply-To: <20240709203213.799070-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240709203213.799070-1-jstultz@google.com> X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog Message-ID: <20240709203213.799070-2-jstultz@google.com> Subject: [PATCH v11 1/7] locking/mutex: Remove wakeups from under mutex::wait_lock From: John Stultz To: LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, John Stultz , Metin Kaya , Davidlohr Bueso Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra In preparation to nest mutex::wait_lock under rq::lock we need to remove wakeups from under it. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Signed-off-by: Peter Zijlstra (Intel) [Heavily changed after 55f036ca7e74 ("locking: WW mutex cleanup") and 08295b3b5bee ("locking: Implement an algorithm choice for Wound-Wait mutexes")] Signed-off-by: Juri Lelli [jstultz: rebased to mainline, added extra wake_up_q & init to avoid hangs, similar to Connor's rework of this patch] Signed-off-by: John Stultz Tested-by: K Prateek Nayak Tested-by: Metin Kaya Acked-by: Davidlohr Bueso Reviewed-by: Metin Kaya --- v5: * Reverted back to an earlier version of this patch to undo the change that kept the wake_q in the ctx structure, as that broke the rule that the wake_q must always be on the stack, as its not safe for concurrency. v6: * Made tweaks suggested by Waiman Long v7: * Fixups to pass wake_qs down for PREEMPT_RT logic v10: * Switched preempt_enable to be lower close to the unlock as suggested by Valentin * Added additional preempt_disable coverage around the wake_q calls as again noted by Valentin --- kernel/locking/mutex.c | 17 +++++++++++++---- kernel/locking/rtmutex.c | 30 +++++++++++++++++++++--------- kernel/locking/rwbase_rt.c | 8 +++++++- kernel/locking/rwsem.c | 4 ++-- kernel/locking/spinlock_rt.c | 3 ++- kernel/locking/ww_mutex.h | 29 ++++++++++++++++++----------- 6 files changed, 63 insertions(+), 28 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index cbae8c0b89ab..4269da1f3ef5 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -575,6 +575,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas struct lockdep_map *nest_lock, unsigned long ip, struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx) { + DEFINE_WAKE_Q(wake_q); struct mutex_waiter waiter; struct ww_mutex *ww; int ret; @@ -625,7 +626,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas */ if (__mutex_trylock(lock)) { if (ww_ctx) - __ww_mutex_check_waiters(lock, ww_ctx); + __ww_mutex_check_waiters(lock, ww_ctx, &wake_q); =20 goto skip_wait; } @@ -645,7 +646,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas * Add in stamp order, waking up waiters that must kill * themselves. */ - ret =3D __ww_mutex_add_waiter(&waiter, lock, ww_ctx); + ret =3D __ww_mutex_add_waiter(&waiter, lock, ww_ctx, &wake_q); if (ret) goto err_early_kill; } @@ -681,6 +682,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int s= tate, unsigned int subclas } =20 raw_spin_unlock(&lock->wait_lock); + /* Make sure we do wakeups before calling schedule */ + if (!wake_q_empty(&wake_q)) { + wake_up_q(&wake_q); + wake_q_init(&wake_q); + } schedule_preempt_disabled(); =20 first =3D __mutex_waiter_is_first(lock, &waiter); @@ -714,7 +720,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas */ if (!ww_ctx->is_wait_die && !__mutex_waiter_is_first(lock, &waiter)) - __ww_mutex_check_waiters(lock, ww_ctx); + __ww_mutex_check_waiters(lock, ww_ctx, &wake_q); } =20 __mutex_remove_waiter(lock, &waiter); @@ -730,6 +736,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas ww_mutex_lock_acquired(ww, ww_ctx); =20 raw_spin_unlock(&lock->wait_lock); + wake_up_q(&wake_q); preempt_enable(); return 0; =20 @@ -741,6 +748,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas raw_spin_unlock(&lock->wait_lock); debug_mutex_free_waiter(&waiter); mutex_release(&lock->dep_map, ip); + wake_up_q(&wake_q); preempt_enable(); return ret; } @@ -951,9 +959,10 @@ static noinline void __sched __mutex_unlock_slowpath(s= truct mutex *lock, unsigne if (owner & MUTEX_FLAG_HANDOFF) __mutex_handoff(lock, next); =20 + preempt_disable(); raw_spin_unlock(&lock->wait_lock); - wake_up_q(&wake_q); + preempt_enable(); } =20 #ifndef CONFIG_DEBUG_LOCK_ALLOC diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 88d08eeb8bc0..7a85d9bfa972 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -34,13 +34,15 @@ =20 static inline int __ww_mutex_add_waiter(struct rt_mutex_waiter *waiter, struct rt_mutex *lock, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, + struct wake_q_head *wake_q) { return 0; } =20 static inline void __ww_mutex_check_waiters(struct rt_mutex *lock, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, + struct wake_q_head *wake_q) { } =20 @@ -1207,6 +1209,7 @@ static int __sched task_blocks_on_rt_mutex(struct rt_= mutex_base *lock, struct rt_mutex_waiter *top_waiter =3D waiter; struct rt_mutex_base *next_lock; int chain_walk =3D 0, res; + DEFINE_WAKE_Q(wake_q); =20 lockdep_assert_held(&lock->wait_lock); =20 @@ -1245,7 +1248,10 @@ static int __sched task_blocks_on_rt_mutex(struct rt= _mutex_base *lock, =20 /* Check whether the waiter should back out immediately */ rtm =3D container_of(lock, struct rt_mutex, rtmutex); - res =3D __ww_mutex_add_waiter(waiter, rtm, ww_ctx); + preempt_disable(); + res =3D __ww_mutex_add_waiter(waiter, rtm, ww_ctx, &wake_q); + wake_up_q(&wake_q); + preempt_enable(); if (res) { raw_spin_lock(&task->pi_lock); rt_mutex_dequeue(lock, waiter); @@ -1678,7 +1684,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, struct ww_acquire_ctx *ww_ctx, unsigned int state, enum rtmutex_chainwalk chwalk, - struct rt_mutex_waiter *waiter) + struct rt_mutex_waiter *waiter, + struct wake_q_head *wake_q) { struct rt_mutex *rtm =3D container_of(lock, struct rt_mutex, rtmutex); struct ww_mutex *ww =3D ww_container_of(rtm); @@ -1689,7 +1696,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, /* Try to acquire the lock again: */ if (try_to_take_rt_mutex(lock, current, NULL)) { if (build_ww_mutex() && ww_ctx) { - __ww_mutex_check_waiters(rtm, ww_ctx); + __ww_mutex_check_waiters(rtm, ww_ctx, wake_q); ww_mutex_lock_acquired(ww, ww_ctx); } return 0; @@ -1707,7 +1714,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, /* acquired the lock */ if (build_ww_mutex() && ww_ctx) { if (!ww_ctx->is_wait_die) - __ww_mutex_check_waiters(rtm, ww_ctx); + __ww_mutex_check_waiters(rtm, ww_ctx, wake_q); ww_mutex_lock_acquired(ww, ww_ctx); } } else { @@ -1729,7 +1736,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, =20 static inline int __rt_mutex_slowlock_locked(struct rt_mutex_base *lock, struct ww_acquire_ctx *ww_ctx, - unsigned int state) + unsigned int state, + struct wake_q_head *wake_q) { struct rt_mutex_waiter waiter; int ret; @@ -1738,7 +1746,7 @@ static inline int __rt_mutex_slowlock_locked(struct r= t_mutex_base *lock, waiter.ww_ctx =3D ww_ctx; =20 ret =3D __rt_mutex_slowlock(lock, ww_ctx, state, RT_MUTEX_MIN_CHAINWALK, - &waiter); + &waiter, wake_q); =20 debug_rt_mutex_free_waiter(&waiter); return ret; @@ -1754,6 +1762,7 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_= base *lock, struct ww_acquire_ctx *ww_ctx, unsigned int state) { + DEFINE_WAKE_Q(wake_q); unsigned long flags; int ret; =20 @@ -1775,8 +1784,11 @@ static int __sched rt_mutex_slowlock(struct rt_mutex= _base *lock, * irqsave/restore variants. */ raw_spin_lock_irqsave(&lock->wait_lock, flags); - ret =3D __rt_mutex_slowlock_locked(lock, ww_ctx, state); + ret =3D __rt_mutex_slowlock_locked(lock, ww_ctx, state, &wake_q); + preempt_disable(); raw_spin_unlock_irqrestore(&lock->wait_lock, flags); + wake_up_q(&wake_q); + preempt_enable(); rt_mutex_post_schedule(); =20 return ret; diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index 34a59569db6b..9f4322c07486 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -69,6 +69,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *r= wb, unsigned int state) { struct rt_mutex_base *rtm =3D &rwb->rtmutex; + DEFINE_WAKE_Q(wake_q); int ret; =20 rwbase_pre_schedule(); @@ -110,7 +111,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt = *rwb, * For rwlocks this returns 0 unconditionally, so the below * !ret conditionals are optimized out. */ - ret =3D rwbase_rtmutex_slowlock_locked(rtm, state); + ret =3D rwbase_rtmutex_slowlock_locked(rtm, state, &wake_q); =20 /* * On success the rtmutex is held, so there can't be a writer @@ -121,7 +122,12 @@ static int __sched __rwbase_read_lock(struct rwbase_rt= *rwb, */ if (!ret) atomic_inc(&rwb->readers); + + preempt_disable(); raw_spin_unlock_irq(&rtm->wait_lock); + wake_up_q(&wake_q); + preempt_enable(); + if (!ret) rwbase_rtmutex_unlock(rtm); =20 diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index c6d17aee4209..79ab7b8df5c1 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1415,8 +1415,8 @@ static inline void __downgrade_write(struct rw_semaph= ore *sem) #define rwbase_rtmutex_lock_state(rtm, state) \ __rt_mutex_lock(rtm, state) =20 -#define rwbase_rtmutex_slowlock_locked(rtm, state) \ - __rt_mutex_slowlock_locked(rtm, NULL, state) +#define rwbase_rtmutex_slowlock_locked(rtm, state, wq) \ + __rt_mutex_slowlock_locked(rtm, NULL, state, wq) =20 #define rwbase_rtmutex_unlock(rtm) \ __rt_mutex_unlock(rtm) diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c index 38e292454fcc..fb1810a14c9d 100644 --- a/kernel/locking/spinlock_rt.c +++ b/kernel/locking/spinlock_rt.c @@ -162,7 +162,8 @@ rwbase_rtmutex_lock_state(struct rt_mutex_base *rtm, un= signed int state) } =20 static __always_inline int -rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int sta= te) +rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int sta= te, + struct wake_q_head *wake_q) { rtlock_slowlock_locked(rtm); return 0; diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index 3ad2cc4823e5..7189c6631d90 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -275,7 +275,7 @@ __ww_ctx_less(struct ww_acquire_ctx *a, struct ww_acqui= re_ctx *b) */ static bool __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, struct wake_q_head *wake_q) { if (!ww_ctx->is_wait_die) return false; @@ -284,7 +284,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER = *waiter, #ifndef WW_RT debug_mutex_wake_waiter(lock, waiter); #endif - wake_up_process(waiter->task); + wake_q_add(wake_q, waiter->task); } =20 return true; @@ -299,7 +299,8 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER = *waiter, */ static bool __ww_mutex_wound(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx, - struct ww_acquire_ctx *hold_ctx) + struct ww_acquire_ctx *hold_ctx, + struct wake_q_head *wake_q) { struct task_struct *owner =3D __ww_mutex_owner(lock); =20 @@ -331,7 +332,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock, * wakeup pending to re-read the wounded state. */ if (owner !=3D current) - wake_up_process(owner); + wake_q_add(wake_q, owner); =20 return true; } @@ -352,7 +353,8 @@ static bool __ww_mutex_wound(struct MUTEX *lock, * The current task must not be on the wait list. */ static void -__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx) +__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx, + struct wake_q_head *wake_q) { struct MUTEX_WAITER *cur; =20 @@ -364,8 +366,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_= acquire_ctx *ww_ctx) if (!cur->ww_ctx) continue; =20 - if (__ww_mutex_die(lock, cur, ww_ctx) || - __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx)) + if (__ww_mutex_die(lock, cur, ww_ctx, wake_q) || + __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx, wake_q)) break; } } @@ -377,6 +379,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_= acquire_ctx *ww_ctx) static __always_inline void ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx= *ctx) { + DEFINE_WAKE_Q(wake_q); + ww_mutex_lock_acquired(lock, ctx); =20 /* @@ -405,8 +409,10 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, s= truct ww_acquire_ctx *ctx) * die or wound us. */ lock_wait_lock(&lock->base); - __ww_mutex_check_waiters(&lock->base, ctx); + __ww_mutex_check_waiters(&lock->base, ctx, &wake_q); unlock_wait_lock(&lock->base); + + wake_up_q(&wake_q); } =20 static __always_inline int @@ -488,7 +494,8 @@ __ww_mutex_check_kill(struct MUTEX *lock, struct MUTEX_= WAITER *waiter, static inline int __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter, struct MUTEX *lock, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, + struct wake_q_head *wake_q) { struct MUTEX_WAITER *cur, *pos =3D NULL; bool is_wait_die; @@ -532,7 +539,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter, pos =3D cur; =20 /* Wait-Die: ensure younger waiters die. */ - __ww_mutex_die(lock, cur, ww_ctx); + __ww_mutex_die(lock, cur, ww_ctx, wake_q); } =20 __ww_waiter_add(lock, waiter, pos); @@ -550,7 +557,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter, * such that either we or the fastpath will wound @ww->ctx. */ smp_mb(); - __ww_mutex_wound(lock, ww_ctx, ww->ctx); + __ww_mutex_wound(lock, ww_ctx, ww->ctx, wake_q); } =20 return 0; --=20 2.45.2.993.g49e7a77208-goog From nobody Tue Nov 4 22:27:51 2025 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4AB4185614 for ; Tue, 9 Jul 2024 20:32:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557154; cv=none; b=d3NbQRuB91TSicd+Qbej+kgfmR1FyabAG43daPMqPY6UxLbrC6THZqYD1u2F95KjWD/F9fMOTSQl3rJ6N23t1PipcY39hxucypDNp/HDIUqt4HSFu0OPaflJ6zIrlQ9pQiWm9aNlsDoeEnePyqEyzKhs1PF6dRLfIin1R02rBqU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557154; c=relaxed/simple; bh=8WwFP5CCf9stN1FiTOlKvBC5zqi6xiZP0UDWiU8fL0k=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=t5E24m11nQhScZI/e+oVaEUAsxkCdTJfWiK5PxYrAG2t6wyxAqBCoDrkT554sqmjYXWpzcqz6517uXs5QYYl+SNkfFyMniZqlzXnaItgxca3swqcMYNWQd3YvH/F3PHa6JRE4te6XQiRCsmJa9k2KxDiUlIivFReS3F+HVfca2A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=kRriDddW; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kRriDddW" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-5e4df21f22dso84516a12.0 for ; Tue, 09 Jul 2024 13:32:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720557151; x=1721161951; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dLvTAbNH73dBrl7bNnyR5edCKfFkbAVRdkATgKgoksk=; b=kRriDddWYlXlgFyqUDT6O5yMU+O7f1xMIClEigxPga0u5tVetcpX++Fzf+1h8JqwKP N9tM1fOmvduAZytY50/dsUiDsgpF24s824T5by2S3pCsYK6vb818TlFyLIw9YZ5svr/M nu30flUOuGbxaqVQU1EAoPYCt0zKUJnDEYMiI1M1nTZmqSkaUteWWMwYkxKfgBA3HziE WnyEM83I2Ab/0fb3if1Xmij1qaEkjbiOOZubR8NUqrMdANVhPzMkXa8k+TWJlqoBiwCV 8yr2YJc48Lb56KA7Ejs2W0qzViJ9ACKL89BNJMLfXjIdel8JIpyFPS/ocKOr/94R+qLK kF2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720557151; x=1721161951; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dLvTAbNH73dBrl7bNnyR5edCKfFkbAVRdkATgKgoksk=; b=KHOiDssZOWURzFsslrVFyYV5ucLru5yk0aXCgo0SliyYrN1xzuoVNFfNaepBmhzOuX 25VuybLLVgDR5ZL/xHVGr6/MchOwnqiLVcuzdr5AYFwQ99NTLHAIYMKO3udwcpSkwha7 u7rq2XcA8IOGyJFnlAfyUXzGaDk8XO0aTES3jtlk3UegXrW/ClsaFp9PYKL9FoqT3qwN dgpGWAI6kLCPeE76fPrrLYD4jdb44v49/7z1N00xjB/AnFFAELnD/wScVpFxX0c5NQGV tS9b2kmEzW+4kDoZTlxwAKBgP7KGlrSNIfrDtzSYDgbU1sFoM5BpbOrfdFV3HNx7Wkg1 iu+Q== X-Gm-Message-State: AOJu0YwfOETly6t5c9XANuH2FcvEkKF6wxGR4eh18w05XLfnoouXVc7I KYhfWL0OsKN8lQDHgeVcYGFthpe+m2X87Q5j8EoFfyqZq1dlpyr0Qe5XjKJovs1rYENKtqwRReP uQSZaX478SLIM6qXfkzEQqkfqc778MVM6YLqPDjRiFReaOqwOf3cESgDxI9km/Ih6SdUC+JBoLT qXHqDfQTtVn/hfdbeom5ePVt2DcOflACiRvNX3Y31CsrbF X-Google-Smtp-Source: AGHT+IGMvCZToZeHj65S2vUnjRIIe0nCFMOuJ9QbdPNkwAiB09LaXZrmLtpISjk6zta7Lvx8LztCl8RRgrDp X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a65:67da:0:b0:70e:1058:e16c with SMTP id 41be03b00d2f7-77dfededd53mr8536a12.1.1720557148168; Tue, 09 Jul 2024 13:32:28 -0700 (PDT) Date: Tue, 9 Jul 2024 13:31:45 -0700 In-Reply-To: <20240709203213.799070-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240709203213.799070-1-jstultz@google.com> X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog Message-ID: <20240709203213.799070-3-jstultz@google.com> Subject: [PATCH v11 2/7] locking/mutex: Make mutex::wait_lock irq safe From: John Stultz To: LKML Cc: Juri Lelli , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, "Connor O'Brien" , John Stultz , Metin Kaya Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Juri Lelli mutex::wait_lock might be nested under rq->lock. Make it irq safe then. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra (Intel) [rebase & fix {un,}lock_wait_lock helpers in ww_mutex.h] Signed-off-by: Connor O'Brien Signed-off-by: John Stultz Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Valentin Schneider --- v3: * Re-added this patch after it was dropped in v2 which caused lockdep warnings to trip. v7: * Fix function definition for PREEMPT_RT case, as pointed out by Metin Kaya. * Fix incorrect flags handling in PREEMPT_RT case as found by Metin Kaya --- kernel/locking/mutex.c | 18 ++++++++++-------- kernel/locking/ww_mutex.h | 22 +++++++++++----------- 2 files changed, 21 insertions(+), 19 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 4269da1f3ef5..6d843a0978a5 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -578,6 +578,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas DEFINE_WAKE_Q(wake_q); struct mutex_waiter waiter; struct ww_mutex *ww; + unsigned long flags; int ret; =20 if (!use_ww_ctx) @@ -620,7 +621,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas return 0; } =20 - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); /* * After waiting to acquire the wait_lock, try again. */ @@ -681,7 +682,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas goto err; } =20 - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); /* Make sure we do wakeups before calling schedule */ if (!wake_q_empty(&wake_q)) { wake_up_q(&wake_q); @@ -707,9 +708,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas trace_contention_begin(lock, LCB_F_MUTEX); } =20 - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); } - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); acquired: __set_current_state(TASK_RUNNING); =20 @@ -735,7 +736,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas if (ww_ctx) ww_mutex_lock_acquired(ww, ww_ctx); =20 - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); wake_up_q(&wake_q); preempt_enable(); return 0; @@ -745,7 +746,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas __mutex_remove_waiter(lock, &waiter); err_early_kill: trace_contention_end(lock, ret); - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); debug_mutex_free_waiter(&waiter); mutex_release(&lock->dep_map, ip); wake_up_q(&wake_q); @@ -916,6 +917,7 @@ static noinline void __sched __mutex_unlock_slowpath(st= ruct mutex *lock, unsigne struct task_struct *next =3D NULL; DEFINE_WAKE_Q(wake_q); unsigned long owner; + unsigned long flags; =20 mutex_release(&lock->dep_map, ip); =20 @@ -942,7 +944,7 @@ static noinline void __sched __mutex_unlock_slowpath(st= ruct mutex *lock, unsigne } } =20 - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); debug_mutex_unlock(lock); if (!list_empty(&lock->wait_list)) { /* get the first entry from the wait-list: */ @@ -960,7 +962,7 @@ static noinline void __sched __mutex_unlock_slowpath(st= ruct mutex *lock, unsigne __mutex_handoff(lock, next); =20 preempt_disable(); - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); wake_up_q(&wake_q); preempt_enable(); } diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index 7189c6631d90..9facc0ddfdd3 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -70,14 +70,14 @@ __ww_mutex_has_waiters(struct mutex *lock) return atomic_long_read(&lock->owner) & MUTEX_FLAG_WAITERS; } =20 -static inline void lock_wait_lock(struct mutex *lock) +static inline void lock_wait_lock(struct mutex *lock, unsigned long *flags) { - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, *flags); } =20 -static inline void unlock_wait_lock(struct mutex *lock) +static inline void unlock_wait_lock(struct mutex *lock, unsigned long *fla= gs) { - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, *flags); } =20 static inline void lockdep_assert_wait_lock_held(struct mutex *lock) @@ -144,14 +144,14 @@ __ww_mutex_has_waiters(struct rt_mutex *lock) return rt_mutex_has_waiters(&lock->rtmutex); } =20 -static inline void lock_wait_lock(struct rt_mutex *lock) +static inline void lock_wait_lock(struct rt_mutex *lock, unsigned long *fl= ags) { - raw_spin_lock(&lock->rtmutex.wait_lock); + raw_spin_lock_irqsave(&lock->rtmutex.wait_lock, *flags); } =20 -static inline void unlock_wait_lock(struct rt_mutex *lock) +static inline void unlock_wait_lock(struct rt_mutex *lock, unsigned long *= flags) { - raw_spin_unlock(&lock->rtmutex.wait_lock); + raw_spin_unlock_irqrestore(&lock->rtmutex.wait_lock, *flags); } =20 static inline void lockdep_assert_wait_lock_held(struct rt_mutex *lock) @@ -380,6 +380,7 @@ static __always_inline void ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx= *ctx) { DEFINE_WAKE_Q(wake_q); + unsigned long flags; =20 ww_mutex_lock_acquired(lock, ctx); =20 @@ -408,10 +409,9 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, s= truct ww_acquire_ctx *ctx) * Uh oh, we raced in fastpath, check if any of the waiters need to * die or wound us. */ - lock_wait_lock(&lock->base); + lock_wait_lock(&lock->base, &flags); __ww_mutex_check_waiters(&lock->base, ctx, &wake_q); - unlock_wait_lock(&lock->base); - + unlock_wait_lock(&lock->base, &flags); wake_up_q(&wake_q); } =20 --=20 2.45.2.993.g49e7a77208-goog From nobody Tue Nov 4 22:27:51 2025 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E81D1185616 for ; Tue, 9 Jul 2024 20:32:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557154; cv=none; b=oLkSXWwihO9SCt2tU6OtVJ302qW/qRB+9rDXuZnU8H3Xt/98s2CVyVZR543Zr9T48cdmcDSGBRaQImzwbjt6HmoUoKgGCMWNmwr66fjT7DdCUJtadRWIqv9aXNYAIvaMC7ezlXTmXEtaom/AAxF9MecIwd7axVnZQhwAHGHN2Ac= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557154; c=relaxed/simple; bh=X541dEl79Yn7xvidTxrBKmxXUU7GWVN/DNRykcpVjO8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Hz2KyYW2K/gn8pzMI5lCatLEJcnSxv6rKDCu7mfGPeYdDGLqUHmfU5clzcZre2u5o0jMqehMC6hfETEBQYC9/1KqtydKcnG5uHKpP3oOUCAUORUKgyshxDRAAsRfsGjT4qZQ8ZOg/xVpfWbRnZ41/HT2NwWEGhgIrA0+fy/5oMg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=J33I4bmP; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="J33I4bmP" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-652c6e92147so93851177b3.3 for ; Tue, 09 Jul 2024 13:32:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720557152; x=1721161952; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=HO7ngMSkWjPHLXmo+rqra2Oo+7w/UwqwG9JY6wVCFY0=; b=J33I4bmPO55haCHEhl7ssnCPZDyCEQ+8oC2qPr9GVEhtRbXcj95dLEwq1p0sMOp6Cv 5JqaHxrGTEIqkwvDOXtV0JHfYT14QMCqKHz0k+/HAFAtyty9tzuE3V6G0H0dV2GEtSOC ni/h/CmZhGaVKVAow20LOLRxNZvrJJHdKhP3coQXlIjvUfl4hUHiX0MT8sh100Gc2za6 EQGhlIsuYMBWhMwIZX8qJrFI1PrC1tquyZITVZC8F+3yEfU5KRzTGIRHinJKNcdEQZRJ N8EBo5SvnxxUS6zTYe/beFqo7nJRyynYApx1VLG6+HAm7mWwg/Um04c48wKbi7M4LN8B dd/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720557152; x=1721161952; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HO7ngMSkWjPHLXmo+rqra2Oo+7w/UwqwG9JY6wVCFY0=; b=w8FGUhKJ1eXVZg1ZOUmVVF4SBiSbCeaztPlbqV3QyTRl4rDDdhl6GwAmfmnWOGp44K weCNDtBInwXRpDb4x/dFfC0gjVL2+tD9POzt1MhL7Cuhheyn0mJRLWN4TwCqbwyS3kcF 3pSqTGjzv/I0Am8YQMowZYM01dGqETimMEKOQblw0JiPbxIjK66w9oEbm/VSdeKGlb6i 9ww9PUWvVfN/lxgCX0AbOKl1ytOxMxyCfCLX4yv2rJ/22ooIwa8IFR3ZflxIKgHC+NzI nUEAGHbEsVnUPjhGYytyXSmc0k8xnxPtMeTSVQdABtvkaVjQIcbmeAzqJwptYVYJwqok ZHEA== X-Gm-Message-State: AOJu0Yy2y2W1QI0q34mpkFkYfy+FI9XRdOrJDLRJIqQ3mnbDrAmUmZJc /c1dfxjZYFYtjLNimsrHB3L8kAYDRSAF9EjSelNka+93EFt2SIoWlsrAUFet/HIQHosGXNfgydy mQ4P2EcsMvz8ljcql8M8ypqiqQK5Fq4FJXlVaHLtJgPun/3fGk3hneHUBDlY3PPmVbxkM2LhUEI d3BgM5hx8uxyOKGeNOwIfrZse2TM6gyuGf33HZT73/1+JC X-Google-Smtp-Source: AGHT+IHTH0TVWsYmV2MjvWBvPTj8ERIhji6Z6aTrrT6MOTDQGQfSU4XEudhShTCy9Ws1HmhvyEA3jtWXgA4p X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a05:6902:260b:b0:e05:5fd1:b4ba with SMTP id 3f1490d57ef6-e055fd1bcb3mr97276.4.1720557151478; Tue, 09 Jul 2024 13:32:31 -0700 (PDT) Date: Tue, 9 Jul 2024 13:31:46 -0700 In-Reply-To: <20240709203213.799070-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240709203213.799070-1-jstultz@google.com> X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog Message-ID: <20240709203213.799070-4-jstultz@google.com> Subject: [PATCH v11 3/7] locking/mutex: Expose __mutex_owner() From: John Stultz To: LKML Cc: Juri Lelli , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, Valentin Schneider , "Connor O'Brien" , John Stultz , Metin Kaya Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Juri Lelli Implementing proxy execution requires that scheduler code be able to identify the current owner of a mutex. Expose __mutex_owner() for this purpose (alone!). Includes a null mutex check, so that users of the function can be simplified. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Signed-off-by: Juri Lelli [Removed the EXPORT_SYMBOL] Signed-off-by: Valentin Schneider Signed-off-by: Connor O'Brien [jstultz: Reworked per Peter's suggestions] Signed-off-by: John Stultz Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Valentin Schneider --- v4: * Move __mutex_owner() to kernel/locking/mutex.h instead of adding a new globally available accessor function to keep the exposure of this low, along with keeping it an inline function, as suggested by PeterZ v10: * Handle null lock ptr, to simplify later code, as suggested by Metin Kaya v11: * Tweak commit message suggested by Metin Kaya --- kernel/locking/mutex.c | 25 ------------------------- kernel/locking/mutex.h | 27 +++++++++++++++++++++++++++ 2 files changed, 27 insertions(+), 25 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 6d843a0978a5..4b7193fd3be9 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -56,31 +56,6 @@ __mutex_init(struct mutex *lock, const char *name, struc= t lock_class_key *key) } EXPORT_SYMBOL(__mutex_init); =20 -/* - * @owner: contains: 'struct task_struct *' to the current lock owner, - * NULL means not owned. Since task_struct pointers are aligned at - * at least L1_CACHE_BYTES, we have low bits to store extra state. - * - * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup. - * Bit1 indicates unlock needs to hand the lock to the top-waiter - * Bit2 indicates handoff has been done and we're waiting for pickup. - */ -#define MUTEX_FLAG_WAITERS 0x01 -#define MUTEX_FLAG_HANDOFF 0x02 -#define MUTEX_FLAG_PICKUP 0x04 - -#define MUTEX_FLAGS 0x07 - -/* - * Internal helper function; C doesn't allow us to hide it :/ - * - * DO NOT USE (outside of mutex code). - */ -static inline struct task_struct *__mutex_owner(struct mutex *lock) -{ - return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA= GS); -} - static inline struct task_struct *__owner_task(unsigned long owner) { return (struct task_struct *)(owner & ~MUTEX_FLAGS); diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h index 0b2a79c4013b..cbff35b9b7ae 100644 --- a/kernel/locking/mutex.h +++ b/kernel/locking/mutex.h @@ -20,6 +20,33 @@ struct mutex_waiter { #endif }; =20 +/* + * @owner: contains: 'struct task_struct *' to the current lock owner, + * NULL means not owned. Since task_struct pointers are aligned at + * at least L1_CACHE_BYTES, we have low bits to store extra state. + * + * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup. + * Bit1 indicates unlock needs to hand the lock to the top-waiter + * Bit2 indicates handoff has been done and we're waiting for pickup. + */ +#define MUTEX_FLAG_WAITERS 0x01 +#define MUTEX_FLAG_HANDOFF 0x02 +#define MUTEX_FLAG_PICKUP 0x04 + +#define MUTEX_FLAGS 0x07 + +/* + * Internal helper function; C doesn't allow us to hide it :/ + * + * DO NOT USE (outside of mutex & scheduler code). + */ +static inline struct task_struct *__mutex_owner(struct mutex *lock) +{ + if (!lock) + return NULL; + return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA= GS); +} + #ifdef CONFIG_DEBUG_MUTEXES extern void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter); --=20 2.45.2.993.g49e7a77208-goog From nobody Tue Nov 4 22:27:51 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8B4B185639 for ; Tue, 9 Jul 2024 20:32:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557157; cv=none; b=atBUWVONEb8cJ8cP/+z5/yAHhUusu5E8JBLfHE9m1j4bNbet9u9X2awhwxGLxC99Tj9Ee5WR98s2uAp30UwDT3/2dcbj8IoXqjIllaLR7Z/0hY2dRN1ttrUg2YblgEnQvQJr52wJShVSTVwbLevODJ7KdIVXs4e+bYNIRuuQLeI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557157; c=relaxed/simple; bh=oQTA8x65GQzVtig5DcSmr8Ibwy1iNnjZMuGwkCWZtXU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=p9I+hxd2ZGI/Y1N5J0SFbIaNq8b6C2M0bi3CRCyXmF8hZ8fEiMS4smwR2qmX5m/D74bncYs8fcx+V5muTGkAHZGfgGTvzpi26LjsHfLB0dAsnl4/bK4jgj4JyceM4ybVbNxvvQXxTna3Tpq2fzOzvzyZF5dX9B7C+6bG1FiDesY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=WRBJ/T71; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="WRBJ/T71" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2c1a9e8d3b0so5116937a91.0 for ; Tue, 09 Jul 2024 13:32:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720557155; x=1721161955; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=SK4jF2z7xl5uoUAgslE/9jLcJAWwxsMsuk73k7nd2+0=; b=WRBJ/T71/bmxhqzlCuBznydgPqvt2TJHsKCxdgkM2qdivlftAxd1eFPDDIGduugl9Q X7pG/R3zrhsdKOMnLjD6a63+FlZVOben5hGUDcrPf7kL8XH7JxcxDDRZiyDHGIby2U6q CbQlIuj5SZ3wzZQbSI8wORhH2tIEYyiADBHZQ7KNPEs73mY95PHoWMWaKkSuLE9AJrCd 2xZXSresuRsVqnfb0H9EqgEsOcYIbiZHNEFSp8EAAsnjOTKhTfDmELP2KyIM7b/iV6ub MZQnktuXmSNs6fafUF0jx0OISkJurkXsd3Lj9gGaWg9EWjzJVOgW6I7d2QFHL/lC0Ozr TMEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720557155; x=1721161955; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SK4jF2z7xl5uoUAgslE/9jLcJAWwxsMsuk73k7nd2+0=; b=Kp/Uh3AfDr80Z3Ex9WSuZgGIEFM+fXupc/1iBEQmChomCkv+97OdyJfZBtkPQwImff huOogw5HcQuz/2j+Og/qME2BBfq1/9j5wMpQC+EMQ1L0PTbh/LInt572beP+ALvwYaWv ZuBrhdNL+n3d4Sc5eaXV34cFRCSeETYYaKy7PgPSD9tCB1Cm/NDzPb9fLNnnjRUuadd+ WPgYymQrPpD8DkDwGSl7F23zyZpOkMBYqBbj69kQRJedeCQ6+lMstFSQLbSoTwAytsfS fIH+LFSUreqYSdV9OHr/+aoMmduJ3yrGBVxzUMGKNpVlxS2mIBEz4lFM8gzJpEovV7bx ErrA== X-Gm-Message-State: AOJu0YysZaGBb04YCMtq1Fg29WYV1uqo9guyubtgySkMc3DZZrPokvHl YuP3e2RG5iWm3QIvD04yDk3UxBipHAnb1Ikx0xj7173Fb7pU3LpX3wEhXMHCAEGVLLCeikqUiHC thGV3+b1JHOA2eB2X73xjJnD4Nq0UNhA4bWfmwMWU6Ev5qOzcoDQk/rjMnup2tsf0sHorPhQLOx yF0NNklEzsYZUmQ7ZolqAAKwIMrxOvuvHInd3pk7mdsJkI X-Google-Smtp-Source: AGHT+IEvefeNpWBJ9CbUI1hgDOI+7IfAEHN9pnRIlIhRJhJ/mWxMB3uWM2sWWUIxs6D5KtBnBELcK4gb6kAm X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a17:90a:e610:b0:2c9:6903:a427 with SMTP id 98e67ed59e1d1-2ca35be7eb9mr87390a91.1.1720557154317; Tue, 09 Jul 2024 13:32:34 -0700 (PDT) Date: Tue, 9 Jul 2024 13:31:47 -0700 In-Reply-To: <20240709203213.799070-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240709203213.799070-1-jstultz@google.com> X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog Message-ID: <20240709203213.799070-5-jstultz@google.com> Subject: [PATCH v11 4/7] sched: Add move_queued_task_locked helper From: John Stultz To: LKML Cc: "Connor O'Brien" , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, John Stultz , Metin Kaya Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Connor O'Brien Switch logic that deactivates, sets the task cpu, and reactivates a task on a different rq to use a helper that will be later extended to push entire blocked task chains. This patch was broken out from a larger chain migration patch originally by Connor O'Brien. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Signed-off-by: Connor O'Brien [jstultz: split out from larger chain migration patch] Signed-off-by: John Stultz Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Valentin Schneider Reviewed-by: Qais Yousef --- v8: * Renamed from push_task_chain to do_push_task so it makes more sense without proxy-execution v10: * Changed name to move_queued_task_locked as suggested by Valentin v11: * Also use new helper in __migrate_swap_task() and try_steal_cookie() as suggested by Qais Yousef * Nit cleanups suggested by Metin --- kernel/sched/core.c | 13 +++---------- kernel/sched/deadline.c | 8 ++------ kernel/sched/rt.c | 8 ++------ kernel/sched/sched.h | 12 ++++++++++++ 4 files changed, 19 insertions(+), 22 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index bcf2c4cc0522..5e63dbcbc1f0 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2712,9 +2712,7 @@ int push_cpu_stop(void *arg) =20 // XXX validate p is still the highest prio task if (task_rq(p) =3D=3D rq) { - deactivate_task(rq, p, 0); - set_task_cpu(p, lowest_rq->cpu); - activate_task(lowest_rq, p, 0); + move_queued_task_locked(rq, lowest_rq, p); resched_curr(lowest_rq); } =20 @@ -3414,9 +3412,7 @@ static void __migrate_swap_task(struct task_struct *p= , int cpu) rq_pin_lock(src_rq, &srf); rq_pin_lock(dst_rq, &drf); =20 - deactivate_task(src_rq, p, 0); - set_task_cpu(p, cpu); - activate_task(dst_rq, p, 0); + move_queued_task_locked(src_rq, dst_rq, p); wakeup_preempt(dst_rq, p, 0); =20 rq_unpin_lock(dst_rq, &drf); @@ -6369,10 +6365,7 @@ static bool try_steal_cookie(int this, int that) if (sched_task_is_throttled(p, this)) goto next; =20 - deactivate_task(src, p, 0); - set_task_cpu(p, this); - activate_task(dst, p, 0); - + move_queued_task_locked(src, dst, p); resched_curr(dst); =20 success =3D true; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index c75d1307d86d..3e05e239f5f6 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2442,9 +2442,7 @@ static int push_dl_task(struct rq *rq) goto retry; } =20 - deactivate_task(rq, next_task, 0); - set_task_cpu(next_task, later_rq->cpu); - activate_task(later_rq, next_task, 0); + move_queued_task_locked(rq, later_rq, next_task); ret =3D 1; =20 resched_curr(later_rq); @@ -2530,9 +2528,7 @@ static void pull_dl_task(struct rq *this_rq) if (is_migration_disabled(p)) { push_task =3D get_push_task(src_rq); } else { - deactivate_task(src_rq, p, 0); - set_task_cpu(p, this_cpu); - activate_task(this_rq, p, 0); + move_queued_task_locked(src_rq, this_rq, p); dmin =3D p->dl.deadline; resched =3D true; } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index aa4c1c874fa4..55bb33f367ad 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2105,9 +2105,7 @@ static int push_rt_task(struct rq *rq, bool pull) goto retry; } =20 - deactivate_task(rq, next_task, 0); - set_task_cpu(next_task, lowest_rq->cpu); - activate_task(lowest_rq, next_task, 0); + move_queued_task_locked(rq, lowest_rq, next_task); resched_curr(lowest_rq); ret =3D 1; =20 @@ -2378,9 +2376,7 @@ static void pull_rt_task(struct rq *this_rq) if (is_migration_disabled(p)) { push_task =3D get_push_task(src_rq); } else { - deactivate_task(src_rq, p, 0); - set_task_cpu(p, this_cpu); - activate_task(this_rq, p, 0); + move_queued_task_locked(src_rq, this_rq, p); resched =3D true; } /* diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index a831af102070..a24286059a6a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3479,5 +3479,17 @@ static inline void init_sched_mm_cid(struct task_str= uct *t) { } =20 extern u64 avg_vruntime(struct cfs_rq *cfs_rq); extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se); +#ifdef CONFIG_SMP +static inline +void move_queued_task_locked(struct rq *src_rq, struct rq *dst_rq, struct = task_struct *task) +{ + lockdep_assert_rq_held(src_rq); + lockdep_assert_rq_held(dst_rq); + + deactivate_task(src_rq, task, 0); + set_task_cpu(task, dst_rq->cpu); + activate_task(dst_rq, task, 0); +} +#endif =20 #endif /* _KERNEL_SCHED_SCHED_H */ --=20 2.45.2.993.g49e7a77208-goog From nobody Tue Nov 4 22:27:52 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2BE718785B for ; Tue, 9 Jul 2024 20:32:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557159; cv=none; b=soDFIXegG2WdQej12ms0ONm6oj39cPP1IjAbzQiL7qq6q1ILZZHXrdr5XtYigHwAbBK0QtBw444bqF45SX7gkU5uSq1VaIep23mk/bQj6nLk3YnwAx5XTbYf7wYwHXPdM7GOPvzzDLVApQC4SC4ACnn1oz23FkEceBvOvDYa7oY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557159; c=relaxed/simple; bh=V1PF08pJ+tLWUnbiVFTp/QGQhy5yYg7gQLXFb/OQcQ4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=uLeR5nDPS21p/QsBpwwrydhFBN4x+hsKfTACqoP71LLV8H9Qig6f62mAkZNLsJAYv4898Bq4jmDHaDxHLWL5plw77YDFRd58oaE8npKYdSEz3V1WOWgWZlL6yT1yzIaWhmv7zsIIJFxkb9KenJFNui9IttQeGBBaQR1ESy0dqI0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=H04tlgh8; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="H04tlgh8" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-1fb2ee83169so42772435ad.3 for ; Tue, 09 Jul 2024 13:32:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720557157; x=1721161957; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=q6xxvV2bWwwSWDlU+VJFmJ+e4u0BCl9LU2uMRtJJq7g=; b=H04tlgh8975QOJTdnCyA0433UcmoNXsf+8jlnpoW9oOmY0K8bU88MtdxPYwQ7LJ86I ///aCSorh93PC5EiDW890bECBIepHIngGk6mU3zygbeMrM6dsjEhBCxtena6msKIAnj0 lMf30+qT49MyqTEJBvmxMF16dxJXNIHF0AUk3Ne6uEJNTLiohGneiHs27tGCI8R/acje QAtItwSNKo7v+vuSuTTHwmD1Co6lQh9zfInZGBEKJ/36BIaYqqltNlGZr6ljZub2eAVp vDsfojcHffXRaWZV3qiv4mZUJhOPBShSEbgVw/TpluhV0Lc1tfx6IjTnnGEDbHmAPEuF XfIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720557157; x=1721161957; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=q6xxvV2bWwwSWDlU+VJFmJ+e4u0BCl9LU2uMRtJJq7g=; b=wUMhUf0L2ID9G1ZfoeYpjVJwZROb8brfhJeshsU/okGexQPns6GmEa7D3vB2+3/hKD kVyRrospeIOARMa2cACGHHnwRUp95GHFxHoNFVDlP8I/q7g86jyJJvMjCy1PQyo2f1hr A+lu6FH8OJC4yVK4rvT7RndUMfnX02bo4GP0B1hLHuWvBvI2bJhBcB9Slx1BCFf4TK70 1TLmFwyBAocoBWxFlqrU6FQ1RfrP0j47DgHBHkrhxDarxLQ/y1uDbm5Hg9FFLaqccv4p FccgwDufKkNynzyuls1w2PfKI9WlRPkNhHCJcJO9o7C+4iv6cbMiSkd0c5LVUKeBVCrU e8SQ== X-Gm-Message-State: AOJu0YwdYWDdt8yrOKdGIvE3IJbwjP8itMX03Oro9WEj+wTLO2eknxHM tbqdpECSGUKN5El5dKnjlU1GfbSKWJbDfT4T3MZFCdLOj6QAl21pjr6vwlyd3guVaGvY8jM1w6C STOrgMAT5CYpvdAuERNNg5VkRoD55g/S72Ps6TMB4h73iENgVoDnmyEtASQ1CHJmHQbcgdoYtH4 OOiDf4SiZ6aVgSJm7it+S2N+o5QXTY/E8BgP40IfENW3rc X-Google-Smtp-Source: AGHT+IFl1bSmqEi3C8wdAPOssZy1aEGWMhMjHM4X/fIsGQ249Cc0QIbjiNLJqMXt9/yCo0Du1jYzfmP+7xoc X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a17:902:ecc7:b0:1fa:acf0:72d6 with SMTP id d9443c01a7336-1fbb6cd2646mr161955ad.3.1720557156775; Tue, 09 Jul 2024 13:32:36 -0700 (PDT) Date: Tue, 9 Jul 2024 13:31:48 -0700 In-Reply-To: <20240709203213.799070-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240709203213.799070-1-jstultz@google.com> X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog Message-ID: <20240709203213.799070-6-jstultz@google.com> Subject: [PATCH v11 5/7] sched: Consolidate pick_*_task to task_is_pushable helper From: John Stultz To: LKML Cc: "Connor O'Brien" , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, John Stultz , Metin Kaya Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Connor O'Brien This patch consolidates rt and deadline pick_*_task functions to a task_is_pushable() helper This patch was broken out from a larger chain migration patch originally by Connor O'Brien. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Signed-off-by: Connor O'Brien [jstultz: split out from larger chain migration patch, renamed helper function] Signed-off-by: John Stultz Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Valentin Schneider --- v7: * Split from chain migration patch * Renamed function v11: * Switched to bool (though later in the series it goes to a tri-state return) for now to simplify review. Will add tri-state handling later in the series when its needed. Suggested by Metin and others. --- kernel/sched/deadline.c | 10 +--------- kernel/sched/rt.c | 11 +---------- kernel/sched/sched.h | 9 +++++++++ 3 files changed, 11 insertions(+), 19 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 3e05e239f5f6..ef135776e068 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2178,14 +2178,6 @@ static void task_fork_dl(struct task_struct *p) /* Only try algorithms three times */ #define DL_MAX_TRIES 3 =20 -static int pick_dl_task(struct rq *rq, struct task_struct *p, int cpu) -{ - if (!task_on_cpu(rq, p) && - cpumask_test_cpu(cpu, &p->cpus_mask)) - return 1; - return 0; -} - /* * Return the earliest pushable rq's task, which is suitable to be executed * on the CPU, NULL otherwise: @@ -2204,7 +2196,7 @@ static struct task_struct *pick_earliest_pushable_dl_= task(struct rq *rq, int cpu if (next_node) { p =3D __node_2_pdl(next_node); =20 - if (pick_dl_task(rq, p, cpu)) + if (task_is_pushable(rq, p, cpu)) return p; =20 next_node =3D rb_next(next_node); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 55bb33f367ad..56363e18949a 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1790,15 +1790,6 @@ static void put_prev_task_rt(struct rq *rq, struct t= ask_struct *p) /* Only try algorithms three times */ #define RT_MAX_TRIES 3 =20 -static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu) -{ - if (!task_on_cpu(rq, p) && - cpumask_test_cpu(cpu, &p->cpus_mask)) - return 1; - - return 0; -} - /* * Return the highest pushable rq's task, which is suitable to be executed * on the CPU, NULL otherwise @@ -1812,7 +1803,7 @@ static struct task_struct *pick_highest_pushable_task= (struct rq *rq, int cpu) return NULL; =20 plist_for_each_entry(p, head, pushable_tasks) { - if (pick_rt_task(rq, p, cpu)) + if (task_is_pushable(rq, p, cpu)) return p; } =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index a24286059a6a..493de4cc320a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3490,6 +3490,15 @@ void move_queued_task_locked(struct rq *src_rq, stru= ct rq *dst_rq, struct task_s set_task_cpu(task, dst_rq->cpu); activate_task(dst_rq, task, 0); } + +static inline +bool task_is_pushable(struct rq *rq, struct task_struct *p, int cpu) +{ + if (!task_on_cpu(rq, p) && + cpumask_test_cpu(cpu, &p->cpus_mask)) + return true; + return false; +} #endif =20 #endif /* _KERNEL_SCHED_SCHED_H */ --=20 2.45.2.993.g49e7a77208-goog From nobody Tue Nov 4 22:27:52 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08B2A18787A for ; Tue, 9 Jul 2024 20:32:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557161; cv=none; b=sk8U1HU8Ll2YPKS/MIJ8NreAizH4hfP5WSmRiliTKfQk4vpr5+GC8pEpI5r1Tvnbx8WqVxndcqHUlXMZ8vmqcVlsKjzP+Ja8M5uqszmxMYAPLoDuRVR/u9FJPX/kzdoQxRrfaWwt//ncj0Mt/D7CWWRaIGbGWP/uyxBAW5swaIs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557161; c=relaxed/simple; bh=OdboL8GRjZ9HaNPDORv6f99DqVqCLARn8M9yns8iW10=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NZotO+NYPBwnVLgEQLtpkL8cv1S0W2VUEwZzhdANWDsKeGt0p1gNA1D7TNXI2NAZv0YmyinvDCM9ClJ0o/SYcTrj66oWj5pQVwY/jUHdvIs0Fz2o0H58ZgU0BqUYQKOlnVglZ9X/SlyuzKc1DRmNk+Js1lbpvILrDRNuUxQvWJ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=s6QjC69J; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="s6QjC69J" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2c95f737113so5418308a91.0 for ; Tue, 09 Jul 2024 13:32:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720557159; x=1721161959; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2+x+zlwEFkDOcEBJWqLmqHwwAqtu5VuIJvpoq5xrn34=; b=s6QjC69J/O1Mv7fjBxyGWCT6lKzY6Qew2CusBhxk5qVWrmZ/sy9pNEv1ZsUduCayDj 7lMnBCsonuv4aZ1oHdYKcB5e7ubanVJSKT+jwa2iiCUmq/QDn9DClC2aYiU1UKqQia5W oy1DBi0AmDa/lrx76yJsUk8AfMlD7aZTgyi08g/j4jsAVybEVUWR53aEJeDBVtZXITkW KqrEMVd1Wyt0VC0kgAlBLwx08A9Ha3/DdPBy2Q4SSEPYVHsU1r8QEEMPC9QDrjJFzcUI UsOYH2ur8e0/Cz6WLe3awyprfDNwwhTpKZOaIvgcVTYD7pd/9/Hd2OWPKjM6HTwwRCET wygA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720557159; x=1721161959; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2+x+zlwEFkDOcEBJWqLmqHwwAqtu5VuIJvpoq5xrn34=; b=W7MAVy9sc7LdZkXNrvEPOYRJhMbuha4uZTCfa/5GrVsDOm0kfZiBi55qZLG7MGaEKK LmTClCLq4YZ46V6W+FvTDKL4ouBGJBaVTRyyF/UThGk6G9EceJNAH8aa7XF1/24CK0ng C4VNivIYt81CcQZoIy7akqUrqgya3iZswbDBOJq9eLH11TphmtPKmlp7Wg8iwwVUU+zb BjLqmU8Aa4x0/87PQfnhiQcZq1QYUXPbs+rPlEAcGr8Tzpf+v+nMS7lUULUtdRAq8sft GT84gMGMyd3XvjJcG7wM0dYGUDgNoPpW+59HyV+dbSkYFJJZaAm6VScOZhffyebcWKSm HaTQ== X-Gm-Message-State: AOJu0YxLWu49OaRpWzNsdVzr73CAtQHh9rEdcVhOpRZkYCNSj3KxYFNi /5/Hup47DB4NfCQF3vNbzXSQbPRZa2fwIm/Q9oFWQ7wvo8GuCbiTi81+iuKhnzlLaf2mYuYMYwr qpg23NasU5Lx8rq27B6PuPEC1ZAnpxbzoMv/KWvKDBn5rELUOhjv5DXt5FK2NJk80u71pTncojD zYjz0jM3gLGhqb2tzC9yFWgOoIedUUvCFs4syPcyB5d4aH X-Google-Smtp-Source: AGHT+IFFC0Vp1N3md7SxhD+JLWl5slCFlb5hK6ZHAGLfzOORUrqcGP4Bk8LB46ia8JbxDjJuWLDE4ifCoONQ X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a17:90b:1917:b0:2c9:75c6:32d8 with SMTP id 98e67ed59e1d1-2ca35be805dmr83893a91.1.1720557158836; Tue, 09 Jul 2024 13:32:38 -0700 (PDT) Date: Tue, 9 Jul 2024 13:31:49 -0700 In-Reply-To: <20240709203213.799070-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240709203213.799070-1-jstultz@google.com> X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog Message-ID: <20240709203213.799070-7-jstultz@google.com> Subject: [PATCH v11 6/7] sched: Split out __schedule() deactivate task logic into a helper From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, Metin Kaya Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As we're going to re-use the deactivation logic, split it into a helper. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Signed-off-by: John Stultz Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Qais Yousef --- v6: * Define function as static to avoid "no previous prototype" warnings as Reported-by: kernel test robot v7: * Rename state task_state to be more clear, as suggested by Metin Kaya v11: * Return early to simplify indentation, and drop unused bool return (will be introduced later when its needed) as sugggested by Qais. --- kernel/sched/core.c | 71 +++++++++++++++++++++++++++------------------ 1 file changed, 42 insertions(+), 29 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 5e63dbcbc1f0..029e7ecf5ea9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6566,6 +6566,47 @@ pick_next_task(struct rq *rq, struct task_struct *pr= ev, struct rq_flags *rf) # define SM_MASK_PREEMPT SM_PREEMPT #endif =20 +/* + * Helper function for __schedule() + * + * If a task does not have signals pending, deactivate it + * Otherwise marks the task's __state as RUNNING + */ +static void try_to_deactivate_task(struct rq *rq, struct task_struct *p, + unsigned long task_state) +{ + if (signal_pending_state(task_state, p)) { + WRITE_ONCE(p->__state, TASK_RUNNING); + return; + } + + p->sched_contributes_to_load =3D + (task_state & TASK_UNINTERRUPTIBLE) && + !(task_state & TASK_NOLOAD) && + !(task_state & TASK_FROZEN); + + if (p->sched_contributes_to_load) + rq->nr_uninterruptible++; + + /* + * __schedule() ttwu() + * prev_state =3D prev->state; if (p->on_rq && ...) + * if (prev_state) goto out; + * p->on_rq =3D 0; smp_acquire__after_ctrl_dep(); + * p->state =3D TASK_WAKING + * + * Where __schedule() and ttwu() have matching control dependencies. + * + * After this, schedule() must not care about p->state any more. + */ + deactivate_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK); + + if (p->in_iowait) { + atomic_inc(&rq->nr_iowait); + delayacct_blkio_start(); + } +} + /* * __schedule() is the main scheduler function. * @@ -6659,35 +6700,7 @@ static void __sched notrace __schedule(unsigned int = sched_mode) */ prev_state =3D READ_ONCE(prev->__state); if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) { - if (signal_pending_state(prev_state, prev)) { - WRITE_ONCE(prev->__state, TASK_RUNNING); - } else { - prev->sched_contributes_to_load =3D - (prev_state & TASK_UNINTERRUPTIBLE) && - !(prev_state & TASK_NOLOAD) && - !(prev_state & TASK_FROZEN); - - if (prev->sched_contributes_to_load) - rq->nr_uninterruptible++; - - /* - * __schedule() ttwu() - * prev_state =3D prev->state; if (p->on_rq && ...) - * if (prev_state) goto out; - * p->on_rq =3D 0; smp_acquire__after_ctrl_dep(); - * p->state =3D TASK_WAKING - * - * Where __schedule() and ttwu() have matching control dependencies. - * - * After this, schedule() must not care about p->state any more. - */ - deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK); - - if (prev->in_iowait) { - atomic_inc(&rq->nr_iowait); - delayacct_blkio_start(); - } - } + try_to_deactivate_task(rq, prev, prev_state); switch_count =3D &prev->nvcsw; } =20 --=20 2.45.2.993.g49e7a77208-goog From nobody Tue Nov 4 22:27:52 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CF19189F37 for ; Tue, 9 Jul 2024 20:32:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557164; cv=none; b=OT3GPEGgGKjEZyuStnoPtr6qFNgwoR4u+ObddckF9b/T+Jchj8GqPtKQ2Ox0DikxSDLKJUEbQTPKwlhHuakkOGELa0zlfDVU0nOo+Z8Q38WozgAnU8OJXPLAs7QQ9s1TUuSkleUoOARQBe+VjAgRDb6tHwEbag1xOKGpGxi0384= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720557164; c=relaxed/simple; bh=8oj2dPJkunaEhP3TXkBMXGahZJyB/Lzz3GQs5N7CM0A=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=JUHBbSzutA/9pcVw94NjfVYJRWW4x0Eh/igmWW+JhrOs5CwnqlNWJlg6xKpwnlMUGzyb84XtKiaQ9B/4u01WGH2b586rsYdrUxj8qaKfjhrxpD3yMkCPviyLohF/W3PQg6X82plpqeVCo68dZ91eTvmheDCVwVFfrhCqXFnrJ1s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=K6RvXDIX; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="K6RvXDIX" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2c1a9e8d3b0so5117027a91.0 for ; Tue, 09 Jul 2024 13:32:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720557161; x=1721161961; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9e4tXmsEC4Nw+Pj8+hrXXnuPc9FiOIZDGJo5IDuGx9g=; b=K6RvXDIXx59I/RuMaY43edwJawoUHU88TZdmrWEfGmYmG/26cBVTXYqJkDcNBUJhyc ZMsDb9DvWyaYPNLW/2hkGPemA0+pbClfDhI2u28L1UAoX8JDD2gRnrzYj2F3sKVFxQAb ZyO7k2KEz3avWSksNUkSzrxsL8aWUGcgYA5MJwZj/IrmB/qeRASntg1o7lTIAOc6QQzF JE2CxruGwcvGL7alUpYLrSjkIcndVo2cStUH7ol25rhVo/3mAblk7M4XlacecVQ2ORcC 9ts9h/HsbQgPBs+ePMN89bRErA174Bxe+ldOywe5/pJw06mBnFRW+2uOfPyibkdX1+TB bSNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720557161; x=1721161961; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9e4tXmsEC4Nw+Pj8+hrXXnuPc9FiOIZDGJo5IDuGx9g=; b=epck3RfAQdv8SdFSsu5W4fKvs3YT+1OA79nIC3zaOwF0Xg1C+EED1Gx50BXc9ahOtO AXo1257aKbwdPWsWUbnHjW+uNCAXrSX2wCmUBMig1QDYvw5EXJ4WPlrLvtBDjLfDl9Jz fL5UA1OUBzVhyZ7ttk81bsPN27Qmt8aH8ivr6NvKEHTKdduHgtQGWUKmk6peWcDtqyEj VMdTYZjSgJYTc6psrdCXG6kPON9kkGYHcYs/eXJrmITqm66GfPo/tJwR6xTVIkZuiN8C wEYBEHH8dxn1fH5pNFz7jgy3DV9LG3ZNFObebamAcchrEun6o4gzsyjqGo/94zNzOcuJ YKJw== X-Gm-Message-State: AOJu0YxMsSthsK25wcKd0auyZJwc0VOh2nCxY5Ffkr//khj24/4eVrv7 VHN7QEFWLPfJySdvyP11DzXGFtXDVOGw7Qbe+Bg5qMnnMMDDn4DwJiQmpX6SRle0qSla3z5PnUM Z1rUFBhHHKUrnY7eJ9BkltdfyYPTh70A/JDoo6WmKLB1qzB+LwjJR+ZEh06vW7UaEF8/vflhCVP 3Ag38wAziv5i3EEap3YOCtRbnafblpQYWDl5VxFYPuj9kg X-Google-Smtp-Source: AGHT+IHHYaY+AsRwjbooo/r+hbZcznnhi7/ERdwg/3poVO0ge/tGlFZlxV2mrkusIBNXMalXMTMntVDLoSD0 X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a17:902:d2cc:b0:1f9:aafc:1ebb with SMTP id d9443c01a7336-1fbb6e9abc3mr2637455ad.13.1720557161034; Tue, 09 Jul 2024 13:32:41 -0700 (PDT) Date: Tue, 9 Jul 2024 13:31:50 -0700 In-Reply-To: <20240709203213.799070-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240709203213.799070-1-jstultz@google.com> X-Mailer: git-send-email 2.45.2.993.g49e7a77208-goog Message-ID: <20240709203213.799070-8-jstultz@google.com> Subject: [PATCH v11 7/7] sched: Split scheduler and execution contexts From: John Stultz To: LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Xuewen Yan , K Prateek Nayak , Metin Kaya , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, "Connor O'Brien" , John Stultz , Metin Kaya Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Let's define the scheduling context as all the scheduler state in task_struct for the task selected to run, and the execution context as all state required to actually run the task. Currently both are intertwined in task_struct. We want to logically split these such that we can use the scheduling context of the task selected to be scheduled, but use the execution context of a different task to actually be run. To this purpose, introduce rq_selected() macro to point to the task_struct selected from the runqueue by the scheduler, and will be used for scheduler state, and preserve rq->curr to indicate the execution context of the task that will actually be run. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Metin Kaya Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20181009092434.26221-5-juri.lelli@redhat.com [add additional comments and update more sched_class code to use rq::proxy] Signed-off-by: Connor O'Brien [jstultz: Rebased and resolved minor collisions, reworked to use accessors, tweaked update_curr_common to use rq_proxy fixing rt scheduling issues] Signed-off-by: John Stultz Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya --- v2: * Reworked to use accessors * Fixed update_curr_common to use proxy instead of curr v3: * Tweaked wrapper names * Swapped proxy for selected for clarity v4: * Minor variable name tweaks for readability * Use a macro instead of a inline function and drop other helper functions as suggested by Peter. * Remove verbose comments/questions to avoid review distractions, as suggested by Dietmar v5: * Add CONFIG_PROXY_EXEC option to this patch so the new logic can be tested with this change * Minor fix to grab rq_selected when holding the rq lock v7: * Minor spelling fix and unused argument fixes suggested by Metin Kaya * Switch to curr_selected for consistency, and minor rewording of commit message for clarity * Rename variables selected instead of curr when we're using rq_selected() * Reduce macros in CONFIG_SCHED_PROXY_EXEC ifdef sections, as suggested by Metin Kaya v8: * Use rq->curr, not rq_selected with task_tick, as suggested by Valentin * Minor rework to reorder this with CONFIG_SCHED_PROXY_EXEC patch v10: * Use rq_selected in push_rt_task & get_push_task v11: * Rework to use selected instead of curr in a few cases we were previously assigning curr =3D rq_selected() to minimize lines of change. Suggested by Metin. --- kernel/sched/core.c | 46 ++++++++++++++++++++++++--------------- kernel/sched/deadline.c | 39 +++++++++++++++++---------------- kernel/sched/fair.c | 32 +++++++++++++-------------- kernel/sched/rt.c | 48 ++++++++++++++++++++--------------------- kernel/sched/sched.h | 27 ++++++++++++++++++++--- 5 files changed, 113 insertions(+), 79 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 029e7ecf5ea9..17036bae4a27 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -794,7 +794,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *time= r) =20 rq_lock(rq, &rf); update_rq_clock(rq); - rq->curr->sched_class->task_tick(rq, rq->curr, 1); + rq_selected(rq)->sched_class->task_tick(rq, rq->curr, 1); rq_unlock(rq, &rf); =20 return HRTIMER_NORESTART; @@ -2236,16 +2236,18 @@ static inline void check_class_changed(struct rq *r= q, struct task_struct *p, =20 void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags) { - if (p->sched_class =3D=3D rq->curr->sched_class) - rq->curr->sched_class->wakeup_preempt(rq, p, flags); - else if (sched_class_above(p->sched_class, rq->curr->sched_class)) + struct task_struct *selected =3D rq_selected(rq); + + if (p->sched_class =3D=3D selected->sched_class) + selected->sched_class->wakeup_preempt(rq, p, flags); + else if (sched_class_above(p->sched_class, selected->sched_class)) resched_curr(rq); =20 /* * A queue event has occurred, and we're going to schedule. In * this case, we can save a useless back to back clock update. */ - if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr)) + if (task_on_rq_queued(selected) && test_tsk_need_resched(rq->curr)) rq_clock_skip_update(rq); } =20 @@ -2772,7 +2774,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct a= ffinity_context *ctx) lockdep_assert_held(&p->pi_lock); =20 queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); =20 if (queued) { /* @@ -5593,7 +5595,7 @@ unsigned long long task_sched_runtime(struct task_str= uct *p) * project cycles that may never be accounted to this * thread, breaking clock_gettime(). */ - if (task_current(rq, p) && task_on_rq_queued(p)) { + if (task_current_selected(rq, p) && task_on_rq_queued(p)) { prefetch_curr_exec_start(p); update_rq_clock(rq); p->sched_class->update_curr(rq); @@ -5661,7 +5663,8 @@ void sched_tick(void) { int cpu =3D smp_processor_id(); struct rq *rq =3D cpu_rq(cpu); - struct task_struct *curr =3D rq->curr; + /* accounting goes to the selected task */ + struct task_struct *selected; struct rq_flags rf; unsigned long hw_pressure; u64 resched_latency; @@ -5672,16 +5675,17 @@ void sched_tick(void) sched_clock_tick(); =20 rq_lock(rq, &rf); + selected =3D rq_selected(rq); =20 update_rq_clock(rq); hw_pressure =3D arch_scale_hw_pressure(cpu_of(rq)); update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure); - curr->sched_class->task_tick(rq, curr, 0); + selected->sched_class->task_tick(rq, selected, 0); if (sched_feat(LATENCY_WARN)) resched_latency =3D cpu_resched_latency(rq); calc_global_load_tick(rq); sched_core_tick(rq); - task_tick_mm_cid(rq, curr); + task_tick_mm_cid(rq, selected); =20 rq_unlock(rq, &rf); =20 @@ -5690,8 +5694,8 @@ void sched_tick(void) =20 perf_event_task_tick(); =20 - if (curr->flags & PF_WQ_WORKER) - wq_worker_tick(curr); + if (selected->flags & PF_WQ_WORKER) + wq_worker_tick(selected); =20 #ifdef CONFIG_SMP rq->idle_balance =3D idle_cpu(cpu); @@ -5756,6 +5760,12 @@ static void sched_tick_remote(struct work_struct *wo= rk) struct task_struct *curr =3D rq->curr; =20 if (cpu_online(cpu)) { + /* + * Since this is a remote tick for full dynticks mode, + * we are always sure that there is no proxy (only a + * single task is running). + */ + SCHED_WARN_ON(rq->curr !=3D rq_selected(rq)); update_rq_clock(rq); =20 if (!is_idle_task(curr)) { @@ -6705,6 +6715,7 @@ static void __sched notrace __schedule(unsigned int s= ched_mode) } =20 next =3D pick_next_task(rq, prev, &rf); + rq_set_selected(rq, next); clear_tsk_need_resched(prev); clear_preempt_need_resched(); #ifdef CONFIG_SCHED_DEBUG @@ -7215,7 +7226,7 @@ void rt_mutex_setprio(struct task_struct *p, struct t= ask_struct *pi_task) =20 prev_class =3D p->sched_class; queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); if (queued) dequeue_task(rq, p, queue_flag); if (running) @@ -7305,7 +7316,7 @@ void set_user_nice(struct task_struct *p, long nice) } =20 queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); if (queued) dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK); if (running) @@ -7884,7 +7895,7 @@ static int __sched_setscheduler(struct task_struct *p, } =20 queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); if (queued) dequeue_task(rq, p, queue_flags); if (running) @@ -9311,6 +9322,7 @@ void __init init_idle(struct task_struct *idle, int c= pu) rcu_read_unlock(); =20 rq->idle =3D idle; + rq_set_selected(rq, idle); rcu_assign_pointer(rq->curr, idle); idle->on_rq =3D TASK_ON_RQ_QUEUED; #ifdef CONFIG_SMP @@ -9400,7 +9412,7 @@ void sched_setnuma(struct task_struct *p, int nid) =20 rq =3D task_rq_lock(p, &rf); queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); =20 if (queued) dequeue_task(rq, p, DEQUEUE_SAVE); @@ -10505,7 +10517,7 @@ void sched_move_task(struct task_struct *tsk) =20 update_rq_clock(rq); =20 - running =3D task_current(rq, tsk); + running =3D task_current_selected(rq, tsk); queued =3D task_on_rq_queued(tsk); =20 if (queued) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index ef135776e068..dbfa14ff16ed 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1217,7 +1217,7 @@ static enum hrtimer_restart dl_task_timer(struct hrti= mer *timer) #endif =20 enqueue_task_dl(rq, p, ENQUEUE_REPLENISH); - if (dl_task(rq->curr)) + if (dl_task(rq_selected(rq))) wakeup_preempt_dl(rq, p, 0); else resched_curr(rq); @@ -1441,11 +1441,11 @@ void dl_server_init(struct sched_dl_entity *dl_se, = struct rq *rq, */ static void update_curr_dl(struct rq *rq) { - struct task_struct *curr =3D rq->curr; - struct sched_dl_entity *dl_se =3D &curr->dl; + struct task_struct *selected =3D rq_selected(rq); + struct sched_dl_entity *dl_se =3D &selected->dl; s64 delta_exec; =20 - if (!dl_task(curr) || !on_dl_rq(dl_se)) + if (!dl_task(selected) || !on_dl_rq(dl_se)) return; =20 /* @@ -1898,7 +1898,7 @@ static int find_later_rq(struct task_struct *task); static int select_task_rq_dl(struct task_struct *p, int cpu, int flags) { - struct task_struct *curr; + struct task_struct *curr, *selected; bool select_rq; struct rq *rq; =20 @@ -1909,6 +1909,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int= flags) =20 rcu_read_lock(); curr =3D READ_ONCE(rq->curr); /* unlocked access */ + selected =3D READ_ONCE(rq_selected(rq)); =20 /* * If we are dealing with a -deadline task, we must @@ -1919,9 +1920,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int= flags) * other hand, if it has a shorter deadline, we * try to make it stay here, it might be important. */ - select_rq =3D unlikely(dl_task(curr)) && + select_rq =3D unlikely(dl_task(selected)) && (curr->nr_cpus_allowed < 2 || - !dl_entity_preempt(&p->dl, &curr->dl)) && + !dl_entity_preempt(&p->dl, &selected->dl)) && p->nr_cpus_allowed > 1; =20 /* @@ -1984,7 +1985,7 @@ static void check_preempt_equal_dl(struct rq *rq, str= uct task_struct *p) * let's hope p can move out. */ if (rq->curr->nr_cpus_allowed =3D=3D 1 || - !cpudl_find(&rq->rd->cpudl, rq->curr, NULL)) + !cpudl_find(&rq->rd->cpudl, rq_selected(rq), NULL)) return; =20 /* @@ -2023,7 +2024,7 @@ static int balance_dl(struct rq *rq, struct task_stru= ct *p, struct rq_flags *rf) static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, int flags) { - if (dl_entity_preempt(&p->dl, &rq->curr->dl)) { + if (dl_entity_preempt(&p->dl, &rq_selected(rq)->dl)) { resched_curr(rq); return; } @@ -2033,7 +2034,7 @@ static void wakeup_preempt_dl(struct rq *rq, struct t= ask_struct *p, * In the unlikely case current and p have the same deadline * let us try to decide what's the best thing to do... */ - if ((p->dl.deadline =3D=3D rq->curr->dl.deadline) && + if ((p->dl.deadline =3D=3D rq_selected(rq)->dl.deadline) && !test_tsk_need_resched(rq->curr)) check_preempt_equal_dl(rq, p); #endif /* CONFIG_SMP */ @@ -2065,7 +2066,7 @@ static void set_next_task_dl(struct rq *rq, struct ta= sk_struct *p, bool first) if (!first) return; =20 - if (rq->curr->sched_class !=3D &dl_sched_class) + if (rq_selected(rq)->sched_class !=3D &dl_sched_class) update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0); =20 deadline_queue_push_tasks(rq); @@ -2390,8 +2391,8 @@ static int push_dl_task(struct rq *rq) * can move away, it makes sense to just reschedule * without going further in pushing next_task. */ - if (dl_task(rq->curr) && - dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) && + if (dl_task(rq_selected(rq)) && + dl_time_before(next_task->dl.deadline, rq_selected(rq)->dl.deadline) = && rq->curr->nr_cpus_allowed > 1) { resched_curr(rq); return 0; @@ -2514,7 +2515,7 @@ static void pull_dl_task(struct rq *this_rq) * deadline than the current task of its runqueue. */ if (dl_time_before(p->dl.deadline, - src_rq->curr->dl.deadline)) + rq_selected(src_rq)->dl.deadline)) goto skip; =20 if (is_migration_disabled(p)) { @@ -2553,9 +2554,9 @@ static void task_woken_dl(struct rq *rq, struct task_= struct *p) if (!task_on_cpu(rq, p) && !test_tsk_need_resched(rq->curr) && p->nr_cpus_allowed > 1 && - dl_task(rq->curr) && + dl_task(rq_selected(rq)) && (rq->curr->nr_cpus_allowed < 2 || - !dl_entity_preempt(&p->dl, &rq->curr->dl))) { + !dl_entity_preempt(&p->dl, &rq_selected(rq)->dl))) { push_dl_tasks(rq); } } @@ -2730,12 +2731,12 @@ static void switched_to_dl(struct rq *rq, struct ta= sk_struct *p) return; } =20 - if (rq->curr !=3D p) { + if (rq_selected(rq) !=3D p) { #ifdef CONFIG_SMP if (p->nr_cpus_allowed > 1 && rq->dl.overloaded) deadline_queue_push_tasks(rq); #endif - if (dl_task(rq->curr)) + if (dl_task(rq_selected(rq))) wakeup_preempt_dl(rq, p, 0); else resched_curr(rq); @@ -2764,7 +2765,7 @@ static void prio_changed_dl(struct rq *rq, struct tas= k_struct *p, if (!rq->dl.overloaded) deadline_queue_pull_task(rq); =20 - if (task_current(rq, p)) { + if (task_current_selected(rq, p)) { /* * If we now have a earlier deadline task than p, * then reschedule, provided p is still on this diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8a5b1ae0aa55..4d0d3b423220 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1140,12 +1140,12 @@ static inline void update_curr_task(struct task_str= uct *p, s64 delta_exec) */ s64 update_curr_common(struct rq *rq) { - struct task_struct *curr =3D rq->curr; + struct task_struct *selected =3D rq_selected(rq); s64 delta_exec; =20 - delta_exec =3D update_curr_se(rq, &curr->se); + delta_exec =3D update_curr_se(rq, &selected->se); if (likely(delta_exec > 0)) - update_curr_task(curr, delta_exec); + update_curr_task(selected, delta_exec); =20 return delta_exec; } @@ -1177,7 +1177,7 @@ static void update_curr(struct cfs_rq *cfs_rq) =20 static void update_curr_fair(struct rq *rq) { - update_curr(cfs_rq_of(&rq->curr->se)); + update_curr(cfs_rq_of(&rq_selected(rq)->se)); } =20 static inline void @@ -6646,7 +6646,7 @@ static void hrtick_start_fair(struct rq *rq, struct t= ask_struct *p) s64 delta =3D slice - ran; =20 if (delta < 0) { - if (task_current(rq, p)) + if (task_current_selected(rq, p)) resched_curr(rq); return; } @@ -6661,12 +6661,12 @@ static void hrtick_start_fair(struct rq *rq, struct= task_struct *p) */ static void hrtick_update(struct rq *rq) { - struct task_struct *curr =3D rq->curr; + struct task_struct *selected =3D rq_selected(rq); =20 - if (!hrtick_enabled_fair(rq) || curr->sched_class !=3D &fair_sched_class) + if (!hrtick_enabled_fair(rq) || selected->sched_class !=3D &fair_sched_cl= ass) return; =20 - hrtick_start_fair(rq, curr); + hrtick_start_fair(rq, selected); } #else /* !CONFIG_SCHED_HRTICK */ static inline void @@ -8348,9 +8348,9 @@ static void set_next_buddy(struct sched_entity *se) */ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p= , int wake_flags) { - struct task_struct *curr =3D rq->curr; - struct sched_entity *se =3D &curr->se, *pse =3D &p->se; - struct cfs_rq *cfs_rq =3D task_cfs_rq(curr); + struct task_struct *selected =3D rq_selected(rq); + struct sched_entity *se =3D &selected->se, *pse =3D &p->se; + struct cfs_rq *cfs_rq =3D task_cfs_rq(selected); int cse_is_idle, pse_is_idle; =20 if (unlikely(se =3D=3D pse)) @@ -8379,11 +8379,11 @@ static void check_preempt_wakeup_fair(struct rq *rq= , struct task_struct *p, int * prevents us from potentially nominating it as a false LAST_BUDDY * below. */ - if (test_tsk_need_resched(curr)) + if (test_tsk_need_resched(rq->curr)) return; =20 /* Idle tasks are by definition preempted by non-idle tasks. */ - if (unlikely(task_has_idle_policy(curr)) && + if (unlikely(task_has_idle_policy(selected)) && likely(!task_has_idle_policy(p))) goto preempt; =20 @@ -9361,7 +9361,7 @@ static bool __update_blocked_others(struct rq *rq, bo= ol *done) * update_load_avg() can call cpufreq_update_util(). Make sure that RT, * DL and IRQ signals have been updated before updating CFS. */ - curr_class =3D rq->curr->sched_class; + curr_class =3D rq_selected(rq)->sched_class; =20 hw_pressure =3D arch_scale_hw_pressure(cpu_of(rq)); =20 @@ -12738,7 +12738,7 @@ prio_changed_fair(struct rq *rq, struct task_struct= *p, int oldprio) * our priority decreased, or if we are not currently running on * this runqueue and our priority is higher than the current's */ - if (task_current(rq, p)) { + if (task_current_selected(rq, p)) { if (p->prio > oldprio) resched_curr(rq); } else @@ -12843,7 +12843,7 @@ static void switched_to_fair(struct rq *rq, struct = task_struct *p) * kick off the schedule if running, otherwise just see * if we can still preempt the current task. */ - if (task_current(rq, p)) + if (task_current_selected(rq, p)) resched_curr(rq); else wakeup_preempt(rq, p, 0); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 56363e18949a..da4cbd744fe6 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -529,7 +529,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *r= t_se, unsigned int flags) =20 static void sched_rt_rq_enqueue(struct rt_rq *rt_rq) { - struct task_struct *curr =3D rq_of_rt_rq(rt_rq)->curr; + struct task_struct *selected =3D rq_selected(rq_of_rt_rq(rt_rq)); struct rq *rq =3D rq_of_rt_rq(rt_rq); struct sched_rt_entity *rt_se; =20 @@ -543,7 +543,7 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq) else if (!on_rt_rq(rt_se)) enqueue_rt_entity(rt_se, 0); =20 - if (rt_rq->highest_prio.curr < curr->prio) + if (rt_rq->highest_prio.curr < selected->prio) resched_curr(rq); } } @@ -999,11 +999,11 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt= _rq) */ static void update_curr_rt(struct rq *rq) { - struct task_struct *curr =3D rq->curr; - struct sched_rt_entity *rt_se =3D &curr->rt; + struct task_struct *selected =3D rq_selected(rq); + struct sched_rt_entity *rt_se =3D &selected->rt; s64 delta_exec; =20 - if (curr->sched_class !=3D &rt_sched_class) + if (selected->sched_class !=3D &rt_sched_class) return; =20 delta_exec =3D update_curr_common(rq); @@ -1542,7 +1542,7 @@ static int find_lowest_rq(struct task_struct *task); static int select_task_rq_rt(struct task_struct *p, int cpu, int flags) { - struct task_struct *curr; + struct task_struct *curr, *selected; struct rq *rq; bool test; =20 @@ -1554,6 +1554,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int= flags) =20 rcu_read_lock(); curr =3D READ_ONCE(rq->curr); /* unlocked access */ + selected =3D READ_ONCE(rq_selected(rq)); =20 /* * If the current task on @p's runqueue is an RT task, then @@ -1582,8 +1583,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int= flags) * systems like big.LITTLE. */ test =3D curr && - unlikely(rt_task(curr)) && - (curr->nr_cpus_allowed < 2 || curr->prio <=3D p->prio); + unlikely(rt_task(selected)) && + (curr->nr_cpus_allowed < 2 || selected->prio <=3D p->prio); =20 if (test || !rt_task_fits_capacity(p, cpu)) { int target =3D find_lowest_rq(p); @@ -1613,12 +1614,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, in= t flags) =20 static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p) { - /* - * Current can't be migrated, useless to reschedule, - * let's hope p can move out. - */ if (rq->curr->nr_cpus_allowed =3D=3D 1 || - !cpupri_find(&rq->rd->cpupri, rq->curr, NULL)) + !cpupri_find(&rq->rd->cpupri, rq_selected(rq), NULL)) return; =20 /* @@ -1661,7 +1658,9 @@ static int balance_rt(struct rq *rq, struct task_stru= ct *p, struct rq_flags *rf) */ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int fl= ags) { - if (p->prio < rq->curr->prio) { + struct task_struct *selected =3D rq_selected(rq); + + if (p->prio < selected->prio) { resched_curr(rq); return; } @@ -1679,7 +1678,7 @@ static void wakeup_preempt_rt(struct rq *rq, struct t= ask_struct *p, int flags) * to move current somewhere else, making room for our non-migratable * task. */ - if (p->prio =3D=3D rq->curr->prio && !test_tsk_need_resched(rq->curr)) + if (p->prio =3D=3D selected->prio && !test_tsk_need_resched(rq->curr)) check_preempt_equal_prio(rq, p); #endif } @@ -1704,7 +1703,7 @@ static inline void set_next_task_rt(struct rq *rq, st= ruct task_struct *p, bool f * utilization. We only care of the case where we start to schedule a * rt task */ - if (rq->curr->sched_class !=3D &rt_sched_class) + if (rq_selected(rq)->sched_class !=3D &rt_sched_class) update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0); =20 rt_queue_push_tasks(rq); @@ -1976,6 +1975,7 @@ static struct task_struct *pick_next_pushable_task(st= ruct rq *rq) =20 BUG_ON(rq->cpu !=3D task_cpu(p)); BUG_ON(task_current(rq, p)); + BUG_ON(task_current_selected(rq, p)); BUG_ON(p->nr_cpus_allowed <=3D 1); =20 BUG_ON(!task_on_rq_queued(p)); @@ -2008,7 +2008,7 @@ static int push_rt_task(struct rq *rq, bool pull) * higher priority than current. If that's the case * just reschedule current. */ - if (unlikely(next_task->prio < rq->curr->prio)) { + if (unlikely(next_task->prio < rq_selected(rq)->prio)) { resched_curr(rq); return 0; } @@ -2029,7 +2029,7 @@ static int push_rt_task(struct rq *rq, bool pull) * Note that the stoppers are masqueraded as SCHED_FIFO * (cf. sched_set_stop_task()), so we can't rely on rt_task(). */ - if (rq->curr->sched_class !=3D &rt_sched_class) + if (rq_selected(rq)->sched_class !=3D &rt_sched_class) return 0; =20 cpu =3D find_lowest_rq(rq->curr); @@ -2361,7 +2361,7 @@ static void pull_rt_task(struct rq *this_rq) * p if it is lower in priority than the * current task on the run queue */ - if (p->prio < src_rq->curr->prio) + if (p->prio < rq_selected(src_rq)->prio) goto skip; =20 if (is_migration_disabled(p)) { @@ -2403,9 +2403,9 @@ static void task_woken_rt(struct rq *rq, struct task_= struct *p) bool need_to_push =3D !task_on_cpu(rq, p) && !test_tsk_need_resched(rq->curr) && p->nr_cpus_allowed > 1 && - (dl_task(rq->curr) || rt_task(rq->curr)) && + (dl_task(rq_selected(rq)) || rt_task(rq_selected(rq))) && (rq->curr->nr_cpus_allowed < 2 || - rq->curr->prio <=3D p->prio); + rq_selected(rq)->prio <=3D p->prio); =20 if (need_to_push) push_rt_tasks(rq); @@ -2489,7 +2489,7 @@ static void switched_to_rt(struct rq *rq, struct task= _struct *p) if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) rt_queue_push_tasks(rq); #endif /* CONFIG_SMP */ - if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq))) + if (p->prio < rq_selected(rq)->prio && cpu_online(cpu_of(rq))) resched_curr(rq); } } @@ -2504,7 +2504,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,= int oldprio) if (!task_on_rq_queued(p)) return; =20 - if (task_current(rq, p)) { + if (task_current_selected(rq, p)) { #ifdef CONFIG_SMP /* * If our priority decreases while running, we @@ -2530,7 +2530,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,= int oldprio) * greater than the current running task * then reschedule. */ - if (p->prio < rq->curr->prio) + if (p->prio < rq_selected(rq)->prio) resched_curr(rq); } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 493de4cc320a..7ee8c7fa0ae8 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1051,7 +1051,7 @@ struct rq { */ unsigned int nr_uninterruptible; =20 - struct task_struct __rcu *curr; + struct task_struct __rcu *curr; /* Execution context */ struct task_struct *idle; struct task_struct *stop; unsigned long next_balance; @@ -1246,6 +1246,13 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); #define cpu_curr(cpu) (cpu_rq(cpu)->curr) #define raw_rq() raw_cpu_ptr(&runqueues) =20 +/* For now, rq_selected =3D=3D rq->curr */ +#define rq_selected(rq) ((rq)->curr) +static inline void rq_set_selected(struct rq *rq, struct task_struct *t) +{ + /* Do nothing */ +} + struct sched_group; #ifdef CONFIG_SCHED_CORE static inline struct cpumask *sched_group_span(struct sched_group *sg); @@ -2151,11 +2158,25 @@ static inline u64 global_rt_runtime(void) return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC; } =20 +/* + * Is p the current execution context? + */ static inline int task_current(struct rq *rq, struct task_struct *p) { return rq->curr =3D=3D p; } =20 +/* + * Is p the current scheduling context? + * + * Note that it might be the current execution context at the same time if + * rq->curr =3D=3D rq_selected() =3D=3D p. + */ +static inline int task_current_selected(struct rq *rq, struct task_struct = *p) +{ + return rq_selected(rq) =3D=3D p; +} + static inline int task_on_cpu(struct rq *rq, struct task_struct *p) { #ifdef CONFIG_SMP @@ -2325,7 +2346,7 @@ struct sched_class { =20 static inline void put_prev_task(struct rq *rq, struct task_struct *prev) { - WARN_ON_ONCE(rq->curr !=3D prev); + WARN_ON_ONCE(rq_selected(rq) !=3D prev); prev->sched_class->put_prev_task(rq, prev); } =20 @@ -2406,7 +2427,7 @@ extern void set_cpus_allowed_common(struct task_struc= t *p, struct affinity_conte =20 static inline struct task_struct *get_push_task(struct rq *rq) { - struct task_struct *p =3D rq->curr; + struct task_struct *p =3D rq_selected(rq); =20 lockdep_assert_rq_held(rq); =20 --=20 2.45.2.993.g49e7a77208-goog