From nobody Sat Nov 30 07:44:02 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53A221A76C4 for ; Tue, 10 Sep 2024 22:13:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006383; cv=none; b=qpvEcncWkJgvoC+eHeudJZvjxL5XbvdAZK/72vwnc6+D9RnIXW4iTKJDPkjW48/FATvXZE4OtyOavkd9gTaiZaCKzxyvxb8Bk/rsV91FKBREK+cjj2If1+uIBXcjtRJ11ZxDIv9ikbw/fyFg34N8f/mMc1zG10Q6o7QBi6dcjJY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006383; c=relaxed/simple; bh=F5iTvgxi46oQWpuLWDLrpr40oWtZ6uU1N/scTO6kMfk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=E2QGLS2bRpVEnihArCWXyeaWDmusxzD6B8b3fQ/mMVqZriYUJk5c5I4KfSlXeA4EgNzOKxoHZ56jgwEBx0R5u1cYoPJsyHDTpgr9xWPOyJ0J+hFhM5kfCGHdXq7Zn1aoBLj244yEZCwz06f/GyYvJG6GSawWfHES6xs8VhZ7jT8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=mgTrgXvK; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mgTrgXvK" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6886cd07673so41545437b3.3 for ; Tue, 10 Sep 2024 15:13:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726006380; x=1726611180; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7hhVQHTej7S22zuWcZylzG8zzAJLuSLS0+fa9imwrco=; b=mgTrgXvKnBdyXCnsi0krAogrX7XuQO528WIfi1wpgT0KoetWbqlB1tTd+elR8rXZWd ATias1qcOO4l/PBnEWN31QNIBp3p3+Pb3uJv5T59QM9g1rNxUy4/YxLMWz0FDRk0jBQS GzxP+T5ugHGRx28TJxywO9msKcYB0+W0H136Q6IrhpHj1fQp9kzJ4hTP64sfYn6jMhRa B8mMTCIGdLESkEBl6pgu81BpLhKl0T2MxkSlevkoG5Q9Mh7WM34c+Xbd8oBlt3AsrVkW WS4L7Uzfy+DtFedW71V8uWf7/PrjDDLt6Hz36eSMQoTMxaUYCMwfxr7ZgyJpk/SxmfQy 2xPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726006380; x=1726611180; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7hhVQHTej7S22zuWcZylzG8zzAJLuSLS0+fa9imwrco=; b=PHw6G7ju5DWmL1GFEM7AQpDPj3GPD6BU2XIrcOgN9GUpRw+N92JL3IwPzsJLdzA6wM qPiR4/4DsgdPa9n1qshAjaV7fJi609ERwDcHYRNu8DkLtgenhB1uQb0u+sD5HYgYOjhs YXHaf6CLDMLbRz5Jxs8OIUiiJyPkF3dnMtTZbsD0Dh8+ws4vPczBSgevGJuy6yWgt/WH 5TWLejaVJX4wkGmuxiR7LAgX7CPvlaTbkqJfvFRaCuWT8OnYJLoFJEEkt2Rx9GHnMKtD Zp9wmHa0noADn2c4r8MMgbex3Whyu67cceTVflsuxcsB+W9G8mIwV4e4erwUBD99/i09 9r0A== X-Gm-Message-State: AOJu0YyAvPtLIFXu8athClUHeYnWgRm8mOnZikyCA8Eei11b3FfWYjKB dacpU6zg8QzagiDV/8JFFUu2xztZgSvGo8djvmLhMQ2AYDPdilWVaCuF1xxRviQp8uD2Pg1Hgtd 3cf/e6h2URbu/9cPAuji6HPteeS85RlxlxD1aOjiTlEQyQQ2bqnmW7JWvJRRVaxKZ4LdpdyQJR0 pJsAWWbD9kgXYFMk17/iKewIqMyUciXFL7vd5N6irb8oN0 X-Google-Smtp-Source: AGHT+IGB+nMY37J+k8p93EBqp6uMF2enup31slbNOyhacYhQtiANp2p9Fdfq7JnhVsoTgsequbcB9tzWWjUU X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a81:b247:0:b0:6b2:3ecc:817 with SMTP id 00721157ae682-6dba6e1b842mr150687b3.8.1726006379879; Tue, 10 Sep 2024 15:12:59 -0700 (PDT) Date: Tue, 10 Sep 2024 15:12:07 -0700 In-Reply-To: <20240910221235.2132138-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240910221235.2132138-1-jstultz@google.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog Message-ID: <20240910221235.2132138-2-jstultz@google.com> Subject: [RESEND x2][PATCH v12 1/7] locking/mutex: Remove wakeups from under mutex::wait_lock From: John Stultz To: LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, Metin Kaya , Davidlohr Bueso , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra In preparation to nest mutex::wait_lock under rq::lock we need to remove wakeups from under it. Do this by utilizing wake_qs to defer the wakeup until after the lock is dropped. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Acked-by: Davidlohr Bueso Reviewed-by: Metin Kaya Signed-off-by: Peter Zijlstra (Intel) [Heavily changed after 55f036ca7e74 ("locking: WW mutex cleanup") and 08295b3b5bee ("locking: Implement an algorithm choice for Wound-Wait mutexes")] Signed-off-by: Juri Lelli [jstultz: rebased to mainline, added extra wake_up_q & init to avoid hangs, similar to Connor's rework of this patch] Signed-off-by: John Stultz --- v5: * Reverted back to an earlier version of this patch to undo the change that kept the wake_q in the ctx structure, as that broke the rule that the wake_q must always be on the stack, as its not safe for concurrency. v6: * Made tweaks suggested by Waiman Long v7: * Fixups to pass wake_qs down for PREEMPT_RT logic v10: * Switched preempt_enable to be lower close to the unlock as suggested by Valentin * Added additional preempt_disable coverage around the wake_q calls as again noted by Valentin v12: * Fixes and simplifications from K Prateek Nayak and Peter Zijlstra * Commit message tweak --- kernel/futex/pi.c | 6 +++- kernel/locking/mutex.c | 16 ++++++++--- kernel/locking/rtmutex.c | 49 +++++++++++++++++++++++---------- kernel/locking/rtmutex_api.c | 11 ++++++-- kernel/locking/rtmutex_common.h | 3 +- kernel/locking/rwbase_rt.c | 8 +++++- kernel/locking/rwsem.c | 4 +-- kernel/locking/spinlock_rt.c | 3 +- kernel/locking/ww_mutex.h | 30 ++++++++++++-------- 9 files changed, 92 insertions(+), 38 deletions(-) diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index 5722467f27379..d62cca5ed8f4c 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -922,6 +922,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags= , ktime_t *time, int tryl struct rt_mutex_waiter rt_waiter; struct futex_hash_bucket *hb; struct futex_q q =3D futex_q_init; + DEFINE_WAKE_Q(wake_q); int res, ret; =20 if (!IS_ENABLED(CONFIG_FUTEX_PI)) @@ -1018,8 +1019,11 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fl= ags, ktime_t *time, int tryl * such that futex_unlock_pi() is guaranteed to observe the waiter when * it sees the futex_q::pi_state. */ - ret =3D __rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, cu= rrent); + ret =3D __rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, cu= rrent, &wake_q); + preempt_disable(); raw_spin_unlock_irq(&q.pi_state->pi_mutex.wait_lock); + wake_up_q(&wake_q); + preempt_enable(); =20 if (ret) { if (ret =3D=3D 1) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index cbae8c0b89ab2..6c94da061ec25 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -575,6 +575,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas struct lockdep_map *nest_lock, unsigned long ip, struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx) { + DEFINE_WAKE_Q(wake_q); struct mutex_waiter waiter; struct ww_mutex *ww; int ret; @@ -625,7 +626,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas */ if (__mutex_trylock(lock)) { if (ww_ctx) - __ww_mutex_check_waiters(lock, ww_ctx); + __ww_mutex_check_waiters(lock, ww_ctx, &wake_q); =20 goto skip_wait; } @@ -645,7 +646,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas * Add in stamp order, waking up waiters that must kill * themselves. */ - ret =3D __ww_mutex_add_waiter(&waiter, lock, ww_ctx); + ret =3D __ww_mutex_add_waiter(&waiter, lock, ww_ctx, &wake_q); if (ret) goto err_early_kill; } @@ -681,6 +682,10 @@ __mutex_lock_common(struct mutex *lock, unsigned int s= tate, unsigned int subclas } =20 raw_spin_unlock(&lock->wait_lock); + /* Make sure we do wakeups before calling schedule */ + wake_up_q(&wake_q); + wake_q_init(&wake_q); + schedule_preempt_disabled(); =20 first =3D __mutex_waiter_is_first(lock, &waiter); @@ -714,7 +719,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas */ if (!ww_ctx->is_wait_die && !__mutex_waiter_is_first(lock, &waiter)) - __ww_mutex_check_waiters(lock, ww_ctx); + __ww_mutex_check_waiters(lock, ww_ctx, &wake_q); } =20 __mutex_remove_waiter(lock, &waiter); @@ -730,6 +735,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas ww_mutex_lock_acquired(ww, ww_ctx); =20 raw_spin_unlock(&lock->wait_lock); + wake_up_q(&wake_q); preempt_enable(); return 0; =20 @@ -741,6 +747,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas raw_spin_unlock(&lock->wait_lock); debug_mutex_free_waiter(&waiter); mutex_release(&lock->dep_map, ip); + wake_up_q(&wake_q); preempt_enable(); return ret; } @@ -951,9 +958,10 @@ static noinline void __sched __mutex_unlock_slowpath(s= truct mutex *lock, unsigne if (owner & MUTEX_FLAG_HANDOFF) __mutex_handoff(lock, next); =20 + preempt_disable(); raw_spin_unlock(&lock->wait_lock); - wake_up_q(&wake_q); + preempt_enable(); } =20 #ifndef CONFIG_DEBUG_LOCK_ALLOC diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 88d08eeb8bc03..e5a6dc82508c7 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -34,13 +34,15 @@ =20 static inline int __ww_mutex_add_waiter(struct rt_mutex_waiter *waiter, struct rt_mutex *lock, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, + struct wake_q_head *wake_q) { return 0; } =20 static inline void __ww_mutex_check_waiters(struct rt_mutex *lock, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, + struct wake_q_head *wake_q) { } =20 @@ -1201,7 +1203,8 @@ static int __sched task_blocks_on_rt_mutex(struct rt_= mutex_base *lock, struct rt_mutex_waiter *waiter, struct task_struct *task, struct ww_acquire_ctx *ww_ctx, - enum rtmutex_chainwalk chwalk) + enum rtmutex_chainwalk chwalk, + struct wake_q_head *wake_q) { struct task_struct *owner =3D rt_mutex_owner(lock); struct rt_mutex_waiter *top_waiter =3D waiter; @@ -1245,7 +1248,10 @@ static int __sched task_blocks_on_rt_mutex(struct rt= _mutex_base *lock, =20 /* Check whether the waiter should back out immediately */ rtm =3D container_of(lock, struct rt_mutex, rtmutex); - res =3D __ww_mutex_add_waiter(waiter, rtm, ww_ctx); + preempt_disable(); + res =3D __ww_mutex_add_waiter(waiter, rtm, ww_ctx, wake_q); + wake_up_q(wake_q); + preempt_enable(); if (res) { raw_spin_lock(&task->pi_lock); rt_mutex_dequeue(lock, waiter); @@ -1678,7 +1684,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, struct ww_acquire_ctx *ww_ctx, unsigned int state, enum rtmutex_chainwalk chwalk, - struct rt_mutex_waiter *waiter) + struct rt_mutex_waiter *waiter, + struct wake_q_head *wake_q) { struct rt_mutex *rtm =3D container_of(lock, struct rt_mutex, rtmutex); struct ww_mutex *ww =3D ww_container_of(rtm); @@ -1689,7 +1696,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, /* Try to acquire the lock again: */ if (try_to_take_rt_mutex(lock, current, NULL)) { if (build_ww_mutex() && ww_ctx) { - __ww_mutex_check_waiters(rtm, ww_ctx); + __ww_mutex_check_waiters(rtm, ww_ctx, wake_q); ww_mutex_lock_acquired(ww, ww_ctx); } return 0; @@ -1699,7 +1706,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, =20 trace_contention_begin(lock, LCB_F_RT); =20 - ret =3D task_blocks_on_rt_mutex(lock, waiter, current, ww_ctx, chwalk); + ret =3D task_blocks_on_rt_mutex(lock, waiter, current, ww_ctx, chwalk, wa= ke_q); if (likely(!ret)) ret =3D rt_mutex_slowlock_block(lock, ww_ctx, state, NULL, waiter); =20 @@ -1707,7 +1714,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, /* acquired the lock */ if (build_ww_mutex() && ww_ctx) { if (!ww_ctx->is_wait_die) - __ww_mutex_check_waiters(rtm, ww_ctx); + __ww_mutex_check_waiters(rtm, ww_ctx, wake_q); ww_mutex_lock_acquired(ww, ww_ctx); } } else { @@ -1729,7 +1736,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mute= x_base *lock, =20 static inline int __rt_mutex_slowlock_locked(struct rt_mutex_base *lock, struct ww_acquire_ctx *ww_ctx, - unsigned int state) + unsigned int state, + struct wake_q_head *wake_q) { struct rt_mutex_waiter waiter; int ret; @@ -1738,7 +1746,7 @@ static inline int __rt_mutex_slowlock_locked(struct r= t_mutex_base *lock, waiter.ww_ctx =3D ww_ctx; =20 ret =3D __rt_mutex_slowlock(lock, ww_ctx, state, RT_MUTEX_MIN_CHAINWALK, - &waiter); + &waiter, wake_q); =20 debug_rt_mutex_free_waiter(&waiter); return ret; @@ -1754,6 +1762,7 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_= base *lock, struct ww_acquire_ctx *ww_ctx, unsigned int state) { + DEFINE_WAKE_Q(wake_q); unsigned long flags; int ret; =20 @@ -1775,8 +1784,11 @@ static int __sched rt_mutex_slowlock(struct rt_mutex= _base *lock, * irqsave/restore variants. */ raw_spin_lock_irqsave(&lock->wait_lock, flags); - ret =3D __rt_mutex_slowlock_locked(lock, ww_ctx, state); + ret =3D __rt_mutex_slowlock_locked(lock, ww_ctx, state, &wake_q); + preempt_disable(); raw_spin_unlock_irqrestore(&lock->wait_lock, flags); + wake_up_q(&wake_q); + preempt_enable(); rt_mutex_post_schedule(); =20 return ret; @@ -1803,7 +1815,8 @@ static __always_inline int __rt_mutex_lock(struct rt_= mutex_base *lock, * rtlock_slowlock_locked - Slow path lock acquisition for RT locks * @lock: The underlying RT mutex */ -static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock) +static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock, + struct wake_q_head *wake_q) { struct rt_mutex_waiter waiter; struct task_struct *owner; @@ -1820,7 +1833,7 @@ static void __sched rtlock_slowlock_locked(struct rt_= mutex_base *lock) =20 trace_contention_begin(lock, LCB_F_RT); =20 - task_blocks_on_rt_mutex(lock, &waiter, current, NULL, RT_MUTEX_MIN_CHAINW= ALK); + task_blocks_on_rt_mutex(lock, &waiter, current, NULL, RT_MUTEX_MIN_CHAINW= ALK, wake_q); =20 for (;;) { /* Try to acquire the lock again */ @@ -1831,7 +1844,11 @@ static void __sched rtlock_slowlock_locked(struct rt= _mutex_base *lock) owner =3D rt_mutex_owner(lock); else owner =3D NULL; + preempt_disable(); raw_spin_unlock_irq(&lock->wait_lock); + wake_up_q(wake_q); + wake_q_init(wake_q); + preempt_enable(); =20 if (!owner || !rtmutex_spin_on_owner(lock, &waiter, owner)) schedule_rtlock(); @@ -1856,10 +1873,14 @@ static void __sched rtlock_slowlock_locked(struct r= t_mutex_base *lock) static __always_inline void __sched rtlock_slowlock(struct rt_mutex_base *= lock) { unsigned long flags; + DEFINE_WAKE_Q(wake_q); =20 raw_spin_lock_irqsave(&lock->wait_lock, flags); - rtlock_slowlock_locked(lock); + rtlock_slowlock_locked(lock, &wake_q); + preempt_disable(); raw_spin_unlock_irqrestore(&lock->wait_lock, flags); + wake_up_q(&wake_q); + preempt_enable(); } =20 #endif /* RT_MUTEX_BUILD_SPINLOCKS */ diff --git a/kernel/locking/rtmutex_api.c b/kernel/locking/rtmutex_api.c index a6974d0445930..747f2da160372 100644 --- a/kernel/locking/rtmutex_api.c +++ b/kernel/locking/rtmutex_api.c @@ -291,7 +291,8 @@ void __sched rt_mutex_proxy_unlock(struct rt_mutex_base= *lock) */ int __sched __rt_mutex_start_proxy_lock(struct rt_mutex_base *lock, struct rt_mutex_waiter *waiter, - struct task_struct *task) + struct task_struct *task, + struct wake_q_head *wake_q) { int ret; =20 @@ -302,7 +303,7 @@ int __sched __rt_mutex_start_proxy_lock(struct rt_mutex= _base *lock, =20 /* We enforce deadlock detection for futexes */ ret =3D task_blocks_on_rt_mutex(lock, waiter, task, NULL, - RT_MUTEX_FULL_CHAINWALK); + RT_MUTEX_FULL_CHAINWALK, wake_q); =20 if (ret && !rt_mutex_owner(lock)) { /* @@ -341,12 +342,16 @@ int __sched rt_mutex_start_proxy_lock(struct rt_mutex= _base *lock, struct task_struct *task) { int ret; + DEFINE_WAKE_Q(wake_q); =20 raw_spin_lock_irq(&lock->wait_lock); - ret =3D __rt_mutex_start_proxy_lock(lock, waiter, task); + ret =3D __rt_mutex_start_proxy_lock(lock, waiter, task, &wake_q); if (unlikely(ret)) remove_waiter(lock, waiter); + preempt_disable(); raw_spin_unlock_irq(&lock->wait_lock); + wake_up_q(&wake_q); + preempt_enable(); =20 return ret; } diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_commo= n.h index 1162e07cdaea1..c38a2d2d4a7ee 100644 --- a/kernel/locking/rtmutex_common.h +++ b/kernel/locking/rtmutex_common.h @@ -83,7 +83,8 @@ extern void rt_mutex_init_proxy_locked(struct rt_mutex_ba= se *lock, extern void rt_mutex_proxy_unlock(struct rt_mutex_base *lock); extern int __rt_mutex_start_proxy_lock(struct rt_mutex_base *lock, struct rt_mutex_waiter *waiter, - struct task_struct *task); + struct task_struct *task, + struct wake_q_head *); extern int rt_mutex_start_proxy_lock(struct rt_mutex_base *lock, struct rt_mutex_waiter *waiter, struct task_struct *task); diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index 34a59569db6be..9f4322c074862 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -69,6 +69,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *r= wb, unsigned int state) { struct rt_mutex_base *rtm =3D &rwb->rtmutex; + DEFINE_WAKE_Q(wake_q); int ret; =20 rwbase_pre_schedule(); @@ -110,7 +111,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt = *rwb, * For rwlocks this returns 0 unconditionally, so the below * !ret conditionals are optimized out. */ - ret =3D rwbase_rtmutex_slowlock_locked(rtm, state); + ret =3D rwbase_rtmutex_slowlock_locked(rtm, state, &wake_q); =20 /* * On success the rtmutex is held, so there can't be a writer @@ -121,7 +122,12 @@ static int __sched __rwbase_read_lock(struct rwbase_rt= *rwb, */ if (!ret) atomic_inc(&rwb->readers); + + preempt_disable(); raw_spin_unlock_irq(&rtm->wait_lock); + wake_up_q(&wake_q); + preempt_enable(); + if (!ret) rwbase_rtmutex_unlock(rtm); =20 diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 33cac79e39946..cdb149135f93a 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1415,8 +1415,8 @@ static inline void __downgrade_write(struct rw_semaph= ore *sem) #define rwbase_rtmutex_lock_state(rtm, state) \ __rt_mutex_lock(rtm, state) =20 -#define rwbase_rtmutex_slowlock_locked(rtm, state) \ - __rt_mutex_slowlock_locked(rtm, NULL, state) +#define rwbase_rtmutex_slowlock_locked(rtm, state, wq) \ + __rt_mutex_slowlock_locked(rtm, NULL, state, wq) =20 #define rwbase_rtmutex_unlock(rtm) \ __rt_mutex_unlock(rtm) diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c index 38e292454fccb..fb1810a14c9dd 100644 --- a/kernel/locking/spinlock_rt.c +++ b/kernel/locking/spinlock_rt.c @@ -162,7 +162,8 @@ rwbase_rtmutex_lock_state(struct rt_mutex_base *rtm, un= signed int state) } =20 static __always_inline int -rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int sta= te) +rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int sta= te, + struct wake_q_head *wake_q) { rtlock_slowlock_locked(rtm); return 0; diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index 3ad2cc4823e59..0a05de8728411 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -275,7 +275,7 @@ __ww_ctx_less(struct ww_acquire_ctx *a, struct ww_acqui= re_ctx *b) */ static bool __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, struct wake_q_head *wake_q) { if (!ww_ctx->is_wait_die) return false; @@ -284,7 +284,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER = *waiter, #ifndef WW_RT debug_mutex_wake_waiter(lock, waiter); #endif - wake_up_process(waiter->task); + wake_q_add(wake_q, waiter->task); } =20 return true; @@ -299,7 +299,8 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER = *waiter, */ static bool __ww_mutex_wound(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx, - struct ww_acquire_ctx *hold_ctx) + struct ww_acquire_ctx *hold_ctx, + struct wake_q_head *wake_q) { struct task_struct *owner =3D __ww_mutex_owner(lock); =20 @@ -331,7 +332,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock, * wakeup pending to re-read the wounded state. */ if (owner !=3D current) - wake_up_process(owner); + wake_q_add(wake_q, owner); =20 return true; } @@ -352,7 +353,8 @@ static bool __ww_mutex_wound(struct MUTEX *lock, * The current task must not be on the wait list. */ static void -__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx) +__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx, + struct wake_q_head *wake_q) { struct MUTEX_WAITER *cur; =20 @@ -364,8 +366,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_= acquire_ctx *ww_ctx) if (!cur->ww_ctx) continue; =20 - if (__ww_mutex_die(lock, cur, ww_ctx) || - __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx)) + if (__ww_mutex_die(lock, cur, ww_ctx, wake_q) || + __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx, wake_q)) break; } } @@ -377,6 +379,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_= acquire_ctx *ww_ctx) static __always_inline void ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx= *ctx) { + DEFINE_WAKE_Q(wake_q); + ww_mutex_lock_acquired(lock, ctx); =20 /* @@ -405,8 +409,11 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, s= truct ww_acquire_ctx *ctx) * die or wound us. */ lock_wait_lock(&lock->base); - __ww_mutex_check_waiters(&lock->base, ctx); + __ww_mutex_check_waiters(&lock->base, ctx, &wake_q); + preempt_disable(); unlock_wait_lock(&lock->base); + wake_up_q(&wake_q); + preempt_enable(); } =20 static __always_inline int @@ -488,7 +495,8 @@ __ww_mutex_check_kill(struct MUTEX *lock, struct MUTEX_= WAITER *waiter, static inline int __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter, struct MUTEX *lock, - struct ww_acquire_ctx *ww_ctx) + struct ww_acquire_ctx *ww_ctx, + struct wake_q_head *wake_q) { struct MUTEX_WAITER *cur, *pos =3D NULL; bool is_wait_die; @@ -532,7 +540,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter, pos =3D cur; =20 /* Wait-Die: ensure younger waiters die. */ - __ww_mutex_die(lock, cur, ww_ctx); + __ww_mutex_die(lock, cur, ww_ctx, wake_q); } =20 __ww_waiter_add(lock, waiter, pos); @@ -550,7 +558,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter, * such that either we or the fastpath will wound @ww->ctx. */ smp_mb(); - __ww_mutex_wound(lock, ww_ctx, ww->ctx); + __ww_mutex_wound(lock, ww_ctx, ww->ctx, wake_q); } =20 return 0; --=20 2.46.0.598.g6f2099f65c-goog From nobody Sat Nov 30 07:44:02 2024 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D3A51ABEB7 for ; Tue, 10 Sep 2024 22:13:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006385; cv=none; b=cuCiagpza6CR+KPeLt4Uf8j7N9QCQZg9gxGEUmqIAVpaHwLtWIKq1Q1MMmeO7zLJ9pUFqCo6IDATp3xWlARVHqE5xhCvRD6kxvs9haX1VwKsiyCstHZViMWEbaN1R7hLXK2loZpgb16F7N3RuzuDDRXTGNG77d9izHYBUGLvc80= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006385; c=relaxed/simple; bh=Z/ZpF5o4Uug5WnSpBo4piLXRhtUvhJy3JUwwkULZyMw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZloVUH6VYtYreOsrEkF3rIUE7gRU5sZ84MvzM4xNvLFvwF1c+G9rw/b5G+udmXs2HfkoHx3uiCjfnH8lNolsmdwtndhmLmPybgjIG1PqWaWP6PTNkvdKzyxmQ2nn5zgMfrb6o5vl4wodFljT1foqk7XQROmmtV2THejG5CBigVw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Vz3dFXOV; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Vz3dFXOV" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6d7124938d1so201576337b3.0 for ; Tue, 10 Sep 2024 15:13:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726006382; x=1726611182; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8Ngl3p3gaLhAGY9U302AaZvLwYAytwHdUU1vSVP0r/8=; b=Vz3dFXOV/P/HO6bFDtVprgWusO5hBAGcQOim4oG43rfRGi+sioW7GWajT5jhKbiycM VhjNmcYVYqqCF9kSdR9/+KlJYMDEottUMRRrXwm27vfOs39w+x2rlsS8pUaM7/mK/hJ7 WcuC2/tXZm6SbIEuT7DikwXSxVwfMU8Ptt/0/HVTIWKMsHjJ8k/d5dyUsVFXnmT77r4E +WdS/RLmNsPiPHY9Q+4aXdQ565EV5n5myFGT3kGXJg20PgDCrB7poJBNdeQ5N9P9OmbE sM2TFWUcBPx45PHF1vEIqATJTgSW8baL9a1thwgB4h85eanB3ovRoLcZQHQpLwsfz/LI MaMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726006382; x=1726611182; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8Ngl3p3gaLhAGY9U302AaZvLwYAytwHdUU1vSVP0r/8=; b=Pt7FddN4WqO9tHsRuI1abnqnOSW/FN48lQqAlkoHpa+V7ogIdnmpTBloHbOfyjeVCv bMWAbZcHibdmo52hzZet7KIRindd8BIQX61Nyv+7FKn29yNcdK8mLJk2HHM73x49cIDL A8BGS33Lkp4h57U1frklZO9FGy+uIww68I4D95tFQ/IbDJwIRc/JKSdH+wWii5zyIkjw pYVyoTw0zXMa8D1rFGr2LvMKAjgRJnRYKu/yhD5WzZqGQdbz72dhcNLjlKIHYiiLwjEQ 8QdcHl80xf2incTrSC6QqXa9Y5VfayozH11l2vOpdFZnE4h3f9tCA4+A17FgE+vKT7Vm YGSA== X-Gm-Message-State: AOJu0YyM7PVBDrv3SXcXYVKsBmsyGfKk7+/uOVBJfLRlM+U/t6cSOLEU e/EZkzbtlG/SNra7gBjyET+81nsO+B9iM1ZFCNqKK/y6UK8y1typdAqVqVnoe6YQj9T7UiYUDEL 4wiUhVpxOAwQfK4beX4QhNdURtmlRCFFEODQ+fFW2HX6jdP03SrOq73kvu3g8tYJDeB+Kc0xZTJ 0sTfopKwgHFNI8ojzxz3k1QYM87jzFQ4TZqV753s6G5tzZ X-Google-Smtp-Source: AGHT+IFb7c8/sVMYAqISBjR2DuZFBS7WnptRre1WEVQBFe8h+i3QhKcf2+okVj7uy6XRC3q0vdVpp/riJeS2 X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a25:9011:0:b0:e16:6771:a299 with SMTP id 3f1490d57ef6-e1d34a2107bmr25126276.11.1726006381664; Tue, 10 Sep 2024 15:13:01 -0700 (PDT) Date: Tue, 10 Sep 2024 15:12:08 -0700 In-Reply-To: <20240910221235.2132138-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240910221235.2132138-1-jstultz@google.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog Message-ID: <20240910221235.2132138-3-jstultz@google.com> Subject: [RESEND x2][PATCH v12 2/7] locking/mutex: Make mutex::wait_lock irq safe From: John Stultz To: LKML Cc: Juri Lelli , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, Metin Kaya , "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Juri Lelli mutex::wait_lock might be nested under rq->lock. Make it irq safe then. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Valentin Schneider Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra (Intel) [rebase & fix {un,}lock_wait_lock helpers in ww_mutex.h] Signed-off-by: Connor O'Brien Signed-off-by: John Stultz --- v3: * Re-added this patch after it was dropped in v2 which caused lockdep warnings to trip. v7: * Fix function definition for PREEMPT_RT case, as pointed out by Metin Kaya. * Fix incorrect flags handling in PREEMPT_RT case as found by Metin Kaya --- kernel/locking/mutex.c | 18 ++++++++++-------- kernel/locking/ww_mutex.h | 21 +++++++++++---------- 2 files changed, 21 insertions(+), 18 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 6c94da061ec25..cd248d1767eb3 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -578,6 +578,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas DEFINE_WAKE_Q(wake_q); struct mutex_waiter waiter; struct ww_mutex *ww; + unsigned long flags; int ret; =20 if (!use_ww_ctx) @@ -620,7 +621,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas return 0; } =20 - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); /* * After waiting to acquire the wait_lock, try again. */ @@ -681,7 +682,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas goto err; } =20 - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); /* Make sure we do wakeups before calling schedule */ wake_up_q(&wake_q); wake_q_init(&wake_q); @@ -706,9 +707,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas trace_contention_begin(lock, LCB_F_MUTEX); } =20 - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); } - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); acquired: __set_current_state(TASK_RUNNING); =20 @@ -734,7 +735,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas if (ww_ctx) ww_mutex_lock_acquired(ww, ww_ctx); =20 - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); wake_up_q(&wake_q); preempt_enable(); return 0; @@ -744,7 +745,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int st= ate, unsigned int subclas __mutex_remove_waiter(lock, &waiter); err_early_kill: trace_contention_end(lock, ret); - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); debug_mutex_free_waiter(&waiter); mutex_release(&lock->dep_map, ip); wake_up_q(&wake_q); @@ -915,6 +916,7 @@ static noinline void __sched __mutex_unlock_slowpath(st= ruct mutex *lock, unsigne struct task_struct *next =3D NULL; DEFINE_WAKE_Q(wake_q); unsigned long owner; + unsigned long flags; =20 mutex_release(&lock->dep_map, ip); =20 @@ -941,7 +943,7 @@ static noinline void __sched __mutex_unlock_slowpath(st= ruct mutex *lock, unsigne } } =20 - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, flags); debug_mutex_unlock(lock); if (!list_empty(&lock->wait_list)) { /* get the first entry from the wait-list: */ @@ -959,7 +961,7 @@ static noinline void __sched __mutex_unlock_slowpath(st= ruct mutex *lock, unsigne __mutex_handoff(lock, next); =20 preempt_disable(); - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); wake_up_q(&wake_q); preempt_enable(); } diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h index 0a05de8728411..517a971f6a8be 100644 --- a/kernel/locking/ww_mutex.h +++ b/kernel/locking/ww_mutex.h @@ -70,14 +70,14 @@ __ww_mutex_has_waiters(struct mutex *lock) return atomic_long_read(&lock->owner) & MUTEX_FLAG_WAITERS; } =20 -static inline void lock_wait_lock(struct mutex *lock) +static inline void lock_wait_lock(struct mutex *lock, unsigned long *flags) { - raw_spin_lock(&lock->wait_lock); + raw_spin_lock_irqsave(&lock->wait_lock, *flags); } =20 -static inline void unlock_wait_lock(struct mutex *lock) +static inline void unlock_wait_lock(struct mutex *lock, unsigned long *fla= gs) { - raw_spin_unlock(&lock->wait_lock); + raw_spin_unlock_irqrestore(&lock->wait_lock, *flags); } =20 static inline void lockdep_assert_wait_lock_held(struct mutex *lock) @@ -144,14 +144,14 @@ __ww_mutex_has_waiters(struct rt_mutex *lock) return rt_mutex_has_waiters(&lock->rtmutex); } =20 -static inline void lock_wait_lock(struct rt_mutex *lock) +static inline void lock_wait_lock(struct rt_mutex *lock, unsigned long *fl= ags) { - raw_spin_lock(&lock->rtmutex.wait_lock); + raw_spin_lock_irqsave(&lock->rtmutex.wait_lock, *flags); } =20 -static inline void unlock_wait_lock(struct rt_mutex *lock) +static inline void unlock_wait_lock(struct rt_mutex *lock, unsigned long *= flags) { - raw_spin_unlock(&lock->rtmutex.wait_lock); + raw_spin_unlock_irqrestore(&lock->rtmutex.wait_lock, *flags); } =20 static inline void lockdep_assert_wait_lock_held(struct rt_mutex *lock) @@ -380,6 +380,7 @@ static __always_inline void ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx= *ctx) { DEFINE_WAKE_Q(wake_q); + unsigned long flags; =20 ww_mutex_lock_acquired(lock, ctx); =20 @@ -408,10 +409,10 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, = struct ww_acquire_ctx *ctx) * Uh oh, we raced in fastpath, check if any of the waiters need to * die or wound us. */ - lock_wait_lock(&lock->base); + lock_wait_lock(&lock->base, &flags); __ww_mutex_check_waiters(&lock->base, ctx, &wake_q); preempt_disable(); - unlock_wait_lock(&lock->base); + unlock_wait_lock(&lock->base, &flags); wake_up_q(&wake_q); preempt_enable(); } --=20 2.46.0.598.g6f2099f65c-goog From nobody Sat Nov 30 07:44:02 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F3B971ABEDA for ; Tue, 10 Sep 2024 22:13:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006386; cv=none; b=lOgjhmEx2NwnSiA6660j+04uWD1cHzbl9HBz14bnAzt21TA1Mzxq6CEWLYT5sfzUpmL8Lp60QqvLc4TM3on12fc1vmaGmgRXF7uncWpFeRer7QGR5rBEj+RXIPQNUFvYIqPggnqASHlQGgajg1ZZ23y7g+td99ejDpyrO1jfLrU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006386; c=relaxed/simple; bh=MDWJdTnVn3yMfkCMDsuVz9ep2U1fMOWvy1+eqwZ/7w0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nyE+Sr5WX6+/PHiOoZfkrzIq7HarYLZeJGF/i+0b/YxM0fYS22QyBWUGsFDN2dMupw1IHTdMNjTMdVyC1Pd3nxXuQG8spbTyNd+J/GEr3gu+B72wGtWXqyaJdX1ynfUVk4AZuwpaYf6mE6i/lPexNIX4lLgNtv5SJTRECyxDcH0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=d7Gt5K+r; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="d7Gt5K+r" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6db791c42e3so76933987b3.1 for ; Tue, 10 Sep 2024 15:13:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726006384; x=1726611184; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=GhbjDyq5bWGd6aE16hj7/sjohLBvxh5vFhyJLxlnWkI=; b=d7Gt5K+rETtm/sXr+kPlbejL6iP857IxrAnuO7UOA17oTtEovCY6oroDTBSefX8v5U cjzg+qOH+yjtUu/d8LAjW/qvd6T849P95wMaPuFfQ9ycF43dZ6LjVK+5Fm18bQ9dwaxX LLHEU49C6xarSRdvBqD+hHvtEitZqcqPufg78pCjmiNRKBDn+tIeAS9OHaJE/q4CqGgn 8KEV+uzlFIDlYO1Zc7960Qo02AuXedQX/p3qZZyrR0DiQMx0SGXZJMmHuv+rTJXGLo/F uBRNRZQjWRNYKdf8O/rh2dEwqqnOADCkRtr/O9uv/41tP/syG6cVAf6pBZfyp0acyDu8 k6kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726006384; x=1726611184; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GhbjDyq5bWGd6aE16hj7/sjohLBvxh5vFhyJLxlnWkI=; b=Y0MRUHiE+g/6fkPlmmuG8MJn0dBqDWFD242MWOnnk7VWRwKSY0na7bMQr9x6FSU8Rl QI4QilrQ6aWDehmAtvhVGTL4ql3YQyweuTmbgogZMIVlrDS9xwRDV8sI1MpuhOL4bK2d 81rjXpb2euyVDaszxwWVsFdx8uYBWC3c3bQH8ICxk149MBx7HnUZc4DzuQNXyZ2PgzhE zF76rpEelwTxBmZm6ffY6705TsXS/qffiOnQXFDBdsbXff0z5UWpnZ8TPd0CWhB1kQDt 0uA9uQonnUc1cJ9ZOaRpyUuzTNEc0gGnzu5QwbFKYWPqsLZnZqfnLJ6J1ylQiSQpMzRT bO1w== X-Gm-Message-State: AOJu0YxpgsiXw5jrBBmezaxyxOLd2d8hscjPw36M/vk3GGgWsHojGsGB DPPfkjmbx8yTmDIpZEwpMW3rwSu3vkCGzTaajf/4uAqp/+6jOeUEQzst0dNdLpe7OSHSgf3rjDj u4mPGDmc48ibbUTb+H68UIj+lnuSy8Jnj4dqZgLRhz8/cwkPsBpG+fwrWE0A54QKT4JAAtn2bw+ wWhczzfF0Qk5NevCyb3bQ7UIRIpdsJOuEQk2eTAP9Z6tLk X-Google-Smtp-Source: AGHT+IHOl9GEKAMD0CohshIcmkU7HXkZZTAiByP/xkr6WfxSSFoPnsB1khHBYAQxvJBUcOOAPHDih0gRx8mo X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a05:690c:6a04:b0:6ad:e9c1:fc4f with SMTP id 00721157ae682-6db4516ff0cmr5877617b3.5.1726006383505; Tue, 10 Sep 2024 15:13:03 -0700 (PDT) Date: Tue, 10 Sep 2024 15:12:09 -0700 In-Reply-To: <20240910221235.2132138-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240910221235.2132138-1-jstultz@google.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog Message-ID: <20240910221235.2132138-4-jstultz@google.com> Subject: [RESEND x2][PATCH v12 3/7] locking/mutex: Expose __mutex_owner() From: John Stultz To: LKML Cc: Juri Lelli , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, Metin Kaya , Valentin Schneider , "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Juri Lelli Implementing proxy execution requires that scheduler code be able to identify the current owner of a mutex. Expose __mutex_owner() for this purpose (alone!). Includes a null mutex check, so that users of the function can be simplified. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Valentin Schneider Signed-off-by: Juri Lelli [Removed the EXPORT_SYMBOL] Signed-off-by: Valentin Schneider Signed-off-by: Connor O'Brien [jstultz: Reworked per Peter's suggestions] Signed-off-by: John Stultz --- v4: * Move __mutex_owner() to kernel/locking/mutex.h instead of adding a new globally available accessor function to keep the exposure of this low, along with keeping it an inline function, as suggested by PeterZ v10: * Handle null lock ptr, to simplify later code, as suggested by Metin Kaya v11: * Tweak commit message suggested by Metin Kaya --- kernel/locking/mutex.c | 25 ------------------------- kernel/locking/mutex.h | 27 +++++++++++++++++++++++++++ 2 files changed, 27 insertions(+), 25 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index cd248d1767eb3..3302e52f0c967 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -56,31 +56,6 @@ __mutex_init(struct mutex *lock, const char *name, struc= t lock_class_key *key) } EXPORT_SYMBOL(__mutex_init); =20 -/* - * @owner: contains: 'struct task_struct *' to the current lock owner, - * NULL means not owned. Since task_struct pointers are aligned at - * at least L1_CACHE_BYTES, we have low bits to store extra state. - * - * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup. - * Bit1 indicates unlock needs to hand the lock to the top-waiter - * Bit2 indicates handoff has been done and we're waiting for pickup. - */ -#define MUTEX_FLAG_WAITERS 0x01 -#define MUTEX_FLAG_HANDOFF 0x02 -#define MUTEX_FLAG_PICKUP 0x04 - -#define MUTEX_FLAGS 0x07 - -/* - * Internal helper function; C doesn't allow us to hide it :/ - * - * DO NOT USE (outside of mutex code). - */ -static inline struct task_struct *__mutex_owner(struct mutex *lock) -{ - return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA= GS); -} - static inline struct task_struct *__owner_task(unsigned long owner) { return (struct task_struct *)(owner & ~MUTEX_FLAGS); diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h index 0b2a79c4013bd..cbff35b9b7ae3 100644 --- a/kernel/locking/mutex.h +++ b/kernel/locking/mutex.h @@ -20,6 +20,33 @@ struct mutex_waiter { #endif }; =20 +/* + * @owner: contains: 'struct task_struct *' to the current lock owner, + * NULL means not owned. Since task_struct pointers are aligned at + * at least L1_CACHE_BYTES, we have low bits to store extra state. + * + * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup. + * Bit1 indicates unlock needs to hand the lock to the top-waiter + * Bit2 indicates handoff has been done and we're waiting for pickup. + */ +#define MUTEX_FLAG_WAITERS 0x01 +#define MUTEX_FLAG_HANDOFF 0x02 +#define MUTEX_FLAG_PICKUP 0x04 + +#define MUTEX_FLAGS 0x07 + +/* + * Internal helper function; C doesn't allow us to hide it :/ + * + * DO NOT USE (outside of mutex & scheduler code). + */ +static inline struct task_struct *__mutex_owner(struct mutex *lock) +{ + if (!lock) + return NULL; + return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLA= GS); +} + #ifdef CONFIG_DEBUG_MUTEXES extern void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter); --=20 2.46.0.598.g6f2099f65c-goog From nobody Sat Nov 30 07:44:02 2024 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3F2D1AC431 for ; Tue, 10 Sep 2024 22:13:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006388; cv=none; b=UG5hpmFqABJl0/S5XIiF8bt1BYHX0bd9AprtXOd+1WiR8JJVhnPhd+zHwiTaiDL7p3v/zeNqaFku7NZ9S2PSjt+snaF141MbVG3g0x1o8Bb4iVb4nRhIwQfL/xfoApw3ziju+YfYr/GiItTB0dGHmuvpuVinSBE+P0hnmq6tW+0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006388; c=relaxed/simple; bh=E+tukkaNY2JZnuy2+v/av4ipTY8U4Ethm0a+bNcZjF4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=sk2iqRNmhf2Rm63wGx3VC3FUtU0pcYvUiJ6MKaLg3haUsdIR0aofbgCa0P6L37rZo+G33R91V9r0Mz+OaOqrPsWOgRnyLsFu6jzR5GLT8Xj2LzltGJFZgbGtyeL8nvAVb0XsPlkMoYQB2UZOWI94RVXG1b44BjIsn9ezhSLxdkY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=JpvI3ADn; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="JpvI3ADn" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-207510f3242so6301905ad.0 for ; Tue, 10 Sep 2024 15:13:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726006386; x=1726611186; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=onUSDzogQBtqDC8fVj674OMJjBr3mjIqAIhlm+dFF6E=; b=JpvI3ADn3O8ufqBzSj542wu9Owa5qjxSS7sL3ByPKPrDFOo8DChhQmMfNaVW/jxIvR NHnLoxQy0FxfFUUAL3xVq+f9kGRjPMOemlOYOn6uOE/3l3maHULQQQ4zm+v9VVIgPgF8 m/f+Oit+UtxXlsi4kbnBaQhw4rgu/VsGn0h2VAc9qA5AtoBY0MWXpfZi9CYicXzQFxdk MOdAZOo+Z912xlEck1a5pAHjGMImsLnwLTpASyUIdGKGf/aeG/oyLmepVcGUys9nU/iE fi6MuOBORM96sfCvQlVXr2R/kPhOlU71H2RLSs9l9JFPglW+c9LksCgdvmZxUVa/PDB1 oi3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726006386; x=1726611186; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=onUSDzogQBtqDC8fVj674OMJjBr3mjIqAIhlm+dFF6E=; b=GOgVftEODuZgvMw3OssKPFrGMiMyGhA5aUBxk5xxPj9wIlJiDM+YSEvfPv3l6ivaZZ afWBDIeGpUSMKb7akHuK2LVMw5gTyibKv0LukMC+Pc/kGX0CFshiwRYakaHsA7r+FwMj +BNDcchyzcJMNCmLhuiaBdztgngInfGttvmkJ781FDkaF53iDwyxoxGpDzvq/IrX/Tme dD60FrpLAFvq68LNH+QDrbC6D7NfCK7mxHCjrBaWnDu/axz0KmOJrgWRl8+zA4MjMF5Q aGF22urRHtJ9MQoDE6DxK/8YE3q91Yl1SVsTNxXXPvuIRnjCTnPSjO/lMJC6wwkrfThD /V3g== X-Gm-Message-State: AOJu0YzpR+TzaVf3+V38gSYT9kBX5kr6nOiUrlS6fG5XD5+v0gilYnAi exg2qiiBvBIAoGoRJWRIazzojv1EMKfTGCKlV6Z4a+2HYlUnmiCo3lFQap2WKCeMqYARDp97fE0 5QHgzNN3A0HfkpjSaHrBkx+ISe4jKQLOmtHAd2wu2mIsJ5tWNFTJMQ2x2/AtDwcPksjddow6wpb 6YkzM2kjDR9cZ7YX898AUPo96muxjNnRwgYWwWF/1dyMW8 X-Google-Smtp-Source: AGHT+IE9iRWPyUp5tl475fhm46rIzneMbMp5fy8CbIZM7/8qX3r1UpSo6IrLdE7G6MPxvAfOd8AUuPaTLbQa X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a17:902:db0b:b0:202:156:c4c6 with SMTP id d9443c01a7336-2074c7e1855mr1920665ad.9.1726006385419; Tue, 10 Sep 2024 15:13:05 -0700 (PDT) Date: Tue, 10 Sep 2024 15:12:10 -0700 In-Reply-To: <20240910221235.2132138-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240910221235.2132138-1-jstultz@google.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog Message-ID: <20240910221235.2132138-5-jstultz@google.com> Subject: [RESEND x2][PATCH v12 4/7] sched: Add move_queued_task_locked helper From: John Stultz To: LKML Cc: "Connor O'Brien" , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, Metin Kaya , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Connor O'Brien Switch logic that deactivates, sets the task cpu, and reactivates a task on a different rq to use a helper that will be later extended to push entire blocked task chains. This patch was broken out from a larger chain migration patch originally by Connor O'Brien. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Valentin Schneider Reviewed-by: Qais Yousef Signed-off-by: Connor O'Brien [jstultz: split out from larger chain migration patch] Signed-off-by: John Stultz --- v8: * Renamed from push_task_chain to do_push_task so it makes more sense without proxy-execution v10: * Changed name to move_queued_task_locked as suggested by Valentin v11: * Also use new helper in __migrate_swap_task() and try_steal_cookie() as suggested by Qais Yousef * Nit cleanups suggested by Metin --- kernel/sched/core.c | 13 +++---------- kernel/sched/deadline.c | 8 ++------ kernel/sched/rt.c | 8 ++------ kernel/sched/sched.h | 12 ++++++++++++ 4 files changed, 19 insertions(+), 22 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f3951e4a55e5b..b2cc20ed14de6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2526,9 +2526,7 @@ int push_cpu_stop(void *arg) =20 // XXX validate p is still the highest prio task if (task_rq(p) =3D=3D rq) { - deactivate_task(rq, p, 0); - set_task_cpu(p, lowest_rq->cpu); - activate_task(lowest_rq, p, 0); + move_queued_task_locked(rq, lowest_rq, p); resched_curr(lowest_rq); } =20 @@ -3214,9 +3212,7 @@ static void __migrate_swap_task(struct task_struct *p= , int cpu) rq_pin_lock(src_rq, &srf); rq_pin_lock(dst_rq, &drf); =20 - deactivate_task(src_rq, p, 0); - set_task_cpu(p, cpu); - activate_task(dst_rq, p, 0); + move_queued_task_locked(src_rq, dst_rq, p); wakeup_preempt(dst_rq, p, 0); =20 rq_unpin_lock(dst_rq, &drf); @@ -6152,10 +6148,7 @@ static bool try_steal_cookie(int this, int that) if (sched_task_is_throttled(p, this)) goto next; =20 - deactivate_task(src, p, 0); - set_task_cpu(p, this); - activate_task(dst, p, 0); - + move_queued_task_locked(src, dst, p); resched_curr(dst); =20 success =3D true; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index f59e5c19d9445..ac60d2819bd1d 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2447,9 +2447,7 @@ static int push_dl_task(struct rq *rq) goto retry; } =20 - deactivate_task(rq, next_task, 0); - set_task_cpu(next_task, later_rq->cpu); - activate_task(later_rq, next_task, 0); + move_queued_task_locked(rq, later_rq, next_task); ret =3D 1; =20 resched_curr(later_rq); @@ -2535,9 +2533,7 @@ static void pull_dl_task(struct rq *this_rq) if (is_migration_disabled(p)) { push_task =3D get_push_task(src_rq); } else { - deactivate_task(src_rq, p, 0); - set_task_cpu(p, this_cpu); - activate_task(this_rq, p, 0); + move_queued_task_locked(src_rq, this_rq, p); dmin =3D p->dl.deadline; resched =3D true; } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 310523c1b9e3f..496d3e6ab57d2 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2105,9 +2105,7 @@ static int push_rt_task(struct rq *rq, bool pull) goto retry; } =20 - deactivate_task(rq, next_task, 0); - set_task_cpu(next_task, lowest_rq->cpu); - activate_task(lowest_rq, next_task, 0); + move_queued_task_locked(rq, lowest_rq, next_task); resched_curr(lowest_rq); ret =3D 1; =20 @@ -2378,9 +2376,7 @@ static void pull_rt_task(struct rq *this_rq) if (is_migration_disabled(p)) { push_task =3D get_push_task(src_rq); } else { - deactivate_task(src_rq, p, 0); - set_task_cpu(p, this_cpu); - activate_task(this_rq, p, 0); + move_queued_task_locked(src_rq, this_rq, p); resched =3D true; } /* diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 4c36cc6803617..d2cc31c2457e5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3575,6 +3575,18 @@ static inline void init_sched_mm_cid(struct task_str= uct *t) { } =20 extern u64 avg_vruntime(struct cfs_rq *cfs_rq); extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se); +#ifdef CONFIG_SMP +static inline +void move_queued_task_locked(struct rq *src_rq, struct rq *dst_rq, struct = task_struct *task) +{ + lockdep_assert_rq_held(src_rq); + lockdep_assert_rq_held(dst_rq); + + deactivate_task(src_rq, task, 0); + set_task_cpu(task, dst_rq->cpu); + activate_task(dst_rq, task, 0); +} +#endif =20 #ifdef CONFIG_RT_MUTEXES =20 --=20 2.46.0.598.g6f2099f65c-goog From nobody Sat Nov 30 07:44:02 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 445241AC89B for ; Tue, 10 Sep 2024 22:13:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006391; cv=none; b=ltpXHF7SLSLjhoGCJ4kr1fu3KrB6ErHuemEqZhIkrdmkq216lyYkvRfXyI63Np7H5XTH3J4NiC7F6vvsWn25jbY6E33Kv40fOsDkySrwDoppRb3HlvXwVWP9gLwpocuggpwHOV54NOmHgyo87qSKuJqsZqZaq1C/agsHIH8cm60= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006391; c=relaxed/simple; bh=iuGtN+f2vomE6D5EDzT6KDxmi1UFJf79tXpaViAx6cM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=qOGa9ORsSheHA8+DaLjlVP/STCtc55gWlMSPFj2bLgJMojPRlwXES7FJm/HrKkneV0vW+4RbgjqHFSSiLibVzS7nIqaLpzqGQQZCnxaxVAdh27XH1YH3SBFO115NLRO9BtLjRKNaRP+0h9TfTHM6ioOWTZWfCtVYjGIArATCq0Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Sa0atAlT; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Sa0atAlT" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6d4426ad833so38068697b3.2 for ; Tue, 10 Sep 2024 15:13:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726006388; x=1726611188; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=iaOcE1n+CT4Z0Nve3QVlepBeUo0bLzyMIk99K97vdS8=; b=Sa0atAlTSwRfYHCETmhs+7pbjTBc9nXgwIff5XV66DGOWA/kHKV/0w2uwYLF03Vmlz 1qZLmzVFgtORXGgPjF/9LWhxPCYc9Fhc+pVYtVkhR+47c+tH6znC7x48yoY68LD9yID6 16BWrRJdJRDoujARiiCeRAetsyH4af0f7C9iL8PRkjuNqWYLm6OrQraOwkNrlTYCDZQZ Hzt38QTrOEOLNc+rRpuz5508/3rH6OC2PRuK4R3eJC8/mTf/Y3goSEL2viE0jZZ48VCN 0KrW4o7HjKX9AtJZ63+Ccc5fzw5OpORRkVV3CJEZfmbvnL9F8wzKMjrKxfew1o9Fmnvu M9jA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726006388; x=1726611188; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iaOcE1n+CT4Z0Nve3QVlepBeUo0bLzyMIk99K97vdS8=; b=As6dR2PnR7XtPTRF6tA+cGMRrWeq+A3c9E5ZSF1siUB3NfskSv7UUqcJYPm3+6VZl5 NkkBbhoRiK6GRLpOROC/HHorjHbYlakv/TDcaHdkbdhcdcTDtXxw/9F7UaKRXVZgDnWR YZFrw+ZAwtQLSIJHhFAslJd4E1DlkeJhoIjij7fL1pi1+ejG2FrF+T5WDL+oPmVTTLxK y8XXPSsbXZ0JsXQT8dbql/I2x4CAXVrJRMw/gfQlHvrDRwM1SC1yWrmTbHYCqYJDoljG iknAaXGFBJ358V21QP7c88iyoGO4+/amaIFdvbiD2JzuDUhDC0bT6MKJqdln3rtZLflX 3q5w== X-Gm-Message-State: AOJu0YwwPbCzXY5hWNTyPV80NMG7jbCdjGZb0jIVKEUJZqYOBeNbQnJ6 HPzOiNGAjsLAp5fNC4sxysAquWYD9OI3FhvWHEveDSQ0OA7bSZ3IAVR+/r9sYNWXQnraaHvgqLv /Iqoh0hy3axR/04y3ic5e2r2M60jiJgGBvYyo9PnEJ6p0VuP95FGqzjcQS2uSdgkuPq4nphoH8e rqz22Yo3GB/y2JXFDF4OwiX6EAFJ4jPp0NvOYL71XIpSLH X-Google-Smtp-Source: AGHT+IEP/sFtVMI7Ur6r7G0JfXZRrrE1jC7YOB31tlWCO/792DIBECsiHyuEvPb3reZgBGOYWPBoTVvtEu9f X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a25:a2c1:0:b0:e1a:974f:48e1 with SMTP id 3f1490d57ef6-e1d8c41782bmr1390276.6.1726006387520; Tue, 10 Sep 2024 15:13:07 -0700 (PDT) Date: Tue, 10 Sep 2024 15:12:11 -0700 In-Reply-To: <20240910221235.2132138-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240910221235.2132138-1-jstultz@google.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog Message-ID: <20240910221235.2132138-6-jstultz@google.com> Subject: [RESEND x2][PATCH v12 5/7] sched: Consolidate pick_*_task to task_is_pushable helper From: John Stultz To: LKML Cc: "Connor O'Brien" , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, Metin Kaya , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Connor O'Brien This patch consolidates rt and deadline pick_*_task functions to a task_is_pushable() helper This patch was broken out from a larger chain migration patch originally by Connor O'Brien. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Valentin Schneider Signed-off-by: Connor O'Brien [jstultz: split out from larger chain migration patch, renamed helper function] Signed-off-by: John Stultz --- v7: * Split from chain migration patch * Renamed function v11: * Switched to bool (though later in the series it goes to a tri-state return) for now to simplify review. Will add tri-state handling later in the series when its needed. Suggested by Metin and others. --- kernel/sched/deadline.c | 10 +--------- kernel/sched/rt.c | 11 +---------- kernel/sched/sched.h | 10 ++++++++++ 3 files changed, 12 insertions(+), 19 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index ac60d2819bd1d..d3050f6c2958d 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2183,14 +2183,6 @@ static void task_fork_dl(struct task_struct *p) /* Only try algorithms three times */ #define DL_MAX_TRIES 3 =20 -static int pick_dl_task(struct rq *rq, struct task_struct *p, int cpu) -{ - if (!task_on_cpu(rq, p) && - cpumask_test_cpu(cpu, &p->cpus_mask)) - return 1; - return 0; -} - /* * Return the earliest pushable rq's task, which is suitable to be executed * on the CPU, NULL otherwise: @@ -2209,7 +2201,7 @@ static struct task_struct *pick_earliest_pushable_dl_= task(struct rq *rq, int cpu if (next_node) { p =3D __node_2_pdl(next_node); =20 - if (pick_dl_task(rq, p, cpu)) + if (task_is_pushable(rq, p, cpu)) return p; =20 next_node =3D rb_next(next_node); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 496d3e6ab57d2..9f07f09641f7a 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1790,15 +1790,6 @@ static void put_prev_task_rt(struct rq *rq, struct t= ask_struct *p) /* Only try algorithms three times */ #define RT_MAX_TRIES 3 =20 -static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu) -{ - if (!task_on_cpu(rq, p) && - cpumask_test_cpu(cpu, &p->cpus_mask)) - return 1; - - return 0; -} - /* * Return the highest pushable rq's task, which is suitable to be executed * on the CPU, NULL otherwise @@ -1812,7 +1803,7 @@ static struct task_struct *pick_highest_pushable_task= (struct rq *rq, int cpu) return NULL; =20 plist_for_each_entry(p, head, pushable_tasks) { - if (pick_rt_task(rq, p, cpu)) + if (task_is_pushable(rq, p, cpu)) return p; } =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d2cc31c2457e5..10ef612c078f9 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3586,6 +3586,16 @@ void move_queued_task_locked(struct rq *src_rq, stru= ct rq *dst_rq, struct task_s set_task_cpu(task, dst_rq->cpu); activate_task(dst_rq, task, 0); } + +static inline +bool task_is_pushable(struct rq *rq, struct task_struct *p, int cpu) +{ + if (!task_on_cpu(rq, p) && + cpumask_test_cpu(cpu, &p->cpus_mask)) + return true; + + return false; +} #endif =20 #ifdef CONFIG_RT_MUTEXES --=20 2.46.0.598.g6f2099f65c-goog From nobody Sat Nov 30 07:44:02 2024 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE8511AC8BD for ; Tue, 10 Sep 2024 22:13:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006392; cv=none; b=RxduqyTpUgwN8BlvmOQPlNAe/ZTV1J19U62A+Thf5ivsCtL8sVh24vLUAtp8zGP8vlaxhhp5W4g3XGbK569WGmF8bJw4vKobTaj2gqizH6V5GZmbVbtddISzFNj2xnR1fXJX0jMnanGTVsKpWUqFeo+yunDlkordlRlfMnMSRc0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006392; c=relaxed/simple; bh=U9kExhQNmoqbUcABt0zApvKBDIP2+P0+yLvNARBa/8s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=O3RIYNySAuMK98eE+Qz0IwqjesXEpqITj3gouZLg1ygY+NpYfRfACfgAbVJMiq1PhkwQazr7KBZVaZg3ypDF1H5t3kV/hoBM6Mb3POAt0uaqkRSS9m9tQnjLurtCvIEK7t/rAq9KtUT7rUebNh6mPp4iEDdCOztnSRzIpYdBLmE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Xez6njsp; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Xez6njsp" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e1a74f824f9so2714764276.1 for ; Tue, 10 Sep 2024 15:13:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726006390; x=1726611190; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TgWPkJOgDCTgQuyFh0W8ok44kERRCzXuvotXrcKd8BI=; b=Xez6njspBPd0w6pSE/PFeyzn9O5FXx/DCH67ysypgWzL8d3r7abeeugTzeL8kO52je DaHzQxiUmmVGD9SK+5yy/dSWD/+0B33ozyekfS5VXmllZOhf5iyXou7lxR87e8It/1nr Isermu1t2FgaBZi+mfCx0SH3l6a2P0AItG54OGY4LMVG2V62+QCARbZsLvnA5EPNMUvt iRzZI2MX7YUujAx5X/NWpW+gynhrAUFsDZfzWEzd53OAB0Zwf3xMgmgMEqnYub2102SO tZIhQwGFbepBccx093sEpznj77Dbgf4Tp31pjc6s7UZrjYNxFDahgK47pQfFYvSeeuUT 1cWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726006390; x=1726611190; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TgWPkJOgDCTgQuyFh0W8ok44kERRCzXuvotXrcKd8BI=; b=Ab0loqFljkvcjZ+P2N7No/qzpa7VlRPnP56fx2g91bMtWXxTYjlNLoDawVIqporVl4 YZe+VuxPF40PDwvlT1KFaL+fRKnECr7yj+BxSF6CBLjeQpQUBynDC78eV1p+N421fbQR rNd01Rb7/wglni3YwhGu2zx1kOIFi/UXy2SItYU9LSRzpqNf9hGCfgX7fwdaJ6BBcyeS kSVTsQln3y/vsD3mpgqqtRNymWl8A/2JswuCfWUquf1ZgiDm7AHqPbugg0MLmbht3ou4 qaSj7emyhEmFRy1avRjsezygqnEHMdJs/OyeDW2fLFF4i7MbNtD08iTO5WUbX8AIkDpr sycA== X-Gm-Message-State: AOJu0Yy+oVRDkFGVKoVlVElThk5YVWYgnb9A4Wy4hbHp0Wda/QEyheLc w7c+lvwedhCyx57WPDkxGqJcMFdgGcZFbLPwXIDXDusZ/idpe6q+tep0t+dONxgbXh+CmRvXcUK yrMAAXlO2D4H0Vwx3jKLsUAIAVNj9USp3goodTvgkbv3nIWudZHqFSxqfsPJC+BP//e74TvMvZ8 ne9WTelrv8uDwEqgYnJcXbyYFaF82sKWKYAJfSSgE62Uyj X-Google-Smtp-Source: AGHT+IFZpIB1bRgxwr+GsjEt5pT4U4ZebwKSRo8cZjRcQzgaWa9DzF5r7vck8rwK32SLTSXkOQqqNO5dx4HP X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a25:d607:0:b0:e1a:44fa:f09 with SMTP id 3f1490d57ef6-e1d8c260cc1mr1251276.2.1726006389351; Tue, 10 Sep 2024 15:13:09 -0700 (PDT) Date: Tue, 10 Sep 2024 15:12:12 -0700 In-Reply-To: <20240910221235.2132138-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240910221235.2132138-1-jstultz@google.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog Message-ID: <20240910221235.2132138-7-jstultz@google.com> Subject: [RESEND x2][PATCH v12 6/7] sched: Split out __schedule() deactivate task logic into a helper From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, Metin Kaya Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As we're going to re-use the deactivation logic, split it into a helper. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Reviewed-by: Qais Yousef Signed-off-by: John Stultz --- v6: * Define function as static to avoid "no previous prototype" warnings as Reported-by: kernel test robot v7: * Rename state task_state to be more clear, as suggested by Metin Kaya v11: * Return early to simplify indentation, and drop unused bool return (will be introduced later when its needed) as sugggested by Qais. --- kernel/sched/core.c | 71 +++++++++++++++++++++++++++------------------ 1 file changed, 42 insertions(+), 29 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b2cc20ed14de6..2cc84f1549b7b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6349,6 +6349,47 @@ pick_next_task(struct rq *rq, struct task_struct *pr= ev, struct rq_flags *rf) # define SM_MASK_PREEMPT SM_PREEMPT #endif =20 +/* + * Helper function for __schedule() + * + * If a task does not have signals pending, deactivate it + * Otherwise marks the task's __state as RUNNING + */ +static void try_to_deactivate_task(struct rq *rq, struct task_struct *p, + unsigned long task_state) +{ + if (signal_pending_state(task_state, p)) { + WRITE_ONCE(p->__state, TASK_RUNNING); + return; + } + + p->sched_contributes_to_load =3D + (task_state & TASK_UNINTERRUPTIBLE) && + !(task_state & TASK_NOLOAD) && + !(task_state & TASK_FROZEN); + + if (p->sched_contributes_to_load) + rq->nr_uninterruptible++; + + /* + * __schedule() ttwu() + * prev_state =3D prev->state; if (p->on_rq && ...) + * if (prev_state) goto out; + * p->on_rq =3D 0; smp_acquire__after_ctrl_dep(); + * p->state =3D TASK_WAKING + * + * Where __schedule() and ttwu() have matching control dependencies. + * + * After this, schedule() must not care about p->state any more. + */ + deactivate_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK); + + if (p->in_iowait) { + atomic_inc(&rq->nr_iowait); + delayacct_blkio_start(); + } +} + /* * __schedule() is the main scheduler function. * @@ -6442,35 +6483,7 @@ static void __sched notrace __schedule(unsigned int = sched_mode) */ prev_state =3D READ_ONCE(prev->__state); if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) { - if (signal_pending_state(prev_state, prev)) { - WRITE_ONCE(prev->__state, TASK_RUNNING); - } else { - prev->sched_contributes_to_load =3D - (prev_state & TASK_UNINTERRUPTIBLE) && - !(prev_state & TASK_NOLOAD) && - !(prev_state & TASK_FROZEN); - - if (prev->sched_contributes_to_load) - rq->nr_uninterruptible++; - - /* - * __schedule() ttwu() - * prev_state =3D prev->state; if (p->on_rq && ...) - * if (prev_state) goto out; - * p->on_rq =3D 0; smp_acquire__after_ctrl_dep(); - * p->state =3D TASK_WAKING - * - * Where __schedule() and ttwu() have matching control dependencies. - * - * After this, schedule() must not care about p->state any more. - */ - deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK); - - if (prev->in_iowait) { - atomic_inc(&rq->nr_iowait); - delayacct_blkio_start(); - } - } + try_to_deactivate_task(rq, prev, prev_state); switch_count =3D &prev->nvcsw; } =20 --=20 2.46.0.598.g6f2099f65c-goog From nobody Sat Nov 30 07:44:02 2024 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B85F1AD419 for ; Tue, 10 Sep 2024 22:13:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006394; cv=none; b=o+R9wfVxt3XBU09T54L0w9F5bqEjDCZV+em7nFQqWbFf+LqYOHw00EwFv0Fb1Go3Z+S163YiiLmaA5W3DgXTvei/QLRHvH9AqIoSE3l4Qp0NyN6wIETUucZn8oPoPVtdjeqEP440EUJWYAnXftDvlGhgV0BRA2S495xe6qsj6x4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726006394; c=relaxed/simple; bh=BE8nGrXCpNUECA9SRtCaAXSBYElSMtMg9xxZy0h+nzQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MNNR7genlFzzuMsguSGlaKawx7LHdgmJm/V9pfRX8DLCNcGNKdUQcKMvhifp0fYqghu3drpQtu0fObwp7Mv6397vfgSbN1Go9fi5caBe5fA9rS1hz/CALzEmcMORIfT4LMGdnt8g1QNyB4TtRIXvO1pfYp6Oz57+8+cxthZLyf8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=azdi6jnp; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="azdi6jnp" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2db15111146so3538035a91.1 for ; Tue, 10 Sep 2024 15:13:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726006391; x=1726611191; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=N/AaEHrVChNu/MMLr0hdE8wIhb2GbtcXMB3HAEY5Pts=; b=azdi6jnpV9bTUHSaJtMJWW5YrMHWqpyvPSl3aIFCuY0LzdllCIeiaQf5scltiiru9c 7n7kXujTpP+nKBSYAs+vRBbuDMtmKhyOlmWzT+BbO2bYdPtSehErOjpcr4JMgbtYeP4w QN1gAyX0nCUevl/HB6rI1W3gQDdecu30xhfVpLHp9+1eZbD1cIYoC2M1jugK4oAi7jYW vRxQFpB7PCDx7FPB4kR78ZxzbZTjer6pVZzUdHavxBAi//+g8kz8LYKfGegCgF5Gq97b jN36Zxazzklrv5XOE1sWZUK88FXUzfSJ28CBI/hN8aXifQEteWRgBHZDnKuaLENekxxU e9Pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726006391; x=1726611191; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=N/AaEHrVChNu/MMLr0hdE8wIhb2GbtcXMB3HAEY5Pts=; b=G7TNDTChhML5KOPZdHnH3RGPy5XFclKC+rkNaa9zMgkZSmM8x6ii3mrF6AbYRuBNdV /3MNTfJsz4ARPg2+EoWMajwbXGzsL+mSB9jOvriED/kxxpKOXGE/WnNO/Ct+X9JpkRNK gd1FSNvDDcTQ2raKOnsN+e1xFkUnmGplQkh5tptTvtn/mCnBuDgrABEuorT74msiJYX0 KI8E7KB+1HbTLPshynbyfG0QprPb9daWyaBZlyWBUb52iU7S7EAKJ0lLZYz1z1TUCCpN NXiZd7Ir07DCJPjUyPkwC+NfEU3aUmvId7BdfWHnU8ToabFs8lg7duuxUNHTA/G0Jwgp vLiQ== X-Gm-Message-State: AOJu0YzRWJjK62XIAv33K7zlyXKthRuC10FU54SQp2g1XVxNaJzkTyqr ibih5M6UKfqmAqN4GgwjAlJnr6J0XXG9POGb42GpnHvZ6Bq3sJdyTSPxibt3xJCyfPeU6Pj8Qsz sE7V8Gk5z6UVns/qMWhpFqeeZ5lp3yWrsCpFJ16nMy/I0vCVBwETpW820Updhg/84+5EvtH7pIa uBf/XOLrW07F2DQuEzLZFPiC5cuKVDACkKYbtjwyIPlwF4 X-Google-Smtp-Source: AGHT+IE/ZE1RcX/waruVSFvIHe8KFSp4xK3tRLVM7Rx+k4VYArsRQxy0/F2DtmCEqSGViyqmvnZJbKKGMCo1 X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a17:90a:ea90:b0:2d8:7d10:d97 with SMTP id 98e67ed59e1d1-2dad50f9e4bmr51709a91.5.1726006391155; Tue, 10 Sep 2024 15:13:11 -0700 (PDT) Date: Tue, 10 Sep 2024 15:12:13 -0700 In-Reply-To: <20240910221235.2132138-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240910221235.2132138-1-jstultz@google.com> X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog Message-ID: <20240910221235.2132138-8-jstultz@google.com> Subject: [RESEND x2][PATCH v12 7/7] sched: Split scheduler and execution contexts From: John Stultz To: LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Xuewen Yan , K Prateek Nayak , Metin Kaya , Thomas Gleixner , Daniel Lezcano , kernel-team@android.com, Metin Kaya , "Connor O'Brien" , John Stultz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Let's define the "scheduling context" as all the scheduler state in task_struct for the task chosen to run, which we'll call the donor task, and the "execution context" as all state required to actually run the task. Currently both are intertwined in task_struct. We want to logically split these such that we can use the scheduling context of the donor task selected to be scheduled, but use the execution context of a different task to actually be run. To this purpose, introduce rq->donor field to point to the task_struct chosen from the runqueue by the scheduler, and will be used for scheduler state, and preserve rq->curr to indicate the execution context of the task that will actually be run. This patch introduces the donor field as a union with curr, so it doesn't cause the contexts to be split yet, but adds the logic to handle everything separately. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Metin Kaya Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Tested-by: K Prateek Nayak Tested-by: Metin Kaya Reviewed-by: Metin Kaya Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20181009092434.26221-5-juri.lelli@redhat.com [add additional comments and update more sched_class code to use rq::proxy] Signed-off-by: Connor O'Brien [jstultz: Rebased and resolved minor collisions, reworked to use accessors, tweaked update_curr_common to use rq_proxy fixing rt scheduling issues] Signed-off-by: John Stultz --- v2: * Reworked to use accessors * Fixed update_curr_common to use proxy instead of curr v3: * Tweaked wrapper names * Swapped proxy for selected for clarity v4: * Minor variable name tweaks for readability * Use a macro instead of a inline function and drop other helper functions as suggested by Peter. * Remove verbose comments/questions to avoid review distractions, as suggested by Dietmar v5: * Add CONFIG_PROXY_EXEC option to this patch so the new logic can be tested with this change * Minor fix to grab rq_selected when holding the rq lock v7: * Minor spelling fix and unused argument fixes suggested by Metin Kaya * Switch to curr_selected for consistency, and minor rewording of commit message for clarity * Rename variables selected instead of curr when we're using rq_selected() * Reduce macros in CONFIG_SCHED_PROXY_EXEC ifdef sections, as suggested by Metin Kaya v8: * Use rq->curr, not rq_selected with task_tick, as suggested by Valentin * Minor rework to reorder this with CONFIG_SCHED_PROXY_EXEC patch v10: * Use rq_selected in push_rt_task & get_push_task v11: * Rework to use selected instead of curr in a few cases we were previously assigning curr =3D rq_selected() to minimize lines of change. Suggested by Metin. v12: * Big rename to use rq->donor instead of rq_selected(), as suggested by Peter. --- kernel/sched/core.c | 45 +++++++++++++++++++++++--------------- kernel/sched/deadline.c | 39 +++++++++++++++++---------------- kernel/sched/fair.c | 32 +++++++++++++-------------- kernel/sched/rt.c | 48 ++++++++++++++++++++--------------------- kernel/sched/sched.h | 28 +++++++++++++++++++++--- kernel/sched/syscalls.c | 4 ++-- 6 files changed, 115 insertions(+), 81 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2cc84f1549b7b..80a70b9bd27c2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -794,7 +794,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *time= r) =20 rq_lock(rq, &rf); update_rq_clock(rq); - rq->curr->sched_class->task_tick(rq, rq->curr, 1); + rq->donor->sched_class->task_tick(rq, rq->curr, 1); rq_unlock(rq, &rf); =20 return HRTIMER_NORESTART; @@ -2053,16 +2053,18 @@ void check_class_changed(struct rq *rq, struct task= _struct *p, =20 void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags) { - if (p->sched_class =3D=3D rq->curr->sched_class) - rq->curr->sched_class->wakeup_preempt(rq, p, flags); - else if (sched_class_above(p->sched_class, rq->curr->sched_class)) + struct task_struct *donor =3D rq->donor; + + if (p->sched_class =3D=3D donor->sched_class) + donor->sched_class->wakeup_preempt(rq, p, flags); + else if (sched_class_above(p->sched_class, donor->sched_class)) resched_curr(rq); =20 /* * A queue event has occurred, and we're going to schedule. In * this case, we can save a useless back to back clock update. */ - if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr)) + if (task_on_rq_queued(donor) && test_tsk_need_resched(rq->curr)) rq_clock_skip_update(rq); } =20 @@ -2586,7 +2588,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct a= ffinity_context *ctx) lockdep_assert_held(&p->pi_lock); =20 queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_donor(rq, p); =20 if (queued) { /* @@ -5373,7 +5375,7 @@ unsigned long long task_sched_runtime(struct task_str= uct *p) * project cycles that may never be accounted to this * thread, breaking clock_gettime(). */ - if (task_current(rq, p) && task_on_rq_queued(p)) { + if (task_current_donor(rq, p) && task_on_rq_queued(p)) { prefetch_curr_exec_start(p); update_rq_clock(rq); p->sched_class->update_curr(rq); @@ -5441,7 +5443,8 @@ void sched_tick(void) { int cpu =3D smp_processor_id(); struct rq *rq =3D cpu_rq(cpu); - struct task_struct *curr; + /* accounting goes to the donor task */ + struct task_struct *donor; struct rq_flags rf; unsigned long hw_pressure; u64 resched_latency; @@ -5452,19 +5455,19 @@ void sched_tick(void) sched_clock_tick(); =20 rq_lock(rq, &rf); + donor =3D rq->donor; =20 - curr =3D rq->curr; - psi_account_irqtime(rq, curr, NULL); + psi_account_irqtime(rq, donor, NULL); =20 update_rq_clock(rq); hw_pressure =3D arch_scale_hw_pressure(cpu_of(rq)); update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure); - curr->sched_class->task_tick(rq, curr, 0); + donor->sched_class->task_tick(rq, donor, 0); if (sched_feat(LATENCY_WARN)) resched_latency =3D cpu_resched_latency(rq); calc_global_load_tick(rq); sched_core_tick(rq); - task_tick_mm_cid(rq, curr); + task_tick_mm_cid(rq, donor); =20 rq_unlock(rq, &rf); =20 @@ -5473,8 +5476,8 @@ void sched_tick(void) =20 perf_event_task_tick(); =20 - if (curr->flags & PF_WQ_WORKER) - wq_worker_tick(curr); + if (donor->flags & PF_WQ_WORKER) + wq_worker_tick(donor); =20 #ifdef CONFIG_SMP rq->idle_balance =3D idle_cpu(cpu); @@ -5539,6 +5542,12 @@ static void sched_tick_remote(struct work_struct *wo= rk) struct task_struct *curr =3D rq->curr; =20 if (cpu_online(cpu)) { + /* + * Since this is a remote tick for full dynticks mode, + * we are always sure that there is no proxy (only a + * single task is running). + */ + SCHED_WARN_ON(rq->curr !=3D rq->donor); update_rq_clock(rq); =20 if (!is_idle_task(curr)) { @@ -6488,6 +6497,7 @@ static void __sched notrace __schedule(unsigned int s= ched_mode) } =20 next =3D pick_next_task(rq, prev, &rf); + rq_set_donor(rq, next); clear_tsk_need_resched(prev); clear_preempt_need_resched(); #ifdef CONFIG_SCHED_DEBUG @@ -6984,7 +6994,7 @@ void rt_mutex_setprio(struct task_struct *p, struct t= ask_struct *pi_task) =20 prev_class =3D p->sched_class; queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_donor(rq, p); if (queued) dequeue_task(rq, p, queue_flag); if (running) @@ -7552,6 +7562,7 @@ void __init init_idle(struct task_struct *idle, int c= pu) rcu_read_unlock(); =20 rq->idle =3D idle; + rq_set_donor(rq, idle); rcu_assign_pointer(rq->curr, idle); idle->on_rq =3D TASK_ON_RQ_QUEUED; #ifdef CONFIG_SMP @@ -7641,7 +7652,7 @@ void sched_setnuma(struct task_struct *p, int nid) =20 rq =3D task_rq_lock(p, &rf); queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_donor(rq, p); =20 if (queued) dequeue_task(rq, p, DEQUEUE_SAVE); @@ -8772,7 +8783,7 @@ void sched_move_task(struct task_struct *tsk) =20 update_rq_clock(rq); =20 - running =3D task_current(rq, tsk); + running =3D task_current_donor(rq, tsk); queued =3D task_on_rq_queued(tsk); =20 if (queued) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index d3050f6c2958d..c284585f23cba 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1217,7 +1217,7 @@ static enum hrtimer_restart dl_task_timer(struct hrti= mer *timer) #endif =20 enqueue_task_dl(rq, p, ENQUEUE_REPLENISH); - if (dl_task(rq->curr)) + if (dl_task(rq->donor)) wakeup_preempt_dl(rq, p, 0); else resched_curr(rq); @@ -1441,11 +1441,11 @@ void dl_server_init(struct sched_dl_entity *dl_se, = struct rq *rq, */ static void update_curr_dl(struct rq *rq) { - struct task_struct *curr =3D rq->curr; - struct sched_dl_entity *dl_se =3D &curr->dl; + struct task_struct *donor =3D rq->donor; + struct sched_dl_entity *dl_se =3D &donor->dl; s64 delta_exec; =20 - if (!dl_task(curr) || !on_dl_rq(dl_se)) + if (!dl_task(donor) || !on_dl_rq(dl_se)) return; =20 /* @@ -1903,7 +1903,7 @@ static int find_later_rq(struct task_struct *task); static int select_task_rq_dl(struct task_struct *p, int cpu, int flags) { - struct task_struct *curr; + struct task_struct *curr, *donor; bool select_rq; struct rq *rq; =20 @@ -1914,6 +1914,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int= flags) =20 rcu_read_lock(); curr =3D READ_ONCE(rq->curr); /* unlocked access */ + donor =3D READ_ONCE(rq->donor); =20 /* * If we are dealing with a -deadline task, we must @@ -1924,9 +1925,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int= flags) * other hand, if it has a shorter deadline, we * try to make it stay here, it might be important. */ - select_rq =3D unlikely(dl_task(curr)) && + select_rq =3D unlikely(dl_task(donor)) && (curr->nr_cpus_allowed < 2 || - !dl_entity_preempt(&p->dl, &curr->dl)) && + !dl_entity_preempt(&p->dl, &donor->dl)) && p->nr_cpus_allowed > 1; =20 /* @@ -1989,7 +1990,7 @@ static void check_preempt_equal_dl(struct rq *rq, str= uct task_struct *p) * let's hope p can move out. */ if (rq->curr->nr_cpus_allowed =3D=3D 1 || - !cpudl_find(&rq->rd->cpudl, rq->curr, NULL)) + !cpudl_find(&rq->rd->cpudl, rq->donor, NULL)) return; =20 /* @@ -2028,7 +2029,7 @@ static int balance_dl(struct rq *rq, struct task_stru= ct *p, struct rq_flags *rf) static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, int flags) { - if (dl_entity_preempt(&p->dl, &rq->curr->dl)) { + if (dl_entity_preempt(&p->dl, &rq->donor->dl)) { resched_curr(rq); return; } @@ -2038,7 +2039,7 @@ static void wakeup_preempt_dl(struct rq *rq, struct t= ask_struct *p, * In the unlikely case current and p have the same deadline * let us try to decide what's the best thing to do... */ - if ((p->dl.deadline =3D=3D rq->curr->dl.deadline) && + if ((p->dl.deadline =3D=3D rq->donor->dl.deadline) && !test_tsk_need_resched(rq->curr)) check_preempt_equal_dl(rq, p); #endif /* CONFIG_SMP */ @@ -2070,7 +2071,7 @@ static void set_next_task_dl(struct rq *rq, struct ta= sk_struct *p, bool first) if (!first) return; =20 - if (rq->curr->sched_class !=3D &dl_sched_class) + if (rq->donor->sched_class !=3D &dl_sched_class) update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0); =20 deadline_queue_push_tasks(rq); @@ -2395,8 +2396,8 @@ static int push_dl_task(struct rq *rq) * can move away, it makes sense to just reschedule * without going further in pushing next_task. */ - if (dl_task(rq->curr) && - dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) && + if (dl_task(rq->donor) && + dl_time_before(next_task->dl.deadline, rq->donor->dl.deadline) && rq->curr->nr_cpus_allowed > 1) { resched_curr(rq); return 0; @@ -2519,7 +2520,7 @@ static void pull_dl_task(struct rq *this_rq) * deadline than the current task of its runqueue. */ if (dl_time_before(p->dl.deadline, - src_rq->curr->dl.deadline)) + src_rq->donor->dl.deadline)) goto skip; =20 if (is_migration_disabled(p)) { @@ -2558,9 +2559,9 @@ static void task_woken_dl(struct rq *rq, struct task_= struct *p) if (!task_on_cpu(rq, p) && !test_tsk_need_resched(rq->curr) && p->nr_cpus_allowed > 1 && - dl_task(rq->curr) && + dl_task(rq->donor) && (rq->curr->nr_cpus_allowed < 2 || - !dl_entity_preempt(&p->dl, &rq->curr->dl))) { + !dl_entity_preempt(&p->dl, &rq->donor->dl))) { push_dl_tasks(rq); } } @@ -2735,12 +2736,12 @@ static void switched_to_dl(struct rq *rq, struct ta= sk_struct *p) return; } =20 - if (rq->curr !=3D p) { + if (rq->donor !=3D p) { #ifdef CONFIG_SMP if (p->nr_cpus_allowed > 1 && rq->dl.overloaded) deadline_queue_push_tasks(rq); #endif - if (dl_task(rq->curr)) + if (dl_task(rq->donor)) wakeup_preempt_dl(rq, p, 0); else resched_curr(rq); @@ -2769,7 +2770,7 @@ static void prio_changed_dl(struct rq *rq, struct tas= k_struct *p, if (!rq->dl.overloaded) deadline_queue_pull_task(rq); =20 - if (task_current(rq, p)) { + if (task_current_donor(rq, p)) { /* * If we now have a earlier deadline task than p, * then reschedule, provided p is still on this diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9057584ec06de..9b8734f861e03 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1140,12 +1140,12 @@ static inline void update_curr_task(struct task_str= uct *p, s64 delta_exec) */ s64 update_curr_common(struct rq *rq) { - struct task_struct *curr =3D rq->curr; + struct task_struct *donor =3D rq->donor; s64 delta_exec; =20 - delta_exec =3D update_curr_se(rq, &curr->se); + delta_exec =3D update_curr_se(rq, &donor->se); if (likely(delta_exec > 0)) - update_curr_task(curr, delta_exec); + update_curr_task(donor, delta_exec); =20 return delta_exec; } @@ -1177,7 +1177,7 @@ static void update_curr(struct cfs_rq *cfs_rq) =20 static void update_curr_fair(struct rq *rq) { - update_curr(cfs_rq_of(&rq->curr->se)); + update_curr(cfs_rq_of(&rq->donor->se)); } =20 static inline void @@ -6645,7 +6645,7 @@ static void hrtick_start_fair(struct rq *rq, struct t= ask_struct *p) s64 delta =3D slice - ran; =20 if (delta < 0) { - if (task_current(rq, p)) + if (task_current_donor(rq, p)) resched_curr(rq); return; } @@ -6660,12 +6660,12 @@ static void hrtick_start_fair(struct rq *rq, struct= task_struct *p) */ static void hrtick_update(struct rq *rq) { - struct task_struct *curr =3D rq->curr; + struct task_struct *donor =3D rq->donor; =20 - if (!hrtick_enabled_fair(rq) || curr->sched_class !=3D &fair_sched_class) + if (!hrtick_enabled_fair(rq) || donor->sched_class !=3D &fair_sched_class) return; =20 - hrtick_start_fair(rq, curr); + hrtick_start_fair(rq, donor); } #else /* !CONFIG_SCHED_HRTICK */ static inline void @@ -8347,9 +8347,9 @@ static void set_next_buddy(struct sched_entity *se) */ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p= , int wake_flags) { - struct task_struct *curr =3D rq->curr; - struct sched_entity *se =3D &curr->se, *pse =3D &p->se; - struct cfs_rq *cfs_rq =3D task_cfs_rq(curr); + struct task_struct *donor =3D rq->donor; + struct sched_entity *se =3D &donor->se, *pse =3D &p->se; + struct cfs_rq *cfs_rq =3D task_cfs_rq(donor); int cse_is_idle, pse_is_idle; =20 if (unlikely(se =3D=3D pse)) @@ -8378,11 +8378,11 @@ static void check_preempt_wakeup_fair(struct rq *rq= , struct task_struct *p, int * prevents us from potentially nominating it as a false LAST_BUDDY * below. */ - if (test_tsk_need_resched(curr)) + if (test_tsk_need_resched(rq->curr)) return; =20 /* Idle tasks are by definition preempted by non-idle tasks. */ - if (unlikely(task_has_idle_policy(curr)) && + if (unlikely(task_has_idle_policy(donor)) && likely(!task_has_idle_policy(p))) goto preempt; =20 @@ -9356,7 +9356,7 @@ static bool __update_blocked_others(struct rq *rq, bo= ol *done) * update_load_avg() can call cpufreq_update_util(). Make sure that RT, * DL and IRQ signals have been updated before updating CFS. */ - curr_class =3D rq->curr->sched_class; + curr_class =3D rq->donor->sched_class; =20 hw_pressure =3D arch_scale_hw_pressure(cpu_of(rq)); =20 @@ -12738,7 +12738,7 @@ prio_changed_fair(struct rq *rq, struct task_struct= *p, int oldprio) * our priority decreased, or if we are not currently running on * this runqueue and our priority is higher than the current's */ - if (task_current(rq, p)) { + if (task_current_donor(rq, p)) { if (p->prio > oldprio) resched_curr(rq); } else @@ -12843,7 +12843,7 @@ static void switched_to_fair(struct rq *rq, struct = task_struct *p) * kick off the schedule if running, otherwise just see * if we can still preempt the current task. */ - if (task_current(rq, p)) + if (task_current_donor(rq, p)) resched_curr(rq); else wakeup_preempt(rq, p, 0); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 9f07f09641f7a..f8b3cb92d6776 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -529,7 +529,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *r= t_se, unsigned int flags) =20 static void sched_rt_rq_enqueue(struct rt_rq *rt_rq) { - struct task_struct *curr =3D rq_of_rt_rq(rt_rq)->curr; + struct task_struct *donor =3D rq_of_rt_rq(rt_rq)->donor; struct rq *rq =3D rq_of_rt_rq(rt_rq); struct sched_rt_entity *rt_se; =20 @@ -543,7 +543,7 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq) else if (!on_rt_rq(rt_se)) enqueue_rt_entity(rt_se, 0); =20 - if (rt_rq->highest_prio.curr < curr->prio) + if (rt_rq->highest_prio.curr < donor->prio) resched_curr(rq); } } @@ -999,11 +999,11 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt= _rq) */ static void update_curr_rt(struct rq *rq) { - struct task_struct *curr =3D rq->curr; - struct sched_rt_entity *rt_se =3D &curr->rt; + struct task_struct *donor =3D rq->donor; + struct sched_rt_entity *rt_se =3D &donor->rt; s64 delta_exec; =20 - if (curr->sched_class !=3D &rt_sched_class) + if (donor->sched_class !=3D &rt_sched_class) return; =20 delta_exec =3D update_curr_common(rq); @@ -1542,7 +1542,7 @@ static int find_lowest_rq(struct task_struct *task); static int select_task_rq_rt(struct task_struct *p, int cpu, int flags) { - struct task_struct *curr; + struct task_struct *curr, *donor; struct rq *rq; bool test; =20 @@ -1554,6 +1554,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int= flags) =20 rcu_read_lock(); curr =3D READ_ONCE(rq->curr); /* unlocked access */ + donor =3D READ_ONCE(rq->donor); =20 /* * If the current task on @p's runqueue is an RT task, then @@ -1582,8 +1583,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int= flags) * systems like big.LITTLE. */ test =3D curr && - unlikely(rt_task(curr)) && - (curr->nr_cpus_allowed < 2 || curr->prio <=3D p->prio); + unlikely(rt_task(donor)) && + (curr->nr_cpus_allowed < 2 || donor->prio <=3D p->prio); =20 if (test || !rt_task_fits_capacity(p, cpu)) { int target =3D find_lowest_rq(p); @@ -1613,12 +1614,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, in= t flags) =20 static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p) { - /* - * Current can't be migrated, useless to reschedule, - * let's hope p can move out. - */ if (rq->curr->nr_cpus_allowed =3D=3D 1 || - !cpupri_find(&rq->rd->cpupri, rq->curr, NULL)) + !cpupri_find(&rq->rd->cpupri, rq->donor, NULL)) return; =20 /* @@ -1661,7 +1658,9 @@ static int balance_rt(struct rq *rq, struct task_stru= ct *p, struct rq_flags *rf) */ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int fl= ags) { - if (p->prio < rq->curr->prio) { + struct task_struct *donor =3D rq->donor; + + if (p->prio < donor->prio) { resched_curr(rq); return; } @@ -1679,7 +1678,7 @@ static void wakeup_preempt_rt(struct rq *rq, struct t= ask_struct *p, int flags) * to move current somewhere else, making room for our non-migratable * task. */ - if (p->prio =3D=3D rq->curr->prio && !test_tsk_need_resched(rq->curr)) + if (p->prio =3D=3D donor->prio && !test_tsk_need_resched(rq->curr)) check_preempt_equal_prio(rq, p); #endif } @@ -1704,7 +1703,7 @@ static inline void set_next_task_rt(struct rq *rq, st= ruct task_struct *p, bool f * utilization. We only care of the case where we start to schedule a * rt task */ - if (rq->curr->sched_class !=3D &rt_sched_class) + if (rq->donor->sched_class !=3D &rt_sched_class) update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0); =20 rt_queue_push_tasks(rq); @@ -1976,6 +1975,7 @@ static struct task_struct *pick_next_pushable_task(st= ruct rq *rq) =20 BUG_ON(rq->cpu !=3D task_cpu(p)); BUG_ON(task_current(rq, p)); + BUG_ON(task_current_donor(rq, p)); BUG_ON(p->nr_cpus_allowed <=3D 1); =20 BUG_ON(!task_on_rq_queued(p)); @@ -2008,7 +2008,7 @@ static int push_rt_task(struct rq *rq, bool pull) * higher priority than current. If that's the case * just reschedule current. */ - if (unlikely(next_task->prio < rq->curr->prio)) { + if (unlikely(next_task->prio < rq->donor->prio)) { resched_curr(rq); return 0; } @@ -2029,7 +2029,7 @@ static int push_rt_task(struct rq *rq, bool pull) * Note that the stoppers are masqueraded as SCHED_FIFO * (cf. sched_set_stop_task()), so we can't rely on rt_task(). */ - if (rq->curr->sched_class !=3D &rt_sched_class) + if (rq->donor->sched_class !=3D &rt_sched_class) return 0; =20 cpu =3D find_lowest_rq(rq->curr); @@ -2361,7 +2361,7 @@ static void pull_rt_task(struct rq *this_rq) * p if it is lower in priority than the * current task on the run queue */ - if (p->prio < src_rq->curr->prio) + if (p->prio < src_rq->donor->prio) goto skip; =20 if (is_migration_disabled(p)) { @@ -2403,9 +2403,9 @@ static void task_woken_rt(struct rq *rq, struct task_= struct *p) bool need_to_push =3D !task_on_cpu(rq, p) && !test_tsk_need_resched(rq->curr) && p->nr_cpus_allowed > 1 && - (dl_task(rq->curr) || rt_task(rq->curr)) && + (dl_task(rq->donor) || rt_task(rq->donor)) && (rq->curr->nr_cpus_allowed < 2 || - rq->curr->prio <=3D p->prio); + rq->donor->prio <=3D p->prio); =20 if (need_to_push) push_rt_tasks(rq); @@ -2489,7 +2489,7 @@ static void switched_to_rt(struct rq *rq, struct task= _struct *p) if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) rt_queue_push_tasks(rq); #endif /* CONFIG_SMP */ - if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq))) + if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq))) resched_curr(rq); } } @@ -2504,7 +2504,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,= int oldprio) if (!task_on_rq_queued(p)) return; =20 - if (task_current(rq, p)) { + if (task_current_donor(rq, p)) { #ifdef CONFIG_SMP /* * If our priority decreases while running, we @@ -2530,7 +2530,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,= int oldprio) * greater than the current running task * then reschedule. */ - if (p->prio < rq->curr->prio) + if (p->prio < rq->donor->prio) resched_curr(rq); } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 10ef612c078f9..06bca59c2bce5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1058,7 +1058,10 @@ struct rq { */ unsigned int nr_uninterruptible; =20 - struct task_struct __rcu *curr; + union { + struct task_struct __rcu *donor; /* Scheduler context */ + struct task_struct __rcu *curr; /* Execution context */ + }; struct task_struct *idle; struct task_struct *stop; unsigned long next_balance; @@ -1254,6 +1257,11 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); #define cpu_curr(cpu) (cpu_rq(cpu)->curr) #define raw_rq() raw_cpu_ptr(&runqueues) =20 +static inline void rq_set_donor(struct rq *rq, struct task_struct *t) +{ + /* Do nothing */ +} + #ifdef CONFIG_SCHED_CORE static inline struct cpumask *sched_group_span(struct sched_group *sg); =20 @@ -2170,11 +2178,25 @@ static inline u64 global_rt_runtime(void) return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC; } =20 +/* + * Is p the current execution context? + */ static inline int task_current(struct rq *rq, struct task_struct *p) { return rq->curr =3D=3D p; } =20 +/* + * Is p the current scheduling context? + * + * Note that it might be the current execution context at the same time if + * rq->curr =3D=3D rq->donor =3D=3D p. + */ +static inline int task_current_donor(struct rq *rq, struct task_struct *p) +{ + return rq->donor =3D=3D p; +} + static inline int task_on_cpu(struct rq *rq, struct task_struct *p) { #ifdef CONFIG_SMP @@ -2344,7 +2366,7 @@ struct sched_class { =20 static inline void put_prev_task(struct rq *rq, struct task_struct *prev) { - WARN_ON_ONCE(rq->curr !=3D prev); + WARN_ON_ONCE(rq->donor !=3D prev); prev->sched_class->put_prev_task(rq, prev); } =20 @@ -2436,7 +2458,7 @@ static inline cpumask_t *alloc_user_cpus_ptr(int node) =20 static inline struct task_struct *get_push_task(struct rq *rq) { - struct task_struct *p =3D rq->curr; + struct task_struct *p =3D rq->donor; =20 lockdep_assert_rq_held(rq); =20 diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c index ae1b42775ef95..f0b25e810cbae 100644 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -91,7 +91,7 @@ void set_user_nice(struct task_struct *p, long nice) } =20 queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_donor(rq, p); if (queued) dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK); if (running) @@ -784,7 +784,7 @@ int __sched_setscheduler(struct task_struct *p, } =20 queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_donor(rq, p); if (queued) dequeue_task(rq, p, queue_flags); if (running) --=20 2.46.0.598.g6f2099f65c-goog