From nobody Tue Dec 16 18:34:41 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88E44EE801B for ; Fri, 8 Sep 2023 16:23:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245083AbjIHQXP (ORCPT ); Fri, 8 Sep 2023 12:23:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245021AbjIHQXL (ORCPT ); Fri, 8 Sep 2023 12:23:11 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A108A1FCF for ; Fri, 8 Sep 2023 09:23:03 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1694190182; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kBSf3JAvBodQOJU5koQ6BIwri4EQhJKJSEVrlYiAIjM=; b=XjToC34+GvOXbPPH0Apm/0m14hSDHiRnyWj/QiLAnPENq8Zd4px+YNUHuMz1+nK0eTr1hk THh82Wo5Q458TlvIrnjGwUwvL4eeZ+ZyvmuYJ6E0MBeVWFHhiSEF7oUuoQtkHpSn+CwZg1 3pkCL2/H8en+Ns3Vrkl24WAtBCRLpu1LHepLnHFaxHuJHgTH0ICW/E/iJFWprY5N/frshh //PCi9iYTMdqnRbhLVSWjeLuacd9f0dhLiADmHNDBeDX5bscWGQPdXtygkMgPtwD3pzEfw xL7iuGtn2VofkFkKAd75bhzvTTBlh6+N2mHbo1ZJOaNtCii6sIksL7G9NV0xqA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1694190182; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kBSf3JAvBodQOJU5koQ6BIwri4EQhJKJSEVrlYiAIjM=; b=NpuGVYhPfHxu2rVtpLLVPCs/91pOsXeVOSCpc9X8y90QgnJl8XldQoKe0Dy/fF0b3wVcKq RYY9+Lw4Wm3kDbAA== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v3 1/7] sched: Constrain locks in sched_submit_work() Date: Fri, 8 Sep 2023 18:22:48 +0200 Message-Id: <20230908162254.999499-2-bigeasy@linutronix.de> In-Reply-To: <20230908162254.999499-1-bigeasy@linutronix.de> References: <20230908162254.999499-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Even though sched_submit_work() is ran from preemptible context, it is discouraged to have it use blocking locks due to the recursion potential. Enforce this. Signed-off-by: "Peter Zijlstra (Intel)" Link: https://lore.kernel.org/r/20230815111430.154558666@infradead.org Link: https://lore.kernel.org/r/20230825181033.504534-2-bigeasy@linutronix.= de Signed-off-by: Sebastian Andrzej Siewior --- kernel/sched/core.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2299a5cfbfb9e..d55564097bd86 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6720,11 +6720,18 @@ void __noreturn do_task_dead(void) =20 static inline void sched_submit_work(struct task_struct *tsk) { + static DEFINE_WAIT_OVERRIDE_MAP(sched_map, LD_WAIT_CONFIG); unsigned int task_flags; =20 if (task_is_running(tsk)) return; =20 + /* + * Establish LD_WAIT_CONFIG context to ensure none of the code called + * will use a blocking primitive -- which would lead to recursion. + */ + lock_map_acquire_try(&sched_map); + task_flags =3D tsk->flags; /* * If a worker goes to sleep, notify and ask workqueue whether it @@ -6749,6 +6756,8 @@ static inline void sched_submit_work(struct task_stru= ct *tsk) * make sure to submit it to avoid deadlocks. */ blk_flush_plug(tsk->plug, true); + + lock_map_release(&sched_map); } =20 static void sched_update_worker(struct task_struct *tsk) --=20 2.40.1 From nobody Tue Dec 16 18:34:41 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E41B6EE8015 for ; Fri, 8 Sep 2023 16:23:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245064AbjIHQXN (ORCPT ); Fri, 8 Sep 2023 12:23:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243432AbjIHQXL (ORCPT ); Fri, 8 Sep 2023 12:23:11 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 272441FD9 for ; Fri, 8 Sep 2023 09:23:04 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1694190182; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Jn4eRlPtJKnHeG2pf4chEV3J19o5X2lAFc+vObKmtf4=; b=X098BhBR1DhgSx08v7uIOtZmzaaLDbxXdEbbj/kIztJWkQv7Nu0uKxFSDfoIhrjUr1sOsN rC4+g4IsA0HYXOBjUyC/aupvuTzd6XAJ/WGEHKqtyqtzgfijPjMkWEvpoihpGYtpFVQdD0 ExWpLPrVS/VR0JRW7oGD69vlFDd4tAkzjWwmgzX2AAdZLL4i0xVXKdZHuxQ3AkhhLJ78U2 pCuwx+GrnZ7zoOGsDIPZO2FlSdpb/V0321KeVSRsb2NpQPLve+d40qImsg3tUXLnZK4nVu ze54EME2wXMd07exv/4RZHW/78muiHGtMO/ZbCs2VVry32ccDv8a4NrvmwuL6w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1694190182; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Jn4eRlPtJKnHeG2pf4chEV3J19o5X2lAFc+vObKmtf4=; b=gWAhtHIdSP9vLpmF68bXeg3BCyRU51DPT0DXD59lGHxbES8+cIxB7f3gOWi/T6KKL+KNc9 RCF7jt8ieK14XfCQ== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v3 2/7] locking/rtmutex: Avoid unconditional slowpath for DEBUG_RT_MUTEXES Date: Fri, 8 Sep 2023 18:22:49 +0200 Message-Id: <20230908162254.999499-3-bigeasy@linutronix.de> In-Reply-To: <20230908162254.999499-1-bigeasy@linutronix.de> References: <20230908162254.999499-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" With DEBUG_RT_MUTEXES enabled the fast-path rt_mutex_cmpxchg_acquire() always fails and all lock operations take the slow path. Provide a new helper inline rt_mutex_try_acquire() which maps to rt_mutex_cmpxchg_acquire() in the non-debug case. For the debug case it invokes rt_mutex_slowtrylock() which can acquire a non-contended rtmutex under full debug coverage. Signed-off-by: Thomas Gleixner Signed-off-by: "Peter Zijlstra (Intel)" Link: https://lkml.kernel.org/r/20230427111937.2745231-4-bigeasy@linutronix= .de Link: https://lore.kernel.org/r/20230815111430.220899937@infradead.org Link: https://lore.kernel.org/r/20230825181033.504534-3-bigeasy@linutronix.= de Signed-off-by: Sebastian Andrzej Siewior --- kernel/locking/rtmutex.c | 21 ++++++++++++++++++++- kernel/locking/ww_rt_mutex.c | 2 +- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 21db0df0eb000..bcec0533a0cc0 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -218,6 +218,11 @@ static __always_inline bool rt_mutex_cmpxchg_acquire(s= truct rt_mutex_base *lock, return try_cmpxchg_acquire(&lock->owner, &old, new); } =20 +static __always_inline bool rt_mutex_try_acquire(struct rt_mutex_base *loc= k) +{ + return rt_mutex_cmpxchg_acquire(lock, NULL, current); +} + static __always_inline bool rt_mutex_cmpxchg_release(struct rt_mutex_base = *lock, struct task_struct *old, struct task_struct *new) @@ -297,6 +302,20 @@ static __always_inline bool rt_mutex_cmpxchg_acquire(s= truct rt_mutex_base *lock, =20 } =20 +static int __sched rt_mutex_slowtrylock(struct rt_mutex_base *lock); + +static __always_inline bool rt_mutex_try_acquire(struct rt_mutex_base *loc= k) +{ + /* + * With debug enabled rt_mutex_cmpxchg trylock() will always fail. + * + * Avoid unconditionally taking the slow path by using + * rt_mutex_slow_trylock() which is covered by the debug code and can + * acquire a non-contended rtmutex. + */ + return rt_mutex_slowtrylock(lock); +} + static __always_inline bool rt_mutex_cmpxchg_release(struct rt_mutex_base = *lock, struct task_struct *old, struct task_struct *new) @@ -1755,7 +1774,7 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_= base *lock, static __always_inline int __rt_mutex_lock(struct rt_mutex_base *lock, unsigned int state) { - if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) + if (likely(rt_mutex_try_acquire(lock))) return 0; =20 return rt_mutex_slowlock(lock, NULL, state); diff --git a/kernel/locking/ww_rt_mutex.c b/kernel/locking/ww_rt_mutex.c index d1473c624105c..c7196de838edc 100644 --- a/kernel/locking/ww_rt_mutex.c +++ b/kernel/locking/ww_rt_mutex.c @@ -62,7 +62,7 @@ __ww_rt_mutex_lock(struct ww_mutex *lock, struct ww_acqui= re_ctx *ww_ctx, } mutex_acquire_nest(&rtm->dep_map, 0, 0, nest_lock, ip); =20 - if (likely(rt_mutex_cmpxchg_acquire(&rtm->rtmutex, NULL, current))) { + if (likely(rt_mutex_try_acquire(&rtm->rtmutex))) { if (ww_ctx) ww_mutex_set_context_fastpath(lock, ww_ctx); return 0; --=20 2.40.1 From nobody Tue Dec 16 18:34:41 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E742EE8015 for ; Fri, 8 Sep 2023 16:23:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245088AbjIHQXQ (ORCPT ); Fri, 8 Sep 2023 12:23:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245034AbjIHQXL (ORCPT ); Fri, 8 Sep 2023 12:23:11 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C03D1FDA for ; Fri, 8 Sep 2023 09:23:04 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1694190183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hwUwmUlH3g9zQEW7d2rAilsnYPzclWIyRWrBB3Q075A=; b=cEGa+x70PHbU5kxzK0/7lDCu23itAk3KvdE5jtcMhvj5FEYwHXlvFlba0rfAEmroV++28X 4y9xdTayo0KaMjsc00nGNeuZaLVm7QH1xK4r/36wU920RRksHKZheeNABv68MYii0XRu1d j7ozsqsSkDz7Gq9WqYBlm4hs3kFycgTgjRGnZ28hLCO8HkanDvHXfjd1uYr/Qr+J0vAkya FG5trYXv8aTKsITiMQPXTNEDFDpHPcSZzO+7nc+F/3KGr7FTCLvTAOQUZcy7t0m0vRtUu0 fC8L5dhrdpTaFHgpbkLyvPE3t1ZmLdiLNYcYW8MkZM+YB8li8c1jsKxCJlPx0g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1694190183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hwUwmUlH3g9zQEW7d2rAilsnYPzclWIyRWrBB3Q075A=; b=YtWjObXrviPmAWxd0YiKCkf5WxNrMYJhCwU9jOKr8R1ozaEZtjXffAOeV4Y5E1KV0/AAM6 BbkD+XBM2EjgE+Cw== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v3 3/7] sched: Extract __schedule_loop() Date: Fri, 8 Sep 2023 18:22:50 +0200 Message-Id: <20230908162254.999499-4-bigeasy@linutronix.de> In-Reply-To: <20230908162254.999499-1-bigeasy@linutronix.de> References: <20230908162254.999499-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Thomas Gleixner There are currently two implementations of this basic __schedule() loop, and there is soon to be a third. Signed-off-by: Thomas Gleixner Signed-off-by: "Peter Zijlstra (Intel)" Link: https://lkml.kernel.org/r/20230427111937.2745231-2-bigeasy@linutronix= .de Link: https://lore.kernel.org/r/20230815111430.288063671@infradead.org Link: https://lore.kernel.org/r/20230825181033.504534-4-bigeasy@linutronix.= de Signed-off-by: Sebastian Andrzej Siewior --- kernel/sched/core.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d55564097bd86..1ea7ba53aad24 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6770,16 +6770,21 @@ static void sched_update_worker(struct task_struct = *tsk) } } =20 +static __always_inline void __schedule_loop(unsigned int sched_mode) +{ + do { + preempt_disable(); + __schedule(sched_mode); + sched_preempt_enable_no_resched(); + } while (need_resched()); +} + asmlinkage __visible void __sched schedule(void) { struct task_struct *tsk =3D current; =20 sched_submit_work(tsk); - do { - preempt_disable(); - __schedule(SM_NONE); - sched_preempt_enable_no_resched(); - } while (need_resched()); + __schedule_loop(SM_NONE); sched_update_worker(tsk); } EXPORT_SYMBOL(schedule); @@ -6843,11 +6848,7 @@ void __sched schedule_preempt_disabled(void) #ifdef CONFIG_PREEMPT_RT void __sched notrace schedule_rtlock(void) { - do { - preempt_disable(); - __schedule(SM_RTLOCK_WAIT); - sched_preempt_enable_no_resched(); - } while (need_resched()); + __schedule_loop(SM_RTLOCK_WAIT); } NOKPROBE_SYMBOL(schedule_rtlock); #endif --=20 2.40.1 From nobody Tue Dec 16 18:34:41 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A205EE801B for ; Fri, 8 Sep 2023 16:23:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245099AbjIHQXS (ORCPT ); Fri, 8 Sep 2023 12:23:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245045AbjIHQXL (ORCPT ); Fri, 8 Sep 2023 12:23:11 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E445E1FDB for ; Fri, 8 Sep 2023 09:23:04 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1694190183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=q8XyPC96iStfUFVrLnHs+IfiJcdMzFSK7KsPSMVSlGo=; b=hScGYXwinvNw5TInyfHW8weryumWcKTH5sBs615qtM0EWtOqsvWFjZCoXVNu2lRw9BWCQU h934IMuG6DPlGLAZoGiLWMa0W5QF0oh0mr3eISsONyvoE8xx2pr+O7lW4MChUJpzQU0tLo 1Aowu2g9IvNm1IcvhEoXNE2lYcpQR24zGimPOH1DIDcyEWvnimVSCDV/RTiuP06n7rGmYU 82/qaOOmVRlG5d5dzY8B4+d6Uwlgj41xcNFxjgRJtWNMcPvuWU2zlt3/ilHTZZpcCtlCGK mWOaTTDf1hv1DJPC2PUoA5O3JSPoZyOge7FCfUyZToAOXsNIIEEnhs4Xmkji9g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1694190183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=q8XyPC96iStfUFVrLnHs+IfiJcdMzFSK7KsPSMVSlGo=; b=5vIyudG5J1u+lwH7EHsbb52FgVuIIJv0sgowgYcXHGreW9D6/1PCz3z6dTDwy6zvqlIeX1 GUTTN8kzvLvyOOCQ== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v3 4/7] sched: Provide rt_mutex specific scheduler helpers Date: Fri, 8 Sep 2023 18:22:51 +0200 Message-Id: <20230908162254.999499-5-bigeasy@linutronix.de> In-Reply-To: <20230908162254.999499-1-bigeasy@linutronix.de> References: <20230908162254.999499-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra With PREEMPT_RT there is a rt_mutex recursion problem where sched_submit_work() can use an rtlock (aka spinlock_t). More specifically what happens is: mutex_lock() /* really rt_mutex */ ... __rt_mutex_slowlock_locked() task_blocks_on_rt_mutex() // enqueue current task as waiter // do PI chain walk rt_mutex_slowlock_block() schedule() sched_submit_work() ... spin_lock() /* really rtlock */ ... __rt_mutex_slowlock_locked() task_blocks_on_rt_mutex() // enqueue current task as waiter *AGAIN* // *CONFUSION* Fix this by making rt_mutex do the sched_submit_work() early, before it enqueues itself as a waiter -- before it even knows *if* it will wait. [[ basically Thomas' patch but with different naming and a few asserts added ]] Originally-by: Thomas Gleixner Signed-off-by: "Peter Zijlstra (Intel)" Link: https://lore.kernel.org/r/20230815111430.355375399@infradead.org Link: https://lore.kernel.org/r/20230825181033.504534-5-bigeasy@linutronix.= de Signed-off-by: Sebastian Andrzej Siewior --- include/linux/sched.h | 3 +++ include/linux/sched/rt.h | 4 ++++ kernel/sched/core.c | 36 ++++++++++++++++++++++++++++++++---- 3 files changed, 39 insertions(+), 4 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 77f01ac385f7a..67623ffd4a8ea 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -911,6 +911,9 @@ struct task_struct { * ->sched_remote_wakeup gets used, so it can be in this word. */ unsigned sched_remote_wakeup:1; +#ifdef CONFIG_RT_MUTEXES + unsigned sched_rt_mutex:1; +#endif =20 /* Bit to tell LSMs we're in execve(): */ unsigned in_execve:1; diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h index 994c25640e156..b2b9e6eb96830 100644 --- a/include/linux/sched/rt.h +++ b/include/linux/sched/rt.h @@ -30,6 +30,10 @@ static inline bool task_is_realtime(struct task_struct *= tsk) } =20 #ifdef CONFIG_RT_MUTEXES +extern void rt_mutex_pre_schedule(void); +extern void rt_mutex_schedule(void); +extern void rt_mutex_post_schedule(void); + /* * Must hold either p->pi_lock or task_rq(p)->lock. */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1ea7ba53aad24..58d0346d1bb3b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6723,9 +6723,6 @@ static inline void sched_submit_work(struct task_stru= ct *tsk) static DEFINE_WAIT_OVERRIDE_MAP(sched_map, LD_WAIT_CONFIG); unsigned int task_flags; =20 - if (task_is_running(tsk)) - return; - /* * Establish LD_WAIT_CONFIG context to ensure none of the code called * will use a blocking primitive -- which would lead to recursion. @@ -6783,7 +6780,12 @@ asmlinkage __visible void __sched schedule(void) { struct task_struct *tsk =3D current; =20 - sched_submit_work(tsk); +#ifdef CONFIG_RT_MUTEXES + lockdep_assert(!tsk->sched_rt_mutex); +#endif + + if (!task_is_running(tsk)) + sched_submit_work(tsk); __schedule_loop(SM_NONE); sched_update_worker(tsk); } @@ -7044,6 +7046,32 @@ static void __setscheduler_prio(struct task_struct *= p, int prio) =20 #ifdef CONFIG_RT_MUTEXES =20 +/* + * Would be more useful with typeof()/auto_type but they don't mix with + * bit-fields. Since it's a local thing, use int. Keep the generic sounding + * name such that if someone were to implement this function we get to com= pare + * notes. + */ +#define fetch_and_set(x, v) ({ int _x =3D (x); (x) =3D (v); _x; }) + +void rt_mutex_pre_schedule(void) +{ + lockdep_assert(!fetch_and_set(current->sched_rt_mutex, 1)); + sched_submit_work(current); +} + +void rt_mutex_schedule(void) +{ + lockdep_assert(current->sched_rt_mutex); + __schedule_loop(SM_NONE); +} + +void rt_mutex_post_schedule(void) +{ + sched_update_worker(current); + lockdep_assert(fetch_and_set(current->sched_rt_mutex, 0)); +} + static inline int __rt_effective_prio(struct task_struct *pi_task, int pri= o) { if (pi_task) --=20 2.40.1 From nobody Tue Dec 16 18:34:41 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0360EE8015 for ; Fri, 8 Sep 2023 16:23:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245117AbjIHQXX (ORCPT ); Fri, 8 Sep 2023 12:23:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245055AbjIHQXM (ORCPT ); Fri, 8 Sep 2023 12:23:12 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C83921FDC for ; Fri, 8 Sep 2023 09:23:07 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1694190183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=p78TCAeYLfEIiaLWFh2fUP2KxDHjOeSpz9mjG1lU/Pk=; b=aItuojN9btuYIQJt9txwKKWqwlgOWnyN26xd33wCIJIAoJR69pHwE7Fh0v3D9SCJ0QcGve 0b8TR1YMZfHhPlo0ZcW9ZcBDm7bu5/icSbH7PPGgeAAw7nH/NZb6WfrJDwaG3SkXvPOQ9l LXczbYh8OLy7t9jDjTn5fGKiV1pk5dhHqiScgyDKJlRviqJc4/I+WhtCGQcZ7FKzpZj1j3 qL+5VRKM2+7y29v1GNa7S9gFrycsfdiTVmqxwWw00q9Bl9XVpFnJyGAc2FpeUY5K3Bp8jQ 0loAOA/A7164jCgC26E2JgaXlfk03dbJ+GUjcZXMfZ7tcss+JyYpVfswGo/zOA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1694190183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=p78TCAeYLfEIiaLWFh2fUP2KxDHjOeSpz9mjG1lU/Pk=; b=tv3yTXjoj4vBzm2Bm6s/MU1yCJYYyL20YyXJ9cNmsDlO9cXyZ0qBBP9q7DLEUXUix7vbki OlCEX2BB8AN4D9BQ== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v3 5/7] locking/rtmutex: Use rt_mutex specific scheduler helpers Date: Fri, 8 Sep 2023 18:22:52 +0200 Message-Id: <20230908162254.999499-6-bigeasy@linutronix.de> In-Reply-To: <20230908162254.999499-1-bigeasy@linutronix.de> References: <20230908162254.999499-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Have rt_mutex use the rt_mutex specific scheduler helpers to avoid recursion vs rtlock on the PI state. [[ peterz: adapted to new names ]] Reported-by: Crystal Wood Signed-off-by: "Peter Zijlstra (Intel)" Link: https://lore.kernel.org/r/20230815111430.421408298@infradead.org Link: https://lore.kernel.org/r/20230825181033.504534-6-bigeasy@linutronix.= de Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/pi.c | 11 +++++++++++ kernel/locking/rtmutex.c | 14 ++++++++++++-- kernel/locking/rwbase_rt.c | 6 ++++++ kernel/locking/rwsem.c | 8 +++++++- kernel/locking/spinlock_rt.c | 4 ++++ 5 files changed, 40 insertions(+), 3 deletions(-) diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index ce2889f123755..f8e65b27d9d6b 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-or-later =20 #include +#include #include =20 #include "futex.h" @@ -1002,6 +1003,12 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fl= ags, ktime_t *time, int tryl goto no_block; } =20 + /* + * Must be done before we enqueue the waiter, here is unfortunately + * under the hb lock, but that *should* work because it does nothing. + */ + rt_mutex_pre_schedule(); + rt_mutex_init_waiter(&rt_waiter); =20 /* @@ -1052,6 +1059,10 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fl= ags, ktime_t *time, int tryl if (ret && !rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter= )) ret =3D 0; =20 + /* + * Waiter is unqueued. + */ + rt_mutex_post_schedule(); no_block: /* * Fixup the pi_state owner and possibly acquire the lock if we diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index bcec0533a0cc0..a3fe05dfd0d8f 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1632,7 +1632,7 @@ static int __sched rt_mutex_slowlock_block(struct rt_= mutex_base *lock, raw_spin_unlock_irq(&lock->wait_lock); =20 if (!owner || !rtmutex_spin_on_owner(lock, waiter, owner)) - schedule(); + rt_mutex_schedule(); =20 raw_spin_lock_irq(&lock->wait_lock); set_current_state(state); @@ -1661,7 +1661,7 @@ static void __sched rt_mutex_handle_deadlock(int res,= int detect_deadlock, WARN(1, "rtmutex deadlock detected\n"); while (1) { set_current_state(TASK_INTERRUPTIBLE); - schedule(); + rt_mutex_schedule(); } } =20 @@ -1756,6 +1756,15 @@ static int __sched rt_mutex_slowlock(struct rt_mutex= _base *lock, unsigned long flags; int ret; =20 + /* + * Do all pre-schedule work here, before we queue a waiter and invoke + * PI -- any such work that trips on rtlock (PREEMPT_RT spinlock) would + * otherwise recurse back into task_blocks_on_rt_mutex() through + * rtlock_slowlock() and will then enqueue a second waiter for this + * same task and things get really confusing real fast. + */ + rt_mutex_pre_schedule(); + /* * Technically we could use raw_spin_[un]lock_irq() here, but this can * be called in early boot if the cmpxchg() fast path is disabled @@ -1767,6 +1776,7 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_= base *lock, raw_spin_lock_irqsave(&lock->wait_lock, flags); ret =3D __rt_mutex_slowlock_locked(lock, ww_ctx, state); raw_spin_unlock_irqrestore(&lock->wait_lock, flags); + rt_mutex_post_schedule(); =20 return ret; } diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index 25ec0239477c2..c7258cb32d91b 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -71,6 +71,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *r= wb, struct rt_mutex_base *rtm =3D &rwb->rtmutex; int ret; =20 + rwbase_pre_schedule(); raw_spin_lock_irq(&rtm->wait_lock); =20 /* @@ -125,6 +126,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt = *rwb, rwbase_rtmutex_unlock(rtm); =20 trace_contention_end(rwb, ret); + rwbase_post_schedule(); return ret; } =20 @@ -237,6 +239,8 @@ static int __sched rwbase_write_lock(struct rwbase_rt *= rwb, /* Force readers into slow path */ atomic_sub(READER_BIAS, &rwb->readers); =20 + rwbase_pre_schedule(); + raw_spin_lock_irqsave(&rtm->wait_lock, flags); if (__rwbase_write_trylock(rwb)) goto out_unlock; @@ -248,6 +252,7 @@ static int __sched rwbase_write_lock(struct rwbase_rt *= rwb, if (rwbase_signal_pending_state(state, current)) { rwbase_restore_current_state(); __rwbase_write_unlock(rwb, 0, flags); + rwbase_post_schedule(); trace_contention_end(rwb, -EINTR); return -EINTR; } @@ -266,6 +271,7 @@ static int __sched rwbase_write_lock(struct rwbase_rt *= rwb, =20 out_unlock: raw_spin_unlock_irqrestore(&rtm->wait_lock, flags); + rwbase_post_schedule(); return 0; } =20 diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 9eabd585ce7af..2340b6d90ec6f 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1427,8 +1427,14 @@ static inline void __downgrade_write(struct rw_semap= hore *sem) #define rwbase_signal_pending_state(state, current) \ signal_pending_state(state, current) =20 +#define rwbase_pre_schedule() \ + rt_mutex_pre_schedule() + #define rwbase_schedule() \ - schedule() + rt_mutex_schedule() + +#define rwbase_post_schedule() \ + rt_mutex_post_schedule() =20 #include "rwbase_rt.c" =20 diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c index 48a19ed8486d8..842037b2ba548 100644 --- a/kernel/locking/spinlock_rt.c +++ b/kernel/locking/spinlock_rt.c @@ -184,9 +184,13 @@ static __always_inline int rwbase_rtmutex_trylock(str= uct rt_mutex_base *rtm) =20 #define rwbase_signal_pending_state(state, current) (0) =20 +#define rwbase_pre_schedule() + #define rwbase_schedule() \ schedule_rtlock() =20 +#define rwbase_post_schedule() + #include "rwbase_rt.c" /* * The common functions which get wrapped into the rwlock API. --=20 2.40.1 From nobody Tue Dec 16 18:34:41 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EC04EE801A for ; Fri, 8 Sep 2023 16:23:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245110AbjIHQXU (ORCPT ); Fri, 8 Sep 2023 12:23:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245050AbjIHQXL (ORCPT ); Fri, 8 Sep 2023 12:23:11 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C86671FDE for ; Fri, 8 Sep 2023 09:23:07 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1694190184; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nObt6Rvv/iHjuSo5tpmHNv2n1ioX4SkGPdB4ZPBUYeY=; b=mtBxwqvjAxfoi0m1hDD+w3rlg+zu4IfmIjbxZMOYL4n18yyQg2FtX67LHSWauM0nWgcfPn qe82Tqy/bnwPkuI34YsAqtSeRk1Q9hcXO1WFgtxsgro/MemddBRGvDx6bZGjG2K/h+AGAd 0gKToJxVU4Tkv6Ydwk68q+hN7cL7wM4regMDqf5UesVI008zuPVJ+jrsTCbk0dEXI1rpXF SfYlH7fiL/aSVpg2FGm5WVAWBKqYt6uYn0I/nn78mK9TmDr+JJ3GBs3LID+9HszzicRJwy PMW5+U6K1Crup7si/exg2yycF0mKyoaTMb8W1Y8SS/GzUNAaDyBw/+38583cUg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1694190184; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nObt6Rvv/iHjuSo5tpmHNv2n1ioX4SkGPdB4ZPBUYeY=; b=LX3WNpvN584hgnLxToJvVAombHO2/7mZ1jvpcbiA/T9yTySlhhQL2/mEGlIwR+uX4E0eI4 fjCpxZjciy3SroCg== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v3 6/7] locking/rtmutex: Add a lockdep assert to catch potential nested blocking Date: Fri, 8 Sep 2023 18:22:53 +0200 Message-Id: <20230908162254.999499-7-bigeasy@linutronix.de> In-Reply-To: <20230908162254.999499-1-bigeasy@linutronix.de> References: <20230908162254.999499-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Thomas Gleixner There used to be a BUG_ON(current->pi_blocked_on) in the lock acquisition functions, but that vanished in one of the rtmutex overhauls. Bring it back in form of a lockdep assert to catch code paths which take rtmutex based locks with current::pi_blocked_on !=3D NULL. Reported-by: Crystal Wood Signed-off-by: Thomas Gleixner Signed-off-by: "Peter Zijlstra (Intel)" Link: https://lkml.kernel.org/r/20230427111937.2745231-5-bigeasy@linutronix= .de Link: https://lore.kernel.org/r/20230815111430.488430699@infradead.org Link: https://lore.kernel.org/r/20230825181033.504534-7-bigeasy@linutronix.= de Signed-off-by: Sebastian Andrzej Siewior --- kernel/locking/rtmutex.c | 2 ++ kernel/locking/rwbase_rt.c | 2 ++ kernel/locking/spinlock_rt.c | 2 ++ 3 files changed, 6 insertions(+) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index a3fe05dfd0d8f..4a10e8c16fd2b 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1784,6 +1784,8 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_= base *lock, static __always_inline int __rt_mutex_lock(struct rt_mutex_base *lock, unsigned int state) { + lockdep_assert(!current->pi_blocked_on); + if (likely(rt_mutex_try_acquire(lock))) return 0; =20 diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index c7258cb32d91b..34a59569db6be 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -133,6 +133,8 @@ static int __sched __rwbase_read_lock(struct rwbase_rt = *rwb, static __always_inline int rwbase_read_lock(struct rwbase_rt *rwb, unsigned int state) { + lockdep_assert(!current->pi_blocked_on); + if (rwbase_read_trylock(rwb)) return 0; =20 diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c index 842037b2ba548..38e292454fccb 100644 --- a/kernel/locking/spinlock_rt.c +++ b/kernel/locking/spinlock_rt.c @@ -37,6 +37,8 @@ =20 static __always_inline void rtlock_lock(struct rt_mutex_base *rtm) { + lockdep_assert(!current->pi_blocked_on); + if (unlikely(!rt_mutex_cmpxchg_acquire(rtm, NULL, current))) rtlock_slowlock(rtm); } --=20 2.40.1 From nobody Tue Dec 16 18:34:41 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CF1DEE801B for ; Fri, 8 Sep 2023 16:23:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245087AbjIHQXZ (ORCPT ); Fri, 8 Sep 2023 12:23:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245057AbjIHQXM (ORCPT ); Fri, 8 Sep 2023 12:23:12 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E005C1FDF for ; Fri, 8 Sep 2023 09:23:07 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1694190184; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hOKFGzCotOHUxaKZl+Fgb2qMIRGIic/quC+6eg/2sYY=; b=1rRc0V2wh9Zv5NrTYDX4/GemFkw/fV7quKyuMg074p4uyPEBzZDVHeW3yIgrFNqkYg0TXg qCsGk1b6+UNq96vRd1Ded4tUcGeuJNpGqmvqUPza0TM6toaVsPOGCPLMkP7hrFq/T6aU+f 0l1nVGYaQH5Fr0heqFzk/8OLVYl1MQKfnIwJnS4+TYMQrXBggC7AybQh0rlFeKM3YOA2UO FQmAn1RYZM9jNMydx2ezobyXx5kwLRhc8P/ANIEoODJWsuw6C0GmW2egz+wawUK67P4mjo YYHrwbJNLZ5FZbeuo+xVI9S4+ck10dcTZXNNWTpA/7CwZHnE4gpqKErjzFIanw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1694190184; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hOKFGzCotOHUxaKZl+Fgb2qMIRGIic/quC+6eg/2sYY=; b=mEtAfgfYyUy5u/kdOqGXjHD58UafkN6Z3fiS5u2tQlOGhYf6MQ7twR6V+8ggY87evXkS1T GwR+zcwnA0gKvqCA== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v3 7/7] locking/rtmutex: Acquire the hb lock via trylock after wait-proxylock. Date: Fri, 8 Sep 2023 18:22:54 +0200 Message-Id: <20230908162254.999499-8-bigeasy@linutronix.de> In-Reply-To: <20230908162254.999499-1-bigeasy@linutronix.de> References: <20230908162254.999499-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" After rt_mutex_wait_proxy_lock() task_struct::pi_blocked_on is cleared if current owns the lock. If the operation has been interrupted by a signal or timeout then pi_blocked_on can be set. This means spin_lock() *can* overwrite pi_blocked_on on PREEMPT_RT. This has been noticed by the recently added lockdep-asserts=E2=80=A6 The rt_mutex_cleanup_proxy_lock() operation will clear pi_blocked_on (and update pending waiters as expected) but it must happen under the hb lock to ensure the same state in rtmutex and userland. Given all the possibilities it is probably the simplest option to try-lock the hb lock. In case the lock is occupied a quick nap is needed. A busy loop can lock up the system if performed by a task with high priorioty preventing the owner from running. The rt_mutex_post_schedule() needs to be put before try-lock-loop because otherwie the schedule() in schedule_hrtimeout() will trip over the !sched_rt_mutex assert. Introduce futex_trylock_hblock() to try-lock the hb lock and sleep until the try-lock operation succeeds. Use it after rt_mutex_wait_proxy_lock() to acquire the lock. Suggested-by: Thomas Gleixner Link: https://lore.kernel.org/r/20230831095314.fTliy0Bh@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Sebastian Andrzej Siewior Reviewed-by: Thomas Gleixner Tested-by: Sebastian Andrzej Siewior --- kernel/futex/futex.h | 23 +++++++++++++++++++++++ kernel/futex/pi.c | 10 ++++------ kernel/futex/requeue.c | 4 ++-- kernel/locking/rtmutex.c | 7 +++++-- 4 files changed, 34 insertions(+), 10 deletions(-) diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index b5379c0e6d6d1..58fbc34a59811 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -254,6 +254,29 @@ double_unlock_hb(struct futex_hash_bucket *hb1, struct= futex_hash_bucket *hb2) spin_unlock(&hb2->lock); } =20 +static inline void futex_trylock_hblock(spinlock_t *lock) +{ + do { + ktime_t chill_time;; + + /* + * Current is not longer pi_blocked_on if it owns the lock. It + * can still have pi_blocked_on set if the lock acquiring was + * interrupted by signal or timeout. The trylock operation does + * not clobber pi_blocked_on so it is the only option. + * Should the try-lock operation fail then it needs leave the CPU + * to avoid a busy loop in case it is the task with the highest + * priority. + */ + if (spin_trylock(lock)) + return; + + chill_time =3D ktime_set(0, NSEC_PER_MSEC); + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_hrtimeout(&chill_time, HRTIMER_MODE_REL_HARD); + } while (1); +} + /* syscalls */ =20 extern int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, u32 diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index f8e65b27d9d6b..1440fdcdbfd8c 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -1046,7 +1046,10 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fl= ags, ktime_t *time, int tryl ret =3D rt_mutex_wait_proxy_lock(&q.pi_state->pi_mutex, to, &rt_waiter); =20 cleanup: - spin_lock(q.lock_ptr); + rt_mutex_post_schedule(); + + futex_trylock_hblock(q.lock_ptr); + /* * If we failed to acquire the lock (deadlock/signal/timeout), we must * first acquire the hb->lock before removing the lock from the @@ -1058,11 +1061,6 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fl= ags, ktime_t *time, int tryl */ if (ret && !rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter= )) ret =3D 0; - - /* - * Waiter is unqueued. - */ - rt_mutex_post_schedule(); no_block: /* * Fixup the pi_state owner and possibly acquire the lock if we diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index cba8b1a6a4cc2..26888cfa74449 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -850,8 +850,8 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, pi_mutex =3D &q.pi_state->pi_mutex; ret =3D rt_mutex_wait_proxy_lock(pi_mutex, to, &rt_waiter); =20 - /* Current is not longer pi_blocked_on */ - spin_lock(q.lock_ptr); + futex_trylock_hblock(q.lock_ptr); + if (ret && !rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter)) ret =3D 0; =20 diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 4a10e8c16fd2b..e56585ef489c8 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1166,10 +1166,13 @@ try_to_take_rt_mutex(struct rt_mutex_base *lock, st= ruct task_struct *task, * Clear @task->pi_blocked_on. Requires protection by * @task->pi_lock. Redundant operation for the @waiter =3D=3D NULL * case, but conditionals are more expensive than a redundant - * store. + * store. But then there is FUTEX and if rt_mutex_wait_proxy_lock() + * did not acquire the lock it try-locks another lock before it clears + * @task->pi_blocked_on so we mustn't clear it here premature. */ raw_spin_lock(&task->pi_lock); - task->pi_blocked_on =3D NULL; + if (waiter) + task->pi_blocked_on =3D NULL; /* * Finish the lock acquisition. @task is the new owner. If * other waiters exist we have to insert the highest priority --=20 2.40.1