From nobody Thu Dec 18 23:07:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD3BCEE49A3 for ; Fri, 25 Aug 2023 18:11:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231981AbjHYSKw (ORCPT ); Fri, 25 Aug 2023 14:10:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231384AbjHYSKj (ORCPT ); Fri, 25 Aug 2023 14:10:39 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A70F81BF6 for ; Fri, 25 Aug 2023 11:10:37 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1692987036; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EEoQnpkaIgYq157v25Ctv6kJ87J+vk1fYfyBSQDYO/0=; b=uo8JcZipsORIpF8dlE7y/7vDxbAwmSYLmLKQP5lMdqNCKP6/V1fYaDk6Zy/cZpB5blI4Gy PROBGvIGEd35oqm8iDATPP1shknrb/fEdaOA9u62jEj7x2r+vLGxDnqbLs3gvgKQAJ6f74 M/HZBD/YNT9GQlFvYPAFmSu1tDaI28JCJpqw9Adr66uobfi7uDdU/VfJObid/Hjv92hKQ9 Oaks7xWLR3vzrvB7tqItFS3SWb0W6Rb5FYnUjIVEKSn/mLxH+ocH3bl50GXclpkIL3o8Q5 yKna1Vdm92bvRryDLLiWDGf1oRyrLg2Hmqlxd88Recbg5S0q0l3y5XQwL6R2Sw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1692987036; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EEoQnpkaIgYq157v25Ctv6kJ87J+vk1fYfyBSQDYO/0=; b=TLjhCS5WG1HXzi4nFO57WHjXmdGMu2EjVPARJ+MG1nuItGhykC50pvUlqsKpHZjWCkC8XV a5eelGXVUT2e2NAA== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v2 1/6] sched: Constrain locks in sched_submit_work() Date: Fri, 25 Aug 2023 20:10:28 +0200 Message-Id: <20230825181033.504534-2-bigeasy@linutronix.de> In-Reply-To: <20230825181033.504534-1-bigeasy@linutronix.de> References: <20230825181033.504534-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Even though sched_submit_work() is ran from preemptible context, it is discouraged to have it use blocking locks due to the recursion potential. Enforce this. Signed-off-by: "Peter Zijlstra (Intel)" Link: https://lore.kernel.org/r/20230815111430.154558666@infradead.org Signed-off-by: Sebastian Andrzej Siewior --- kernel/sched/core.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c52c2eba7c739..bca53ff9d8182 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6735,11 +6735,18 @@ void __noreturn do_task_dead(void) =20 static inline void sched_submit_work(struct task_struct *tsk) { + static DEFINE_WAIT_OVERRIDE_MAP(sched_map, LD_WAIT_CONFIG); unsigned int task_flags; =20 if (task_is_running(tsk)) return; =20 + /* + * Establish LD_WAIT_CONFIG context to ensure none of the code called + * will use a blocking primitive -- which would lead to recursion. + */ + lock_map_acquire_try(&sched_map); + task_flags =3D tsk->flags; /* * If a worker goes to sleep, notify and ask workqueue whether it @@ -6764,6 +6771,8 @@ static inline void sched_submit_work(struct task_stru= ct *tsk) * make sure to submit it to avoid deadlocks. */ blk_flush_plug(tsk->plug, true); + + lock_map_release(&sched_map); } =20 static void sched_update_worker(struct task_struct *tsk) --=20 2.40.1 From nobody Thu Dec 18 23:07:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BA2DEE49AB for ; Fri, 25 Aug 2023 18:11:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232319AbjHYSKy (ORCPT ); Fri, 25 Aug 2023 14:10:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231441AbjHYSKj (ORCPT ); Fri, 25 Aug 2023 14:10:39 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15FF52686 for ; Fri, 25 Aug 2023 11:10:38 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1692987036; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D0QlxMx+ZXd5rKyd7Cu6emMlYu1VSh6mQm5w9+3FG2w=; b=teEZHM4fzG7F7aNYUwfzI5lwHNP9c+oaoSVUAN18bfJtef9spINe/f2tqaHhfa+v4A6wyc UWzA87VBfnamPTpJyHcFZ7j3EIx9Jrtut5A04YclgGhod7btIHLQg1xtyAgjZXjz70Dv8h QR4EirWpwaXwroPMPm5wFbm9eNFwSd/ebhlntV6EnILePkXq8+Hl9HwpBbGGYtaIDBmp/z ZdjaZg+T583Cc3mPPQmjmFrYtVJG1PlW62MlCwKvaNL5cO1HRtFkcFPqGPsShiRAFqxUYa Jix8ABqWvbQ3i4ojWAxRIWf8DeMqGhwc46n/gFP1WUlLDOa1mNqVrssH8LWGBA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1692987036; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D0QlxMx+ZXd5rKyd7Cu6emMlYu1VSh6mQm5w9+3FG2w=; b=NG9tXaYr+hoW49142rOHlC8BNLh1xnOUIuJ2hhjuPL2NKz11k+EYe1l1chVC19iSeMz6b/ /JCIrAXrSwbXTNCA== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v2 2/6] locking/rtmutex: Avoid unconditional slowpath for DEBUG_RT_MUTEXES Date: Fri, 25 Aug 2023 20:10:29 +0200 Message-Id: <20230825181033.504534-3-bigeasy@linutronix.de> In-Reply-To: <20230825181033.504534-1-bigeasy@linutronix.de> References: <20230825181033.504534-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" With DEBUG_RT_MUTEXES enabled the fast-path rt_mutex_cmpxchg_acquire() always fails and all lock operations take the slow path. Provide a new helper inline rt_mutex_try_acquire() which maps to rt_mutex_cmpxchg_acquire() in the non-debug case. For the debug case it invokes rt_mutex_slowtrylock() which can acquire a non-contended rtmutex under full debug coverage. Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: "Peter Zijlstra (Intel)" Link: https://lkml.kernel.org/r/20230427111937.2745231-4-bigeasy@linutronix= .de Link: https://lore.kernel.org/r/20230815111430.220899937@infradead.org Signed-off-by: Sebastian Andrzej Siewior --- kernel/locking/rtmutex.c | 21 ++++++++++++++++++++- kernel/locking/ww_rt_mutex.c | 2 +- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 21db0df0eb000..bcec0533a0cc0 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -218,6 +218,11 @@ static __always_inline bool rt_mutex_cmpxchg_acquire(s= truct rt_mutex_base *lock, return try_cmpxchg_acquire(&lock->owner, &old, new); } =20 +static __always_inline bool rt_mutex_try_acquire(struct rt_mutex_base *loc= k) +{ + return rt_mutex_cmpxchg_acquire(lock, NULL, current); +} + static __always_inline bool rt_mutex_cmpxchg_release(struct rt_mutex_base = *lock, struct task_struct *old, struct task_struct *new) @@ -297,6 +302,20 @@ static __always_inline bool rt_mutex_cmpxchg_acquire(s= truct rt_mutex_base *lock, =20 } =20 +static int __sched rt_mutex_slowtrylock(struct rt_mutex_base *lock); + +static __always_inline bool rt_mutex_try_acquire(struct rt_mutex_base *loc= k) +{ + /* + * With debug enabled rt_mutex_cmpxchg trylock() will always fail. + * + * Avoid unconditionally taking the slow path by using + * rt_mutex_slow_trylock() which is covered by the debug code and can + * acquire a non-contended rtmutex. + */ + return rt_mutex_slowtrylock(lock); +} + static __always_inline bool rt_mutex_cmpxchg_release(struct rt_mutex_base = *lock, struct task_struct *old, struct task_struct *new) @@ -1755,7 +1774,7 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_= base *lock, static __always_inline int __rt_mutex_lock(struct rt_mutex_base *lock, unsigned int state) { - if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) + if (likely(rt_mutex_try_acquire(lock))) return 0; =20 return rt_mutex_slowlock(lock, NULL, state); diff --git a/kernel/locking/ww_rt_mutex.c b/kernel/locking/ww_rt_mutex.c index d1473c624105c..c7196de838edc 100644 --- a/kernel/locking/ww_rt_mutex.c +++ b/kernel/locking/ww_rt_mutex.c @@ -62,7 +62,7 @@ __ww_rt_mutex_lock(struct ww_mutex *lock, struct ww_acqui= re_ctx *ww_ctx, } mutex_acquire_nest(&rtm->dep_map, 0, 0, nest_lock, ip); =20 - if (likely(rt_mutex_cmpxchg_acquire(&rtm->rtmutex, NULL, current))) { + if (likely(rt_mutex_try_acquire(&rtm->rtmutex))) { if (ww_ctx) ww_mutex_set_context_fastpath(lock, ww_ctx); return 0; --=20 2.40.1 From nobody Thu Dec 18 23:07:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 511FCC3DA6F for ; Fri, 25 Aug 2023 18:11:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232197AbjHYSKx (ORCPT ); Fri, 25 Aug 2023 14:10:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231473AbjHYSKk (ORCPT ); Fri, 25 Aug 2023 14:10:40 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 808151BF1 for ; Fri, 25 Aug 2023 11:10:38 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1692987037; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JrBWE7gLV3gdYOwuqDxnLxb8n9IPkKOACA7XtApmVIA=; b=pvAhKb4POo0xn/2YS95w0YMRCWBZqOhIwM3VPMlABD3kmOxrw/LRd9aXZCQK6o2vwguMNt 0X7XMIPVYQIpuyb+GqfZDy3Q150/lcdDieLsAsQ0nbli7CNTJOo1YHgD4k8Bg6cuIXVjnd 9iWtrdGLqam1SjvgwLTGSTpunoRixT/ieO4kEIPEZqdpwX8e+CHhQkXjh/ueZm/l2+Nr8C Qpez1ffJFzLpbgU9nX/Qa8V7QdzWYh7VSUQsZDbco53FmGt4TimPg59c6r0vm/Zg8tuo5F a+qMuX3g9LNOKtclrAfihTOfO6Q5QI3MOMYy0+tGmOtrLmb+iQHYmtHp7pmGpQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1692987037; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JrBWE7gLV3gdYOwuqDxnLxb8n9IPkKOACA7XtApmVIA=; b=JwznRK/95g0m1Q6vQMQ22/b41a58RR8j1o4hyuZ9CtNSIZKQN2HtS/CuPwPvbH8IG8Z82z jzFEB0z2+aAF26AQ== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v2 3/6] sched: Extract __schedule_loop() Date: Fri, 25 Aug 2023 20:10:30 +0200 Message-Id: <20230825181033.504534-4-bigeasy@linutronix.de> In-Reply-To: <20230825181033.504534-1-bigeasy@linutronix.de> References: <20230825181033.504534-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Thomas Gleixner There are currently two implementations of this basic __schedule() loop, and there is soon to be a third. Signed-off-by: Thomas Gleixner Signed-off-by: "Peter Zijlstra (Intel)" Link: https://lkml.kernel.org/r/20230427111937.2745231-2-bigeasy@linutronix= .de Link: https://lore.kernel.org/r/20230815111430.288063671@infradead.org Signed-off-by: Sebastian Andrzej Siewior --- kernel/sched/core.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index bca53ff9d8182..901766a88afc3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6785,16 +6785,21 @@ static void sched_update_worker(struct task_struct = *tsk) } } =20 +static __always_inline void __schedule_loop(unsigned int sched_mode) +{ + do { + preempt_disable(); + __schedule(sched_mode); + sched_preempt_enable_no_resched(); + } while (need_resched()); +} + asmlinkage __visible void __sched schedule(void) { struct task_struct *tsk =3D current; =20 sched_submit_work(tsk); - do { - preempt_disable(); - __schedule(SM_NONE); - sched_preempt_enable_no_resched(); - } while (need_resched()); + __schedule_loop(SM_NONE); sched_update_worker(tsk); } EXPORT_SYMBOL(schedule); @@ -6858,11 +6863,7 @@ void __sched schedule_preempt_disabled(void) #ifdef CONFIG_PREEMPT_RT void __sched notrace schedule_rtlock(void) { - do { - preempt_disable(); - __schedule(SM_RTLOCK_WAIT); - sched_preempt_enable_no_resched(); - } while (need_resched()); + __schedule_loop(SM_RTLOCK_WAIT); } NOKPROBE_SYMBOL(schedule_rtlock); #endif --=20 2.40.1 From nobody Thu Dec 18 23:07:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74713EE49A6 for ; Fri, 25 Aug 2023 18:11:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232658AbjHYSK6 (ORCPT ); Fri, 25 Aug 2023 14:10:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231577AbjHYSKl (ORCPT ); Fri, 25 Aug 2023 14:10:41 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E10B81BF6 for ; Fri, 25 Aug 2023 11:10:38 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1692987037; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sEiDarfZytMC81gX6rswyLHInvaAEer86wNzf/7j3Cc=; b=hyUbF/41PoElNtCIj72ODJ8T/ojWzWa3y77DHNpp9SO8nIhkcfr9oVg5JG5LSM7MfO88Xy 8K5Aa/grJG/Tc2CtWAi/NQPkeSzI7hmX01NvP/sFdr5i7gnfq81XHu3ajkz0Ld/H9uL/0L ydVglCWtDEWXYlt1tlpwGJJIXvRFCSGoocJxUm9E8Ofp7F8Y5CK79VKzDwcYBq6i2zuOZ+ L6jCDa6KK9P8dEkfA6TrUMsw7detJ6fwZVF2+cECIgKFJTsahS5ApsYzpZDLGfxoJOwm4J rz9ZDp1Qd8rtLXdbTCkjcCbVakGyrXt6dLPSiV5bEVH/qb3Z823qF5+9uUdJpQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1692987037; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sEiDarfZytMC81gX6rswyLHInvaAEer86wNzf/7j3Cc=; b=DAgnOvWzMK303tKGW0aokfPjKZXPlOBp1JwPQKYaMKsffmONNKm729XLfQUIVHkTA3ShtV /FIo0YZn8dE/hNCg== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v2 4/6] sched: Provide rt_mutex specific scheduler helpers Date: Fri, 25 Aug 2023 20:10:31 +0200 Message-Id: <20230825181033.504534-5-bigeasy@linutronix.de> In-Reply-To: <20230825181033.504534-1-bigeasy@linutronix.de> References: <20230825181033.504534-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra With PREEMPT_RT there is a rt_mutex recursion problem where sched_submit_work() can use an rtlock (aka spinlock_t). More specifically what happens is: mutex_lock() /* really rt_mutex */ ... __rt_mutex_slowlock_locked() task_blocks_on_rt_mutex() // enqueue current task as waiter // do PI chain walk rt_mutex_slowlock_block() schedule() sched_submit_work() ... spin_lock() /* really rtlock */ ... __rt_mutex_slowlock_locked() task_blocks_on_rt_mutex() // enqueue current task as waiter *AGAIN* // *CONFUSION* Fix this by making rt_mutex do the sched_submit_work() early, before it enqueues itself as a waiter -- before it even knows *if* it will wait. [[ basically Thomas' patch but with different naming and a few asserts added ]] Originally-by: Thomas Gleixner Signed-off-by: "Peter Zijlstra (Intel)" Link: https://lore.kernel.org/r/20230815111430.355375399@infradead.org Signed-off-by: Sebastian Andrzej Siewior --- include/linux/sched.h | 3 +++ include/linux/sched/rt.h | 4 ++++ kernel/sched/core.c | 36 ++++++++++++++++++++++++++++++++---- 3 files changed, 39 insertions(+), 4 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 609bde814cb06..0ea7a023c6c73 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -906,6 +906,9 @@ struct task_struct { * ->sched_remote_wakeup gets used, so it can be in this word. */ unsigned sched_remote_wakeup:1; +#ifdef CONFIG_RT_MUTEXES + unsigned sched_rt_mutex:1; +#endif =20 /* Bit to tell LSMs we're in execve(): */ unsigned in_execve:1; diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h index 994c25640e156..b2b9e6eb96830 100644 --- a/include/linux/sched/rt.h +++ b/include/linux/sched/rt.h @@ -30,6 +30,10 @@ static inline bool task_is_realtime(struct task_struct *= tsk) } =20 #ifdef CONFIG_RT_MUTEXES +extern void rt_mutex_pre_schedule(void); +extern void rt_mutex_schedule(void); +extern void rt_mutex_post_schedule(void); + /* * Must hold either p->pi_lock or task_rq(p)->lock. */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 901766a88afc3..bba1ed28608ab 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6738,9 +6738,6 @@ static inline void sched_submit_work(struct task_stru= ct *tsk) static DEFINE_WAIT_OVERRIDE_MAP(sched_map, LD_WAIT_CONFIG); unsigned int task_flags; =20 - if (task_is_running(tsk)) - return; - /* * Establish LD_WAIT_CONFIG context to ensure none of the code called * will use a blocking primitive -- which would lead to recursion. @@ -6798,7 +6795,12 @@ asmlinkage __visible void __sched schedule(void) { struct task_struct *tsk =3D current; =20 - sched_submit_work(tsk); +#ifdef CONFIG_RT_MUTEXES + lockdep_assert(!tsk->sched_rt_mutex); +#endif + + if (!task_is_running(tsk)) + sched_submit_work(tsk); __schedule_loop(SM_NONE); sched_update_worker(tsk); } @@ -7059,6 +7061,32 @@ static void __setscheduler_prio(struct task_struct *= p, int prio) =20 #ifdef CONFIG_RT_MUTEXES =20 +/* + * Would be more useful with typeof()/auto_type but they don't mix with + * bit-fields. Since it's a local thing, use int. Keep the generic sounding + * name such that if someone were to implement this function we get to com= pare + * notes. + */ +#define fetch_and_set(x, v) ({ int _x =3D (x); (x) =3D (v); _x; }) + +void rt_mutex_pre_schedule(void) +{ + lockdep_assert(!fetch_and_set(current->sched_rt_mutex, 1)); + sched_submit_work(current); +} + +void rt_mutex_schedule(void) +{ + lockdep_assert(current->sched_rt_mutex); + __schedule_loop(SM_NONE); +} + +void rt_mutex_post_schedule(void) +{ + sched_update_worker(current); + lockdep_assert(fetch_and_set(current->sched_rt_mutex, 0)); +} + static inline int __rt_effective_prio(struct task_struct *pi_task, int pri= o) { if (pi_task) --=20 2.40.1 From nobody Thu Dec 18 23:07:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EC80EE49AD for ; Fri, 25 Aug 2023 18:11:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232492AbjHYSK4 (ORCPT ); Fri, 25 Aug 2023 14:10:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231589AbjHYSKl (ORCPT ); Fri, 25 Aug 2023 14:10:41 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78AC62686 for ; Fri, 25 Aug 2023 11:10:39 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1692987038; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Iaghsr58+6ugluicDUPGF4WEEEZtHY5OFG7Ax1N0Kv0=; b=VmUG3Jvy8eDY6U6AEGSYKBdKVev4hXPYbZZnLE0nGb0s2TFi18JfY8y3asJCr3hdUob9EW P+mgg5qMsPrcwplGpenuXD5JpTTNL+S4DeFThynegOdzSCk1HL1kfo53K1CptmvRiLtAdb rCWFKRKxsgAzJ4QO32r1ZONf3lBva9pd76LF0UGgeZSiPodcSJW6Vmk0VopAq3/tQIBuTA tT830ggooTnnr3/Nt4RVE4ZwAOPVe2cjJUUXdntLrQ/jZKeByoKdG4FC0HupOgIN+hvc10 aB/q9fGLMEDJP0+17rRw4O7TbkLHcWYtuXvVoaZsiRHDbf7A0VeI54kzx66LKg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1692987038; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Iaghsr58+6ugluicDUPGF4WEEEZtHY5OFG7Ax1N0Kv0=; b=O9e4j5pPl4gXtJix6HHTEcBZhl/8SkolbNZ2Xhdc0VAHo26RB2MIdDi2//D27HqLBMKEtW u4GbnTO9tVIYw0DQ== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v2 5/6] locking/rtmutex: Use rt_mutex specific scheduler helpers Date: Fri, 25 Aug 2023 20:10:32 +0200 Message-Id: <20230825181033.504534-6-bigeasy@linutronix.de> In-Reply-To: <20230825181033.504534-1-bigeasy@linutronix.de> References: <20230825181033.504534-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Have rt_mutex use the rt_mutex specific scheduler helpers to avoid recursion vs rtlock on the PI state. [[ peterz: adapted to new names ]] Reported-by: Crystal Wood Signed-off-by: "Peter Zijlstra (Intel)" Link: https://lore.kernel.org/r/20230815111430.421408298@infradead.org Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/pi.c | 11 +++++++++++ kernel/locking/rtmutex.c | 14 ++++++++++++-- kernel/locking/rwbase_rt.c | 6 ++++++ kernel/locking/rwsem.c | 8 +++++++- kernel/locking/spinlock_rt.c | 4 ++++ 5 files changed, 40 insertions(+), 3 deletions(-) diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index ce2889f123755..f8e65b27d9d6b 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-or-later =20 #include +#include #include =20 #include "futex.h" @@ -1002,6 +1003,12 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fl= ags, ktime_t *time, int tryl goto no_block; } =20 + /* + * Must be done before we enqueue the waiter, here is unfortunately + * under the hb lock, but that *should* work because it does nothing. + */ + rt_mutex_pre_schedule(); + rt_mutex_init_waiter(&rt_waiter); =20 /* @@ -1052,6 +1059,10 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fl= ags, ktime_t *time, int tryl if (ret && !rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter= )) ret =3D 0; =20 + /* + * Waiter is unqueued. + */ + rt_mutex_post_schedule(); no_block: /* * Fixup the pi_state owner and possibly acquire the lock if we diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index bcec0533a0cc0..a3fe05dfd0d8f 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1632,7 +1632,7 @@ static int __sched rt_mutex_slowlock_block(struct rt_= mutex_base *lock, raw_spin_unlock_irq(&lock->wait_lock); =20 if (!owner || !rtmutex_spin_on_owner(lock, waiter, owner)) - schedule(); + rt_mutex_schedule(); =20 raw_spin_lock_irq(&lock->wait_lock); set_current_state(state); @@ -1661,7 +1661,7 @@ static void __sched rt_mutex_handle_deadlock(int res,= int detect_deadlock, WARN(1, "rtmutex deadlock detected\n"); while (1) { set_current_state(TASK_INTERRUPTIBLE); - schedule(); + rt_mutex_schedule(); } } =20 @@ -1756,6 +1756,15 @@ static int __sched rt_mutex_slowlock(struct rt_mutex= _base *lock, unsigned long flags; int ret; =20 + /* + * Do all pre-schedule work here, before we queue a waiter and invoke + * PI -- any such work that trips on rtlock (PREEMPT_RT spinlock) would + * otherwise recurse back into task_blocks_on_rt_mutex() through + * rtlock_slowlock() and will then enqueue a second waiter for this + * same task and things get really confusing real fast. + */ + rt_mutex_pre_schedule(); + /* * Technically we could use raw_spin_[un]lock_irq() here, but this can * be called in early boot if the cmpxchg() fast path is disabled @@ -1767,6 +1776,7 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_= base *lock, raw_spin_lock_irqsave(&lock->wait_lock, flags); ret =3D __rt_mutex_slowlock_locked(lock, ww_ctx, state); raw_spin_unlock_irqrestore(&lock->wait_lock, flags); + rt_mutex_post_schedule(); =20 return ret; } diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index 25ec0239477c2..7d57bfb909001 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -71,6 +71,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *r= wb, struct rt_mutex_base *rtm =3D &rwb->rtmutex; int ret; =20 + rwbase_pre_schedule(); raw_spin_lock_irq(&rtm->wait_lock); =20 /* @@ -125,6 +126,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt = *rwb, rwbase_rtmutex_unlock(rtm); =20 trace_contention_end(rwb, ret); + rwbase_post_schedule(); return ret; } =20 @@ -237,6 +239,8 @@ static int __sched rwbase_write_lock(struct rwbase_rt *= rwb, /* Force readers into slow path */ atomic_sub(READER_BIAS, &rwb->readers); =20 + rt_mutex_pre_schedule(); + raw_spin_lock_irqsave(&rtm->wait_lock, flags); if (__rwbase_write_trylock(rwb)) goto out_unlock; @@ -248,6 +252,7 @@ static int __sched rwbase_write_lock(struct rwbase_rt *= rwb, if (rwbase_signal_pending_state(state, current)) { rwbase_restore_current_state(); __rwbase_write_unlock(rwb, 0, flags); + rt_mutex_post_schedule(); trace_contention_end(rwb, -EINTR); return -EINTR; } @@ -266,6 +271,7 @@ static int __sched rwbase_write_lock(struct rwbase_rt *= rwb, =20 out_unlock: raw_spin_unlock_irqrestore(&rtm->wait_lock, flags); + rt_mutex_post_schedule(); return 0; } =20 diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 9eabd585ce7af..2340b6d90ec6f 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -1427,8 +1427,14 @@ static inline void __downgrade_write(struct rw_semap= hore *sem) #define rwbase_signal_pending_state(state, current) \ signal_pending_state(state, current) =20 +#define rwbase_pre_schedule() \ + rt_mutex_pre_schedule() + #define rwbase_schedule() \ - schedule() + rt_mutex_schedule() + +#define rwbase_post_schedule() \ + rt_mutex_post_schedule() =20 #include "rwbase_rt.c" =20 diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c index 48a19ed8486d8..842037b2ba548 100644 --- a/kernel/locking/spinlock_rt.c +++ b/kernel/locking/spinlock_rt.c @@ -184,9 +184,13 @@ static __always_inline int rwbase_rtmutex_trylock(str= uct rt_mutex_base *rtm) =20 #define rwbase_signal_pending_state(state, current) (0) =20 +#define rwbase_pre_schedule() + #define rwbase_schedule() \ schedule_rtlock() =20 +#define rwbase_post_schedule() + #include "rwbase_rt.c" /* * The common functions which get wrapped into the rwlock API. --=20 2.40.1 From nobody Thu Dec 18 23:07:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85FBDEE49B2 for ; Fri, 25 Aug 2023 18:11:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232602AbjHYSK5 (ORCPT ); Fri, 25 Aug 2023 14:10:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231591AbjHYSKl (ORCPT ); Fri, 25 Aug 2023 14:10:41 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B21EE1BF1 for ; Fri, 25 Aug 2023 11:10:39 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1692987038; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Qx5o5NWuongItLawGnYT3eYixJKHNqubNZ7Gbs6vpvI=; b=r8sH32SWQe+RFNOfsM7sBQTrtyCKJSbyEmJihL7cyPasvs3mqvJJloypILxgiTF8a8K5yQ b92E1Tw5pEw8vAQxwCvzTaXUkm7LQfokPXjzSK8gB6hdtQ8GBZZWSmbFNNsmD9r39i2yUa g4gnLfL0b9YkcwDc0RB06AFI79gf6wAidcrD8Crswhxk+QnDGHYZx8L86szS4zz1O6tcFo W3WGk45rLBu/c/MA5OJG6W+AZAIGbT+Ihdw6f6e0OkThvJnaLYa9kWTjffe8FT6mnQ+M9K P3RfLsdQFj2G4pXaNyOhz72qB7gBk2s0BBN70dB6EEZw+9gkpSwoQjSiOEqRKQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1692987038; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Qx5o5NWuongItLawGnYT3eYixJKHNqubNZ7Gbs6vpvI=; b=OiTe5S48gckWMNd+eXzMWGMgAI7oY+aP+qgi61j45/z7WgbsOQHKvYbNYrPBhwtcR5+6PE h2FIxwpLeAoZN2Ag== To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: bigeasy@linutronix.de, tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v2 6/6] locking/rtmutex: Add a lockdep assert to catch potential nested blocking Date: Fri, 25 Aug 2023 20:10:33 +0200 Message-Id: <20230825181033.504534-7-bigeasy@linutronix.de> In-Reply-To: <20230825181033.504534-1-bigeasy@linutronix.de> References: <20230825181033.504534-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Thomas Gleixner There used to be a BUG_ON(current->pi_blocked_on) in the lock acquisition functions, but that vanished in one of the rtmutex overhauls. Bring it back in form of a lockdep assert to catch code paths which take rtmutex based locks with current::pi_blocked_on !=3D NULL. Reported-by: Crystal Wood Signed-off-by: Thomas Gleixner Signed-off-by: "Peter Zijlstra (Intel)" Link: https://lkml.kernel.org/r/20230427111937.2745231-5-bigeasy@linutronix= .de Link: https://lore.kernel.org/r/20230815111430.488430699@infradead.org Signed-off-by: Sebastian Andrzej Siewior --- kernel/locking/rtmutex.c | 2 ++ kernel/locking/rwbase_rt.c | 2 ++ kernel/locking/spinlock_rt.c | 2 ++ 3 files changed, 6 insertions(+) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index a3fe05dfd0d8f..4a10e8c16fd2b 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1784,6 +1784,8 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_= base *lock, static __always_inline int __rt_mutex_lock(struct rt_mutex_base *lock, unsigned int state) { + lockdep_assert(!current->pi_blocked_on); + if (likely(rt_mutex_try_acquire(lock))) return 0; =20 diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index 7d57bfb909001..b5e881250fec5 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -133,6 +133,8 @@ static int __sched __rwbase_read_lock(struct rwbase_rt = *rwb, static __always_inline int rwbase_read_lock(struct rwbase_rt *rwb, unsigned int state) { + lockdep_assert(!current->pi_blocked_on); + if (rwbase_read_trylock(rwb)) return 0; =20 diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c index 842037b2ba548..38e292454fccb 100644 --- a/kernel/locking/spinlock_rt.c +++ b/kernel/locking/spinlock_rt.c @@ -37,6 +37,8 @@ =20 static __always_inline void rtlock_lock(struct rt_mutex_base *rtm) { + lockdep_assert(!current->pi_blocked_on); + if (unlikely(!rt_mutex_cmpxchg_acquire(rtm, NULL, current))) rtlock_slowlock(rtm); } --=20 2.40.1 From nobody Thu Dec 18 23:07:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14323C83F32 for ; Thu, 31 Aug 2023 09:53:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238542AbjHaJx2 (ORCPT ); Thu, 31 Aug 2023 05:53:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233717AbjHaJx1 (ORCPT ); Thu, 31 Aug 2023 05:53:27 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58595CED for ; Thu, 31 Aug 2023 02:53:21 -0700 (PDT) Date: Thu, 31 Aug 2023 11:53:14 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1693475599; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yPBlYtLozkMuEAYxQ7Chu1MKOc7ySAG4sE0hHicbhw8=; b=GCIWCYkZDTfjdcptuiNX10YAxvC7YtnSFEVwIK7TI9+08pj4jf0feSnganxn6kHZ0efkdv G1j6DgwTn56/87Ljv4uSI4l5pDWDV6jz1fFocR4XzLPDIWvIOvEdk8UiNqy2TOaaORtPwP 8/vh2wT3275XmlT2BIQKgQp6d4VIUHzIJzydmUEyE3L/QZ3S1FSrSM9pCrmWGdOInun2jj WC1CxtQn+t+4f4heoabyBIel1j0tGTIz1hZT2qvVcAA1y0eTQyoW+jW2hznxERJf8GwjRD S8bCZ7nnDUur2zlObL12Y58q4IfRcFPYqswEz7zjHpim1NIyd0Ez0G8nHMi3Ow== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1693475599; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yPBlYtLozkMuEAYxQ7Chu1MKOc7ySAG4sE0hHicbhw8=; b=iUTuPv+i0USKWe/1RSTZ88MVdC4jlSaVPGOkSXR+sTTtCkQCbqtC41PIQl/DeWr1CS+iUN SkgtR5dkwyG583Cg== From: Sebastian Andrzej Siewior To: Peter Zijlstra , linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, boqun.feng@gmail.com, bristot@redhat.com, bsegall@google.com, dietmar.eggemann@arm.com, jstultz@google.com, juri.lelli@redhat.com, longman@redhat.com, mgorman@suse.de, mingo@redhat.com, rostedt@goodmis.org, swood@redhat.com, vincent.guittot@linaro.org, vschneid@redhat.com, will@kernel.org Subject: [PATCH v2 7/6] locking/rtmutex: Acquire the hb lock via trylock after wait-proxylock. Message-ID: <20230831095314.fTliy0Bh@linutronix.de> References: <20230825181033.504534-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20230825181033.504534-1-bigeasy@linutronix.de> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" After rt_mutex_wait_proxy_lock() task_struct::pi_blocked_on is cleared if current owns the lock. If the operation has been interrupted by a signal or timeout then pi_blocked_on can be set. This means spin_lock() *can* overwrite pi_blocked_on on PREEMPT_RT. This has been noticed by the recently added lockdep-asserts=E2=80=A6 The rt_mutex_cleanup_proxy_lock() operation will clear pi_blocked_on (and update pending waiters as expected) but it must happen under the hb lock to ensure the same state in rtmutex and userland. Given all the possibilities it is probably the simplest option to try-lock the hb lock. In case the lock is occupied a quick nap is needed. A busy loop can lock up the system if performed by a task with high priorioty preventing the owner from running. The rt_mutex_post_schedule() needs to be put before try-lock-loop because otherwie the schedule() in schedule_hrtimeout() will trip over the !sched_rt_mutex assert. Introduce futex_trylock_hblock() to try-lock the hb lock and sleep until the try-lock operation succeeds. Use it after rt_mutex_wait_proxy_lock() to acquire the lock. Suggested-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior --- On 2023-08-25 20:10:27 [+0200], To Peter Zijlstra wrote: > but then why over complicate things. Speaking of over complicating, this > triggered: > | ------------[ cut here ]------------ > | WARNING: CPU: 6 PID: 2155 at kernel/locking/spinlock_rt.c:40 rt_spin_lo= ck+0x5a/0xf0 =E2=80=A6 > Here, rt_mutex_wait_proxy_lock() returned with -EINTR, didn't acquire > the lock. Later rt_mutex_cleanup_proxy_lock() would clean > ->pi_blocked_on but that happens after=20 > | /* Current is not longer pi_blocked_on */ > | spin_lock(q.lock_ptr); So this seems to do the job. Gross but =E2=80=A6 kernel/futex/futex.h | 23 +++++++++++++++++++++++ kernel/futex/pi.c | 10 ++++------ kernel/futex/requeue.c | 4 ++-- kernel/locking/rtmutex.c | 7 +++++-- 4 files changed, 34 insertions(+), 10 deletions(-) diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index b5379c0e6d6d1..b0b2a5b35ae57 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -254,6 +254,29 @@ double_unlock_hb(struct futex_hash_bucket *hb1, struct= futex_hash_bucket *hb2) spin_unlock(&hb2->lock); } =20 +static inline void futex_trylock_hblock(spinlock_t *lock) +{ + do { + ktime_t chill_time;; + + /* + * Current is not longer pi_blocked_on if it owns the lock. It + * can still have pi_blocked_on set if the lock acquiring was + * interrupted by signal or timeout. The trylock operation does + * not clobber pi_blocked_on so it is the only option. + * Should the try-lock operation fail then it needs leave the CPU + * to avoid a busy loop in case it is the task with the highest + * priority. + */ + if (spin_trylock(lock)) + return; + + chill_time =3D ktime_set(0, NSEC_PER_MSEC); + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_hrtimeout(&chill_time, HRTIMER_MODE_REL_HARD); + } while (1); +} + /* syscalls */ =20 extern int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, u32 diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index f8e65b27d9d6b..1440fdcdbfd8c 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -1046,7 +1046,10 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fl= ags, ktime_t *time, int tryl ret =3D rt_mutex_wait_proxy_lock(&q.pi_state->pi_mutex, to, &rt_waiter); =20 cleanup: - spin_lock(q.lock_ptr); + rt_mutex_post_schedule(); + + futex_trylock_hblock(q.lock_ptr); + /* * If we failed to acquire the lock (deadlock/signal/timeout), we must * first acquire the hb->lock before removing the lock from the @@ -1058,11 +1061,6 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fl= ags, ktime_t *time, int tryl */ if (ret && !rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter= )) ret =3D 0; - - /* - * Waiter is unqueued. - */ - rt_mutex_post_schedule(); no_block: /* * Fixup the pi_state owner and possibly acquire the lock if we diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index cba8b1a6a4cc2..26888cfa74449 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -850,8 +850,8 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, pi_mutex =3D &q.pi_state->pi_mutex; ret =3D rt_mutex_wait_proxy_lock(pi_mutex, to, &rt_waiter); =20 - /* Current is not longer pi_blocked_on */ - spin_lock(q.lock_ptr); + futex_trylock_hblock(q.lock_ptr); + if (ret && !rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter)) ret =3D 0; =20 diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 4a10e8c16fd2b..e56585ef489c8 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1166,10 +1166,13 @@ try_to_take_rt_mutex(struct rt_mutex_base *lock, st= ruct task_struct *task, * Clear @task->pi_blocked_on. Requires protection by * @task->pi_lock. Redundant operation for the @waiter =3D=3D NULL * case, but conditionals are more expensive than a redundant - * store. + * store. But then there is FUTEX and if rt_mutex_wait_proxy_lock() + * did not acquire the lock it try-locks another lock before it clears + * @task->pi_blocked_on so we mustn't clear it here premature. */ raw_spin_lock(&task->pi_lock); - task->pi_blocked_on =3D NULL; + if (waiter) + task->pi_blocked_on =3D NULL; /* * Finish the lock acquisition. @task is the new owner. If * other waiters exist we have to insert the highest priority --=20 2.40.1