With PREEMPT_RT enabled, some of the calls to put_task_struct() coming
from rt_mutex_adjust_prio_chain() could happen in preemptible context and
with a mutex enqueued. That could lead to this sequence:
rt_mutex_adjust_prio_chain()
put_task_struct()
__put_task_struct()
sched_ext_free()
spin_lock_irqsave()
rtlock_lock() ---> TRIGGERS
lockdep_assert(!current->pi_blocked_on);
This is not a SCHED_EXT bug. The first cleanup function called by
__put_task_struct() is sched_ext_free() and it happens to take a
(RT) spin_lock, which in the scenario described above, would trigger
the lockdep assertion of "!current->pi_blocked_on".
Crystal Wood was able to identify the problem as __put_task_struct()
being called during rt_mutex_adjust_prio_chain(), in the context of
a process with a mutex enqueued.
Instead of adding more complex conditions to decide when to directly
call __put_task_struct() and when to defer the call, unconditionally
resort to the deferred call on PREEMPT_RT to simplify the code.
Suggested-by: Crystal Wood <crwood@redhat.com>
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Fixes: 893cdaaa3977 ("sched: avoid false lockdep splat in put_task_struct()")
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
---
v6: (Sebastian) rework patch description with the note from Crystal Wood.
v5: Add the "Fixes:" tag.
v4: Fix the implementation of what was requested on v3.
v3: (Sebastian, PeterZ) always call the deferred __put_task_struct() on RT.
v2: (Rostedt) remove the #ifdef from put_task_struct() and create
tsk_is_pi_blocked_on() in sched.h to make the change cleaner.
include/linux/sched/task.h | 27 ++++++++++-----------------
1 file changed, 10 insertions(+), 17 deletions(-)
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 0f2aeb37bbb0..5873de8804d4 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -135,24 +135,17 @@ static inline void put_task_struct(struct task_struct *t)
return;
/*
- * In !RT, it is always safe to call __put_task_struct().
- * Under RT, we can only call it in preemptible context.
- */
- if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
- static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
-
- lock_map_acquire_try(&put_task_map);
- __put_task_struct(t);
- lock_map_release(&put_task_map);
- return;
- }
-
- /*
- * under PREEMPT_RT, we can't call put_task_struct
+ * Under PREEMPT_RT, we can't call __put_task_struct
* in atomic context because it will indirectly
- * acquire sleeping locks.
+ * acquire sleeping locks. The same is true if the
+ * current process has a mutex enqueued (blocked on
+ * a PI chain).
+ *
+ * In !RT, it is always safe to call __put_task_struct().
+ * Though, in order to simplify the code, resort to the
+ * deferred call too.
*
- * call_rcu() will schedule delayed_put_task_struct_rcu()
+ * call_rcu() will schedule __put_task_struct_rcu_cb()
* to be called in process context.
*
* __put_task_struct() is called when
@@ -165,7 +158,7 @@ static inline void put_task_struct(struct task_struct *t)
*
* delayed_free_task() also uses ->rcu, but it is only called
* when it fails to fork a process. Therefore, there is no
- * way it can conflict with put_task_struct().
+ * way it can conflict with __put_task_struct().
*/
call_rcu(&t->rcu, __put_task_struct_rcu_cb);
}
--
2.50.0
On 07/07, Luis Claudio R. Goncalves wrote: > > Instead of adding more complex conditions to decide when to directly > call __put_task_struct() and when to defer the call, unconditionally > resort to the deferred call on PREEMPT_RT to simplify the code. ^^^^^^^^^^^^^ Confused... with this patch put_task_struct() always uses the deferred call, regardless of PREEMPT_RT? Oleg. > --- a/include/linux/sched/task.h > +++ b/include/linux/sched/task.h > @@ -135,24 +135,17 @@ static inline void put_task_struct(struct task_struct *t) > return; > > /* > - * In !RT, it is always safe to call __put_task_struct(). > - * Under RT, we can only call it in preemptible context. > - */ > - if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) { > - static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP); > - > - lock_map_acquire_try(&put_task_map); > - __put_task_struct(t); > - lock_map_release(&put_task_map); > - return; > - } > - > - /* > - * under PREEMPT_RT, we can't call put_task_struct > + * Under PREEMPT_RT, we can't call __put_task_struct > * in atomic context because it will indirectly > - * acquire sleeping locks. > + * acquire sleeping locks. The same is true if the > + * current process has a mutex enqueued (blocked on > + * a PI chain). > + * > + * In !RT, it is always safe to call __put_task_struct(). > + * Though, in order to simplify the code, resort to the > + * deferred call too. > * > - * call_rcu() will schedule delayed_put_task_struct_rcu() > + * call_rcu() will schedule __put_task_struct_rcu_cb() > * to be called in process context. > * > * __put_task_struct() is called when > @@ -165,7 +158,7 @@ static inline void put_task_struct(struct task_struct *t) > * > * delayed_free_task() also uses ->rcu, but it is only called > * when it fails to fork a process. Therefore, there is no > - * way it can conflict with put_task_struct(). > + * way it can conflict with __put_task_struct(). > */ > call_rcu(&t->rcu, __put_task_struct_rcu_cb); > } > -- > 2.50.0 >
On Mon, Jul 28, 2025 at 10:14:41PM +0200, Oleg Nesterov wrote: > On 07/07, Luis Claudio R. Goncalves wrote: > > > > Instead of adding more complex conditions to decide when to directly > > call __put_task_struct() and when to defer the call, unconditionally > > resort to the deferred call on PREEMPT_RT to simplify the code. > ^^^^^^^^^^^^^ > Confused... with this patch put_task_struct() always uses the deferred > call, regardless of PREEMPT_RT? > > Oleg. You are correct. I mistakenly sent the patch from v3, with the updated description. I had been working on that patch in parallel for a future RFC and mixed them up when updating the description for v6. The changes from v4 onward were updates to the patch description only. The fact that I had discussed this patch (the one submitted) with some people and was asking for tests to assess robustness and dependability may have further composed the confusion. I posted v6 (with the wrong patch) a couple of hours before leaving for a 2-week vacation. This is also why I didn't notice the wrong submission before. That was an unfortunate mistake on my part, no bad intent. Not sure how to proceed here, if I should resend the correct patch or a follow-up fix like this: ====== From: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Subject: sched: restore the behavior of put_task_struct() for non-rt Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set") changed the behavior of put_task_struct() unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch with the commit description. Restore the previous behavior of put_task_struct() for the PREEMPT_RT disabled case. Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set") Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> --- diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index ea41795a352b..51678a541477 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t) if (!refcount_dec_and_test(&t->usage)) return; + /* In !RT, it is always safe to call __put_task_struct(). */ + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP); + + lock_map_acquire_try(&put_task_map); + __put_task_struct(t); + lock_map_release(&put_task_map); + return; + } + /* * Under PREEMPT_RT, we can't call __put_task_struct * in atomic context because it will indirectly @@ -137,10 +150,6 @@ static inline void put_task_struct(struct task_struct *t) * current process has a mutex enqueued (blocked on * a PI chain). * - * In !RT, it is always safe to call __put_task_struct(). - * Though, in order to simplify the code, resort to the - * deferred call too. - * * call_rcu() will schedule __put_task_struct_rcu_cb() * to be called in process context. *
On 07/29, Luis Claudio R. Goncalves wrote: > > From: Luis Claudio R. Goncalves <lgoncalv@redhat.com> > Subject: sched: restore the behavior of put_task_struct() for non-rt > > Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt > if pi_blocked_on is set") changed the behavior of put_task_struct() > unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch > with the commit description. > > Restore the previous behavior of put_task_struct() for the PREEMPT_RT > disabled case. > > Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set") > Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> > --- > diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h > index ea41795a352b..51678a541477 100644 > --- a/include/linux/sched/task.h > +++ b/include/linux/sched/task.h > @@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t) > if (!refcount_dec_and_test(&t->usage)) > return; > > + /* In !RT, it is always safe to call __put_task_struct(). */ > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { > + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP); > + > + lock_map_acquire_try(&put_task_map); > + __put_task_struct(t); > + lock_map_release(&put_task_map); > + return; > + } FWIW: Acked-by: Oleg Nesterov <oleg@redhat.com> At the same time... I don't understand this DEFINE_WAIT_OVERRIDE_MAP(). IIUC, we need to shut up lockdep when put_task_struct() is called under raw_spinlock_t and __put_task_struct() paths take spinlock_t, right? Perhaps this deserves a comment... But if I am right, why LD_WAIT_SLEEP? LD_WAIT_CONFIG should equally work, no? LD_WAIT_SLEEP can fool lockdep more than we need, suppose that __put_task_struct() does mutex_lock(). Not really a problem, might_sleep/etc will complain in this case, but still. Or I am totally confused? Oleg.
On Tue, Jul 29, 2025 at 01:47:03PM +0200, Oleg Nesterov wrote: > On 07/29, Luis Claudio R. Goncalves wrote: > > > > From: Luis Claudio R. Goncalves <lgoncalv@redhat.com> > > Subject: sched: restore the behavior of put_task_struct() for non-rt > > > > Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt > > if pi_blocked_on is set") changed the behavior of put_task_struct() > > unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch > > with the commit description. > > > > Restore the previous behavior of put_task_struct() for the PREEMPT_RT > > disabled case. > > > > Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set") > > Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> > > --- > > diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h > > index ea41795a352b..51678a541477 100644 > > --- a/include/linux/sched/task.h > > +++ b/include/linux/sched/task.h > > @@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t) > > if (!refcount_dec_and_test(&t->usage)) > > return; > > > > + /* In !RT, it is always safe to call __put_task_struct(). */ > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { > > + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP); > > + > > + lock_map_acquire_try(&put_task_map); > > + __put_task_struct(t); > > + lock_map_release(&put_task_map); > > + return; > > + } > > FWIW: > > Acked-by: Oleg Nesterov <oleg@redhat.com> > > > At the same time... I don't understand this DEFINE_WAIT_OVERRIDE_MAP(). > IIUC, we need to shut up lockdep when put_task_struct() is called under > raw_spinlock_t and __put_task_struct() paths take spinlock_t, right? > Perhaps this deserves a comment... I reverted that code to the previous state, commit 893cdaaa3977 ("sched: avoid false lockdep splat in put_task_struct()") and simplified the "if" statement. In the original code, PREEMPT_RT could call __put_task_struct() if the context was preemptible. But in the proposed code __put_task_struct() is only called if PREEMPT_RT is disabled. In this case I believe we could simply do: + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { + __put_task_struct(t); + return; + } Does that make sense? Luis > But if I am right, why LD_WAIT_SLEEP? LD_WAIT_CONFIG should equally work, no? > > LD_WAIT_SLEEP can fool lockdep more than we need, suppose that __put_task_struct() > does mutex_lock(). Not really a problem, might_sleep/etc will complain in this > case, but still. > > Or I am totally confused? > > Oleg. > ---end quoted text---
On 07/29, Luis Claudio R. Goncalves wrote: > > On Tue, Jul 29, 2025 at 01:47:03PM +0200, Oleg Nesterov wrote: > > On 07/29, Luis Claudio R. Goncalves wrote: > > > > > > + /* In !RT, it is always safe to call __put_task_struct(). */ > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { > > > + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP); > > > + > > > + lock_map_acquire_try(&put_task_map); > > > + __put_task_struct(t); > > > + lock_map_release(&put_task_map); > > > + return; > > > + } > > > > FWIW: > > > > Acked-by: Oleg Nesterov <oleg@redhat.com> > > > > > > At the same time... I don't understand this DEFINE_WAIT_OVERRIDE_MAP(). > > IIUC, we need to shut up lockdep when put_task_struct() is called under > > raw_spinlock_t and __put_task_struct() paths take spinlock_t, right? > > Perhaps this deserves a comment... > > I reverted that code to the previous state, commit 893cdaaa3977 ("sched: > avoid false lockdep splat in put_task_struct()") and simplified the "if" > statement. Yes, yes, I see and I have already acked your patch. > In the original code, PREEMPT_RT could call __put_task_struct() > if the context was preemptible. But in the proposed code __put_task_struct() > is only called if PREEMPT_RT is disabled. In this case I believe we could > simply do: > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { > + __put_task_struct(t); > + return; > + } > > Does that make sense? Hmm... But, again unless I am totally confused, we do need the DEFINE_WAIT_OVERRIDE_MAP() trick even if !PREEMPT_RT ? Looking at lockdep_wait_type, I think that with CONFIG_PROVE_RAW_LOCK_NESTING=y lockdep enforces the PREEMPT_RT locking rules even if PREEMPT_RT is not set? But: > > But if I am right, why LD_WAIT_SLEEP? LD_WAIT_CONFIG should equally work, no? > > > > LD_WAIT_SLEEP can fool lockdep more than we need, suppose that __put_task_struct() > > does mutex_lock(). Not really a problem, might_sleep/etc will complain in this > > case, but still. I still think LD_WAIT_CONFIG makes more sense. Oleg.
On 07/29, Oleg Nesterov wrote: > > On 07/29, Luis Claudio R. Goncalves wrote: > > > > On Tue, Jul 29, 2025 at 01:47:03PM +0200, Oleg Nesterov wrote: > > > On 07/29, Luis Claudio R. Goncalves wrote: > > > > > > > > + /* In !RT, it is always safe to call __put_task_struct(). */ > > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { > > > > + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP); > > > > + > > > > + lock_map_acquire_try(&put_task_map); > > > > + __put_task_struct(t); > > > > + lock_map_release(&put_task_map); > > > > + return; > > > > + } > > > > > > FWIW: > > > > > > Acked-by: Oleg Nesterov <oleg@redhat.com> > > > > > > > > > At the same time... I don't understand this DEFINE_WAIT_OVERRIDE_MAP(). > > > IIUC, we need to shut up lockdep when put_task_struct() is called under > > > raw_spinlock_t and __put_task_struct() paths take spinlock_t, right? > > > Perhaps this deserves a comment... > > > > I reverted that code to the previous state, commit 893cdaaa3977 ("sched: > > avoid false lockdep splat in put_task_struct()") and simplified the "if" > > statement. > > Yes, yes, I see and I have already acked your patch. So I think you should just resend it. s/LD_WAIT_SLEEP/LD_WAIT_CONFIG/ needs another discussion even if I am right, sorry for the confusion. Oleg.
On Fri, Aug 01, 2025 at 12:24:29PM +0200, Oleg Nesterov wrote: > On 07/29, Oleg Nesterov wrote: > > > > On 07/29, Luis Claudio R. Goncalves wrote: > > > > > > On Tue, Jul 29, 2025 at 01:47:03PM +0200, Oleg Nesterov wrote: > > > > On 07/29, Luis Claudio R. Goncalves wrote: > > > > > > > > > > + /* In !RT, it is always safe to call __put_task_struct(). */ > > > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { > > > > > + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP); > > > > > + > > > > > + lock_map_acquire_try(&put_task_map); > > > > > + __put_task_struct(t); > > > > > + lock_map_release(&put_task_map); > > > > > + return; > > > > > + } > > > > > > > > FWIW: > > > > > > > > Acked-by: Oleg Nesterov <oleg@redhat.com> > > > > > > > > > > > > At the same time... I don't understand this DEFINE_WAIT_OVERRIDE_MAP(). > > > > IIUC, we need to shut up lockdep when put_task_struct() is called under > > > > raw_spinlock_t and __put_task_struct() paths take spinlock_t, right? > > > > Perhaps this deserves a comment... > > > > > > I reverted that code to the previous state, commit 893cdaaa3977 ("sched: > > > avoid false lockdep splat in put_task_struct()") and simplified the "if" > > > statement. > > > > Yes, yes, I see and I have already acked your patch. > > So I think you should just resend it. Thank you! I was a bit unsure on how to proceed :) > s/LD_WAIT_SLEEP/LD_WAIT_CONFIG/ needs another discussion even if I am right, > sorry for the confusion. For what is worth, I tested the change with LD_WAIT_CONFIG and it worked as expected. Luis > Oleg. > ---end quoted text---
On 2025-08-01 12:24:29 [+0200], Oleg Nesterov wrote: > s/LD_WAIT_SLEEP/LD_WAIT_CONFIG/ needs another discussion even if I am right, > sorry for the confusion. You are correct Oleg. I've been just verifying it and yes: LD_WAIT_SLEEP suppresses also mutex while the intention is to only suppress spinlock_t. We have four users in tree, based on quick check all four should use CONFIG, three of them do use SLEEP. > Oleg. Sebastian
On 08/11, Sebastian Andrzej Siewior wrote: > > On 2025-08-01 12:24:29 [+0200], Oleg Nesterov wrote: > > s/LD_WAIT_SLEEP/LD_WAIT_CONFIG/ needs another discussion even if I am right, > > sorry for the confusion. > > You are correct Oleg. I've been just verifying it and yes: LD_WAIT_SLEEP > suppresses also mutex while the intention is to only suppress > spinlock_t. Good, thanks. > We have four users in tree, based on quick check all four should use > CONFIG, three of them do use SLEEP. Yes. I'll send the simple patch when this patch from Luis is merged. Oleg
On 2025-08-11 13:06:18 [+0200], Oleg Nesterov wrote: > On 08/11, Sebastian Andrzej Siewior wrote: > > > > On 2025-08-01 12:24:29 [+0200], Oleg Nesterov wrote: > > > s/LD_WAIT_SLEEP/LD_WAIT_CONFIG/ needs another discussion even if I am right, > > > sorry for the confusion. > > > > You are correct Oleg. I've been just verifying it and yes: LD_WAIT_SLEEP > > suppresses also mutex while the intention is to only suppress > > spinlock_t. > > Good, thanks. > > > We have four users in tree, based on quick check all four should use > > CONFIG, three of them do use SLEEP. > > Yes. I'll send the simple patch when this patch from Luis is merged. Okay. Let me cover then the other instances. > Oleg Sebastian
On 08/11, Sebastian Andrzej Siewior wrote: > > On 2025-08-11 13:06:18 [+0200], Oleg Nesterov wrote: > > On 08/11, Sebastian Andrzej Siewior wrote: > > > > > > On 2025-08-01 12:24:29 [+0200], Oleg Nesterov wrote: > > > > s/LD_WAIT_SLEEP/LD_WAIT_CONFIG/ needs another discussion even if I am right, > > > > sorry for the confusion. > > > > > > You are correct Oleg. I've been just verifying it and yes: LD_WAIT_SLEEP > > > suppresses also mutex while the intention is to only suppress > > > spinlock_t. > > > > Good, thanks. > > > > > We have four users in tree, based on quick check all four should use > > > CONFIG, three of them do use SLEEP. > > > > Yes. I'll send the simple patch when this patch from Luis is merged. > > Okay. Let me cover then the other instances. I was going to update them all, but feel free to do. Oleg.
On 2025-08-11 14:19:51 [+0200], Oleg Nesterov wrote: > > > Yes. I'll send the simple patch when this patch from Luis is merged. > > > > Okay. Let me cover then the other instances. > > I was going to update them all, but feel free to do. Oh no. Please do cover them all and I review then. I understood that you make a single patch for a single instance. > Oleg. Sebastian
On 2025-07-07 11:03:59 [-0300], Luis Claudio R. Goncalves wrote: > With PREEMPT_RT enabled, some of the calls to put_task_struct() coming > from rt_mutex_adjust_prio_chain() could happen in preemptible context and > with a mutex enqueued. That could lead to this sequence: > > rt_mutex_adjust_prio_chain() > put_task_struct() > __put_task_struct() > sched_ext_free() > spin_lock_irqsave() > rtlock_lock() ---> TRIGGERS > lockdep_assert(!current->pi_blocked_on); > > This is not a SCHED_EXT bug. The first cleanup function called by > __put_task_struct() is sched_ext_free() and it happens to take a > (RT) spin_lock, which in the scenario described above, would trigger > the lockdep assertion of "!current->pi_blocked_on". > > Crystal Wood was able to identify the problem as __put_task_struct() > being called during rt_mutex_adjust_prio_chain(), in the context of > a process with a mutex enqueued. > > Instead of adding more complex conditions to decide when to directly > call __put_task_struct() and when to defer the call, unconditionally > resort to the deferred call on PREEMPT_RT to simplify the code. > > Suggested-by: Crystal Wood <crwood@redhat.com> > Reviewed-by: Wander Lairson Costa <wander@redhat.com> > Fixes: 893cdaaa3977 ("sched: avoid false lockdep splat in put_task_struct()") > Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Side note: This simplifies the call chain to always free the task struct via RCU. This also means that the stack is not immediately available for recycle (CONFIG_VMAP_STACK) but after the grace period. Based on my testing the new task was also deployed on a remote CPU therefore I wouldn't worry too much here. Just wanted to point the only difference I could come up with. Sebastian
On 07/07/25 11:03, Luis Claudio R. Goncalves wrote: > With PREEMPT_RT enabled, some of the calls to put_task_struct() coming > from rt_mutex_adjust_prio_chain() could happen in preemptible context and > with a mutex enqueued. That could lead to this sequence: > > rt_mutex_adjust_prio_chain() > put_task_struct() > __put_task_struct() > sched_ext_free() > spin_lock_irqsave() > rtlock_lock() ---> TRIGGERS > lockdep_assert(!current->pi_blocked_on); > > This is not a SCHED_EXT bug. The first cleanup function called by > __put_task_struct() is sched_ext_free() and it happens to take a > (RT) spin_lock, which in the scenario described above, would trigger > the lockdep assertion of "!current->pi_blocked_on". > > Crystal Wood was able to identify the problem as __put_task_struct() > being called during rt_mutex_adjust_prio_chain(), in the context of > a process with a mutex enqueued. > > Instead of adding more complex conditions to decide when to directly > call __put_task_struct() and when to defer the call, unconditionally > resort to the deferred call on PREEMPT_RT to simplify the code. > > Suggested-by: Crystal Wood <crwood@redhat.com> > Reviewed-by: Wander Lairson Costa <wander@redhat.com> > Fixes: 893cdaaa3977 ("sched: avoid false lockdep splat in put_task_struct()") > Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> FWIW: Reviewed-by: Valentin Schneider <vschneid@redhat.com>
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 8671bad873ebeb082afcf7b4501395c374da6023
Gitweb: https://git.kernel.org/tip/8671bad873ebeb082afcf7b4501395c374da6023
Author: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
AuthorDate: Mon, 07 Jul 2025 11:03:59 -03:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 14 Jul 2025 17:16:33 +02:00
sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
With PREEMPT_RT enabled, some of the calls to put_task_struct() coming
from rt_mutex_adjust_prio_chain() could happen in preemptible context and
with a mutex enqueued. That could lead to this sequence:
rt_mutex_adjust_prio_chain()
put_task_struct()
__put_task_struct()
sched_ext_free()
spin_lock_irqsave()
rtlock_lock() ---> TRIGGERS
lockdep_assert(!current->pi_blocked_on);
This is not a SCHED_EXT bug. The first cleanup function called by
__put_task_struct() is sched_ext_free() and it happens to take a
(RT) spin_lock, which in the scenario described above, would trigger
the lockdep assertion of "!current->pi_blocked_on".
Crystal Wood was able to identify the problem as __put_task_struct()
being called during rt_mutex_adjust_prio_chain(), in the context of
a process with a mutex enqueued.
Instead of adding more complex conditions to decide when to directly
call __put_task_struct() and when to defer the call, unconditionally
resort to the deferred call on PREEMPT_RT to simplify the code.
Fixes: 893cdaaa3977 ("sched: avoid false lockdep splat in put_task_struct()")
Suggested-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/aGvTz5VaPFyj0pBV@uudg.org
---
include/linux/sched/task.h | 27 ++++++++++-----------------
1 file changed, 10 insertions(+), 17 deletions(-)
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index c517dbc..ea41795 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -131,24 +131,17 @@ static inline void put_task_struct(struct task_struct *t)
return;
/*
- * In !RT, it is always safe to call __put_task_struct().
- * Under RT, we can only call it in preemptible context.
- */
- if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
- static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
-
- lock_map_acquire_try(&put_task_map);
- __put_task_struct(t);
- lock_map_release(&put_task_map);
- return;
- }
-
- /*
- * under PREEMPT_RT, we can't call put_task_struct
+ * Under PREEMPT_RT, we can't call __put_task_struct
* in atomic context because it will indirectly
- * acquire sleeping locks.
+ * acquire sleeping locks. The same is true if the
+ * current process has a mutex enqueued (blocked on
+ * a PI chain).
+ *
+ * In !RT, it is always safe to call __put_task_struct().
+ * Though, in order to simplify the code, resort to the
+ * deferred call too.
*
- * call_rcu() will schedule delayed_put_task_struct_rcu()
+ * call_rcu() will schedule __put_task_struct_rcu_cb()
* to be called in process context.
*
* __put_task_struct() is called when
@@ -161,7 +154,7 @@ static inline void put_task_struct(struct task_struct *t)
*
* delayed_free_task() also uses ->rcu, but it is only called
* when it fails to fork a process. Therefore, there is no
- * way it can conflict with put_task_struct().
+ * way it can conflict with __put_task_struct().
*/
call_rcu(&t->rcu, __put_task_struct_rcu_cb);
}
© 2016 - 2025 Red Hat, Inc.