[v1] sched: restore the behavior of put_task_struct() for non-rt

[RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Luis Claudio R. Goncalves 6 months ago

Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
if pi_blocked_on is set") changed the behavior of put_task_struct()
unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
with the commit description.

Restore the previous behavior of put_task_struct() for the PREEMPT_RT
disabled case.

Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
---
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ea41795a352b..51678a541477 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t)
 	if (!refcount_dec_and_test(&t->usage))
 		return;
 
+	/* In !RT, it is always safe to call __put_task_struct(). */
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
+
+		lock_map_acquire_try(&put_task_map);
+		__put_task_struct(t);
+		lock_map_release(&put_task_map);
+		return;
+	}
+
 	/*
 	 * Under PREEMPT_RT, we can't call __put_task_struct
 	 * in atomic context because it will indirectly
@@ -137,10 +150,6 @@ static inline void put_task_struct(struct task_struct *t)
 	 * current process has a mutex enqueued (blocked on
 	 * a PI chain).
 	 *
-	 * In !RT, it is always safe to call __put_task_struct().
-	 * Though, in order to simplify the code, resort to the
-	 * deferred call too.
-	 *
 	 * call_rcu() will schedule __put_task_struct_rcu_cb()
 	 * to be called in process context.
 	 *

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Sebastian Andrzej Siewior 6 months ago

On 2025-08-06 16:43:55 [-0300], Luis Claudio R. Goncalves wrote:
> Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
> if pi_blocked_on is set") changed the behavior of put_task_struct()
> unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
> with the commit description.
> 
> Restore the previous behavior of put_task_struct() for the PREEMPT_RT
> disabled case.
> 
> Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> ---
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index ea41795a352b..51678a541477 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t)
>  	if (!refcount_dec_and_test(&t->usage))
>  		return;
>  
> +	/* In !RT, it is always safe to call __put_task_struct(). */

I don't want to drag this but this comment is obvious for anyone who is
fluent in C. It is just a statement with no explanation.
An important note would be that the atomic context restriction only
apply to PREEMPT_RT and therefore we have this context override for
lockdep below. The other question would be why don't we do this
unconditionally regardless of PREEMPT_RT. The only reason I could find
is that releasing the task here from the "exit path" makes the vmap
stack "earlier" available for reuse.

> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
> +		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> +
> +		lock_map_acquire_try(&put_task_map);
> +		__put_task_struct(t);
> +		lock_map_release(&put_task_map);
> +		return;
> +	}
> +
>  	/*
>  	 * Under PREEMPT_RT, we can't call __put_task_struct
>  	 * in atomic context because it will indirectly
> @@ -137,10 +150,6 @@ static inline void put_task_struct(struct task_struct *t)
>  	 * current process has a mutex enqueued (blocked on
>  	 * a PI chain).
>  	 *
> -	 * In !RT, it is always safe to call __put_task_struct().
> -	 * Though, in order to simplify the code, resort to the
> -	 * deferred call too.
> -	 *
>  	 * call_rcu() will schedule __put_task_struct_rcu_cb()
>  	 * to be called in process context.
>  	 *
> 

Sebastian

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Oleg Nesterov 6 months ago

On 08/11, Sebastian Andrzej Siewior wrote:
>
> I don't want to drag this but this comment is obvious for anyone who is
> fluent in C. It is just a statement with no explanation.
> An important note would be that the atomic context restriction only
> apply to PREEMPT_RT and therefore we have this context override for
> lockdep below. The other question would be why don't we do this
> unconditionally regardless of PREEMPT_RT. The only reason I could find
> is that releasing the task here from the "exit path" makes the vmap
> stack "earlier" available for reuse.

Sorry, could you clarify your "other" question?

What exactly do you think we could do regardless of PREEMPT_RT?

Oleg.

> 
> > +	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > +		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> > +
> > +		lock_map_acquire_try(&put_task_map);
> > +		__put_task_struct(t);
> > +		lock_map_release(&put_task_map);
> > +		return;
> > +	}
> > +
> >  	/*
> >  	 * Under PREEMPT_RT, we can't call __put_task_struct
> >  	 * in atomic context because it will indirectly
> > @@ -137,10 +150,6 @@ static inline void put_task_struct(struct task_struct *t)
> >  	 * current process has a mutex enqueued (blocked on
> >  	 * a PI chain).
> >  	 *
> > -	 * In !RT, it is always safe to call __put_task_struct().
> > -	 * Though, in order to simplify the code, resort to the
> > -	 * deferred call too.
> > -	 *
> >  	 * call_rcu() will schedule __put_task_struct_rcu_cb()
> >  	 * to be called in process context.
> >  	 *
> > 
> 
> Sebastian
>

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Sebastian Andrzej Siewior 6 months ago

On 2025-08-11 12:40:34 [+0200], Oleg Nesterov wrote:
> On 08/11, Sebastian Andrzej Siewior wrote:
> >
> > I don't want to drag this but this comment is obvious for anyone who is
> > fluent in C. It is just a statement with no explanation.
> > An important note would be that the atomic context restriction only
> > apply to PREEMPT_RT and therefore we have this context override for
> > lockdep below. The other question would be why don't we do this
> > unconditionally regardless of PREEMPT_RT. The only reason I could find
> > is that releasing the task here from the "exit path" makes the vmap
> > stack "earlier" available for reuse.
> 
> Sorry, could you clarify your "other" question?
> 
> What exactly do you think we could do regardless of PREEMPT_RT?

Do what we do now and have one free path for task_struct regardless if
PREEMPT_RT is enabled or not. The one via RCU which delays the freeing
until after the grace period.

> Oleg.

Sebastian

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Oleg Nesterov 6 months ago

On 08/11, Sebastian Andrzej Siewior wrote:
>
> On 2025-08-11 12:40:34 [+0200], Oleg Nesterov wrote:
> >
> > What exactly do you think we could do regardless of PREEMPT_RT?
>
> Do what we do now and have one free path for task_struct regardless if
> PREEMPT_RT is enabled or not. The one via RCU which delays the freeing
> until after the grace period.

Ah, got it. I won't really argue, but...

call_rcu() is not free, it obviously delays free_task/etc. To me this
!PREEMPT_RT optimization makes sense.

But lets forget it for the moment. This patch

	https://lore.kernel.org/all/aGvTz5VaPFyj0pBV@uudg.org
	[PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set

removed that optimization by mistake, this doesn't even match the changelog.
I think it should be restored, and this is what the new patch from Luis does.

Then we can discuss this all again and possibly remove it, but this
should be explicitly documented in the changelog.

Oleg.

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Sebastian Andrzej Siewior 6 months ago

On 2025-08-11 13:21:20 [+0200], Oleg Nesterov wrote:
> On 08/11, Sebastian Andrzej Siewior wrote:
> >
> > On 2025-08-11 12:40:34 [+0200], Oleg Nesterov wrote:
> > >
> > > What exactly do you think we could do regardless of PREEMPT_RT?
> >
> > Do what we do now and have one free path for task_struct regardless if
> > PREEMPT_RT is enabled or not. The one via RCU which delays the freeing
> > until after the grace period.
> 
> Ah, got it. I won't really argue, but...
> 
> call_rcu() is not free, it obviously delays free_task/etc. To me this
> !PREEMPT_RT optimization makes sense.
> 
> But lets forget it for the moment. This patch
> 
> 	https://lore.kernel.org/all/aGvTz5VaPFyj0pBV@uudg.org
> 	[PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
> 
> removed that optimization by mistake, this doesn't even match the changelog.
> I think it should be restored, and this is what the new patch from Luis does.
> 
> Then we can discuss this all again and possibly remove it, but this
> should be explicitly documented in the changelog.

Certainly. I am not saying we should keep it as is. The added comment
appears wrong but I am all for getting this merged and then sorting out
later.
Thank you for spotting this ;)

> Oleg.

Sebastian