[v1] sched: restore the behavior of put_task_struct() for non-rt

[RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Luis Claudio R. Goncalves 4 months, 3 weeks ago

Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
if pi_blocked_on is set") changed the behavior of put_task_struct()
unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
with the commit description.

Restore the previous behavior of put_task_struct() for the PREEMPT_RT
disabled case.

Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
---

  Note: This patch is a fix motivated by Oleg Nesterov's question at
  https://lore.kernel.org/linux-rt-devel/20250728201441.GA4690@redhat.com/

  Also, the kernel test robot reported a problem found in a x86 (32 bit) VM
  test that was bisected to the original fix being amended here:

    https://lkml.org/lkml/2025/9/5/147

  Though I was not able the reproduce the reported problem, this patch here
  would minimize the problem by isolating the behavior to PREEMPT_RT-enabled
  kernels.

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ea41795a352b..51678a541477 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t)
 	if (!refcount_dec_and_test(&t->usage))
 		return;
 
+	/* In !RT, it is always safe to call __put_task_struct(). */
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
+
+		lock_map_acquire_try(&put_task_map);
+		__put_task_struct(t);
+		lock_map_release(&put_task_map);
+		return;
+	}
+
 	/*
 	 * Under PREEMPT_RT, we can't call __put_task_struct
 	 * in atomic context because it will indirectly
@@ -137,10 +150,6 @@ static inline void put_task_struct(struct task_struct *t)
 	 * current process has a mutex enqueued (blocked on
 	 * a PI chain).
 	 *
-	 * In !RT, it is always safe to call __put_task_struct().
-	 * Though, in order to simplify the code, resort to the
-	 * deferred call too.
-	 *
 	 * call_rcu() will schedule __put_task_struct_rcu_cb()
 	 * to be called in process context.
 	 *

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Peter Zijlstra 4 months, 3 weeks ago

On Mon, Sep 15, 2025 at 08:15:19AM -0300, Luis Claudio R. Goncalves wrote:
> Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
> if pi_blocked_on is set") changed the behavior of put_task_struct()
> unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
> with the commit description.
> 
> Restore the previous behavior of put_task_struct() for the PREEMPT_RT
> disabled case.
> 
> Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> ---
> 
>   Note: This patch is a fix motivated by Oleg Nesterov's question at
>   https://lore.kernel.org/linux-rt-devel/20250728201441.GA4690@redhat.com/

Right, but I thought we did want to make this behaviour consistent.

Why have !RT behave differently? That is, why isn't this simply a
'buggy' comment/changelog issue?

>   Also, the kernel test robot reported a problem found in a x86 (32 bit) VM
>   test that was bisected to the original fix being amended here:
> 
>     https://lkml.org/lkml/2025/9/5/147
> 
>   Though I was not able the reproduce the reported problem, this patch here
>   would minimize the problem by isolating the behavior to PREEMPT_RT-enabled
>   kernels.

Yeah, robot also had trouble reproducing that if I read that
%reproduction thing right.

And IIRC RT has been running with this for ages, and never seen a
problem.

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Oleg Nesterov 4 months, 3 weeks ago

On 09/15, Peter Zijlstra wrote:
>
> On Mon, Sep 15, 2025 at 08:15:19AM -0300, Luis Claudio R. Goncalves wrote:
> > Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
> > if pi_blocked_on is set") changed the behavior of put_task_struct()
> > unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
> > with the commit description.
> >
> > Restore the previous behavior of put_task_struct() for the PREEMPT_RT
> > disabled case.
> >
> > Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> > Acked-by: Oleg Nesterov <oleg@redhat.com>
> > Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> > ---
> >
> >   Note: This patch is a fix motivated by Oleg Nesterov's question at
> >   https://lore.kernel.org/linux-rt-devel/20250728201441.GA4690@redhat.com/
>
> Right, but I thought we did want to make this behaviour consistent.
>
> Why have !RT behave differently? That is, why isn't this simply a
> 'buggy' comment/changelog issue?

Well, this was discussed several times, in particular see
https://lore.kernel.org/all/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/

And task_struct->rcu_users was introduced to avoid RCU call in put_task_struct() ...

But I won't really argue if you decide to remove this !RT optimization, although
I think it would be better to do this in a separate patch.

Oleg.

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Peter Zijlstra 4 months, 3 weeks ago

On Mon, Sep 15, 2025 at 02:35:40PM +0200, Oleg Nesterov wrote:
> On 09/15, Peter Zijlstra wrote:
> >
> > On Mon, Sep 15, 2025 at 08:15:19AM -0300, Luis Claudio R. Goncalves wrote:
> > > Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
> > > if pi_blocked_on is set") changed the behavior of put_task_struct()
> > > unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
> > > with the commit description.
> > >
> > > Restore the previous behavior of put_task_struct() for the PREEMPT_RT
> > > disabled case.
> > >
> > > Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> > > Acked-by: Oleg Nesterov <oleg@redhat.com>
> > > Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> > > ---
> > >
> > >   Note: This patch is a fix motivated by Oleg Nesterov's question at
> > >   https://lore.kernel.org/linux-rt-devel/20250728201441.GA4690@redhat.com/
> >
> > Right, but I thought we did want to make this behaviour consistent.
> >
> > Why have !RT behave differently? That is, why isn't this simply a
> > 'buggy' comment/changelog issue?
> 
> Well, this was discussed several times, in particular see
> https://lore.kernel.org/all/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/
> 
> And task_struct->rcu_users was introduced to avoid RCU call in put_task_struct() ...

Ah, I forgot about that thing.. Although I had vague memories of that
argument on rcu_assign_pointer() vs RCU_INIT_POINTER().

> But I won't really argue if you decide to remove this !RT optimization, although
> I think it would be better to do this in a separate patch.

Right. So when they wanted to remove that preemptible() clause, I was
like why again do we have this !RT exception at all, and can't we get
rid of that.

If git isn't confusing me again, this got merged in this cycle. But so
far no benchmark came and told us this was a bad idea.

So what do we do now... do we restore the !RT exception (so far there
aren't any numbers to suggest this mattered) or do we let it be for a
bit and then go and clean things up?

---
 include/linux/sched.h      |  1 -
 include/linux/sched/task.h | 37 +++----------------------------------
 kernel/bpf/helpers.c       |  6 +++---
 kernel/exit.c              | 21 ++-------------------
 kernel/fork.c              | 14 +++++++++-----
 kernel/sched/core.c        |  3 +--
 6 files changed, 18 insertions(+), 64 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 644a01bdae70..fd6586c04ed3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1576,7 +1576,6 @@ struct task_struct {
 # endif
 #endif
 	struct rcu_head			rcu;
-	refcount_t			rcu_users;
 	int				pagefault_disabled;
 #ifdef CONFIG_MMU
 	struct task_struct		*oom_reaper_list;
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ea41795a352b..1125c44b205a 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -122,41 +122,12 @@ static inline struct task_struct *tryget_task_struct(struct task_struct *t)
 	return refcount_inc_not_zero(&t->usage) ? t : NULL;
 }
 
-extern void __put_task_struct(struct task_struct *t);
 extern void __put_task_struct_rcu_cb(struct rcu_head *rhp);
 
 static inline void put_task_struct(struct task_struct *t)
 {
-	if (!refcount_dec_and_test(&t->usage))
-		return;
-
-	/*
-	 * Under PREEMPT_RT, we can't call __put_task_struct
-	 * in atomic context because it will indirectly
-	 * acquire sleeping locks. The same is true if the
-	 * current process has a mutex enqueued (blocked on
-	 * a PI chain).
-	 *
-	 * In !RT, it is always safe to call __put_task_struct().
-	 * Though, in order to simplify the code, resort to the
-	 * deferred call too.
-	 *
-	 * call_rcu() will schedule __put_task_struct_rcu_cb()
-	 * to be called in process context.
-	 *
-	 * __put_task_struct() is called when
-	 * refcount_dec_and_test(&t->usage) succeeds.
-	 *
-	 * This means that it can't "conflict" with
-	 * put_task_struct_rcu_user() which abuses ->rcu the same
-	 * way; rcu_users has a reference so task->usage can't be
-	 * zero after rcu_users 1 -> 0 transition.
-	 *
-	 * delayed_free_task() also uses ->rcu, but it is only called
-	 * when it fails to fork a process. Therefore, there is no
-	 * way it can conflict with __put_task_struct().
-	 */
-	call_rcu(&t->rcu, __put_task_struct_rcu_cb);
+	if (refcount_dec_and_test(&t->usage))
+		call_rcu(&t->rcu, __put_task_struct_rcu_cb);
 }
 
 DEFINE_FREE(put_task, struct task_struct *, if (_T) put_task_struct(_T))
@@ -164,11 +135,9 @@ DEFINE_FREE(put_task, struct task_struct *, if (_T) put_task_struct(_T))
 static inline void put_task_struct_many(struct task_struct *t, int nr)
 {
 	if (refcount_sub_and_test(nr, &t->usage))
-		__put_task_struct(t);
+		call_rcu(&t->rcu, __put_task_struct_rcu_cb);
 }
 
-void put_task_struct_rcu_user(struct task_struct *task);
-
 /* Free all architecture-specific resources held by a thread. */
 void release_thread(struct task_struct *dead_task);
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 8af62cb243d9..bdc4e65bca5c 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2459,7 +2459,7 @@ __bpf_kfunc struct bpf_rb_node *bpf_rbtree_right(struct bpf_rb_root *root, struc
  */
 __bpf_kfunc struct task_struct *bpf_task_acquire(struct task_struct *p)
 {
-	if (refcount_inc_not_zero(&p->rcu_users))
+	if (refcount_inc_not_zero(&p->usage))
 		return p;
 	return NULL;
 }
@@ -2470,12 +2470,12 @@ __bpf_kfunc struct task_struct *bpf_task_acquire(struct task_struct *p)
  */
 __bpf_kfunc void bpf_task_release(struct task_struct *p)
 {
-	put_task_struct_rcu_user(p);
+	put_task_struct(p);
 }
 
 __bpf_kfunc void bpf_task_release_dtor(void *p)
 {
-	put_task_struct_rcu_user(p);
+	put_task_struct(p);
 }
 CFI_NOSEAL(bpf_task_release_dtor);
 
diff --git a/kernel/exit.c b/kernel/exit.c
index 343eb97543d5..240a6faa0e26 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -220,23 +220,6 @@ static void __exit_signal(struct release_task_post *post, struct task_struct *ts
 		tty_kref_put(tty);
 }
 
-static void delayed_put_task_struct(struct rcu_head *rhp)
-{
-	struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);
-
-	kprobe_flush_task(tsk);
-	rethook_flush_task(tsk);
-	perf_event_delayed_put(tsk);
-	trace_sched_process_free(tsk);
-	put_task_struct(tsk);
-}
-
-void put_task_struct_rcu_user(struct task_struct *task)
-{
-	if (refcount_dec_and_test(&task->rcu_users))
-		call_rcu(&task->rcu, delayed_put_task_struct);
-}
-
 void __weak release_thread(struct task_struct *dead_task)
 {
 }
@@ -305,7 +288,7 @@ void release_task(struct task_struct *p)
 	if (thread_group_leader(p))
 		flush_sigqueue(&p->signal->shared_pending);
 
-	put_task_struct_rcu_user(p);
+	put_task_struct(p);
 
 	p = leader;
 	if (unlikely(zap_leader))
@@ -1057,7 +1040,7 @@ void __noreturn make_task_dead(int signr)
 		pr_alert("Fixing recursive fault but reboot is needed!\n");
 		futex_exit_recursive(tsk);
 		tsk->exit_state = EXIT_DEAD;
-		refcount_inc(&tsk->rcu_users);
+		get_task_struct(tsk);
 		do_task_dead();
 	}
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 33dfb82af25b..383c811626a3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -106,6 +106,7 @@
 #include <linux/pidfs.h>
 #include <linux/tick.h>
 #include <linux/unwind_deferred.h>
+#include <linux/kprobes.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -729,12 +730,17 @@ static inline void put_signal_struct(struct signal_struct *sig)
 		free_signal_struct(sig);
 }
 
-void __put_task_struct(struct task_struct *tsk)
+static void __put_task_struct(struct task_struct *tsk)
 {
 	WARN_ON(!tsk->exit_state);
 	WARN_ON(refcount_read(&tsk->usage));
 	WARN_ON(tsk == current);
 
+	kprobe_flush_task(tsk);
+	rethook_flush_task(tsk);
+	perf_event_delayed_put(tsk);
+	trace_sched_process_free(tsk);
+
 	unwind_task_free(tsk);
 	sched_ext_free(tsk);
 	io_uring_free(tsk);
@@ -915,12 +921,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	dup_user_cpus_ptr(tsk, orig, node);
 
 	/*
-	 * One for the user space visible state that goes away when reaped.
+	 * One for user space visible state that goes away when reaped.
 	 * One for the scheduler.
 	 */
-	refcount_set(&tsk->rcu_users, 2);
-	/* One for the rcu users */
-	refcount_set(&tsk->usage, 1);
+	refcount_set(&tsk->usage, 2);
 #ifdef CONFIG_BLK_DEV_IO_TRACE
 	tsk->btrace_seq = 0;
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da2062de97a2..ad3c0e9f3eb4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5259,8 +5259,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 
 		/* Task is done with its stack. */
 		put_task_stack(prev);
-
-		put_task_struct_rcu_user(prev);
+		put_task_struct(prev);
 	}
 
 	return rq;

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Oleg Nesterov 4 months, 3 weeks ago

On 09/16, Peter Zijlstra wrote:
>
> On Mon, Sep 15, 2025 at 02:35:40PM +0200, Oleg Nesterov wrote:
> > On 09/15, Peter Zijlstra wrote:
> > >
> > > Why have !RT behave differently? That is, why isn't this simply a
> > > 'buggy' comment/changelog issue?
> >
> > Well, this was discussed several times, in particular see
> > https://lore.kernel.org/all/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/
> >
> > And task_struct->rcu_users was introduced to avoid RCU call in put_task_struct() ...
>
> Ah, I forgot about that thing.. Although I had vague memories of that
> argument on rcu_assign_pointer() vs RCU_INIT_POINTER().
>
> > But I won't really argue if you decide to remove this !RT optimization, although
> > I think it would be better to do this in a separate patch.
>
> Right. So when they wanted to remove that preemptible() clause, I was
> like why again do we have this !RT exception at all, and can't we get
> rid of that.
>
> If git isn't confusing me again, this got merged in this cycle. But so
> far no benchmark came and told us this was a bad idea.

I still think it would be safer to merge this patch from Luis before
v6.17, then possibly remove this special case in a separate patch...

> So what do we do now... do we restore the !RT exception (so far there
> aren't any numbers to suggest this mattered) or do we let it be for a
> bit and then go and clean things up?

It is not that simple. Please note that put_task_struct_rcu_user()
delays put(tsk->usage), not free(tsk).

So for example with this change

> @@ -305,7 +288,7 @@ void release_task(struct task_struct *p)
>  	if (thread_group_leader(p))
>  		flush_sigqueue(&p->signal->shared_pending);
>
> -	put_task_struct_rcu_user(p);
> +	put_task_struct(p);
>
>  	p = leader;
>  	if (unlikely(zap_leader))

This code

	rcu_read_lock();
	tsk = find_task_by_vpid(...);
	if (tsk)
		get_task_struct(tsk);
	rcu_read_unlock();

becomes wrong, get_task_struct(tsk) can increment tsk->usage == 0.

Oleg.

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Oleg Nesterov 3 months, 3 weeks ago

Peter,

So I think that delayed_put_task_struct() has to stay and this means
that we have 2 RCU grace periods in the "typical" case when the exiting
task without extra references is reaped (or autoreaped if a subthread).

As I said I won't really argue, but what is your final thought on this?
Should we forget this patch from Luis? I'd prefer to restore the !RT
exception...

And another question. Sasha and Greg are going to add 8671bad873eb
("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
to -stable. Do you think it is fine to backport this commit as is?

Oleg.

On 09/16, Oleg Nesterov wrote:
>
> On 09/16, Peter Zijlstra wrote:
> >
> > On Mon, Sep 15, 2025 at 02:35:40PM +0200, Oleg Nesterov wrote:
> > > On 09/15, Peter Zijlstra wrote:
> > > >
> > > > Why have !RT behave differently? That is, why isn't this simply a
> > > > 'buggy' comment/changelog issue?
> > >
> > > Well, this was discussed several times, in particular see
> > > https://lore.kernel.org/all/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/
> > >
> > > And task_struct->rcu_users was introduced to avoid RCU call in put_task_struct() ...
> >
> > Ah, I forgot about that thing.. Although I had vague memories of that
> > argument on rcu_assign_pointer() vs RCU_INIT_POINTER().
> >
> > > But I won't really argue if you decide to remove this !RT optimization, although
> > > I think it would be better to do this in a separate patch.
> >
> > Right. So when they wanted to remove that preemptible() clause, I was
> > like why again do we have this !RT exception at all, and can't we get
> > rid of that.
> >
> > If git isn't confusing me again, this got merged in this cycle. But so
> > far no benchmark came and told us this was a bad idea.
>
> I still think it would be safer to merge this patch from Luis before
> v6.17, then possibly remove this special case in a separate patch...
>
> > So what do we do now... do we restore the !RT exception (so far there
> > aren't any numbers to suggest this mattered) or do we let it be for a
> > bit and then go and clean things up?
>
> It is not that simple. Please note that put_task_struct_rcu_user()
> delays put(tsk->usage), not free(tsk).
>
> So for example with this change
>
> > @@ -305,7 +288,7 @@ void release_task(struct task_struct *p)
> >  	if (thread_group_leader(p))
> >  		flush_sigqueue(&p->signal->shared_pending);
> >
> > -	put_task_struct_rcu_user(p);
> > +	put_task_struct(p);
> >
> >  	p = leader;
> >  	if (unlikely(zap_leader))
>
> This code
>
> 	rcu_read_lock();
> 	tsk = find_task_by_vpid(...);
> 	if (tsk)
> 		get_task_struct(tsk);
> 	rcu_read_unlock();
>
> becomes wrong, get_task_struct(tsk) can increment tsk->usage == 0.
>
> Oleg.

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Luis Claudio R. Goncalves 3 months, 3 weeks ago

On Fri, Oct 17, 2025 at 04:39:45PM +0200, Oleg Nesterov wrote:
> Peter,
> 
> So I think that delayed_put_task_struct() has to stay and this means
> that we have 2 RCU grace periods in the "typical" case when the exiting
> task without extra references is reaped (or autoreaped if a subthread).
> 
> As I said I won't really argue, but what is your final thought on this?
> Should we forget this patch from Luis? I'd prefer to restore the !RT
> exception...
> 
> And another question. Sasha and Greg are going to add 8671bad873eb
> ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> to -stable. Do you think it is fine to backport this commit as is?

I am not aware of any specific RCU fixes that need to be present in order
for commit 8671bad873eb to work properly on older kernels. That was my initial
fear when Sasha and Greg mentioned the backport to Stable in August. The
reference I have is that the deferred __put_task_struct() calls have been
working for RT on older kernels for quite a while.

Anyway, I asked them to hold because this discussion here was still happening.

If that helps to get the ball moving and sounds reasonable to you all, I could
propose a STABLE-only patch that isolates the change to RT. Does that make
sense?

Luis

> Oleg.
> 
> On 09/16, Oleg Nesterov wrote:
> >
> > On 09/16, Peter Zijlstra wrote:
> > >
> > > On Mon, Sep 15, 2025 at 02:35:40PM +0200, Oleg Nesterov wrote:
> > > > On 09/15, Peter Zijlstra wrote:
> > > > >
> > > > > Why have !RT behave differently? That is, why isn't this simply a
> > > > > 'buggy' comment/changelog issue?
> > > >
> > > > Well, this was discussed several times, in particular see
> > > > https://lore.kernel.org/all/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/
> > > >
> > > > And task_struct->rcu_users was introduced to avoid RCU call in put_task_struct() ...
> > >
> > > Ah, I forgot about that thing.. Although I had vague memories of that
> > > argument on rcu_assign_pointer() vs RCU_INIT_POINTER().
> > >
> > > > But I won't really argue if you decide to remove this !RT optimization, although
> > > > I think it would be better to do this in a separate patch.
> > >
> > > Right. So when they wanted to remove that preemptible() clause, I was
> > > like why again do we have this !RT exception at all, and can't we get
> > > rid of that.
> > >
> > > If git isn't confusing me again, this got merged in this cycle. But so
> > > far no benchmark came and told us this was a bad idea.
> >
> > I still think it would be safer to merge this patch from Luis before
> > v6.17, then possibly remove this special case in a separate patch...
> >
> > > So what do we do now... do we restore the !RT exception (so far there
> > > aren't any numbers to suggest this mattered) or do we let it be for a
> > > bit and then go and clean things up?
> >
> > It is not that simple. Please note that put_task_struct_rcu_user()
> > delays put(tsk->usage), not free(tsk).
> >
> > So for example with this change
> >
> > > @@ -305,7 +288,7 @@ void release_task(struct task_struct *p)
> > >  	if (thread_group_leader(p))
> > >  		flush_sigqueue(&p->signal->shared_pending);
> > >
> > > -	put_task_struct_rcu_user(p);
> > > +	put_task_struct(p);
> > >
> > >  	p = leader;
> > >  	if (unlikely(zap_leader))
> >
> > This code
> >
> > 	rcu_read_lock();
> > 	tsk = find_task_by_vpid(...);
> > 	if (tsk)
> > 		get_task_struct(tsk);
> > 	rcu_read_unlock();
> >
> > becomes wrong, get_task_struct(tsk) can increment tsk->usage == 0.
> >
> > Oleg.
> 
---end quoted text---

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Sebastian Andrzej Siewior 4 months, 3 weeks ago

On 2025-09-15 13:38:12 [+0200], Peter Zijlstra wrote:
> Right, but I thought we did want to make this behaviour consistent.

That is correct, that is what I asked for (either consistent or a
compelling reason why not).
Oleg pointed out that the patch description does not match the change.
That is the only complaint.

> And IIRC RT has been running with this for ages, and never seen a
> problem.

The syz-bot picked up RT recently. Other than that, yes, RT had it for
ages.

Sebastian

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Luis Claudio R. Goncalves 4 months, 3 weeks ago

On Mon, Sep 15, 2025 at 02:24:44PM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-09-15 13:38:12 [+0200], Peter Zijlstra wrote:
> > Right, but I thought we did want to make this behaviour consistent.
> 
> That is correct, that is what I asked for (either consistent or a
> compelling reason why not).

I received a ping from colleagues investigating a problem possible caused
by excessive pressure on RCU and this change could make the problem worse.
But last I heard from them, sounds like the problem they are investigating
lies elsewhere.

That said, I don't have problems either way, limiting the change to RT or
keeping it general. I will follow whatever guidance I get from you all.

Luis

> Oleg pointed out that the patch description does not match the change.
> That is the only complaint.
> 
> > And IIRC RT has been running with this for ages, and never seen a
> > problem.
> 
> The syz-bot picked up RT recently. Other than that, yes, RT had it for
> ages.
> 
> Sebastian
> 
---end quoted text---

Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt

Posted by Sebastian Andrzej Siewior 4 months, 3 weeks ago

On 2025-09-15 11:49:37 [-0300], Luis Claudio R. Goncalves wrote:
> On Mon, Sep 15, 2025 at 02:24:44PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2025-09-15 13:38:12 [+0200], Peter Zijlstra wrote:
> > > Right, but I thought we did want to make this behaviour consistent.
> > 
> > That is correct, that is what I asked for (either consistent or a
> > compelling reason why not).
> 
> I received a ping from colleagues investigating a problem possible caused
> by excessive pressure on RCU and this change could make the problem worse.
> But last I heard from them, sounds like the problem they are investigating
> lies elsewhere.

Either way, if this change introduces a problem then it affects RT as
today. So we should deal with this. If it is not a problem and it just
put pressure on an existing problem then it shouldn't be an issue.

> Luis

Sebastian