[v1] rcu/exp updates

[PATCH 2/3] rcu/exp: Remove confusing needless full barrier on task unblock

Posted by Frederic Weisbecker 11 months, 4 weeks ago

A full memory barrier in the RCU-PREEMPT task unblock path advertizes
to order the context switch (or rather the accesses prior to
rcu_read_unlock()) with the expedited grace period fastpath.

However the grace period can not complete without the rnp calling into
rcu_report_exp_rnp() with the node locked. This reports the quiescent
state in a fully ordered fashion against updater's accesses thanks to:

1) The READ-SIDE smp_mb__after_unlock_lock() barrier accross nodes
   locking while propagating QS up to the root.

2) The UPDATE-SIDE smp_mb__after_unlock_lock() barrier while holding the
   the root rnp to wait/check for the GP completion.

3) The (perhaps redundant given step 1) and 2)) smp_mb() in rcu_seq_end()
   before the grace period completes.

This makes the explicit barrier in this place superflous. Therefore
remove it as it is confusing.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_plugin.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3c0bbbbb686f..d51cc7a5dfc7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -534,7 +534,6 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 		WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq &&
 			     (!empty_norm || rnp->qsmask));
 		empty_exp = sync_rcu_exp_done(rnp);
-		smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
 		np = rcu_next_node_entry(t, rnp);
 		list_del_init(&t->rcu_node_entry);
 		t->rcu_blocked_node = NULL;
-- 
2.46.0

Re: [PATCH 2/3] rcu/exp: Remove confusing needless full barrier on task unblock

Posted by Joel Fernandes 11 months, 2 weeks ago

On Fri, Feb 14, 2025 at 12:25:58AM +0100, Frederic Weisbecker wrote:
> A full memory barrier in the RCU-PREEMPT task unblock path advertizes
> to order the context switch (or rather the accesses prior to
> rcu_read_unlock()) with the expedited grace period fastpath.
> 
> However the grace period can not complete without the rnp calling into
> rcu_report_exp_rnp() with the node locked. This reports the quiescent
> state in a fully ordered fashion against updater's accesses thanks to:
> 
> 1) The READ-SIDE smp_mb__after_unlock_lock() barrier accross nodes
>    locking while propagating QS up to the root.
> 
> 2) The UPDATE-SIDE smp_mb__after_unlock_lock() barrier while holding the
>    the root rnp to wait/check for the GP completion.
> 
> 3) The (perhaps redundant given step 1) and 2)) smp_mb() in rcu_seq_end()
>    before the grace period completes.
> 
> This makes the explicit barrier in this place superflous. Therefore
> remove it as it is confusing.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/rcu/tree_plugin.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 3c0bbbbb686f..d51cc7a5dfc7 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -534,7 +534,6 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
>  		WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq &&
>  			     (!empty_norm || rnp->qsmask));
>  		empty_exp = sync_rcu_exp_done(rnp);
> -		smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */

I was wondering though, this is a slow path and the smp_mb() has been there
since 2009 or so. Not sure if it is super valuable to remove it at this
point. But we/I should definitely understand it.

I was wondering if you could also point to the fastpath that this is racing
with, it is not immediately clear (to me) what this smp_mb() is pairing with :(

thanks,

 - Joel





>  		np = rcu_next_node_entry(t, rnp);
>  		list_del_init(&t->rcu_node_entry);
>  		t->rcu_blocked_node = NULL;
> -- 
> 2.46.0
>

Re: [PATCH 2/3] rcu/exp: Remove confusing needless full barrier on task unblock

Posted by Frederic Weisbecker 11 months, 2 weeks ago

Le Tue, Feb 25, 2025 at 04:59:08PM -0500, Joel Fernandes a écrit :
> On Fri, Feb 14, 2025 at 12:25:58AM +0100, Frederic Weisbecker wrote:
> > A full memory barrier in the RCU-PREEMPT task unblock path advertizes
> > to order the context switch (or rather the accesses prior to
> > rcu_read_unlock()) with the expedited grace period fastpath.
> > 
> > However the grace period can not complete without the rnp calling into
> > rcu_report_exp_rnp() with the node locked. This reports the quiescent
> > state in a fully ordered fashion against updater's accesses thanks to:
> > 
> > 1) The READ-SIDE smp_mb__after_unlock_lock() barrier accross nodes
> >    locking while propagating QS up to the root.
> > 
> > 2) The UPDATE-SIDE smp_mb__after_unlock_lock() barrier while holding the
> >    the root rnp to wait/check for the GP completion.
> > 
> > 3) The (perhaps redundant given step 1) and 2)) smp_mb() in rcu_seq_end()
> >    before the grace period completes.
> > 
> > This makes the explicit barrier in this place superflous. Therefore
> > remove it as it is confusing.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  kernel/rcu/tree_plugin.h | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 3c0bbbbb686f..d51cc7a5dfc7 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -534,7 +534,6 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
> >  		WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq &&
> >  			     (!empty_norm || rnp->qsmask));
> >  		empty_exp = sync_rcu_exp_done(rnp);
> > -		smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
> 
> I was wondering though, this is a slow path and the smp_mb() has been there
> since 2009 or so. Not sure if it is super valuable to remove it at this
> point. But we/I should definitely understand it.

The point is indeed not to improve performance because this is a slowpath
(although...). The main goal is to maintain a clear picture of the ordering
without needless barriers that leave a taste of doubt to reviewers.

> I was wondering if you could also point to the fastpath that this is racing
> with, it is not immediately clear (to me) what this smp_mb() is pairing with
> :(

It is supposed to pair with the barrier in sync_exp_work_done() but then again
this is already enforced by the smp_mb__after_unlock_lock() chained through
rnp locking.

Thanks.

> 
> thanks,
> 
>  - Joel
> 
> 
> 
> 
> 
> >  		np = rcu_next_node_entry(t, rnp);
> >  		list_del_init(&t->rcu_node_entry);
> >  		t->rcu_blocked_node = NULL;
> > -- 
> > 2.46.0
> >

Re: [PATCH 2/3] rcu/exp: Remove confusing needless full barrier on task unblock

Posted by Paul E. McKenney 11 months, 2 weeks ago

On Wed, Feb 26, 2025 at 01:52:09PM +0100, Frederic Weisbecker wrote:
> Le Tue, Feb 25, 2025 at 04:59:08PM -0500, Joel Fernandes a écrit :
> > On Fri, Feb 14, 2025 at 12:25:58AM +0100, Frederic Weisbecker wrote:
> > > A full memory barrier in the RCU-PREEMPT task unblock path advertizes
> > > to order the context switch (or rather the accesses prior to
> > > rcu_read_unlock()) with the expedited grace period fastpath.
> > > 
> > > However the grace period can not complete without the rnp calling into
> > > rcu_report_exp_rnp() with the node locked. This reports the quiescent
> > > state in a fully ordered fashion against updater's accesses thanks to:
> > > 
> > > 1) The READ-SIDE smp_mb__after_unlock_lock() barrier accross nodes
> > >    locking while propagating QS up to the root.
> > > 
> > > 2) The UPDATE-SIDE smp_mb__after_unlock_lock() barrier while holding the
> > >    the root rnp to wait/check for the GP completion.
> > > 
> > > 3) The (perhaps redundant given step 1) and 2)) smp_mb() in rcu_seq_end()
> > >    before the grace period completes.
> > > 
> > > This makes the explicit barrier in this place superflous. Therefore
> > > remove it as it is confusing.
> > > 
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > ---
> > >  kernel/rcu/tree_plugin.h | 1 -
> > >  1 file changed, 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 3c0bbbbb686f..d51cc7a5dfc7 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -534,7 +534,6 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
> > >  		WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq &&
> > >  			     (!empty_norm || rnp->qsmask));
> > >  		empty_exp = sync_rcu_exp_done(rnp);
> > > -		smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
> > 
> > I was wondering though, this is a slow path and the smp_mb() has been there
> > since 2009 or so. Not sure if it is super valuable to remove it at this
> > point. But we/I should definitely understand it.
> 
> The point is indeed not to improve performance because this is a slowpath
> (although...). The main goal is to maintain a clear picture of the ordering
> without needless barriers that leave a taste of doubt to reviewers.

Agreed!

> > I was wondering if you could also point to the fastpath that this is racing
> > with, it is not immediately clear (to me) what this smp_mb() is pairing with
> > :(
> 
> It is supposed to pair with the barrier in sync_exp_work_done() but then again
> this is already enforced by the smp_mb__after_unlock_lock() chained through
> rnp locking.

You could interpret that "Order GP completion with preceding accesses"
to include preceding readers, which to your point sounds plausible.
And in that case, again as you say, the raw_spin_lock_irqsave_rcu_node()
in rcu_report_exp_rnp() provides the needed ordering.

I think.  ;-)

							Thanx, Paul

> Thanks.
> 
> > 
> > thanks,
> > 
> >  - Joel
> > 
> > 
> > 
> > 
> > 
> > >  		np = rcu_next_node_entry(t, rnp);
> > >  		list_del_init(&t->rcu_node_entry);
> > >  		t->rcu_blocked_node = NULL;
> > > -- 
> > > 2.46.0
> > >

Re: [PATCH 2/3] rcu/exp: Remove confusing needless full barrier on task unblock

Posted by Joel Fernandes 11 months, 2 weeks ago

On 2/26/2025 10:04 AM, Paul E. McKenney wrote:
>>> I was wondering if you could also point to the fastpath that this is racing
>>> with, it is not immediately clear (to me) what this smp_mb() is pairing with
>>> 🙁
>> It is supposed to pair with the barrier in sync_exp_work_done() but then again
>> this is already enforced by the smp_mb__after_unlock_lock() chained through
>> rnp locking.
> You could interpret that "Order GP completion with preceding accesses"
> to include preceding readers, which to your point sounds plausible.
> And in that case, again as you say, the raw_spin_lock_irqsave_rcu_node()
> in rcu_report_exp_rnp() provides the needed ordering.
> 
> I think.  😉

This is for the case where readers are blocked. For the case where readers were
not blocked, and we exited the RSCS, we just expect the regular QS reporting
paths to call into rcu_report_exp_rnp() and similarly provide the full memory
barrier (smp_mb) on the now-reader-unlocked-CPU right?

Just wanted to check my understanding was correct :)

Also if I may paraphrase the ordering requirement here, we do not want RCU
readers to observe any modifications happening to data after the GP has ended
(i.e. synchronize_rcu_expedited() has returned). Similarly, we do not want
updates in the pre-existing readers to not be visible to accesses after the GP
has ended. Right?

thanks,

 - Joel

Re: [PATCH 2/3] rcu/exp: Remove confusing needless full barrier on task unblock

Posted by Frederic Weisbecker 11 months, 2 weeks ago

Le Wed, Feb 26, 2025 at 10:26:34AM -0500, Joel Fernandes a écrit :
> 
> 
> On 2/26/2025 10:04 AM, Paul E. McKenney wrote:
> >>> I was wondering if you could also point to the fastpath that this is racing
> >>> with, it is not immediately clear (to me) what this smp_mb() is pairing with
> >>> 🙁
> >> It is supposed to pair with the barrier in sync_exp_work_done() but then again
> >> this is already enforced by the smp_mb__after_unlock_lock() chained through
> >> rnp locking.
> > You could interpret that "Order GP completion with preceding accesses"
> > to include preceding readers, which to your point sounds plausible.
> > And in that case, again as you say, the raw_spin_lock_irqsave_rcu_node()
> > in rcu_report_exp_rnp() provides the needed ordering.
> > 
> > I think.  😉
> 
> This is for the case where readers are blocked. For the case where readers were
> not blocked, and we exited the RSCS, we just expect the regular QS reporting
> paths to call into rcu_report_exp_rnp() and similarly provide the full memory
> barrier (smp_mb) on the now-reader-unlocked-CPU right?

Right, again through rnp locking and smp_mb__after_unlock_lock().

> 
> Just wanted to check my understanding was correct :)
> 
> Also if I may paraphrase the ordering requirement here, we do not want RCU
> readers to observe any modifications happening to data after the GP has ended
> (i.e. synchronize_rcu_expedited() has returned). Similarly, we do not want
> updates in the pre-existing readers to not be visible to accesses after the GP
> has ended. Right?

Exactly!

Thanks.

> 
> thanks,
> 
>  - Joel

Re: [PATCH 2/3] rcu/exp: Remove confusing needless full barrier on task unblock

Posted by Paul E. McKenney 11 months, 2 weeks ago

On Tue, Feb 25, 2025 at 04:59:08PM -0500, Joel Fernandes wrote:
> On Fri, Feb 14, 2025 at 12:25:58AM +0100, Frederic Weisbecker wrote:
> > A full memory barrier in the RCU-PREEMPT task unblock path advertizes
> > to order the context switch (or rather the accesses prior to
> > rcu_read_unlock()) with the expedited grace period fastpath.
> > 
> > However the grace period can not complete without the rnp calling into
> > rcu_report_exp_rnp() with the node locked. This reports the quiescent
> > state in a fully ordered fashion against updater's accesses thanks to:
> > 
> > 1) The READ-SIDE smp_mb__after_unlock_lock() barrier accross nodes
> >    locking while propagating QS up to the root.
> > 
> > 2) The UPDATE-SIDE smp_mb__after_unlock_lock() barrier while holding the
> >    the root rnp to wait/check for the GP completion.
> > 
> > 3) The (perhaps redundant given step 1) and 2)) smp_mb() in rcu_seq_end()
> >    before the grace period completes.
> > 
> > This makes the explicit barrier in this place superflous. Therefore
> > remove it as it is confusing.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  kernel/rcu/tree_plugin.h | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 3c0bbbbb686f..d51cc7a5dfc7 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -534,7 +534,6 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
> >  		WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq &&
> >  			     (!empty_norm || rnp->qsmask));
> >  		empty_exp = sync_rcu_exp_done(rnp);
> > -		smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
> 
> I was wondering though, this is a slow path and the smp_mb() has been there
> since 2009 or so. Not sure if it is super valuable to remove it at this
> point. But we/I should definitely understand it.
> 
> I was wondering if you could also point to the fastpath that this is racing
> with, it is not immediately clear (to me) what this smp_mb() is pairing with :(

My guess was one of the lock acquisitions or dyntick checks in
__sync_rcu_exp_select_node_cpus(), but I am not seeing anything there.
In this context, "fastpath" would be one of the early exits, for example,
the "continue" statements in the second for_each_leaf_node_cpu_mask()
loop.

But again, I am not seeing anything that appears to need that smp_mb().
As you say, that smp_mb() is not on a fastpath, so we need to check
carefully before removing it.

						Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> 
> 
> 
> >  		np = rcu_next_node_entry(t, rnp);
> >  		list_del_init(&t->rcu_node_entry);
> >  		t->rcu_blocked_node = NULL;
> > -- 
> > 2.46.0
> >