[PATCH 5/6] rcu: Remove full memory barrier on RCU stall printout

Frederic Weisbecker posted 6 patches 11 months, 3 weeks ago
[PATCH 5/6] rcu: Remove full memory barrier on RCU stall printout
Posted by Frederic Weisbecker 11 months, 3 weeks ago
RCU stall printout fetches the EQS state of a CPU with a preceding full
memory barrier. However there is nothing to order this read against at
this debugging stage. It is inherently racy when performed remotely.

Do a plain read instead.

This was the last user of rcu_dynticks_snap().

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c       | 10 ----------
 kernel/rcu/tree_stall.h |  4 ++--
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 02f6f3483482..04dde7473613 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -299,16 +299,6 @@ static void rcu_dynticks_eqs_online(void)
 	ct_state_inc(RCU_DYNTICKS_IDX);
 }
 
-/*
- * Snapshot the ->dynticks counter with full ordering so as to allow
- * stable comparison of this counter with past and future snapshots.
- */
-static int rcu_dynticks_snap(int cpu)
-{
-	smp_mb();  // Fundamental RCU ordering guarantee.
-	return ct_dynticks_cpu_acquire(cpu);
-}
-
 /*
  * Return true if the snapshot returned from rcu_dynticks_snap()
  * indicates that RCU is in an extended quiescent state.
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 460efecd077b..4b0e9d7c4c68 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -501,7 +501,7 @@ static void print_cpu_stall_info(int cpu)
 	}
 	delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq);
 	falsepositive = rcu_is_gp_kthread_starving(NULL) &&
-			rcu_dynticks_in_eqs(rcu_dynticks_snap(cpu));
+			rcu_dynticks_in_eqs(ct_dynticks_cpu(cpu));
 	rcuc_starved = rcu_is_rcuc_kthread_starving(rdp, &j);
 	if (rcuc_starved)
 		// Print signed value, as negative values indicate a probable bug.
@@ -515,7 +515,7 @@ static void print_cpu_stall_info(int cpu)
 			rdp->rcu_iw_pending ? (int)min(delta, 9UL) + '0' :
 				"!."[!delta],
 	       ticks_value, ticks_title,
-	       rcu_dynticks_snap(cpu) & 0xffff,
+	       ct_dynticks_cpu(cpu) & 0xffff,
 	       ct_dynticks_nesting_cpu(cpu), ct_dynticks_nmi_nesting_cpu(cpu),
 	       rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
 	       data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
-- 
2.44.0
Re: [PATCH 5/6] rcu: Remove full memory barrier on RCU stall printout
Posted by Paul E. McKenney 11 months, 1 week ago
On Wed, May 15, 2024 at 02:53:31PM +0200, Frederic Weisbecker wrote:
> RCU stall printout fetches the EQS state of a CPU with a preceding full
> memory barrier. However there is nothing to order this read against at
> this debugging stage. It is inherently racy when performed remotely.
> 
> Do a plain read instead.
> 
> This was the last user of rcu_dynticks_snap().
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

I went through all of these, and the look good.  Though I am a bit
nervous about this one.  The RCU CPU stall warning code used to be
completely unordered, but the hardware taught me better.  I did not
add these in response to a problem (just lazily used the existing fully
ordered primitive), but you never know.  Me, I would have kept the extra
memory barriers in all six patches because they are not on a fastpath,
but you are quite correct that they are redundant.

So I have queued these, and intend to send them into the next merge
window.  However, you now own vanilla RCU grace-period memory ordering,
both normal and expedited.  As in if someone else breaks it, you already
bought it.  ;-)

							Thanx, Paul

> ---
>  kernel/rcu/tree.c       | 10 ----------
>  kernel/rcu/tree_stall.h |  4 ++--
>  2 files changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 02f6f3483482..04dde7473613 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -299,16 +299,6 @@ static void rcu_dynticks_eqs_online(void)
>  	ct_state_inc(RCU_DYNTICKS_IDX);
>  }
>  
> -/*
> - * Snapshot the ->dynticks counter with full ordering so as to allow
> - * stable comparison of this counter with past and future snapshots.
> - */
> -static int rcu_dynticks_snap(int cpu)
> -{
> -	smp_mb();  // Fundamental RCU ordering guarantee.
> -	return ct_dynticks_cpu_acquire(cpu);
> -}
> -
>  /*
>   * Return true if the snapshot returned from rcu_dynticks_snap()
>   * indicates that RCU is in an extended quiescent state.
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 460efecd077b..4b0e9d7c4c68 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -501,7 +501,7 @@ static void print_cpu_stall_info(int cpu)
>  	}
>  	delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq);
>  	falsepositive = rcu_is_gp_kthread_starving(NULL) &&
> -			rcu_dynticks_in_eqs(rcu_dynticks_snap(cpu));
> +			rcu_dynticks_in_eqs(ct_dynticks_cpu(cpu));
>  	rcuc_starved = rcu_is_rcuc_kthread_starving(rdp, &j);
>  	if (rcuc_starved)
>  		// Print signed value, as negative values indicate a probable bug.
> @@ -515,7 +515,7 @@ static void print_cpu_stall_info(int cpu)
>  			rdp->rcu_iw_pending ? (int)min(delta, 9UL) + '0' :
>  				"!."[!delta],
>  	       ticks_value, ticks_title,
> -	       rcu_dynticks_snap(cpu) & 0xffff,
> +	       ct_dynticks_cpu(cpu) & 0xffff,
>  	       ct_dynticks_nesting_cpu(cpu), ct_dynticks_nmi_nesting_cpu(cpu),
>  	       rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
>  	       data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
> -- 
> 2.44.0
>
Re: [PATCH 5/6] rcu: Remove full memory barrier on RCU stall printout
Posted by Frederic Weisbecker 11 months, 1 week ago
Le Mon, Jun 03, 2024 at 05:10:54PM -0700, Paul E. McKenney a écrit :
> On Wed, May 15, 2024 at 02:53:31PM +0200, Frederic Weisbecker wrote:
> > RCU stall printout fetches the EQS state of a CPU with a preceding full
> > memory barrier. However there is nothing to order this read against at
> > this debugging stage. It is inherently racy when performed remotely.
> > 
> > Do a plain read instead.
> > 
> > This was the last user of rcu_dynticks_snap().
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> 
> I went through all of these, and the look good.  Though I am a bit
> nervous about this one.  The RCU CPU stall warning code used to be
> completely unordered, but the hardware taught me better.  I did not
> add these in response to a problem (just lazily used the existing fully
> ordered primitive), but you never know.

At least I haven't found against what it is ordering the dynticks counter here.

> Me, I would have kept the extra
> memory barriers in all six patches because they are not on a fastpath,

It is still time to discard the patches :-)

> but you are quite correct that they are redundant.

Yes and it's not so much for optimization purpose, like you said it's
not a fast-path, although in the case of fqs round scan it _might_ be
debatable in the presence of hurry callbacks, but I use those changes
more for documentation purpose. My opinion on that being that having
memory barriers when they are not necessary doesn't help reviewers and
doesn't bring the incentive to actually verify that the ordering is
correct when it is really required, since there is so much of it
everywhere anyway. I'd rather have a clear, well visible and precise
picture. But that's just personal belief.

> 
> So I have queued these, and intend to send them into the next merge
> window.  However, you now own vanilla RCU grace-period memory ordering,
> both normal and expedited.  As in if someone else breaks it, you already
> bought it.  ;-)

Sure, but it's a bet. That one day a younger person will buy it from me
double the price ;-)

Thanks.
Re: [PATCH 5/6] rcu: Remove full memory barrier on RCU stall printout
Posted by Paul E. McKenney 11 months, 1 week ago
On Tue, Jun 04, 2024 at 01:13:25PM +0200, Frederic Weisbecker wrote:
> Le Mon, Jun 03, 2024 at 05:10:54PM -0700, Paul E. McKenney a écrit :
> > On Wed, May 15, 2024 at 02:53:31PM +0200, Frederic Weisbecker wrote:
> > > RCU stall printout fetches the EQS state of a CPU with a preceding full
> > > memory barrier. However there is nothing to order this read against at
> > > this debugging stage. It is inherently racy when performed remotely.
> > > 
> > > Do a plain read instead.
> > > 
> > > This was the last user of rcu_dynticks_snap().
> > > 
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > 
> > I went through all of these, and the look good.  Though I am a bit
> > nervous about this one.  The RCU CPU stall warning code used to be
> > completely unordered, but the hardware taught me better.  I did not
> > add these in response to a problem (just lazily used the existing fully
> > ordered primitive), but you never know.
> 
> At least I haven't found against what it is ordering the dynticks counter here.
> 
> > Me, I would have kept the extra
> > memory barriers in all six patches because they are not on a fastpath,
> 
> It is still time to discard the patches :-)

And there is also still time for you to add comments.  ;-)

> > but you are quite correct that they are redundant.
> 
> Yes and it's not so much for optimization purpose, like you said it's
> not a fast-path, although in the case of fqs round scan it _might_ be
> debatable in the presence of hurry callbacks, but I use those changes
> more for documentation purpose. My opinion on that being that having
> memory barriers when they are not necessary doesn't help reviewers and
> doesn't bring the incentive to actually verify that the ordering is
> correct when it is really required, since there is so much of it
> everywhere anyway. I'd rather have a clear, well visible and precise
> picture. But that's just personal belief.

Redundant memory barriers can be OK, but only if they make the algorithm
easier to understand, as we found in SRCU.  It is not clear that these
fit that bill, or, alternatively, that appropriate comments wouldn't be
an improvement over the redundant memory barrier.

> > So I have queued these, and intend to send them into the next merge
> > window.  However, you now own vanilla RCU grace-period memory ordering,
> > both normal and expedited.  As in if someone else breaks it, you already
> > bought it.  ;-)
> 
> Sure, but it's a bet. That one day a younger person will buy it from me
> double the price ;-)

;-) ;-) ;-)

							Thanx, Paul
Re: [PATCH 5/6] rcu: Remove full memory barrier on RCU stall printout
Posted by Valentin Schneider 11 months, 3 weeks ago
On 15/05/24 14:53, Frederic Weisbecker wrote:
> RCU stall printout fetches the EQS state of a CPU with a preceding full
> memory barrier. However there is nothing to order this read against at
> this debugging stage. It is inherently racy when performed remotely.
>
> Do a plain read instead.
>
> This was the last user of rcu_dynticks_snap().
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Reviewed-by: Valentin Schneider <vschneid@redhat.com>