RCU stall printout fetches the EQS state of a CPU with a preceding full
memory barrier. However there is nothing to order this read against at
this debugging stage. It is inherently racy when performed remotely.
Do a plain read instead.
This was the last user of rcu_dynticks_snap().
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/rcu/tree.c | 10 ----------
kernel/rcu/tree_stall.h | 4 ++--
2 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 02f6f3483482..04dde7473613 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -299,16 +299,6 @@ static void rcu_dynticks_eqs_online(void)
ct_state_inc(RCU_DYNTICKS_IDX);
}
-/*
- * Snapshot the ->dynticks counter with full ordering so as to allow
- * stable comparison of this counter with past and future snapshots.
- */
-static int rcu_dynticks_snap(int cpu)
-{
- smp_mb(); // Fundamental RCU ordering guarantee.
- return ct_dynticks_cpu_acquire(cpu);
-}
-
/*
* Return true if the snapshot returned from rcu_dynticks_snap()
* indicates that RCU is in an extended quiescent state.
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 460efecd077b..4b0e9d7c4c68 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -501,7 +501,7 @@ static void print_cpu_stall_info(int cpu)
}
delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq);
falsepositive = rcu_is_gp_kthread_starving(NULL) &&
- rcu_dynticks_in_eqs(rcu_dynticks_snap(cpu));
+ rcu_dynticks_in_eqs(ct_dynticks_cpu(cpu));
rcuc_starved = rcu_is_rcuc_kthread_starving(rdp, &j);
if (rcuc_starved)
// Print signed value, as negative values indicate a probable bug.
@@ -515,7 +515,7 @@ static void print_cpu_stall_info(int cpu)
rdp->rcu_iw_pending ? (int)min(delta, 9UL) + '0' :
"!."[!delta],
ticks_value, ticks_title,
- rcu_dynticks_snap(cpu) & 0xffff,
+ ct_dynticks_cpu(cpu) & 0xffff,
ct_dynticks_nesting_cpu(cpu), ct_dynticks_nmi_nesting_cpu(cpu),
rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu),
data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart,
--
2.44.0
On Wed, May 15, 2024 at 02:53:31PM +0200, Frederic Weisbecker wrote: > RCU stall printout fetches the EQS state of a CPU with a preceding full > memory barrier. However there is nothing to order this read against at > this debugging stage. It is inherently racy when performed remotely. > > Do a plain read instead. > > This was the last user of rcu_dynticks_snap(). > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> I went through all of these, and the look good. Though I am a bit nervous about this one. The RCU CPU stall warning code used to be completely unordered, but the hardware taught me better. I did not add these in response to a problem (just lazily used the existing fully ordered primitive), but you never know. Me, I would have kept the extra memory barriers in all six patches because they are not on a fastpath, but you are quite correct that they are redundant. So I have queued these, and intend to send them into the next merge window. However, you now own vanilla RCU grace-period memory ordering, both normal and expedited. As in if someone else breaks it, you already bought it. ;-) Thanx, Paul > --- > kernel/rcu/tree.c | 10 ---------- > kernel/rcu/tree_stall.h | 4 ++-- > 2 files changed, 2 insertions(+), 12 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 02f6f3483482..04dde7473613 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -299,16 +299,6 @@ static void rcu_dynticks_eqs_online(void) > ct_state_inc(RCU_DYNTICKS_IDX); > } > > -/* > - * Snapshot the ->dynticks counter with full ordering so as to allow > - * stable comparison of this counter with past and future snapshots. > - */ > -static int rcu_dynticks_snap(int cpu) > -{ > - smp_mb(); // Fundamental RCU ordering guarantee. > - return ct_dynticks_cpu_acquire(cpu); > -} > - > /* > * Return true if the snapshot returned from rcu_dynticks_snap() > * indicates that RCU is in an extended quiescent state. > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index 460efecd077b..4b0e9d7c4c68 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h > @@ -501,7 +501,7 @@ static void print_cpu_stall_info(int cpu) > } > delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq); > falsepositive = rcu_is_gp_kthread_starving(NULL) && > - rcu_dynticks_in_eqs(rcu_dynticks_snap(cpu)); > + rcu_dynticks_in_eqs(ct_dynticks_cpu(cpu)); > rcuc_starved = rcu_is_rcuc_kthread_starving(rdp, &j); > if (rcuc_starved) > // Print signed value, as negative values indicate a probable bug. > @@ -515,7 +515,7 @@ static void print_cpu_stall_info(int cpu) > rdp->rcu_iw_pending ? (int)min(delta, 9UL) + '0' : > "!."[!delta], > ticks_value, ticks_title, > - rcu_dynticks_snap(cpu) & 0xffff, > + ct_dynticks_cpu(cpu) & 0xffff, > ct_dynticks_nesting_cpu(cpu), ct_dynticks_nmi_nesting_cpu(cpu), > rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu), > data_race(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart, > -- > 2.44.0 >
Le Mon, Jun 03, 2024 at 05:10:54PM -0700, Paul E. McKenney a écrit : > On Wed, May 15, 2024 at 02:53:31PM +0200, Frederic Weisbecker wrote: > > RCU stall printout fetches the EQS state of a CPU with a preceding full > > memory barrier. However there is nothing to order this read against at > > this debugging stage. It is inherently racy when performed remotely. > > > > Do a plain read instead. > > > > This was the last user of rcu_dynticks_snap(). > > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > > I went through all of these, and the look good. Though I am a bit > nervous about this one. The RCU CPU stall warning code used to be > completely unordered, but the hardware taught me better. I did not > add these in response to a problem (just lazily used the existing fully > ordered primitive), but you never know. At least I haven't found against what it is ordering the dynticks counter here. > Me, I would have kept the extra > memory barriers in all six patches because they are not on a fastpath, It is still time to discard the patches :-) > but you are quite correct that they are redundant. Yes and it's not so much for optimization purpose, like you said it's not a fast-path, although in the case of fqs round scan it _might_ be debatable in the presence of hurry callbacks, but I use those changes more for documentation purpose. My opinion on that being that having memory barriers when they are not necessary doesn't help reviewers and doesn't bring the incentive to actually verify that the ordering is correct when it is really required, since there is so much of it everywhere anyway. I'd rather have a clear, well visible and precise picture. But that's just personal belief. > > So I have queued these, and intend to send them into the next merge > window. However, you now own vanilla RCU grace-period memory ordering, > both normal and expedited. As in if someone else breaks it, you already > bought it. ;-) Sure, but it's a bet. That one day a younger person will buy it from me double the price ;-) Thanks.
On Tue, Jun 04, 2024 at 01:13:25PM +0200, Frederic Weisbecker wrote: > Le Mon, Jun 03, 2024 at 05:10:54PM -0700, Paul E. McKenney a écrit : > > On Wed, May 15, 2024 at 02:53:31PM +0200, Frederic Weisbecker wrote: > > > RCU stall printout fetches the EQS state of a CPU with a preceding full > > > memory barrier. However there is nothing to order this read against at > > > this debugging stage. It is inherently racy when performed remotely. > > > > > > Do a plain read instead. > > > > > > This was the last user of rcu_dynticks_snap(). > > > > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > > > > I went through all of these, and the look good. Though I am a bit > > nervous about this one. The RCU CPU stall warning code used to be > > completely unordered, but the hardware taught me better. I did not > > add these in response to a problem (just lazily used the existing fully > > ordered primitive), but you never know. > > At least I haven't found against what it is ordering the dynticks counter here. > > > Me, I would have kept the extra > > memory barriers in all six patches because they are not on a fastpath, > > It is still time to discard the patches :-) And there is also still time for you to add comments. ;-) > > but you are quite correct that they are redundant. > > Yes and it's not so much for optimization purpose, like you said it's > not a fast-path, although in the case of fqs round scan it _might_ be > debatable in the presence of hurry callbacks, but I use those changes > more for documentation purpose. My opinion on that being that having > memory barriers when they are not necessary doesn't help reviewers and > doesn't bring the incentive to actually verify that the ordering is > correct when it is really required, since there is so much of it > everywhere anyway. I'd rather have a clear, well visible and precise > picture. But that's just personal belief. Redundant memory barriers can be OK, but only if they make the algorithm easier to understand, as we found in SRCU. It is not clear that these fit that bill, or, alternatively, that appropriate comments wouldn't be an improvement over the redundant memory barrier. > > So I have queued these, and intend to send them into the next merge > > window. However, you now own vanilla RCU grace-period memory ordering, > > both normal and expedited. As in if someone else breaks it, you already > > bought it. ;-) > > Sure, but it's a bet. That one day a younger person will buy it from me > double the price ;-) ;-) ;-) ;-) Thanx, Paul
On 15/05/24 14:53, Frederic Weisbecker wrote: > RCU stall printout fetches the EQS state of a CPU with a preceding full > memory barrier. However there is nothing to order this read against at > this debugging stage. It is inherently racy when performed remotely. > > Do a plain read instead. > > This was the last user of rcu_dynticks_snap(). > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
© 2016 - 2025 Red Hat, Inc.