[PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field

Uladzislau Rezki (Sony) posted 48 patches 1 year, 7 months ago
[PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Uladzislau Rezki (Sony) 1 year, 7 months ago
From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_sync structure's ->gp_count field is updated under the protection
of ->rss_lock, but read locklessly, and KCSAN noted the data race.
This commit therefore uses WRITE_ONCE() to do this update to clearly
document its racy nature.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/sync.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index 86df878a2fee..6c2bd9001adc 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -122,7 +122,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
 		 * we are called at early boot time but this shouldn't happen.
 		 */
 	}
-	rsp->gp_count++;
+	WRITE_ONCE(rsp->gp_count, rsp->gp_count + 1);
 	spin_unlock_irq(&rsp->rss_lock);
 
 	if (gp_state == GP_IDLE) {
@@ -151,11 +151,15 @@ void rcu_sync_enter(struct rcu_sync *rsp)
  */
 void rcu_sync_exit(struct rcu_sync *rsp)
 {
+	int gpc;
+
 	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
 	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
 
 	spin_lock_irq(&rsp->rss_lock);
-	if (!--rsp->gp_count) {
+	gpc = rsp->gp_count - 1;
+	WRITE_ONCE(rsp->gp_count, gpc);
+	if (!gpc) {
 		if (rsp->gp_state == GP_PASSED) {
 			WRITE_ONCE(rsp->gp_state, GP_EXIT);
 			rcu_sync_call(rsp);
-- 
2.39.2
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Oleg Nesterov 1 year, 7 months ago
Hello,

I feel I don't really like this patch but I am travelling without my working
laptop, can't read the source code ;) Quite possibly I am wrong, I'll return
to this when I get back on May 10.

Oleg.

On 05/07, Uladzislau Rezki (Sony) wrote:
>
> From: "Paul E. McKenney" <paulmck@kernel.org>
>
> The rcu_sync structure's ->gp_count field is updated under the protection
> of ->rss_lock, but read locklessly, and KCSAN noted the data race.
> This commit therefore uses WRITE_ONCE() to do this update to clearly
> document its racy nature.
>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  kernel/rcu/sync.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> index 86df878a2fee..6c2bd9001adc 100644
> --- a/kernel/rcu/sync.c
> +++ b/kernel/rcu/sync.c
> @@ -122,7 +122,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
>  		 * we are called at early boot time but this shouldn't happen.
>  		 */
>  	}
> -	rsp->gp_count++;
> +	WRITE_ONCE(rsp->gp_count, rsp->gp_count + 1);
>  	spin_unlock_irq(&rsp->rss_lock);
>
>  	if (gp_state == GP_IDLE) {
> @@ -151,11 +151,15 @@ void rcu_sync_enter(struct rcu_sync *rsp)
>   */
>  void rcu_sync_exit(struct rcu_sync *rsp)
>  {
> +	int gpc;
> +
>  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
>  	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
>
>  	spin_lock_irq(&rsp->rss_lock);
> -	if (!--rsp->gp_count) {
> +	gpc = rsp->gp_count - 1;
> +	WRITE_ONCE(rsp->gp_count, gpc);
> +	if (!gpc) {
>  		if (rsp->gp_state == GP_PASSED) {
>  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
>  			rcu_sync_call(rsp);
> --
> 2.39.2
>
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Paul E. McKenney 1 year, 7 months ago
On Tue, May 07, 2024 at 10:54:41AM -0400, Oleg Nesterov wrote:
> Hello,
> 
> I feel I don't really like this patch but I am travelling without my working
> laptop, can't read the source code ;) Quite possibly I am wrong, I'll return
> to this when I get back on May 10.

By the stricter data-race rules used in RCU code [1], this is a bug that
needs to be fixed.  This code is updating ->gp_count, which is read
locklessly, which in turn results in a data race.  The fix is to mark
the updates (as below) with WRITE_ONCE().

Or is there something in one or the other of these updates to ->gp_count
that excludes lockless readers?  (I am not seeing it, but you know this
code way better than I do!)

							Thanx, Paul

[1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing

> Oleg.
> 
> On 05/07, Uladzislau Rezki (Sony) wrote:
> >
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> >
> > The rcu_sync structure's ->gp_count field is updated under the protection
> > of ->rss_lock, but read locklessly, and KCSAN noted the data race.
> > This commit therefore uses WRITE_ONCE() to do this update to clearly
> > document its racy nature.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  kernel/rcu/sync.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> > index 86df878a2fee..6c2bd9001adc 100644
> > --- a/kernel/rcu/sync.c
> > +++ b/kernel/rcu/sync.c
> > @@ -122,7 +122,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> >  		 * we are called at early boot time but this shouldn't happen.
> >  		 */
> >  	}
> > -	rsp->gp_count++;
> > +	WRITE_ONCE(rsp->gp_count, rsp->gp_count + 1);
> >  	spin_unlock_irq(&rsp->rss_lock);
> >
> >  	if (gp_state == GP_IDLE) {
> > @@ -151,11 +151,15 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> >   */
> >  void rcu_sync_exit(struct rcu_sync *rsp)
> >  {
> > +	int gpc;
> > +
> >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
> >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
> >
> >  	spin_lock_irq(&rsp->rss_lock);
> > -	if (!--rsp->gp_count) {
> > +	gpc = rsp->gp_count - 1;
> > +	WRITE_ONCE(rsp->gp_count, gpc);
> > +	if (!gpc) {
> >  		if (rsp->gp_state == GP_PASSED) {
> >  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
> >  			rcu_sync_call(rsp);
> > --
> > 2.39.2
> >
>
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Oleg Nesterov 1 year, 7 months ago
On 05/07, Paul E. McKenney wrote:
>
> By the stricter data-race rules used in RCU code [1],
...
> [1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing

I am getting more and more confused...

Does this mean that KCSAN/etc treats the files in kernel/rcu/
differently than the "Rest of Kernel"? Or what?

And how is it enforced?

Oleg.
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Paul E. McKenney 1 year, 7 months ago
On Fri, May 10, 2024 at 03:18:50PM +0200, Oleg Nesterov wrote:
> On 05/07, Paul E. McKenney wrote:
> >
> > By the stricter data-race rules used in RCU code [1],
> ...
> > [1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing
> 
> I am getting more and more confused...
> 
> Does this mean that KCSAN/etc treats the files in kernel/rcu/
> differently than the "Rest of Kernel"? Or what?

Yes.

> And how is it enforced?

By me running rcutorture with KCSAN with the Kconfig options listed in
that docusment.

							Thanx, Paul
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Oleg Nesterov 1 year, 7 months ago
On 05/10, Oleg Nesterov wrote:
>
> On 05/07, Paul E. McKenney wrote:
> >
> > By the stricter data-race rules used in RCU code [1],
> ...
> > [1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing
>
> I am getting more and more confused...
>
> Does this mean that KCSAN/etc treats the files in kernel/rcu/
> differently than the "Rest of Kernel"? Or what?
>
> And how is it enforced?

I can only find the strnstr(buf, "rcu") checks in skip_report(),
but they only cover the KCSAN_REPORT_VALUE_CHANGE_ONLY case...

Oleg.
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Paul E. McKenney 1 year, 7 months ago
On Fri, May 10, 2024 at 03:50:57PM +0200, Oleg Nesterov wrote:
> On 05/10, Oleg Nesterov wrote:
> >
> > On 05/07, Paul E. McKenney wrote:
> > >
> > > By the stricter data-race rules used in RCU code [1],
> > ...
> > > [1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing
> >
> > I am getting more and more confused...
> >
> > Does this mean that KCSAN/etc treats the files in kernel/rcu/
> > differently than the "Rest of Kernel"? Or what?
> >
> > And how is it enforced?
> 
> I can only find the strnstr(buf, "rcu") checks in skip_report(),
> but they only cover the KCSAN_REPORT_VALUE_CHANGE_ONLY case...

Huh, new one on me!  When I run KCSAN, I set CONFIG_KCSAN_STRICT=y,
which implies CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n, which should
prevent skip_report() from even being invoked.

Which suggests that in the rest of the kernel, including "rcu_"
in your function name gets you stricter KCSAN checking.  ;-)

							Thanx, Paul
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Marco Elver 1 year, 7 months ago
On Fri, 10 May 2024 at 16:11, Paul E. McKenney <paulmck@kernel.org> wrote:
[...]
> > > Does this mean that KCSAN/etc treats the files in kernel/rcu/
> > > differently than the "Rest of Kernel"? Or what?
> > >
> > > And how is it enforced?
> >
> > I can only find the strnstr(buf, "rcu") checks in skip_report(),
> > but they only cover the KCSAN_REPORT_VALUE_CHANGE_ONLY case...
>
> Huh, new one on me!  When I run KCSAN, I set CONFIG_KCSAN_STRICT=y,
> which implies CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n, which should
> prevent skip_report() from even being invoked.

The strnstr hack goes back to the first version of KCSAN released in
v5.8 [1]. It was added in response to Paul wanting the "stricter"
treatment for RCU even while KCSAN was still in development, and back
then syzbot was the first test system using KCSAN. Shortly after Paul
added a KCSAN config for rcutorture with a laundry list of options to
make KCSAN "strict" (before we eventually added CONFIG_KCSAN_STRICT
which greatly simplified that).

While the strnstr(.., "rcu") rules are redundant when using the
stricter rules (at least CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n is
set), we're keeping the "rcu" special case around because there are
test robots and fuzzers that use the default KCSAN config (unlike
rcutorture). And because we know that RCU likes the stricter rules,
the "value change only" filter is ignored in all KCSAN configs for
rcu-related functions.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/kcsan/report.c?id=v5.8

Back then syzbot occasionally reported RCU data races, but these days
rcutorture probably finds all of them before syzbot ever gets its
hands on new code.

Thanks,
-- Marco
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Paul E. McKenney 1 year, 7 months ago
On Mon, May 13, 2024 at 04:13:35PM +0200, Marco Elver wrote:
> On Fri, 10 May 2024 at 16:11, Paul E. McKenney <paulmck@kernel.org> wrote:
> [...]
> > > > Does this mean that KCSAN/etc treats the files in kernel/rcu/
> > > > differently than the "Rest of Kernel"? Or what?
> > > >
> > > > And how is it enforced?
> > >
> > > I can only find the strnstr(buf, "rcu") checks in skip_report(),
> > > but they only cover the KCSAN_REPORT_VALUE_CHANGE_ONLY case...
> >
> > Huh, new one on me!  When I run KCSAN, I set CONFIG_KCSAN_STRICT=y,
> > which implies CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n, which should
> > prevent skip_report() from even being invoked.
> 
> The strnstr hack goes back to the first version of KCSAN released in
> v5.8 [1]. It was added in response to Paul wanting the "stricter"
> treatment for RCU even while KCSAN was still in development, and back
> then syzbot was the first test system using KCSAN. Shortly after Paul
> added a KCSAN config for rcutorture with a laundry list of options to
> make KCSAN "strict" (before we eventually added CONFIG_KCSAN_STRICT
> which greatly simplified that).
> 
> While the strnstr(.., "rcu") rules are redundant when using the
> stricter rules (at least CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n is
> set), we're keeping the "rcu" special case around because there are
> test robots and fuzzers that use the default KCSAN config (unlike
> rcutorture). And because we know that RCU likes the stricter rules,
> the "value change only" filter is ignored in all KCSAN configs for
> rcu-related functions.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/kcsan/report.c?id=v5.8

Thank you for the background information!

> Back then syzbot occasionally reported RCU data races, but these days
> rcutorture probably finds all of them before syzbot ever gets its
> hands on new code.

I do try my best.  ;-)

							Thanx, Paul
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Oleg Nesterov 1 year, 7 months ago
Sorry for another delay...

On 05/10, Paul E. McKenney wrote:
>
> On Fri, May 10, 2024 at 03:50:57PM +0200, Oleg Nesterov wrote:
> >
> > I can only find the strnstr(buf, "rcu") checks in skip_report(),
> > but they only cover the KCSAN_REPORT_VALUE_CHANGE_ONLY case...
>
> Huh, new one on me!  When I run KCSAN, I set CONFIG_KCSAN_STRICT=y,
> which implies CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n, which should
> prevent skip_report() from even being invoked.
>
> Which suggests that in the rest of the kernel, including "rcu_"
> in your function name gets you stricter KCSAN checking.  ;-)

Yes.

And that is why I was very confused. I misinterpreted the "stricter
data-race rules used in RCU code" as if there must be more "rcu-only"
hacks in the kernel/kcsan/ code which I can't find ;)

Oleg.
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Oleg Nesterov 1 year, 7 months ago
On 05/07, Paul E. McKenney wrote:
>
> On Tue, May 07, 2024 at 10:54:41AM -0400, Oleg Nesterov wrote:
> > Hello,
> >
> > I feel I don't really like this patch but I am travelling without my working
> > laptop, can't read the source code ;) Quite possibly I am wrong, I'll return
> > to this when I get back on May 10.
>
> By the stricter data-race rules used in RCU code [1], this is a bug that
> needs to be fixed.

Now that I can read the code... Sorry, still can't understand.

> which is read locklessly,

Where???

OK, OK, we have

	// rcu_sync_exit()
	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0)

and

	// rcu_sync_dtor()
	WARN_ON_ONCE(READ_ONCE(rsp->gp_count));

other than that ->gp_count is always accessed under ->rss_lock.

And yes, at least WARN_ON_ONCE() in rcu_sync_exit() can obviously race with
rcu_sync_enter/exit which update gp_count. I think this is fine correctness-wise.

But OK, we need to please KCSAN (or is there another problem I missed ???)

We can move these WARN_ON()'s into the ->rss_lock protected section.

Or perhaps we can use data_race(rsp->gp_count) ? To be honest I thought
that READ_ONCE() should be enough...

Or we can simply kill these WARN_ON_ONCE()'s.

I don't understand why should we add more WRITE_ONCE()'s into the critical
section protected by ->rss_lock.

Help! ;)

Oleg.


which in turn results in a data race.  The fix is to mark
> the updates (as below) with WRITE_ONCE().
>
> Or is there something in one or the other of these updates to ->gp_count
> that excludes lockless readers?  (I am not seeing it, but you know this
> code way better than I do!)
>
> 							Thanx, Paul
>
> [1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing
>
> > Oleg.
> >
> > On 05/07, Uladzislau Rezki (Sony) wrote:
> > >
> > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > >
> > > The rcu_sync structure's ->gp_count field is updated under the protection
> > > of ->rss_lock, but read locklessly, and KCSAN noted the data race.
> > > This commit therefore uses WRITE_ONCE() to do this update to clearly
> > > document its racy nature.
> > >
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > ---
> > >  kernel/rcu/sync.c | 8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> > > index 86df878a2fee..6c2bd9001adc 100644
> > > --- a/kernel/rcu/sync.c
> > > +++ b/kernel/rcu/sync.c
> > > @@ -122,7 +122,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> > >  		 * we are called at early boot time but this shouldn't happen.
> > >  		 */
> > >  	}
> > > -	rsp->gp_count++;
> > > +	WRITE_ONCE(rsp->gp_count, rsp->gp_count + 1);
> > >  	spin_unlock_irq(&rsp->rss_lock);
> > >
> > >  	if (gp_state == GP_IDLE) {
> > > @@ -151,11 +151,15 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> > >   */
> > >  void rcu_sync_exit(struct rcu_sync *rsp)
> > >  {
> > > +	int gpc;
> > > +
> > >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
> > >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
> > >
> > >  	spin_lock_irq(&rsp->rss_lock);
> > > -	if (!--rsp->gp_count) {
> > > +	gpc = rsp->gp_count - 1;
> > > +	WRITE_ONCE(rsp->gp_count, gpc);
> > > +	if (!gpc) {
> > >  		if (rsp->gp_state == GP_PASSED) {
> > >  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
> > >  			rcu_sync_call(rsp);
> > > --
> > > 2.39.2
> > >
> >
>
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Paul E. McKenney 1 year, 7 months ago
On Thu, May 09, 2024 at 05:13:12PM +0200, Oleg Nesterov wrote:
> On 05/07, Paul E. McKenney wrote:
> >
> > On Tue, May 07, 2024 at 10:54:41AM -0400, Oleg Nesterov wrote:
> > > Hello,
> > >
> > > I feel I don't really like this patch but I am travelling without my working
> > > laptop, can't read the source code ;) Quite possibly I am wrong, I'll return
> > > to this when I get back on May 10.
> >
> > By the stricter data-race rules used in RCU code [1], this is a bug that
> > needs to be fixed.
> 
> Now that I can read the code... Sorry, still can't understand.
> 
> > which is read locklessly,
> 
> Where???
> 
> OK, OK, we have
> 
> 	// rcu_sync_exit()
> 	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0)
> 
> and
> 
> 	// rcu_sync_dtor()
> 	WARN_ON_ONCE(READ_ONCE(rsp->gp_count));
> 
> other than that ->gp_count is always accessed under ->rss_lock.
> 
> And yes, at least WARN_ON_ONCE() in rcu_sync_exit() can obviously race with
> rcu_sync_enter/exit which update gp_count. I think this is fine correctness-wise.
> 
> But OK, we need to please KCSAN (or is there another problem I missed ???)
> 
> We can move these WARN_ON()'s into the ->rss_lock protected section.
> 
> Or perhaps we can use data_race(rsp->gp_count) ? To be honest I thought
> that READ_ONCE() should be enough...
> 
> Or we can simply kill these WARN_ON_ONCE()'s.

Or we could move those WARN_ON_ONCE() under the lock.  If this would
be a lock-contention issue, we could condition them with something like
IS_ENABLED(CONFIG_PROVE_RCU).  Then all accesses to those variables would
always be protected by the lock, and the WRITE_ONCE() and READ_ONCE()
calls could be dropped.  (Or am I missing another lockless access?)

Which would have the further advantage that if anyone accessed these
without holding the lock, KCSAN would complain.

> I don't understand why should we add more WRITE_ONCE()'s into the critical
> section protected by ->rss_lock.

There are indeed several ways to fix this.  Which would you prefer?

> Help! ;)

;-) ;-) ;-)

							Thanx, Paul

> Oleg.
> 
> 
> which in turn results in a data race.  The fix is to mark
> > the updates (as below) with WRITE_ONCE().
> >
> > Or is there something in one or the other of these updates to ->gp_count
> > that excludes lockless readers?  (I am not seeing it, but you know this
> > code way better than I do!)
> >
> > 							Thanx, Paul
> >
> > [1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing
> >
> > > Oleg.
> > >
> > > On 05/07, Uladzislau Rezki (Sony) wrote:
> > > >
> > > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > > >
> > > > The rcu_sync structure's ->gp_count field is updated under the protection
> > > > of ->rss_lock, but read locklessly, and KCSAN noted the data race.
> > > > This commit therefore uses WRITE_ONCE() to do this update to clearly
> > > > document its racy nature.
> > > >
> > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > ---
> > > >  kernel/rcu/sync.c | 8 ++++++--
> > > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> > > > index 86df878a2fee..6c2bd9001adc 100644
> > > > --- a/kernel/rcu/sync.c
> > > > +++ b/kernel/rcu/sync.c
> > > > @@ -122,7 +122,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> > > >  		 * we are called at early boot time but this shouldn't happen.
> > > >  		 */
> > > >  	}
> > > > -	rsp->gp_count++;
> > > > +	WRITE_ONCE(rsp->gp_count, rsp->gp_count + 1);
> > > >  	spin_unlock_irq(&rsp->rss_lock);
> > > >
> > > >  	if (gp_state == GP_IDLE) {
> > > > @@ -151,11 +151,15 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> > > >   */
> > > >  void rcu_sync_exit(struct rcu_sync *rsp)
> > > >  {
> > > > +	int gpc;
> > > > +
> > > >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
> > > >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
> > > >
> > > >  	spin_lock_irq(&rsp->rss_lock);
> > > > -	if (!--rsp->gp_count) {
> > > > +	gpc = rsp->gp_count - 1;
> > > > +	WRITE_ONCE(rsp->gp_count, gpc);
> > > > +	if (!gpc) {
> > > >  		if (rsp->gp_state == GP_PASSED) {
> > > >  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
> > > >  			rcu_sync_call(rsp);
> > > > --
> > > > 2.39.2
> > > >
> > >
> >
>
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Oleg Nesterov 1 year, 7 months ago
On 05/09, Paul E. McKenney wrote:
>
> On Thu, May 09, 2024 at 05:13:12PM +0200, Oleg Nesterov wrote:
> >
> > We can move these WARN_ON()'s into the ->rss_lock protected section.
> >
> > Or perhaps we can use data_race(rsp->gp_count) ? To be honest I thought
> > that READ_ONCE() should be enough...
> >
> > Or we can simply kill these WARN_ON_ONCE()'s.
>
> Or we could move those WARN_ON_ONCE() under the lock.

Sure, see above.

But could you help me to understand this magic? I naively thought
that READ_ONCE() is always "safe"...

So, unless I am totally confused it turns out that, say,

	CPU 0			CPU 1
	-----			-----

	spin_lock(LOCK);
	++X;			READ_ONCE(X); // data race
	spin_unlock(LOCK);

is data-racy accoring to KCSAN, while

	CPU 0			CPU 1
	-----			-----

	spin_lock(LOCK);
	WRITE_ONCE(X, X+1);	READ_ONCE(X); // no data race
	spin_unlock(LOCK);

is not.

Why is that?

Trying to read Documentation/dev-tools/kcsan.rst... it says

	KCSAN is aware of *marked atomic operations* (``READ_ONCE``, WRITE_ONCE``,

	...

	if all accesses to a variable that is accessed concurrently are properly
	marked, KCSAN will never trigger a watchpoint

but how can KCSAN detect that all accesses to X are properly marked? I see nothing
KCSAN-related in the definition of WRITE_ONCE() or READ_ONCE().

And what does the "all accesses" above actually mean? The 2nd version does

	WRITE_ONCE(X, X+1);

but "X + 1" is the plain/unmarked access?

Thanks,

Oleg.
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Paul E. McKenney 1 year, 7 months ago
On Fri, May 10, 2024 at 01:31:49PM +0200, Oleg Nesterov wrote:
> On 05/09, Paul E. McKenney wrote:
> >
> > On Thu, May 09, 2024 at 05:13:12PM +0200, Oleg Nesterov wrote:
> > >
> > > We can move these WARN_ON()'s into the ->rss_lock protected section.
> > >
> > > Or perhaps we can use data_race(rsp->gp_count) ? To be honest I thought
> > > that READ_ONCE() should be enough...
> > >
> > > Or we can simply kill these WARN_ON_ONCE()'s.
> >
> > Or we could move those WARN_ON_ONCE() under the lock.
> 
> Sure, see above.
> 
> But could you help me to understand this magic? I naively thought
> that READ_ONCE() is always "safe"...
> 
> So, unless I am totally confused it turns out that, say,
> 
> 	CPU 0			CPU 1
> 	-----			-----
> 
> 	spin_lock(LOCK);
> 	++X;			READ_ONCE(X); // data race
> 	spin_unlock(LOCK);
> 
> is data-racy accoring to KCSAN, while
> 
> 	CPU 0			CPU 1
> 	-----			-----
> 
> 	spin_lock(LOCK);
> 	WRITE_ONCE(X, X+1);	READ_ONCE(X); // no data race
> 	spin_unlock(LOCK);
> 
> is not.

Agreed, in RCU code.

> Why is that?

Because I run KCSAN on RCU using Kconfig options that cause KCSAN
to be more strict.

> Trying to read Documentation/dev-tools/kcsan.rst... it says
> 
> 	KCSAN is aware of *marked atomic operations* (``READ_ONCE``, WRITE_ONCE``,
> 
> 	...
> 
> 	if all accesses to a variable that is accessed concurrently are properly
> 	marked, KCSAN will never trigger a watchpoint
> 
> but how can KCSAN detect that all accesses to X are properly marked? I see nothing
> KCSAN-related in the definition of WRITE_ONCE() or READ_ONCE().

The trick is that KCSAN sees the volatile cast that both READ_ONCE()
and WRITE_ONCE() use.

> And what does the "all accesses" above actually mean? The 2nd version does
> 
> 	WRITE_ONCE(X, X+1);
> 
> but "X + 1" is the plain/unmarked access?

That would be the correct usage in RCU code if there were lockless
accesses to X, which would use READ_ONCE(), but a lock was held across
that WRITE_ONCE() such that there would be no concurrent updates of X.
In that case, the "X+1" cannot be involved in a data race, so KCSAN
won't complain.

But if all accesses to X were protected by an exclusive lock, then there
would be no data races involving X, and thus no marking of any accesses
to X.  Which would allow KCSAN to detect buggy lockless accesses to X.

							Thanx, Paul
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Oleg Nesterov 1 year, 7 months ago
On 05/10, Paul E. McKenney wrote:
>
> On Fri, May 10, 2024 at 01:31:49PM +0200, Oleg Nesterov wrote:
>
> > Why is that?
>
> Because I run KCSAN on RCU using Kconfig options that cause KCSAN
> to be more strict.

Yes, I see now.

> > but how can KCSAN detect that all accesses to X are properly marked? I see nothing
> > KCSAN-related in the definition of WRITE_ONCE() or READ_ONCE().
>
> The trick is that KCSAN sees the volatile cast that both READ_ONCE()
> and WRITE_ONCE() use.

Hmm. grep-grep-grep... I seem to understand, DEFINE_TSAN_VOLATILE_READ_WRITE.
So __tsan_volatile_readX() will use KCSAN_ACCESS_ATOMIC.

Thanks!

> > And what does the "all accesses" above actually mean? The 2nd version does
> >
> > 	WRITE_ONCE(X, X+1);
> >
> > but "X + 1" is the plain/unmarked access?
>
> ...
>
> In that case, the "X+1" cannot be involved in a data race, so KCSAN
> won't complain.

Yes, yes, I understand now.

Paul, thanks for your explanations! and sorry for wasting your time by
provoking the unnecessarily long discussion.

I am going to send the trivial patch which moves these WARN_ON()'s under
spin_lock(), this looks more clean to me. But I won't argue if you prefer
your original patch.

Oleg.
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Paul E. McKenney 1 year, 7 months ago
On Sun, May 12, 2024 at 12:53:06PM +0200, Oleg Nesterov wrote:
> On 05/10, Paul E. McKenney wrote:
> >
> > On Fri, May 10, 2024 at 01:31:49PM +0200, Oleg Nesterov wrote:
> >
> > > Why is that?
> >
> > Because I run KCSAN on RCU using Kconfig options that cause KCSAN
> > to be more strict.
> 
> Yes, I see now.
> 
> > > but how can KCSAN detect that all accesses to X are properly marked? I see nothing
> > > KCSAN-related in the definition of WRITE_ONCE() or READ_ONCE().
> >
> > The trick is that KCSAN sees the volatile cast that both READ_ONCE()
> > and WRITE_ONCE() use.
> 
> Hmm. grep-grep-grep... I seem to understand, DEFINE_TSAN_VOLATILE_READ_WRITE.
> So __tsan_volatile_readX() will use KCSAN_ACCESS_ATOMIC.
> 
> Thanks!

You got it!!!

> > > And what does the "all accesses" above actually mean? The 2nd version does
> > >
> > > 	WRITE_ONCE(X, X+1);
> > >
> > > but "X + 1" is the plain/unmarked access?
> >
> > ...
> >
> > In that case, the "X+1" cannot be involved in a data race, so KCSAN
> > won't complain.
> 
> Yes, yes, I understand now.
> 
> Paul, thanks for your explanations! and sorry for wasting your time by
> provoking the unnecessarily long discussion.

Not a problem and absolutely no need to apologize!  Of course, please do
pass this information on to anyone else needing it.

> I am going to send the trivial patch which moves these WARN_ON()'s under
> spin_lock(), this looks more clean to me. But I won't argue if you prefer
> your original patch.

Actually, I like your patch quite a bit better than I do my original.
In fact, I feel quite foolish that I did not think of this to begin with.
With your way, we have strict locking for that field and can therefore
just use plain C-language accesses for all accesses to it.  KCSAN will
then warn us of any buggy lockless access to that field, even if that
buggy access uses READ_ONCE().  Much much better your way!!!

So thank you for persisting on this!

							Thanx, Paul
Re: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field
Posted by Alan Huang 1 year, 7 months ago
> 2024年5月10日 19:31,Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 05/09, Paul E. McKenney wrote:
>> 
>> On Thu, May 09, 2024 at 05:13:12PM +0200, Oleg Nesterov wrote:
>>> 
>>> We can move these WARN_ON()'s into the ->rss_lock protected section.
>>> 
>>> Or perhaps we can use data_race(rsp->gp_count) ? To be honest I thought
>>> that READ_ONCE() should be enough...
>>> 
>>> Or we can simply kill these WARN_ON_ONCE()'s.
>> 
>> Or we could move those WARN_ON_ONCE() under the lock.
> 
> Sure, see above.
> 
> But could you help me to understand this magic? I naively thought
> that READ_ONCE() is always "safe"...
> 
> So, unless I am totally confused it turns out that, say,
> 
> CPU 0 CPU 1
> ----- -----
> 
> spin_lock(LOCK);
> ++X; READ_ONCE(X); // data race
> spin_unlock(LOCK);
> 
> is data-racy accoring to KCSAN, while
> 
> CPU 0 CPU 1
> ----- -----
> 
> spin_lock(LOCK);
> WRITE_ONCE(X, X+1); READ_ONCE(X); // no data race
> spin_unlock(LOCK);
> 
> is not.
> 
> Why is that?
> 
> Trying to read Documentation/dev-tools/kcsan.rst... it says
> 
> KCSAN is aware of *marked atomic operations* (``READ_ONCE``, WRITE_ONCE``,
> 
> ...
> 
> if all accesses to a variable that is accessed concurrently are properly
> marked, KCSAN will never trigger a watchpoint
> 
> but how can KCSAN detect that all accesses to X are properly marked? I see nothing
> KCSAN-related in the definition of WRITE_ONCE() or READ_ONCE().
> 
> And what does the "all accesses" above actually mean? The 2nd version does
> 
> WRITE_ONCE(X, X+1);
> 
> but "X + 1" is the plain/unmarked access?

X + 1 and READ_ONCE(X) are two read.

> 
> Thanks,
> 
> Oleg.
> 
> 
[PATCH] rcu/sync: don't read rcu_sync->gp_count lockless
Posted by Oleg Nesterov 1 year, 7 months ago
rcu_sync->gp_count is updated under the protection of ->rss_lock but read
locklessly by the WARN_ON() checks, and KCSAN noted the data race.

Move these WARN_ON_ONCE()'s under the lock and remove the no longer needed
READ_ONCE().

Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/rcu/sync.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index 86df878a2fee..b50fde082198 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -152,9 +152,9 @@ void rcu_sync_enter(struct rcu_sync *rsp)
 void rcu_sync_exit(struct rcu_sync *rsp)
 {
 	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
-	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
 
 	spin_lock_irq(&rsp->rss_lock);
+	WARN_ON_ONCE(rsp->gp_count == 0);
 	if (!--rsp->gp_count) {
 		if (rsp->gp_state == GP_PASSED) {
 			WRITE_ONCE(rsp->gp_state, GP_EXIT);
@@ -174,10 +174,10 @@ void rcu_sync_dtor(struct rcu_sync *rsp)
 {
 	int gp_state;
 
-	WARN_ON_ONCE(READ_ONCE(rsp->gp_count));
 	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
 
 	spin_lock_irq(&rsp->rss_lock);
+	WARN_ON_ONCE(rsp->gp_count != 0);
 	if (rsp->gp_state == GP_REPLAY)
 		WRITE_ONCE(rsp->gp_state, GP_EXIT);
 	gp_state = rsp->gp_state;
-- 
2.25.1.362.g51ebf55
Re: [PATCH] rcu/sync: don't read rcu_sync->gp_count lockless
Posted by Paul E. McKenney 1 year, 7 months ago
On Sun, May 12, 2024 at 01:19:48PM +0200, Oleg Nesterov wrote:
> rcu_sync->gp_count is updated under the protection of ->rss_lock but read
> locklessly by the WARN_ON() checks, and KCSAN noted the data race.
> 
> Move these WARN_ON_ONCE()'s under the lock and remove the no longer needed
> READ_ONCE().
> 
> Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Very good, thank you!

Due to inattention on my part, the patches were sent late, so the patch
you are (rightly) complaining about is on its way in.  So what I did was
to port your patch on top of that one as shown below.  Left to myself,
I would be thinking in terms of the v6.11 merge window.  Please let me
know if this is more urgent than that.

And as always, please let me know if I messed anything on in the port.

							Thanx, Paul

------------------------------------------------------------------------

commit 8d75fb302aaa97693c2294ded48a472e4956d615
Author: Oleg Nesterov <oleg@redhat.com>
Date:   Sun May 12 08:02:07 2024 -0700

    rcu: Eliminate lockless accesses to rcu_sync->gp_count
    
    The rcu_sync structure's ->gp_count field is always accessed under the
    protection of that same structure's ->rss_lock field, with the exception
    of a pair of WARN_ON_ONCE() calls just prior to acquiring that lock in
    functions rcu_sync_exit() and rcu_sync_dtor().  These lockless accesses
    are unnecessary and impair KCSAN's ability to catch bugs that might be
    inserted via other lockless accesses.
    
    This commit therefore moves those WARN_ON_ONCE() calls under the lock.
    
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index 6c2bd9001adcd..05bfe69fdb0bb 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -151,15 +151,11 @@ void rcu_sync_enter(struct rcu_sync *rsp)
  */
 void rcu_sync_exit(struct rcu_sync *rsp)
 {
-	int gpc;
-
 	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
-	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
 
 	spin_lock_irq(&rsp->rss_lock);
-	gpc = rsp->gp_count - 1;
-	WRITE_ONCE(rsp->gp_count, gpc);
-	if (!gpc) {
+	WARN_ON_ONCE(rsp->gp_count == 0);
+	if (!--rsp->gp_count) {
 		if (rsp->gp_state == GP_PASSED) {
 			WRITE_ONCE(rsp->gp_state, GP_EXIT);
 			rcu_sync_call(rsp);
@@ -178,10 +174,10 @@ void rcu_sync_dtor(struct rcu_sync *rsp)
 {
 	int gp_state;
 
-	WARN_ON_ONCE(READ_ONCE(rsp->gp_count));
 	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
 
 	spin_lock_irq(&rsp->rss_lock);
+	WARN_ON_ONCE(rsp->gp_count);
 	if (rsp->gp_state == GP_REPLAY)
 		WRITE_ONCE(rsp->gp_state, GP_EXIT);
 	gp_state = rsp->gp_state;
Re: [PATCH] rcu/sync: don't read rcu_sync->gp_count lockless
Posted by Oleg Nesterov 1 year, 7 months ago
On 05/12, Paul E. McKenney wrote:
>
> --- a/kernel/rcu/sync.c
> +++ b/kernel/rcu/sync.c
> @@ -151,15 +151,11 @@ void rcu_sync_enter(struct rcu_sync *rsp)
>   */
>  void rcu_sync_exit(struct rcu_sync *rsp)
>  {
> -	int gpc;
> -
>  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
> -	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
>
>  	spin_lock_irq(&rsp->rss_lock);
> -	gpc = rsp->gp_count - 1;
> -	WRITE_ONCE(rsp->gp_count, gpc);
> -	if (!gpc) {
> +	WARN_ON_ONCE(rsp->gp_count == 0);
> +	if (!--rsp->gp_count) {
>  		if (rsp->gp_state == GP_PASSED) {
>  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
>  			rcu_sync_call(rsp);
> @@ -178,10 +174,10 @@ void rcu_sync_dtor(struct rcu_sync *rsp)
>  {
>  	int gp_state;
>
> -	WARN_ON_ONCE(READ_ONCE(rsp->gp_count));
>  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
>
>  	spin_lock_irq(&rsp->rss_lock);
> +	WARN_ON_ONCE(rsp->gp_count);
>  	if (rsp->gp_state == GP_REPLAY)
>  		WRITE_ONCE(rsp->gp_state, GP_EXIT);
>  	gp_state = rsp->gp_state;

Thanks Paul!

But then I think this change can also revert this chunk from the previous
patch:

	@@ -122,7 +122,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
			 * we are called at early boot time but this shouldn't happen.
			 */
		}
	-	rsp->gp_count++;
	+	WRITE_ONCE(rsp->gp_count, rsp->gp_count + 1);
		spin_unlock_irq(&rsp->rss_lock);
	 
		if (gp_state == GP_IDLE) {


Thanks again,

Oleg.
Re: [PATCH] rcu/sync: don't read rcu_sync->gp_count lockless
Posted by Paul E. McKenney 1 year, 7 months ago
On Sun, May 12, 2024 at 06:55:29PM +0200, Oleg Nesterov wrote:
> On 05/12, Paul E. McKenney wrote:
> >
> > --- a/kernel/rcu/sync.c
> > +++ b/kernel/rcu/sync.c
> > @@ -151,15 +151,11 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> >   */
> >  void rcu_sync_exit(struct rcu_sync *rsp)
> >  {
> > -	int gpc;
> > -
> >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
> > -	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
> >
> >  	spin_lock_irq(&rsp->rss_lock);
> > -	gpc = rsp->gp_count - 1;
> > -	WRITE_ONCE(rsp->gp_count, gpc);
> > -	if (!gpc) {
> > +	WARN_ON_ONCE(rsp->gp_count == 0);
> > +	if (!--rsp->gp_count) {
> >  		if (rsp->gp_state == GP_PASSED) {
> >  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
> >  			rcu_sync_call(rsp);
> > @@ -178,10 +174,10 @@ void rcu_sync_dtor(struct rcu_sync *rsp)
> >  {
> >  	int gp_state;
> >
> > -	WARN_ON_ONCE(READ_ONCE(rsp->gp_count));
> >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
> >
> >  	spin_lock_irq(&rsp->rss_lock);
> > +	WARN_ON_ONCE(rsp->gp_count);
> >  	if (rsp->gp_state == GP_REPLAY)
> >  		WRITE_ONCE(rsp->gp_state, GP_EXIT);
> >  	gp_state = rsp->gp_state;
> 
> Thanks Paul!
> 
> But then I think this change can also revert this chunk from the previous
> patch:
> 
> 	@@ -122,7 +122,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> 			 * we are called at early boot time but this shouldn't happen.
> 			 */
> 		}
> 	-	rsp->gp_count++;
> 	+	WRITE_ONCE(rsp->gp_count, rsp->gp_count + 1);
> 		spin_unlock_irq(&rsp->rss_lock);
> 	 
> 		if (gp_state == GP_IDLE) {

Good catch, thank you!  How about like this?

							Thanx, Paul

------------------------------------------------------------------------

commit 3e75ce9876396a770a0fcd8eecd83b9f6469f49c
Author: Oleg Nesterov <oleg@redhat.com>
Date:   Sun May 12 08:02:07 2024 -0700

    rcu: Eliminate lockless accesses to rcu_sync->gp_count
    
    The rcu_sync structure's ->gp_count field is always accessed under the
    protection of that same structure's ->rss_lock field, with the exception
    of a pair of WARN_ON_ONCE() calls just prior to acquiring that lock in
    functions rcu_sync_exit() and rcu_sync_dtor().  These lockless accesses
    are unnecessary and impair KCSAN's ability to catch bugs that might be
    inserted via other lockless accesses.
    
    This commit therefore moves those WARN_ON_ONCE() calls under the lock.
    
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index 6c2bd9001adcd..da60a9947c005 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -122,7 +122,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
 		 * we are called at early boot time but this shouldn't happen.
 		 */
 	}
-	WRITE_ONCE(rsp->gp_count, rsp->gp_count + 1);
+	rsp->gp_count++;
 	spin_unlock_irq(&rsp->rss_lock);
 
 	if (gp_state == GP_IDLE) {
@@ -151,15 +151,11 @@ void rcu_sync_enter(struct rcu_sync *rsp)
  */
 void rcu_sync_exit(struct rcu_sync *rsp)
 {
-	int gpc;
-
 	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
-	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
 
 	spin_lock_irq(&rsp->rss_lock);
-	gpc = rsp->gp_count - 1;
-	WRITE_ONCE(rsp->gp_count, gpc);
-	if (!gpc) {
+	WARN_ON_ONCE(rsp->gp_count == 0);
+	if (!--rsp->gp_count) {
 		if (rsp->gp_state == GP_PASSED) {
 			WRITE_ONCE(rsp->gp_state, GP_EXIT);
 			rcu_sync_call(rsp);
@@ -178,10 +174,10 @@ void rcu_sync_dtor(struct rcu_sync *rsp)
 {
 	int gp_state;
 
-	WARN_ON_ONCE(READ_ONCE(rsp->gp_count));
 	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
 
 	spin_lock_irq(&rsp->rss_lock);
+	WARN_ON_ONCE(rsp->gp_count);
 	if (rsp->gp_state == GP_REPLAY)
 		WRITE_ONCE(rsp->gp_state, GP_EXIT);
 	gp_state = rsp->gp_state;
Re: [PATCH] rcu/sync: don't read rcu_sync->gp_count lockless
Posted by Oleg Nesterov 1 year, 7 months ago
On 05/12, Paul E. McKenney wrote:
>
> How about like this?

LGTM ;)

Oleg.