[PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading

Frederic Weisbecker posted 2 patches 2 weeks, 3 days ago
[PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading
Posted by Frederic Weisbecker 2 weeks, 3 days ago
From: Zqiang <qiang.zhang1211@gmail.com>

Currently, running rcutorture test with torture_type=rcu fwd_progress=8
n_barrier_cbs=8 nocbs_nthreads=8 nocbs_toggle=100 onoff_interval=60
test_boost=2, will trigger the following warning:

	WARNING: CPU: 19 PID: 100 at kernel/rcu/tree_nocb.h:1061 rcu_nocb_rdp_deoffload+0x292/0x2a0
	RIP: 0010:rcu_nocb_rdp_deoffload+0x292/0x2a0
	 Call Trace:
	  <TASK>
	  ? __warn+0x7e/0x120
	  ? rcu_nocb_rdp_deoffload+0x292/0x2a0
	  ? report_bug+0x18e/0x1a0
	  ? handle_bug+0x3d/0x70
	  ? exc_invalid_op+0x18/0x70
	  ? asm_exc_invalid_op+0x1a/0x20
	  ? rcu_nocb_rdp_deoffload+0x292/0x2a0
	  rcu_nocb_cpu_deoffload+0x70/0xa0
	  rcu_nocb_toggle+0x136/0x1c0
	  ? __pfx_rcu_nocb_toggle+0x10/0x10
	  kthread+0xd1/0x100
	  ? __pfx_kthread+0x10/0x10
	  ret_from_fork+0x2f/0x50
	  ? __pfx_kthread+0x10/0x10
	  ret_from_fork_asm+0x1a/0x30
	  </TASK>

CPU0                               CPU2                          CPU3
//rcu_nocb_toggle             //nocb_cb_wait                   //rcutorture

// deoffload CPU1             // process CPU1's rdp
rcu_barrier()
    rcu_segcblist_entrain()
        rcu_segcblist_add_len(1);
        // len == 2
        // enqueue barrier
        // callback to CPU1's
        // rdp->cblist
                             rcu_do_batch()
                                 // invoke CPU1's rdp->cblist
                                 // callback
                                 rcu_barrier_callback()
                                                             rcu_barrier()
                                                               mutex_lock(&rcu_state.barrier_mutex);
                                                               // still see len == 2
                                                               // enqueue barrier callback
                                                               // to CPU1's rdp->cblist
                                                               rcu_segcblist_entrain()
                                                                   rcu_segcblist_add_len(1);
                                                                   // len == 3
                                 // decrement len
                                 rcu_segcblist_add_len(-2);
                             kthread_parkme()

// CPU1's rdp->cblist len == 1
// Warn because there is
// still a pending barrier
// trigger warning
WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
cpus_read_unlock();

                                                                // wait CPU1 to comes online and
                                                                // invoke barrier callback on
                                                                // CPU1 rdp's->cblist
                                                                wait_for_completion(&rcu_state.barrier_completion);
// deoffload CPU4
cpus_read_lock()
  rcu_barrier()
    mutex_lock(&rcu_state.barrier_mutex);
    // block on barrier_mutex
    // wait rcu_barrier() on
    // CPU3 to unlock barrier_mutex
    // but CPU3 unlock barrier_mutex
    // need to wait CPU1 comes online
    // when CPU1 going online will block on cpus_write_lock

The above scenario will not only trigger a WARN_ON_ONCE(), but also
trigger a deadlock.

Thanks to nocb locking, a second racing rcu_barrier() on an offline CPU
will either observe the decremented callback counter down to 0 and spare
the callback enqueue, or rcuo will observe the new callback and keep
rdp->nocb_cb_sleep to false.

Therefore check rdp->nocb_cb_sleep before parking to make sure no
further rcu_barrier() is waiting on the rdp.

Fixes: 1fcb932c8b5c ("rcu/nocb: Simplify (de-)offloading state machine")
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_nocb.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 16865475120b..2605dd234a13 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
 	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
 					    nocb_cb_wait_cond(rdp));
 	if (kthread_should_park()) {
-		kthread_parkme();
+		/*
+		 * kthread_park() must be preceded by an rcu_barrier().
+		 * But yet another rcu_barrier() might have sneaked in between
+		 * the barrier callback execution and the callbacks counter
+		 * decrement.
+		 */
+		if (rdp->nocb_cb_sleep) {
+			rcu_nocb_lock_irqsave(rdp, flags);
+			WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
+			rcu_nocb_unlock_irqrestore(rdp, flags);
+			kthread_parkme();
+		}
 	} else if (READ_ONCE(rdp->nocb_cb_sleep)) {
 		WARN_ON(signal_pending(current));
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
-- 
2.46.0
Re: [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading
Posted by Neeraj Upadhyay 1 week, 5 days ago
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 16865475120b..2605dd234a13 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
>  					    nocb_cb_wait_cond(rdp));
>  	if (kthread_should_park()) {
> -		kthread_parkme();
> +		/*
> +		 * kthread_park() must be preceded by an rcu_barrier().
> +		 * But yet another rcu_barrier() might have sneaked in between
> +		 * the barrier callback execution and the callbacks counter
> +		 * decrement.
> +		 */
> +		if (rdp->nocb_cb_sleep) {

Is READ_ONCE() not required here?


- Neeraj

> +			rcu_nocb_lock_irqsave(rdp, flags);
> +			WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
> +			rcu_nocb_unlock_irqrestore(rdp, flags);
> +			kthread_parkme();
> +		}
>  	} else if (READ_ONCE(rdp->nocb_cb_sleep)) {
>  		WARN_ON(signal_pending(current));
>  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
Re: [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading
Posted by Frederic Weisbecker 1 week, 4 days ago
Le Mon, Nov 11, 2024 at 01:07:16PM +0530, Neeraj Upadhyay a écrit :
> 
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 16865475120b..2605dd234a13 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> >  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> >  					    nocb_cb_wait_cond(rdp));
> >  	if (kthread_should_park()) {
> > -		kthread_parkme();
> > +		/*
> > +		 * kthread_park() must be preceded by an rcu_barrier().
> > +		 * But yet another rcu_barrier() might have sneaked in between
> > +		 * the barrier callback execution and the callbacks counter
> > +		 * decrement.
> > +		 */
> > +		if (rdp->nocb_cb_sleep) {
> 
> Is READ_ONCE() not required here?

No because it can't be written concurrently at this point. The value observed
here if kthread_should_park() must have been written locally on the previous
call to nocb_cb_wait().

Thanks.


> 
> 
> - Neeraj
> 
> > +			rcu_nocb_lock_irqsave(rdp, flags);
> > +			WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
> > +			rcu_nocb_unlock_irqrestore(rdp, flags);
> > +			kthread_parkme();
> > +		}
> >  	} else if (READ_ONCE(rdp->nocb_cb_sleep)) {
> >  		WARN_ON(signal_pending(current));
> >  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
> 
Re: [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading
Posted by Neeraj Upadhyay 1 week, 4 days ago

On 11/13/2024 3:07 AM, Frederic Weisbecker wrote:
> Le Mon, Nov 11, 2024 at 01:07:16PM +0530, Neeraj Upadhyay a écrit :
>>
>>> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
>>> index 16865475120b..2605dd234a13 100644
>>> --- a/kernel/rcu/tree_nocb.h
>>> +++ b/kernel/rcu/tree_nocb.h
>>> @@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>>>  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
>>>  					    nocb_cb_wait_cond(rdp));
>>>  	if (kthread_should_park()) {
>>> -		kthread_parkme();
>>> +		/*
>>> +		 * kthread_park() must be preceded by an rcu_barrier().
>>> +		 * But yet another rcu_barrier() might have sneaked in between
>>> +		 * the barrier callback execution and the callbacks counter
>>> +		 * decrement.
>>> +		 */
>>> +		if (rdp->nocb_cb_sleep) {
>>
>> Is READ_ONCE() not required here?
> 
> No because it can't be written concurrently at this point. The value observed
> here if kthread_should_park() must have been written locally on the previous
> call to nocb_cb_wait().
> 

Ok, got it. I was not aware of any other flow (other than the one described in
this fix) which can race with it. So, asked.



- Neeraj