[PATCH 1/3] rcu/exp: Protect against early QS report

Frederic Weisbecker posted 3 patches 10 months, 1 week ago
There is a newer version of this series
[PATCH 1/3] rcu/exp: Protect against early QS report
Posted by Frederic Weisbecker 10 months, 1 week ago
When a grace period is started, the ->expmask of each node is set up
from sync_exp_reset_tree(). Then later on each leaf node also initialize
its ->exp_tasks pointer.

This means that the initialization of the quiescent state of a node and
the initialization of its blocking tasks happen with an unlocked node
gap in-between.

It happens to be fine because nothing is expected to report an exp
quiescent state within this gap, since no IPI have been issued yet and
every rdp's ->cpu_no_qs.b.exp should be false.

However if it were to happen by accident, the quiescent state could be
reported and propagated while ignoring tasks that blocked _before_ the
start of the grace period.

Prevent such trouble to happen in the future and initialize both the
quiescent states mask to report and the blocked tasks head from the same
node locked block.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_exp.h | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 8d4895c854c5..caff16e441d1 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -141,6 +141,13 @@ static void __maybe_unused sync_exp_reset_tree(void)
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		WARN_ON_ONCE(rnp->expmask);
 		WRITE_ONCE(rnp->expmask, rnp->expmaskinit);
+		/*
+		 * Need to wait for any blocked tasks as well.	Note that
+		 * additional blocking tasks will also block the expedited GP
+		 * until such time as the ->expmask bits are cleared.
+		 */
+		if (rcu_is_leaf_node(rnp) && rcu_preempt_has_tasks(rnp))
+			WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next);
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	}
 }
@@ -393,13 +400,6 @@ static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp)
 	}
 	mask_ofl_ipi = rnp->expmask & ~mask_ofl_test;
 
-	/*
-	 * Need to wait for any blocked tasks as well.	Note that
-	 * additional blocking tasks will also block the expedited GP
-	 * until such time as the ->expmask bits are cleared.
-	 */
-	if (rcu_preempt_has_tasks(rnp))
-		WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
 	/* IPI the remaining CPUs for expedited quiescent state. */
-- 
2.46.0
Re: [PATCH 1/3] rcu/exp: Protect against early QS report
Posted by Paul E. McKenney 10 months, 1 week ago
On Fri, Feb 14, 2025 at 12:25:57AM +0100, Frederic Weisbecker wrote:
> When a grace period is started, the ->expmask of each node is set up
> from sync_exp_reset_tree(). Then later on each leaf node also initialize
> its ->exp_tasks pointer.
> 
> This means that the initialization of the quiescent state of a node and
> the initialization of its blocking tasks happen with an unlocked node
> gap in-between.
> 
> It happens to be fine because nothing is expected to report an exp
> quiescent state within this gap, since no IPI have been issued yet and
> every rdp's ->cpu_no_qs.b.exp should be false.
> 
> However if it were to happen by accident, the quiescent state could be
> reported and propagated while ignoring tasks that blocked _before_ the
> start of the grace period.
> 
> Prevent such trouble to happen in the future and initialize both the
> quiescent states mask to report and the blocked tasks head from the same
> node locked block.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Thank you for looking into this!

One question:  What happens if a CPU has tasks pending during the
call to sync_exp_reset_tree(), but all of these tasks complete
their RCU read-side critical sections before execution reaches
__sync_rcu_exp_select_node_cpus()?

(My guess is that all is well, but even if so, it would be good to record
why in the commit log.)

						Thanx, Paul

> ---
>  kernel/rcu/tree_exp.h | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 8d4895c854c5..caff16e441d1 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -141,6 +141,13 @@ static void __maybe_unused sync_exp_reset_tree(void)
>  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  		WARN_ON_ONCE(rnp->expmask);
>  		WRITE_ONCE(rnp->expmask, rnp->expmaskinit);
> +		/*
> +		 * Need to wait for any blocked tasks as well.	Note that
> +		 * additional blocking tasks will also block the expedited GP
> +		 * until such time as the ->expmask bits are cleared.
> +		 */
> +		if (rcu_is_leaf_node(rnp) && rcu_preempt_has_tasks(rnp))
> +			WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next);
>  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  	}
>  }
> @@ -393,13 +400,6 @@ static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp)
>  	}
>  	mask_ofl_ipi = rnp->expmask & ~mask_ofl_test;
>  
> -	/*
> -	 * Need to wait for any blocked tasks as well.	Note that
> -	 * additional blocking tasks will also block the expedited GP
> -	 * until such time as the ->expmask bits are cleared.
> -	 */
> -	if (rcu_preempt_has_tasks(rnp))
> -		WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next);
>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  
>  	/* IPI the remaining CPUs for expedited quiescent state. */
> -- 
> 2.46.0
>
Re: [PATCH 1/3] rcu/exp: Protect against early QS report
Posted by Frederic Weisbecker 9 months, 1 week ago
Le Fri, Feb 14, 2025 at 01:10:43AM -0800, Paul E. McKenney a écrit :
> On Fri, Feb 14, 2025 at 12:25:57AM +0100, Frederic Weisbecker wrote:
> > When a grace period is started, the ->expmask of each node is set up
> > from sync_exp_reset_tree(). Then later on each leaf node also initialize
> > its ->exp_tasks pointer.
> > 
> > This means that the initialization of the quiescent state of a node and
> > the initialization of its blocking tasks happen with an unlocked node
> > gap in-between.
> > 
> > It happens to be fine because nothing is expected to report an exp
> > quiescent state within this gap, since no IPI have been issued yet and
> > every rdp's ->cpu_no_qs.b.exp should be false.
> > 
> > However if it were to happen by accident, the quiescent state could be
> > reported and propagated while ignoring tasks that blocked _before_ the
> > start of the grace period.
> > 
> > Prevent such trouble to happen in the future and initialize both the
> > quiescent states mask to report and the blocked tasks head from the same
> > node locked block.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> 
> Thank you for looking into this!
> 
> One question:  What happens if a CPU has tasks pending during the
> call to sync_exp_reset_tree(), but all of these tasks complete
> their RCU read-side critical sections before execution reaches
> __sync_rcu_exp_select_node_cpus()?
> 
> (My guess is that all is well, but even if so, it would be good to record
> why in the commit log.)

All is (expected to be) well because the QS won't be reported yet:
rdp->cpu_no_qs.b.exp is still false, therefore rnp->expmask will
still have the RDPs masks set.

!PREEMPT_RCU is different because sync_sched_exp_online_cleanup()
can report the QS earlier. But that patch is a PREEMPT_RCU concern
only.

I'll drop a note on the changelog.

Thanks.


> 
> 						Thanx, Paul
> 
> > ---
> >  kernel/rcu/tree_exp.h | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index 8d4895c854c5..caff16e441d1 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -141,6 +141,13 @@ static void __maybe_unused sync_exp_reset_tree(void)
> >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> >  		WARN_ON_ONCE(rnp->expmask);
> >  		WRITE_ONCE(rnp->expmask, rnp->expmaskinit);
> > +		/*
> > +		 * Need to wait for any blocked tasks as well.	Note that
> > +		 * additional blocking tasks will also block the expedited GP
> > +		 * until such time as the ->expmask bits are cleared.
> > +		 */
> > +		if (rcu_is_leaf_node(rnp) && rcu_preempt_has_tasks(rnp))
> > +			WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next);
> >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >  	}
> >  }
> > @@ -393,13 +400,6 @@ static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp)
> >  	}
> >  	mask_ofl_ipi = rnp->expmask & ~mask_ofl_test;
> >  
> > -	/*
> > -	 * Need to wait for any blocked tasks as well.	Note that
> > -	 * additional blocking tasks will also block the expedited GP
> > -	 * until such time as the ->expmask bits are cleared.
> > -	 */
> > -	if (rcu_preempt_has_tasks(rnp))
> > -		WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next);
> >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >  
> >  	/* IPI the remaining CPUs for expedited quiescent state. */
> > -- 
> > 2.46.0
> > 
Re: [PATCH 1/3] rcu/exp: Protect against early QS report
Posted by Paul E. McKenney 9 months, 1 week ago
On Thu, Mar 13, 2025 at 05:40:19PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 14, 2025 at 01:10:43AM -0800, Paul E. McKenney a écrit :
> > On Fri, Feb 14, 2025 at 12:25:57AM +0100, Frederic Weisbecker wrote:
> > > When a grace period is started, the ->expmask of each node is set up
> > > from sync_exp_reset_tree(). Then later on each leaf node also initialize
> > > its ->exp_tasks pointer.
> > > 
> > > This means that the initialization of the quiescent state of a node and
> > > the initialization of its blocking tasks happen with an unlocked node
> > > gap in-between.
> > > 
> > > It happens to be fine because nothing is expected to report an exp
> > > quiescent state within this gap, since no IPI have been issued yet and
> > > every rdp's ->cpu_no_qs.b.exp should be false.
> > > 
> > > However if it were to happen by accident, the quiescent state could be
> > > reported and propagated while ignoring tasks that blocked _before_ the
> > > start of the grace period.
> > > 
> > > Prevent such trouble to happen in the future and initialize both the
> > > quiescent states mask to report and the blocked tasks head from the same
> > > node locked block.
> > > 
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > 
> > Thank you for looking into this!
> > 
> > One question:  What happens if a CPU has tasks pending during the
> > call to sync_exp_reset_tree(), but all of these tasks complete
> > their RCU read-side critical sections before execution reaches
> > __sync_rcu_exp_select_node_cpus()?
> > 
> > (My guess is that all is well, but even if so, it would be good to record
> > why in the commit log.)
> 
> All is (expected to be) well because the QS won't be reported yet:
> rdp->cpu_no_qs.b.exp is still false, therefore rnp->expmask will
> still have the RDPs masks set.
> 
> !PREEMPT_RCU is different because sync_sched_exp_online_cleanup()
> can report the QS earlier. But that patch is a PREEMPT_RCU concern
> only.
> 
> I'll drop a note on the changelog.

Thank you!

							Thanx, Paul

> Thanks.
> 
> 
> > 
> > 						Thanx, Paul
> > 
> > > ---
> > >  kernel/rcu/tree_exp.h | 14 +++++++-------
> > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > index 8d4895c854c5..caff16e441d1 100644
> > > --- a/kernel/rcu/tree_exp.h
> > > +++ b/kernel/rcu/tree_exp.h
> > > @@ -141,6 +141,13 @@ static void __maybe_unused sync_exp_reset_tree(void)
> > >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > >  		WARN_ON_ONCE(rnp->expmask);
> > >  		WRITE_ONCE(rnp->expmask, rnp->expmaskinit);
> > > +		/*
> > > +		 * Need to wait for any blocked tasks as well.	Note that
> > > +		 * additional blocking tasks will also block the expedited GP
> > > +		 * until such time as the ->expmask bits are cleared.
> > > +		 */
> > > +		if (rcu_is_leaf_node(rnp) && rcu_preempt_has_tasks(rnp))
> > > +			WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next);
> > >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > >  	}
> > >  }
> > > @@ -393,13 +400,6 @@ static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp)
> > >  	}
> > >  	mask_ofl_ipi = rnp->expmask & ~mask_ofl_test;
> > >  
> > > -	/*
> > > -	 * Need to wait for any blocked tasks as well.	Note that
> > > -	 * additional blocking tasks will also block the expedited GP
> > > -	 * until such time as the ->expmask bits are cleared.
> > > -	 */
> > > -	if (rcu_preempt_has_tasks(rnp))
> > > -		WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next);
> > >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > >  
> > >  	/* IPI the remaining CPUs for expedited quiescent state. */
> > > -- 
> > > 2.46.0
> > >