[PATCH 15/34] rcu: Add noinstr-fast rcu_read_{,un}lock_tasks_trace() APIs

Paul E. McKenney posted 34 patches 1 week, 1 day ago
There is a newer version of this series
[PATCH 15/34] rcu: Add noinstr-fast rcu_read_{,un}lock_tasks_trace() APIs
Posted by Paul E. McKenney 1 week, 1 day ago
When expressing RCU Tasks Trace in terms of SRCU-fast, it was
necessary to keep a nesting count and per-CPU srcu_ctr structure
pointer in the task_struct structure, which is slow to access.
But an alternative is to instead make rcu_read_lock_tasks_trace() and
rcu_read_unlock_tasks_trace(), which match the underlying SRCU-fast
semantics, avoiding the task_struct accesses.

When all callers have switched to the new API, the previous
rcu_read_lock_trace() and rcu_read_unlock_trace() APIs will be removed.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <bpf@vger.kernel.org>
---
 include/linux/rcupdate_trace.h | 37 ++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/include/linux/rcupdate_trace.h b/include/linux/rcupdate_trace.h
index 0bd47f12ecd17b..b87151e6b23881 100644
--- a/include/linux/rcupdate_trace.h
+++ b/include/linux/rcupdate_trace.h
@@ -34,6 +34,43 @@ static inline int rcu_read_lock_trace_held(void)
 
 #ifdef CONFIG_TASKS_TRACE_RCU
 
+/**
+ * rcu_read_lock_tasks_trace - mark beginning of RCU-trace read-side critical section
+ *
+ * When synchronize_rcu_tasks_trace() is invoked by one task, then that
+ * task is guaranteed to block until all other tasks exit their read-side
+ * critical sections.  Similarly, if call_rcu_trace() is invoked on one
+ * task while other tasks are within RCU read-side critical sections,
+ * invocation of the corresponding RCU callback is deferred until after
+ * the all the other tasks exit their critical sections.
+ *
+ * For more details, please see the documentation for srcu_read_lock_fast().
+ */
+static inline struct srcu_ctr __percpu *rcu_read_lock_tasks_trace(void)
+{
+	struct srcu_ctr __percpu *ret = srcu_read_lock_fast(&rcu_tasks_trace_srcu_struct);
+
+	if (IS_ENABLED(CONFIG_ARCH_WANTS_NO_INSTR))
+		smp_mb();
+	return ret;
+}
+
+/**
+ * rcu_read_unlock_tasks_trace - mark end of RCU-trace read-side critical section
+ * @scp: return value from corresponding rcu_read_lock_tasks_trace().
+ *
+ * Pairs with the preceding call to rcu_read_lock_tasks_trace() that
+ * returned the value passed in via scp.
+ *
+ * For more details, please see the documentation for rcu_read_unlock().
+ */
+static inline void rcu_read_unlock_tasks_trace(struct srcu_ctr __percpu *scp)
+{
+	if (!IS_ENABLED(CONFIG_ARCH_WANTS_NO_INSTR))
+		smp_mb();
+	srcu_read_unlock_fast(&rcu_tasks_trace_srcu_struct, scp);
+}
+
 /**
  * rcu_read_lock_trace - mark beginning of RCU-trace read-side critical section
  *
-- 
2.40.1
Re: [PATCH 15/34] rcu: Add noinstr-fast rcu_read_{,un}lock_tasks_trace() APIs
Posted by Peter Zijlstra 1 week, 1 day ago
On Tue, Sep 23, 2025 at 07:20:17AM -0700, Paul E. McKenney wrote:
> When expressing RCU Tasks Trace in terms of SRCU-fast, it was
> necessary to keep a nesting count and per-CPU srcu_ctr structure
> pointer in the task_struct structure, which is slow to access.
> But an alternative is to instead make rcu_read_lock_tasks_trace() and
> rcu_read_unlock_tasks_trace(), which match the underlying SRCU-fast
> semantics, avoiding the task_struct accesses.
> 
> When all callers have switched to the new API, the previous
> rcu_read_lock_trace() and rcu_read_unlock_trace() APIs will be removed.
> 
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: <bpf@vger.kernel.org>
> ---
>  include/linux/rcupdate_trace.h | 37 ++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/include/linux/rcupdate_trace.h b/include/linux/rcupdate_trace.h
> index 0bd47f12ecd17b..b87151e6b23881 100644
> --- a/include/linux/rcupdate_trace.h
> +++ b/include/linux/rcupdate_trace.h
> @@ -34,6 +34,43 @@ static inline int rcu_read_lock_trace_held(void)
>  
>  #ifdef CONFIG_TASKS_TRACE_RCU
>  
> +/**
> + * rcu_read_lock_tasks_trace - mark beginning of RCU-trace read-side critical section
> + *
> + * When synchronize_rcu_tasks_trace() is invoked by one task, then that
> + * task is guaranteed to block until all other tasks exit their read-side
> + * critical sections.  Similarly, if call_rcu_trace() is invoked on one
> + * task while other tasks are within RCU read-side critical sections,
> + * invocation of the corresponding RCU callback is deferred until after
> + * the all the other tasks exit their critical sections.
> + *
> + * For more details, please see the documentation for srcu_read_lock_fast().
> + */
> +static inline struct srcu_ctr __percpu *rcu_read_lock_tasks_trace(void)
> +{
> +	struct srcu_ctr __percpu *ret = srcu_read_lock_fast(&rcu_tasks_trace_srcu_struct);
> +
> +	if (IS_ENABLED(CONFIG_ARCH_WANTS_NO_INSTR))
> +		smp_mb();

I am somewhat confused by the relation between noinstr and smp_mb()
here. Subject mentions is, but Changelog is awfully silent again.

Furthermore I note that this is a positive while unlock is a negative
relation between the two. Which adds even more confusion.

> +	return ret;
> +}
> +
> +/**
> + * rcu_read_unlock_tasks_trace - mark end of RCU-trace read-side critical section
> + * @scp: return value from corresponding rcu_read_lock_tasks_trace().
> + *
> + * Pairs with the preceding call to rcu_read_lock_tasks_trace() that
> + * returned the value passed in via scp.
> + *
> + * For more details, please see the documentation for rcu_read_unlock().
> + */
> +static inline void rcu_read_unlock_tasks_trace(struct srcu_ctr __percpu *scp)
> +{
> +	if (!IS_ENABLED(CONFIG_ARCH_WANTS_NO_INSTR))
> +		smp_mb();
> +	srcu_read_unlock_fast(&rcu_tasks_trace_srcu_struct, scp);
> +}
> +
>  /**
>   * rcu_read_lock_trace - mark beginning of RCU-trace read-side critical section
>   *
> -- 
> 2.40.1
>
Re: [PATCH 15/34] rcu: Add noinstr-fast rcu_read_{,un}lock_tasks_trace() APIs
Posted by Paul E. McKenney 1 week ago
On Tue, Sep 23, 2025 at 07:32:16PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 23, 2025 at 07:20:17AM -0700, Paul E. McKenney wrote:
> > When expressing RCU Tasks Trace in terms of SRCU-fast, it was
> > necessary to keep a nesting count and per-CPU srcu_ctr structure
> > pointer in the task_struct structure, which is slow to access.
> > But an alternative is to instead make rcu_read_lock_tasks_trace() and
> > rcu_read_unlock_tasks_trace(), which match the underlying SRCU-fast
> > semantics, avoiding the task_struct accesses.
> > 
> > When all callers have switched to the new API, the previous
> > rcu_read_lock_trace() and rcu_read_unlock_trace() APIs will be removed.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Andrii Nakryiko <andrii@kernel.org>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: <bpf@vger.kernel.org>
> > ---
> >  include/linux/rcupdate_trace.h | 37 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 37 insertions(+)
> > 
> > diff --git a/include/linux/rcupdate_trace.h b/include/linux/rcupdate_trace.h
> > index 0bd47f12ecd17b..b87151e6b23881 100644
> > --- a/include/linux/rcupdate_trace.h
> > +++ b/include/linux/rcupdate_trace.h
> > @@ -34,6 +34,43 @@ static inline int rcu_read_lock_trace_held(void)
> >  
> >  #ifdef CONFIG_TASKS_TRACE_RCU
> >  
> > +/**
> > + * rcu_read_lock_tasks_trace - mark beginning of RCU-trace read-side critical section
> > + *
> > + * When synchronize_rcu_tasks_trace() is invoked by one task, then that
> > + * task is guaranteed to block until all other tasks exit their read-side
> > + * critical sections.  Similarly, if call_rcu_trace() is invoked on one
> > + * task while other tasks are within RCU read-side critical sections,
> > + * invocation of the corresponding RCU callback is deferred until after
> > + * the all the other tasks exit their critical sections.
> > + *
> > + * For more details, please see the documentation for srcu_read_lock_fast().
> > + */
> > +static inline struct srcu_ctr __percpu *rcu_read_lock_tasks_trace(void)
> > +{
> > +	struct srcu_ctr __percpu *ret = srcu_read_lock_fast(&rcu_tasks_trace_srcu_struct);
> > +
> > +	if (IS_ENABLED(CONFIG_ARCH_WANTS_NO_INSTR))
> > +		smp_mb();
> 
> I am somewhat confused by the relation between noinstr and smp_mb()
> here. Subject mentions is, but Changelog is awfully silent again.

Thank you for looking this over!

To Alexei's point, this commit should be merged with 18/34.

> Furthermore I note that this is a positive while unlock is a negative
> relation between the two. Which adds even more confusion.

You are right, at most one of these two conditions can be correct.  ;-)

I believe that the one above needs a "!".

The point of this is that architectures that set ARCH_WANTS_NO_INSTR
have promised that any point in the entry/exit code that RCU is not
watching has been marked noinstr.  For those architectures, SRCU-fast
can rely on the fact that the key updates in __srcu_read_lock_fast()
and __srcu_read_unlock_fast() are either interrrupt-disabled regions or
atomic operations, depending on the architecture.  This means that
the synchronize_rcu{,_expedited}() calls in the SRCU-fast grace-period
code will be properly ordered with those accesses.

But for !ARCH_WANTS_NO_INSTR architectures, it is possible to attach
various forms of tracing to entry/exit code that RCU is not watching,
which means that those synchronize_rcu{,_expedited}() calls won't have
the needed ordering properties.  So we use smp_mb() on the read side
to force the needed ordering.

Does that help, or am I missing the point of your question?

							Thanx, Paul

> > +	return ret;
> > +}
> > +
> > +/**
> > + * rcu_read_unlock_tasks_trace - mark end of RCU-trace read-side critical section
> > + * @scp: return value from corresponding rcu_read_lock_tasks_trace().
> > + *
> > + * Pairs with the preceding call to rcu_read_lock_tasks_trace() that
> > + * returned the value passed in via scp.
> > + *
> > + * For more details, please see the documentation for rcu_read_unlock().
> > + */
> > +static inline void rcu_read_unlock_tasks_trace(struct srcu_ctr __percpu *scp)
> > +{
> > +	if (!IS_ENABLED(CONFIG_ARCH_WANTS_NO_INSTR))
> > +		smp_mb();
> > +	srcu_read_unlock_fast(&rcu_tasks_trace_srcu_struct, scp);
> > +}
> > +
> >  /**
> >   * rcu_read_lock_trace - mark beginning of RCU-trace read-side critical section
> >   *
> > -- 
> > 2.40.1
> >
Re: [PATCH 15/34] rcu: Add noinstr-fast rcu_read_{,un}lock_tasks_trace() APIs
Posted by Peter Zijlstra 1 week ago
On Wed, Sep 24, 2025 at 01:44:09AM -0700, Paul E. McKenney wrote:
> On Tue, Sep 23, 2025 at 07:32:16PM +0200, Peter Zijlstra wrote:
> > On Tue, Sep 23, 2025 at 07:20:17AM -0700, Paul E. McKenney wrote:
> > > When expressing RCU Tasks Trace in terms of SRCU-fast, it was
> > > necessary to keep a nesting count and per-CPU srcu_ctr structure
> > > pointer in the task_struct structure, which is slow to access.
> > > But an alternative is to instead make rcu_read_lock_tasks_trace() and
> > > rcu_read_unlock_tasks_trace(), which match the underlying SRCU-fast
> > > semantics, avoiding the task_struct accesses.
> > > 
> > > When all callers have switched to the new API, the previous
> > > rcu_read_lock_trace() and rcu_read_unlock_trace() APIs will be removed.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: Andrii Nakryiko <andrii@kernel.org>
> > > Cc: Alexei Starovoitov <ast@kernel.org>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: <bpf@vger.kernel.org>
> > > ---
> > >  include/linux/rcupdate_trace.h | 37 ++++++++++++++++++++++++++++++++++
> > >  1 file changed, 37 insertions(+)
> > > 
> > > diff --git a/include/linux/rcupdate_trace.h b/include/linux/rcupdate_trace.h
> > > index 0bd47f12ecd17b..b87151e6b23881 100644
> > > --- a/include/linux/rcupdate_trace.h
> > > +++ b/include/linux/rcupdate_trace.h
> > > @@ -34,6 +34,43 @@ static inline int rcu_read_lock_trace_held(void)
> > >  
> > >  #ifdef CONFIG_TASKS_TRACE_RCU
> > >  
> > > +/**
> > > + * rcu_read_lock_tasks_trace - mark beginning of RCU-trace read-side critical section
> > > + *
> > > + * When synchronize_rcu_tasks_trace() is invoked by one task, then that
> > > + * task is guaranteed to block until all other tasks exit their read-side
> > > + * critical sections.  Similarly, if call_rcu_trace() is invoked on one
> > > + * task while other tasks are within RCU read-side critical sections,
> > > + * invocation of the corresponding RCU callback is deferred until after
> > > + * the all the other tasks exit their critical sections.
> > > + *
> > > + * For more details, please see the documentation for srcu_read_lock_fast().
> > > + */
> > > +static inline struct srcu_ctr __percpu *rcu_read_lock_tasks_trace(void)
> > > +{
> > > +	struct srcu_ctr __percpu *ret = srcu_read_lock_fast(&rcu_tasks_trace_srcu_struct);
> > > +
> > > +	if (IS_ENABLED(CONFIG_ARCH_WANTS_NO_INSTR))
> > > +		smp_mb();
> > 
> > I am somewhat confused by the relation between noinstr and smp_mb()
> > here. Subject mentions is, but Changelog is awfully silent again.
> 
> Thank you for looking this over!
> 
> To Alexei's point, this commit should be merged with 18/34.
> 
> > Furthermore I note that this is a positive while unlock is a negative
> > relation between the two. Which adds even more confusion.
> 
> You are right, at most one of these two conditions can be correct.  ;-)
> 
> I believe that the one above needs a "!".

Whew :-)

> The point of this is that architectures that set ARCH_WANTS_NO_INSTR
> have promised that any point in the entry/exit code that RCU is not
> watching has been marked noinstr.  For those architectures, SRCU-fast
> can rely on the fact that the key updates in __srcu_read_lock_fast()
> and __srcu_read_unlock_fast() are either interrrupt-disabled regions or
> atomic operations, depending on the architecture.  This means that
> the synchronize_rcu{,_expedited}() calls in the SRCU-fast grace-period
> code will be properly ordered with those accesses.
> 
> But for !ARCH_WANTS_NO_INSTR architectures, it is possible to attach
> various forms of tracing to entry/exit code that RCU is not watching,
> which means that those synchronize_rcu{,_expedited}() calls won't have
> the needed ordering properties.  So we use smp_mb() on the read side
> to force the needed ordering.
> 
> Does that help, or am I missing the point of your question?

Yes, that was indeed what I was asking. This might make a good comment
to go along with that IS_ENABLED(CONFIG_ARCH_WANTS_NO_INSTR) thing.

Thanks!