[PATCH v6 0/3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast

Steven Rostedt posted 3 patches 1 week, 4 days ago
include/linux/tracepoint.h   |  9 +++++----
include/trace/perf.h         |  4 ++--
include/trace/trace_events.h |  4 ++--
kernel/trace/bpf_trace.c     |  5 ++---
kernel/tracepoint.c          | 18 ++++++++++++++----
5 files changed, 25 insertions(+), 15 deletions(-)
[PATCH v6 0/3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
Posted by Steven Rostedt 1 week, 4 days ago
The current use of guard(preempt_notrace)() within __DECLARE_TRACE()
to protect invocation of __DO_TRACE_CALL() means that BPF programs
attached to tracepoints are non-preemptible.  This is unhelpful in
real-time systems, whose users apparently wish to use BPF while also
achieving low latencies.

Change the protection of tracepoints to use fast_srcu() instead.
This will allow the callbacks to be able to be preempted. This also
means that the callbacks themselves need to be able to handle this
new found preemption ability.

For perf, add a guard(preempt) inside its handler too keep the old behavior
of perf events being called with preemption disabled.

For BPF, add a migrate_disable() to its handler. Actually, just replace
the rcu_read_lock() with rcu_read_lock_dont_migrate() and make it
cover more of the BPF callback handler.

[ I would have sent this out earlier, but had a death in the family
  which cause everything to be postponed ]

Changes since v5: https://patch.msgid.link/20260108220550.2f6638f3@fedora

- Add separate patch for perf to call preempt_disable()

- Add patch that has bpf call migrate_disable() directly.

- Just change from preempt_disable() to srcu_fast() always
  Do not do anything different for PREEMPT_RT.
  Now that BPF disables migration directly, do not have tracepoints
  disable migration in its code.

Steven Rostedt (3):
      tracing: perf: Have perf tracepoint callbacks always disable preemption
      bpf: Have __bpf_trace_run() use rcu_read_lock_dont_migrate()
      tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast

----
 include/linux/tracepoint.h   |  9 +++++----
 include/trace/perf.h         |  4 ++--
 include/trace/trace_events.h |  4 ++--
 kernel/trace/bpf_trace.c     |  5 ++---
 kernel/tracepoint.c          | 18 ++++++++++++++----
 5 files changed, 25 insertions(+), 15 deletions(-)
Re: [PATCH v6 0/3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
Posted by Steven Rostedt 1 week, 4 days ago
On Mon, 26 Jan 2026 18:11:45 -0500
Steven Rostedt <rostedt@kernel.org> wrote:

> The current use of guard(preempt_notrace)() within __DECLARE_TRACE()
> to protect invocation of __DO_TRACE_CALL() means that BPF programs
> attached to tracepoints are non-preemptible.  This is unhelpful in
> real-time systems, whose users apparently wish to use BPF while also
> achieving low latencies.
> 
> Change the protection of tracepoints to use fast_srcu() instead.
> This will allow the callbacks to be able to be preempted. This also
> means that the callbacks themselves need to be able to handle this
> new found preemption ability.
> 
> For perf, add a guard(preempt) inside its handler too keep the old behavior
> of perf events being called with preemption disabled.
> 
> For BPF, add a migrate_disable() to its handler. Actually, just replace
> the rcu_read_lock() with rcu_read_lock_dont_migrate() and make it
> cover more of the BPF callback handler.

My tests just triggered this, so I'm removing them from my queue for now.

-- Steve


[  204.194772] ------------[ cut here ]------------
[  204.194789] WARNING: kernel/rcu/srcutree.c:792 at __srcu_check_read_flavor+0x5c/0xb0, CPU#1: swapper/1/0
[  204.194800] Modules linked in:
[  204.194817] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.19.0-rc7-test-00018-g2c774d6ad074-dirty #32 PREEMPT(voluntary) 
[  204.194821] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
[  204.194824] RIP: 0010:__srcu_check_read_flavor+0x5c/0xb0
[  204.194829] Code: 84 c9 74 19 39 f1 74 45 0f 0b 85 c0 74 2e 39 c1 74 45 0f 0b 39 f0 75 3f c3 cc cc cc cc 85 c0 74 16 83 fe 04 75 ee 0f 0b eb ea <0f> 0b 8d 46 ff 85 f0 74 ba 0f 0b eb b6 83
 fe 04 74 3a 31 c0 f0 0f
[  204.194832] RSP: 0018:fffffe4c48325b50 EFLAGS: 00010002
[  204.194835] RAX: 0000000000000001 RBX: ffffffff8791e5a0 RCX: 0000000000000000
[  204.194836] RDX: 00000000ffffffff RSI: 0000000000000004 RDI: ffffffff879f1180
[  204.194838] RBP: ffff8e6453fd2000 R08: 0000000000000001 R09: 0000000000000000
[  204.194839] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
[  204.194840] R13: fffffe4c48325ef8 R14: ffffffff85eeae93 R15: ffff8e6453906900
[  204.194842] FS:  0000000000000000(0000) GS:ffff8e6533593000(0000) knlGS:0000000000000000
[  204.194844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  204.194845] CR2: 000055d1e7cf8cc0 CR3: 000000010b0cc004 CR4: 0000000000172ef0
[  204.194850] Call Trace:
[  204.194866]  <NMI>
[  204.194868]  lock_release+0x215/0x320
[  204.194886]  ? arch_perf_update_userpage+0x6c/0xf0
[  204.195214]  perf_event_update_userpage+0x158/0x2e0
[  204.195538]  x86_perf_event_set_period+0xc1/0x180
[  204.195811]  handle_pmi_common+0x1ac/0x450
[  204.198605]  ? __get_next_timer_interrupt+0x185/0x370
[  204.198914]  intel_pmu_handle_irq+0x10e/0x510
[  204.199032]  ? nmi_handle.part.0+0x30/0x270
[  204.199197]  ? __get_next_timer_interrupt+0x185/0x370
[  204.199404]  perf_event_nmi_handler+0x34/0x60
[  204.199523]  nmi_handle.part.0+0xc9/0x270
Re: [PATCH v6 0/3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
Posted by Paul E. McKenney 1 week, 3 days ago
On Mon, Jan 26, 2026 at 09:39:22PM -0500, Steven Rostedt wrote:
> On Mon, 26 Jan 2026 18:11:45 -0500
> Steven Rostedt <rostedt@kernel.org> wrote:
> 
> > The current use of guard(preempt_notrace)() within __DECLARE_TRACE()
> > to protect invocation of __DO_TRACE_CALL() means that BPF programs
> > attached to tracepoints are non-preemptible.  This is unhelpful in
> > real-time systems, whose users apparently wish to use BPF while also
> > achieving low latencies.
> > 
> > Change the protection of tracepoints to use fast_srcu() instead.
> > This will allow the callbacks to be able to be preempted. This also
> > means that the callbacks themselves need to be able to handle this
> > new found preemption ability.
> > 
> > For perf, add a guard(preempt) inside its handler too keep the old behavior
> > of perf events being called with preemption disabled.
> > 
> > For BPF, add a migrate_disable() to its handler. Actually, just replace
> > the rcu_read_lock() with rcu_read_lock_dont_migrate() and make it
> > cover more of the BPF callback handler.
> 
> My tests just triggered this, so I'm removing them from my queue for now.

Huh.  "Works for me."

Ah, I get it.  I think.  NMIs, right?

In your source tree, line 792 of kernel/rcu/srcutree.c is this line of
code, correct?

	WARN_ON_ONCE((read_flavor != SRCU_READ_FLAVOR_NMI) && in_nmi());

If so, could you please try this test with the patch shown at the end
of this email?

> -- Steve
> 
> 
> [  204.194772] ------------[ cut here ]------------
> [  204.194789] WARNING: kernel/rcu/srcutree.c:792 at __srcu_check_read_flavor+0x5c/0xb0, CPU#1: swapper/1/0
> [  204.194800] Modules linked in:
> [  204.194817] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.19.0-rc7-test-00018-g2c774d6ad074-dirty #32 PREEMPT(voluntary) 
> [  204.194821] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
> [  204.194824] RIP: 0010:__srcu_check_read_flavor+0x5c/0xb0
> [  204.194829] Code: 84 c9 74 19 39 f1 74 45 0f 0b 85 c0 74 2e 39 c1 74 45 0f 0b 39 f0 75 3f c3 cc cc cc cc 85 c0 74 16 83 fe 04 75 ee 0f 0b eb ea <0f> 0b 8d 46 ff 85 f0 74 ba 0f 0b eb b6 83
>  fe 04 74 3a 31 c0 f0 0f
> [  204.194832] RSP: 0018:fffffe4c48325b50 EFLAGS: 00010002
> [  204.194835] RAX: 0000000000000001 RBX: ffffffff8791e5a0 RCX: 0000000000000000
> [  204.194836] RDX: 00000000ffffffff RSI: 0000000000000004 RDI: ffffffff879f1180
> [  204.194838] RBP: ffff8e6453fd2000 R08: 0000000000000001 R09: 0000000000000000
> [  204.194839] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
> [  204.194840] R13: fffffe4c48325ef8 R14: ffffffff85eeae93 R15: ffff8e6453906900
> [  204.194842] FS:  0000000000000000(0000) GS:ffff8e6533593000(0000) knlGS:0000000000000000
> [  204.194844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  204.194845] CR2: 000055d1e7cf8cc0 CR3: 000000010b0cc004 CR4: 0000000000172ef0
> [  204.194850] Call Trace:
> [  204.194866]  <NMI>
> [  204.194868]  lock_release+0x215/0x320
> [  204.194886]  ? arch_perf_update_userpage+0x6c/0xf0
> [  204.195214]  perf_event_update_userpage+0x158/0x2e0
> [  204.195538]  x86_perf_event_set_period+0xc1/0x180
> [  204.195811]  handle_pmi_common+0x1ac/0x450
> [  204.198605]  ? __get_next_timer_interrupt+0x185/0x370
> [  204.198914]  intel_pmu_handle_irq+0x10e/0x510
> [  204.199032]  ? nmi_handle.part.0+0x30/0x270
> [  204.199197]  ? __get_next_timer_interrupt+0x185/0x370
> [  204.199404]  perf_event_nmi_handler+0x34/0x60
> [  204.199523]  nmi_handle.part.0+0xc9/0x270

------------------------------------------------------------------------

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index c469c708fdd6a..66ba6a2f83d3a 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -789,7 +789,8 @@ void __srcu_check_read_flavor(struct srcu_struct *ssp, int read_flavor)
 	struct srcu_data *sdp;
 
 	/* NMI-unsafe use in NMI is a bad sign, as is multi-bit read_flavor values. */
-	WARN_ON_ONCE((read_flavor != SRCU_READ_FLAVOR_NMI) && in_nmi());
+	WARN_ON_ONCE(read_flavor != SRCU_READ_FLAVOR_NMI &&
+		     read_flavor != SRCU_READ_FLAVOR_FAST && in_nmi());
 	WARN_ON_ONCE(read_flavor & (read_flavor - 1));
 
 	sdp = raw_cpu_ptr(ssp->sda);
Re: [PATCH v6 0/3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
Posted by Steven Rostedt 1 week, 1 day ago
On Tue, 27 Jan 2026 15:18:05 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> Ah, I get it.  I think.  NMIs, right?
> 
> In your source tree, line 792 of kernel/rcu/srcutree.c is this line of
> code, correct?
> 
> 	WARN_ON_ONCE((read_flavor != SRCU_READ_FLAVOR_NMI) && in_nmi());
> 
> If so, could you please try this test with the patch shown at the end
> of this email?
> 

> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index c469c708fdd6a..66ba6a2f83d3a 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -789,7 +789,8 @@ void __srcu_check_read_flavor(struct srcu_struct *ssp, int read_flavor)
>  	struct srcu_data *sdp;
>  
>  	/* NMI-unsafe use in NMI is a bad sign, as is multi-bit read_flavor values. */
> -	WARN_ON_ONCE((read_flavor != SRCU_READ_FLAVOR_NMI) && in_nmi());
> +	WARN_ON_ONCE(read_flavor != SRCU_READ_FLAVOR_NMI &&
> +		     read_flavor != SRCU_READ_FLAVOR_FAST && in_nmi());
>  	WARN_ON_ONCE(read_flavor & (read_flavor - 1));
>  
>  	sdp = raw_cpu_ptr(ssp->sda);

It appears to fix the issue.

Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Care to send a formal patch, and I'll add it before the patch that causes
issues.

Thanks,

-- Steve
Re: [PATCH v6 0/3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
Posted by Paul E. McKenney 1 week, 1 day ago
On Thu, Jan 29, 2026 at 07:33:59PM -0500, Steven Rostedt wrote:
> On Tue, 27 Jan 2026 15:18:05 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > Ah, I get it.  I think.  NMIs, right?
> > 
> > In your source tree, line 792 of kernel/rcu/srcutree.c is this line of
> > code, correct?
> > 
> > 	WARN_ON_ONCE((read_flavor != SRCU_READ_FLAVOR_NMI) && in_nmi());
> > 
> > If so, could you please try this test with the patch shown at the end
> > of this email?
> > 
> 
> > 
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index c469c708fdd6a..66ba6a2f83d3a 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -789,7 +789,8 @@ void __srcu_check_read_flavor(struct srcu_struct *ssp, int read_flavor)
> >  	struct srcu_data *sdp;
> >  
> >  	/* NMI-unsafe use in NMI is a bad sign, as is multi-bit read_flavor values. */
> > -	WARN_ON_ONCE((read_flavor != SRCU_READ_FLAVOR_NMI) && in_nmi());
> > +	WARN_ON_ONCE(read_flavor != SRCU_READ_FLAVOR_NMI &&
> > +		     read_flavor != SRCU_READ_FLAVOR_FAST && in_nmi());
> >  	WARN_ON_ONCE(read_flavor & (read_flavor - 1));
> >  
> >  	sdp = raw_cpu_ptr(ssp->sda);
> 
> It appears to fix the issue.
> 
> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> 
> Care to send a formal patch, and I'll add it before the patch that causes
> issues.

Thank you, done, and apologies for the hassle!  This should show up
here in a bit:

https://lore.kernel.org/all/8232efe8-a7a3-446c-af0b-19f9b523b4f7@paulmck-laptop/

And I have it below, just in case.

							Thanx, Paul

------------------------------------------------------------------------

commit 0bf3a51bef3c33ea528c96720ab6d6211d9009cf
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue Jan 27 15:20:02 2026 -0800

    srcu: Fix warning to permit SRCU-fast readers in NMI handlers
    
    SRCU-fast is designed to be used in NMI handlers, even going so far
    as to use atomic operations for architectures supporting NMIs but not
    providing NMI-safe per-CPU atomic operations.  However, the WARN_ON_ONCE()
    in __srcu_check_read_flavor() complains if SRCU-fast is used in an NMI
    handler.  This commit therefore modifies that WARN_ON_ONCE() to avoid
    such complaints.
    
    Reported-by: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Tested-by: Steven Rostedt <rostedt@goodmis.org>
    Cc: Andrii Nakryiko <andrii@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: bpf@vger.kernel.org

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index c469c708fdd6a..66ba6a2f83d3a 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -789,7 +789,8 @@ void __srcu_check_read_flavor(struct srcu_struct *ssp, int read_flavor)
 	struct srcu_data *sdp;
 
 	/* NMI-unsafe use in NMI is a bad sign, as is multi-bit read_flavor values. */
-	WARN_ON_ONCE((read_flavor != SRCU_READ_FLAVOR_NMI) && in_nmi());
+	WARN_ON_ONCE(read_flavor != SRCU_READ_FLAVOR_NMI &&
+		     read_flavor != SRCU_READ_FLAVOR_FAST && in_nmi());
 	WARN_ON_ONCE(read_flavor & (read_flavor - 1));
 
 	sdp = raw_cpu_ptr(ssp->sda);
Re: [PATCH v6 0/3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
Posted by Steven Rostedt 1 week, 1 day ago
On Thu, 29 Jan 2026 17:32:24 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> Thank you, done, and apologies for the hassle!  This should show up
> here in a bit:
> 
> https://lore.kernel.org/all/8232efe8-a7a3-446c-af0b-19f9b523b4f7@paulmck-laptop/

I can get it from there. Also if you Cc
linux-trace-kernel@vger.kernel.org, I would be able to pull it from
patchwork.

-- Steve