From nobody Sat Feb 7 17:55:30 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2693F30DD13; Mon, 26 Jan 2026 23:12:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769469173; cv=none; b=tV08+zU5rEIdZHkYS8KcgYq57+LIVttAMywDvFkV3xW2DjNWwlDtPpPoMMh3TcrSCrxWAvNHkooVIB81B5gRbtEoAERyIsXqWNwPzBAJgTcwXMq/teh2stv9szsM21IOtBnpfESAF2cwWJ+iSBthSRdobibVBtqG1S7e5DVKT6k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769469173; c=relaxed/simple; bh=elb0itkw64ruXAZKFW/E4Sx35NGysHyhNQC3sZzrV9g=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=EQQVrENNm8iiTFGKECJqD0uTkp3/IYczyB3rCuRniuFUozD6RodPdBvNqyxN3+LC4MiFhIVNnv28NWd4IPxCC5Bqx5HIslPEQQlbmrzaEdX8vbxD2I1GT+t9Jd8NS1oa+g0ENRF7qb4FuETGPqC2Wwr59JQlLm3Phcm7PkztFLY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=EoCqedrE; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="EoCqedrE" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A8A72C19425; Mon, 26 Jan 2026 23:12:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1769469172; bh=elb0itkw64ruXAZKFW/E4Sx35NGysHyhNQC3sZzrV9g=; h=Date:From:To:Cc:Subject:References:From; b=EoCqedrEyyTba4PHYbIt2s2AyKHJMJy5wldmgiA3LOVRjx8ofW8hgsx56VVwjAJNi CgFZTkeb2/SHIZsjm0FwNS9kiHXVX8/QDHQePKvK1Mi4YkXEQSlfGeKU+3P0DqQ3Me mV5tcC+QYR+x3LcDdqiBSMLhKyQC6o9gvvpxsz2615E+n2rFl/vxfXwXxJKcJetS2C FHXfb9gMdfNiLY1u8IYseJFI9JyJ377mJ77GOkzRcqlS2qRTINLEcech8WNGMIrkNJ q+nZYemJDm14tp8hvnw7qtAAa9FdGGf555wFZXhAejnDR9GoF+JCSepzlQcXC2E4oq Tb5mmznQ3Egmw== Received: from rostedt by gandalf with local (Exim 4.99.1) (envelope-from ) id 1vkVlQ-00000000C0I-1KEO; Mon, 26 Jan 2026 18:12:56 -0500 Message-ID: <20260126231256.174621257@kernel.org> User-Agent: quilt/0.68 Date: Mon, 26 Jan 2026 18:11:46 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , "Paul E. McKenney" , Sebastian Andrzej Siewior , Alexei Starovoitov Subject: [PATCH v6 1/3] tracing: perf: Have perf tracepoint callbacks always disable preemption References: <20260126231145.728172709@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt In preparation to convert protection of tracepoints from being protected by a preempt disabled section to being protected by SRCU, have all the perf callbacks disable preemption as perf expects preemption to be disabled when processing tracepoints. While at it, convert the perf system call callback preempt_disable() to a guard(preempt). Link: https://lore.kernel.org/all/20250613152218.1924093-1-bigeasy@linutron= ix.de/ Link: https://patch.msgid.link/20260108220550.2f6638f3@fedora Signed-off-by: Steven Rostedt (Google) Tested-by: Steven Rostedt (Google) --- include/trace/perf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/trace/perf.h b/include/trace/perf.h index a1754b73a8f5..348ad1d9b556 100644 --- a/include/trace/perf.h +++ b/include/trace/perf.h @@ -71,6 +71,7 @@ perf_trace_##call(void *__data, proto) \ u64 __count __attribute__((unused)); \ struct task_struct *__task __attribute__((unused)); \ \ + guard(preempt_notrace)(); \ do_perf_trace_##call(__data, args); \ } =20 @@ -85,9 +86,8 @@ perf_trace_##call(void *__data, proto) \ struct task_struct *__task __attribute__((unused)); \ \ might_fault(); \ - preempt_disable_notrace(); \ + guard(preempt_notrace)(); \ do_perf_trace_##call(__data, args); \ - preempt_enable_notrace(); \ } =20 /* --=20 2.51.0 From nobody Sat Feb 7 17:55:30 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 289F230DD31; Mon, 26 Jan 2026 23:12:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769469173; cv=none; b=A942lV2DjJ4D1gAzwAeGWA2t+r6XCR7e7EiDDbcbk/g/U2CVFb6GF3RiJdheubfST/hARXU7ZABrfw8wscO0m3BpeRc2cEkF6ZkYv7152Sa8VDNR7BLzCG9xqcdJoH7XSaYTxyZ2btKKtjO0QnZJv6mnCqqVR9G/Zrx/kUOzn/k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769469173; c=relaxed/simple; bh=Btlf+O4AfMiaH+Tpkc8CIcpEHiL8iAmq3ZoEmAb2gv0=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=fouRd2ciIBiLvdjZ7kX8dP9YE4NjP8sHwrTQR2tPIFWQs42tTJ51V62Sg8DWBHKATT/XozMOAEf6mn4r+aH7nzEWvNFJg/PFTRgOqa28f4it6kpbL5y12tI6JeZy0D7MsDqX73AaL8eEjvCWdM4kgKBoaOQS8aOwiE5Ypc80oAA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ukLTe/M6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ukLTe/M6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D41CBC2BCB2; Mon, 26 Jan 2026 23:12:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1769469172; bh=Btlf+O4AfMiaH+Tpkc8CIcpEHiL8iAmq3ZoEmAb2gv0=; h=Date:From:To:Cc:Subject:References:From; b=ukLTe/M6RA57QeBDYaGRFsS143CWJq9ZGx48JoI8CVHjJteczIcE3WXi0CoUgstg3 Mnd2ec/rv4VJ0r3u945EMlW2uFrZ261gRp6uDLrHr7lu1jLxOvV2cK2F43/dKijglA nPXbBL02c6ygNQRzcgg5kC3m2zxvV/X9n9dPtUtOvimHIxA8r9WNgt/mMgats1Oy4+ 1asLUClDXL3XStu10IJ77j7aNyqB76rnuj0WaWU3SI0Iz1501QmbVx1USrxCdNEOc5 wxFDLql+p/zKj4EoIW3L8V3TRDUMjYHQOMhmlYfouDSfbduhyNqa/l5xfiHQyzVjZM NnTT6IjD7o3RA== Received: from rostedt by gandalf with local (Exim 4.99.1) (envelope-from ) id 1vkVlQ-00000000C0m-20cu; Mon, 26 Jan 2026 18:12:56 -0500 Message-ID: <20260126231256.335034877@kernel.org> User-Agent: quilt/0.68 Date: Mon, 26 Jan 2026 18:11:47 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , "Paul E. McKenney" , Sebastian Andrzej Siewior , Alexei Starovoitov , Alexei Starovoitov Subject: [PATCH v6 2/3] bpf: Have __bpf_trace_run() use rcu_read_lock_dont_migrate() References: <20260126231145.728172709@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt In order to switch the protection of tracepoint callbacks from preempt_disable() to srcu_read_lock_fast() the BPF callback from tracepoints needs to have migration prevention as the BPF programs expect to stay on the same CPU as they execute. Put together the RCU protection with migration prevention and use rcu_read_lock_dont_migrate() in __bpf_trace_run(). This will allow tracepoints callbacks to be preemptible. Link: https://lore.kernel.org/all/CAADnVQKvY026HSFGOsavJppm3-Ajm-VsLzY-OeFU= e+BaKMRnDg@mail.gmail.com/ Suggested-by: Alexei Starovoitov Signed-off-by: Steven Rostedt (Google) Tested-by: Steven Rostedt (Google) --- kernel/trace/bpf_trace.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index fe28d86f7c35..abbf0177ad20 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2062,7 +2062,7 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u6= 4 *args) struct bpf_run_ctx *old_run_ctx; struct bpf_trace_run_ctx run_ctx; =20 - cant_sleep(); + rcu_read_lock_dont_migrate(); if (unlikely(this_cpu_inc_return(*(prog->active)) !=3D 1)) { bpf_prog_inc_misses_counter(prog); goto out; @@ -2071,13 +2071,12 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, = u64 *args) run_ctx.bpf_cookie =3D link->cookie; old_run_ctx =3D bpf_set_run_ctx(&run_ctx.run_ctx); =20 - rcu_read_lock(); (void) bpf_prog_run(prog, args); - rcu_read_unlock(); =20 bpf_reset_run_ctx(old_run_ctx); out: this_cpu_dec(*(prog->active)); + rcu_read_unlock_migrate(); } =20 #define UNPACK(...) __VA_ARGS__ --=20 2.51.0 From nobody Sat Feb 7 17:55:30 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B69230DD34; Mon, 26 Jan 2026 23:12:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769469173; cv=none; b=uzQNaungSJ3JrwLOKA5KtuKM8H4JLOdULZhiWrXrqqjDkKfiImTrRg/VZTflnZ1L548I5O7zmS7ZB+H1U2lkjMK0+jJvn74WdqejGrj3myYsf6mtD0J/ldo4GFEs5eOtOlK1FibEaNQ9hPK1yGpywiVFu/4Hcc0mC6+A4Po+dKA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769469173; c=relaxed/simple; bh=TRlEcpWkzD9Pl1RTO8PDInWuAWdUdhbBwrEj7z437Wk=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=vAOSS3ixRn9C3EbwVAtl5SHlNcxI2zWpsdpFfnVlFDe3AzrvU3NAPUBQecs0phoWT7wU/P3ogChmi20G5+y9cNGp1S0F5dDoL5xPkrAxtgmlDnJFqduyDPV+aS57Kzc12te/4Emq2YyrrLpvE1/PoG2oluZytGORDopS9XR4nvc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Q0QHj6TV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Q0QHj6TV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F02AFC2BCB1; Mon, 26 Jan 2026 23:12:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1769469173; bh=TRlEcpWkzD9Pl1RTO8PDInWuAWdUdhbBwrEj7z437Wk=; h=Date:From:To:Cc:Subject:References:From; b=Q0QHj6TVKF99it5UcksXXz9k8z9BB9G/eeltc2ghhMiXnaXV38Efku+LcOJe0SF6P XdPlN+l+BHdvLfFNd4w1yX5u3YI+6ftc6iuvu8wyC8J0l4kZc4VJLGxOgw3p6212IF iFcnfh5Dn2zHvrGL+hVE8r1KZd3L3INrrVyvPNbiat6uYtOhIr15N5mnR8O4biOgd+ p70IT+pxGqHOdzs4rjPzad/NvW/wrL5FZ0gjUj5ktYHhS3AUL5GmuGFRbOSsmQB1+r CCCItUogxtjDvbdgId94Q/GciySIW01v/dV1YbnBrJJ3PmbaUVbTSD99BkCZ3g14mB gXCLrI7APQRsQ== Received: from rostedt by gandalf with local (Exim 4.99.1) (envelope-from ) id 1vkVlQ-00000000C1G-2gYn; Mon, 26 Jan 2026 18:12:56 -0500 Message-ID: <20260126231256.499701982@kernel.org> User-Agent: quilt/0.68 Date: Mon, 26 Jan 2026 18:11:48 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , "Paul E. McKenney" , Sebastian Andrzej Siewior , Alexei Starovoitov Subject: [PATCH v6 3/3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast References: <20260126231145.728172709@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt The current use of guard(preempt_notrace)() within __DECLARE_TRACE() to protect invocation of __DO_TRACE_CALL() means that BPF programs attached to tracepoints are non-preemptible. This is unhelpful in real-time systems, whose users apparently wish to use BPF while also achieving low latencies. (Who knew?) One option would be to use preemptible RCU, but this introduces many opportunities for infinite recursion, which many consider to be counterproductive, especially given the relatively small stacks provided by the Linux kernel. These opportunities could be shut down by sufficiently energetic duplication of code, but this sort of thing is considered impolite in some circles. Therefore, use the shiny new SRCU-fast API, which provides somewhat faster readers than those of preemptible RCU, at least on Paul E. McKenney's laptop, where task_struct access is more expensive than access to per-CPU variables. And SRCU-fast provides way faster readers than does SRCU, courtesy of being able to avoid the read-side use of smp_mb(). Also, it is quite straightforward to create srcu_read_{,un}lock_fast_notrace() functions. Link: https://lore.kernel.org/all/20250613152218.1924093-1-bigeasy@linutron= ix.de/ Co-developed-by: Paul E. McKenney Signed-off-by: Paul E. McKenney Signed-off-by: Steven Rostedt (Google) Tested-by: Steven Rostedt (Google) --- Changes since v5: https://patch.msgid.link/20260108220550.2f6638f3@fedora - Just change from preempt_disable() to srcu_fast() always Do not do anything different for PREEMPT_RT. Now that BPF disables migration directly, do not have tracepoints disable migration in its code. include/linux/tracepoint.h | 9 +++++---- include/trace/trace_events.h | 4 ++-- kernel/tracepoint.c | 18 ++++++++++++++---- 3 files changed, 21 insertions(+), 10 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 8a56f3278b1b..22ca1c8b54f3 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -108,14 +108,15 @@ void for_each_tracepoint_in_module(struct module *mod, * An alternative is to use the following for batch reclaim associated * with a given tracepoint: * - * - tracepoint_is_faultable() =3D=3D false: call_rcu() + * - tracepoint_is_faultable() =3D=3D false: call_srcu() * - tracepoint_is_faultable() =3D=3D true: call_rcu_tasks_trace() */ #ifdef CONFIG_TRACEPOINTS +extern struct srcu_struct tracepoint_srcu; static inline void tracepoint_synchronize_unregister(void) { synchronize_rcu_tasks_trace(); - synchronize_rcu(); + synchronize_srcu(&tracepoint_srcu); } static inline bool tracepoint_is_faultable(struct tracepoint *tp) { @@ -275,13 +276,13 @@ static inline struct tracepoint *tracepoint_ptr_deref= (tracepoint_ptr_t *p) return static_branch_unlikely(&__tracepoint_##name.key);\ } =20 -#define __DECLARE_TRACE(name, proto, args, cond, data_proto) \ +#define __DECLARE_TRACE(name, proto, args, cond, data_proto) \ __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), PARAMS(data_pro= to)) \ static inline void __do_trace_##name(proto) \ { \ TRACEPOINT_CHECK(name) \ if (cond) { \ - guard(preempt_notrace)(); \ + guard(srcu_fast_notrace)(&tracepoint_srcu); \ __DO_TRACE_CALL(name, TP_ARGS(args)); \ } \ } \ diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h index 4f22136fd465..fbc07d353be6 100644 --- a/include/trace/trace_events.h +++ b/include/trace/trace_events.h @@ -436,6 +436,7 @@ __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args)= , PARAMS(tstruct), \ static notrace void \ trace_event_raw_event_##call(void *__data, proto) \ { \ + guard(preempt_notrace)(); \ do_trace_event_raw_event_##call(__data, args); \ } =20 @@ -447,9 +448,8 @@ static notrace void \ trace_event_raw_event_##call(void *__data, proto) \ { \ might_fault(); \ - preempt_disable_notrace(); \ + guard(preempt_notrace)(); \ do_trace_event_raw_event_##call(__data, args); \ - preempt_enable_notrace(); \ } =20 /* diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 62719d2941c9..fd2ee879815c 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -34,9 +34,13 @@ enum tp_transition_sync { =20 struct tp_transition_snapshot { unsigned long rcu; + unsigned long srcu_gp; bool ongoing; }; =20 +DEFINE_SRCU_FAST(tracepoint_srcu); +EXPORT_SYMBOL_GPL(tracepoint_srcu); + /* Protected by tracepoints_mutex */ static struct tp_transition_snapshot tp_transition_snapshot[_NR_TP_TRANSIT= ION_SYNC]; =20 @@ -46,6 +50,7 @@ static void tp_rcu_get_state(enum tp_transition_sync sync) =20 /* Keep the latest get_state snapshot. */ snapshot->rcu =3D get_state_synchronize_rcu(); + snapshot->srcu_gp =3D start_poll_synchronize_srcu(&tracepoint_srcu); snapshot->ongoing =3D true; } =20 @@ -56,6 +61,8 @@ static void tp_rcu_cond_sync(enum tp_transition_sync sync) if (!snapshot->ongoing) return; cond_synchronize_rcu(snapshot->rcu); + if (!poll_state_synchronize_srcu(&tracepoint_srcu, snapshot->srcu_gp)) + synchronize_srcu(&tracepoint_srcu); snapshot->ongoing =3D false; } =20 @@ -112,10 +119,13 @@ static inline void release_probes(struct tracepoint *= tp, struct tracepoint_func struct tp_probes *tp_probes =3D container_of(old, struct tp_probes, probes[0]); =20 - if (tracepoint_is_faultable(tp)) - call_rcu_tasks_trace(&tp_probes->rcu, rcu_free_old_probes); - else - call_rcu(&tp_probes->rcu, rcu_free_old_probes); + if (tracepoint_is_faultable(tp)) { + call_rcu_tasks_trace(&tp_probes->rcu, + rcu_free_old_probes); + } else { + call_srcu(&tracepoint_srcu, &tp_probes->rcu, + rcu_free_old_probes); + } } } =20 --=20 2.51.0