include/linux/bpf_types.h | 2 + include/linux/trace_events.h | 9 + include/linux/tracepoint-defs.h | 6 + include/linux/tracepoint.h | 3 + include/uapi/linux/bpf.h | 2 + kernel/bpf/syscall.c | 35 +++- kernel/trace/bpf_trace.c | 31 +++ kernel/tracepoint.c | 190 +++++++++++++++++- tools/include/uapi/linux/bpf.h | 2 + tools/lib/bpf/bpf.c | 1 + tools/lib/bpf/bpf.h | 3 +- tools/lib/bpf/libbpf.c | 27 ++- tools/lib/bpf/libbpf.h | 3 +- .../bpf/prog_tests/raw_tp_override_test_run.c | 23 +++ .../bpf/progs/test_raw_tp_override_test_run.c | 20 ++ .../selftests/bpf/test_kmods/bpf_testmod.c | 7 + 16 files changed, 352 insertions(+), 12 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_override_test_run.c create mode 100644 tools/testing/selftests/bpf/progs/test_raw_tp_override_test_run.c
Hi everyone, This patchset introduces a new BPF program type that allows overriding a tracepoint probe function registered via register_trace_*. Motivation ---------- Tracepoint probe functions registered via register_trace_* in the kernel cannot be dynamically modified, changing a probe function requires recompiling the kernel and rebooting. Nor can BPF programs change an existing probe function. Overiding tracepoint supports a way to apply patches into kernel quickly (such as applying security ones), through predefined static tracepoints, without waiting for upstream integration. This patchset demonstrates the way to override probe functions by BPF program. Overview -------- This patchset adds BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE program type. When this type of BPF program attaches, it overrides the target tracepoint probe function. And it also extends a new struct type "tracepoint_func_snapshot", which extends the tracepoint structure. It is used to record the original probe function registered by kernel after BPF program being attached and restore from it after detachment. Critical steps -------------- 1. Attach: Attach programs via the raw_tracepoint_open syscall. 2. Override: (a) Locate the target probe by `probe_name`. (b) Override target probe with the BPF program. (c) Save the BPF program and target probe function into "tracepoint_func_snapshot". 3. Restore: When the BPF program is detached, automatically restore the original probe function from earlier saved snapshot. Future work ----------- This patchset is intended as a first step toward supporting BPF programs that can override tracepoint probes. The current implementation may not yet cover all use cases or handle every corner case. I welcome feedback and suggestions from the community, and will continue to refine and improve the design based on comments and real-world requirements. Thanks! Fuyu Fuyu Zhao (3): bpf: Introduce BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE libbpf: Add support for BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE selftests/bpf: Add selftest for "raw_tp.o" include/linux/bpf_types.h | 2 + include/linux/trace_events.h | 9 + include/linux/tracepoint-defs.h | 6 + include/linux/tracepoint.h | 3 + include/uapi/linux/bpf.h | 2 + kernel/bpf/syscall.c | 35 +++- kernel/trace/bpf_trace.c | 31 +++ kernel/tracepoint.c | 190 +++++++++++++++++- tools/include/uapi/linux/bpf.h | 2 + tools/lib/bpf/bpf.c | 1 + tools/lib/bpf/bpf.h | 3 +- tools/lib/bpf/libbpf.c | 27 ++- tools/lib/bpf/libbpf.h | 3 +- .../bpf/prog_tests/raw_tp_override_test_run.c | 23 +++ .../bpf/progs/test_raw_tp_override_test_run.c | 20 ++ .../selftests/bpf/test_kmods/bpf_testmod.c | 7 + 16 files changed, 352 insertions(+), 12 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_override_test_run.c create mode 100644 tools/testing/selftests/bpf/progs/test_raw_tp_override_test_run.c -- 2.43.0
On Wed, Sep 17, 2025 at 12:23 AM Fuyu Zhao <zhaofuyu@vivo.com> wrote: > > Hi everyone, > > This patchset introduces a new BPF program type that allows overriding > a tracepoint probe function registered via register_trace_*. > > Motivation > ---------- > Tracepoint probe functions registered via register_trace_* in the kernel > cannot be dynamically modified, changing a probe function requires recompiling > the kernel and rebooting. Nor can BPF programs change an existing > probe function. > > Overiding tracepoint supports a way to apply patches into kernel quickly > (such as applying security ones), through predefined static tracepoints, > without waiting for upstream integration. IIUC, this work solves the same problem as raw tracepoint (raw_tp) or raw tracepoint with btf (tp_btf). Did I miss something? Thanks, Song
On 9/18/2025 4:02 AM, Song Liu wrote: > On Wed, Sep 17, 2025 at 12:23 AM Fuyu Zhao <zhaofuyu@vivo.com> wrote: >> >> Hi everyone, >> >> This patchset introduces a new BPF program type that allows overriding >> a tracepoint probe function registered via register_trace_*. >> >> Motivation >> ---------- >> Tracepoint probe functions registered via register_trace_* in the kernel >> cannot be dynamically modified, changing a probe function requires recompiling >> the kernel and rebooting. Nor can BPF programs change an existing >> probe function. >> >> Overiding tracepoint supports a way to apply patches into kernel quickly >> (such as applying security ones), through predefined static tracepoints, >> without waiting for upstream integration. > > IIUC, this work solves the same problem as raw tracepoint (raw_tp) or raw > tracepoint with btf (tp_btf). > > Did I miss something? > > Thanks, > Song As I understand it, raw tracepoints (raw_tp) and raw tracepoint (raw_tp) are designed mainly for tracing the kernel. The goal of this work is to provide a way to override the tracepoint callback, so that kernel behavior can be adjusted dynamically. Thanks, Fuyu
On Thu, Sep 18, 2025 at 04:05:51PM +0800, Fuyu Zhao wrote: > > > On 9/18/2025 4:02 AM, Song Liu wrote: > > On Wed, Sep 17, 2025 at 12:23 AM Fuyu Zhao <zhaofuyu@vivo.com> wrote: > >> > >> Hi everyone, > >> > >> This patchset introduces a new BPF program type that allows overriding > >> a tracepoint probe function registered via register_trace_*. > >> > >> Motivation > >> ---------- > >> Tracepoint probe functions registered via register_trace_* in the kernel > >> cannot be dynamically modified, changing a probe function requires recompiling > >> the kernel and rebooting. Nor can BPF programs change an existing > >> probe function. > >> > >> Overiding tracepoint supports a way to apply patches into kernel quickly > >> (such as applying security ones), through predefined static tracepoints, > >> without waiting for upstream integration. > > > > IIUC, this work solves the same problem as raw tracepoint (raw_tp) or raw > > tracepoint with btf (tp_btf). > > > > Did I miss something? > > > > Thanks, > > Song > > As I understand it, raw tracepoints (raw_tp) and raw tracepoint (raw_tp) > are designed mainly for tracing the kernel. The goal of this work is to > provide a way to override the tracepoint callback, so that kernel behavior > can be adjusted dynamically. hi, what's the use case for this? also I'd think you can do that just by unregister the callback you want to override and register new one? thanks, jirka
On 9/18/2025 4:47 PM, Jiri Olsa wrote: > [You don't often get email from olsajiri@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Thu, Sep 18, 2025 at 04:05:51PM +0800, Fuyu Zhao wrote: >> >> >> On 9/18/2025 4:02 AM, Song Liu wrote: >>> On Wed, Sep 17, 2025 at 12:23 AM Fuyu Zhao <zhaofuyu@vivo.com> wrote: >>>> >>>> Hi everyone, >>>> >>>> This patchset introduces a new BPF program type that allows overriding >>>> a tracepoint probe function registered via register_trace_*. >>>> >>>> Motivation >>>> ---------- >>>> Tracepoint probe functions registered via register_trace_* in the kernel >>>> cannot be dynamically modified, changing a probe function requires recompiling >>>> the kernel and rebooting. Nor can BPF programs change an existing >>>> probe function. >>>> >>>> Overiding tracepoint supports a way to apply patches into kernel quickly >>>> (such as applying security ones), through predefined static tracepoints, >>>> without waiting for upstream integration. >>> >>> IIUC, this work solves the same problem as raw tracepoint (raw_tp) or raw >>> tracepoint with btf (tp_btf). >>> >>> Did I miss something? >>> >>> Thanks, >>> Song >> >> As I understand it, raw tracepoints (raw_tp) and raw tracepoint (raw_tp) >> are designed mainly for tracing the kernel. The goal of this work is to >> provide a way to override the tracepoint callback, so that kernel behavior >> can be adjusted dynamically. > > hi, > what's the use case for this? also I'd think you can do that just by > unregister the callback you want to override and register new one? > > thanks, > jirka At this moment, I don't have a real-world example. However, I mentioned one possible use case in my reply to Steven: > One possible use case is CPU core selection under certain scenarios. For example, > developers may want to experiment with alternative strategies for deciding > which CPU a task should run on to improve performance. > > If a tracepoint is added as a hook point in this path, then overriding its > function callback could make it possible to dynamically adjust the > cpu-selection logic without rebuilding and rebooting the kernel. As for the reason not to unregister and register a new callback: callbacks registered directly inside the kernel cannot be unregistered from user space. From user space, we can only attach additional callbacks with BPF programs, but can not remove or replace the ones already registered in the kernel. Therefore, an override mechanism is needed. Thanks, Fuyu
On Thu, 18 Sep 2025 21:15:57 +0800 Fuyu Zhao <zhaofuyu@vivo.com> wrote: > As for the reason not to unregister and register a new callback: > callbacks registered directly inside the kernel cannot be unregistered from > user space. From user space, we can only attach additional callbacks > with BPF programs, but can not remove or replace the ones already > registered in the kernel. Therefore, an override mechanism is needed. The fact that user space cannot unregister or override the current callbacks, to me is a feature and not a bug. -- Steve
On 9/18/2025 11:32 PM, Steven Rostedt wrote: > On Thu, 18 Sep 2025 21:15:57 +0800 > Fuyu Zhao <zhaofuyu@vivo.com> wrote: > >> As for the reason not to unregister and register a new callback: >> callbacks registered directly inside the kernel cannot be unregistered from >> user space. From user space, we can only attach additional callbacks >> with BPF programs, but can not remove or replace the ones already >> registered in the kernel. Therefore, an override mechanism is needed. > > The fact that user space cannot unregister or override the current > callbacks, to me is a feature and not a bug. > > -- Steve I see, thank you for sharing your view — I’ll keep it in mind. Sincerely, Fuyu
On Wed, 17 Sep 2025 15:22:39 +0800 Fuyu Zhao <zhaofuyu@vivo.com> wrote: > Hi everyone, > > This patchset introduces a new BPF program type that allows overriding > a tracepoint probe function registered via register_trace_*. > > Motivation > ---------- > Tracepoint probe functions registered via register_trace_* in the kernel > cannot be dynamically modified, changing a probe function requires recompiling > the kernel and rebooting. Nor can BPF programs change an existing > probe function. I'm confused by what you mean by "tracepoint probe function"? You mean the function callback that gets called via the "register_trace_*()"? > > Overiding tracepoint supports a way to apply patches into kernel quickly > (such as applying security ones), through predefined static tracepoints, > without waiting for upstream integration. This sounds way out of scope for tracepoints. Please provide a solid example for this. > > This patchset demonstrates the way to override probe functions by BPF program. > > Overview > -------- > This patchset adds BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE program type. > When this type of BPF program attaches, it overrides the target tracepoint > probe function. > > And it also extends a new struct type "tracepoint_func_snapshot", which extends > the tracepoint structure. It is used to record the original probe function > registered by kernel after BPF program being attached and restore from it > after detachment. The tracepoint structure exists for every tracepoint in the kernel. By adding a pointer to it, you just increased the size of the tracepoint. I'm already complaining that each tracepoint causes around 5K of memory overhead, and I'd like to make it smaller. -- Steve
Sorry, I just realized that I forgot to include the CC list in my first reply. Resending with CCs. Apologies to Steven for the extra noise. On 9/18/2025 3:30 AM, Steven Rostedt wrote: > On Wed, 17 Sep 2025 15:22:39 +0800 > Fuyu Zhao <zhaofuyu@vivo.com> wrote: > >> Hi everyone, >> >> This patchset introduces a new BPF program type that allows overriding >> a tracepoint probe function registered via register_trace_*. >> >> Motivation >> ---------- >> Tracepoint probe functions registered via register_trace_* in the kernel >> cannot be dynamically modified, changing a probe function requires recompiling >> the kernel and rebooting. Nor can BPF programs change an existing >> probe function. > > I'm confused by what you mean by "tracepoint probe function"? > > You mean the function callback that gets called via the "register_trace_*()"? > Yes, that’s correct. My earlier wording was not very precise — thanks for pointing that out. >> >> Overiding tracepoint supports a way to apply patches into kernel quickly >> (such as applying security ones), through predefined static tracepoints, >> without waiting for upstream integration. > > This sounds way out of scope for tracepoints. Please provide a solid > example for this. > I appreciate your comment. The example I gave about security patches probably wasn’t a good one here — I just meant to show the idea of changing kernel behavior at runtime. Sorry for the confusion. At the moment, I don’t have a solid real-world example to provide. This work is still in an exploratory stage. One possible use case is CPU core selection under certain scenarios. For example, developers may want to experiment with alternative strategies for deciding which CPU a task should run on to improve performance. If a tracepoint is added as a hook point in this path, then overriding its function callback could make it possible to dynamically adjust the cpu-selection logic without rebuilding and rebooting the kernel. The same mechanism could also be applied in other kernel paths where developers want to make quick changes from user space. >> >> This patchset demonstrates the way to override probe functions by BPF program. >> >> Overview >> -------- >> This patchset adds BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE program type. >> When this type of BPF program attaches, it overrides the target tracepoint >> probe function. >> >> And it also extends a new struct type "tracepoint_func_snapshot", which extends >> the tracepoint structure. It is used to record the original probe function >> registered by kernel after BPF program being attached and restore from it >> after detachment. > > The tracepoint structure exists for every tracepoint in the kernel. By > adding a pointer to it, you just increased the size of the tracepoint. I'm > already complaining that each tracepoint causes around 5K of memory > overhead, and I'd like to make it smaller. > > -- Steve > It is true that adding a pointer to the tracepoint structure increases memory overhead. However, memory for "snapshot" pointer will only be allocated after a BPF program is attached, and freed once it is dettached. I am also considering whether it is possible to reuse existing structures to reduce memory usage. I'd be very grateful for any suggestions or guidance you might have. Thanks, Fuyu
On Thu, 18 Sep 2025 20:33:22 +0800 Fuyu Zhao <zhaofuyu@vivo.com> wrote: > At the moment, I don’t have a solid real-world example to provide. > This work is still in an exploratory stage. We shouldn't be in the business of "if you build it, they will come". Unless there is a concrete use case now, I would not be adding anything. My entire workflow for what I created in the tracing system was "I have a need, I will implement it". The "need" came first. I then wrote code to satisfy that need. It should not be the other way around. -- Steve
On Thu Sep 18, 2025 at 3:29 PM UTC, Steven Rostedt wrote: > > My entire workflow for what I created in the tracing system was "I have a > need, I will implement it". The "need" came first. I then wrote code to > satisfy that need. It should not be the other way around. Tagging on to this sentiment - the kernel's design is emergent and will always remain so. Speculative features have a very low probability of reflecting the required design language. On the other hand, if someone needs a thing, the need will drive the use of conformal design language. ..Ch:W..
On 9/18/2025 11:24 PM, Steven Rostedt wrote: > On Thu, 18 Sep 2025 20:33:22 +0800 > Fuyu Zhao <zhaofuyu@vivo.com> wrote: > >> At the moment, I don’t have a solid real-world example to provide. >> This work is still in an exploratory stage. > > We shouldn't be in the business of "if you build it, they will come". > Unless there is a concrete use case now, I would not be adding anything. > > My entire workflow for what I created in the tracing system was "I have a > need, I will implement it". The "need" came first. I then wrote code to > satisfy that need. It should not be the other way around. > > -- Steve Thanks a lot for the feedback and guidance. I understand your point that new functionality should be driven by real needs rather than exploratory ideas. I’ll keep looking into this. If I find a concrete use case that demonstrates clear value, I’ll bring it back for discussion. Thanks again.
© 2016 - 2025 Red Hat, Inc.