include/linux/perf_event.h | 1 + kernel/events/core.c | 2 +- kernel/trace/bpf_trace.c | 55 ++++ tools/perf/Documentation/perf-record.txt | 51 ++++ tools/perf/Makefile.perf | 1 + tools/perf/builtin-record.c | 20 +- tools/perf/util/Build | 4 + tools/perf/util/auxtrace.h | 43 +++ tools/perf/util/bpf_auxtrace_pause.c | 408 ++++++++++++++++++++++++++ tools/perf/util/bpf_skel/auxtrace_pause.bpf.c | 156 ++++++++++ tools/perf/util/evsel.c | 6 + tools/perf/util/record.h | 1 + 12 files changed, 746 insertions(+), 2 deletions(-)
This series extends Perf for fine-grained tracing by using BPF program to pause and resume AUX tracing. The BPF program can be attached to tracepoints (including ftrace tracepoints and dynamic tracepoints, like kprobe, kretprobe, uprobe and uretprobe). The first two patches are changes in kernel - it adds a bpf kfunc which can be invoked from BPF program. The Perf tool implements BPF skeleton program, hooks BPF program into a perf record session. This is finished by patches 03 ~ 05. The patch 06 updates documentation for usage of the new introduced option '--bpf-aux-pause'. This series has been tested on Hikey960 platform with commands: perf record -e cs_etm/aux-action=start-paused/ \ --bpf-aux-pause="kretprobe:p:__arm64_sys_openat,kprobe:r:__arm64_sys_openat,tp:r:sched:sched_switch" \ -a -- ls perf record -e cs_etm/aux-action=start-paused/ \ --bpf-aux-pause="kretprobe:p:__arm64_sys_openat,kprobe:r:__arm64_sys_openat,tp:r:sched:sched_switch" \ -i -- ls perf record -e cs_etm/aux-action=start-paused/ \ --bpf-aux-pause="uretprobe:p:/mnt/sort:bubble_sort,uprobe:r:/mnt/sort:bubble_sort" \ --per-thread -- /mnt/sort Note, as the AUX pause operation cannot be inherited by child tasks, it requires to specify the '-i' option for default mode. Otherwise, the tool reports an error to remind user to disable inherited mode: Failed to update BPF map for auxtrace: Operation not supported. Try to disable inherit mode with option '-i'. Changes in v3: - Added check "map->type" (Eduard) - Fixed kfunc with guard(irqsave). - Link to v2: https://lore.kernel.org/r/20250718-perf_aux_pause_resume_bpf_rebase-v2-0-992557b8fb16@arm.com Changes in v2: - Changed to use BPF kfunc and dropped uAPI (Yonghong). - Added support uprobe/uretprobe. - Refined the syntax for trigger points (mainly for trigger action {p:r}). - Fixed a bug in the BPF program with passing wrong flag. - Rebased on bpf-next branch. - Link to v1: https://lore.kernel.org/linux-perf-users/20241215193436.275278-1-leo.yan@arm.com/T/#m10ea3e66bca7418db07c141a14217934f36e3bc8 --- Leo Yan (6): perf/core: Make perf_event_aux_pause() as external function bpf: Add bpf_perf_event_aux_pause kfunc perf: auxtrace: Control AUX pause and resume with BPF perf: auxtrace: Add BPF userspace program for AUX pause and resume perf record: Support AUX pause and resume with BPF perf docs: Document AUX pause and resume with BPF include/linux/perf_event.h | 1 + kernel/events/core.c | 2 +- kernel/trace/bpf_trace.c | 55 ++++ tools/perf/Documentation/perf-record.txt | 51 ++++ tools/perf/Makefile.perf | 1 + tools/perf/builtin-record.c | 20 +- tools/perf/util/Build | 4 + tools/perf/util/auxtrace.h | 43 +++ tools/perf/util/bpf_auxtrace_pause.c | 408 ++++++++++++++++++++++++++ tools/perf/util/bpf_skel/auxtrace_pause.bpf.c | 156 ++++++++++ tools/perf/util/evsel.c | 6 + tools/perf/util/record.h | 1 + 12 files changed, 746 insertions(+), 2 deletions(-) --- base-commit: 95993dc3039e29dabb9a50d074145d4cb757b08b change-id: 20250717-perf_aux_pause_resume_bpf_rebase-174c79b0bab5 Best regards, -- Leo Yan <leo.yan@arm.com>
On 25/07/2025 12:59, Leo Yan wrote: > This series extends Perf for fine-grained tracing by using BPF program > to pause and resume AUX tracing. The BPF program can be attached to > tracepoints (including ftrace tracepoints and dynamic tracepoints, like > kprobe, kretprobe, uprobe and uretprobe). Using eBPF to pause/resume AUX tracing seems like a great idea. AFAICT with this patch set, there is just support for pause/resume much like what could be done directly without eBPF, so I wonder if you could share a bit more on how you see this evolving, and what your future plans are? > > The first two patches are changes in kernel - it adds a bpf kfunc which > can be invoked from BPF program. > > The Perf tool implements BPF skeleton program, hooks BPF program into a > perf record session. This is finished by patches 03 ~ 05. > > The patch 06 updates documentation for usage of the new introduced > option '--bpf-aux-pause'. > > This series has been tested on Hikey960 platform with commands: > > perf record -e cs_etm/aux-action=start-paused/ \ > --bpf-aux-pause="kretprobe:p:__arm64_sys_openat,kprobe:r:__arm64_sys_openat,tp:r:sched:sched_switch" \ > -a -- ls > > perf record -e cs_etm/aux-action=start-paused/ \ > --bpf-aux-pause="kretprobe:p:__arm64_sys_openat,kprobe:r:__arm64_sys_openat,tp:r:sched:sched_switch" \ > -i -- ls > > perf record -e cs_etm/aux-action=start-paused/ \ > --bpf-aux-pause="uretprobe:p:/mnt/sort:bubble_sort,uprobe:r:/mnt/sort:bubble_sort" \ > --per-thread -- /mnt/sort > > Note, as the AUX pause operation cannot be inherited by child tasks, it > requires to specify the '-i' option for default mode. Otherwise, the > tool reports an error to remind user to disable inherited mode: > > Failed to update BPF map for auxtrace: Operation not supported. > Try to disable inherit mode with option '-i'. > > Changes in v3: > - Added check "map->type" (Eduard) > - Fixed kfunc with guard(irqsave). > - Link to v2: https://lore.kernel.org/r/20250718-perf_aux_pause_resume_bpf_rebase-v2-0-992557b8fb16@arm.com > > Changes in v2: > - Changed to use BPF kfunc and dropped uAPI (Yonghong). > - Added support uprobe/uretprobe. > - Refined the syntax for trigger points (mainly for trigger action {p:r}). > - Fixed a bug in the BPF program with passing wrong flag. > - Rebased on bpf-next branch. > - Link to v1: https://lore.kernel.org/linux-perf-users/20241215193436.275278-1-leo.yan@arm.com/T/#m10ea3e66bca7418db07c141a14217934f36e3bc8 > > --- > Leo Yan (6): > perf/core: Make perf_event_aux_pause() as external function > bpf: Add bpf_perf_event_aux_pause kfunc > perf: auxtrace: Control AUX pause and resume with BPF > perf: auxtrace: Add BPF userspace program for AUX pause and resume > perf record: Support AUX pause and resume with BPF > perf docs: Document AUX pause and resume with BPF > > include/linux/perf_event.h | 1 + > kernel/events/core.c | 2 +- > kernel/trace/bpf_trace.c | 55 ++++ > tools/perf/Documentation/perf-record.txt | 51 ++++ > tools/perf/Makefile.perf | 1 + > tools/perf/builtin-record.c | 20 +- > tools/perf/util/Build | 4 + > tools/perf/util/auxtrace.h | 43 +++ > tools/perf/util/bpf_auxtrace_pause.c | 408 ++++++++++++++++++++++++++ > tools/perf/util/bpf_skel/auxtrace_pause.bpf.c | 156 ++++++++++ > tools/perf/util/evsel.c | 6 + > tools/perf/util/record.h | 1 + > 12 files changed, 746 insertions(+), 2 deletions(-) > --- > base-commit: 95993dc3039e29dabb9a50d074145d4cb757b08b > change-id: 20250717-perf_aux_pause_resume_bpf_rebase-174c79b0bab5 > > Best regards,
Hi Adrian, On Mon, Jul 28, 2025 at 08:02:51PM +0300, Adrian Hunter wrote: > On 25/07/2025 12:59, Leo Yan wrote: > > This series extends Perf for fine-grained tracing by using BPF program > > to pause and resume AUX tracing. The BPF program can be attached to > > tracepoints (including ftrace tracepoints and dynamic tracepoints, like > > kprobe, kretprobe, uprobe and uretprobe). > > Using eBPF to pause/resume AUX tracing seems like a great idea. > > AFAICT with this patch set, there is just support for pause/resume > much like what could be done directly without eBPF, so I wonder if you > could share a bit more on how you see this evolving, and what your > future plans are? IIUC, here you mean the tool can use `perf probe` to firstly create probes, then enable tracepoints as PMU event for AUX pause and resume. I would say a benefit from this series is users can use a single command to create probes and bind eBPF program for AUX pause and resume in one go. To be honest, at current stage, I don't have clear idea for expanding this feature. But a clear requirement is: AUX trace data usually is quite huge, after initial analysis, developers might want to focus on specific function profiling (based on function entry and exit) or specific period (E.g., start tracing when hit a tracepoing and stop when hit another tracepoint). eBPF program is powerful. Basically, we can extend it in two different dimensions. One direction is we can easily attach the eBPF program to more kernel modules, like networking, storage, etc. Another direction is to improve the eBPF program itself as a filter for better fine-grained tracing, so far we only support limited filtering based on CPU ID or PID, we also can extend the filtering based on time, event types, etc. Thanks, Leo
On 30/07/2025 21:26, Leo Yan wrote: > Hi Adrian, > > On Mon, Jul 28, 2025 at 08:02:51PM +0300, Adrian Hunter wrote: >> On 25/07/2025 12:59, Leo Yan wrote: >>> This series extends Perf for fine-grained tracing by using BPF program >>> to pause and resume AUX tracing. The BPF program can be attached to >>> tracepoints (including ftrace tracepoints and dynamic tracepoints, like >>> kprobe, kretprobe, uprobe and uretprobe). >> >> Using eBPF to pause/resume AUX tracing seems like a great idea. >> >> AFAICT with this patch set, there is just support for pause/resume >> much like what could be done directly without eBPF, so I wonder if you >> could share a bit more on how you see this evolving, and what your >> future plans are? > > IIUC, here you mean the tool can use `perf probe` to firstly create > probes, then enable tracepoints as PMU event for AUX pause and resume. Yes, like: $ sudo perf probe 'do_sys_openat2 how->flags how->mode' Added new event: probe:do_sys_openat2 (on do_sys_openat2 with flags=how->flags mode=how->mode) You can now use it in all perf tools, such as: perf record -e probe:do_sys_openat2 -aR sleep 1 $ sudo perf probe do_sys_openat2%return Added new event: probe:do_sys_openat2__return (on do_sys_openat2%return) You can now use it in all perf tools, such as: perf record -e probe:do_sys_openat2__return -aR sleep 1 $ sudo perf record --kcore -e intel_pt/aux-action=start-paused/k -e probe:do_sys_openat2/aux-action=resume/ --filter='flags==0x98800' -e probe:do_sys_openat2__return/aux-action=pause/ -- ls arch certs CREDITS cscope.out drivers fs include io_uring Kbuild kernel LICENSES Makefile mm perf.data README samples security tools virt block COPYING crypto Documentation init ipc Kconfig lib MAINTAINERS net rust scripts sound usr [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.067 MB perf.data ] $ sudo perf script --itrace=qi | grep -B1 instructions ls 37607 [003] 36109.137560: probe:do_sys_openat2: (ffffffff9d2276a0) flags=0x98800 mode=0x0 ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9cdc3834 native_write_msr+0x4 ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9cdc3836 native_write_msr+0x6 ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9cd26728 pt_config_start+0x58 ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9cd27727 pt_event_start+0x107 ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9d0d5a04 perf_event_aux_pause+0x114 ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9d0d80f7 __perf_event_overflow+0x197 ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9d0d844d perf_swevent_event+0x12d ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9d0d8738 perf_tp_event+0x188 ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9d00fad6 kprobe_perf_func+0x256 ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9d00fbbd kprobe_dispatcher+0x6d ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9cf80582 aggr_pre_handler+0x42 ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9cdbcbb2 kprobe_ftrace_handler+0x152 ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffffc12440f5 ftrace_trampoline+0xf5 ([kernel.kallsyms]) ls 37607 [003] 36109.137562: 1 instructions:k: ffffffff9d2276a5 do_sys_openat2+0x5 ([kernel.kallsyms]) ls 37607 [003] 36109.137563: 1 instructions:k: ffffffff9d4c3d60 hook_file_alloc_security+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137564: 1 instructions:k: ffffffff9d4a5050 apparmor_file_alloc_security+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137565: 1 instructions:k: ffffffff9d42d400 cap_capable+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137565: 1 instructions:k: ffffffff9d4a4b70 apparmor_capable+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137566: 1 instructions:k: ffffffff9d42d400 cap_capable+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137566: 1 instructions:k: ffffffff9d4a4b70 apparmor_capable+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137567: 1 instructions:k: ffffffff9d4c4e80 hook_file_open+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137567: 1 instructions:k: ffffffff9d4a5aa0 apparmor_file_open+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137567: 1 instructions:k: ffffffff9d31fb10 ext4_dir_open+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137567: 1 instructions:k: ffffffff9d4cc740 ima_file_check+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137567: 1 instructions:k: ffffffff9d4a5960 apparmor_current_getlsmprop_subj+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137568: 1 instructions:k: ffffffff9cdb76c0 arch_rethook_trampoline+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137568: 1 instructions:k: ffffffff9cf80670 kretprobe_rethook_handler+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137568: 1 instructions:k: ffffffff9d00fe90 kretprobe_dispatcher+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137568: 1 instructions:k: ffffffff9cd282c0 pt_event_stop+0x0 ([kernel.kallsyms]) ls 37607 [003] 36109.137569: 1 instructions:k: ffffffff9cdc3834 native_write_msr+0x4 ([kernel.kallsyms]) > > I would say a benefit from this series is users can use a single > command to create probes and bind eBPF program for AUX pause and > resume in one go. > > To be honest, at current stage, I don't have clear idea for expanding > this feature. But a clear requirement is: AUX trace data usually is > quite huge, after initial analysis, developers might want to focus > on specific function profiling (based on function entry and exit) or > specific period (E.g., start tracing when hit a tracepoing and stop when > hit another tracepoint). > > eBPF program is powerful. Basically, we can extend it in two different > dimensions. One direction is we can easily attach the eBPF program to more > kernel modules, like networking, storage, etc. Another direction is to > improve the eBPF program itself as a filter for better fine-grained > tracing, so far we only support limited filtering based on CPU ID or PID, > we also can extend the filtering based on time, event types, etc.
On Tue, Aug 05, 2025 at 10:16:29PM +0300, Adrian Hunter wrote: > On 30/07/2025 21:26, Leo Yan wrote: > > Hi Adrian, > > > > On Mon, Jul 28, 2025 at 08:02:51PM +0300, Adrian Hunter wrote: > >> On 25/07/2025 12:59, Leo Yan wrote: > >>> This series extends Perf for fine-grained tracing by using BPF program > >>> to pause and resume AUX tracing. The BPF program can be attached to > >>> tracepoints (including ftrace tracepoints and dynamic tracepoints, like > >>> kprobe, kretprobe, uprobe and uretprobe). > >> > >> Using eBPF to pause/resume AUX tracing seems like a great idea. > >> > >> AFAICT with this patch set, there is just support for pause/resume > >> much like what could be done directly without eBPF, so I wonder if you > >> could share a bit more on how you see this evolving, and what your > >> future plans are? > > > > IIUC, here you mean the tool can use `perf probe` to firstly create > > probes, then enable tracepoints as PMU event for AUX pause and resume. > > Yes, like: > > $ sudo perf probe 'do_sys_openat2 how->flags how->mode' > Added new event: > probe:do_sys_openat2 (on do_sys_openat2 with flags=how->flags mode=how->mode) > > You can now use it in all perf tools, such as: > > perf record -e probe:do_sys_openat2 -aR sleep 1 > > $ sudo perf probe do_sys_openat2%return > Added new event: > probe:do_sys_openat2__return (on do_sys_openat2%return) > > You can now use it in all perf tools, such as: > > perf record -e probe:do_sys_openat2__return -aR sleep 1 > > $ sudo perf record --kcore -e intel_pt/aux-action=start-paused/k -e probe:do_sys_openat2/aux-action=resume/ --filter='flags==0x98800' -e probe:do_sys_openat2__return/aux-action=pause/ -- ls Thanks a lot for sharing the commands. I was able to replicate them using CoreSight. Given that we can achieve the same result without using BPF, I am not sure how useful this series is. It may give us a base for exploring profiling that combines AUX trace and BPF, but I am fine with holding on until we have clear requirements for it. I would get suggestion from you and maintainers before proceeding further. Thanks, Leo
On Fri, Aug 08, 2025 at 12:47:34PM +0100, Leo Yan wrote: > On Tue, Aug 05, 2025 at 10:16:29PM +0300, Adrian Hunter wrote: > > On 30/07/2025 21:26, Leo Yan wrote: > > > On Mon, Jul 28, 2025 at 08:02:51PM +0300, Adrian Hunter wrote: > > >> On 25/07/2025 12:59, Leo Yan wrote: > > >>> This series extends Perf for fine-grained tracing by using BPF program > > >>> to pause and resume AUX tracing. The BPF program can be attached to > > >>> tracepoints (including ftrace tracepoints and dynamic tracepoints, like > > >>> kprobe, kretprobe, uprobe and uretprobe). > > >> Using eBPF to pause/resume AUX tracing seems like a great idea. > > >> AFAICT with this patch set, there is just support for pause/resume > > >> much like what could be done directly without eBPF, so I wonder if you > > >> could share a bit more on how you see this evolving, and what your > > >> future plans are? > > > IIUC, here you mean the tool can use `perf probe` to firstly create > > > probes, then enable tracepoints as PMU event for AUX pause and resume. > > Yes, like: > > $ sudo perf probe 'do_sys_openat2 how->flags how->mode' > > Added new event: > > probe:do_sys_openat2 (on do_sys_openat2 with flags=how->flags mode=how->mode) > > You can now use it in all perf tools, such as: > > perf record -e probe:do_sys_openat2 -aR sleep 1 > > $ sudo perf probe do_sys_openat2%return > > Added new event: > > probe:do_sys_openat2__return (on do_sys_openat2%return) > > You can now use it in all perf tools, such as: > > perf record -e probe:do_sys_openat2__return -aR sleep 1 > > $ sudo perf record --kcore -e intel_pt/aux-action=start-paused/k -e probe:do_sys_openat2/aux-action=resume/ --filter='flags==0x98800' -e probe:do_sys_openat2__return/aux-action=pause/ -- ls > Thanks a lot for sharing the commands. I was able to replicate them > using CoreSight. > Given that we can achieve the same result without using BPF, I am not > sure how useful this series is. It may give us a base for exploring > profiling that combines AUX trace and BPF, but I am fine with holding > on until we have clear requirements for it. > I would get suggestion from you and maintainers before proceeding > further. Maybe retrofit this for starting stopping profiling non HW tracing sections? We have now: ⬢ [acme@toolbx perf-tools-next]$ perf record -h switch Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] --switch-events Record context switch events --switch-max-files <n> Limit number of switch output generated files --switch-output[=<signal or size[BKMG] or time[smhd]>] Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold --switch-output-event <switch output event> switch output event selector. use 'perf list' to list available events ⬢ [acme@toolbx perf-tools-next]$ That will dump a snapshot when some event takes place, but that is done with a sideband thread, from 'man perf-record': --switch-output-event:: Events that will cause the switch of the perf.data file, auto-selecting --switch-output=signal, the results are similar as internally the side band thread will also send a SIGUSR2 to the main one. Uses the same syntax as --event, it will just not be recorded, serving only to switch the perf.data file as soon as the --switch-output event is processed by a separate sideband thread. This sideband thread is also used to other purposes, like processing the PERF_RECORD_BPF_EVENT records as they happen, asking the kernel for extra BPF information, etc. ---------------------- And in perf-report we have: ---- --switch-on EVENT_NAME:: Only consider events after this event is found. This may be interesting to measure a workload only after some initialization phase is over, i.e. insert a perf probe at that point and then using this option with that probe. --switch-off EVENT_NAME:: Stop considering events after this event is found. --show-on-off-events:: Show the --switch-on/off events too. This has no effect in 'perf report' now but probably we'll make the default not to show the switch-on/off events on the --group mode and if there is only one event besides the off/on ones, go straight to the histogram browser, just like 'perf report' with no events explicitly specified does. ---- If we had it in 'perf record' then we would have reduced perf.data files. I.e. we would have something like '-e {cycles,instructions}/action=start-paused/k' -e probe:do_sys_openat2/action=resume/ --filter='flags==0x98800' -e probe:do_sys_openat2__return/action=pause/ - Arnaldo
Hi all, On Fri, Jul 25, 2025 at 10:59:09AM +0100, Leo Yan wrote: > This series extends Perf for fine-grained tracing by using BPF program > to pause and resume AUX tracing. The BPF program can be attached to > tracepoints (including ftrace tracepoints and dynamic tracepoints, like > kprobe, kretprobe, uprobe and uretprobe). Due to stale prefixes in my local b4 configuration, the version number in this series is mess. I have resent v3 [1], please kindly review the resent version. Sorry for inconvenience. Leo [1] https://lore.kernel.org/linux-perf-users/20250725-perf_aux_pause_resume_bpf_rebase-v3-0-ae21deb49d1a@arm.com/T/#mb884408efc23dc7013b925e65fe4eeb051b97c7f
© 2016 - 2025 Red Hat, Inc.