tools/lib/bpf/libbpf.c | 6 ------ 1 file changed, 6 deletions(-)
On linux-next
commit b4c658d4d63d61 ("perf target: Remove uid from target")
introduces a regression on s390. In fact the regression exists
on all platforms when the event supports auxiliary data gathering.
Command
# ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data ]
# ./perf report --stats | grep SAMPLE
#
does not generate samples in the perf.data file.
On x86 command
# sudo perf record -e intel_pt// -u 0 ls
is broken too.
Looking at the sequence of calls in 'perf record' reveals this
behavior:
1. The event 'cycles' is created and enabled:
record__open()
+-> evlist__apply_filters()
+-> perf_bpf_filter__prepare()
+-> bpf_program.attach_perf_event()
+-> bpf_program.attach_perf_event_opts()
+-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...)
The event 'cycles' is enabled and active now. However the event's
ring-buffer to store the samples generated by hardware is not
allocated yet. This happens now after enabling the event:
2. The event's fd is mmap() to create the ring buffer:
record__open()
+-> record__mmap()
+-> record__mmap_evlist()
+-> evlist__mmap_ex()
+-> perf_evlist__mmap_ops()
+-> mmap_per_cpu()
+-> mmap_per_evsel()
+-> mmap__mmap()
+-> perf_mmap__mmap()
+-> mmap()
This allocates the ring-buffer for the event 'cycles'. With mmap()
the kernel creates the ring buffer:
perf_mmap(): kernel function to create the event's ring
| buffer to save the sampled data.
|
+-> ring_buffer_attach(): Allocates memory for ring buffer.
| The PMU has auxiliary data setup function. The
| has_aux(event) condition is true and the PMU's
| stop() is called to stop sampling. It is not
| restarted:
| if (has_aux(event))
| perf_event_stop(event, 0);
|
+-> cpumsf_pmu_stop():
Hardware sampling is stopped. No samples are generated and saved
anymore.
3. After the event 'cycles' has been mapped, the event is enabled a
second time in:
__cmd_record()
+-> evlist__enable()
+-> __evlist__enable()
+-> evsel__enable_cpu()
+-> perf_evsel__enable_cpu()
+-> perf_evsel__run_ioctl()
+-> perf_evsel__ioctl()
+-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .)
The second
ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
is just a NOP in this case. The first invocation in (1.) sets the
event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions
perf_ioctl()
+-> _perf_ioctl()
+-> _perf_event_enable()
+-> __perf_event_enable() returns immediately because
event::state is already set to
PERF_EVENT_STATE_ACTIVE.
This happens on s390, because the event 'cycles' offers the possibility
to save auxilary data. The PMU call backs setup_aux() and
free_aux() are defined. Without both call back functions,
cpumsf_pmu_stop() is not invoked and sampling continues.
To remedy this, remove the first invocation of
ioctl(..., PERF_EVENT_IOC_ENABLE, ...).
in step (1.) Create the event in step (1.) and enable it in step (3.)
after the ring buffer has been mapped.
Output after:
# ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.876 MB perf.data ]
# ./perf report --stats | grep SAMPLE
SAMPLE events: 16200 (99.5%)
SAMPLE events: 16200
#
The software event succeeded before and after the patch:
# ./perf record -e cpu-clock -aB --synth=no -u 0 -- ./perf test -w thloop 2
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 2.870 MB perf.data ]
# ./perf report --stats | grep SAMPLE
SAMPLE events: 53506 (99.8%)
SAMPLE events: 53506
#
Fixes: 63f2f5ee856ba ("libbpf: add ability to attach/detach BPF program to perf event")
To: Andrii Nakryiko <andriin@fb.com>
To: Ian Rogers <irogers@google.com>
To: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
tools/lib/bpf/libbpf.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 162ebd16a59f..5973412a1031 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10960,12 +10960,6 @@ struct bpf_link *bpf_program__attach_perf_event_opts(const struct bpf_program *p
}
link->link.fd = pfd;
}
- if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) {
- err = -errno;
- pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n",
- prog->name, pfd, errstr(err));
- goto err_out;
- }
return &link->link;
err_out:
--
2.50.0
On Fri, Jul 25, 2025 at 11:34:05AM +0200, Thomas Richter wrote: > On linux-next > commit b4c658d4d63d61 ("perf target: Remove uid from target") > introduces a regression on s390. In fact the regression exists > on all platforms when the event supports auxiliary data gathering. > > Command > # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.011 MB perf.data ] > # ./perf report --stats | grep SAMPLE > # > > does not generate samples in the perf.data file. > On x86 command > # sudo perf record -e intel_pt// -u 0 ls > is broken too. > > Looking at the sequence of calls in 'perf record' reveals this > behavior: > 1. The event 'cycles' is created and enabled: > record__open() > +-> evlist__apply_filters() > +-> perf_bpf_filter__prepare() > +-> bpf_program.attach_perf_event() > +-> bpf_program.attach_perf_event_opts() > +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...) > The event 'cycles' is enabled and active now. However the event's > ring-buffer to store the samples generated by hardware is not > allocated yet. This happens now after enabling the event: > > 2. The event's fd is mmap() to create the ring buffer: > record__open() > +-> record__mmap() > +-> record__mmap_evlist() > +-> evlist__mmap_ex() > +-> perf_evlist__mmap_ops() > +-> mmap_per_cpu() > +-> mmap_per_evsel() > +-> mmap__mmap() > +-> perf_mmap__mmap() > +-> mmap() > > This allocates the ring-buffer for the event 'cycles'. With mmap() > the kernel creates the ring buffer: > > perf_mmap(): kernel function to create the event's ring > | buffer to save the sampled data. > | > +-> ring_buffer_attach(): Allocates memory for ring buffer. > | The PMU has auxiliary data setup function. The > | has_aux(event) condition is true and the PMU's > | stop() is called to stop sampling. It is not > | restarted: > | if (has_aux(event)) > | perf_event_stop(event, 0); > | > +-> cpumsf_pmu_stop(): > > Hardware sampling is stopped. No samples are generated and saved > anymore. > > 3. After the event 'cycles' has been mapped, the event is enabled a > second time in: > __cmd_record() > +-> evlist__enable() > +-> __evlist__enable() > +-> evsel__enable_cpu() > +-> perf_evsel__enable_cpu() > +-> perf_evsel__run_ioctl() > +-> perf_evsel__ioctl() > +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .) > The second > ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); > is just a NOP in this case. The first invocation in (1.) sets the > event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions > perf_ioctl() > +-> _perf_ioctl() > +-> _perf_event_enable() > +-> __perf_event_enable() returns immediately because > event::state is already set to > PERF_EVENT_STATE_ACTIVE. > > This happens on s390, because the event 'cycles' offers the possibility > to save auxilary data. The PMU call backs setup_aux() and > free_aux() are defined. Without both call back functions, > cpumsf_pmu_stop() is not invoked and sampling continues. > > To remedy this, remove the first invocation of > ioctl(..., PERF_EVENT_IOC_ENABLE, ...). > in step (1.) Create the event in step (1.) and enable it in step (3.) > after the ring buffer has been mapped. > > Output after: > # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2 > [ perf record: Woken up 3 times to write data ] > [ perf record: Captured and wrote 0.876 MB perf.data ] > # ./perf report --stats | grep SAMPLE > SAMPLE events: 16200 (99.5%) > SAMPLE events: 16200 > # > > The software event succeeded before and after the patch: > # ./perf record -e cpu-clock -aB --synth=no -u 0 -- ./perf test -w thloop 2 > [ perf record: Woken up 7 times to write data ] > [ perf record: Captured and wrote 2.870 MB perf.data ] > # ./perf report --stats | grep SAMPLE > SAMPLE events: 53506 (99.8%) > SAMPLE events: 53506 > # > > Fixes: 63f2f5ee856ba ("libbpf: add ability to attach/detach BPF program to perf event") > To: Andrii Nakryiko <andriin@fb.com> > To: Ian Rogers <irogers@google.com> > To: Ilya Leoshkevich <iii@linux.ibm.com> > Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> > Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> > --- > tools/lib/bpf/libbpf.c | 6 ------ > 1 file changed, 6 deletions(-) > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index 162ebd16a59f..5973412a1031 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -10960,12 +10960,6 @@ struct bpf_link *bpf_program__attach_perf_event_opts(const struct bpf_program *p > } > link->link.fd = pfd; > } > - if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) { > - err = -errno; > - pr_warn("prog '%s': failed to enable perf_event FD %d: %s\n", > - prog->name, pfd, errstr(err)); > - goto err_out; > - } I think this might break existing users depending on this could we instead add some 'enable' flag to bpf_perf_event_opts and perf would use bpf_program__attach_perf_event_opts function instead? jirka
On 7/25/25 16:06, Jiri Olsa wrote: > On Fri, Jul 25, 2025 at 11:34:05AM +0200, Thomas Richter wrote: >> On linux-next >> commit b4c658d4d63d61 ("perf target: Remove uid from target") >> introduces a regression on s390. In fact the regression exists >> on all platforms when the event supports auxiliary data gathering. >> >> Command >> # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop >> [ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.011 MB perf.data ] >> # ./perf report --stats | grep SAMPLE >> # .... >> - goto err_out; >> - } > > I think this might break existing users depending on this > > could we instead add some 'enable' flag to bpf_perf_event_opts and perf > would use bpf_program__attach_perf_event_opts function instead? > > jirka > Hi Jiri, Ilya and Ian, Jiri recommended a more flexible approach and I submitted version 2 https://lore.kernel.org/all/20250728144340.711196-1-tmricht@linux.ibm.com This version now also passed the eBPF ci-test suite. I am blind on the eBPF topic and do not have enough knowledge to dig deeper into eBPF. Please note that this is not an s390 specific issue. It happens on other platforms when the event supports auxiliary data gathering, i. e. has_aux() returns true. I would like to ask someone with more eBPF knowledge for some help to drive this further. Thanks a lot. -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
© 2016 - 2025 Red Hat, Inc.