[PATCH v3 2/2] perf bpf-filter: Enable events manually

Ilya Leoshkevich posted 2 patches 2 months ago
[PATCH v3 2/2] perf bpf-filter: Enable events manually
Posted by Ilya Leoshkevich 2 months ago
On linux-next
commit b4c658d4d63d61 ("perf target: Remove uid from target")
introduces a regression on s390. In fact the regression exists
on all platforms when the event supports auxiliary data gathering.

Command

   # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.011 MB perf.data ]
   # ./perf report --stats | grep SAMPLE
   #

does not generate samples in the perf.data file.
On x86 command

  # sudo perf record -e intel_pt// -u 0 ls

is broken too.

Looking at the sequence of calls in 'perf record' reveals this
behavior:

1. The event 'cycles' is created and enabled:

   record__open()
   +-> evlist__apply_filters()
       +-> perf_bpf_filter__prepare()
	   +-> bpf_program.attach_perf_event()
	       +-> bpf_program.attach_perf_event_opts()
	           +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...)

   The event 'cycles' is enabled and active now. However the event's
   ring-buffer to store the samples generated by hardware is not
   allocated yet.

2. The event's fd is mmap()ed to create the ring buffer:

   record__open()
   +-> record__mmap()
       +-> record__mmap_evlist()
	   +-> evlist__mmap_ex()
	       +-> perf_evlist__mmap_ops()
	           +-> mmap_per_cpu()
	               +-> mmap_per_evsel()
	                   +-> mmap__mmap()
	                       +-> perf_mmap__mmap()
	                           +-> mmap()

   This allocates the ring buffer for the event 'cycles'. With mmap()
   the kernel creates the ring buffer:

   perf_mmap(): kernel function to create the event's ring
   |            buffer to save the sampled data.
   |
   +-> ring_buffer_attach(): Allocates memory for ring buffer.
       |        The PMU has auxiliary data setup function. The
       |        has_aux(event) condition is true and the PMU's
       |        stop() is called to stop sampling. It is not
       |        restarted:
       |
       |        if (has_aux(event))
       |                perf_event_stop(event, 0);
       |
       +-> cpumsf_pmu_stop():

   Hardware sampling is stopped. No samples are generated and saved
   anymore.

3. After the event 'cycles' has been mapped, the event is enabled a
   second time in:

   __cmd_record()
   +-> evlist__enable()
       +-> __evlist__enable()
	   +-> evsel__enable_cpu()
	       +-> perf_evsel__enable_cpu()
	           +-> perf_evsel__run_ioctl()
	               +-> perf_evsel__ioctl()
	                   +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .)

   The second

      ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);

   is just a NOP in this case. The first invocation in (1.) sets the
   event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions

   perf_ioctl()
   +-> _perf_ioctl()
       +-> _perf_event_enable()
           +-> __perf_event_enable()

   return immediately because event::state is already set to
   PERF_EVENT_STATE_ACTIVE.

This happens on s390, because the event 'cycles' offers the possibility
to save auxilary data. The PMU callbacks setup_aux() and free_aux() are
defined. Without both callback functions, cpumsf_pmu_stop() is not
invoked and sampling continues.

To remedy this, remove the first invocation of

   ioctl(..., PERF_EVENT_IOC_ENABLE, ...).

in step (1.) Create the event in step (1.) and enable it in step (3.)
after the ring buffer has been mapped.

Output after:

 # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2
 [ perf record: Woken up 3 times to write data ]
 [ perf record: Captured and wrote 0.876 MB perf.data ]
 # ./perf  report --stats | grep SAMPLE
              SAMPLE events:      16200  (99.5%)
              SAMPLE events:      16200
 #

The software event succeeded both before and after the patch:

 # ./perf record -e cpu-clock -aB --synth=no -u 0 -- \
					  ./perf test -w thloop 2
 [ perf record: Woken up 7 times to write data ]
 [ perf record: Captured and wrote 2.870 MB perf.data ]
 # ./perf  report --stats | grep SAMPLE
              SAMPLE events:      53506  (99.8%)
              SAMPLE events:      53506
 #

Fixes: 63f2f5ee856ba ("libbpf: add ability to attach/detach BPF program to perf event")
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Co-developed-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tools/perf/util/bpf-filter.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
index d0e013eeb0f7..d480ccaf3662 100644
--- a/tools/perf/util/bpf-filter.c
+++ b/tools/perf/util/bpf-filter.c
@@ -451,6 +451,8 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target)
 	struct bpf_link *link;
 	struct perf_bpf_filter_entry *entry;
 	bool needs_idx_hash = !target__has_cpu(target);
+	DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts,
+			    .no_ioctl_enable = true);
 
 	entry = calloc(MAX_FILTERS, sizeof(*entry));
 	if (entry == NULL)
@@ -522,7 +524,8 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target)
 	prog = skel->progs.perf_sample_filter;
 	for (x = 0; x < xyarray__max_x(evsel->core.fd); x++) {
 		for (y = 0; y < xyarray__max_y(evsel->core.fd); y++) {
-			link = bpf_program__attach_perf_event(prog, FD(evsel, x, y));
+			link = bpf_program__attach_perf_event_opts(prog, FD(evsel, x, y),
+								   &pe_opts);
 			if (IS_ERR(link)) {
 				pr_err("Failed to attach perf sample-filter program\n");
 				ret = PTR_ERR(link);
-- 
2.50.1
Re: [PATCH v3 2/2] perf bpf-filter: Enable events manually
Posted by Alexander Gordeev 2 months ago
On Tue, Aug 05, 2025 at 02:54:05PM +0200, Ilya Leoshkevich wrote:

Hi Thomas,

The below comments date to the initial version, so the question is
rather to you:

> On linux-next

This line is extra.

> commit b4c658d4d63d61 ("perf target: Remove uid from target")
> introduces a regression on s390. In fact the regression exists
> on all platforms when the event supports auxiliary data gathering.

So which commit it actually fixes: the above, the below or the both?

> Fixes: 63f2f5ee856ba ("libbpf: add ability to attach/detach BPF program to perf event")

Thanks!
Re: [PATCH v3 2/2] perf bpf-filter: Enable events manually
Posted by Thomas Richter 2 months ago
On 8/5/25 16:14, Alexander Gordeev wrote:
> On Tue, Aug 05, 2025 at 02:54:05PM +0200, Ilya Leoshkevich wrote:
> 
> Hi Thomas,
> 
> The below comments date to the initial version, so the question is
> rather to you:
> 
>> On linux-next
> 
> This line is extra.

I just wanted to let readers know which repo to look at.

> 
>> commit b4c658d4d63d61 ("perf target: Remove uid from target")
>> introduces a regression on s390. In fact the regression exists
>> on all platforms when the event supports auxiliary data gathering.
> 
> So which commit it actually fixes: the above, the below or the both?
> 
>> Fixes: 63f2f5ee856ba ("libbpf: add ability to attach/detach BPF program to perf event")
> 
> Thanks!
> 

Good question!  Pick what you like... :-)

The issue in question originates from a patch set of 10 patches.
The patch set rebuilds event sample with filtering and migrates
from perf tool's selective process picking to more generic eBPF
filtering using eBPF programs hooked to perf events.

To be precise, the issue Ilya's  patch fixes is this:
Fixes: 63f2f5ee856ba ("libbpf: add ability to attach/detach BPF program to perf event")

However the issue (perf failure) does *NOT* show up until this patch is applied:
commit b4c658d4d63d61 ("perf target: Remove uid from target")

There are some patches in between the two (when you look at the complete patch set),
but they do not affect the result.

Hope that helps.
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH

Vorsitzender des Aufsichtsrats: Wolfgang Wendt

Geschäftsführung: David Faller

Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
Re: [PATCH v3 2/2] perf bpf-filter: Enable events manually
Posted by Ilya Leoshkevich 1 month, 4 weeks ago
On Wed, 2025-08-06 at 11:29 +0200, Thomas Richter wrote:
> On 8/5/25 16:14, Alexander Gordeev wrote:
> > On Tue, Aug 05, 2025 at 02:54:05PM +0200, Ilya Leoshkevich wrote:
> > 
> > Hi Thomas,
> > 
> > The below comments date to the initial version, so the question is
> > rather to you:
> > 
> > > On linux-next
> > 
> > This line is extra.
> 
> I just wanted to let readers know which repo to look at.
> 
> > 
> > > commit b4c658d4d63d61 ("perf target: Remove uid from target")
> > > introduces a regression on s390. In fact the regression exists
> > > on all platforms when the event supports auxiliary data
> > > gathering.
> > 
> > So which commit it actually fixes: the above, the below or the
> > both?
> > 
> > > Fixes: 63f2f5ee856ba ("libbpf: add ability to attach/detach BPF
> > > program to perf event")
> > 
> > Thanks!
> > 
> 
> Good question!  Pick what you like... :-)
> 
> The issue in question originates from a patch set of 10 patches.
> The patch set rebuilds event sample with filtering and migrates
> from perf tool's selective process picking to more generic eBPF
> filtering using eBPF programs hooked to perf events.
> 
> To be precise, the issue Ilya's  patch fixes is this:
> Fixes: 63f2f5ee856ba ("libbpf: add ability to attach/detach BPF
> program to perf event")
> 
> However the issue (perf failure) does *NOT* show up until this patch
> is applied:
> commit b4c658d4d63d61 ("perf target: Remove uid from target")

I think I will switch the Fixes: tag to b4c658d4d63d61 then, because
IIUC it is one of the factors that drives backporting decisions, and
it does not make too much sense to backport it to earlier kernels.

> There are some patches in between the two (when you look at the
> complete patch set),
> but they do not affect the result.
> 
> Hope that helps.