[PATCH 0/1] Support PERF_SAMPLE_READ with inherit_stat

Ben Gainey posted 1 patch 1 year, 11 months ago
kernel/events/core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
[PATCH 0/1] Support PERF_SAMPLE_READ with inherit_stat
Posted by Ben Gainey 1 year, 11 months ago
This change allows events to use PERF_SAMPLE READ with inherit so long 
as both inherit_stat and PERF_SAMPLE_TID are set.

Currently it is not possible to use PERF_SAMPLE_READ with inherit. This 
restriction assumes the user is interested in collecting aggregate 
statistics as per `perf stat`. It prevents a user from collecting 
per-thread samples using counter groups from a multi-threaded or 
multi-process application, as with `perf record -e '{....}:S'`. Instead 
users must use system-wide mode, or forgo the ability to sample counter 
groups. System-wide mode is often problematic as it requires specific 
permissions (no CAP_PERFMON / root access), or may lead to capture of 
significant amounts of extra data from other processes running on the 
system. 

Perf already supports the ability to collect per-thread counts with 
`inherit` via the `inherit_stat` flag. This patch changes 
`perf_event_alloc` relaxing the restriction to combine `inherit` with 
`PERF_SAMPLE_READ` so that the combination will be allowed so long as 
`inherit_stat` and `PERF_SAMPLE_TID` are enabled.

In this configuration stream ids (such as may appear in the read_format 
field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather 
the pair of (stream id, tid) uniquely identify each event. Tools that 
rely on this, for example to calculate a delta between samples, would 
need updating to take this into account. Previously valid event 
configurations (system-wide, no-inherit and so on) where each stream id 
is the identifier are unaffected.

This patch has been tested on aarch64 both my manual inspection of the 
output of `perf script -D` and through a modified version of Arm's 
commercial profiling tools and the numbers appear to line up as one 
would expect, but some further validation across other architectures 
and/or edge cases would be welcome.

This patch was developed and tested on top of v6.7.


Ben Gainey (1):
  perf: Support PERF_SAMPLE_READ with inherit_stat

 kernel/events/core.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.43.0
Re: [PATCH 0/1] Support PERF_SAMPLE_READ with inherit_stat
Posted by Namhyung Kim 1 year, 11 months ago
Hello,

On Fri, Jan 19, 2024 at 8:39 AM Ben Gainey <ben.gainey@arm.com> wrote:
>
> This change allows events to use PERF_SAMPLE READ with inherit so long
> as both inherit_stat and PERF_SAMPLE_TID are set.
>
> Currently it is not possible to use PERF_SAMPLE_READ with inherit. This
> restriction assumes the user is interested in collecting aggregate
> statistics as per `perf stat`. It prevents a user from collecting
> per-thread samples using counter groups from a multi-threaded or
> multi-process application, as with `perf record -e '{....}:S'`. Instead
> users must use system-wide mode, or forgo the ability to sample counter
> groups. System-wide mode is often problematic as it requires specific
> permissions (no CAP_PERFMON / root access), or may lead to capture of
> significant amounts of extra data from other processes running on the
> system.
>
> Perf already supports the ability to collect per-thread counts with
> `inherit` via the `inherit_stat` flag. This patch changes
> `perf_event_alloc` relaxing the restriction to combine `inherit` with
> `PERF_SAMPLE_READ` so that the combination will be allowed so long as
> `inherit_stat` and `PERF_SAMPLE_TID` are enabled.

I'm not sure if it's correct.  Maybe I misunderstand inherit_stat but
AFAIK it's just to use prev_task's events when next_task has the
compatible event context.  So the event values it sees in samples
would depend on the timing or scheduler behavior.

Also event counts and time values PERF_SAMPLE_READ sees
include child event's so the values of the parent event can be
updated even if it's inactive.  And the values will vary for the
next_task whether prev_task is the parent or not.  I think it
would return consistent values only if it iterates all child events
and sums up the values like it does for read(2).  But it cannot
do that in the NMI handler.

Frankly I don't understand how inherit_stat supports per-thread
counts properly.  Also it doesn't seem to be used by default in
the perf tools.  IIUC per-thread count is supported when you
don't set the inherit bit and open separate events for each
thread but I guess that's not what you want.

Anyway, I'm ok with the idea of using PERF_SAMPLE_READ to
improve per-thread profiling especially with event groups.
But I think it should not use inherit_stat and it needs a way to
not include child stats in the samples.

What do you think?

Thanks,
Namhyung

>
> In this configuration stream ids (such as may appear in the read_format
> field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather
> the pair of (stream id, tid) uniquely identify each event. Tools that
> rely on this, for example to calculate a delta between samples, would
> need updating to take this into account. Previously valid event
> configurations (system-wide, no-inherit and so on) where each stream id
> is the identifier are unaffected.
>
> This patch has been tested on aarch64 both my manual inspection of the
> output of `perf script -D` and through a modified version of Arm's
> commercial profiling tools and the numbers appear to line up as one
> would expect, but some further validation across other architectures
> and/or edge cases would be welcome.
>
> This patch was developed and tested on top of v6.7.
>
>
> Ben Gainey (1):
>   perf: Support PERF_SAMPLE_READ with inherit_stat
>
>  kernel/events/core.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> --
> 2.43.0
>
Re: [PATCH 0/1] Support PERF_SAMPLE_READ with inherit_stat
Posted by Andi Kleen 1 year, 11 months ago
Ben Gainey <ben.gainey@arm.com> writes:

> In this configuration stream ids (such as may appear in the read_format 
> field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather 
> the pair of (stream id, tid) uniquely identify each event. Tools that 
> rely on this, for example to calculate a delta between samples, would 
> need updating to take this into account. Previously valid event 
> configurations (system-wide, no-inherit and so on) where each stream id 
> is the identifier are unaffected.

So is this an ABI break? It might need an optin, if it breaks anything,
which wouldn't surprise me. We do have a lot of different perf stream
parsers around these days and we cannot break them.

-Andi
Re: [PATCH 0/1] Support PERF_SAMPLE_READ with inherit_stat
Posted by Ben Gainey 1 year, 11 months ago
On Fri, 2024-01-19 at 09:45 -0800, Andi Kleen wrote:
> Ben Gainey <ben.gainey@arm.com> writes:
>
> > In this configuration stream ids (such as may appear in the
> > read_format
> > field of a PERF_RECORD_SAMPLE) are no longer globally unique,
> > rather
> > the pair of (stream id, tid) uniquely identify each event. Tools
> > that
> > rely on this, for example to calculate a delta between samples,
> > would
> > need updating to take this into account. Previously valid event
> > configurations (system-wide, no-inherit and so on) where each
> > stream id
> > is the identifier are unaffected.
>
> So is this an ABI break? It might need an optin, if it breaks
> anything,
> which wouldn't surprise me. We do have a lot of different perf stream
> parsers around these days and we cannot break them.
>
> -Andi

I had considered that, but given currently this perf_event_attr
configuration is not allowed, I assumed that it would require existing
tools to add support which would in effect be an opt-in. Of course,
adding a new flag to be explicit would be trivial enough if required.

That said, the binary format for the mmap records / read() etc does not
change so using an unmodified tool to parse the data file will give bad
results. Therefore any workflow where "modified recording tool" can be
combined with "older / unmodified parsing tool" will break. Not sure of
the best way to handle this... presumably whenever a change is made to
the perf record format, any workflow that allows old parsers to read
new format data without version checks could fail? Admittedly this is a
"looks the same but isn't" change so harder for tools devs to spot. Any
suggestions?

For the perf tools, is there a means to record in perf.data a minimum
supported tool version / feature incompatibility flags?

Regards
Ben
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [PATCH 0/1] Support PERF_SAMPLE_READ with inherit_stat
Posted by Andi Kleen 1 year, 11 months ago
> I had considered that, but given currently this perf_event_attr
> configuration is not allowed, I assumed that it would require existing
> tools to add support which would in effect be an opt-in. Of course,
> adding a new flag to be explicit would be trivial enough if required.

That's fair. Makes sense.

> That said, the binary format for the mmap records / read() etc does not
> change so using an unmodified tool to parse the data file will give bad
> results. Therefore any workflow where "modified recording tool" can be
> combined with "older / unmodified parsing tool" will break. Not sure of
> the best way to handle this... presumably whenever a change is made to
> the perf record format, any workflow that allows old parsers to read
> new format data without version checks could fail? Admittedly this is a
> "looks the same but isn't" change so harder for tools devs to spot. Any
> suggestions?

For perf itself we can find something. It does a couple of checks, like
reserved bits in the perf_event_attr. For the general case of other
parsers it's unclear. I suppose could increment the magic identifier
to PERFILE3

-Andi