kernel/events/core.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
This change allows events to use PERF_SAMPLE READ with inherit so long
as both inherit_stat and PERF_SAMPLE_TID are set.
Currently it is not possible to use PERF_SAMPLE_READ with inherit. This
restriction assumes the user is interested in collecting aggregate
statistics as per `perf stat`. It prevents a user from collecting
per-thread samples using counter groups from a multi-threaded or
multi-process application, as with `perf record -e '{....}:S'`. Instead
users must use system-wide mode, or forgo the ability to sample counter
groups. System-wide mode is often problematic as it requires specific
permissions (no CAP_PERFMON / root access), or may lead to capture of
significant amounts of extra data from other processes running on the
system.
Perf already supports the ability to collect per-thread counts with
`inherit` via the `inherit_stat` flag. This patch changes
`perf_event_alloc` relaxing the restriction to combine `inherit` with
`PERF_SAMPLE_READ` so that the combination will be allowed so long as
`inherit_stat` and `PERF_SAMPLE_TID` are enabled.
In this configuration stream ids (such as may appear in the read_format
field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather
the pair of (stream id, tid) uniquely identify each event. Tools that
rely on this, for example to calculate a delta between samples, would
need updating to take this into account. Previously valid event
configurations (system-wide, no-inherit and so on) where each stream id
is the identifier are unaffected.
This patch has been tested on aarch64 both my manual inspection of the
output of `perf script -D` and through a modified version of Arm's
commercial profiling tools and the numbers appear to line up as one
would expect, but some further validation across other architectures
and/or edge cases would be welcome.
This patch was developed and tested on top of v6.7.
Ben Gainey (1):
perf: Support PERF_SAMPLE_READ with inherit_stat
kernel/events/core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--
2.43.0
Hello,
On Fri, Jan 19, 2024 at 8:39 AM Ben Gainey <ben.gainey@arm.com> wrote:
>
> This change allows events to use PERF_SAMPLE READ with inherit so long
> as both inherit_stat and PERF_SAMPLE_TID are set.
>
> Currently it is not possible to use PERF_SAMPLE_READ with inherit. This
> restriction assumes the user is interested in collecting aggregate
> statistics as per `perf stat`. It prevents a user from collecting
> per-thread samples using counter groups from a multi-threaded or
> multi-process application, as with `perf record -e '{....}:S'`. Instead
> users must use system-wide mode, or forgo the ability to sample counter
> groups. System-wide mode is often problematic as it requires specific
> permissions (no CAP_PERFMON / root access), or may lead to capture of
> significant amounts of extra data from other processes running on the
> system.
>
> Perf already supports the ability to collect per-thread counts with
> `inherit` via the `inherit_stat` flag. This patch changes
> `perf_event_alloc` relaxing the restriction to combine `inherit` with
> `PERF_SAMPLE_READ` so that the combination will be allowed so long as
> `inherit_stat` and `PERF_SAMPLE_TID` are enabled.
I'm not sure if it's correct. Maybe I misunderstand inherit_stat but
AFAIK it's just to use prev_task's events when next_task has the
compatible event context. So the event values it sees in samples
would depend on the timing or scheduler behavior.
Also event counts and time values PERF_SAMPLE_READ sees
include child event's so the values of the parent event can be
updated even if it's inactive. And the values will vary for the
next_task whether prev_task is the parent or not. I think it
would return consistent values only if it iterates all child events
and sums up the values like it does for read(2). But it cannot
do that in the NMI handler.
Frankly I don't understand how inherit_stat supports per-thread
counts properly. Also it doesn't seem to be used by default in
the perf tools. IIUC per-thread count is supported when you
don't set the inherit bit and open separate events for each
thread but I guess that's not what you want.
Anyway, I'm ok with the idea of using PERF_SAMPLE_READ to
improve per-thread profiling especially with event groups.
But I think it should not use inherit_stat and it needs a way to
not include child stats in the samples.
What do you think?
Thanks,
Namhyung
>
> In this configuration stream ids (such as may appear in the read_format
> field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather
> the pair of (stream id, tid) uniquely identify each event. Tools that
> rely on this, for example to calculate a delta between samples, would
> need updating to take this into account. Previously valid event
> configurations (system-wide, no-inherit and so on) where each stream id
> is the identifier are unaffected.
>
> This patch has been tested on aarch64 both my manual inspection of the
> output of `perf script -D` and through a modified version of Arm's
> commercial profiling tools and the numbers appear to line up as one
> would expect, but some further validation across other architectures
> and/or edge cases would be welcome.
>
> This patch was developed and tested on top of v6.7.
>
>
> Ben Gainey (1):
> perf: Support PERF_SAMPLE_READ with inherit_stat
>
> kernel/events/core.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> --
> 2.43.0
>
Ben Gainey <ben.gainey@arm.com> writes: > In this configuration stream ids (such as may appear in the read_format > field of a PERF_RECORD_SAMPLE) are no longer globally unique, rather > the pair of (stream id, tid) uniquely identify each event. Tools that > rely on this, for example to calculate a delta between samples, would > need updating to take this into account. Previously valid event > configurations (system-wide, no-inherit and so on) where each stream id > is the identifier are unaffected. So is this an ABI break? It might need an optin, if it breaks anything, which wouldn't surprise me. We do have a lot of different perf stream parsers around these days and we cannot break them. -Andi
On Fri, 2024-01-19 at 09:45 -0800, Andi Kleen wrote: > Ben Gainey <ben.gainey@arm.com> writes: > > > In this configuration stream ids (such as may appear in the > > read_format > > field of a PERF_RECORD_SAMPLE) are no longer globally unique, > > rather > > the pair of (stream id, tid) uniquely identify each event. Tools > > that > > rely on this, for example to calculate a delta between samples, > > would > > need updating to take this into account. Previously valid event > > configurations (system-wide, no-inherit and so on) where each > > stream id > > is the identifier are unaffected. > > So is this an ABI break? It might need an optin, if it breaks > anything, > which wouldn't surprise me. We do have a lot of different perf stream > parsers around these days and we cannot break them. > > -Andi I had considered that, but given currently this perf_event_attr configuration is not allowed, I assumed that it would require existing tools to add support which would in effect be an opt-in. Of course, adding a new flag to be explicit would be trivial enough if required. That said, the binary format for the mmap records / read() etc does not change so using an unmodified tool to parse the data file will give bad results. Therefore any workflow where "modified recording tool" can be combined with "older / unmodified parsing tool" will break. Not sure of the best way to handle this... presumably whenever a change is made to the perf record format, any workflow that allows old parsers to read new format data without version checks could fail? Admittedly this is a "looks the same but isn't" change so harder for tools devs to spot. Any suggestions? For the perf tools, is there a means to record in perf.data a minimum supported tool version / feature incompatibility flags? Regards Ben IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> I had considered that, but given currently this perf_event_attr > configuration is not allowed, I assumed that it would require existing > tools to add support which would in effect be an opt-in. Of course, > adding a new flag to be explicit would be trivial enough if required. That's fair. Makes sense. > That said, the binary format for the mmap records / read() etc does not > change so using an unmodified tool to parse the data file will give bad > results. Therefore any workflow where "modified recording tool" can be > combined with "older / unmodified parsing tool" will break. Not sure of > the best way to handle this... presumably whenever a change is made to > the perf record format, any workflow that allows old parsers to read > new format data without version checks could fail? Admittedly this is a > "looks the same but isn't" change so harder for tools devs to spot. Any > suggestions? For perf itself we can find something. It does a couple of checks, like reserved bits in the perf_event_attr. For the general case of other parsers it's unclear. I suppose could increment the magic identifier to PERFILE3 -Andi
© 2016 - 2025 Red Hat, Inc.