[PATCH v3 00/15] Legacy hardware/cache events as json

Ian Rogers posted 15 patches 1 month ago
There is a newer version of this series
tools/perf/Makefile.perf                      |   21 +-
tools/perf/arch/x86/util/intel-pt.c           |    2 +-
tools/perf/builtin-list.c                     |   34 +-
tools/perf/builtin-record.c                   |   89 +-
tools/perf/pmu-events/Build                   |   24 +-
.../arch/common/common/legacy-hardware.json   |   72 +
tools/perf/pmu-events/empty-pmu-events.c      | 2763 ++++++++++++++++-
tools/perf/pmu-events/jevents.py              |   24 +
tools/perf/pmu-events/make_legacy_cache.py    |  129 +
tools/perf/pmu-events/pmu-events.h            |    1 +
tools/perf/tests/parse-events.c               |    2 +-
tools/perf/tests/pmu-events.c                 |   24 +-
tools/perf/tests/pmu.c                        |    3 +-
tools/perf/util/parse-events.c                |  283 +-
tools/perf/util/parse-events.h                |   16 +-
tools/perf/util/parse-events.l                |   54 +-
tools/perf/util/parse-events.y                |  114 +-
tools/perf/util/perf_api_probe.c              |   27 +-
tools/perf/util/pmu.c                         |  302 +-
tools/perf/util/print-events.c                |  112 -
tools/perf/util/print-events.h                |    4 -
21 files changed, 3330 insertions(+), 770 deletions(-)
create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json
create mode 100755 tools/perf/pmu-events/make_legacy_cache.py
[PATCH v3 00/15] Legacy hardware/cache events as json
Posted by Ian Rogers 1 month ago
Mirroring similar work for software events in commit 6e9fa4131abb
("perf parse-events: Remove non-json software events"). These changes
migrate the legacy hardware and cache events to json.  With no hard
coded legacy hardware or cache events the wild card, case
insensitivity, etc. is consistent for events. This does, however, mean
events like cycles will wild card against all PMUs. A change doing the
same was originally posted and merged from:
https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
and reverted by Linus in commit 4f1b067359ac ("Revert "perf
parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
his dislike for the cycles behavior on ARM with perf record. Earlier
patches in this series make perf record event opening failures
non-fatal and hide the cycles event's failure to open on ARM in perf
record, so it is expected the behavior will now be transparent in perf
record on ARM. perf stat with a cycles event will wildcard open the
event on all PMUs.

The change to support legacy events with PMUs was done to clean up
Intel's hybrid PMU implementation. Having sysfs/json events with
increased priority to legacy was requested by Mark Rutland
 <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy
events on that PMU. It is believed the PMU driver is now fixed, but
this has only been confirmed on ARM Juno boards. It was requested that
RISC-V be able to add events to the perf tool json so the PMU driver
didn't need to map legacy events to config encodings:
https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
This patch series achieves this.

A previous series of patches decreasing legacy hardware event
priorities was posted in:
https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
Namhyung Kim <namhyung@kernel.org> mentioned that hardware and
software events can be implemented similarly:
https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
and this patch series achieves this.

Note, patch 1 (perf parse-events: Fix legacy cache events if event is
duplicated in a PMU) fixes a function deleted by patch 15 (perf
parse-events: Remove hard coded legacy hardware and cache
parsing). Adding the json exposed an issue when legacy cache (not
legacy hardware) and sysfs/json events exist. The fix is necessary to
keep tests passing through the series. It is also posted for backports
to stable trees.

The perf list behavior includes a lot more information and events. The
before behavior on a hybrid alderlake is:
```
$ perf list hw

List of pre-defined events (to be used in -e or -M):

  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  ref-cycles                                         [Hardware event]
$ perf list hwcache

List of pre-defined events (to be used in -e or -M):


cache:
  L1-dcache-loads OR cpu_atom/L1-dcache-loads/
  L1-dcache-stores OR cpu_atom/L1-dcache-stores/
  L1-icache-loads OR cpu_atom/L1-icache-loads/
  L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/
  LLC-loads OR cpu_atom/LLC-loads/
  LLC-load-misses OR cpu_atom/LLC-load-misses/
  LLC-stores OR cpu_atom/LLC-stores/
  LLC-store-misses OR cpu_atom/LLC-store-misses/
  dTLB-loads OR cpu_atom/dTLB-loads/
  dTLB-load-misses OR cpu_atom/dTLB-load-misses/
  dTLB-stores OR cpu_atom/dTLB-stores/
  dTLB-store-misses OR cpu_atom/dTLB-store-misses/
  iTLB-load-misses OR cpu_atom/iTLB-load-misses/
  branch-loads OR cpu_atom/branch-loads/
  branch-load-misses OR cpu_atom/branch-load-misses/
  L1-dcache-loads OR cpu_core/L1-dcache-loads/
  L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/
  L1-dcache-stores OR cpu_core/L1-dcache-stores/
  L1-icache-load-misses OR cpu_core/L1-icache-load-misses/
  LLC-loads OR cpu_core/LLC-loads/
  LLC-load-misses OR cpu_core/LLC-load-misses/
  LLC-stores OR cpu_core/LLC-stores/
  LLC-store-misses OR cpu_core/LLC-store-misses/
  dTLB-loads OR cpu_core/dTLB-loads/
  dTLB-load-misses OR cpu_core/dTLB-load-misses/
  dTLB-stores OR cpu_core/dTLB-stores/
  dTLB-store-misses OR cpu_core/dTLB-store-misses/
  iTLB-load-misses OR cpu_core/iTLB-load-misses/
  branch-loads OR cpu_core/branch-loads/
  branch-load-misses OR cpu_core/branch-load-misses/
  node-loads OR cpu_core/node-loads/
  node-load-misses OR cpu_core/node-load-misses/
```
and after it is:
```
$ perf list hw

legacy hardware:
  branch-instructions
       [Retired branch instructions [This event is an alias of branches].
        Unit: cpu_atom]
  branch-misses
       [Mispredicted branch instructions. Unit: cpu_atom]
  branches
       [Retired branch instructions [This event is an alias of
        branch-instructions]. Unit: cpu_atom]
  bus-cycles
       [Bus cycles,which can be different from total cycles. Unit: cpu_atom]
  cache-misses
       [Cache misses. Usually this indicates Last Level Cache misses; this is
        intended to be used in conjunction with the
        PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
        Unit: cpu_atom]
  cache-references
       [Cache accesses. Usually this indicates Last Level Cache accesses but
        this may vary depending on your CPU. This may include prefetches and
        coherency messages; again this depends on the design of your CPU.
        Unit: cpu_atom]
  cpu-cycles
       [Total cycles. Be wary of what happens during CPU frequency scaling
        [This event is an alias of cycles]. Unit: cpu_atom]
  cycles
       [Total cycles. Be wary of what happens during CPU frequency scaling
        [This event is an alias of cpu-cycles]. Unit: cpu_atom]
  instructions
       [Retired instructions. Be careful,these can be affected by various
        issues,most notably hardware interrupt counts. Unit: cpu_atom]
  ref-cycles
       [Total cycles; not affected by CPU frequency scaling. Unit: cpu_atom]
  branch-instructions
       [Retired branch instructions [This event is an alias of branches].
        Unit: cpu_core]
  branch-misses
       [Mispredicted branch instructions. Unit: cpu_core]
  branches
       [Retired branch instructions [This event is an alias of
        branch-instructions]. Unit: cpu_core]
  bus-cycles
       [Bus cycles,which can be different from total cycles. Unit: cpu_core]
  cache-misses
       [Cache misses. Usually this indicates Last Level Cache misses; this is
        intended to be used in conjunction with the
        PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
        Unit: cpu_core]
  cache-references
       [Cache accesses. Usually this indicates Last Level Cache accesses but
        this may vary depending on your CPU. This may include prefetches and
        coherency messages; again this depends on the design of your CPU.
        Unit: cpu_core]
  cpu-cycles
       [Total cycles. Be wary of what happens during CPU frequency scaling
        [This event is an alias of cycles]. Unit: cpu_core]
  cycles
       [Total cycles. Be wary of what happens during CPU frequency scaling
        [This event is an alias of cpu-cycles]. Unit: cpu_core]
  instructions
       [Retired instructions. Be careful,these can be affected by various
        issues,most notably hardware interrupt counts. Unit: cpu_core]
  ref-cycles
       [Total cycles; not affected by CPU frequency scaling. Unit: cpu_core]
$ perf list hwcache

legacy cache:
  branch-load-misses
       [Branch prediction unit read misses. Unit: cpu_atom]
  branch-loads
       [Branch prediction unit read accesses. Unit: cpu_atom]
  dtlb-load-misses
       [Data TLB read misses. Unit: cpu_atom]
  dtlb-loads
       [Data TLB read accesses. Unit: cpu_atom]
  dtlb-store-misses
       [Data TLB write misses. Unit: cpu_atom]
  dtlb-stores
       [Data TLB write accesses. Unit: cpu_atom]
  itlb-load-misses
       [Instruction TLB read misses. Unit: cpu_atom]
  l1-dcache-loads
       [Level 1 data cache read accesses. Unit: cpu_atom]
  l1-dcache-stores
       [Level 1 data cache write accesses. Unit: cpu_atom]
  l1-icache-load-misses
       [Level 1 instruction cache read misses. Unit: cpu_atom]
  l1-icache-loads
       [Level 1 instruction cache read accesses. Unit: cpu_atom]
  llc-load-misses
       [Last level cache read misses. Unit: cpu_atom]
  llc-loads
       [Last level cache read accesses. Unit: cpu_atom]
  llc-store-misses
       [Last level cache write misses. Unit: cpu_atom]
  llc-stores
       [Last level cache write accesses. Unit: cpu_atom]
  branch-load-misses
       [Branch prediction unit read misses. Unit: cpu_core]
  branch-loads
       [Branch prediction unit read accesses. Unit: cpu_core]
  dtlb-load-misses
       [Data TLB read misses. Unit: cpu_core]
  dtlb-loads
       [Data TLB read accesses. Unit: cpu_core]
  dtlb-store-misses
       [Data TLB write misses. Unit: cpu_core]
  dtlb-stores
       [Data TLB write accesses. Unit: cpu_core]
  itlb-load-misses
       [Instruction TLB read misses. Unit: cpu_core]
  l1-dcache-load-misses
       [Level 1 data cache read misses. Unit: cpu_core]
  l1-dcache-loads
       [Level 1 data cache read accesses. Unit: cpu_core]
  l1-dcache-stores
       [Level 1 data cache write accesses. Unit: cpu_core]
  l1-icache-load-misses
       [Level 1 instruction cache read misses. Unit: cpu_core]
  llc-load-misses
       [Last level cache read misses. Unit: cpu_core]
  llc-loads
       [Last level cache read accesses. Unit: cpu_core]
  llc-store-misses
       [Last level cache write misses. Unit: cpu_core]
  llc-stores
       [Last level cache write accesses. Unit: cpu_core]
  node-load-misses
       [Local memory read misses. Unit: cpu_core]
  node-loads
       [Local memory read accesses. Unit: cpu_core]
```

v3: Deprecate the legacy cache events that aren't shown in the
    previous perf list to avoid the perf list output being too verbose.

v2: Additional details to the cover letter. Credit to Vince Weaver
    added to the commit message for the event details. Additional
    patches to clean up perf_pmu new_alias by removing an unused term
    scanner argument and avoid stdio usage.
    https://lore.kernel.org/lkml/20250828163225.3839073-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20250828064231.1762997-1-irogers@google.com/

Ian Rogers (15):
  perf parse-events: Fix legacy cache events if event is duplicated in a
    PMU
  perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
  perf record: Skip don't fail for events that don't open
  perf jevents: Support copying the source json files to OUTPUT
  perf pmu: Don't eagerly parse event terms
  perf parse-events: Remove unused FILE input argument to scanner
  perf pmu: Use fd rather than FILE from new_alias
  perf pmu: Factor term parsing into a perf_event_attr into a helper
  perf parse-events: Add terms for legacy hardware and cache config
    values
  perf jevents: Add legacy json terms and default_core event table
    helper
  perf pmu: Add and use legacy_terms in alias information
  perf jevents: Add legacy-hardware and legacy-cache json
  perf print-events: Remove print_hwcache_events
  perf print-events: Remove print_symbol_events
  perf parse-events: Remove hard coded legacy hardware and cache parsing

 tools/perf/Makefile.perf                      |   21 +-
 tools/perf/arch/x86/util/intel-pt.c           |    2 +-
 tools/perf/builtin-list.c                     |   34 +-
 tools/perf/builtin-record.c                   |   89 +-
 tools/perf/pmu-events/Build                   |   24 +-
 .../arch/common/common/legacy-hardware.json   |   72 +
 tools/perf/pmu-events/empty-pmu-events.c      | 2763 ++++++++++++++++-
 tools/perf/pmu-events/jevents.py              |   24 +
 tools/perf/pmu-events/make_legacy_cache.py    |  129 +
 tools/perf/pmu-events/pmu-events.h            |    1 +
 tools/perf/tests/parse-events.c               |    2 +-
 tools/perf/tests/pmu-events.c                 |   24 +-
 tools/perf/tests/pmu.c                        |    3 +-
 tools/perf/util/parse-events.c                |  283 +-
 tools/perf/util/parse-events.h                |   16 +-
 tools/perf/util/parse-events.l                |   54 +-
 tools/perf/util/parse-events.y                |  114 +-
 tools/perf/util/perf_api_probe.c              |   27 +-
 tools/perf/util/pmu.c                         |  302 +-
 tools/perf/util/print-events.c                |  112 -
 tools/perf/util/print-events.h                |    4 -
 21 files changed, 3330 insertions(+), 770 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json
 create mode 100755 tools/perf/pmu-events/make_legacy_cache.py

-- 
2.51.0.318.gd7df087d1a-goog
Re: [PATCH v3 00/15] Legacy hardware/cache events as json
Posted by James Clark 3 weeks, 2 days ago

On 28/08/2025 9:59 pm, Ian Rogers wrote:
> Mirroring similar work for software events in commit 6e9fa4131abb
> ("perf parse-events: Remove non-json software events"). These changes
> migrate the legacy hardware and cache events to json.  With no hard
> coded legacy hardware or cache events the wild card, case
> insensitivity, etc. is consistent for events. This does, however, mean
> events like cycles will wild card against all PMUs. A change doing the
> same was originally posted and merged from:
> https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> and reverted by Linus in commit 4f1b067359ac ("Revert "perf
> parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
> his dislike for the cycles behavior on ARM with perf record. Earlier
> patches in this series make perf record event opening failures
> non-fatal and hide the cycles event's failure to open on ARM in perf
> record, so it is expected the behavior will now be transparent in perf
> record on ARM. perf stat with a cycles event will wildcard open the
> event on all PMUs.

Hi Ian,

Briefly testing perf record and perf stat seem to work now. i.e "perf 
record -e cycles" doesn't fail and just skips the uncore cycles event. 
And "perf stat" now includes the uncore cycles event which I think is 
harmless.

But there are a few perf test failures. For example "test event parsing":

   evlist after sorting/fixing: 'arm_cmn_0/cycles/,{cycles,cache-
     misses,branch-misses}'
   FAILED tests/parse-events.c:1589 wrong number of entries
   Event test failure: test 57 '{cycles,cache-misses,branch-
     misses}:e'running test 58 'cycles/name=name/'

The tests "Perf time to TSC" and "Use a dummy software event to keep 
tracking" are using libperf to open the cycles event as a sampling event 
which now fails. It seems like we've fixed Perf record to ignore this 
failure, but we didn't think about libperf until now.

Thanks
James
Re: [PATCH v3 00/15] Legacy hardware/cache events as json
Posted by Ian Rogers 3 weeks, 1 day ago
On Wed, Sep 10, 2025 at 4:14 AM James Clark <james.clark@linaro.org> wrote:
>
> On 28/08/2025 9:59 pm, Ian Rogers wrote:
> > Mirroring similar work for software events in commit 6e9fa4131abb
> > ("perf parse-events: Remove non-json software events"). These changes
> > migrate the legacy hardware and cache events to json.  With no hard
> > coded legacy hardware or cache events the wild card, case
> > insensitivity, etc. is consistent for events. This does, however, mean
> > events like cycles will wild card against all PMUs. A change doing the
> > same was originally posted and merged from:
> > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> > and reverted by Linus in commit 4f1b067359ac ("Revert "perf
> > parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
> > his dislike for the cycles behavior on ARM with perf record. Earlier
> > patches in this series make perf record event opening failures
> > non-fatal and hide the cycles event's failure to open on ARM in perf
> > record, so it is expected the behavior will now be transparent in perf
> > record on ARM. perf stat with a cycles event will wildcard open the
> > event on all PMUs.
>
> Hi Ian,
>
> Briefly testing perf record and perf stat seem to work now. i.e "perf
> record -e cycles" doesn't fail and just skips the uncore cycles event.
> And "perf stat" now includes the uncore cycles event which I think is
> harmless.

Thanks for confirming this.

> But there are a few perf test failures. For example "test event parsing":
>
>    evlist after sorting/fixing: 'arm_cmn_0/cycles/,{cycles,cache-
>      misses,branch-misses}'
>    FAILED tests/parse-events.c:1589 wrong number of entries
>    Event test failure: test 57 '{cycles,cache-misses,branch-
>      misses}:e'running test 58 'cycles/name=name/'

I suspect the easiest fix for this is to change "cycles" to the
"cpu-cycles" legacy hardware event for this test. The test has always
had issues on ARM due to hardcoded expectations of the core PMU being
"cpu".

> The tests "Perf time to TSC" and "Use a dummy software event to keep
> tracking" are using libperf to open the cycles event as a sampling event
> which now fails. It seems like we've fixed Perf record to ignore this
> failure, but we didn't think about libperf until now.

I'm not clear on the connection here. libperf doesn't do event parsing
and so there are no changes in tools/lib/perf. If a test has an
expectation that "cycles" is a core event, again we can change it to
"cpu-cycles" as a workaround for ARM. As "cycles" will wildcard now,
we don't want that behavior in say API probing as we'll end up never
lazily processing the PMUs. That code has been altered in these
changes to specify the core PMU. For tests it is less of an issue and
so the changes are more limited.

Thanks,
Ian
Re: [PATCH v3 00/15] Legacy hardware/cache events as json
Posted by James Clark 3 weeks, 1 day ago

On 10/09/2025 4:00 pm, Ian Rogers wrote:
> On Wed, Sep 10, 2025 at 4:14 AM James Clark <james.clark@linaro.org> wrote:
>>
>> On 28/08/2025 9:59 pm, Ian Rogers wrote:
>>> Mirroring similar work for software events in commit 6e9fa4131abb
>>> ("perf parse-events: Remove non-json software events"). These changes
>>> migrate the legacy hardware and cache events to json.  With no hard
>>> coded legacy hardware or cache events the wild card, case
>>> insensitivity, etc. is consistent for events. This does, however, mean
>>> events like cycles will wild card against all PMUs. A change doing the
>>> same was originally posted and merged from:
>>> https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
>>> and reverted by Linus in commit 4f1b067359ac ("Revert "perf
>>> parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
>>> his dislike for the cycles behavior on ARM with perf record. Earlier
>>> patches in this series make perf record event opening failures
>>> non-fatal and hide the cycles event's failure to open on ARM in perf
>>> record, so it is expected the behavior will now be transparent in perf
>>> record on ARM. perf stat with a cycles event will wildcard open the
>>> event on all PMUs.
>>
>> Hi Ian,
>>
>> Briefly testing perf record and perf stat seem to work now. i.e "perf
>> record -e cycles" doesn't fail and just skips the uncore cycles event.
>> And "perf stat" now includes the uncore cycles event which I think is
>> harmless.
> 
> Thanks for confirming this.
> 
>> But there are a few perf test failures. For example "test event parsing":
>>
>>     evlist after sorting/fixing: 'arm_cmn_0/cycles/,{cycles,cache-
>>       misses,branch-misses}'
>>     FAILED tests/parse-events.c:1589 wrong number of entries
>>     Event test failure: test 57 '{cycles,cache-misses,branch-
>>       misses}:e'running test 58 'cycles/name=name/'
> 
> I suspect the easiest fix for this is to change "cycles" to the
> "cpu-cycles" legacy hardware event for this test. The test has always
> had issues on ARM due to hardcoded expectations of the core PMU being
> "cpu".
> 
>> The tests "Perf time to TSC" and "Use a dummy software event to keep
>> tracking" are using libperf to open the cycles event as a sampling event
>> which now fails. It seems like we've fixed Perf record to ignore this
>> failure, but we didn't think about libperf until now.
> 
> I'm not clear on the connection here. libperf doesn't do event parsing
> and so there are no changes in tools/lib/perf. If a test has an
> expectation that "cycles" is a core event, again we can change it to
> "cpu-cycles" as a workaround for ARM. As "cycles" will wildcard now,
> we don't want that behavior in say API probing as we'll end up never
> lazily processing the PMUs. That code has been altered in these
> changes to specify the core PMU. For tests it is less of an issue and
> so the changes are more limited.
> 
> Thanks,
> Ian

Sure makes sense if there's an easy fix for the tests, we can do that. I 
suppose the main reason I mentioned it was that the tests might be 
highlighting that other genuine non-Perf and non-test users would see 
the same breakage though.



Re: [PATCH v3 00/15] Legacy hardware/cache events as json
Posted by Ian Rogers 3 weeks ago
On Thu, Sep 11, 2025 at 6:00 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 10/09/2025 4:00 pm, Ian Rogers wrote:
> > On Wed, Sep 10, 2025 at 4:14 AM James Clark <james.clark@linaro.org> wrote:
> >>
> >> On 28/08/2025 9:59 pm, Ian Rogers wrote:
> >>> Mirroring similar work for software events in commit 6e9fa4131abb
> >>> ("perf parse-events: Remove non-json software events"). These changes
> >>> migrate the legacy hardware and cache events to json.  With no hard
> >>> coded legacy hardware or cache events the wild card, case
> >>> insensitivity, etc. is consistent for events. This does, however, mean
> >>> events like cycles will wild card against all PMUs. A change doing the
> >>> same was originally posted and merged from:
> >>> https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> >>> and reverted by Linus in commit 4f1b067359ac ("Revert "perf
> >>> parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
> >>> his dislike for the cycles behavior on ARM with perf record. Earlier
> >>> patches in this series make perf record event opening failures
> >>> non-fatal and hide the cycles event's failure to open on ARM in perf
> >>> record, so it is expected the behavior will now be transparent in perf
> >>> record on ARM. perf stat with a cycles event will wildcard open the
> >>> event on all PMUs.
> >>
> >> Hi Ian,
> >>
> >> Briefly testing perf record and perf stat seem to work now. i.e "perf
> >> record -e cycles" doesn't fail and just skips the uncore cycles event.
> >> And "perf stat" now includes the uncore cycles event which I think is
> >> harmless.
> >
> > Thanks for confirming this.
> >
> >> But there are a few perf test failures. For example "test event parsing":
> >>
> >>     evlist after sorting/fixing: 'arm_cmn_0/cycles/,{cycles,cache-
> >>       misses,branch-misses}'
> >>     FAILED tests/parse-events.c:1589 wrong number of entries
> >>     Event test failure: test 57 '{cycles,cache-misses,branch-
> >>       misses}:e'running test 58 'cycles/name=name/'
> >
> > I suspect the easiest fix for this is to change "cycles" to the
> > "cpu-cycles" legacy hardware event for this test. The test has always
> > had issues on ARM due to hardcoded expectations of the core PMU being
> > "cpu".
> >
> >> The tests "Perf time to TSC" and "Use a dummy software event to keep
> >> tracking" are using libperf to open the cycles event as a sampling event
> >> which now fails. It seems like we've fixed Perf record to ignore this
> >> failure, but we didn't think about libperf until now.
> >
> > I'm not clear on the connection here. libperf doesn't do event parsing
> > and so there are no changes in tools/lib/perf. If a test has an
> > expectation that "cycles" is a core event, again we can change it to
> > "cpu-cycles" as a workaround for ARM. As "cycles" will wildcard now,
> > we don't want that behavior in say API probing as we'll end up never
> > lazily processing the PMUs. That code has been altered in these
> > changes to specify the core PMU. For tests it is less of an issue and
> > so the changes are more limited.
> >
> > Thanks,
> > Ian
>
> Sure makes sense if there's an easy fix for the tests, we can do that. I
> suppose the main reason I mentioned it was that the tests might be
> highlighting that other genuine non-Perf and non-test users would see
> the same breakage though.

For a non-perf user to see a perf change they must transitively depend
on perf to care. I think the complaint is that we've gone from 1 event
(ignoring BIG.little/hybrid) to possibly many, particularly on ARM.
What I'm thinking is we should have something like:

#if defined(__aarch64__) || defined(__arm__)
#define HW_CYCLES_STR "cpu-cycles"
#else
#define HW_CYCLES_STR "cycles"
#endif

and remove all use of just raw "cycles" in the code to use this
#define. This should avoid the >1 event issue on ARM in things like
tests. It does cause a new problem if the evsel->name is assumed to be
cycles, which is something that can happen a lot in shell scripts.
Perhaps all those use-cases should switch to specifying a PMU, which
would be a good thing performance wise to avoid scanning lots of PMUs.
I'll add somethings to v4 to do a mix of this.

Thanks,
Ian
Re: [PATCH v3 00/15] Legacy hardware/cache events as json
Posted by Thomas Richter 1 month ago
On 8/28/25 22:59, Ian Rogers wrote:
> Mirroring similar work for software events in commit 6e9fa4131abb
> ("perf parse-events: Remove non-json software events"). These changes
> migrate the legacy hardware and cache events to json.  With no hard
> coded legacy hardware or cache events the wild card, case
> insensitivity, etc. is consistent for events. This does, however, mean
> events like cycles will wild card against all PMUs. A change doing the
> same was originally posted and merged from:
> https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> and reverted by Linus in commit 4f1b067359ac ("Revert "perf
> parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
> his dislike for the cycles behavior on ARM with perf record. Earlier
> patches in this series make perf record event opening failures
> non-fatal and hide the cycles event's failure to open on ARM in perf
> record, so it is expected the behavior will now be transparent in perf
> record on ARM. perf stat with a cycles event will wildcard open the
> event on all PMUs.
> 
> The change to support legacy events with PMUs was done to clean up
> Intel's hybrid PMU implementation. Having sysfs/json events with
> increased priority to legacy was requested by Mark Rutland
>  <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy
> events on that PMU. It is believed the PMU driver is now fixed, but
> this has only been confirmed on ARM Juno boards. It was requested that
> RISC-V be able to add events to the perf tool json so the PMU driver
> didn't need to map legacy events to config encodings:
> https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> This patch series achieves this.
> 
> A previous series of patches decreasing legacy hardware event
> priorities was posted in:
> https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
> Namhyung Kim <namhyung@kernel.org> mentioned that hardware and
> software events can be implemented similarly:
> https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
> and this patch series achieves this.
> 
> Note, patch 1 (perf parse-events: Fix legacy cache events if event is
> duplicated in a PMU) fixes a function deleted by patch 15 (perf
> parse-events: Remove hard coded legacy hardware and cache
> parsing). Adding the json exposed an issue when legacy cache (not
> legacy hardware) and sysfs/json events exist. The fix is necessary to
> keep tests passing through the series. It is also posted for backports
> to stable trees.
> 
> The perf list behavior includes a lot more information and events. The
> before behavior on a hybrid alderlake is:


.....

For s390 the whole series:

Tested-by: Thomas Richter <tmricht@linux.ibm.com>
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH

Vorsitzender des Aufsichtsrats: Wolfgang Wendt

Geschäftsführung: David Faller

Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
Re: [PATCH v3 00/15] Legacy hardware/cache events as json
Posted by Namhyung Kim 3 weeks, 1 day ago
Hi Ian,

On Thu, Aug 28, 2025 at 01:59:15PM -0700, Ian Rogers wrote:
> Mirroring similar work for software events in commit 6e9fa4131abb
> ("perf parse-events: Remove non-json software events"). These changes
> migrate the legacy hardware and cache events to json.  With no hard
> coded legacy hardware or cache events the wild card, case
> insensitivity, etc. is consistent for events. This does, however, mean
> events like cycles will wild card against all PMUs. A change doing the
> same was originally posted and merged from:
> https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> and reverted by Linus in commit 4f1b067359ac ("Revert "perf
> parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
> his dislike for the cycles behavior on ARM with perf record. Earlier
> patches in this series make perf record event opening failures
> non-fatal and hide the cycles event's failure to open on ARM in perf
> record, so it is expected the behavior will now be transparent in perf
> record on ARM. perf stat with a cycles event will wildcard open the
> event on all PMUs.
> 
> The change to support legacy events with PMUs was done to clean up
> Intel's hybrid PMU implementation. Having sysfs/json events with
> increased priority to legacy was requested by Mark Rutland
>  <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy
> events on that PMU. It is believed the PMU driver is now fixed, but
> this has only been confirmed on ARM Juno boards. It was requested that
> RISC-V be able to add events to the perf tool json so the PMU driver
> didn't need to map legacy events to config encodings:
> https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> This patch series achieves this.
> 
> A previous series of patches decreasing legacy hardware event
> priorities was posted in:
> https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
> Namhyung Kim <namhyung@kernel.org> mentioned that hardware and
> software events can be implemented similarly:
> https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
> and this patch series achieves this.

Thanks for working on this.  Yeah, I think it's be easier to handle all
events consistently with JSON.  I expect the sysfs encoding will be used
in a higher priority if it comes with <PMU>/<EVENT>/ format.

> 
> Note, patch 1 (perf parse-events: Fix legacy cache events if event is
> duplicated in a PMU) fixes a function deleted by patch 15 (perf
> parse-events: Remove hard coded legacy hardware and cache
> parsing). Adding the json exposed an issue when legacy cache (not
> legacy hardware) and sysfs/json events exist. The fix is necessary to
> keep tests passing through the series. It is also posted for backports
> to stable trees.

Sounds ok.

> 
> The perf list behavior includes a lot more information and events. The
> before behavior on a hybrid alderlake is:
> ```
> $ perf list hw
> 
> List of pre-defined events (to be used in -e or -M):
> 
>   branch-instructions OR branches                    [Hardware event]
>   branch-misses                                      [Hardware event]
>   bus-cycles                                         [Hardware event]
>   cache-misses                                       [Hardware event]
>   cache-references                                   [Hardware event]
>   cpu-cycles OR cycles                               [Hardware event]
>   instructions                                       [Hardware event]
>   ref-cycles                                         [Hardware event]
> $ perf list hwcache
> 
> List of pre-defined events (to be used in -e or -M):
> 
> 
> cache:
>   L1-dcache-loads OR cpu_atom/L1-dcache-loads/
>   L1-dcache-stores OR cpu_atom/L1-dcache-stores/
>   L1-icache-loads OR cpu_atom/L1-icache-loads/
>   L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/
>   LLC-loads OR cpu_atom/LLC-loads/
>   LLC-load-misses OR cpu_atom/LLC-load-misses/
>   LLC-stores OR cpu_atom/LLC-stores/
>   LLC-store-misses OR cpu_atom/LLC-store-misses/
>   dTLB-loads OR cpu_atom/dTLB-loads/
>   dTLB-load-misses OR cpu_atom/dTLB-load-misses/
>   dTLB-stores OR cpu_atom/dTLB-stores/
>   dTLB-store-misses OR cpu_atom/dTLB-store-misses/
>   iTLB-load-misses OR cpu_atom/iTLB-load-misses/
>   branch-loads OR cpu_atom/branch-loads/
>   branch-load-misses OR cpu_atom/branch-load-misses/
>   L1-dcache-loads OR cpu_core/L1-dcache-loads/
>   L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/
>   L1-dcache-stores OR cpu_core/L1-dcache-stores/
>   L1-icache-load-misses OR cpu_core/L1-icache-load-misses/
>   LLC-loads OR cpu_core/LLC-loads/
>   LLC-load-misses OR cpu_core/LLC-load-misses/
>   LLC-stores OR cpu_core/LLC-stores/
>   LLC-store-misses OR cpu_core/LLC-store-misses/
>   dTLB-loads OR cpu_core/dTLB-loads/
>   dTLB-load-misses OR cpu_core/dTLB-load-misses/
>   dTLB-stores OR cpu_core/dTLB-stores/
>   dTLB-store-misses OR cpu_core/dTLB-store-misses/
>   iTLB-load-misses OR cpu_core/iTLB-load-misses/
>   branch-loads OR cpu_core/branch-loads/
>   branch-load-misses OR cpu_core/branch-load-misses/
>   node-loads OR cpu_core/node-loads/
>   node-load-misses OR cpu_core/node-load-misses/
> ```
> and after it is:
> ```
> $ perf list hw
> 
> legacy hardware:
>   branch-instructions
>        [Retired branch instructions [This event is an alias of branches].
>         Unit: cpu_atom]
>   branch-misses
>        [Mispredicted branch instructions. Unit: cpu_atom]
>   branches
>        [Retired branch instructions [This event is an alias of
>         branch-instructions]. Unit: cpu_atom]

A nit.  Can we have one actual event and an alias of it?

I think 'branch-instructions' will be the actual event and 'branches'
will be the alias.  Then the description will be like

  branch-instructions
      [Retired branch instructions.  Unit: cpu_atom]
  ...

  branches
      [This event is an alias of branch-instructions.]

The same goes to 'cycles' and 'cpu-cycles'.

Thanks,
Namhyung


>   bus-cycles
>        [Bus cycles,which can be different from total cycles. Unit: cpu_atom]
>   cache-misses
>        [Cache misses. Usually this indicates Last Level Cache misses; this is
>         intended to be used in conjunction with the
>         PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
>         Unit: cpu_atom]
>   cache-references
>        [Cache accesses. Usually this indicates Last Level Cache accesses but
>         this may vary depending on your CPU. This may include prefetches and
>         coherency messages; again this depends on the design of your CPU.
>         Unit: cpu_atom]
>   cpu-cycles
>        [Total cycles. Be wary of what happens during CPU frequency scaling
>         [This event is an alias of cycles]. Unit: cpu_atom]
>   cycles
>        [Total cycles. Be wary of what happens during CPU frequency scaling
>         [This event is an alias of cpu-cycles]. Unit: cpu_atom]
>   instructions
>        [Retired instructions. Be careful,these can be affected by various
>         issues,most notably hardware interrupt counts. Unit: cpu_atom]
>   ref-cycles
>        [Total cycles; not affected by CPU frequency scaling. Unit: cpu_atom]
>   branch-instructions
>        [Retired branch instructions [This event is an alias of branches].
>         Unit: cpu_core]
>   branch-misses
>        [Mispredicted branch instructions. Unit: cpu_core]
>   branches
>        [Retired branch instructions [This event is an alias of
>         branch-instructions]. Unit: cpu_core]
>   bus-cycles
>        [Bus cycles,which can be different from total cycles. Unit: cpu_core]
>   cache-misses
>        [Cache misses. Usually this indicates Last Level Cache misses; this is
>         intended to be used in conjunction with the
>         PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
>         Unit: cpu_core]
>   cache-references
>        [Cache accesses. Usually this indicates Last Level Cache accesses but
>         this may vary depending on your CPU. This may include prefetches and
>         coherency messages; again this depends on the design of your CPU.
>         Unit: cpu_core]
>   cpu-cycles
>        [Total cycles. Be wary of what happens during CPU frequency scaling
>         [This event is an alias of cycles]. Unit: cpu_core]
>   cycles
>        [Total cycles. Be wary of what happens during CPU frequency scaling
>         [This event is an alias of cpu-cycles]. Unit: cpu_core]
>   instructions
>        [Retired instructions. Be careful,these can be affected by various
>         issues,most notably hardware interrupt counts. Unit: cpu_core]
>   ref-cycles
>        [Total cycles; not affected by CPU frequency scaling. Unit: cpu_core]
> $ perf list hwcache
> 
> legacy cache:
>   branch-load-misses
>        [Branch prediction unit read misses. Unit: cpu_atom]
>   branch-loads
>        [Branch prediction unit read accesses. Unit: cpu_atom]
>   dtlb-load-misses
>        [Data TLB read misses. Unit: cpu_atom]
>   dtlb-loads
>        [Data TLB read accesses. Unit: cpu_atom]
>   dtlb-store-misses
>        [Data TLB write misses. Unit: cpu_atom]
>   dtlb-stores
>        [Data TLB write accesses. Unit: cpu_atom]
>   itlb-load-misses
>        [Instruction TLB read misses. Unit: cpu_atom]
>   l1-dcache-loads
>        [Level 1 data cache read accesses. Unit: cpu_atom]
>   l1-dcache-stores
>        [Level 1 data cache write accesses. Unit: cpu_atom]
>   l1-icache-load-misses
>        [Level 1 instruction cache read misses. Unit: cpu_atom]
>   l1-icache-loads
>        [Level 1 instruction cache read accesses. Unit: cpu_atom]
>   llc-load-misses
>        [Last level cache read misses. Unit: cpu_atom]
>   llc-loads
>        [Last level cache read accesses. Unit: cpu_atom]
>   llc-store-misses
>        [Last level cache write misses. Unit: cpu_atom]
>   llc-stores
>        [Last level cache write accesses. Unit: cpu_atom]
>   branch-load-misses
>        [Branch prediction unit read misses. Unit: cpu_core]
>   branch-loads
>        [Branch prediction unit read accesses. Unit: cpu_core]
>   dtlb-load-misses
>        [Data TLB read misses. Unit: cpu_core]
>   dtlb-loads
>        [Data TLB read accesses. Unit: cpu_core]
>   dtlb-store-misses
>        [Data TLB write misses. Unit: cpu_core]
>   dtlb-stores
>        [Data TLB write accesses. Unit: cpu_core]
>   itlb-load-misses
>        [Instruction TLB read misses. Unit: cpu_core]
>   l1-dcache-load-misses
>        [Level 1 data cache read misses. Unit: cpu_core]
>   l1-dcache-loads
>        [Level 1 data cache read accesses. Unit: cpu_core]
>   l1-dcache-stores
>        [Level 1 data cache write accesses. Unit: cpu_core]
>   l1-icache-load-misses
>        [Level 1 instruction cache read misses. Unit: cpu_core]
>   llc-load-misses
>        [Last level cache read misses. Unit: cpu_core]
>   llc-loads
>        [Last level cache read accesses. Unit: cpu_core]
>   llc-store-misses
>        [Last level cache write misses. Unit: cpu_core]
>   llc-stores
>        [Last level cache write accesses. Unit: cpu_core]
>   node-load-misses
>        [Local memory read misses. Unit: cpu_core]
>   node-loads
>        [Local memory read accesses. Unit: cpu_core]
> ```
> 
> v3: Deprecate the legacy cache events that aren't shown in the
>     previous perf list to avoid the perf list output being too verbose.
> 
> v2: Additional details to the cover letter. Credit to Vince Weaver
>     added to the commit message for the event details. Additional
>     patches to clean up perf_pmu new_alias by removing an unused term
>     scanner argument and avoid stdio usage.
>     https://lore.kernel.org/lkml/20250828163225.3839073-1-irogers@google.com/
> 
> v1: https://lore.kernel.org/lkml/20250828064231.1762997-1-irogers@google.com/
> 
> Ian Rogers (15):
>   perf parse-events: Fix legacy cache events if event is duplicated in a
>     PMU
>   perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
>   perf record: Skip don't fail for events that don't open
>   perf jevents: Support copying the source json files to OUTPUT
>   perf pmu: Don't eagerly parse event terms
>   perf parse-events: Remove unused FILE input argument to scanner
>   perf pmu: Use fd rather than FILE from new_alias
>   perf pmu: Factor term parsing into a perf_event_attr into a helper
>   perf parse-events: Add terms for legacy hardware and cache config
>     values
>   perf jevents: Add legacy json terms and default_core event table
>     helper
>   perf pmu: Add and use legacy_terms in alias information
>   perf jevents: Add legacy-hardware and legacy-cache json
>   perf print-events: Remove print_hwcache_events
>   perf print-events: Remove print_symbol_events
>   perf parse-events: Remove hard coded legacy hardware and cache parsing
> 
>  tools/perf/Makefile.perf                      |   21 +-
>  tools/perf/arch/x86/util/intel-pt.c           |    2 +-
>  tools/perf/builtin-list.c                     |   34 +-
>  tools/perf/builtin-record.c                   |   89 +-
>  tools/perf/pmu-events/Build                   |   24 +-
>  .../arch/common/common/legacy-hardware.json   |   72 +
>  tools/perf/pmu-events/empty-pmu-events.c      | 2763 ++++++++++++++++-
>  tools/perf/pmu-events/jevents.py              |   24 +
>  tools/perf/pmu-events/make_legacy_cache.py    |  129 +
>  tools/perf/pmu-events/pmu-events.h            |    1 +
>  tools/perf/tests/parse-events.c               |    2 +-
>  tools/perf/tests/pmu-events.c                 |   24 +-
>  tools/perf/tests/pmu.c                        |    3 +-
>  tools/perf/util/parse-events.c                |  283 +-
>  tools/perf/util/parse-events.h                |   16 +-
>  tools/perf/util/parse-events.l                |   54 +-
>  tools/perf/util/parse-events.y                |  114 +-
>  tools/perf/util/perf_api_probe.c              |   27 +-
>  tools/perf/util/pmu.c                         |  302 +-
>  tools/perf/util/print-events.c                |  112 -
>  tools/perf/util/print-events.h                |    4 -
>  21 files changed, 3330 insertions(+), 770 deletions(-)
>  create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json
>  create mode 100755 tools/perf/pmu-events/make_legacy_cache.py
> 
> -- 
> 2.51.0.318.gd7df087d1a-goog
>
Re: [PATCH v3 00/15] Legacy hardware/cache events as json
Posted by Ian Rogers 3 weeks, 1 day ago
On Wed, Sep 10, 2025 at 1:10 PM Namhyung Kim <namhyung@kernel.org> wrote:
> A nit.  Can we have one actual event and an alias of it?
>
> I think 'branch-instructions' will be the actual event and 'branches'
> will be the alias.  Then the description will be like
>
>   branch-instructions
>       [Retired branch instructions.  Unit: cpu_atom]
>   ...
>
>   branches
>       [This event is an alias of branch-instructions.]
>
> The same goes to 'cycles' and 'cpu-cycles'.

Similar 'cs' and 'context-switches' in
tools/perf/pmu-events/arch/common/common/software.json.

So there are a few different ways to do this:

1) In perf list detect two events have the same encoding and list them together.
2) In the json have a new aliases list then either:
2.1) gets expanded in jevents.py as part of the build,
2.2) passes into the pmu-events.c and the C code is updated to use an
alias list associated with each event.

Option (1) will have something like quadratic complexity, but a fast
perf list isn't a particular goal I've heard of.
Option (2.2) will mean the existing binary searches for events will
become a binary search for an event and then linear searches through
the aliases. To make this not a slowdown we'd likely need more lookup
tables to avoid the linear searches.
Option (2.1) feels the most plausible. I was hoping the json and the
sysfs layout would kind of match, this would be true after the
jevents.py expands the aliases. This option is already kind of already
done in the legacy cache case as the
tools/perf/pmu-events/make_legacy_cache.py is making this. We'd still
need option (1) with this.

Anyway, I'm not sure these downsides are countered by a slightly
smaller hardware.json and software.json, or maybe we should just go
with option 1 if the perf list output is all you care about. Let me
know if you see a different way of making it happen. I don't think the
vendors will be particularly happy for their upstream formats to
change given other tools will rely on them.

Thanks,
Ian
Re: [PATCH v3 00/15] Legacy hardware/cache events as json
Posted by Namhyung Kim 3 weeks ago
On Wed, Sep 10, 2025 at 02:58:05PM -0700, Ian Rogers wrote:
> On Wed, Sep 10, 2025 at 1:10 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > A nit.  Can we have one actual event and an alias of it?
> >
> > I think 'branch-instructions' will be the actual event and 'branches'
> > will be the alias.  Then the description will be like
> >
> >   branch-instructions
> >       [Retired branch instructions.  Unit: cpu_atom]
> >   ...
> >
> >   branches
> >       [This event is an alias of branch-instructions.]
> >
> > The same goes to 'cycles' and 'cpu-cycles'.
> 
> Similar 'cs' and 'context-switches' in
> tools/perf/pmu-events/arch/common/common/software.json.
> 
> So there are a few different ways to do this:
> 
> 1) In perf list detect two events have the same encoding and list them together.
> 2) In the json have a new aliases list then either:
> 2.1) gets expanded in jevents.py as part of the build,
> 2.2) passes into the pmu-events.c and the C code is updated to use an
> alias list associated with each event.
> 
> Option (1) will have something like quadratic complexity, but a fast
> perf list isn't a particular goal I've heard of.
> Option (2.2) will mean the existing binary searches for events will
> become a binary search for an event and then linear searches through
> the aliases. To make this not a slowdown we'd likely need more lookup
> tables to avoid the linear searches.
> Option (2.1) feels the most plausible. I was hoping the json and the
> sysfs layout would kind of match, this would be true after the
> jevents.py expands the aliases. This option is already kind of already
> done in the legacy cache case as the
> tools/perf/pmu-events/make_legacy_cache.py is making this. We'd still
> need option (1) with this.
> 
> Anyway, I'm not sure these downsides are countered by a slightly
> smaller hardware.json and software.json, or maybe we should just go
> with option 1 if the perf list output is all you care about. Let me
> know if you see a different way of making it happen. I don't think the
> vendors will be particularly happy for their upstream formats to
> change given other tools will rely on them.

Well, I was asking just to update the description in JSON.  I'm not sure
if it's a common problem we need to solve.  Updating a few known aliases
in the hardware and software description would be fine.

Thanks,
Namhyung