[PATCH v6 00/28] Legacy hardware/cache events as json

Ian Rogers posted 28 patches 1 week, 1 day ago
tools/perf/Makefile.perf                      |   21 +-
tools/perf/arch/x86/util/intel-pt.c           |    2 +-
tools/perf/builtin-list.c                     |   34 +-
tools/perf/builtin-record.c                   |   97 +-
tools/perf/builtin-stat.c                     |  171 +-
tools/perf/builtin-top.c                      |    8 +-
tools/perf/pmu-events/Build                   |   24 +-
.../arch/common/common/legacy-hardware.json   |   72 +
tools/perf/pmu-events/empty-pmu-events.c      | 2771 ++++++++++++++++-
tools/perf/pmu-events/jevents.py              |   32 +
tools/perf/pmu-events/make_legacy_cache.py    |  129 +
tools/perf/pmu-events/pmu-events.h            |    1 +
tools/perf/tests/code-reading.c               |    2 +-
tools/perf/tests/keep-tracking.c              |    2 +-
tools/perf/tests/parse-events.c               | 2010 ++++++------
tools/perf/tests/perf-time-to-tsc.c           |    4 +-
tools/perf/tests/pmu-events.c                 |   24 +-
tools/perf/tests/pmu.c                        |    3 +-
tools/perf/tests/switch-tracking.c            |    2 +-
tools/perf/util/evlist.c                      |   18 +-
tools/perf/util/evsel.c                       |   21 +-
tools/perf/util/parse-events.c                |  282 +-
tools/perf/util/parse-events.h                |   22 +-
tools/perf/util/parse-events.l                |   54 +-
tools/perf/util/parse-events.y                |  114 +-
tools/perf/util/perf_api_probe.c              |   27 +-
tools/perf/util/pmu.c                         |  309 +-
tools/perf/util/print-events.c                |  112 -
tools/perf/util/print-events.h                |    4 -
29 files changed, 4523 insertions(+), 1849 deletions(-)
create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json
create mode 100755 tools/perf/pmu-events/make_legacy_cache.py
[PATCH v6 00/28] Legacy hardware/cache events as json
Posted by Ian Rogers 1 week, 1 day ago
Mirroring similar work for software events in commit 6e9fa4131abb
("perf parse-events: Remove non-json software events"). These changes
migrate the legacy hardware and cache events to json.  With no hard
coded legacy hardware or cache events the wild card, case
insensitivity, etc. is consistent for events. This does, however, mean
events like cycles will wild card against all PMUs. A change doing the
same was originally posted and merged from:
https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
and reverted by Linus in commit 4f1b067359ac ("Revert "perf
parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
his dislike for the cycles behavior on ARM with perf record. Earlier
patches in this series make perf record event opening failures
non-fatal and hide the cycles event's failure to open on ARM in perf
record, so it is expected the behavior will now be transparent in perf
record on ARM. perf stat with a cycles event will wildcard open the
event on all PMUs, however, with default events the cycles event will
only be opened on core PMUs.

The key motivation for these patches is so that if, for example, you
run `perf stat -e cpu-cycles ...` on a hybrid x86 and the results are
printed out with "cpu_core/cpu-cycles/" and "cpu_atom/cpu-cycles/",
the perf_event_attr for cpu-cycles and cpu_core/cpu-cycles/ be the
same, similarly for the cpu_atom event. Prior to these patches the
event with a PMU prefers sysfs/json over legacy encodings while with
no PMU legacy encodings are preferred - these are different encodings
on x86.

The change to support legacy events with PMUs was done to clean up
Intel's hybrid PMU implementation. Having sysfs/json events with
increased priority to legacy was requested by Mark Rutland
 <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy
events on that PMU. It is believed the PMU driver is now fixed, but
this has only been confirmed on ARM Juno boards. It was requested that
RISC-V be able to add events to the perf tool json so the PMU driver
didn't need to map legacy events to config encodings:
https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
This patch series achieves this.

A previous series of patches decreasing legacy hardware event
priorities was posted in:
https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
Namhyung Kim <namhyung@kernel.org> mentioned that hardware and
software events can be implemented similarly:
https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
and this patch series achieves this.

Note, patch 2 (perf parse-events: Fix legacy cache events if event is
duplicated in a PMU) fixes a function deleted by patch 17 (perf
parse-events: Remove hard coded legacy hardware and cache
parsing). Adding the json exposed an issue when legacy cache (not
legacy hardware) and sysfs/json events exist. The fix is necessary to
keep tests passing through the series. It is also posted for backports
to stable trees.

The perf list behavior includes a lot more information and events. The
before behavior on a hybrid alderlake is:
```
$ perf list hw

List of pre-defined events (to be used in -e or -M):

  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  ref-cycles                                         [Hardware event]
$ perf list hwcache

List of pre-defined events (to be used in -e or -M):


cache:
  L1-dcache-loads OR cpu_atom/L1-dcache-loads/
  L1-dcache-stores OR cpu_atom/L1-dcache-stores/
  L1-icache-loads OR cpu_atom/L1-icache-loads/
  L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/
  LLC-loads OR cpu_atom/LLC-loads/
  LLC-load-misses OR cpu_atom/LLC-load-misses/
  LLC-stores OR cpu_atom/LLC-stores/
  LLC-store-misses OR cpu_atom/LLC-store-misses/
  dTLB-loads OR cpu_atom/dTLB-loads/
  dTLB-load-misses OR cpu_atom/dTLB-load-misses/
  dTLB-stores OR cpu_atom/dTLB-stores/
  dTLB-store-misses OR cpu_atom/dTLB-store-misses/
  iTLB-load-misses OR cpu_atom/iTLB-load-misses/
  branch-loads OR cpu_atom/branch-loads/
  branch-load-misses OR cpu_atom/branch-load-misses/
  L1-dcache-loads OR cpu_core/L1-dcache-loads/
  L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/
  L1-dcache-stores OR cpu_core/L1-dcache-stores/
  L1-icache-load-misses OR cpu_core/L1-icache-load-misses/
  LLC-loads OR cpu_core/LLC-loads/
  LLC-load-misses OR cpu_core/LLC-load-misses/
  LLC-stores OR cpu_core/LLC-stores/
  LLC-store-misses OR cpu_core/LLC-store-misses/
  dTLB-loads OR cpu_core/dTLB-loads/
  dTLB-load-misses OR cpu_core/dTLB-load-misses/
  dTLB-stores OR cpu_core/dTLB-stores/
  dTLB-store-misses OR cpu_core/dTLB-store-misses/
  iTLB-load-misses OR cpu_core/iTLB-load-misses/
  branch-loads OR cpu_core/branch-loads/
  branch-load-misses OR cpu_core/branch-load-misses/
  node-loads OR cpu_core/node-loads/
  node-load-misses OR cpu_core/node-load-misses/
```
and after it is:
```
$ perf list hw

legacy hardware:
  branch-instructions
       [Retired branch instructions [This event is an alias of branches].
        Unit: cpu_atom]
  branch-misses
       [Mispredicted branch instructions. Unit: cpu_atom]
  branches
       [Retired branch instructions [This event is an alias of
        branch-instructions]. Unit: cpu_atom]
  bus-cycles
       [Bus cycles,which can be different from total cycles. Unit: cpu_atom]
  cache-misses
       [Cache misses. Usually this indicates Last Level Cache misses; this is
        intended to be used in conjunction with the
        PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
        Unit: cpu_atom]
  cache-references
       [Cache accesses. Usually this indicates Last Level Cache accesses but
        this may vary depending on your CPU. This may include prefetches and
        coherency messages; again this depends on the design of your CPU.
        Unit: cpu_atom]
  cpu-cycles
       [Total cycles. Be wary of what happens during CPU frequency scaling
        [This event is an alias of cycles]. Unit: cpu_atom]
  cycles
       [Total cycles. Be wary of what happens during CPU frequency scaling
        [This event is an alias of cpu-cycles]. Unit: cpu_atom]
  instructions
       [Retired instructions. Be careful,these can be affected by various
        issues,most notably hardware interrupt counts. Unit: cpu_atom]
  ref-cycles
       [Total cycles; not affected by CPU frequency scaling. Unit: cpu_atom]
  branch-instructions
       [Retired branch instructions [This event is an alias of branches].
        Unit: cpu_core]
  branch-misses
       [Mispredicted branch instructions. Unit: cpu_core]
  branches
       [Retired branch instructions [This event is an alias of
        branch-instructions]. Unit: cpu_core]
  bus-cycles
       [Bus cycles,which can be different from total cycles. Unit: cpu_core]
  cache-misses
       [Cache misses. Usually this indicates Last Level Cache misses; this is
        intended to be used in conjunction with the
        PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
        Unit: cpu_core]
  cache-references
       [Cache accesses. Usually this indicates Last Level Cache accesses but
        this may vary depending on your CPU. This may include prefetches and
        coherency messages; again this depends on the design of your CPU.
        Unit: cpu_core]
  cpu-cycles
       [Total cycles. Be wary of what happens during CPU frequency scaling
        [This event is an alias of cycles]. Unit: cpu_core]
  cycles
       [Total cycles. Be wary of what happens during CPU frequency scaling
        [This event is an alias of cpu-cycles]. Unit: cpu_core]
  instructions
       [Retired instructions. Be careful,these can be affected by various
        issues,most notably hardware interrupt counts. Unit: cpu_core]
  ref-cycles
       [Total cycles; not affected by CPU frequency scaling. Unit: cpu_core]
$ perf list hwcache

legacy cache:
  branch-load-misses
       [Branch prediction unit read misses. Unit: cpu_atom]
  branch-loads
       [Branch prediction unit read accesses. Unit: cpu_atom]
  dtlb-load-misses
       [Data TLB read misses. Unit: cpu_atom]
  dtlb-loads
       [Data TLB read accesses. Unit: cpu_atom]
  dtlb-store-misses
       [Data TLB write misses. Unit: cpu_atom]
  dtlb-stores
       [Data TLB write accesses. Unit: cpu_atom]
  itlb-load-misses
       [Instruction TLB read misses. Unit: cpu_atom]
  l1-dcache-loads
       [Level 1 data cache read accesses. Unit: cpu_atom]
  l1-dcache-stores
       [Level 1 data cache write accesses. Unit: cpu_atom]
  l1-icache-load-misses
       [Level 1 instruction cache read misses. Unit: cpu_atom]
  l1-icache-loads
       [Level 1 instruction cache read accesses. Unit: cpu_atom]
  llc-load-misses
       [Last level cache read misses. Unit: cpu_atom]
  llc-loads
       [Last level cache read accesses. Unit: cpu_atom]
  llc-store-misses
       [Last level cache write misses. Unit: cpu_atom]
  llc-stores
       [Last level cache write accesses. Unit: cpu_atom]
  branch-load-misses
       [Branch prediction unit read misses. Unit: cpu_core]
  branch-loads
       [Branch prediction unit read accesses. Unit: cpu_core]
  dtlb-load-misses
       [Data TLB read misses. Unit: cpu_core]
  dtlb-loads
       [Data TLB read accesses. Unit: cpu_core]
  dtlb-store-misses
       [Data TLB write misses. Unit: cpu_core]
  dtlb-stores
       [Data TLB write accesses. Unit: cpu_core]
  itlb-load-misses
       [Instruction TLB read misses. Unit: cpu_core]
  l1-dcache-load-misses
       [Level 1 data cache read misses. Unit: cpu_core]
  l1-dcache-loads
       [Level 1 data cache read accesses. Unit: cpu_core]
  l1-dcache-stores
       [Level 1 data cache write accesses. Unit: cpu_core]
  l1-icache-load-misses
       [Level 1 instruction cache read misses. Unit: cpu_core]
  llc-load-misses
       [Last level cache read misses. Unit: cpu_core]
  llc-loads
       [Last level cache read accesses. Unit: cpu_core]
  llc-store-misses
       [Last level cache write misses. Unit: cpu_core]
  llc-stores
       [Last level cache write accesses. Unit: cpu_core]
  node-load-misses
       [Local memory read misses. Unit: cpu_core]
  node-loads
       [Local memory read accesses. Unit: cpu_core]
```

v6. Fix x86 hybrid mismatched number of evsels for the case a PMU is
    specified. Add patches to make failures in the parse-events test
    easier to diagnose. Reorder the perf stat default events patch to
    come earlier.

v5. Add patch for retrying default events, fixing regression when
    non-root and paranoid. Make cycles to cpu-cycles test event change
    (to avoid non-core ARM events) the default on all architectures
    (suggested by Namhyung). Switch all non-test cases to specifying a
    PMU. Improvements to the parse-events test including core PMU
    parsing support for architectures without a "cpu" PMU.
    https://lore.kernel.org/lkml/20250923041844.400164-1-irogers@google.com/

v4: Fixes for matching hard coded metrics in stat-shadow. Make the
    default "cycles" event string on ARM "cpu-cycles" which is the
    same legacy event but avoids name collisions on ARM PMUs. To
    support this, use evlist__new_default for the no command line
    event case in `perf record` and `perf top`. Make
    evlist__new_default only scan core PMUs.
    https://lore.kernel.org/lkml/20250914181121.1952748-1-irogers@google.com/#t

v3: Deprecate the legacy cache events that aren't shown in the
    previous perf list to avoid the perf list output being too verbose.
    https://lore.kernel.org/lkml/20250828205930.4007284-1-irogers@google.com/

v2: Additional details to the cover letter. Credit to Vince Weaver
    added to the commit message for the event details. Additional
    patches to clean up perf_pmu new_alias by removing an unused term
    scanner argument and avoid stdio usage.
    https://lore.kernel.org/lkml/20250828163225.3839073-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20250828064231.1762997-1-irogers@google.com/

Ian Rogers (28):
  perf stat: Allow retry for default events
  perf parse-events: Fix legacy cache events if event is duplicated in a
    PMU
  perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
  perf stat: Avoid wildcarding PMUs for default events
  perf record: Skip don't fail for events that don't open
  perf jevents: Support copying the source json files to OUTPUT
  perf pmu: Don't eagerly parse event terms
  perf parse-events: Remove unused FILE input argument to scanner
  perf pmu: Use fd rather than FILE from new_alias
  perf pmu: Factor term parsing into a perf_event_attr into a helper
  perf parse-events: Add terms for legacy hardware and cache config
    values
  perf jevents: Add legacy json terms and default_core event table
    helper
  perf pmu: Add and use legacy_terms in alias information
  perf jevents: Add legacy-hardware and legacy-cache json
  perf print-events: Remove print_hwcache_events
  perf print-events: Remove print_symbol_events
  perf parse-events: Remove hard coded legacy hardware and cache parsing
  perf record: Use evlist__new_default when no events specified
  perf top: Use evlist__new_default when no events specified
  perf evlist: Avoid scanning all PMUs for evlist__new_default
  perf evsel: Improvements to __evsel__match
  perf test parse-events: Use evsel__match for legacy events
  perf test parse-events: Without a PMU use cpu-cycles rather than
    cycles
  perf test parse-events: Remove cpu PMU requirement
  perf test: Switch cycles event to cpu-cycles
  perf test: Clean up test_..config helpers
  perf test parse-events: Add evlist test helper
  perf test parse-events: Add evsel test helper

 tools/perf/Makefile.perf                      |   21 +-
 tools/perf/arch/x86/util/intel-pt.c           |    2 +-
 tools/perf/builtin-list.c                     |   34 +-
 tools/perf/builtin-record.c                   |   97 +-
 tools/perf/builtin-stat.c                     |  171 +-
 tools/perf/builtin-top.c                      |    8 +-
 tools/perf/pmu-events/Build                   |   24 +-
 .../arch/common/common/legacy-hardware.json   |   72 +
 tools/perf/pmu-events/empty-pmu-events.c      | 2771 ++++++++++++++++-
 tools/perf/pmu-events/jevents.py              |   32 +
 tools/perf/pmu-events/make_legacy_cache.py    |  129 +
 tools/perf/pmu-events/pmu-events.h            |    1 +
 tools/perf/tests/code-reading.c               |    2 +-
 tools/perf/tests/keep-tracking.c              |    2 +-
 tools/perf/tests/parse-events.c               | 2010 ++++++------
 tools/perf/tests/perf-time-to-tsc.c           |    4 +-
 tools/perf/tests/pmu-events.c                 |   24 +-
 tools/perf/tests/pmu.c                        |    3 +-
 tools/perf/tests/switch-tracking.c            |    2 +-
 tools/perf/util/evlist.c                      |   18 +-
 tools/perf/util/evsel.c                       |   21 +-
 tools/perf/util/parse-events.c                |  282 +-
 tools/perf/util/parse-events.h                |   22 +-
 tools/perf/util/parse-events.l                |   54 +-
 tools/perf/util/parse-events.y                |  114 +-
 tools/perf/util/perf_api_probe.c              |   27 +-
 tools/perf/util/pmu.c                         |  309 +-
 tools/perf/util/print-events.c                |  112 -
 tools/perf/util/print-events.h                |    4 -
 29 files changed, 4523 insertions(+), 1849 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json
 create mode 100755 tools/perf/pmu-events/make_legacy_cache.py

-- 
2.51.0.534.gc79095c0ca-goog
Re: [PATCH v6 00/28] Legacy hardware/cache events as json
Posted by James Clark 11 hours ago

On 23/09/2025 11:32 pm, Ian Rogers wrote:
> Mirroring similar work for software events in commit 6e9fa4131abb
> ("perf parse-events: Remove non-json software events"). These changes
> migrate the legacy hardware and cache events to json.  With no hard
> coded legacy hardware or cache events the wild card, case
> insensitivity, etc. is consistent for events. This does, however, mean
> events like cycles will wild card against all PMUs. A change doing the
> same was originally posted and merged from:
> https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> and reverted by Linus in commit 4f1b067359ac ("Revert "perf
> parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
> his dislike for the cycles behavior on ARM with perf record. Earlier
> patches in this series make perf record event opening failures
> non-fatal and hide the cycles event's failure to open on ARM in perf
> record, so it is expected the behavior will now be transparent in perf
> record on ARM. perf stat with a cycles event will wildcard open the
> event on all PMUs, however, with default events the cycles event will
> only be opened on core PMUs.
> 
> The key motivation for these patches is so that if, for example, you
> run `perf stat -e cpu-cycles ...` on a hybrid x86 and the results are
> printed out with "cpu_core/cpu-cycles/" and "cpu_atom/cpu-cycles/",
> the perf_event_attr for cpu-cycles and cpu_core/cpu-cycles/ be the
> same, similarly for the cpu_atom event. Prior to these patches the
> event with a PMU prefers sysfs/json over legacy encodings while with
> no PMU legacy encodings are preferred - these are different encodings
> on x86.
> 
> The change to support legacy events with PMUs was done to clean up
> Intel's hybrid PMU implementation. Having sysfs/json events with
> increased priority to legacy was requested by Mark Rutland
>   <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy
> events on that PMU. It is believed the PMU driver is now fixed, but
> this has only been confirmed on ARM Juno boards. It was requested that
> RISC-V be able to add events to the perf tool json so the PMU driver
> didn't need to map legacy events to config encodings:
> https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> This patch series achieves this.
> 
> A previous series of patches decreasing legacy hardware event
> priorities was posted in:
> https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
> Namhyung Kim <namhyung@kernel.org> mentioned that hardware and
> software events can be implemented similarly:
> https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
> and this patch series achieves this.
> 
> Note, patch 2 (perf parse-events: Fix legacy cache events if event is
> duplicated in a PMU) fixes a function deleted by patch 17 (perf
> parse-events: Remove hard coded legacy hardware and cache
> parsing). Adding the json exposed an issue when legacy cache (not
> legacy hardware) and sysfs/json events exist. The fix is necessary to
> keep tests passing through the series. It is also posted for backports
> to stable trees.
> 
> The perf list behavior includes a lot more information and events. The
> before behavior on a hybrid alderlake is:
> ```
> $ perf list hw
> 
> List of pre-defined events (to be used in -e or -M):
> 
>    branch-instructions OR branches                    [Hardware event]
>    branch-misses                                      [Hardware event]
>    bus-cycles                                         [Hardware event]
>    cache-misses                                       [Hardware event]
>    cache-references                                   [Hardware event]
>    cpu-cycles OR cycles                               [Hardware event]
>    instructions                                       [Hardware event]
>    ref-cycles                                         [Hardware event]
> $ perf list hwcache
> 
> List of pre-defined events (to be used in -e or -M):
> 
> 
> cache:
>    L1-dcache-loads OR cpu_atom/L1-dcache-loads/
>    L1-dcache-stores OR cpu_atom/L1-dcache-stores/
>    L1-icache-loads OR cpu_atom/L1-icache-loads/
>    L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/
>    LLC-loads OR cpu_atom/LLC-loads/
>    LLC-load-misses OR cpu_atom/LLC-load-misses/
>    LLC-stores OR cpu_atom/LLC-stores/
>    LLC-store-misses OR cpu_atom/LLC-store-misses/
>    dTLB-loads OR cpu_atom/dTLB-loads/
>    dTLB-load-misses OR cpu_atom/dTLB-load-misses/
>    dTLB-stores OR cpu_atom/dTLB-stores/
>    dTLB-store-misses OR cpu_atom/dTLB-store-misses/
>    iTLB-load-misses OR cpu_atom/iTLB-load-misses/
>    branch-loads OR cpu_atom/branch-loads/
>    branch-load-misses OR cpu_atom/branch-load-misses/
>    L1-dcache-loads OR cpu_core/L1-dcache-loads/
>    L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/
>    L1-dcache-stores OR cpu_core/L1-dcache-stores/
>    L1-icache-load-misses OR cpu_core/L1-icache-load-misses/
>    LLC-loads OR cpu_core/LLC-loads/
>    LLC-load-misses OR cpu_core/LLC-load-misses/
>    LLC-stores OR cpu_core/LLC-stores/
>    LLC-store-misses OR cpu_core/LLC-store-misses/
>    dTLB-loads OR cpu_core/dTLB-loads/
>    dTLB-load-misses OR cpu_core/dTLB-load-misses/
>    dTLB-stores OR cpu_core/dTLB-stores/
>    dTLB-store-misses OR cpu_core/dTLB-store-misses/
>    iTLB-load-misses OR cpu_core/iTLB-load-misses/
>    branch-loads OR cpu_core/branch-loads/
>    branch-load-misses OR cpu_core/branch-load-misses/
>    node-loads OR cpu_core/node-loads/
>    node-load-misses OR cpu_core/node-load-misses/
> ```
> and after it is:
> ```
> $ perf list hw
> 
> legacy hardware:
>    branch-instructions
>         [Retired branch instructions [This event is an alias of branches].
>          Unit: cpu_atom]
>    branch-misses
>         [Mispredicted branch instructions. Unit: cpu_atom]
>    branches
>         [Retired branch instructions [This event is an alias of
>          branch-instructions]. Unit: cpu_atom]
>    bus-cycles
>         [Bus cycles,which can be different from total cycles. Unit: cpu_atom]
>    cache-misses
>         [Cache misses. Usually this indicates Last Level Cache misses; this is
>          intended to be used in conjunction with the
>          PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
>          Unit: cpu_atom]
>    cache-references
>         [Cache accesses. Usually this indicates Last Level Cache accesses but
>          this may vary depending on your CPU. This may include prefetches and
>          coherency messages; again this depends on the design of your CPU.
>          Unit: cpu_atom]
>    cpu-cycles
>         [Total cycles. Be wary of what happens during CPU frequency scaling
>          [This event is an alias of cycles]. Unit: cpu_atom]
>    cycles
>         [Total cycles. Be wary of what happens during CPU frequency scaling
>          [This event is an alias of cpu-cycles]. Unit: cpu_atom]
>    instructions
>         [Retired instructions. Be careful,these can be affected by various
>          issues,most notably hardware interrupt counts. Unit: cpu_atom]
>    ref-cycles
>         [Total cycles; not affected by CPU frequency scaling. Unit: cpu_atom]
>    branch-instructions
>         [Retired branch instructions [This event is an alias of branches].
>          Unit: cpu_core]
>    branch-misses
>         [Mispredicted branch instructions. Unit: cpu_core]
>    branches
>         [Retired branch instructions [This event is an alias of
>          branch-instructions]. Unit: cpu_core]
>    bus-cycles
>         [Bus cycles,which can be different from total cycles. Unit: cpu_core]
>    cache-misses
>         [Cache misses. Usually this indicates Last Level Cache misses; this is
>          intended to be used in conjunction with the
>          PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
>          Unit: cpu_core]
>    cache-references
>         [Cache accesses. Usually this indicates Last Level Cache accesses but
>          this may vary depending on your CPU. This may include prefetches and
>          coherency messages; again this depends on the design of your CPU.
>          Unit: cpu_core]
>    cpu-cycles
>         [Total cycles. Be wary of what happens during CPU frequency scaling
>          [This event is an alias of cycles]. Unit: cpu_core]
>    cycles
>         [Total cycles. Be wary of what happens during CPU frequency scaling
>          [This event is an alias of cpu-cycles]. Unit: cpu_core]
>    instructions
>         [Retired instructions. Be careful,these can be affected by various
>          issues,most notably hardware interrupt counts. Unit: cpu_core]
>    ref-cycles
>         [Total cycles; not affected by CPU frequency scaling. Unit: cpu_core]
> $ perf list hwcache
> 
> legacy cache:
>    branch-load-misses
>         [Branch prediction unit read misses. Unit: cpu_atom]
>    branch-loads
>         [Branch prediction unit read accesses. Unit: cpu_atom]
>    dtlb-load-misses
>         [Data TLB read misses. Unit: cpu_atom]
>    dtlb-loads
>         [Data TLB read accesses. Unit: cpu_atom]
>    dtlb-store-misses
>         [Data TLB write misses. Unit: cpu_atom]
>    dtlb-stores
>         [Data TLB write accesses. Unit: cpu_atom]
>    itlb-load-misses
>         [Instruction TLB read misses. Unit: cpu_atom]
>    l1-dcache-loads
>         [Level 1 data cache read accesses. Unit: cpu_atom]
>    l1-dcache-stores
>         [Level 1 data cache write accesses. Unit: cpu_atom]
>    l1-icache-load-misses
>         [Level 1 instruction cache read misses. Unit: cpu_atom]
>    l1-icache-loads
>         [Level 1 instruction cache read accesses. Unit: cpu_atom]
>    llc-load-misses
>         [Last level cache read misses. Unit: cpu_atom]
>    llc-loads
>         [Last level cache read accesses. Unit: cpu_atom]
>    llc-store-misses
>         [Last level cache write misses. Unit: cpu_atom]
>    llc-stores
>         [Last level cache write accesses. Unit: cpu_atom]
>    branch-load-misses
>         [Branch prediction unit read misses. Unit: cpu_core]
>    branch-loads
>         [Branch prediction unit read accesses. Unit: cpu_core]
>    dtlb-load-misses
>         [Data TLB read misses. Unit: cpu_core]
>    dtlb-loads
>         [Data TLB read accesses. Unit: cpu_core]
>    dtlb-store-misses
>         [Data TLB write misses. Unit: cpu_core]
>    dtlb-stores
>         [Data TLB write accesses. Unit: cpu_core]
>    itlb-load-misses
>         [Instruction TLB read misses. Unit: cpu_core]
>    l1-dcache-load-misses
>         [Level 1 data cache read misses. Unit: cpu_core]
>    l1-dcache-loads
>         [Level 1 data cache read accesses. Unit: cpu_core]
>    l1-dcache-stores
>         [Level 1 data cache write accesses. Unit: cpu_core]
>    l1-icache-load-misses
>         [Level 1 instruction cache read misses. Unit: cpu_core]
>    llc-load-misses
>         [Last level cache read misses. Unit: cpu_core]
>    llc-loads
>         [Last level cache read accesses. Unit: cpu_core]
>    llc-store-misses
>         [Last level cache write misses. Unit: cpu_core]
>    llc-stores
>         [Last level cache write accesses. Unit: cpu_core]
>    node-load-misses
>         [Local memory read misses. Unit: cpu_core]
>    node-loads
>         [Local memory read accesses. Unit: cpu_core]
> ```
> 
> v6. Fix x86 hybrid mismatched number of evsels for the case a PMU is
>      specified. Add patches to make failures in the parse-events test
>      easier to diagnose. Reorder the perf stat default events patch to
>      come earlier.
> 
> v5. Add patch for retrying default events, fixing regression when
>      non-root and paranoid. Make cycles to cpu-cycles test event change
>      (to avoid non-core ARM events) the default on all architectures
>      (suggested by Namhyung). Switch all non-test cases to specifying a
>      PMU. Improvements to the parse-events test including core PMU
>      parsing support for architectures without a "cpu" PMU.
>      https://lore.kernel.org/lkml/20250923041844.400164-1-irogers@google.com/
> 
> v4: Fixes for matching hard coded metrics in stat-shadow. Make the
>      default "cycles" event string on ARM "cpu-cycles" which is the
>      same legacy event but avoids name collisions on ARM PMUs. To
>      support this, use evlist__new_default for the no command line
>      event case in `perf record` and `perf top`. Make
>      evlist__new_default only scan core PMUs.
>      https://lore.kernel.org/lkml/20250914181121.1952748-1-irogers@google.com/#t
> 
> v3: Deprecate the legacy cache events that aren't shown in the
>      previous perf list to avoid the perf list output being too verbose.
>      https://lore.kernel.org/lkml/20250828205930.4007284-1-irogers@google.com/
> 

Hi Ian,

Did you drop the change to ignore failures to open events in favour of 
switching the default from "cycles" to "cpu-cycles" instead? I'm trying 
to follow the changelog but couldn't see it.

In v3 I got <not supported> for the uncore cycles event, but in v6 I get 
a complete failure:

  -> sudo perf-v3 stat -e cycles -- true

  Performance counter stats for 'true':

            1732478      cycles 

    <not supported>      arm_cmn_0/cycles/


  -> sudo perf-v6 stat -e cycles -- true
  Error:
  Invalid event (cycles) in per-thread mode, enable system wide with '-
  a'.

The verbose output shows that it tries both, but doesn't ignore the 
error on arm_cmn_0 anymore:

-> sudo perf-v6 stat -e cycles -vvv -- true
Control descriptor is not initialized
Opening: cycles
------------------------------------------------------------
perf_event_attr:
   type                             0 (PERF_TYPE_HARDWARE)
   size                             136
   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
   sample_type                      IDENTIFIER
   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
   disabled                         1
   inherit                          1
   enable_on_exec                   1
------------------------------------------------------------
sys_perf_event_open: pid 9646  cpu -1  group_fd -1  flags 0x8 = 3
Opening: cycles
------------------------------------------------------------
perf_event_attr:
   type                             11 (arm_cmn_0)
   size                             136
Required parameter 'wp_dev_sel' not specified
Required parameter 'wp_dev_sel' not specified
   config                           0x3 (cycles)
   sample_type                      IDENTIFIER
   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
   disabled                         1
   inherit                          1
   enable_on_exec                   1
------------------------------------------------------------
sys_perf_event_open: pid 9646  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -22
switching off exclude_guest for PMU arm_cmn_0
Using PERF_SAMPLE_READ / :S modifier is not compatible with inherit, 
falling back to no-inherit.
Warning:
cycles event is not supported by the kernel.
Error:
Invalid event (cycles) in per-thread mode, enable system wide with '-a'.


> v2: Additional details to the cover letter. Credit to Vince Weaver
>      added to the commit message for the event details. Additional
>      patches to clean up perf_pmu new_alias by removing an unused term
>      scanner argument and avoid stdio usage.
>      https://lore.kernel.org/lkml/20250828163225.3839073-1-irogers@google.com/
> 
> v1: https://lore.kernel.org/lkml/20250828064231.1762997-1-irogers@google.com/
> 
> Ian Rogers (28):
>    perf stat: Allow retry for default events
>    perf parse-events: Fix legacy cache events if event is duplicated in a
>      PMU
>    perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
>    perf stat: Avoid wildcarding PMUs for default events
>    perf record: Skip don't fail for events that don't open
>    perf jevents: Support copying the source json files to OUTPUT
>    perf pmu: Don't eagerly parse event terms
>    perf parse-events: Remove unused FILE input argument to scanner
>    perf pmu: Use fd rather than FILE from new_alias
>    perf pmu: Factor term parsing into a perf_event_attr into a helper
>    perf parse-events: Add terms for legacy hardware and cache config
>      values
>    perf jevents: Add legacy json terms and default_core event table
>      helper
>    perf pmu: Add and use legacy_terms in alias information
>    perf jevents: Add legacy-hardware and legacy-cache json
>    perf print-events: Remove print_hwcache_events
>    perf print-events: Remove print_symbol_events
>    perf parse-events: Remove hard coded legacy hardware and cache parsing
>    perf record: Use evlist__new_default when no events specified
>    perf top: Use evlist__new_default when no events specified
>    perf evlist: Avoid scanning all PMUs for evlist__new_default
>    perf evsel: Improvements to __evsel__match
>    perf test parse-events: Use evsel__match for legacy events
>    perf test parse-events: Without a PMU use cpu-cycles rather than
>      cycles
>    perf test parse-events: Remove cpu PMU requirement
>    perf test: Switch cycles event to cpu-cycles
>    perf test: Clean up test_..config helpers
>    perf test parse-events: Add evlist test helper
>    perf test parse-events: Add evsel test helper
> 
>   tools/perf/Makefile.perf                      |   21 +-
>   tools/perf/arch/x86/util/intel-pt.c           |    2 +-
>   tools/perf/builtin-list.c                     |   34 +-
>   tools/perf/builtin-record.c                   |   97 +-
>   tools/perf/builtin-stat.c                     |  171 +-
>   tools/perf/builtin-top.c                      |    8 +-
>   tools/perf/pmu-events/Build                   |   24 +-
>   .../arch/common/common/legacy-hardware.json   |   72 +
>   tools/perf/pmu-events/empty-pmu-events.c      | 2771 ++++++++++++++++-
>   tools/perf/pmu-events/jevents.py              |   32 +
>   tools/perf/pmu-events/make_legacy_cache.py    |  129 +
>   tools/perf/pmu-events/pmu-events.h            |    1 +
>   tools/perf/tests/code-reading.c               |    2 +-
>   tools/perf/tests/keep-tracking.c              |    2 +-
>   tools/perf/tests/parse-events.c               | 2010 ++++++------
>   tools/perf/tests/perf-time-to-tsc.c           |    4 +-
>   tools/perf/tests/pmu-events.c                 |   24 +-
>   tools/perf/tests/pmu.c                        |    3 +-
>   tools/perf/tests/switch-tracking.c            |    2 +-
>   tools/perf/util/evlist.c                      |   18 +-
>   tools/perf/util/evsel.c                       |   21 +-
>   tools/perf/util/parse-events.c                |  282 +-
>   tools/perf/util/parse-events.h                |   22 +-
>   tools/perf/util/parse-events.l                |   54 +-
>   tools/perf/util/parse-events.y                |  114 +-
>   tools/perf/util/perf_api_probe.c              |   27 +-
>   tools/perf/util/pmu.c                         |  309 +-
>   tools/perf/util/print-events.c                |  112 -
>   tools/perf/util/print-events.h                |    4 -
>   29 files changed, 4523 insertions(+), 1849 deletions(-)
>   create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json
>   create mode 100755 tools/perf/pmu-events/make_legacy_cache.py
>
Re: [PATCH v6 00/28] Legacy hardware/cache events as json
Posted by Ian Rogers 9 hours ago
On Wed, Oct 1, 2025 at 6:38 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 23/09/2025 11:32 pm, Ian Rogers wrote:
> > Mirroring similar work for software events in commit 6e9fa4131abb
> > ("perf parse-events: Remove non-json software events"). These changes
> > migrate the legacy hardware and cache events to json.  With no hard
> > coded legacy hardware or cache events the wild card, case
> > insensitivity, etc. is consistent for events. This does, however, mean
> > events like cycles will wild card against all PMUs. A change doing the
> > same was originally posted and merged from:
> > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> > and reverted by Linus in commit 4f1b067359ac ("Revert "perf
> > parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
> > his dislike for the cycles behavior on ARM with perf record. Earlier
> > patches in this series make perf record event opening failures
> > non-fatal and hide the cycles event's failure to open on ARM in perf
> > record, so it is expected the behavior will now be transparent in perf
> > record on ARM. perf stat with a cycles event will wildcard open the
> > event on all PMUs, however, with default events the cycles event will
> > only be opened on core PMUs.
> >
> > The key motivation for these patches is so that if, for example, you
> > run `perf stat -e cpu-cycles ...` on a hybrid x86 and the results are
> > printed out with "cpu_core/cpu-cycles/" and "cpu_atom/cpu-cycles/",
> > the perf_event_attr for cpu-cycles and cpu_core/cpu-cycles/ be the
> > same, similarly for the cpu_atom event. Prior to these patches the
> > event with a PMU prefers sysfs/json over legacy encodings while with
> > no PMU legacy encodings are preferred - these are different encodings
> > on x86.
> >
> > The change to support legacy events with PMUs was done to clean up
> > Intel's hybrid PMU implementation. Having sysfs/json events with
> > increased priority to legacy was requested by Mark Rutland
> >   <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy
> > events on that PMU. It is believed the PMU driver is now fixed, but
> > this has only been confirmed on ARM Juno boards. It was requested that
> > RISC-V be able to add events to the perf tool json so the PMU driver
> > didn't need to map legacy events to config encodings:
> > https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> > This patch series achieves this.
> >
> > A previous series of patches decreasing legacy hardware event
> > priorities was posted in:
> > https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
> > Namhyung Kim <namhyung@kernel.org> mentioned that hardware and
> > software events can be implemented similarly:
> > https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
> > and this patch series achieves this.
> >
> > Note, patch 2 (perf parse-events: Fix legacy cache events if event is
> > duplicated in a PMU) fixes a function deleted by patch 17 (perf
> > parse-events: Remove hard coded legacy hardware and cache
> > parsing). Adding the json exposed an issue when legacy cache (not
> > legacy hardware) and sysfs/json events exist. The fix is necessary to
> > keep tests passing through the series. It is also posted for backports
> > to stable trees.
> >
> > The perf list behavior includes a lot more information and events. The
> > before behavior on a hybrid alderlake is:
> > ```
> > $ perf list hw
> >
> > List of pre-defined events (to be used in -e or -M):
> >
> >    branch-instructions OR branches                    [Hardware event]
> >    branch-misses                                      [Hardware event]
> >    bus-cycles                                         [Hardware event]
> >    cache-misses                                       [Hardware event]
> >    cache-references                                   [Hardware event]
> >    cpu-cycles OR cycles                               [Hardware event]
> >    instructions                                       [Hardware event]
> >    ref-cycles                                         [Hardware event]
> > $ perf list hwcache
> >
> > List of pre-defined events (to be used in -e or -M):
> >
> >
> > cache:
> >    L1-dcache-loads OR cpu_atom/L1-dcache-loads/
> >    L1-dcache-stores OR cpu_atom/L1-dcache-stores/
> >    L1-icache-loads OR cpu_atom/L1-icache-loads/
> >    L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/
> >    LLC-loads OR cpu_atom/LLC-loads/
> >    LLC-load-misses OR cpu_atom/LLC-load-misses/
> >    LLC-stores OR cpu_atom/LLC-stores/
> >    LLC-store-misses OR cpu_atom/LLC-store-misses/
> >    dTLB-loads OR cpu_atom/dTLB-loads/
> >    dTLB-load-misses OR cpu_atom/dTLB-load-misses/
> >    dTLB-stores OR cpu_atom/dTLB-stores/
> >    dTLB-store-misses OR cpu_atom/dTLB-store-misses/
> >    iTLB-load-misses OR cpu_atom/iTLB-load-misses/
> >    branch-loads OR cpu_atom/branch-loads/
> >    branch-load-misses OR cpu_atom/branch-load-misses/
> >    L1-dcache-loads OR cpu_core/L1-dcache-loads/
> >    L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/
> >    L1-dcache-stores OR cpu_core/L1-dcache-stores/
> >    L1-icache-load-misses OR cpu_core/L1-icache-load-misses/
> >    LLC-loads OR cpu_core/LLC-loads/
> >    LLC-load-misses OR cpu_core/LLC-load-misses/
> >    LLC-stores OR cpu_core/LLC-stores/
> >    LLC-store-misses OR cpu_core/LLC-store-misses/
> >    dTLB-loads OR cpu_core/dTLB-loads/
> >    dTLB-load-misses OR cpu_core/dTLB-load-misses/
> >    dTLB-stores OR cpu_core/dTLB-stores/
> >    dTLB-store-misses OR cpu_core/dTLB-store-misses/
> >    iTLB-load-misses OR cpu_core/iTLB-load-misses/
> >    branch-loads OR cpu_core/branch-loads/
> >    branch-load-misses OR cpu_core/branch-load-misses/
> >    node-loads OR cpu_core/node-loads/
> >    node-load-misses OR cpu_core/node-load-misses/
> > ```
> > and after it is:
> > ```
> > $ perf list hw
> >
> > legacy hardware:
> >    branch-instructions
> >         [Retired branch instructions [This event is an alias of branches].
> >          Unit: cpu_atom]
> >    branch-misses
> >         [Mispredicted branch instructions. Unit: cpu_atom]
> >    branches
> >         [Retired branch instructions [This event is an alias of
> >          branch-instructions]. Unit: cpu_atom]
> >    bus-cycles
> >         [Bus cycles,which can be different from total cycles. Unit: cpu_atom]
> >    cache-misses
> >         [Cache misses. Usually this indicates Last Level Cache misses; this is
> >          intended to be used in conjunction with the
> >          PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
> >          Unit: cpu_atom]
> >    cache-references
> >         [Cache accesses. Usually this indicates Last Level Cache accesses but
> >          this may vary depending on your CPU. This may include prefetches and
> >          coherency messages; again this depends on the design of your CPU.
> >          Unit: cpu_atom]
> >    cpu-cycles
> >         [Total cycles. Be wary of what happens during CPU frequency scaling
> >          [This event is an alias of cycles]. Unit: cpu_atom]
> >    cycles
> >         [Total cycles. Be wary of what happens during CPU frequency scaling
> >          [This event is an alias of cpu-cycles]. Unit: cpu_atom]
> >    instructions
> >         [Retired instructions. Be careful,these can be affected by various
> >          issues,most notably hardware interrupt counts. Unit: cpu_atom]
> >    ref-cycles
> >         [Total cycles; not affected by CPU frequency scaling. Unit: cpu_atom]
> >    branch-instructions
> >         [Retired branch instructions [This event is an alias of branches].
> >          Unit: cpu_core]
> >    branch-misses
> >         [Mispredicted branch instructions. Unit: cpu_core]
> >    branches
> >         [Retired branch instructions [This event is an alias of
> >          branch-instructions]. Unit: cpu_core]
> >    bus-cycles
> >         [Bus cycles,which can be different from total cycles. Unit: cpu_core]
> >    cache-misses
> >         [Cache misses. Usually this indicates Last Level Cache misses; this is
> >          intended to be used in conjunction with the
> >          PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
> >          Unit: cpu_core]
> >    cache-references
> >         [Cache accesses. Usually this indicates Last Level Cache accesses but
> >          this may vary depending on your CPU. This may include prefetches and
> >          coherency messages; again this depends on the design of your CPU.
> >          Unit: cpu_core]
> >    cpu-cycles
> >         [Total cycles. Be wary of what happens during CPU frequency scaling
> >          [This event is an alias of cycles]. Unit: cpu_core]
> >    cycles
> >         [Total cycles. Be wary of what happens during CPU frequency scaling
> >          [This event is an alias of cpu-cycles]. Unit: cpu_core]
> >    instructions
> >         [Retired instructions. Be careful,these can be affected by various
> >          issues,most notably hardware interrupt counts. Unit: cpu_core]
> >    ref-cycles
> >         [Total cycles; not affected by CPU frequency scaling. Unit: cpu_core]
> > $ perf list hwcache
> >
> > legacy cache:
> >    branch-load-misses
> >         [Branch prediction unit read misses. Unit: cpu_atom]
> >    branch-loads
> >         [Branch prediction unit read accesses. Unit: cpu_atom]
> >    dtlb-load-misses
> >         [Data TLB read misses. Unit: cpu_atom]
> >    dtlb-loads
> >         [Data TLB read accesses. Unit: cpu_atom]
> >    dtlb-store-misses
> >         [Data TLB write misses. Unit: cpu_atom]
> >    dtlb-stores
> >         [Data TLB write accesses. Unit: cpu_atom]
> >    itlb-load-misses
> >         [Instruction TLB read misses. Unit: cpu_atom]
> >    l1-dcache-loads
> >         [Level 1 data cache read accesses. Unit: cpu_atom]
> >    l1-dcache-stores
> >         [Level 1 data cache write accesses. Unit: cpu_atom]
> >    l1-icache-load-misses
> >         [Level 1 instruction cache read misses. Unit: cpu_atom]
> >    l1-icache-loads
> >         [Level 1 instruction cache read accesses. Unit: cpu_atom]
> >    llc-load-misses
> >         [Last level cache read misses. Unit: cpu_atom]
> >    llc-loads
> >         [Last level cache read accesses. Unit: cpu_atom]
> >    llc-store-misses
> >         [Last level cache write misses. Unit: cpu_atom]
> >    llc-stores
> >         [Last level cache write accesses. Unit: cpu_atom]
> >    branch-load-misses
> >         [Branch prediction unit read misses. Unit: cpu_core]
> >    branch-loads
> >         [Branch prediction unit read accesses. Unit: cpu_core]
> >    dtlb-load-misses
> >         [Data TLB read misses. Unit: cpu_core]
> >    dtlb-loads
> >         [Data TLB read accesses. Unit: cpu_core]
> >    dtlb-store-misses
> >         [Data TLB write misses. Unit: cpu_core]
> >    dtlb-stores
> >         [Data TLB write accesses. Unit: cpu_core]
> >    itlb-load-misses
> >         [Instruction TLB read misses. Unit: cpu_core]
> >    l1-dcache-load-misses
> >         [Level 1 data cache read misses. Unit: cpu_core]
> >    l1-dcache-loads
> >         [Level 1 data cache read accesses. Unit: cpu_core]
> >    l1-dcache-stores
> >         [Level 1 data cache write accesses. Unit: cpu_core]
> >    l1-icache-load-misses
> >         [Level 1 instruction cache read misses. Unit: cpu_core]
> >    llc-load-misses
> >         [Last level cache read misses. Unit: cpu_core]
> >    llc-loads
> >         [Last level cache read accesses. Unit: cpu_core]
> >    llc-store-misses
> >         [Last level cache write misses. Unit: cpu_core]
> >    llc-stores
> >         [Last level cache write accesses. Unit: cpu_core]
> >    node-load-misses
> >         [Local memory read misses. Unit: cpu_core]
> >    node-loads
> >         [Local memory read accesses. Unit: cpu_core]
> > ```
> >
> > v6. Fix x86 hybrid mismatched number of evsels for the case a PMU is
> >      specified. Add patches to make failures in the parse-events test
> >      easier to diagnose. Reorder the perf stat default events patch to
> >      come earlier.
> >
> > v5. Add patch for retrying default events, fixing regression when
> >      non-root and paranoid. Make cycles to cpu-cycles test event change
> >      (to avoid non-core ARM events) the default on all architectures
> >      (suggested by Namhyung). Switch all non-test cases to specifying a
> >      PMU. Improvements to the parse-events test including core PMU
> >      parsing support for architectures without a "cpu" PMU.
> >      https://lore.kernel.org/lkml/20250923041844.400164-1-irogers@google.com/
> >
> > v4: Fixes for matching hard coded metrics in stat-shadow. Make the
> >      default "cycles" event string on ARM "cpu-cycles" which is the
> >      same legacy event but avoids name collisions on ARM PMUs. To
> >      support this, use evlist__new_default for the no command line
> >      event case in `perf record` and `perf top`. Make
> >      evlist__new_default only scan core PMUs.
> >      https://lore.kernel.org/lkml/20250914181121.1952748-1-irogers@google.com/#t
> >
> > v3: Deprecate the legacy cache events that aren't shown in the
> >      previous perf list to avoid the perf list output being too verbose.
> >      https://lore.kernel.org/lkml/20250828205930.4007284-1-irogers@google.com/
> >
>
> Hi Ian,
>
> Did you drop the change to ignore failures to open events in favour of
> switching the default from "cycles" to "cpu-cycles" instead? I'm trying
> to follow the changelog but couldn't see it.

Hi James,

your example is using `perf stat` whilst the behavior to ignore
failing to open events is for `perf record` and added in:
https://lore.kernel.org/lkml/20250923223312.238185-6-irogers@google.com/
Changing the behavior for `perf stat` hasn't been in scope for making
the behavior of legacy events consistent in any patch series. The
whole '<not counted>' vs '<not supported>' vs an early exit is
something where not changing behavior has been the name of the game
for many years due to potential reliance on the behavior.

Fwiw, this series migrates legacy events to json we should be able to
do the same with the legacy hard coded metrics in `perf stat` once it
lands. Removing the hard coded metrics will give better metrics as we
can use event groups, share events between metrics, .. and only
display requested metrics when metrics are requested. My point is that
there is further `perf stat` clean up that this work moves us toward
but at 28 patches already I'd like not to start work on making perf
stat a better place. Getting consensus on what that better place is
given the existing fragmentic behavior will be a challenge, but at
least these days we have some form of tests, albeit they tend to flake
all the time. To solve your problems below where you're specifically
picking an event that can wildcard, give the PMU or use cpu-cycles, or
one of the many other names ARM already has to mean this. Overloading
the event 'cycles' wasn't something introduced in this series and the
fact changing the name hasn't happened in the drivers is really very
frustrating.

Thanks,
Ian


> In v3 I got <not supported> for the uncore cycles event, but in v6 I get
> a complete failure:
>
>   -> sudo perf-v3 stat -e cycles -- true
>
>   Performance counter stats for 'true':
>
>             1732478      cycles
>
>     <not supported>      arm_cmn_0/cycles/
>
>
>   -> sudo perf-v6 stat -e cycles -- true
>   Error:
>   Invalid event (cycles) in per-thread mode, enable system wide with '-
>   a'.
>
> The verbose output shows that it tries both, but doesn't ignore the
> error on arm_cmn_0 anymore:
>
> -> sudo perf-v6 stat -e cycles -vvv -- true
> Control descriptor is not initialized
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>    type                             0 (PERF_TYPE_HARDWARE)
>    size                             136
>    config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>    sample_type                      IDENTIFIER
>    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>    disabled                         1
>    inherit                          1
>    enable_on_exec                   1
> ------------------------------------------------------------
> sys_perf_event_open: pid 9646  cpu -1  group_fd -1  flags 0x8 = 3
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>    type                             11 (arm_cmn_0)
>    size                             136
> Required parameter 'wp_dev_sel' not specified
> Required parameter 'wp_dev_sel' not specified
>    config                           0x3 (cycles)
>    sample_type                      IDENTIFIER
>    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>    disabled                         1
>    inherit                          1
>    enable_on_exec                   1
> ------------------------------------------------------------
> sys_perf_event_open: pid 9646  cpu 0  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -22
> switching off exclude_guest for PMU arm_cmn_0
> Using PERF_SAMPLE_READ / :S modifier is not compatible with inherit,
> falling back to no-inherit.
> Warning:
> cycles event is not supported by the kernel.
> Error:
> Invalid event (cycles) in per-thread mode, enable system wide with '-a'.
>
>
> > v2: Additional details to the cover letter. Credit to Vince Weaver
> >      added to the commit message for the event details. Additional
> >      patches to clean up perf_pmu new_alias by removing an unused term
> >      scanner argument and avoid stdio usage.
> >      https://lore.kernel.org/lkml/20250828163225.3839073-1-irogers@google.com/
> >
> > v1: https://lore.kernel.org/lkml/20250828064231.1762997-1-irogers@google.com/
> >
> > Ian Rogers (28):
> >    perf stat: Allow retry for default events
> >    perf parse-events: Fix legacy cache events if event is duplicated in a
> >      PMU
> >    perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
> >    perf stat: Avoid wildcarding PMUs for default events
> >    perf record: Skip don't fail for events that don't open
> >    perf jevents: Support copying the source json files to OUTPUT
> >    perf pmu: Don't eagerly parse event terms
> >    perf parse-events: Remove unused FILE input argument to scanner
> >    perf pmu: Use fd rather than FILE from new_alias
> >    perf pmu: Factor term parsing into a perf_event_attr into a helper
> >    perf parse-events: Add terms for legacy hardware and cache config
> >      values
> >    perf jevents: Add legacy json terms and default_core event table
> >      helper
> >    perf pmu: Add and use legacy_terms in alias information
> >    perf jevents: Add legacy-hardware and legacy-cache json
> >    perf print-events: Remove print_hwcache_events
> >    perf print-events: Remove print_symbol_events
> >    perf parse-events: Remove hard coded legacy hardware and cache parsing
> >    perf record: Use evlist__new_default when no events specified
> >    perf top: Use evlist__new_default when no events specified
> >    perf evlist: Avoid scanning all PMUs for evlist__new_default
> >    perf evsel: Improvements to __evsel__match
> >    perf test parse-events: Use evsel__match for legacy events
> >    perf test parse-events: Without a PMU use cpu-cycles rather than
> >      cycles
> >    perf test parse-events: Remove cpu PMU requirement
> >    perf test: Switch cycles event to cpu-cycles
> >    perf test: Clean up test_..config helpers
> >    perf test parse-events: Add evlist test helper
> >    perf test parse-events: Add evsel test helper
> >
> >   tools/perf/Makefile.perf                      |   21 +-
> >   tools/perf/arch/x86/util/intel-pt.c           |    2 +-
> >   tools/perf/builtin-list.c                     |   34 +-
> >   tools/perf/builtin-record.c                   |   97 +-
> >   tools/perf/builtin-stat.c                     |  171 +-
> >   tools/perf/builtin-top.c                      |    8 +-
> >   tools/perf/pmu-events/Build                   |   24 +-
> >   .../arch/common/common/legacy-hardware.json   |   72 +
> >   tools/perf/pmu-events/empty-pmu-events.c      | 2771 ++++++++++++++++-
> >   tools/perf/pmu-events/jevents.py              |   32 +
> >   tools/perf/pmu-events/make_legacy_cache.py    |  129 +
> >   tools/perf/pmu-events/pmu-events.h            |    1 +
> >   tools/perf/tests/code-reading.c               |    2 +-
> >   tools/perf/tests/keep-tracking.c              |    2 +-
> >   tools/perf/tests/parse-events.c               | 2010 ++++++------
> >   tools/perf/tests/perf-time-to-tsc.c           |    4 +-
> >   tools/perf/tests/pmu-events.c                 |   24 +-
> >   tools/perf/tests/pmu.c                        |    3 +-
> >   tools/perf/tests/switch-tracking.c            |    2 +-
> >   tools/perf/util/evlist.c                      |   18 +-
> >   tools/perf/util/evsel.c                       |   21 +-
> >   tools/perf/util/parse-events.c                |  282 +-
> >   tools/perf/util/parse-events.h                |   22 +-
> >   tools/perf/util/parse-events.l                |   54 +-
> >   tools/perf/util/parse-events.y                |  114 +-
> >   tools/perf/util/perf_api_probe.c              |   27 +-
> >   tools/perf/util/pmu.c                         |  309 +-
> >   tools/perf/util/print-events.c                |  112 -
> >   tools/perf/util/print-events.h                |    4 -
> >   29 files changed, 4523 insertions(+), 1849 deletions(-)
> >   create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json
> >   create mode 100755 tools/perf/pmu-events/make_legacy_cache.py
> >
>
Re: [PATCH v6 00/28] Legacy hardware/cache events as json
Posted by Ian Rogers 4 hours ago
On Wed, Oct 1, 2025 at 8:12 AM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, Oct 1, 2025 at 6:38 AM James Clark <james.clark@linaro.org> wrote:
> >
> >
> >
> > On 23/09/2025 11:32 pm, Ian Rogers wrote:
> > > Mirroring similar work for software events in commit 6e9fa4131abb
> > > ("perf parse-events: Remove non-json software events"). These changes
> > > migrate the legacy hardware and cache events to json.  With no hard
> > > coded legacy hardware or cache events the wild card, case
> > > insensitivity, etc. is consistent for events. This does, however, mean
> > > events like cycles will wild card against all PMUs. A change doing the
> > > same was originally posted and merged from:
> > > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> > > and reverted by Linus in commit 4f1b067359ac ("Revert "perf
> > > parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
> > > his dislike for the cycles behavior on ARM with perf record. Earlier
> > > patches in this series make perf record event opening failures
> > > non-fatal and hide the cycles event's failure to open on ARM in perf
> > > record, so it is expected the behavior will now be transparent in perf
> > > record on ARM. perf stat with a cycles event will wildcard open the
> > > event on all PMUs, however, with default events the cycles event will
> > > only be opened on core PMUs.
> > >
> > > The key motivation for these patches is so that if, for example, you
> > > run `perf stat -e cpu-cycles ...` on a hybrid x86 and the results are
> > > printed out with "cpu_core/cpu-cycles/" and "cpu_atom/cpu-cycles/",
> > > the perf_event_attr for cpu-cycles and cpu_core/cpu-cycles/ be the
> > > same, similarly for the cpu_atom event. Prior to these patches the
> > > event with a PMU prefers sysfs/json over legacy encodings while with
> > > no PMU legacy encodings are preferred - these are different encodings
> > > on x86.
> > >
> > > The change to support legacy events with PMUs was done to clean up
> > > Intel's hybrid PMU implementation. Having sysfs/json events with
> > > increased priority to legacy was requested by Mark Rutland
> > >   <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy
> > > events on that PMU. It is believed the PMU driver is now fixed, but
> > > this has only been confirmed on ARM Juno boards. It was requested that
> > > RISC-V be able to add events to the perf tool json so the PMU driver
> > > didn't need to map legacy events to config encodings:
> > > https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> > > This patch series achieves this.
> > >
> > > A previous series of patches decreasing legacy hardware event
> > > priorities was posted in:
> > > https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
> > > Namhyung Kim <namhyung@kernel.org> mentioned that hardware and
> > > software events can be implemented similarly:
> > > https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
> > > and this patch series achieves this.
> > >
> > > Note, patch 2 (perf parse-events: Fix legacy cache events if event is
> > > duplicated in a PMU) fixes a function deleted by patch 17 (perf
> > > parse-events: Remove hard coded legacy hardware and cache
> > > parsing). Adding the json exposed an issue when legacy cache (not
> > > legacy hardware) and sysfs/json events exist. The fix is necessary to
> > > keep tests passing through the series. It is also posted for backports
> > > to stable trees.
> > >
> > > The perf list behavior includes a lot more information and events. The
> > > before behavior on a hybrid alderlake is:
> > > ```
> > > $ perf list hw
> > >
> > > List of pre-defined events (to be used in -e or -M):
> > >
> > >    branch-instructions OR branches                    [Hardware event]
> > >    branch-misses                                      [Hardware event]
> > >    bus-cycles                                         [Hardware event]
> > >    cache-misses                                       [Hardware event]
> > >    cache-references                                   [Hardware event]
> > >    cpu-cycles OR cycles                               [Hardware event]
> > >    instructions                                       [Hardware event]
> > >    ref-cycles                                         [Hardware event]
> > > $ perf list hwcache
> > >
> > > List of pre-defined events (to be used in -e or -M):
> > >
> > >
> > > cache:
> > >    L1-dcache-loads OR cpu_atom/L1-dcache-loads/
> > >    L1-dcache-stores OR cpu_atom/L1-dcache-stores/
> > >    L1-icache-loads OR cpu_atom/L1-icache-loads/
> > >    L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/
> > >    LLC-loads OR cpu_atom/LLC-loads/
> > >    LLC-load-misses OR cpu_atom/LLC-load-misses/
> > >    LLC-stores OR cpu_atom/LLC-stores/
> > >    LLC-store-misses OR cpu_atom/LLC-store-misses/
> > >    dTLB-loads OR cpu_atom/dTLB-loads/
> > >    dTLB-load-misses OR cpu_atom/dTLB-load-misses/
> > >    dTLB-stores OR cpu_atom/dTLB-stores/
> > >    dTLB-store-misses OR cpu_atom/dTLB-store-misses/
> > >    iTLB-load-misses OR cpu_atom/iTLB-load-misses/
> > >    branch-loads OR cpu_atom/branch-loads/
> > >    branch-load-misses OR cpu_atom/branch-load-misses/
> > >    L1-dcache-loads OR cpu_core/L1-dcache-loads/
> > >    L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/
> > >    L1-dcache-stores OR cpu_core/L1-dcache-stores/
> > >    L1-icache-load-misses OR cpu_core/L1-icache-load-misses/
> > >    LLC-loads OR cpu_core/LLC-loads/
> > >    LLC-load-misses OR cpu_core/LLC-load-misses/
> > >    LLC-stores OR cpu_core/LLC-stores/
> > >    LLC-store-misses OR cpu_core/LLC-store-misses/
> > >    dTLB-loads OR cpu_core/dTLB-loads/
> > >    dTLB-load-misses OR cpu_core/dTLB-load-misses/
> > >    dTLB-stores OR cpu_core/dTLB-stores/
> > >    dTLB-store-misses OR cpu_core/dTLB-store-misses/
> > >    iTLB-load-misses OR cpu_core/iTLB-load-misses/
> > >    branch-loads OR cpu_core/branch-loads/
> > >    branch-load-misses OR cpu_core/branch-load-misses/
> > >    node-loads OR cpu_core/node-loads/
> > >    node-load-misses OR cpu_core/node-load-misses/
> > > ```
> > > and after it is:
> > > ```
> > > $ perf list hw
> > >
> > > legacy hardware:
> > >    branch-instructions
> > >         [Retired branch instructions [This event is an alias of branches].
> > >          Unit: cpu_atom]
> > >    branch-misses
> > >         [Mispredicted branch instructions. Unit: cpu_atom]
> > >    branches
> > >         [Retired branch instructions [This event is an alias of
> > >          branch-instructions]. Unit: cpu_atom]
> > >    bus-cycles
> > >         [Bus cycles,which can be different from total cycles. Unit: cpu_atom]
> > >    cache-misses
> > >         [Cache misses. Usually this indicates Last Level Cache misses; this is
> > >          intended to be used in conjunction with the
> > >          PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
> > >          Unit: cpu_atom]
> > >    cache-references
> > >         [Cache accesses. Usually this indicates Last Level Cache accesses but
> > >          this may vary depending on your CPU. This may include prefetches and
> > >          coherency messages; again this depends on the design of your CPU.
> > >          Unit: cpu_atom]
> > >    cpu-cycles
> > >         [Total cycles. Be wary of what happens during CPU frequency scaling
> > >          [This event is an alias of cycles]. Unit: cpu_atom]
> > >    cycles
> > >         [Total cycles. Be wary of what happens during CPU frequency scaling
> > >          [This event is an alias of cpu-cycles]. Unit: cpu_atom]
> > >    instructions
> > >         [Retired instructions. Be careful,these can be affected by various
> > >          issues,most notably hardware interrupt counts. Unit: cpu_atom]
> > >    ref-cycles
> > >         [Total cycles; not affected by CPU frequency scaling. Unit: cpu_atom]
> > >    branch-instructions
> > >         [Retired branch instructions [This event is an alias of branches].
> > >          Unit: cpu_core]
> > >    branch-misses
> > >         [Mispredicted branch instructions. Unit: cpu_core]
> > >    branches
> > >         [Retired branch instructions [This event is an alias of
> > >          branch-instructions]. Unit: cpu_core]
> > >    bus-cycles
> > >         [Bus cycles,which can be different from total cycles. Unit: cpu_core]
> > >    cache-misses
> > >         [Cache misses. Usually this indicates Last Level Cache misses; this is
> > >          intended to be used in conjunction with the
> > >          PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
> > >          Unit: cpu_core]
> > >    cache-references
> > >         [Cache accesses. Usually this indicates Last Level Cache accesses but
> > >          this may vary depending on your CPU. This may include prefetches and
> > >          coherency messages; again this depends on the design of your CPU.
> > >          Unit: cpu_core]
> > >    cpu-cycles
> > >         [Total cycles. Be wary of what happens during CPU frequency scaling
> > >          [This event is an alias of cycles]. Unit: cpu_core]
> > >    cycles
> > >         [Total cycles. Be wary of what happens during CPU frequency scaling
> > >          [This event is an alias of cpu-cycles]. Unit: cpu_core]
> > >    instructions
> > >         [Retired instructions. Be careful,these can be affected by various
> > >          issues,most notably hardware interrupt counts. Unit: cpu_core]
> > >    ref-cycles
> > >         [Total cycles; not affected by CPU frequency scaling. Unit: cpu_core]
> > > $ perf list hwcache
> > >
> > > legacy cache:
> > >    branch-load-misses
> > >         [Branch prediction unit read misses. Unit: cpu_atom]
> > >    branch-loads
> > >         [Branch prediction unit read accesses. Unit: cpu_atom]
> > >    dtlb-load-misses
> > >         [Data TLB read misses. Unit: cpu_atom]
> > >    dtlb-loads
> > >         [Data TLB read accesses. Unit: cpu_atom]
> > >    dtlb-store-misses
> > >         [Data TLB write misses. Unit: cpu_atom]
> > >    dtlb-stores
> > >         [Data TLB write accesses. Unit: cpu_atom]
> > >    itlb-load-misses
> > >         [Instruction TLB read misses. Unit: cpu_atom]
> > >    l1-dcache-loads
> > >         [Level 1 data cache read accesses. Unit: cpu_atom]
> > >    l1-dcache-stores
> > >         [Level 1 data cache write accesses. Unit: cpu_atom]
> > >    l1-icache-load-misses
> > >         [Level 1 instruction cache read misses. Unit: cpu_atom]
> > >    l1-icache-loads
> > >         [Level 1 instruction cache read accesses. Unit: cpu_atom]
> > >    llc-load-misses
> > >         [Last level cache read misses. Unit: cpu_atom]
> > >    llc-loads
> > >         [Last level cache read accesses. Unit: cpu_atom]
> > >    llc-store-misses
> > >         [Last level cache write misses. Unit: cpu_atom]
> > >    llc-stores
> > >         [Last level cache write accesses. Unit: cpu_atom]
> > >    branch-load-misses
> > >         [Branch prediction unit read misses. Unit: cpu_core]
> > >    branch-loads
> > >         [Branch prediction unit read accesses. Unit: cpu_core]
> > >    dtlb-load-misses
> > >         [Data TLB read misses. Unit: cpu_core]
> > >    dtlb-loads
> > >         [Data TLB read accesses. Unit: cpu_core]
> > >    dtlb-store-misses
> > >         [Data TLB write misses. Unit: cpu_core]
> > >    dtlb-stores
> > >         [Data TLB write accesses. Unit: cpu_core]
> > >    itlb-load-misses
> > >         [Instruction TLB read misses. Unit: cpu_core]
> > >    l1-dcache-load-misses
> > >         [Level 1 data cache read misses. Unit: cpu_core]
> > >    l1-dcache-loads
> > >         [Level 1 data cache read accesses. Unit: cpu_core]
> > >    l1-dcache-stores
> > >         [Level 1 data cache write accesses. Unit: cpu_core]
> > >    l1-icache-load-misses
> > >         [Level 1 instruction cache read misses. Unit: cpu_core]
> > >    llc-load-misses
> > >         [Last level cache read misses. Unit: cpu_core]
> > >    llc-loads
> > >         [Last level cache read accesses. Unit: cpu_core]
> > >    llc-store-misses
> > >         [Last level cache write misses. Unit: cpu_core]
> > >    llc-stores
> > >         [Last level cache write accesses. Unit: cpu_core]
> > >    node-load-misses
> > >         [Local memory read misses. Unit: cpu_core]
> > >    node-loads
> > >         [Local memory read accesses. Unit: cpu_core]
> > > ```
> > >
> > > v6. Fix x86 hybrid mismatched number of evsels for the case a PMU is
> > >      specified. Add patches to make failures in the parse-events test
> > >      easier to diagnose. Reorder the perf stat default events patch to
> > >      come earlier.
> > >
> > > v5. Add patch for retrying default events, fixing regression when
> > >      non-root and paranoid. Make cycles to cpu-cycles test event change
> > >      (to avoid non-core ARM events) the default on all architectures
> > >      (suggested by Namhyung). Switch all non-test cases to specifying a
> > >      PMU. Improvements to the parse-events test including core PMU
> > >      parsing support for architectures without a "cpu" PMU.
> > >      https://lore.kernel.org/lkml/20250923041844.400164-1-irogers@google.com/
> > >
> > > v4: Fixes for matching hard coded metrics in stat-shadow. Make the
> > >      default "cycles" event string on ARM "cpu-cycles" which is the
> > >      same legacy event but avoids name collisions on ARM PMUs. To
> > >      support this, use evlist__new_default for the no command line
> > >      event case in `perf record` and `perf top`. Make
> > >      evlist__new_default only scan core PMUs.
> > >      https://lore.kernel.org/lkml/20250914181121.1952748-1-irogers@google.com/#t
> > >
> > > v3: Deprecate the legacy cache events that aren't shown in the
> > >      previous perf list to avoid the perf list output being too verbose.
> > >      https://lore.kernel.org/lkml/20250828205930.4007284-1-irogers@google.com/
> > >
> >
> > Hi Ian,
> >
> > Did you drop the change to ignore failures to open events in favour of
> > switching the default from "cycles" to "cpu-cycles" instead? I'm trying
> > to follow the changelog but couldn't see it.
>
> Hi James,
>
> your example is using `perf stat` whilst the behavior to ignore
> failing to open events is for `perf record` and added in:
> https://lore.kernel.org/lkml/20250923223312.238185-6-irogers@google.com/
> Changing the behavior for `perf stat` hasn't been in scope for making
> the behavior of legacy events consistent in any patch series. The
> whole '<not counted>' vs '<not supported>' vs an early exit is
> something where not changing behavior has been the name of the game
> for many years due to potential reliance on the behavior.
>
> Fwiw, this series migrates legacy events to json we should be able to
> do the same with the legacy hard coded metrics in `perf stat` once it
> lands. Removing the hard coded metrics will give better metrics as we
> can use event groups, share events between metrics, .. and only
> display requested metrics when metrics are requested. My point is that
> there is further `perf stat` clean up that this work moves us toward
> but at 28 patches already I'd like not to start work on making perf
> stat a better place. Getting consensus on what that better place is
> given the existing fragmentic behavior will be a challenge, but at
> least these days we have some form of tests, albeit they tend to flake
> all the time. To solve your problems below where you're specifically
> picking an event that can wildcard, give the PMU or use cpu-cycles, or
> one of the many other names ARM already has to mean this. Overloading
> the event 'cycles' wasn't something introduced in this series and the
> fact changing the name hasn't happened in the drivers is really very
> frustrating.
>
> Thanks,
> Ian
>
>
> > In v3 I got <not supported> for the uncore cycles event, but in v6 I get
> > a complete failure:
> >
> >   -> sudo perf-v3 stat -e cycles -- true
> >
> >   Performance counter stats for 'true':
> >
> >             1732478      cycles
> >
> >     <not supported>      arm_cmn_0/cycles/
> >
> >
> >   -> sudo perf-v6 stat -e cycles -- true
> >   Error:
> >   Invalid event (cycles) in per-thread mode, enable system wide with '-
> >   a'.
> >
> > The verbose output shows that it tries both, but doesn't ignore the
> > error on arm_cmn_0 anymore:
> >
> > -> sudo perf-v6 stat -e cycles -vvv -- true
> > Control descriptor is not initialized
> > Opening: cycles
> > ------------------------------------------------------------
> > perf_event_attr:
> >    type                             0 (PERF_TYPE_HARDWARE)
> >    size                             136
> >    config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >    sample_type                      IDENTIFIER
> >    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >    disabled                         1
> >    inherit                          1
> >    enable_on_exec                   1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 9646  cpu -1  group_fd -1  flags 0x8 = 3
> > Opening: cycles
> > ------------------------------------------------------------
> > perf_event_attr:
> >    type                             11 (arm_cmn_0)
> >    size                             136
> > Required parameter 'wp_dev_sel' not specified
> > Required parameter 'wp_dev_sel' not specified
> >    config                           0x3 (cycles)
> >    sample_type                      IDENTIFIER
> >    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >    disabled                         1
> >    inherit                          1
> >    enable_on_exec                   1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 9646  cpu 0  group_fd -1  flags 0x8
> > sys_perf_event_open failed, error -22
> > switching off exclude_guest for PMU arm_cmn_0
> > Using PERF_SAMPLE_READ / :S modifier is not compatible with inherit,
> > falling back to no-inherit.
> > Warning:
> > cycles event is not supported by the kernel.
> > Error:
> > Invalid event (cycles) in per-thread mode, enable system wide with '-a'.

Re v3 vs v6:

v3 series here has no changes to perf stat:
https://lore.kernel.org/lkml/20250828205930.4007284-1-irogers@google.com/
the v6 does change perf stat in the "add_default_events" function:
https://lore.kernel.org/lkml/20250923223312.238185-5-irogers@google.com/
but you are reporting an issue with an event specified, so not using
default events. The evsel changes in v6 are for the evsel__match
function that isn't used during event opening.

Thanks,
Ian

> >
> > > v2: Additional details to the cover letter. Credit to Vince Weaver
> > >      added to the commit message for the event details. Additional
> > >      patches to clean up perf_pmu new_alias by removing an unused term
> > >      scanner argument and avoid stdio usage.
> > >      https://lore.kernel.org/lkml/20250828163225.3839073-1-irogers@google.com/
> > >
> > > v1: https://lore.kernel.org/lkml/20250828064231.1762997-1-irogers@google.com/
> > >
> > > Ian Rogers (28):
> > >    perf stat: Allow retry for default events
> > >    perf parse-events: Fix legacy cache events if event is duplicated in a
> > >      PMU
> > >    perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
> > >    perf stat: Avoid wildcarding PMUs for default events
> > >    perf record: Skip don't fail for events that don't open
> > >    perf jevents: Support copying the source json files to OUTPUT
> > >    perf pmu: Don't eagerly parse event terms
> > >    perf parse-events: Remove unused FILE input argument to scanner
> > >    perf pmu: Use fd rather than FILE from new_alias
> > >    perf pmu: Factor term parsing into a perf_event_attr into a helper
> > >    perf parse-events: Add terms for legacy hardware and cache config
> > >      values
> > >    perf jevents: Add legacy json terms and default_core event table
> > >      helper
> > >    perf pmu: Add and use legacy_terms in alias information
> > >    perf jevents: Add legacy-hardware and legacy-cache json
> > >    perf print-events: Remove print_hwcache_events
> > >    perf print-events: Remove print_symbol_events
> > >    perf parse-events: Remove hard coded legacy hardware and cache parsing
> > >    perf record: Use evlist__new_default when no events specified
> > >    perf top: Use evlist__new_default when no events specified
> > >    perf evlist: Avoid scanning all PMUs for evlist__new_default
> > >    perf evsel: Improvements to __evsel__match
> > >    perf test parse-events: Use evsel__match for legacy events
> > >    perf test parse-events: Without a PMU use cpu-cycles rather than
> > >      cycles
> > >    perf test parse-events: Remove cpu PMU requirement
> > >    perf test: Switch cycles event to cpu-cycles
> > >    perf test: Clean up test_..config helpers
> > >    perf test parse-events: Add evlist test helper
> > >    perf test parse-events: Add evsel test helper
> > >
> > >   tools/perf/Makefile.perf                      |   21 +-
> > >   tools/perf/arch/x86/util/intel-pt.c           |    2 +-
> > >   tools/perf/builtin-list.c                     |   34 +-
> > >   tools/perf/builtin-record.c                   |   97 +-
> > >   tools/perf/builtin-stat.c                     |  171 +-
> > >   tools/perf/builtin-top.c                      |    8 +-
> > >   tools/perf/pmu-events/Build                   |   24 +-
> > >   .../arch/common/common/legacy-hardware.json   |   72 +
> > >   tools/perf/pmu-events/empty-pmu-events.c      | 2771 ++++++++++++++++-
> > >   tools/perf/pmu-events/jevents.py              |   32 +
> > >   tools/perf/pmu-events/make_legacy_cache.py    |  129 +
> > >   tools/perf/pmu-events/pmu-events.h            |    1 +
> > >   tools/perf/tests/code-reading.c               |    2 +-
> > >   tools/perf/tests/keep-tracking.c              |    2 +-
> > >   tools/perf/tests/parse-events.c               | 2010 ++++++------
> > >   tools/perf/tests/perf-time-to-tsc.c           |    4 +-
> > >   tools/perf/tests/pmu-events.c                 |   24 +-
> > >   tools/perf/tests/pmu.c                        |    3 +-
> > >   tools/perf/tests/switch-tracking.c            |    2 +-
> > >   tools/perf/util/evlist.c                      |   18 +-
> > >   tools/perf/util/evsel.c                       |   21 +-
> > >   tools/perf/util/parse-events.c                |  282 +-
> > >   tools/perf/util/parse-events.h                |   22 +-
> > >   tools/perf/util/parse-events.l                |   54 +-
> > >   tools/perf/util/parse-events.y                |  114 +-
> > >   tools/perf/util/perf_api_probe.c              |   27 +-
> > >   tools/perf/util/pmu.c                         |  309 +-
> > >   tools/perf/util/print-events.c                |  112 -
> > >   tools/perf/util/print-events.h                |    4 -
> > >   29 files changed, 4523 insertions(+), 1849 deletions(-)
> > >   create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json
> > >   create mode 100755 tools/perf/pmu-events/make_legacy_cache.py
> > >
> >