tools/perf/Makefile.perf | 21 +- tools/perf/arch/x86/util/intel-pt.c | 2 +- tools/perf/builtin-list.c | 34 +- tools/perf/builtin-record.c | 89 +- tools/perf/pmu-events/Build | 24 +- .../arch/common/common/legacy-hardware.json | 72 + tools/perf/pmu-events/empty-pmu-events.c | 2763 ++++++++++++++++- tools/perf/pmu-events/jevents.py | 24 + tools/perf/pmu-events/make_legacy_cache.py | 129 + tools/perf/pmu-events/pmu-events.h | 1 + tools/perf/tests/parse-events.c | 2 +- tools/perf/tests/pmu-events.c | 24 +- tools/perf/tests/pmu.c | 3 +- tools/perf/util/parse-events.c | 283 +- tools/perf/util/parse-events.h | 16 +- tools/perf/util/parse-events.l | 54 +- tools/perf/util/parse-events.y | 114 +- tools/perf/util/perf_api_probe.c | 27 +- tools/perf/util/pmu.c | 302 +- tools/perf/util/print-events.c | 112 - tools/perf/util/print-events.h | 4 - 21 files changed, 3330 insertions(+), 770 deletions(-) create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json create mode 100755 tools/perf/pmu-events/make_legacy_cache.py
Mirroring similar work for software events in commit 6e9fa4131abb ("perf parse-events: Remove non-json software events"). These changes migrate the legacy hardware and cache events to json. With no hard coded legacy hardware or cache events the wild card, case insensitivity, etc. is consistent for events. This does, however, mean events like cycles will wild card against all PMUs. A change doing the same was originally posted and merged from: https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com and reverted by Linus in commit 4f1b067359ac ("Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"") due to his dislike for the cycles behavior on ARM with perf record. Earlier patches in this series make perf record event opening failures non-fatal and hide the cycles event's failure to open on ARM in perf record, so it is expected the behavior will now be transparent in perf record on ARM. perf stat with a cycles event will wildcard open the event on all PMUs. The change to support legacy events with PMUs was done to clean up Intel's hybrid PMU implementation. Having sysfs/json events with increased priority to legacy was requested by Mark Rutland <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy events on that PMU. It is believed the PMU driver is now fixed, but this has only been confirmed on ARM Juno boards. It was requested that RISC-V be able to add events to the perf tool json so the PMU driver didn't need to map legacy events to config encodings: https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/ This patch series achieves this. A previous series of patches decreasing legacy hardware event priorities was posted in: https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/ Namhyung Kim <namhyung@kernel.org> mentioned that hardware and software events can be implemented similarly: https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/ and this patch series achieves this. Note, patch 1 (perf parse-events: Fix legacy cache events if event is duplicated in a PMU) fixes a function deleted by patch 15 (perf parse-events: Remove hard coded legacy hardware and cache parsing). Adding the json exposed an issue when legacy cache (not legacy hardware) and sysfs/json events exist. The fix is necessary to keep tests passing through the series. It is also posted for backports to stable trees. The perf list behavior includes a lot more information and events. The before behavior on a hybrid alderlake is: ``` $ perf list hw List of pre-defined events (to be used in -e or -M): branch-instructions OR branches [Hardware event] branch-misses [Hardware event] bus-cycles [Hardware event] cache-misses [Hardware event] cache-references [Hardware event] cpu-cycles OR cycles [Hardware event] instructions [Hardware event] ref-cycles [Hardware event] $ perf list hwcache List of pre-defined events (to be used in -e or -M): cache: L1-dcache-loads OR cpu_atom/L1-dcache-loads/ L1-dcache-stores OR cpu_atom/L1-dcache-stores/ L1-icache-loads OR cpu_atom/L1-icache-loads/ L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/ LLC-loads OR cpu_atom/LLC-loads/ LLC-load-misses OR cpu_atom/LLC-load-misses/ LLC-stores OR cpu_atom/LLC-stores/ LLC-store-misses OR cpu_atom/LLC-store-misses/ dTLB-loads OR cpu_atom/dTLB-loads/ dTLB-load-misses OR cpu_atom/dTLB-load-misses/ dTLB-stores OR cpu_atom/dTLB-stores/ dTLB-store-misses OR cpu_atom/dTLB-store-misses/ iTLB-load-misses OR cpu_atom/iTLB-load-misses/ branch-loads OR cpu_atom/branch-loads/ branch-load-misses OR cpu_atom/branch-load-misses/ L1-dcache-loads OR cpu_core/L1-dcache-loads/ L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/ L1-dcache-stores OR cpu_core/L1-dcache-stores/ L1-icache-load-misses OR cpu_core/L1-icache-load-misses/ LLC-loads OR cpu_core/LLC-loads/ LLC-load-misses OR cpu_core/LLC-load-misses/ LLC-stores OR cpu_core/LLC-stores/ LLC-store-misses OR cpu_core/LLC-store-misses/ dTLB-loads OR cpu_core/dTLB-loads/ dTLB-load-misses OR cpu_core/dTLB-load-misses/ dTLB-stores OR cpu_core/dTLB-stores/ dTLB-store-misses OR cpu_core/dTLB-store-misses/ iTLB-load-misses OR cpu_core/iTLB-load-misses/ branch-loads OR cpu_core/branch-loads/ branch-load-misses OR cpu_core/branch-load-misses/ node-loads OR cpu_core/node-loads/ node-load-misses OR cpu_core/node-load-misses/ ``` and after it is: ``` $ perf list hw legacy hardware: branch-instructions [Retired branch instructions [This event is an alias of branches]. Unit: cpu_atom] branch-misses [Mispredicted branch instructions. Unit: cpu_atom] branches [Retired branch instructions [This event is an alias of branch-instructions]. Unit: cpu_atom] bus-cycles [Bus cycles,which can be different from total cycles. Unit: cpu_atom] cache-misses [Cache misses. Usually this indicates Last Level Cache misses; this is intended to be used in conjunction with the PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates. Unit: cpu_atom] cache-references [Cache accesses. Usually this indicates Last Level Cache accesses but this may vary depending on your CPU. This may include prefetches and coherency messages; again this depends on the design of your CPU. Unit: cpu_atom] cpu-cycles [Total cycles. Be wary of what happens during CPU frequency scaling [This event is an alias of cycles]. Unit: cpu_atom] cycles [Total cycles. Be wary of what happens during CPU frequency scaling [This event is an alias of cpu-cycles]. Unit: cpu_atom] instructions [Retired instructions. Be careful,these can be affected by various issues,most notably hardware interrupt counts. Unit: cpu_atom] ref-cycles [Total cycles; not affected by CPU frequency scaling. Unit: cpu_atom] branch-instructions [Retired branch instructions [This event is an alias of branches]. Unit: cpu_core] branch-misses [Mispredicted branch instructions. Unit: cpu_core] branches [Retired branch instructions [This event is an alias of branch-instructions]. Unit: cpu_core] bus-cycles [Bus cycles,which can be different from total cycles. Unit: cpu_core] cache-misses [Cache misses. Usually this indicates Last Level Cache misses; this is intended to be used in conjunction with the PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates. Unit: cpu_core] cache-references [Cache accesses. Usually this indicates Last Level Cache accesses but this may vary depending on your CPU. This may include prefetches and coherency messages; again this depends on the design of your CPU. Unit: cpu_core] cpu-cycles [Total cycles. Be wary of what happens during CPU frequency scaling [This event is an alias of cycles]. Unit: cpu_core] cycles [Total cycles. Be wary of what happens during CPU frequency scaling [This event is an alias of cpu-cycles]. Unit: cpu_core] instructions [Retired instructions. Be careful,these can be affected by various issues,most notably hardware interrupt counts. Unit: cpu_core] ref-cycles [Total cycles; not affected by CPU frequency scaling. Unit: cpu_core] $ perf list hwcache legacy cache: branch-load-misses [Branch prediction unit read misses. Unit: cpu_atom] branch-loads [Branch prediction unit read accesses. Unit: cpu_atom] dtlb-load-misses [Data TLB read misses. Unit: cpu_atom] dtlb-loads [Data TLB read accesses. Unit: cpu_atom] dtlb-store-misses [Data TLB write misses. Unit: cpu_atom] dtlb-stores [Data TLB write accesses. Unit: cpu_atom] itlb-load-misses [Instruction TLB read misses. Unit: cpu_atom] l1-dcache-loads [Level 1 data cache read accesses. Unit: cpu_atom] l1-dcache-stores [Level 1 data cache write accesses. Unit: cpu_atom] l1-icache-load-misses [Level 1 instruction cache read misses. Unit: cpu_atom] l1-icache-loads [Level 1 instruction cache read accesses. Unit: cpu_atom] llc-load-misses [Last level cache read misses. Unit: cpu_atom] llc-loads [Last level cache read accesses. Unit: cpu_atom] llc-store-misses [Last level cache write misses. Unit: cpu_atom] llc-stores [Last level cache write accesses. Unit: cpu_atom] branch-load-misses [Branch prediction unit read misses. Unit: cpu_core] branch-loads [Branch prediction unit read accesses. Unit: cpu_core] dtlb-load-misses [Data TLB read misses. Unit: cpu_core] dtlb-loads [Data TLB read accesses. Unit: cpu_core] dtlb-store-misses [Data TLB write misses. Unit: cpu_core] dtlb-stores [Data TLB write accesses. Unit: cpu_core] itlb-load-misses [Instruction TLB read misses. Unit: cpu_core] l1-dcache-load-misses [Level 1 data cache read misses. Unit: cpu_core] l1-dcache-loads [Level 1 data cache read accesses. Unit: cpu_core] l1-dcache-stores [Level 1 data cache write accesses. Unit: cpu_core] l1-icache-load-misses [Level 1 instruction cache read misses. Unit: cpu_core] llc-load-misses [Last level cache read misses. Unit: cpu_core] llc-loads [Last level cache read accesses. Unit: cpu_core] llc-store-misses [Last level cache write misses. Unit: cpu_core] llc-stores [Last level cache write accesses. Unit: cpu_core] node-load-misses [Local memory read misses. Unit: cpu_core] node-loads [Local memory read accesses. Unit: cpu_core] ``` v3: Deprecate the legacy cache events that aren't shown in the previous perf list to avoid the perf list output being too verbose. v2: Additional details to the cover letter. Credit to Vince Weaver added to the commit message for the event details. Additional patches to clean up perf_pmu new_alias by removing an unused term scanner argument and avoid stdio usage. https://lore.kernel.org/lkml/20250828163225.3839073-1-irogers@google.com/ v1: https://lore.kernel.org/lkml/20250828064231.1762997-1-irogers@google.com/ Ian Rogers (15): perf parse-events: Fix legacy cache events if event is duplicated in a PMU perf perf_api_probe: Avoid scanning all PMUs, try software PMU first perf record: Skip don't fail for events that don't open perf jevents: Support copying the source json files to OUTPUT perf pmu: Don't eagerly parse event terms perf parse-events: Remove unused FILE input argument to scanner perf pmu: Use fd rather than FILE from new_alias perf pmu: Factor term parsing into a perf_event_attr into a helper perf parse-events: Add terms for legacy hardware and cache config values perf jevents: Add legacy json terms and default_core event table helper perf pmu: Add and use legacy_terms in alias information perf jevents: Add legacy-hardware and legacy-cache json perf print-events: Remove print_hwcache_events perf print-events: Remove print_symbol_events perf parse-events: Remove hard coded legacy hardware and cache parsing tools/perf/Makefile.perf | 21 +- tools/perf/arch/x86/util/intel-pt.c | 2 +- tools/perf/builtin-list.c | 34 +- tools/perf/builtin-record.c | 89 +- tools/perf/pmu-events/Build | 24 +- .../arch/common/common/legacy-hardware.json | 72 + tools/perf/pmu-events/empty-pmu-events.c | 2763 ++++++++++++++++- tools/perf/pmu-events/jevents.py | 24 + tools/perf/pmu-events/make_legacy_cache.py | 129 + tools/perf/pmu-events/pmu-events.h | 1 + tools/perf/tests/parse-events.c | 2 +- tools/perf/tests/pmu-events.c | 24 +- tools/perf/tests/pmu.c | 3 +- tools/perf/util/parse-events.c | 283 +- tools/perf/util/parse-events.h | 16 +- tools/perf/util/parse-events.l | 54 +- tools/perf/util/parse-events.y | 114 +- tools/perf/util/perf_api_probe.c | 27 +- tools/perf/util/pmu.c | 302 +- tools/perf/util/print-events.c | 112 - tools/perf/util/print-events.h | 4 - 21 files changed, 3330 insertions(+), 770 deletions(-) create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json create mode 100755 tools/perf/pmu-events/make_legacy_cache.py -- 2.51.0.318.gd7df087d1a-goog
On 28/08/2025 9:59 pm, Ian Rogers wrote: > Mirroring similar work for software events in commit 6e9fa4131abb > ("perf parse-events: Remove non-json software events"). These changes > migrate the legacy hardware and cache events to json. With no hard > coded legacy hardware or cache events the wild card, case > insensitivity, etc. is consistent for events. This does, however, mean > events like cycles will wild card against all PMUs. A change doing the > same was originally posted and merged from: > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com > and reverted by Linus in commit 4f1b067359ac ("Revert "perf > parse-events: Prefer sysfs/JSON hardware events over legacy"") due to > his dislike for the cycles behavior on ARM with perf record. Earlier > patches in this series make perf record event opening failures > non-fatal and hide the cycles event's failure to open on ARM in perf > record, so it is expected the behavior will now be transparent in perf > record on ARM. perf stat with a cycles event will wildcard open the > event on all PMUs. Hi Ian, Briefly testing perf record and perf stat seem to work now. i.e "perf record -e cycles" doesn't fail and just skips the uncore cycles event. And "perf stat" now includes the uncore cycles event which I think is harmless. But there are a few perf test failures. For example "test event parsing": evlist after sorting/fixing: 'arm_cmn_0/cycles/,{cycles,cache- misses,branch-misses}' FAILED tests/parse-events.c:1589 wrong number of entries Event test failure: test 57 '{cycles,cache-misses,branch- misses}:e'running test 58 'cycles/name=name/' The tests "Perf time to TSC" and "Use a dummy software event to keep tracking" are using libperf to open the cycles event as a sampling event which now fails. It seems like we've fixed Perf record to ignore this failure, but we didn't think about libperf until now. Thanks James
On Wed, Sep 10, 2025 at 4:14 AM James Clark <james.clark@linaro.org> wrote: > > On 28/08/2025 9:59 pm, Ian Rogers wrote: > > Mirroring similar work for software events in commit 6e9fa4131abb > > ("perf parse-events: Remove non-json software events"). These changes > > migrate the legacy hardware and cache events to json. With no hard > > coded legacy hardware or cache events the wild card, case > > insensitivity, etc. is consistent for events. This does, however, mean > > events like cycles will wild card against all PMUs. A change doing the > > same was originally posted and merged from: > > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com > > and reverted by Linus in commit 4f1b067359ac ("Revert "perf > > parse-events: Prefer sysfs/JSON hardware events over legacy"") due to > > his dislike for the cycles behavior on ARM with perf record. Earlier > > patches in this series make perf record event opening failures > > non-fatal and hide the cycles event's failure to open on ARM in perf > > record, so it is expected the behavior will now be transparent in perf > > record on ARM. perf stat with a cycles event will wildcard open the > > event on all PMUs. > > Hi Ian, > > Briefly testing perf record and perf stat seem to work now. i.e "perf > record -e cycles" doesn't fail and just skips the uncore cycles event. > And "perf stat" now includes the uncore cycles event which I think is > harmless. Thanks for confirming this. > But there are a few perf test failures. For example "test event parsing": > > evlist after sorting/fixing: 'arm_cmn_0/cycles/,{cycles,cache- > misses,branch-misses}' > FAILED tests/parse-events.c:1589 wrong number of entries > Event test failure: test 57 '{cycles,cache-misses,branch- > misses}:e'running test 58 'cycles/name=name/' I suspect the easiest fix for this is to change "cycles" to the "cpu-cycles" legacy hardware event for this test. The test has always had issues on ARM due to hardcoded expectations of the core PMU being "cpu". > The tests "Perf time to TSC" and "Use a dummy software event to keep > tracking" are using libperf to open the cycles event as a sampling event > which now fails. It seems like we've fixed Perf record to ignore this > failure, but we didn't think about libperf until now. I'm not clear on the connection here. libperf doesn't do event parsing and so there are no changes in tools/lib/perf. If a test has an expectation that "cycles" is a core event, again we can change it to "cpu-cycles" as a workaround for ARM. As "cycles" will wildcard now, we don't want that behavior in say API probing as we'll end up never lazily processing the PMUs. That code has been altered in these changes to specify the core PMU. For tests it is less of an issue and so the changes are more limited. Thanks, Ian
On 10/09/2025 4:00 pm, Ian Rogers wrote: > On Wed, Sep 10, 2025 at 4:14 AM James Clark <james.clark@linaro.org> wrote: >> >> On 28/08/2025 9:59 pm, Ian Rogers wrote: >>> Mirroring similar work for software events in commit 6e9fa4131abb >>> ("perf parse-events: Remove non-json software events"). These changes >>> migrate the legacy hardware and cache events to json. With no hard >>> coded legacy hardware or cache events the wild card, case >>> insensitivity, etc. is consistent for events. This does, however, mean >>> events like cycles will wild card against all PMUs. A change doing the >>> same was originally posted and merged from: >>> https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com >>> and reverted by Linus in commit 4f1b067359ac ("Revert "perf >>> parse-events: Prefer sysfs/JSON hardware events over legacy"") due to >>> his dislike for the cycles behavior on ARM with perf record. Earlier >>> patches in this series make perf record event opening failures >>> non-fatal and hide the cycles event's failure to open on ARM in perf >>> record, so it is expected the behavior will now be transparent in perf >>> record on ARM. perf stat with a cycles event will wildcard open the >>> event on all PMUs. >> >> Hi Ian, >> >> Briefly testing perf record and perf stat seem to work now. i.e "perf >> record -e cycles" doesn't fail and just skips the uncore cycles event. >> And "perf stat" now includes the uncore cycles event which I think is >> harmless. > > Thanks for confirming this. > >> But there are a few perf test failures. For example "test event parsing": >> >> evlist after sorting/fixing: 'arm_cmn_0/cycles/,{cycles,cache- >> misses,branch-misses}' >> FAILED tests/parse-events.c:1589 wrong number of entries >> Event test failure: test 57 '{cycles,cache-misses,branch- >> misses}:e'running test 58 'cycles/name=name/' > > I suspect the easiest fix for this is to change "cycles" to the > "cpu-cycles" legacy hardware event for this test. The test has always > had issues on ARM due to hardcoded expectations of the core PMU being > "cpu". > >> The tests "Perf time to TSC" and "Use a dummy software event to keep >> tracking" are using libperf to open the cycles event as a sampling event >> which now fails. It seems like we've fixed Perf record to ignore this >> failure, but we didn't think about libperf until now. > > I'm not clear on the connection here. libperf doesn't do event parsing > and so there are no changes in tools/lib/perf. If a test has an > expectation that "cycles" is a core event, again we can change it to > "cpu-cycles" as a workaround for ARM. As "cycles" will wildcard now, > we don't want that behavior in say API probing as we'll end up never > lazily processing the PMUs. That code has been altered in these > changes to specify the core PMU. For tests it is less of an issue and > so the changes are more limited. > > Thanks, > Ian Sure makes sense if there's an easy fix for the tests, we can do that. I suppose the main reason I mentioned it was that the tests might be highlighting that other genuine non-Perf and non-test users would see the same breakage though.
On Thu, Sep 11, 2025 at 6:00 AM James Clark <james.clark@linaro.org> wrote: > > > > On 10/09/2025 4:00 pm, Ian Rogers wrote: > > On Wed, Sep 10, 2025 at 4:14 AM James Clark <james.clark@linaro.org> wrote: > >> > >> On 28/08/2025 9:59 pm, Ian Rogers wrote: > >>> Mirroring similar work for software events in commit 6e9fa4131abb > >>> ("perf parse-events: Remove non-json software events"). These changes > >>> migrate the legacy hardware and cache events to json. With no hard > >>> coded legacy hardware or cache events the wild card, case > >>> insensitivity, etc. is consistent for events. This does, however, mean > >>> events like cycles will wild card against all PMUs. A change doing the > >>> same was originally posted and merged from: > >>> https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com > >>> and reverted by Linus in commit 4f1b067359ac ("Revert "perf > >>> parse-events: Prefer sysfs/JSON hardware events over legacy"") due to > >>> his dislike for the cycles behavior on ARM with perf record. Earlier > >>> patches in this series make perf record event opening failures > >>> non-fatal and hide the cycles event's failure to open on ARM in perf > >>> record, so it is expected the behavior will now be transparent in perf > >>> record on ARM. perf stat with a cycles event will wildcard open the > >>> event on all PMUs. > >> > >> Hi Ian, > >> > >> Briefly testing perf record and perf stat seem to work now. i.e "perf > >> record -e cycles" doesn't fail and just skips the uncore cycles event. > >> And "perf stat" now includes the uncore cycles event which I think is > >> harmless. > > > > Thanks for confirming this. > > > >> But there are a few perf test failures. For example "test event parsing": > >> > >> evlist after sorting/fixing: 'arm_cmn_0/cycles/,{cycles,cache- > >> misses,branch-misses}' > >> FAILED tests/parse-events.c:1589 wrong number of entries > >> Event test failure: test 57 '{cycles,cache-misses,branch- > >> misses}:e'running test 58 'cycles/name=name/' > > > > I suspect the easiest fix for this is to change "cycles" to the > > "cpu-cycles" legacy hardware event for this test. The test has always > > had issues on ARM due to hardcoded expectations of the core PMU being > > "cpu". > > > >> The tests "Perf time to TSC" and "Use a dummy software event to keep > >> tracking" are using libperf to open the cycles event as a sampling event > >> which now fails. It seems like we've fixed Perf record to ignore this > >> failure, but we didn't think about libperf until now. > > > > I'm not clear on the connection here. libperf doesn't do event parsing > > and so there are no changes in tools/lib/perf. If a test has an > > expectation that "cycles" is a core event, again we can change it to > > "cpu-cycles" as a workaround for ARM. As "cycles" will wildcard now, > > we don't want that behavior in say API probing as we'll end up never > > lazily processing the PMUs. That code has been altered in these > > changes to specify the core PMU. For tests it is less of an issue and > > so the changes are more limited. > > > > Thanks, > > Ian > > Sure makes sense if there's an easy fix for the tests, we can do that. I > suppose the main reason I mentioned it was that the tests might be > highlighting that other genuine non-Perf and non-test users would see > the same breakage though. For a non-perf user to see a perf change they must transitively depend on perf to care. I think the complaint is that we've gone from 1 event (ignoring BIG.little/hybrid) to possibly many, particularly on ARM. What I'm thinking is we should have something like: #if defined(__aarch64__) || defined(__arm__) #define HW_CYCLES_STR "cpu-cycles" #else #define HW_CYCLES_STR "cycles" #endif and remove all use of just raw "cycles" in the code to use this #define. This should avoid the >1 event issue on ARM in things like tests. It does cause a new problem if the evsel->name is assumed to be cycles, which is something that can happen a lot in shell scripts. Perhaps all those use-cases should switch to specifying a PMU, which would be a good thing performance wise to avoid scanning lots of PMUs. I'll add somethings to v4 to do a mix of this. Thanks, Ian
On 8/28/25 22:59, Ian Rogers wrote: > Mirroring similar work for software events in commit 6e9fa4131abb > ("perf parse-events: Remove non-json software events"). These changes > migrate the legacy hardware and cache events to json. With no hard > coded legacy hardware or cache events the wild card, case > insensitivity, etc. is consistent for events. This does, however, mean > events like cycles will wild card against all PMUs. A change doing the > same was originally posted and merged from: > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com > and reverted by Linus in commit 4f1b067359ac ("Revert "perf > parse-events: Prefer sysfs/JSON hardware events over legacy"") due to > his dislike for the cycles behavior on ARM with perf record. Earlier > patches in this series make perf record event opening failures > non-fatal and hide the cycles event's failure to open on ARM in perf > record, so it is expected the behavior will now be transparent in perf > record on ARM. perf stat with a cycles event will wildcard open the > event on all PMUs. > > The change to support legacy events with PMUs was done to clean up > Intel's hybrid PMU implementation. Having sysfs/json events with > increased priority to legacy was requested by Mark Rutland > <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy > events on that PMU. It is believed the PMU driver is now fixed, but > this has only been confirmed on ARM Juno boards. It was requested that > RISC-V be able to add events to the perf tool json so the PMU driver > didn't need to map legacy events to config encodings: > https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/ > This patch series achieves this. > > A previous series of patches decreasing legacy hardware event > priorities was posted in: > https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/ > Namhyung Kim <namhyung@kernel.org> mentioned that hardware and > software events can be implemented similarly: > https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/ > and this patch series achieves this. > > Note, patch 1 (perf parse-events: Fix legacy cache events if event is > duplicated in a PMU) fixes a function deleted by patch 15 (perf > parse-events: Remove hard coded legacy hardware and cache > parsing). Adding the json exposed an issue when legacy cache (not > legacy hardware) and sysfs/json events exist. The fix is necessary to > keep tests passing through the series. It is also posted for backports > to stable trees. > > The perf list behavior includes a lot more information and events. The > before behavior on a hybrid alderlake is: ..... For s390 the whole series: Tested-by: Thomas Richter <tmricht@linux.ibm.com> -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
Hi Ian, On Thu, Aug 28, 2025 at 01:59:15PM -0700, Ian Rogers wrote: > Mirroring similar work for software events in commit 6e9fa4131abb > ("perf parse-events: Remove non-json software events"). These changes > migrate the legacy hardware and cache events to json. With no hard > coded legacy hardware or cache events the wild card, case > insensitivity, etc. is consistent for events. This does, however, mean > events like cycles will wild card against all PMUs. A change doing the > same was originally posted and merged from: > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com > and reverted by Linus in commit 4f1b067359ac ("Revert "perf > parse-events: Prefer sysfs/JSON hardware events over legacy"") due to > his dislike for the cycles behavior on ARM with perf record. Earlier > patches in this series make perf record event opening failures > non-fatal and hide the cycles event's failure to open on ARM in perf > record, so it is expected the behavior will now be transparent in perf > record on ARM. perf stat with a cycles event will wildcard open the > event on all PMUs. > > The change to support legacy events with PMUs was done to clean up > Intel's hybrid PMU implementation. Having sysfs/json events with > increased priority to legacy was requested by Mark Rutland > <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy > events on that PMU. It is believed the PMU driver is now fixed, but > this has only been confirmed on ARM Juno boards. It was requested that > RISC-V be able to add events to the perf tool json so the PMU driver > didn't need to map legacy events to config encodings: > https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/ > This patch series achieves this. > > A previous series of patches decreasing legacy hardware event > priorities was posted in: > https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/ > Namhyung Kim <namhyung@kernel.org> mentioned that hardware and > software events can be implemented similarly: > https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/ > and this patch series achieves this. Thanks for working on this. Yeah, I think it's be easier to handle all events consistently with JSON. I expect the sysfs encoding will be used in a higher priority if it comes with <PMU>/<EVENT>/ format. > > Note, patch 1 (perf parse-events: Fix legacy cache events if event is > duplicated in a PMU) fixes a function deleted by patch 15 (perf > parse-events: Remove hard coded legacy hardware and cache > parsing). Adding the json exposed an issue when legacy cache (not > legacy hardware) and sysfs/json events exist. The fix is necessary to > keep tests passing through the series. It is also posted for backports > to stable trees. Sounds ok. > > The perf list behavior includes a lot more information and events. The > before behavior on a hybrid alderlake is: > ``` > $ perf list hw > > List of pre-defined events (to be used in -e or -M): > > branch-instructions OR branches [Hardware event] > branch-misses [Hardware event] > bus-cycles [Hardware event] > cache-misses [Hardware event] > cache-references [Hardware event] > cpu-cycles OR cycles [Hardware event] > instructions [Hardware event] > ref-cycles [Hardware event] > $ perf list hwcache > > List of pre-defined events (to be used in -e or -M): > > > cache: > L1-dcache-loads OR cpu_atom/L1-dcache-loads/ > L1-dcache-stores OR cpu_atom/L1-dcache-stores/ > L1-icache-loads OR cpu_atom/L1-icache-loads/ > L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/ > LLC-loads OR cpu_atom/LLC-loads/ > LLC-load-misses OR cpu_atom/LLC-load-misses/ > LLC-stores OR cpu_atom/LLC-stores/ > LLC-store-misses OR cpu_atom/LLC-store-misses/ > dTLB-loads OR cpu_atom/dTLB-loads/ > dTLB-load-misses OR cpu_atom/dTLB-load-misses/ > dTLB-stores OR cpu_atom/dTLB-stores/ > dTLB-store-misses OR cpu_atom/dTLB-store-misses/ > iTLB-load-misses OR cpu_atom/iTLB-load-misses/ > branch-loads OR cpu_atom/branch-loads/ > branch-load-misses OR cpu_atom/branch-load-misses/ > L1-dcache-loads OR cpu_core/L1-dcache-loads/ > L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/ > L1-dcache-stores OR cpu_core/L1-dcache-stores/ > L1-icache-load-misses OR cpu_core/L1-icache-load-misses/ > LLC-loads OR cpu_core/LLC-loads/ > LLC-load-misses OR cpu_core/LLC-load-misses/ > LLC-stores OR cpu_core/LLC-stores/ > LLC-store-misses OR cpu_core/LLC-store-misses/ > dTLB-loads OR cpu_core/dTLB-loads/ > dTLB-load-misses OR cpu_core/dTLB-load-misses/ > dTLB-stores OR cpu_core/dTLB-stores/ > dTLB-store-misses OR cpu_core/dTLB-store-misses/ > iTLB-load-misses OR cpu_core/iTLB-load-misses/ > branch-loads OR cpu_core/branch-loads/ > branch-load-misses OR cpu_core/branch-load-misses/ > node-loads OR cpu_core/node-loads/ > node-load-misses OR cpu_core/node-load-misses/ > ``` > and after it is: > ``` > $ perf list hw > > legacy hardware: > branch-instructions > [Retired branch instructions [This event is an alias of branches]. > Unit: cpu_atom] > branch-misses > [Mispredicted branch instructions. Unit: cpu_atom] > branches > [Retired branch instructions [This event is an alias of > branch-instructions]. Unit: cpu_atom] A nit. Can we have one actual event and an alias of it? I think 'branch-instructions' will be the actual event and 'branches' will be the alias. Then the description will be like branch-instructions [Retired branch instructions. Unit: cpu_atom] ... branches [This event is an alias of branch-instructions.] The same goes to 'cycles' and 'cpu-cycles'. Thanks, Namhyung > bus-cycles > [Bus cycles,which can be different from total cycles. Unit: cpu_atom] > cache-misses > [Cache misses. Usually this indicates Last Level Cache misses; this is > intended to be used in conjunction with the > PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates. > Unit: cpu_atom] > cache-references > [Cache accesses. Usually this indicates Last Level Cache accesses but > this may vary depending on your CPU. This may include prefetches and > coherency messages; again this depends on the design of your CPU. > Unit: cpu_atom] > cpu-cycles > [Total cycles. Be wary of what happens during CPU frequency scaling > [This event is an alias of cycles]. Unit: cpu_atom] > cycles > [Total cycles. Be wary of what happens during CPU frequency scaling > [This event is an alias of cpu-cycles]. Unit: cpu_atom] > instructions > [Retired instructions. Be careful,these can be affected by various > issues,most notably hardware interrupt counts. Unit: cpu_atom] > ref-cycles > [Total cycles; not affected by CPU frequency scaling. Unit: cpu_atom] > branch-instructions > [Retired branch instructions [This event is an alias of branches]. > Unit: cpu_core] > branch-misses > [Mispredicted branch instructions. Unit: cpu_core] > branches > [Retired branch instructions [This event is an alias of > branch-instructions]. Unit: cpu_core] > bus-cycles > [Bus cycles,which can be different from total cycles. Unit: cpu_core] > cache-misses > [Cache misses. Usually this indicates Last Level Cache misses; this is > intended to be used in conjunction with the > PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates. > Unit: cpu_core] > cache-references > [Cache accesses. Usually this indicates Last Level Cache accesses but > this may vary depending on your CPU. This may include prefetches and > coherency messages; again this depends on the design of your CPU. > Unit: cpu_core] > cpu-cycles > [Total cycles. Be wary of what happens during CPU frequency scaling > [This event is an alias of cycles]. Unit: cpu_core] > cycles > [Total cycles. Be wary of what happens during CPU frequency scaling > [This event is an alias of cpu-cycles]. Unit: cpu_core] > instructions > [Retired instructions. Be careful,these can be affected by various > issues,most notably hardware interrupt counts. Unit: cpu_core] > ref-cycles > [Total cycles; not affected by CPU frequency scaling. Unit: cpu_core] > $ perf list hwcache > > legacy cache: > branch-load-misses > [Branch prediction unit read misses. Unit: cpu_atom] > branch-loads > [Branch prediction unit read accesses. Unit: cpu_atom] > dtlb-load-misses > [Data TLB read misses. Unit: cpu_atom] > dtlb-loads > [Data TLB read accesses. Unit: cpu_atom] > dtlb-store-misses > [Data TLB write misses. Unit: cpu_atom] > dtlb-stores > [Data TLB write accesses. Unit: cpu_atom] > itlb-load-misses > [Instruction TLB read misses. Unit: cpu_atom] > l1-dcache-loads > [Level 1 data cache read accesses. Unit: cpu_atom] > l1-dcache-stores > [Level 1 data cache write accesses. Unit: cpu_atom] > l1-icache-load-misses > [Level 1 instruction cache read misses. Unit: cpu_atom] > l1-icache-loads > [Level 1 instruction cache read accesses. Unit: cpu_atom] > llc-load-misses > [Last level cache read misses. Unit: cpu_atom] > llc-loads > [Last level cache read accesses. Unit: cpu_atom] > llc-store-misses > [Last level cache write misses. Unit: cpu_atom] > llc-stores > [Last level cache write accesses. Unit: cpu_atom] > branch-load-misses > [Branch prediction unit read misses. Unit: cpu_core] > branch-loads > [Branch prediction unit read accesses. Unit: cpu_core] > dtlb-load-misses > [Data TLB read misses. Unit: cpu_core] > dtlb-loads > [Data TLB read accesses. Unit: cpu_core] > dtlb-store-misses > [Data TLB write misses. Unit: cpu_core] > dtlb-stores > [Data TLB write accesses. Unit: cpu_core] > itlb-load-misses > [Instruction TLB read misses. Unit: cpu_core] > l1-dcache-load-misses > [Level 1 data cache read misses. Unit: cpu_core] > l1-dcache-loads > [Level 1 data cache read accesses. Unit: cpu_core] > l1-dcache-stores > [Level 1 data cache write accesses. Unit: cpu_core] > l1-icache-load-misses > [Level 1 instruction cache read misses. Unit: cpu_core] > llc-load-misses > [Last level cache read misses. Unit: cpu_core] > llc-loads > [Last level cache read accesses. Unit: cpu_core] > llc-store-misses > [Last level cache write misses. Unit: cpu_core] > llc-stores > [Last level cache write accesses. Unit: cpu_core] > node-load-misses > [Local memory read misses. Unit: cpu_core] > node-loads > [Local memory read accesses. Unit: cpu_core] > ``` > > v3: Deprecate the legacy cache events that aren't shown in the > previous perf list to avoid the perf list output being too verbose. > > v2: Additional details to the cover letter. Credit to Vince Weaver > added to the commit message for the event details. Additional > patches to clean up perf_pmu new_alias by removing an unused term > scanner argument and avoid stdio usage. > https://lore.kernel.org/lkml/20250828163225.3839073-1-irogers@google.com/ > > v1: https://lore.kernel.org/lkml/20250828064231.1762997-1-irogers@google.com/ > > Ian Rogers (15): > perf parse-events: Fix legacy cache events if event is duplicated in a > PMU > perf perf_api_probe: Avoid scanning all PMUs, try software PMU first > perf record: Skip don't fail for events that don't open > perf jevents: Support copying the source json files to OUTPUT > perf pmu: Don't eagerly parse event terms > perf parse-events: Remove unused FILE input argument to scanner > perf pmu: Use fd rather than FILE from new_alias > perf pmu: Factor term parsing into a perf_event_attr into a helper > perf parse-events: Add terms for legacy hardware and cache config > values > perf jevents: Add legacy json terms and default_core event table > helper > perf pmu: Add and use legacy_terms in alias information > perf jevents: Add legacy-hardware and legacy-cache json > perf print-events: Remove print_hwcache_events > perf print-events: Remove print_symbol_events > perf parse-events: Remove hard coded legacy hardware and cache parsing > > tools/perf/Makefile.perf | 21 +- > tools/perf/arch/x86/util/intel-pt.c | 2 +- > tools/perf/builtin-list.c | 34 +- > tools/perf/builtin-record.c | 89 +- > tools/perf/pmu-events/Build | 24 +- > .../arch/common/common/legacy-hardware.json | 72 + > tools/perf/pmu-events/empty-pmu-events.c | 2763 ++++++++++++++++- > tools/perf/pmu-events/jevents.py | 24 + > tools/perf/pmu-events/make_legacy_cache.py | 129 + > tools/perf/pmu-events/pmu-events.h | 1 + > tools/perf/tests/parse-events.c | 2 +- > tools/perf/tests/pmu-events.c | 24 +- > tools/perf/tests/pmu.c | 3 +- > tools/perf/util/parse-events.c | 283 +- > tools/perf/util/parse-events.h | 16 +- > tools/perf/util/parse-events.l | 54 +- > tools/perf/util/parse-events.y | 114 +- > tools/perf/util/perf_api_probe.c | 27 +- > tools/perf/util/pmu.c | 302 +- > tools/perf/util/print-events.c | 112 - > tools/perf/util/print-events.h | 4 - > 21 files changed, 3330 insertions(+), 770 deletions(-) > create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json > create mode 100755 tools/perf/pmu-events/make_legacy_cache.py > > -- > 2.51.0.318.gd7df087d1a-goog >
On Wed, Sep 10, 2025 at 1:10 PM Namhyung Kim <namhyung@kernel.org> wrote: > A nit. Can we have one actual event and an alias of it? > > I think 'branch-instructions' will be the actual event and 'branches' > will be the alias. Then the description will be like > > branch-instructions > [Retired branch instructions. Unit: cpu_atom] > ... > > branches > [This event is an alias of branch-instructions.] > > The same goes to 'cycles' and 'cpu-cycles'. Similar 'cs' and 'context-switches' in tools/perf/pmu-events/arch/common/common/software.json. So there are a few different ways to do this: 1) In perf list detect two events have the same encoding and list them together. 2) In the json have a new aliases list then either: 2.1) gets expanded in jevents.py as part of the build, 2.2) passes into the pmu-events.c and the C code is updated to use an alias list associated with each event. Option (1) will have something like quadratic complexity, but a fast perf list isn't a particular goal I've heard of. Option (2.2) will mean the existing binary searches for events will become a binary search for an event and then linear searches through the aliases. To make this not a slowdown we'd likely need more lookup tables to avoid the linear searches. Option (2.1) feels the most plausible. I was hoping the json and the sysfs layout would kind of match, this would be true after the jevents.py expands the aliases. This option is already kind of already done in the legacy cache case as the tools/perf/pmu-events/make_legacy_cache.py is making this. We'd still need option (1) with this. Anyway, I'm not sure these downsides are countered by a slightly smaller hardware.json and software.json, or maybe we should just go with option 1 if the perf list output is all you care about. Let me know if you see a different way of making it happen. I don't think the vendors will be particularly happy for their upstream formats to change given other tools will rely on them. Thanks, Ian
On Wed, Sep 10, 2025 at 02:58:05PM -0700, Ian Rogers wrote: > On Wed, Sep 10, 2025 at 1:10 PM Namhyung Kim <namhyung@kernel.org> wrote: > > A nit. Can we have one actual event and an alias of it? > > > > I think 'branch-instructions' will be the actual event and 'branches' > > will be the alias. Then the description will be like > > > > branch-instructions > > [Retired branch instructions. Unit: cpu_atom] > > ... > > > > branches > > [This event is an alias of branch-instructions.] > > > > The same goes to 'cycles' and 'cpu-cycles'. > > Similar 'cs' and 'context-switches' in > tools/perf/pmu-events/arch/common/common/software.json. > > So there are a few different ways to do this: > > 1) In perf list detect two events have the same encoding and list them together. > 2) In the json have a new aliases list then either: > 2.1) gets expanded in jevents.py as part of the build, > 2.2) passes into the pmu-events.c and the C code is updated to use an > alias list associated with each event. > > Option (1) will have something like quadratic complexity, but a fast > perf list isn't a particular goal I've heard of. > Option (2.2) will mean the existing binary searches for events will > become a binary search for an event and then linear searches through > the aliases. To make this not a slowdown we'd likely need more lookup > tables to avoid the linear searches. > Option (2.1) feels the most plausible. I was hoping the json and the > sysfs layout would kind of match, this would be true after the > jevents.py expands the aliases. This option is already kind of already > done in the legacy cache case as the > tools/perf/pmu-events/make_legacy_cache.py is making this. We'd still > need option (1) with this. > > Anyway, I'm not sure these downsides are countered by a slightly > smaller hardware.json and software.json, or maybe we should just go > with option 1 if the perf list output is all you care about. Let me > know if you see a different way of making it happen. I don't think the > vendors will be particularly happy for their upstream formats to > change given other tools will rely on them. Well, I was asking just to update the description in JSON. I'm not sure if it's a common problem we need to solve. Updating a few known aliases in the hardware and software description would be fine. Thanks, Namhyung
© 2016 - 2025 Red Hat, Inc.