[PATCH v9 46/48] perf jevents: Add collection of topdown like metrics for arm64

Ian Rogers posted 48 patches 1 week, 1 day ago
[PATCH v9 46/48] perf jevents: Add collection of topdown like metrics for arm64
Posted by Ian Rogers 1 week, 1 day ago
Metrics are created using legacy, common and recommended events. As
events may be missing a TryEvent function will give None if an event
is missing. To workaround missing JSON events for cortex-a53, sysfs
encodings are used.

Signed-off-by: Ian Rogers <irogers@google.com>
---
An earlier review of this patch by Leo Yan is here:
https://lore.kernel.org/lkml/8168c713-005c-4fd9-a928-66763dab746a@arm.com/
Hopefully all corrections were made.
---
 tools/perf/pmu-events/arm64_metrics.py | 145 ++++++++++++++++++++++++-
 1 file changed, 142 insertions(+), 3 deletions(-)

diff --git a/tools/perf/pmu-events/arm64_metrics.py b/tools/perf/pmu-events/arm64_metrics.py
index ac717ca3513a..9678253e2e0e 100755
--- a/tools/perf/pmu-events/arm64_metrics.py
+++ b/tools/perf/pmu-events/arm64_metrics.py
@@ -2,13 +2,150 @@
 # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
 import argparse
 import os
-from metric import (JsonEncodeMetric, JsonEncodeMetricGroupDescriptions, LoadEvents,
-                    MetricGroup)
+from typing import Optional
+from metric import (d_ratio, Event, JsonEncodeMetric, JsonEncodeMetricGroupDescriptions,
+                    LoadEvents, Metric, MetricGroup)
 
 # Global command line arguments.
 _args = None
 
 
+def Arm64Topdown() -> MetricGroup:
+    """Returns a MetricGroup representing ARM64 topdown like metrics."""
+    def TryEvent(name: str) -> Optional[Event]:
+        # Skip an event if not in the json files.
+        try:
+            return Event(name)
+        except:
+            return None
+    # ARM models like a53 lack JSON for INST_RETIRED but have the
+    # architetural standard event in sysfs. Use the PMU name to identify
+    # the sysfs event.
+    pmu_name = f'armv8_{_args.model.replace("-", "_")}'
+    ins = Event("instructions")
+    ins_ret = Event("INST_RETIRED", f"{pmu_name}/inst_retired/")
+    cycles = Event("cpu\\-cycles")
+    stall_fe = TryEvent("STALL_FRONTEND")
+    stall_be = TryEvent("STALL_BACKEND")
+    br_ret = TryEvent("BR_RETIRED")
+    br_mp_ret = TryEvent("BR_MIS_PRED_RETIRED")
+    dtlb_walk = TryEvent("DTLB_WALK")
+    itlb_walk = TryEvent("ITLB_WALK")
+    l1d_tlb = TryEvent("L1D_TLB")
+    l1i_tlb = TryEvent("L1I_TLB")
+    l1d_refill = Event("L1D_CACHE_REFILL", f"{pmu_name}/l1d_cache_refill/")
+    l2d_refill = Event("L2D_CACHE_REFILL", f"{pmu_name}/l2d_cache_refill/")
+    l1i_refill = Event("L1I_CACHE_REFILL", f"{pmu_name}/l1i_cache_refill/")
+    l1d_access = Event("L1D_CACHE", f"{pmu_name}/l1d_cache/")
+    l2d_access = Event("L2D_CACHE", f"{pmu_name}/l2d_cache/")
+    llc_access = TryEvent("LL_CACHE_RD")
+    l1i_access = Event("L1I_CACHE", f"{pmu_name}/l1i_cache/")
+    llc_miss_rd = TryEvent("LL_CACHE_MISS_RD")
+    ase_spec = TryEvent("ASE_SPEC")
+    ld_spec = TryEvent("LD_SPEC")
+    st_spec = TryEvent("ST_SPEC")
+    vfp_spec = TryEvent("VFP_SPEC")
+    dp_spec = TryEvent("DP_SPEC")
+    br_immed_spec = TryEvent("BR_IMMED_SPEC")
+    br_indirect_spec = TryEvent("BR_INDIRECT_SPEC")
+    br_ret_spec = TryEvent("BR_RETURN_SPEC")
+    crypto_spec = TryEvent("CRYPTO_SPEC")
+    inst_spec = TryEvent("INST_SPEC")
+    return MetricGroup("lpm_topdown", [
+        MetricGroup("lpm_topdown_tl", [
+            Metric("lpm_topdown_tl_ipc", "Instructions per cycle", d_ratio(
+                ins, cycles), "insn/cycle"),
+            Metric("lpm_topdown_tl_stall_fe_rate", "Frontend stalls to all cycles",
+                   d_ratio(stall_fe, cycles), "100%") if stall_fe else None,
+            Metric("lpm_topdown_tl_stall_be_rate", "Backend stalls to all cycles",
+                   d_ratio(stall_be, cycles), "100%") if stall_be else None,
+        ]),
+        MetricGroup("lpm_topdown_fe_bound", [
+            MetricGroup("lpm_topdown_fe_br", [
+                Metric("lpm_topdown_fe_br_mp_per_insn",
+                       "Branch mispredicts per instruction retired",
+                       d_ratio(br_mp_ret, ins_ret), "br/insn") if br_mp_ret else None,
+                Metric("lpm_topdown_fe_br_ins_rate",
+                       "Branches per instruction retired", d_ratio(
+                           br_ret, ins_ret), "100%") if br_ret else None,
+                Metric("lpm_topdown_fe_br_mispredict",
+                       "Branch mispredicts per branch instruction",
+                       d_ratio(br_mp_ret, br_ret), "100%") if (br_mp_ret and br_ret) else None,
+            ]),
+            MetricGroup("lpm_topdown_fe_itlb", [
+                Metric("lpm_topdown_fe_itlb_walks", "Itlb walks per insn",
+                       d_ratio(itlb_walk, ins_ret), "walk/insn"),
+                Metric("lpm_topdown_fe_itlb_walk_rate", "Itlb walks per L1I TLB access",
+                       d_ratio(itlb_walk, l1i_tlb) if l1i_tlb else None, "100%"),
+            ]) if itlb_walk else None,
+            MetricGroup("lpm_topdown_fe_icache", [
+                Metric("lpm_topdown_fe_icache_l1i_per_insn",
+                       "L1I cache refills per instruction",
+                       d_ratio(l1i_refill, ins_ret), "l1i/insn"),
+                Metric("lpm_topdown_fe_icache_l1i_miss_rate",
+                       "L1I cache refills per L1I cache access",
+                       d_ratio(l1i_refill, l1i_access), "100%"),
+            ]),
+        ]),
+        MetricGroup("lpm_topdown_be_bound", [
+            MetricGroup("lpm_topdown_be_dtlb", [
+                Metric("lpm_topdown_be_dtlb_walks", "Dtlb walks per instruction",
+                       d_ratio(dtlb_walk, ins_ret), "walk/insn"),
+                Metric("lpm_topdown_be_dtlb_walk_rate", "Dtlb walks per L1D TLB access",
+                       d_ratio(dtlb_walk, l1d_tlb) if l1d_tlb else None, "100%"),
+            ]) if dtlb_walk else None,
+            MetricGroup("lpm_topdown_be_mix", [
+                Metric("lpm_topdown_be_mix_ld", "Percentage of load instructions",
+                       d_ratio(ld_spec, inst_spec), "100%") if ld_spec else None,
+                Metric("lpm_topdown_be_mix_st", "Percentage of store instructions",
+                       d_ratio(st_spec, inst_spec), "100%") if st_spec else None,
+                Metric("lpm_topdown_be_mix_simd", "Percentage of SIMD instructions",
+                       d_ratio(ase_spec, inst_spec), "100%") if ase_spec else None,
+                Metric("lpm_topdown_be_mix_fp",
+                       "Percentage of floating point instructions",
+                       d_ratio(vfp_spec, inst_spec), "100%") if vfp_spec else None,
+                Metric("lpm_topdown_be_mix_dp",
+                       "Percentage of data processing instructions",
+                       d_ratio(dp_spec, inst_spec), "100%") if dp_spec else None,
+                Metric("lpm_topdown_be_mix_crypto",
+                       "Percentage of data processing instructions",
+                       d_ratio(crypto_spec, inst_spec), "100%") if crypto_spec else None,
+                Metric(
+                    "lpm_topdown_be_mix_br", "Percentage of branch instructions",
+                    d_ratio(br_immed_spec + br_indirect_spec + br_ret_spec,
+                            inst_spec), "100%") if br_immed_spec and br_indirect_spec and br_ret_spec else None,
+            ], description="Breakdown of instructions by type. Counts include both useful and wasted speculative instructions"
+            ) if inst_spec else None,
+            MetricGroup("lpm_topdown_be_dcache", [
+                MetricGroup("lpm_topdown_be_dcache_l1", [
+                    Metric("lpm_topdown_be_dcache_l1_per_insn",
+                           "L1D cache refills per instruction",
+                           d_ratio(l1d_refill, ins_ret), "refills/insn"),
+                    Metric("lpm_topdown_be_dcache_l1_miss_rate",
+                           "L1D cache refills per L1D cache access",
+                           d_ratio(l1d_refill, l1d_access), "100%")
+                ]),
+                MetricGroup("lpm_topdown_be_dcache_l2", [
+                    Metric("lpm_topdown_be_dcache_l2_per_insn",
+                           "L2D cache refills per instruction",
+                           d_ratio(l2d_refill, ins_ret), "refills/insn"),
+                    Metric("lpm_topdown_be_dcache_l2_miss_rate",
+                           "L2D cache refills per L2D cache access",
+                           d_ratio(l2d_refill, l2d_access), "100%")
+                ]),
+                MetricGroup("lpm_topdown_be_dcache_llc", [
+                    Metric("lpm_topdown_be_dcache_llc_per_insn",
+                           "Last level cache misses per instruction",
+                           d_ratio(llc_miss_rd, ins_ret), "miss/insn"),
+                    Metric("lpm_topdown_be_dcache_llc_miss_rate",
+                           "Last level cache misses per last level cache access",
+                           d_ratio(llc_miss_rd, llc_access), "100%")
+                ]) if llc_miss_rd and llc_access else None,
+            ]),
+        ]),
+    ])
+
+
 def main() -> None:
     global _args
 
@@ -34,7 +171,9 @@ def main() -> None:
     directory = f"{_args.events_path}/arm64/{_args.vendor}/{_args.model}/"
     LoadEvents(directory)
 
-    all_metrics = MetricGroup("", [])
+    all_metrics = MetricGroup("", [
+        Arm64Topdown(),
+    ])
 
     if _args.metricgroups:
         print(JsonEncodeMetricGroupDescriptions(all_metrics))
-- 
2.52.0.158.g65b55ccf14-goog
Re: [PATCH v9 46/48] perf jevents: Add collection of topdown like metrics for arm64
Posted by James Clark 1 day, 8 hours ago

On 02/12/2025 5:50 pm, Ian Rogers wrote:
> Metrics are created using legacy, common and recommended events. As
> events may be missing a TryEvent function will give None if an event
> is missing. To workaround missing JSON events for cortex-a53, sysfs
> encodings are used.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> An earlier review of this patch by Leo Yan is here:
> https://lore.kernel.org/lkml/8168c713-005c-4fd9-a928-66763dab746a@arm.com/
> Hopefully all corrections were made.
> ---
>   tools/perf/pmu-events/arm64_metrics.py | 145 ++++++++++++++++++++++++-
>   1 file changed, 142 insertions(+), 3 deletions(-)
> 
[...]
> +        MetricGroup("lpm_topdown_be_bound", [
> +            MetricGroup("lpm_topdown_be_dtlb", [
> +                Metric("lpm_topdown_be_dtlb_walks", "Dtlb walks per instruction",
> +                       d_ratio(dtlb_walk, ins_ret), "walk/insn"),
> +                Metric("lpm_topdown_be_dtlb_walk_rate", "Dtlb walks per L1D TLB access",
> +                       d_ratio(dtlb_walk, l1d_tlb) if l1d_tlb else None, "100%"),
> +            ]) if dtlb_walk else None,
> +            MetricGroup("lpm_topdown_be_mix", [
> +                Metric("lpm_topdown_be_mix_ld", "Percentage of load instructions",
> +                       d_ratio(ld_spec, inst_spec), "100%") if ld_spec else None,
> +                Metric("lpm_topdown_be_mix_st", "Percentage of store instructions",
> +                       d_ratio(st_spec, inst_spec), "100%") if st_spec else None,
> +                Metric("lpm_topdown_be_mix_simd", "Percentage of SIMD instructions",
> +                       d_ratio(ase_spec, inst_spec), "100%") if ase_spec else None,
> +                Metric("lpm_topdown_be_mix_fp",
> +                       "Percentage of floating point instructions",
> +                       d_ratio(vfp_spec, inst_spec), "100%") if vfp_spec else None,
> +                Metric("lpm_topdown_be_mix_dp",
> +                       "Percentage of data processing instructions",
> +                       d_ratio(dp_spec, inst_spec), "100%") if dp_spec else None,
> +                Metric("lpm_topdown_be_mix_crypto",
> +                       "Percentage of data processing instructions",
> +                       d_ratio(crypto_spec, inst_spec), "100%") if crypto_spec else None,
> +                Metric(
> +                    "lpm_topdown_be_mix_br", "Percentage of branch instructions",
> +                    d_ratio(br_immed_spec + br_indirect_spec + br_ret_spec,
> +                            inst_spec), "100%") if br_immed_spec and br_indirect_spec and br_ret_spec else None,

Hi Ian,

I've been trying to engage with the team that's publishing the metrics 
in Arm [1] to see if there was any chance in getting some unity between 
these new metrics and their existing json ones. The feedback from them 
was that the decision to only publish metrics for certain cores is 
deliberate and there is no plan to change anything. The metrics there 
are well tested, known to be working, and usually contain workarounds 
for specific issues. They don't want to do "Arm wide" common metrics for 
existing cores as they believe it has more potential to mislead people 
than help.

I'm commenting on this "lpm_topdown_be_mix_br" as one example, that the 
equivalent Arm metric "branch_percentage" excludes br_ret_spec because 
br_indirect_spec also counts returns. Or on neoverse-n3 it's 
"PC_WRITE_SPEC / INST_SPEC".

I see that you've prefixed all the metrics so the names won't clash from 
Kan's feedback [2]. But it makes me wonder if at some point some kind of 
alias list could be implemented to override the generated metrics with 
hand written json ones. But by that point why not just use the same 
names? The Arm metric team's feedback was that there isn't really an 
industry standard for naming, and that differences between architectures 
would make it almost impossible to standardise anyway in their opinion. 
But here we're adding duplicate metrics with different names, where the 
new ones are known to have issues. It's not a great user experience IMO, 
but at the same time missing old cores from the Arm metrics isn't a 
great user experience either. I actually don't have a solution, other 
than to say I tried to get them to consider more unified naming.

I also have to say that I do still agree with Andi's old feedback [3] 
that the existing json was good enough, and maybe this isn't the right 
direction, although it's not very useful feedback at this point. I 
thought I had replied to that thread long ago, but must not have pressed 
send, sorry about that.

[1]: 
https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/tree/main/data
[2]: 
https://lore.kernel.org/lkml/43548903-b7c8-47c4-b1da-0258293ecbd4@linux.intel.com
[3]: https://lore.kernel.org/lkml/ZeJJyCmXO9GxpDiF@tassilo/
Re: [PATCH v9 46/48] perf jevents: Add collection of topdown like metrics for arm64
Posted by Ian Rogers 22 hours ago
On Tue, Dec 9, 2025 at 3:31 AM James Clark <james.clark@linaro.org> wrote:
>
> On 02/12/2025 5:50 pm, Ian Rogers wrote:
> > Metrics are created using legacy, common and recommended events. As
> > events may be missing a TryEvent function will give None if an event
> > is missing. To workaround missing JSON events for cortex-a53, sysfs
> > encodings are used.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> > An earlier review of this patch by Leo Yan is here:
> > https://lore.kernel.org/lkml/8168c713-005c-4fd9-a928-66763dab746a@arm.com/
> > Hopefully all corrections were made.
> > ---
> >   tools/perf/pmu-events/arm64_metrics.py | 145 ++++++++++++++++++++++++-
> >   1 file changed, 142 insertions(+), 3 deletions(-)
> >
> [...]
> > +        MetricGroup("lpm_topdown_be_bound", [
> > +            MetricGroup("lpm_topdown_be_dtlb", [
> > +                Metric("lpm_topdown_be_dtlb_walks", "Dtlb walks per instruction",
> > +                       d_ratio(dtlb_walk, ins_ret), "walk/insn"),
> > +                Metric("lpm_topdown_be_dtlb_walk_rate", "Dtlb walks per L1D TLB access",
> > +                       d_ratio(dtlb_walk, l1d_tlb) if l1d_tlb else None, "100%"),
> > +            ]) if dtlb_walk else None,
> > +            MetricGroup("lpm_topdown_be_mix", [
> > +                Metric("lpm_topdown_be_mix_ld", "Percentage of load instructions",
> > +                       d_ratio(ld_spec, inst_spec), "100%") if ld_spec else None,
> > +                Metric("lpm_topdown_be_mix_st", "Percentage of store instructions",
> > +                       d_ratio(st_spec, inst_spec), "100%") if st_spec else None,
> > +                Metric("lpm_topdown_be_mix_simd", "Percentage of SIMD instructions",
> > +                       d_ratio(ase_spec, inst_spec), "100%") if ase_spec else None,
> > +                Metric("lpm_topdown_be_mix_fp",
> > +                       "Percentage of floating point instructions",
> > +                       d_ratio(vfp_spec, inst_spec), "100%") if vfp_spec else None,
> > +                Metric("lpm_topdown_be_mix_dp",
> > +                       "Percentage of data processing instructions",
> > +                       d_ratio(dp_spec, inst_spec), "100%") if dp_spec else None,
> > +                Metric("lpm_topdown_be_mix_crypto",
> > +                       "Percentage of data processing instructions",
> > +                       d_ratio(crypto_spec, inst_spec), "100%") if crypto_spec else None,
> > +                Metric(
> > +                    "lpm_topdown_be_mix_br", "Percentage of branch instructions",
> > +                    d_ratio(br_immed_spec + br_indirect_spec + br_ret_spec,
> > +                            inst_spec), "100%") if br_immed_spec and br_indirect_spec and br_ret_spec else None,
>
> Hi Ian,
>
> I've been trying to engage with the team that's publishing the metrics
> in Arm [1] to see if there was any chance in getting some unity between
> these new metrics and their existing json ones. The feedback from them
> was that the decision to only publish metrics for certain cores is
> deliberate and there is no plan to change anything. The metrics there
> are well tested, known to be working, and usually contain workarounds
> for specific issues. They don't want to do "Arm wide" common metrics for
> existing cores as they believe it has more potential to mislead people
> than help.

So this is sad, but I'll drop the patch from the series so as not to
delay things and keep carrying it in Google's tree. Just looking in
tools/perf/pmu-events/arch/arm64/arm there are 20 ARM models of which
only neoverse models (5 of the 20) have metrics. Could ARM's metric
people step up to fill the void? Models like cortex-a76 are actively
sold in the Raspberry Pi 5 and yet lack metrics.

I think there has to be a rule at some point of, "don't let perfect be
the enemy of good." There's no implication that ARM should maintain
these metrics and they be perfect just as there isn't an implication
that ARM should maintain the legacy metrics like
"stalled_cycles_per_instruction":
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next#n49

I'm guessing the cycles breakdown:
https://lore.kernel.org/lkml/20251202175043.623597-48-irogers@google.com/
is okay and will keep that for ARM.

> I'm commenting on this "lpm_topdown_be_mix_br" as one example, that the
> equivalent Arm metric "branch_percentage" excludes br_ret_spec because
> br_indirect_spec also counts returns. Or on neoverse-n3 it's
> "PC_WRITE_SPEC / INST_SPEC".

This is the value in upstreaming metrics like this, to bug fix. This
is what has happened with the AMD and Intel metrics. I'm happy we can
deliver more metrics to users on those CPUs.

> I see that you've prefixed all the metrics so the names won't clash from
> Kan's feedback [2]. But it makes me wonder if at some point some kind of
> alias list could be implemented to override the generated metrics with
> hand written json ones. But by that point why not just use the same
> names? The Arm metric team's feedback was that there isn't really an
> industry standard for naming, and that differences between architectures
> would make it almost impossible to standardise anyway in their opinion.

So naming is always a challenge. One solution here is the ilist
application. When doing the legacy event reorganization I remember you
arguing that legacy events should be the norm and not the exception,
but it is because of all the model quirks precisely why that doesn't
work. By working through the quirks with cross platform metrics,
that's value to users and not misleading. If it does mislead then
that's a bug, let's fix it. Presenting users with no data isn't a fix
nor being particularly helpful.

> But here we're adding duplicate metrics with different names, where the
> new ones are known to have issues. It's not a great user experience IMO,
> but at the same time missing old cores from the Arm metrics isn't a
> great user experience either. I actually don't have a solution, other
> than to say I tried to get them to consider more unified naming.

So the lpm_ metrics are on top of whatever a vendor wants to add.
There are often more than one way to compute a metric, such as memory
controller counters vs l3 cache, on Intel an lpm_ metric may use
uncore counters while a tma_ metric uses the cache. I don't know if
sticking "ARM doesn't support this" in all the ARM lpm_ metric
descriptions would mitigate your metric creators' concerns, it is
something implied by Linux's licensing. We do highlight metrics that
contain experimental events, such as on Intel, should be considered
similarly experimental.

> I also have to say that I do still agree with Andi's old feedback [3]
> that the existing json was good enough, and maybe this isn't the right
> direction, although it's not very useful feedback at this point. I
> thought I had replied to that thread long ago, but must not have pressed
> send, sorry about that.

So having handwritten long metrics in json it's horrid, Having been
there I wouldn't want to be doing more of it. No comments, no line
breaks, huge potential for typos, peculiar rules on when commas are
allowed (so removing a line breaks parsing), .. This is why we have
make_legacy_cache.py
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/make_legacy_cache.py?h=perf-tools-next
writing 1216 legacy cache event descriptions (7266 lines of json) vs
129 lines of python. I'm going to be on team python all day long. In
terms of the Linux build, I don't think there's a reasonable
alternative language.

Thanks,
Ian


> [1]:
> https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/tree/main/data
> [2]:
> https://lore.kernel.org/lkml/43548903-b7c8-47c4-b1da-0258293ecbd4@linux.intel.com
> [3]: https://lore.kernel.org/lkml/ZeJJyCmXO9GxpDiF@tassilo/
>