[PATCH v3 0/2] perf stat: Fix uncore metric scaling across aggregation modes

Chun-Tse Shao posted 2 patches 2 days, 22 hours ago
tools/perf/pmu-events/intel_metrics.py |  6 +++---
tools/perf/pmu-events/metric.py        |  9 +++++++--
tools/perf/util/expr.c                 | 26 ++++++++++++++++++++++----
tools/perf/util/expr.h                 |  6 +++++-
tools/perf/util/expr.l                 |  1 +
tools/perf/util/expr.y                 | 24 +++++++++++++++++-------
tools/perf/util/stat-shadow.c          |  6 +++++-
7 files changed, 60 insertions(+), 18 deletions(-)
[PATCH v3 0/2] perf stat: Fix uncore metric scaling across aggregation modes
Posted by Chun-Tse Shao 2 days, 22 hours ago
This series fixes a scaling issue for metrics (like lpm_miss_lat) across
different runtime aggregation modes.

Uncore metrics currently use `source_count` to scale events. However,
`source_count` returns the total uncore unit count regardless of the
selected aggregation mode. When evaluating metrics in different
aggregation mode other than `--per-socket`, this incorrectly divides
aggregated uncore events against the total uncore count rather than the
uncores belonging to the aggregation, leading to wrong metric results.

To fix this, we:
1. Introduce the aggr_nr() keyword to the metric parser, which
dynamically resolves to the active units in the current aggregation
group (`gr->nr`).

2. Update the python metrics to use `aggr_nr` instead of `source_count`,
ensuring correct scaling across all runtime aggregation boundaries.

Before the fix (incorrect low latency in global mode):
  $ perf stat -M lpm_miss_lat --metric-only -a -j -- sleep 1
  {"ns  lpm_miss_lat_rem" : "122.8", "ns  lpm_miss_lat_loc" : "114.5"}
  $ perf stat -M lpm_miss_lat --per-socket --metric-only -a -j -- sleep 1
  {"socket" : "S0", "ns  lpm_miss_lat_rem" : "232.1", "ns  lpm_miss_lat_loc" : "278.2"}
  {"socket" : "S1", "ns  lpm_miss_lat_rem" : "233.9", "ns  lpm_miss_lat_loc" : "257.5"}

After the fix (correct scaled latency in all aggregation modes):
  $ perf stat -M lpm_miss_lat --metric-only -a -j -- sleep 1
  {"ns  lpm_miss_lat_rem" : "231.7", "ns  lpm_miss_lat_loc" : "245.0"}
  $ perf stat -M lpm_miss_lat --per-socket --metric-only -a -j -- sleep 1
  {"socket" : "S0", "ns  lpm_miss_lat_rem" : "238.3", "ns  lpm_miss_lat_loc" : "249.4"}
  {"socket" : "S1", "ns  lpm_miss_lat_rem" : "259.1", "ns  lpm_miss_lat_loc" : "253.1"}

v3:
  Fixed based on Sashiko review:
  - Removed the unnecessary, copied `redefined-builtin` pylint-disable
    comment from `aggr_nr` definition inside `metric.py`.

v2: lore.kernel.org/20260521035941.3860145-1-ctshao@google.com
  Fixed based on Sashiko review:
  - Fixed `aggr_nr` setting when an uncore event fails to run
    (counts.run == 0) to explicitly set it to 0 instead of defaulting to
    1.
  - Accumulated `aggr_nr` when multiple unmerged PMU events are
    associated with the same metric ID to prevent incorrect scaling
    across active sockets.
  - Removed unused `List` import from `typing` in `intel_metrics.py`.

v1: lore.kernel.org/20260520180032.3045144-1-ctshao@google.com

Chun-Tse Shao (2):
  perf stat: Add aggr_nr metric parser support
  perf stat: Use aggr_nr scaling for Intel uncore miss latency metrics

 tools/perf/pmu-events/intel_metrics.py |  6 +++---
 tools/perf/pmu-events/metric.py        |  9 +++++++--
 tools/perf/util/expr.c                 | 26 ++++++++++++++++++++++----
 tools/perf/util/expr.h                 |  6 +++++-
 tools/perf/util/expr.l                 |  1 +
 tools/perf/util/expr.y                 | 24 +++++++++++++++++-------
 tools/perf/util/stat-shadow.c          |  6 +++++-
 7 files changed, 60 insertions(+), 18 deletions(-)

--
2.54.0.746.g67dd491aae-goog