[PATCH v4 00/10] perf stat fixes and improvements

Ian Rogers posted 10 patches 2 months, 3 weeks ago
tools/lib/perf/cpumap.c                |  29 ++--
tools/perf/Documentation/perf-stat.txt |   4 +
tools/perf/builtin-stat.c              | 210 +++++++++++++++----------
tools/perf/tests/parse-metric.c        |   2 -
tools/perf/tests/pmu-events.c          |   2 -
tools/perf/util/config.c               |   3 +-
tools/perf/util/drm_pmu.c              |   2 +-
tools/perf/util/evlist.c               | 156 +++++++++++-------
tools/perf/util/evlist.h               |  27 +++-
tools/perf/util/hwmon_pmu.c            |   2 +-
tools/perf/util/parse-events.c         |   9 +-
tools/perf/util/pmu.c                  |  12 ++
tools/perf/util/pmu.h                  |   1 +
tools/perf/util/stat-shadow.c          | 152 ++++++++----------
tools/perf/util/stat.h                 |  23 ---
tools/perf/util/tool_pmu.c             |  78 ++++++---
tools/perf/util/tool_pmu.h             |   1 +
17 files changed, 407 insertions(+), 306 deletions(-)
[PATCH v4 00/10] perf stat fixes and improvements
Posted by Ian Rogers 2 months, 3 weeks ago
A collection of fixes aiming to stabilize and make more reasonable
measurements/metrics such as memory bandwidth.

Tool events are changed from getting a PMU cpu map of all online CPUs
to either CPU 0 or all online CPUs. This avoids iterating over useless
CPUs for events in particular `duration_time`. Fix a bug where
duration_time didn't correctly use the previous raw counts and would
skip values in interval mode.

Change how json metrics handle tool events. Use the counter value
rather than using shared state with perf stat. A later patch changes
it so that tool events are read last, so that if reading say memory
bandwidth counters you don't divide by an earlier read time and exceed
the theoretical maximum memory bandwidth.

Do some clean up around the shared state in stat-shadow that's no
longer used.

Change how affinities work with evlist__for_each_cpu. Move the
affinity code into the iterator to simplify setting it up. Detect when
affinities will and won't be profitable, for example a tool event and
a regular perf event (or read group) may face less delay from a single
IPI for the event read than from a call to sched_setaffinity. Add a
 --no-affinity flag to perf stat to allow affinities to be disabled.

v4: Rebase. Add patch to reduce scope of walltime_nsec_stats now that
    the legacy metric code is no more. Minor tweak to the ru_stats
    clean up.

v3: Add affinity clean ups and read tool events last.
https://lore.kernel.org/lkml/20251106071241.141234-1-irogers@google.com/

v2: Fixed an aggregation index issue:
https://lore.kernel.org/lkml/20251104234148.3103176-2-irogers@google.com/

v1:
https://lore.kernel.org/lkml/20251104053449.1208800-1-irogers@google.com/

Ian Rogers (10):
  libperf cpumap: Reduce allocations and sorting in intersect
  perf pmu: perf_cpu_map__new_int to avoid parsing a string
  perf tool_pmu: Use old_count when computing count values for time
    events
  perf stat-shadow: Read tool events directly
  perf stat: Reduce scope of ru_stats
  perf stat: Reduce scope of walltime_nsecs_stats
  perf tool_pmu: More accurately set the cpus for tool events
  perf evlist: Reduce affinity use and move into iterator, fix no
    affinity
  perf stat: Read tool events last
  perf stat: Add no-affinity flag

 tools/lib/perf/cpumap.c                |  29 ++--
 tools/perf/Documentation/perf-stat.txt |   4 +
 tools/perf/builtin-stat.c              | 210 +++++++++++++++----------
 tools/perf/tests/parse-metric.c        |   2 -
 tools/perf/tests/pmu-events.c          |   2 -
 tools/perf/util/config.c               |   3 +-
 tools/perf/util/drm_pmu.c              |   2 +-
 tools/perf/util/evlist.c               | 156 +++++++++++-------
 tools/perf/util/evlist.h               |  27 +++-
 tools/perf/util/hwmon_pmu.c            |   2 +-
 tools/perf/util/parse-events.c         |   9 +-
 tools/perf/util/pmu.c                  |  12 ++
 tools/perf/util/pmu.h                  |   1 +
 tools/perf/util/stat-shadow.c          | 152 ++++++++----------
 tools/perf/util/stat.h                 |  23 ---
 tools/perf/util/tool_pmu.c             |  78 ++++++---
 tools/perf/util/tool_pmu.h             |   1 +
 17 files changed, 407 insertions(+), 306 deletions(-)

-- 
2.51.2.1041.gc1ab5b90ca-goog
Re: [PATCH v4 00/10] perf stat fixes and improvements
Posted by Namhyung Kim 2 months, 2 weeks ago
On Thu, Nov 13, 2025 at 10:05:06AM -0800, Ian Rogers wrote:
> A collection of fixes aiming to stabilize and make more reasonable
> measurements/metrics such as memory bandwidth.
> 
> Tool events are changed from getting a PMU cpu map of all online CPUs
> to either CPU 0 or all online CPUs. This avoids iterating over useless
> CPUs for events in particular `duration_time`. Fix a bug where
> duration_time didn't correctly use the previous raw counts and would
> skip values in interval mode.
> 
> Change how json metrics handle tool events. Use the counter value
> rather than using shared state with perf stat. A later patch changes
> it so that tool events are read last, so that if reading say memory
> bandwidth counters you don't divide by an earlier read time and exceed
> the theoretical maximum memory bandwidth.

[...]

Applied the first 7 patches to perf-tools-next, thanks!

Best regards,
Namhyung