tools/perf/builtin-stat.c | 261 +++--- .../tests/shell/lib/perf_metric_validation.py | 12 +- tools/perf/tests/shell/stat+csv_output.sh | 19 + tools/perf/tests/shell/stat+json_output.sh | 74 +- tools/perf/tests/shell/stat+std_output.sh | 18 + tools/perf/tests/shell/stat_metrics_values.sh | 13 +- tools/perf/util/Build | 4 + tools/perf/util/stat-display.c | 28 +- tools/perf/util/stat-print-csv.c | 534 ++++++++++++ tools/perf/util/stat-print-json.c | 330 ++++++++ tools/perf/util/stat-print-std.c | 773 ++++++++++++++++++ tools/perf/util/stat-print.c | 490 +++++++++++ tools/perf/util/stat-print.h | 133 +++ tools/perf/util/stat.h | 2 + 14 files changed, 2519 insertions(+), 172 deletions(-) create mode 100644 tools/perf/util/stat-print-csv.c create mode 100644 tools/perf/util/stat-print-json.c create mode 100644 tools/perf/util/stat-print-std.c create mode 100644 tools/perf/util/stat-print.c create mode 100644 tools/perf/util/stat-print.h
This RFC patch series introduces a complete architectural refactoring
to decouple and modularize the event and metric output printing
engine inside 'perf stat'.
======================
Background and Motivation
======================
Historically, 'perf stat' output printing was tightly coupled with
data collection, aggregation math, and shadow metrics calculation.
Formatting logic (Standard Console, CSV, and JSON) was scattered
across util/stat-display.c, featuring massive, complex switch-cases,
temporal adjacency assumptions, and duplicated layout logic. Adding
new metrics, uncore PMUs, or topology-aware CPU aggregation modes
frequently resulted in accidental layout regressions, broken field
counts in CSV linters, or parsing crashes.
This patch series decouples the data-traversal and shadows-metric
calculations from the visual layout rendering, introducing a highly
optimized, modular, and type-safe callback-driven print
architecture.
======================
Decoupled Printing Strategy
======================
1. Format-Agnostic Traversal Driver (util/stat-print.c)
The core display logic is abstracted into a generic traversal
driver, perf_stat__print_cb(). This driver manages the complex
CPU/thread/topology aggregation loops, resolves hybrid wildcard
merges, filters default skipped uncore metrics, and calculates
raw shadow metrics. Once the data points are prepared, the driver
streams them cleanly to formatting callbacks.
- Safety: The core `calculate_and_print_metric` traversal is
fully protected with early-exit checks if formatting callbacks
choose to leave `print_metric` unpopulated.
2. Type-Safe Callbacks Interface (struct perf_stat_print_callbacks)
Output formats communicate with the driver using a clean
streaming interface:
- print_start(): Initializes format-private DOM states.
- print_event(): Buffers or prints raw counter event details.
- print_metric(): Buffers or prints calculated shadow metrics.
- print_end(): Finalizes rendering and cleans up structures.
3. Format-Specific Rendering Engines:
- Standard Console (util/stat-print-std.c):
Buffers events and metrics into standard-private DOM lists.
It resolves default metric-group skipped headers, prepends
formatted interval timestamps, aligns rows dynamically using
aggr_header_lens, and prints them cleanly in print_end().
- Refinement: Cleanly resolves `aggr_idx == -1` global indices
by tracking bounds with a `-2` initialization indicator,
protecting all lookups from out-of-bounds array reads.
It safely resets the active event pointer if a zero counter
is skipped locally, avoiding temporal violation false-positives.
- CSV Printing (util/stat-print-csv.c):
Buffers events and metrics into format-private queues,
formatting rows separated by config->csv_sep. Corrects
metrics continuation padding to print exactly 4 separators,
ensuring column counts are strictly and visually valid.
- Refinement: Decoupled CSV headers now output static
structural labels (e.g. "cpu,", "die,") instead of live
hardware IDs, and prevent redundant header rows in interval
mode by persisting state tracking.
- Streaming JSON Printing (util/stat-print-json.c):
Implements a highly optimized, 100% streaming, zero-allocation
print engine that bypasses dynamic queues and metrics buffering
completely! JSON objects and interval keys are formatted and
streamed directly onto the output file descriptor, maximizing
speed and eliminating heap allocation overhead.
- Refinement: Completely zero-allocation fast-path rendering
inside `json_metric_only_print_metric` by streaming strings
directly without dynamic `asprintf` or `strdup` overheads.
4. Centralized Aggregation Prefix Formatting
Duplicates in CPU/thread aggregation prefix rendering are
completely eliminated by exposing arrays globally and introducing
shared generic helpers in stat-print.c:
- perf_stat__get_aggr_key(): Resolves the JSON key name.
- perf_stat__get_aggr_id_char(): Resolves the unified prefix.
This mathematically guarantees absolute structural and visual
consistency across all formats.
5. Temporal Coupling Sanity Checks
A strict temporal coupling constraint (that the traversal driver
always invokes print_metric() callbacks synchronously and
consecutively for the same PMU/event node immediately after its
print_event() callback) is formally protected by adding a
runtime evsel matching check inside both STD and CSV engines:
if (evsel != ps->current_event->evsel) abort_print();
======================
Verification and Testing
======================
All automated shell linters (stat+std_output.sh, stat+csv_output.sh,
stat+json_output.sh) have been extended to run their entire
aggregation suites a second time under the new printer flag
(--new), passing with 100% success. The PMU metrics value Python
validation script and stat_metrics_values.sh have also been
extended with --new flag testing, ensuring complete mathematical
correctness of calculated metric values.
- Test Quality: JSON linter checks define dynamic `api_label`
indicators to generate highly distinguishable and descriptive
output logs between legacy and `--new` passes.
======================
Changes since v1:
======================
- calculate_and_print_metric: added safe print_metric NULL callback check.
- should_skip_zero_counter: added safe aggr_idx bounds check to avoid
out-of-bounds mapping array access when aggr_idx is negative.
- std_print_event: reset ps->current_event pointer on skipped zero counters
to avoid temporal coupling mismatch violations.
- std_metric_only_print_end: only print metric headers once in
interval mode, and print dynamic spacing padding to perfectly
align columns.
- csv_metric_only_print_end: only print CSV headers once in
interval mode, print static aggregation labels instead of live
hardware IDs, and fix column misalignment under AGGR_GLOBAL by
initializing current_aggr to -2 sentinel.
- json_metric_only_print_metric: completely zero-allocation fast-path
rendering by streaming combined keys directly without dynamic heap string
allocations, and resolve AGGR_GLOBAL indices by initializing
last_aggr_idx to -2.
- stat+json_output.sh: define dynamic api_label to generate highly
distinguishable and descriptive output logs between legacy and
--new passes.
- merged duplicate skip_test block structures inside linter shell scripts.
- documented -2 sentinel choices as C comments inside standard, CSV,
and JSON print engines.
We would highly appreciate reviews, comments, and feedback on this
decoupled output printing strategy.
Assisted-by: Antigravity:gemini-3.5-flash
***
Ian Rogers (14):
perf stat: Introduce core generic print traversal engine and header
stubs
perf stat: Implement standard console (STD) formatting callbacks
perf stat: Extend STD output linter to test basic New API checks
perf stat: Extend STD output linter to test core aggregation checks
perf stat: Extend STD output linter to test advanced PMU checks
perf stat: Extend STD output linter to test metric-only checks
perf stat: Implement CSV formatting callbacks
perf stat: Extend CSV output linter to test core aggregation checks
perf stat: Extend CSV output linter to test advanced PMU and
metric-only checks
perf stat: Implement streaming JSON formatting callbacks
perf stat: Extend JSON output linter to test core aggregation checks
perf stat: Extend JSON output linter to test advanced PMU and
metric-only checks
perf stat: Add --new support to PMU metrics Python validator
perf stat: Extend PMU metrics value linter to validate --new outputs
tools/perf/builtin-stat.c | 261 +++---
.../tests/shell/lib/perf_metric_validation.py | 12 +-
tools/perf/tests/shell/stat+csv_output.sh | 19 +
tools/perf/tests/shell/stat+json_output.sh | 74 +-
tools/perf/tests/shell/stat+std_output.sh | 18 +
tools/perf/tests/shell/stat_metrics_values.sh | 13 +-
tools/perf/util/Build | 4 +
tools/perf/util/stat-display.c | 28 +-
tools/perf/util/stat-print-csv.c | 534 ++++++++++++
tools/perf/util/stat-print-json.c | 330 ++++++++
tools/perf/util/stat-print-std.c | 773 ++++++++++++++++++
tools/perf/util/stat-print.c | 490 +++++++++++
tools/perf/util/stat-print.h | 133 +++
tools/perf/util/stat.h | 2 +
14 files changed, 2519 insertions(+), 172 deletions(-)
create mode 100644 tools/perf/util/stat-print-csv.c
create mode 100644 tools/perf/util/stat-print-json.c
create mode 100644 tools/perf/util/stat-print-std.c
create mode 100644 tools/perf/util/stat-print.c
create mode 100644 tools/perf/util/stat-print.h
--
2.54.0.794.g4f17f83d09-goog
Hi Ian, On Mon, May 25, 2026 at 4:19 PM Ian Rogers <irogers@google.com> wrote: > > This RFC patch series introduces a complete architectural refactoring > to decouple and modularize the event and metric output printing > engine inside 'perf stat'. I really like this change. Historically, fixing printing format issues in perf stat has been painful due to the tight coupling of printing logic with aggregation and math in util/stat-display.c. Decoupling these logics makes the codebase much easier to maintain and simplifies future changes to the print format. Acked-by: Chun-Tse Shao <ctshao@google.com> > > > ====================== > Background and Motivation > ====================== > Historically, 'perf stat' output printing was tightly coupled with > data collection, aggregation math, and shadow metrics calculation. > Formatting logic (Standard Console, CSV, and JSON) was scattered > across util/stat-display.c, featuring massive, complex switch-cases, > temporal adjacency assumptions, and duplicated layout logic. Adding > new metrics, uncore PMUs, or topology-aware CPU aggregation modes > frequently resulted in accidental layout regressions, broken field > counts in CSV linters, or parsing crashes. > > This patch series decouples the data-traversal and shadows-metric > calculations from the visual layout rendering, introducing a highly > optimized, modular, and type-safe callback-driven print > architecture. > > ====================== > Decoupled Printing Strategy > ====================== > 1. Format-Agnostic Traversal Driver (util/stat-print.c) > The core display logic is abstracted into a generic traversal > driver, perf_stat__print_cb(). This driver manages the complex > CPU/thread/topology aggregation loops, resolves hybrid wildcard > merges, filters default skipped uncore metrics, and calculates > raw shadow metrics. Once the data points are prepared, the driver > streams them cleanly to formatting callbacks. > - Safety: The core `calculate_and_print_metric` traversal is > fully protected with early-exit checks if formatting callbacks > choose to leave `print_metric` unpopulated. > > 2. Type-Safe Callbacks Interface (struct perf_stat_print_callbacks) > Output formats communicate with the driver using a clean > streaming interface: > - print_start(): Initializes format-private DOM states. > - print_event(): Buffers or prints raw counter event details. > - print_metric(): Buffers or prints calculated shadow metrics. > - print_end(): Finalizes rendering and cleans up structures. > > 3. Format-Specific Rendering Engines: > - Standard Console (util/stat-print-std.c): > Buffers events and metrics into standard-private DOM lists. > It resolves default metric-group skipped headers, prepends > formatted interval timestamps, aligns rows dynamically using > aggr_header_lens, and prints them cleanly in print_end(). > - Refinement: Cleanly resolves `aggr_idx == -1` global indices > by tracking bounds with a `-2` initialization indicator, > protecting all lookups from out-of-bounds array reads. > It safely resets the active event pointer if a zero counter > is skipped locally, avoiding temporal violation false-positives. > - CSV Printing (util/stat-print-csv.c): > Buffers events and metrics into format-private queues, > formatting rows separated by config->csv_sep. Corrects > metrics continuation padding to print exactly 4 separators, > ensuring column counts are strictly and visually valid. > - Refinement: Decoupled CSV headers now output static > structural labels (e.g. "cpu,", "die,") instead of live > hardware IDs, and prevent redundant header rows in interval > mode by persisting state tracking. > - Streaming JSON Printing (util/stat-print-json.c): > Implements a highly optimized, 100% streaming, zero-allocation > print engine that bypasses dynamic queues and metrics buffering > completely! JSON objects and interval keys are formatted and > streamed directly onto the output file descriptor, maximizing > speed and eliminating heap allocation overhead. > - Refinement: Completely zero-allocation fast-path rendering > inside `json_metric_only_print_metric` by streaming strings > directly without dynamic `asprintf` or `strdup` overheads. > > 4. Centralized Aggregation Prefix Formatting > Duplicates in CPU/thread aggregation prefix rendering are > completely eliminated by exposing arrays globally and introducing > shared generic helpers in stat-print.c: > - perf_stat__get_aggr_key(): Resolves the JSON key name. > - perf_stat__get_aggr_id_char(): Resolves the unified prefix. > This mathematically guarantees absolute structural and visual > consistency across all formats. > > 5. Temporal Coupling Sanity Checks > A strict temporal coupling constraint (that the traversal driver > always invokes print_metric() callbacks synchronously and > consecutively for the same PMU/event node immediately after its > print_event() callback) is formally protected by adding a > runtime evsel matching check inside both STD and CSV engines: > if (evsel != ps->current_event->evsel) abort_print(); > > ====================== > Verification and Testing > ====================== > All automated shell linters (stat+std_output.sh, stat+csv_output.sh, > stat+json_output.sh) have been extended to run their entire > aggregation suites a second time under the new printer flag > (--new), passing with 100% success. The PMU metrics value Python > validation script and stat_metrics_values.sh have also been > extended with --new flag testing, ensuring complete mathematical > correctness of calculated metric values. > - Test Quality: JSON linter checks define dynamic `api_label` > indicators to generate highly distinguishable and descriptive > output logs between legacy and `--new` passes. > > ====================== > Changes since v1: > ====================== > - calculate_and_print_metric: added safe print_metric NULL callback check. > - should_skip_zero_counter: added safe aggr_idx bounds check to avoid > out-of-bounds mapping array access when aggr_idx is negative. > - std_print_event: reset ps->current_event pointer on skipped zero counters > to avoid temporal coupling mismatch violations. > - std_metric_only_print_end: only print metric headers once in > interval mode, and print dynamic spacing padding to perfectly > align columns. > - csv_metric_only_print_end: only print CSV headers once in > interval mode, print static aggregation labels instead of live > hardware IDs, and fix column misalignment under AGGR_GLOBAL by > initializing current_aggr to -2 sentinel. > - json_metric_only_print_metric: completely zero-allocation fast-path > rendering by streaming combined keys directly without dynamic heap string > allocations, and resolve AGGR_GLOBAL indices by initializing > last_aggr_idx to -2. > - stat+json_output.sh: define dynamic api_label to generate highly > distinguishable and descriptive output logs between legacy and > --new passes. > - merged duplicate skip_test block structures inside linter shell scripts. > - documented -2 sentinel choices as C comments inside standard, CSV, > and JSON print engines. > > We would highly appreciate reviews, comments, and feedback on this > decoupled output printing strategy. > > Assisted-by: Antigravity:gemini-3.5-flash > > *** > > Ian Rogers (14): > perf stat: Introduce core generic print traversal engine and header > stubs > perf stat: Implement standard console (STD) formatting callbacks > perf stat: Extend STD output linter to test basic New API checks > perf stat: Extend STD output linter to test core aggregation checks > perf stat: Extend STD output linter to test advanced PMU checks > perf stat: Extend STD output linter to test metric-only checks > perf stat: Implement CSV formatting callbacks > perf stat: Extend CSV output linter to test core aggregation checks > perf stat: Extend CSV output linter to test advanced PMU and > metric-only checks > perf stat: Implement streaming JSON formatting callbacks > perf stat: Extend JSON output linter to test core aggregation checks > perf stat: Extend JSON output linter to test advanced PMU and > metric-only checks > perf stat: Add --new support to PMU metrics Python validator > perf stat: Extend PMU metrics value linter to validate --new outputs > > tools/perf/builtin-stat.c | 261 +++--- > .../tests/shell/lib/perf_metric_validation.py | 12 +- > tools/perf/tests/shell/stat+csv_output.sh | 19 + > tools/perf/tests/shell/stat+json_output.sh | 74 +- > tools/perf/tests/shell/stat+std_output.sh | 18 + > tools/perf/tests/shell/stat_metrics_values.sh | 13 +- > tools/perf/util/Build | 4 + > tools/perf/util/stat-display.c | 28 +- > tools/perf/util/stat-print-csv.c | 534 ++++++++++++ > tools/perf/util/stat-print-json.c | 330 ++++++++ > tools/perf/util/stat-print-std.c | 773 ++++++++++++++++++ > tools/perf/util/stat-print.c | 490 +++++++++++ > tools/perf/util/stat-print.h | 133 +++ > tools/perf/util/stat.h | 2 + > 14 files changed, 2519 insertions(+), 172 deletions(-) > create mode 100644 tools/perf/util/stat-print-csv.c > create mode 100644 tools/perf/util/stat-print-json.c > create mode 100644 tools/perf/util/stat-print-std.c > create mode 100644 tools/perf/util/stat-print.c > create mode 100644 tools/perf/util/stat-print.h > > -- > 2.54.0.794.g4f17f83d09-goog > >
© 2016 - 2026 Red Hat, Inc.