[RFC PATCH v1 00/14] perf stat: Decouple and modularize metrics/events output printing API

Ian Rogers posted 14 patches 1 day, 20 hours ago
tools/perf/builtin-stat.c                     | 261 ++++---
.../tests/shell/lib/perf_metric_validation.py |  12 +-
tools/perf/tests/shell/stat+csv_output.sh     |  22 +
tools/perf/tests/shell/stat+json_output.sh    |  49 +-
tools/perf/tests/shell/stat+std_output.sh     |  21 +
tools/perf/tests/shell/stat_metrics_values.sh |  13 +-
tools/perf/util/Build                         |   4 +
tools/perf/util/stat-print-csv.c              | 527 +++++++++++++
tools/perf/util/stat-print-json.c             | 338 ++++++++
tools/perf/util/stat-print-std.c              | 739 ++++++++++++++++++
tools/perf/util/stat-print.c                  | 487 ++++++++++++
tools/perf/util/stat-print.h                  | 129 +++
12 files changed, 2456 insertions(+), 146 deletions(-)
create mode 100644 tools/perf/util/stat-print-csv.c
create mode 100644 tools/perf/util/stat-print-json.c
create mode 100644 tools/perf/util/stat-print-std.c
create mode 100644 tools/perf/util/stat-print.c
create mode 100644 tools/perf/util/stat-print.h
[RFC PATCH v1 00/14] perf stat: Decouple and modularize metrics/events output printing API
Posted by Ian Rogers 1 day, 20 hours ago
This RFC patch series introduces a complete architectural refactoring to decouple
and modularize the event and metric output printing engine inside 'perf stat'.

======================
Background and Motivation
======================
Historically, 'perf stat' output printing was tightly coupled with data collection,
aggregation math, and metrics calculation. Formatting logic (Standard Console,
CSV, and JSON) was scattered across util/stat-display.c, featuring complex
switch-cases, temporal adjacency assumptions, and duplicated layout logic. Adding new
metrics, uncore PMUs, or topology-aware CPU aggregation modes could result in
accidental layout regressions. A recently reported regression was:
https://lore.kernel.org/linux-perf-users/20260513144906.557896-1-ak@linux.intel.com/

This patch series decouples the data-traversal and shadows-metric calculations from
the visual layout rendering, introducing a modular callback-driven print architecture.

======================
Decoupled Printing Strategy
======================
1. Format-Agnostic Traversal Driver (util/stat-print.c)
   The core display logic is abstracted into a generic traversal driver,
   perf_stat__print_cb(). This driver manages the CPU/thread/topology
   aggregation loops, resolves hybrid wildcard merges, filters default skipped uncore
   metrics, and calculates raw metrics. Once the data points are prepared,
   the driver streams them cleanly to the output formatting callbacks.

2. Type-Safe Callbacks Interface (struct perf_stat_print_callbacks)
   Output formats communicate with the driver using a streaming interface:
   - print_start(): Initializes format-private DOM states or buffers.
   - print_event(): Buffers or prints raw counter event details (val, ena, run).
   - print_metric(): Buffers or prints calculated shadow metric values and thresholds.
   - print_end(): Finalizes rendering, formats padding, and cleans up state structures.

3. Format-Specific Rendering Engines:
   - Standard Console (util/stat-print-std.c):
     Buffers events and nested metrics into standard-private DOM lists. It resolves
     default metric-group skipped headers, prepends formatted interval timestamps,
     aligns continuation rows dynamically using aggr_header_lens, and prints them
     cleanly in print_end().
   - CSV Printing (util/stat-print-csv.c):
     Buffers events and metrics into format-private queues, formatting rows
     separated by config->csv_sep. Corrects metrics continuation padding to print
     exactly 4 separators, ensuring column counts are strictly and visually valid.
   - Streaming JSON Printing (util/stat-print-json.c):
     Implements a highly optimized, 100% streaming, zero-allocation print engine
     that bypasses dynamic queues and metrics buffering completely! JSON objects
     and interval keys are formatted and streamed directly onto the output file
     descriptor, maximizing speed and eliminating heap allocation overhead.

4. Centralized Aggregation Prefix Formatting
   Duplicates in CPU/thread aggregation prefix rendering are completely eliminated
   by introducing shared generic helpers in stat-print.c:
   - perf_stat__get_aggr_key(): Resolves the JSON key name.
   - perf_stat__get_aggr_id_char(): Resolves the unified aggregation ID string
     (e.g. "S0-D0-C0", "comm-pid").
   This mathematically guarantees absolute structural and visual consistency across
   all formats (Standard console pads dynamically, CSV splits by separator, JSON
   injects key-value pairs).

5. Temporal Coupling Sanity Checks
   A strict temporal coupling constraint (that the traversal driver always invokes
   print_metric() callbacks synchronously and consecutively for the same PMU/event
   node immediately after its print_event() callback) is formally protected by
   adding a runtime evsel matching check inside both STD and CSV print engines:
   if (evsel != ps->current_event->evsel) abort_print();

======================
Verification and Testing
======================
All automated shell linters (stat+std_output.sh, stat+csv_output.sh,
stat+json_output.sh) have been extended to run their entire aggregation suites a
second time under the new printer flag (--new), passing with 100% success. The PMU
metrics value Python validation script and stat_metrics_values.sh have also been
extended with --new flag testing, ensuring complete mathematical correctness of
calculated metric values.

All files have been beautifully formatted, passing scripts/checkpatch.pl with
exactly 0 errors on C and header code paths!

======================
Next Steps
======================
We would highly appreciate reviews, comments, and feedback on this
decoupled output printing strategy. After sufficient use, and because
tools are sensitive to output formatting, the "--new" option can be
the default. We can then look to remove stat-display.c and
stat-shadow.c.

***

Ian Rogers (14):
  perf stat: Introduce core generic print traversal engine and header
    stubs
  perf stat: Implement standard console (STD) formatting callbacks
  perf stat: Extend STD output linter to test basic New API checks
  perf stat: Extend STD output linter to test core aggregation checks
  perf stat: Extend STD output linter to test advanced PMU checks
  perf stat: Extend STD output linter to test metric-only checks
  perf stat: Implement CSV formatting callbacks
  perf stat: Extend CSV output linter to test core aggregation checks
  perf stat: Extend CSV output linter to test advanced PMU and
    metric-only checks
  perf stat: Implement streaming JSON formatting callbacks
  perf stat: Extend JSON output linter to test core aggregation checks
  perf stat: Extend JSON output linter to test advanced PMU and
    metric-only checks
  perf stat: Add --new support to PMU metrics Python validator
  perf stat: Extend PMU metrics value linter to validate --new outputs

 tools/perf/builtin-stat.c                     | 261 ++++---
 .../tests/shell/lib/perf_metric_validation.py |  12 +-
 tools/perf/tests/shell/stat+csv_output.sh     |  22 +
 tools/perf/tests/shell/stat+json_output.sh    |  49 +-
 tools/perf/tests/shell/stat+std_output.sh     |  21 +
 tools/perf/tests/shell/stat_metrics_values.sh |  13 +-
 tools/perf/util/Build                         |   4 +
 tools/perf/util/stat-print-csv.c              | 527 +++++++++++++
 tools/perf/util/stat-print-json.c             | 338 ++++++++
 tools/perf/util/stat-print-std.c              | 739 ++++++++++++++++++
 tools/perf/util/stat-print.c                  | 487 ++++++++++++
 tools/perf/util/stat-print.h                  | 129 +++
 12 files changed, 2456 insertions(+), 146 deletions(-)
 create mode 100644 tools/perf/util/stat-print-csv.c
 create mode 100644 tools/perf/util/stat-print-json.c
 create mode 100644 tools/perf/util/stat-print-std.c
 create mode 100644 tools/perf/util/stat-print.c
 create mode 100644 tools/perf/util/stat-print.h

-- 
2.54.0.794.g4f17f83d09-goog