perf stat: Cleanup counter aggregation (v1)

[RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)

Posted by Namhyung Kim 3 years, 5 months ago

Hello,

Current perf stat code is somewhat hard to follow since it handles
many combinations of PMUs/events for given display and aggregation
options.  This is my attempt to clean it up a little. ;-)

My first concern is that aggregation and display routines are intermixed
and processed differently depends on the aggregation mode.  I'd like to
separate them apart and make the logic clearer.

To do that, I added struct perf_stat_aggr to save the aggregated counter
values and other info.  It'll be allocated and processed according to
the aggr_mode and display logic will use it.

The code is available at 'perf/stat-aggr-v1' branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (19):
  perf tools: Save evsel->pmu in parse_events()
  perf tools: Use pmu info in evsel__is_hybrid()
  perf stat: Use evsel__is_hybrid() more
  perf stat: Add aggr id for global mode
  perf stat: Add cpu aggr id for no aggregation mode
  perf stat: Add 'needs_sort' argument to cpu_aggr_map__new()
  perf stat: Add struct perf_stat_aggr to perf_stat_evsel
  perf stat: Allocate evsel->stats->aggr properly
  perf stat: Aggregate events using evsel->stats->aggr
  perf stat: Aggregate per-thread stats using evsel->stats->aggr
  perf stat: Allocate aggr counts for recorded data
  perf stat: Reset aggr counts for each interval
  perf stat: Split process_counters()
  perf stat: Add perf_stat_merge_counters()
  perf stat: Add perf_stat_process_percore()
  perf stat: Add perf_stat_process_shadow_stats()
  perf stat: Display event stats using aggr counts
  perf stat: Display percore events properly
  perf stat: Remove unused perf_counts.aggr field

 tools/perf/builtin-script.c                   |   4 +-
 tools/perf/builtin-stat.c                     | 177 +++++--
 tools/perf/tests/parse-metric.c               |   2 +-
 tools/perf/tests/pmu-events.c                 |   2 +-
 tools/perf/util/counts.c                      |   1 -
 tools/perf/util/counts.h                      |   1 -
 tools/perf/util/cpumap.c                      |  16 +-
 tools/perf/util/cpumap.h                      |   8 +-
 tools/perf/util/evsel.c                       |  13 +-
 tools/perf/util/parse-events.c                |   1 +
 tools/perf/util/pmu.c                         |   4 +
 .../scripting-engines/trace-event-python.c    |   6 -
 tools/perf/util/stat-display.c                | 472 +++---------------
 tools/perf/util/stat.c                        | 383 +++++++++++---
 tools/perf/util/stat.h                        |  29 +-
 15 files changed, 590 insertions(+), 529 deletions(-)


base-commit: d79310700590b8b40d8c867012d6c899ea6fd505
-- 
2.38.0.rc1.362.ged0d419d3c-goog

Re: [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)

Posted by Andi Kleen 3 years, 5 months ago

On 10/10/2022 10:35 PM, Namhyung Kim wrote:
> Hello,
>
> Current perf stat code is somewhat hard to follow since it handles
> many combinations of PMUs/events for given display and aggregation
> options.  This is my attempt to clean it up a little. ;-)

My main concern would be subtle regressions since there are so many 
different combinations and way to travel through the code, and a lot of 
things are not covered by unit tests. When I worked on the code it was 
difficult to keep it all working. I assume you have some way to 
enumerate them all and tested that the output is identical?

-Andi

Re: [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)

Posted by Namhyung Kim 3 years, 5 months ago

Hi Andi,

On Mon, Oct 10, 2022 at 5:25 PM Andi Kleen <ak@linux.intel.com> wrote:
>
>
> On 10/10/2022 10:35 PM, Namhyung Kim wrote:
> > Hello,
> >
> > Current perf stat code is somewhat hard to follow since it handles
> > many combinations of PMUs/events for given display and aggregation
> > options.  This is my attempt to clean it up a little. ;-)
>
>
> My main concern would be subtle regressions since there are so many
> different combinations and way to travel through the code, and a lot of
> things are not covered by unit tests. When I worked on the code it was
> difficult to keep it all working. I assume you have some way to
> enumerate them all and tested that the output is identical?

Right, that's my concern too.

I have tested many combinations manually and checked if they
produced similar results.  But the problem is that I cannot test
all hardwares and more importantly it's hard to check
programmatically if the output is the same or not.  The numbers
vary on each run and sometimes it fluctuates a lot.  I don't have
good test workloads and the results work for every combination.

Any suggestions?

Thanks,
Namhyung

Re: [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)

Posted by Andi Kleen 3 years, 5 months ago

>> My main concern would be subtle regressions since there are so many
>> different combinations and way to travel through the code, and a lot of
>> things are not covered by unit tests. When I worked on the code it was
>> difficult to keep it all working. I assume you have some way to
>> enumerate them all and tested that the output is identical?
> Right, that's my concern too.
>
> I have tested many combinations manually and checked if they
> produced similar results.

I had a script to test many combinations, but had to check the output 
manually


> But the problem is that I cannot test
> all hardwares and more importantly it's hard to check
> programmatically if the output is the same or not.

Can use "dummy" or some software event (e.g. a probe on some syscall) to 
get stable numbers. I don't think we need to cover all hardware for the 
output options, the different events should be similar, but need some 
coverage for the different aggregation. Or we could add some more tool 
events just for testing purposes, that would allow covering different 
core scopes etc. and would easily allow generating known counts.

-Andi

Re: [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)

Posted by Namhyung Kim 3 years, 5 months ago

On Tue, Oct 11, 2022 at 4:57 AM Andi Kleen <ak@linux.intel.com> wrote:
>
>
> >> My main concern would be subtle regressions since there are so many
> >> different combinations and way to travel through the code, and a lot of
> >> things are not covered by unit tests. When I worked on the code it was
> >> difficult to keep it all working. I assume you have some way to
> >> enumerate them all and tested that the output is identical?
> > Right, that's my concern too.
> >
> > I have tested many combinations manually and checked if they
> > produced similar results.
>
> I had a script to test many combinations, but had to check the output
> manually
>
>
> > But the problem is that I cannot test
> > all hardwares and more importantly it's hard to check
> > programmatically if the output is the same or not.
>
> Can use "dummy" or some software event (e.g. a probe on some syscall) to
> get stable numbers. I don't think we need to cover all hardware for the
> output options, the different events should be similar, but need some
> coverage for the different aggregation. Or we could add some more tool
> events just for testing purposes, that would allow covering different
> core scopes etc. and would easily allow generating known counts.

Even if we can get a stable number, it still needs to know cpu topology
for different aggregation modes to verify the count.  Also I'm afraid that
cpu hotplug can affect the aggregation.

Thanks,
Namhyung

Re: [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)

Posted by Ian Rogers 3 years, 5 months ago

On Mon, Oct 10, 2022 at 10:38 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Andi,
>
> On Mon, Oct 10, 2022 at 5:25 PM Andi Kleen <ak@linux.intel.com> wrote:
> >
> >
> > On 10/10/2022 10:35 PM, Namhyung Kim wrote:
> > > Hello,
> > >
> > > Current perf stat code is somewhat hard to follow since it handles
> > > many combinations of PMUs/events for given display and aggregation
> > > options.  This is my attempt to clean it up a little. ;-)
> >
> >
> > My main concern would be subtle regressions since there are so many
> > different combinations and way to travel through the code, and a lot of
> > things are not covered by unit tests. When I worked on the code it was
> > difficult to keep it all working. I assume you have some way to
> > enumerate them all and tested that the output is identical?
>
> Right, that's my concern too.
>
> I have tested many combinations manually and checked if they
> produced similar results.  But the problem is that I cannot test
> all hardwares and more importantly it's hard to check
> programmatically if the output is the same or not.  The numbers
> vary on each run and sometimes it fluctuates a lot.  I don't have
> good test workloads and the results work for every combination.
>
> Any suggestions?

I don't think there is anything clever we can do here. A few releases
ago summary mode was enabled by default. For CSV output this meant a
summary was printed at the bottom of perf stat and importantly the
summary print out added a column on the left of all the other columns.
This caused some tool issues for us. We now have a test that CSV
output has a fixed number of columns. We added the CSV test because
the json output code reformatted the display code and it would be easy
to introduce a regression (in fact I did :-/ ). So my point is that
stat output can change and break things and we've been doing this by
accident for a while now. This isn't a reason to not merge this
change.

I think the real fix here is for tools to stop using text or CSV
output and switch to the json output, that way output isn't as brittle
except to the keys we use. It isn't feasible for the perf tool to
stand still in case there is a script somewhere, we'll just accumulate
bugs and baggage. However, if someone has a script and they want to
enforce an output, all they need to do is stick a test on it (the
Beyonce principle except s/ring/test/).

Thanks,
Ian

Re: [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)

Posted by Namhyung Kim 3 years, 5 months ago

On Mon, Oct 10, 2022 at 11:14 PM Ian Rogers <irogers@google.com> wrote:
>
> On Mon, Oct 10, 2022 at 10:38 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Andi,
> >
> > On Mon, Oct 10, 2022 at 5:25 PM Andi Kleen <ak@linux.intel.com> wrote:
> > >
> > >
> > > On 10/10/2022 10:35 PM, Namhyung Kim wrote:
> > > > Hello,
> > > >
> > > > Current perf stat code is somewhat hard to follow since it handles
> > > > many combinations of PMUs/events for given display and aggregation
> > > > options.  This is my attempt to clean it up a little. ;-)
> > >
> > >
> > > My main concern would be subtle regressions since there are so many
> > > different combinations and way to travel through the code, and a lot of
> > > things are not covered by unit tests. When I worked on the code it was
> > > difficult to keep it all working. I assume you have some way to
> > > enumerate them all and tested that the output is identical?
> >
> > Right, that's my concern too.
> >
> > I have tested many combinations manually and checked if they
> > produced similar results.  But the problem is that I cannot test
> > all hardwares and more importantly it's hard to check
> > programmatically if the output is the same or not.  The numbers
> > vary on each run and sometimes it fluctuates a lot.  I don't have
> > good test workloads and the results work for every combination.
> >
> > Any suggestions?
>
> I don't think there is anything clever we can do here. A few releases
> ago summary mode was enabled by default. For CSV output this meant a
> summary was printed at the bottom of perf stat and importantly the
> summary print out added a column on the left of all the other columns.
> This caused some tool issues for us. We now have a test that CSV
> output has a fixed number of columns. We added the CSV test because
> the json output code reformatted the display code and it would be easy
> to introduce a regression (in fact I did :-/ ). So my point is that
> stat output can change and break things and we've been doing this by
> accident for a while now. This isn't a reason to not merge this
> change.
>
> I think the real fix here is for tools to stop using text or CSV
> output and switch to the json output, that way output isn't as brittle
> except to the keys we use. It isn't feasible for the perf tool to
> stand still in case there is a script somewhere, we'll just accumulate
> bugs and baggage. However, if someone has a script and they want to
> enforce an output, all they need to do is stick a test on it (the
> Beyonce principle except s/ring/test/).

Thanks for your opinion.

I agree that it'd be better using JSON output for machine processing.
Although there are records of historic perf stat brekages, it'd be nice
if we could avoid that for the default output mode. :)
Let me think about if there's a better way.

Thanks,
Namhyung