[v4] Fix perf on Intel hybrid CPUs

[PATCH v4 00/44] Fix perf on Intel hybrid CPUs

Posted by Ian Rogers 2 years, 9 months ago

TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
or individually, event parsing doesn't always scan all PMUs, more and
new tests that also run without hybrid, less code.

The first 4 patches are aimed at Linux 6.4 to address issues raised,
in particular by Kan, on the existing perf stat behavior with json
metrics. They avoid duplicated events by removing groups. They don't
hide events and metrics to make event multiplexing obvious. They avoid
terminating perf when paranoia is higher due to certain events that
always fail. They avoid rearranging events by PMUs when the events
aren't in a group.

The next 5 patches avoid grouping events for metrics where they could
never succeed and were previously posted as:
"perf vendor events intel: Add xxx metric constraints"
https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/
In general the generated json is coming from:
https://github.com/intel/perfmon/pull/73

Next are some general and test improvements.

Next event parsing is rewritten to not scan all PMUs for the benefit
of raw and legacy cache parsing, instead these are handled by the
lexer and a new term type. This ultimately removes the need for the
event parser for hybrid to be recursive as legacy cache can be just a
term. Tests are re-enabled for events with hyphens, so AMD's
branch-brs event is now parsable.

The cputype option is made a generic pmu filter flag and is tested
even on non-hybrid systems.

The final patches address specific json metric issues on hybrid, in
both the json metrics and the metric code.

The patches add slightly more code than they remove, in areas like
better json metric constraints and tests, but in the core util code,
the removal of hybrid is a net reduction:
 22 files changed, 711 insertions(+), 1016 deletions(-)

Sample output is contained in the v1 patch set:
https://lore.kernel.org/lkml/bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com/

Tested on Tigerlake, Skylake and Alderlake CPUs.

The v4 patch set:
 - rebase, 1 of the Linux 6.4 recommended patches are merged leaving:
   1) perf metric: Change divide by zero and !support events behavior
   2) perf stat: Introduce skippable evsels
   3) perf metric: Json flag to not group events if gathering a metric group
   4) perf parse-events: Don't reorder ungrouped events by pmu
   whose diffstat is:
    30 files changed, 326 insertions(+), 33 deletions(-)
   but without the vendor event updates (the tend to be large as they
   repeat something per architecture per metric) is just:
    10 files changed, 90 insertions(+), 32 deletions(-)
 - Address review comments from Ravi Bangoria <ravi.bangoria@amd.com>
   that are only in the patches targeting Linux 6.5.

The v3 patch set:
 - for Linux 6.4 the first 5 patches are recommended:
   - changes skippable evsels to always print in case short running
     benchmarks meet the 0 enable and 0 count condition (suggested by
     Stephane).
   - changes metrics to show "nan" so that it is more obvious they
     couldn't be computed (suggested by Stephane).
   - fixes a reordering issue that reordered events that lacked a
     group, especially when the core PMU isn't "cpu".
 - for Linux 6.5 added extra hybrid type tests in the x86 hybrid test
   as suggested by Kan.
 - note, the patches aren't rebased against the tmp.perf-tools branch
   meaning patches 1 and 11 should be dropped there.
 
The v2 patch set:
 - restructures the 3 Linux 6.4 patches first;
 - makes it so that is_event_supported isn't called during core event parsing;
 - displays skippable 0 count events that were enabled;
 - addresses output formatting review comments;
 - addresses some test issues and an uninitialized variable use in strchr;
 - addresses checkpatch.pl reported issues;
 - adds reviewed-by to some patches.

Ian Rogers (44):
  perf metric: Change divide by zero and !support events behavior
  perf stat: Introduce skippable evsels
  perf metric: Json flag to not group events if gathering a metric group
  perf parse-events: Don't reorder ungrouped events by pmu
  perf vendor events intel: Add alderlake metric constraints
  perf vendor events intel: Add icelake metric constraints
  perf vendor events intel: Add icelakex metric constraints
  perf vendor events intel: Add sapphirerapids metric constraints
  perf vendor events intel: Add tigerlake metric constraints
  perf test: Test more sysfs events
  perf test: Use valid for PMU tests
  perf test: Mask configs with extended types then test
  perf test: Test more with config_cache
  perf test: Roundtrip name, don't assume 1 event per name
  perf parse-events: Set attr.type to PMU type early
  perf parse-events: Set pmu_name whenever a pmu is given
  perf print-events: Avoid unnecessary strlist
  perf parse-events: Avoid scanning PMUs before parsing
  perf evsel: Modify group pmu name for software events
  perf test: Move x86 hybrid tests to arch/x86
  perf test x86 hybrid: Update test expectations
  perf test x86 hybrid: Add hybrid extended type checks
  perf parse-events: Support PMUs for legacy cache events
  perf parse-events: Wildcard legacy cache events
  perf print-events: Print legacy cache events for each PMU
  perf parse-events: Support wildcards on raw events
  perf parse-events: Remove now unused hybrid logic
  perf parse-events: Minor type safety cleanup
  perf parse-events: Add pmu filter
  perf stat: Make cputype filter generic
  perf test: Add cputype testing to perf stat
  perf test: Fix parse-events tests for >1 core PMU
  perf parse-events: Support hardware events as terms
  perf parse-events: Avoid error when assigning a term
  perf parse-events: Avoid error when assigning a legacy cache term
  perf parse-events: Don't auto merge hybrid wildcard events
  perf parse-events: Don't reorder atom cpu events
  perf metrics: Be PMU specific for referenced metrics.
  perf stat: Command line PMU metric filtering
  perf vendor events intel: Correct alderlake metrics
  perf jevents: Don't rewrite metrics across PMUs
  perf metrics: Be PMU specific in event match
  perf stat: Don't disable TopdownL1 metric on hybrid
  perf parse-events: Reduce scope of is_event_supported

 tools/perf/arch/x86/include/arch-tests.h      |   1 +
 tools/perf/arch/x86/tests/Build               |   1 +
 tools/perf/arch/x86/tests/arch-tests.c        |  10 +
 tools/perf/arch/x86/tests/hybrid.c            | 287 ++++++
 tools/perf/arch/x86/util/evlist.c             |   4 +-
 tools/perf/builtin-list.c                     |  21 +-
 tools/perf/builtin-record.c                   |  13 +-
 tools/perf/builtin-stat.c                     |  77 +-
 tools/perf/builtin-top.c                      |   5 +-
 tools/perf/builtin-trace.c                    |   5 +-
 .../arch/x86/alderlake/adl-metrics.json       | 275 +++---
 .../arch/x86/alderlaken/adln-metrics.json     |  20 +-
 .../arch/x86/broadwell/bdw-metrics.json       |  12 +
 .../arch/x86/broadwellde/bdwde-metrics.json   |  12 +
 .../arch/x86/broadwellx/bdx-metrics.json      |  12 +
 .../arch/x86/cascadelakex/clx-metrics.json    |  12 +
 .../arch/x86/haswell/hsw-metrics.json         |  12 +
 .../arch/x86/haswellx/hsx-metrics.json        |  12 +
 .../arch/x86/icelake/icl-metrics.json         |  23 +
 .../arch/x86/icelakex/icx-metrics.json        |  23 +
 .../arch/x86/ivybridge/ivb-metrics.json       |  12 +
 .../arch/x86/ivytown/ivt-metrics.json         |  12 +
 .../arch/x86/jaketown/jkt-metrics.json        |  12 +
 .../arch/x86/sandybridge/snb-metrics.json     |  12 +
 .../arch/x86/sapphirerapids/spr-metrics.json  |  23 +
 .../arch/x86/skylake/skl-metrics.json         |  12 +
 .../arch/x86/skylakex/skx-metrics.json        |  12 +
 .../arch/x86/tigerlake/tgl-metrics.json       |  23 +
 tools/perf/pmu-events/jevents.py              |  10 +-
 tools/perf/pmu-events/metric.py               |  28 +-
 tools/perf/pmu-events/metric_test.py          |   6 +-
 tools/perf/pmu-events/pmu-events.h            |   2 +
 tools/perf/tests/evsel-roundtrip-name.c       | 119 +--
 tools/perf/tests/expr.c                       |   3 +-
 tools/perf/tests/parse-events.c               | 858 +++++++++---------
 tools/perf/tests/parse-metric.c               |   1 +
 tools/perf/tests/pmu-events.c                 |  12 +-
 tools/perf/tests/shell/stat.sh                |  44 +
 tools/perf/util/Build                         |   1 -
 tools/perf/util/evlist.h                      |   1 -
 tools/perf/util/evsel.c                       |  30 +-
 tools/perf/util/evsel.h                       |   1 +
 tools/perf/util/expr.y                        |   6 +-
 tools/perf/util/metricgroup.c                 | 111 ++-
 tools/perf/util/metricgroup.h                 |   3 +-
 tools/perf/util/parse-events-hybrid.c         | 214 -----
 tools/perf/util/parse-events-hybrid.h         |  25 -
 tools/perf/util/parse-events.c                | 720 ++++++---------
 tools/perf/util/parse-events.h                |  63 +-
 tools/perf/util/parse-events.l                | 108 +--
 tools/perf/util/parse-events.y                | 222 ++---
 tools/perf/util/pmu-hybrid.c                  |  20 -
 tools/perf/util/pmu-hybrid.h                  |   1 -
 tools/perf/util/pmu.c                         |  16 +-
 tools/perf/util/pmu.h                         |   3 +
 tools/perf/util/pmus.c                        |  25 +-
 tools/perf/util/pmus.h                        |   3 +
 tools/perf/util/print-events.c                | 127 ++-
 tools/perf/util/stat-display.c                |   2 +-
 tools/perf/util/stat-shadow.c                 |  25 +-
 60 files changed, 2066 insertions(+), 1699 deletions(-)
 create mode 100644 tools/perf/arch/x86/tests/hybrid.c
 delete mode 100644 tools/perf/util/parse-events-hybrid.c
 delete mode 100644 tools/perf/util/parse-events-hybrid.h

-- 
2.40.1.495.gc816e09b53d-goog

Re: [PATCH v4 00/44] Fix perf on Intel hybrid CPUs

Posted by Arnaldo Carvalho de Melo 2 years, 9 months ago

Em Tue, May 02, 2023 at 03:38:07PM -0700, Ian Rogers escreveu:
> TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
> or individually, event parsing doesn't always scan all PMUs, more and
> new tests that also run without hybrid, less code.
> 
> The first 4 patches are aimed at Linux 6.4 to address issues raised,
> in particular by Kan, on the existing perf stat behavior with json
> metrics. They avoid duplicated events by removing groups. They don't
> hide events and metrics to make event multiplexing obvious. They avoid
> terminating perf when paranoia is higher due to certain events that
> always fail. They avoid rearranging events by PMUs when the events
> aren't in a group.

Applied the first 4, soon I'll go over the rest of the patchkit.

- Arnaldo
 
> The next 5 patches avoid grouping events for metrics where they could
> never succeed and were previously posted as:
> "perf vendor events intel: Add xxx metric constraints"
> https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/
> In general the generated json is coming from:
> https://github.com/intel/perfmon/pull/73
> 
> Next are some general and test improvements.
> 
> Next event parsing is rewritten to not scan all PMUs for the benefit
> of raw and legacy cache parsing, instead these are handled by the
> lexer and a new term type. This ultimately removes the need for the
> event parser for hybrid to be recursive as legacy cache can be just a
> term. Tests are re-enabled for events with hyphens, so AMD's
> branch-brs event is now parsable.
> 
> The cputype option is made a generic pmu filter flag and is tested
> even on non-hybrid systems.
> 
> The final patches address specific json metric issues on hybrid, in
> both the json metrics and the metric code.
> 
> The patches add slightly more code than they remove, in areas like
> better json metric constraints and tests, but in the core util code,
> the removal of hybrid is a net reduction:
>  22 files changed, 711 insertions(+), 1016 deletions(-)
> 
> Sample output is contained in the v1 patch set:
> https://lore.kernel.org/lkml/bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com/
> 
> Tested on Tigerlake, Skylake and Alderlake CPUs.
> 
> The v4 patch set:
>  - rebase, 1 of the Linux 6.4 recommended patches are merged leaving:
>    1) perf metric: Change divide by zero and !support events behavior
>    2) perf stat: Introduce skippable evsels
>    3) perf metric: Json flag to not group events if gathering a metric group
>    4) perf parse-events: Don't reorder ungrouped events by pmu
>    whose diffstat is:
>     30 files changed, 326 insertions(+), 33 deletions(-)
>    but without the vendor event updates (the tend to be large as they
>    repeat something per architecture per metric) is just:
>     10 files changed, 90 insertions(+), 32 deletions(-)
>  - Address review comments from Ravi Bangoria <ravi.bangoria@amd.com>
>    that are only in the patches targeting Linux 6.5.
> 
> The v3 patch set:
>  - for Linux 6.4 the first 5 patches are recommended:
>    - changes skippable evsels to always print in case short running
>      benchmarks meet the 0 enable and 0 count condition (suggested by
>      Stephane).
>    - changes metrics to show "nan" so that it is more obvious they
>      couldn't be computed (suggested by Stephane).
>    - fixes a reordering issue that reordered events that lacked a
>      group, especially when the core PMU isn't "cpu".
>  - for Linux 6.5 added extra hybrid type tests in the x86 hybrid test
>    as suggested by Kan.
>  - note, the patches aren't rebased against the tmp.perf-tools branch
>    meaning patches 1 and 11 should be dropped there.
>  
> The v2 patch set:
>  - restructures the 3 Linux 6.4 patches first;
>  - makes it so that is_event_supported isn't called during core event parsing;
>  - displays skippable 0 count events that were enabled;
>  - addresses output formatting review comments;
>  - addresses some test issues and an uninitialized variable use in strchr;
>  - addresses checkpatch.pl reported issues;
>  - adds reviewed-by to some patches.
> 
> Ian Rogers (44):
>   perf metric: Change divide by zero and !support events behavior
>   perf stat: Introduce skippable evsels
>   perf metric: Json flag to not group events if gathering a metric group
>   perf parse-events: Don't reorder ungrouped events by pmu
>   perf vendor events intel: Add alderlake metric constraints
>   perf vendor events intel: Add icelake metric constraints
>   perf vendor events intel: Add icelakex metric constraints
>   perf vendor events intel: Add sapphirerapids metric constraints
>   perf vendor events intel: Add tigerlake metric constraints
>   perf test: Test more sysfs events
>   perf test: Use valid for PMU tests
>   perf test: Mask configs with extended types then test
>   perf test: Test more with config_cache
>   perf test: Roundtrip name, don't assume 1 event per name
>   perf parse-events: Set attr.type to PMU type early
>   perf parse-events: Set pmu_name whenever a pmu is given
>   perf print-events: Avoid unnecessary strlist
>   perf parse-events: Avoid scanning PMUs before parsing
>   perf evsel: Modify group pmu name for software events
>   perf test: Move x86 hybrid tests to arch/x86
>   perf test x86 hybrid: Update test expectations
>   perf test x86 hybrid: Add hybrid extended type checks
>   perf parse-events: Support PMUs for legacy cache events
>   perf parse-events: Wildcard legacy cache events
>   perf print-events: Print legacy cache events for each PMU
>   perf parse-events: Support wildcards on raw events
>   perf parse-events: Remove now unused hybrid logic
>   perf parse-events: Minor type safety cleanup
>   perf parse-events: Add pmu filter
>   perf stat: Make cputype filter generic
>   perf test: Add cputype testing to perf stat
>   perf test: Fix parse-events tests for >1 core PMU
>   perf parse-events: Support hardware events as terms
>   perf parse-events: Avoid error when assigning a term
>   perf parse-events: Avoid error when assigning a legacy cache term
>   perf parse-events: Don't auto merge hybrid wildcard events
>   perf parse-events: Don't reorder atom cpu events
>   perf metrics: Be PMU specific for referenced metrics.
>   perf stat: Command line PMU metric filtering
>   perf vendor events intel: Correct alderlake metrics
>   perf jevents: Don't rewrite metrics across PMUs
>   perf metrics: Be PMU specific in event match
>   perf stat: Don't disable TopdownL1 metric on hybrid
>   perf parse-events: Reduce scope of is_event_supported
> 
>  tools/perf/arch/x86/include/arch-tests.h      |   1 +
>  tools/perf/arch/x86/tests/Build               |   1 +
>  tools/perf/arch/x86/tests/arch-tests.c        |  10 +
>  tools/perf/arch/x86/tests/hybrid.c            | 287 ++++++
>  tools/perf/arch/x86/util/evlist.c             |   4 +-
>  tools/perf/builtin-list.c                     |  21 +-
>  tools/perf/builtin-record.c                   |  13 +-
>  tools/perf/builtin-stat.c                     |  77 +-
>  tools/perf/builtin-top.c                      |   5 +-
>  tools/perf/builtin-trace.c                    |   5 +-
>  .../arch/x86/alderlake/adl-metrics.json       | 275 +++---
>  .../arch/x86/alderlaken/adln-metrics.json     |  20 +-
>  .../arch/x86/broadwell/bdw-metrics.json       |  12 +
>  .../arch/x86/broadwellde/bdwde-metrics.json   |  12 +
>  .../arch/x86/broadwellx/bdx-metrics.json      |  12 +
>  .../arch/x86/cascadelakex/clx-metrics.json    |  12 +
>  .../arch/x86/haswell/hsw-metrics.json         |  12 +
>  .../arch/x86/haswellx/hsx-metrics.json        |  12 +
>  .../arch/x86/icelake/icl-metrics.json         |  23 +
>  .../arch/x86/icelakex/icx-metrics.json        |  23 +
>  .../arch/x86/ivybridge/ivb-metrics.json       |  12 +
>  .../arch/x86/ivytown/ivt-metrics.json         |  12 +
>  .../arch/x86/jaketown/jkt-metrics.json        |  12 +
>  .../arch/x86/sandybridge/snb-metrics.json     |  12 +
>  .../arch/x86/sapphirerapids/spr-metrics.json  |  23 +
>  .../arch/x86/skylake/skl-metrics.json         |  12 +
>  .../arch/x86/skylakex/skx-metrics.json        |  12 +
>  .../arch/x86/tigerlake/tgl-metrics.json       |  23 +
>  tools/perf/pmu-events/jevents.py              |  10 +-
>  tools/perf/pmu-events/metric.py               |  28 +-
>  tools/perf/pmu-events/metric_test.py          |   6 +-
>  tools/perf/pmu-events/pmu-events.h            |   2 +
>  tools/perf/tests/evsel-roundtrip-name.c       | 119 +--
>  tools/perf/tests/expr.c                       |   3 +-
>  tools/perf/tests/parse-events.c               | 858 +++++++++---------
>  tools/perf/tests/parse-metric.c               |   1 +
>  tools/perf/tests/pmu-events.c                 |  12 +-
>  tools/perf/tests/shell/stat.sh                |  44 +
>  tools/perf/util/Build                         |   1 -
>  tools/perf/util/evlist.h                      |   1 -
>  tools/perf/util/evsel.c                       |  30 +-
>  tools/perf/util/evsel.h                       |   1 +
>  tools/perf/util/expr.y                        |   6 +-
>  tools/perf/util/metricgroup.c                 | 111 ++-
>  tools/perf/util/metricgroup.h                 |   3 +-
>  tools/perf/util/parse-events-hybrid.c         | 214 -----
>  tools/perf/util/parse-events-hybrid.h         |  25 -
>  tools/perf/util/parse-events.c                | 720 ++++++---------
>  tools/perf/util/parse-events.h                |  63 +-
>  tools/perf/util/parse-events.l                | 108 +--
>  tools/perf/util/parse-events.y                | 222 ++---
>  tools/perf/util/pmu-hybrid.c                  |  20 -
>  tools/perf/util/pmu-hybrid.h                  |   1 -
>  tools/perf/util/pmu.c                         |  16 +-
>  tools/perf/util/pmu.h                         |   3 +
>  tools/perf/util/pmus.c                        |  25 +-
>  tools/perf/util/pmus.h                        |   3 +
>  tools/perf/util/print-events.c                | 127 ++-
>  tools/perf/util/stat-display.c                |   2 +-
>  tools/perf/util/stat-shadow.c                 |  25 +-
>  60 files changed, 2066 insertions(+), 1699 deletions(-)
>  create mode 100644 tools/perf/arch/x86/tests/hybrid.c
>  delete mode 100644 tools/perf/util/parse-events-hybrid.c
>  delete mode 100644 tools/perf/util/parse-events-hybrid.h
> 
> -- 
> 2.40.1.495.gc816e09b53d-goog
> 

-- 

- Arnaldo

Re: [PATCH v4 00/44] Fix perf on Intel hybrid CPUs

Posted by Arnaldo Carvalho de Melo 2 years, 9 months ago

Em Tue, May 02, 2023 at 03:38:07PM -0700, Ian Rogers escreveu:
> TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
> or individually, event parsing doesn't always scan all PMUs, more and
> new tests that also run without hybrid, less code.
> 
> The first 4 patches are aimed at Linux 6.4 to address issues raised,
> in particular by Kan, on the existing perf stat behavior with json
> metrics. They avoid duplicated events by removing groups. They don't
> hide events and metrics to make event multiplexing obvious. They avoid
> terminating perf when paranoia is higher due to certain events that
> always fail. They avoid rearranging events by PMUs when the events
> aren't in a group.

Took the first 4, skipped patch 19 that Adrian asked to be in 6.4 and
then went all the way to:

Applying: perf test: Add cputype testing to perf stat
error: patch failed: tools/perf/tests/shell/stat.sh:91
error: tools/perf/tests/shell/stat.sh: patch does not apply
Patch failed at 0012 perf test: Add cputype testing to perf stat
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
⬢[acme@toolbox perf-tools-next]$ git am --abort
⬢[acme@toolbox perf-tools-next]$

Will test with it and put what I have in tmp.perf-tools-next, testing
now.

- Arnaldo

 
> The next 5 patches avoid grouping events for metrics where they could
> never succeed and were previously posted as:
> "perf vendor events intel: Add xxx metric constraints"
> https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/
> In general the generated json is coming from:
> https://github.com/intel/perfmon/pull/73
> 
> Next are some general and test improvements.
> 
> Next event parsing is rewritten to not scan all PMUs for the benefit
> of raw and legacy cache parsing, instead these are handled by the
> lexer and a new term type. This ultimately removes the need for the
> event parser for hybrid to be recursive as legacy cache can be just a
> term. Tests are re-enabled for events with hyphens, so AMD's
> branch-brs event is now parsable.
> 
> The cputype option is made a generic pmu filter flag and is tested
> even on non-hybrid systems.
> 
> The final patches address specific json metric issues on hybrid, in
> both the json metrics and the metric code.
> 
> The patches add slightly more code than they remove, in areas like
> better json metric constraints and tests, but in the core util code,
> the removal of hybrid is a net reduction:
>  22 files changed, 711 insertions(+), 1016 deletions(-)
> 
> Sample output is contained in the v1 patch set:
> https://lore.kernel.org/lkml/bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com/
> 
> Tested on Tigerlake, Skylake and Alderlake CPUs.
> 
> The v4 patch set:
>  - rebase, 1 of the Linux 6.4 recommended patches are merged leaving:
>    1) perf metric: Change divide by zero and !support events behavior
>    2) perf stat: Introduce skippable evsels
>    3) perf metric: Json flag to not group events if gathering a metric group
>    4) perf parse-events: Don't reorder ungrouped events by pmu
>    whose diffstat is:
>     30 files changed, 326 insertions(+), 33 deletions(-)
>    but without the vendor event updates (the tend to be large as they
>    repeat something per architecture per metric) is just:
>     10 files changed, 90 insertions(+), 32 deletions(-)
>  - Address review comments from Ravi Bangoria <ravi.bangoria@amd.com>
>    that are only in the patches targeting Linux 6.5.
> 
> The v3 patch set:
>  - for Linux 6.4 the first 5 patches are recommended:
>    - changes skippable evsels to always print in case short running
>      benchmarks meet the 0 enable and 0 count condition (suggested by
>      Stephane).
>    - changes metrics to show "nan" so that it is more obvious they
>      couldn't be computed (suggested by Stephane).
>    - fixes a reordering issue that reordered events that lacked a
>      group, especially when the core PMU isn't "cpu".
>  - for Linux 6.5 added extra hybrid type tests in the x86 hybrid test
>    as suggested by Kan.
>  - note, the patches aren't rebased against the tmp.perf-tools branch
>    meaning patches 1 and 11 should be dropped there.
>  
> The v2 patch set:
>  - restructures the 3 Linux 6.4 patches first;
>  - makes it so that is_event_supported isn't called during core event parsing;
>  - displays skippable 0 count events that were enabled;
>  - addresses output formatting review comments;
>  - addresses some test issues and an uninitialized variable use in strchr;
>  - addresses checkpatch.pl reported issues;
>  - adds reviewed-by to some patches.
> 
> Ian Rogers (44):
>   perf metric: Change divide by zero and !support events behavior
>   perf stat: Introduce skippable evsels
>   perf metric: Json flag to not group events if gathering a metric group
>   perf parse-events: Don't reorder ungrouped events by pmu
>   perf vendor events intel: Add alderlake metric constraints
>   perf vendor events intel: Add icelake metric constraints
>   perf vendor events intel: Add icelakex metric constraints
>   perf vendor events intel: Add sapphirerapids metric constraints
>   perf vendor events intel: Add tigerlake metric constraints
>   perf test: Test more sysfs events
>   perf test: Use valid for PMU tests
>   perf test: Mask configs with extended types then test
>   perf test: Test more with config_cache
>   perf test: Roundtrip name, don't assume 1 event per name
>   perf parse-events: Set attr.type to PMU type early
>   perf parse-events: Set pmu_name whenever a pmu is given
>   perf print-events: Avoid unnecessary strlist
>   perf parse-events: Avoid scanning PMUs before parsing
>   perf evsel: Modify group pmu name for software events
>   perf test: Move x86 hybrid tests to arch/x86
>   perf test x86 hybrid: Update test expectations
>   perf test x86 hybrid: Add hybrid extended type checks
>   perf parse-events: Support PMUs for legacy cache events
>   perf parse-events: Wildcard legacy cache events
>   perf print-events: Print legacy cache events for each PMU
>   perf parse-events: Support wildcards on raw events
>   perf parse-events: Remove now unused hybrid logic
>   perf parse-events: Minor type safety cleanup
>   perf parse-events: Add pmu filter
>   perf stat: Make cputype filter generic
>   perf test: Add cputype testing to perf stat
>   perf test: Fix parse-events tests for >1 core PMU
>   perf parse-events: Support hardware events as terms
>   perf parse-events: Avoid error when assigning a term
>   perf parse-events: Avoid error when assigning a legacy cache term
>   perf parse-events: Don't auto merge hybrid wildcard events
>   perf parse-events: Don't reorder atom cpu events
>   perf metrics: Be PMU specific for referenced metrics.
>   perf stat: Command line PMU metric filtering
>   perf vendor events intel: Correct alderlake metrics
>   perf jevents: Don't rewrite metrics across PMUs
>   perf metrics: Be PMU specific in event match
>   perf stat: Don't disable TopdownL1 metric on hybrid
>   perf parse-events: Reduce scope of is_event_supported
> 
>  tools/perf/arch/x86/include/arch-tests.h      |   1 +
>  tools/perf/arch/x86/tests/Build               |   1 +
>  tools/perf/arch/x86/tests/arch-tests.c        |  10 +
>  tools/perf/arch/x86/tests/hybrid.c            | 287 ++++++
>  tools/perf/arch/x86/util/evlist.c             |   4 +-
>  tools/perf/builtin-list.c                     |  21 +-
>  tools/perf/builtin-record.c                   |  13 +-
>  tools/perf/builtin-stat.c                     |  77 +-
>  tools/perf/builtin-top.c                      |   5 +-
>  tools/perf/builtin-trace.c                    |   5 +-
>  .../arch/x86/alderlake/adl-metrics.json       | 275 +++---
>  .../arch/x86/alderlaken/adln-metrics.json     |  20 +-
>  .../arch/x86/broadwell/bdw-metrics.json       |  12 +
>  .../arch/x86/broadwellde/bdwde-metrics.json   |  12 +
>  .../arch/x86/broadwellx/bdx-metrics.json      |  12 +
>  .../arch/x86/cascadelakex/clx-metrics.json    |  12 +
>  .../arch/x86/haswell/hsw-metrics.json         |  12 +
>  .../arch/x86/haswellx/hsx-metrics.json        |  12 +
>  .../arch/x86/icelake/icl-metrics.json         |  23 +
>  .../arch/x86/icelakex/icx-metrics.json        |  23 +
>  .../arch/x86/ivybridge/ivb-metrics.json       |  12 +
>  .../arch/x86/ivytown/ivt-metrics.json         |  12 +
>  .../arch/x86/jaketown/jkt-metrics.json        |  12 +
>  .../arch/x86/sandybridge/snb-metrics.json     |  12 +
>  .../arch/x86/sapphirerapids/spr-metrics.json  |  23 +
>  .../arch/x86/skylake/skl-metrics.json         |  12 +
>  .../arch/x86/skylakex/skx-metrics.json        |  12 +
>  .../arch/x86/tigerlake/tgl-metrics.json       |  23 +
>  tools/perf/pmu-events/jevents.py              |  10 +-
>  tools/perf/pmu-events/metric.py               |  28 +-
>  tools/perf/pmu-events/metric_test.py          |   6 +-
>  tools/perf/pmu-events/pmu-events.h            |   2 +
>  tools/perf/tests/evsel-roundtrip-name.c       | 119 +--
>  tools/perf/tests/expr.c                       |   3 +-
>  tools/perf/tests/parse-events.c               | 858 +++++++++---------
>  tools/perf/tests/parse-metric.c               |   1 +
>  tools/perf/tests/pmu-events.c                 |  12 +-
>  tools/perf/tests/shell/stat.sh                |  44 +
>  tools/perf/util/Build                         |   1 -
>  tools/perf/util/evlist.h                      |   1 -
>  tools/perf/util/evsel.c                       |  30 +-
>  tools/perf/util/evsel.h                       |   1 +
>  tools/perf/util/expr.y                        |   6 +-
>  tools/perf/util/metricgroup.c                 | 111 ++-
>  tools/perf/util/metricgroup.h                 |   3 +-
>  tools/perf/util/parse-events-hybrid.c         | 214 -----
>  tools/perf/util/parse-events-hybrid.h         |  25 -
>  tools/perf/util/parse-events.c                | 720 ++++++---------
>  tools/perf/util/parse-events.h                |  63 +-
>  tools/perf/util/parse-events.l                | 108 +--
>  tools/perf/util/parse-events.y                | 222 ++---
>  tools/perf/util/pmu-hybrid.c                  |  20 -
>  tools/perf/util/pmu-hybrid.h                  |   1 -
>  tools/perf/util/pmu.c                         |  16 +-
>  tools/perf/util/pmu.h                         |   3 +
>  tools/perf/util/pmus.c                        |  25 +-
>  tools/perf/util/pmus.h                        |   3 +
>  tools/perf/util/print-events.c                | 127 ++-
>  tools/perf/util/stat-display.c                |   2 +-
>  tools/perf/util/stat-shadow.c                 |  25 +-
>  60 files changed, 2066 insertions(+), 1699 deletions(-)
>  create mode 100644 tools/perf/arch/x86/tests/hybrid.c
>  delete mode 100644 tools/perf/util/parse-events-hybrid.c
>  delete mode 100644 tools/perf/util/parse-events-hybrid.h
> 
> -- 
> 2.40.1.495.gc816e09b53d-goog
> 

-- 

- Arnaldo

Re: [PATCH v4 00/44] Fix perf on Intel hybrid CPUs

Posted by Ian Rogers 2 years, 9 months ago

On Fri, May 12, 2023 at 11:28 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Tue, May 02, 2023 at 03:38:07PM -0700, Ian Rogers escreveu:
> > TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
> > or individually, event parsing doesn't always scan all PMUs, more and
> > new tests that also run without hybrid, less code.
> >
> > The first 4 patches are aimed at Linux 6.4 to address issues raised,
> > in particular by Kan, on the existing perf stat behavior with json
> > metrics. They avoid duplicated events by removing groups. They don't
> > hide events and metrics to make event multiplexing obvious. They avoid
> > terminating perf when paranoia is higher due to certain events that
> > always fail. They avoid rearranging events by PMUs when the events
> > aren't in a group.
>
> Took the first 4, skipped patch 19 that Adrian asked to be in 6.4 and
> then went all the way to:
>
> Applying: perf test: Add cputype testing to perf stat
> error: patch failed: tools/perf/tests/shell/stat.sh:91
> error: tools/perf/tests/shell/stat.sh: patch does not apply
> Patch failed at 0012 perf test: Add cputype testing to perf stat
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
> ⬢[acme@toolbox perf-tools-next]$ git am --abort
> ⬢[acme@toolbox perf-tools-next]$
>
> Will test with it and put what I have in tmp.perf-tools-next, testing
> now.
>
> - Arnaldo

Thanks Arnaldo, I checked that all the patches are there
(tmp.perf-tools-next) and tested on a non-hybrid system. Everything
looks good. I resent the missing test patch:
https://lore.kernel.org/lkml/20230513063447.464691-1-irogers@google.com/
It applied cleanly for me, so I'm not sure what the issue was. Getting
the other cherry-picked was no small thing, so I'll not complain :-)

Thanks,
Ian

> > The next 5 patches avoid grouping events for metrics where they could
> > never succeed and were previously posted as:
> > "perf vendor events intel: Add xxx metric constraints"
> > https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/
> > In general the generated json is coming from:
> > https://github.com/intel/perfmon/pull/73
> >
> > Next are some general and test improvements.
> >
> > Next event parsing is rewritten to not scan all PMUs for the benefit
> > of raw and legacy cache parsing, instead these are handled by the
> > lexer and a new term type. This ultimately removes the need for the
> > event parser for hybrid to be recursive as legacy cache can be just a
> > term. Tests are re-enabled for events with hyphens, so AMD's
> > branch-brs event is now parsable.
> >
> > The cputype option is made a generic pmu filter flag and is tested
> > even on non-hybrid systems.
> >
> > The final patches address specific json metric issues on hybrid, in
> > both the json metrics and the metric code.
> >
> > The patches add slightly more code than they remove, in areas like
> > better json metric constraints and tests, but in the core util code,
> > the removal of hybrid is a net reduction:
> >  22 files changed, 711 insertions(+), 1016 deletions(-)
> >
> > Sample output is contained in the v1 patch set:
> > https://lore.kernel.org/lkml/bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com/
> >
> > Tested on Tigerlake, Skylake and Alderlake CPUs.
> >
> > The v4 patch set:
> >  - rebase, 1 of the Linux 6.4 recommended patches are merged leaving:
> >    1) perf metric: Change divide by zero and !support events behavior
> >    2) perf stat: Introduce skippable evsels
> >    3) perf metric: Json flag to not group events if gathering a metric group
> >    4) perf parse-events: Don't reorder ungrouped events by pmu
> >    whose diffstat is:
> >     30 files changed, 326 insertions(+), 33 deletions(-)
> >    but without the vendor event updates (the tend to be large as they
> >    repeat something per architecture per metric) is just:
> >     10 files changed, 90 insertions(+), 32 deletions(-)
> >  - Address review comments from Ravi Bangoria <ravi.bangoria@amd.com>
> >    that are only in the patches targeting Linux 6.5.
> >
> > The v3 patch set:
> >  - for Linux 6.4 the first 5 patches are recommended:
> >    - changes skippable evsels to always print in case short running
> >      benchmarks meet the 0 enable and 0 count condition (suggested by
> >      Stephane).
> >    - changes metrics to show "nan" so that it is more obvious they
> >      couldn't be computed (suggested by Stephane).
> >    - fixes a reordering issue that reordered events that lacked a
> >      group, especially when the core PMU isn't "cpu".
> >  - for Linux 6.5 added extra hybrid type tests in the x86 hybrid test
> >    as suggested by Kan.
> >  - note, the patches aren't rebased against the tmp.perf-tools branch
> >    meaning patches 1 and 11 should be dropped there.
> >
> > The v2 patch set:
> >  - restructures the 3 Linux 6.4 patches first;
> >  - makes it so that is_event_supported isn't called during core event parsing;
> >  - displays skippable 0 count events that were enabled;
> >  - addresses output formatting review comments;
> >  - addresses some test issues and an uninitialized variable use in strchr;
> >  - addresses checkpatch.pl reported issues;
> >  - adds reviewed-by to some patches.
> >
> > Ian Rogers (44):
> >   perf metric: Change divide by zero and !support events behavior
> >   perf stat: Introduce skippable evsels
> >   perf metric: Json flag to not group events if gathering a metric group
> >   perf parse-events: Don't reorder ungrouped events by pmu
> >   perf vendor events intel: Add alderlake metric constraints
> >   perf vendor events intel: Add icelake metric constraints
> >   perf vendor events intel: Add icelakex metric constraints
> >   perf vendor events intel: Add sapphirerapids metric constraints
> >   perf vendor events intel: Add tigerlake metric constraints
> >   perf test: Test more sysfs events
> >   perf test: Use valid for PMU tests
> >   perf test: Mask configs with extended types then test
> >   perf test: Test more with config_cache
> >   perf test: Roundtrip name, don't assume 1 event per name
> >   perf parse-events: Set attr.type to PMU type early
> >   perf parse-events: Set pmu_name whenever a pmu is given
> >   perf print-events: Avoid unnecessary strlist
> >   perf parse-events: Avoid scanning PMUs before parsing
> >   perf evsel: Modify group pmu name for software events
> >   perf test: Move x86 hybrid tests to arch/x86
> >   perf test x86 hybrid: Update test expectations
> >   perf test x86 hybrid: Add hybrid extended type checks
> >   perf parse-events: Support PMUs for legacy cache events
> >   perf parse-events: Wildcard legacy cache events
> >   perf print-events: Print legacy cache events for each PMU
> >   perf parse-events: Support wildcards on raw events
> >   perf parse-events: Remove now unused hybrid logic
> >   perf parse-events: Minor type safety cleanup
> >   perf parse-events: Add pmu filter
> >   perf stat: Make cputype filter generic
> >   perf test: Add cputype testing to perf stat
> >   perf test: Fix parse-events tests for >1 core PMU
> >   perf parse-events: Support hardware events as terms
> >   perf parse-events: Avoid error when assigning a term
> >   perf parse-events: Avoid error when assigning a legacy cache term
> >   perf parse-events: Don't auto merge hybrid wildcard events
> >   perf parse-events: Don't reorder atom cpu events
> >   perf metrics: Be PMU specific for referenced metrics.
> >   perf stat: Command line PMU metric filtering
> >   perf vendor events intel: Correct alderlake metrics
> >   perf jevents: Don't rewrite metrics across PMUs
> >   perf metrics: Be PMU specific in event match
> >   perf stat: Don't disable TopdownL1 metric on hybrid
> >   perf parse-events: Reduce scope of is_event_supported
> >
> >  tools/perf/arch/x86/include/arch-tests.h      |   1 +
> >  tools/perf/arch/x86/tests/Build               |   1 +
> >  tools/perf/arch/x86/tests/arch-tests.c        |  10 +
> >  tools/perf/arch/x86/tests/hybrid.c            | 287 ++++++
> >  tools/perf/arch/x86/util/evlist.c             |   4 +-
> >  tools/perf/builtin-list.c                     |  21 +-
> >  tools/perf/builtin-record.c                   |  13 +-
> >  tools/perf/builtin-stat.c                     |  77 +-
> >  tools/perf/builtin-top.c                      |   5 +-
> >  tools/perf/builtin-trace.c                    |   5 +-
> >  .../arch/x86/alderlake/adl-metrics.json       | 275 +++---
> >  .../arch/x86/alderlaken/adln-metrics.json     |  20 +-
> >  .../arch/x86/broadwell/bdw-metrics.json       |  12 +
> >  .../arch/x86/broadwellde/bdwde-metrics.json   |  12 +
> >  .../arch/x86/broadwellx/bdx-metrics.json      |  12 +
> >  .../arch/x86/cascadelakex/clx-metrics.json    |  12 +
> >  .../arch/x86/haswell/hsw-metrics.json         |  12 +
> >  .../arch/x86/haswellx/hsx-metrics.json        |  12 +
> >  .../arch/x86/icelake/icl-metrics.json         |  23 +
> >  .../arch/x86/icelakex/icx-metrics.json        |  23 +
> >  .../arch/x86/ivybridge/ivb-metrics.json       |  12 +
> >  .../arch/x86/ivytown/ivt-metrics.json         |  12 +
> >  .../arch/x86/jaketown/jkt-metrics.json        |  12 +
> >  .../arch/x86/sandybridge/snb-metrics.json     |  12 +
> >  .../arch/x86/sapphirerapids/spr-metrics.json  |  23 +
> >  .../arch/x86/skylake/skl-metrics.json         |  12 +
> >  .../arch/x86/skylakex/skx-metrics.json        |  12 +
> >  .../arch/x86/tigerlake/tgl-metrics.json       |  23 +
> >  tools/perf/pmu-events/jevents.py              |  10 +-
> >  tools/perf/pmu-events/metric.py               |  28 +-
> >  tools/perf/pmu-events/metric_test.py          |   6 +-
> >  tools/perf/pmu-events/pmu-events.h            |   2 +
> >  tools/perf/tests/evsel-roundtrip-name.c       | 119 +--
> >  tools/perf/tests/expr.c                       |   3 +-
> >  tools/perf/tests/parse-events.c               | 858 +++++++++---------
> >  tools/perf/tests/parse-metric.c               |   1 +
> >  tools/perf/tests/pmu-events.c                 |  12 +-
> >  tools/perf/tests/shell/stat.sh                |  44 +
> >  tools/perf/util/Build                         |   1 -
> >  tools/perf/util/evlist.h                      |   1 -
> >  tools/perf/util/evsel.c                       |  30 +-
> >  tools/perf/util/evsel.h                       |   1 +
> >  tools/perf/util/expr.y                        |   6 +-
> >  tools/perf/util/metricgroup.c                 | 111 ++-
> >  tools/perf/util/metricgroup.h                 |   3 +-
> >  tools/perf/util/parse-events-hybrid.c         | 214 -----
> >  tools/perf/util/parse-events-hybrid.h         |  25 -
> >  tools/perf/util/parse-events.c                | 720 ++++++---------
> >  tools/perf/util/parse-events.h                |  63 +-
> >  tools/perf/util/parse-events.l                | 108 +--
> >  tools/perf/util/parse-events.y                | 222 ++---
> >  tools/perf/util/pmu-hybrid.c                  |  20 -
> >  tools/perf/util/pmu-hybrid.h                  |   1 -
> >  tools/perf/util/pmu.c                         |  16 +-
> >  tools/perf/util/pmu.h                         |   3 +
> >  tools/perf/util/pmus.c                        |  25 +-
> >  tools/perf/util/pmus.h                        |   3 +
> >  tools/perf/util/print-events.c                | 127 ++-
> >  tools/perf/util/stat-display.c                |   2 +-
> >  tools/perf/util/stat-shadow.c                 |  25 +-
> >  60 files changed, 2066 insertions(+), 1699 deletions(-)
> >  create mode 100644 tools/perf/arch/x86/tests/hybrid.c
> >  delete mode 100644 tools/perf/util/parse-events-hybrid.c
> >  delete mode 100644 tools/perf/util/parse-events-hybrid.h
> >
> > --
> > 2.40.1.495.gc816e09b53d-goog
> >
>
> --
>
> - Arnaldo

Re: [PATCH v4 00/44] Fix perf on Intel hybrid CPUs

Posted by Liang, Kan 2 years, 9 months ago


On 2023-05-02 6:38 p.m., Ian Rogers wrote:
> TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
> or individually, event parsing doesn't always scan all PMUs, more and
> new tests that also run without hybrid, less code.
> 
> The first 4 patches are aimed at Linux 6.4 to address issues raised,
> in particular by Kan, on the existing perf stat behavior with json
> metrics. They avoid duplicated events by removing groups. They don't
> hide events and metrics to make event multiplexing obvious. They avoid
> terminating perf when paranoia is higher due to certain events that
> always fail. They avoid rearranging events by PMUs when the events
> aren't in a group.
> 
> The next 5 patches avoid grouping events for metrics where they could
> never succeed and were previously posted as:
> "perf vendor events intel: Add xxx metric constraints"
> https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/
> In general the generated json is coming from:
> https://github.com/intel/perfmon/pull/73
> 
> Next are some general and test improvements.
> 
> Next event parsing is rewritten to not scan all PMUs for the benefit
> of raw and legacy cache parsing, instead these are handled by the
> lexer and a new term type. This ultimately removes the need for the
> event parser for hybrid to be recursive as legacy cache can be just a
> term. Tests are re-enabled for events with hyphens, so AMD's
> branch-brs event is now parsable.
> 
> The cputype option is made a generic pmu filter flag and is tested
> even on non-hybrid systems.
> 
> The final patches address specific json metric issues on hybrid, in
> both the json metrics and the metric code.
> 
> The patches add slightly more code than they remove, in areas like
> better json metric constraints and tests, but in the core util code,
> the removal of hybrid is a net reduction:
>  22 files changed, 711 insertions(+), 1016 deletions(-)
> 
> Sample output is contained in the v1 patch set:
> https://lore.kernel.org/lkml/bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com/
> 
> Tested on Tigerlake, Skylake and Alderlake CPUs.
> 
> The v4 patch set:
>  - rebase, 1 of the Linux 6.4 recommended patches are merged leaving:
>    1) perf metric: Change divide by zero and !support events behavior
>    2) perf stat: Introduce skippable evsels
>    3) perf metric: Json flag to not group events if gathering a metric group
>    4) perf parse-events: Don't reorder ungrouped events by pmu
>    whose diffstat is:
>     30 files changed, 326 insertions(+), 33 deletions(-)
>    but without the vendor event updates (the tend to be large as they
>    repeat something per architecture per metric) is just:
>     10 files changed, 90 insertions(+), 32 deletions(-)

I have tested the 4 patches on top of the perf-tools-next branch on both
Cascade Lake and Raptor Lake. The result looks good to me.

They address the permission error found in the default mode of perf stat
on the Cascade Lake. Thanks Ian for the fix.

Arnaldo, could you please consider to back port them for the 6.4?

Thanks,
Kan

Re: [PATCH v4 00/44] Fix perf on Intel hybrid CPUs

Posted by Arnaldo Carvalho de Melo 2 years, 9 months ago

Em Wed, May 03, 2023 at 04:56:36PM -0400, Liang, Kan escreveu:
> 
> 
> On 2023-05-02 6:38 p.m., Ian Rogers wrote:
> > TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
> > or individually, event parsing doesn't always scan all PMUs, more and
> > new tests that also run without hybrid, less code.
> > 
> > The first 4 patches are aimed at Linux 6.4 to address issues raised,
> > in particular by Kan, on the existing perf stat behavior with json
> > metrics. They avoid duplicated events by removing groups. They don't
> > hide events and metrics to make event multiplexing obvious. They avoid
> > terminating perf when paranoia is higher due to certain events that
> > always fail. They avoid rearranging events by PMUs when the events
> > aren't in a group.
> > 
> > The next 5 patches avoid grouping events for metrics where they could
> > never succeed and were previously posted as:
> > "perf vendor events intel: Add xxx metric constraints"
> > https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/
> > In general the generated json is coming from:
> > https://github.com/intel/perfmon/pull/73
> > 
> > Next are some general and test improvements.
> > 
> > Next event parsing is rewritten to not scan all PMUs for the benefit
> > of raw and legacy cache parsing, instead these are handled by the
> > lexer and a new term type. This ultimately removes the need for the
> > event parser for hybrid to be recursive as legacy cache can be just a
> > term. Tests are re-enabled for events with hyphens, so AMD's
> > branch-brs event is now parsable.
> > 
> > The cputype option is made a generic pmu filter flag and is tested
> > even on non-hybrid systems.
> > 
> > The final patches address specific json metric issues on hybrid, in
> > both the json metrics and the metric code.
> > 
> > The patches add slightly more code than they remove, in areas like
> > better json metric constraints and tests, but in the core util code,
> > the removal of hybrid is a net reduction:
> >  22 files changed, 711 insertions(+), 1016 deletions(-)
> > 
> > Sample output is contained in the v1 patch set:
> > https://lore.kernel.org/lkml/bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com/
> > 
> > Tested on Tigerlake, Skylake and Alderlake CPUs.
> > 
> > The v4 patch set:
> >  - rebase, 1 of the Linux 6.4 recommended patches are merged leaving:
> >    1) perf metric: Change divide by zero and !support events behavior
> >    2) perf stat: Introduce skippable evsels
> >    3) perf metric: Json flag to not group events if gathering a metric group
> >    4) perf parse-events: Don't reorder ungrouped events by pmu
> >    whose diffstat is:
> >     30 files changed, 326 insertions(+), 33 deletions(-)
> >    but without the vendor event updates (the tend to be large as they
> >    repeat something per architecture per metric) is just:
> >     10 files changed, 90 insertions(+), 32 deletions(-)
> 
> I have tested the 4 patches on top of the perf-tools-next branch on both
> Cascade Lake and Raptor Lake. The result looks good to me.
> 
> They address the permission error found in the default mode of perf stat
> on the Cascade Lake. Thanks Ian for the fix.
> 
> Arnaldo, could you please consider to back port them for the 6.4?

Yes, its in perf-tools now, will go to Linus next week.

What about the other patches? I saw some you provided your review, what
about the others, are you ok with them?

- Arnaldo

Re: [PATCH v4 00/44] Fix perf on Intel hybrid CPUs

Posted by Liang, Kan 2 years, 8 months ago


On 2023-05-12 2:33 p.m., Arnaldo Carvalho de Melo wrote:
> Em Wed, May 03, 2023 at 04:56:36PM -0400, Liang, Kan escreveu:
>>
>>
>> On 2023-05-02 6:38 p.m., Ian Rogers wrote:
>>> TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
>>> or individually, event parsing doesn't always scan all PMUs, more and
>>> new tests that also run without hybrid, less code.
>>>
>>> The first 4 patches are aimed at Linux 6.4 to address issues raised,
>>> in particular by Kan, on the existing perf stat behavior with json
>>> metrics. They avoid duplicated events by removing groups. They don't
>>> hide events and metrics to make event multiplexing obvious. They avoid
>>> terminating perf when paranoia is higher due to certain events that
>>> always fail. They avoid rearranging events by PMUs when the events
>>> aren't in a group.
>>>
>>> The next 5 patches avoid grouping events for metrics where they could
>>> never succeed and were previously posted as:
>>> "perf vendor events intel: Add xxx metric constraints"
>>> https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/
>>> In general the generated json is coming from:
>>> https://github.com/intel/perfmon/pull/73
>>>
>>> Next are some general and test improvements.
>>>
>>> Next event parsing is rewritten to not scan all PMUs for the benefit
>>> of raw and legacy cache parsing, instead these are handled by the
>>> lexer and a new term type. This ultimately removes the need for the
>>> event parser for hybrid to be recursive as legacy cache can be just a
>>> term. Tests are re-enabled for events with hyphens, so AMD's
>>> branch-brs event is now parsable.
>>>
>>> The cputype option is made a generic pmu filter flag and is tested
>>> even on non-hybrid systems.
>>>
>>> The final patches address specific json metric issues on hybrid, in
>>> both the json metrics and the metric code.
>>>
>>> The patches add slightly more code than they remove, in areas like
>>> better json metric constraints and tests, but in the core util code,
>>> the removal of hybrid is a net reduction:
>>>  22 files changed, 711 insertions(+), 1016 deletions(-)
>>>
>>> Sample output is contained in the v1 patch set:
>>> https://lore.kernel.org/lkml/bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com/
>>>
>>> Tested on Tigerlake, Skylake and Alderlake CPUs.
>>>
>>> The v4 patch set:
>>>  - rebase, 1 of the Linux 6.4 recommended patches are merged leaving:
>>>    1) perf metric: Change divide by zero and !support events behavior
>>>    2) perf stat: Introduce skippable evsels
>>>    3) perf metric: Json flag to not group events if gathering a metric group
>>>    4) perf parse-events: Don't reorder ungrouped events by pmu
>>>    whose diffstat is:
>>>     30 files changed, 326 insertions(+), 33 deletions(-)
>>>    but without the vendor event updates (the tend to be large as they
>>>    repeat something per architecture per metric) is just:
>>>     10 files changed, 90 insertions(+), 32 deletions(-)
>>
>> I have tested the 4 patches on top of the perf-tools-next branch on both
>> Cascade Lake and Raptor Lake. The result looks good to me.
>>
>> They address the permission error found in the default mode of perf stat
>> on the Cascade Lake. Thanks Ian for the fix.
>>
>> Arnaldo, could you please consider to back port them for the 6.4?
> 
> Yes, its in perf-tools now, will go to Linus next week.

Thanks Arnaldo!

> 
> What about the other patches? I saw some you provided your review, what
> about the others, are you ok with them?
> 

Yes, I'm OK with the patch set. It fixes many issues. Thanks Ian.
(My tests mainly focus on the area in which the patch set may touch. I
did the tests on various platforms, ADL (hybrid), Cascade Lake, SPR.)

Tested-by: Kan Liang <kan.liang@linux.intel.com>

But there are still some issues. I don't think they are introduced by
this patch set. We may fix them later separately.

- Segmentation fault with perf stat --topdown on ADL (hybrid) and
Cascade Lake.
  It looks like a legacy issue, may not be introduced by this patch set.
  Here is the backtrace. It looks like there is a NULL metric_group.

(gdb) backtrace
#0  0x00007ffff73035d1 in __strstr_sse2_unaligned () from /lib64/libc.so.6
#1  0x00000000004f9019 in metricgroup__topdown_max_level_callback
(pm=<optimized out>, table=<optimized out>,
    data=0x7fffffff92f4) at util/metricgroup.c:1722
#2  0x00000000005e8a31 in pmu_metrics_table_for_each_metric
(table=0xcb74d0 <pmu_events_map+368>,
    fn=fn@entry=0x4f8ff0 <metricgroup__topdown_max_level_callback>,
data=data@entry=0x7fffffff92f4)
    at pmu-events/pmu-events.c:61123
#3  0x00000000004fbc3b in metricgroups__topdown_max_level () at
util/metricgroup.c:1742
#4  0x000000000042c135 in add_default_attributes () at builtin-stat.c:1845
#5  cmd_stat (argc=0, argv=0x7fffffffe3e0) at builtin-stat.c:2446
#6  0x00000000004b922b in run_builtin (p=p@entry=0xd5c530
<commands+336>, argc=argc@entry=2,
    argv=argv@entry=0x7fffffffe3e0) at perf.c:323
#7  0x000000000040e373 in handle_internal_command (argv=0x7fffffffe3e0,
argc=2) at perf.c:377
#8  run_argv (argv=<synthetic pointer>, argcp=<synthetic pointer>) at
perf.c:421
#9  main (argc=2, argv=0x7fffffffe3e0) at perf.c:537
(gdb)

Also, the return type is unsigned int, but a bool is given.

unsigned int metricgroups__topdown_max_level(void)
{
        unsigned int max_level = 0;
        const struct pmu_metrics_table *table = pmu_metrics_table__find();

        if (!table)
                return false;



- The perf metric and metricgroups fail on different platforms.
  Ian and I have discussed it. We agree to address it later separately.

  102: perf all metricgroups test
  ADL (hybrid)
  103: perf all metrics test
  ADL (hybrid), Cascade Lake, SPR

- perf list: The [Kernel PMU event] is missed for all the hardware cache
events.
  It impacts both hybrid and non-hybrid platforms.
  It's a user-visible change introduced by the patch set.
  I don't know if anyone cares whether it's a kernel event or a regular
event. It doesn't bother me. So I'm OK with it.

cpu:
  L1-dcache-loads OR cpu/L1-dcache-loads/
  L1-dcache-load-misses OR cpu/L1-dcache-load-misses/
  L1-dcache-stores OR cpu/L1-dcache-stores/
  L1-icache-load-misses OR cpu/L1-icache-load-misses/
  LLC-loads OR cpu/LLC-loads/
  LLC-load-misses OR cpu/LLC-load-misses/
  LLC-stores OR cpu/LLC-stores/
  LLC-store-misses OR cpu/LLC-store-misses/
  dTLB-loads OR cpu/dTLB-loads/
  dTLB-load-misses OR cpu/dTLB-load-misses/
  dTLB-stores OR cpu/dTLB-stores/
  dTLB-store-misses OR cpu/dTLB-store-misses/
  iTLB-load-misses OR cpu/iTLB-load-misses/
  branch-loads OR cpu/branch-loads/
  branch-load-misses OR cpu/branch-load-misses/
  node-loads OR cpu/node-loads/
  node-load-misses OR cpu/node-load-misses/
  branch-instructions OR cpu/branch-instructions/    [Kernel PMU event]
  branch-misses OR cpu/branch-misses/                [Kernel PMU event]
  bus-cycles OR cpu/bus-cycles/                      [Kernel PMU event]
  cache-misses OR cpu/cache-misses/                  [Kernel PMU event]
  cache-references OR cpu/cache-references/          [Kernel PMU event]
  cpu-cycles OR cpu/cpu-cycles/                      [Kernel PMU event]
  instructions OR cpu/instructions/                  [Kernel PMU event]


- The --cputype only works for the metric in the default mode.
  I can still see the cpu_atom events with --cputype core
  It may be something we can improve later.

# ./perf stat --cputype core sleep 2

 Performance counter stats for 'sleep 2':

              0.52 msec task-clock                       #    0.000 CPUs
utilized
                 1      context-switches                 #    1.939 K/sec
                 0      cpu-migrations                   #    0.000 /sec
                69      page-faults                      #  133.770 K/sec
         2,569,423      cpu_core/cycles/                 #    4.981 G/sec
     <not counted>      cpu_atom/cycles/
                       (0.00%)
         3,287,691      cpu_core/instructions/           #    6.374 G/sec
     <not counted>      cpu_atom/instructions/
                       (0.00%)
           555,848      cpu_core/branches/               #    1.078 G/sec
     <not counted>      cpu_atom/branches/
                       (0.00%)
             8,398      cpu_core/branch-misses/          #   16.281 M/sec
     <not counted>      cpu_atom/branch-misses/
                       (0.00%)
        15,416,538      cpu_core/TOPDOWN.SLOTS/          #     36.1 %
tma_backend_bound
                                                  #     23.9 %  tma_retiring
                                                  #      5.6 %
tma_bad_speculation
                                                  #     34.4 %
tma_frontend_bound
         3,687,877      cpu_core/topdown-retiring/
           846,398      cpu_core/topdown-bad-spec/
         5,320,217      cpu_core/topdown-fe-bound/
         5,562,045      cpu_core/topdown-be-bound/
            14,149      cpu_core/INT_MISC.UOP_DROPPING/  #   27.431 M/sec


Thanks,
Kan

Re: [PATCH v4 00/44] Fix perf on Intel hybrid CPUs

Posted by Ian Rogers 2 years, 8 months ago

On Sun, May 14, 2023 at 5:03 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2023-05-12 2:33 p.m., Arnaldo Carvalho de Melo wrote:
> > Em Wed, May 03, 2023 at 04:56:36PM -0400, Liang, Kan escreveu:
> >>
> >>
> >> On 2023-05-02 6:38 p.m., Ian Rogers wrote:
> >>> TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
> >>> or individually, event parsing doesn't always scan all PMUs, more and
> >>> new tests that also run without hybrid, less code.
> >>>
> >>> The first 4 patches are aimed at Linux 6.4 to address issues raised,
> >>> in particular by Kan, on the existing perf stat behavior with json
> >>> metrics. They avoid duplicated events by removing groups. They don't
> >>> hide events and metrics to make event multiplexing obvious. They avoid
> >>> terminating perf when paranoia is higher due to certain events that
> >>> always fail. They avoid rearranging events by PMUs when the events
> >>> aren't in a group.
> >>>
> >>> The next 5 patches avoid grouping events for metrics where they could
> >>> never succeed and were previously posted as:
> >>> "perf vendor events intel: Add xxx metric constraints"
> >>> https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/
> >>> In general the generated json is coming from:
> >>> https://github.com/intel/perfmon/pull/73
> >>>
> >>> Next are some general and test improvements.
> >>>
> >>> Next event parsing is rewritten to not scan all PMUs for the benefit
> >>> of raw and legacy cache parsing, instead these are handled by the
> >>> lexer and a new term type. This ultimately removes the need for the
> >>> event parser for hybrid to be recursive as legacy cache can be just a
> >>> term. Tests are re-enabled for events with hyphens, so AMD's
> >>> branch-brs event is now parsable.
> >>>
> >>> The cputype option is made a generic pmu filter flag and is tested
> >>> even on non-hybrid systems.
> >>>
> >>> The final patches address specific json metric issues on hybrid, in
> >>> both the json metrics and the metric code.
> >>>
> >>> The patches add slightly more code than they remove, in areas like
> >>> better json metric constraints and tests, but in the core util code,
> >>> the removal of hybrid is a net reduction:
> >>>  22 files changed, 711 insertions(+), 1016 deletions(-)
> >>>
> >>> Sample output is contained in the v1 patch set:
> >>> https://lore.kernel.org/lkml/bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com/
> >>>
> >>> Tested on Tigerlake, Skylake and Alderlake CPUs.
> >>>
> >>> The v4 patch set:
> >>>  - rebase, 1 of the Linux 6.4 recommended patches are merged leaving:
> >>>    1) perf metric: Change divide by zero and !support events behavior
> >>>    2) perf stat: Introduce skippable evsels
> >>>    3) perf metric: Json flag to not group events if gathering a metric group
> >>>    4) perf parse-events: Don't reorder ungrouped events by pmu
> >>>    whose diffstat is:
> >>>     30 files changed, 326 insertions(+), 33 deletions(-)
> >>>    but without the vendor event updates (the tend to be large as they
> >>>    repeat something per architecture per metric) is just:
> >>>     10 files changed, 90 insertions(+), 32 deletions(-)
> >>
> >> I have tested the 4 patches on top of the perf-tools-next branch on both
> >> Cascade Lake and Raptor Lake. The result looks good to me.
> >>
> >> They address the permission error found in the default mode of perf stat
> >> on the Cascade Lake. Thanks Ian for the fix.
> >>
> >> Arnaldo, could you please consider to back port them for the 6.4?
> >
> > Yes, its in perf-tools now, will go to Linus next week.
>
> Thanks Arnaldo!
>
> >
> > What about the other patches? I saw some you provided your review, what
> > about the others, are you ok with them?
> >
>
> Yes, I'm OK with the patch set. It fixes many issues. Thanks Ian.
> (My tests mainly focus on the area in which the patch set may touch. I
> did the tests on various platforms, ADL (hybrid), Cascade Lake, SPR.)
>
> Tested-by: Kan Liang <kan.liang@linux.intel.com>
>
> But there are still some issues. I don't think they are introduced by
> this patch set. We may fix them later separately.
>
> - Segmentation fault with perf stat --topdown on ADL (hybrid) and
> Cascade Lake.
>   It looks like a legacy issue, may not be introduced by this patch set.
>   Here is the backtrace. It looks like there is a NULL metric_group.
>
> (gdb) backtrace
> #0  0x00007ffff73035d1 in __strstr_sse2_unaligned () from /lib64/libc.so.6
> #1  0x00000000004f9019 in metricgroup__topdown_max_level_callback
> (pm=<optimized out>, table=<optimized out>,
>     data=0x7fffffff92f4) at util/metricgroup.c:1722
> #2  0x00000000005e8a31 in pmu_metrics_table_for_each_metric
> (table=0xcb74d0 <pmu_events_map+368>,
>     fn=fn@entry=0x4f8ff0 <metricgroup__topdown_max_level_callback>,
> data=data@entry=0x7fffffff92f4)
>     at pmu-events/pmu-events.c:61123
> #3  0x00000000004fbc3b in metricgroups__topdown_max_level () at
> util/metricgroup.c:1742
> #4  0x000000000042c135 in add_default_attributes () at builtin-stat.c:1845
> #5  cmd_stat (argc=0, argv=0x7fffffffe3e0) at builtin-stat.c:2446
> #6  0x00000000004b922b in run_builtin (p=p@entry=0xd5c530
> <commands+336>, argc=argc@entry=2,
>     argv=argv@entry=0x7fffffffe3e0) at perf.c:323
> #7  0x000000000040e373 in handle_internal_command (argv=0x7fffffffe3e0,
> argc=2) at perf.c:377
> #8  run_argv (argv=<synthetic pointer>, argcp=<synthetic pointer>) at
> perf.c:421
> #9  main (argc=2, argv=0x7fffffffe3e0) at perf.c:537
> (gdb)

Thanks, there's a 1-liner for the segv:
https://lore.kernel.org/lkml/20230515224530.671331-1-irogers@google.com/
Arnaldo, can we make sure this goes toward the next 6.4 release candidate?

Thanks,
Ian

> Also, the return type is unsigned int, but a bool is given.
>
> unsigned int metricgroups__topdown_max_level(void)
> {
>         unsigned int max_level = 0;
>         const struct pmu_metrics_table *table = pmu_metrics_table__find();
>
>         if (!table)
>                 return false;
>
>
>
> - The perf metric and metricgroups fail on different platforms.
>   Ian and I have discussed it. We agree to address it later separately.
>
>   102: perf all metricgroups test
>   ADL (hybrid)
>   103: perf all metrics test
>   ADL (hybrid), Cascade Lake, SPR
>
> - perf list: The [Kernel PMU event] is missed for all the hardware cache
> events.
>   It impacts both hybrid and non-hybrid platforms.
>   It's a user-visible change introduced by the patch set.
>   I don't know if anyone cares whether it's a kernel event or a regular
> event. It doesn't bother me. So I'm OK with it.
>
> cpu:
>   L1-dcache-loads OR cpu/L1-dcache-loads/
>   L1-dcache-load-misses OR cpu/L1-dcache-load-misses/
>   L1-dcache-stores OR cpu/L1-dcache-stores/
>   L1-icache-load-misses OR cpu/L1-icache-load-misses/
>   LLC-loads OR cpu/LLC-loads/
>   LLC-load-misses OR cpu/LLC-load-misses/
>   LLC-stores OR cpu/LLC-stores/
>   LLC-store-misses OR cpu/LLC-store-misses/
>   dTLB-loads OR cpu/dTLB-loads/
>   dTLB-load-misses OR cpu/dTLB-load-misses/
>   dTLB-stores OR cpu/dTLB-stores/
>   dTLB-store-misses OR cpu/dTLB-store-misses/
>   iTLB-load-misses OR cpu/iTLB-load-misses/
>   branch-loads OR cpu/branch-loads/
>   branch-load-misses OR cpu/branch-load-misses/
>   node-loads OR cpu/node-loads/
>   node-load-misses OR cpu/node-load-misses/
>   branch-instructions OR cpu/branch-instructions/    [Kernel PMU event]
>   branch-misses OR cpu/branch-misses/                [Kernel PMU event]
>   bus-cycles OR cpu/bus-cycles/                      [Kernel PMU event]
>   cache-misses OR cpu/cache-misses/                  [Kernel PMU event]
>   cache-references OR cpu/cache-references/          [Kernel PMU event]
>   cpu-cycles OR cpu/cpu-cycles/                      [Kernel PMU event]
>   instructions OR cpu/instructions/                  [Kernel PMU event]
>
>
> - The --cputype only works for the metric in the default mode.
>   I can still see the cpu_atom events with --cputype core
>   It may be something we can improve later.
>
> # ./perf stat --cputype core sleep 2
>
>  Performance counter stats for 'sleep 2':
>
>               0.52 msec task-clock                       #    0.000 CPUs
> utilized
>                  1      context-switches                 #    1.939 K/sec
>                  0      cpu-migrations                   #    0.000 /sec
>                 69      page-faults                      #  133.770 K/sec
>          2,569,423      cpu_core/cycles/                 #    4.981 G/sec
>      <not counted>      cpu_atom/cycles/
>                        (0.00%)
>          3,287,691      cpu_core/instructions/           #    6.374 G/sec
>      <not counted>      cpu_atom/instructions/
>                        (0.00%)
>            555,848      cpu_core/branches/               #    1.078 G/sec
>      <not counted>      cpu_atom/branches/
>                        (0.00%)
>              8,398      cpu_core/branch-misses/          #   16.281 M/sec
>      <not counted>      cpu_atom/branch-misses/
>                        (0.00%)
>         15,416,538      cpu_core/TOPDOWN.SLOTS/          #     36.1 %
> tma_backend_bound
>                                                   #     23.9 %  tma_retiring
>                                                   #      5.6 %
> tma_bad_speculation
>                                                   #     34.4 %
> tma_frontend_bound
>          3,687,877      cpu_core/topdown-retiring/
>            846,398      cpu_core/topdown-bad-spec/
>          5,320,217      cpu_core/topdown-fe-bound/
>          5,562,045      cpu_core/topdown-be-bound/
>             14,149      cpu_core/INT_MISC.UOP_DROPPING/  #   27.431 M/sec
>
>
> Thanks,
> Kan

Re: [PATCH v4 00/44] Fix perf on Intel hybrid CPUs

Posted by Arnaldo Carvalho de Melo 2 years, 8 months ago

Em Mon, May 15, 2023 at 03:49:45PM -0700, Ian Rogers escreveu:
> On Sun, May 14, 2023 at 5:03 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> > - Segmentation fault with perf stat --topdown on ADL (hybrid) and
> > Cascade Lake.
> >   It looks like a legacy issue, may not be introduced by this patch set.
> >   Here is the backtrace. It looks like there is a NULL metric_group.
> >
> > (gdb) backtrace
> > #0  0x00007ffff73035d1 in __strstr_sse2_unaligned () from /lib64/libc.so.6
> > #1  0x00000000004f9019 in metricgroup__topdown_max_level_callback
> > (pm=<optimized out>, table=<optimized out>,
> >     data=0x7fffffff92f4) at util/metricgroup.c:1722
> > #2  0x00000000005e8a31 in pmu_metrics_table_for_each_metric
> > (table=0xcb74d0 <pmu_events_map+368>,
> >     fn=fn@entry=0x4f8ff0 <metricgroup__topdown_max_level_callback>,
> > data=data@entry=0x7fffffff92f4)
> >     at pmu-events/pmu-events.c:61123
> > #3  0x00000000004fbc3b in metricgroups__topdown_max_level () at
> > util/metricgroup.c:1742
> > #4  0x000000000042c135 in add_default_attributes () at builtin-stat.c:1845
> > #5  cmd_stat (argc=0, argv=0x7fffffffe3e0) at builtin-stat.c:2446
> > #6  0x00000000004b922b in run_builtin (p=p@entry=0xd5c530
> > <commands+336>, argc=argc@entry=2,
> >     argv=argv@entry=0x7fffffffe3e0) at perf.c:323
> > #7  0x000000000040e373 in handle_internal_command (argv=0x7fffffffe3e0,
> > argc=2) at perf.c:377
> > #8  run_argv (argv=<synthetic pointer>, argcp=<synthetic pointer>) at
> > perf.c:421
> > #9  main (argc=2, argv=0x7fffffffe3e0) at perf.c:537
> > (gdb)
> 
> Thanks, there's a 1-liner for the segv:
> https://lore.kernel.org/lkml/20230515224530.671331-1-irogers@google.com/
> Arnaldo, can we make sure this goes toward the next 6.4 release candidate?

Sure.

- Arnaldo

Re: [PATCH v4 00/44] Fix perf on Intel hybrid CPUs

Posted by Arnaldo Carvalho de Melo 2 years, 8 months ago

Em Sun, May 14, 2023 at 08:03:38AM -0400, Liang, Kan escreveu:
> On 2023-05-12 2:33 p.m., Arnaldo Carvalho de Melo wrote:
> > Em Wed, May 03, 2023 at 04:56:36PM -0400, Liang, Kan escreveu:
> >> On 2023-05-02 6:38 p.m., Ian Rogers wrote:
> >>> TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
> >>> or individually, event parsing doesn't always scan all PMUs, more and
> >>> new tests that also run without hybrid, less code.
> >>>
> >>> The first 4 patches are aimed at Linux 6.4 to address issues raised,
> >>> in particular by Kan, on the existing perf stat behavior with json
> >>> metrics. They avoid duplicated events by removing groups. They don't
> >>> hide events and metrics to make event multiplexing obvious. They avoid
> >>> terminating perf when paranoia is higher due to certain events that
> >>> always fail. They avoid rearranging events by PMUs when the events
> >>> aren't in a group.
> >>>
> >>> The next 5 patches avoid grouping events for metrics where they could
> >>> never succeed and were previously posted as:
> >>> "perf vendor events intel: Add xxx metric constraints"
> >>> https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/
> >>> In general the generated json is coming from:
> >>> https://github.com/intel/perfmon/pull/73
> >>>
> >>> Next are some general and test improvements.
> >>>
> >>> Next event parsing is rewritten to not scan all PMUs for the benefit
> >>> of raw and legacy cache parsing, instead these are handled by the
> >>> lexer and a new term type. This ultimately removes the need for the
> >>> event parser for hybrid to be recursive as legacy cache can be just a
> >>> term. Tests are re-enabled for events with hyphens, so AMD's
> >>> branch-brs event is now parsable.
> >>>
> >>> The cputype option is made a generic pmu filter flag and is tested
> >>> even on non-hybrid systems.
> >>>
> >>> The final patches address specific json metric issues on hybrid, in
> >>> both the json metrics and the metric code.
> >>>
> >>> The patches add slightly more code than they remove, in areas like
> >>> better json metric constraints and tests, but in the core util code,
> >>> the removal of hybrid is a net reduction:
> >>>  22 files changed, 711 insertions(+), 1016 deletions(-)
> >>>
> >>> Sample output is contained in the v1 patch set:
> >>> https://lore.kernel.org/lkml/bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com/
> >>>
> >>> Tested on Tigerlake, Skylake and Alderlake CPUs.
> >>>
> >>> The v4 patch set:
> >>>  - rebase, 1 of the Linux 6.4 recommended patches are merged leaving:
> >>>    1) perf metric: Change divide by zero and !support events behavior
> >>>    2) perf stat: Introduce skippable evsels
> >>>    3) perf metric: Json flag to not group events if gathering a metric group
> >>>    4) perf parse-events: Don't reorder ungrouped events by pmu
> >>>    whose diffstat is:
> >>>     30 files changed, 326 insertions(+), 33 deletions(-)
> >>>    but without the vendor event updates (the tend to be large as they
> >>>    repeat something per architecture per metric) is just:
> >>>     10 files changed, 90 insertions(+), 32 deletions(-)
> >>
> >> I have tested the 4 patches on top of the perf-tools-next branch on both
> >> Cascade Lake and Raptor Lake. The result looks good to me.
> >>
> >> They address the permission error found in the default mode of perf stat
> >> on the Cascade Lake. Thanks Ian for the fix.
> >>
> >> Arnaldo, could you please consider to back port them for the 6.4?
> > 
> > Yes, its in perf-tools now, will go to Linus next week.
> 
> Thanks Arnaldo!
> 
> > 
> > What about the other patches? I saw some you provided your review, what
> > about the others, are you ok with them?
> > 
> 
> Yes, I'm OK with the patch set. It fixes many issues. Thanks Ian.
> (My tests mainly focus on the area in which the patch set may touch. I
> did the tests on various platforms, ADL (hybrid), Cascade Lake, SPR.)
> 
> Tested-by: Kan Liang <kan.liang@linux.intel.com>

Thanks!
 
> But there are still some issues. I don't think they are introduced by
> this patch set. We may fix them later separately.

Yeah, look forward to receiving the patches :-)

- Arnaldo
 
> - Segmentation fault with perf stat --topdown on ADL (hybrid) and
> Cascade Lake.
>   It looks like a legacy issue, may not be introduced by this patch set.
>   Here is the backtrace. It looks like there is a NULL metric_group.
> 
> (gdb) backtrace
> #0  0x00007ffff73035d1 in __strstr_sse2_unaligned () from /lib64/libc.so.6
> #1  0x00000000004f9019 in metricgroup__topdown_max_level_callback
> (pm=<optimized out>, table=<optimized out>,
>     data=0x7fffffff92f4) at util/metricgroup.c:1722
> #2  0x00000000005e8a31 in pmu_metrics_table_for_each_metric
> (table=0xcb74d0 <pmu_events_map+368>,
>     fn=fn@entry=0x4f8ff0 <metricgroup__topdown_max_level_callback>,
> data=data@entry=0x7fffffff92f4)
>     at pmu-events/pmu-events.c:61123
> #3  0x00000000004fbc3b in metricgroups__topdown_max_level () at
> util/metricgroup.c:1742
> #4  0x000000000042c135 in add_default_attributes () at builtin-stat.c:1845
> #5  cmd_stat (argc=0, argv=0x7fffffffe3e0) at builtin-stat.c:2446
> #6  0x00000000004b922b in run_builtin (p=p@entry=0xd5c530
> <commands+336>, argc=argc@entry=2,
>     argv=argv@entry=0x7fffffffe3e0) at perf.c:323
> #7  0x000000000040e373 in handle_internal_command (argv=0x7fffffffe3e0,
> argc=2) at perf.c:377
> #8  run_argv (argv=<synthetic pointer>, argcp=<synthetic pointer>) at
> perf.c:421
> #9  main (argc=2, argv=0x7fffffffe3e0) at perf.c:537
> (gdb)
> 
> Also, the return type is unsigned int, but a bool is given.
> 
> unsigned int metricgroups__topdown_max_level(void)
> {
>         unsigned int max_level = 0;
>         const struct pmu_metrics_table *table = pmu_metrics_table__find();
> 
>         if (!table)
>                 return false;
> 
> 
> 
> - The perf metric and metricgroups fail on different platforms.
>   Ian and I have discussed it. We agree to address it later separately.
> 
>   102: perf all metricgroups test
>   ADL (hybrid)
>   103: perf all metrics test
>   ADL (hybrid), Cascade Lake, SPR
> 
> - perf list: The [Kernel PMU event] is missed for all the hardware cache
> events.
>   It impacts both hybrid and non-hybrid platforms.
>   It's a user-visible change introduced by the patch set.
>   I don't know if anyone cares whether it's a kernel event or a regular
> event. It doesn't bother me. So I'm OK with it.
> 
> cpu:
>   L1-dcache-loads OR cpu/L1-dcache-loads/
>   L1-dcache-load-misses OR cpu/L1-dcache-load-misses/
>   L1-dcache-stores OR cpu/L1-dcache-stores/
>   L1-icache-load-misses OR cpu/L1-icache-load-misses/
>   LLC-loads OR cpu/LLC-loads/
>   LLC-load-misses OR cpu/LLC-load-misses/
>   LLC-stores OR cpu/LLC-stores/
>   LLC-store-misses OR cpu/LLC-store-misses/
>   dTLB-loads OR cpu/dTLB-loads/
>   dTLB-load-misses OR cpu/dTLB-load-misses/
>   dTLB-stores OR cpu/dTLB-stores/
>   dTLB-store-misses OR cpu/dTLB-store-misses/
>   iTLB-load-misses OR cpu/iTLB-load-misses/
>   branch-loads OR cpu/branch-loads/
>   branch-load-misses OR cpu/branch-load-misses/
>   node-loads OR cpu/node-loads/
>   node-load-misses OR cpu/node-load-misses/
>   branch-instructions OR cpu/branch-instructions/    [Kernel PMU event]
>   branch-misses OR cpu/branch-misses/                [Kernel PMU event]
>   bus-cycles OR cpu/bus-cycles/                      [Kernel PMU event]
>   cache-misses OR cpu/cache-misses/                  [Kernel PMU event]
>   cache-references OR cpu/cache-references/          [Kernel PMU event]
>   cpu-cycles OR cpu/cpu-cycles/                      [Kernel PMU event]
>   instructions OR cpu/instructions/                  [Kernel PMU event]
> 
> 
> - The --cputype only works for the metric in the default mode.
>   I can still see the cpu_atom events with --cputype core
>   It may be something we can improve later.
> 
> # ./perf stat --cputype core sleep 2
> 
>  Performance counter stats for 'sleep 2':
> 
>               0.52 msec task-clock                       #    0.000 CPUs
> utilized
>                  1      context-switches                 #    1.939 K/sec
>                  0      cpu-migrations                   #    0.000 /sec
>                 69      page-faults                      #  133.770 K/sec
>          2,569,423      cpu_core/cycles/                 #    4.981 G/sec
>      <not counted>      cpu_atom/cycles/
>                        (0.00%)
>          3,287,691      cpu_core/instructions/           #    6.374 G/sec
>      <not counted>      cpu_atom/instructions/
>                        (0.00%)
>            555,848      cpu_core/branches/               #    1.078 G/sec
>      <not counted>      cpu_atom/branches/
>                        (0.00%)
>              8,398      cpu_core/branch-misses/          #   16.281 M/sec
>      <not counted>      cpu_atom/branch-misses/
>                        (0.00%)
>         15,416,538      cpu_core/TOPDOWN.SLOTS/          #     36.1 %
> tma_backend_bound
>                                                   #     23.9 %  tma_retiring
>                                                   #      5.6 %
> tma_bad_speculation
>                                                   #     34.4 %
> tma_frontend_bound
>          3,687,877      cpu_core/topdown-retiring/
>            846,398      cpu_core/topdown-bad-spec/
>          5,320,217      cpu_core/topdown-fe-bound/
>          5,562,045      cpu_core/topdown-be-bound/
>             14,149      cpu_core/INT_MISC.UOP_DROPPING/  #   27.431 M/sec
> 
> 
> Thanks,
> Kan

-- 

- Arnaldo