[PATCH v3 00/23] Improvements to Intel perf metrics

Ian Rogers posted 23 patches 1 year, 6 months ago
.../arch/test/test_soc/cpu/metrics.json       |    6 +-
.../arch/x86/alderlake/adl-metrics.json       | 1353 ++++++++++++++++-
.../pmu-events/arch/x86/alderlake/cache.json  |  129 +-
.../arch/x86/alderlake/frontend.json          |   12 +
.../pmu-events/arch/x86/alderlake/memory.json |   22 +
.../pmu-events/arch/x86/alderlake/other.json  |   22 +
.../arch/x86/alderlake/pipeline.json          |   14 +-
.../arch/x86/broadwell/bdw-metrics.json       |  679 +++++++--
.../arch/x86/broadwellde/bdwde-metrics.json   |  711 +++++++--
.../arch/x86/broadwellx/bdx-metrics.json      |  965 +++++++-----
.../arch/x86/broadwellx/uncore-cache.json     |   10 +-
.../x86/broadwellx/uncore-interconnect.json   |   18 +-
.../arch/x86/broadwellx/uncore-memory.json    |   18 +-
.../arch/x86/cascadelakex/clx-metrics.json    | 1285 ++++++++++------
.../arch/x86/cascadelakex/uncore-memory.json  |   18 +-
.../arch/x86/cascadelakex/uncore-other.json   |   10 +-
.../pmu-events/arch/x86/haswell/cache.json    |    4 +-
.../pmu-events/arch/x86/haswell/frontend.json |   12 +-
.../arch/x86/haswell/hsw-metrics.json         |  570 ++++++-
.../pmu-events/arch/x86/haswellx/cache.json   |    2 +-
.../arch/x86/haswellx/frontend.json           |   12 +-
.../arch/x86/haswellx/hsx-metrics.json        |  919 +++++++----
.../x86/haswellx/uncore-interconnect.json     |   18 +-
.../arch/x86/haswellx/uncore-memory.json      |   18 +-
.../pmu-events/arch/x86/icelake/cache.json    |    6 +-
.../arch/x86/icelake/icl-metrics.json         |  808 +++++++++-
.../pmu-events/arch/x86/icelake/pipeline.json |    2 +-
.../pmu-events/arch/x86/icelakex/cache.json   |    6 +-
.../arch/x86/icelakex/icx-metrics.json        | 1155 ++++++++++----
.../arch/x86/icelakex/pipeline.json           |    2 +-
.../arch/x86/icelakex/uncore-other.json       |    2 +-
.../arch/x86/ivybridge/ivb-metrics.json       |  594 ++++++--
.../pmu-events/arch/x86/ivytown/cache.json    |    4 +-
.../arch/x86/ivytown/floating-point.json      |    2 +-
.../pmu-events/arch/x86/ivytown/frontend.json |   18 +-
.../arch/x86/ivytown/ivt-metrics.json         |  630 ++++++--
.../arch/x86/ivytown/uncore-cache.json        |   58 +-
.../arch/x86/ivytown/uncore-interconnect.json |   84 +-
.../arch/x86/ivytown/uncore-memory.json       |    2 +-
.../arch/x86/ivytown/uncore-other.json        |    6 +-
.../arch/x86/ivytown/uncore-power.json        |    8 +-
.../arch/x86/jaketown/jkt-metrics.json        |  327 +++-
tools/perf/pmu-events/arch/x86/mapfile.csv    |   18 +-
.../arch/x86/sandybridge/snb-metrics.json     |  315 +++-
.../arch/x86/sapphirerapids/cache.json        |    4 +-
.../arch/x86/sapphirerapids/frontend.json     |   11 +
.../arch/x86/sapphirerapids/pipeline.json     |    4 +-
.../arch/x86/sapphirerapids/spr-metrics.json  | 1249 ++++++++++-----
.../arch/x86/skylake/skl-metrics.json         |  861 ++++++++---
.../arch/x86/skylakex/skx-metrics.json        | 1262 +++++++++------
.../arch/x86/skylakex/uncore-memory.json      |   18 +-
.../arch/x86/skylakex/uncore-other.json       |   19 +-
.../arch/x86/tigerlake/tgl-metrics.json       |  810 +++++++++-
tools/perf/pmu-events/empty-pmu-events.c      |    6 +-
tools/perf/tests/expr.c                       |    4 +
tools/perf/util/expr.c                        |   11 +-
tools/perf/util/expr.y                        |    2 +-
tools/perf/util/stat-shadow.c                 |    9 +-
58 files changed, 11514 insertions(+), 3630 deletions(-)
[PATCH v3 00/23] Improvements to Intel perf metrics
Posted by Ian Rogers 1 year, 6 months ago
For consistency with:
https://github.com/intel/perfmon-metrics
rename of topdown TMA metrics from Frontend_Bound to tma_frontend_bound.

Remove _SMT suffix metrics are dropped as the #SMT_On and #EBS_Mode
are correctly expanded in the single main metric. Fix perf expr to
allow a double if to be correctly processed.

Add all 6 levels of TMA metrics. Child metrics are placed in a group
named after their parent allowing children of a metric to be
easily measured using the metric name with a _group suffix.

Don't drop TMA metrics if they contain topdown events.

The ## and ##? operators are correctly expanded.

The locate-with column is added to the long description describing a
sampling event.

Metrics are written in terms of other metrics to reduce the expression
size and increase readability.

Following this the pmu-events/arch/x86 directories match those created
by the script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
with updates at:
https://github.com/captain5050/event-converter-for-linux-perf


v3. Fix a parse metrics test failure due to making metrics referring
    to other metrics case sensitive - make the cases in the test
    metric match.
v2. Fixes commit message wrt missing mapfile.csv updates as noted by
    Zhengjun Xing <zhengjun.xing@linux.intel.com>. ScaleUnit is added
    for TMA metrics. Metrics with topdown events have have a missing
    slots event added if necessary. The latest metrics at:
    https://github.com/intel/perfmon-metrics are used, however, the
    event-converter-for-linux-perf scripts now prefer their own
    metrics in case of mismatched units when a metric is written in
    terms of another.  Additional testing was performed on broadwell,
    broadwellde, cascadelakex, haswellx, sapphirerapids and tigerlake
    CPUs.

Ian Rogers (23):
  perf expr: Allow a double if expression
  perf test: Adjust case of test metrics
  perf expr: Remove jevents case workaround
  perf metrics: Don't scale counts going into metrics
  perf vendor events: Update Intel skylakex
  perf vendor events: Update Intel alderlake
  perf vendor events: Update Intel broadwell
  perf vendor events: Update Intel broadwellx
  perf vendor events: Update Intel cascadelakex
  perf vendor events: Update elkhartlake cpuids
  perf vendor events: Update Intel haswell
  perf vendor events: Update Intel haswellx
  perf vendor events: Update Intel icelake
  perf vendor events: Update Intel icelakex
  perf vendor events: Update Intel ivybridge
  perf vendor events: Update Intel ivytown
  perf vendor events: Update Intel jaketown
  perf vendor events: Update Intel sandybridge
  perf vendor events: Update Intel sapphirerapids
  perf vendor events: Update silvermont cpuids
  perf vendor events: Update Intel skylake
  perf vendor events: Update Intel tigerlake
  perf vendor events: Update Intel broadwellde

 .../arch/test/test_soc/cpu/metrics.json       |    6 +-
 .../arch/x86/alderlake/adl-metrics.json       | 1353 ++++++++++++++++-
 .../pmu-events/arch/x86/alderlake/cache.json  |  129 +-
 .../arch/x86/alderlake/frontend.json          |   12 +
 .../pmu-events/arch/x86/alderlake/memory.json |   22 +
 .../pmu-events/arch/x86/alderlake/other.json  |   22 +
 .../arch/x86/alderlake/pipeline.json          |   14 +-
 .../arch/x86/broadwell/bdw-metrics.json       |  679 +++++++--
 .../arch/x86/broadwellde/bdwde-metrics.json   |  711 +++++++--
 .../arch/x86/broadwellx/bdx-metrics.json      |  965 +++++++-----
 .../arch/x86/broadwellx/uncore-cache.json     |   10 +-
 .../x86/broadwellx/uncore-interconnect.json   |   18 +-
 .../arch/x86/broadwellx/uncore-memory.json    |   18 +-
 .../arch/x86/cascadelakex/clx-metrics.json    | 1285 ++++++++++------
 .../arch/x86/cascadelakex/uncore-memory.json  |   18 +-
 .../arch/x86/cascadelakex/uncore-other.json   |   10 +-
 .../pmu-events/arch/x86/haswell/cache.json    |    4 +-
 .../pmu-events/arch/x86/haswell/frontend.json |   12 +-
 .../arch/x86/haswell/hsw-metrics.json         |  570 ++++++-
 .../pmu-events/arch/x86/haswellx/cache.json   |    2 +-
 .../arch/x86/haswellx/frontend.json           |   12 +-
 .../arch/x86/haswellx/hsx-metrics.json        |  919 +++++++----
 .../x86/haswellx/uncore-interconnect.json     |   18 +-
 .../arch/x86/haswellx/uncore-memory.json      |   18 +-
 .../pmu-events/arch/x86/icelake/cache.json    |    6 +-
 .../arch/x86/icelake/icl-metrics.json         |  808 +++++++++-
 .../pmu-events/arch/x86/icelake/pipeline.json |    2 +-
 .../pmu-events/arch/x86/icelakex/cache.json   |    6 +-
 .../arch/x86/icelakex/icx-metrics.json        | 1155 ++++++++++----
 .../arch/x86/icelakex/pipeline.json           |    2 +-
 .../arch/x86/icelakex/uncore-other.json       |    2 +-
 .../arch/x86/ivybridge/ivb-metrics.json       |  594 ++++++--
 .../pmu-events/arch/x86/ivytown/cache.json    |    4 +-
 .../arch/x86/ivytown/floating-point.json      |    2 +-
 .../pmu-events/arch/x86/ivytown/frontend.json |   18 +-
 .../arch/x86/ivytown/ivt-metrics.json         |  630 ++++++--
 .../arch/x86/ivytown/uncore-cache.json        |   58 +-
 .../arch/x86/ivytown/uncore-interconnect.json |   84 +-
 .../arch/x86/ivytown/uncore-memory.json       |    2 +-
 .../arch/x86/ivytown/uncore-other.json        |    6 +-
 .../arch/x86/ivytown/uncore-power.json        |    8 +-
 .../arch/x86/jaketown/jkt-metrics.json        |  327 +++-
 tools/perf/pmu-events/arch/x86/mapfile.csv    |   18 +-
 .../arch/x86/sandybridge/snb-metrics.json     |  315 +++-
 .../arch/x86/sapphirerapids/cache.json        |    4 +-
 .../arch/x86/sapphirerapids/frontend.json     |   11 +
 .../arch/x86/sapphirerapids/pipeline.json     |    4 +-
 .../arch/x86/sapphirerapids/spr-metrics.json  | 1249 ++++++++++-----
 .../arch/x86/skylake/skl-metrics.json         |  861 ++++++++---
 .../arch/x86/skylakex/skx-metrics.json        | 1262 +++++++++------
 .../arch/x86/skylakex/uncore-memory.json      |   18 +-
 .../arch/x86/skylakex/uncore-other.json       |   19 +-
 .../arch/x86/tigerlake/tgl-metrics.json       |  810 +++++++++-
 tools/perf/pmu-events/empty-pmu-events.c      |    6 +-
 tools/perf/tests/expr.c                       |    4 +
 tools/perf/util/expr.c                        |   11 +-
 tools/perf/util/expr.y                        |    2 +-
 tools/perf/util/stat-shadow.c                 |    9 +-
 58 files changed, 11514 insertions(+), 3630 deletions(-)

-- 
2.38.0.rc1.362.ged0d419d3c-goog
Re: [PATCH v3 00/23] Improvements to Intel perf metrics
Posted by Ian Rogers 1 year, 6 months ago
On Mon, Oct 3, 2022 at 7:16 PM Ian Rogers <irogers@google.com> wrote:
>
> For consistency with:
> https://github.com/intel/perfmon-metrics
> rename of topdown TMA metrics from Frontend_Bound to tma_frontend_bound.
>
> Remove _SMT suffix metrics are dropped as the #SMT_On and #EBS_Mode
> are correctly expanded in the single main metric. Fix perf expr to
> allow a double if to be correctly processed.
>
> Add all 6 levels of TMA metrics. Child metrics are placed in a group
> named after their parent allowing children of a metric to be
> easily measured using the metric name with a _group suffix.
>
> Don't drop TMA metrics if they contain topdown events.
>
> The ## and ##? operators are correctly expanded.
>
> The locate-with column is added to the long description describing a
> sampling event.
>
> Metrics are written in terms of other metrics to reduce the expression
> size and increase readability.
>
> Following this the pmu-events/arch/x86 directories match those created
> by the script at:
> https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
> with updates at:
> https://github.com/captain5050/event-converter-for-linux-perf
>
>
> v3. Fix a parse metrics test failure due to making metrics referring
>     to other metrics case sensitive - make the cases in the test
>     metric match.
> v2. Fixes commit message wrt missing mapfile.csv updates as noted by
>     Zhengjun Xing <zhengjun.xing@linux.intel.com>. ScaleUnit is added
>     for TMA metrics. Metrics with topdown events have have a missing
>     slots event added if necessary. The latest metrics at:
>     https://github.com/intel/perfmon-metrics are used, however, the
>     event-converter-for-linux-perf scripts now prefer their own
>     metrics in case of mismatched units when a metric is written in
>     terms of another.  Additional testing was performed on broadwell,
>     broadwellde, cascadelakex, haswellx, sapphirerapids and tigerlake
>     CPUs.

I wrote up a little example of performing a top-down analysis for the
perf wiki here:
https://perf.wiki.kernel.org/index.php/Top-Down_Analysis

Thanks,
Ian

> Ian Rogers (23):
>   perf expr: Allow a double if expression
>   perf test: Adjust case of test metrics
>   perf expr: Remove jevents case workaround
>   perf metrics: Don't scale counts going into metrics
>   perf vendor events: Update Intel skylakex
>   perf vendor events: Update Intel alderlake
>   perf vendor events: Update Intel broadwell
>   perf vendor events: Update Intel broadwellx
>   perf vendor events: Update Intel cascadelakex
>   perf vendor events: Update elkhartlake cpuids
>   perf vendor events: Update Intel haswell
>   perf vendor events: Update Intel haswellx
>   perf vendor events: Update Intel icelake
>   perf vendor events: Update Intel icelakex
>   perf vendor events: Update Intel ivybridge
>   perf vendor events: Update Intel ivytown
>   perf vendor events: Update Intel jaketown
>   perf vendor events: Update Intel sandybridge
>   perf vendor events: Update Intel sapphirerapids
>   perf vendor events: Update silvermont cpuids
>   perf vendor events: Update Intel skylake
>   perf vendor events: Update Intel tigerlake
>   perf vendor events: Update Intel broadwellde
>
>  .../arch/test/test_soc/cpu/metrics.json       |    6 +-
>  .../arch/x86/alderlake/adl-metrics.json       | 1353 ++++++++++++++++-
>  .../pmu-events/arch/x86/alderlake/cache.json  |  129 +-
>  .../arch/x86/alderlake/frontend.json          |   12 +
>  .../pmu-events/arch/x86/alderlake/memory.json |   22 +
>  .../pmu-events/arch/x86/alderlake/other.json  |   22 +
>  .../arch/x86/alderlake/pipeline.json          |   14 +-
>  .../arch/x86/broadwell/bdw-metrics.json       |  679 +++++++--
>  .../arch/x86/broadwellde/bdwde-metrics.json   |  711 +++++++--
>  .../arch/x86/broadwellx/bdx-metrics.json      |  965 +++++++-----
>  .../arch/x86/broadwellx/uncore-cache.json     |   10 +-
>  .../x86/broadwellx/uncore-interconnect.json   |   18 +-
>  .../arch/x86/broadwellx/uncore-memory.json    |   18 +-
>  .../arch/x86/cascadelakex/clx-metrics.json    | 1285 ++++++++++------
>  .../arch/x86/cascadelakex/uncore-memory.json  |   18 +-
>  .../arch/x86/cascadelakex/uncore-other.json   |   10 +-
>  .../pmu-events/arch/x86/haswell/cache.json    |    4 +-
>  .../pmu-events/arch/x86/haswell/frontend.json |   12 +-
>  .../arch/x86/haswell/hsw-metrics.json         |  570 ++++++-
>  .../pmu-events/arch/x86/haswellx/cache.json   |    2 +-
>  .../arch/x86/haswellx/frontend.json           |   12 +-
>  .../arch/x86/haswellx/hsx-metrics.json        |  919 +++++++----
>  .../x86/haswellx/uncore-interconnect.json     |   18 +-
>  .../arch/x86/haswellx/uncore-memory.json      |   18 +-
>  .../pmu-events/arch/x86/icelake/cache.json    |    6 +-
>  .../arch/x86/icelake/icl-metrics.json         |  808 +++++++++-
>  .../pmu-events/arch/x86/icelake/pipeline.json |    2 +-
>  .../pmu-events/arch/x86/icelakex/cache.json   |    6 +-
>  .../arch/x86/icelakex/icx-metrics.json        | 1155 ++++++++++----
>  .../arch/x86/icelakex/pipeline.json           |    2 +-
>  .../arch/x86/icelakex/uncore-other.json       |    2 +-
>  .../arch/x86/ivybridge/ivb-metrics.json       |  594 ++++++--
>  .../pmu-events/arch/x86/ivytown/cache.json    |    4 +-
>  .../arch/x86/ivytown/floating-point.json      |    2 +-
>  .../pmu-events/arch/x86/ivytown/frontend.json |   18 +-
>  .../arch/x86/ivytown/ivt-metrics.json         |  630 ++++++--
>  .../arch/x86/ivytown/uncore-cache.json        |   58 +-
>  .../arch/x86/ivytown/uncore-interconnect.json |   84 +-
>  .../arch/x86/ivytown/uncore-memory.json       |    2 +-
>  .../arch/x86/ivytown/uncore-other.json        |    6 +-
>  .../arch/x86/ivytown/uncore-power.json        |    8 +-
>  .../arch/x86/jaketown/jkt-metrics.json        |  327 +++-
>  tools/perf/pmu-events/arch/x86/mapfile.csv    |   18 +-
>  .../arch/x86/sandybridge/snb-metrics.json     |  315 +++-
>  .../arch/x86/sapphirerapids/cache.json        |    4 +-
>  .../arch/x86/sapphirerapids/frontend.json     |   11 +
>  .../arch/x86/sapphirerapids/pipeline.json     |    4 +-
>  .../arch/x86/sapphirerapids/spr-metrics.json  | 1249 ++++++++++-----
>  .../arch/x86/skylake/skl-metrics.json         |  861 ++++++++---
>  .../arch/x86/skylakex/skx-metrics.json        | 1262 +++++++++------
>  .../arch/x86/skylakex/uncore-memory.json      |   18 +-
>  .../arch/x86/skylakex/uncore-other.json       |   19 +-
>  .../arch/x86/tigerlake/tgl-metrics.json       |  810 +++++++++-
>  tools/perf/pmu-events/empty-pmu-events.c      |    6 +-
>  tools/perf/tests/expr.c                       |    4 +
>  tools/perf/util/expr.c                        |   11 +-
>  tools/perf/util/expr.y                        |    2 +-
>  tools/perf/util/stat-shadow.c                 |    9 +-
>  58 files changed, 11514 insertions(+), 3630 deletions(-)
>
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>
Re: [PATCH v3 00/23] Improvements to Intel perf metrics
Posted by Andi Kleen 1 year, 6 months ago
[cutting down cc list]


On 10/3/2022 8:43 PM, Ian Rogers wrote:
> On Mon, Oct 3, 2022 at 7:16 PM Ian Rogers <irogers@google.com> wrote:
>> For consistency with:
>> https://github.com/intel/perfmon-metrics
>> rename of topdown TMA metrics from Frontend_Bound to tma_frontend_bound.
>>
>> Remove _SMT suffix metrics are dropped as the #SMT_On and #EBS_Mode
>> are correctly expanded in the single main metric. Fix perf expr to
>> allow a double if to be correctly processed.
>>
>> Add all 6 levels of TMA metrics. Child metrics are placed in a group
>> named after their parent allowing children of a metric to be
>> easily measured using the metric name with a _group suffix.
>>
>> Don't drop TMA metrics if they contain topdown events.
>>
>> The ## and ##? operators are correctly expanded.
>>
>> The locate-with column is added to the long description describing a
>> sampling event.
>>
>> Metrics are written in terms of other metrics to reduce the expression
>> size and increase readability.
>>
>> Following this the pmu-events/arch/x86 directories match those created
>> by the script at:
>> https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
>> with updates at:
>> https://github.com/captain5050/event-converter-for-linux-perf
>>
>>
>> v3. Fix a parse metrics test failure due to making metrics referring
>>      to other metrics case sensitive - make the cases in the test
>>      metric match.
>> v2. Fixes commit message wrt missing mapfile.csv updates as noted by
>>      Zhengjun Xing <zhengjun.xing@linux.intel.com>. ScaleUnit is added
>>      for TMA metrics. Metrics with topdown events have have a missing
>>      slots event added if necessary. The latest metrics at:
>>      https://github.com/intel/perfmon-metrics are used, however, the
>>      event-converter-for-linux-perf scripts now prefer their own
>>      metrics in case of mismatched units when a metric is written in
>>      terms of another.  Additional testing was performed on broadwell,
>>      broadwellde, cascadelakex, haswellx, sapphirerapids and tigerlake
>>      CPUs.
> I wrote up a little example of performing a top-down analysis for the
> perf wiki here:
> https://perf.wiki.kernel.org/index.php/Top-Down_Analysis


I did some quick testing.

On Skylake the output of L1 isn't scaled to percent:

$ ./perf stat -M TopdownL1 ~/pmu/pmu-tools/workloads/BC1s

  Performance counter stats for '/home/ak/pmu/pmu-tools/workloads/BC1s':

        608,066,701      INT_MISC.RECOVERY_CYCLES         # 0.32 
Bad_Speculation          (50.02%)
      5,364,230,382      CPU_CLK_UNHALTED.THREAD          # 0.48 
Retiring                 (50.02%)
     10,194,062,626 UOPS_RETIRED.RETIRE_SLOTS (50.02%)
     14,613,100,390 UOPS_ISSUED.ANY (50.02%)
      2,928,793,077      IDQ_UOPS_NOT_DELIVERED.CORE      # 0.14 
Frontend_Bound
                                                   #     0.07 
Backend_Bound            (50.02%)
        604,850,703 INT_MISC.RECOVERY_CYCLES (50.02%)
      5,357,291,185 CPU_CLK_UNHALTED.THREAD (50.02%)
     14,618,285,580 UOPS_ISSUED.ANY (50.02%)

Then if I follow the wiki example here I would expect I need to do

$ ./perf stat -M tma_backend_bound_group ~/pmu/pmu-tools/workloads/BC1s

Cannot find metric or group `tma_backend_bound_group'

but tma_retiring_group doesn't exist. So it seems the methodology isn't 
fully consistent everywhere? Perhaps the wiki needs to document the 
supported CPUs and also what part of the hierarchy is supported.

Another problem I noticed in the example is that the sample event didn't 
specify PEBS, even though it probably should at least on Icelake+ where 
every event can be used with less over with PEBS.

Also with all these groups that need to be specified by hand some bash 
completion support for groups would be really useful)

-Andi


Re: [PATCH v3 00/23] Improvements to Intel perf metrics
Posted by Ian Rogers 1 year, 6 months ago
On Tue, Oct 4, 2022 at 10:29 AM Andi Kleen <ak@linux.intel.com> wrote:
>
> [cutting down cc list]
>
>
> On 10/3/2022 8:43 PM, Ian Rogers wrote:
> > On Mon, Oct 3, 2022 at 7:16 PM Ian Rogers <irogers@google.com> wrote:
> >> For consistency with:
> >> https://github.com/intel/perfmon-metrics
> >> rename of topdown TMA metrics from Frontend_Bound to tma_frontend_bound.
> >>
> >> Remove _SMT suffix metrics are dropped as the #SMT_On and #EBS_Mode
> >> are correctly expanded in the single main metric. Fix perf expr to
> >> allow a double if to be correctly processed.
> >>
> >> Add all 6 levels of TMA metrics. Child metrics are placed in a group
> >> named after their parent allowing children of a metric to be
> >> easily measured using the metric name with a _group suffix.
> >>
> >> Don't drop TMA metrics if they contain topdown events.
> >>
> >> The ## and ##? operators are correctly expanded.
> >>
> >> The locate-with column is added to the long description describing a
> >> sampling event.
> >>
> >> Metrics are written in terms of other metrics to reduce the expression
> >> size and increase readability.
> >>
> >> Following this the pmu-events/arch/x86 directories match those created
> >> by the script at:
> >> https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
> >> with updates at:
> >> https://github.com/captain5050/event-converter-for-linux-perf
> >>
> >>
> >> v3. Fix a parse metrics test failure due to making metrics referring
> >>      to other metrics case sensitive - make the cases in the test
> >>      metric match.
> >> v2. Fixes commit message wrt missing mapfile.csv updates as noted by
> >>      Zhengjun Xing <zhengjun.xing@linux.intel.com>. ScaleUnit is added
> >>      for TMA metrics. Metrics with topdown events have have a missing
> >>      slots event added if necessary. The latest metrics at:
> >>      https://github.com/intel/perfmon-metrics are used, however, the
> >>      event-converter-for-linux-perf scripts now prefer their own
> >>      metrics in case of mismatched units when a metric is written in
> >>      terms of another.  Additional testing was performed on broadwell,
> >>      broadwellde, cascadelakex, haswellx, sapphirerapids and tigerlake
> >>      CPUs.
> > I wrote up a little example of performing a top-down analysis for the
> > perf wiki here:
> > https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
>
>
> I did some quick testing.
>
> On Skylake the output of L1 isn't scaled to percent:
>
> $ ./perf stat -M TopdownL1 ~/pmu/pmu-tools/workloads/BC1s
>
>   Performance counter stats for '/home/ak/pmu/pmu-tools/workloads/BC1s':
>
>         608,066,701      INT_MISC.RECOVERY_CYCLES         # 0.32
> Bad_Speculation          (50.02%)
>       5,364,230,382      CPU_CLK_UNHALTED.THREAD          # 0.48
> Retiring                 (50.02%)
>      10,194,062,626 UOPS_RETIRED.RETIRE_SLOTS (50.02%)
>      14,613,100,390 UOPS_ISSUED.ANY (50.02%)
>       2,928,793,077      IDQ_UOPS_NOT_DELIVERED.CORE      # 0.14
> Frontend_Bound
>                                                    #     0.07
> Backend_Bound            (50.02%)
>         604,850,703 INT_MISC.RECOVERY_CYCLES (50.02%)
>       5,357,291,185 CPU_CLK_UNHALTED.THREAD (50.02%)
>      14,618,285,580 UOPS_ISSUED.ANY (50.02%)

Did you build Arnaldo's perf/core branch with the changes applied? The
metric values here should be tma_bad_speculation, tma_retiring,
tma_frontend_bound, tma_backend_bound.

Looking at:
https://lore.kernel.org/lkml/20221004021612.325521-22-irogers@google.com/

+        "MetricExpr": "1 - tma_frontend_bound - (UOPS_ISSUED.ANY + 4
* ((INT_MISC.RECOVERY_CYCLES_ANY / 2) if #SMT_on else
INT_MISC.RECOVERY_CYCLES)) / SLOTS",
+        "MetricGroup": "TopdownL1;tma_L1_group",
+        "MetricName": "tma_backend_bound",
+        "PublicDescription": "This category represents fraction of
slots where no uops are being delivered due to a lack of required
resources for accepting new uops in the Backend. Backend is the
portion of the processor core where the out-of-order scheduler
dispatches ready uops into their respective execution units; and once
completed these uops get retired according to program order. For
example; stalls due to data-cache misses or stalls due to the divider
unit being overloaded are both categorized under Backend Bound.
Backend Bound is further divided into two main categories: Memory
Bound and Core Bound.",
+        "ScaleUnit": "100%"

So it wouldn't make sense to me that the scale was missing. Fwiw, I
did test on SkylakeX but used Tigerlake for the wiki due to potential
clock domain issues with SLOTS.

> Then if I follow the wiki example here I would expect I need to do
>
> $ ./perf stat -M tma_backend_bound_group ~/pmu/pmu-tools/workloads/BC1s
>
> Cannot find metric or group `tma_backend_bound_group'
>
> but tma_retiring_group doesn't exist. So it seems the methodology isn't
> fully consistent everywhere? Perhaps the wiki needs to document the
> supported CPUs and also what part of the hierarchy is supported.

So I think you've not got Arnaldo's branch with the changes applied.
Unfortunately the instructions around '_group' are only going to apply
to Linux 6.1.

> Another problem I noticed in the example is that the sample event didn't
> specify PEBS, even though it probably should at least on Icelake+ where
> every event can be used with less over with PEBS.

The 'Sample with' text is just text for a description. We can change
it or put something on the wiki, what would you suggest?

> Also with all these groups that need to be specified by hand some bash
> completion support for groups would be really useful)

Ack. My expectation is that everyone starts with TopdownL1 and goes
from there adding '_group' to the metric they want to drill into.
There are 104 topdown metrics and I'm not sure how useful expanding
all of these would be. On Icelake+ this becomes muddy due to the
unconditional printing of topdown metrics in the midst of the
regularly computed metrics, this can be seen on the wiki.
https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
For example, when the level 2 metric group tma_backend_bound_group is
given the level 1 metrics Retiring, Frontend Bound, Backend Bound and
Bad Speculation are displayed.

Thanks,
Ian

> -Andi
>
>
Re: [PATCH v3 00/23] Improvements to Intel perf metrics
Posted by Arnaldo Carvalho de Melo 1 year, 6 months ago
Em Tue, Oct 04, 2022 at 10:55:56AM -0700, Ian Rogers escreveu:
> On Tue, Oct 4, 2022 at 10:29 AM Andi Kleen <ak@linux.intel.com> wrote:
> > Then if I follow the wiki example here I would expect I need to do

> > $ ./perf stat -M tma_backend_bound_group ~/pmu/pmu-tools/workloads/BC1s

> > Cannot find metric or group `tma_backend_bound_group'

> > but tma_retiring_group doesn't exist. So it seems the methodology isn't
> > fully consistent everywhere? Perhaps the wiki needs to document the
> > supported CPUs and also what part of the hierarchy is supported.
 
> So I think you've not got Arnaldo's branch with the changes applied.
> Unfortunately the instructions around '_group' are only going to apply
> to Linux 6.1.

I just pushed perf/core with Ian's v3 series, please check with that
one.

- Arnaldo
Re: [PATCH v3 00/23] Improvements to Intel perf metrics
Posted by Andi Kleen 1 year, 6 months ago
>   
>> So I think you've not got Arnaldo's branch with the changes applied.
>> Unfortunately the instructions around '_group' are only going to apply
>> to Linux 6.1.
> I just pushed perf/core with Ian's v3 series, please check with that
> one.


Yes works with the latest branch thanks.


-Andi
Re: [PATCH v3 00/23] Improvements to Intel perf metrics
Posted by Arnaldo Carvalho de Melo 1 year, 6 months ago
Em Tue, Oct 04, 2022 at 11:15:34AM -0700, Andi Kleen escreveu:
> 
> > > So I think you've not got Arnaldo's branch with the changes applied.
> > > Unfortunately the instructions around '_group' are only going to apply
> > > to Linux 6.1.
> > I just pushed perf/core with Ian's v3 series, please check with that
> > one.
> 
> 
> Yes works with the latest branch thanks.

Thanks for checking,

- Arnaldo
Re: [PATCH v3 00/23] Improvements to Intel perf metrics
Posted by Arnaldo Carvalho de Melo 1 year, 6 months ago
Em Mon, Oct 03, 2022 at 07:15:49PM -0700, Ian Rogers escreveu:
> For consistency with:
> https://github.com/intel/perfmon-metrics
> rename of topdown TMA metrics from Frontend_Bound to tma_frontend_bound.
> 
> Remove _SMT suffix metrics are dropped as the #SMT_On and #EBS_Mode
> are correctly expanded in the single main metric. Fix perf expr to
> allow a double if to be correctly processed.
> 
> Add all 6 levels of TMA metrics. Child metrics are placed in a group
> named after their parent allowing children of a metric to be
> easily measured using the metric name with a _group suffix.
> 
> Don't drop TMA metrics if they contain topdown events.
> 
> The ## and ##? operators are correctly expanded.
> 
> The locate-with column is added to the long description describing a
> sampling event.
> 
> Metrics are written in terms of other metrics to reduce the expression
> size and increase readability.
> 
> Following this the pmu-events/arch/x86 directories match those created
> by the script at:
> https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py
> with updates at:
> https://github.com/captain5050/event-converter-for-linux-perf
> 
> 
> v3. Fix a parse metrics test failure due to making metrics referring
>     to other metrics case sensitive - make the cases in the test
>     metric match.



Thanks, applied.

- Arnaldo


> v2. Fixes commit message wrt missing mapfile.csv updates as noted by
>     Zhengjun Xing <zhengjun.xing@linux.intel.com>. ScaleUnit is added
>     for TMA metrics. Metrics with topdown events have have a missing
>     slots event added if necessary. The latest metrics at:
>     https://github.com/intel/perfmon-metrics are used, however, the
>     event-converter-for-linux-perf scripts now prefer their own
>     metrics in case of mismatched units when a metric is written in
>     terms of another.  Additional testing was performed on broadwell,
>     broadwellde, cascadelakex, haswellx, sapphirerapids and tigerlake
>     CPUs.
> 
> Ian Rogers (23):
>   perf expr: Allow a double if expression
>   perf test: Adjust case of test metrics
>   perf expr: Remove jevents case workaround
>   perf metrics: Don't scale counts going into metrics
>   perf vendor events: Update Intel skylakex
>   perf vendor events: Update Intel alderlake
>   perf vendor events: Update Intel broadwell
>   perf vendor events: Update Intel broadwellx
>   perf vendor events: Update Intel cascadelakex
>   perf vendor events: Update elkhartlake cpuids
>   perf vendor events: Update Intel haswell
>   perf vendor events: Update Intel haswellx
>   perf vendor events: Update Intel icelake
>   perf vendor events: Update Intel icelakex
>   perf vendor events: Update Intel ivybridge
>   perf vendor events: Update Intel ivytown
>   perf vendor events: Update Intel jaketown
>   perf vendor events: Update Intel sandybridge
>   perf vendor events: Update Intel sapphirerapids
>   perf vendor events: Update silvermont cpuids
>   perf vendor events: Update Intel skylake
>   perf vendor events: Update Intel tigerlake
>   perf vendor events: Update Intel broadwellde
> 
>  .../arch/test/test_soc/cpu/metrics.json       |    6 +-
>  .../arch/x86/alderlake/adl-metrics.json       | 1353 ++++++++++++++++-
>  .../pmu-events/arch/x86/alderlake/cache.json  |  129 +-
>  .../arch/x86/alderlake/frontend.json          |   12 +
>  .../pmu-events/arch/x86/alderlake/memory.json |   22 +
>  .../pmu-events/arch/x86/alderlake/other.json  |   22 +
>  .../arch/x86/alderlake/pipeline.json          |   14 +-
>  .../arch/x86/broadwell/bdw-metrics.json       |  679 +++++++--
>  .../arch/x86/broadwellde/bdwde-metrics.json   |  711 +++++++--
>  .../arch/x86/broadwellx/bdx-metrics.json      |  965 +++++++-----
>  .../arch/x86/broadwellx/uncore-cache.json     |   10 +-
>  .../x86/broadwellx/uncore-interconnect.json   |   18 +-
>  .../arch/x86/broadwellx/uncore-memory.json    |   18 +-
>  .../arch/x86/cascadelakex/clx-metrics.json    | 1285 ++++++++++------
>  .../arch/x86/cascadelakex/uncore-memory.json  |   18 +-
>  .../arch/x86/cascadelakex/uncore-other.json   |   10 +-
>  .../pmu-events/arch/x86/haswell/cache.json    |    4 +-
>  .../pmu-events/arch/x86/haswell/frontend.json |   12 +-
>  .../arch/x86/haswell/hsw-metrics.json         |  570 ++++++-
>  .../pmu-events/arch/x86/haswellx/cache.json   |    2 +-
>  .../arch/x86/haswellx/frontend.json           |   12 +-
>  .../arch/x86/haswellx/hsx-metrics.json        |  919 +++++++----
>  .../x86/haswellx/uncore-interconnect.json     |   18 +-
>  .../arch/x86/haswellx/uncore-memory.json      |   18 +-
>  .../pmu-events/arch/x86/icelake/cache.json    |    6 +-
>  .../arch/x86/icelake/icl-metrics.json         |  808 +++++++++-
>  .../pmu-events/arch/x86/icelake/pipeline.json |    2 +-
>  .../pmu-events/arch/x86/icelakex/cache.json   |    6 +-
>  .../arch/x86/icelakex/icx-metrics.json        | 1155 ++++++++++----
>  .../arch/x86/icelakex/pipeline.json           |    2 +-
>  .../arch/x86/icelakex/uncore-other.json       |    2 +-
>  .../arch/x86/ivybridge/ivb-metrics.json       |  594 ++++++--
>  .../pmu-events/arch/x86/ivytown/cache.json    |    4 +-
>  .../arch/x86/ivytown/floating-point.json      |    2 +-
>  .../pmu-events/arch/x86/ivytown/frontend.json |   18 +-
>  .../arch/x86/ivytown/ivt-metrics.json         |  630 ++++++--
>  .../arch/x86/ivytown/uncore-cache.json        |   58 +-
>  .../arch/x86/ivytown/uncore-interconnect.json |   84 +-
>  .../arch/x86/ivytown/uncore-memory.json       |    2 +-
>  .../arch/x86/ivytown/uncore-other.json        |    6 +-
>  .../arch/x86/ivytown/uncore-power.json        |    8 +-
>  .../arch/x86/jaketown/jkt-metrics.json        |  327 +++-
>  tools/perf/pmu-events/arch/x86/mapfile.csv    |   18 +-
>  .../arch/x86/sandybridge/snb-metrics.json     |  315 +++-
>  .../arch/x86/sapphirerapids/cache.json        |    4 +-
>  .../arch/x86/sapphirerapids/frontend.json     |   11 +
>  .../arch/x86/sapphirerapids/pipeline.json     |    4 +-
>  .../arch/x86/sapphirerapids/spr-metrics.json  | 1249 ++++++++++-----
>  .../arch/x86/skylake/skl-metrics.json         |  861 ++++++++---
>  .../arch/x86/skylakex/skx-metrics.json        | 1262 +++++++++------
>  .../arch/x86/skylakex/uncore-memory.json      |   18 +-
>  .../arch/x86/skylakex/uncore-other.json       |   19 +-
>  .../arch/x86/tigerlake/tgl-metrics.json       |  810 +++++++++-
>  tools/perf/pmu-events/empty-pmu-events.c      |    6 +-
>  tools/perf/tests/expr.c                       |    4 +
>  tools/perf/util/expr.c                        |   11 +-
>  tools/perf/util/expr.y                        |    2 +-
>  tools/perf/util/stat-shadow.c                 |    9 +-
>  58 files changed, 11514 insertions(+), 3630 deletions(-)
> 
> -- 
> 2.38.0.rc1.362.ged0d419d3c-goog

-- 

- Arnaldo