[PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes)

Ravi Bangoria posted 8 patches 3 years, 11 months ago
There is a newer version of this series
arch/x86/include/asm/amd-ibs.h                |  16 +-
tools/arch/x86/include/asm/amd-ibs.h          |  16 +-
.../Documentation/perf.data-file-format.txt   |  10 +-
tools/perf/arch/x86/util/evsel.c              |  49 +++++
tools/perf/builtin-inject.c                   |   2 +-
tools/perf/util/amd-sample-raw.c              |  68 +++++-
tools/perf/util/env.c                         |  62 +++++-
tools/perf/util/env.h                         |  14 +-
tools/perf/util/evsel.c                       |   7 +
tools/perf/util/evsel.h                       |   1 +
tools/perf/util/header.c                      | 196 ++++++++++--------
tools/perf/util/header.h                      |   2 +-
tools/perf/util/pmu.c                         |  15 +-
tools/perf/util/pmu.h                         |   2 +
14 files changed, 333 insertions(+), 127 deletions(-)
[PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes)
Posted by Ravi Bangoria 3 years, 11 months ago
Kernel side of changes have already been applied to linus/master
(except amd-ibs.h header). This series contains perf tool changes.

Kan, I don't have any machine with heterogeneou cpus. It would be
helpful if you can check HEADER_PMU_CAPS on Intel ADL machine.

v4: https://lore.kernel.org/lkml/20220523033945.1612-1-ravi.bangoria@amd.com
v4->v5:
 - Replace HEADER_HYBRID_CPU_PMU_CAPS with HEADER_PMU_CAPS instead of
   adding new header HEADER_PMU_CAPS. Special care is taken by writing
   hybrid cpu pmu caps first in the header to make sure old perf tool
   does not break.
 - Store HEADER_CPU_PMU_CAPS capabilities in an array instead of single
   string separated by NULL.
 - Include "cpu" pmu while searching for capabilities in perf_env.
 - Rebase on acme/perf/core (9dde6cadb92b5)

Original cover letter:

IBS support has been enhanced with two new features in upcoming uarch:
1. DataSrc extension and 2. L3 Miss Filtering capability. Both are
indicated by CPUID_Fn8000001B_EAX bit 11.

DataSrc extension provides additional data source details for tagged
load/store operations. Add support for these new bits in perf report/
script raw-dump.

IBS L3 miss filtering works by tagging an instruction on IBS counter
overflow and generating an NMI if the tagged instruction causes an L3
miss. Samples without an L3 miss are discarded and counter is reset
with random value (between 1-15 for fetch pmu and 1-127 for op pmu).
This helps in reducing sampling overhead when user is interested only
in such samples. One of the use case of such filtered samples is to
feed data to page-migration daemon in tiered memory systems.

Add support for L3 miss filtering in IBS driver via new pmu attribute
"l3missonly". Example usage:

  # perf record -a -e ibs_op/l3missonly=1/ --raw-samples sleep 5
  # perf report -D

Some important points to keep in mind while using L3 miss filtering:
1. Hw internally reset sampling period when tagged instruction does
   not cause L3 miss. But there is no way to reconstruct aggregated
   sampling period when this happens.
2. L3 miss is not the actual event being counted. Rather, IBS will
   count fetch, cycles or uOps depending on the configuration. Thus
   sampling period have no direct connection to L3 misses.

1st causes sampling period skew. Thus, I've added warning message at
perf record:

  # perf record -c 10000 -C 0 -e ibs_op/l3missonly=1/
  WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled
  and tagged operation does not cause L3 Miss. This causes sampling period skew.

User can configure smaller sampling period to get more samples while
using l3missonly.


Ravi Bangoria (8):
  perf record ibs: Warn about sampling period skew
  perf tool: Parse pmu caps sysfs only once
  perf headers: Pass "cpu" pmu name while printing caps
  perf headers: Store pmu caps in an array of strings
  perf headers: Record non-cpu pmu capabilities
  perf/x86/ibs: Add new IBS register bits into header
  perf tool ibs: Sync amd ibs header file
  perf script ibs: Support new IBS bits in raw trace dump

 arch/x86/include/asm/amd-ibs.h                |  16 +-
 tools/arch/x86/include/asm/amd-ibs.h          |  16 +-
 .../Documentation/perf.data-file-format.txt   |  10 +-
 tools/perf/arch/x86/util/evsel.c              |  49 +++++
 tools/perf/builtin-inject.c                   |   2 +-
 tools/perf/util/amd-sample-raw.c              |  68 +++++-
 tools/perf/util/env.c                         |  62 +++++-
 tools/perf/util/env.h                         |  14 +-
 tools/perf/util/evsel.c                       |   7 +
 tools/perf/util/evsel.h                       |   1 +
 tools/perf/util/header.c                      | 196 ++++++++++--------
 tools/perf/util/header.h                      |   2 +-
 tools/perf/util/pmu.c                         |  15 +-
 tools/perf/util/pmu.h                         |   2 +
 14 files changed, 333 insertions(+), 127 deletions(-)

-- 
2.31.1
Re: [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes)
Posted by Liang, Kan 3 years, 11 months ago

On 5/31/2022 11:26 PM, Ravi Bangoria wrote:
> Kernel side of changes have already been applied to linus/master
> (except amd-ibs.h header). This series contains perf tool changes.
> 
> Kan, I don't have any machine with heterogeneou cpus. It would be
> helpful if you can check HEADER_PMU_CAPS on Intel ADL machine.
>

I tried the patch 2-5 on a hybrid machine. I didn't see any regression 
with perf report --header-only option.

Without the patch 2-5,
# perf report --header-only | grep capabilities
# cpu_core pmu capabilities: branches=32, max_precise=3, 
pmu_name=alderlake_hybrid
# cpu_atom pmu capabilities: branches=32, max_precise=3, 
pmu_name=alderlake_hybrid

With the patch 2-5,
# ./perf report --header-only | grep capabilities
# cpu_core pmu capabilities: branches=32, max_precise=3, 
pmu_name=alderlake_hybrid
# cpu_atom pmu capabilities: branches=32, max_precise=3, 
pmu_name=alderlake_hybrid


Thanks,
Kan


> v4: https://lore.kernel.org/lkml/20220523033945.1612-1-ravi.bangoria@amd.com
> v4->v5:
>   - Replace HEADER_HYBRID_CPU_PMU_CAPS with HEADER_PMU_CAPS instead of
>     adding new header HEADER_PMU_CAPS. Special care is taken by writing
>     hybrid cpu pmu caps first in the header to make sure old perf tool
>     does not break.
>   - Store HEADER_CPU_PMU_CAPS capabilities in an array instead of single
>     string separated by NULL.
>   - Include "cpu" pmu while searching for capabilities in perf_env.
>   - Rebase on acme/perf/core (9dde6cadb92b5)
> 
> Original cover letter:
> 
> IBS support has been enhanced with two new features in upcoming uarch:
> 1. DataSrc extension and 2. L3 Miss Filtering capability. Both are
> indicated by CPUID_Fn8000001B_EAX bit 11.
> 
> DataSrc extension provides additional data source details for tagged
> load/store operations. Add support for these new bits in perf report/
> script raw-dump.
> 
> IBS L3 miss filtering works by tagging an instruction on IBS counter
> overflow and generating an NMI if the tagged instruction causes an L3
> miss. Samples without an L3 miss are discarded and counter is reset
> with random value (between 1-15 for fetch pmu and 1-127 for op pmu).
> This helps in reducing sampling overhead when user is interested only
> in such samples. One of the use case of such filtered samples is to
> feed data to page-migration daemon in tiered memory systems.
> 
> Add support for L3 miss filtering in IBS driver via new pmu attribute
> "l3missonly". Example usage:
> 
>    # perf record -a -e ibs_op/l3missonly=1/ --raw-samples sleep 5
>    # perf report -D
> 
> Some important points to keep in mind while using L3 miss filtering:
> 1. Hw internally reset sampling period when tagged instruction does
>     not cause L3 miss. But there is no way to reconstruct aggregated
>     sampling period when this happens.
> 2. L3 miss is not the actual event being counted. Rather, IBS will
>     count fetch, cycles or uOps depending on the configuration. Thus
>     sampling period have no direct connection to L3 misses.
> 
> 1st causes sampling period skew. Thus, I've added warning message at
> perf record:
> 
>    # perf record -c 10000 -C 0 -e ibs_op/l3missonly=1/
>    WARNING: Hw internally resets sampling period when L3 Miss Filtering is enabled
>    and tagged operation does not cause L3 Miss. This causes sampling period skew.
> 
> User can configure smaller sampling period to get more samples while
> using l3missonly.
> 
> 
> Ravi Bangoria (8):
>    perf record ibs: Warn about sampling period skew
>    perf tool: Parse pmu caps sysfs only once
>    perf headers: Pass "cpu" pmu name while printing caps
>    perf headers: Store pmu caps in an array of strings
>    perf headers: Record non-cpu pmu capabilities
>    perf/x86/ibs: Add new IBS register bits into header
>    perf tool ibs: Sync amd ibs header file
>    perf script ibs: Support new IBS bits in raw trace dump
> 
>   arch/x86/include/asm/amd-ibs.h                |  16 +-
>   tools/arch/x86/include/asm/amd-ibs.h          |  16 +-
>   .../Documentation/perf.data-file-format.txt   |  10 +-
>   tools/perf/arch/x86/util/evsel.c              |  49 +++++
>   tools/perf/builtin-inject.c                   |   2 +-
>   tools/perf/util/amd-sample-raw.c              |  68 +++++-
>   tools/perf/util/env.c                         |  62 +++++-
>   tools/perf/util/env.h                         |  14 +-
>   tools/perf/util/evsel.c                       |   7 +
>   tools/perf/util/evsel.h                       |   1 +
>   tools/perf/util/header.c                      | 196 ++++++++++--------
>   tools/perf/util/header.h                      |   2 +-
>   tools/perf/util/pmu.c                         |  15 +-
>   tools/perf/util/pmu.h                         |   2 +
>   14 files changed, 333 insertions(+), 127 deletions(-)
>
Re: [PATCH v5 0/8] perf/amd: Zen4 IBS extensions support (tool changes)
Posted by Ravi Bangoria 3 years, 11 months ago
On 01-Jun-22 7:34 PM, Liang, Kan wrote:
> 
> 
> On 5/31/2022 11:26 PM, Ravi Bangoria wrote:
>> Kernel side of changes have already been applied to linus/master
>> (except amd-ibs.h header). This series contains perf tool changes.
>>
>> Kan, I don't have any machine with heterogeneou cpus. It would be
>> helpful if you can check HEADER_PMU_CAPS on Intel ADL machine.
>>
> 
> I tried the patch 2-5 on a hybrid machine. I didn't see any regression with perf report --header-only option.
> 
> Without the patch 2-5,
> # perf report --header-only | grep capabilities
> # cpu_core pmu capabilities: branches=32, max_precise=3, pmu_name=alderlake_hybrid
> # cpu_atom pmu capabilities: branches=32, max_precise=3, pmu_name=alderlake_hybrid
> 
> With the patch 2-5,
> # ./perf report --header-only | grep capabilities
> # cpu_core pmu capabilities: branches=32, max_precise=3, pmu_name=alderlake_hybrid
> # cpu_atom pmu capabilities: branches=32, max_precise=3, pmu_name=alderlake_hybrid
> 

Perfect! Thanks for testing, Kan.

Arnaldo, since kernel patches are already applied to linus' tree for -rc1,
would you be able to include this series in your -rc1 PR?

Thanks,
Ravi