[PATCH v2 00/14] perf mem/c2c: Add support for AMD

Ravi Bangoria posted 14 patches 3 years, 10 months ago
Only 10 patches received!
There is a newer version of this series
arch/x86/events/amd/ibs.c                | 372 ++++++++++++++++++++++-
arch/x86/include/asm/amd-ibs.h           |  16 +
include/uapi/linux/perf_event.h          |   5 +-
kernel/events/core.c                     |   4 +-
tools/arch/x86/include/asm/amd-ibs.h     |  16 +
tools/include/uapi/linux/perf_event.h    |   5 +-
tools/perf/Documentation/perf-c2c.txt    |  14 +-
tools/perf/Documentation/perf-mem.txt    |   3 +-
tools/perf/Documentation/perf-record.txt |   1 +
tools/perf/arch/x86/util/mem-events.c    |  31 +-
tools/perf/builtin-c2c.c                 |   1 +
tools/perf/builtin-mem.c                 |   1 +
tools/perf/builtin-script.c              |   7 +-
tools/perf/util/mem-events.c             |  17 +-
14 files changed, 467 insertions(+), 26 deletions(-)
[PATCH v2 00/14] perf mem/c2c: Add support for AMD
Posted by Ravi Bangoria 3 years, 10 months ago
Perf mem and c2c tools are wrappers around perf record with mem load/
store events. IBS tagged load/store sample provides most of the
information needed for these tools. Enable support for these tools on
AMD Zen processors based on IBS Op pmu.

There are some limitations though: Only load/store instructions provide
mem/c2c information. However, IBS does not provide a way to choose a
particular type of instruction to tag. This results in many non-LS
instructions being tagged which appear as N/A. IBS, being an uncore pmu
from kernel point of view[1], does not support per process monitoring.
Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.

Example:
  $ sudo ./perf mem record -- -c 10000
  ^C[ perf record: Woken up 227 times to write data ]
  [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

  $ sudo ./perf mem report -F mem,sample,snoop
  Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
  Memory access                  Samples  Snoop
  N/A                             700620  N/A
  L1 hit                          126675  N/A
  L2 hit                             424  N/A
  L3 hit                             664  HitM
  L3 hit                              10  N/A
  Local RAM hit                        2  N/A
  Remote RAM (1 hop) hit            8558  N/A
  Remote Cache (1 hop) hit             3  N/A
  Remote Cache (1 hop) hit             2  HitM
  Remote Cache (2 hops) hit            10  HitM
  Remote Cache (2 hops) hit             6  N/A
  Uncached hit                         4  N/A

Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement patches[2]

[1]: https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bangoria@amd.com
[2]: https://lore.kernel.org/lkml/20220604044519.594-1-ravi.bangoria@amd.com

v1: https://lore.kernel.org/lkml/20220525093938.4101-1-ravi.bangoria@amd.com
v1->v2:
 - Instead of defining macros to extract IBS register bits, use existing
   bitfield definitions. Zen4 has introduced additional set of bits in
   IBS registers which this series also exploits and thus this series
   now depends on IBS Zen4 enhancement patchset.
 - Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new event,
   perf tool starts with a set of attributes and goes on reverting some
   attributes in a predefined order until it succeeds or run out or all
   attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
   which always fails because IBS does not support guest filtering. The
   problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
   exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
   support from kernel, using it from the perf tool need more changes.
   I'll try to address this bug later.
 - Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
   that physical address is set by arch pmu driver and should not be
   overwritten.


Ravi Bangoria (14):
  perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
  perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
  perf/x86/amd: Support PERF_SAMPLE_ADDR
  perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  perf tool: Sync include/uapi/linux/perf_event.h header
  perf tool: Sync arch/x86/include/asm/amd-ibs.h header
  perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
  perf mem/c2c: Add load store event mappings for AMD
  perf mem/c2c: Avoid printing empty lines for unsupported events
  perf mem: Use more generic term for LFB
  perf script: Add missing fields in usage hint

 arch/x86/events/amd/ibs.c                | 372 ++++++++++++++++++++++-
 arch/x86/include/asm/amd-ibs.h           |  16 +
 include/uapi/linux/perf_event.h          |   5 +-
 kernel/events/core.c                     |   4 +-
 tools/arch/x86/include/asm/amd-ibs.h     |  16 +
 tools/include/uapi/linux/perf_event.h    |   5 +-
 tools/perf/Documentation/perf-c2c.txt    |  14 +-
 tools/perf/Documentation/perf-mem.txt    |   3 +-
 tools/perf/Documentation/perf-record.txt |   1 +
 tools/perf/arch/x86/util/mem-events.c    |  31 +-
 tools/perf/builtin-c2c.c                 |   1 +
 tools/perf/builtin-mem.c                 |   1 +
 tools/perf/builtin-script.c              |   7 +-
 tools/perf/util/mem-events.c             |  17 +-
 14 files changed, 467 insertions(+), 26 deletions(-)

-- 
2.31.1
Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD
Posted by Jiri Olsa 3 years, 9 months ago
On Thu, Jun 16, 2022 at 05:06:23PM +0530, Ravi Bangoria wrote:
> Perf mem and c2c tools are wrappers around perf record with mem load/
> store events. IBS tagged load/store sample provides most of the
> information needed for these tools. Enable support for these tools on
> AMD Zen processors based on IBS Op pmu.
> 
> There are some limitations though: Only load/store instructions provide
> mem/c2c information. However, IBS does not provide a way to choose a
> particular type of instruction to tag. This results in many non-LS
> instructions being tagged which appear as N/A. IBS, being an uncore pmu
> from kernel point of view[1], does not support per process monitoring.
> Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.
> 
> Example:
>   $ sudo ./perf mem record -- -c 10000
>   ^C[ perf record: Woken up 227 times to write data ]
>   [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]
> 
>   $ sudo ./perf mem report -F mem,sample,snoop
>   Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
>   Memory access                  Samples  Snoop
>   N/A                             700620  N/A
>   L1 hit                          126675  N/A
>   L2 hit                             424  N/A
>   L3 hit                             664  HitM
>   L3 hit                              10  N/A
>   Local RAM hit                        2  N/A
>   Remote RAM (1 hop) hit            8558  N/A
>   Remote Cache (1 hop) hit             3  N/A
>   Remote Cache (1 hop) hit             2  HitM
>   Remote Cache (2 hops) hit            10  HitM
>   Remote Cache (2 hops) hit             6  N/A
>   Uncached hit                         4  N/A
> 
> Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement patches[2]
> 
> [1]: https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bangoria@amd.com
> [2]: https://lore.kernel.org/lkml/20220604044519.594-1-ravi.bangoria@amd.com
> 
> v1: https://lore.kernel.org/lkml/20220525093938.4101-1-ravi.bangoria@amd.com
> v1->v2:
>  - Instead of defining macros to extract IBS register bits, use existing
>    bitfield definitions. Zen4 has introduced additional set of bits in
>    IBS registers which this series also exploits and thus this series
>    now depends on IBS Zen4 enhancement patchset.
>  - Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new event,
>    perf tool starts with a set of attributes and goes on reverting some
>    attributes in a predefined order until it succeeds or run out or all
>    attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
>    which always fails because IBS does not support guest filtering. The
>    problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
>    exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
>    support from kernel, using it from the perf tool need more changes.
>    I'll try to address this bug later.
>  - Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
>    that physical address is set by arch pmu driver and should not be
>    overwritten.
> 
> 
> Ravi Bangoria (14):
>   perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
>   perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
>   perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
>   perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
>   perf/x86/amd: Support PERF_SAMPLE_ADDR
>   perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
>   perf tool: Sync include/uapi/linux/perf_event.h header
>   perf tool: Sync arch/x86/include/asm/amd-ibs.h header
>   perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
>   perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
>   perf mem/c2c: Add load store event mappings for AMD
>   perf mem/c2c: Avoid printing empty lines for unsupported events
>   perf mem: Use more generic term for LFB
>   perf script: Add missing fields in usage hint

tools part looks good to me

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

> 
>  arch/x86/events/amd/ibs.c                | 372 ++++++++++++++++++++++-
>  arch/x86/include/asm/amd-ibs.h           |  16 +
>  include/uapi/linux/perf_event.h          |   5 +-
>  kernel/events/core.c                     |   4 +-
>  tools/arch/x86/include/asm/amd-ibs.h     |  16 +
>  tools/include/uapi/linux/perf_event.h    |   5 +-
>  tools/perf/Documentation/perf-c2c.txt    |  14 +-
>  tools/perf/Documentation/perf-mem.txt    |   3 +-
>  tools/perf/Documentation/perf-record.txt |   1 +
>  tools/perf/arch/x86/util/mem-events.c    |  31 +-
>  tools/perf/builtin-c2c.c                 |   1 +
>  tools/perf/builtin-mem.c                 |   1 +
>  tools/perf/builtin-script.c              |   7 +-
>  tools/perf/util/mem-events.c             |  17 +-
>  14 files changed, 467 insertions(+), 26 deletions(-)
> 
> -- 
> 2.31.1
>
Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD
Posted by Arnaldo Carvalho de Melo 3 years, 9 months ago
Em Tue, Jul 12, 2022 at 01:35:25PM +0200, Jiri Olsa escreveu:
> On Thu, Jun 16, 2022 at 05:06:23PM +0530, Ravi Bangoria wrote:
> > Perf mem and c2c tools are wrappers around perf record with mem load/
> > store events. IBS tagged load/store sample provides most of the
> > information needed for these tools. Enable support for these tools on
> > AMD Zen processors based on IBS Op pmu.
> > 
> > There are some limitations though: Only load/store instructions provide
> > mem/c2c information. However, IBS does not provide a way to choose a
> > particular type of instruction to tag. This results in many non-LS
> > instructions being tagged which appear as N/A. IBS, being an uncore pmu
> > from kernel point of view[1], does not support per process monitoring.
> > Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only.
> > 
> > Example:
> >   $ sudo ./perf mem record -- -c 10000
> >   ^C[ perf record: Woken up 227 times to write data ]
> >   [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]
> > 
> >   $ sudo ./perf mem report -F mem,sample,snoop
> >   Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
> >   Memory access                  Samples  Snoop
> >   N/A                             700620  N/A
> >   L1 hit                          126675  N/A
> >   L2 hit                             424  N/A
> >   L3 hit                             664  HitM
> >   L3 hit                              10  N/A
> >   Local RAM hit                        2  N/A
> >   Remote RAM (1 hop) hit            8558  N/A
> >   Remote Cache (1 hop) hit             3  N/A
> >   Remote Cache (1 hop) hit             2  HitM
> >   Remote Cache (2 hops) hit            10  HitM
> >   Remote Cache (2 hops) hit             6  N/A
> >   Uncached hit                         4  N/A
> > 
> > Prepared on amd/perf/core (9886142c7a22) + IBS Zen4 enhancement patches[2]
> > 
> > [1]: https://lore.kernel.org/lkml/20220113134743.1292-1-ravi.bangoria@amd.com
> > [2]: https://lore.kernel.org/lkml/20220604044519.594-1-ravi.bangoria@amd.com
> > 
> > v1: https://lore.kernel.org/lkml/20220525093938.4101-1-ravi.bangoria@amd.com
> > v1->v2:
> >  - Instead of defining macros to extract IBS register bits, use existing
> >    bitfield definitions. Zen4 has introduced additional set of bits in
> >    IBS registers which this series also exploits and thus this series
> >    now depends on IBS Zen4 enhancement patchset.
> >  - Add support for PERF_SAMPLE_WEIGHT_STRUCT. While opening a new event,
> >    perf tool starts with a set of attributes and goes on reverting some
> >    attributes in a predefined order until it succeeds or run out or all
> >    attempts. Here, 1st attempt includes WEIGHT_STRUCT and exclude_guest
> >    which always fails because IBS does not support guest filtering. The
> >    problem however is, perf reverts WEIGHT_STRUCT but keeps trying with
> >    exclude_guest. Thus, although, this series enables WEIGHT_STRUCT
> >    support from kernel, using it from the perf tool need more changes.
> >    I'll try to address this bug later.
> >  - Introduce __PERF_SAMPLE_CALLCHAIN_EARLY to hint generic perf driver
> >    that physical address is set by arch pmu driver and should not be
> >    overwritten.
> > 
> > 
> > Ravi Bangoria (14):
> >   perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
> >   perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
> >   perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
> >   perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
> >   perf/x86/amd: Support PERF_SAMPLE_ADDR
> >   perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
> >   perf tool: Sync include/uapi/linux/perf_event.h header
> >   perf tool: Sync arch/x86/include/asm/amd-ibs.h header
> >   perf mem: Add support for printing PERF_MEM_LVLNUM_{EXTN_MEM|IO}
> >   perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
> >   perf mem/c2c: Add load store event mappings for AMD
> >   perf mem/c2c: Avoid printing empty lines for unsupported events
> >   perf mem: Use more generic term for LFB
> >   perf script: Add missing fields in usage hint
> 
> tools part looks good to me
> 
> Acked-by: Jiri Olsa <jolsa@kernel.org>

What about the kernel bits? PeterZ? Is this in some tip branch?

- Arnaldo
Re: [PATCH v2 00/14] perf mem/c2c: Add support for AMD
Posted by Ravi Bangoria 3 years, 9 months ago
On 16-Jun-22 5:06 PM, Ravi Bangoria wrote:
> Perf mem and c2c tools are wrappers around perf record with mem load/
> store events. IBS tagged load/store sample provides most of the
> information needed for these tools. Enable support for these tools on
> AMD Zen processors based on IBS Op pmu.

Gentle ping!

Thank,
Ravi