[PATCH v6 00/15] perf c2c: Support data source and display for Arm64

Leo Yan posted 15 patches 3 years, 8 months ago
tools/include/uapi/linux/perf_event.h         |   2 +-
tools/perf/Documentation/perf-c2c.txt         |  31 +-
tools/perf/builtin-c2c.c                      | 454 ++++++++++++++----
.../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
.../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
tools/perf/util/arm-spe.c                     | 130 ++++-
tools/perf/util/mem-events.c                  |  46 +-
tools/perf/util/mem-events.h                  |   3 +
8 files changed, 547 insertions(+), 132 deletions(-)
[PATCH v6 00/15] perf c2c: Support data source and display for Arm64
Posted by Leo Yan 3 years, 8 months ago
Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
us to detect cache line contention and transfers.

This patch set has been rebased on the acme/perf/core branch with the latest
commit b39c9e1b101d ("perf machine: Fix missing free of
machine->kallsyms_filename").

To make building success, a compilation fixing commit [1] has been sent
to LKML, this patch set is dependent on it.  This patch set has been verified
for both x86 perf memory events and Arm SPE events.

[1] https://lore.kernel.org/lkml/20220811044341.426796-1-leo.yan@linaro.org/

Changes from v5:
* Removed the patch "perf: Add SNOOP_PEER flag to perf mem data struct"
  (Arnaldo);
* Removed the patch "perf arm-spe: Don't set data source if it's not a
  memory operation" which has been merged in the mainline kernel, so can
  dismiss merging conflict.
* Rebased on the latest acme perf/core branch, no any code change
  compared to previous version.

Changes from v4:
* Included Ali's patch set for adding data source in Arm SPE samples;
* Added Ian's ACK and Ali's review and test tags;
* Update document for the default peer dispaly for Arm64 (Ali).

Changes from v3:
* Changed to display remote and local peer accesses (Joe);
* Fixed the usage info for display types (Joe);
* Do not display HITM dimensions when use 'peer' display, and HITM
  display doesn't show any 'peer' dimensions (James);
* Split to smaller patches for adding dimensions of peer operations;
* Updated documentation to reflect the latest GUI and stdio.


Ali Saidi (2):
  perf tools: sync addition of PERF_MEM_SNOOPX_PEER
  perf arm-spe: Use SPE data source for neoverse cores

Leo Yan (13):
  perf mem: Print snoop peer flag
  perf mem: Add statistics for peer snooping
  perf c2c: Output statistics for peer snooping
  perf c2c: Add dimensions for peer load operations
  perf c2c: Add dimensions of peer metrics for cache line view
  perf c2c: Add mean dimensions for peer operations
  perf c2c: Use explicit names for display macros
  perf c2c: Rename dimension from 'percent_hitm' to
    'percent_costly_snoop'
  perf c2c: Refactor node header
  perf c2c: Refactor display string
  perf c2c: Sort on peer snooping for load operations
  perf c2c: Use 'peer' as default display for Arm64
  perf c2c: Update documentation for new display option 'peer'

 tools/include/uapi/linux/perf_event.h         |   2 +-
 tools/perf/Documentation/perf-c2c.txt         |  31 +-
 tools/perf/builtin-c2c.c                      | 454 ++++++++++++++----
 .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
 .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
 tools/perf/util/arm-spe.c                     | 130 ++++-
 tools/perf/util/mem-events.c                  |  46 +-
 tools/perf/util/mem-events.h                  |   3 +
 8 files changed, 547 insertions(+), 132 deletions(-)

-- 
2.34.1
Re: [PATCH v6 00/15] perf c2c: Support data source and display for Arm64
Posted by Arnaldo Carvalho de Melo 3 years, 8 months ago
Em Thu, Aug 11, 2022 at 02:24:36PM +0800, Leo Yan escreveu:
> Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> us to detect cache line contention and transfers.
> 
> This patch set has been rebased on the acme/perf/core branch with the latest
> commit b39c9e1b101d ("perf machine: Fix missing free of
> machine->kallsyms_filename").
> 
> To make building success, a compilation fixing commit [1] has been sent
> to LKML, this patch set is dependent on it.  This patch set has been verified
> for both x86 perf memory events and Arm SPE events.
> 
> [1] https://lore.kernel.org/lkml/20220811044341.426796-1-leo.yan@linaro.org/

So, I tentatively applied this set after applying the patch for
<asm/sysreg.h>, and its all now out in tmp.perf/core in my git tree,
please check.

I'm doing the usual set of container build tests, but any additional
checking, including on the committer note I added to the first patch in
this series, claryfing it is not really a "sync" with the kernel
headers, is more than welcome.

- Arnaldo
 
> Changes from v5:
> * Removed the patch "perf: Add SNOOP_PEER flag to perf mem data struct"
>   (Arnaldo);
> * Removed the patch "perf arm-spe: Don't set data source if it's not a
>   memory operation" which has been merged in the mainline kernel, so can
>   dismiss merging conflict.
> * Rebased on the latest acme perf/core branch, no any code change
>   compared to previous version.
> 
> Changes from v4:
> * Included Ali's patch set for adding data source in Arm SPE samples;
> * Added Ian's ACK and Ali's review and test tags;
> * Update document for the default peer dispaly for Arm64 (Ali).
> 
> Changes from v3:
> * Changed to display remote and local peer accesses (Joe);
> * Fixed the usage info for display types (Joe);
> * Do not display HITM dimensions when use 'peer' display, and HITM
>   display doesn't show any 'peer' dimensions (James);
> * Split to smaller patches for adding dimensions of peer operations;
> * Updated documentation to reflect the latest GUI and stdio.
> 
> 
> Ali Saidi (2):
>   perf tools: sync addition of PERF_MEM_SNOOPX_PEER
>   perf arm-spe: Use SPE data source for neoverse cores
> 
> Leo Yan (13):
>   perf mem: Print snoop peer flag
>   perf mem: Add statistics for peer snooping
>   perf c2c: Output statistics for peer snooping
>   perf c2c: Add dimensions for peer load operations
>   perf c2c: Add dimensions of peer metrics for cache line view
>   perf c2c: Add mean dimensions for peer operations
>   perf c2c: Use explicit names for display macros
>   perf c2c: Rename dimension from 'percent_hitm' to
>     'percent_costly_snoop'
>   perf c2c: Refactor node header
>   perf c2c: Refactor display string
>   perf c2c: Sort on peer snooping for load operations
>   perf c2c: Use 'peer' as default display for Arm64
>   perf c2c: Update documentation for new display option 'peer'
> 
>  tools/include/uapi/linux/perf_event.h         |   2 +-
>  tools/perf/Documentation/perf-c2c.txt         |  31 +-
>  tools/perf/builtin-c2c.c                      | 454 ++++++++++++++----
>  .../util/arm-spe-decoder/arm-spe-decoder.c    |   1 +
>  .../util/arm-spe-decoder/arm-spe-decoder.h    |  12 +
>  tools/perf/util/arm-spe.c                     | 130 ++++-
>  tools/perf/util/mem-events.c                  |  46 +-
>  tools/perf/util/mem-events.h                  |   3 +
>  8 files changed, 547 insertions(+), 132 deletions(-)
> 
> -- 
> 2.34.1

-- 

- Arnaldo
Re: [PATCH v6 00/15] perf c2c: Support data source and display for Arm64
Posted by Leo Yan 3 years, 8 months ago
Hi Arnaldo,

On Thu, Aug 11, 2022 at 07:25:35PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 11, 2022 at 02:24:36PM +0800, Leo Yan escreveu:
> > Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> > us to detect cache line contention and transfers.
> > 
> > This patch set has been rebased on the acme/perf/core branch with the latest
> > commit b39c9e1b101d ("perf machine: Fix missing free of
> > machine->kallsyms_filename").
> > 
> > To make building success, a compilation fixing commit [1] has been sent
> > to LKML, this patch set is dependent on it.  This patch set has been verified
> > for both x86 perf memory events and Arm SPE events.
> > 
> > [1] https://lore.kernel.org/lkml/20220811044341.426796-1-leo.yan@linaro.org/
> 
> So, I tentatively applied this set after applying the patch for
> <asm/sysreg.h>, and its all now out in tmp.perf/core in my git tree,
> please check.

With discussing with Suzuki, he pointed it is not perfect for adding asm
include path in that way.  With the patch on tmp.perf/core branch, two
include paths are added into CFLAGS for arm-spe.c:

  -I$(srctree)/tools/arch/$(SRCARCH)/include/
  -I$(srctree)/tools/arch/arm64/include/

When we build perf on x86_64, then $(srctree)/tools/arch/x86/include/
takes more precedence than $(srctree)/tools/arch/arm64/include/; if we
want to include header file without relative path in c code, like
"#include <asm/cputype.h>", then it has chance to find the same name
file from x86's asm folder rather than arm64's asm folder.

At yesterday, I spent couple hours to find other methods (like
filter-out, CFLAGS_REMOVE, etc) in makefile but it's no lucky to make
success to give precedence for $(srctree)/tools/arch/arm64/include/.

So current patches on the branch tmp.perf/core can build successfully,
but if have any better method to resolve the header path precedence
issue, then I prefer to improve for this, which can allow us later
don't worry about it.  Any suggestion for this?

> I'm doing the usual set of container build tests, but any additional
> checking, including on the committer note I added to the first patch in
> this series, claryfing it is not really a "sync" with the kernel
> headers, is more than welcome.

It's fine for me for adding my Signed-off for the signature chain.
Appreicate for the amending.

Thanks,
Leo