[Patch v3 00/22] Arch-PEBS and PMU supports for Clearwater Forest and Panther Lake

Dapeng Mi posted 22 patches 8 months, 1 week ago
arch/arm/kernel/perf_regs.c                   |   6 +
arch/arm64/kernel/perf_regs.c                 |   6 +
arch/csky/kernel/perf_regs.c                  |   5 +
arch/loongarch/kernel/perf_regs.c             |   5 +
arch/mips/kernel/perf_regs.c                  |   5 +
arch/powerpc/perf/perf_regs.c                 |   5 +
arch/riscv/kernel/perf_regs.c                 |   5 +
arch/s390/kernel/perf_regs.c                  |   5 +
arch/x86/events/core.c                        | 136 +++-
arch/x86/events/intel/bts.c                   |   6 +-
arch/x86/events/intel/core.c                  | 329 +++++++-
arch/x86/events/intel/ds.c                    | 714 ++++++++++++++----
arch/x86/events/perf_event.h                  |  60 +-
arch/x86/include/asm/intel_ds.h               |  10 +-
arch/x86/include/asm/msr-index.h              |  26 +
arch/x86/include/asm/perf_event.h             | 145 +++-
arch/x86/include/uapi/asm/perf_regs.h         |  83 +-
arch/x86/kernel/perf_regs.c                   |  71 +-
include/linux/perf_event.h                    |   4 +
include/linux/perf_regs.h                     |  10 +
include/uapi/linux/perf_event.h               |  11 +
kernel/events/core.c                          |  98 ++-
tools/arch/x86/include/uapi/asm/perf_regs.h   |  86 ++-
tools/include/uapi/linux/perf_event.h         |  14 +
tools/perf/arch/arm/util/perf_regs.c          |   8 +-
tools/perf/arch/arm64/util/perf_regs.c        |  11 +-
tools/perf/arch/csky/util/perf_regs.c         |   8 +-
tools/perf/arch/loongarch/util/perf_regs.c    |   8 +-
tools/perf/arch/mips/util/perf_regs.c         |   8 +-
tools/perf/arch/powerpc/util/perf_regs.c      |  17 +-
tools/perf/arch/riscv/util/perf_regs.c        |   8 +-
tools/perf/arch/s390/util/perf_regs.c         |   8 +-
tools/perf/arch/x86/util/perf_regs.c          | 138 +++-
tools/perf/builtin-script.c                   |  23 +-
tools/perf/tests/shell/record.sh              |  55 ++
tools/perf/util/evsel.c                       |  36 +-
tools/perf/util/intel-pt.c                    |   2 +-
tools/perf/util/parse-regs-options.c          |  23 +-
.../perf/util/perf-regs-arch/perf_regs_x86.c  |  84 +++
tools/perf/util/perf_regs.c                   |   8 +-
tools/perf/util/perf_regs.h                   |  20 +-
tools/perf/util/record.h                      |   4 +-
tools/perf/util/sample.h                      |   6 +-
tools/perf/util/session.c                     |  29 +-
tools/perf/util/synthetic-events.c            |  12 +-
45 files changed, 2075 insertions(+), 286 deletions(-)
[Patch v3 00/22] Arch-PEBS and PMU supports for Clearwater Forest and Panther Lake
Posted by Dapeng Mi 8 months, 1 week ago
This v3 patch serires is based on latest perf/core tree "5c3627b6f059
 perf/x86/intel/bts: Replace offsetof() with struct_size()" plus extra 2
patches in patchset "perf/x86/intel: Don't clear perf metrics overflow
 bit unconditionally"[1].

Changes:
  v2 -> v3:
  * Rebase patches to 6.15-rc1 code base.
  * Refactor arch-PEBS buffer allocation/release code, decouple with
    legacy PEBS buffer allocation/release code.
  * Support to capture SSP/XMM/YMM/ZMM registers for user space registers
    sampling (--user-regs options) with PEBS events.
  * Fix incorrect sampling frequency issue in frequency sampling mode.
  * Misc changes to address other v2 comments.

Tests:
  Run below tests on Clearwater Forest and Pantherlake, no issue is
  found.
  
  1. Basic perf counting case.
    perf stat -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles}' sleep 1

  2. Basic PMI based perf sampling case.
    perf record -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles}' sleep 1

  3. Basic PEBS based perf sampling case.
    perf record -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles}:p' sleep 1

  4. PEBS sampling case with basic, GPRs, vector-registers and LBR groups
    perf record -e branches:p -Iax,bx,ip,ssp,xmm0,ymm0 -b -c 10000 sleep 1

  5. User space PEBS sampling case with basic, GPRs, vector-registers and LBR groups
    perf record -e branches:pu --user-regs=ax,bx,ip,ssp,xmm0,ymm0 -b -c 10000 sleep 1

  6 PEBS sampling case with auxiliary (memory info) group
    perf mem record sleep 1

  7. PEBS sampling case with counter group
    perf record -e '{branches:p,branches,cycles}:S' -c 10000 sleep 1

  8. Perf stat and record test
    perf test 92; perf test 120

  9. perf-fuzzer test


History:
  v2: https://lore.kernel.org/all/20250218152818.158614-1-dapeng1.mi@linux.intel.com/
  v1: https://lore.kernel.org/all/20250123140721.2496639-1-dapeng1.mi@linux.intel.com/

Ref:
  [1]: https://lore.kernel.org/all/20250415104135.318169-1-dapeng1.mi@linux.intel.com/


Dapeng Mi (21):
  perf/x86/intel: Add PMU support for Clearwater Forest
  perf/x86/intel: Parse CPUID archPerfmonExt leaves for non-hybrid CPUs
  perf/x86/intel: Decouple BTS initialization from PEBS initialization
  perf/x86/intel: Rename x86_pmu.pebs to x86_pmu.ds_pebs
  perf/x86/intel: Introduce pairs of PEBS static calls
  perf/x86/intel: Initialize architectural PEBS
  perf/x86/intel/ds: Factor out PEBS record processing code to functions
  perf/x86/intel/ds: Factor out PEBS group processing code to functions
  perf/x86/intel: Process arch-PEBS records or record fragments
  perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR
  perf/x86/intel: Update dyn_constranit base on PEBS event precise level
  perf/x86/intel: Setup PEBS data configuration and enable legacy groups
  perf/x86/intel: Add counter group support for arch-PEBS
  perf/x86/intel: Support SSP register capturing for arch-PEBS
  perf/core: Support to capture higher width vector registers
  perf/x86/intel: Support arch-PEBS vector registers group capturing
  perf tools: Support to show SSP register
  perf tools: Enhance arch__intr/user_reg_mask() helpers
  perf tools: Enhance sample_regs_user/intr to capture more registers
  perf tools: Support to capture more vector registers (x86/Intel)
  perf tools/tests: Add vector registers PEBS sampling test

Kan Liang (1):
  perf/x86/intel: Add Panther Lake support

 arch/arm/kernel/perf_regs.c                   |   6 +
 arch/arm64/kernel/perf_regs.c                 |   6 +
 arch/csky/kernel/perf_regs.c                  |   5 +
 arch/loongarch/kernel/perf_regs.c             |   5 +
 arch/mips/kernel/perf_regs.c                  |   5 +
 arch/powerpc/perf/perf_regs.c                 |   5 +
 arch/riscv/kernel/perf_regs.c                 |   5 +
 arch/s390/kernel/perf_regs.c                  |   5 +
 arch/x86/events/core.c                        | 136 +++-
 arch/x86/events/intel/bts.c                   |   6 +-
 arch/x86/events/intel/core.c                  | 329 +++++++-
 arch/x86/events/intel/ds.c                    | 714 ++++++++++++++----
 arch/x86/events/perf_event.h                  |  60 +-
 arch/x86/include/asm/intel_ds.h               |  10 +-
 arch/x86/include/asm/msr-index.h              |  26 +
 arch/x86/include/asm/perf_event.h             | 145 +++-
 arch/x86/include/uapi/asm/perf_regs.h         |  83 +-
 arch/x86/kernel/perf_regs.c                   |  71 +-
 include/linux/perf_event.h                    |   4 +
 include/linux/perf_regs.h                     |  10 +
 include/uapi/linux/perf_event.h               |  11 +
 kernel/events/core.c                          |  98 ++-
 tools/arch/x86/include/uapi/asm/perf_regs.h   |  86 ++-
 tools/include/uapi/linux/perf_event.h         |  14 +
 tools/perf/arch/arm/util/perf_regs.c          |   8 +-
 tools/perf/arch/arm64/util/perf_regs.c        |  11 +-
 tools/perf/arch/csky/util/perf_regs.c         |   8 +-
 tools/perf/arch/loongarch/util/perf_regs.c    |   8 +-
 tools/perf/arch/mips/util/perf_regs.c         |   8 +-
 tools/perf/arch/powerpc/util/perf_regs.c      |  17 +-
 tools/perf/arch/riscv/util/perf_regs.c        |   8 +-
 tools/perf/arch/s390/util/perf_regs.c         |   8 +-
 tools/perf/arch/x86/util/perf_regs.c          | 138 +++-
 tools/perf/builtin-script.c                   |  23 +-
 tools/perf/tests/shell/record.sh              |  55 ++
 tools/perf/util/evsel.c                       |  36 +-
 tools/perf/util/intel-pt.c                    |   2 +-
 tools/perf/util/parse-regs-options.c          |  23 +-
 .../perf/util/perf-regs-arch/perf_regs_x86.c  |  84 +++
 tools/perf/util/perf_regs.c                   |   8 +-
 tools/perf/util/perf_regs.h                   |  20 +-
 tools/perf/util/record.h                      |   4 +-
 tools/perf/util/sample.h                      |   6 +-
 tools/perf/util/session.c                     |  29 +-
 tools/perf/util/synthetic-events.c            |  12 +-
 45 files changed, 2075 insertions(+), 286 deletions(-)


base-commit: 538f1f04b5bfeaff4cd681b2567a0fde2335be38
-- 
2.40.1
Re: [Patch v3 00/22] Arch-PEBS and PMU supports for Clearwater Forest and Panther Lake
Posted by Liang, Kan 8 months, 1 week ago
Hi Peter,

On 2025-04-15 7:44 a.m., Dapeng Mi wrote:
> Dapeng Mi (21):
>   perf/x86/intel: Add PMU support for Clearwater Forest
> 
> Kan Liang (1):
>   perf/x86/intel: Add Panther Lake support

Could you please take a look and pick up the above two patches if they
look good to you?

The two patches are generic support for the Panther Lake and Clearwater
Forest. With them, at least the non-PEBS and topdown can work.
The ARCH PEBS will be temporarily disabled until this big patch set is
merged.

 # dmesg | grep PMU
[    0.095162] Performance Events: XSAVE Architectural LBR,  AnyThread
deprecated, Pantherlake Hybrid events, 32-deep LBR, full-width counters,
Intel PMU driver.

 # perf stat -e
"{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound}"
-a
WARNING: events were regrouped to match PMUs
^C
 Performance counter stats for 'system wide':

         2,212,401      cpu_atom/topdown-retiring/
         8,121,982      cpu_atom/topdown-bad-spec/
        42,119,870      cpu_atom/topdown-fe-bound/
        27,667,678      cpu_atom/topdown-be-bound/
       496,377,056      cpu_core/slots/
         2,058,926      cpu_core/topdown-retiring/
         6,008,255      cpu_core/topdown-bad-spec/
       265,352,356      cpu_core/topdown-fe-bound/
       222,957,516      cpu_core/topdown-be-bound/

 # perf record -e cycles:p sleep 1
Error:
cpu_atom/cycles/pH: PMU Hardware doesn't support
sampling/overflow-interrupts. Try 'perf stat'

Thanks,
Kan
Re: [Patch v3 00/22] Arch-PEBS and PMU supports for Clearwater Forest and Panther Lake
Posted by Peter Zijlstra 8 months, 1 week ago
On Tue, Apr 15, 2025 at 11:21:30AM -0400, Liang, Kan wrote:
> Hi Peter,
> 
> On 2025-04-15 7:44 a.m., Dapeng Mi wrote:
> > Dapeng Mi (21):
> >   perf/x86/intel: Add PMU support for Clearwater Forest
> > 
> > Kan Liang (1):
> >   perf/x86/intel: Add Panther Lake support
> 
> Could you please take a look and pick up the above two patches if they
> look good to you?

Yes, I've picked up that earlier 2 patch series and will pick up the
first 6 patches from this series.

Thanks!