[Patch v7 00/12] arch-PEBS enabling for Intel platforms

Dapeng Mi posted 12 patches 1 month ago
arch/x86/events/core.c            |  21 +-
arch/x86/events/intel/core.c      | 268 ++++++++++++-
arch/x86/events/intel/ds.c        | 621 +++++++++++++++++++++++++-----
arch/x86/events/perf_event.h      |  41 +-
arch/x86/include/asm/intel_ds.h   |  10 +-
arch/x86/include/asm/msr-index.h  |  20 +
arch/x86/include/asm/perf_event.h | 116 +++++-
7 files changed, 955 insertions(+), 142 deletions(-)
[Patch v7 00/12] arch-PEBS enabling for Intel platforms
Posted by Dapeng Mi 1 month ago
Changes:
v6 -> v7:
  * Rebase code to last tip perf/core tree.
  * Opportunistically remove the redundant is_x86_event() prototype.
    (Patch 01/12)
  * Fix PEBS handler NULL event access and record loss issue.
    (Patch 02/12)
  * Reset MSR_IA32_PEBS_INDEX at the head of_drain_arch_pebs() instead
    of end. It avoids the processed PEBS records are processed again in
    some corner cases like event throttling. (Patch 08/12)

v5 -> v6:
  * Rebase code to last tip perf/core tree + "x86 perf bug fixes and
    optimization" patchset
 
v4 -> v5:
  * Rebase code to 6.16-rc3
  * Allocate/free arch-PEBS buffer in callbacks *prepare_cpu/*dead_cpu
    (patch 07/10, Peter)
  * Code and comments refine (patch 09/10, Peter)


This patchset introduces architectural PEBS support for Intel platforms
like Clearwater Forest (CWF) and Panther Lake (PTL). The detailed
information about arch-PEBS can be found in chapter 11
"architectural PEBS" of "Intel Architecture Instruction Set Extensions
and Future Features".

This patch set doesn't include the SSP and SIMD regs (OPMASK/YMM/ZMM)
sampling support for arch-PEBS to avoid the dependency for the basic
SIMD regs sampling support patch series[1]. Once the basic SIMD regs
sampling is supported, the arch-PEBS based SSP and SIMD regs
(OPMASK/YMM/ZMM) sampling would be supported in a later patch set.

Tests:
  Run below tests on Clearwater Forest and Pantherlake, no issue is
  found.

  1. Basic perf counting case.
    perf stat -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles}' sleep 1

  2. Basic PMI based perf sampling case.
    perf record -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles}' sleep 1

  3. Basic PEBS based perf sampling case.
    perf record -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles}:p' sleep 1

  4. PEBS sampling case with basic, GPRs, vector-registers and LBR groups
    perf record -e branches:p -Iax,bx,ip,xmm0 -b -c 10000 sleep 1

  5. User space PEBS sampling case with basic, GPRs and LBR groups
    perf record -e branches:p --user-regs=ax,bx,ip -b -c 10000 sleep 1

  6. PEBS sampling case with auxiliary (memory info) group
    perf mem record sleep 1

  7. PEBS sampling case with counter group
    perf record -e '{branches:p,branches,cycles}:S' -c 10000 sleep 1

  8. Perf stat and record test
    perf test 100; perf test 131


History:
  v6: https://lore.kernel.org/all/20250821035805.159494-1-dapeng1.mi@linux.intel.com/ 
  v5: https://lore.kernel.org/all/20250623223546.112465-1-dapeng1.mi@linux.intel.com/
  v4: https://lore.kernel.org/all/20250620103909.1586595-1-dapeng1.mi@linux.intel.com/
  v3: https://lore.kernel.org/all/20250415114428.341182-1-dapeng1.mi@linux.intel.com/
  v2: https://lore.kernel.org/all/20250218152818.158614-1-dapeng1.mi@linux.intel.com/
  v1: https://lore.kernel.org/all/20250123140721.2496639-1-dapeng1.mi@linux.intel.com/

Ref:
  [1]: https://lore.kernel.org/all/20250815213435.1702022-1-kan.liang@linux.intel.com/

Dapeng Mi (12):
  perf/x86: Remove redundant is_x86_event() prototype
  perf/x86/intel: Fix NULL event access and potential PEBS record loss
  perf/x86/intel: Replace x86_pmu.drain_pebs calling with static call
  perf/x86/intel: Correct large PEBS flag check
  perf/x86/intel: Initialize architectural PEBS
  perf/x86/intel/ds: Factor out PEBS record processing code to functions
  perf/x86/intel/ds: Factor out PEBS group processing code to functions
  perf/x86/intel: Process arch-PEBS records or record fragments
  perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR
  perf/x86/intel: Update dyn_constranit base on PEBS event precise level
  perf/x86/intel: Setup PEBS data configuration and enable legacy groups
  perf/x86/intel: Add counter group support for arch-PEBS

 arch/x86/events/core.c            |  21 +-
 arch/x86/events/intel/core.c      | 268 ++++++++++++-
 arch/x86/events/intel/ds.c        | 621 +++++++++++++++++++++++++-----
 arch/x86/events/perf_event.h      |  41 +-
 arch/x86/include/asm/intel_ds.h   |  10 +-
 arch/x86/include/asm/msr-index.h  |  20 +
 arch/x86/include/asm/perf_event.h | 116 +++++-
 7 files changed, 955 insertions(+), 142 deletions(-)


base-commit: f49e1be19542487921e82b29004908966cb99d7c
-- 
2.34.1
Re: [Patch v7 00/12] arch-PEBS enabling for Intel platforms
Posted by Mi, Dapeng 2 weeks ago
On 8/28/2025 9:34 AM, Dapeng Mi wrote:
> Changes:
> v6 -> v7:
>   * Rebase code to last tip perf/core tree.
>   * Opportunistically remove the redundant is_x86_event() prototype.
>     (Patch 01/12)
>   * Fix PEBS handler NULL event access and record loss issue.
>     (Patch 02/12)
>   * Reset MSR_IA32_PEBS_INDEX at the head of_drain_arch_pebs() instead
>     of end. It avoids the processed PEBS records are processed again in
>     some corner cases like event throttling. (Patch 08/12)
>
> v5 -> v6:
>   * Rebase code to last tip perf/core tree + "x86 perf bug fixes and
>     optimization" patchset
>  
> v4 -> v5:
>   * Rebase code to 6.16-rc3
>   * Allocate/free arch-PEBS buffer in callbacks *prepare_cpu/*dead_cpu
>     (patch 07/10, Peter)
>   * Code and comments refine (patch 09/10, Peter)
>
>
> This patchset introduces architectural PEBS support for Intel platforms
> like Clearwater Forest (CWF) and Panther Lake (PTL). The detailed
> information about arch-PEBS can be found in chapter 11
> "architectural PEBS" of "Intel Architecture Instruction Set Extensions
> and Future Features".
>
> This patch set doesn't include the SSP and SIMD regs (OPMASK/YMM/ZMM)
> sampling support for arch-PEBS to avoid the dependency for the basic
> SIMD regs sampling support patch series[1]. Once the basic SIMD regs
> sampling is supported, the arch-PEBS based SSP and SIMD regs
> (OPMASK/YMM/ZMM) sampling would be supported in a later patch set.

Kindly ping.

Hi Peter,

May I know your comments on this patch series? Is it ok to merge this basic
arch-PEBS enabling patch series first (without SIMD regs sampling support)?
Is there other new comments about this patch set? Thanks.



>
> Tests:
>   Run below tests on Clearwater Forest and Pantherlake, no issue is
>   found.
>
>   1. Basic perf counting case.
>     perf stat -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles}' sleep 1
>
>   2. Basic PMI based perf sampling case.
>     perf record -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles}' sleep 1
>
>   3. Basic PEBS based perf sampling case.
>     perf record -e '{branches,branches,branches,branches,branches,branches,branches,branches,cycles,instructions,ref-cycles}:p' sleep 1
>
>   4. PEBS sampling case with basic, GPRs, vector-registers and LBR groups
>     perf record -e branches:p -Iax,bx,ip,xmm0 -b -c 10000 sleep 1
>
>   5. User space PEBS sampling case with basic, GPRs and LBR groups
>     perf record -e branches:p --user-regs=ax,bx,ip -b -c 10000 sleep 1
>
>   6. PEBS sampling case with auxiliary (memory info) group
>     perf mem record sleep 1
>
>   7. PEBS sampling case with counter group
>     perf record -e '{branches:p,branches,cycles}:S' -c 10000 sleep 1
>
>   8. Perf stat and record test
>     perf test 100; perf test 131
>
>
> History:
>   v6: https://lore.kernel.org/all/20250821035805.159494-1-dapeng1.mi@linux.intel.com/ 
>   v5: https://lore.kernel.org/all/20250623223546.112465-1-dapeng1.mi@linux.intel.com/
>   v4: https://lore.kernel.org/all/20250620103909.1586595-1-dapeng1.mi@linux.intel.com/
>   v3: https://lore.kernel.org/all/20250415114428.341182-1-dapeng1.mi@linux.intel.com/
>   v2: https://lore.kernel.org/all/20250218152818.158614-1-dapeng1.mi@linux.intel.com/
>   v1: https://lore.kernel.org/all/20250123140721.2496639-1-dapeng1.mi@linux.intel.com/
>
> Ref:
>   [1]: https://lore.kernel.org/all/20250815213435.1702022-1-kan.liang@linux.intel.com/
>
> Dapeng Mi (12):
>   perf/x86: Remove redundant is_x86_event() prototype
>   perf/x86/intel: Fix NULL event access and potential PEBS record loss
>   perf/x86/intel: Replace x86_pmu.drain_pebs calling with static call
>   perf/x86/intel: Correct large PEBS flag check
>   perf/x86/intel: Initialize architectural PEBS
>   perf/x86/intel/ds: Factor out PEBS record processing code to functions
>   perf/x86/intel/ds: Factor out PEBS group processing code to functions
>   perf/x86/intel: Process arch-PEBS records or record fragments
>   perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR
>   perf/x86/intel: Update dyn_constranit base on PEBS event precise level
>   perf/x86/intel: Setup PEBS data configuration and enable legacy groups
>   perf/x86/intel: Add counter group support for arch-PEBS
>
>  arch/x86/events/core.c            |  21 +-
>  arch/x86/events/intel/core.c      | 268 ++++++++++++-
>  arch/x86/events/intel/ds.c        | 621 +++++++++++++++++++++++++-----
>  arch/x86/events/perf_event.h      |  41 +-
>  arch/x86/include/asm/intel_ds.h   |  10 +-
>  arch/x86/include/asm/msr-index.h  |  20 +
>  arch/x86/include/asm/perf_event.h | 116 +++++-
>  7 files changed, 955 insertions(+), 142 deletions(-)
>
>
> base-commit: f49e1be19542487921e82b29004908966cb99d7c