[PATCH v2 0/5] Fix PMU kselftests errors on GNR/SRF/CWF

Dapeng Mi posted 5 patches 2 months, 2 weeks ago
arch/x86/include/asm/kvm_host.h               |  8 ++--
arch/x86/kvm/vmx/pmu_intel.c                  |  6 +--
tools/testing/selftests/kvm/include/x86/pmu.h | 19 ++++++++
.../selftests/kvm/include/x86/processor.h     |  7 ++-
tools/testing/selftests/kvm/lib/x86/pmu.c     | 43 +++++++++++++++++++
.../selftests/kvm/x86/pmu_counters_test.c     | 39 ++++++++++++++---
.../selftests/kvm/x86/pmu_event_filter_test.c |  9 +++-
.../selftests/kvm/x86/vmx_pmu_caps_test.c     |  3 +-
8 files changed, 117 insertions(+), 17 deletions(-)
[PATCH v2 0/5] Fix PMU kselftests errors on GNR/SRF/CWF
Posted by Dapeng Mi 2 months, 2 weeks ago
This patch series fixes KVM PMU kselftests errors encountered on Granite
Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF).

GNR and SRF starts to support the timed PEBS. Timed PEBS adds a new
"retired latency" field in basic info group to show the timing info and
the PERF_CAPABILITIES[17] called "PEBS_TIMING_INFO" bit is added
to indicated whether timed PEBS is supported. KVM module doesn't need to
do any specific change to support timed PEBS except a perf change adding
PERF_CAP_PEBS_TIMING_INFO flag into PERF_CAP_PEBS_MASK[1]. The patch 2/5
adds timed PEBS support in vmx_pmu_caps_test and fix the error as the
PEBS caps field mismatch.

CWF introduces 5 new architectural events (4 level-1 topdown metrics
events and LBR inserts event). The patch 3/5 adds support for these 5
arch-events and fixes the error that caused by mismatch between HW real
supported arch-events number with NR_INTEL_ARCH_EVENTS.

On Intel Atom platforms, the PMU events "Instruction Retired" or
"Branch Instruction Retired" may be overcounted for some certain
instructions, like FAR CALL/JMP, RETF, IRET, VMENTRY/VMEXIT/VMPTRLD
and complex SGX/SMX/CSTATE instructions/flows[2].

In details, for the Atom platforms before Sierra Forest (including
Sierra Forest), Both 2 events "Instruction Retired" and
"Branch Instruction Retired" would be overcounted on these certain
instructions, but for Clearwater Forest only "Instruction Retired" event
is overcounted on these instructions.

As this overcount issue, pmu_counters_test and pmu_event_filter_test
would fail on the precise event count validation for these 2 events on
Atom platforms.

To work around this Atom platform overcount issue, Patches 4-5/5 looses
the precise count validation separately for pmu_counters_test and
pmu_event_filter_test.

BTW, this patch series doesn't depend on the mediated vPMU support.

Changes:
  * Add error fix for vmx_pmu_caps_test on GNR/SRF (patch 2/5).
  * Opportunistically fix a typo (patch 1/5).

Tests:
  * PMU kselftests (pmu_counters_test/pmu_event_filter_test/
    vmx_pmu_caps_test) passed on Intel SPR/GNR/SRF/CWF platforms.

History:
  * v1: https://lore.kernel.org/all/20250712172522.187414-1-dapeng1.mi@linux.intel.com/

Ref:
  [1] https://lore.kernel.org/all/20250717090302.11316-1-dapeng1.mi@linux.intel.com/
  [2] https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/sierra-forest/xeon-6700-series-processor-with-e-cores-specification-update/errata-details

Dapeng Mi (4):
  KVM: x86/pmu: Correct typo "_COUTNERS" to "_COUNTERS"
  KVM: selftests: Add timing_info bit support in vmx_pmu_caps_test
  KVM: Selftests: Validate more arch-events in pmu_counters_test
  KVM: selftests: Relax branches event count check for event_filter test

dongsheng (1):
  KVM: selftests: Relax precise event count validation as overcount
    issue

 arch/x86/include/asm/kvm_host.h               |  8 ++--
 arch/x86/kvm/vmx/pmu_intel.c                  |  6 +--
 tools/testing/selftests/kvm/include/x86/pmu.h | 19 ++++++++
 .../selftests/kvm/include/x86/processor.h     |  7 ++-
 tools/testing/selftests/kvm/lib/x86/pmu.c     | 43 +++++++++++++++++++
 .../selftests/kvm/x86/pmu_counters_test.c     | 39 ++++++++++++++---
 .../selftests/kvm/x86/pmu_event_filter_test.c |  9 +++-
 .../selftests/kvm/x86/vmx_pmu_caps_test.c     |  3 +-
 8 files changed, 117 insertions(+), 17 deletions(-)


base-commit: 772d50d9b87bec08b56ecee0a880d6b2ee5c7da5
-- 
2.34.1
Re: [PATCH v2 0/5] Fix PMU kselftests errors on GNR/SRF/CWF
Posted by Sean Christopherson 3 weeks, 4 days ago
On Fri, Jul 18, 2025, Dapeng Mi wrote:
> This patch series fixes KVM PMU kselftests errors encountered on Granite
> Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF).
> 
> GNR and SRF starts to support the timed PEBS. Timed PEBS adds a new
> "retired latency" field in basic info group to show the timing info and
> the PERF_CAPABILITIES[17] called "PEBS_TIMING_INFO" bit is added
> to indicated whether timed PEBS is supported. KVM module doesn't need to
> do any specific change to support timed PEBS except a perf change adding
> PERF_CAP_PEBS_TIMING_INFO flag into PERF_CAP_PEBS_MASK[1]. The patch 2/5
> adds timed PEBS support in vmx_pmu_caps_test and fix the error as the
> PEBS caps field mismatch.
> 
> CWF introduces 5 new architectural events (4 level-1 topdown metrics
> events and LBR inserts event). The patch 3/5 adds support for these 5
> arch-events and fixes the error that caused by mismatch between HW real
> supported arch-events number with NR_INTEL_ARCH_EVENTS.
> 
> On Intel Atom platforms, the PMU events "Instruction Retired" or
> "Branch Instruction Retired" may be overcounted for some certain
> instructions, like FAR CALL/JMP, RETF, IRET, VMENTRY/VMEXIT/VMPTRLD
> and complex SGX/SMX/CSTATE instructions/flows[2].
> 
> In details, for the Atom platforms before Sierra Forest (including
> Sierra Forest), Both 2 events "Instruction Retired" and
> "Branch Instruction Retired" would be overcounted on these certain
> instructions, but for Clearwater Forest only "Instruction Retired" event
> is overcounted on these instructions.
> 
> As this overcount issue, pmu_counters_test and pmu_event_filter_test
> would fail on the precise event count validation for these 2 events on
> Atom platforms.
> 
> To work around this Atom platform overcount issue, Patches 4-5/5 looses
> the precise count validation separately for pmu_counters_test and
> pmu_event_filter_test.
> 
> BTW, this patch series doesn't depend on the mediated vPMU support.
> 
> Changes:
>   * Add error fix for vmx_pmu_caps_test on GNR/SRF (patch 2/5).
>   * Opportunistically fix a typo (patch 1/5).
> 
> Tests:
>   * PMU kselftests (pmu_counters_test/pmu_event_filter_test/
>     vmx_pmu_caps_test) passed on Intel SPR/GNR/SRF/CWF platforms.
> 
> History:
>   * v1: https://lore.kernel.org/all/20250712172522.187414-1-dapeng1.mi@linux.intel.com/
> 
> Ref:
>   [1] https://lore.kernel.org/all/20250717090302.11316-1-dapeng1.mi@linux.intel.com/
>   [2] https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/sierra-forest/xeon-6700-series-processor-with-e-cores-specification-update/errata-details
> 
> Dapeng Mi (4):
>   KVM: x86/pmu: Correct typo "_COUTNERS" to "_COUNTERS"
>   KVM: selftests: Add timing_info bit support in vmx_pmu_caps_test
>   KVM: Selftests: Validate more arch-events in pmu_counters_test
>   KVM: selftests: Relax branches event count check for event_filter test
> 
> dongsheng (1):
>   KVM: selftests: Relax precise event count validation as overcount
>     issue

Overall looks good, I just want to take a more infrastructure-oriented approach
for the errata.  I'll post a v3 tomorrow.  All coding is done and the tests pass,
but I want to take a second look with fresh eyes before posting it :-)
Re: [PATCH v2 0/5] Fix PMU kselftests errors on GNR/SRF/CWF
Posted by Mi, Dapeng 3 weeks, 4 days ago
On 9/11/2025 7:59 AM, Sean Christopherson wrote:
> On Fri, Jul 18, 2025, Dapeng Mi wrote:
>> This patch series fixes KVM PMU kselftests errors encountered on Granite
>> Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF).
>>
>> GNR and SRF starts to support the timed PEBS. Timed PEBS adds a new
>> "retired latency" field in basic info group to show the timing info and
>> the PERF_CAPABILITIES[17] called "PEBS_TIMING_INFO" bit is added
>> to indicated whether timed PEBS is supported. KVM module doesn't need to
>> do any specific change to support timed PEBS except a perf change adding
>> PERF_CAP_PEBS_TIMING_INFO flag into PERF_CAP_PEBS_MASK[1]. The patch 2/5
>> adds timed PEBS support in vmx_pmu_caps_test and fix the error as the
>> PEBS caps field mismatch.
>>
>> CWF introduces 5 new architectural events (4 level-1 topdown metrics
>> events and LBR inserts event). The patch 3/5 adds support for these 5
>> arch-events and fixes the error that caused by mismatch between HW real
>> supported arch-events number with NR_INTEL_ARCH_EVENTS.
>>
>> On Intel Atom platforms, the PMU events "Instruction Retired" or
>> "Branch Instruction Retired" may be overcounted for some certain
>> instructions, like FAR CALL/JMP, RETF, IRET, VMENTRY/VMEXIT/VMPTRLD
>> and complex SGX/SMX/CSTATE instructions/flows[2].
>>
>> In details, for the Atom platforms before Sierra Forest (including
>> Sierra Forest), Both 2 events "Instruction Retired" and
>> "Branch Instruction Retired" would be overcounted on these certain
>> instructions, but for Clearwater Forest only "Instruction Retired" event
>> is overcounted on these instructions.
>>
>> As this overcount issue, pmu_counters_test and pmu_event_filter_test
>> would fail on the precise event count validation for these 2 events on
>> Atom platforms.
>>
>> To work around this Atom platform overcount issue, Patches 4-5/5 looses
>> the precise count validation separately for pmu_counters_test and
>> pmu_event_filter_test.
>>
>> BTW, this patch series doesn't depend on the mediated vPMU support.
>>
>> Changes:
>>   * Add error fix for vmx_pmu_caps_test on GNR/SRF (patch 2/5).
>>   * Opportunistically fix a typo (patch 1/5).
>>
>> Tests:
>>   * PMU kselftests (pmu_counters_test/pmu_event_filter_test/
>>     vmx_pmu_caps_test) passed on Intel SPR/GNR/SRF/CWF platforms.
>>
>> History:
>>   * v1: https://lore.kernel.org/all/20250712172522.187414-1-dapeng1.mi@linux.intel.com/
>>
>> Ref:
>>   [1] https://lore.kernel.org/all/20250717090302.11316-1-dapeng1.mi@linux.intel.com/
>>   [2] https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/sierra-forest/xeon-6700-series-processor-with-e-cores-specification-update/errata-details
>>
>> Dapeng Mi (4):
>>   KVM: x86/pmu: Correct typo "_COUTNERS" to "_COUNTERS"
>>   KVM: selftests: Add timing_info bit support in vmx_pmu_caps_test
>>   KVM: Selftests: Validate more arch-events in pmu_counters_test
>>   KVM: selftests: Relax branches event count check for event_filter test
>>
>> dongsheng (1):
>>   KVM: selftests: Relax precise event count validation as overcount
>>     issue
> Overall looks good, I just want to take a more infrastructure-oriented approach
> for the errata.  I'll post a v3 tomorrow.  All coding is done and the tests pass,
> but I want to take a second look with fresh eyes before posting it :-)

Thanks! :-)


>