[PATCH v7 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup

Dongli Zhang posted 9 patches 2 months, 4 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20251111061532.36702-1-dongli.zhang@oracle.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Zhao Liu <zhao1.liu@intel.com>, Marcelo Tosatti <mtosatti@redhat.com>
There is a newer version of this series
target/i386/cpu.c     |   8 +
target/i386/cpu.h     |  16 ++
target/i386/kvm/kvm.c | 355 +++++++++++++++++++++++++++++++++++++++------
3 files changed, 332 insertions(+), 47 deletions(-)
[PATCH v7 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
Posted by Dongli Zhang 2 months, 4 weeks ago
This patchset addresses four bugs related to AMD PMU virtualization.

1. The PerfMonV2 is still available if PERCORE if disabled via
"-cpu host,-perfctr-core".

2. The VM 'cpuid' command still returns PERFCORE although "-pmu" is
configured.

3. The third issue is that using "-cpu host,-pmu" does not disable AMD PMU
virtualization. When using "-cpu EPYC" or "-cpu host,-pmu", AMD PMU
virtualization remains enabled. On the VM's Linux side, you might still
see:

[    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

instead of:

[    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.

4. The fourth issue is that unreclaimed performance events (after a QEMU
system_reset) in KVM may cause random, unwanted, or unknown NMIs to be
injected into the VM.

The AMD PMU registers are not reset during QEMU system_reset.

(1) If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.

(2) Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.

(3) The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.

(4) After a reboot, the VM kernel may report the following error:

[    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)

(5) In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process


Changed since v1:
  - Use feature_dependencies for CPUID_EXT3_PERFCORE and
    CPUID_8000_0022_EAX_PERFMON_V2.
  - Remove CPUID_EXT3_PERFCORE when !cpu->enable_pmu.
  - Pick kvm_arch_pre_create_vcpu() patch from Xiaoyao Li.
  - Use "-pmu" but not a global "pmu-cap-disabled" for KVM_PMU_CAP_DISABLE.
  - Also use sysfs kvm.enable_pmu=N to determine if PMU is supported.
  - Some changes to PMU register limit calculation.
Changed since v2:
  - Change has_pmu_cap to pmu_cap.
  - Use cpuid_find_entry() instead of cpu_x86_cpuid().
  - Rework the code flow of PATCH 07 related to kvm.enable_pmu=N following
    Zhao's suggestion.
  - Use object_property_get_int() to get CPU family.
  - Add support to Zhaoxin.
Changed since v3:
  - Re-base on top of Zhao's queued patch.
  - Use host_cpu_vendor_fms() from Zhao's patch.
  - Pick new version of kvm_arch_pre_create_vcpu() patch from Xiaoyao.
  - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
    suggestion.
  - Check AMD directly makes the "compat" rule clear.
  - Some changes on commit message and comment.
  - Bring back global static variable 'kvm_pmu_disabled' read from
    /sys/module/kvm/parameters/enable_pmu.
Changed since v4:
  - Re-base on top of most recent mainline QEMU.
  - Add more Reviewed-by.
  - All patches are reviewed.
Changed since v5:
  - Re-base on top of most recent mainline QEMU.
  - Remove patch "kvm: Introduce kvm_arch_pre_create_vcpu()" as it is
    already merged.
  - To resolve conflicts in new [PATCH v6 3/9] , move the PMU related code
    before the call site of is_tdx_vm().
Changed since v6:
  - Re-base on top of most recent mainline QEMU (staging branch).
  - Add more Reviewed-by from Dapeng and Sandipan.


Dongli Zhang (9):
  target/i386: disable PerfMonV2 when PERFCORE unavailable
  target/i386: disable PERFCORE when "-pmu" is configured
  target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
  target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
  target/i386/kvm: rename architectural PMU variables
  target/i386/kvm: query kvm.enable_pmu parameter
  target/i386/kvm: reset AMD PMU registers during VM reset
  target/i386/kvm: support perfmon-v2 for reset
  target/i386/kvm: don't stop Intel PMU counters

 target/i386/cpu.c     |   8 +
 target/i386/cpu.h     |  16 ++
 target/i386/kvm/kvm.c | 355 +++++++++++++++++++++++++++++++++++++++------
 3 files changed, 332 insertions(+), 47 deletions(-)

branch: remotes/origin/staging
base-commit: 593aee5df98b4a862ff8841a57ea3dbf22131a5f

Thank you very much!

Dongli Zhang
Re: [PATCH v7 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
Posted by Zhao Liu 1 month, 3 weeks ago
On Mon, Nov 10, 2025 at 10:14:49PM -0800, Dongli Zhang wrote:
> Date: Mon, 10 Nov 2025 22:14:49 -0800
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v7 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and
>  Cleanup
> X-Mailer: git-send-email 2.43.5
> 
> This patchset addresses four bugs related to AMD PMU virtualization.
> 
> 1. The PerfMonV2 is still available if PERCORE if disabled via
> "-cpu host,-perfctr-core".
> 
> 2. The VM 'cpuid' command still returns PERFCORE although "-pmu" is
> configured.
> 
> 3. The third issue is that using "-cpu host,-pmu" does not disable AMD PMU
> virtualization. When using "-cpu EPYC" or "-cpu host,-pmu", AMD PMU
> virtualization remains enabled. On the VM's Linux side, you might still
> see:
> 
> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
> 
> instead of:
> 
> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
> 
> To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
> when "-pmu" is configured.
> 
> 4. The fourth issue is that unreclaimed performance events (after a QEMU
> system_reset) in KVM may cause random, unwanted, or unknown NMIs to be
> injected into the VM.
> 
> The AMD PMU registers are not reset during QEMU system_reset.
> 
> (1) If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
> running "perf top", the PMU registers are not disabled properly.
> 
> (2) Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
> does not handle AMD PMU registers, causing some PMU events to remain
> enabled in KVM.
> 
> (3) The KVM kvm_pmc_speculative_in_use() function consistently returns true,
> preventing the reclamation of these events. Consequently, the
> kvm_pmc->perf_event remains active.
> 
> (4) After a reboot, the VM kernel may report the following error:
> 
> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
> 
> (5) In the worst case, the active kvm_pmc->perf_event may inject unknown
> NMIs randomly into the VM kernel:
> 
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
> 
> To resolve these issues, we propose resetting AMD PMU registers during the
> VM reset process
 
Hi Dongli,

Except for Patch 1 & 2 which need compatibility options (if you think
it's okay, I could help take these 2 and fix them when v11.0's compat
array is ready).

The other patches still LGTM. Maybe it's better to have a v8 excluding
patch 1 & 2?

Regards,
Zhao
Re: [PATCH v7 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
Posted by Dongli Zhang 1 month, 3 weeks ago

On 12/16/25 12:25 AM, Zhao Liu wrote:
> On Mon, Nov 10, 2025 at 10:14:49PM -0800, Dongli Zhang wrote:
>> Date: Mon, 10 Nov 2025 22:14:49 -0800
>> From: Dongli Zhang <dongli.zhang@oracle.com>
>> Subject: [PATCH v7 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and
>>  Cleanup
>> X-Mailer: git-send-email 2.43.5
>>

[snip]

>  
> Hi Dongli,
> 
> Except for Patch 1 & 2 which need compatibility options (if you think
> it's okay, I could help take these 2 and fix them when v11.0's compat
> array is ready).
> 
> The other patches still LGTM. Maybe it's better to have a v8 excluding
> patch 1 & 2?
> 

Hi Zhao,

Sure. I will rebase on top of the staging branch, excluding patch 1 & 2.

Feel free to take 1 & 2.

Thank you very much!

Dongli Zhang
Re: [PATCH v7 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
Posted by Dongli Zhang 2 months, 3 weeks ago
Hi Paolo,

Apologies if this causes any inconvenience.

Would you consider this patchset? It resolves several QEMU AMD vPMU issues, and
each patch has received a Reviewed-by from at least two reviewers.

Thank you very much!

Dongli Zhang

On 11/10/25 10:14 PM, Dongli Zhang wrote:
> This patchset addresses four bugs related to AMD PMU virtualization.
> 
> 1. The PerfMonV2 is still available if PERCORE if disabled via
> "-cpu host,-perfctr-core".
> 
> 2. The VM 'cpuid' command still returns PERFCORE although "-pmu" is
> configured.
> 
> 3. The third issue is that using "-cpu host,-pmu" does not disable AMD PMU
> virtualization. When using "-cpu EPYC" or "-cpu host,-pmu", AMD PMU
> virtualization remains enabled. On the VM's Linux side, you might still
> see:
> 
> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
> 
> instead of:
> 
> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
> 
> To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
> when "-pmu" is configured.
> 
> 4. The fourth issue is that unreclaimed performance events (after a QEMU
> system_reset) in KVM may cause random, unwanted, or unknown NMIs to be
> injected into the VM.
> 
> The AMD PMU registers are not reset during QEMU system_reset.
> 
> (1) If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
> running "perf top", the PMU registers are not disabled properly.
> 
> (2) Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
> does not handle AMD PMU registers, causing some PMU events to remain
> enabled in KVM.
> 
> (3) The KVM kvm_pmc_speculative_in_use() function consistently returns true,
> preventing the reclamation of these events. Consequently, the
> kvm_pmc->perf_event remains active.
> 
> (4) After a reboot, the VM kernel may report the following error:
> 
> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
> 
> (5) In the worst case, the active kvm_pmc->perf_event may inject unknown
> NMIs randomly into the VM kernel:
> 
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
> 
> To resolve these issues, we propose resetting AMD PMU registers during the
> VM reset process
> 
> 
> Changed since v1:
>   - Use feature_dependencies for CPUID_EXT3_PERFCORE and
>     CPUID_8000_0022_EAX_PERFMON_V2.
>   - Remove CPUID_EXT3_PERFCORE when !cpu->enable_pmu.
>   - Pick kvm_arch_pre_create_vcpu() patch from Xiaoyao Li.
>   - Use "-pmu" but not a global "pmu-cap-disabled" for KVM_PMU_CAP_DISABLE.
>   - Also use sysfs kvm.enable_pmu=N to determine if PMU is supported.
>   - Some changes to PMU register limit calculation.
> Changed since v2:
>   - Change has_pmu_cap to pmu_cap.
>   - Use cpuid_find_entry() instead of cpu_x86_cpuid().
>   - Rework the code flow of PATCH 07 related to kvm.enable_pmu=N following
>     Zhao's suggestion.
>   - Use object_property_get_int() to get CPU family.
>   - Add support to Zhaoxin.
> Changed since v3:
>   - Re-base on top of Zhao's queued patch.
>   - Use host_cpu_vendor_fms() from Zhao's patch.
>   - Pick new version of kvm_arch_pre_create_vcpu() patch from Xiaoyao.
>   - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
>     suggestion.
>   - Check AMD directly makes the "compat" rule clear.
>   - Some changes on commit message and comment.
>   - Bring back global static variable 'kvm_pmu_disabled' read from
>     /sys/module/kvm/parameters/enable_pmu.
> Changed since v4:
>   - Re-base on top of most recent mainline QEMU.
>   - Add more Reviewed-by.
>   - All patches are reviewed.
> Changed since v5:
>   - Re-base on top of most recent mainline QEMU.
>   - Remove patch "kvm: Introduce kvm_arch_pre_create_vcpu()" as it is
>     already merged.
>   - To resolve conflicts in new [PATCH v6 3/9] , move the PMU related code
>     before the call site of is_tdx_vm().
> Changed since v6:
>   - Re-base on top of most recent mainline QEMU (staging branch).
>   - Add more Reviewed-by from Dapeng and Sandipan.
> 
> 
> Dongli Zhang (9):
>   target/i386: disable PerfMonV2 when PERFCORE unavailable
>   target/i386: disable PERFCORE when "-pmu" is configured
>   target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
>   target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
>   target/i386/kvm: rename architectural PMU variables
>   target/i386/kvm: query kvm.enable_pmu parameter
>   target/i386/kvm: reset AMD PMU registers during VM reset
>   target/i386/kvm: support perfmon-v2 for reset
>   target/i386/kvm: don't stop Intel PMU counters
> 
>  target/i386/cpu.c     |   8 +
>  target/i386/cpu.h     |  16 ++
>  target/i386/kvm/kvm.c | 355 +++++++++++++++++++++++++++++++++++++++------
>  3 files changed, 332 insertions(+), 47 deletions(-)
> 
> branch: remotes/origin/staging
> base-commit: 593aee5df98b4a862ff8841a57ea3dbf22131a5f
> 
> Thank you very much!
> 
> Dongli Zhang
> 
>