[v8] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup

[PATCH v8 4/7] target/i386/kvm: query kvm.enable_pmu parameter

Posted by Dongli Zhang 1 month, 1 week ago

When PMU is enabled in QEMU, there is a chance that PMU virtualization is
completely disabled by the KVM module parameter kvm.enable_pmu=N.

The kvm.enable_pmu parameter is introduced since Linux v5.17.
Its permission is 0444. It does not change until a reload of the KVM
module.

Read the kvm.enable_pmu value from the module sysfs to give a chance to
provide more information about vPMU enablement.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
Changed since v2:
  - Rework the code flow following Zhao's suggestion.
  - Return error when:
    (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu)
Changed since v3:
  - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
    suggestion.
  - Rework the commit messages.
  - Bring back global static variable 'kvm_pmu_disabled' from v2.
Changed since v4:
  - Add Reviewed-by from Zhao.
Changed since v5:
  - Rebase on top of most recent QEMU.
Changed since v6:
  - Add Reviewed-by from Dapeng Mi.

 target/i386/kvm/kvm.c | 61 +++++++++++++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 17 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3b803c662d..338b9558e4 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -187,6 +187,10 @@ static int has_triple_fault_event;
 static bool has_msr_mcg_ext_ctl;
 
 static int pmu_cap;
+/*
+ * Read from /sys/module/kvm/parameters/enable_pmu.
+ */
+static bool kvm_pmu_disabled;
 
 static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
@@ -2068,23 +2072,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
     if (first) {
         first = false;
 
-        /*
-         * Since Linux v5.18, KVM provides a VM-level capability to easily
-         * disable PMUs; however, QEMU has been providing PMU property per
-         * CPU since v1.6. In order to accommodate both, have to configure
-         * the VM-level capability here.
-         *
-         * KVM_PMU_CAP_DISABLE doesn't change the PMU
-         * behavior on Intel platform because current "pmu" property works
-         * as expected.
-         */
-        if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
-            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
-                                    KVM_PMU_CAP_DISABLE);
-            if (ret < 0) {
-                error_setg_errno(errp, -ret,
-                                 "Failed to set KVM_PMU_CAP_DISABLE");
-                return ret;
+        if (X86_CPU(cpu)->enable_pmu) {
+            if (kvm_pmu_disabled) {
+                warn_report("Failed to enable PMU since "
+                            "KVM's enable_pmu parameter is disabled");
+            }
+        } else {
+            /*
+             * Since Linux v5.18, KVM provides a VM-level capability to easily
+             * disable PMUs; however, QEMU has been providing PMU property per
+             * CPU since v1.6. In order to accommodate both, have to configure
+             * the VM-level capability here.
+             *
+             * KVM_PMU_CAP_DISABLE doesn't change the PMU
+             * behavior on Intel platform because current "pmu" property works
+             * as expected.
+             */
+            if (pmu_cap & KVM_PMU_CAP_DISABLE) {
+                ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+                                        KVM_PMU_CAP_DISABLE);
+                if (ret < 0) {
+                    error_setg_errno(errp, -ret,
+                                     "Failed to set KVM_PMU_CAP_DISABLE");
+                    return ret;
+                }
             }
         }
     }
@@ -3302,6 +3313,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     int ret;
     struct utsname utsname;
     Error *local_err = NULL;
+    g_autofree char *kvm_enable_pmu;
 
     /*
      * Initialize confidential guest (SEV/TDX) context, if required
@@ -3437,6 +3449,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 
     pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
 
+    /*
+     * The enable_pmu parameter is introduced since Linux v5.17,
+     * give a chance to provide more information about vPMU
+     * enablement.
+     *
+     * The kvm.enable_pmu's permission is 0444. It does not change
+     * until a reload of the KVM module.
+     */
+    if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
+                            &kvm_enable_pmu, NULL, NULL)) {
+        if (*kvm_enable_pmu == 'N') {
+            kvm_pmu_disabled = true;
+        }
+    }
+
     return 0;
 }
 
-- 
2.39.3

Re: [PATCH v8 4/7] target/i386/kvm: query kvm.enable_pmu parameter

Posted by Chen, Zide 1 month, 1 week ago


On 12/29/2025 11:42 PM, Dongli Zhang wrote:
> When PMU is enabled in QEMU, there is a chance that PMU virtualization is
> completely disabled by the KVM module parameter kvm.enable_pmu=N.
> 
> The kvm.enable_pmu parameter is introduced since Linux v5.17.
> Its permission is 0444. It does not change until a reload of the KVM
> module.
> 
> Read the kvm.enable_pmu value from the module sysfs to give a chance to
> provide more information about vPMU enablement.> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
> Changed since v2:
>   - Rework the code flow following Zhao's suggestion.
>   - Return error when:
>     (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu)
> Changed since v3:
>   - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
>     suggestion.
>   - Rework the commit messages.
>   - Bring back global static variable 'kvm_pmu_disabled' from v2.
> Changed since v4:
>   - Add Reviewed-by from Zhao.
> Changed since v5:
>   - Rebase on top of most recent QEMU.
> Changed since v6:
>   - Add Reviewed-by from Dapeng Mi.
> 
>  target/i386/kvm/kvm.c | 61 +++++++++++++++++++++++++++++++------------
>  1 file changed, 44 insertions(+), 17 deletions(-)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 3b803c662d..338b9558e4 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -187,6 +187,10 @@ static int has_triple_fault_event;
>  static bool has_msr_mcg_ext_ctl;
>  
>  static int pmu_cap;
> +/*
> + * Read from /sys/module/kvm/parameters/enable_pmu.
> + */
> +static bool kvm_pmu_disabled;
>  
>  static struct kvm_cpuid2 *cpuid_cache;
>  static struct kvm_cpuid2 *hv_cpuid_cache;
> @@ -2068,23 +2072,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
>      if (first) {
>          first = false;
>  
> -        /*
> -         * Since Linux v5.18, KVM provides a VM-level capability to easily
> -         * disable PMUs; however, QEMU has been providing PMU property per
> -         * CPU since v1.6. In order to accommodate both, have to configure
> -         * the VM-level capability here.
> -         *
> -         * KVM_PMU_CAP_DISABLE doesn't change the PMU
> -         * behavior on Intel platform because current "pmu" property works
> -         * as expected.
> -         */
> -        if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
> -            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
> -                                    KVM_PMU_CAP_DISABLE);
> -            if (ret < 0) {
> -                error_setg_errno(errp, -ret,
> -                                 "Failed to set KVM_PMU_CAP_DISABLE");
> -                return ret;
> +        if (X86_CPU(cpu)->enable_pmu) {
> +            if (kvm_pmu_disabled) {
> +                warn_report("Failed to enable PMU since "
> +                            "KVM's enable_pmu parameter is disabled");

I'm wondering about the intended value of this patch?

If enable_pmu is true in QEMU but the corresponding KVM parameter is
false, then KVM_GET_SUPPORTED_CPUID or KVM_GET_MSRS should be able to
tell that the PMU feature is not supported by host.

The logic implemented in this patch seems somewhat redundant.

Additionally, in this scenario — where the user intends to enable a
feature but the host cannot support it — normally no warning is emitted
by QEMU.

> +            }
> +        } else {
> +            /*
> +             * Since Linux v5.18, KVM provides a VM-level capability to easily
> +             * disable PMUs; however, QEMU has been providing PMU property per
> +             * CPU since v1.6. In order to accommodate both, have to configure
> +             * the VM-level capability here.
> +             *
> +             * KVM_PMU_CAP_DISABLE doesn't change the PMU
> +             * behavior on Intel platform because current "pmu" property works
> +             * as expected.
> +             */
> +            if (pmu_cap & KVM_PMU_CAP_DISABLE) {
> +                ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
> +                                        KVM_PMU_CAP_DISABLE);
> +                if (ret < 0) {
> +                    error_setg_errno(errp, -ret,
> +                                     "Failed to set KVM_PMU_CAP_DISABLE");
> +                    return ret;
> +                }
>              }
>          }
>      }
> @@ -3302,6 +3313,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>      int ret;
>      struct utsname utsname;
>      Error *local_err = NULL;
> +    g_autofree char *kvm_enable_pmu;
>  
>      /*
>       * Initialize confidential guest (SEV/TDX) context, if required
> @@ -3437,6 +3449,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>  
>      pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
>  
> +    /*
> +     * The enable_pmu parameter is introduced since Linux v5.17,
> +     * give a chance to provide more information about vPMU
> +     * enablement.
> +     *
> +     * The kvm.enable_pmu's permission is 0444. It does not change
> +     * until a reload of the KVM module.
> +     */
> +    if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
> +                            &kvm_enable_pmu, NULL, NULL)) {
> +        if (*kvm_enable_pmu == 'N') {
> +            kvm_pmu_disabled = true;

It’s generally better not to rely on KVM’s internal implementation
unless really necessary.

For example, in the new mediated vPMU framework, even if the KVM module
parameter enable_pmu is set, the per-guest kvm->arch.enable_pmu could
still be cleared.

In such a case, the logic here might not be correct.

> +        }
> +    }
> +
>      return 0;
>  }
>

Re: [PATCH v8 4/7] target/i386/kvm: query kvm.enable_pmu parameter

Posted by Dongli Zhang 1 month ago

Hi Zide,

On 1/2/26 2:59 PM, Chen, Zide wrote:
> 
> 
> On 12/29/2025 11:42 PM, Dongli Zhang wrote:

[snip]

>>  
>>  static struct kvm_cpuid2 *cpuid_cache;
>>  static struct kvm_cpuid2 *hv_cpuid_cache;
>> @@ -2068,23 +2072,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
>>      if (first) {
>>          first = false;
>>  
>> -        /*
>> -         * Since Linux v5.18, KVM provides a VM-level capability to easily
>> -         * disable PMUs; however, QEMU has been providing PMU property per
>> -         * CPU since v1.6. In order to accommodate both, have to configure
>> -         * the VM-level capability here.
>> -         *
>> -         * KVM_PMU_CAP_DISABLE doesn't change the PMU
>> -         * behavior on Intel platform because current "pmu" property works
>> -         * as expected.
>> -         */
>> -        if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
>> -            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>> -                                    KVM_PMU_CAP_DISABLE);
>> -            if (ret < 0) {
>> -                error_setg_errno(errp, -ret,
>> -                                 "Failed to set KVM_PMU_CAP_DISABLE");
>> -                return ret;
>> +        if (X86_CPU(cpu)->enable_pmu) {
>> +            if (kvm_pmu_disabled) {
>> +                warn_report("Failed to enable PMU since "
>> +                            "KVM's enable_pmu parameter is disabled");
> 
> I'm wondering about the intended value of this patch?
> 
> If enable_pmu is true in QEMU but the corresponding KVM parameter is
> false, then KVM_GET_SUPPORTED_CPUID or KVM_GET_MSRS should be able to
> tell that the PMU feature is not supported by host.
> 
> The logic implemented in this patch seems somewhat redundant.

For Intel, the QEMU userspace can determine if the vPMU is disabled by KVM
through the use of KVM_GET_SUPPORTED_CPUID.

However, this approach does not apply to AMD. Unlike Intel, AMD does not rely on
CPUID to detect whether PMU is supported. By default, we can assume that PMU is
always available, except for the recent PerfMonV2 feature.

The main objective of this PATCH 4/7 is to introduce the variable
'kvm_pmu_disabled', which will be reused in PATCH 5/7 to skip any PMU
initialization if the parameter is set to 'N'.

+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+    CPUX86State *env = &cpu->env;
+
+    /*
+     * The PMU virtualization is disabled by kvm.enable_pmu=N.
+     */
+    if (kvm_pmu_disabled) {
+        return;
+    }

The 'kvm_pmu_disabled' variable is used to differentiate between the following
two scenarios on AMD:

(1) A newer KVM with KVM_PMU_CAP_DISABLE support, but explicitly disabled via
the KVM parameter ('N').

(2) An older KVM without KVM_CAP_PMU_CAPABILITY support.

In both cases, the call to KVM_CAP_PMU_CAPABILITY extension support check may
return 0.

By reading the file "/sys/module/kvm/parameters/enable_pmu", we can distinguish
between these two scenarios.

As you mentioned, another approach would be to use KVM_GET_MSRS to specifically
probe for AMD during QEMU initialization. In this case, we can set
'kvm_pmu_disabled' to true if reading the AMD PMU MSR registers fails.

To implement this, we may need to:

1. Turn this patch to be AMD specific by probing the AMD PMU registers during
initialization. We may need go create a new function in QEMU to use KVM_GET_MSRS
for probing only, or we may re-use kvm_arch_get_supported_msr_feature() or
kvm_get_one_msr(). I may change in the next version.

2. Limit the usage of 'kvm_pmu_disabled' to be AMD specific in PATCH 5/7.

> 
> Additionally, in this scenario — where the user intends to enable a
> feature but the host cannot support it — normally no warning is emitted
> by QEMU.

According to the usage of QEMU, may I assume QEMU already prints warning logs
for unsupported features? The below is an example.

QEMU 10.2.50 monitor - type 'help' for more information
qemu-system-x86_64: warning: host doesn't support requested feature:
CPUID[eax=07h,ecx=00h].EBX.hle [bit 4]
qemu-system-x86_64: warning: host doesn't support requested feature:
CPUID[eax=07h,ecx=00h].EBX.rtm [bit 11]

> 
>> +            }
>> +        } else {
>> +            /*
>> +             * Since Linux v5.18, KVM provides a VM-level capability to easily
>> +             * disable PMUs; however, QEMU has been providing PMU property per
>> +             * CPU since v1.6. In order to accommodate both, have to configure
>> +             * the VM-level capability here.
>> +             *
>> +             * KVM_PMU_CAP_DISABLE doesn't change the PMU
>> +             * behavior on Intel platform because current "pmu" property works
>> +             * as expected.
>> +             */
>> +            if (pmu_cap & KVM_PMU_CAP_DISABLE) {
>> +                ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>> +                                        KVM_PMU_CAP_DISABLE);
>> +                if (ret < 0) {
>> +                    error_setg_errno(errp, -ret,
>> +                                     "Failed to set KVM_PMU_CAP_DISABLE");
>> +                    return ret;
>> +                }
>>              }
>>          }
>>      }
>> @@ -3302,6 +3313,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>      int ret;
>>      struct utsname utsname;
>>      Error *local_err = NULL;
>> +    g_autofree char *kvm_enable_pmu;
>>  
>>      /*
>>       * Initialize confidential guest (SEV/TDX) context, if required
>> @@ -3437,6 +3449,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>  
>>      pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
>>  
>> +    /*
>> +     * The enable_pmu parameter is introduced since Linux v5.17,
>> +     * give a chance to provide more information about vPMU
>> +     * enablement.
>> +     *
>> +     * The kvm.enable_pmu's permission is 0444. It does not change
>> +     * until a reload of the KVM module.
>> +     */
>> +    if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
>> +                            &kvm_enable_pmu, NULL, NULL)) {
>> +        if (*kvm_enable_pmu == 'N') {
>> +            kvm_pmu_disabled = true;
> 
> It’s generally better not to rely on KVM’s internal implementation
> unless really necessary.
> 
> For example, in the new mediated vPMU framework, even if the KVM module
> parameter enable_pmu is set, the per-guest kvm->arch.enable_pmu could
> still be cleared.
> 
> In such a case, the logic here might not be correct.

Would the Mediated vPMU set KVM_PMU_CAP_DISABLE to clear per-VM enable_pmu even
when the global KVM parameter enable_pmu=N is set?

In this scenario, we plan to rely on KVM_PMU_CAP_DISABLE only when the value of
"/sys/module/kvm/parameters/enable_pmu" is not equal to N.

Can I assume that this will work with Mediated vPMU?

Is there any possibility to follow the current approach before Mediated vPMU is
finalized for mainline, and later introduce an incremental change using
KVM_GET_MSRS probing? The current approach is straightforward and can work with
existing Linux kernel source code.

For quite some time, QEMU has lacked support for disabling or resetting AMD PMU
registers. If we could add this feature before Mediated vPMU is finalized, it
would benefit many existing kernel versions. This patchset solves production bugs.

Feel free to let me know your thought, while I would starting working on next
version now.

Thank you very much!

Dongli Zhang

Re: [PATCH v8 4/7] target/i386/kvm: query kvm.enable_pmu parameter

Posted by Chen, Zide 1 month ago


On 1/5/2026 12:21 PM, Dongli Zhang wrote:
> Hi Zide,
> 
> On 1/2/26 2:59 PM, Chen, Zide wrote:
>>
>>
>> On 12/29/2025 11:42 PM, Dongli Zhang wrote:
> 
> [snip]
> 
>>>  
>>>  static struct kvm_cpuid2 *cpuid_cache;
>>>  static struct kvm_cpuid2 *hv_cpuid_cache;
>>> @@ -2068,23 +2072,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
>>>      if (first) {
>>>          first = false;
>>>  
>>> -        /*
>>> -         * Since Linux v5.18, KVM provides a VM-level capability to easily
>>> -         * disable PMUs; however, QEMU has been providing PMU property per
>>> -         * CPU since v1.6. In order to accommodate both, have to configure
>>> -         * the VM-level capability here.
>>> -         *
>>> -         * KVM_PMU_CAP_DISABLE doesn't change the PMU
>>> -         * behavior on Intel platform because current "pmu" property works
>>> -         * as expected.
>>> -         */
>>> -        if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
>>> -            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>>> -                                    KVM_PMU_CAP_DISABLE);
>>> -            if (ret < 0) {
>>> -                error_setg_errno(errp, -ret,
>>> -                                 "Failed to set KVM_PMU_CAP_DISABLE");
>>> -                return ret;
>>> +        if (X86_CPU(cpu)->enable_pmu) {
>>> +            if (kvm_pmu_disabled) {
>>> +                warn_report("Failed to enable PMU since "
>>> +                            "KVM's enable_pmu parameter is disabled");
>>
>> I'm wondering about the intended value of this patch?
>>
>> If enable_pmu is true in QEMU but the corresponding KVM parameter is
>> false, then KVM_GET_SUPPORTED_CPUID or KVM_GET_MSRS should be able to
>> tell that the PMU feature is not supported by host.
>>
>> The logic implemented in this patch seems somewhat redundant.
> 
> For Intel, the QEMU userspace can determine if the vPMU is disabled by KVM
> through the use of KVM_GET_SUPPORTED_CPUID.
> 
> However, this approach does not apply to AMD. Unlike Intel, AMD does not rely on
> CPUID to detect whether PMU is supported. By default, we can assume that PMU is
> always available, except for the recent PerfMonV2 feature.
> 
> The main objective of this PATCH 4/7 is to introduce the variable
> 'kvm_pmu_disabled', which will be reused in PATCH 5/7 to skip any PMU
> initialization if the parameter is set to 'N'.
> 
> +static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
> +{
> +    CPUX86State *env = &cpu->env;
> +
> +    /*
> +     * The PMU virtualization is disabled by kvm.enable_pmu=N.
> +     */
> +    if (kvm_pmu_disabled) {
> +        return;
> +    }

Thanks for explanation.

> The 'kvm_pmu_disabled' variable is used to differentiate between the following
> two scenarios on AMD:
> 
> (1) A newer KVM with KVM_PMU_CAP_DISABLE support, but explicitly disabled via
> the KVM parameter ('N').
> 
> (2) An older KVM without KVM_CAP_PMU_CAPABILITY support.
> 
> In both cases, the call to KVM_CAP_PMU_CAPABILITY extension support check may
> return 0.
> 
> By reading the file "/sys/module/kvm/parameters/enable_pmu", we can distinguish
> between these two scenarios.

As described in PATCH 1/7, without issuing KVM_PMU_CAP_DISABLE, KVM has
no way to know that userspace does not intend to enable vPMU in AMD
platforms, and therefore does not fault guest accesses to PMU MSRs.

My understanding is that the issue being addressed here is basically the
opposite: QEMU does not know that vPMU is disabled by KVM.

IIUC, one difference between Intel and AMD is that AMD lacks a CPUID
leaf to indicate the availability of PMU version 1. But Intel
potentially could be in the same situation that KVM advertises PMU
availability but it's not actually supported. (e.g. kvm->arch.enable_pmu
is false while modules parameter enable_pmu is true).

From the guest’s point of view, it probes PMU MSRs to determine whether
PMU support is present and it's fine in this situation.

In userspace, QEMU may issue KVM_SET_MSRS / KVM_GET_MSRS to KVM without
knowing that vPMU has been disabled by KVM.  I think these IOCTLs should
not fail, since KVM states that “Userspace is allowed to read MSRs, and
write ‘0’ to MSRs, that KVM advertises to userspace, even if an MSR
isn’t fully supported.”

My current understanding is that AMD should be fine even without
kvm_pmu_disabled, but I may be missing some context here.

The bottom line is this patch doesn't handle the cases that KVM still
could disable vPMU support even if enable_pmu is true.


> As you mentioned, another approach would be to use KVM_GET_MSRS to specifically
> probe for AMD during QEMU initialization. In this case, we can set
> 'kvm_pmu_disabled' to true if reading the AMD PMU MSR registers fails.
> 
> To implement this, we may need to:
> 
> 1. Turn this patch to be AMD specific by probing the AMD PMU registers during
> initialization. We may need go create a new function in QEMU to use KVM_GET_MSRS
> for probing only, or we may re-use kvm_arch_get_supported_msr_feature() or
> kvm_get_one_msr(). I may change in the next version.
> 
> 2. Limit the usage of 'kvm_pmu_disabled' to be AMD specific in PATCH 5/7.

I guess this might make things more complicated.

>>
>> Additionally, in this scenario — where the user intends to enable a
>> feature but the host cannot support it — normally no warning is emitted
>> by QEMU.
> 
> According to the usage of QEMU, may I assume QEMU already prints warning logs
> for unsupported features? The below is an example.
> 
> QEMU 10.2.50 monitor - type 'help' for more information
> qemu-system-x86_64: warning: host doesn't support requested feature:
> CPUID[eax=07h,ecx=00h].EBX.hle [bit 4]
> qemu-system-x86_64: warning: host doesn't support requested feature:
> CPUID[eax=07h,ecx=00h].EBX.rtm [bit 11]
> 
>>
>>> +            }
>>> +        } else {
>>> +            /*
>>> +             * Since Linux v5.18, KVM provides a VM-level capability to easily
>>> +             * disable PMUs; however, QEMU has been providing PMU property per
>>> +             * CPU since v1.6. In order to accommodate both, have to configure
>>> +             * the VM-level capability here.
>>> +             *
>>> +             * KVM_PMU_CAP_DISABLE doesn't change the PMU
>>> +             * behavior on Intel platform because current "pmu" property works
>>> +             * as expected.
>>> +             */
>>> +            if (pmu_cap & KVM_PMU_CAP_DISABLE) {
>>> +                ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>>> +                                        KVM_PMU_CAP_DISABLE);
>>> +                if (ret < 0) {
>>> +                    error_setg_errno(errp, -ret,
>>> +                                     "Failed to set KVM_PMU_CAP_DISABLE");
>>> +                    return ret;
>>> +                }
>>>              }
>>>          }
>>>      }
>>> @@ -3302,6 +3313,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>      int ret;
>>>      struct utsname utsname;
>>>      Error *local_err = NULL;
>>> +    g_autofree char *kvm_enable_pmu;
>>>  
>>>      /*
>>>       * Initialize confidential guest (SEV/TDX) context, if required
>>> @@ -3437,6 +3449,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>  
>>>      pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
>>>  
>>> +    /*
>>> +     * The enable_pmu parameter is introduced since Linux v5.17,
>>> +     * give a chance to provide more information about vPMU
>>> +     * enablement.
>>> +     *
>>> +     * The kvm.enable_pmu's permission is 0444. It does not change
>>> +     * until a reload of the KVM module.
>>> +     */
>>> +    if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
>>> +                            &kvm_enable_pmu, NULL, NULL)) {
>>> +        if (*kvm_enable_pmu == 'N') {
>>> +            kvm_pmu_disabled = true;
>>
>> It’s generally better not to rely on KVM’s internal implementation
>> unless really necessary.
>>
>> For example, in the new mediated vPMU framework, even if the KVM module
>> parameter enable_pmu is set, the per-guest kvm->arch.enable_pmu could
>> still be cleared.
>>
>> In such a case, the logic here might not be correct.
> 
> Would the Mediated vPMU set KVM_PMU_CAP_DISABLE to clear per-VM enable_pmu even
> when the global KVM parameter enable_pmu=N is set?
> 
> In this scenario, we plan to rely on KVM_PMU_CAP_DISABLE only when the value of
> "/sys/module/kvm/parameters/enable_pmu" is not equal to N.
> 
> Can I assume that this will work with Mediated vPMU?
> 
> 
> Is there any possibility to follow the current approach before Mediated vPMU is
> finalized for mainline, and later introduce an incremental change using
> KVM_GET_MSRS probing? The current approach is straightforward and can work with
> existing Linux kernel source code.

Apologies for the incorrect statement I made earlier regarding mediated
vPMU.

According to the mediated vPMU v6, the only behavior specific to
mediated vPMU is that kvm->arch.enable_pmu may be cleared when
irqchip_in_kernel() is not true:
https://lore.kernel.org/all/20251206001720.468579-17-seanjc@google.com/

However, this does not imply that mediated vPMU requires any special
handling here. In theory, KVM could clear kvm->arch.enable_pmu in the
future for other reasons.


> For quite some time, QEMU has lacked support for disabling or resetting AMD PMU
> registers. If we could add this feature before Mediated vPMU is finalized, it
> would benefit many existing kernel versions. This patchset solves production bugs.
> 
> 
> Feel free to let me know your thought, while I would starting working on next
> version now.
> 
> Thank you very much!
> 
> Dongli Zhang
> 
>

Re: [PATCH v8 4/7] target/i386/kvm: query kvm.enable_pmu parameter

Posted by Dongli Zhang 1 month ago

Hi Zide,

On 1/6/26 1:03 PM, Chen, Zide wrote:
> 
> 
> On 1/5/2026 12:21 PM, Dongli Zhang wrote:
>> Hi Zide,
>>
>> On 1/2/26 2:59 PM, Chen, Zide wrote:
>>>
>>>
>>> On 12/29/2025 11:42 PM, Dongli Zhang wrote:
>>
>> [snip]
>>
>>>>  
>>>>  static struct kvm_cpuid2 *cpuid_cache;
>>>>  static struct kvm_cpuid2 *hv_cpuid_cache;
>>>> @@ -2068,23 +2072,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
>>>>      if (first) {
>>>>          first = false;
>>>>  
>>>> -        /*
>>>> -         * Since Linux v5.18, KVM provides a VM-level capability to easily
>>>> -         * disable PMUs; however, QEMU has been providing PMU property per
>>>> -         * CPU since v1.6. In order to accommodate both, have to configure
>>>> -         * the VM-level capability here.
>>>> -         *
>>>> -         * KVM_PMU_CAP_DISABLE doesn't change the PMU
>>>> -         * behavior on Intel platform because current "pmu" property works
>>>> -         * as expected.
>>>> -         */
>>>> -        if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
>>>> -            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>>>> -                                    KVM_PMU_CAP_DISABLE);
>>>> -            if (ret < 0) {
>>>> -                error_setg_errno(errp, -ret,
>>>> -                                 "Failed to set KVM_PMU_CAP_DISABLE");
>>>> -                return ret;
>>>> +        if (X86_CPU(cpu)->enable_pmu) {
>>>> +            if (kvm_pmu_disabled) {
>>>> +                warn_report("Failed to enable PMU since "
>>>> +                            "KVM's enable_pmu parameter is disabled");
>>>
>>> I'm wondering about the intended value of this patch?
>>>
>>> If enable_pmu is true in QEMU but the corresponding KVM parameter is
>>> false, then KVM_GET_SUPPORTED_CPUID or KVM_GET_MSRS should be able to
>>> tell that the PMU feature is not supported by host.
>>>
>>> The logic implemented in this patch seems somewhat redundant.
>>
>> For Intel, the QEMU userspace can determine if the vPMU is disabled by KVM
>> through the use of KVM_GET_SUPPORTED_CPUID.
>>
>> However, this approach does not apply to AMD. Unlike Intel, AMD does not rely on
>> CPUID to detect whether PMU is supported. By default, we can assume that PMU is
>> always available, except for the recent PerfMonV2 feature.
>>
>> The main objective of this PATCH 4/7 is to introduce the variable
>> 'kvm_pmu_disabled', which will be reused in PATCH 5/7 to skip any PMU
>> initialization if the parameter is set to 'N'.
>>
>> +static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
>> +{
>> +    CPUX86State *env = &cpu->env;
>> +
>> +    /*
>> +     * The PMU virtualization is disabled by kvm.enable_pmu=N.
>> +     */
>> +    if (kvm_pmu_disabled) {
>> +        return;
>> +    }
> 
> Thanks for explanation.
> 
>> The 'kvm_pmu_disabled' variable is used to differentiate between the following
>> two scenarios on AMD:
>>
>> (1) A newer KVM with KVM_PMU_CAP_DISABLE support, but explicitly disabled via
>> the KVM parameter ('N').
>>
>> (2) An older KVM without KVM_CAP_PMU_CAPABILITY support.
>>
>> In both cases, the call to KVM_CAP_PMU_CAPABILITY extension support check may
>> return 0.
>>
>> By reading the file "/sys/module/kvm/parameters/enable_pmu", we can distinguish
>> between these two scenarios.
> 
> As described in PATCH 1/7, without issuing KVM_PMU_CAP_DISABLE, KVM has
> no way to know that userspace does not intend to enable vPMU in AMD
> platforms, and therefore does not fault guest accesses to PMU MSRs.
> 
> My understanding is that the issue being addressed here is basically the
> opposite: QEMU does not know that vPMU is disabled by KVM.

Exactly.

Otherwise, QEMU issues unwanted MSR writes for every vCPU during QEMU reset.

> 
> IIUC, one difference between Intel and AMD is that AMD lacks a CPUID
> leaf to indicate the availability of PMU version 1. But Intel
> potentially could be in the same situation that KVM advertises PMU
> availability but it's not actually supported. (e.g. kvm->arch.enable_pmu
> is false while modules parameter enable_pmu is true).
> 
> From the guest’s point of view, it probes PMU MSRs to determine whether
> PMU support is present and it's fine in this situation.
> 
> In userspace, QEMU may issue KVM_SET_MSRS / KVM_GET_MSRS to KVM without
> knowing that vPMU has been disabled by KVM.  I think these IOCTLs should
> not fail, since KVM states that “Userspace is allowed to read MSRs, and
> write ‘0’ to MSRs, that KVM advertises to userspace, even if an MSR
> isn’t fully supported.”
> 
> My current understanding is that AMD should be fine even without
> kvm_pmu_disabled, but I may be missing some context here.
> 
> The bottom line is this patch doesn't handle the cases that KVM still
> could disable vPMU support even if enable_pmu is true.

Yes. There are still unwanted PMU MSR writes from QEMU. This just seems odd.

The concern with unwanted MSR writes was initially raised by Maksim Davydov:

https://lore.kernel.org/qemu-devel/a7f9c3c9-09af-4941-b137-2cb83ef8ceb3@yandex-team.ru/

As shown below on the v6.0 KVM hypervisor (AMD), while there are no errors from
QEMU, numerous annoying warnings are generated. (If I recall correctly, this can
also be triggered from the VM itself.)

However, here the logs are not only due to vcpu0, but indeed every vcpu.

[  280.802976] kvm_set_msr_common: 1910 callbacks suppressed
[  280.802981] kvm [18411]: vcpu0, guest rIP: 0xffffffffa4c97844 disabled
perfctr wrmsr: 0xc0010007 data 0xffff
[  295.345747] kvm [18411]: vcpu0, guest rIP: 0xfff0 disabled perfctr wrmsr:
0xc0010004 data 0x0
[  295.355379] kvm [18411]: vcpu0, guest rIP: 0xfff0 disabled perfctr wrmsr:
0xc0010005 data 0x0
[  295.364997] kvm [18411]: vcpu0, guest rIP: 0xfff0 disabled perfctr wrmsr:
0xc0010006 data 0x0
[  295.374618] kvm [18411]: vcpu0, guest rIP: 0xfff0 disabled perfctr wrmsr:
0xc0010007 data 0x0
[  295.385048] kvm [18411]: vcpu1, guest rIP: 0xfff0 disabled perfctr wrmsr:
0xc0010004 data 0x0
[  295.394694] kvm [18411]: vcpu1, guest rIP: 0xfff0 disabled perfctr wrmsr:
0xc0010005 data 0x0
[  295.404317] kvm [18411]: vcpu1, guest rIP: 0xfff0 disabled perfctr wrmsr:
0xc0010006 data 0x0
[  295.413928] kvm [18411]: vcpu1, guest rIP: 0xfff0 disabled perfctr wrmsr:
0xc0010007 data 0x0
[  295.424319] kvm [18411]: vcpu2, guest rIP: 0xfff0 disabled perfctr wrmsr:
0xc0010004 data 0x0
[  295.433963] kvm [18411]: vcpu2, guest rIP: 0xfff0 disabled perfctr wrmsr:
0xc0010005 data 0x0
[  301.966571] kvm_set_msr_common: 1910 callbacks suppressed
[  301.966577] kvm [18411]: vcpu0, guest rIP: 0xffffffff8ac97844 disabled
perfctr wrmsr: 0xc0010007 data 0xffff

> 
> 
>> As you mentioned, another approach would be to use KVM_GET_MSRS to specifically
>> probe for AMD during QEMU initialization. In this case, we can set
>> 'kvm_pmu_disabled' to true if reading the AMD PMU MSR registers fails.
>>
>> To implement this, we may need to:
>>
>> 1. Turn this patch to be AMD specific by probing the AMD PMU registers during
>> initialization. We may need go create a new function in QEMU to use KVM_GET_MSRS
>> for probing only, or we may re-use kvm_arch_get_supported_msr_feature() or
>> kvm_get_one_msr(). I may change in the next version.
>>
>> 2. Limit the usage of 'kvm_pmu_disabled' to be AMD specific in PATCH 5/7.
> 
> I guess this might make things more complicated.
> 
>>>
>>> Additionally, in this scenario — where the user intends to enable a
>>> feature but the host cannot support it — normally no warning is emitted
>>> by QEMU.
>>
>> According to the usage of QEMU, may I assume QEMU already prints warning logs
>> for unsupported features? The below is an example.
>>
>> QEMU 10.2.50 monitor - type 'help' for more information
>> qemu-system-x86_64: warning: host doesn't support requested feature:
>> CPUID[eax=07h,ecx=00h].EBX.hle [bit 4]
>> qemu-system-x86_64: warning: host doesn't support requested feature:
>> CPUID[eax=07h,ecx=00h].EBX.rtm [bit 11]
>>
>>>
>>>> +            }
>>>> +        } else {
>>>> +            /*
>>>> +             * Since Linux v5.18, KVM provides a VM-level capability to easily
>>>> +             * disable PMUs; however, QEMU has been providing PMU property per
>>>> +             * CPU since v1.6. In order to accommodate both, have to configure
>>>> +             * the VM-level capability here.
>>>> +             *
>>>> +             * KVM_PMU_CAP_DISABLE doesn't change the PMU
>>>> +             * behavior on Intel platform because current "pmu" property works
>>>> +             * as expected.
>>>> +             */
>>>> +            if (pmu_cap & KVM_PMU_CAP_DISABLE) {
>>>> +                ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>>>> +                                        KVM_PMU_CAP_DISABLE);
>>>> +                if (ret < 0) {
>>>> +                    error_setg_errno(errp, -ret,
>>>> +                                     "Failed to set KVM_PMU_CAP_DISABLE");
>>>> +                    return ret;
>>>> +                }
>>>>              }
>>>>          }
>>>>      }
>>>> @@ -3302,6 +3313,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>>      int ret;
>>>>      struct utsname utsname;
>>>>      Error *local_err = NULL;
>>>> +    g_autofree char *kvm_enable_pmu;
>>>>  
>>>>      /*
>>>>       * Initialize confidential guest (SEV/TDX) context, if required
>>>> @@ -3437,6 +3449,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>>  
>>>>      pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
>>>>  
>>>> +    /*
>>>> +     * The enable_pmu parameter is introduced since Linux v5.17,
>>>> +     * give a chance to provide more information about vPMU
>>>> +     * enablement.
>>>> +     *
>>>> +     * The kvm.enable_pmu's permission is 0444. It does not change
>>>> +     * until a reload of the KVM module.
>>>> +     */
>>>> +    if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
>>>> +                            &kvm_enable_pmu, NULL, NULL)) {
>>>> +        if (*kvm_enable_pmu == 'N') {
>>>> +            kvm_pmu_disabled = true;
>>>
>>> It’s generally better not to rely on KVM’s internal implementation
>>> unless really necessary.
>>>
>>> For example, in the new mediated vPMU framework, even if the KVM module
>>> parameter enable_pmu is set, the per-guest kvm->arch.enable_pmu could
>>> still be cleared.
>>>
>>> In such a case, the logic here might not be correct.
>>
>> Would the Mediated vPMU set KVM_PMU_CAP_DISABLE to clear per-VM enable_pmu even
>> when the global KVM parameter enable_pmu=N is set?
>>
>> In this scenario, we plan to rely on KVM_PMU_CAP_DISABLE only when the value of
>> "/sys/module/kvm/parameters/enable_pmu" is not equal to N.
>>
>> Can I assume that this will work with Mediated vPMU?
>>
>>
>> Is there any possibility to follow the current approach before Mediated vPMU is
>> finalized for mainline, and later introduce an incremental change using
>> KVM_GET_MSRS probing? The current approach is straightforward and can work with
>> existing Linux kernel source code.
> 
> Apologies for the incorrect statement I made earlier regarding mediated
> vPMU.
> 
> According to the mediated vPMU v6, the only behavior specific to
> mediated vPMU is that kvm->arch.enable_pmu may be cleared when
> irqchip_in_kernel() is not true:
> https://urldefense.com/v3/__https://lore.kernel.org/all/20251206001720.468579-17-seanjc@google.com/__;!!ACWV5N9M2RV99hQ!IS8XQG3Zx84utP2QScNlxp-0H5JAgr89lBb1j2oGVJJop3WMyK6X2I5mlerMPA06wkJy8VFd1x6XGEHc6kWn$ 
> 
> However, this does not imply that mediated vPMU requires any special
> handling here. In theory, KVM could clear kvm->arch.enable_pmu in the
> future for other reasons.
> 

While "KVM could clear kvm->arch.enable_pmu in the future," I don't think KVM
may set kvm->arch.enable_pmu if the global enable_pmu is set to 'N'.

Taking Intel VMX EPT as an example, once "/sys/module/kvm_intel/parameters/ept"
is globally disabled, there's no way within KVM software to enable it for any
guest VM **after** 'ept' is set to N.

Similarly, "/sys/module/kvm/parameters/enable_pmu=N" indicates that this KVM
host will not support PMU virtualization in any way. Therefore, there should be
no way to enable vPMU for any guest VM if the global parameter is set to 'N'.
Here we read from ths parameter only during QEMU initialization.

That's why I believe it's reliable to trust the setting when
"/sys/module/kvm/parameters/enable_pmu=N".

In this way, we can avoid many unnecessary MSR writes, especially in cases where
a VM has 300+ vCPUs, even though these may be equivalent to NOPs with
optimizations in more recent KVM versions.

The objective isn't to improve performance. Minimizing the number of unwanted
MSR writes from QEMU reduces the chances of failure (e.g., due to any QEMU
software bug). We can simply avoid those unwanted MSR/NOPs by reading from a KVM
parameter.

From a user's perspective, this just seems odd.
"/sys/module/kvm/parameters/enable_pmu=N" is a reliable setting. If there's a
configuration mismatch between QEMU and KVM, a warning could alert the user.

I can remove this patch, along with the 'kvm_pmu_disabled' variable.

Thank you very much!

Dongli Zhang

Re: [PATCH v8 4/7] target/i386/kvm: query kvm.enable_pmu parameter

Posted by Chen, Zide 1 month ago


On 1/7/2026 12:05 AM, Dongli Zhang wrote:
> Hi Zide,
> 
> On 1/6/26 1:03 PM, Chen, Zide wrote:
>>
>>
>> On 1/5/2026 12:21 PM, Dongli Zhang wrote:
>>> Hi Zide,
>>>
>>> On 1/2/26 2:59 PM, Chen, Zide wrote:
>>>>
>>>>
>>>> On 12/29/2025 11:42 PM, Dongli Zhang wrote:
>>>
>>> [snip]
>>>
>>>>>  
>>>>>  static struct kvm_cpuid2 *cpuid_cache;
>>>>>  static struct kvm_cpuid2 *hv_cpuid_cache;
>>>>> @@ -2068,23 +2072,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
>>>>>      if (first) {
>>>>>          first = false;
>>>>>  
>>>>> -        /*
>>>>> -         * Since Linux v5.18, KVM provides a VM-level capability to easily
>>>>> -         * disable PMUs; however, QEMU has been providing PMU property per
>>>>> -         * CPU since v1.6. In order to accommodate both, have to configure
>>>>> -         * the VM-level capability here.
>>>>> -         *
>>>>> -         * KVM_PMU_CAP_DISABLE doesn't change the PMU
>>>>> -         * behavior on Intel platform because current "pmu" property works
>>>>> -         * as expected.
>>>>> -         */
>>>>> -        if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
>>>>> -            ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>>>>> -                                    KVM_PMU_CAP_DISABLE);
>>>>> -            if (ret < 0) {
>>>>> -                error_setg_errno(errp, -ret,
>>>>> -                                 "Failed to set KVM_PMU_CAP_DISABLE");
>>>>> -                return ret;
>>>>> +        if (X86_CPU(cpu)->enable_pmu) {
>>>>> +            if (kvm_pmu_disabled) {
>>>>> +                warn_report("Failed to enable PMU since "
>>>>> +                            "KVM's enable_pmu parameter is disabled");
>>>>
>>>> I'm wondering about the intended value of this patch?
>>>>
>>>> If enable_pmu is true in QEMU but the corresponding KVM parameter is
>>>> false, then KVM_GET_SUPPORTED_CPUID or KVM_GET_MSRS should be able to
>>>> tell that the PMU feature is not supported by host.
>>>>
>>>> The logic implemented in this patch seems somewhat redundant.
>>>
>>> For Intel, the QEMU userspace can determine if the vPMU is disabled by KVM
>>> through the use of KVM_GET_SUPPORTED_CPUID.
>>>
>>> However, this approach does not apply to AMD. Unlike Intel, AMD does not rely on
>>> CPUID to detect whether PMU is supported. By default, we can assume that PMU is
>>> always available, except for the recent PerfMonV2 feature.
>>>
>>> The main objective of this PATCH 4/7 is to introduce the variable
>>> 'kvm_pmu_disabled', which will be reused in PATCH 5/7 to skip any PMU
>>> initialization if the parameter is set to 'N'.
>>>
>>> +static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
>>> +{
>>> +    CPUX86State *env = &cpu->env;
>>> +
>>> +    /*
>>> +     * The PMU virtualization is disabled by kvm.enable_pmu=N.
>>> +     */
>>> +    if (kvm_pmu_disabled) {
>>> +        return;
>>> +    }
>>
>> Thanks for explanation.
>>
>>> The 'kvm_pmu_disabled' variable is used to differentiate between the following
>>> two scenarios on AMD:
>>>
>>> (1) A newer KVM with KVM_PMU_CAP_DISABLE support, but explicitly disabled via
>>> the KVM parameter ('N').
>>>
>>> (2) An older KVM without KVM_CAP_PMU_CAPABILITY support.
>>>
>>> In both cases, the call to KVM_CAP_PMU_CAPABILITY extension support check may
>>> return 0.
>>>
>>> By reading the file "/sys/module/kvm/parameters/enable_pmu", we can distinguish
>>> between these two scenarios.
>>
>> As described in PATCH 1/7, without issuing KVM_PMU_CAP_DISABLE, KVM has
>> no way to know that userspace does not intend to enable vPMU in AMD
>> platforms, and therefore does not fault guest accesses to PMU MSRs.
>>
>> My understanding is that the issue being addressed here is basically the
>> opposite: QEMU does not know that vPMU is disabled by KVM.
> 
> Exactly.
> 
> Otherwise, QEMU issues unwanted MSR writes for every vCPU during QEMU reset.
> 
>>
>> IIUC, one difference between Intel and AMD is that AMD lacks a CPUID
>> leaf to indicate the availability of PMU version 1. But Intel
>> potentially could be in the same situation that KVM advertises PMU
>> availability but it's not actually supported. (e.g. kvm->arch.enable_pmu
>> is false while modules parameter enable_pmu is true).
>>
>> From the guest’s point of view, it probes PMU MSRs to determine whether
>> PMU support is present and it's fine in this situation.
>>
>> In userspace, QEMU may issue KVM_SET_MSRS / KVM_GET_MSRS to KVM without
>> knowing that vPMU has been disabled by KVM.  I think these IOCTLs should
>> not fail, since KVM states that “Userspace is allowed to read MSRs, and
>> write ‘0’ to MSRs, that KVM advertises to userspace, even if an MSR
>> isn’t fully supported.”
>>
>> My current understanding is that AMD should be fine even without
>> kvm_pmu_disabled, but I may be missing some context here.
>>
>> The bottom line is this patch doesn't handle the cases that KVM still
>> could disable vPMU support even if enable_pmu is true.
> 
> Yes. There are still unwanted PMU MSR writes from QEMU. This just seems odd.
> 
> The concern with unwanted MSR writes was initially raised by Maksim Davydov:
> 
> https://lore.kernel.org/qemu-devel/a7f9c3c9-09af-4941-b137-2cb83ef8ceb3@yandex-team.ru/
> 
> As shown below on the v6.0 KVM hypervisor (AMD), while there are no errors from
> QEMU, numerous annoying warnings are generated. (If I recall correctly, this can
> also be triggered from the VM itself.)
> 
> However, here the logs are not only due to vcpu0, but indeed every vcpu.
> 
> [  280.802976] kvm_set_msr_common: 1910 callbacks suppressed
> [  280.802981] kvm [18411]: vcpu0, guest rIP: 0xffffffffa4c97844 disabled
> perfctr wrmsr: 0xc0010007 data 0xffff
> [  295.345747] kvm [18411]: vcpu0, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010004 data 0x0
> [  295.355379] kvm [18411]: vcpu0, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010005 data 0x0
> [  295.364997] kvm [18411]: vcpu0, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010006 data 0x0
> [  295.374618] kvm [18411]: vcpu0, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010007 data 0x0
> [  295.385048] kvm [18411]: vcpu1, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010004 data 0x0
> [  295.394694] kvm [18411]: vcpu1, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010005 data 0x0
> [  295.404317] kvm [18411]: vcpu1, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010006 data 0x0
> [  295.413928] kvm [18411]: vcpu1, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010007 data 0x0
> [  295.424319] kvm [18411]: vcpu2, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010004 data 0x0
> [  295.433963] kvm [18411]: vcpu2, guest rIP: 0xfff0 disabled perfctr wrmsr:
> 0xc0010005 data 0x0
> [  301.966571] kvm_set_msr_common: 1910 callbacks suppressed
> [  301.966577] kvm [18411]: vcpu0, guest rIP: 0xffffffff8ac97844 disabled
> perfctr wrmsr: 0xc0010007 data 0xffff

In e76ae52747a8 ("KVM: x86/pmu: Gate all "unimplemented MSR" prints on
report_ignored_msrs"), in "disabled perfctr wrmsr" case, vcpu_unimpl()
is no longer forced for counter MSRs, so most of the above warnings go away.

For the remaining warnings, vcpu_unimpl() is ratelimitedm and plus all
these logs can be removed by setting report_ignored_msrs=false.

So, it should be not that bad now.

>>
>>
>>> As you mentioned, another approach would be to use KVM_GET_MSRS to specifically
>>> probe for AMD during QEMU initialization. In this case, we can set
>>> 'kvm_pmu_disabled' to true if reading the AMD PMU MSR registers fails.
>>>
>>> To implement this, we may need to:
>>>
>>> 1. Turn this patch to be AMD specific by probing the AMD PMU registers during
>>> initialization. We may need go create a new function in QEMU to use KVM_GET_MSRS
>>> for probing only, or we may re-use kvm_arch_get_supported_msr_feature() or
>>> kvm_get_one_msr(). I may change in the next version.
>>>
>>> 2. Limit the usage of 'kvm_pmu_disabled' to be AMD specific in PATCH 5/7.
>>
>> I guess this might make things more complicated.
>>
>>>>
>>>> Additionally, in this scenario — where the user intends to enable a
>>>> feature but the host cannot support it — normally no warning is emitted
>>>> by QEMU.
>>>
>>> According to the usage of QEMU, may I assume QEMU already prints warning logs
>>> for unsupported features? The below is an example.
>>>
>>> QEMU 10.2.50 monitor - type 'help' for more information
>>> qemu-system-x86_64: warning: host doesn't support requested feature:
>>> CPUID[eax=07h,ecx=00h].EBX.hle [bit 4]
>>> qemu-system-x86_64: warning: host doesn't support requested feature:
>>> CPUID[eax=07h,ecx=00h].EBX.rtm [bit 11]
>>>
>>>>
>>>>> +            }
>>>>> +        } else {
>>>>> +            /*
>>>>> +             * Since Linux v5.18, KVM provides a VM-level capability to easily
>>>>> +             * disable PMUs; however, QEMU has been providing PMU property per
>>>>> +             * CPU since v1.6. In order to accommodate both, have to configure
>>>>> +             * the VM-level capability here.
>>>>> +             *
>>>>> +             * KVM_PMU_CAP_DISABLE doesn't change the PMU
>>>>> +             * behavior on Intel platform because current "pmu" property works
>>>>> +             * as expected.
>>>>> +             */
>>>>> +            if (pmu_cap & KVM_PMU_CAP_DISABLE) {
>>>>> +                ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>>>>> +                                        KVM_PMU_CAP_DISABLE);
>>>>> +                if (ret < 0) {
>>>>> +                    error_setg_errno(errp, -ret,
>>>>> +                                     "Failed to set KVM_PMU_CAP_DISABLE");
>>>>> +                    return ret;
>>>>> +                }
>>>>>              }
>>>>>          }
>>>>>      }
>>>>> @@ -3302,6 +3313,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>>>      int ret;
>>>>>      struct utsname utsname;
>>>>>      Error *local_err = NULL;
>>>>> +    g_autofree char *kvm_enable_pmu;
>>>>>  
>>>>>      /*
>>>>>       * Initialize confidential guest (SEV/TDX) context, if required
>>>>> @@ -3437,6 +3449,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>>>  
>>>>>      pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
>>>>>  
>>>>> +    /*
>>>>> +     * The enable_pmu parameter is introduced since Linux v5.17,
>>>>> +     * give a chance to provide more information about vPMU
>>>>> +     * enablement.
>>>>> +     *
>>>>> +     * The kvm.enable_pmu's permission is 0444. It does not change
>>>>> +     * until a reload of the KVM module.
>>>>> +     */
>>>>> +    if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
>>>>> +                            &kvm_enable_pmu, NULL, NULL)) {
>>>>> +        if (*kvm_enable_pmu == 'N') {
>>>>> +            kvm_pmu_disabled = true;
>>>>
>>>> It’s generally better not to rely on KVM’s internal implementation
>>>> unless really necessary.
>>>>
>>>> For example, in the new mediated vPMU framework, even if the KVM module
>>>> parameter enable_pmu is set, the per-guest kvm->arch.enable_pmu could
>>>> still be cleared.
>>>>
>>>> In such a case, the logic here might not be correct.
>>>
>>> Would the Mediated vPMU set KVM_PMU_CAP_DISABLE to clear per-VM enable_pmu even
>>> when the global KVM parameter enable_pmu=N is set?
>>>
>>> In this scenario, we plan to rely on KVM_PMU_CAP_DISABLE only when the value of
>>> "/sys/module/kvm/parameters/enable_pmu" is not equal to N.
>>>
>>> Can I assume that this will work with Mediated vPMU?
>>>
>>>
>>> Is there any possibility to follow the current approach before Mediated vPMU is
>>> finalized for mainline, and later introduce an incremental change using
>>> KVM_GET_MSRS probing? The current approach is straightforward and can work with
>>> existing Linux kernel source code.
>>
>> Apologies for the incorrect statement I made earlier regarding mediated
>> vPMU.
>>
>> According to the mediated vPMU v6, the only behavior specific to
>> mediated vPMU is that kvm->arch.enable_pmu may be cleared when
>> irqchip_in_kernel() is not true:
>> https://urldefense.com/v3/__https://lore.kernel.org/all/20251206001720.468579-17-seanjc@google.com/__;!!ACWV5N9M2RV99hQ!IS8XQG3Zx84utP2QScNlxp-0H5JAgr89lBb1j2oGVJJop3WMyK6X2I5mlerMPA06wkJy8VFd1x6XGEHc6kWn$ 
>>
>> However, this does not imply that mediated vPMU requires any special
>> handling here. In theory, KVM could clear kvm->arch.enable_pmu in the
>> future for other reasons.
>>
> 
> While "KVM could clear kvm->arch.enable_pmu in the future," I don't think KVM
> may set kvm->arch.enable_pmu if the global enable_pmu is set to 'N'.
> 
> Taking Intel VMX EPT as an example, once "/sys/module/kvm_intel/parameters/ept"
> is globally disabled, there's no way within KVM software to enable it for any
> guest VM **after** 'ept' is set to N.
> 
> Similarly, "/sys/module/kvm/parameters/enable_pmu=N" indicates that this KVM
> host will not support PMU virtualization in any way. Therefore, there should be
> no way to enable vPMU for any guest VM if the global parameter is set to 'N'.
> Here we read from ths parameter only during QEMU initialization.
> 
> That's why I believe it's reliable to trust the setting when
> "/sys/module/kvm/parameters/enable_pmu=N".
> 
> In this way, we can avoid many unnecessary MSR writes, especially in cases where
> a VM has 300+ vCPUs, even though these may be equivalent to NOPs with
> optimizations in more recent KVM versions.
> 
> The objective isn't to improve performance. Minimizing the number of unwanted
> MSR writes from QEMU reduces the chances of failure (e.g., due to any QEMU
> software bug). We can simply avoid those unwanted MSR/NOPs by reading from a KVM
> parameter.
> 
> From a user's perspective, this just seems odd.
> "/sys/module/kvm/parameters/enable_pmu=N" is a reliable setting. If there's a
> configuration mismatch between QEMU and KVM, a warning could alert the user.
> 
> I can remove this patch, along with the 'kvm_pmu_disabled' variable.


Even with /sys/module/kvm/parameters/enable_pmu=Y, theoretically it's
possible for kvm->arch.enable_pmu to be false. In such a case, vPMU
could still be advertised, and QEMU doens't know that vPMU is not
supported by KVM, on either Intel or AMD platforms.

Anyway, this is likely only a theoretical scenario and may not actually
happens in practice.


> Thank you very much!
> 
> Dongli Zhang