[PATCH v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs

Dongli Zhang posted 2 patches 1 year, 4 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20221202002256.39243-1-dongli.zhang@oracle.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Marcelo Tosatti <mtosatti@redhat.com>
There is a newer version of this series
accel/kvm/kvm-all.c      |   1 +
include/sysemu/kvm_int.h |   1 +
qemu-options.hx          |   7 +++
target/i386/cpu.h        |   5 ++
target/i386/kvm/kvm.c    | 129 +++++++++++++++++++++++++++++++++++++++++-
5 files changed, 141 insertions(+), 2 deletions(-)
[PATCH v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs
Posted by Dongli Zhang 1 year, 4 months ago
This patchset is to fix two svm pmu virtualization bugs, x86 only.

version 1:
https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/

1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.

To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
virtualization. There is still below at the VM linux side ...

[    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

... although we expect something like below.

[    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

The 1st patch has introduced a new x86 only accel/kvm property
"pmu-cap-disabled=true" to disable the pmu virtualization via
KVM_PMU_CAP_DISABLE.

I considered 'KVM_X86_SET_MSR_FILTER' initially before patchset v1.
Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
finally used the latter because it is easier to use.


2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
at the KVM side may inject random unwanted/unknown NMIs to the VM.

The svm pmu registers are not reset during QEMU system_reset.

(1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
is running "perf top". The pmu registers are not disabled gracefully.

(2). Although the x86_cpu_reset() resets many registers to zero, the
kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
some pmu events are still enabled at the KVM side.

(3). The KVM pmc_speculative_in_use() always returns true so that the events
will not be reclaimed. The kvm_pmc->perf_event is still active.

(4). After the reboot, the VM kernel reports below error:

[    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)

(5). In a worse case, the active kvm_pmc->perf_event is still able to
inject unknown NMIs randomly to the VM kernel.

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

The 2nd patch is to fix the issue by resetting AMD pmu registers as well as
Intel registers.


This patchset does not cover PerfMonV2, until the below patchset is merged
into the KVM side.

[PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support
https://lore.kernel.org/all/20221111102645.82001-1-likexu@tencent.com/


Dongli Zhang (2):
      target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE
      target/i386/kvm: get and put AMD pmu registers

 accel/kvm/kvm-all.c      |   1 +
 include/sysemu/kvm_int.h |   1 +
 qemu-options.hx          |   7 +++
 target/i386/cpu.h        |   5 ++
 target/i386/kvm/kvm.c    | 129 +++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 141 insertions(+), 2 deletions(-)

Thank you very much!

Dongli Zhang
Re: [PATCH v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs
Posted by Dongli Zhang 1 year, 4 months ago
Can I get feedback for this patchset, especially the [PATCH v2 2/2]?

About the [PATCH v2 2/2], currently the issue impacts the usage of PMUs on AMD
VM, especially the below case:

1. Enable panic on nmi.
2. Use perf to monitor the performance of VM. Although without a test, I think
the nmi watchdog has the same effect.
3. A sudden system reset, or a kernel panic (kdump/kexec).
4. After reboot, there will be random unknown NMI.
5. Unfortunately, the "panic on nmi" may panic the VM randomly at any time.

Thank you very much!

Dongli Zhang

On 12/1/22 16:22, Dongli Zhang wrote:
> This patchset is to fix two svm pmu virtualization bugs, x86 only.
> 
> version 1:
> https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/
> 
> 1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.
> 
> To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
> virtualization. There is still below at the VM linux side ...
> 
> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
> 
> ... although we expect something like below.
> 
> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
> 
> The 1st patch has introduced a new x86 only accel/kvm property
> "pmu-cap-disabled=true" to disable the pmu virtualization via
> KVM_PMU_CAP_DISABLE.
> 
> I considered 'KVM_X86_SET_MSR_FILTER' initially before patchset v1.
> Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
> finally used the latter because it is easier to use.
> 
> 
> 2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
> at the KVM side may inject random unwanted/unknown NMIs to the VM.
> 
> The svm pmu registers are not reset during QEMU system_reset.
> 
> (1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
> is running "perf top". The pmu registers are not disabled gracefully.
> 
> (2). Although the x86_cpu_reset() resets many registers to zero, the
> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
> some pmu events are still enabled at the KVM side.
> 
> (3). The KVM pmc_speculative_in_use() always returns true so that the events
> will not be reclaimed. The kvm_pmc->perf_event is still active.
> 
> (4). After the reboot, the VM kernel reports below error:
> 
> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
> 
> (5). In a worse case, the active kvm_pmc->perf_event is still able to
> inject unknown NMIs randomly to the VM kernel.
> 
> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
> 
> The 2nd patch is to fix the issue by resetting AMD pmu registers as well as
> Intel registers.
> 
> 
> This patchset does not cover PerfMonV2, until the below patchset is merged
> into the KVM side.
> 
> [PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support
> https://lore.kernel.org/all/20221111102645.82001-1-likexu@tencent.com/
> 
> 
> Dongli Zhang (2):
>       target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE
>       target/i386/kvm: get and put AMD pmu registers
> 
>  accel/kvm/kvm-all.c      |   1 +
>  include/sysemu/kvm_int.h |   1 +
>  qemu-options.hx          |   7 +++
>  target/i386/cpu.h        |   5 ++
>  target/i386/kvm/kvm.c    | 129 +++++++++++++++++++++++++++++++++++++++++-
>  5 files changed, 141 insertions(+), 2 deletions(-)
> 
> Thank you very much!
> 
> Dongli Zhang
> 
>
Re: [PATCH v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs
Posted by Dongli Zhang 1 year, 3 months ago
Ping?

About [PATCH v2 2/2], the bad thing is that the customer will not be able to
notice the issue, that is, the "Broken BIOS detected" in dmesg, immediately.

As a result, the customer VM many panic randomly anytime in the future (once
issue is encountered) if "/proc/sys/kernel/unknown_nmi_panic" is enabled.

Thank you very much!

Dongli Zhang

On 12/19/22 06:45, Dongli Zhang wrote:
> Can I get feedback for this patchset, especially the [PATCH v2 2/2]?
> 
> About the [PATCH v2 2/2], currently the issue impacts the usage of PMUs on AMD
> VM, especially the below case:
> 
> 1. Enable panic on nmi.
> 2. Use perf to monitor the performance of VM. Although without a test, I think
> the nmi watchdog has the same effect.
> 3. A sudden system reset, or a kernel panic (kdump/kexec).
> 4. After reboot, there will be random unknown NMI.
> 5. Unfortunately, the "panic on nmi" may panic the VM randomly at any time.
> 
> Thank you very much!
> 
> Dongli Zhang
> 
> On 12/1/22 16:22, Dongli Zhang wrote:
>> This patchset is to fix two svm pmu virtualization bugs, x86 only.
>>
>> version 1:
>> https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/
>>
>> 1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.
>>
>> To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
>> virtualization. There is still below at the VM linux side ...
>>
>> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>>
>> ... although we expect something like below.
>>
>> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
>> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>>
>> The 1st patch has introduced a new x86 only accel/kvm property
>> "pmu-cap-disabled=true" to disable the pmu virtualization via
>> KVM_PMU_CAP_DISABLE.
>>
>> I considered 'KVM_X86_SET_MSR_FILTER' initially before patchset v1.
>> Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
>> finally used the latter because it is easier to use.
>>
>>
>> 2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
>> at the KVM side may inject random unwanted/unknown NMIs to the VM.
>>
>> The svm pmu registers are not reset during QEMU system_reset.
>>
>> (1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
>> is running "perf top". The pmu registers are not disabled gracefully.
>>
>> (2). Although the x86_cpu_reset() resets many registers to zero, the
>> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
>> some pmu events are still enabled at the KVM side.
>>
>> (3). The KVM pmc_speculative_in_use() always returns true so that the events
>> will not be reclaimed. The kvm_pmc->perf_event is still active.
>>
>> (4). After the reboot, the VM kernel reports below error:
>>
>> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
>> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
>>
>> (5). In a worse case, the active kvm_pmc->perf_event is still able to
>> inject unknown NMIs randomly to the VM kernel.
>>
>> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>>
>> The 2nd patch is to fix the issue by resetting AMD pmu registers as well as
>> Intel registers.
>>
>>
>> This patchset does not cover PerfMonV2, until the below patchset is merged
>> into the KVM side.
>>
>> [PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support
>> https://lore.kernel.org/all/20221111102645.82001-1-likexu@tencent.com/
>>
>>
>> Dongli Zhang (2):
>>       target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE
>>       target/i386/kvm: get and put AMD pmu registers
>>
>>  accel/kvm/kvm-all.c      |   1 +
>>  include/sysemu/kvm_int.h |   1 +
>>  qemu-options.hx          |   7 +++
>>  target/i386/cpu.h        |   5 ++
>>  target/i386/kvm/kvm.c    | 129 +++++++++++++++++++++++++++++++++++++++++-
>>  5 files changed, 141 insertions(+), 2 deletions(-)
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>>
Re: [PATCH v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs
Posted by Like Xu 10 months ago
I think we've been stuck here too long. Sorry Dongli.

+zhenyu, could you get someone to follow up on this, or I will start working on 
that.

On 9/1/2023 9:19 am, Dongli Zhang wrote:
> Ping?
> 
> About [PATCH v2 2/2], the bad thing is that the customer will not be able to
> notice the issue, that is, the "Broken BIOS detected" in dmesg, immediately.
> 
> As a result, the customer VM many panic randomly anytime in the future (once
> issue is encountered) if "/proc/sys/kernel/unknown_nmi_panic" is enabled.
> 
> Thank you very much!
> 
> Dongli Zhang
> 
> On 12/19/22 06:45, Dongli Zhang wrote:
>> Can I get feedback for this patchset, especially the [PATCH v2 2/2]?
>>
>> About the [PATCH v2 2/2], currently the issue impacts the usage of PMUs on AMD
>> VM, especially the below case:
>>
>> 1. Enable panic on nmi.
>> 2. Use perf to monitor the performance of VM. Although without a test, I think
>> the nmi watchdog has the same effect.
>> 3. A sudden system reset, or a kernel panic (kdump/kexec).
>> 4. After reboot, there will be random unknown NMI.
>> 5. Unfortunately, the "panic on nmi" may panic the VM randomly at any time.
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>> On 12/1/22 16:22, Dongli Zhang wrote:
>>> This patchset is to fix two svm pmu virtualization bugs, x86 only.
>>>
>>> version 1:
>>> https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/
>>>
>>> 1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.
>>>
>>> To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
>>> virtualization. There is still below at the VM linux side ...
>>>
>>> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>>>
>>> ... although we expect something like below.
>>>
>>> [    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
>>> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>>>
>>> The 1st patch has introduced a new x86 only accel/kvm property
>>> "pmu-cap-disabled=true" to disable the pmu virtualization via
>>> KVM_PMU_CAP_DISABLE.
>>>
>>> I considered 'KVM_X86_SET_MSR_FILTER' initially before patchset v1.
>>> Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
>>> finally used the latter because it is easier to use.
>>>
>>>
>>> 2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
>>> at the KVM side may inject random unwanted/unknown NMIs to the VM.
>>>
>>> The svm pmu registers are not reset during QEMU system_reset.
>>>
>>> (1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
>>> is running "perf top". The pmu registers are not disabled gracefully.
>>>
>>> (2). Although the x86_cpu_reset() resets many registers to zero, the
>>> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
>>> some pmu events are still enabled at the KVM side.
>>>
>>> (3). The KVM pmc_speculative_in_use() always returns true so that the events
>>> will not be reclaimed. The kvm_pmc->perf_event is still active.
>>>
>>> (4). After the reboot, the VM kernel reports below error:
>>>
>>> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
>>> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
>>>
>>> (5). In a worse case, the active kvm_pmc->perf_event is still able to
>>> inject unknown NMIs randomly to the VM kernel.
>>>
>>> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>>>
>>> The 2nd patch is to fix the issue by resetting AMD pmu registers as well as
>>> Intel registers.
>>>
>>>
>>> This patchset does not cover PerfMonV2, until the below patchset is merged
>>> into the KVM side.
>>>
>>> [PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support
>>> https://lore.kernel.org/all/20221111102645.82001-1-likexu@tencent.com/
>>>
>>>
>>> Dongli Zhang (2):
>>>        target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE
>>>        target/i386/kvm: get and put AMD pmu registers
>>>
>>>   accel/kvm/kvm-all.c      |   1 +
>>>   include/sysemu/kvm_int.h |   1 +
>>>   qemu-options.hx          |   7 +++
>>>   target/i386/cpu.h        |   5 ++
>>>   target/i386/kvm/kvm.c    | 129 +++++++++++++++++++++++++++++++++++++++++-
>>>   5 files changed, 141 insertions(+), 2 deletions(-)
>>>
>>> Thank you very much!
>>>
>>> Dongli Zhang
>>>
>>>
> 
>
Re: [PATCH v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs
Posted by Dongli Zhang 10 months ago
Hi Like and zhenyu,

Thank you very much! That will be very helpful.

In order to help the review, I will rebase the patchset on top of the most
recent QEMU.

Thank you very much!

Dongli Zhang

On 6/19/23 01:52, Like Xu wrote:
> I think we've been stuck here too long. Sorry Dongli.
> 
> +zhenyu, could you get someone to follow up on this, or I will start working on
> that.
> 
> On 9/1/2023 9:19 am, Dongli Zhang wrote:
>> Ping?
>>
>> About [PATCH v2 2/2], the bad thing is that the customer will not be able to
>> notice the issue, that is, the "Broken BIOS detected" in dmesg, immediately.
>>
>> As a result, the customer VM many panic randomly anytime in the future (once
>> issue is encountered) if "/proc/sys/kernel/unknown_nmi_panic" is enabled.
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>> On 12/19/22 06:45, Dongli Zhang wrote:
>>> Can I get feedback for this patchset, especially the [PATCH v2 2/2]?
>>>
>>> About the [PATCH v2 2/2], currently the issue impacts the usage of PMUs on AMD
>>> VM, especially the below case:
>>>
>>> 1. Enable panic on nmi.
>>> 2. Use perf to monitor the performance of VM. Although without a test, I think
>>> the nmi watchdog has the same effect.
>>> 3. A sudden system reset, or a kernel panic (kdump/kexec).
>>> 4. After reboot, there will be random unknown NMI.
>>> 5. Unfortunately, the "panic on nmi" may panic the VM randomly at any time.
>>>
>>> Thank you very much!
>>>
>>> Dongli Zhang
>>>
>>> On 12/1/22 16:22, Dongli Zhang wrote:
>>>> This patchset is to fix two svm pmu virtualization bugs, x86 only.
>>>>
>>>> version 1:
>>>> https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/
>>>>
>>>> 1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.
>>>>
>>>> To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
>>>> virtualization. There is still below at the VM linux side ...
>>>>
>>>> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>>>>
>>>> ... although we expect something like below.
>>>>
>>>> [    0.596381] Performance Events: PMU not available due to virtualization,
>>>> using software events only.
>>>> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>>>>
>>>> The 1st patch has introduced a new x86 only accel/kvm property
>>>> "pmu-cap-disabled=true" to disable the pmu virtualization via
>>>> KVM_PMU_CAP_DISABLE.
>>>>
>>>> I considered 'KVM_X86_SET_MSR_FILTER' initially before patchset v1.
>>>> Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
>>>> finally used the latter because it is easier to use.
>>>>
>>>>
>>>> 2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
>>>> at the KVM side may inject random unwanted/unknown NMIs to the VM.
>>>>
>>>> The svm pmu registers are not reset during QEMU system_reset.
>>>>
>>>> (1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
>>>> is running "perf top". The pmu registers are not disabled gracefully.
>>>>
>>>> (2). Although the x86_cpu_reset() resets many registers to zero, the
>>>> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
>>>> some pmu events are still enabled at the KVM side.
>>>>
>>>> (3). The KVM pmc_speculative_in_use() always returns true so that the events
>>>> will not be reclaimed. The kvm_pmc->perf_event is still active.
>>>>
>>>> (4). After the reboot, the VM kernel reports below error:
>>>>
>>>> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS
>>>> detected, complain to your hardware vendor.
>>>> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR
>>>> c0010200 is 530076)
>>>>
>>>> (5). In a worse case, the active kvm_pmc->perf_event is still able to
>>>> inject unknown NMIs randomly to the VM kernel.
>>>>
>>>> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>>>>
>>>> The 2nd patch is to fix the issue by resetting AMD pmu registers as well as
>>>> Intel registers.
>>>>
>>>>
>>>> This patchset does not cover PerfMonV2, until the below patchset is merged
>>>> into the KVM side.
>>>>
>>>> [PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support
>>>> https://lore.kernel.org/all/20221111102645.82001-1-likexu@tencent.com/
>>>>
>>>>
>>>> Dongli Zhang (2):
>>>>        target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE
>>>>        target/i386/kvm: get and put AMD pmu registers
>>>>
>>>>   accel/kvm/kvm-all.c      |   1 +
>>>>   include/sysemu/kvm_int.h |   1 +
>>>>   qemu-options.hx          |   7 +++
>>>>   target/i386/cpu.h        |   5 ++
>>>>   target/i386/kvm/kvm.c    | 129 +++++++++++++++++++++++++++++++++++++++++-
>>>>   5 files changed, 141 insertions(+), 2 deletions(-)
>>>>
>>>> Thank you very much!
>>>>
>>>> Dongli Zhang
>>>>
>>>>
>>
>>