[PATCH 0/7] target/i386/kvm/pmu: Enhancement, Bugfix and Cleanup

Dongli Zhang posted 7 patches 2 weeks, 5 days ago
accel/kvm/kvm-all.c        |   1 +
include/sysemu/kvm_int.h   |   1 +
qemu-options.hx            |   9 +-
target/i386/cpu.c          |   3 +-
target/i386/cpu.h          |  12 ++
target/i386/kvm/kvm.c      | 340 ++++++++++++++++++++++++++++++++++------
target/i386/kvm/kvm_i386.h |   2 +
7 files changed, 319 insertions(+), 49 deletions(-)
[PATCH 0/7] target/i386/kvm/pmu: Enhancement, Bugfix and Cleanup
Posted by Dongli Zhang 2 weeks, 5 days ago
This patchset addresses three bugs related to AMD PMU virtualization.

1. The PerfMonV2 is still available if PERCORE if disabled via
"-cpu host,-perfctr-core".

2. The second issue is that using "-cpu host,-pmu" does not disable AMD PMU
virtualization. When using "-cpu EPYC" or "-cpu host,-pmu", AMD PMU
virtualization remains enabled. On the VM's Linux side, you might still
see:

[    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

instead of:

[    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

To address this, we have introduced a new x86-specific accel/kvm property,
"pmu-cap-disabled=true", which disables PMU virtualization via
KVM_PMU_CAP_DISABLE.

Another previous solution to re-use '-cpu host,-pmu':
https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/


3. The third issue is that unreclaimed performance events (after a QEMU
system_reset) in KVM may cause random, unwanted, or unknown NMIs to be
injected into the VM.

The AMD PMU registers are not reset during QEMU system_reset.

(1) If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.

(2) Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.

(3) The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.

(4) After a reboot, the VM kernel may report the following error:

[    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)

(5) In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process


Dongli Zhang (7):
  target/i386: disable PerfMonV2 when PERFCORE unavailable
  target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE
  target/i386/kvm: init PMU information only once
  target/i386/kvm: rename architectural PMU variables
  target/i386/kvm: reset AMD PMU registers during VM reset
  target/i386/kvm: support perfmon-v2 for reset
  target/i386/kvm: don't stop Intel PMU counters

 accel/kvm/kvm-all.c        |   1 +
 include/sysemu/kvm_int.h   |   1 +
 qemu-options.hx            |   9 +-
 target/i386/cpu.c          |   3 +-
 target/i386/cpu.h          |  12 ++
 target/i386/kvm/kvm.c      | 340 ++++++++++++++++++++++++++++++++++------
 target/i386/kvm/kvm_i386.h |   2 +
 7 files changed, 319 insertions(+), 49 deletions(-)

base-commit: c94bee4cd6693c1c65ba43bb8970cf909dec378b

Thank you very much!

Dongli Zhang