[PATCH v5 44/44] KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already match

Sean Christopherson posted 44 patches 6 months ago
There is a newer version of this series
[PATCH v5 44/44] KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already match
Posted by Sean Christopherson 6 months ago
When loading a mediated PMU state, elide the WRMSRs to load PMCs with the
guest's value if the value in hardware already matches the guest's value.
For the relatively common case where neither the guest nor the host is
actively using the PMU, i.e. when all/many counters are '0', eliding the
WRMSRs reduces the latency of handling VM-Exit by a measurable amount
(WRMSR is significantly more expensive than RDPMC).

As measured by KVM-Unit-Tests' CPUID VM-Exit testcase, this provides a
a ~25% reduction in latency (4k => 3k cycles) on Intel Emerald Rapids,
and a ~13% reduction (6.2k => 5.3k cycles) on AMD Turing.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index ddab1630a978..0e5048ae86fa 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -1299,13 +1299,15 @@ static void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu)
 	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
 		pmc = &pmu->gp_counters[i];
 
-		wrmsrl(gp_counter_msr(i), pmc->counter);
+		if (pmc->counter != rdpmc(i))
+			wrmsrl(gp_counter_msr(i), pmc->counter);
 		wrmsrl(gp_eventsel_msr(i), pmc->eventsel_hw);
 	}
 	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
 		pmc = &pmu->fixed_counters[i];
 
-		wrmsrl(fixed_counter_msr(i), pmc->counter);
+		if (pmc->counter != rdpmc(INTEL_PMC_FIXED_RDPMC_BASE | i))
+			wrmsrl(fixed_counter_msr(i), pmc->counter);
 	}
 }
 
-- 
2.50.1.565.gc32cd1483b-goog
Re: [PATCH v5 44/44] KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already match
Posted by Manali Shukla 2 months, 3 weeks ago
On 8/7/2025 1:27 AM, Sean Christopherson wrote:
> When loading a mediated PMU state, elide the WRMSRs to load PMCs with the
> guest's value if the value in hardware already matches the guest's value.
> For the relatively common case where neither the guest nor the host is
> actively using the PMU, i.e. when all/many counters are '0', eliding the
> WRMSRs reduces the latency of handling VM-Exit by a measurable amount
> (WRMSR is significantly more expensive than RDPMC).
> 
> As measured by KVM-Unit-Tests' CPUID VM-Exit testcase, this provides a
> a ~25% reduction in latency (4k => 3k cycles) on Intel Emerald Rapids,
> and a ~13% reduction (6.2k => 5.3k cycles) on AMD Turing.

Nit. s/AMD Turing/AMD Turin

-Manali
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>