From nobody Sun Oct 5 09:11:00 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA0E22DAFA4 for ; Wed, 6 Aug 2025 19:58:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754510310; cv=none; b=N9LJagfbpIsHIp1G9hiLsXfFPRrXEBuiGTQJ8JLGT+RgKnRIkSkZuLHP9BoNIHjJsX3tCprLwhET/772RzoJSTq16NHhUstrhXTInkAc21gRF7as3rf4dD0W9L3v2ZqVcsGfzQvWalpdKxKOh2s/IOS+r+5PcaSkixvRM7Q1MDA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754510310; c=relaxed/simple; bh=0nGwxtmjwtpfQ3TN4W9/t/0LCqh4Gk+u4uvrKvrw+k4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=LqAkkKdFs6h4lD0Scl3ChGmSCUurPxfV05Y/QI9tt0koyxbjRCnmElg5jjPW6kQJmkWWZqy+85GYJmeWdC9JxGe7uMqeqCmTcc+oy02ecDEHJXkpvIcHW36ha7/B/COY1zmpUUb9HWijdqBkgnfx+4acVXx1reouhHOQo0Rv9gI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=01T4XjTZ; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="01T4XjTZ" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2425364e0b2so1992675ad.1 for ; Wed, 06 Aug 2025 12:58:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1754510307; x=1755115107; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=OFlIqQp7LfOXLtzFAGq5nR1orhSSuVevPR/iGUx1pag=; b=01T4XjTZ78YEkDSbYU6EV/t+XZf4XKThEseOBLxvIWbMBWtXuRmORFUNizCcg5osFY 06b8Uj4RJnry55cr4isNnV972paya22BmykCaHJZHBlLZduBhW7E4HjHWygiNXxNIuXD Jc6uDkD+/VknGUwrf91XclnpUiUWq5BZ87bfAGxItolQbT+ixFL2qUzqxBo+uqhaHqaV cA4zKwQcxASjF1TdRPDJk6/tPDDPOicNDCJNgw8LKBrQF6HDqxdxXo3F7uBdDAOUhmXC MVoR0gfhQiH1ywNnTXP76EqAMRiYemwY1xedz71atMC5WjMDuaLdw7GU1T0A/tFcWDfv r6Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1754510307; x=1755115107; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=OFlIqQp7LfOXLtzFAGq5nR1orhSSuVevPR/iGUx1pag=; b=rOkDi/JKkAWQGd6QvFWMzXYzH1/fB7aswe4Sttl9GtIuq7ICy2yNmp6hb9k5+v2dHS K1eRpr8/OGYwWXAj+GvjSeR4v02sELjxdaZtvQ9LckvISB2XTgGizMHlVI4FG1MkOAkO FPRO7Bdx8J2Tqm/VTCom/j5DE77BeCP2pzRgaQKweUaSAqMY8CjhgObSb8OTzANlpUMX i+Y9LMgfESVrXjJh+PkrXd8SaodJWuEPTvgbeJGy9ac/BKRe1yviloZZJpx3MZRebwjn bMWRTISVEM0OObPUlIPvxpAfwW+D5Rh5UET52S/T78x4PRLa1Nx7UG398PJkjfZZtC70 o5Fw== X-Forwarded-Encrypted: i=1; AJvYcCWPOUiT0FbXtHH1oXcF20GCsO0Ph2lPhiBN8VQc8r2Uya9wXWSDoEkB1fksnpeTlUsdwEFUPD7ClYzTln0=@vger.kernel.org X-Gm-Message-State: AOJu0YzJC9sCbtJsVsf7mTnLspkH++GZHTgJ3pKPyUybVdIrJrEP8Um0 wAyO6BkOBCMgYQnbr3beiBNnnCsz4Jpl0LY4bnJH0nAZdaShphCcR6i4ORWRHFU5CXaCrELzFYp W2Xhz1A== X-Google-Smtp-Source: AGHT+IExoXGbX1n3oux0+ebgASacivEUZvDFKwtmFYuGqpCTA7y0Fiel45h6jynxmc5ok7SFueltFvmebss= X-Received: from pjbqb13.prod.google.com ([2002:a17:90b:280d:b0:31c:160d:e3be]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e810:b0:240:10dc:b7c9 with SMTP id d9443c01a7336-2429f1e410amr61901265ad.9.1754510307282; Wed, 06 Aug 2025 12:58:27 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 6 Aug 2025 12:56:54 -0700 In-Reply-To: <20250806195706.1650976-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250806195706.1650976-1-seanjc@google.com> X-Mailer: git-send-email 2.50.1.565.gc32cd1483b-goog Message-ID: <20250806195706.1650976-33-seanjc@google.com> Subject: [PATCH v5 32/44] KVM: x86/pmu: Disable interception of select PMU MSRs for mediated vPMUs From: Sean Christopherson To: Marc Zyngier , Oliver Upton , Tianrui Zhao , Bibo Mao , Huacai Chen , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Xin Li , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, loongarch@lists.linux.dev, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Kan Liang , Yongwei Ma , Mingwei Zhang , Xiong Zhang , Sandipan Das , Dapeng Mi Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Dapeng Mi For vCPUs with a mediated vPMU, disable interception of counter MSRs for PMCs that are exposed to the guest, and for GLOBAL_CTRL and related MSRs if they are fully supported according to the vCPU model, i.e. if the MSRs and all bits supported by hardware exist from the guest's point of view. Do NOT passthrough event selector or fixed counter control MSRs, so that KVM can enforce userspace-defined event filters, e.g. to prevent use of AnyThread events (which is unfortunately a setting in the fixed counter control MSR). Defer support for nested passthrough of mediated PMU MSRs to the future, as the logic for nested MSR interception is unfortunately vendor specific. Suggested-by: Sean Christopherson Co-developed-by: Mingwei Zhang Signed-off-by: Mingwei Zhang Co-developed-by: Sandipan Das Signed-off-by: Sandipan Das Signed-off-by: Dapeng Mi [sean: squash patches, massage changelog, refresh VMX MSRs on filter change] Signed-off-by: Sean Christopherson --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/kvm/pmu.c | 34 ++++++++++++------ arch/x86/kvm/pmu.h | 1 + arch/x86/kvm/svm/svm.c | 36 +++++++++++++++++++ arch/x86/kvm/vmx/pmu_intel.c | 13 ------- arch/x86/kvm/vmx/pmu_intel.h | 15 ++++++++ arch/x86/kvm/vmx/vmx.c | 59 ++++++++++++++++++++++++++------ 7 files changed, 124 insertions(+), 35 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index f19d1ee9a396..06bd7aaae931 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -736,6 +736,7 @@ #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300 #define MSR_AMD64_PERF_CNTR_GLOBAL_CTL 0xc0000301 #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR 0xc0000302 +#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET 0xc0000303 =20 /* AMD Last Branch Record MSRs */ #define MSR_AMD64_LBR_SELECT 0xc000010e diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 4246e1d2cfcc..817ef852bdf9 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -715,18 +715,14 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx= , u64 *data) return 0; } =20 -bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu) +bool kvm_need_perf_global_ctrl_intercept(struct kvm_vcpu *vcpu) { struct kvm_pmu *pmu =3D vcpu_to_pmu(vcpu); =20 if (!kvm_vcpu_has_mediated_pmu(vcpu)) return true; =20 - /* - * VMware allows access to these Pseduo-PMCs even when read via RDPMC - * in Ring3 when CR4.PCE=3D0. - */ - if (enable_vmware_backdoor) + if (!kvm_pmu_has_perf_global_ctrl(pmu)) return true; =20 /* @@ -735,7 +731,22 @@ bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu) * capabilities themselves may be a subset of hardware capabilities. */ return pmu->nr_arch_gp_counters !=3D kvm_host_pmu.num_counters_gp || - pmu->nr_arch_fixed_counters !=3D kvm_host_pmu.num_counters_fixed || + pmu->nr_arch_fixed_counters !=3D kvm_host_pmu.num_counters_fixed; +} +EXPORT_SYMBOL_GPL(kvm_need_perf_global_ctrl_intercept); + +bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu) +{ + struct kvm_pmu *pmu =3D vcpu_to_pmu(vcpu); + + /* + * VMware allows access to these Pseduo-PMCs even when read via RDPMC + * in Ring3 when CR4.PCE=3D0. + */ + if (enable_vmware_backdoor) + return true; + + return kvm_need_perf_global_ctrl_intercept(vcpu) || pmu->counter_bitmask[KVM_PMC_GP] !=3D (BIT_ULL(kvm_host_pmu.bit_wi= dth_gp) - 1) || pmu->counter_bitmask[KVM_PMC_FIXED] !=3D (BIT_ULL(kvm_host_pmu.bit= _width_fixed) - 1); } @@ -927,11 +938,12 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu) * in the global controls). Emulate that behavior when refreshing the * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL. */ - if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters) { + if (pmu->nr_arch_gp_counters && + (kvm_pmu_has_perf_global_ctrl(pmu) || kvm_vcpu_has_mediated_pmu(vcpu)= )) pmu->global_ctrl =3D GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0); - if (kvm_vcpu_has_mediated_pmu(vcpu)) - kvm_pmu_call(write_global_ctrl)(pmu->global_ctrl); - } + + if (kvm_vcpu_has_mediated_pmu(vcpu)) + kvm_pmu_call(write_global_ctrl)(pmu->global_ctrl); =20 bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters); bitmap_set(pmu->all_valid_pmc_idx, KVM_FIXED_PMC_BASE_IDX, diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index dcf4e2253875..51963a3a167a 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -239,6 +239,7 @@ void kvm_pmu_instruction_retired(struct kvm_vcpu *vcpu); void kvm_pmu_branch_retired(struct kvm_vcpu *vcpu); =20 bool is_vmware_backdoor_pmc(u32 pmc_idx); +bool kvm_need_perf_global_ctrl_intercept(struct kvm_vcpu *vcpu); bool kvm_need_rdpmc_intercept(struct kvm_vcpu *vcpu); =20 extern struct kvm_pmu_ops intel_pmu_ops; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 2d42962b47aa..add50b64256c 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -790,6 +790,40 @@ void svm_vcpu_free_msrpm(void *msrpm) __free_pages(virt_to_page(msrpm), get_order(MSRPM_SIZE)); } =20 +static void svm_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu) +{ + bool intercept =3D !kvm_vcpu_has_mediated_pmu(vcpu); + struct kvm_pmu *pmu =3D vcpu_to_pmu(vcpu); + int i; + + if (!enable_mediated_pmu) + return; + + /* Legacy counters are always available for AMD CPUs with a PMU. */ + for (i =3D 0; i < min(pmu->nr_arch_gp_counters, AMD64_NUM_COUNTERS); i++) + svm_set_intercept_for_msr(vcpu, MSR_K7_PERFCTR0 + i, + MSR_TYPE_RW, intercept); + + intercept |=3D !guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE); + for (i =3D 0; i < pmu->nr_arch_gp_counters; i++) + svm_set_intercept_for_msr(vcpu, MSR_F15H_PERF_CTR + 2 * i, + MSR_TYPE_RW, intercept); + + for ( ; i < kvm_pmu_cap.num_counters_gp; i++) + svm_enable_intercept_for_msr(vcpu, MSR_F15H_PERF_CTR + 2 * i, + MSR_TYPE_RW); + + intercept =3D kvm_need_perf_global_ctrl_intercept(vcpu); + svm_set_intercept_for_msr(vcpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL, + MSR_TYPE_RW, intercept); + svm_set_intercept_for_msr(vcpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, + MSR_TYPE_RW, intercept); + svm_set_intercept_for_msr(vcpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, + MSR_TYPE_RW, intercept); + svm_set_intercept_for_msr(vcpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET, + MSR_TYPE_RW, intercept); +} + static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm =3D to_svm(vcpu); @@ -847,6 +881,8 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *= vcpu) if (sev_es_guest(vcpu->kvm)) sev_es_recalc_msr_intercepts(vcpu); =20 + svm_recalc_pmu_msr_intercepts(vcpu); + /* * x2APIC intercepts are modified on-demand and cannot be filtered by * userspace. diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 779b4e64acac..2bdddb95816e 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -128,19 +128,6 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct k= vm_vcpu *vcpu, return &counters[array_index_nospec(idx, num_counters)]; } =20 -static inline u64 vcpu_get_perf_capabilities(struct kvm_vcpu *vcpu) -{ - if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM)) - return 0; - - return vcpu->arch.perf_capabilities; -} - -static inline bool fw_writes_is_enabled(struct kvm_vcpu *vcpu) -{ - return (vcpu_get_perf_capabilities(vcpu) & PERF_CAP_FW_WRITES) !=3D 0; -} - static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr) { if (!fw_writes_is_enabled(pmu_to_vcpu(pmu))) diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h index 5620d0882cdc..5d9357640aa1 100644 --- a/arch/x86/kvm/vmx/pmu_intel.h +++ b/arch/x86/kvm/vmx/pmu_intel.h @@ -4,6 +4,21 @@ =20 #include =20 +#include "cpuid.h" + +static inline u64 vcpu_get_perf_capabilities(struct kvm_vcpu *vcpu) +{ + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM)) + return 0; + + return vcpu->arch.perf_capabilities; +} + +static inline bool fw_writes_is_enabled(struct kvm_vcpu *vcpu) +{ + return (vcpu_get_perf_capabilities(vcpu) & PERF_CAP_FW_WRITES) !=3D 0; +} + bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu); int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu); =20 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 1233a0afb31e..85bd82d41f94 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4068,6 +4068,53 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vc= pu) } } =20 +static void vmx_recalc_pmu_msr_intercepts(struct kvm_vcpu *vcpu) +{ + bool has_mediated_pmu =3D kvm_vcpu_has_mediated_pmu(vcpu); + struct kvm_pmu *pmu =3D vcpu_to_pmu(vcpu); + struct vcpu_vmx *vmx =3D to_vmx(vcpu); + bool intercept =3D !has_mediated_pmu; + int i; + + if (!enable_mediated_pmu) + return; + + vm_entry_controls_changebit(vmx, VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, + has_mediated_pmu); + + vm_exit_controls_changebit(vmx, VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | + VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL, + has_mediated_pmu); + + for (i =3D 0; i < pmu->nr_arch_gp_counters; i++) { + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PERFCTR0 + i, + MSR_TYPE_RW, intercept); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PMC0 + i, MSR_TYPE_RW, + intercept || !fw_writes_is_enabled(vcpu)); + } + for ( ; i < kvm_pmu_cap.num_counters_gp; i++) { + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PERFCTR0 + i, + MSR_TYPE_RW, true); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PMC0 + i, + MSR_TYPE_RW, true); + } + + for (i =3D 0; i < pmu->nr_arch_fixed_counters; i++) + vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_FIXED_CTR0 + i, + MSR_TYPE_RW, intercept); + for ( ; i < kvm_pmu_cap.num_counters_fixed; i++) + vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_FIXED_CTR0 + i, + MSR_TYPE_RW, true); + + intercept =3D kvm_need_perf_global_ctrl_intercept(vcpu); + vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_STATUS, + MSR_TYPE_RW, intercept); + vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, + MSR_TYPE_RW, intercept); + vmx_set_intercept_for_msr(vcpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, + MSR_TYPE_RW, intercept); +} + static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu) { if (!cpu_has_vmx_msr_bitmap()) @@ -4115,17 +4162,7 @@ static void vmx_recalc_msr_intercepts(struct kvm_vcp= u *vcpu) vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W, !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D)); =20 - if (enable_mediated_pmu) { - bool is_mediated_pmu =3D kvm_vcpu_has_mediated_pmu(vcpu); - struct vcpu_vmx *vmx =3D to_vmx(vcpu); - - vm_entry_controls_changebit(vmx, - VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, is_mediated_pmu); - - vm_exit_controls_changebit(vmx, - VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | - VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL, is_mediated_pmu); - } + vmx_recalc_pmu_msr_intercepts(vcpu); =20 /* * x2APIC and LBR MSR intercepts are modified on-demand and cannot be --=20 2.50.1.565.gc32cd1483b-goog