From nobody Tue Dec 16 14:38:31 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2AE630171D for ; Sat, 6 Dec 2025 00:18:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764980323; cv=none; b=Dg6GIWWTaMUWhHtkrOymztvoXeqaYwFRO6AU78MyZ93c8dakOI3A9SaEYoh6q/Ogrt48QE4zTxme8VelRr3v4JevnzZq2FCJ/metXAoBqVJlGVNuvjHYPS3JLI2l0a+JPPIBgrSOPeJsnX5T7idmH/67N2yKUJ+SUHoG/SGCpBU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764980323; c=relaxed/simple; bh=q1xuOjj1CS5QA7ELdLFyugLEU/XcJbqyV6rHreEB4aE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=CFnpstiUysQDyYcvFAYYc06G/MeA2rAlwJ7G9etrYsXXlJQx2KstzjZ0rMVVNuUXfEYPC/kRyO6C7Z911OClcf7zf8Kdsf1lMxeZQCT+TrKtx97xowmYrWe2ZFy7mYmnQKx+mI4k01wrxrMQ3NaFpSHdVqnQy4jHI+uIvQ7Z5Ps= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=LgH5hYRC; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LgH5hYRC" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-3436e9e3569so4588535a91.2 for ; Fri, 05 Dec 2025 16:18:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1764980320; x=1765585120; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=ku5/x+LjNQqCx84/WRP5NtSBvqHOmfXefyV2HWZpjLA=; b=LgH5hYRCojXZxMk+udhVBWaSHt5Cz2gJX4YN9xR+XclaD/8uM8OFKqai8vC/9d4HfA HF8Q08xgUuZYyH6FA5WH+q17QnBO1YYpEln23eoWQWsFqoXvpjMSojwP4/SYwunWM63J +QwMADc3awZqRxQlFnuAv67lNugaHhW2Yz9qmG1/ZoBNr2tHYbAigyPreD2ppcGWPNUj UclS2of4ZPabA3A7kj80JS2A4HkVRWROgmCxUmRmS/GPki0KY6iu3KDnRDw3is/ma183 VRj5Au6qsb5FC4mfxoTaINqM8XV2K95pbRYJFtjqfzpjBji8FO4BgZ+0Pqj6+hv+JDqY DaBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764980320; x=1765585120; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ku5/x+LjNQqCx84/WRP5NtSBvqHOmfXefyV2HWZpjLA=; b=a07grU0bXMkQLNr5OGAXDN7DTC1ctWvvsnp0sFy1CdQc3yXQKM1b4Q5eJojccW6vcy 83zH1tLeqlF6YB1+Tkm3YIzHmDzeyegz+To24H88FuUpIiFILf2uf2r0bDZREeErKnKe JV4h4yhSB+0zh+SpbcCP6HasVYbsndUyLbbFxyxx3pquV8sLgt/gCCGr5etzR4GsMcXV 4SYKYg6a6ZwGXOJxzsvf7X9yCOCYPrMwRQuOpUUVAri1FocCwN6UrKaSNlCG7SA/SRFp RNwbKdG/wW+3k+AYn22ZMX0nCyAEvulDe8mrt9KNemBw28lbJFYWbMjrL9dOI4nAyjfk R/pQ== X-Forwarded-Encrypted: i=1; AJvYcCUlizob3+JM5IDflnE6aKMic2zks3vrZ5gJsDgn+2hfxVLv0yYOGevcxvKlyJyOMGIHk79rRhD+b7KLDxM=@vger.kernel.org X-Gm-Message-State: AOJu0YwDY+GkONaBVrBsaOBTT+KVWot449pOhGHBvKPJ4ZXU+CEZstYb IIBfcfbntVmf3/XskE+/3xDA88hqOXa9qQfkf7F7fYf8t5hLrGaKlQqo9bxvtTrC0WoDj4qYpj3 IZsRFQQ== X-Google-Smtp-Source: AGHT+IEdp4wM145Xt6RAcxFUTnovxS6MiWgfOzkzkurnCS7JdSyw5SBAEDCrfo8aXCdOIaWJdQdQ/+16vt8= X-Received: from pjbft20.prod.google.com ([2002:a17:90b:f94:b0:347:40b0:958b]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:e70d:b0:343:66e2:5f9b with SMTP id 98e67ed59e1d1-349a25bd10emr622201a91.24.1764980320255; Fri, 05 Dec 2025 16:18:40 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 5 Dec 2025 16:17:12 -0800 In-Reply-To: <20251206001720.468579-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251206001720.468579-1-seanjc@google.com> X-Mailer: git-send-email 2.52.0.223.gf5cc29aaa4-goog Message-ID: <20251206001720.468579-37-seanjc@google.com> Subject: [PATCH v6 36/44] KVM: nVMX: Don't update msr_autostore count when saving TSC for vmcs12 From: Sean Christopherson To: Marc Zyngier , Oliver Upton , Tianrui Zhao , Bibo Mao , Huacai Chen , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Xin Li , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, loongarch@lists.linux.dev, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Mingwei Zhang , Xudong Hao , Sandipan Das , Dapeng Mi , Xiong Zhang , Manali Shukla , Jim Mattson Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Rework nVMX's use of the MSR auto-store list to snapshot TSC to sneak MSR_IA32_TSC into the list _without_ updating KVM's software tracking, and drop the generic functionality so that future usage of the store list for nested specific logic needs to consider the implications of modifying the list. Updating the list only for vmcs02 and only on nested VM-Enter is a disaster waiting to happen, as it means vmcs01 is stale relative to the software tracking, and KVM could unintentionally leave an MSR in the store list in perpetuity while running L1, e.g. if KVM addressed the first issue and updated vmcs01 on nested VM-Exit without removing TSC from the list. Furthermore, mixing KVM's desire to save an MSR with L1's desire to save an MSR result KVM clobbering/ignoring the needs of vmcs01 or vmcs02. E.g. if KVM added MSR_IA32_TSC to the store list for its own purposes, and then _removed_ MSR_IA32_TSC from the list after emulating nested VM-Enter, then KVM would remove MSR_IA32_TSC from the list even though saving TSC on VM-Exit from L2 is still desirable (to provide L1 with an accurate TSC). Similarly, removing an MSR from the list based on vmcs12's settings could drop an MSR that KVM wants to save for its own purposes. In practice, the issues are currently benign, because KVM doesn't use the store list for vmcs01. But that will change with upcoming mediated PMU support. Alternatively, a "full" solution would be to track MSR list entries for vmcs12 separately from KVM's standard lists, but MSR_IA32_TSC is likely the only MSR that KVM would ever want to save on _every_ VM-Exit purely based on vmcs12. I.e. the added complexity isn't remotely justified at this time. Opportunistically escalate from a pr_warn_ratelimited() to a full WARN as KVM reserves eight entries in each MSR list, and as above KVM uses at most one entry. Opportunistically make vmx_find_loadstore_msr_slot() local to vmx.c as using it directly from nested code is unsafe due to the potential for mixing vmcs01 and vmcs02 state (see above). Cc: Jim Mattson Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/nested.c | 71 ++++++++++++--------------------------- arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/vmx/vmx.h | 2 +- 3 files changed, 24 insertions(+), 51 deletions(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 486789dac515..614b789ecf16 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1075,16 +1075,12 @@ static bool nested_vmx_get_vmexit_msr_value(struct = kvm_vcpu *vcpu, * does not include the time taken for emulation of the L2->L1 * VM-exit in L0, use the more accurate value. */ - if (msr_index =3D=3D MSR_IA32_TSC) { - int i =3D vmx_find_loadstore_msr_slot(&vmx->msr_autostore, - MSR_IA32_TSC); + if (msr_index =3D=3D MSR_IA32_TSC && vmx->nested.tsc_autostore_slot >=3D = 0) { + int slot =3D vmx->nested.tsc_autostore_slot; + u64 host_tsc =3D vmx->msr_autostore.val[slot].value; =20 - if (i >=3D 0) { - u64 val =3D vmx->msr_autostore.val[i].value; - - *data =3D kvm_read_l1_tsc(vcpu, val); - return true; - } + *data =3D kvm_read_l1_tsc(vcpu, host_tsc); + return true; } =20 if (kvm_emulate_msr_read(vcpu, msr_index, data)) { @@ -1163,42 +1159,6 @@ static bool nested_msr_store_list_has_msr(struct kvm= _vcpu *vcpu, u32 msr_index) return false; } =20 -static void prepare_vmx_msr_autostore_list(struct kvm_vcpu *vcpu, - u32 msr_index) -{ - struct vcpu_vmx *vmx =3D to_vmx(vcpu); - struct vmx_msrs *autostore =3D &vmx->msr_autostore; - bool in_vmcs12_store_list; - int msr_autostore_slot; - bool in_autostore_list; - int last; - - msr_autostore_slot =3D vmx_find_loadstore_msr_slot(autostore, msr_index); - in_autostore_list =3D msr_autostore_slot >=3D 0; - in_vmcs12_store_list =3D nested_msr_store_list_has_msr(vcpu, msr_index); - - if (in_vmcs12_store_list && !in_autostore_list) { - if (autostore->nr =3D=3D MAX_NR_LOADSTORE_MSRS) { - /* - * Emulated VMEntry does not fail here. Instead a less - * accurate value will be returned by - * nested_vmx_get_vmexit_msr_value() by reading KVM's - * internal MSR state instead of reading the value from - * the vmcs02 VMExit MSR-store area. - */ - pr_warn_ratelimited( - "Not enough msr entries in msr_autostore. Can't add msr %x\n", - msr_index); - return; - } - last =3D autostore->nr++; - autostore->val[last].index =3D msr_index; - } else if (!in_vmcs12_store_list && in_autostore_list) { - last =3D --autostore->nr; - autostore->val[msr_autostore_slot] =3D autostore->val[last]; - } -} - /* * Load guest's/host's cr3 at nested entry/exit. @nested_ept is true if w= e are * emulating VM-Entry into a guest with EPT enabled. On failure, the expe= cted @@ -2699,12 +2659,25 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vm= x, struct vmcs12 *vmcs12) } =20 /* - * Make sure the msr_autostore list is up to date before we set the - * count in the vmcs02. + * If vmcs12 is configured to save TSC on exit via the auto-store list, + * append the MSR to vmcs02's auto-store list so that KVM effectively + * reads TSC at the time of VM-Exit from L2. The saved value will be + * propagated to vmcs12's list on nested VM-Exit. + * + * Don't increment the number of MSRs in the vCPU structure, as saving + * TSC is specific to this particular incarnation of vmcb02, i.e. must + * not bleed into vmcs01. */ - prepare_vmx_msr_autostore_list(&vmx->vcpu, MSR_IA32_TSC); + if (nested_msr_store_list_has_msr(&vmx->vcpu, MSR_IA32_TSC) && + !WARN_ON_ONCE(vmx->msr_autostore.nr >=3D ARRAY_SIZE(vmx->msr_autostor= e.val))) { + vmx->nested.tsc_autostore_slot =3D vmx->msr_autostore.nr; + vmx->msr_autostore.val[vmx->msr_autostore.nr].index =3D MSR_IA32_TSC; =20 - vmcs_write32(VM_EXIT_MSR_STORE_COUNT, vmx->msr_autostore.nr); + vmcs_write32(VM_EXIT_MSR_STORE_COUNT, vmx->msr_autostore.nr + 1); + } else { + vmx->nested.tsc_autostore_slot =3D -1; + vmcs_write32(VM_EXIT_MSR_STORE_COUNT, vmx->msr_autostore.nr); + } vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr); vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr); =20 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 23c92c41fd83..52bcb817cc15 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1029,7 +1029,7 @@ static __always_inline void clear_atomic_switch_msr_s= pecial(struct vcpu_vmx *vmx vm_exit_controls_clearbit(vmx, exit); } =20 -int vmx_find_loadstore_msr_slot(struct vmx_msrs *m, u32 msr) +static int vmx_find_loadstore_msr_slot(struct vmx_msrs *m, u32 msr) { unsigned int i; =20 diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 4ce653d729ca..3175fedb5a4d 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -191,6 +191,7 @@ struct nested_vmx { u16 vpid02; u16 last_vpid; =20 + int tsc_autostore_slot; struct nested_vmx_msrs msrs; =20 /* SMM related state */ @@ -383,7 +384,6 @@ void vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx, u= nsigned int flags); unsigned int __vmx_vcpu_run_flags(struct vcpu_vmx *vmx); bool __vmx_vcpu_run(struct vcpu_vmx *vmx, unsigned long *regs, unsigned int flags); -int vmx_find_loadstore_msr_slot(struct vmx_msrs *m, u32 msr); void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu); =20 void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type, b= ool set); --=20 2.52.0.223.gf5cc29aaa4-goog