From nobody Thu Apr 2 06:31:33 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5B52366570; Tue, 3 Mar 2026 00:34:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772498079; cv=none; b=G37UgX1MfmARWwhCzRYm817U9wxdcUR9HgS5g03jsvUi1GRjl3kWhPymWhfr/zcV9+rI9sLQl4eZYHFKNF2JgPwlOqDttqJJI8IAxVw/ES0AZ9kU04Wjf2G6d3nEbM+9g+zWinDxKfU9q1e2/mqxgQo5gZWtVWJs2YCYGeNZyK4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772498079; c=relaxed/simple; bh=iVkzc2Qb7bTJIvKcSCArEB8mpEt69UegMebDNgIwCYE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bBEVKacY4ycHuJSYdnyPmtuw32Ym3OtBjhuYzfu2Ij2YCs6OPwaoWYKKWvsk4HeW1ehDHjbbqaeyuJHkRM34qXCbGi6IRqZ+2X1mTEy2wBP/LYSuUMHus2kamlK4bMoz52hVnu2fO3DQ+m0UzLrG8Z4IOx5Z8oPX3sXy3M6VJTE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=awMYoRgp; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="awMYoRgp" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7674EC2BC86; Tue, 3 Mar 2026 00:34:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772498079; bh=iVkzc2Qb7bTJIvKcSCArEB8mpEt69UegMebDNgIwCYE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=awMYoRgpDF3eEds0iqiGlCTaw6WYmpAU6tAPKUgSUs0TkrjDFzpwo2nnMIdXIGJsu nTdgR2cnCKmq7I7z9SyZWMVHsU4H4H4/IFq3PXMlZPdFeFWjpgcAYKe/IuUKVMAok3 fEy6MSDZ3W0Jbn7juz0nOX6vzgtM95UIwd4Iq4vuzW0JdQNqZQqOPn8VqDYCwIJB8b gHIzOUhBM+0wcR3vcZB8TsfAzGhfJjTZWkbO6z28BAZwQUldUGObobEXIEdhviXp6p 6s8uU/4Ilyt+F2uME2oO2K5qryXF/SJDAB+LDIZA3gwbWJBjgEzR78Y0xtFsryi1Fq 8FhhW1Ht5n1bQ== From: Yosry Ahmed To: Sean Christopherson Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Yosry Ahmed Subject: [PATCH v7 20/26] KVM: nSVM: Cache all used fields from VMCB12 Date: Tue, 3 Mar 2026 00:34:14 +0000 Message-ID: <20260303003421.2185681-21-yosry@kernel.org> X-Mailer: git-send-email 2.53.0.473.g4a7958ca14-goog In-Reply-To: <20260303003421.2185681-1-yosry@kernel.org> References: <20260303003421.2185681-1-yosry@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, most fields used from VMCB12 are cached in svm->nested.{ctl/save}. This is mainly to avoid TOC-TOU bugs. However, for the save area, only the fields used in the consistency checks (i.e. nested_vmcb_check_save()) were being cached. Other fields are read directly from guest memory in nested_vmcb02_prepare_save(). While probably benign, this still makes it possible for TOC-TOU bugs to happen. For example, RAX, RSP, and RIP are read twice, once to store in VMCB02, and once to store in vcpu->arch.regs. It is possible for the guest to modify the value between both reads, potentially causing nasty bugs. Harden against such bugs by caching everything in svm->nested.save. Cache all the needed fields, and keep all accesses to the VMCB12 strictly in nested_svm_vmrun() for caching and early error injection. Following changes will further limit the access to the VMCB12 in the nested VMRUN path. Introduce vmcb12_is_dirty() to use with the cached control fields instead of vmcb_is_dirty(), similar to vmcb12_is_intercept(). Opportunistically order the copies in __nested_copy_vmcb_save_to_cache() by the order in which the fields are defined in struct vmcb_save_area. Signed-off-by: Yosry Ahmed --- arch/x86/kvm/svm/nested.c | 118 ++++++++++++++++++++++---------------- arch/x86/kvm/svm/svm.c | 2 +- arch/x86/kvm/svm/svm.h | 27 ++++++++- 3 files changed, 94 insertions(+), 53 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 5994e309831d0..f89040a467d4a 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -518,11 +518,11 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_v= cpu *vcpu, to->asid =3D from->asid; to->msrpm_base_pa &=3D ~0x0fffULL; to->iopm_base_pa &=3D ~0x0fffULL; + to->clean =3D from->clean; =20 #ifdef CONFIG_KVM_HYPERV /* Hyper-V extensions (Enlightened VMCB) */ if (kvm_hv_hypercall_enabled(vcpu)) { - to->clean =3D from->clean; memcpy(&to->hv_enlightenments, &from->hv_enlightenments, sizeof(to->hv_enlightenments)); } @@ -538,19 +538,34 @@ void nested_copy_vmcb_control_to_cache(struct vcpu_sv= m *svm, static void __nested_copy_vmcb_save_to_cache(struct vmcb_save_area_cached = *to, struct vmcb_save_area *from) { - /* - * Copy only fields that are validated, as we need them - * to avoid TOC/TOU races. - */ + to->es =3D from->es; to->cs =3D from->cs; + to->ss =3D from->ss; + to->ds =3D from->ds; + to->gdtr =3D from->gdtr; + to->idtr =3D from->idtr; + + to->cpl =3D from->cpl; =20 to->efer =3D from->efer; - to->cr0 =3D from->cr0; - to->cr3 =3D from->cr3; to->cr4 =3D from->cr4; - - to->dr6 =3D from->dr6; + to->cr3 =3D from->cr3; + to->cr0 =3D from->cr0; to->dr7 =3D from->dr7; + to->dr6 =3D from->dr6; + + to->rflags =3D from->rflags; + to->rip =3D from->rip; + to->rsp =3D from->rsp; + + to->s_cet =3D from->s_cet; + to->ssp =3D from->ssp; + to->isst_addr =3D from->isst_addr; + + to->rax =3D from->rax; + to->cr2 =3D from->cr2; + + svm_copy_lbrs(to, from); } =20 void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm, @@ -692,8 +707,10 @@ static bool nested_vmcb12_has_lbrv(struct kvm_vcpu *vc= pu) (to_svm(vcpu)->nested.ctl.misc_ctl2 & SVM_MISC2_ENABLE_V_LBR); } =20 -static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *= vmcb12) +static void nested_vmcb02_prepare_save(struct vcpu_svm *svm) { + struct vmcb_ctrl_area_cached *control =3D &svm->nested.ctl; + struct vmcb_save_area_cached *save =3D &svm->nested.save; bool new_vmcb12 =3D false; struct vmcb *vmcb01 =3D svm->vmcb01.ptr; struct vmcb *vmcb02 =3D svm->nested.vmcb02.ptr; @@ -709,48 +726,48 @@ static void nested_vmcb02_prepare_save(struct vcpu_sv= m *svm, struct vmcb *vmcb12 svm->nested.force_msr_bitmap_recalc =3D true; } =20 - if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_SEG))) { - vmcb02->save.es =3D vmcb12->save.es; - vmcb02->save.cs =3D vmcb12->save.cs; - vmcb02->save.ss =3D vmcb12->save.ss; - vmcb02->save.ds =3D vmcb12->save.ds; - vmcb02->save.cpl =3D vmcb12->save.cpl; + if (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_SEG))) { + vmcb02->save.es =3D save->es; + vmcb02->save.cs =3D save->cs; + vmcb02->save.ss =3D save->ss; + vmcb02->save.ds =3D save->ds; + vmcb02->save.cpl =3D save->cpl; vmcb_mark_dirty(vmcb02, VMCB_SEG); } =20 - if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_DT))) { - vmcb02->save.gdtr =3D vmcb12->save.gdtr; - vmcb02->save.idtr =3D vmcb12->save.idtr; + if (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_DT))) { + vmcb02->save.gdtr =3D save->gdtr; + vmcb02->save.idtr =3D save->idtr; vmcb_mark_dirty(vmcb02, VMCB_DT); } =20 if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) && - (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_CET)))) { - vmcb02->save.s_cet =3D vmcb12->save.s_cet; - vmcb02->save.isst_addr =3D vmcb12->save.isst_addr; - vmcb02->save.ssp =3D vmcb12->save.ssp; + (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_CET)))) { + vmcb02->save.s_cet =3D save->s_cet; + vmcb02->save.isst_addr =3D save->isst_addr; + vmcb02->save.ssp =3D save->ssp; vmcb_mark_dirty(vmcb02, VMCB_CET); } =20 - kvm_set_rflags(vcpu, vmcb12->save.rflags | X86_EFLAGS_FIXED); + kvm_set_rflags(vcpu, save->rflags | X86_EFLAGS_FIXED); =20 svm_set_efer(vcpu, svm->nested.save.efer); =20 svm_set_cr0(vcpu, svm->nested.save.cr0); svm_set_cr4(vcpu, svm->nested.save.cr4); =20 - svm->vcpu.arch.cr2 =3D vmcb12->save.cr2; + svm->vcpu.arch.cr2 =3D save->cr2; =20 - kvm_rax_write(vcpu, vmcb12->save.rax); - kvm_rsp_write(vcpu, vmcb12->save.rsp); - kvm_rip_write(vcpu, vmcb12->save.rip); + kvm_rax_write(vcpu, save->rax); + kvm_rsp_write(vcpu, save->rsp); + kvm_rip_write(vcpu, save->rip); =20 /* In case we don't even reach vcpu_run, the fields are not updated */ - vmcb02->save.rax =3D vmcb12->save.rax; - vmcb02->save.rsp =3D vmcb12->save.rsp; - vmcb02->save.rip =3D vmcb12->save.rip; + vmcb02->save.rax =3D save->rax; + vmcb02->save.rsp =3D save->rsp; + vmcb02->save.rip =3D save->rip; =20 - if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_DR))) { + if (unlikely(new_vmcb12 || vmcb12_is_dirty(control, VMCB_DR))) { vmcb02->save.dr7 =3D svm->nested.save.dr7 | DR7_FIXED_1; svm->vcpu.arch.dr6 =3D svm->nested.save.dr6 | DR6_ACTIVE_LOW; vmcb_mark_dirty(vmcb02, VMCB_DR); @@ -761,7 +778,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm = *svm, struct vmcb *vmcb12 * Reserved bits of DEBUGCTL are ignored. Be consistent with * svm_set_msr's definition of reserved bits. */ - svm_copy_lbrs(&vmcb02->save, &vmcb12->save); + svm_copy_lbrs(&vmcb02->save, save); vmcb02->save.dbgctl &=3D ~DEBUGCTL_RESERVED_BITS; } else { svm_copy_lbrs(&vmcb02->save, &vmcb01->save); @@ -984,28 +1001,29 @@ static void nested_svm_copy_common_state(struct vmcb= *from_vmcb, struct vmcb *to to_vmcb->save.spec_ctrl =3D from_vmcb->save.spec_ctrl; } =20 -int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, - struct vmcb *vmcb12, bool from_vmrun) +int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, bool from_= vmrun) { struct vcpu_svm *svm =3D to_svm(vcpu); + struct vmcb_ctrl_area_cached *control =3D &svm->nested.ctl; + struct vmcb_save_area_cached *save =3D &svm->nested.save; int ret; =20 trace_kvm_nested_vmenter(svm->vmcb->save.rip, vmcb12_gpa, - vmcb12->save.rip, - vmcb12->control.int_ctl, - vmcb12->control.event_inj, - vmcb12->control.misc_ctl, - vmcb12->control.nested_cr3, - vmcb12->save.cr3, + save->rip, + control->int_ctl, + control->event_inj, + control->misc_ctl, + control->nested_cr3, + save->cr3, KVM_ISA_SVM); =20 - trace_kvm_nested_intercepts(vmcb12->control.intercepts[INTERCEPT_CR] & 0x= ffff, - vmcb12->control.intercepts[INTERCEPT_CR] >> 16, - vmcb12->control.intercepts[INTERCEPT_EXCEPTION], - vmcb12->control.intercepts[INTERCEPT_WORD3], - vmcb12->control.intercepts[INTERCEPT_WORD4], - vmcb12->control.intercepts[INTERCEPT_WORD5]); + trace_kvm_nested_intercepts(control->intercepts[INTERCEPT_CR] & 0xffff, + control->intercepts[INTERCEPT_CR] >> 16, + control->intercepts[INTERCEPT_EXCEPTION], + control->intercepts[INTERCEPT_WORD3], + control->intercepts[INTERCEPT_WORD4], + control->intercepts[INTERCEPT_WORD5]); =20 =20 svm->nested.vmcb12_gpa =3D vmcb12_gpa; @@ -1015,8 +1033,8 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 v= mcb12_gpa, nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr); =20 svm_switch_vmcb(svm, &svm->nested.vmcb02); - nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base= ); - nested_vmcb02_prepare_save(svm, vmcb12); + nested_vmcb02_prepare_control(svm, save->rip, save->cs.base); + nested_vmcb02_prepare_save(svm); =20 ret =3D nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3, nested_npt_enabled(svm), from_vmrun); @@ -1104,7 +1122,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) =20 svm->nested.nested_run_pending =3D 1; =20 - if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true)) + if (enter_svm_guest_mode(vcpu, vmcb12_gpa, true)) goto out_exit_err; =20 if (nested_svm_merge_msrpm(vcpu)) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 94e14badddfa2..19112ec48c0f7 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4885,7 +4885,7 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const= union kvm_smram *smram) vmcb12 =3D map.hva; nested_copy_vmcb_control_to_cache(svm, &vmcb12->control); nested_copy_vmcb_save_to_cache(svm, &vmcb12->save); - ret =3D enter_svm_guest_mode(vcpu, smram64->svm_guest_vmcb_gpa, vmcb12, f= alse); + ret =3D enter_svm_guest_mode(vcpu, smram64->svm_guest_vmcb_gpa, false); =20 if (ret) goto unmap_save; diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 304328c33e960..388aaa5d63d29 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -140,13 +140,32 @@ struct kvm_vmcb_info { }; =20 struct vmcb_save_area_cached { + struct vmcb_seg es; struct vmcb_seg cs; + struct vmcb_seg ss; + struct vmcb_seg ds; + struct vmcb_seg gdtr; + struct vmcb_seg idtr; + u8 cpl; u64 efer; u64 cr4; u64 cr3; u64 cr0; u64 dr7; u64 dr6; + u64 rflags; + u64 rip; + u64 rsp; + u64 s_cet; + u64 ssp; + u64 isst_addr; + u64 rax; + u64 cr2; + u64 dbgctl; + u64 br_from; + u64 br_to; + u64 last_excp_from; + u64 last_excp_to; }; =20 struct vmcb_ctrl_area_cached { @@ -421,6 +440,11 @@ static inline bool vmcb_is_dirty(struct vmcb *vmcb, in= t bit) return !test_bit(bit, (unsigned long *)&vmcb->control.clean); } =20 +static inline bool vmcb12_is_dirty(struct vmcb_ctrl_area_cached *control, = int bit) +{ + return !test_bit(bit, (unsigned long *)&control->clean); +} + static __always_inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu) { return container_of(vcpu, struct vcpu_svm, vcpu); @@ -785,8 +809,7 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *= svm) =20 int __init nested_svm_init_msrpm_merge_offsets(void); =20 -int enter_svm_guest_mode(struct kvm_vcpu *vcpu, - u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun); +int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb_gpa, bool from_vm= run); void svm_leave_nested(struct kvm_vcpu *vcpu); void svm_free_nested(struct vcpu_svm *svm); int svm_allocate_nested(struct vcpu_svm *svm); --=20 2.53.0.473.g4a7958ca14-goog