From nobody Sat Feb 7 15:12:30 2026 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6F8943D504 for ; Fri, 6 Feb 2026 19:09:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770404973; cv=none; b=HSugMF4584QmBWvGdLgY9xaMq94SgeEhV45xrK0xqkj9i2d/lUg6E+3+lSRuOCHsVLK7IZnR/DQxKh7aL/L7v8HLtPoJ3b+wtUgx0adOBdcfz65iI6zDIMML2TLiSRdTlkoaM64t9o7xxgIhA8ov4CJcozDkvEvyx/qF26qY/ec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770404973; c=relaxed/simple; bh=qfymSp5sdM5UdXIapZ0krZEloaywrLeuhrSrgLX05Tg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=o8VGvsYbjpujrMoGEB3ATLCTuJFQx7hjOMpcLKdS+W55t1Ri+uaFnGhbxfHlrZn0RBY+SW/RYs4QPCw96VgsU7b7SsePs+WppvZh1zvykmGRDZEmvHr6U6XYMFMdcfzIvmaogSMuILT6MVO0LyNA2/m1QvcK7WRr0uNhOeaR9oM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=gYWX9Ki9; arc=none smtp.client-ip=91.218.175.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="gYWX9Ki9" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1770404972; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A61MjfomSxxuJgw3g4fkU+eGDEdvm0za4xW5Q7zFG64=; b=gYWX9Ki9iMCvKkkGqBgqTl36povFos+FqtRT0t177HxzSTDBDYwVlf8d5B4G47YzcIp4Xk +Zf2hlCKwG0dmYlKGCeBltS8E/69J+9RzP5jjIMS0eRLcFDwcaHwD9kLO+ZJ3wOlD7+gsP ccXg19xy0+Ba8Vo3pEXd/h46QVpTlD8= From: Yosry Ahmed To: Sean Christopherson Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Yosry Ahmed , stable@vger.kernel.org Subject: [PATCH v5 14/26] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup Date: Fri, 6 Feb 2026 19:08:39 +0000 Message-ID: <20260206190851.860662-15-yosry.ahmed@linux.dev> In-Reply-To: <20260206190851.860662-1-yosry.ahmed@linux.dev> References: <20260206190851.860662-1-yosry.ahmed@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" There are currently two possible causes of VMRUN failures emulated by KVM: 1) Consistency checks failures. In this case, KVM updates the exit code in the mapped VMCB12 and exits early in nested_svm_vmrun(). This causes a few problems: A) KVM does not clear the GIF if the early consistency checks fail (because nested_svm_vmexit() is not called). Nothing requires GIF=3D0 before a VMRUN, from the APM: It is assumed that VMM software cleared GIF some time before executing the VMRUN instruction, to ensure an atomic state switch. So an early #VMEXIT from early consistency checks could leave the GIF set. B) svm_leave_smm() is missing consistency checks on the newly loaded guest state, because the checks aren't performed by enter_svm_guest_mode(). 2) Failure to load L2's CR3 or merge the MSR bitmaps. In this case, a fully-fledged #VMEXIT injection is performed as VMCB02 is already prepared. Arguably all VMRUN failures should be handled before the VMCB02 is prepared, but with proper cleanup (e.g. clear the GIF). Move all the potential failure checks inside enter_svm_guest_mode() before switching to VMCB02. On failure of any of these checks, nested_svm_vmrun() synthesizes a minimal #VMEXIT through the new nested_svm_failed_vmrun() helper. __nested_svm_vmexit() already performs the necessary cleanup for a failed VMRUN, including uninitializing the nested MMU and reloading L1's CR3. This ensures that consistency check failures do proper necessary cleanup, while other failures do not doo too much cleanup. It also leaves a unified path for handling VMRUN failures. Cc: stable@vger.kernel.org Fixes: 52c65a30a5c6 ("KVM: SVM: Check for nested vmrun intercept before emu= lating vmrun") Suggested-by: Sean Christopherson Signed-off-by: Yosry Ahmed --- arch/x86/kvm/svm/nested.c | 66 +++++++++++++++++++++++++-------------- 1 file changed, 42 insertions(+), 24 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index a852508d7419..918f6a6eaf56 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -934,22 +934,19 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 v= mcb12_gpa, vmcb12->control.intercepts[INTERCEPT_WORD4], vmcb12->control.intercepts[INTERCEPT_WORD5]); =20 - svm->nested.vmcb12_gpa =3D vmcb12_gpa; =20 WARN_ON(svm->vmcb =3D=3D svm->nested.vmcb02.ptr); =20 enter_guest_mode(vcpu); =20 + if (!nested_vmcb_check_save(vcpu, &svm->nested.save) || + !nested_vmcb_check_controls(vcpu, &svm->nested.ctl)) + return -EINVAL; + if (nested_npt_enabled(svm)) nested_svm_init_mmu_context(vcpu); =20 - nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr); - - svm_switch_vmcb(svm, &svm->nested.vmcb02); - nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base= ); - nested_vmcb02_prepare_save(svm, vmcb12); - ret =3D nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3, nested_npt_enabled(svm), from_vmrun); if (ret) @@ -961,6 +958,17 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vm= cb12_gpa, return ret; } =20 + /* + * Any VMRUN failure needs to happen before this point, such that the + * nested #VMEXIT is injected properly by nested_svm_vmrun_error_vmexit(). + */ + + nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr); + + svm_switch_vmcb(svm, &svm->nested.vmcb02); + nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base= ); + nested_vmcb02_prepare_save(svm, vmcb12); + if (!from_vmrun) kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu); =20 @@ -979,6 +987,8 @@ static void __nested_svm_vmexit(struct vcpu_svm *svm) struct vmcb *vmcb01 =3D svm->vmcb01.ptr; struct kvm_vcpu *vcpu =3D &svm->vcpu; =20 + WARN_ON_ONCE(is_guest_mode(vcpu)); + svm->nested.vmcb12_gpa =3D 0; svm->nested.ctl.nested_cr3 =3D 0; =20 @@ -1002,6 +1012,20 @@ static void __nested_svm_vmexit(struct vcpu_svm *svm) kvm_queue_exception(vcpu, DB_VECTOR); } =20 +static void nested_svm_vmrun_error_vmexit(struct kvm_vcpu *vcpu, struct vm= cb *vmcb12) +{ + struct vcpu_svm *svm =3D to_svm(vcpu); + + WARN_ON_ONCE(svm->vmcb =3D=3D svm->nested.vmcb02.ptr); + + leave_guest_mode(vcpu); + + vmcb12->control.exit_code =3D SVM_EXIT_ERR; + vmcb12->control.exit_info_1 =3D 0; + vmcb12->control.exit_info_2 =3D 0; + __nested_svm_vmexit(svm); +} + int nested_svm_vmrun(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm =3D to_svm(vcpu); @@ -1044,14 +1068,6 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) nested_copy_vmcb_control_to_cache(svm, &vmcb12->control); nested_copy_vmcb_save_to_cache(svm, &vmcb12->save); =20 - if (!nested_vmcb_check_save(vcpu, &svm->nested.save) || - !nested_vmcb_check_controls(vcpu, &svm->nested.ctl)) { - vmcb12->control.exit_code =3D SVM_EXIT_ERR; - vmcb12->control.exit_info_1 =3D 0; - vmcb12->control.exit_info_2 =3D 0; - goto out; - } - /* * Since vmcb01 is not in use, we can use it to store some of the L1 * state. @@ -1072,14 +1088,9 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) svm->nmi_l1_to_l2 =3D false; svm->soft_int_injected =3D false; =20 - svm->vmcb->control.exit_code =3D SVM_EXIT_ERR; - svm->vmcb->control.exit_info_1 =3D 0; - svm->vmcb->control.exit_info_2 =3D 0; - - nested_svm_vmexit(svm); + nested_svm_vmrun_error_vmexit(vcpu, vmcb12); } =20 -out: kvm_vcpu_unmap(vcpu, &map); =20 return ret; @@ -1217,6 +1228,13 @@ void nested_svm_vmexit(struct vcpu_svm *svm) if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS)) vmcb01->control.erap_ctl |=3D ERAP_CONTROL_CLEAR_RAP; =20 + /* + * nested_svm_vmexit() is intended for use only when KVM is synthesizing + * a #VMEXIT after a successful nested VMRUN. All VMRUN consistency + * checks must be performed before loading guest state, and so should + * use __nested_svm_vmexit(). + */ + WARN_ON_ONCE(svm->vmcb !=3D svm->nested.vmcb02.ptr); svm_switch_vmcb(svm, &svm->vmcb01); =20 /* @@ -1903,9 +1921,6 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, if (nested_npt_enabled(svm)) nested_svm_init_mmu_context(vcpu); =20 - svm_switch_vmcb(svm, &svm->nested.vmcb02); - nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.c= s.base); - /* * While the nested guest CR3 is already checked and set by * KVM_SET_SREGS, it was set when nested state was yet loaded, @@ -1917,6 +1932,9 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, if (ret) goto out_free; =20 + svm_switch_vmcb(svm, &svm->nested.vmcb02); + nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.c= s.base); + svm->nested.force_msr_bitmap_recalc =3D true; =20 kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu); --=20 2.53.0.rc2.204.g2597b5adb4-goog