From nobody Thu Apr 2 20:28:00 2026 Received: from out-178.mta0.migadu.com (out-178.mta0.migadu.com [91.218.175.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C85F9335541 for ; Thu, 12 Feb 2026 23:08:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770937701; cv=none; b=TLVY5nTVm8lOHjZqOHCtPvKC/x9FY+IS0zi5nkEParmGDppBUYUPD2ThpgFZYqcJ/w8lMaPzpqsap5AIyLZDuZTROHvYpeaazItXGnU1mcPqV4U8hZiYMDbHqaAb5qW84jQHNMlewFbXnaOVU7fawpRYOGh6487/kEgsJn6Tt+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770937701; c=relaxed/simple; bh=MaFLxnPmIAi2D81bbPTiKq+h4zOElAFRVTBuKq1e8AE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fkX6hGjEqXpQej0+1WE3dkICMG4AYunfg8YyMaTT5eHdZhMPP0Te6o4Zw1m1YS5fSZuuU1yYtKCdkjEmKriFQ4oW7aCxV9Ukw1SYubUqZXOxRrG4kLvBlNEWVov1Tq3WaKOHVe3iOrlKRZ+/ggrwawhuzBuTyPE2YgHVhmomBZo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=uuM7iAAI; arc=none smtp.client-ip=91.218.175.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="uuM7iAAI" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1770937697; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DVqs+uZzYtA2VPzEWogopVZL7ayy3RSndHPHR6Lrphk=; b=uuM7iAAI+AkWVPDkTlGU4sZwrqvRWHBMqk9lVi8p2ITpr3ifJKVZ/mJIoJhDmMIXO5YZmw 9Y6cCQezV6uaPh8aW6JAXvGjmLkvhLqKVf5A/XvUOYvk06c5qSLiVAeIiAZ4B1bzbbf15m owkfnhF4o6zcIOV7YQAPjUpNN0MJ2mc= From: Yosry Ahmed To: Sean Christopherson Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Yosry Ahmed , stable@vger.kernel.org Subject: [RFC PATCH 1/5] KVM: nSVM: Do not use L2's RIP for vmcb02's NextRIP after first L2 VMRUN Date: Thu, 12 Feb 2026 23:07:47 +0000 Message-ID: <20260212230751.1871720-2-yosry.ahmed@linux.dev> In-Reply-To: <20260212230751.1871720-1-yosry.ahmed@linux.dev> References: <20260212230751.1871720-1-yosry.ahmed@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" For guests with NRIPS disabled, L1 does not provide NextRIP when running an L2 with an injected soft interrupt, instead it advances L2's RIP before running it. KVM uses L2's RIP as the NextRIP in vmcb02 to emulate a CPU without NRIPS. However, after L2 runs the first time, NextRIP will be updated by the CPU and/or KVM, and L2's RIP is no longer the correct value to use in vmcb02. Hence, after save/restore, do not use L2's RIP if a nested run is not pending (i.e. L2 has run at least once), use the NextRIP value. Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET= _NESTED_STATE") CC: stable@vger.kernel.org Signed-off-by: Yosry Ahmed --- arch/x86/kvm/svm/nested.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index de90b104a0dd..eebbe00714e3 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -844,14 +844,18 @@ static void nested_vmcb02_prepare_control(struct vcpu= _svm *svm, vmcb02->control.event_inj_err =3D svm->nested.ctl.event_inj_err; =20 /* - * next_rip is consumed on VMRUN as the return address pushed on the + * NextRIP is consumed on VMRUN as the return address pushed on the * stack for injected soft exceptions/interrupts. If nrips is exposed - * to L1, take it verbatim from vmcb12. If nrips is supported in - * hardware but not exposed to L1, stuff the actual L2 RIP to emulate - * what a nrips=3D0 CPU would do (L1 is responsible for advancing RIP - * prior to injecting the event). + * to L1, take it verbatim from vmcb12. + * + * If nrips is supported in hardware but not exposed to L1, stuff the + * actual L2 RIP to emulate what a nrips=3D0 CPU would do (L1 is + * responsible for advancing RIP prior to injecting the event). This is + * only the case for the first L2 run after VMRUN. After that (e.g. + * during save/restore), NextRIP is updated by the CPU and/or KVM, and + * the value of the L2 RIP from vmcb12 should not be used. */ - if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) || !svm->nested.nested_run= _pending) vmcb02->control.next_rip =3D svm->nested.ctl.next_rip; else if (boot_cpu_has(X86_FEATURE_NRIPS)) vmcb02->control.next_rip =3D vmcb12_rip; --=20 2.53.0.273.g2a3d683680-goog From nobody Thu Apr 2 20:28:00 2026 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2EE7E33373B; Thu, 12 Feb 2026 23:08:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770937702; cv=none; b=cvqBzpKNLJ/VdKaiHCz6ZDowdMETxVDBdGrU9/TNautmiL8tyHKSdRgscbuwkDbI5ldY2CTDTXV25lvdGIfcfb0yQjK1zoXD0/cJG/WzUFpLGqiSlEA15IuNpI2uholMH9PCvod5tHgcxH1HH7EQjy+aFhELY5LYGrCOzNv0BRY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770937702; c=relaxed/simple; bh=F2rq/QYq+VDgmercyp1Pz8SbSpv4Yd+LzIuEDP+JTEk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=d906xN+mG4v8rB+wE9Ffg3/EHRxIZixmQJfr9EbUOF09QBduRRZJlQjUiflhM6ukccRYBWjVCU49CtGhCaAESgCdCPv/0EB82j3EeVNV5oMjL8sToWi60wAQPJQMw0z8xVq5LmOtsmE8RMa2XoAN2VmbAXL+WPxOXeeW14gbCPg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=c2BziKKm; arc=none smtp.client-ip=91.218.175.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="c2BziKKm" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1770937699; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gIvqwA3oUv5Iv4Tz3qmaczAU3wfiUNio/9aouuNfKhc=; b=c2BziKKm8dLsguRqFovZXpe/oi8jixDo85HEHP56AVVx1fOdO8Dp5jP7sNoGuEmpFFxV/I vokRQ0g9ZqAt5RBkiX/gWL7S8fdI9v1exdgLVgJYDnCru309Y8xopw/cZCuYSirNzThDcf npy1BMCz9KLiVN2yNmTkFQp3iUMaDpw= From: Yosry Ahmed To: Sean Christopherson Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Yosry Ahmed , stable@vger.kernel.org Subject: [RFC PATCH 2/5] KVM: nSVM: Use the correct RIP when restoring vmcb02's control area Date: Thu, 12 Feb 2026 23:07:48 +0000 Message-ID: <20260212230751.1871720-3-yosry.ahmed@linux.dev> In-Reply-To: <20260212230751.1871720-1-yosry.ahmed@linux.dev> References: <20260212230751.1871720-1-yosry.ahmed@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" In svm_set_nested_state(), the value of RIP from vmcb02 is passed into nested_vmcb02_prepare_control(). However, even if RIP is restored with KVM_SET_REGS prior to KVM_SET_NESTED_STATE, its value is not reflected into the VMCB until the vCPU is run. Use the value from KVM's cache instead, which is what KVM_SET_REGS updates. Not that the passed RIP is still incorrect if KVM_SET_REGS is not called prior to KVM_SET_NESTED_STATE, this will be fixed separately. Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET= _NESTED_STATE") CC: stable@vger.kernel.org Signed-off-by: Yosry Ahmed --- arch/x86/kvm/svm/nested.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index eebbe00714e3..aec17c80ed73 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -1911,7 +1911,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, nested_copy_vmcb_control_to_cache(svm, ctl); =20 svm_switch_vmcb(svm, &svm->nested.vmcb02); - nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.c= s.base); + nested_vmcb02_prepare_control(svm, kvm_rip_read(vcpu), svm->vmcb->save.cs= .base); =20 /* * While the nested guest CR3 is already checked and set by --=20 2.53.0.273.g2a3d683680-goog From nobody Thu Apr 2 20:28:00 2026 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B18133AD97 for ; Thu, 12 Feb 2026 23:08:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770937705; cv=none; b=He8fsn/A288hW8CDic7IHjWnkxc0XBK6ReIoNqq+rabr6AtqHGsl+zhU/fmZdBKsYyDhSEZu9C+r6pgf6u3901/CaTgUzTAQgTLmWYeH2hpS0LND5QZGAxuFFtnd27ailTg9REpzM8iajz0Bwp2kqI5NZpCej8MP2Fc0CL2IvHM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770937705; c=relaxed/simple; bh=rX2Oi31usLIlskT+HFF+rdeWcZqC2SD2sO4VNem1jf0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kzpmMU8n+9IEfObWOcWiLX5/6lh4jIjxpvM3iLzsC7m/1zdnSX1i5FrBepB7Xh898SXeQPeCgbiqS2zU7TCUUDraH8pGBbslUZw7HEkAWPKiJoTJmmyjeSrwJYW6tqWqPh5pjXKbeGebXXfqDssSSDqUTn1EAoxAPZ8rIf3Czc0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=P7w5SE72; arc=none smtp.client-ip=91.218.175.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="P7w5SE72" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1770937701; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bRQktyz1OWXBe0KR8az0RWyUUDXfQ1olq/zdmRsnwZg=; b=P7w5SE72XnOAQB+33lRf5Zg+GSsI7ciIXES1PagK33oTGg+Vk7TphZ3oyDWyLiMEYCZN5e P4NKR2bojMB/l6DHFz01KWGMCH7mvVsbnwx663j7nbszNYGrMRFlTe18GiKuBdoRTV91bx xjzX2FgsqTkF8X5vTKhKXBhbmMejwlY= From: Yosry Ahmed To: Sean Christopherson Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Yosry Ahmed , stable@vger.kernel.org Subject: [RFC PATCH 3/5] KVM: nSVM: Move updating NextRIP and soft IRQ RIPs into a helper Date: Thu, 12 Feb 2026 23:07:49 +0000 Message-ID: <20260212230751.1871720-4-yosry.ahmed@linux.dev> In-Reply-To: <20260212230751.1871720-1-yosry.ahmed@linux.dev> References: <20260212230751.1871720-1-yosry.ahmed@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Move the logic for updating NextRIP and soft interrupt tracking fields out of nested_vmcb02_prepare_control() into a helper, in preparation for re-using the same logic to fixup the RIPs during save/restore. No functional change intended. CC: stable@vger.kernel.org Signed-off-by: Yosry Ahmed --- arch/x86/kvm/svm/nested.c | 64 +++++++++++++++++++++++---------------- arch/x86/kvm/svm/svm.h | 2 ++ 2 files changed, 40 insertions(+), 26 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index aec17c80ed73..af7a0113f269 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -741,6 +741,43 @@ static bool is_evtinj_nmi(u32 evtinj) return type =3D=3D SVM_EVTINJ_TYPE_NMI; } =20 +void nested_vmcb02_prepare_rips(struct kvm_vcpu *vcpu, unsigned long csbas= e, + unsigned long rip) +{ + struct vcpu_svm *svm =3D to_svm(vcpu); + + if (WARN_ON_ONCE(svm->vmcb !=3D svm->nested.vmcb02.ptr)) + return; + + /* + * NextRIP is consumed on VMRUN as the return address pushed on the + * stack for injected soft exceptions/interrupts. If nrips is exposed + * to L1, take it verbatim. + * + * If nrips is supported in hardware but not exposed to L1, stuff the + * actual L2 RIP to emulate what a nrips=3D0 CPU would do (L1 is + * responsible for advancing RIP prior to injecting the event). This is + * only the case for the first L2 run after VMRUN. After that (e.g. + * during save/restore), NextRIP is updated by the CPU and/or KVM, and + * the value of the L2 RIP should not be used. + */ + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) || !svm->nested.nested_run= _pending) + svm->vmcb->control.next_rip =3D svm->nested.ctl.next_rip; + else if (boot_cpu_has(X86_FEATURE_NRIPS)) + svm->vmcb->control.next_rip =3D rip; + + if (!is_evtinj_soft(svm->nested.ctl.event_inj)) + return; + + svm->soft_int_injected =3D true; + svm->soft_int_csbase =3D csbase; + svm->soft_int_old_rip =3D rip; + if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) + svm->soft_int_next_rip =3D svm->nested.ctl.next_rip; + else + svm->soft_int_next_rip =3D rip; +} + static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, unsigned long vmcb12_rip, unsigned long vmcb12_csbase) @@ -843,33 +880,8 @@ static void nested_vmcb02_prepare_control(struct vcpu_= svm *svm, vmcb02->control.event_inj =3D svm->nested.ctl.event_inj; vmcb02->control.event_inj_err =3D svm->nested.ctl.event_inj_err; =20 - /* - * NextRIP is consumed on VMRUN as the return address pushed on the - * stack for injected soft exceptions/interrupts. If nrips is exposed - * to L1, take it verbatim from vmcb12. - * - * If nrips is supported in hardware but not exposed to L1, stuff the - * actual L2 RIP to emulate what a nrips=3D0 CPU would do (L1 is - * responsible for advancing RIP prior to injecting the event). This is - * only the case for the first L2 run after VMRUN. After that (e.g. - * during save/restore), NextRIP is updated by the CPU and/or KVM, and - * the value of the L2 RIP from vmcb12 should not be used. - */ - if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) || !svm->nested.nested_run= _pending) - vmcb02->control.next_rip =3D svm->nested.ctl.next_rip; - else if (boot_cpu_has(X86_FEATURE_NRIPS)) - vmcb02->control.next_rip =3D vmcb12_rip; - svm->nmi_l1_to_l2 =3D is_evtinj_nmi(vmcb02->control.event_inj); - if (is_evtinj_soft(vmcb02->control.event_inj)) { - svm->soft_int_injected =3D true; - svm->soft_int_csbase =3D vmcb12_csbase; - svm->soft_int_old_rip =3D vmcb12_rip; - if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS)) - svm->soft_int_next_rip =3D svm->nested.ctl.next_rip; - else - svm->soft_int_next_rip =3D vmcb12_rip; - } + nested_vmcb02_prepare_rips(vcpu, vmcb12_csbase, vmcb12_rip); =20 /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */ =20 diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index ebd7b36b1ceb..057281dda487 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -809,6 +809,8 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *sv= m, void nested_sync_control_from_vmcb02(struct vcpu_svm *svm); void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm); void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vm= cb); +void nested_vmcb02_prepare_rips(struct kvm_vcpu *vcpu, unsigned long csbas= e, + unsigned long rip); =20 extern struct kvm_x86_nested_ops svm_nested_ops; =20 --=20 2.53.0.273.g2a3d683680-goog From nobody Thu Apr 2 20:28:00 2026 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B741F35E525 for ; Thu, 12 Feb 2026 23:08:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.186 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770937707; cv=none; b=DE/BKTFrfrjY6RAotliJtnRc/l4pKra11baIurPoIs7PdMrmn1dHLGHWPyYM4qJ2N1ckWnbwUWVRJDlEo9ct5s2u4gtowsHljXeZIu18xoQh7dJFBxBFp6AnYTk6cHCaIuntY+3v+iuoy4WOgu9aEWnccPv0J09AVWjWDoWea2I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770937707; c=relaxed/simple; bh=QvCQhbaKFsPu61SMIC48+FImwnhjQHq0q3yxVCOyg1s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tiGvCmAfXjzYhmZ0AQfLDVldbdg9Jb9kZGN/r4kPemR7dmTq4d+LgHB5xhwsgob7PQNbwtiAtNO5izqXB2TB06zHo+jLvKr49dHNUD5Qjjw80hIMg6Q54KCM+tzh9NQGTP2rAEbFwyw3n4UKnpk7CMS2Vzm26SDmubsl5tluukM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=w0hL4t1W; arc=none smtp.client-ip=91.218.175.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="w0hL4t1W" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1770937702; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Aea0BoA8HcYLsHpQYiRS3wr3b9yRtg1pHhwr45NJjRA=; b=w0hL4t1WXLmK73m7S21kg8sG2GwxIuonzAHp+N0Q3pNUfmkaAE7yefuza+/JH6gSXgtOTa 1XZgeJRrJYbOSE4xrL1BH+yQH6DChYBqlhADGqRf3ELmWOzOE90igaeHmgL2kr7sPuMwT2 dAmYyjvaKw0MJH3cbA4Unqfq3xer0h8= From: Yosry Ahmed To: Sean Christopherson Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Yosry Ahmed , stable@vger.kernel.org Subject: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS Date: Thu, 12 Feb 2026 23:07:50 +0000 Message-ID: <20260212230751.1871720-5-yosry.ahmed@linux.dev> In-Reply-To: <20260212230751.1871720-1-yosry.ahmed@linux.dev> References: <20260212230751.1871720-1-yosry.ahmed@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" In the save/restore path, if KVM_SET_NESTED_STATE is performed before restoring REGS and/or SREGS , the values of CS and RIP used to initialize the vmcb02's NextRIP and soft interrupt tracking RIPs are incorrect. Recalculate them up after CS is set, or REGS are restored. This is only needed when a nested run is pending during restore. After L2 runs for the first time, any soft interrupts injected by L1 are already delivered or tracked by KVM separately for re-injection, so the CS and RIP values are no longer relevant. If KVM_SET_NESTED_STATE is performed after both REGS and SREGS are restored, it will just overwrite the fields. Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET= _NESTED_STATE") CC: stable@vger.kernel.org Signed-off-by: Yosry Ahmed --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm/nested.c | 4 +++- arch/x86/kvm/svm/svm.c | 21 +++++++++++++++++++++ arch/x86/kvm/x86.c | 2 ++ 5 files changed, 28 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index de709fb5bd76..7221517ea3e6 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -54,6 +54,7 @@ KVM_X86_OP(cache_reg) KVM_X86_OP(get_rflags) KVM_X86_OP(set_rflags) KVM_X86_OP(get_if_flag) +KVM_X86_OP_OPTIONAL(post_user_set_regs) KVM_X86_OP(flush_tlb_all) KVM_X86_OP(flush_tlb_current) #if IS_ENABLED(CONFIG_HYPERV) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index ff07c45e3c73..feadd9579159 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1789,6 +1789,7 @@ struct kvm_x86_ops { unsigned long (*get_rflags)(struct kvm_vcpu *vcpu); void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); bool (*get_if_flag)(struct kvm_vcpu *vcpu); + void (*post_user_set_regs)(struct kvm_vcpu *vcpu); =20 void (*flush_tlb_all)(struct kvm_vcpu *vcpu); void (*flush_tlb_current)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index af7a0113f269..22680aa31c28 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -766,7 +766,9 @@ void nested_vmcb02_prepare_rips(struct kvm_vcpu *vcpu, = unsigned long csbase, else if (boot_cpu_has(X86_FEATURE_NRIPS)) svm->vmcb->control.next_rip =3D rip; =20 - if (!is_evtinj_soft(svm->nested.ctl.event_inj)) + /* L1's injected events should be cleared after the first run of L2 */ + if (!is_evtinj_soft(svm->nested.ctl.event_inj) || + WARN_ON_ONCE(!svm->nested.nested_run_pending)) return; =20 svm->soft_int_injected =3D true; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 8f8bc863e214..5729da2b300d 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1477,6 +1477,24 @@ static bool svm_get_if_flag(struct kvm_vcpu *vcpu) : kvm_get_rflags(vcpu) & X86_EFLAGS_IF; } =20 +static void svm_fixup_nested_rips(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm =3D to_svm(vcpu); + + /* + * In the save/restore path, if nested state is restored before + * RIP or CS, then fixing up the vmcb02 (and soft IRQ tracking) is + * needed. This is only the case if a nested run is pending (i.e. L2 + * is yet to run after L1's VMRUN). Otherwise, any soft IRQ injected by + * L1 should have been delivered to L2 or is being tracked separately by + * KVM for re-injection. Similarly, NextRIP would have already been + * updated by the CPU and/or KVM. + */ + if (svm->nested.nested_run_pending) + nested_vmcb02_prepare_rips(vcpu, svm->vmcb->save.cs.base, + kvm_rip_read(vcpu)); +} + static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) { kvm_register_mark_available(vcpu, reg); @@ -1826,6 +1844,8 @@ static void svm_set_segment(struct kvm_vcpu *vcpu, if (seg =3D=3D VCPU_SREG_SS) /* This is symmetric with svm_get_segment() */ svm->vmcb->save.cpl =3D (var->dpl & 3); + else if (seg =3D=3D VCPU_SREG_CS) + svm_fixup_nested_rips(vcpu); =20 vmcb_mark_dirty(svm->vmcb, VMCB_SEG); } @@ -5172,6 +5192,7 @@ struct kvm_x86_ops svm_x86_ops __initdata =3D { .get_rflags =3D svm_get_rflags, .set_rflags =3D svm_set_rflags, .get_if_flag =3D svm_get_if_flag, + .post_user_set_regs =3D svm_fixup_nested_rips, =20 .flush_tlb_all =3D svm_flush_tlb_all, .flush_tlb_current =3D svm_flush_tlb_current, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index db3f393192d9..35fe1d337273 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12112,6 +12112,8 @@ static void __set_regs(struct kvm_vcpu *vcpu, struc= t kvm_regs *regs) kvm_rip_write(vcpu, regs->rip); kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED); =20 + kvm_x86_call(post_user_set_regs)(vcpu); + vcpu->arch.exception.pending =3D false; vcpu->arch.exception_vmexit.pending =3D false; =20 --=20 2.53.0.273.g2a3d683680-goog From nobody Thu Apr 2 20:28:00 2026 Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7BAA35EDAF for ; Thu, 12 Feb 2026 23:08:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770937713; cv=none; b=FrQciD1i9W5W3P5mXWjYJYzeWCXigojLLzBiA0ArZYx0DkPvDblmMtS5HhUkzn/Is3ozn5ui6wVji4eFiobTqAz1bqeTepCKiIDOBR6l37W/GdASPm6yWQIpUk2HGvoxBGNitO5wInMcWO7TRwjzNX69SL5FFFDoGcc7+glALjQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770937713; c=relaxed/simple; bh=bAOaJdOA04AkQxS17TJBTKFOnGSDb1vT1i6dJxSl2nQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NMEBYQGfThvSV4qHSZASSg/P30qKHR0ZCEMDOZYevcU/hZzcNBqT5G5xbPKk46Npwi7RU47ufvH5/vTGCXREyiiJ/9gUHGpwJKJKcC1+vI3gytqwCJEW9Vw/PkyvvYchP9n+EMfOVu1bvahj48izh23e2s71TdTGJp38OirNfkc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=m2ge7CAD; arc=none smtp.client-ip=91.218.175.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="m2ge7CAD" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1770937704; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xv3/xFFAZoY8Sit0MX/CbnOjm9pEWm9G+ntyH+it1HQ=; b=m2ge7CADwE8J8TFiGy/AGC7xyp/ExrzJvwKT0sGyjjpZsBBI0nzZWmA8eHVxadU0xR8R1J lSsYeG/39iO2W4/O0QtJwfGwJqjtoSnSN1YwTysAxheQtBKNfWic7PSa8eyiJJ3p4wWTG1 GJJWWYso5N7wRV3x9Jb5YqzQhI2462M= From: Yosry Ahmed To: Sean Christopherson Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Yosry Ahmed Subject: [RFC PATCH 5/5] DO NOT MERGE: KVM: selftests: Reproduce nested RIP restore bug Date: Thu, 12 Feb 2026 23:07:51 +0000 Message-ID: <20260212230751.1871720-6-yosry.ahmed@linux.dev> In-Reply-To: <20260212230751.1871720-1-yosry.ahmed@linux.dev> References: <20260212230751.1871720-1-yosry.ahmed@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" Update svm_nested_soft_inject_test such that L1 syncs to userspace before running L2. The test then enables single-stepping and steps through guest code until VMRUN is execute, and saves/restores the VM immediately after (before L2 runs). This reproduces a bug in save/restore where L2's RIP is not used correctly to construct the vmcb02 at the destination. Signed-off-by: Yosry Ahmed --- .../testing/selftests/kvm/lib/x86/processor.c | 3 + .../kvm/x86/svm_nested_soft_inject_test.c | 74 +++++++++++++++---- 2 files changed, 61 insertions(+), 16 deletions(-) diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testin= g/selftests/kvm/lib/x86/processor.c index fab18e9be66c..3e8d516ec8d3 100644 --- a/tools/testing/selftests/kvm/lib/x86/processor.c +++ b/tools/testing/selftests/kvm/lib/x86/processor.c @@ -1291,6 +1291,9 @@ void vcpu_load_state(struct kvm_vcpu *vcpu, struct kv= m_x86_state *state) =20 if (state->nested.size) vcpu_nested_state_set(vcpu, &state->nested); + + /* Switch between this and the call above */ + // vcpu_regs_set(vcpu, &state->regs); } =20 void kvm_x86_state_cleanup(struct kvm_x86_state *state) diff --git a/tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c = b/tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c index 4bd1655f9e6d..dfefd8eed392 100644 --- a/tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c +++ b/tools/testing/selftests/kvm/x86/svm_nested_soft_inject_test.c @@ -101,6 +101,7 @@ static void l1_guest_code(struct svm_test_data *svm, ui= nt64_t is_nmi, uint64_t i vmcb->control.next_rip =3D vmcb->save.rip; } =20 + GUEST_SYNC(true); run_guest(vmcb, svm->vmcb_gpa); __GUEST_ASSERT(vmcb->control.exit_code =3D=3D SVM_EXIT_VMMCALL, "Expected VMMCAL #VMEXIT, got '0x%lx', info1 =3D '0x%lx, info2 = =3D '0x%lx'", @@ -131,6 +132,7 @@ static void l1_guest_code(struct svm_test_data *svm, ui= nt64_t is_nmi, uint64_t i /* The return address pushed on stack, skip over UD2 */ vmcb->control.next_rip =3D vmcb->save.rip + 2; =20 + GUEST_SYNC(true); run_guest(vmcb, svm->vmcb_gpa); __GUEST_ASSERT(vmcb->control.exit_code =3D=3D SVM_EXIT_HLT, "Expected HLT #VMEXIT, got '0x%lx', info1 =3D '0x%lx, info2 =3D '= 0x%lx'", @@ -140,6 +142,24 @@ static void l1_guest_code(struct svm_test_data *svm, u= int64_t is_nmi, uint64_t i GUEST_DONE(); } =20 +static struct kvm_vcpu *save_and_restore_vm(struct kvm_vm *vm, struct kvm_= vcpu *vcpu) +{ + struct kvm_x86_state *state =3D vcpu_save_state(vcpu); + + kvm_vm_release(vm); + vcpu =3D vm_recreate_with_one_vcpu(vm); + vcpu_load_state(vcpu, state); + kvm_x86_state_cleanup(state); + return vcpu; +} + +static bool is_nested_run_pending(struct kvm_vcpu *vcpu) +{ + struct kvm_x86_state *state =3D vcpu_save_state(vcpu); + + return state->nested.size && (state->nested.flags & KVM_STATE_NESTED_RUN_= PENDING); +} + static void run_test(bool is_nmi) { struct kvm_vcpu *vcpu; @@ -173,22 +193,44 @@ static void run_test(bool is_nmi) memset(&debug, 0, sizeof(debug)); vcpu_guest_debug_set(vcpu, &debug); =20 - struct ucall uc; - - alarm(2); - vcpu_run(vcpu); - alarm(0); - TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); - - switch (get_ucall(vcpu, &uc)) { - case UCALL_ABORT: - REPORT_GUEST_ASSERT(uc); - break; - /* NOT REACHED */ - case UCALL_DONE: - goto done; - default: - TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd); + for (;;) { + struct kvm_guest_debug debug; + struct ucall uc; + + alarm(2); + vcpu_run(vcpu); + alarm(0); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_SYNC: + /* + * L1 syncs before calling run_guest(), single-step over + * all instructions until VMRUN, and save+restore right + * after it (before L2 actually runs). + */ + debug.control =3D KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP; + vcpu_guest_debug_set(vcpu, &debug); + + do { + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_DEBUG); + } while (!is_nested_run_pending(vcpu)); + + memset(&debug, 0, sizeof(debug)); + vcpu_guest_debug_set(vcpu, &debug); + vcpu =3D save_and_restore_vm(vm, vcpu); + break; + + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + break; + /* NOT REACHED */ + case UCALL_DONE: + goto done; + default: + TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd); + } } done: kvm_vm_free(vm); --=20 2.53.0.273.g2a3d683680-goog