From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 06033C433EF
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:28:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346032AbiCKD3P (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:29:15 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41014 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S239978AbiCKD3J (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:09 -0500
Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com
 [IPv6:2607:f8b0:4864:20::449])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62BBFEACA0
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:06 -0800 (PST)
Received: by mail-pf1-x449.google.com with SMTP id
 x205-20020a627cd6000000b004f6e1b97b45so4417273pfc.18
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=05/9tl1rLPgUBreYUw7eEtnH71YqeV4kQqIYEJJG048=;
        b=mTFhxyAL3l7jFh6smGX49alsnQYJtzV77gVFS1xKNxi+Ww+Y4j3A4/X0QodMpGbt3h
         nzocS9cCTkxKrB4aI/RpCWlONph10GWM+igP3L2vZj+8dBxmt5R2Qx7UUaYidHLbMSME
         5HgxOL+4uS4+ZF4bVTYTow3BtyBBN+z9UTVgK4eh6++p+5tKhwoUzchO1BkQsBVXLcO6
         aKeqa18iXa1g8o9OGezwfdUfGFxMnGJD0GFjJzcZ+eAwjK04cdNPg7KQn3pNCC/Z6h76
         YAmuYiFnn1C4MXcESZi4iHNgDRA+AoaYYCXAgX3vEr2NeCzun+u5/w/1xoz3GRJI3FLi
         QBNQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=05/9tl1rLPgUBreYUw7eEtnH71YqeV4kQqIYEJJG048=;
        b=1TqdxUOgDtHfIq6zUmw+5wkJ2X5gJM7ovZOkIsQCROxmaT3EcNJiH/XJPoKFZpYanP
         58fxU/iXDJb3vnCDCrIdjJObbm7rjP084l4DXZUIZsxPvs+NsLvnNXDJaE49mQEvnNPV
         0SN3KwnnRxmX9VcIKbOKqEQGezSSIrHlkCVeEMuChZiHFKzgYkc2JyRdFem6/+Ir4U1Q
         KMr12SISGjMuDH2od58MzI0okx7rw6STZdJj9ol2vdFRKExdk0dVimrXbLXkAy6fuP4n
         d5LEZvyjltApwCU3cI52mn+dWgoU7NftyPYZZBabj8a2qw9x+hvE6rGWKlsnUU6271Rk
         Fddg==
X-Gm-Message-State: AOAM531rijshn0KRN0ZJcd2H+gYzpOnFTeyHHMRhnkJA0bECU6qVF2Aq
        hL/Jx2huAfUCV3cviWZ8HIGu9VYZ66A=
X-Google-Smtp-Source: 
 ABdhPJxHI4xHm2rn6sArcUXsyJPhkn5IvKDkgnSWCGlQa5SEghkMuvBVwOq39SUscIuYo0+UjDC5n6k72NI=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90b:4d81:b0:1bf:8ce4:4f51 with SMTP id
 oj1-20020a17090b4d8100b001bf8ce44f51mr404030pjb.0.1646969285541; Thu, 10 Mar
 2022 19:28:05 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:41 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-2-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 01/21] KVM: x86: Return immediately from
 x86_emulate_instruction()
 on code #DB
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Return immediately if a code #DB is encountered during instruction
emulation, code #DBs have fault-like behavior and are higher priority
than any exceptions that occur on the code fetch itself, and obviously
should prevent decode/execution.

Fixes: 4aa2691dcbd3 ("KVM: x86: Factor out x86 instruction emulation with d=
ecoding")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4fa4d8269e5b..feacc0901c24 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8212,7 +8212,7 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vc=
pu)
 }
 EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
=20
-static bool kvm_vcpu_check_breakpoint(struct kvm_vcpu *vcpu, int *r)
+static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu, int *r)
 {
 	if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) &&
 	    (vcpu->arch.guest_debug_dr7 & DR7_BP_EN_MASK)) {
@@ -8281,25 +8281,23 @@ static bool is_vmware_backdoor_opcode(struct x86_em=
ulate_ctxt *ctxt)
 }
=20
 /*
- * Decode to be emulated instruction. Return EMULATION_OK if success.
+ * Decode an instruction for emulation.  The caller is responsible for han=
dling
+ * code breakpoints.  Note, manually detecting code breakpoints is unneces=
sary
+ * (and wrong) when emulating on an intercepted fault-like exception[*], as
+ * code breakpoints have higher priority and thus have already been done by
+ * hardware.
+ *
+ * [*] Except #MC, which is higher priority, but KVM should never emulate =
in
+ *     response to a machine check.
  */
 int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_t=
ype,
 				    void *insn, int insn_len)
 {
-	int r =3D EMULATION_OK;
 	struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt;
+	int r;
=20
 	init_emulate_ctxt(vcpu);
=20
-	/*
-	 * We will reenter on the same instruction since we do not set
-	 * complete_userspace_io. This does not handle watchpoints yet,
-	 * those would be handled in the emulate_ops.
-	 */
-	if (!(emulation_type & EMULTYPE_SKIP) &&
-	    kvm_vcpu_check_breakpoint(vcpu, &r))
-		return r;
-
 	r =3D x86_decode_insn(ctxt, insn, insn_len, emulation_type);
=20
 	trace_kvm_emulate_insn_start(vcpu);
@@ -8332,6 +8330,15 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, g=
pa_t cr2_or_gpa,
 	if (!(emulation_type & EMULTYPE_NO_DECODE)) {
 		kvm_clear_exception_queue(vcpu);
=20
+		/*
+		 * Return immediately if RIP hits a code breakpoint, such #DBs
+		 * are fault-like and are higher priority than any faults on
+		 * the code fetch itself.
+		 */
+		if (!(emulation_type & EMULTYPE_SKIP) &&
+		    kvm_vcpu_check_code_breakpoint(vcpu, &r))
+			return r;
+
 		r =3D x86_decode_emulated_instruction(vcpu, emulation_type,
 						    insn, insn_len);
 		if (r !=3D EMULATION_OK)  {
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C5795C433EF
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:28:18 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1345104AbiCKD3T (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:29:19 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41090 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S242089AbiCKD3K (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:10 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0CFD4EB305
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:08 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 b9-20020a17090aa58900b001b8b14b4aabso4413877pjq.9
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=I+0PCK08cjf4M6d/EijWO6MxFUvmgvYBy0yuwDtEnpk=;
        b=gSWPnigYbX/RC9U7qKUlH9roio4rflbomJk7Ap7DYKERzce0Iionv/jdFxt70QdlHi
         1wnOabBbi4hvTonv1Inf9a7CfmQHGagoM0DdCPUHi6X9lHD+DqFYxyVWh+dkk2V7flUx
         6YkCA0sAONc025Unx9uYQagsk49dsH4TBmpXGcT7PF87bxMIgigQbLBVn9I5AhpWGT01
         De7ZCA0sQOaN6UjiioD3taDE+i2ciZe8/QxNJIs1FGzwjvR46avRUNveVp4hHEbonFd4
         JTl3+oHXtzjGUiM4ULQe9m85BYR+9lMbndDabaxrq8ISO9RMHBPkerPiLNCiCbUBO65i
         CYwA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=I+0PCK08cjf4M6d/EijWO6MxFUvmgvYBy0yuwDtEnpk=;
        b=HAJFqF9MA6TqVWuErV62DvA5XkFtFzFLHDgktp1llGTjDaOypY7gYHKY5BYM1qHvdd
         tvp4b0geAf866M13MH7FdLZ/fNeA18ZdztdmP9eNuburj0AWD1SrCBC50kE0VnYkLJP3
         yYZ3cCnqU7qR0M0DSk+fC+Bik2VclgHvry2hD5uDwyThaDs5IaVzy5Kn/0ayVEd6SsfV
         nc0pG9h2ki1dYRzoogGD/RmgP/16rBWqAUAnK8oNhw5KceLofp0UgJlZ6aysATINOsr3
         2Lp+6/m/mmU4ores/F1eSDGmhur3OoYLCyO/qLac1Osa73LgqZLLMpVpVFbjSZKMUpa9
         R1rg==
X-Gm-Message-State: AOAM533uRYFmCAd4dhbptTp4IsvaVFxefXXy2k3bv0w8xHeJAemCXFHK
        0sxOP8R/nHAxuZl3rh4UtHc4WQVbHjg=
X-Google-Smtp-Source: 
 ABdhPJwLdSK+KhhkYT2ZHcDo1u9dHFKUt+OzU3gMQIT20p0hhay4A2TMQWO48JBhg7usPwbX1PAQcgsKfvg=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:903:124f:b0:151:76bc:6e0b with SMTP id
 u15-20020a170903124f00b0015176bc6e0bmr8632554plh.81.1646969287475; Thu, 10
 Mar 2022 19:28:07 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:42 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-3-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 02/21] KVM: nVMX: Unconditionally purge queued/injected events
 on nested "exit"
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Drop pending exceptions and events queued for re-injection when leaving
nested guest mode, even if the "exit" is due to VM-Fail, SMI, or forced
by host userspace.  Failure to purge events could result in an event
belonging to L2 being injected into L1.

This _should_ never happen for VM-Fail as all events should be blocked by
nested_run_pending, but it's possible if KVM, not the L1 hypervisor, is
the source of VM-Fail when running vmcs02.

SMI is a nop (barring unknown bugs) as recognition of SMI and thus entry
to SMM is blocked by pending exceptions and re-injected events.

Forced exit is definitely buggy, but has likely gone unnoticed because
userspace probably follows the forced exit with KVM_SET_VCPU_EVENTS (or
some other ioctl() that purges the queue).

Fixes: 4f350c6dbcb9 ("kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME fa=
ilure properly")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f18744f7ff82..f09c6eff7af9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4233,14 +4233,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, st=
ruct vmcs12 *vmcs12,
 			nested_vmx_abort(vcpu,
 					 VMX_ABORT_SAVE_GUEST_MSR_FAIL);
 	}
-
-	/*
-	 * Drop what we picked up for L2 via vmx_complete_interrupts. It is
-	 * preserved above and would only end up incorrectly in L1.
-	 */
-	vcpu->arch.nmi_injected =3D false;
-	kvm_clear_exception_queue(vcpu);
-	kvm_clear_interrupt_queue(vcpu);
 }
=20
 /*
@@ -4582,6 +4574,17 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm=
_exit_reason,
 		WARN_ON_ONCE(nested_early_check);
 	}
=20
+	/*
+	 * Drop events/exceptions that were queued for re-injection to L2
+	 * (picked up via vmx_complete_interrupts()), as well as exceptions
+	 * that were pending for L2.  Note, this must NOT be hoisted above
+	 * prepare_vmcs12(), events/exceptions queued for re-injection need to
+	 * be captured in vmcs12 (see vmcs12_save_pending_event()).
+	 */
+	vcpu->arch.nmi_injected =3D false;
+	kvm_clear_exception_queue(vcpu);
+	kvm_clear_interrupt_queue(vcpu);
+
 	vmx_switch_vmcs(vcpu, &vmx->vmcs01);
=20
 	/* Update any VMCS fields that might have changed while L2 ran */
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B3C53C433EF
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:28:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1343962AbiCKD33 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:29:29 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41194 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346010AbiCKD3M (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:12 -0500
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE162EBAD9
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:09 -0800 (PST)
Received: by mail-pl1-x649.google.com with SMTP id
 n11-20020a170902d2cb00b0015331a5d02fso819113plc.12
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=Q11tmqpRyohs0wxUo0y4r3aBzzijFQCBN8fS/7u/ul8=;
        b=NHbaWQh8IotH/IrJheJvNEMIYI3iNClIpD6pF7pYa4PxajGfFNw7H4VVrbMYpSPW53
         UaKM2vDuG07mwqwLS1qPpozBZFczUFXDbAL0vYIPnfVJlCbhnejjDtMkNl7a0iNnRl0q
         cK1EvfqLBmuyqiIby2Isu4DOMI0utlyadO7Ln0VSaawOD4SnCYPhvodrFl9RzUiyc09P
         r0AIxujLDr7NjYMg/SJIEhBIdr6iNtwxHc2wMkyMcyo4i0S/s6GX6HSVIppNLBppI7Yv
         dAKoKlFxUmp/dNBDXBO/NM+2GPgkUnVYeA8dg1lPcriAanFXLAsGuB86BALXSDQdSErZ
         N7wQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=Q11tmqpRyohs0wxUo0y4r3aBzzijFQCBN8fS/7u/ul8=;
        b=60kpKZzmSCMEcjM5cLDGEGpFAeOtPuc331TQmeaLT1br5wbEbV1Nc2Ma2YNN0oE6mQ
         4Zi3uErGgSxwEDosFr1IumDmjnCU9XRquZQsCmKXEbpBxPsC30ufChUuc1/z3RElhf2e
         NasCURTRvqVrN1q5wrb58x+UjZoCbUogew9BSD96vY/eBWHaAeHU6JPOkn4sRpEHfTg6
         rTisNoH2iV0G1MUSQTQ5mmgGi4uG9Avc45cwfC8GYyTujRkyoWi5i1SFkk9euHz78f4k
         bQ9JKGQB75WI9p0k8wnEpw7bEKAafAkSPdjV1qkJMg5qx8xA1mGA7Ug2DPaRVpLGB7Dw
         CDyw==
X-Gm-Message-State: AOAM5318gZ7AUJSX9FXrpQcv1yU4qdrKcsguvyM0WVaX3eSEAGF6wTzg
        Wj+ciY/Wu27qmMAEkv+uFwrp2G1CF+8=
X-Google-Smtp-Source: 
 ABdhPJx8m7u1JsrHVqes8stC/gTklmGIv5nAe2qeZV+KxBfHH86PkBaWKYnB9nvHLVdbfKw51+II8mHq1Do=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:902:7086:b0:14f:ee29:5ef0 with SMTP id
 z6-20020a170902708600b0014fee295ef0mr8155852plk.142.1646969288900; Thu, 10
 Mar 2022 19:28:08 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:43 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-4-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 03/21] KVM: VMX: Drop bits 31:16 when shoving exception error
 code into VMCS
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Deliberately truncate the exception error code when shoving it into the
VMCS (VM-Entry field for vmcs01 and vmcs02, VM-Exit field for vmcs12).
Intel CPUs are incapable of handling 32-bit error codes and will never
generate an error code with bits 31:16, but userspace can provide an
arbitrary error code via KVM_SET_VCPU_EVENTS.  Failure to drop the bits
on exception injection results in failed VM-Entry, as VMX disallows
setting bits 31:16.  Setting the bits on VM-Exit would at best confuse
L1, and at worse induce a nested VM-Entry failure, e.g. if L1 decided to
reinject the exception back into L2.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c |  9 ++++++++-
 arch/x86/kvm/vmx/vmx.c    | 11 ++++++++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f09c6eff7af9..7bdda9ef2828 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3808,7 +3808,14 @@ static void nested_vmx_inject_exception_vmexit(struc=
t kvm_vcpu *vcpu,
 	u32 intr_info =3D nr | INTR_INFO_VALID_MASK;
=20
 	if (vcpu->arch.exception.has_error_code) {
-		vmcs12->vm_exit_intr_error_code =3D vcpu->arch.exception.error_code;
+		/*
+		 * Intel CPUs will never generate an error code with bits 31:16
+		 * set, and more importantly VMX disallows setting bits 31:16
+		 * in the injected error code for VM-Entry.  Drop the bits to
+		 * mimic hardware and avoid inducing failure on nested VM-Entry
+		 * if L1 chooses to inject the exception back to L2.
+		 */
+		vmcs12->vm_exit_intr_error_code =3D (u16)vcpu->arch.exception.error_code;
 		intr_info |=3D INTR_INFO_DELIVER_CODE_MASK;
 	}
=20
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e8963f5af618..a8ebe91fe9a5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1613,7 +1613,16 @@ static void vmx_queue_exception(struct kvm_vcpu *vcp=
u)
 	kvm_deliver_exception_payload(vcpu);
=20
 	if (has_error_code) {
-		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, error_code);
+		/*
+		 * Despite the error code being architecturally defined as 32
+		 * bits, and the VMCS field being 32 bits, Intel CPUs and thus
+		 * VMX don't actually supporting setting bits 31:16.  Hardware
+		 * will (should) never provide a bogus error code, but KVM's
+		 * ABI lets userspace shove in arbitrary 32-bit values.  Drop
+		 * the upper bits to avoid VM-Fail, losing information that
+		 * does't really exist is preferable to killing the VM.
+		 */
+		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, (u16)error_code);
 		intr_info |=3D INTR_INFO_DELIVER_CODE_MASK;
 	}
=20
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4B957C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:28:24 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346056AbiCKD3X (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:29:23 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41278 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S245352AbiCKD3N (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:13 -0500
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A59B9EBAE3
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:11 -0800 (PST)
Received: by mail-pj1-x104a.google.com with SMTP id
 p15-20020a17090a748f00b001bf3ba2ae95so4514190pjk.9
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=eeiWwVzbFJEQZHfLJkzt4Xui3WAD154yxH7Bf8Mh/wY=;
        b=V8JhYgObuMTrguU8wNFOf3ct7rERE5iwETFN/nEi8Fb5T8mhY+8sLgfuIrQes2c7Z7
         WMz7v6Jdka2/myXZW9wPkAlUuiNYnXUk9XJhcUpuQ0HQ0LbfrAe78PWpwjBxFJvyTk5z
         5Z7aVor0PaHxajLjeI8+OU0MN2d6tOYar7q4f0IN3loPRUqNIcDMnbECQPiZX0zA3E58
         KirztV80EUnbIUgx5VeZ4+5rCDsT5+52GXkgUE4hHjbu4/G3d82KQIZOl9kqjBOgmGZT
         2Vb8pRnjRVQu0ClJ6CrKYsM59OI2cTpBse7wmQy8WHEfhsjGw4tgpXIWPtPD9dkEuzAB
         kzGw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=eeiWwVzbFJEQZHfLJkzt4Xui3WAD154yxH7Bf8Mh/wY=;
        b=3fas2tl4VosmdgNyGu8R/PZW5eqEajbOopmwVYyeADQxMtMQKNmX480ZgJw+GcupJ1
         UmcDzIpd+10dSpB6PChVqk6HgIaVSIPFb78mVY/UOyQaF1Pkbvt+VJCKZ9dEpvtD3xSC
         7rpJ33CM+gqebHMbve2OoGrJ40ZY3F6zqDmNMSnBT5jHQGLa4O5ZOWrmSdqD4JwNCPBK
         8k9U/RTmIdkeVzt7rMtPEGW+XYegPNO//I4RZ0qbmFL4zVEtHallJqpzPVwtjVGBWfQQ
         wdOPWo0DAiPTOI1fp7oCp3TR0+qalBBnqkLfUpbvQ+RmejIpQ9XDVo1mdP+3zwPpht9o
         EmKQ==
X-Gm-Message-State: AOAM5314XuyY0Obycyl/z7RZ43hFN45tri0et7nPKGfCd9FHrWBLF4pb
        RQpVAnvf7DG1oD/K+aNynFF/QcxI4NI=
X-Google-Smtp-Source: 
 ABdhPJzx/4uQjDdd07wksC3Yi2WeYGtP6QOPZzhgOg23mGYLXrxk0cTDsoZNU9dWNj57GJ6PFu5ayq1LiOo=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:bc8:b0:4f6:ff68:50ba with SMTP id
 x8-20020a056a000bc800b004f6ff6850bamr7992417pfu.69.1646969290649; Thu, 10 Mar
 2022 19:28:10 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:44 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-5-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 04/21] KVM: x86: Don't check for code breakpoints when
 emulating on exception
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Don't check for code breakpoints during instruction emulation if the
emulation was triggered by exception interception.  Code breakpoints are
the highest priority fault-like exception, and KVM only emulates on
exceptions that are fault-like.  Thus, if hardware signaled a different
exception, then the vCPU is already passed the stage of checking for
hardware breakpoints.

This is likely a glorified nop in terms of functionality, and is more for
clarification and is technically an optimization.  Intel's SDM explicitly
states vmcs.GUEST_RFLAGS.RF on exception interception is the same as the
value that would have been saved on the stack had the exception not been
intercepted, i.e. will be '1' due to all fault-like exceptions setting RF
to '1'.  AMD says "guest state saved ... is the processor state as of the
moment the intercept triggers", but that begs the question, "when does
the intercept trigger?".

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index feacc0901c24..3636206ed3e4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8212,8 +8212,24 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *v=
cpu)
 }
 EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
=20
-static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu, int *r)
+static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu,
+					   int emulation_type, int *r)
 {
+	WARN_ON_ONCE(emulation_type & EMULTYPE_NO_DECODE);
+
+	/*
+	 * Do not check for code breakpoints if hardware has already done the
+	 * checks, as inferred from the emulation type.  On NO_DECODE and SKIP,
+	 * the instruction has passed all exception checks, and all intercepted
+	 * exceptions that trigger emulation have lower priority than code
+	 * breakpoints, i.e. the fact that the intercepted exception occurred
+	 * means any code breakpoints have already been serviced.
+	 */
+	if (emulation_type & (EMULTYPE_NO_DECODE | EMULTYPE_SKIP |
+			      EMULTYPE_TRAP_UD | EMULTYPE_TRAP_UD_FORCED |
+			      EMULTYPE_VMWARE_GP | EMULTYPE_PF))
+		return false;
+
 	if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) &&
 	    (vcpu->arch.guest_debug_dr7 & DR7_BP_EN_MASK)) {
 		struct kvm_run *kvm_run =3D vcpu->run;
@@ -8335,8 +8351,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gp=
a_t cr2_or_gpa,
 		 * are fault-like and are higher priority than any faults on
 		 * the code fetch itself.
 		 */
-		if (!(emulation_type & EMULTYPE_SKIP) &&
-		    kvm_vcpu_check_code_breakpoint(vcpu, &r))
+		if (kvm_vcpu_check_code_breakpoint(vcpu, emulation_type, &r))
 			return r;
=20
 		r =3D x86_decode_emulated_instruction(vcpu, emulation_type,
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B47A9C433EF
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:28:32 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346085AbiCKD3c (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:29:32 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41456 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346031AbiCKD3P (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:15 -0500
Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com
 [IPv6:2607:f8b0:4864:20::44a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9196BEBAD6
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:12 -0800 (PST)
Received: by mail-pf1-x44a.google.com with SMTP id
 16-20020a621910000000b004f783aad863so1412418pfz.15
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=j4Z3T+nzzIPRS6v7RNtG7klN67ptidGFnGzueq2K4lA=;
        b=ZoWE8aSMXhRnbdnYUKJb/+InFLaumFndZAxbWr2Xetg1Mpig0CEcJL6dTdPZB/R851
         3joeMc8nNG0Jn3syn9EVyM9wL0buqRTW8TJENX26wbHccDz3g183otaOVLJkv3lobo0C
         NWmCThPSgO+Gb0ik+U6TzXR5/0tAz0/A4H/XABISWFdhB+p3Vid+pqlxgv5NRRfzruLs
         vd4RQJBFEs0SXXgpixydqt/xnl/qWRzkZ93NPOJWjKYas5Bo+yJC/kw8r6GcPGK2Brf1
         /JV0Lgp7xt+1KeOBndPb6y8c76Mmj5jgetbqG+tFceapYw4OyDdgiqIc0mKR12B1diQ+
         3UrA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=j4Z3T+nzzIPRS6v7RNtG7klN67ptidGFnGzueq2K4lA=;
        b=AhF/EsgC6cd09trCNLk4kjDMo3p58GI4wXO/6KDV8dxZm95fP4Tsdqgw2vqvl4cG+y
         Z1YsiqiT+wOuDobIISLNQ1Vv5A6gz6g1gCWo4vXtTK/olUPcc4wS0HTqVDgjpmwWrYGi
         qVfp8J9XtuyazXNdL5U7AGJ159TdKitvyO91RyoFYGGI/8HsF+ctdZlLbYNFI5aouNAu
         LoFsX3XZA3Z/Xunu4uEc3BoaCuOLVIjHNAOpClt4OXVKb3BIzAL6Ze8/VTnkh9bmsCsM
         0e0lGpOz6fWai4Z1YXOp6siGwivZTLLaSl+DRUqAnD7r8CjgFQ5LyGJm9JI8BYwUPBz9
         xv5A==
X-Gm-Message-State: AOAM531u79oOxIa7eK5Xsk8DC7qrkdetoAndEtJmZkkcLmb2lYJM6IAX
        d6gZRBTXVUxvBxOInv5hTG/Q9SAMruQ=
X-Google-Smtp-Source: 
 ABdhPJz9vgLqgAlkmRAnk1nSs/gcNRQ9fjAPHBeNipVq0wvL8suZfMpo5RvlYD7tCfYlthJeiYYFZv21oIE=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:902:7895:b0:14b:6b63:b3fa with SMTP id
 q21-20020a170902789500b0014b6b63b3famr8166081pll.156.1646969292083; Thu, 10
 Mar 2022 19:28:12 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:45 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-6-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 05/21] KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as
 fault-like
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Exclude General Detect #DBs, which have fault-like behavior but also have
a non-zero payload (DR6.BD=3D1), from nVMX's handling of pending debug
traps.  Opportunistically rewrite the comment to better document what is
being checked, i.e. "has a non-zero payload" vs. "has a payload", and to
call out the many caveats surrounding #DBs that KVM dodges one way or
another.

Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Fixes: 684c0422da71 ("KVM: nVMX: Handle pending #DB when injecting INIT VM-=
exit")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7bdda9ef2828..298a58eaac32 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3832,16 +3832,29 @@ static void nested_vmx_inject_exception_vmexit(stru=
ct kvm_vcpu *vcpu,
 }
=20
 /*
- * Returns true if a debug trap is pending delivery.
+ * Returns true if a debug trap is (likely) pending delivery.  Infer the c=
lass
+ * of a #DB (trap-like vs. fault-like) from the exception payload (to-be-D=
R6).
+ * Using the payload is flawed because code breakpoints (fault-like) and d=
ata
+ * breakpoints (trap-like) set the same bits in DR6 (breakpoint detected),=
 i.e.
+ * this will return false positives if a to-be-injected code breakpoint #D=
B is
+ * pending (from KVM's perspective, but not "pending" across an instruction
+ * boundary).  ICEBP, a.k.a. INT1, is also not reflected here even though =
it
+ * too is trap-like.
  *
- * In KVM, debug traps bear an exception payload. As such, the class of a =
#DB
- * exception may be inferred from the presence of an exception payload.
+ * KVM "works" despite these flaws as ICEBP isn't currently supported by t=
he
+ * emulator, Monitor Trap Flag is not marked pending on intercepted #DBs (=
the
+ * #DB has already happened), and MTF isn't marked pending on code breakpo=
ints
+ * from the emulator (because such #DBs are fault-like and thus don't trig=
ger
+ * actions that fire on instruction retire).
  */
-static inline bool vmx_pending_dbg_trap(struct kvm_vcpu *vcpu)
+static inline unsigned long vmx_get_pending_dbg_trap(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.exception.pending &&
-			vcpu->arch.exception.nr =3D=3D DB_VECTOR &&
-			vcpu->arch.exception.payload;
+	if (!vcpu->arch.exception.pending ||
+	    vcpu->arch.exception.nr !=3D DB_VECTOR)
+		return 0;
+
+	/* General Detect #DBs are always fault-like. */
+	return vcpu->arch.exception.payload & ~DR6_BD;
 }
=20
 /*
@@ -3853,9 +3866,10 @@ static inline bool vmx_pending_dbg_trap(struct kvm_v=
cpu *vcpu)
  */
 static void nested_vmx_update_pending_dbg(struct kvm_vcpu *vcpu)
 {
-	if (vmx_pending_dbg_trap(vcpu))
-		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
-			    vcpu->arch.exception.payload);
+	unsigned long pending_dbg =3D vmx_get_pending_dbg_trap(vcpu);
+
+	if (pending_dbg)
+		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, pending_dbg);
 }
=20
 static bool nested_vmx_preemption_timer_pending(struct kvm_vcpu *vcpu)
@@ -3912,7 +3926,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 	 * while delivering the pending exception.
 	 */
=20
-	if (vcpu->arch.exception.pending && !vmx_pending_dbg_trap(vcpu)) {
+	if (vcpu->arch.exception.pending && !vmx_get_pending_dbg_trap(vcpu)) {
 		if (vmx->nested.nested_run_pending)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7253CC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:28:34 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S238525AbiCKD3e (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:29:34 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41440 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346026AbiCKD3Q (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:16 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08A94EB16A
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:14 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 x6-20020a17090aa38600b001c227fbfbc5so1268452pjp.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=TaKt6ifyDzw/4EryDJ6FgrsRqPfcQSEjNzKUlhOKYgc=;
        b=VM3IB77PwQXDPJ+CJqIHXpNdbm9e9A+2RGayeURR9qEVsFtUrJu8x6Q1sHq0zKOg+H
         8qaQxoqBCu5B1x4cKhKKh8uUHYwgnUZ0UKw2udJyW8wF/X7MAYGmJRfKOPBDGLt3RUsQ
         astlSKoCaHhl22x7MHgPVfpS5rG++nqHbu123QMA+U7F9T2gLysLWQaDSBnTKxfPN1RG
         PV/HZnNod0kDqiv41DDpbDAY1bTBpz2awvS9iBkbRBK8a+vZrWazkYSkHaDcI7ytfm08
         3KFUJL9w/0tQOQO3Si1Suy7vjmyaJg/0YCSUnHra0krGe86JwME1vmpBfNfSHztH8WRw
         0obw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=TaKt6ifyDzw/4EryDJ6FgrsRqPfcQSEjNzKUlhOKYgc=;
        b=rafiihnhlHZ1qmadwepPbyJnW6xaG5y0IO5UQEB1yDgJGHLlzqJpLCL5JaZPFqwrD0
         jwqcF9ZOyDJ4Xr/uNm/skhOc1gW2xGT+oRwZBh1CBIQSsXzjiQKqrXMOILAccMqRZc1E
         5eyFZ35z0Kxfi9Gi6JNjp1BsYh3GZqpmLSqLXcb+j23Yn7LfidU0pvfF7oF7cx1uhkRD
         gr624WjS7Q9+HkuF7Suo97B+MjTuGdDewIRHSy1M7vZsehmJIA8XH72RGvZAXbcqxf0a
         5PojAtzvBJPeesWqm8kdWpteGN+jT83c2tIu7k4KXK7mCdUK2NEWfw+i2PvgjtfRezC7
         L78w==
X-Gm-Message-State: AOAM531lLbvDxb6V3ZIEMXYZE+puHI6qN1lHchOh6TPAAr1OYd/GpCWt
        b+C6oN6EVB7mZ3ZB/xwLPOIEX7+gp/g=
X-Google-Smtp-Source: 
 ABdhPJw9uw91OP2dgONl+pA11JFBwf4ErEh3CpHGCkwniEzo2MTQ+Ypg+e93ikqW+nHeZ0HzLL4bfkXdUbQ=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90b:4b4a:b0:1bf:83d:6805 with SMTP id
 mi10-20020a17090b4b4a00b001bf083d6805mr20091923pjb.174.1646969293511; Thu, 10
 Mar 2022 19:28:13 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:46 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-7-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 06/21] KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap
 Flag
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Service TSS T-flag #DBs prior to pending MTFs, as such #DBs are higher
priority than MTF.  KVM itself doesn't emulate TSS #DBs, and any such
exceptions injected from L1 will be handled by hardware (or morphed to
a fault-like exception if injection fails), but theoretically userspace
could pend a TSS T-flag #DB in conjunction with a pending MTF.

Note, there's no known use case this fixes, it's purely to be technically
correct with respect to Intel's SDM.

Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Fixes: 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction em=
ulation")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 298a58eaac32..53ac6ebb3b3b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3918,15 +3918,17 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 	}
=20
 	/*
-	 * Process any exceptions that are not debug traps before MTF.
+	 * Process exceptions that are higher priority than Monitor Trap Flag:
+	 * fault-like exceptions, TSS T flag #DB (not emulated by KVM, but
+	 * could theoretically come in from userspace), and ICEBP (INT1).
 	 *
 	 * Note that only a pending nested run can block a pending exception.
 	 * Otherwise an injected NMI/interrupt should either be
 	 * lost or delivered to the nested hypervisor in the IDT_VECTORING_INFO,
 	 * while delivering the pending exception.
 	 */
-
-	if (vcpu->arch.exception.pending && !vmx_get_pending_dbg_trap(vcpu)) {
+	if (vcpu->arch.exception.pending &&
+	    !(vmx_get_pending_dbg_trap(vcpu) & ~DR6_BT)) {
 		if (vmx->nested.nested_run_pending)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9AB21C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:28:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S242495AbiCKD3l (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:29:41 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41444 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346053AbiCKD3S (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:18 -0500
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B19B9EBAFF
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:15 -0800 (PST)
Received: by mail-pg1-x549.google.com with SMTP id
 f21-20020a633815000000b0038105768c61so843560pga.21
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=hJz/U0EAMBA7hYzNPif94y/Uhe0d5y5BugBJH8WNWPE=;
        b=Ab1H9iK7AkwPOn659Y0mAICwHVK27uCNKEq4SJiBiViWfsXmL+AeWTP+udLFsapkGk
         iBMGFENjhztxgpeulA/SuDPfrChb1laLmdIRjJTeddb/tuUFbpuikOHcA5uoYNim5/BB
         vg8gSNHOT5w4ilpSaYM0YLA1ON7XkjhPVbPtjMC2EJh7odTILfukaJvpnyL8MZhO5V42
         TnVTrp2ymjJEDBEzCDESloSHqq4/JDlfahEoERiTwBDJWfElijUj2LUOEUKsGNPTyoTC
         KYg0gdm1KF04ACmzpbo5VBkGaoJitqExBhOT6rtSOb2g5rNSimdc0ZY6myVakQKXdouf
         uNeQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=hJz/U0EAMBA7hYzNPif94y/Uhe0d5y5BugBJH8WNWPE=;
        b=qK03ra+lDfl4VpQCyWJ0nYaB5JTJKQh932f9OcL+R102/w/C/vF6ob9R7Se6nxSg3m
         R6Pizi18ZCCk5q2lhAgkVIxgdIvhqaDvox5x0wWtH10/l4/EAt2rce/eGnOqIgBltI+G
         Ghs0Le1P5cPuMZwDZVxqTHtaeJfSvYeqe62QbvBhp4BfD/y9+BnMmPan6SuF2LnffMPn
         pI/cFuRc7BG41WiagiHcp4SZmsgz4JhnB0CyAyb8bKMqBWAvlZSCzOIpFEXrU8ifbMRf
         ZTakeaMMQiIncaO2gSfDGKmGz9DyenS4Eei6XXe5hx9stLKioaIoU57zNiwcvLt/1LlA
         1qgw==
X-Gm-Message-State: AOAM533Gdg14k6dnGIJJxzAm1lsGQTcU5clY+FRCp+BkOyBD2lm8l6Sw
        xlrHocSTFEXDmYmv5Zy6F0IwUHx8MiM=
X-Google-Smtp-Source: 
 ABdhPJzDbKAqjDDIl60ryX/NXrLw4lH1FufiQZPBvFXxRtaKm5Hc8HXYLgEz9j7cNsAqHJIrpUR2yJEcZoA=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:903:2348:b0:151:ff4f:e6b2 with SMTP id
 c8-20020a170903234800b00151ff4fe6b2mr8464100plh.51.1646969295202; Thu, 10 Mar
 2022 19:28:15 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:47 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-8-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 07/21] KVM: x86: Treat #DBs from the emulator as fault-like
 (code and DR7.GD=1)
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or
trap-like depending the sub-type of #DB, and effectively defer the
decision of what to do with the #DB to the caller.

For the emulator's two calls to exception_type(), treat the #DB as
fault-like, as the emulator handles only code breakpoint and general
detect #DBs, both of which are fault-like.

For event injection, which uses exception_type() to determine whether to
set EFLAGS.RF=3D1 on the stack, keep the current behavior of not setting
RF=3D1 for #DBs.  Intel and AMD explicitly state RF isn't set on code #DBs,
so exempting by failing the "=3D=3D EXCPT_FAULT" check is correct.  The only
other fault-like #DB is General Detect, and despite Intel and AMD both
strongly implying (through omission) that General Detect #DBs should set
RF=3D1, hardware (multiple generations of both Intel and AMD), in fact does
not.  Through insider knowledge, extreme foresight, sheer dumb luck, or
some combination thereof, KVM correctly handled RF for General Detect #DBs.

Fixes: 38827dbd3fb8 ("KVM: x86: Do not update EFLAGS on faulting emulation")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3636206ed3e4..507e5f26ebbf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -535,6 +535,7 @@ static int exception_class(int vector)
 #define EXCPT_TRAP		1
 #define EXCPT_ABORT		2
 #define EXCPT_INTERRUPT		3
+#define EXCPT_DB		4
=20
 static int exception_type(int vector)
 {
@@ -545,8 +546,14 @@ static int exception_type(int vector)
=20
 	mask =3D 1 << vector;
=20
-	/* #DB is trap, as instruction watchpoints are handled elsewhere */
-	if (mask & ((1 << DB_VECTOR) | (1 << BP_VECTOR) | (1 << OF_VECTOR)))
+	/*
+	 * #DBs can be trap-like or fault-like, the caller must check other CPU
+	 * state, e.g. DR6, to determine whether a #DB is a trap or fault.
+	 */
+	if (mask & (1 << DB_VECTOR))
+		return EXCPT_DB;
+
+	if (mask & ((1 << BP_VECTOR) | (1 << OF_VECTOR)))
 		return EXCPT_TRAP;
=20
 	if (mask & ((1 << DF_VECTOR) | (1 << MC_VECTOR)))
@@ -8480,6 +8487,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, g=
pa_t cr2_or_gpa,
 		unsigned long rflags =3D static_call(kvm_x86_get_rflags)(vcpu);
 		toggle_interruptibility(vcpu, ctxt->interruptibility);
 		vcpu->arch.emulate_regs_need_sync_to_vcpu =3D false;
+
+		/*
+		 * Note, EXCPT_DB is assumed to be fault-like as the emulator
+		 * only supports code breakpoints and general detect #DB, both
+		 * of which are fault-like.
+		 */
 		if (!ctxt->have_exception ||
 		    exception_type(ctxt->exception.vector) =3D=3D EXCPT_TRAP) {
 			kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
@@ -9361,6 +9374,16 @@ static int inject_pending_event(struct kvm_vcpu *vcp=
u, bool *req_immediate_exit)
 		vcpu->arch.exception.pending =3D false;
 		vcpu->arch.exception.injected =3D true;
=20
+		/*
+		 * Fault-class exceptions, except #DBs, set RF=3D1 in the RFLAGS
+		 * value pushed on the stack.  Trap-like exception and all #DBs
+		 * leave RF as-is (KVM follows Intel's behavior in this regard;
+		 * AMD states that code breakpoint #DBs excplitly clear RF=3D0).
+		 *
+		 * Note, most versions of Intel's SDM and AMD's APM incorrectly
+		 * describe the behavior of General Detect #DBs, which are
+		 * fault-like.  They do _not_ set RF, a la code breakpoints.
+		 */
 		if (exception_type(vcpu->arch.exception.nr) =3D=3D EXCPT_FAULT)
 			__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
 					     X86_EFLAGS_RF);
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 89497C433EF
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:28:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346163AbiCKD3t (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:29:49 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42390 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346071AbiCKD3Z (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:25 -0500
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D92CECB14
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:17 -0800 (PST)
Received: by mail-pg1-x549.google.com with SMTP id
 1-20020a630c41000000b00378d9d6bd91so4046326pgm.17
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=ah7917HNa/g0cmKwYFpbM45yCdlvulapW582Yzn7pZ0=;
        b=N407RnY58cz0djG3g+jXoIaXBqOrqIAQT5TafvJUD7SHhBvrPDJvakgzpjkg06fbrx
         SUfemsCdjmv3U4K/JwCSJ9qb40qz9Mmxbex4VowQEoCe19Mwrp0FK/fwbJP5qnXmgDpD
         aClTR0kaIMLMEiqIc0ZV/yLp9SaHLcVsPuJPBNiBMk6K5mVLN4NBlRH6G7CyyEFQeqY4
         IVTF8YlLKrXiNpNSRsRNWqsKxFzvh6xFEB4sztjCQ+7jiG2ZGhMxYB8vvur0SRS33HBV
         jg+Fj5WHM8hBK7IuMlRSTg7uCIiA8e9PClba+YuIAQ9duIdfU9N5X6zEtGCUlvCQbGsU
         Ew2Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=ah7917HNa/g0cmKwYFpbM45yCdlvulapW582Yzn7pZ0=;
        b=2DT0XKuSw4niKMsLpYbr+mhg7OSlffAakk4Hl25s6GMM5POdKa0G2MpALXyFU2w7aY
         nNKzDuEkSjH9dF2e7kEFocPZUQ3lytOfqDarALkXMBvoGqqDhYE3gXzCJf0nuiigs+Xi
         o+g/jgxjv2EeIgfcnw/4zFq+kKlG8SBEzi8XeKj5sO7HhDxaKv4XJO3oPVIoUiwZgVnc
         4/s1rbgwfAlkJZA1zMifNE0/cp3LY3u2x7tl0ob03a5KR5Uw+ZRv4mmCqR+gwkp+REhS
         BiG8oLACWhW3ufVTyimWwgwL0noS4isYwOgBd6xFfnWjnVGmAdbCVHyE6l75ce3nMa4s
         QMdQ==
X-Gm-Message-State: AOAM530gmf1ag27KP3/8dLLK9MJ7NKF84C0rjdeErProKYL25pFFpX9L
        dfx3IpQm91b3kajZSQKY+myh+YY/oDo=
X-Google-Smtp-Source: 
 ABdhPJwVEqTJyDOKgyRp9BN9/5t9HOysd368ONGXLWYbTYZ9iwc0I/OfH2lbvowB1Cq6FK3nyOUCoBrpfng=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90a:e2cc:b0:1bf:711f:11e7 with SMTP id
 fr12-20020a17090ae2cc00b001bf711f11e7mr19466024pjb.40.1646969297010; Thu, 10
 Mar 2022 19:28:17 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:48 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-9-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 08/21] KVM: x86: Use DR7_GD macro instead of open coding check
 in emulator
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Use DR7_GD in the emulator instead of open coding the check, and drop a
comically wrong comment.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/emulate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a60b4d20b309..648a0687120d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4148,8 +4148,7 @@ static int check_dr7_gd(struct x86_emulate_ctxt *ctxt)
=20
 	ctxt->ops->get_dr(ctxt, 7, &dr7);
=20
-	/* Check if DR7.Global_Enable is set */
-	return dr7 & (1 << 13);
+	return dr7 & DR7_GD;
 }
=20
 static int check_dr_read(struct x86_emulate_ctxt *ctxt)
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 25B91C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:28:57 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346057AbiCKD34 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:29:56 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41444 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346060AbiCKD3n (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:43 -0500
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5965EBB88
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:19 -0800 (PST)
Received: by mail-pg1-x549.google.com with SMTP id
 p21-20020a631e55000000b00372d919267cso4068576pgm.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=4wHqQmQYMmlrvs3hNFs3qg3hjBR4lK0MtK0mo5CKbtk=;
        b=dm/O2RQ0osNZqyPdQT9mVd47I6mZ1PQUwrpx1+f80sVQ/M/bsGsmpwbfFg+/ofwoic
         na1VFKn6V1/+TNoqFtNHX+vjWsAE8mdaZF//Y2XYPqsnMVvSUDKqQaqJSsPIn4IvS3TT
         0UKzZGDLGsYzmTWmczT5so8JBiWvcoCCQ4yGUPDkVOZ/0er9d7LCzEe30xbkuFpQdnDv
         PNzvfSsn73gXvupazH432VmfG8qOnnoEAk5hEOiSX7fbSz+ufvQ28Q7v3h1n7FFRcUN3
         ghk+vJ5LeKHL7HRR/kL5VgC9ekxzJCcfSfYJgIf3qmmLo1K4mM4SM0HkvuvIlUz4t3lD
         zyEA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=4wHqQmQYMmlrvs3hNFs3qg3hjBR4lK0MtK0mo5CKbtk=;
        b=gNXxI2mttwmUNzSN7bmGBXlJ49/3uw/CU2Z2K6ZOErlOdQdX1rxpzEgX9ufD6b8UJi
         VtbWnIecpIyHlw4o1AGX4C+rjxKPSv5GTKOVeZM0wAMksfiy72Ha3QXCPQXlmaqJc4jn
         m5XpGTr3PcR1INYOHUlGmuXK1B4zPxOsky5Cpn9RTQDeLJAtjBGnh8T+yIh/ymXXdUkn
         i5u54CILkMZ7TJ3bed+9q2+cLRNKyySQ5+xdrejHBeVwvOVrJm7qyG5aQHtfI7rF6smS
         p5Wfcj2AMdCfUR+fXyXSL9D+wRguQU5BNgpniEaXdfAAYl+Fgcbfhs0BKLdr21o615E9
         UPVw==
X-Gm-Message-State: AOAM533c3fjSsADXsSeGyyl7EOo902beRGNQowj1iQjREDD3zwVO8axR
        1PcDkx1UIOOtEXrMPmBxcVJQQGKq7+0=
X-Google-Smtp-Source: 
 ABdhPJyRDLP072Uw4B0sqGdbay9yE4OLa/6LAjtCL6JnJz5Dpd4znMU+cQxDWQlz2yB2sc/JgGetmzRaUNg=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:a8b:b0:4e1:52db:9e5c with SMTP id
 b11-20020a056a000a8b00b004e152db9e5cmr7945372pfl.38.1646969298739; Thu, 10
 Mar 2022 19:28:18 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:49 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-10-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 09/21] KVM: nVMX: Ignore SIPI that arrives in L2 when vCPU is
 not in WFS
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Fall through to handling other pending exception/events for L2 if SIPI
is pending while the CPU is not in Wait-for-SIPI.  KVM correctly ignores
the event, but incorrectly returns immediately, e.g. a SIPI coincident
with another event could lead to KVM incorrectly routing the event to L1
instead of L2.

Fixes: bf0cd88ce363 ("KVM: x86: emulate wait-for-SIPI and SIPI-VMExit")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 53ac6ebb3b3b..b22089ebfe76 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3911,10 +3911,12 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 			return -EBUSY;
=20
 		clear_bit(KVM_APIC_SIPI, &apic->pending_events);
-		if (vcpu->arch.mp_state =3D=3D KVM_MP_STATE_INIT_RECEIVED)
+		if (vcpu->arch.mp_state =3D=3D KVM_MP_STATE_INIT_RECEIVED) {
 			nested_vmx_vmexit(vcpu, EXIT_REASON_SIPI_SIGNAL, 0,
 						apic->sipi_vector & 0xFFUL);
-		return 0;
+			return 0;
+		}
+		/* Fallthrough, the SIPI is completely ignored. */
 	}
=20
 	/*
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A03F8C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:29:02 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346109AbiCKDaC (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:30:02 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43938 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346098AbiCKD3m (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:42 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3954ECC7C
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:20 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 s12-20020a17090a13cc00b001bee1e1677fso4543862pjf.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=AcJat6mfsYesUxiLsf097aJKwH7FBxsnrZltNc4YjNY=;
        b=UXgtWo/oc9ev1snc+o3QfD26We+3IV17jR3vHN8gwuLYsncQj9wSpNhPG4zDoDm6FR
         0M6Ggegix0R7PfGseTLt4fXluy6idQUfX/jT0o4n4tsmfmYuQlZvTAgm0403CBs2Cw10
         JRge3iUCXhNB8WaXwGbUjsal9IXUpKK3OGwysKxKjjxSZlOX4u+NUdswylh44YA884eN
         UabTfHqHkJR00Qle96v6J0+oSg5XIRMLQJYe7lDb/p6WhEZuhRye5XAd4XnD/qbrgm6J
         iK+rAKNBGq/vsUUx5YcqadXCx2nT48ExD00+fCAtpEf9YmwIeyqmSJzBBzozMeEMjxIx
         YxhQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=AcJat6mfsYesUxiLsf097aJKwH7FBxsnrZltNc4YjNY=;
        b=VwMEIdgX/pYXhjuo0Et2MdkYu/5+/njAxuXJTby1sFRGPDsG5XKFiWGuqv7ZsGEf32
         QX6d2aSYuQfr/z22SZUcC3saEbfrVyUt+pqsthZFwNsCoiIfICZPo87rp0HaXP2i3Ex0
         fliBoL+wbi+9bzjuAc3z5+c2l6AH2Z8Q3Wf9VmK6YbU/U1qDRyvlJpOSfhHdH6jshyMh
         wmAcs2ITHbh/q8VgXoHFPoBOc+S4gYTDQoKqGONtG8SsVbYTadPXWOgTwhOBEOZq18SG
         QN39b7fdhfXmw7NasQWREo078eoU5xBDg7+vJj/oiuEsEIVzuPdgFxNSpb6x1GWvTx7j
         wTvA==
X-Gm-Message-State: AOAM5332xRucsK5Xv2SeeKgAR3SKCY9dhJMTCJf8RuFWYqWKKq+rOZPD
        wJM4rFsJcy3cLs6F8VP256ZSmw2ncf4=
X-Google-Smtp-Source: 
 ABdhPJyliGAu6H5Nz54834Q3GV0caw3GdOFbWEVZpIm/9XuBICxNCnAsmjPKoCEANi2gYwoHf39zGnEdCGs=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90b:4c42:b0:1bf:c572:cc45 with SMTP id
 np2-20020a17090b4c4200b001bfc572cc45mr8597614pjb.238.1646969300408; Thu, 10
 Mar 2022 19:28:20 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:50 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-11-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 10/21] KVM: nVMX: Unconditionally clear mtf_pending on nested
 VM-Exit
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Clear mtf_pending on nested VM-Exit instead of handling the clear on a
case-by-case basis in vmx_check_nested_events().  The pending MTF should
never survive nested VM-Exit, as it is a property of KVM's run of the
current L2, i.e. should never affect the next L2 run by L1.  In practice,
this is likely a nop as getting to L1 with nested_run_pending is
impossible, and KVM doesn't correctly handle morphing a pending exception
that occurs on a prior injected exception (need for re-injected exception
being the other case where MTF isn't cleared).  However, KVM will
hopefully soon correctly deal with a pending exception on top of an
injected exception.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b22089ebfe76..82b2d9dde611 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3884,16 +3884,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *=
vcpu)
 	unsigned long exit_qual;
 	bool block_nested_events =3D
 	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
-	bool mtf_pending =3D vmx->nested.mtf_pending;
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
=20
-	/*
-	 * Clear the MTF state. If a higher priority VM-exit is delivered first,
-	 * this state is discarded.
-	 */
-	if (!block_nested_events)
-		vmx->nested.mtf_pending =3D false;
-
 	if (lapic_in_kernel(vcpu) &&
 		test_bit(KVM_APIC_INIT, &apic->pending_events)) {
 		if (block_nested_events)
@@ -3902,6 +3894,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 		clear_bit(KVM_APIC_INIT, &apic->pending_events);
 		if (vcpu->arch.mp_state !=3D KVM_MP_STATE_INIT_RECEIVED)
 			nested_vmx_vmexit(vcpu, EXIT_REASON_INIT_SIGNAL, 0, 0);
+
+		/* MTF is discarded if the vCPU is in WFS. */
+		vmx->nested.mtf_pending =3D false;
 		return 0;
 	}
=20
@@ -3939,7 +3934,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 		return 0;
 	}
=20
-	if (mtf_pending) {
+	if (vmx->nested.mtf_pending) {
 		if (block_nested_events)
 			return -EBUSY;
 		nested_vmx_update_pending_dbg(vcpu);
@@ -4532,6 +4527,9 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_=
exit_reason,
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
=20
+	/* Pending MTF traps are discarded on VM-Exit. */
+	vmx->nested.mtf_pending =3D false;
+
 	/* trying to cancel vmlaunch/vmresume is a bug */
 	WARN_ON_ONCE(vmx->nested.nested_run_pending);
=20
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A7F9EC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:29:10 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346145AbiCKDaI (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:30:08 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41950 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346114AbiCKD3o (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:44 -0500
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8764EEA51
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:22 -0800 (PST)
Received: by mail-pj1-x104a.google.com with SMTP id
 p5-20020a17090a748500b001bee6752974so4417916pjk.8
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=hDEsiW/1JCBAL6rnpjAoAKowK88eFadHJbQ+k7mmB9A=;
        b=RFF8YW6U9Bnsdg9ppIYdODr3twU98mGctgU6cl8f3boBCowDbMpo4eeue3p4MdWSAa
         Np0Tk6nTzyjg/sEQU+UR2QiiO6C8feFoVBL7/Y0GoWRFEgnprHnc6zXbr+57unQbg+np
         p0RKvMzLCgSglH5rGIauZNKXmiV4jtNhKYAg1rTY6ZTFO8dJD9j2oM83HenlbGmp+ImU
         1+gqT1Bkr7JdXwCHbdixFiuOOWrFCzmT5TwRAutxA1YQxwvDD5zE0JU8GSQAaDL7tr1a
         u2ewtiIGWsw3pGrmLNImaBnmXeGnzi0UHAbwqBv0MPeUuWUnf0mdg3FRmwGHI6aZcSIt
         Ayrg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=hDEsiW/1JCBAL6rnpjAoAKowK88eFadHJbQ+k7mmB9A=;
        b=OpMs4d4/jJXLkog0Xc4/bIxkmRmD5t0ikwxFcbQxaelpxY8msziyWgdR6QbJL759p2
         qqacPj0fnUX9kpoAu0ly8oqAgEF7444iYeacqG5zSgWmWkIiSdgMJj4zE9mKmpX48t48
         SaMjaXXUJelvTecei2vy4wEIjS0rhzLgq/VYnOCuCZeA9dmyqYrW5KfRXDk7H+xkShxY
         K+u7bHFuwdv8SqEpZmIT/EqDQwkxn4iA8OJOG5IYTZzNrh56zICVw68hRl54yDpGHSAV
         q9RKcr9IHobDXVFc2Qu+8gcKui8NakcOnnkt/a7iyDwVFyGSfg6UDqUJMGUGsMp9yjhw
         xZOQ==
X-Gm-Message-State: AOAM532G9Ke0brfTOeXLwf1jsV9SI9v4H5W5xypJRsbkCgWIsq+Gb6Sa
        0ONrIXZgPtdJ9T4AefiKban+/dmMkhA=
X-Google-Smtp-Source: 
 ABdhPJyhaE75SPhWVJQ1hIPPBNsxEpMhN69TxBTr0IeMQB2WoyGtw6PbXHuvKQdeNPQ/xASLq5yjvdDpSNA=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:902:a3c3:b0:151:ec83:d88 with SMTP id
 q3-20020a170902a3c300b00151ec830d88mr8717054plb.9.1646969302136; Thu, 10 Mar
 2022 19:28:22 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:51 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-12-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 11/21] KVM: VMX: Inject #PF on ENCLS as "emulated" #PF
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Treat #PFs that occur during emulation of ENCLS as, wait for it, emulated
page faults.  Practically speaking, this is a glorified nop as the
exception is never of the nested flavor, and it's extremely unlikely the
guest is relying on the side effect of an implicit INVLPG on the faulting
address.

Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce C=
PUID restrictions")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/sgx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 35e7ec91ae86..966cfa228f2a 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -129,7 +129,7 @@ static int sgx_inject_fault(struct kvm_vcpu *vcpu, gva_=
t gva, int trapnr)
 		ex.address =3D gva;
 		ex.error_code_valid =3D true;
 		ex.nested_page_fault =3D false;
-		kvm_inject_page_fault(vcpu, &ex);
+		kvm_inject_emulated_page_fault(vcpu, &ex);
 	} else {
 		kvm_inject_gp(vcpu, 0);
 	}
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 79540C433EF
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:29:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346151AbiCKDaN (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:30:13 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44000 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346139AbiCKD3s (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:48 -0500
Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com
 [IPv6:2607:f8b0:4864:20::449])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EFD6FF1E8F
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:24 -0800 (PST)
Received: by mail-pf1-x449.google.com with SMTP id
 y27-20020aa7943b000000b004f6decccdb5so4464804pfo.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=QjfA+ulPE/1p5AtnpClL8XNu0TVV5mTw0QyNigIW+zY=;
        b=DBjZAxpiVD/MGi+UdS5XMlndiShw1xIi6lAeK0wpz2NOy+pYg66MrD0gi6Bn9AMOac
         inZpdFh0KUdu8zjuU6eYxbESFlakwceZlRvZ6ok8sCJpdidSR3upO+TeefijCKsduVl9
         wuORJuUZnKl165HZNmEDPPUw5q0qTGK0abrjv+1OvV7aeFOd8qA4Ekm38e3nfz8DEyFx
         FLR/1404Tbzi7TA7KaFe6U41TaPV0NEBdA+0oNAj4QozbfqU7XO8qLDJHS3gnwCGzvB0
         G+4fv2NbXJK2Hqnjw0tNErD1BRZvxHz+AUamgFYeH2OBI4jd4DSaV593nf6VTDOKx4f9
         2V2w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=QjfA+ulPE/1p5AtnpClL8XNu0TVV5mTw0QyNigIW+zY=;
        b=KDhVJH08GZvtVSMoPc39HMkAO8e4L9cseJIhQ5DpEO7yKylPw8Mxm5NQO3LtDOSSb0
         NUOx5Oitmg8xGShgfG4rvPAAGnnXYdZZ/eM+2acwKnpY2PMC1q3YI19lzJs9Q8K2cN2W
         HzkuH3K7KIffpFgqkTm3LltRTWXuRXGTUgol+ED6ZVsCmAS8VIADawpgQ8Hc1jScYvt1
         gofl5NSO7woX1yyvMfPBkUKln72dIOQH8aSl98mhn2yQ9oC29d8Yb1IPw50ybtUz7mqT
         QOta6ZkqESjVA8EPWB80Ka2oV73LpmbyDNiRa0AhQPvn9fpBSErl285zLSHHiHxDBvzl
         RmLQ==
X-Gm-Message-State: AOAM530NV4KgOddmU6NP7W8BHj1hVLhszGSVpeAn0Ic/dkFgg1hEmiQE
        /j9hGp7wEKS0le2z/fuj4S12ZLRnPHQ=
X-Google-Smtp-Source: 
 ABdhPJzk89S+w6AUCCuGJZTBPwv8i95IrqOrmBsTCIORPlf2weHYqAJJxK4v6b0w4b/UcGaQnLfdgoOidFU=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90b:1e10:b0:1bf:6c78:54a9 with SMTP id
 pg16-20020a17090b1e1000b001bf6c7854a9mr403350pjb.1.1646969303953; Thu, 10 Mar
 2022 19:28:23 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:52 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-13-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 12/21] KVM: x86: Rename kvm_x86_ops.queue_exception to
 inject_exception
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Rename the kvm_x86_ops hook for exception injection to better reflect
reality, and to align with pretty much every other related function name
in KVM.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 2 +-
 arch/x86/include/asm/kvm_host.h    | 2 +-
 arch/x86/kvm/svm/svm.c             | 4 ++--
 arch/x86/kvm/vmx/vmx.c             | 4 ++--
 arch/x86/kvm/x86.c                 | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 29affccb353c..656fa1626dc1 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -66,7 +66,7 @@ KVM_X86_OP(get_interrupt_shadow)
 KVM_X86_OP(patch_hypercall)
 KVM_X86_OP(inject_irq)
 KVM_X86_OP(inject_nmi)
-KVM_X86_OP(queue_exception)
+KVM_X86_OP(inject_exception)
 KVM_X86_OP(cancel_injection)
 KVM_X86_OP(interrupt_allowed)
 KVM_X86_OP(nmi_allowed)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 3a2c855f04e3..4f891fe00767 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1391,7 +1391,7 @@ struct kvm_x86_ops {
 				unsigned char *hypercall_addr);
 	void (*inject_irq)(struct kvm_vcpu *vcpu);
 	void (*inject_nmi)(struct kvm_vcpu *vcpu);
-	void (*queue_exception)(struct kvm_vcpu *vcpu);
+	void (*inject_exception)(struct kvm_vcpu *vcpu);
 	void (*cancel_injection)(struct kvm_vcpu *vcpu);
 	int (*interrupt_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
 	int (*nmi_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fc5222a0f506..8b7f3c4e383f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -382,7 +382,7 @@ static int svm_skip_emulated_instruction(struct kvm_vcp=
u *vcpu)
 	return 1;
 }
=20
-static void svm_queue_exception(struct kvm_vcpu *vcpu)
+static void svm_inject_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm =3D to_svm(vcpu);
 	unsigned nr =3D vcpu->arch.exception.nr;
@@ -4580,7 +4580,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata =3D {
 	.patch_hypercall =3D svm_patch_hypercall,
 	.inject_irq =3D svm_inject_irq,
 	.inject_nmi =3D svm_inject_nmi,
-	.queue_exception =3D svm_queue_exception,
+	.inject_exception =3D svm_inject_exception,
 	.cancel_injection =3D svm_cancel_injection,
 	.interrupt_allowed =3D svm_interrupt_allowed,
 	.nmi_allowed =3D svm_nmi_allowed,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a8ebe91fe9a5..f3f16271fa2c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1602,7 +1602,7 @@ static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
 		vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
 }
=20
-static void vmx_queue_exception(struct kvm_vcpu *vcpu)
+static void vmx_inject_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	unsigned nr =3D vcpu->arch.exception.nr;
@@ -7783,7 +7783,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D {
 	.patch_hypercall =3D vmx_patch_hypercall,
 	.inject_irq =3D vmx_inject_irq,
 	.inject_nmi =3D vmx_inject_nmi,
-	.queue_exception =3D vmx_queue_exception,
+	.inject_exception =3D vmx_inject_exception,
 	.cancel_injection =3D vmx_cancel_injection,
 	.interrupt_allowed =3D vmx_interrupt_allowed,
 	.nmi_allowed =3D vmx_nmi_allowed,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 507e5f26ebbf..452fbb55d9d2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9312,7 +9312,7 @@ static void kvm_inject_exception(struct kvm_vcpu *vcp=
u)
 {
 	if (vcpu->arch.exception.error_code && !is_protmode(vcpu))
 		vcpu->arch.exception.error_code =3D false;
-	static_call(kvm_x86_queue_exception)(vcpu);
+	static_call(kvm_x86_inject_exception)(vcpu);
 }
=20
 static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate=
_exit)
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 690C6C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:29:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346263AbiCKDaV (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:30:21 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41668 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346160AbiCKD3t (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:49 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0E7DF32B5
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:26 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 lp2-20020a17090b4a8200b001bc449ecbceso7065273pjb.8
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=ZXBUeCkzC64GFGqdJoNk1VBO8P67/O3n7kSI4bnhy7Q=;
        b=GRnuMP5NIst57ahFj6X65mxBlaeNUjIR1eO7TQ/avAjzdvgv3ILr/reXJ204q+GjBG
         dH3jEXb1f44z1KLsOqIQHScakeQfGXeX/huFLN9TgnGzpwcEEsC4Z8TJrwDxhJSrczMu
         +8jDOEK25tpDHOaSplIu5mGZrav9ZEN8mAhTdozdr5+A5e7BG+9QaP5q5aIBn4x5Nfwn
         OMMwCN4cbhVuHwOSFSHS/sV+XSv2SpmWdlWemkvhcd1nXJxT2YnsKjN4rVqvfky8UEKt
         gFTgb7NIFXpa7sFHw7xgqwH2Q8YlrpohRPwLLZ3PAt1H4JeNfCScC2IpUQBniaDdjh/N
         KvHw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=ZXBUeCkzC64GFGqdJoNk1VBO8P67/O3n7kSI4bnhy7Q=;
        b=sFXxsAbtLqHmfv+Nqiuxhf0FXQE5+S0siJc8Pnst1Y5VEz/U8Fn0/vTPb0FMq8faty
         J1esouNmfKfUmHcXVKoJdIvl3CjATl0WYPykQWKsJC+KI+yluT/KkLs+HT1LheUwyjzS
         hNm2sZhcc7Iea4ZsvpLL/lvLy3kyDv7o2UeAYwBTlld/+K6ZCf6lFg0EwYGf3NiB7FXj
         mmzhvqhUYyIMVpndDmGfWJA+J09uUEqN/B/AW6NHgyVEgugsWjyrM2PrMIcjpGzTTonY
         BhkbejFZ9Ph0YiixmR6lL9eEqfSE3zV1/m7dZc5Ce+qJmGMbHPGUTW96MW9OxpcP7MiB
         FmBg==
X-Gm-Message-State: AOAM533IQfjz+pXPoDE+Y+d7qkLbsDBuw4nd11Fkxgyqv3kTAdyeAkPC
        Xsj9N9EAjpUrZSE8FF+Sb5RhJvMu3PQ=
X-Google-Smtp-Source: 
 ABdhPJzgnUYPNKxjrUu/H6j/MrwqxdPDgcUMii6UIXfFz8mhgE4i1HLN2EKJSfM06fH24pJWRHmvvw45rFU=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90b:4d81:b0:1bf:8ce4:4f51 with SMTP id
 oj1-20020a17090b4d8100b001bf8ce44f51mr404183pjb.0.1646969305825; Thu, 10 Mar
 2022 19:28:25 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:53 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-14-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 13/21] KVM: x86: Make kvm_queued_exception a properly named,
 visible struct
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Move the definition of "struct kvm_queued_exception" out of kvm_vcpu_arch
in anticipation of adding a second instance in kvm_vcpu_arch to handle
exceptions that occur when vectoring an injected exception and are
morphed to VM-Exit instead of leading to #DF.

Opportunistically take advantage of the churn to rename "nr" to "vector".

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 23 +++++-----
 arch/x86/kvm/svm/nested.c       | 45 ++++++++++---------
 arch/x86/kvm/svm/svm.c          | 14 +++---
 arch/x86/kvm/vmx/nested.c       | 44 +++++++++---------
 arch/x86/kvm/vmx/vmx.c          | 20 ++++-----
 arch/x86/kvm/x86.c              | 80 ++++++++++++++++-----------------
 arch/x86/kvm/x86.h              |  3 +-
 7 files changed, 112 insertions(+), 117 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 4f891fe00767..478e2fef0062 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -615,6 +615,17 @@ struct kvm_vcpu_xen {
 	unsigned long evtchn_pending_sel;
 };
=20
+struct kvm_queued_exception {
+	bool pending;
+	bool injected;
+	bool has_error_code;
+	u8 vector;
+	u32 error_code;
+	unsigned long payload;
+	bool has_payload;
+	u8 nested_apf;
+};
+
 struct kvm_vcpu_arch {
 	/*
 	 * rip and regs accesses must go through
@@ -713,16 +724,8 @@ struct kvm_vcpu_arch {
=20
 	u8 event_exit_inst_len;
=20
-	struct kvm_queued_exception {
-		bool pending;
-		bool injected;
-		bool has_error_code;
-		u8 nr;
-		u32 error_code;
-		unsigned long payload;
-		bool has_payload;
-		u8 nested_apf;
-	} exception;
+	/* Exceptions to be injected to the guest. */
+	struct kvm_queued_exception exception;
=20
 	struct kvm_queued_interrupt {
 		bool injected;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 96bab464967f..bef5a93166a8 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -429,7 +429,7 @@ static void nested_save_pending_event_to_vmcb12(struct =
vcpu_svm *svm,
 	unsigned int nr;
=20
 	if (vcpu->arch.exception.injected) {
-		nr =3D vcpu->arch.exception.nr;
+		nr =3D vcpu->arch.exception.vector;
 		exit_int_info =3D nr | SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_EXEPT;
=20
 		if (vcpu->arch.exception.has_error_code) {
@@ -1154,41 +1154,42 @@ int nested_svm_check_permissions(struct kvm_vcpu *v=
cpu)
=20
 static bool nested_exit_on_exception(struct vcpu_svm *svm)
 {
-	unsigned int nr =3D svm->vcpu.arch.exception.nr;
+	unsigned int vector =3D svm->vcpu.arch.exception.vector;
=20
-	return (svm->nested.ctl.intercepts[INTERCEPT_EXCEPTION] & BIT(nr));
+	return (svm->nested.ctl.intercepts[INTERCEPT_EXCEPTION] & BIT(vector));
 }
=20
-static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm)
+static void nested_svm_inject_exception_vmexit(struct kvm_vcpu *vcpu)
 {
-	unsigned int nr =3D svm->vcpu.arch.exception.nr;
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	struct vcpu_svm *svm =3D to_svm(vcpu);
=20
-	svm->vmcb->control.exit_code =3D SVM_EXIT_EXCP_BASE + nr;
+	svm->vmcb->control.exit_code =3D SVM_EXIT_EXCP_BASE + ex->vector;
 	svm->vmcb->control.exit_code_hi =3D 0;
=20
-	if (svm->vcpu.arch.exception.has_error_code)
-		svm->vmcb->control.exit_info_1 =3D svm->vcpu.arch.exception.error_code;
+	if (ex->has_error_code)
+		svm->vmcb->control.exit_info_1 =3D ex->error_code;
=20
 	/*
 	 * EXITINFO2 is undefined for all exception intercepts other
 	 * than #PF.
 	 */
-	if (nr =3D=3D PF_VECTOR) {
-		if (svm->vcpu.arch.exception.nested_apf)
-			svm->vmcb->control.exit_info_2 =3D svm->vcpu.arch.apf.nested_apf_token;
-		else if (svm->vcpu.arch.exception.has_payload)
-			svm->vmcb->control.exit_info_2 =3D svm->vcpu.arch.exception.payload;
+	if (ex->vector =3D=3D PF_VECTOR) {
+		if (ex->has_payload)
+			svm->vmcb->control.exit_info_2 =3D ex->payload;
 		else
-			svm->vmcb->control.exit_info_2 =3D svm->vcpu.arch.cr2;
-	} else if (nr =3D=3D DB_VECTOR) {
+			svm->vmcb->control.exit_info_2 =3D vcpu->arch.cr2;
+	} else if (ex->vector =3D=3D DB_VECTOR) {
 		/* See inject_pending_event.  */
-		kvm_deliver_exception_payload(&svm->vcpu);
-		if (svm->vcpu.arch.dr7 & DR7_GD) {
-			svm->vcpu.arch.dr7 &=3D ~DR7_GD;
-			kvm_update_dr7(&svm->vcpu);
+		kvm_deliver_exception_payload(vcpu, ex);
+
+		if (vcpu->arch.dr7 & DR7_GD) {
+			vcpu->arch.dr7 &=3D ~DR7_GD;
+			kvm_update_dr7(vcpu);
 		}
-	} else
-		WARN_ON(svm->vcpu.arch.exception.has_payload);
+	} else {
+		WARN_ON(ex->has_payload);
+	}
=20
 	nested_svm_vmexit(svm);
 }
@@ -1226,7 +1227,7 @@ static int svm_check_nested_events(struct kvm_vcpu *v=
cpu)
                         return -EBUSY;
 		if (!nested_exit_on_exception(svm))
 			return 0;
-		nested_svm_inject_exception_vmexit(svm);
+		nested_svm_inject_exception_vmexit(vcpu);
 		return 0;
 	}
=20
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8b7f3c4e383f..5bd6ecb31387 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -384,14 +384,12 @@ static int svm_skip_emulated_instruction(struct kvm_v=
cpu *vcpu)
=20
 static void svm_inject_exception(struct kvm_vcpu *vcpu)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
 	struct vcpu_svm *svm =3D to_svm(vcpu);
-	unsigned nr =3D vcpu->arch.exception.nr;
-	bool has_error_code =3D vcpu->arch.exception.has_error_code;
-	u32 error_code =3D vcpu->arch.exception.error_code;
=20
-	kvm_deliver_exception_payload(vcpu);
+	kvm_deliver_exception_payload(vcpu, ex);
=20
-	if (nr =3D=3D BP_VECTOR && !nrips) {
+	if (ex->vector =3D=3D BP_VECTOR && !nrips) {
 		unsigned long rip, old_rip =3D kvm_rip_read(vcpu);
=20
 		/*
@@ -407,11 +405,11 @@ static void svm_inject_exception(struct kvm_vcpu *vcp=
u)
 		svm->int3_injected =3D rip - old_rip;
 	}
=20
-	svm->vmcb->control.event_inj =3D nr
+	svm->vmcb->control.event_inj =3D ex->vector
 		| SVM_EVTINJ_VALID
-		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
+		| (ex->has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
 		| SVM_EVTINJ_TYPE_EXEPT;
-	svm->vmcb->control.event_inj_err =3D error_code;
+	svm->vmcb->control.event_inj_err =3D ex->error_code;
 }
=20
 static void svm_init_erratum_383(void)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 82b2d9dde611..c88031d44148 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -445,29 +445,27 @@ static bool nested_vmx_is_page_fault_vmexit(struct vm=
cs12 *vmcs12,
  */
 static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned long=
 *exit_qual)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
-	unsigned int nr =3D vcpu->arch.exception.nr;
-	bool has_payload =3D vcpu->arch.exception.has_payload;
-	unsigned long payload =3D vcpu->arch.exception.payload;
=20
-	if (nr =3D=3D PF_VECTOR) {
-		if (vcpu->arch.exception.nested_apf) {
+	if (ex->vector =3D=3D PF_VECTOR) {
+		if (ex->nested_apf) {
 			*exit_qual =3D vcpu->arch.apf.nested_apf_token;
 			return 1;
 		}
-		if (nested_vmx_is_page_fault_vmexit(vmcs12,
-						    vcpu->arch.exception.error_code)) {
-			*exit_qual =3D has_payload ? payload : vcpu->arch.cr2;
+		if (nested_vmx_is_page_fault_vmexit(vmcs12, ex->error_code)) {
+			*exit_qual =3D ex->has_payload ? ex->payload : vcpu->arch.cr2;
 			return 1;
 		}
-	} else if (vmcs12->exception_bitmap & (1u << nr)) {
-		if (nr =3D=3D DB_VECTOR) {
-			if (!has_payload) {
-				payload =3D vcpu->arch.dr6;
-				payload &=3D ~DR6_BT;
-				payload ^=3D DR6_ACTIVE_LOW;
+	} else if (vmcs12->exception_bitmap & (1u << ex->vector)) {
+		if (ex->vector =3D=3D DB_VECTOR) {
+			if (ex->has_payload) {
+				*exit_qual =3D ex->payload;
+			} else {
+				*exit_qual =3D vcpu->arch.dr6;
+				*exit_qual &=3D ~DR6_BT;
+				*exit_qual ^=3D DR6_ACTIVE_LOW;
 			}
-			*exit_qual =3D payload;
 		} else
 			*exit_qual =3D 0;
 		return 1;
@@ -3701,7 +3699,7 @@ static void vmcs12_save_pending_event(struct kvm_vcpu=
 *vcpu,
 	unsigned int nr;
=20
 	if (vcpu->arch.exception.injected) {
-		nr =3D vcpu->arch.exception.nr;
+		nr =3D vcpu->arch.exception.vector;
 		idt_vectoring =3D nr | VECTORING_INFO_VALID_MASK;
=20
 		if (kvm_exception_is_soft(nr)) {
@@ -3803,23 +3801,23 @@ static int vmx_complete_nested_posted_interrupt(str=
uct kvm_vcpu *vcpu)
 static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
 					       unsigned long exit_qual)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	u32 intr_info =3D ex->vector | INTR_INFO_VALID_MASK;
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
-	unsigned int nr =3D vcpu->arch.exception.nr;
-	u32 intr_info =3D nr | INTR_INFO_VALID_MASK;
=20
-	if (vcpu->arch.exception.has_error_code) {
-		/*
+	if (ex->has_error_code) {
+				/*
 		 * Intel CPUs will never generate an error code with bits 31:16
 		 * set, and more importantly VMX disallows setting bits 31:16
 		 * in the injected error code for VM-Entry.  Drop the bits to
 		 * mimic hardware and avoid inducing failure on nested VM-Entry
 		 * if L1 chooses to inject the exception back to L2.
 		 */
-		vmcs12->vm_exit_intr_error_code =3D (u16)vcpu->arch.exception.error_code;
+		vmcs12->vm_exit_intr_error_code =3D (u16)ex->error_code;
 		intr_info |=3D INTR_INFO_DELIVER_CODE_MASK;
 	}
=20
-	if (kvm_exception_is_soft(nr))
+	if (kvm_exception_is_soft(ex->vector))
 		intr_info |=3D INTR_TYPE_SOFT_EXCEPTION;
 	else
 		intr_info |=3D INTR_TYPE_HARD_EXCEPTION;
@@ -3850,7 +3848,7 @@ static void nested_vmx_inject_exception_vmexit(struct=
 kvm_vcpu *vcpu,
 static inline unsigned long vmx_get_pending_dbg_trap(struct kvm_vcpu *vcpu)
 {
 	if (!vcpu->arch.exception.pending ||
-	    vcpu->arch.exception.nr !=3D DB_VECTOR)
+	    vcpu->arch.exception.vector !=3D DB_VECTOR)
 		return 0;
=20
 	/* General Detect #DBs are always fault-like. */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f3f16271fa2c..9f9b601fd6f6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1577,7 +1577,7 @@ static void vmx_update_emulated_instruction(struct kv=
m_vcpu *vcpu)
 	 */
 	if (nested_cpu_has_mtf(vmcs12) &&
 	    (!vcpu->arch.exception.pending ||
-	     vcpu->arch.exception.nr =3D=3D DB_VECTOR))
+	     vcpu->arch.exception.vector =3D=3D DB_VECTOR))
 		vmx->nested.mtf_pending =3D true;
 	else
 		vmx->nested.mtf_pending =3D false;
@@ -1604,15 +1604,13 @@ static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
=20
 static void vmx_inject_exception(struct kvm_vcpu *vcpu)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	u32 intr_info =3D ex->vector | INTR_INFO_VALID_MASK;
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
-	unsigned nr =3D vcpu->arch.exception.nr;
-	bool has_error_code =3D vcpu->arch.exception.has_error_code;
-	u32 error_code =3D vcpu->arch.exception.error_code;
-	u32 intr_info =3D nr | INTR_INFO_VALID_MASK;
=20
-	kvm_deliver_exception_payload(vcpu);
+	kvm_deliver_exception_payload(vcpu, ex);
=20
-	if (has_error_code) {
+	if (ex->has_error_code) {
 		/*
 		 * Despite the error code being architecturally defined as 32
 		 * bits, and the VMCS field being 32 bits, Intel CPUs and thus
@@ -1622,21 +1620,21 @@ static void vmx_inject_exception(struct kvm_vcpu *v=
cpu)
 		 * the upper bits to avoid VM-Fail, losing information that
 		 * does't really exist is preferable to killing the VM.
 		 */
-		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, (u16)error_code);
+		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, (u16)ex->error_code);
 		intr_info |=3D INTR_INFO_DELIVER_CODE_MASK;
 	}
=20
 	if (vmx->rmode.vm86_active) {
 		int inc_eip =3D 0;
-		if (kvm_exception_is_soft(nr))
+		if (kvm_exception_is_soft(ex->vector))
 			inc_eip =3D vcpu->arch.event_exit_inst_len;
-		kvm_inject_realmode_interrupt(vcpu, nr, inc_eip);
+		kvm_inject_realmode_interrupt(vcpu, ex->vector, inc_eip);
 		return;
 	}
=20
 	WARN_ON_ONCE(vmx->emulation_required);
=20
-	if (kvm_exception_is_soft(nr)) {
+	if (kvm_exception_is_soft(ex->vector)) {
 		vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
 			     vmx->vcpu.arch.event_exit_inst_len);
 		intr_info |=3D INTR_TYPE_SOFT_EXCEPTION;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 452fbb55d9d2..74843767383c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -563,16 +563,13 @@ static int exception_type(int vector)
 	return EXCPT_FAULT;
 }
=20
-void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu)
+void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
+				   struct kvm_queued_exception *ex)
 {
-	unsigned nr =3D vcpu->arch.exception.nr;
-	bool has_payload =3D vcpu->arch.exception.has_payload;
-	unsigned long payload =3D vcpu->arch.exception.payload;
-
-	if (!has_payload)
+	if (!ex->has_payload)
 		return;
=20
-	switch (nr) {
+	switch (ex->vector) {
 	case DB_VECTOR:
 		/*
 		 * "Certain debug exceptions may clear bit 0-3.  The
@@ -597,8 +594,8 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcp=
u)
 		 * So they need to be flipped for DR6.
 		 */
 		vcpu->arch.dr6 |=3D DR6_ACTIVE_LOW;
-		vcpu->arch.dr6 |=3D payload;
-		vcpu->arch.dr6 ^=3D payload & DR6_ACTIVE_LOW;
+		vcpu->arch.dr6 |=3D ex->payload;
+		vcpu->arch.dr6 ^=3D ex->payload & DR6_ACTIVE_LOW;
=20
 		/*
 		 * The #DB payload is defined as compatible with the 'pending
@@ -609,12 +606,12 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *v=
cpu)
 		vcpu->arch.dr6 &=3D ~BIT(12);
 		break;
 	case PF_VECTOR:
-		vcpu->arch.cr2 =3D payload;
+		vcpu->arch.cr2 =3D ex->payload;
 		break;
 	}
=20
-	vcpu->arch.exception.has_payload =3D false;
-	vcpu->arch.exception.payload =3D 0;
+	ex->has_payload =3D false;
+	ex->payload =3D 0;
 }
 EXPORT_SYMBOL_GPL(kvm_deliver_exception_payload);
=20
@@ -653,17 +650,18 @@ static void kvm_multiple_exception(struct kvm_vcpu *v=
cpu,
 			vcpu->arch.exception.injected =3D false;
 		}
 		vcpu->arch.exception.has_error_code =3D has_error;
-		vcpu->arch.exception.nr =3D nr;
+		vcpu->arch.exception.vector =3D nr;
 		vcpu->arch.exception.error_code =3D error_code;
 		vcpu->arch.exception.has_payload =3D has_payload;
 		vcpu->arch.exception.payload =3D payload;
 		if (!is_guest_mode(vcpu))
-			kvm_deliver_exception_payload(vcpu);
+			kvm_deliver_exception_payload(vcpu,
+						      &vcpu->arch.exception);
 		return;
 	}
=20
 	/* to check exception */
-	prev_nr =3D vcpu->arch.exception.nr;
+	prev_nr =3D vcpu->arch.exception.vector;
 	if (prev_nr =3D=3D DF_VECTOR) {
 		/* triple fault -> shutdown */
 		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
@@ -681,7 +679,7 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcp=
u,
 		vcpu->arch.exception.pending =3D true;
 		vcpu->arch.exception.injected =3D false;
 		vcpu->arch.exception.has_error_code =3D true;
-		vcpu->arch.exception.nr =3D DF_VECTOR;
+		vcpu->arch.exception.vector =3D DF_VECTOR;
 		vcpu->arch.exception.error_code =3D 0;
 		vcpu->arch.exception.has_payload =3D false;
 		vcpu->arch.exception.payload =3D 0;
@@ -4826,25 +4824,24 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vc=
pu *vcpu,
 static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 					       struct kvm_vcpu_events *events)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+
 	process_nmi(vcpu);
=20
 	if (kvm_check_request(KVM_REQ_SMI, vcpu))
 		process_smi(vcpu);
=20
 	/*
-	 * In guest mode, payload delivery should be deferred,
-	 * so that the L1 hypervisor can intercept #PF before
-	 * CR2 is modified (or intercept #DB before DR6 is
-	 * modified under nVMX). Unless the per-VM capability,
-	 * KVM_CAP_EXCEPTION_PAYLOAD, is set, we may not defer the delivery of
-	 * an exception payload and handle after a KVM_GET_VCPU_EVENTS. Since we
-	 * opportunistically defer the exception payload, deliver it if the
-	 * capability hasn't been requested before processing a
-	 * KVM_GET_VCPU_EVENTS.
+	 * In guest mode, payload delivery should be deferred if the exception
+	 * will be intercepted by L1, e.g. KVM should not modifying CR2 if L1
+	 * intercepts #PF, ditto for DR6 and #DBs.  If the per-VM capability,
+	 * KVM_CAP_EXCEPTION_PAYLOAD, is not set, userspace may or may not
+	 * propagate the payload and so it cannot be safely deferred.  Deliver
+	 * the payload if the capability hasn't been requested.
 	 */
 	if (!vcpu->kvm->arch.exception_payload_enabled &&
-	    vcpu->arch.exception.pending && vcpu->arch.exception.has_payload)
-		kvm_deliver_exception_payload(vcpu);
+	    ex->pending && ex->has_payload)
+		kvm_deliver_exception_payload(vcpu, ex);
=20
 	/*
 	 * The API doesn't provide the instruction length for software
@@ -4852,26 +4849,25 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(stru=
ct kvm_vcpu *vcpu,
 	 * isn't advanced, we should expect to encounter the exception
 	 * again.
 	 */
-	if (kvm_exception_is_soft(vcpu->arch.exception.nr)) {
+	if (kvm_exception_is_soft(ex->vector)) {
 		events->exception.injected =3D 0;
 		events->exception.pending =3D 0;
 	} else {
-		events->exception.injected =3D vcpu->arch.exception.injected;
-		events->exception.pending =3D vcpu->arch.exception.pending;
+		events->exception.injected =3D ex->injected;
+		events->exception.pending =3D ex->pending;
 		/*
 		 * For ABI compatibility, deliberately conflate
 		 * pending and injected exceptions when
 		 * KVM_CAP_EXCEPTION_PAYLOAD isn't enabled.
 		 */
 		if (!vcpu->kvm->arch.exception_payload_enabled)
-			events->exception.injected |=3D
-				vcpu->arch.exception.pending;
+			events->exception.injected |=3D ex->pending;
 	}
-	events->exception.nr =3D vcpu->arch.exception.nr;
-	events->exception.has_error_code =3D vcpu->arch.exception.has_error_code;
-	events->exception.error_code =3D vcpu->arch.exception.error_code;
-	events->exception_has_payload =3D vcpu->arch.exception.has_payload;
-	events->exception_payload =3D vcpu->arch.exception.payload;
+	events->exception.nr =3D ex->vector;
+	events->exception.has_error_code =3D ex->has_error_code;
+	events->exception.error_code =3D ex->error_code;
+	events->exception_has_payload =3D ex->has_payload;
+	events->exception_payload =3D ex->payload;
=20
 	events->interrupt.injected =3D
 		vcpu->arch.interrupt.injected && !vcpu->arch.interrupt.soft;
@@ -4938,7 +4934,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct =
kvm_vcpu *vcpu,
 	process_nmi(vcpu);
 	vcpu->arch.exception.injected =3D events->exception.injected;
 	vcpu->arch.exception.pending =3D events->exception.pending;
-	vcpu->arch.exception.nr =3D events->exception.nr;
+	vcpu->arch.exception.vector =3D events->exception.nr;
 	vcpu->arch.exception.has_error_code =3D events->exception.has_error_code;
 	vcpu->arch.exception.error_code =3D events->exception.error_code;
 	vcpu->arch.exception.has_payload =3D events->exception_has_payload;
@@ -9367,7 +9363,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu=
, bool *req_immediate_exit)
=20
 	/* try to inject new event if pending */
 	if (vcpu->arch.exception.pending) {
-		trace_kvm_inj_exception(vcpu->arch.exception.nr,
+		trace_kvm_inj_exception(vcpu->arch.exception.vector,
 					vcpu->arch.exception.has_error_code,
 					vcpu->arch.exception.error_code);
=20
@@ -9384,12 +9380,12 @@ static int inject_pending_event(struct kvm_vcpu *vc=
pu, bool *req_immediate_exit)
 		 * describe the behavior of General Detect #DBs, which are
 		 * fault-like.  They do _not_ set RF, a la code breakpoints.
 		 */
-		if (exception_type(vcpu->arch.exception.nr) =3D=3D EXCPT_FAULT)
+		if (exception_type(vcpu->arch.exception.vector) =3D=3D EXCPT_FAULT)
 			__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
 					     X86_EFLAGS_RF);
=20
-		if (vcpu->arch.exception.nr =3D=3D DB_VECTOR) {
-			kvm_deliver_exception_payload(vcpu);
+		if (vcpu->arch.exception.vector =3D=3D DB_VECTOR) {
+			kvm_deliver_exception_payload(vcpu, &vcpu->arch.exception);
 			if (vcpu->arch.dr7 & DR7_GD) {
 				vcpu->arch.dr7 &=3D ~DR7_GD;
 				kvm_update_dr7(vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index aa86abad914d..d8b44913aa62 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -270,7 +270,8 @@ int kvm_write_guest_virt_system(struct kvm_vcpu *vcpu,
=20
 int handle_ud(struct kvm_vcpu *vcpu);
=20
-void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu);
+void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
+				   struct kvm_queued_exception *ex);
=20
 void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu);
 u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0B56FC433EF
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:29:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S242052AbiCKDa2 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:30:28 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42592 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S243926AbiCKD3u (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:50 -0500
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41D7EF405C
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:28 -0800 (PST)
Received: by mail-pl1-x649.google.com with SMTP id
 b4-20020a170902e94400b0015309b5c481so3229115pll.6
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=j3hSzQOulkLgoxGEFlrcU09LiC13RRkbJn8paPd8DUc=;
        b=IZGMuhv6pzNYzqCrDXwTWtRbqxddxRFhdCvZ8uoAaw7CmdfxwBc741x3qnVcQ9Xkaz
         X4OdaAQcEskKlCvOKDLt2IYqoyRMZgHbt4jXoJdsJ0S2ugv2iMIb2cM3Q1O4sEw3I8nv
         lZRuLv+uh9yGBXeLoylLAnUFVbJTSfjhUbV3T9xUOSG/+ylpjd2VuqC0Rray/+lTgZM0
         MHZj1nAjFcQvPw6lhCWEdvEhWSbbh/IC1U4iXZrx+033jhtPgBPukdQtopWrHIAC/IpE
         2oFAUJq+OrS568/b25v/U2sOPlsuMjwHC41YUhkwsQrOBJVrTcONRnfngzHsGuL9qPCQ
         SQ0w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=j3hSzQOulkLgoxGEFlrcU09LiC13RRkbJn8paPd8DUc=;
        b=T9ucTmBkqxre+5V7UWfaPhqDhSMl4EL6iCsvTkEXpb5Q7NRKb8prJlZzHifiVAE4dr
         wDQ9YmxaJHAmJie7pW2QMCA6TvqKdMLxs/244lT2OxHWE8t1BU7kEwQ8kcUvhCPWGZRy
         75zB4rbVk+vAa2EtdWFFtKnF1FnmCLPXF+Th+YQo6zMSbfLKnWQ1o3j4H3qof/aBcP1P
         /eANqthZygaTnQ6qqEhAUk98Vb3I8f9UMrA1u2HuWjkh1mlmJOdZHFir+/JQmlmO3mEH
         5v8k8fmNCvOkgWi3oi6/DV2/w8RZ9E+D6EgFCdYRbw6HBiTx47E3aVB1MmAA2UCZs8/l
         iL/Q==
X-Gm-Message-State: AOAM5307mR+lWZ3CHTK4dY5QHnitHkWUm7wst59H3dYgD22jN1aquzJN
        kU2V8a7WEm9L7ovoDsvfeLmaohw6mJQ=
X-Google-Smtp-Source: 
 ABdhPJwVJ3a6Z6fMYsIBvTtrAVKhm73bHMtoC/OPFymL39Hecva/G3A1RMlXm4qBS+5MwK8oxpGI2jPsVEU=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:902:70c6:b0:153:2444:6dcd with SMTP id
 l6-20020a17090270c600b0015324446dcdmr5228529plt.55.1646969307773; Thu, 10 Mar
 2022 19:28:27 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:54 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-15-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 14/21] KVM: x86: Formalize blocking of nested pending
 exceptions
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Capture nested_run_pending as block_pending_exceptions so that the logic
of why exceptions are blocked only needs to be documented once instead of
at every place that employs the logic.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/nested.c | 20 ++++++++++----------
 arch/x86/kvm/vmx/nested.c | 23 ++++++++++++-----------
 2 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index bef5a93166a8..60df9d4d19b5 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1201,10 +1201,16 @@ static inline bool nested_exit_on_init(struct vcpu_=
svm *svm)
=20
 static int svm_check_nested_events(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_svm *svm =3D to_svm(vcpu);
-	bool block_nested_events =3D
-		kvm_event_needs_reinjection(vcpu) || svm->nested.nested_run_pending;
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
+	struct vcpu_svm *svm =3D to_svm(vcpu);
+	/*
+	 * Only a pending nested run blocks a pending exception.  If there is a
+	 * previously injected event, the pending exception occurred while said
+	 * event was being delivered and thus needs to be handled.
+	 */
+	bool block_nested_exceptions =3D svm->nested.nested_run_pending;
+	bool block_nested_events =3D block_nested_exceptions ||
+				   kvm_event_needs_reinjection(vcpu);
=20
 	if (lapic_in_kernel(vcpu) &&
 	    test_bit(KVM_APIC_INIT, &apic->pending_events)) {
@@ -1217,13 +1223,7 @@ static int svm_check_nested_events(struct kvm_vcpu *=
vcpu)
 	}
=20
 	if (vcpu->arch.exception.pending) {
-		/*
-		 * Only a pending nested run can block a pending exception.
-		 * Otherwise an injected NMI/interrupt should either be
-		 * lost or delivered to the nested hypervisor in the EXITINTINFO
-		 * vmcb field, while delivering the pending exception.
-		 */
-		if (svm->nested.nested_run_pending)
+		if (block_nested_exceptions)
                         return -EBUSY;
 		if (!nested_exit_on_exception(svm))
 			return 0;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index c88031d44148..01cf579c0260 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3878,11 +3878,17 @@ static bool nested_vmx_preemption_timer_pending(str=
uct kvm_vcpu *vcpu)
=20
 static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
-	unsigned long exit_qual;
-	bool block_nested_events =3D
-	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
+	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
+	unsigned long exit_qual;
+	/*
+	 * Only a pending nested run blocks a pending exception.  If there is a
+	 * previously injected event, the pending exception occurred while said
+	 * event was being delivered and thus needs to be handled.
+	 */
+	bool block_nested_exceptions =3D vmx->nested.nested_run_pending;
+	bool block_nested_events =3D block_nested_exceptions ||
+				   kvm_event_needs_reinjection(vcpu);
=20
 	if (lapic_in_kernel(vcpu) &&
 		test_bit(KVM_APIC_INIT, &apic->pending_events)) {
@@ -3916,15 +3922,10 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 	 * Process exceptions that are higher priority than Monitor Trap Flag:
 	 * fault-like exceptions, TSS T flag #DB (not emulated by KVM, but
 	 * could theoretically come in from userspace), and ICEBP (INT1).
-	 *
-	 * Note that only a pending nested run can block a pending exception.
-	 * Otherwise an injected NMI/interrupt should either be
-	 * lost or delivered to the nested hypervisor in the IDT_VECTORING_INFO,
-	 * while delivering the pending exception.
 	 */
 	if (vcpu->arch.exception.pending &&
 	    !(vmx_get_pending_dbg_trap(vcpu) & ~DR6_BT)) {
-		if (vmx->nested.nested_run_pending)
+		if (block_nested_exceptions)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
 			goto no_vmexit;
@@ -3941,7 +3942,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 	}
=20
 	if (vcpu->arch.exception.pending) {
-		if (vmx->nested.nested_run_pending)
+		if (block_nested_exceptions)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
 			goto no_vmexit;
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0216AC433EF
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:29:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346218AbiCKDaz (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:30:55 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44672 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346168AbiCKD3u (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:50 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 314FBF4602
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:30 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 x6-20020a17090aa38600b001c227fbfbc5so1268717pjp.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=/3hTzlaDbI+285CROy2A1iE61ckSvYZ31v9FVpbRb8s=;
        b=UX1w/P7jg1a798gyvWXu1wypx7wjM/pHwylJXYUX138rvUTE46OHEe6KKLLOy83Bi1
         IM0oPux+Lp3TbcuqMvJEZ3/pb77EFhen/M3xXOxjgRmTgJ9QcKNl3RNoWufP/tH9tAvG
         Dwl99UB0XD8Oo7KlvGMsmYJ8xNf3UGsO0bhsXzQkJ7w6IFAbLbvJ3ZmaQZ+DaT3qbn+C
         qt3Mhzh88XlbQL2fdRPv96/xDlcNc3XkUb8oBgzfREcXxcRry8GL0uinjqpchEMG0fva
         eOr6a1JabfQu05p6PpLiVxdYsAvpshzxeDB5AsPtFFVo5629JXMgCLTbQ7rXIHLQ62rx
         RUkw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=/3hTzlaDbI+285CROy2A1iE61ckSvYZ31v9FVpbRb8s=;
        b=XYvPJIXOU534MTFinHot7wSf9PiwpNpHCJRvyPCheoikn/VYVwBmghxpSMzsAt/uoz
         2zLvH/cB+MVkPMasfAG2aaC42Rrv37lgXXkBtXg2Ta+LKR8OGN4fswvWZTJBIhmjcK+D
         +jLVzT2LmJWA2W3S1lHd22/5JEhqACh8khhY4H/HwClbWPtKbpXsgpoN3xBPGsBXFCn/
         v4li6Xfs2R7S5Fek97LgsI42b04QTOFmfQTruQn90yIy/h/MLwdfXc5lo2n7neIIgdYm
         Pa8SBEb4LTrnMXa2gJjGKSM807Evs1btbc1DrLE0dayJ+KhABdyz1xYVajaPjU/XvVDt
         RK8A==
X-Gm-Message-State: AOAM532gcZSHJhB4HnPs7GhYk0q2qGyrjhxUBTDduI1vtaPtwwmLRCvU
        VPZi5bVx7/P3LZLpVhzFKuFqujnK/0g=
X-Google-Smtp-Source: 
 ABdhPJxJslK22N6B7y/w6enIzRagBV9xKv6A0m5KOVOgB6/0UtNBTn43PhLt3jDYbUTJdLT8qSfQGDF41zw=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:902:e84b:b0:151:ca71:7d3d with SMTP id
 t11-20020a170902e84b00b00151ca717d3dmr8295289plg.26.1646969309525; Thu, 10
 Mar 2022 19:28:29 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:55 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-16-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 15/21] KVM: x86: Use kvm_queue_exception_e() to queue #DF
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Queue #DF by recursing on kvm_multiple_exception() by way of
kvm_queue_exception_e() instead of open coding the behavior.  This will
allow KVM to Just Work when a future commit moves exception interception
checks (for L2 =3D> L1) into kvm_multiple_exception().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 74843767383c..d6fbff896263 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -669,25 +669,22 @@ static void kvm_multiple_exception(struct kvm_vcpu *v=
cpu,
 	}
 	class1 =3D exception_class(prev_nr);
 	class2 =3D exception_class(nr);
-	if ((class1 =3D=3D EXCPT_CONTRIBUTORY && class2 =3D=3D EXCPT_CONTRIBUTORY)
-		|| (class1 =3D=3D EXCPT_PF && class2 !=3D EXCPT_BENIGN)) {
+	if ((class1 =3D=3D EXCPT_CONTRIBUTORY && class2 =3D=3D EXCPT_CONTRIBUTORY=
) ||
+	    (class1 =3D=3D EXCPT_PF && class2 !=3D EXCPT_BENIGN)) {
 		/*
-		 * Generate double fault per SDM Table 5-5.  Set
-		 * exception.pending =3D true so that the double fault
-		 * can trigger a nested vmexit.
+		 * Synthesize #DF.  Clear the previously injected or pending
+		 * exception so as not to incorrectly trigger shutdown.
 		 */
-		vcpu->arch.exception.pending =3D true;
 		vcpu->arch.exception.injected =3D false;
-		vcpu->arch.exception.has_error_code =3D true;
-		vcpu->arch.exception.vector =3D DF_VECTOR;
-		vcpu->arch.exception.error_code =3D 0;
-		vcpu->arch.exception.has_payload =3D false;
-		vcpu->arch.exception.payload =3D 0;
-	} else
+		vcpu->arch.exception.pending =3D false;
+
+		kvm_queue_exception_e(vcpu, DF_VECTOR, 0);
+	} else {
 		/* replace previous exception with a new one in a hope
 		   that instruction re-execution will regenerate lost
 		   exception */
 		goto queue;
+	}
 }
=20
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B8F55C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:29:31 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346110AbiCKDac (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:30:32 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44128 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346115AbiCKD3v (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:51 -0500
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6046F460F
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:31 -0800 (PST)
Received: by mail-pj1-x104a.google.com with SMTP id
 c7-20020a17090a674700b001beef0afd32so4529762pjm.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=i+4VKT/T0igK4H6nIL8Gfamr1MyWuYg2mmUZ99jzwkY=;
        b=hwmPG6QQd/bcIwGYLJ4bWXMayKyuFTxlx4Y872OX386vf6n4mWQITftMTJ701ufphg
         WXitE3WSt0Yit1dYeolSLwidjXqLSFhcPg8M7Tc3SkigjcUnSDOhRpxBdHza2RQg9X/V
         uGaV6XZjtoI5JPi+WVmo79QEtfM7sghplCMR+OJTG9C9z4RDvU29IA0d5ikyzYJkZ7DU
         2sB7Fw9wdNUtCm25OGxkXLs9Pk+zw7tiuoC1IjKr2UO6rgREHEJdwr+nejCw+1AN+o3Q
         Htn0FaAXoCGRm6wmdYxSqXzHVmwxTrh2b0KBhhJMTajRVrTTKYJS2rFTDffDWN94gZ0+
         okZQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=i+4VKT/T0igK4H6nIL8Gfamr1MyWuYg2mmUZ99jzwkY=;
        b=ze+C/lBKIhkDi7xDO5MC62UQFS21JN4en4xhZSIzUooYKJ4ApZcypQAyA/p1nc7/Ts
         WCGi5qw7BNhljRmiDQH6uga2NqUc9BwlnP3U0gct+sjdRpPLITp2P9ZblutuMRHa8kJC
         vLsZp0QeuQWvxbwoItHFV7EVVNAmKfyqncGFGBvVsufVSYeZkw0zAocricN6gjSQ8h2M
         SaPOpz0Jp+ftgSPnormG13LxrMUHmaSPsKVSjS6ZWZmAxStZmrK8O0+Rg+4brXNaoIgW
         FajW1kt5XoZXY45JBW4RKCb0q+Yq/7T3ESrrH9TAZsA59IRu4791lP0v0VkBt7u2d9/V
         4Ysg==
X-Gm-Message-State: AOAM531mq5a0/Sn4S2xst49vaxYPDZ3T6gZX8cPAFJwcYQ2p2BVMpNyt
        QAFTY6qYIWfkjpEhLg/IaT2ExtZRY1k=
X-Google-Smtp-Source: 
 ABdhPJwP3kPk0WzwJcRj8UYWr/Hbf4nQuNWxLQglFe71shnkQamqtXcMQveMwcydk+qVemlF9uPbs253KKA=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:902:9681:b0:150:288:7440 with SMTP id
 n1-20020a170902968100b0015002887440mr8268721plp.166.1646969311239; Thu, 10
 Mar 2022 19:28:31 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:56 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-17-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 16/21] KVM: x86: Hoist nested event checks above event
 injection logic
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Perform nested event checks before re-injecting exceptions/events into
L2.  If a pending exception causes VM-Exit to L1, re-injecting events
into vmcs02 is premature and wasted effort.  Take care to ensure events
that need to be re-injected are still re-injected if checking for nested
events "fails", i.e. if KVM needs to force an immediate entry+exit to
complete the to-be-re-injecteed event.

Keep the "can_inject" logic the same for now; it too can be pushed below
the nested checks, but is a slightly riskier change (see past bugs about
events not being properly purged on nested VM-Exit).

Add and/or modify comments to better document the various interactions.
Of note is the comment regarding "blocking" previously injected NMIs and
IRQs if an exception is pending.  The old comment isn't wrong strictly
speaking, but it failed to capture the reason why the logic even exists.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 89 +++++++++++++++++++++++++++-------------------
 1 file changed, 53 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d6fbff896263..c1cd2166fe22 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9310,53 +9310,70 @@ static void kvm_inject_exception(struct kvm_vcpu *v=
cpu)
=20
 static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate=
_exit)
 {
+	bool can_inject =3D !kvm_event_needs_reinjection(vcpu);
 	int r;
-	bool can_inject =3D true;
=20
-	/* try to reinject previous events if any */
+	/*
+	 * Process nested events first, as nested VM-Exit supercedes event
+	 * re-injection.  If there's an event queued for re-injection, it will
+	 * be saved into the appropriate vmc{b,s}12 fields on nested VM-Exit.
+	 */
+	if (is_guest_mode(vcpu))
+		r =3D kvm_check_nested_events(vcpu);
+	else
+		r =3D 0;
=20
-	if (vcpu->arch.exception.injected) {
+	/*
+	 * Re-inject exceptions and events *especially* if immediate entry+exit
+	 * to/from L2 is needed, as any event that has already been injected
+	 * into L2 needs to complete its lifecycle before injecting a new event.
+	 *
+	 * Don't re-inject an NMI or interrupt if there is a pending exception.
+	 * This collision arises if an exception occurred while vectoring the
+	 * injected event, KVM intercepted said exception, and KVM ultimately
+	 * determined the fault belongs to the guest and queues the exception
+	 * for injection back into the guest.
+	 *
+	 * "Injected" interrupts can also collide with pending exceptions if
+	 * userspace ignores the "ready for injection" flag and blindly queues
+	 * an interrupt.  In that case, prioritizing the exception is correct,
+	 * as the exception "occurred" before the exit to userspace.  Trap-like
+	 * exceptions, e.g. most #DBs, have higher priority than interrupts.
+	 * And while fault-like exceptions, e.g. #GP and #PF, are the lowest
+	 * priority, they're only generated (pended) during instruction
+	 * execution, and interrupts are recognized at instruction boundaries.
+	 * Thus a pending fault-like exception means the fault occurred on the
+	 * *previous* instruction and must be serviced prior to recognizing any
+	 * new events in order to fully complete the previous instruction.
+	 */
+	if (vcpu->arch.exception.injected)
 		kvm_inject_exception(vcpu);
-		can_inject =3D false;
-	}
+	else if (vcpu->arch.exception.pending)
+		; /* see above */
+	else if (vcpu->arch.nmi_injected)
+		static_call(kvm_x86_inject_nmi)(vcpu);
+	else if (vcpu->arch.interrupt.injected)
+		static_call(kvm_x86_inject_irq)(vcpu);
+
 	/*
-	 * Do not inject an NMI or interrupt if there is a pending
-	 * exception.  Exceptions and interrupts are recognized at
-	 * instruction boundaries, i.e. the start of an instruction.
-	 * Trap-like exceptions, e.g. #DB, have higher priority than
-	 * NMIs and interrupts, i.e. traps are recognized before an
-	 * NMI/interrupt that's pending on the same instruction.
-	 * Fault-like exceptions, e.g. #GP and #PF, are the lowest
-	 * priority, but are only generated (pended) during instruction
-	 * execution, i.e. a pending fault-like exception means the
-	 * fault occurred on the *previous* instruction and must be
-	 * serviced prior to recognizing any new events in order to
-	 * fully complete the previous instruction.
+	 * Exceptions that morph to VM-Exits are handled above, and pending
+	 * exceptions on top of injected exceptions that do not VM-Exit should
+	 * either morph to #DF or, sadly, override the injected exception.
 	 */
-	else if (!vcpu->arch.exception.pending) {
-		if (vcpu->arch.nmi_injected) {
-			static_call(kvm_x86_inject_nmi)(vcpu);
-			can_inject =3D false;
-		} else if (vcpu->arch.interrupt.injected) {
-			static_call(kvm_x86_inject_irq)(vcpu);
-			can_inject =3D false;
-		}
-	}
-
 	WARN_ON_ONCE(vcpu->arch.exception.injected &&
 		     vcpu->arch.exception.pending);
=20
 	/*
-	 * Call check_nested_events() even if we reinjected a previous event
-	 * in order for caller to determine if it should require immediate-exit
-	 * from L2 to L1 due to pending L1 events which require exit
-	 * from L2 to L1.
+	 * Bail if immediate entry+exit to/from the guest is needed to complete
+	 * nested VM-Enter or event re-injection so that a different pending
+	 * event can be serviced (or if KVM needs to exit to userspace).
+	 *
+	 * Otherwise, continue processing events even if VM-Exit occurred.  The
+	 * VM-Exit will have cleared exceptions that were meant for L2, but
+	 * there may now be events that can be injected into L1.
 	 */
-	if (is_guest_mode(vcpu)) {
-		r =3D kvm_check_nested_events(vcpu);
-		if (r < 0)
-			goto out;
-	}
+	if (r < 0)
+		goto out;
=20
 	/* try to inject new event if pending */
 	if (vcpu->arch.exception.pending) {
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 64769C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:29:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346153AbiCKDai (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:30:38 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44410 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346067AbiCKD3w (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:52 -0500
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E907FF47CB
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:33 -0800 (PST)
Received: by mail-pj1-x104a.google.com with SMTP id
 c14-20020a17090a674e00b001bf1c750f9bso7059918pjm.9
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=GjFGC3zkLsAKJ8vvrLjGXYpYaWOgGo2yckKlLK/AIVY=;
        b=KQO3w+xmpoTyRvXAzcDfzPqPSx3WjGUr9L3gTJqQHiDJ2yTrH27YOKscDQl3Bp5Ua3
         JfqPIci9q5Jz4WO1A/G7lKuGKCZy7o4nCNEH/GgMX4qgdeli7FwIJjONYLTXwvxHizcN
         ozj8xFK5uiv1S8Bxp5Q7v9KiTJL0C6gmswS3YDE6EO4srnr7tfn3oEo4fSfRoseyDUPa
         nYhSc4QJzLV3W0edaaRrTPITCokuqgad2WWpUjLMM05Y7Eo0ViwaEzaYdn5qjPsJf2PM
         nHFQ4o/2Gnq/9A+Zylc7yuArpklMIbfeTAmAoMdwT6rHfpUMp50gbBjzoaytKI6TAL+F
         Jx4g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=GjFGC3zkLsAKJ8vvrLjGXYpYaWOgGo2yckKlLK/AIVY=;
        b=xGxnfSqFp2kvF5ald77856UqqdzxihfFiP7mv7NkTRX+X7MuCyumn8RsA0qweXweXY
         2tNHBDr4urg+Pp9ZfL26e9cBoXC55Qn6CWc4V27xjfeAQMqyw3T2NNcYjCHqVsthutxM
         VA7Kl9UOt76dWRCO6F+1dySU3ijIJLakK73v1MnUbU6C5XEynl0RsMvXcc10QMWzDiWw
         9w8iEJ6J+nU1gQqbIKj+BfOYWsTJ3ZAFHBKPeTLeHW4OVea57NPfVRAQxCB0AJrYyGy2
         ZL1Va+gsYBv/e00Wanwo5xYhhDrWqcGspoYkLPR4IZty0+PvxbLFw4LUxCXucYzNK9zQ
         whRw==
X-Gm-Message-State: AOAM533tKr18jLTOrLg2zNVvXi1kwH0JkhVtS6laKlbeCePBuj1Ads+v
        +iSL4x1Ec1MaQDWl17kXDJedrhJK4TI=
X-Google-Smtp-Source: 
 ABdhPJzHDnuZl/O/d0//LQZL3QjijqGxvFKvqsDfvyxeDIx3v/bkVReGqtS28e14C/BWbLfRKeuJ3m7CVLg=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90b:4d81:b0:1bf:8ce4:4f51 with SMTP id
 oj1-20020a17090b4d8100b001bf8ce44f51mr404251pjb.0.1646969313078; Thu, 10 Mar
 2022 19:28:33 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:57 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-18-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 17/21] KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after
 potential VM-Exit
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Determine whether or not new events can be injected after checking nested
events.  If a VM-Exit occurred during nested event handling, any previous
event that needed re-injection is gone from's KVM perspective; the event
is captured in the vmc*12 VM-Exit information, but doesn't exist in terms
of what needs to be done for entry to L1.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c1cd2166fe22..327a935712fb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9310,7 +9310,7 @@ static void kvm_inject_exception(struct kvm_vcpu *vcp=
u)
=20
 static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate=
_exit)
 {
-	bool can_inject =3D !kvm_event_needs_reinjection(vcpu);
+	bool can_inject;
 	int r;
=20
 	/*
@@ -9375,7 +9375,13 @@ static int inject_pending_event(struct kvm_vcpu *vcp=
u, bool *req_immediate_exit)
 	if (r < 0)
 		goto out;
=20
-	/* try to inject new event if pending */
+	/*
+	 * New events, other than exceptions, cannot be injected if KVM needs
+	 * to re-inject a previous event.  See above comments on re-injecting
+	 * for why pending exceptions get priority.
+	 */
+	can_inject =3D !kvm_event_needs_reinjection(vcpu);
+
 	if (vcpu->arch.exception.pending) {
 		trace_kvm_inj_exception(vcpu->arch.exception.vector,
 					vcpu->arch.exception.has_error_code,
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 27BF6C433EF
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:29:45 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S238284AbiCKDao (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:30:44 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41944 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346061AbiCKD36 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:29:58 -0500
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 425BEF47F5
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:35 -0800 (PST)
Received: by mail-pj1-x104a.google.com with SMTP id
 p15-20020a17090a748f00b001bf3ba2ae95so4514714pjk.9
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=XalHmACPaVogRsmGuyX2dRrnQsx96NVwsDAjrfZuVog=;
        b=N3MkpEnnRpeTMoP1TqPa/V0g+AyGv/rtQstfL9UljvE7oGRw735eBz2Y9uHfeTfJIt
         t5JGh0hDLVBFCF7DdtlYVADpG3RZ3cTeQyJ2Kr8z92BfFmGD49czCZPkijNT3LF7p0M5
         JkTRjmD1DsJNzETqVGxX0FXUkhkzThr+IwqFAGi59/un0TSllyDOlNCJXa9bhNtsyvSz
         u0gyz4OuZSz46XKLKRcD63Nuez3SZuuSAiqBZ4wYewOoO6S2Mn3ExczrkhHxJq/2PzQT
         iQ1L0oHsTTvVne1ID5cw+IwQFv+aH/Q6ZKNaDV5XqxbIg67A04/r7gi1ou6p/y9aXFFw
         5cSw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=XalHmACPaVogRsmGuyX2dRrnQsx96NVwsDAjrfZuVog=;
        b=UDPKRk8d91LlvrLVlJ/yZauftI9FtaZDoGJJSYAR0pUFnH9BAKYyuQX96ivgoo25Ll
         gogr+dDJUs1NW85+0jFFQpi31W3ExaKjXLrK2N82+Kq59upPO4dpPigJo9ie/iVdkr+l
         GDpyjxnsYp7GbWujvSaocFqmBcwPGff8rZ+GNUtWEGnmfHFiVKHq2uNfA6HbawVu8YN4
         uBMrApcmCsU/lo2DTLZVCbrDcwZOzda9XAOEFYqRhoI1Hg41S+sfj5RW/qLBGha16TmB
         +iOA7uS7o5rvxi8ZvmdYifgZeQ9XWpwar6yLFxxE/MmUqNTld4YV0mDitqXjU6pgAv3m
         2d9Q==
X-Gm-Message-State: AOAM532DSVspMsmVzI5Xq2PhjHRDoakmL7XpLIA1y1NfO3QZ+Ehe9Q1N
        cNwApIE3HynsV0S09O7ax44cPgieKZE=
X-Google-Smtp-Source: 
 ABdhPJz9qCbjDxwbihwa4xlN0T4W7Oe0XDZbCIaX1TyJKoOjbvwdzW9C9YcvLWX1oEy14qbxYyeL/tWosIc=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:b96:b0:4f3:c0f6:5c47 with SMTP id
 g22-20020a056a000b9600b004f3c0f65c47mr8236942pfj.69.1646969314761; Thu, 10
 Mar 2022 19:28:34 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:58 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-19-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 18/21] KVM: x86: Morph pending exceptions to pending VM-Exits
 at queue time
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Morph pending exceptions to pending VM-Exits (due to interception) when
the exception is queued instead of waiting until nested events are
checked at VM-Entry.  This fixes a longstanding bug where KVM fails to
handle an exception that occurs during delivery of a previous exception,
KVM (L0) and L1 both want to intercept the exception (e.g. #PF for shadow
paging), and KVM determines that the exception is in the guest's domain,
i.e. queues the new exception for L2.  Deferring the interception check
causes KVM to esclate various combinations of injected+pending exceptions
to double fault (#DF) without consulting L1's interception desires, and
ends up injecting a spurious #DF into L2.

KVM has fudged around the issue for #PF by special casing emulated #PF
injection for shadow paging, but the underlying issue is not unique to
shadow paging in L0, e.g. if KVM is intercepting #PF because the guest
has a smaller maxphyaddr and L1 (but not L0) is using shadow paging.
Other exceptions are affected as well, e.g. if KVM is intercepting #GP
for one of SVM's workaround or for the VMware backdoor emulation stuff.
The other cases have gone unnoticed because the #DF is spurious if and
only if L1 resolves the exception, e.g. KVM's goofs go unnoticed if L1
would have injected #DF anyways.

The hack-a-fix has also led to ugly code, e.g. bailing from the emulator
if #PF injection forced a nested VM-Exit and the emulator finds itself
back in L1.  Allowing for direct-to-VM-Exit queueing also neatly solves
the async #PF in L2 mess; no need to set a magic flag and token, simply
queue a #PF nested VM-Exit.

Deal with event migration by flagging that a pending exception was queued
by userspace and check for interception at the next KVM_RUN, e.g. so that
KVM does the right thing regardless of the order in which userspace
restores nested state vs. event state.

When "getting" events from userspace, simply drop any pending excpetion
that is destined to be intercepted if there is also an injected exception
to be migrated.  Ideally, KVM would migrate both events, but that would
require new ABI, and practically speaking losing the event is unlikely to
be noticed, let alone fatal.  The injected exception is captured, RIP
still points at the original faulting instruction, etc...  So either the
injection on the target will trigger the same intercepted exception, or
the source of the intercepted exception was transient and/or
non-deterministic, thus dropping it is ok-ish.

Opportunistically add a gigantic comment above vmx_check_nested_events()
to document the priorities of all known events on Intel CPUs.  Kudos to
Jim Mattson for doing the hard work of collecting and interpreting the
priorities from various locations throughtout the SDM (because putting
them all in one place in the SDM would be too easy).

Fixes: a04aead144fd ("KVM: nSVM: fix running nested guests when npt=3D0")
Fixes: feaf0c7dc473 ("KVM: nVMX: Do not generate #DF if #PF happens during =
exception delivery into L2")
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  10 +-
 arch/x86/kvm/svm/nested.c       |  39 ++----
 arch/x86/kvm/vmx/nested.c       | 223 +++++++++++++++++++++-----------
 arch/x86/kvm/vmx/vmx.c          |   6 +-
 arch/x86/kvm/x86.c              | 142 +++++++++++++++-----
 arch/x86/kvm/x86.h              |   7 +
 6 files changed, 288 insertions(+), 139 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 478e2fef0062..b5a9c0cbb21c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -623,7 +623,6 @@ struct kvm_queued_exception {
 	u32 error_code;
 	unsigned long payload;
 	bool has_payload;
-	u8 nested_apf;
 };
=20
 struct kvm_vcpu_arch {
@@ -724,8 +723,12 @@ struct kvm_vcpu_arch {
=20
 	u8 event_exit_inst_len;
=20
+	bool exception_from_userspace;
+
 	/* Exceptions to be injected to the guest. */
 	struct kvm_queued_exception exception;
+	/* Exception VM-Exits to be synthesized to L1. */
+	struct kvm_queued_exception exception_vmexit;
=20
 	struct kvm_queued_interrupt {
 		bool injected;
@@ -836,7 +839,6 @@ struct kvm_vcpu_arch {
 		u32 id;
 		bool send_user_only;
 		u32 host_apf_flags;
-		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
 		bool pageready_pending;
 	} apf;
@@ -1500,6 +1502,8 @@ struct kvm_x86_ops {
=20
 struct kvm_x86_nested_ops {
 	void (*leave_nested)(struct kvm_vcpu *vcpu);
+	bool (*is_exception_vmexit)(struct kvm_vcpu *vcpu, u8 vector,
+				    u32 error_code);
 	int (*check_events)(struct kvm_vcpu *vcpu);
 	bool (*hv_timer_pending)(struct kvm_vcpu *vcpu);
 	void (*triple_fault)(struct kvm_vcpu *vcpu);
@@ -1754,7 +1758,7 @@ void kvm_queue_exception_p(struct kvm_vcpu *vcpu, uns=
igned nr, unsigned long pay
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error=
_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fa=
ult);
-bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 				    struct x86_exception *fault);
 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
 bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 60df9d4d19b5..ddef8fd8a9e6 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -54,24 +54,6 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *=
vcpu,
 	nested_svm_vmexit(svm);
 }
=20
-static void svm_inject_page_fault_nested(struct kvm_vcpu *vcpu, struct x86=
_exception *fault)
-{
-       struct vcpu_svm *svm =3D to_svm(vcpu);
-       WARN_ON(!is_guest_mode(vcpu));
-
-	if (vmcb12_is_intercept(&svm->nested.ctl,
-				INTERCEPT_EXCEPTION_OFFSET + PF_VECTOR) &&
-	    !svm->nested.nested_run_pending) {
-               svm->vmcb->control.exit_code =3D SVM_EXIT_EXCP_BASE + PF_VE=
CTOR;
-               svm->vmcb->control.exit_code_hi =3D 0;
-               svm->vmcb->control.exit_info_1 =3D fault->error_code;
-               svm->vmcb->control.exit_info_2 =3D fault->address;
-               nested_svm_vmexit(svm);
-       } else {
-               kvm_inject_page_fault(vcpu, fault);
-       }
-}
-
 static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index)
 {
 	struct vcpu_svm *svm =3D to_svm(vcpu);
@@ -680,9 +662,6 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmc=
b12_gpa,
 	if (ret)
 		return ret;
=20
-	if (!npt_enabled)
-		vcpu->arch.mmu->inject_page_fault =3D svm_inject_page_fault_nested;
-
 	if (!from_vmrun)
 		kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
=20
@@ -1152,16 +1131,17 @@ int nested_svm_check_permissions(struct kvm_vcpu *v=
cpu)
 	return 0;
 }
=20
-static bool nested_exit_on_exception(struct vcpu_svm *svm)
+static bool nested_svm_is_exception_vmexit(struct kvm_vcpu *vcpu, u8 vecto=
r,
+					   u32 error_code)
 {
-	unsigned int vector =3D svm->vcpu.arch.exception.vector;
+	struct vcpu_svm *svm =3D to_svm(vcpu);
=20
 	return (svm->nested.ctl.intercepts[INTERCEPT_EXCEPTION] & BIT(vector));
 }
=20
 static void nested_svm_inject_exception_vmexit(struct kvm_vcpu *vcpu)
 {
-	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception_vmexit;
 	struct vcpu_svm *svm =3D to_svm(vcpu);
=20
 	svm->vmcb->control.exit_code =3D SVM_EXIT_EXCP_BASE + ex->vector;
@@ -1222,15 +1202,19 @@ static int svm_check_nested_events(struct kvm_vcpu =
*vcpu)
 		return 0;
 	}
=20
-	if (vcpu->arch.exception.pending) {
+	if (vcpu->arch.exception_vmexit.pending) {
 		if (block_nested_exceptions)
                         return -EBUSY;
-		if (!nested_exit_on_exception(svm))
-			return 0;
 		nested_svm_inject_exception_vmexit(vcpu);
 		return 0;
 	}
=20
+	if (vcpu->arch.exception.pending) {
+		if (block_nested_exceptions)
+			return -EBUSY;
+		return 0;
+	}
+
 	if (vcpu->arch.smi_pending && !svm_smi_blocked(vcpu)) {
 		if (block_nested_events)
 			return -EBUSY;
@@ -1567,6 +1551,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcp=
u *vcpu)
=20
 struct kvm_x86_nested_ops svm_nested_ops =3D {
 	.leave_nested =3D svm_leave_nested,
+	.is_exception_vmexit =3D nested_svm_is_exception_vmexit,
 	.check_events =3D svm_check_nested_events,
 	.triple_fault =3D nested_svm_triple_fault,
 	.get_nested_state_pages =3D svm_get_nested_state_pages,
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 01cf579c0260..99ee0a1c3a4b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -438,60 +438,22 @@ static bool nested_vmx_is_page_fault_vmexit(struct vm=
cs12 *vmcs12,
 	return inequality ^ bit;
 }
=20
-
-/*
- * KVM wants to inject page-faults which it got to the guest. This function
- * checks whether in a nested guest, we need to inject them to L1 or L2.
- */
-static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned long=
 *exit_qual)
-{
-	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
-	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
-
-	if (ex->vector =3D=3D PF_VECTOR) {
-		if (ex->nested_apf) {
-			*exit_qual =3D vcpu->arch.apf.nested_apf_token;
-			return 1;
-		}
-		if (nested_vmx_is_page_fault_vmexit(vmcs12, ex->error_code)) {
-			*exit_qual =3D ex->has_payload ? ex->payload : vcpu->arch.cr2;
-			return 1;
-		}
-	} else if (vmcs12->exception_bitmap & (1u << ex->vector)) {
-		if (ex->vector =3D=3D DB_VECTOR) {
-			if (ex->has_payload) {
-				*exit_qual =3D ex->payload;
-			} else {
-				*exit_qual =3D vcpu->arch.dr6;
-				*exit_qual &=3D ~DR6_BT;
-				*exit_qual ^=3D DR6_ACTIVE_LOW;
-			}
-		} else
-			*exit_qual =3D 0;
-		return 1;
-	}
-
-	return 0;
-}
-
-
-static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu,
-		struct x86_exception *fault)
+static bool nested_vmx_is_exception_vmexit(struct kvm_vcpu *vcpu, u8 vecto=
r,
+					   u32 error_code)
 {
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
=20
-	WARN_ON(!is_guest_mode(vcpu));
+	/*
+	 * Drop bits 31:16 of the error code when performing the #PF mask+match
+	 * check.  All VMCS fields involved are 32 bits, but Intel CPUs never
+	 * set bits 31:16 and VMX disallows setting bits 31:16 in the injected
+	 * error code.  Including the to-be-dropped bits in the check might
+	 * result in an "impossible" or missed exit from L1's perspective.
+	 */
+	if (vector =3D=3D PF_VECTOR)
+		return nested_vmx_is_page_fault_vmexit(vmcs12, (u16)error_code);
=20
-	if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code) &&
-		!to_vmx(vcpu)->nested.nested_run_pending) {
-		vmcs12->vm_exit_intr_error_code =3D fault->error_code;
-		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
-				  PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
-				  INTR_INFO_DELIVER_CODE_MASK | INTR_INFO_VALID_MASK,
-				  fault->address);
-	} else {
-		kvm_inject_page_fault(vcpu, fault);
-	}
+	return (vmcs12->exception_bitmap & (1u << vector));
 }
=20
 static int nested_vmx_check_io_bitmap_controls(struct kvm_vcpu *vcpu,
@@ -2612,9 +2574,6 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, stru=
ct vmcs12 *vmcs12,
 		vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
 	}
=20
-	if (!enable_ept)
-		vcpu->arch.walk_mmu->inject_page_fault =3D vmx_inject_page_fault_nested;
-
 	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) &&
 	    WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
 				     vmcs12->guest_ia32_perf_global_ctrl))) {
@@ -3798,12 +3757,24 @@ static int vmx_complete_nested_posted_interrupt(str=
uct kvm_vcpu *vcpu)
 	return -ENXIO;
 }
=20
-static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
-					       unsigned long exit_qual)
+static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu)
 {
-	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception_vmexit;
 	u32 intr_info =3D ex->vector | INTR_INFO_VALID_MASK;
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
+	unsigned long exit_qual;
+
+	if (ex->has_payload) {
+		exit_qual =3D ex->payload;
+	} else if (ex->vector =3D=3D PF_VECTOR) {
+		exit_qual =3D vcpu->arch.cr2;
+	} else if (ex->vector =3D=3D DB_VECTOR) {
+		exit_qual =3D vcpu->arch.dr6;
+		exit_qual &=3D ~DR6_BT;
+		exit_qual ^=3D DR6_ACTIVE_LOW;
+	} else {
+		exit_qual =3D 0;
+	}
=20
 	if (ex->has_error_code) {
 				/*
@@ -3845,14 +3816,24 @@ static void nested_vmx_inject_exception_vmexit(stru=
ct kvm_vcpu *vcpu,
  * from the emulator (because such #DBs are fault-like and thus don't trig=
ger
  * actions that fire on instruction retire).
  */
-static inline unsigned long vmx_get_pending_dbg_trap(struct kvm_vcpu *vcpu)
+static unsigned long vmx_get_pending_dbg_trap(struct kvm_queued_exception =
*ex)
 {
-	if (!vcpu->arch.exception.pending ||
-	    vcpu->arch.exception.vector !=3D DB_VECTOR)
+	if (!ex->pending || ex->vector !=3D DB_VECTOR)
 		return 0;
=20
 	/* General Detect #DBs are always fault-like. */
-	return vcpu->arch.exception.payload & ~DR6_BD;
+	return ex->payload & ~DR6_BD;
+}
+
+/*
+ * Returns true if there's a pending #DB exception that is lower priority =
than
+ * a pending Monitor Trap Flag VM-Exit.  TSS T-flag #DBs are not emulated =
by
+ * KVM, but could theoretically be injected by userspace.  Note, this code=
 is
+ * imperfect, see above.
+ */
+static bool vmx_is_low_priority_db_trap(struct kvm_queued_exception *ex)
+{
+	return vmx_get_pending_dbg_trap(ex) & ~DR6_BT;
 }
=20
 /*
@@ -3864,8 +3845,9 @@ static inline unsigned long vmx_get_pending_dbg_trap(=
struct kvm_vcpu *vcpu)
  */
 static void nested_vmx_update_pending_dbg(struct kvm_vcpu *vcpu)
 {
-	unsigned long pending_dbg =3D vmx_get_pending_dbg_trap(vcpu);
+	unsigned long pending_dbg;
=20
+	pending_dbg =3D vmx_get_pending_dbg_trap(&vcpu->arch.exception);
 	if (pending_dbg)
 		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, pending_dbg);
 }
@@ -3876,11 +3858,93 @@ static bool nested_vmx_preemption_timer_pending(str=
uct kvm_vcpu *vcpu)
 	       to_vmx(vcpu)->nested.preemption_timer_expired;
 }
=20
+/*
+ * Per the Intel SDM's table "Priority Among Concurrent Events", with minor
+ * edits to fill in missing examples, e.g. #DB due to split-lock accesses,
+ * and less minor edits to splice in the priority of VMX Non-Root specific
+ * events, e.g. MTF and NMI/INTR-window exiting.
+ *
+ * 1 Hardware Reset and Machine Checks
+ *	- RESET
+ *	- Machine Check
+ *
+ * 2 Trap on Task Switch
+ *	- T flag in TSS is set (on task switch)
+ *
+ * 3 External Hardware Interventions
+ *	- FLUSH
+ *	- STOPCLK
+ *	- SMI
+ *	- INIT
+ *
+ * 3.5 Monitor Trap Flag (MTF) VM-exit[1]
+ *
+ * 4 Traps on Previous Instruction
+ *	- Breakpoints
+ *	- Trap-class Debug Exceptions (#DB due to TF flag set, data/I-O
+ *	  breakpoint, or #DB due to a split-lock access)
+ *
+ * 4.3	VMX-preemption timer expired VM-exit
+ *
+ * 4.6	NMI-window exiting VM-exit[2]
+ *
+ * 5 Nonmaskable Interrupts (NMI)
+ *
+ * 5.5 Interrupt-window exiting VM-exit and Virtual-interrupt delivery
+ *
+ * 6 Maskable Hardware Interrupts
+ *
+ * 7 Code Breakpoint Fault
+ *
+ * 8 Faults from Fetching Next Instruction
+ *	- Code-Segment Limit Violation
+ *	- Code Page Fault
+ *	- Control protection exception (missing ENDBRANCH at target of indirect
+ *					call or jump)
+ *
+ * 9 Faults from Decoding Next Instruction
+ *	- Instruction length > 15 bytes
+ *	- Invalid Opcode
+ *	- Coprocessor Not Available
+ *
+ *10 Faults on Executing Instruction
+ *	- Overflow
+ *	- Bound error
+ *	- Invalid TSS
+ *	- Segment Not Present
+ *	- Stack fault
+ *	- General Protection
+ *	- Data Page Fault
+ *	- Alignment Check
+ *	- x86 FPU Floating-point exception
+ *	- SIMD floating-point exception
+ *	- Virtualization exception
+ *	- Control protection exception
+ *
+ * [1] Per the "Monitor Trap Flag" section: System-management interrupts (=
SMIs),
+ *     INIT signals, and higher priority events take priority over MTF VM =
exits.
+ *     MTF VM exits take priority over debug-trap exceptions and lower pri=
ority
+ *     events.
+ *
+ * [2] Debug-trap exceptions and higher priority events take priority over=
 VM exits
+ *     caused by the VMX-preemption timer.  VM exits caused by the VMX-pre=
emption
+ *     timer take priority over VM exits caused by the "NMI-window exiting"
+ *     VM-execution control and lower priority events.
+ *
+ * [3] Debug-trap exceptions and higher priority events take priority over=
 VM exits
+ *     caused by "NMI-window exiting".  VM exits caused by this control ta=
ke
+ *     priority over non-maskable interrupts (NMIs) and lower priority eve=
nts.
+ *
+ * [4] Virtual-interrupt delivery has the same priority as that of VM exit=
s due to
+ *     the 1-setting of the "interrupt-window exiting" VM-execution contro=
l.  Thus,
+ *     non-maskable interrupts (NMIs) and higher priority events take prio=
rity over
+ *     delivery of a virtual interrupt; delivery of a virtual interrupt ta=
kes
+ *     priority over external interrupts and lower priority events.
+ */
 static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
-	unsigned long exit_qual;
 	/*
 	 * Only a pending nested run blocks a pending exception.  If there is a
 	 * previously injected event, the pending exception occurred while said
@@ -3918,19 +3982,20 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 		/* Fallthrough, the SIPI is completely ignored. */
 	}
=20
-	/*
-	 * Process exceptions that are higher priority than Monitor Trap Flag:
-	 * fault-like exceptions, TSS T flag #DB (not emulated by KVM, but
-	 * could theoretically come in from userspace), and ICEBP (INT1).
-	 */
+	if (vcpu->arch.exception_vmexit.pending &&
+	    !vmx_is_low_priority_db_trap(&vcpu->arch.exception_vmexit)) {
+		if (block_nested_exceptions)
+			return -EBUSY;
+
+		nested_vmx_inject_exception_vmexit(vcpu);
+		return 0;
+	}
+
 	if (vcpu->arch.exception.pending &&
-	    !(vmx_get_pending_dbg_trap(vcpu) & ~DR6_BT)) {
+	    !vmx_is_low_priority_db_trap(&vcpu->arch.exception)) {
 		if (block_nested_exceptions)
 			return -EBUSY;
-		if (!nested_vmx_check_exception(vcpu, &exit_qual))
-			goto no_vmexit;
-		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
-		return 0;
+		goto no_vmexit;
 	}
=20
 	if (vmx->nested.mtf_pending) {
@@ -3941,13 +4006,18 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 		return 0;
 	}
=20
+	if (vcpu->arch.exception_vmexit.pending) {
+		if (block_nested_exceptions)
+			return -EBUSY;
+
+		nested_vmx_inject_exception_vmexit(vcpu);
+		return 0;
+	}
+
 	if (vcpu->arch.exception.pending) {
 		if (block_nested_exceptions)
 			return -EBUSY;
-		if (!nested_vmx_check_exception(vcpu, &exit_qual))
-			goto no_vmexit;
-		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
-		return 0;
+		goto no_vmexit;
 	}
=20
 	if (nested_vmx_preemption_timer_pending(vcpu)) {
@@ -6828,6 +6898,7 @@ __init int nested_vmx_hardware_setup(int (*exit_handl=
ers[])(struct kvm_vcpu *))
=20
 struct kvm_x86_nested_ops vmx_nested_ops =3D {
 	.leave_nested =3D vmx_leave_nested,
+	.is_exception_vmexit =3D nested_vmx_is_exception_vmexit,
 	.check_events =3D vmx_check_nested_events,
 	.hv_timer_pending =3D nested_vmx_preemption_timer_pending,
 	.triple_fault =3D nested_vmx_triple_fault,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9f9b601fd6f6..0420bc6d418a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1577,7 +1577,9 @@ static void vmx_update_emulated_instruction(struct kv=
m_vcpu *vcpu)
 	 */
 	if (nested_cpu_has_mtf(vmcs12) &&
 	    (!vcpu->arch.exception.pending ||
-	     vcpu->arch.exception.vector =3D=3D DB_VECTOR))
+	     vcpu->arch.exception.vector =3D=3D DB_VECTOR) &&
+	    (!vcpu->arch.exception_vmexit.pending ||
+	     vcpu->arch.exception_vmexit.vector =3D=3D DB_VECTOR))
 		vmx->nested.mtf_pending =3D true;
 	else
 		vmx->nested.mtf_pending =3D false;
@@ -5479,7 +5481,7 @@ static bool vmx_emulation_required_with_pending_excep=
tion(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
 	return vmx->emulation_required && !vmx->rmode.vm86_active &&
-	       vcpu->arch.exception.pending;
+	       kvm_is_exception_pending(vcpu);
 }
=20
 static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 327a935712fb..bb1bd332d535 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -615,6 +615,21 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vc=
pu,
 }
 EXPORT_SYMBOL_GPL(kvm_deliver_exception_payload);
=20
+static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int=
 vector,
+				       bool has_error_code, u32 error_code,
+				       bool has_payload, unsigned long payload)
+{
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception_vmexit;
+
+	ex->vector =3D vector;
+	ex->injected =3D false;
+	ex->pending =3D true;
+	ex->has_error_code =3D has_error_code;
+	ex->error_code =3D error_code;
+	ex->has_payload =3D has_payload;
+	ex->payload =3D payload;
+}
+
 static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 		unsigned nr, bool has_error, u32 error_code,
 	        bool has_payload, unsigned long payload, bool reinject)
@@ -624,18 +639,31 @@ static void kvm_multiple_exception(struct kvm_vcpu *v=
cpu,
=20
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
=20
+	/*
+	 * If the exception is destined for L2 and isn't being reinjected,
+	 * morph it to a VM-Exit if L1 wants to intercept the exception.  A
+	 * previously injected exception is not checked because it was checked
+	 * when it was original queued, and re-checking is incorrect if _L1_
+	 * injected the exception, in which case it's exempt from interception.
+	 */
+	if (!reinject && is_guest_mode(vcpu) &&
+	    kvm_x86_ops.nested_ops->is_exception_vmexit(vcpu, nr, error_code)) {
+		kvm_queue_exception_vmexit(vcpu, nr, has_error, error_code,
+					   has_payload, payload);
+		return;
+	}
+
 	if (!vcpu->arch.exception.pending && !vcpu->arch.exception.injected) {
 	queue:
 		if (reinject) {
 			/*
-			 * On vmentry, vcpu->arch.exception.pending is only
-			 * true if an event injection was blocked by
-			 * nested_run_pending.  In that case, however,
-			 * vcpu_enter_guest requests an immediate exit,
-			 * and the guest shouldn't proceed far enough to
-			 * need reinjection.
+			 * On VM-Entry, an exception can be pending if and only
+			 * if event injection was blocked by nested_run_pending.
+			 * In that case, however, vcpu_enter_guest() requests an
+			 * immediate exit, and the guest shouldn't proceed far
+			 * enough to need reinjection.
 			 */
-			WARN_ON_ONCE(vcpu->arch.exception.pending);
+			WARN_ON_ONCE(kvm_is_exception_pending(vcpu));
 			vcpu->arch.exception.injected =3D true;
 			if (WARN_ON_ONCE(has_payload)) {
 				/*
@@ -738,19 +766,22 @@ static int complete_emulated_insn_gp(struct kvm_vcpu =
*vcpu, int err)
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fa=
ult)
 {
 	++vcpu->stat.pf_guest;
-	vcpu->arch.exception.nested_apf =3D
-		is_guest_mode(vcpu) && fault->async_page_fault;
-	if (vcpu->arch.exception.nested_apf) {
-		vcpu->arch.apf.nested_apf_token =3D fault->address;
-		kvm_queue_exception_e(vcpu, PF_VECTOR, fault->error_code);
-	} else {
+
+	/*
+	 * Async #PF in L2 is always forwarded to L1 as a VM-Exit regardless of
+	 * whether or not L1 wants to intercept "regular" #PF.
+	 */
+	if (is_guest_mode(vcpu) && fault->async_page_fault)
+		kvm_queue_exception_vmexit(vcpu, PF_VECTOR,
+					   true, fault->error_code,
+					   true, fault->address);
+	else
 		kvm_queue_exception_e_p(vcpu, PF_VECTOR, fault->error_code,
 					fault->address);
-	}
 }
 EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
=20
-bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 				    struct x86_exception *fault)
 {
 	struct kvm_mmu *fault_mmu;
@@ -769,7 +800,6 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vc=
pu,
 				       fault_mmu->root.hpa);
=20
 	fault_mmu->inject_page_fault(vcpu, fault);
-	return fault->nested_page_fault;
 }
 EXPORT_SYMBOL_GPL(kvm_inject_emulated_page_fault);
=20
@@ -4692,7 +4722,7 @@ static int kvm_vcpu_ready_for_interrupt_injection(str=
uct kvm_vcpu *vcpu)
 	return (kvm_arch_interrupt_allowed(vcpu) &&
 		kvm_cpu_accept_dm_intr(vcpu) &&
 		!kvm_event_needs_reinjection(vcpu) &&
-		!vcpu->arch.exception.pending);
+		!kvm_is_exception_pending(vcpu));
 }
=20
 static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
@@ -4821,13 +4851,27 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vc=
pu *vcpu,
 static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 					       struct kvm_vcpu_events *events)
 {
-	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	struct kvm_queued_exception *ex;
=20
 	process_nmi(vcpu);
=20
 	if (kvm_check_request(KVM_REQ_SMI, vcpu))
 		process_smi(vcpu);
=20
+	/*
+	 * KVM's ABI only allows for one exception to be migrated.  Luckily,
+	 * the only time there can be two queued exceptions is if there's a
+	 * non-exiting _injected_ exception, and a pending exiting exception.
+	 * In that case, ignore the VM-Exiting exception as it's an extension
+	 * of the injected exception.
+	 */
+	if (vcpu->arch.exception_vmexit.pending &&
+	    !vcpu->arch.exception.pending &&
+	    !vcpu->arch.exception.injected)
+		ex =3D &vcpu->arch.exception_vmexit;
+	else
+		ex =3D &vcpu->arch.exception;
+
 	/*
 	 * In guest mode, payload delivery should be deferred if the exception
 	 * will be intercepted by L1, e.g. KVM should not modifying CR2 if L1
@@ -4929,6 +4973,19 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct=
 kvm_vcpu *vcpu,
 		return -EINVAL;
=20
 	process_nmi(vcpu);
+
+	/*
+	 * Flag that userspace is stuffing an exception, the next KVM_RUN will
+	 * morph the exception to a VM-Exit if appropriate.  Do this only for
+	 * pending exceptions, already-injected exceptions are not subject to
+	 * intercpetion.  Note, userspace that conflates pending and injected
+	 * is hosed, and will incorrectly convert an injected exception into a
+	 * pending exception, which in turn may cause a spurious VM-Exit.
+	 */
+	vcpu->arch.exception_from_userspace =3D events->exception.pending;
+
+	vcpu->arch.exception_vmexit.pending =3D false;
+
 	vcpu->arch.exception.injected =3D events->exception.injected;
 	vcpu->arch.exception.pending =3D events->exception.pending;
 	vcpu->arch.exception.vector =3D events->exception.nr;
@@ -7825,18 +7882,17 @@ static void toggle_interruptibility(struct kvm_vcpu=
 *vcpu, u32 mask)
 	}
 }
=20
-static bool inject_emulated_exception(struct kvm_vcpu *vcpu)
+static void inject_emulated_exception(struct kvm_vcpu *vcpu)
 {
 	struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt;
+
 	if (ctxt->exception.vector =3D=3D PF_VECTOR)
-		return kvm_inject_emulated_page_fault(vcpu, &ctxt->exception);
-
-	if (ctxt->exception.error_code_valid)
+		kvm_inject_emulated_page_fault(vcpu, &ctxt->exception);
+	else if (ctxt->exception.error_code_valid)
 		kvm_queue_exception_e(vcpu, ctxt->exception.vector,
 				      ctxt->exception.error_code);
 	else
 		kvm_queue_exception(vcpu, ctxt->exception.vector);
-	return false;
 }
=20
 static struct x86_emulate_ctxt *alloc_emulate_ctxt(struct kvm_vcpu *vcpu)
@@ -8449,8 +8505,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gp=
a_t cr2_or_gpa,
=20
 	if (ctxt->have_exception) {
 		r =3D 1;
-		if (inject_emulated_exception(vcpu))
-			return r;
+		inject_emulated_exception(vcpu);
 	} else if (vcpu->arch.pio.count) {
 		if (!vcpu->arch.pio.in) {
 			/* FIXME: return into emulator if single-stepping.  */
@@ -9348,7 +9403,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu=
, bool *req_immediate_exit)
 	 */
 	if (vcpu->arch.exception.injected)
 		kvm_inject_exception(vcpu);
-	else if (vcpu->arch.exception.pending)
+	else if (kvm_is_exception_pending(vcpu))
 		; /* see above */
 	else if (vcpu->arch.nmi_injected)
 		static_call(kvm_x86_inject_nmi)(vcpu);
@@ -9375,6 +9430,14 @@ static int inject_pending_event(struct kvm_vcpu *vcp=
u, bool *req_immediate_exit)
 	if (r < 0)
 		goto out;
=20
+	/*
+	 * A pending exception VM-Exit should either result in nested VM-Exit
+	 * or force an immediate re-entry and exit to/from L2, and exception
+	 * VM-Exits cannot be injected (flag should _never_ be set).
+	 */
+	WARN_ON_ONCE(vcpu->arch.exception_vmexit.injected ||
+		     vcpu->arch.exception_vmexit.pending);
+
 	/*
 	 * New events, other than exceptions, cannot be injected if KVM needs
 	 * to re-inject a previous event.  See above comments on re-injecting
@@ -9477,7 +9540,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu=
, bool *req_immediate_exit)
 	    kvm_x86_ops.nested_ops->hv_timer_pending(vcpu))
 		*req_immediate_exit =3D true;
=20
-	WARN_ON(vcpu->arch.exception.pending);
+	WARN_ON(kvm_is_exception_pending(vcpu));
 	return 0;
=20
 out:
@@ -10349,8 +10412,8 @@ static inline bool kvm_vcpu_running(struct kvm_vcpu=
 *vcpu)
 /* Called within kvm->srcu read side.  */
 static int vcpu_run(struct kvm_vcpu *vcpu)
 {
-	int r;
 	struct kvm *kvm =3D vcpu->kvm;
+	int r;
=20
 	vcpu->arch.l1tf_flush_l1d =3D true;
=20
@@ -10486,6 +10549,7 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
=20
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
 	struct kvm_run *kvm_run =3D vcpu->run;
 	struct kvm *kvm =3D vcpu->kvm;
 	int r;
@@ -10545,6 +10609,21 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		}
 	}
=20
+	/*
+	 * If userspace set a pending exception and L2 is active, convert it to
+	 * a pending VM-Exit if L1 wants to intercept the exception.
+	 */
+	if (vcpu->arch.exception_from_userspace && is_guest_mode(vcpu) &&
+	    kvm_x86_ops.nested_ops->is_exception_vmexit(vcpu, ex->vector,
+							ex->error_code)) {
+		kvm_queue_exception_vmexit(vcpu, ex->vector,
+					   ex->has_error_code, ex->error_code,
+					   ex->has_payload, ex->payload);
+		ex->injected =3D false;
+		ex->pending =3D false;
+	}
+	vcpu->arch.exception_from_userspace =3D false;
+
 	if (unlikely(vcpu->arch.complete_userspace_io)) {
 		int (*cui)(struct kvm_vcpu *) =3D vcpu->arch.complete_userspace_io;
 		vcpu->arch.complete_userspace_io =3D NULL;
@@ -10649,6 +10728,7 @@ static void __set_regs(struct kvm_vcpu *vcpu, struc=
t kvm_regs *regs)
 	kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
=20
 	vcpu->arch.exception.pending =3D false;
+	vcpu->arch.exception_vmexit.pending =3D false;
=20
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
@@ -11013,7 +11093,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_=
vcpu *vcpu,
=20
 	if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) {
 		r =3D -EBUSY;
-		if (vcpu->arch.exception.pending)
+		if (kvm_is_exception_pending(vcpu))
 			goto out;
 		if (dbg->control & KVM_GUESTDBG_INJECT_DB)
 			kvm_queue_exception(vcpu, DB_VECTOR);
@@ -12147,7 +12227,7 @@ static inline bool kvm_vcpu_has_events(struct kvm_v=
cpu *vcpu)
 	if (vcpu->arch.pv.pv_unhalted)
 		return true;
=20
-	if (vcpu->arch.exception.pending)
+	if (kvm_is_exception_pending(vcpu))
 		return true;
=20
 	if (kvm_test_request(KVM_REQ_NMI, vcpu) ||
@@ -12414,7 +12494,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
 {
 	if (unlikely(!lapic_in_kernel(vcpu) ||
 		     kvm_event_needs_reinjection(vcpu) ||
-		     vcpu->arch.exception.pending))
+		     kvm_is_exception_pending(vcpu)))
 		return false;
=20
 	if (kvm_hlt_in_guest(vcpu->kvm) && !kvm_can_deliver_async_pf(vcpu))
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index d8b44913aa62..3be9ccd6492e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -61,10 +61,17 @@ static inline unsigned int __shrink_ple_window(unsigned=
 int val,
 void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu);
 int kvm_check_nested_events(struct kvm_vcpu *vcpu);
=20
+static inline bool kvm_is_exception_pending(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.exception.pending ||
+	       vcpu->arch.exception_vmexit.pending;
+}
+
 static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.exception.pending =3D false;
 	vcpu->arch.exception.injected =3D false;
+	vcpu->arch.exception_vmexit.pending =3D false;
 }
=20
 static inline void kvm_queue_interrupt(struct kvm_vcpu *vcpu, u8 vector,
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DB8E7C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:29:51 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346152AbiCKDat (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:30:49 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44610 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346137AbiCKDaD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:30:03 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FF84F65C4
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:37 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 p8-20020a17090a74c800b001bf257861efso7072786pjl.6
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=/e3NuCK6DEBMSiOBllCLklyOmWNes/4L3RuR/tOp3B8=;
        b=XOLboWLfFXbX2EXQeMjKwLV6hLREdiguZxYDnuFwBuG0g6/gIna0YR6QYqHT4p1+Ai
         iuUn2KTOej2omvllKM73JZYbjdxE9EH2tCjvi9lMJi4tpV79z7foRGjphhTiEluZfi32
         v7zz4eqgOYJ6sYADUd7RxRMgZDcQveHCFUYvcM667PaA6vbJDeify+PRBuO2MYkw3A9/
         EXxs4MwG7h/dawghwislfrsuM2lVM/v/qYmsDiB2dm2BMQ/HkDN7Qk/E+fpqYzqWI2er
         cbNwr9ZB7LEzUgD3U1/FlnhvlNTrPWhgmTv2u3Fqr82M/xVRU6OPPp8OsAXk7bNyYnAp
         osIw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=/e3NuCK6DEBMSiOBllCLklyOmWNes/4L3RuR/tOp3B8=;
        b=p/o3Wq8/gDZkzpuYCKMowSvOP6xsUqeN4RzIXMBoKvMv+nQooTFP7PqTBSHb286oag
         HBSvN/fur9pvuIScPWuF0MyLjOJ0tLS1E0/tHkinGN6D4Tb8YPqlgUE8b4Aea7vYqvM1
         S+AiweqP6s0Nlor2Q5neSaEE5TrHQJJenJPVp+Qh5LmirBuHtwDZFoNuAUrBtS8DWjWj
         hAq1blKQ5bYVC+rot/VuqHgotHQ7IRhpp0D1gZsJunNrwKnXBiaFuQOvqxW8V5QC5K76
         LL9DpKvNXvxZ2ugQnQ4N+OayGoNf0hCbRNnrwxxqhY+el7CrFERIUv+CvG6ESDYySW7m
         iZxg==
X-Gm-Message-State: AOAM531KUPVLbLivjHru3r8r1tWEtemMQEfxNRVH9lFl5JDykl9THyAj
        RR1P9yPO99kEr+T8HAjfwR+2lB9fJfo=
X-Google-Smtp-Source: 
 ABdhPJzFYOk3S8UZ5kwOhBOm3APg84bUmiikFn1or8h3TTrOWPVhXJfsCuqcMNAwQ5xgMZoKJf9vC65jKlg=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90b:4d81:b0:1bf:8ce4:4f51 with SMTP id
 oj1-20020a17090b4d8100b001bf8ce44f51mr404271pjb.0.1646969316456; Thu, 10 Mar
 2022 19:28:36 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:27:59 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-20-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 19/21] KVM: VMX: Update MTF and ICEBP comments to document
 KVM's subtle behavior
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Document the oddities of ICEBP interception (trap-like #DB is intercepted
as a fault-like exception), and how using VMX's inner "skip" helper
deliberately bypasses the pending MTF and single-step #DB logic.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0420bc6d418a..ae88d42289ce 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1570,9 +1570,13 @@ static void vmx_update_emulated_instruction(struct k=
vm_vcpu *vcpu)
=20
 	/*
 	 * Per the SDM, MTF takes priority over debug-trap exceptions besides
-	 * T-bit traps. As instruction emulation is completed (i.e. at the
-	 * instruction boundary), any #DB exception pending delivery must be a
-	 * debug-trap. Record the pending MTF state to be delivered in
+	 * TSS T-bit traps and ICEBP (INT1).  KVM doesn't emulate T-bit traps
+	 * or ICEBP (in the emulator proper), and skipping of ICEBP after an
+	 * intercepted #DB deliberately avoids single-step #DB and MTF updates
+	 * as ICEBP is higher priority than both.  As instruction emulation is
+	 * completed at this point (i.e. KVM is at the instruction boundary),
+	 * any #DB exception pending delivery must be a debug-trap of lower
+	 * priority than MTF.  Record the pending MTF state to be delivered in
 	 * vmx_check_nested_events().
 	 */
 	if (nested_cpu_has_mtf(vmcs12) &&
@@ -4924,8 +4928,10 @@ static int handle_exception_nmi(struct kvm_vcpu *vcp=
u)
 			 * instruction.  ICEBP generates a trap-like #DB, but
 			 * despite its interception control being tied to #DB,
 			 * is an instruction intercept, i.e. the VM-Exit occurs
-			 * on the ICEBP itself.  Note, skipping ICEBP also
-			 * clears STI and MOVSS blocking.
+			 * on the ICEBP itself.  Use the inner "skip" helper to
+			 * avoid single-step #DB and MTF updates, as ICEBP is
+			 * higher priority.  Note, skipping ICEBP still clears
+			 * STI and MOVSS blocking.
 			 *
 			 * For all other #DBs, set vmcs.PENDING_DBG_EXCEPTIONS.BS
 			 * if single-step is enabled in RFLAGS and STI or MOVSS
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 094F0C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:29:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346246AbiCKDa7 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:30:59 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41454 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346157AbiCKDaS (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:30:18 -0500
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E25DF68C7
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:39 -0800 (PST)
Received: by mail-pg1-x54a.google.com with SMTP id
 v4-20020a63f844000000b003745fd0919aso4038067pgj.20
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=kY4qxZiDhaDyFP3epErap/Y5WaMzmFSDTv+6h6cJy5Q=;
        b=XVKTugbTQryXzy1R7WV0Z3hRtxfzhj+dkqHYT6B4Nv19BC+/BVMgV6Fq/Rd2cwPRyH
         hle5kijHnNWLIWX+5fmf9D8bMahWsIE4/0EiXs/sCaNEH6Wj0qi47YyPanY3buFc1DRl
         e+MQ76+3mo9UXleNnGj4ody5g+nOQd+BGgVXJYxRk9mKaJ+HmK4oECUfZyk1XGauKIgR
         MYaGoVNkY6HM7lZANjd7+9ctlV2BHGFcxffcrizuFBVGrXEeoowbcsBJdM6pa1LuqRAZ
         B46D6WA0qW8bpVB4kSKwxzVIF4TMoprocOme3+hRVLGUfrVA4RwVDK6rEO91Y9COJscn
         pWvA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=kY4qxZiDhaDyFP3epErap/Y5WaMzmFSDTv+6h6cJy5Q=;
        b=ldID28e/JoRwHu+decYLccKqI23g4moYT+AxyDba8dXQ9rTdgveRk9ZvElK3gCRMk8
         XMvARTFfw9B+KZq6mEO7eBlfe39yz163pYZbZa9odtcrfkaeVyKn/jVUHhNbFbWad/jc
         wxOEQD6hOf9Zkm0/JgM/UPatuU2pnaUMJPHe4j47jWXRoPng5n/r0nZzv330lHKiS5hb
         eoEHGdDor8FJXiVhgqJbanDLcgVAiRqaqQLCG9VXi679bsVHjQ8nelJA0Pgt0J7IEFcI
         8jn5deZpygYw7gz07On7Q1JT82UXih+JY5GRaOtlSUhQoqD2rJKnvoShrJM1n0OlAUYE
         ipeQ==
X-Gm-Message-State: AOAM532Sn55pZP1GxKRI/s3rEZQJuafl/2iBwthrmNw/oPveW7QizcZ9
        YLza12qpmBLcBLE7OBGbLMe4MFsvNK4=
X-Google-Smtp-Source: 
 ABdhPJxnyPDDvxsy9OFORDORx7hzxzOFHpNId+IL/amBru67apLDojK5bIf1+3+hEt0tEuJ4Z7f8RMNw7oU=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90b:1e10:b0:1bf:6c78:54a9 with SMTP id
 pg16-20020a17090b1e1000b001bf6c7854a9mr403452pjb.1.1646969318414; Thu, 10 Mar
 2022 19:28:38 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:28:00 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-21-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 20/21] KVM: selftests: Use uapi header to get VMX and SVM exit
 reasons/codes
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Include the vmx.h and svm.h uapi headers that KVM so kindly provides
instead of manually defining all the same exit reasons/code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../selftests/kvm/include/x86_64/svm_util.h   |  5 +-
 .../selftests/kvm/include/x86_64/vmx.h        | 51 +------------------
 2 files changed, 4 insertions(+), 52 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/svm_util.h b/tools/=
testing/selftests/kvm/include/x86_64/svm_util.h
index a25aabd8f5e7..2bc9b48a0a01 100644
--- a/tools/testing/selftests/kvm/include/x86_64/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
@@ -9,6 +9,8 @@
 #ifndef SELFTEST_KVM_SVM_UTILS_H
 #define SELFTEST_KVM_SVM_UTILS_H
=20
+#include <asm/svm.h>
+
 #include <stdint.h>
 #include "svm.h"
 #include "processor.h"
@@ -16,9 +18,6 @@
 #define CPUID_SVM_BIT		2
 #define CPUID_SVM		BIT_ULL(CPUID_SVM_BIT)
=20
-#define SVM_EXIT_MSR		0x07c
-#define SVM_EXIT_VMMCALL	0x081
-
 struct svm_test_data {
 	/* VMCB */
 	struct vmcb *vmcb; /* gva */
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testi=
ng/selftests/kvm/include/x86_64/vmx.h
index 583ceb0d1457..9b7641b5bca8 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -8,6 +8,8 @@
 #ifndef SELFTEST_KVM_VMX_H
 #define SELFTEST_KVM_VMX_H
=20
+#include <asm/vmx.h>
+
 #include <stdint.h>
 #include "processor.h"
 #include "apic.h"
@@ -97,55 +99,6 @@
 #define VMX_MISC_SAVE_EFER_LMA			0x00000020
=20
 #define EXIT_REASON_FAILED_VMENTRY	0x80000000
-#define EXIT_REASON_EXCEPTION_NMI	0
-#define EXIT_REASON_EXTERNAL_INTERRUPT	1
-#define EXIT_REASON_TRIPLE_FAULT	2
-#define EXIT_REASON_INTERRUPT_WINDOW	7
-#define EXIT_REASON_NMI_WINDOW		8
-#define EXIT_REASON_TASK_SWITCH		9
-#define EXIT_REASON_CPUID		10
-#define EXIT_REASON_HLT			12
-#define EXIT_REASON_INVD		13
-#define EXIT_REASON_INVLPG		14
-#define EXIT_REASON_RDPMC		15
-#define EXIT_REASON_RDTSC		16
-#define EXIT_REASON_VMCALL		18
-#define EXIT_REASON_VMCLEAR		19
-#define EXIT_REASON_VMLAUNCH		20
-#define EXIT_REASON_VMPTRLD		21
-#define EXIT_REASON_VMPTRST		22
-#define EXIT_REASON_VMREAD		23
-#define EXIT_REASON_VMRESUME		24
-#define EXIT_REASON_VMWRITE		25
-#define EXIT_REASON_VMOFF		26
-#define EXIT_REASON_VMON		27
-#define EXIT_REASON_CR_ACCESS		28
-#define EXIT_REASON_DR_ACCESS		29
-#define EXIT_REASON_IO_INSTRUCTION	30
-#define EXIT_REASON_MSR_READ		31
-#define EXIT_REASON_MSR_WRITE		32
-#define EXIT_REASON_INVALID_STATE	33
-#define EXIT_REASON_MWAIT_INSTRUCTION	36
-#define EXIT_REASON_MONITOR_INSTRUCTION 39
-#define EXIT_REASON_PAUSE_INSTRUCTION	40
-#define EXIT_REASON_MCE_DURING_VMENTRY	41
-#define EXIT_REASON_TPR_BELOW_THRESHOLD 43
-#define EXIT_REASON_APIC_ACCESS		44
-#define EXIT_REASON_EOI_INDUCED		45
-#define EXIT_REASON_EPT_VIOLATION	48
-#define EXIT_REASON_EPT_MISCONFIG	49
-#define EXIT_REASON_INVEPT		50
-#define EXIT_REASON_RDTSCP		51
-#define EXIT_REASON_PREEMPTION_TIMER	52
-#define EXIT_REASON_INVVPID		53
-#define EXIT_REASON_WBINVD		54
-#define EXIT_REASON_XSETBV		55
-#define EXIT_REASON_APIC_WRITE		56
-#define EXIT_REASON_INVPCID		58
-#define EXIT_REASON_PML_FULL		62
-#define EXIT_REASON_XSAVES		63
-#define EXIT_REASON_XRSTORS		64
-#define LAST_EXIT_REASON		64
=20
 enum vmcs_field {
 	VIRTUAL_PROCESSOR_ID		=3D 0x00000000,
--=20
2.35.1.723.g4982287a31-goog
From nobody Tue Jun 23 03:45:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 58585C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 11 Mar 2022 03:30:02 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1346149AbiCKDbB (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 10 Mar 2022 22:31:01 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41440 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1346102AbiCKDaT (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 10 Mar 2022 22:30:19 -0500
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43148F68F7
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:41 -0800 (PST)
Received: by mail-pl1-x649.google.com with SMTP id
 x10-20020a170902a38a00b00151e09a4e15so3821071pla.15
        for <linux-kernel@vger.kernel.org>;
 Thu, 10 Mar 2022 19:28:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=cAMiTqYYBxmgWCvoOynsSSin6TLDW3p525uMTOd0VH4=;
        b=VHKnyEdv0xqSYSYlQ1Jpb+d13Q/A5BI5XvyKytQp2L2NbjeENyt/+F1YVQHUDyHgHZ
         ohMxnP27BZFYsu9rqYkwhx+TgfmaMvAJNiPCXislAg/hqLPO++xYvfkcYogitlnKiSf+
         YyFdiJVvWcmczYkJethvxoB4Fj2kJjAzmYo8G02o2NbEs+momn5e/82fIKnsC6Kg2MgM
         m0mgDTQn6HAKc5gHyz36WNVOhW2flLrpAZzmYCTwKOPJJpu9JaZgwKFQGHoEB251UI3/
         2dfoEs6wWXenu+bIOBM5bfghkoqHzyRslRH5Nak2yVsXBYXetG8tpt+IcIjwTp/xQngN
         Q4Xg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=cAMiTqYYBxmgWCvoOynsSSin6TLDW3p525uMTOd0VH4=;
        b=FZRLNK0zWJdI67xdLCV4FF3xE5n0hIeM2kH6j08BEr38xlkJxFP6Hc6HR3W98e3h4G
         7rPE4K9nI4mSj49zLxR9ROEARabkOqBt3LUDUdpndhdWdjnLCKWrQAmETnSzt+tJ/mC3
         9azk3YRuGU1Zcjrwwsh0YgwcNfoS0SPvxpatZjkyeGpd8+HzaE8WtymnogZCuv+kf+yc
         R3TVR8dZwij+tGa9823BkLhhKF17yjcXpmxN3bMpLCVP9Z8edILjnHmXdSV7Nb9/KtD3
         vGnVBLieKs+XRbeF+FH/zpACCZCukp+gyurqnVJ10UwnwMEourKG8QLrpw1Cu0KdawfF
         B9Yg==
X-Gm-Message-State: AOAM5319ljmt1nnysQegVfrqHrF+JJX2MHr6Lcd02Xbs5fHMLQvLwV6i
        0ZICJUCUWqLLS6M/7DRVzpnFIZH5yeI=
X-Google-Smtp-Source: 
 ABdhPJz5lEznNR2iDFQv1PAPPDs7P230i9UA7M1toCuScPyaIApPuBC+gsciPzrqruDrJXtqX52qwfSlWAU=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90b:1e10:b0:1bf:6c78:54a9 with SMTP id
 pg16-20020a17090b1e1000b001bf6c7854a9mr403463pjb.1.1646969320203; Thu, 10 Mar
 2022 19:28:40 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 11 Mar 2022 03:28:01 +0000
In-Reply-To: <20220311032801.3467418-1-seanjc@google.com>
Message-Id: <20220311032801.3467418-22-seanjc@google.com>
Mime-Version: 1.0
References: <20220311032801.3467418-1-seanjc@google.com>
X-Mailer: git-send-email 2.35.1.723.g4982287a31-goog
Subject: [PATCH 21/21] KVM: selftests: Add an x86-only test to verify nested
 exception queueing
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add a test to verify that KVM_{G,S}ET_EVENTS play nice with pending vs.
injected exceptions when an exception is being queued for L2, and that
KVM correctly handles L1's exception intercept wants.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/x86_64/nested_exceptions_test.c       | 307 ++++++++++++++++++
 3 files changed, 309 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/nested_exceptions_te=
st.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftes=
ts/kvm/.gitignore
index 9b67343dc4ab..c8b8203ca867 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -23,6 +23,7 @@
 /x86_64/hyperv_features
 /x86_64/mmio_warning_test
 /x86_64/mmu_role_test
+/x86_64/nested_exceptions_test
 /x86_64/platform_info_test
 /x86_64/pmu_event_filter_test
 /x86_64/set_boot_cpu_id
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests=
/kvm/Makefile
index 04099f453b59..5679d1a79a83 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -56,6 +56,7 @@ TEST_GEN_PROGS_x86_64 +=3D x86_64/kvm_clock_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/kvm_pv_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/mmio_warning_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/mmu_role_test
+TEST_GEN_PROGS_x86_64 +=3D x86_64/nested_exceptions_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/platform_info_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/pmu_event_filter_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/set_boot_cpu_id
diff --git a/tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c b/=
tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c
new file mode 100644
index 000000000000..1a2b2010e8f3
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#define _GNU_SOURCE /* for program_invocation_short_name */
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "vmx.h"
+#include "svm_util.h"
+
+#define VCPU_ID	0
+#define L2_GUEST_STACK_SIZE 256
+
+/*
+ * Arbitrary, never shoved into KVM/hardware, just need to avoid conflict =
with
+ * the "real" exceptions used, #SS/#GP/#DF (12/13/8).
+ */
+#define FAKE_TRIPLE_FAULT_VECTOR	0xaa
+
+/* Arbitrary 32-bit error code injected by this test. */
+#define SS_ERROR_CODE 0xdeadbeef
+
+/*
+ * Bit '0' is set on Intel if the exception occurs while delivering a prev=
ious
+ * event/exception.  AMD's wording is ambiguous, but presumably the bit is=
 set
+ * if the exception occurs while delivering an external event, e.g. NMI or=
 INTR,
+ * but not for exceptions that occur when delivering other exceptions or
+ * software interrupts.
+ *
+ * Note, Intel's name for it, "External event", is misleading and much more
+ * aligned with AMD's behavior, but the SDM is quite clear on its behavior.
+ */
+#define ERROR_CODE_EXT_FLAG	BIT(0)
+
+/*
+ * Bit '1' is set if the fault occurred when looking up a descriptor in the
+ * IDT, which is the case here as the IDT is empty/NULL.
+ */
+#define ERROR_CODE_IDT_FLAG	BIT(1)
+
+/*
+ * The #GP that occurs when vectoring #SS should show the index into the I=
DT
+ * for #SS, plus have the "IDT flag" set.
+ */
+#define GP_ERROR_CODE_AMD ((SS_VECTOR * 8) | ERROR_CODE_IDT_FLAG)
+#define GP_ERROR_CODE_INTEL ((SS_VECTOR * 8) | ERROR_CODE_IDT_FLAG | ERROR=
_CODE_EXT_FLAG)
+
+/*
+ * Intel and AMD both shove '0' into the error code on #DF, regardless of =
what
+ * led to the double fault.
+ */
+#define DF_ERROR_CODE 0
+
+#define INTERCEPT_SS		(BIT_ULL(SS_VECTOR))
+#define INTERCEPT_SS_DF		(INTERCEPT_SS | BIT_ULL(DF_VECTOR))
+#define INTERCEPT_SS_GP_DF	(INTERCEPT_SS_DF | BIT_ULL(GP_VECTOR))
+
+static void l2_ss_pending_test(void)
+{
+	GUEST_SYNC(SS_VECTOR);
+}
+
+static void l2_ss_injected_gp_test(void)
+{
+	GUEST_SYNC(GP_VECTOR);
+}
+
+static void l2_ss_injected_df_test(void)
+{
+	GUEST_SYNC(DF_VECTOR);
+}
+
+static void l2_ss_injected_tf_test(void)
+{
+	GUEST_SYNC(FAKE_TRIPLE_FAULT_VECTOR);
+}
+
+static void svm_run_l2(struct svm_test_data *svm, void *l2_code, int vecto=
r,
+		       uint32_t error_code)
+{
+	struct vmcb *vmcb =3D svm->vmcb;
+	struct vmcb_control_area *ctrl =3D &vmcb->control;
+
+	vmcb->save.rip =3D (u64)l2_code;
+	run_guest(vmcb, svm->vmcb_gpa);
+
+	if (vector =3D=3D FAKE_TRIPLE_FAULT_VECTOR)
+		return;
+
+	GUEST_ASSERT_EQ(ctrl->exit_code, (SVM_EXIT_EXCP_BASE + vector));
+	GUEST_ASSERT_EQ(ctrl->exit_info_1, error_code);
+}
+
+static void l1_svm_code(struct svm_test_data *svm)
+{
+	struct vmcb_control_area *ctrl =3D &svm->vmcb->control;
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+
+	generic_svm_setup(svm, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+	svm->vmcb->save.idtr.limit =3D 0;
+	ctrl->intercept |=3D BIT_ULL(INTERCEPT_SHUTDOWN);
+
+	ctrl->intercept_exceptions =3D INTERCEPT_SS_GP_DF;
+	svm_run_l2(svm, l2_ss_pending_test, SS_VECTOR, SS_ERROR_CODE);
+	svm_run_l2(svm, l2_ss_injected_gp_test, GP_VECTOR, GP_ERROR_CODE_AMD);
+
+	ctrl->intercept_exceptions =3D INTERCEPT_SS_DF;
+	svm_run_l2(svm, l2_ss_injected_df_test, DF_VECTOR, DF_ERROR_CODE);
+
+	ctrl->intercept_exceptions =3D INTERCEPT_SS;
+	svm_run_l2(svm, l2_ss_injected_tf_test, FAKE_TRIPLE_FAULT_VECTOR, 0);
+	GUEST_ASSERT_EQ(ctrl->exit_code, SVM_EXIT_SHUTDOWN);
+
+	GUEST_DONE();
+}
+
+static void vmx_run_l2(void *l2_code, int vector, uint32_t error_code)
+{
+	GUEST_ASSERT(!vmwrite(GUEST_RIP, (u64)l2_code));
+
+	GUEST_ASSERT_EQ(vector =3D=3D SS_VECTOR ? vmlaunch() : vmresume(), 0);
+
+	if (vector =3D=3D FAKE_TRIPLE_FAULT_VECTOR)
+		return;
+
+	GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_EXCEPTION_NMI);
+	GUEST_ASSERT_EQ((vmreadz(VM_EXIT_INTR_INFO) & 0xff), vector);
+	GUEST_ASSERT_EQ(vmreadz(VM_EXIT_INTR_ERROR_CODE), error_code);
+}
+
+static void l1_vmx_code(struct vmx_pages *vmx)
+{
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+
+	GUEST_ASSERT_EQ(prepare_for_vmx_operation(vmx), true);
+
+	GUEST_ASSERT_EQ(load_vmcs(vmx), true);
+
+	prepare_vmcs(vmx, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+	GUEST_ASSERT_EQ(vmwrite(GUEST_IDTR_LIMIT, 0), 0);
+
+	/*
+	 * VMX disallows injecting an exception with error_code[31:16] !=3D 0,
+	 * and hardware will never generate a VM-Exit with bits 31:16 set.
+	 * KVM should likewise truncate the "bad" userspace value.
+	 */
+	GUEST_ASSERT_EQ(vmwrite(EXCEPTION_BITMAP, INTERCEPT_SS_GP_DF), 0);
+	vmx_run_l2(l2_ss_pending_test, SS_VECTOR, (u16)SS_ERROR_CODE);
+	vmx_run_l2(l2_ss_injected_gp_test, GP_VECTOR, GP_ERROR_CODE_INTEL);
+
+	GUEST_ASSERT_EQ(vmwrite(EXCEPTION_BITMAP, INTERCEPT_SS_DF), 0);
+	vmx_run_l2(l2_ss_injected_df_test, DF_VECTOR, DF_ERROR_CODE);
+
+	GUEST_ASSERT_EQ(vmwrite(EXCEPTION_BITMAP, INTERCEPT_SS), 0);
+	vmx_run_l2(l2_ss_injected_tf_test, FAKE_TRIPLE_FAULT_VECTOR, 0);
+	GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_TRIPLE_FAULT);
+
+	GUEST_DONE();
+}
+
+static void __attribute__((__flatten__)) l1_guest_code(void *test_data)
+{
+	if (cpu_has_svm())
+		l1_svm_code(test_data);
+	else
+		l1_vmx_code(test_data);
+}
+
+static void assert_ucall_vector(struct kvm_vm *vm, int vector)
+{
+	struct kvm_run *run =3D vcpu_state(vm, VCPU_ID);
+	struct ucall uc;
+
+	TEST_ASSERT(run->exit_reason =3D=3D KVM_EXIT_IO,
+		    "Unexpected exit reason: %u (%s),\n",
+		    run->exit_reason, exit_reason_str(run->exit_reason));
+
+	switch (get_ucall(vm, VCPU_ID, &uc)) {
+	case UCALL_SYNC:
+		TEST_ASSERT(vector =3D=3D uc.args[1],
+			    "Expected L2 to ask for %d, got %ld", vector, uc.args[1]);
+		break;
+	case UCALL_DONE:
+		TEST_ASSERT(vector =3D=3D -1,
+			    "Expected L2 to ask for %d, L2 says it's done", vector);
+		break;
+	case UCALL_ABORT:
+		TEST_FAIL("%s at %s:%ld (0x%lx !=3D 0x%lx)",
+			  (const char *)uc.args[0], __FILE__, uc.args[1],
+			  uc.args[2], uc.args[3]);
+		break;
+	default:
+		TEST_FAIL("Expected L2 to ask for %d, got unexpected ucall %lu", vector,=
 uc.cmd);
+	}
+}
+
+static void queue_ss_exception(struct kvm_vm *vm, bool inject)
+{
+	struct kvm_vcpu_events events;
+
+	vcpu_events_get(vm, VCPU_ID, &events);
+
+	TEST_ASSERT(!events.exception.pending,
+		    "Vector %d unexpectedlt pending", events.exception.nr);
+	TEST_ASSERT(!events.exception.injected,
+		    "Vector %d unexpectedly injected", events.exception.nr);
+
+	events.flags =3D KVM_VCPUEVENT_VALID_PAYLOAD;
+	events.exception.pending =3D !inject;
+	events.exception.injected =3D inject;
+	events.exception.nr =3D SS_VECTOR;
+	events.exception.has_error_code =3D true;
+	events.exception.error_code =3D SS_ERROR_CODE;
+	vcpu_events_set(vm, VCPU_ID, &events);
+}
+
+/*
+ * Verify KVM_{G,S}ET_EVENTS play nice with pending vs. injected exceptions
+ * when an exception is being queued for L2.  Specifically, verify that KVM
+ * honors L1 exception intercept controls when a #SS is pending/injected,
+ * triggers a #GP on vectoring the #SS, morphs to #DF if #GP isn't interce=
pted
+ * by L1, and finally causes (nested) SHUTDOWN if #DF isn't intercepted by=
 L1.
+ */
+int main(int argc, char *argv[])
+{
+	struct kvm_enable_cap cap_exception_payload =3D {
+		.cap =3D KVM_CAP_EXCEPTION_PAYLOAD,
+		.args[0] =3D -2ul,
+	};
+	vm_vaddr_t nested_test_data_gva;
+	struct kvm_vcpu_events events;
+	struct kvm_run *run;
+	struct kvm_vm *vm;
+
+	if (!kvm_check_cap(KVM_CAP_EXCEPTION_PAYLOAD)) {
+		pr_info("KVM_CAP_EXCEPTION_PAYLOAD not supported, skipping\n");
+		exit(KSFT_SKIP);
+	}
+
+	vm =3D vm_create_default(VCPU_ID, 0, l1_guest_code);
+	vm_enable_cap(vm, &cap_exception_payload);
+
+	if (nested_svm_supported()) {
+		vcpu_alloc_svm(vm, &nested_test_data_gva);
+	} else if (nested_vmx_supported()) {
+		vcpu_alloc_vmx(vm, &nested_test_data_gva);
+	} else {
+		pr_info("Nested virtualization not supported, skipping\n");
+		exit(KSFT_SKIP);
+	}
+
+	vcpu_args_set(vm, VCPU_ID, 1, nested_test_data_gva);
+	run =3D vcpu_state(vm, VCPU_ID);
+
+	/* Run L1 =3D> L2.  L2 should sync and request #SS. */
+	vcpu_run(vm, VCPU_ID);
+	assert_ucall_vector(vm, SS_VECTOR);
+
+	/* Pend #SS and request immediate exit.  #SS should still be pending. */
+	queue_ss_exception(vm, false);
+	run->immediate_exit =3D true;
+	vcpu_run_complete_io(vm, VCPU_ID);
+
+	/* Verify the pending events comes back out the same as it went in. */
+	vcpu_events_get(vm, VCPU_ID, &events);
+	ASSERT_EQ(events.flags & KVM_VCPUEVENT_VALID_PAYLOAD,
+		  KVM_VCPUEVENT_VALID_PAYLOAD);
+	ASSERT_EQ(events.exception.pending, true);
+	ASSERT_EQ(events.exception.nr, SS_VECTOR);
+	ASSERT_EQ(events.exception.has_error_code, true);
+	ASSERT_EQ(events.exception.error_code, SS_ERROR_CODE);
+
+	/*
+	 * Run for real with the pending #SS, L1 should get a VM-Exit due to
+	 * #SS interception and re-enter L2 to request #GP (via injected #SS).
+	 */
+	run->immediate_exit =3D false;
+	vcpu_run(vm, VCPU_ID);
+	assert_ucall_vector(vm, GP_VECTOR);
+
+	/*
+	 * Inject #SS, the #SS should bypass interception and cause #GP, which
+	 * L1 should intercept before KVM morphs it to #DF.  L1 should then
+	 * disable #GP interception and run L2 to request #DF (via #SS =3D> #GP).
+	 */
+	queue_ss_exception(vm, true);
+	vcpu_run(vm, VCPU_ID);
+	assert_ucall_vector(vm, DF_VECTOR);
+
+	/*
+	 * Inject #SS, the #SS should bypass interception and cause #GP, which
+	 * L1 is no longer interception, and so should see a #DF VM-Exit.  L1
+	 * should then signal that is done.
+	 */
+	queue_ss_exception(vm, true);
+	vcpu_run(vm, VCPU_ID);
+	assert_ucall_vector(vm, FAKE_TRIPLE_FAULT_VECTOR);
+
+	/*
+	 * Inject #SS yet again.  L1 is not intercepting #GP or #DF, and so
+	 * should see nested TRIPLE_FAULT / SHUTDOWN.
+	 */
+	queue_ss_exception(vm, true);
+	vcpu_run(vm, VCPU_ID);
+	assert_ucall_vector(vm, -1);
+
+	kvm_vm_free(vm);
+}
--=20
2.35.1.723.g4982287a31-goog