From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 08F03C43334
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:47:52 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S241080AbiFNUrt (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:47:49 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36668 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234079AbiFNUrj (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:47:39 -0400
Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com
 [IPv6:2607:f8b0:4864:20::64a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5AEF21DA7C
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:38 -0700 (PDT)
Received: by mail-pl1-x64a.google.com with SMTP id
 q2-20020a170902dac200b00168b3978426so5384273plx.17
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=B7koh/xeMRbNx7ZveeQAqwn9fkODM62nIce4GzBlAJw=;
        b=U/gshE5IW1C8sXvFdPCkoSY6Aw1mJ3k5t/1jzxkWSTDNDiI/80meqCPAOI3vV0NCPm
         6TSUPULwBdHGkK7derN6KCOqAdhph6LaS/MYpjDn1z+DXkD1u8lR0OHbrQUpeLmLDgL1
         eeuNlcHglsQCU4lnTRJHTu4JWsGT9t96OiFN+1/dZAbcdSoo+cO/FpHylpzo/ZQfl7Iq
         4TkoF11P0SxGABPuI6PCExKT9LWGizxX/kuDzQ6VdV6XECf46rfWU20W4UuR7gOf2+sR
         FoM/3f64+jsDmLU/IASLKXmRdTMVhPHFxk2V3SpGF8/5wBVgMhuxZLWx0gyNtUhmIcaQ
         ZWZQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=B7koh/xeMRbNx7ZveeQAqwn9fkODM62nIce4GzBlAJw=;
        b=5Fibl2Va2XRN5H+IEQ53A+l4fF27x9XEEcqk6jGwr5MHXYGNwpI7Q+IHR8AqTRldtv
         IUPti0Fi7p161CY+T/iOE6WreBGIN7gtOxLIGtfxBP5DbeBPjefZMUNjw58ledll+43s
         1apzqPQGPjNAIbr4ignl+GnGcwSEET3hn+IJmyewICZfYrrv1QwZ/rLkTBTNg+GFpUf4
         erdHOg0GGsSx/FGd15ZSG0DXMGxEgPy1Yq8GAU8u3PWE/KQ4sDdnWKtZbcfCdt46raU4
         SzghR6n1p4KSZ6odh7CLRVQ8sh/EzYlDxlViLZBllpD0l0RPZlCXw//0n3ZI0XXuHSgX
         FfpQ==
X-Gm-Message-State: AJIora8m2p9SxnI2uGMsgZ6ev+8ry7YOIQk0gRcRBr995gqyJFQzWZUj
        aq9HIBl4G2h/VljQn8Io802YyhkVpdk=
X-Google-Smtp-Source: 
 AGRyM1vFRUUeN03pDeoJwyH3bcpEVVB5Pt0xRaeOBTezfDqV/a2WY419Ys0lZwI/Jfd0k+ALn0NwIQWePwY=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:903:130d:b0:164:17f5:9de5 with SMTP id
 iy13-20020a170903130d00b0016417f59de5mr6025190plb.132.1655239657887; Tue, 14
 Jun 2022 13:47:37 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:10 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-2-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 01/21] KVM: nVMX: Unconditionally purge queued/injected
 events on nested "exit"
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Drop pending exceptions and events queued for re-injection when leaving
nested guest mode, even if the "exit" is due to VM-Fail, SMI, or forced
by host userspace.  Failure to purge events could result in an event
belonging to L2 being injected into L1.

This _should_ never happen for VM-Fail as all events should be blocked by
nested_run_pending, but it's possible if KVM, not the L1 hypervisor, is
the source of VM-Fail when running vmcs02.

SMI is a nop (barring unknown bugs) as recognition of SMI and thus entry
to SMM is blocked by pending exceptions and re-injected events.

Forced exit is definitely buggy, but has likely gone unnoticed because
userspace probably follows the forced exit with KVM_SET_VCPU_EVENTS (or
some other ioctl() that purges the queue).

Fixes: 4f350c6dbcb9 ("kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME fa=
ilure properly")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7d8cd0ebcc75..ee6f27dffdba 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4263,14 +4263,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, st=
ruct vmcs12 *vmcs12,
 			nested_vmx_abort(vcpu,
 					 VMX_ABORT_SAVE_GUEST_MSR_FAIL);
 	}
-
-	/*
-	 * Drop what we picked up for L2 via vmx_complete_interrupts. It is
-	 * preserved above and would only end up incorrectly in L1.
-	 */
-	vcpu->arch.nmi_injected =3D false;
-	kvm_clear_exception_queue(vcpu);
-	kvm_clear_interrupt_queue(vcpu);
 }
=20
 /*
@@ -4609,6 +4601,17 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm=
_exit_reason,
 		WARN_ON_ONCE(nested_early_check);
 	}
=20
+	/*
+	 * Drop events/exceptions that were queued for re-injection to L2
+	 * (picked up via vmx_complete_interrupts()), as well as exceptions
+	 * that were pending for L2.  Note, this must NOT be hoisted above
+	 * prepare_vmcs12(), events/exceptions queued for re-injection need to
+	 * be captured in vmcs12 (see vmcs12_save_pending_event()).
+	 */
+	vcpu->arch.nmi_injected =3D false;
+	kvm_clear_exception_queue(vcpu);
+	kvm_clear_interrupt_queue(vcpu);
+
 	vmx_switch_vmcs(vcpu, &vmx->vmcs01);
=20
 	/* Update any VMCS fields that might have changed while L2 ran */
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D61BBC43334
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:47:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S245731AbiFNUrx (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:47:53 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36744 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235090AbiFNUrk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:47:40 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A5481E3E1
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:40 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 b9-20020a656689000000b003f672946300so5467580pgw.16
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=u8bzNiBps6E2wSU5SkQtzvlTmzoTVHSDY5lFfXr39Zk=;
        b=NW/smZkiDd1uJ4BP5Z2HhFtWDNoo8oNHVv9ysPZG6omGQQshynXbpsBgtihy5M542x
         mwXglKnK2f2nwTv5CH3kFCD1UDuWlOElGHVpVE2G/8JtD3YQkgKB6lgReLoR8GfVa0/v
         6Y/HdUGgTvmwTUqcQeC1HOermxW17NFfypftkD+cMAQmCGOINwFC0iDLwJ0UOZpdvy0B
         LDrBdZnaRP5NsJHvfcxmdaDRZ6P7L5jcOVSl67ru+mE0T3H0x6OGKkDk4JlXlpX15oXK
         Xfc2ixz71VShPaTQPjHvTtzJaJzuxLBIhPttmyV/aWVswn9Po2Xwh8lcdPQFChj2SJA1
         K25g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=u8bzNiBps6E2wSU5SkQtzvlTmzoTVHSDY5lFfXr39Zk=;
        b=W1RwLOr4k+MuEwo6dFs+atKnGXpExvJopVYXFCtI1aiDlGmN/gj1htLIJ/5topjssC
         gtojqclLePgNvYs28BXXsu2R4sxgdij8OUcP30cTfMem1IvUlu7ZenO3RuWwlWdyzdfJ
         Pos/VE+7i+X92Sppv1tkNlnqLZMOv3jipXd+1vmg/XpZY0PC+uOrDpLcfTWTs1kH5XRg
         uC12kHmNeVsJIJksFn1GpRdAZ33sLoknDQ6r7qeXTgFm8rL9fddvuPmmEY2fsYDCjagI
         WkJjgAgxe0BRfG33FUPHyY7uPTp4hlaaEaMUmgn1VK90IQpzWKBmmxuXMgvv0yUoHxez
         KH9Q==
X-Gm-Message-State: AJIora+K3O8Dn/Zq0yUTe5PN76Yb0NshTkmQLfRykrhiL9IeIxmF8zt+
        8gv2v8KNYZzGb44B3XOTX5MPjTXDkRI=
X-Google-Smtp-Source: 
 AGRyM1s8gN0PKCp0jJCS0uvJtwSOItDg7kIV0FDoXpWE2v10ECCFQz0vbVQnfNj5eB+DcUqLbReRrwPTRB0=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:902:930c:b0:167:8960:2c39 with SMTP id
 bc12-20020a170902930c00b0016789602c39mr6024164plb.33.1655239659566; Tue, 14
 Jun 2022 13:47:39 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:11 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-3-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 02/21] KVM: VMX: Drop bits 31:16 when shoving exception
 error code into VMCS
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Deliberately truncate the exception error code when shoving it into the
VMCS (VM-Entry field for vmcs01 and vmcs02, VM-Exit field for vmcs12).
Intel CPUs are incapable of handling 32-bit error codes and will never
generate an error code with bits 31:16, but userspace can provide an
arbitrary error code via KVM_SET_VCPU_EVENTS.  Failure to drop the bits
on exception injection results in failed VM-Entry, as VMX disallows
setting bits 31:16.  Setting the bits on VM-Exit would at best confuse
L1, and at worse induce a nested VM-Entry failure, e.g. if L1 decided to
reinject the exception back into L2.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/vmx/nested.c |  9 ++++++++-
 arch/x86/kvm/vmx/vmx.c    | 11 ++++++++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ee6f27dffdba..33ffc8bcf9cd 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3833,7 +3833,14 @@ static void nested_vmx_inject_exception_vmexit(struc=
t kvm_vcpu *vcpu,
 	u32 intr_info =3D nr | INTR_INFO_VALID_MASK;
=20
 	if (vcpu->arch.exception.has_error_code) {
-		vmcs12->vm_exit_intr_error_code =3D vcpu->arch.exception.error_code;
+		/*
+		 * Intel CPUs will never generate an error code with bits 31:16
+		 * set, and more importantly VMX disallows setting bits 31:16
+		 * in the injected error code for VM-Entry.  Drop the bits to
+		 * mimic hardware and avoid inducing failure on nested VM-Entry
+		 * if L1 chooses to inject the exception back to L2.
+		 */
+		vmcs12->vm_exit_intr_error_code =3D (u16)vcpu->arch.exception.error_code;
 		intr_info |=3D INTR_INFO_DELIVER_CODE_MASK;
 	}
=20
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5e14e4c40007..ec98992024e2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1621,7 +1621,16 @@ static void vmx_queue_exception(struct kvm_vcpu *vcp=
u)
 	kvm_deliver_exception_payload(vcpu);
=20
 	if (has_error_code) {
-		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, error_code);
+		/*
+		 * Despite the error code being architecturally defined as 32
+		 * bits, and the VMCS field being 32 bits, Intel CPUs and thus
+		 * VMX don't actually supporting setting bits 31:16.  Hardware
+		 * will (should) never provide a bogus error code, but KVM's
+		 * ABI lets userspace shove in arbitrary 32-bit values.  Drop
+		 * the upper bits to avoid VM-Fail, losing information that
+		 * does't really exist is preferable to killing the VM.
+		 */
+		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, (u16)error_code);
 		intr_info |=3D INTR_INFO_DELIVER_CODE_MASK;
 	}
=20
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 24CE4C433EF
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:47:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1345006AbiFNUr5 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:47:57 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36828 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235649AbiFNUrn (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:47:43 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C92431D322
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:41 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 r10-20020a632b0a000000b003fcb4af0273so5503414pgr.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=S6MVCft9LaHiNhWyF7FD9LCOGxdFEPDWsG4/kVl+FmI=;
        b=TgQLUbxB7n/ualcy2OzKA3Sa77C9pngWE2rlTiSyuzTdOsHwnty1/vgyvf3DV7YlVY
         kab7xWrs9PBf5+e9/qAMHIr4auL77u7ojkiUt5G82LV/fZxn0Fwc62jb0xTyg1RKtnUq
         b750KanEC+sRBPDpRSbiE5rTYbpof3xPOq66zx14nxOw1xOPGOkPtNjjnTi/9FKecIfs
         Zxwyp3qwtiDBM2UgvleviuXl+cfNAmh0QwyfGMN4faQoOvZIykOHL1IaEThJh2o9j/9u
         cwvrduyq1j0Wxt2sbjXNnP3qmILsXznpKPoVxHnrCX750+lLuRZd2NVci2wyKyJY4A1K
         Rzsg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=S6MVCft9LaHiNhWyF7FD9LCOGxdFEPDWsG4/kVl+FmI=;
        b=eCXMO36UNYvxYxWdP0Eeuax7QMnlX5zTEthpUVMamuEAeZlPSlaZw1FRyiCiuoPfyU
         sJb3dSuRxC8q2Rcx1iAh/X7NVMIqrzPc7YCJaiakMV0VUVbDwnxLzsvkvscrTLv6RLCx
         3PWuHyl5eD5DlS7+TYv7cnibeFtlWQ8fTVKNb3e9Fx6viHm/y86knt5WHkFQCsaOxoQS
         6NUCsnUAfcguL/zZNusyA9QejCtSqr/R7b8WCtmGYyqX/h83ypxw24Qnhpf8UQkk99pb
         x1cGmArqs8jgm4urkiz++6+Uo5ik6IHw8ol8RBBkmwzJI+b2DOepC0OrkdbQEwEPG7WI
         pkrQ==
X-Gm-Message-State: AJIora+21UzUgfH26Tcp+ECQIJn1WVfJnDrfDv442NoCvEhWMa3Q+2tm
        4YQC76GPAFshF0Ih1SaP8GE8R4BB6wM=
X-Google-Smtp-Source: 
 AGRyM1v/jtl477wqnGxDW7NdKC+b3LLICZHtlLrtE4nHZMHKrblNXYIktFnkwpF4w2RL4UXHS4Xkkew8e6A=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90a:cb8c:b0:1e6:715f:ed28 with SMTP id
 a12-20020a17090acb8c00b001e6715fed28mr6401949pju.69.1655239661300; Tue, 14
 Jun 2022 13:47:41 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:12 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-4-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 03/21] KVM: x86: Don't check for code breakpoints when
 emulating on exception
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Don't check for code breakpoints during instruction emulation if the
emulation was triggered by exception interception.  Code breakpoints are
the highest priority fault-like exception, and KVM only emulates on
exceptions that are fault-like.  Thus, if hardware signaled a different
exception, then the vCPU is already passed the stage of checking for
hardware breakpoints.

This is likely a glorified nop in terms of functionality, and is more for
clarification and is technically an optimization.  Intel's SDM explicitly
states vmcs.GUEST_RFLAGS.RF on exception interception is the same as the
value that would have been saved on the stack had the exception not been
intercepted, i.e. will be '1' due to all fault-like exceptions setting RF
to '1'.  AMD says "guest state saved ... is the processor state as of the
moment the intercept triggers", but that begs the question, "when does
the intercept trigger?".

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2318a99139fa..c5db31b4bd6f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8364,8 +8364,24 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *v=
cpu)
 }
 EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
=20
-static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu, int *r)
+static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu,
+					   int emulation_type, int *r)
 {
+	WARN_ON_ONCE(emulation_type & EMULTYPE_NO_DECODE);
+
+	/*
+	 * Do not check for code breakpoints if hardware has already done the
+	 * checks, as inferred from the emulation type.  On NO_DECODE and SKIP,
+	 * the instruction has passed all exception checks, and all intercepted
+	 * exceptions that trigger emulation have lower priority than code
+	 * breakpoints, i.e. the fact that the intercepted exception occurred
+	 * means any code breakpoints have already been serviced.
+	 */
+	if (emulation_type & (EMULTYPE_NO_DECODE | EMULTYPE_SKIP |
+			      EMULTYPE_TRAP_UD | EMULTYPE_TRAP_UD_FORCED |
+			      EMULTYPE_VMWARE_GP | EMULTYPE_PF))
+		return false;
+
 	if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) &&
 	    (vcpu->arch.guest_debug_dr7 & DR7_BP_EN_MASK)) {
 		struct kvm_run *kvm_run =3D vcpu->run;
@@ -8487,8 +8503,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gp=
a_t cr2_or_gpa,
 		 * are fault-like and are higher priority than any faults on
 		 * the code fetch itself.
 		 */
-		if (!(emulation_type & EMULTYPE_SKIP) &&
-		    kvm_vcpu_check_code_breakpoint(vcpu, &r))
+		if (kvm_vcpu_check_code_breakpoint(vcpu, emulation_type, &r))
 			return r;
=20
 		r =3D x86_decode_emulated_instruction(vcpu, emulation_type,
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4EE0AC43334
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:48:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S240796AbiFNUsB (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:48:01 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36884 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S236108AbiFNUro (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:47:44 -0400
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 999341EAEB
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:43 -0700 (PDT)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-30c24697ffaso34609887b3.16
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=jEcsHGHZ3tbmjisn5h6g94WTrwdGvhdt4EBmx2YsX+U=;
        b=LOxkx5oIOTsQ5sTn5IbpHJJPqFHBqQKUqN/C8g6IBCxB5QmVwXyUe3/dhubAVdk7pF
         QjdafCBIojLN3yYLJjkdNgkCCwNwC/58ktSG5g0KIkPr8C7oaMoKdCBJB+rVVLvq92r5
         NJTAdqGZPeeScjH5p6o3ndzELWbwbMwQnPYkurnYT5JdMNqqgZXLmgExqTu9+JS5UBZq
         8EfbMgr716/OYvgnyHkUEGQc2Vr3ZJDkMAluYfqv6S8usClFPC0dewoSk1VJFtHbD/r2
         DR8m6j31+RlthN0Xzf30lZysD/fTSn12i1haGdJcCOERy88Xbb6clENUsTjjCd+dhdKj
         r7zA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=jEcsHGHZ3tbmjisn5h6g94WTrwdGvhdt4EBmx2YsX+U=;
        b=QEXzfwxcL6n+DOEK82ffgbAWk5UvanZDYYA2JIBQ5za7Ds9sEi8hSvlaU1//qEvoVG
         PfO3jZJM7x6vrLcezjbDrGdpSBNgoKY7Fv/y3AeptWya9nDzb70FGe5iwm/qz/Udz4fn
         EDb6HTY/BWlvVzztAkqbzzzG6yK3cwUnxEO81Vxo0zwSmIzvDpEhpR0I0VmxAdzzkTqC
         HKonF/CuSxGsYf7VMqr7Hw8S8s74OSTei5F6nyyzNtBaw3GvzfsPpP7z4lGc9Lf/5WvY
         gY02nKvf6Di242mq6DQnLTBKS0NfGa+CCYC0iA6cfP7ch66kI5pHvxPTaOzVzrtIUrsj
         uYQw==
X-Gm-Message-State: AJIora89chZW+W84mqAK3cRlisGU2lsaq4+kJnOQHQbp0GAA3KV7GB2s
        At5UggF+QTGL76iZyIrCdig/PfRNS2A=
X-Google-Smtp-Source: 
 AGRyM1tQYhTGKvlNHn213iE9pmBdoZ5Tro+vA82JvY1EHUiyq9fB5NCsmwyDOTqZom0mtUhlWLPT+jvidvE=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a25:a227:0:b0:663:5bea:8954 with SMTP id
 b36-20020a25a227000000b006635bea8954mr7153891ybi.513.1655239662864; Tue, 14
 Jun 2022 13:47:42 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:13 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-5-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 04/21] KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as
 fault-like
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Exclude General Detect #DBs, which have fault-like behavior but also have
a non-zero payload (DR6.BD=3D1), from nVMX's handling of pending debug
traps.  Opportunistically rewrite the comment to better document what is
being checked, i.e. "has a non-zero payload" vs. "has a payload", and to
call out the many caveats surrounding #DBs that KVM dodges one way or
another.

Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Fixes: 684c0422da71 ("KVM: nVMX: Handle pending #DB when injecting INIT VM-=
exit")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 33ffc8bcf9cd..61bc80fc4cfa 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3857,16 +3857,29 @@ static void nested_vmx_inject_exception_vmexit(stru=
ct kvm_vcpu *vcpu,
 }
=20
 /*
- * Returns true if a debug trap is pending delivery.
+ * Returns true if a debug trap is (likely) pending delivery.  Infer the c=
lass
+ * of a #DB (trap-like vs. fault-like) from the exception payload (to-be-D=
R6).
+ * Using the payload is flawed because code breakpoints (fault-like) and d=
ata
+ * breakpoints (trap-like) set the same bits in DR6 (breakpoint detected),=
 i.e.
+ * this will return false positives if a to-be-injected code breakpoint #D=
B is
+ * pending (from KVM's perspective, but not "pending" across an instruction
+ * boundary).  ICEBP, a.k.a. INT1, is also not reflected here even though =
it
+ * too is trap-like.
  *
- * In KVM, debug traps bear an exception payload. As such, the class of a =
#DB
- * exception may be inferred from the presence of an exception payload.
+ * KVM "works" despite these flaws as ICEBP isn't currently supported by t=
he
+ * emulator, Monitor Trap Flag is not marked pending on intercepted #DBs (=
the
+ * #DB has already happened), and MTF isn't marked pending on code breakpo=
ints
+ * from the emulator (because such #DBs are fault-like and thus don't trig=
ger
+ * actions that fire on instruction retire).
  */
-static inline bool vmx_pending_dbg_trap(struct kvm_vcpu *vcpu)
+static inline unsigned long vmx_get_pending_dbg_trap(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.exception.pending &&
-			vcpu->arch.exception.nr =3D=3D DB_VECTOR &&
-			vcpu->arch.exception.payload;
+	if (!vcpu->arch.exception.pending ||
+	    vcpu->arch.exception.nr !=3D DB_VECTOR)
+		return 0;
+
+	/* General Detect #DBs are always fault-like. */
+	return vcpu->arch.exception.payload & ~DR6_BD;
 }
=20
 /*
@@ -3878,9 +3891,10 @@ static inline bool vmx_pending_dbg_trap(struct kvm_v=
cpu *vcpu)
  */
 static void nested_vmx_update_pending_dbg(struct kvm_vcpu *vcpu)
 {
-	if (vmx_pending_dbg_trap(vcpu))
-		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
-			    vcpu->arch.exception.payload);
+	unsigned long pending_dbg =3D vmx_get_pending_dbg_trap(vcpu);
+
+	if (pending_dbg)
+		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, pending_dbg);
 }
=20
 static bool nested_vmx_preemption_timer_pending(struct kvm_vcpu *vcpu)
@@ -3937,7 +3951,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 	 * while delivering the pending exception.
 	 */
=20
-	if (vcpu->arch.exception.pending && !vmx_pending_dbg_trap(vcpu)) {
+	if (vcpu->arch.exception.pending && !vmx_get_pending_dbg_trap(vcpu)) {
 		if (vmx->nested.nested_run_pending)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 393A0CCA47A
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:48:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1357191AbiFNUsF (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:48:05 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36956 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S237102AbiFNUrq (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:47:46 -0400
Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com
 [IPv6:2607:f8b0:4864:20::64a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4EB6B1F2F0
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:45 -0700 (PDT)
Received: by mail-pl1-x64a.google.com with SMTP id
 p2-20020a170902e74200b00164081f682cso5370172plf.16
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=fEMqLuFAtgH8w8vKOovMX6x1myh1UNWmWv8ofboTXMI=;
        b=WXTfP/SKtgWZirrs3CPPTeFvNldLG4Xzf4GtcVhc5x+dSoCGpXo5jMDtoogLIQbG8N
         JfyEq/EkPaSHDUlc7qMDG6r4Fqdj9y1UjpVJvGVkENVACe2G9P83cudNF+u4rfg/ofWf
         q0/uh+sAPMRAnJDN98IW8ODT8fQZsP7+5rtzLmCkIAiGOMWdBJmQoYhwG+6FD9b//tM8
         F0Dh2sfmmA5i1DTzzBQbP4LJ58KeRAKJU42a8K5S+JUN4nN9k55Fis4igMclxLdp/GtH
         kEGwTxI9nZFP+6ZXC0LPZNflAaPQmKfVtRzqxOhU9PO5jrSDejRUS1scwA/9b2+LwEH8
         3OVQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=fEMqLuFAtgH8w8vKOovMX6x1myh1UNWmWv8ofboTXMI=;
        b=wLb0riLsgyo0X9C50S9HrszVM345aftzXGJCAlDtOGd9SQW0QzdCtqkQ5tDp6ZuC6b
         BRlLbtillsWF3MAHIUlCWGMlnU7WN3OsMVXxcoSdj7/BPXq3IiOzVCqWe/zB4gkWZnTg
         EkLYylTePn8S5sfGxFYDvwhNmWtnMN+hWDPBAM8Vm3H9ifU7DwXLwIK5a+sQS2Atlols
         HAQBBtNFiQln5xQ5YTs2ZtSdqC72e8aFDc74FhXsOMWo/yy4GE3m0LRATu0AVhiCBFCy
         +xB9hrxt5aSG7N4gUkbXzez5oJzIHm+TUWRd1lik8F6wTQ+P4J1cEaouGpYqSHH4KVJf
         2UoA==
X-Gm-Message-State: AOAM533mGjygFoLNeKhFly/CnjjpkM52sfo3HBaIKliw1L/YKuh++QuZ
        84qLKx9bkvFVC9UeqsIVMawyvxkoChc=
X-Google-Smtp-Source: 
 ABdhPJzGOmR6ilus/G3tluMOKVeV6aCrX+WqWy47QWXnKNVICbB+f93eIh24+Wx28XBPtNWmm1PUEObrR+Q=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:1946:b0:4fe:309f:d612 with SMTP id
 s6-20020a056a00194600b004fe309fd612mr6481855pfk.10.1655239664827; Tue, 14 Jun
 2022 13:47:44 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:14 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-6-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 05/21] KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor
 Trap Flag
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Service TSS T-flag #DBs prior to pending MTFs, as such #DBs are higher
priority than MTF.  KVM itself doesn't emulate TSS #DBs, and any such
exceptions injected from L1 will be handled by hardware (or morphed to
a fault-like exception if injection fails), but theoretically userspace
could pend a TSS T-flag #DB in conjunction with a pending MTF.

Note, there's no known use case this fixes, it's purely to be technically
correct with respect to Intel's SDM.

Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Fixes: 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction em=
ulation")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 61bc80fc4cfa..e794791a6bdd 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3943,15 +3943,17 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 	}
=20
 	/*
-	 * Process any exceptions that are not debug traps before MTF.
+	 * Process exceptions that are higher priority than Monitor Trap Flag:
+	 * fault-like exceptions, TSS T flag #DB (not emulated by KVM, but
+	 * could theoretically come in from userspace), and ICEBP (INT1).
 	 *
 	 * Note that only a pending nested run can block a pending exception.
 	 * Otherwise an injected NMI/interrupt should either be
 	 * lost or delivered to the nested hypervisor in the IDT_VECTORING_INFO,
 	 * while delivering the pending exception.
 	 */
-
-	if (vcpu->arch.exception.pending && !vmx_get_pending_dbg_trap(vcpu)) {
+	if (vcpu->arch.exception.pending &&
+	    !(vmx_get_pending_dbg_trap(vcpu) & ~DR6_BT)) {
 		if (vmx->nested.nested_run_pending)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5B0A9C43334
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:48:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1357768AbiFNUsK (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:48:10 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37066 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S240821AbiFNUrs (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:47:48 -0400
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4769F1F638
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:47 -0700 (PDT)
Received: by mail-pl1-x649.google.com with SMTP id
 l5-20020a170902f68500b00167654aeba1so5399290plg.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=XW7Auazi+rtg3tCq25tBoxDOd9/9r9yYTc2MrH9AoHM=;
        b=bXvLxBX2KPymYQ40OMfTjXomGxbuh8lHVnDQPbxkSwAl3S59t2Wl1GDpothuPKQtyd
         eme9UwcDH7MJFgoGtVLF/E0PqoLwqdDyrtwQTjIFcKUfRSFu94tYSGURDo4vsWSEMYts
         RWhh7Rl9y3nmr4rzF2DQVWhU72luEfNdetI/EDU1A2/rLyIUcGNCV4Ay/esg5zF69Gw9
         mMcHYHB2+7bzH1n6d2i96MN1K7x3G/yzLjqZhF4A0681hKxwszk4mxa18vWdy0q/focX
         +V6BcTNnesQ9Tya1WmuaJYnXBuK/g3rwrCpBBQ1aGU8AFLhODPzPyqYZR+yDr/kQuaWN
         N1yQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=XW7Auazi+rtg3tCq25tBoxDOd9/9r9yYTc2MrH9AoHM=;
        b=oo0ZXFTL3pBkkReg8zY2HgktUnBRkCIHCiqPb37NCF/b1/VdIRGv/jF/66BI+yqNWj
         9T35nYfnouHJjmZmByttF2Soa8HV1Mh2Ll7Yy6euuvMmbtANsJCwetcDHmOyloRI4oBC
         xESkaHxx2lJNdAfRM3pUk2WpynAhKH+VrJwJvzSIuSSWgobaVJtir9eGHVSJJkrsCKrt
         u3DXb9+Et6R9bO8jI76QtOtTaAhzS/KhFTBfyWuDJaJhkXx//CAe6GnKLSvDr/MNCYCG
         azbxOELSN4yC4kdmSo3WCdm2NQnfM/zxcDQdArPTeqFPycDwyqNt30vNTvF5fpZJd8CU
         s81w==
X-Gm-Message-State: AOAM533LFIl2XfnNntPPmm81sFZcqHbolm0SuGnSFDYZdS2AZoXjG2PA
        eiFlg2tL7x3/uqfzKo27cxM3Sd4m9aY=
X-Google-Smtp-Source: 
 ABdhPJzwe6tqyJNiixz/BgrttsvdBlhkPOKbT8YqQt1OF34QAa4Drg67jpXCCwJLCyZTGn+T/CGMYaCHVOk=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:1344:b0:51b:e3b5:54e2 with SMTP id
 k4-20020a056a00134400b0051be3b554e2mr6406748pfu.6.1655239666700; Tue, 14 Jun
 2022 13:47:46 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:15 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-7-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 06/21] KVM: x86: Treat #DBs from the emulator as fault-like
 (code and DR7.GD=1)
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or
trap-like depending the sub-type of #DB, and effectively defer the
decision of what to do with the #DB to the caller.

For the emulator's two calls to exception_type(), treat the #DB as
fault-like, as the emulator handles only code breakpoint and general
detect #DBs, both of which are fault-like.

For event injection, which uses exception_type() to determine whether to
set EFLAGS.RF=3D1 on the stack, keep the current behavior of not setting
RF=3D1 for #DBs.  Intel and AMD explicitly state RF isn't set on code #DBs,
so exempting by failing the "=3D=3D EXCPT_FAULT" check is correct.  The only
other fault-like #DB is General Detect, and despite Intel and AMD both
strongly implying (through omission) that General Detect #DBs should set
RF=3D1, hardware (multiple generations of both Intel and AMD), in fact does
not.  Through insider knowledge, extreme foresight, sheer dumb luck, or
some combination thereof, KVM correctly handled RF for General Detect #DBs.

Fixes: 38827dbd3fb8 ("KVM: x86: Do not update EFLAGS on faulting emulation")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c5db31b4bd6f..7c3ce601bdcc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -529,6 +529,7 @@ static int exception_class(int vector)
 #define EXCPT_TRAP		1
 #define EXCPT_ABORT		2
 #define EXCPT_INTERRUPT		3
+#define EXCPT_DB		4
=20
 static int exception_type(int vector)
 {
@@ -539,8 +540,14 @@ static int exception_type(int vector)
=20
 	mask =3D 1 << vector;
=20
-	/* #DB is trap, as instruction watchpoints are handled elsewhere */
-	if (mask & ((1 << DB_VECTOR) | (1 << BP_VECTOR) | (1 << OF_VECTOR)))
+	/*
+	 * #DBs can be trap-like or fault-like, the caller must check other CPU
+	 * state, e.g. DR6, to determine whether a #DB is a trap or fault.
+	 */
+	if (mask & (1 << DB_VECTOR))
+		return EXCPT_DB;
+
+	if (mask & ((1 << BP_VECTOR) | (1 << OF_VECTOR)))
 		return EXCPT_TRAP;
=20
 	if (mask & ((1 << DF_VECTOR) | (1 << MC_VECTOR)))
@@ -8632,6 +8639,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, g=
pa_t cr2_or_gpa,
 		unsigned long rflags =3D static_call(kvm_x86_get_rflags)(vcpu);
 		toggle_interruptibility(vcpu, ctxt->interruptibility);
 		vcpu->arch.emulate_regs_need_sync_to_vcpu =3D false;
+
+		/*
+		 * Note, EXCPT_DB is assumed to be fault-like as the emulator
+		 * only supports code breakpoints and general detect #DB, both
+		 * of which are fault-like.
+		 */
 		if (!ctxt->have_exception ||
 		    exception_type(ctxt->exception.vector) =3D=3D EXCPT_TRAP) {
 			kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
@@ -9546,6 +9559,16 @@ static int inject_pending_event(struct kvm_vcpu *vcp=
u, bool *req_immediate_exit)
=20
 	/* try to inject new event if pending */
 	if (vcpu->arch.exception.pending) {
+		/*
+		 * Fault-class exceptions, except #DBs, set RF=3D1 in the RFLAGS
+		 * value pushed on the stack.  Trap-like exception and all #DBs
+		 * leave RF as-is (KVM follows Intel's behavior in this regard;
+		 * AMD states that code breakpoint #DBs excplitly clear RF=3D0).
+		 *
+		 * Note, most versions of Intel's SDM and AMD's APM incorrectly
+		 * describe the behavior of General Detect #DBs, which are
+		 * fault-like.  They do _not_ set RF, a la code breakpoints.
+		 */
 		if (exception_type(vcpu->arch.exception.nr) =3D=3D EXCPT_FAULT)
 			__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
 					     X86_EFLAGS_RF);
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CB90CC43334
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:48:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1353147AbiFNUsP (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:48:15 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37468 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1344299AbiFNUr4 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:47:56 -0400
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7593020F50
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:49 -0700 (PDT)
Received: by mail-pj1-x104a.google.com with SMTP id
 l2-20020a17090a72c200b001e325e14e3eso4109925pjk.7
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=445Dxm/hWO1uMPB0/Sj36VMrqtqiEZcdCSofMfSvzg0=;
        b=QIeMVjypij7Jbov+vsynuCjBPOeV2JgUxvXQ+Q6BM/JpYzzCaELEmT8KYTEjCeF9C4
         1ciFMdR1wsxiDFtQDGQriGsed+I/bgBs5Si4mhyepQUTwBBfNe3ZpMA9CyEifZ1urvK3
         ssJDLnoulidAAVYeMXU9fEwGa0KUd7+GWsro8qafeo2y8g2H5C9Ystb/Jpw/Ha6oBiDH
         j4Yf6heS7003xruz58J9xmnXClYXjLCBdKiY+Exj2om5Klgy7vQYzoAghDlRrItKTMgw
         +uyOb08PjDALTJerL8/81X8a16K5wTJcxTmmXyMvrFw6M8zzlQvPmUuB7kODDJ41b/VG
         nyZg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=445Dxm/hWO1uMPB0/Sj36VMrqtqiEZcdCSofMfSvzg0=;
        b=i5F++3ZY/Lt+u3ThjwzkebPtL7lbhdFSPf1S2umy+E38At5cVDdfg3dDRR8GNG0gBc
         5EtBMJGXQCQbmiKfyfcugDFRegEEh7i7nKt8QmcW9uR+N+3A71geXCBno+x9I581Xikj
         F5Pdw2j7QhDhLpLm3NgZmOnnbcnMd3QtoUDMhCAR/E0nbEZCg+XrOh4niRjQHV9JBm13
         Wjma63ktAMOOfl4MmGEkO+wVBVVeRU9z+/Fa3OpE2AiZrGBm4ySBEI1KbKx3YCGUleMf
         HfawRCx8g131Hb8TVRqh4eIn7Wwy86FrcQtE8PzkafGeMhDlzFRRZL4Kdq78szoBTvwu
         COcg==
X-Gm-Message-State: AJIora9hOGpyM45e2OdAeQRz6GXScggZ7iWz5kcom3XnafCt/LA3dMF8
        27F5yEC6S8mGarvPPSscqlRt9o7W3nI=
X-Google-Smtp-Source: 
 AGRyM1sQpwauLumNzBDFhO1lN0ASWBmfeO9ruXIiKyQ8/AfhJ7QVRCvYsl7ijU+PfLdl0YVa7J/XI1FPT0Y=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90a:178f:b0:1e3:3ba:c185 with SMTP id
 q15-20020a17090a178f00b001e303bac185mr194296pja.1.1655239668728; Tue, 14 Jun
 2022 13:47:48 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:16 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-8-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 07/21] KVM: x86: Use DR7_GD macro instead of open coding
 check in emulator
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Use DR7_GD in the emulator instead of open coding the check, and drop a
comically wrong comment.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/emulate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 39ea9138224c..bf499716d9d3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4182,8 +4182,7 @@ static int check_dr7_gd(struct x86_emulate_ctxt *ctxt)
=20
 	ctxt->ops->get_dr(ctxt, 7, &dr7);
=20
-	/* Check if DR7.Global_Enable is set */
-	return dr7 & (1 << 13);
+	return dr7 & DR7_GD;
 }
=20
 static int check_dr_read(struct x86_emulate_ctxt *ctxt)
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8F1F1C433EF
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:48:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1358207AbiFNUso (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:48:44 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37482 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1353181AbiFNUsA (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:48:00 -0400
Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com
 [IPv6:2607:f8b0:4864:20::114a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7598320F53
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:51 -0700 (PDT)
Received: by mail-yw1-x114a.google.com with SMTP id
 00721157ae682-313e5c642dcso35463627b3.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=Avux3DXm0eZx/sc8tRS1iEmdb18s5R6TM2wIGHbP9EE=;
        b=hrvKljUtWXL5ARSt7y6G/u+Cs1N4fvLQdgAPmuGJAo70I9tNF11gUdsg0zfJK50aUl
         oRTZbtU2TQYJTOMImRyszjHNTJuF0HSmb+JUSQc9V+mJpttvpvM0qR4ZAxHjF6Qu58o9
         +72fwqKsaV25Bb0kChYcEipTDB7hN7RklAfUPTCeiC4DFLnjeqY09eoVDvaHjkdk6jXY
         EelGGRK0+tFrpWrtlARq+fTP3BOMenKSgTJsFtqT8wN1gkT5p5FKenfoFB1oS2VRhyne
         oCcro7cFFHidqUHztkv6FWHYeFuG/7awI1JcudKnbPBzOwpMZM1hj9oW/jXuzu4giiHg
         3W+w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=Avux3DXm0eZx/sc8tRS1iEmdb18s5R6TM2wIGHbP9EE=;
        b=ezm7i9QkqQBTicjo01vVcMimT1eSzAGDGI01k3EKmMDHQU17OWyg8nD6gjQ/vARUl9
         M2XI3JJQpwskmfVW43kIvHtCjucAXBmyleITSmD6nqwBcWybFrtTa1weegUFCsGK4ulL
         I72xDB1c1UczvcH7cWCWCnqDq255inOZ7BbuU6NgxMMwMYs7Qn8ghIq5AdxB0Zgn4JVp
         KiBenz6kPPYJOPuehtCMyqSgEN+zoJjCOdoKwDQYnG37idPFkq8QdYjjsT2lWq6TP6GY
         6/kSga7n2RhlsU5CrDiPS8i15bmRf2CohNJUxi2vTis8JrnPb3PaJT2Xm3YlpUzW5Op5
         x5OQ==
X-Gm-Message-State: AJIora+eFQrSib1FkwH4gcS7i2qzm+gyXVPZPsSpQOm1SV5y1wwJlSUT
        /6QWhVpMHpIIjfKygH7afCRj6+++/GA=
X-Google-Smtp-Source: 
 AGRyM1vw5g+ASJFYQ4nCkrht/P3qA6AcoGmTK2nrJJTdttY2KNgAVLMaywKrC1SXjEIP8xAPr5qN38Uee48=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a25:7447:0:b0:667:b7a8:902a with SMTP id
 p68-20020a257447000000b00667b7a8902amr656805ybc.518.1655239670856; Tue, 14
 Jun 2022 13:47:50 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:17 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-9-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 08/21] KVM: nVMX: Ignore SIPI that arrives in L2 when vCPU
 is not in WFS
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Fall through to handling other pending exception/events for L2 if SIPI
is pending while the CPU is not in Wait-for-SIPI.  KVM correctly ignores
the event, but incorrectly returns immediately, e.g. a SIPI coincident
with another event could lead to KVM incorrectly routing the event to L1
instead of L2.

Fixes: bf0cd88ce363 ("KVM: x86: emulate wait-for-SIPI and SIPI-VMExit")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e794791a6bdd..d080bfca16ef 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3936,10 +3936,12 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 			return -EBUSY;
=20
 		clear_bit(KVM_APIC_SIPI, &apic->pending_events);
-		if (vcpu->arch.mp_state =3D=3D KVM_MP_STATE_INIT_RECEIVED)
+		if (vcpu->arch.mp_state =3D=3D KVM_MP_STATE_INIT_RECEIVED) {
 			nested_vmx_vmexit(vcpu, EXIT_REASON_SIPI_SIGNAL, 0,
 						apic->sipi_vector & 0xFFUL);
-		return 0;
+			return 0;
+		}
+		/* Fallthrough, the SIPI is completely ignored. */
 	}
=20
 	/*
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F2278C43334
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:48:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1358124AbiFNUsg (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:48:36 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37530 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1345122AbiFNUr5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:47:57 -0400
Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com
 [IPv6:2607:f8b0:4864:20::44a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3D96245B4
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:53 -0700 (PDT)
Received: by mail-pf1-x44a.google.com with SMTP id
 z186-20020a6233c3000000b00510a6bc2864so4249534pfz.10
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=/TuCx+mMC/wWO+wCOMNM3bKKbwcesvJBIffjbKO56Bw=;
        b=EXnkADu6RTXV+ib1jbxWSMLKVki7PKLtpXvmbrKGr1wxeZWUrsr61IsHWiwHEf0pfg
         qQZ3wjYx3pdYZQgdg3r+72YXMx7RcUwoMH3KcXd6ehSu5fKAu8tA8PfYyqlRowpbk0gg
         cWQ6geMtAhCjmdrTsw1XkaTXddSWFsOhm7BBj0Q6afoWf9dfCVGVrn9bUYCljrZLbMZt
         0QzEhPK0g13S4RBTn3zN1PoijTNol+n0QJmJKuE/qhvi41OZ7AUZuaj6TQZF8ggwlGPO
         6hOIufOCh/p2Vzffj9QPxCzA+r2W5rLEb63J7eMK39NUsXaL+PQJsIdItwEyz/zg04mN
         BF1w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=/TuCx+mMC/wWO+wCOMNM3bKKbwcesvJBIffjbKO56Bw=;
        b=6K/iNQXSLMF1pUkyfHUU5aMmOz4V1GqSaUsBVg8+h7plnG8Du9c5LJGt0QAIEEhVRe
         neq4sehKZxseymEkGQG3ViQKd5J9yhhlHulfNUfF6JWW6biRo+owtekzfVURdS7O34ub
         hPT0xoqKRBBryjNyra/949oWM7qM8Yk4q7KZUf57uZi4gRuKy3m132DM6BVpaIMT5biz
         jxJ1vpicYN5Yp4oN4//C+49JAsYEE+PpOriXb4c3vJZ+w/S7xUQFpuznQlsYel2gUukw
         T2jrXwixW7cH1zMyJ5Z/MESMkNSHydFz++TEIt0hHSliIMDhCLJUYvsNsIkbFySjoLGb
         M9Kg==
X-Gm-Message-State: AOAM531wWuSiDqaPvaFXztYt6zKG3VN6SUQFZBcFpJEQJmeiDLUV1q7L
        OcqZO7TGWxH+VC4vvcvA6EiBYY6ANwk=
X-Google-Smtp-Source: 
 ABdhPJz4TFfgEbPvptgdgeqmoRmXfp4kjzJF01Rq80AvhdYmZSiQbcvuoZiSx2RF+fw1ABPTREqfL1KbBAU=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:aa7:9f84:0:b0:51b:b64d:fc69 with SMTP id
 z4-20020aa79f84000000b0051bb64dfc69mr6543168pfr.7.1655239672654; Tue, 14 Jun
 2022 13:47:52 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:18 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-10-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 09/21] KVM: nVMX: Unconditionally clear mtf_pending on
 nested VM-Exit
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Clear mtf_pending on nested VM-Exit instead of handling the clear on a
case-by-case basis in vmx_check_nested_events().  The pending MTF should
rever survive nested VM-Exit, as it is a property of KVM's run of the
current L2, i.e. should never affect the next L2 run by L1.  In practice,
this is likely a nop as getting to L1 with nested_run_pending is
impossible, and KVM doesn't correctly handle morphing a pending exception
that occurs on a prior injected exception (need for re-injected exception
being the other case where MTF isn't cleared).  However, KVM will
hopefully soon correctly deal with a pending exception on top of an
injected exception.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d080bfca16ef..7b644513c82b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3909,16 +3909,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *=
vcpu)
 	unsigned long exit_qual;
 	bool block_nested_events =3D
 	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
-	bool mtf_pending =3D vmx->nested.mtf_pending;
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
=20
-	/*
-	 * Clear the MTF state. If a higher priority VM-exit is delivered first,
-	 * this state is discarded.
-	 */
-	if (!block_nested_events)
-		vmx->nested.mtf_pending =3D false;
-
 	if (lapic_in_kernel(vcpu) &&
 		test_bit(KVM_APIC_INIT, &apic->pending_events)) {
 		if (block_nested_events)
@@ -3927,6 +3919,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 		clear_bit(KVM_APIC_INIT, &apic->pending_events);
 		if (vcpu->arch.mp_state !=3D KVM_MP_STATE_INIT_RECEIVED)
 			nested_vmx_vmexit(vcpu, EXIT_REASON_INIT_SIGNAL, 0, 0);
+
+		/* MTF is discarded if the vCPU is in WFS. */
+		vmx->nested.mtf_pending =3D false;
 		return 0;
 	}
=20
@@ -3964,7 +3959,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 		return 0;
 	}
=20
-	if (mtf_pending) {
+	if (vmx->nested.mtf_pending) {
 		if (block_nested_events)
 			return -EBUSY;
 		nested_vmx_update_pending_dbg(vcpu);
@@ -4562,6 +4557,9 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_=
exit_reason,
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
=20
+	/* Pending MTF traps are discarded on VM-Exit. */
+	vmx->nested.mtf_pending =3D false;
+
 	/* trying to cancel vmlaunch/vmresume is a bug */
 	WARN_ON_ONCE(vmx->nested.nested_run_pending);
=20
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D33FCC43334
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:49:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1358392AbiFNUtW (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:49:22 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37524 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1356579AbiFNUsA (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:48:00 -0400
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51B2F26AD4
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:55 -0700 (PDT)
Received: by mail-pj1-x104a.google.com with SMTP id
 lw3-20020a17090b180300b001e31fad7d5aso50791pjb.6
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=NuhBWXqmLuVOf2m1Sf51S9jTSqkqKCsQtTnFDyuk7Aw=;
        b=AAxqgJ02kZTKFqX0fncJZKFgSJ2T0aA1M4w3CKtg5ghAgxQBqiDR/VJB2qtuKBXK0Y
         2aF2sDvCot1MSZR+2cf+/hdm82ueKwEx2lrUYeouQ+ECCyoxiAcyWRiYUyAf2K/5uWW0
         MkKQbbnd30ju+o9ghshN4pPD8kBxXkIiOHxRXbSKHg7D/A0JJLRcUaXNmfMEqxiWJD5X
         RSlhwPogjoo2Twvp62FoU3WmNujiDrXwCoEJBJJGG4DaSkdlc75jDYUW+BNkvLQO3BvB
         9i2n3fjxiSCCaty3hj2XDmS2Nwgf/M31VCHXBwM1tcp0Gklu8/KRTJiF+gidwqhVSPYw
         vrTQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=NuhBWXqmLuVOf2m1Sf51S9jTSqkqKCsQtTnFDyuk7Aw=;
        b=N0dHqKTUTwT3dqXkK4ssstllrDQRvldegU5da7Nn8rwBvR+I+yAwugFmtL8HyHFHtS
         ngnoN81M9UQVu6z12PR2PMxDblF6irVYeExVBQqhXOs+9GFnWR5UCrnLbEQCQMwKWT88
         TAm2mkjpKDCHkZmmmboW9nIAWDmddo9ylb29qtvTDuMJtjDWdYluIM7BAiF0M6E2T8W4
         I6QWTMhCITFVwitlJAZNccMPeD3Ccsg2F1Lu3/lRhciSKG2Vp29ZV7sbPh9LaLZ0/utK
         L+5JJI2glPoCP0pcTtpMdXuytlFseGW4lrq8QiceTrJw46LcgJd6f4DYPsGJ7F/1aE1M
         qszA==
X-Gm-Message-State: AOAM531QoOC3A2IZT/p8vcj17il+G/g0LefOkjderQjd7AQC86/XiKE7
        AjU88j28OXDZw2nSU0UwRg7MJZypXQM=
X-Google-Smtp-Source: 
 ABdhPJxhuOZirzBb6ajrKE0F+5qY70HzLyfqMtZYi5fqIRkVW4Lk3gV+5JSEMCrA+DGmDVzjoE8ZrfiPeDQ=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:1306:b0:512:ca3d:392f with SMTP id
 j6-20020a056a00130600b00512ca3d392fmr6513669pfu.79.1655239674490; Tue, 14 Jun
 2022 13:47:54 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:19 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-11-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 10/21] KVM: VMX: Inject #PF on ENCLS as "emulated" #PF
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Treat #PFs that occur during emulation of ENCLS as, wait for it, emulated
page faults.  Practically speaking, this is a glorified nop as the
exception is never of the nested flavor, and it's extremely unlikely the
guest is relying on the side effect of an implicit INVLPG on the faulting
address.

Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce C=
PUID restrictions")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/sgx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 35e7ec91ae86..966cfa228f2a 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -129,7 +129,7 @@ static int sgx_inject_fault(struct kvm_vcpu *vcpu, gva_=
t gva, int trapnr)
 		ex.address =3D gva;
 		ex.error_code_valid =3D true;
 		ex.nested_page_fault =3D false;
-		kvm_inject_page_fault(vcpu, &ex);
+		kvm_inject_emulated_page_fault(vcpu, &ex);
 	} else {
 		kvm_inject_gp(vcpu, 0);
 	}
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B5BB3C433EF
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:48:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1356877AbiFNUsv (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:48:51 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38320 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1357729AbiFNUsJ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:48:09 -0400
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74A483EAB1
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:57 -0700 (PDT)
Received: by mail-pj1-x104a.google.com with SMTP id
 q62-20020a17090a17c400b001e31a482241so11086pja.5
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=DvS0MK4rYpRtDe2fvh6ocxKN7mSQ+xkrM2pZGwGEGi0=;
        b=LJ3wp6JDRHV6sVfW5nAnFe2WOVDe8Ns3OwIoQoyRTK5yehEbrFRdZs26rivPhQZQmz
         HjW26zNG65VF3UfxnITX6UHlAA8Ft1CViBPyP04rGrZhWk2XPYHmceiYEV6pjJ+lxv7J
         nMIbdutaWyeQ5Vn5A8q4o6TpoyFLhxhJClmTHg9TcFJQNnkOiWVTqDoJ+DKrCFOZ2s9y
         2zzdNBHsJ+mbLn5XCilMcCW28Q25oqG2cUt+V7b/Pf06lI47rvPIuOo6y/IncdjAz1oG
         xVuv4L2EplswOkw1W4Ba9yuet+wfqYb9xVPv2GUNsbPml9pVy0k8hyngMGPL/h7vuPs1
         uooQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=DvS0MK4rYpRtDe2fvh6ocxKN7mSQ+xkrM2pZGwGEGi0=;
        b=y1nPlg9ZthuSTOYe98s5Zeu0AwsNp0kg5WiBybxcUrKXsb8UosAgAyxaBGOwk5mx7d
         r/t4hDW+sydu8MIL5MIuDy2aJzsS970y2HSx/YBIR2RmyyaC+oGouuuENw8Licn6ON4x
         ICclR4hkkLPWVn1OorJbBEc5sqiIaAi+62FYYK5Lpwp7FMzZ2ova2CSWIwRYnT3GXXqa
         a0gTtWjcL1by0b3uvkdMd/ayEeH/R9/eICgdAgKeU4p52VQrdqdwQBneJqK0AA2VUSbV
         0mpudER1SkZaTYVm/gocymiovMqy8KsyFMJGzZbipX1tTexUjM7e2poGdgVv8Omx7AbN
         rM3g==
X-Gm-Message-State: AOAM530dJDgTnQB1o5MaG7pmyBDbj6x1Oi+5FvhCXVdkWk4N1T1cH4Qx
        6wxVpye+RaVLe5jKpuJ1lWv30EsR984=
X-Google-Smtp-Source: 
 ABdhPJz4ke7bLqZx/KSNR4w7UoBgnHY5w/YAjtmqQr7Qf5jooPhCshyLJvD01vl3B4TVrPipJWjHkDsd7G0=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:a03:b0:522:990c:c795 with SMTP id
 p3-20020a056a000a0300b00522990cc795mr6250999pfh.15.1655239676339; Tue, 14 Jun
 2022 13:47:56 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:20 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-12-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 11/21] KVM: x86: Rename kvm_x86_ops.queue_exception to
 inject_exception
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Rename the kvm_x86_ops hook for exception injection to better reflect
reality, and to align with pretty much every other related function name
in KVM.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 2 +-
 arch/x86/include/asm/kvm_host.h    | 2 +-
 arch/x86/kvm/svm/svm.c             | 4 ++--
 arch/x86/kvm/vmx/vmx.c             | 4 ++--
 arch/x86/kvm/x86.c                 | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 6f2f1affbb78..a42e2d9b04fe 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -67,7 +67,7 @@ KVM_X86_OP(get_interrupt_shadow)
 KVM_X86_OP(patch_hypercall)
 KVM_X86_OP(inject_irq)
 KVM_X86_OP(inject_nmi)
-KVM_X86_OP(queue_exception)
+KVM_X86_OP(inject_exception)
 KVM_X86_OP(cancel_injection)
 KVM_X86_OP(interrupt_allowed)
 KVM_X86_OP(nmi_allowed)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 7e98b2876380..16a7f91cdf75 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1505,7 +1505,7 @@ struct kvm_x86_ops {
 				unsigned char *hypercall_addr);
 	void (*inject_irq)(struct kvm_vcpu *vcpu, bool reinjected);
 	void (*inject_nmi)(struct kvm_vcpu *vcpu);
-	void (*queue_exception)(struct kvm_vcpu *vcpu);
+	void (*inject_exception)(struct kvm_vcpu *vcpu);
 	void (*cancel_injection)(struct kvm_vcpu *vcpu);
 	int (*interrupt_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
 	int (*nmi_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c6cca0ce127b..ca39f76ca44b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -430,7 +430,7 @@ static int svm_update_soft_interrupt_rip(struct kvm_vcp=
u *vcpu)
 	return 0;
 }
=20
-static void svm_queue_exception(struct kvm_vcpu *vcpu)
+static void svm_inject_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm =3D to_svm(vcpu);
 	unsigned nr =3D vcpu->arch.exception.nr;
@@ -4761,7 +4761,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata =3D {
 	.patch_hypercall =3D svm_patch_hypercall,
 	.inject_irq =3D svm_inject_irq,
 	.inject_nmi =3D svm_inject_nmi,
-	.queue_exception =3D svm_queue_exception,
+	.inject_exception =3D svm_inject_exception,
 	.cancel_injection =3D svm_cancel_injection,
 	.interrupt_allowed =3D svm_interrupt_allowed,
 	.nmi_allowed =3D svm_nmi_allowed,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ec98992024e2..26b863c78a9f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1610,7 +1610,7 @@ static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
 		vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
 }
=20
-static void vmx_queue_exception(struct kvm_vcpu *vcpu)
+static void vmx_inject_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	unsigned nr =3D vcpu->arch.exception.nr;
@@ -7993,7 +7993,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D {
 	.patch_hypercall =3D vmx_patch_hypercall,
 	.inject_irq =3D vmx_inject_irq,
 	.inject_nmi =3D vmx_inject_nmi,
-	.queue_exception =3D vmx_queue_exception,
+	.inject_exception =3D vmx_inject_exception,
 	.cancel_injection =3D vmx_cancel_injection,
 	.interrupt_allowed =3D vmx_interrupt_allowed,
 	.nmi_allowed =3D vmx_nmi_allowed,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7c3ce601bdcc..b63421d511c5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9504,7 +9504,7 @@ static void kvm_inject_exception(struct kvm_vcpu *vcp=
u)
=20
 	if (vcpu->arch.exception.error_code && !is_protmode(vcpu))
 		vcpu->arch.exception.error_code =3D false;
-	static_call(kvm_x86_queue_exception)(vcpu);
+	static_call(kvm_x86_inject_exception)(vcpu);
 }
=20
 static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate=
_exit)
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DC182C43334
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:48:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1358328AbiFNUsz (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:48:55 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38280 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1358260AbiFNUsM (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:48:12 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E6634EDEC
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:59 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 a15-20020a65604f000000b00401a9baf7d5so5514128pgp.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:47:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=Rl4B4t3I4WEPVLC6AhWdnPN1Z8+TXE+l93DxSlE3FxA=;
        b=otFIdkx9QXzh11eYmXu3sXzz0T3XU/BuXA6C3FEIonu/1oYbVt0F20u0LUpgadW4dA
         X+2G588dMcqz0r/ez3tEUQ5NtAy4+9SoGMCjX3DXfJAALM5hBmpAGI6XgQ3UI+0JEf1+
         De26ldJHmGRd5pCFT4OLtJXuLwQgu4rEAefAhmRrTv01ujwwkHvUWtmmrKJuWm7Q1r9w
         eQUhxLhVoRZBa2KkOVWGp3uV8wkiz8DGaEmm3xzYTJFXG41YPizBn4fnra0wowbYhQEt
         H3nCDvqK6XXtgd81j7FD5T66y1g/H9l/xvxSUlpUhvHU6CGEIPYdgKJ9BQHgTKbUk82m
         28yg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=Rl4B4t3I4WEPVLC6AhWdnPN1Z8+TXE+l93DxSlE3FxA=;
        b=n/QQqPyBm1Nu6fCyMjTev3XPzHu3cWxGdV0VfqFuVRmtqgetU+sF63tPXwE9Qso/FU
         Ae3aak/nqEMCeccv7Gt60oJIDlJc51f9fg5+nnL5opVTs5mFag557jcfAO/sgf2l51jI
         7/fhcFHTIuq46PAoVCke6NfjehI3CoWmS/6szq5fFqJRXv0GuOBU64Nzp45hHHCqrnGD
         H2hLqIKWzu0rvpgiB71cizKIrlqGlNNzDScrgfMyuro7pTN94zdvfEYeiIb4YuSMPP38
         vkDfxwA2J4jGWaRSCdvftqpB0TJnxDolWveXRWgx++q/xHBiUA17y3EaJGNy3iBGbLIv
         owcg==
X-Gm-Message-State: AOAM532e10VxVOwI3W6Ad51rk5xvBiiPYuPjCsj6g0ZBiYfwTdpiJ8pS
        l/vumwOl+jFIzPA4Je80GKteYuSGDRg=
X-Google-Smtp-Source: 
 ABdhPJzYPX+FAXhI2jiN3Zz0jr3hYKkcsfVuL9rN2OHtLJ4dTU3XG1iB9hanqDuvJ8fEo6KXRtH71s0Jbyc=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:1902:b0:4f7:8813:b2cb with SMTP id
 y2-20020a056a00190200b004f78813b2cbmr6387325pfi.54.1655239678104; Tue, 14 Jun
 2022 13:47:58 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:21 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-13-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 12/21] KVM: x86: Make kvm_queued_exception a properly
 named, visible struct
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Move the definition of "struct kvm_queued_exception" out of kvm_vcpu_arch
in anticipation of adding a second instance in kvm_vcpu_arch to handle
exceptions that occur when vectoring an injected exception and are
morphed to VM-Exit instead of leading to #DF.

Opportunistically take advantage of the churn to rename "nr" to "vector".

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 23 +++++-----
 arch/x86/kvm/svm/nested.c       | 45 ++++++++++---------
 arch/x86/kvm/svm/svm.c          | 14 +++---
 arch/x86/kvm/vmx/nested.c       | 42 +++++++++--------
 arch/x86/kvm/vmx/vmx.c          | 20 ++++-----
 arch/x86/kvm/x86.c              | 80 ++++++++++++++++-----------------
 arch/x86/kvm/x86.h              |  3 +-
 7 files changed, 111 insertions(+), 116 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 16a7f91cdf75..7f321d53a7e9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -640,6 +640,17 @@ struct kvm_vcpu_xen {
 	struct timer_list poll_timer;
 };
=20
+struct kvm_queued_exception {
+	bool pending;
+	bool injected;
+	bool has_error_code;
+	u8 vector;
+	u32 error_code;
+	unsigned long payload;
+	bool has_payload;
+	u8 nested_apf;
+};
+
 struct kvm_vcpu_arch {
 	/*
 	 * rip and regs accesses must go through
@@ -739,16 +750,8 @@ struct kvm_vcpu_arch {
=20
 	u8 event_exit_inst_len;
=20
-	struct kvm_queued_exception {
-		bool pending;
-		bool injected;
-		bool has_error_code;
-		u8 nr;
-		u32 error_code;
-		unsigned long payload;
-		bool has_payload;
-		u8 nested_apf;
-	} exception;
+	/* Exceptions to be injected to the guest. */
+	struct kvm_queued_exception exception;
=20
 	struct kvm_queued_interrupt {
 		bool injected;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 83bae1f2eeb8..471d40e97890 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -462,7 +462,7 @@ static void nested_save_pending_event_to_vmcb12(struct =
vcpu_svm *svm,
 	unsigned int nr;
=20
 	if (vcpu->arch.exception.injected) {
-		nr =3D vcpu->arch.exception.nr;
+		nr =3D vcpu->arch.exception.vector;
 		exit_int_info =3D nr | SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_EXEPT;
=20
 		if (vcpu->arch.exception.has_error_code) {
@@ -1299,42 +1299,43 @@ int nested_svm_check_permissions(struct kvm_vcpu *v=
cpu)
=20
 static bool nested_exit_on_exception(struct vcpu_svm *svm)
 {
-	unsigned int nr =3D svm->vcpu.arch.exception.nr;
+	unsigned int vector =3D svm->vcpu.arch.exception.vector;
=20
-	return (svm->nested.ctl.intercepts[INTERCEPT_EXCEPTION] & BIT(nr));
+	return (svm->nested.ctl.intercepts[INTERCEPT_EXCEPTION] & BIT(vector));
 }
=20
-static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm)
+static void nested_svm_inject_exception_vmexit(struct kvm_vcpu *vcpu)
 {
-	unsigned int nr =3D svm->vcpu.arch.exception.nr;
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	struct vcpu_svm *svm =3D to_svm(vcpu);
 	struct vmcb *vmcb =3D svm->vmcb;
=20
-	vmcb->control.exit_code =3D SVM_EXIT_EXCP_BASE + nr;
+	vmcb->control.exit_code =3D SVM_EXIT_EXCP_BASE + ex->vector;
 	vmcb->control.exit_code_hi =3D 0;
=20
-	if (svm->vcpu.arch.exception.has_error_code)
-		vmcb->control.exit_info_1 =3D svm->vcpu.arch.exception.error_code;
+	if (ex->has_error_code)
+		vmcb->control.exit_info_1 =3D ex->error_code;
=20
 	/*
 	 * EXITINFO2 is undefined for all exception intercepts other
 	 * than #PF.
 	 */
-	if (nr =3D=3D PF_VECTOR) {
-		if (svm->vcpu.arch.exception.nested_apf)
-			vmcb->control.exit_info_2 =3D svm->vcpu.arch.apf.nested_apf_token;
-		else if (svm->vcpu.arch.exception.has_payload)
-			vmcb->control.exit_info_2 =3D svm->vcpu.arch.exception.payload;
+	if (ex->vector =3D=3D PF_VECTOR) {
+		if (ex->has_payload)
+			vmcb->control.exit_info_2 =3D ex->payload;
 		else
-			vmcb->control.exit_info_2 =3D svm->vcpu.arch.cr2;
-	} else if (nr =3D=3D DB_VECTOR) {
+			vmcb->control.exit_info_2 =3D vcpu->arch.cr2;
+	} else if (ex->vector =3D=3D DB_VECTOR) {
 		/* See inject_pending_event.  */
-		kvm_deliver_exception_payload(&svm->vcpu);
-		if (svm->vcpu.arch.dr7 & DR7_GD) {
-			svm->vcpu.arch.dr7 &=3D ~DR7_GD;
-			kvm_update_dr7(&svm->vcpu);
+		kvm_deliver_exception_payload(vcpu, ex);
+
+		if (vcpu->arch.dr7 & DR7_GD) {
+			vcpu->arch.dr7 &=3D ~DR7_GD;
+			kvm_update_dr7(vcpu);
 		}
-	} else
-		WARN_ON(svm->vcpu.arch.exception.has_payload);
+	} else {
+		WARN_ON(ex->has_payload);
+	}
=20
 	nested_svm_vmexit(svm);
 }
@@ -1372,7 +1373,7 @@ static int svm_check_nested_events(struct kvm_vcpu *v=
cpu)
                         return -EBUSY;
 		if (!nested_exit_on_exception(svm))
 			return 0;
-		nested_svm_inject_exception_vmexit(svm);
+		nested_svm_inject_exception_vmexit(vcpu);
 		return 0;
 	}
=20
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ca39f76ca44b..6b80046a014f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -432,22 +432,20 @@ static int svm_update_soft_interrupt_rip(struct kvm_v=
cpu *vcpu)
=20
 static void svm_inject_exception(struct kvm_vcpu *vcpu)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
 	struct vcpu_svm *svm =3D to_svm(vcpu);
-	unsigned nr =3D vcpu->arch.exception.nr;
-	bool has_error_code =3D vcpu->arch.exception.has_error_code;
-	u32 error_code =3D vcpu->arch.exception.error_code;
=20
-	kvm_deliver_exception_payload(vcpu);
+	kvm_deliver_exception_payload(vcpu, ex);
=20
-	if (kvm_exception_is_soft(nr) &&
+	if (kvm_exception_is_soft(ex->vector) &&
 	    svm_update_soft_interrupt_rip(vcpu))
 		return;
=20
-	svm->vmcb->control.event_inj =3D nr
+	svm->vmcb->control.event_inj =3D ex->vector
 		| SVM_EVTINJ_VALID
-		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
+		| (ex->has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
 		| SVM_EVTINJ_TYPE_EXEPT;
-	svm->vmcb->control.event_inj_err =3D error_code;
+	svm->vmcb->control.event_inj_err =3D ex->error_code;
 }
=20
 static void svm_init_erratum_383(void)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7b644513c82b..fafdcbfeca1f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -445,29 +445,27 @@ static bool nested_vmx_is_page_fault_vmexit(struct vm=
cs12 *vmcs12,
  */
 static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned long=
 *exit_qual)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
-	unsigned int nr =3D vcpu->arch.exception.nr;
-	bool has_payload =3D vcpu->arch.exception.has_payload;
-	unsigned long payload =3D vcpu->arch.exception.payload;
=20
-	if (nr =3D=3D PF_VECTOR) {
-		if (vcpu->arch.exception.nested_apf) {
+	if (ex->vector =3D=3D PF_VECTOR) {
+		if (ex->nested_apf) {
 			*exit_qual =3D vcpu->arch.apf.nested_apf_token;
 			return 1;
 		}
-		if (nested_vmx_is_page_fault_vmexit(vmcs12,
-						    vcpu->arch.exception.error_code)) {
-			*exit_qual =3D has_payload ? payload : vcpu->arch.cr2;
+		if (nested_vmx_is_page_fault_vmexit(vmcs12, ex->error_code)) {
+			*exit_qual =3D ex->has_payload ? ex->payload : vcpu->arch.cr2;
 			return 1;
 		}
-	} else if (vmcs12->exception_bitmap & (1u << nr)) {
-		if (nr =3D=3D DB_VECTOR) {
-			if (!has_payload) {
-				payload =3D vcpu->arch.dr6;
-				payload &=3D ~DR6_BT;
-				payload ^=3D DR6_ACTIVE_LOW;
+	} else if (vmcs12->exception_bitmap & (1u << ex->vector)) {
+		if (ex->vector =3D=3D DB_VECTOR) {
+			if (ex->has_payload) {
+				*exit_qual =3D ex->payload;
+			} else {
+				*exit_qual =3D vcpu->arch.dr6;
+				*exit_qual &=3D ~DR6_BT;
+				*exit_qual ^=3D DR6_ACTIVE_LOW;
 			}
-			*exit_qual =3D payload;
 		} else
 			*exit_qual =3D 0;
 		return 1;
@@ -3724,7 +3722,7 @@ static void vmcs12_save_pending_event(struct kvm_vcpu=
 *vcpu,
 	     is_double_fault(exit_intr_info))) {
 		vmcs12->idt_vectoring_info_field =3D 0;
 	} else if (vcpu->arch.exception.injected) {
-		nr =3D vcpu->arch.exception.nr;
+		nr =3D vcpu->arch.exception.vector;
 		idt_vectoring =3D nr | VECTORING_INFO_VALID_MASK;
=20
 		if (kvm_exception_is_soft(nr)) {
@@ -3828,11 +3826,11 @@ static int vmx_complete_nested_posted_interrupt(str=
uct kvm_vcpu *vcpu)
 static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
 					       unsigned long exit_qual)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	u32 intr_info =3D ex->vector | INTR_INFO_VALID_MASK;
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
-	unsigned int nr =3D vcpu->arch.exception.nr;
-	u32 intr_info =3D nr | INTR_INFO_VALID_MASK;
=20
-	if (vcpu->arch.exception.has_error_code) {
+	if (ex->has_error_code) {
 		/*
 		 * Intel CPUs will never generate an error code with bits 31:16
 		 * set, and more importantly VMX disallows setting bits 31:16
@@ -3840,11 +3838,11 @@ static void nested_vmx_inject_exception_vmexit(stru=
ct kvm_vcpu *vcpu,
 		 * mimic hardware and avoid inducing failure on nested VM-Entry
 		 * if L1 chooses to inject the exception back to L2.
 		 */
-		vmcs12->vm_exit_intr_error_code =3D (u16)vcpu->arch.exception.error_code;
+		vmcs12->vm_exit_intr_error_code =3D (u16)ex->error_code;
 		intr_info |=3D INTR_INFO_DELIVER_CODE_MASK;
 	}
=20
-	if (kvm_exception_is_soft(nr))
+	if (kvm_exception_is_soft(ex->vector))
 		intr_info |=3D INTR_TYPE_SOFT_EXCEPTION;
 	else
 		intr_info |=3D INTR_TYPE_HARD_EXCEPTION;
@@ -3875,7 +3873,7 @@ static void nested_vmx_inject_exception_vmexit(struct=
 kvm_vcpu *vcpu,
 static inline unsigned long vmx_get_pending_dbg_trap(struct kvm_vcpu *vcpu)
 {
 	if (!vcpu->arch.exception.pending ||
-	    vcpu->arch.exception.nr !=3D DB_VECTOR)
+	    vcpu->arch.exception.vector !=3D DB_VECTOR)
 		return 0;
=20
 	/* General Detect #DBs are always fault-like. */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 26b863c78a9f..7ef5659a1bbd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1585,7 +1585,7 @@ static void vmx_update_emulated_instruction(struct kv=
m_vcpu *vcpu)
 	 */
 	if (nested_cpu_has_mtf(vmcs12) &&
 	    (!vcpu->arch.exception.pending ||
-	     vcpu->arch.exception.nr =3D=3D DB_VECTOR))
+	     vcpu->arch.exception.vector =3D=3D DB_VECTOR))
 		vmx->nested.mtf_pending =3D true;
 	else
 		vmx->nested.mtf_pending =3D false;
@@ -1612,15 +1612,13 @@ static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
=20
 static void vmx_inject_exception(struct kvm_vcpu *vcpu)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	u32 intr_info =3D ex->vector | INTR_INFO_VALID_MASK;
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
-	unsigned nr =3D vcpu->arch.exception.nr;
-	bool has_error_code =3D vcpu->arch.exception.has_error_code;
-	u32 error_code =3D vcpu->arch.exception.error_code;
-	u32 intr_info =3D nr | INTR_INFO_VALID_MASK;
=20
-	kvm_deliver_exception_payload(vcpu);
+	kvm_deliver_exception_payload(vcpu, ex);
=20
-	if (has_error_code) {
+	if (ex->has_error_code) {
 		/*
 		 * Despite the error code being architecturally defined as 32
 		 * bits, and the VMCS field being 32 bits, Intel CPUs and thus
@@ -1630,21 +1628,21 @@ static void vmx_inject_exception(struct kvm_vcpu *v=
cpu)
 		 * the upper bits to avoid VM-Fail, losing information that
 		 * does't really exist is preferable to killing the VM.
 		 */
-		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, (u16)error_code);
+		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, (u16)ex->error_code);
 		intr_info |=3D INTR_INFO_DELIVER_CODE_MASK;
 	}
=20
 	if (vmx->rmode.vm86_active) {
 		int inc_eip =3D 0;
-		if (kvm_exception_is_soft(nr))
+		if (kvm_exception_is_soft(ex->vector))
 			inc_eip =3D vcpu->arch.event_exit_inst_len;
-		kvm_inject_realmode_interrupt(vcpu, nr, inc_eip);
+		kvm_inject_realmode_interrupt(vcpu, ex->vector, inc_eip);
 		return;
 	}
=20
 	WARN_ON_ONCE(vmx->emulation_required);
=20
-	if (kvm_exception_is_soft(nr)) {
+	if (kvm_exception_is_soft(ex->vector)) {
 		vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
 			     vmx->vcpu.arch.event_exit_inst_len);
 		intr_info |=3D INTR_TYPE_SOFT_EXCEPTION;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b63421d511c5..511c0c8af80e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -557,16 +557,13 @@ static int exception_type(int vector)
 	return EXCPT_FAULT;
 }
=20
-void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu)
+void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
+				   struct kvm_queued_exception *ex)
 {
-	unsigned nr =3D vcpu->arch.exception.nr;
-	bool has_payload =3D vcpu->arch.exception.has_payload;
-	unsigned long payload =3D vcpu->arch.exception.payload;
-
-	if (!has_payload)
+	if (!ex->has_payload)
 		return;
=20
-	switch (nr) {
+	switch (ex->vector) {
 	case DB_VECTOR:
 		/*
 		 * "Certain debug exceptions may clear bit 0-3.  The
@@ -591,8 +588,8 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcp=
u)
 		 * So they need to be flipped for DR6.
 		 */
 		vcpu->arch.dr6 |=3D DR6_ACTIVE_LOW;
-		vcpu->arch.dr6 |=3D payload;
-		vcpu->arch.dr6 ^=3D payload & DR6_ACTIVE_LOW;
+		vcpu->arch.dr6 |=3D ex->payload;
+		vcpu->arch.dr6 ^=3D ex->payload & DR6_ACTIVE_LOW;
=20
 		/*
 		 * The #DB payload is defined as compatible with the 'pending
@@ -603,12 +600,12 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *v=
cpu)
 		vcpu->arch.dr6 &=3D ~BIT(12);
 		break;
 	case PF_VECTOR:
-		vcpu->arch.cr2 =3D payload;
+		vcpu->arch.cr2 =3D ex->payload;
 		break;
 	}
=20
-	vcpu->arch.exception.has_payload =3D false;
-	vcpu->arch.exception.payload =3D 0;
+	ex->has_payload =3D false;
+	ex->payload =3D 0;
 }
 EXPORT_SYMBOL_GPL(kvm_deliver_exception_payload);
=20
@@ -647,17 +644,18 @@ static void kvm_multiple_exception(struct kvm_vcpu *v=
cpu,
 			vcpu->arch.exception.injected =3D false;
 		}
 		vcpu->arch.exception.has_error_code =3D has_error;
-		vcpu->arch.exception.nr =3D nr;
+		vcpu->arch.exception.vector =3D nr;
 		vcpu->arch.exception.error_code =3D error_code;
 		vcpu->arch.exception.has_payload =3D has_payload;
 		vcpu->arch.exception.payload =3D payload;
 		if (!is_guest_mode(vcpu))
-			kvm_deliver_exception_payload(vcpu);
+			kvm_deliver_exception_payload(vcpu,
+						      &vcpu->arch.exception);
 		return;
 	}
=20
 	/* to check exception */
-	prev_nr =3D vcpu->arch.exception.nr;
+	prev_nr =3D vcpu->arch.exception.vector;
 	if (prev_nr =3D=3D DF_VECTOR) {
 		/* triple fault -> shutdown */
 		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
@@ -675,7 +673,7 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcp=
u,
 		vcpu->arch.exception.pending =3D true;
 		vcpu->arch.exception.injected =3D false;
 		vcpu->arch.exception.has_error_code =3D true;
-		vcpu->arch.exception.nr =3D DF_VECTOR;
+		vcpu->arch.exception.vector =3D DF_VECTOR;
 		vcpu->arch.exception.error_code =3D 0;
 		vcpu->arch.exception.has_payload =3D false;
 		vcpu->arch.exception.payload =3D 0;
@@ -4886,25 +4884,24 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vc=
pu *vcpu,
 static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 					       struct kvm_vcpu_events *events)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+
 	process_nmi(vcpu);
=20
 	if (kvm_check_request(KVM_REQ_SMI, vcpu))
 		process_smi(vcpu);
=20
 	/*
-	 * In guest mode, payload delivery should be deferred,
-	 * so that the L1 hypervisor can intercept #PF before
-	 * CR2 is modified (or intercept #DB before DR6 is
-	 * modified under nVMX). Unless the per-VM capability,
-	 * KVM_CAP_EXCEPTION_PAYLOAD, is set, we may not defer the delivery of
-	 * an exception payload and handle after a KVM_GET_VCPU_EVENTS. Since we
-	 * opportunistically defer the exception payload, deliver it if the
-	 * capability hasn't been requested before processing a
-	 * KVM_GET_VCPU_EVENTS.
+	 * In guest mode, payload delivery should be deferred if the exception
+	 * will be intercepted by L1, e.g. KVM should not modifying CR2 if L1
+	 * intercepts #PF, ditto for DR6 and #DBs.  If the per-VM capability,
+	 * KVM_CAP_EXCEPTION_PAYLOAD, is not set, userspace may or may not
+	 * propagate the payload and so it cannot be safely deferred.  Deliver
+	 * the payload if the capability hasn't been requested.
 	 */
 	if (!vcpu->kvm->arch.exception_payload_enabled &&
-	    vcpu->arch.exception.pending && vcpu->arch.exception.has_payload)
-		kvm_deliver_exception_payload(vcpu);
+	    ex->pending && ex->has_payload)
+		kvm_deliver_exception_payload(vcpu, ex);
=20
 	/*
 	 * The API doesn't provide the instruction length for software
@@ -4912,26 +4909,25 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(stru=
ct kvm_vcpu *vcpu,
 	 * isn't advanced, we should expect to encounter the exception
 	 * again.
 	 */
-	if (kvm_exception_is_soft(vcpu->arch.exception.nr)) {
+	if (kvm_exception_is_soft(ex->vector)) {
 		events->exception.injected =3D 0;
 		events->exception.pending =3D 0;
 	} else {
-		events->exception.injected =3D vcpu->arch.exception.injected;
-		events->exception.pending =3D vcpu->arch.exception.pending;
+		events->exception.injected =3D ex->injected;
+		events->exception.pending =3D ex->pending;
 		/*
 		 * For ABI compatibility, deliberately conflate
 		 * pending and injected exceptions when
 		 * KVM_CAP_EXCEPTION_PAYLOAD isn't enabled.
 		 */
 		if (!vcpu->kvm->arch.exception_payload_enabled)
-			events->exception.injected |=3D
-				vcpu->arch.exception.pending;
+			events->exception.injected |=3D ex->pending;
 	}
-	events->exception.nr =3D vcpu->arch.exception.nr;
-	events->exception.has_error_code =3D vcpu->arch.exception.has_error_code;
-	events->exception.error_code =3D vcpu->arch.exception.error_code;
-	events->exception_has_payload =3D vcpu->arch.exception.has_payload;
-	events->exception_payload =3D vcpu->arch.exception.payload;
+	events->exception.nr =3D ex->vector;
+	events->exception.has_error_code =3D ex->has_error_code;
+	events->exception.error_code =3D ex->error_code;
+	events->exception_has_payload =3D ex->has_payload;
+	events->exception_payload =3D ex->payload;
=20
 	events->interrupt.injected =3D
 		vcpu->arch.interrupt.injected && !vcpu->arch.interrupt.soft;
@@ -5003,7 +4999,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct =
kvm_vcpu *vcpu,
 	process_nmi(vcpu);
 	vcpu->arch.exception.injected =3D events->exception.injected;
 	vcpu->arch.exception.pending =3D events->exception.pending;
-	vcpu->arch.exception.nr =3D events->exception.nr;
+	vcpu->arch.exception.vector =3D events->exception.nr;
 	vcpu->arch.exception.has_error_code =3D events->exception.has_error_code;
 	vcpu->arch.exception.error_code =3D events->exception.error_code;
 	vcpu->arch.exception.has_payload =3D events->exception_has_payload;
@@ -9497,7 +9493,7 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
=20
 static void kvm_inject_exception(struct kvm_vcpu *vcpu)
 {
-	trace_kvm_inj_exception(vcpu->arch.exception.nr,
+	trace_kvm_inj_exception(vcpu->arch.exception.vector,
 				vcpu->arch.exception.has_error_code,
 				vcpu->arch.exception.error_code,
 				vcpu->arch.exception.injected);
@@ -9569,12 +9565,12 @@ static int inject_pending_event(struct kvm_vcpu *vc=
pu, bool *req_immediate_exit)
 		 * describe the behavior of General Detect #DBs, which are
 		 * fault-like.  They do _not_ set RF, a la code breakpoints.
 		 */
-		if (exception_type(vcpu->arch.exception.nr) =3D=3D EXCPT_FAULT)
+		if (exception_type(vcpu->arch.exception.vector) =3D=3D EXCPT_FAULT)
 			__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
 					     X86_EFLAGS_RF);
=20
-		if (vcpu->arch.exception.nr =3D=3D DB_VECTOR) {
-			kvm_deliver_exception_payload(vcpu);
+		if (vcpu->arch.exception.vector =3D=3D DB_VECTOR) {
+			kvm_deliver_exception_payload(vcpu, &vcpu->arch.exception);
 			if (vcpu->arch.dr7 & DR7_GD) {
 				vcpu->arch.dr7 &=3D ~DR7_GD;
 				kvm_update_dr7(vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 501b884b8cc4..dc2af0146220 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -286,7 +286,8 @@ int kvm_write_guest_virt_system(struct kvm_vcpu *vcpu,
=20
 int handle_ud(struct kvm_vcpu *vcpu);
=20
-void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu);
+void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
+				   struct kvm_queued_exception *ex);
=20
 void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu);
 u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EBE1CCCA47A
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:48:58 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1358260AbiFNUs5 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:48:57 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37524 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1358237AbiFNUsM (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:48:12 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 590812A25E
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:00 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 r10-20020a632b0a000000b003fcb4af0273so5503414pgr.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=i8wtSp2E4JqfRNL3djq8C56gzeFUR38oagezAkgJLcI=;
        b=ceDpVZyz3vK9tKa0AWVHBl1+LlX+2QOu34wn/pzGX/BOeyxnAvNtMgmTv0Lvl9tetG
         Mi5GxTjmlGNXhKn7WjXZZ/rnKr4pSSVIONlOVgKprRH/qxuwFIeNWs5oD+JOYshAVPgs
         091UOWFrzaMdlQIcmxk34Krd3xIRCP94qJBrJtl34R/dcCvcOivoYE7AdHe8GXDnf+wL
         pehSVswvWToP/h9XKbY+RxiX6hIh19+wRd/TPdgPSKKEmxCHPQ5D7ik6OjpMnR/+f0cq
         Nq7GLDnrPfLs1CG+Hg+R4fTidXQoh/G8MitcEr8GlByADSFZZkebvhIKiIrRqMFzr/SQ
         gmcQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=i8wtSp2E4JqfRNL3djq8C56gzeFUR38oagezAkgJLcI=;
        b=y87nPijKRAdtZNx6shlg/ZFD+fa9uHlP6JPgF6pq4UETQH+nwPkIJ4R7Kv9D+bbq1G
         NHBCDUCPbEQJV9UqEf7TwWIDhmctzshKzFneBNqpAA1BYNtIeQznIqAKhJQI8RTgNgtA
         NhPhj4hhkosp3IZLj71Ku+c2eUaNwqAo6EHYjeIRKMt4E4cIEgB5pwsf6EcCxa2AQNBO
         5+tAfdhJlKRppl5rBblps7g3H3ewfHxtzr2HvtVK46N9rqrPDAOwyT37dEAkEdeWIzom
         q6gKXSy1YZKCapNVOwOx5sxFCo7xZQ00Drm0HP/YbTpftuTwcZdgaZ+Kq+ChRolXU0bT
         sMGg==
X-Gm-Message-State: AJIora+8mzJ+KY0lKGblrXVTRdN+yyTDF60RgcoxEEEVBJh/NLahmMCT
        qsc3mm5/wg6n3q6niIWdx0MHw5UAcBU=
X-Google-Smtp-Source: 
 AGRyM1snM//1jLbBcUpjQyAZcuRD1MIH9x3pvmmisOWAkqcGBXM0s6yrvhIW5MtrZdWAFYIv9M8rWubSeRk=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90b:1d90:b0:1e8:5a98:d591 with SMTP id
 pf16-20020a17090b1d9000b001e85a98d591mr6416882pjb.126.1655239679868; Tue, 14
 Jun 2022 13:47:59 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:22 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-14-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 13/21] KVM: x86: Formalize blocking of nested pending
 exceptions
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Capture nested_run_pending as block_pending_exceptions so that the logic
of why exceptions are blocked only needs to be documented once instead of
at every place that employs the logic.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 20 ++++++++++----------
 arch/x86/kvm/vmx/nested.c | 23 ++++++++++++-----------
 2 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 471d40e97890..460161e67ce5 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1347,10 +1347,16 @@ static inline bool nested_exit_on_init(struct vcpu_=
svm *svm)
=20
 static int svm_check_nested_events(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_svm *svm =3D to_svm(vcpu);
-	bool block_nested_events =3D
-		kvm_event_needs_reinjection(vcpu) || svm->nested.nested_run_pending;
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
+	struct vcpu_svm *svm =3D to_svm(vcpu);
+	/*
+	 * Only a pending nested run blocks a pending exception.  If there is a
+	 * previously injected event, the pending exception occurred while said
+	 * event was being delivered and thus needs to be handled.
+	 */
+	bool block_nested_exceptions =3D svm->nested.nested_run_pending;
+	bool block_nested_events =3D block_nested_exceptions ||
+				   kvm_event_needs_reinjection(vcpu);
=20
 	if (lapic_in_kernel(vcpu) &&
 	    test_bit(KVM_APIC_INIT, &apic->pending_events)) {
@@ -1363,13 +1369,7 @@ static int svm_check_nested_events(struct kvm_vcpu *=
vcpu)
 	}
=20
 	if (vcpu->arch.exception.pending) {
-		/*
-		 * Only a pending nested run can block a pending exception.
-		 * Otherwise an injected NMI/interrupt should either be
-		 * lost or delivered to the nested hypervisor in the EXITINTINFO
-		 * vmcb field, while delivering the pending exception.
-		 */
-		if (svm->nested.nested_run_pending)
+		if (block_nested_exceptions)
                         return -EBUSY;
 		if (!nested_exit_on_exception(svm))
 			return 0;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index fafdcbfeca1f..50fe66f0cc1b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3903,11 +3903,17 @@ static bool nested_vmx_preemption_timer_pending(str=
uct kvm_vcpu *vcpu)
=20
 static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
-	unsigned long exit_qual;
-	bool block_nested_events =3D
-	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
+	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
+	unsigned long exit_qual;
+	/*
+	 * Only a pending nested run blocks a pending exception.  If there is a
+	 * previously injected event, the pending exception occurred while said
+	 * event was being delivered and thus needs to be handled.
+	 */
+	bool block_nested_exceptions =3D vmx->nested.nested_run_pending;
+	bool block_nested_events =3D block_nested_exceptions ||
+				   kvm_event_needs_reinjection(vcpu);
=20
 	if (lapic_in_kernel(vcpu) &&
 		test_bit(KVM_APIC_INIT, &apic->pending_events)) {
@@ -3941,15 +3947,10 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 	 * Process exceptions that are higher priority than Monitor Trap Flag:
 	 * fault-like exceptions, TSS T flag #DB (not emulated by KVM, but
 	 * could theoretically come in from userspace), and ICEBP (INT1).
-	 *
-	 * Note that only a pending nested run can block a pending exception.
-	 * Otherwise an injected NMI/interrupt should either be
-	 * lost or delivered to the nested hypervisor in the IDT_VECTORING_INFO,
-	 * while delivering the pending exception.
 	 */
 	if (vcpu->arch.exception.pending &&
 	    !(vmx_get_pending_dbg_trap(vcpu) & ~DR6_BT)) {
-		if (vmx->nested.nested_run_pending)
+		if (block_nested_exceptions)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
 			goto no_vmexit;
@@ -3966,7 +3967,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 	}
=20
 	if (vcpu->arch.exception.pending) {
-		if (vmx->nested.nested_run_pending)
+		if (block_nested_exceptions)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
 			goto no_vmexit;
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4094CC433EF
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:49:34 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1353386AbiFNUtc (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:49:32 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37784 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1357760AbiFNUsO (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:48:14 -0400
Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com
 [IPv6:2607:f8b0:4864:20::64a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACC594F479
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:02 -0700 (PDT)
Received: by mail-pl1-x64a.google.com with SMTP id
 s17-20020a170902ea1100b00168b7cad0efso5381621plg.14
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=0hBUVDYa1V63NknO05YZpgVFJQbVlF6OGvqwo1CHEa8=;
        b=qO096xhzhbmoBbXvOug/Fwiz2wA7YD0H5/c+9kRUBfOC2Hlx1VOp+cxVtQLEw5+zKt
         Hs1N95XeROECeMdKDiWNnRC1yJsW+iq0JgaN2fvns2FKebq26CnCzam05LJv8nRaajmv
         J3GrqT6Lj6oUBUYkNUPibkotf41mXR/L5uhnteJJiTzk8e1UG/aGhXAtEadP+nZpHah7
         obv5nMga3+BvcWOjfFShmZjyaFOw80ZG+5eZllbCoXPT22aPQmLfdmFv5SZIsBOA/YAO
         uG1JAjuW7yKDGAcbm6aqvk/q7b87z3V9pAOcxPFqOMk04vC5Vc9WPJeoGTh17lvltbGi
         H14A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=0hBUVDYa1V63NknO05YZpgVFJQbVlF6OGvqwo1CHEa8=;
        b=mgXb8f+H3A1K9CMmievh1DukrNlmZI195ULHWul2q1kt8XUZ3egLL3P6sW0b31zXdf
         vqnMYZ/KYKH9qfIClWnGTeZaLA2jtw2UV6VKx3qDqL8Cdrvbl0hRpuE+79fmYnMTyh3q
         fyZaphlBeKhnRbi6uVgNCGMy4NdOSZwL9lLuEpCFcLLgKCmMcHNao+j1BcYJ148yCMIb
         s5wsjiu0xIkNfYtOIM0XWEDo1iJCIqE1PtE4//XODkRnAT5Ky2GS++KWpP+/wwKKjQ28
         KOX+qcHGV9NXnhgLnZE0tujn3z7Iadz1olaBvyouznkjqvfYmstiGPFZ8fG2qTXrqAOj
         hRhQ==
X-Gm-Message-State: AJIora+5P9S3eCiOacvLISJKdSFWtkOng1cjPWfYWE07uyrB7qdgbwXX
        coRa13rMqDq3N60E8PZkkTIHplaL/BE=
X-Google-Smtp-Source: 
 ABdhPJxUNhxNhuLUDcuLfIjlgQaeLj1jqxeYHrVLquCta8611mU2pbL0A7hHK9zF5w1lB0T8KzFZivfIweo=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:23ce:b0:50d:823f:981 with SMTP id
 g14-20020a056a0023ce00b0050d823f0981mr6614960pfc.10.1655239681604; Tue, 14
 Jun 2022 13:48:01 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:23 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-15-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 14/21] KVM: x86: Use kvm_queue_exception_e() to queue #DF
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Queue #DF by recursing on kvm_multiple_exception() by way of
kvm_queue_exception_e() instead of open coding the behavior.  This will
allow KVM to Just Work when a future commit moves exception interception
checks (for L2 =3D> L1) into kvm_multiple_exception().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 511c0c8af80e..e45465075005 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -663,25 +663,22 @@ static void kvm_multiple_exception(struct kvm_vcpu *v=
cpu,
 	}
 	class1 =3D exception_class(prev_nr);
 	class2 =3D exception_class(nr);
-	if ((class1 =3D=3D EXCPT_CONTRIBUTORY && class2 =3D=3D EXCPT_CONTRIBUTORY)
-		|| (class1 =3D=3D EXCPT_PF && class2 !=3D EXCPT_BENIGN)) {
+	if ((class1 =3D=3D EXCPT_CONTRIBUTORY && class2 =3D=3D EXCPT_CONTRIBUTORY=
) ||
+	    (class1 =3D=3D EXCPT_PF && class2 !=3D EXCPT_BENIGN)) {
 		/*
-		 * Generate double fault per SDM Table 5-5.  Set
-		 * exception.pending =3D true so that the double fault
-		 * can trigger a nested vmexit.
+		 * Synthesize #DF.  Clear the previously injected or pending
+		 * exception so as not to incorrectly trigger shutdown.
 		 */
-		vcpu->arch.exception.pending =3D true;
 		vcpu->arch.exception.injected =3D false;
-		vcpu->arch.exception.has_error_code =3D true;
-		vcpu->arch.exception.vector =3D DF_VECTOR;
-		vcpu->arch.exception.error_code =3D 0;
-		vcpu->arch.exception.has_payload =3D false;
-		vcpu->arch.exception.payload =3D 0;
-	} else
+		vcpu->arch.exception.pending =3D false;
+
+		kvm_queue_exception_e(vcpu, DF_VECTOR, 0);
+	} else {
 		/* replace previous exception with a new one in a hope
 		   that instruction re-execution will regenerate lost
 		   exception */
 		goto queue;
+	}
 }
=20
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0C809C433EF
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:49:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1358379AbiFNUtA (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:49:00 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38816 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1354740AbiFNUsT (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:48:19 -0400
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D23614DF63
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:04 -0700 (PDT)
Received: by mail-yb1-xb4a.google.com with SMTP id
 m11-20020a25710b000000b0065d4a4abca1so8570543ybc.18
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=iqv4+Lwd07qSh4P8HKRWQr9blRzWX3z9Tw+H35YUqtM=;
        b=JfYP9tgmlXPT+59MiTL2dMou5OicLp9Bb0k7qtwKpQF1T4ekEBeazR8kh2efSeUgMl
         jKbAP5kQTaNZAMprtkK0dSVfnzJeURvvviTX1vqS0aG4ngW/J5rFxzHhADpUkHeNgKPg
         BJ2uPMxurwVGDRv6+wkoLOyrneQhjgDiafPVh1BsXWxmlM6tE9QxT3ZOgWxVr0HmAjoY
         EMV5cWaxi/WdByzOvK8b0wWuJ4hvB6PagaKhR0l0QcutL7OeH6s2Jz+NYt0GMXSFukma
         YAeOfv6hVmiVBdlKeNSPzMifeBeEQvDuWdZHRssEbStrKKVrAlNKKR9rVyPQUWSM+YUF
         5DpA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=iqv4+Lwd07qSh4P8HKRWQr9blRzWX3z9Tw+H35YUqtM=;
        b=bx8LtwcixP8fbbh1rR/WfslLcYIkbbRqbQXeaua+pVLr+yYS3oqeoJqK2j0Z9qdZ0j
         P5ndviiZVKxbhIfhcUOmmrGaH7i40mcE9OqTHDj3CCwBC5EMCTnugLK/ZmXiv6AvEPRR
         UcUnPB1+B3cAk6sEiVQ42XjuW5chGB0393qNlrEsWhQfVBn+ShU193EzttoY/HGVU309
         aOs13WbyZ4ob9GxvYxY0+QBR93/GQ+NNAzPF5cOz6xPu0/VWUpVBxFORh5XgfGQRC2Fu
         6k6tlTCDpwLO94NHGrQU6ij2Oeu69KhpFjCQom7Qlxow0A8DCh3/hLv6uZ2Q4kvu7C39
         OMhw==
X-Gm-Message-State: AJIora/d+3kWIkwkSibEw7rJpqOE/44XekC7v9q/d6EA34f1r/CE/a1e
        +UjX1c8IaJwgLS+tx4gujDGVNkEHlYw=
X-Google-Smtp-Source: 
 AGRyM1ug7KMkcC/J7mqr+mnxuOoM2UziB6qTreQGsxxfAyflk5d7t7TRAanHTTeNtMxZf4N1BGHvT4qmERA=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a25:9b48:0:b0:664:b4d0:83ca with SMTP id
 u8-20020a259b48000000b00664b4d083camr7210021ybo.277.1655239683573; Tue, 14
 Jun 2022 13:48:03 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:24 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-16-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 15/21] KVM: x86: Hoist nested event checks above event
 injection logic
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Perform nested event checks before re-injecting exceptions/events into
L2.  If a pending exception causes VM-Exit to L1, re-injecting events
into vmcs02 is premature and wasted effort.  Take care to ensure events
that need to be re-injected are still re-injected if checking for nested
events "fails", i.e. if KVM needs to force an immediate entry+exit to
complete the to-be-re-injecteed event.

Keep the "can_inject" logic the same for now; it too can be pushed below
the nested checks, but is a slightly riskier change (see past bugs about
events not being properly purged on nested VM-Exit).

Add and/or modify comments to better document the various interactions.
Of note is the comment regarding "blocking" previously injected NMIs and
IRQs if an exception is pending.  The old comment isn't wrong strictly
speaking, but it failed to capture the reason why the logic even exists.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 89 +++++++++++++++++++++++++++-------------------
 1 file changed, 53 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e45465075005..930de833aa2b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9502,53 +9502,70 @@ static void kvm_inject_exception(struct kvm_vcpu *v=
cpu)
=20
 static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate=
_exit)
 {
+	bool can_inject =3D !kvm_event_needs_reinjection(vcpu);
 	int r;
-	bool can_inject =3D true;
=20
-	/* try to reinject previous events if any */
+	/*
+	 * Process nested events first, as nested VM-Exit supercedes event
+	 * re-injection.  If there's an event queued for re-injection, it will
+	 * be saved into the appropriate vmc{b,s}12 fields on nested VM-Exit.
+	 */
+	if (is_guest_mode(vcpu))
+		r =3D kvm_check_nested_events(vcpu);
+	else
+		r =3D 0;
=20
-	if (vcpu->arch.exception.injected) {
+	/*
+	 * Re-inject exceptions and events *especially* if immediate entry+exit
+	 * to/from L2 is needed, as any event that has already been injected
+	 * into L2 needs to complete its lifecycle before injecting a new event.
+	 *
+	 * Don't re-inject an NMI or interrupt if there is a pending exception.
+	 * This collision arises if an exception occurred while vectoring the
+	 * injected event, KVM intercepted said exception, and KVM ultimately
+	 * determined the fault belongs to the guest and queues the exception
+	 * for injection back into the guest.
+	 *
+	 * "Injected" interrupts can also collide with pending exceptions if
+	 * userspace ignores the "ready for injection" flag and blindly queues
+	 * an interrupt.  In that case, prioritizing the exception is correct,
+	 * as the exception "occurred" before the exit to userspace.  Trap-like
+	 * exceptions, e.g. most #DBs, have higher priority than interrupts.
+	 * And while fault-like exceptions, e.g. #GP and #PF, are the lowest
+	 * priority, they're only generated (pended) during instruction
+	 * execution, and interrupts are recognized at instruction boundaries.
+	 * Thus a pending fault-like exception means the fault occurred on the
+	 * *previous* instruction and must be serviced prior to recognizing any
+	 * new events in order to fully complete the previous instruction.
+	 */
+	if (vcpu->arch.exception.injected)
 		kvm_inject_exception(vcpu);
-		can_inject =3D false;
-	}
+	else if (vcpu->arch.exception.pending)
+		; /* see above */
+	else if (vcpu->arch.nmi_injected)
+		static_call(kvm_x86_inject_nmi)(vcpu);
+	else if (vcpu->arch.interrupt.injected)
+		static_call(kvm_x86_inject_irq)(vcpu, true);
+
 	/*
-	 * Do not inject an NMI or interrupt if there is a pending
-	 * exception.  Exceptions and interrupts are recognized at
-	 * instruction boundaries, i.e. the start of an instruction.
-	 * Trap-like exceptions, e.g. #DB, have higher priority than
-	 * NMIs and interrupts, i.e. traps are recognized before an
-	 * NMI/interrupt that's pending on the same instruction.
-	 * Fault-like exceptions, e.g. #GP and #PF, are the lowest
-	 * priority, but are only generated (pended) during instruction
-	 * execution, i.e. a pending fault-like exception means the
-	 * fault occurred on the *previous* instruction and must be
-	 * serviced prior to recognizing any new events in order to
-	 * fully complete the previous instruction.
+	 * Exceptions that morph to VM-Exits are handled above, and pending
+	 * exceptions on top of injected exceptions that do not VM-Exit should
+	 * either morph to #DF or, sadly, override the injected exception.
 	 */
-	else if (!vcpu->arch.exception.pending) {
-		if (vcpu->arch.nmi_injected) {
-			static_call(kvm_x86_inject_nmi)(vcpu);
-			can_inject =3D false;
-		} else if (vcpu->arch.interrupt.injected) {
-			static_call(kvm_x86_inject_irq)(vcpu, true);
-			can_inject =3D false;
-		}
-	}
-
 	WARN_ON_ONCE(vcpu->arch.exception.injected &&
 		     vcpu->arch.exception.pending);
=20
 	/*
-	 * Call check_nested_events() even if we reinjected a previous event
-	 * in order for caller to determine if it should require immediate-exit
-	 * from L2 to L1 due to pending L1 events which require exit
-	 * from L2 to L1.
+	 * Bail if immediate entry+exit to/from the guest is needed to complete
+	 * nested VM-Enter or event re-injection so that a different pending
+	 * event can be serviced (or if KVM needs to exit to userspace).
+	 *
+	 * Otherwise, continue processing events even if VM-Exit occurred.  The
+	 * VM-Exit will have cleared exceptions that were meant for L2, but
+	 * there may now be events that can be injected into L1.
 	 */
-	if (is_guest_mode(vcpu)) {
-		r =3D kvm_check_nested_events(vcpu);
-		if (r < 0)
-			goto out;
-	}
+	if (r < 0)
+		goto out;
=20
 	/* try to inject new event if pending */
 	if (vcpu->arch.exception.pending) {
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ADA5ACCA47B
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:49:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1358457AbiFNUtT (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:49:19 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38324 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1358451AbiFNUs2 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:48:28 -0400
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 171832C65D
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:16 -0700 (PDT)
Received: by mail-pj1-x104a.google.com with SMTP id
 bo24-20020a17090b091800b001ea9a916bb7so3934132pjb.6
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=8iOtL/ALJpVJGZjBfeAifxrTvYQ7W61ugeG5Yw7Lub8=;
        b=SePQneirNrbsP48vTbR0RhtjVKyiW8nlVs9B0RTo0fzqdpXKcpFAqkVd6pt7Q6KCvx
         gSlLp7i6ynya4yMF+SSXjoXY8yMqfcAYqcKkawYgrQGfZks5kkoGqP4iq0KeYmrXd/FZ
         k2i8vQmrFNTgfg+Y6rVqYDAL9JV9ByXrq4y9qmPxZVG14I8A8xRW53IwNAiiRnM7Im+D
         Z/rHxU2RmIHpzsRVsMTzT5astVdafanLbDxWFn5k3ptduj9DDNYVx5TPfbf5MsY3KZ5m
         78LM3RuRcShfCxqnCJIEvOG/iCs9a/xT8HMk5CgAtxjPY60DwckbRBqubCTKonkARXDN
         9LDg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=8iOtL/ALJpVJGZjBfeAifxrTvYQ7W61ugeG5Yw7Lub8=;
        b=Xi/5xkf2dW3OzgrEPJ/XjteuNzETPkSxcxChUJghTnxxALJz2bqtlQCtOOTmUrpqEz
         +TIh9S8FxWgXgK7UW2HZro0XW05juLPs8P9FGa/B1aTUzoa/ubsqR7dXxfSjn0XNxLG3
         hlW+b1H0CIxR8yGqPmir92slJRqxntsCbqIbIt3HAz5G35Mp2segjdRe7Zyj09jeEEWe
         d95q5NZ2ww9nluukKOZ2UWzO/42iL7PEX7ksspxzlQSvwSUNxFPlN/38dj7n59RZ0d3Y
         l1pn32xWwSocntgG1w89xq0m+KpZJLU5L4o/IIVUT+i2VSuS9oL1xH/xNdpvrsiU6vbW
         kl/g==
X-Gm-Message-State: AJIora/0fPux7ROw5gTiPfVkFpNkqHDwWWyygynZYmsetA7aL2MSiOy5
        lXBX9MlxnH/v+Gd4K5lFmAS/VAL8sPw=
X-Google-Smtp-Source: 
 AGRyM1tHQSYc1ANFNGx+tE/faCX/axU2h/Q4o3YGeJAU7mNFZ+FCpPs974QbabJOO0RC/mDM2sLgcTXhLNI=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90a:249:b0:1e0:a8a3:3c6c with SMTP id
 t9-20020a17090a024900b001e0a8a33c6cmr194574pje.0.1655239685437; Tue, 14 Jun
 2022 13:48:05 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:25 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-17-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 16/21] KVM: x86: Evaluate ability to inject SMI/NMI/IRQ
 after potential VM-Exit
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Determine whether or not new events can be injected after checking nested
events.  If a VM-Exit occurred during nested event handling, any previous
event that needed re-injection is gone from's KVM perspective; the event
is captured in the vmc*12 VM-Exit information, but doesn't exist in terms
of what needs to be done for entry to L1.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 930de833aa2b..1a301a1730a5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9502,7 +9502,7 @@ static void kvm_inject_exception(struct kvm_vcpu *vcp=
u)
=20
 static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate=
_exit)
 {
-	bool can_inject =3D !kvm_event_needs_reinjection(vcpu);
+	bool can_inject;
 	int r;
=20
 	/*
@@ -9567,7 +9567,13 @@ static int inject_pending_event(struct kvm_vcpu *vcp=
u, bool *req_immediate_exit)
 	if (r < 0)
 		goto out;
=20
-	/* try to inject new event if pending */
+	/*
+	 * New events, other than exceptions, cannot be injected if KVM needs
+	 * to re-inject a previous event.  See above comments on re-injecting
+	 * for why pending exceptions get priority.
+	 */
+	can_inject =3D !kvm_event_needs_reinjection(vcpu);
+
 	if (vcpu->arch.exception.pending) {
 		/*
 		 * Fault-class exceptions, except #DBs, set RF=3D1 in the RFLAGS
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AFA1CC433EF
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:49:36 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1358407AbiFNUte (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:49:34 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38484 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1358397AbiFNUs0 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:48:26 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B604D21813
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:08 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 b9-20020a656689000000b003f672946300so5467580pgw.16
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=+tgpk3zt7/yUoSa8bSXQmBU9ulSECxy9ETxKpZMI6U8=;
        b=ZXkecgHWVkZCjhhPA2ysiKXW9AtXJ0pi/PkB/a/ofN5p3eWWYD8GPHt04k2rRufCEO
         s1s4EgDmnwI8DT4W4pNXru7yWAcBXGrpClbvXOT8vfXkzWuC98d+Hm+2AS7HFvvhe/xY
         OVa8Cnm7UmNjuVfKu6CpSETpoSvEyCV9wxJFVCj1JWhlfCl8lqXGt1X6r6lUj3IBWW5X
         XpreiwvGCvP0nPxz4pN03+Tv97gC3pLkK01yL9k3MS2L85q1aLu/wkrqVz1lCO7KDUXL
         j00a+qXa/jPaAN6U/1SQnTDDLXrPonLhyxn+ui5mLqbTLXLYgksDlhcWjGNVpcQwEflU
         piiw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=+tgpk3zt7/yUoSa8bSXQmBU9ulSECxy9ETxKpZMI6U8=;
        b=b+QHD/4sTa6pcbfbb83HE3qG4rQcJo1ktj41phfDrfxW76BrQJsPGBYswc7buETGAO
         Nx4u8i6VpMr4TI7QdTLssKIoooIpeYvwXiHLNaC6fSIZW51NA9UGz3uDBo7VMCsdR3o+
         /LGj9l3Irdb4itlyNK6GvdVhvk6jxHgT/bGNaCNx3FStODnTa3rjknlY076M/av4tezp
         vjx5z90/CKQ6fCkejxT8laYx2cN1MyAcFjhP8bid2+mS6NRNAzl438NdPFjm5njKqpks
         LyjoWduS58yL+RISPuWJoqSYGqTQzAoaq2fHnfDkYC+vjMKfJn23t9AnXfCTcMo8OaLr
         yA/g==
X-Gm-Message-State: AJIora+3PYfd2CSnZkHfeaTEEHp0LEoo4UOpgWOPJaREsc9mdunyxhTt
        OCX+Q1JVW65JSKov8S97ueE9g7w319I=
X-Google-Smtp-Source: 
 AGRyM1sJdE0gLxrKojp9eeGW1E4PlyG3GsXZF6RqUY0KXI2GbFznfs1ptrUrphUwCx6hnvjPA7RgBc59nSw=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90a:178f:b0:1e3:3ba:c185 with SMTP id
 q15-20020a17090a178f00b001e303bac185mr194298pja.1.1655239687250; Tue, 14 Jun
 2022 13:48:07 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:26 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-18-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 17/21] KVM: x86: Morph pending exceptions to pending
 VM-Exits at queue time
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Morph pending exceptions to pending VM-Exits (due to interception) when
the exception is queued instead of waiting until nested events are
checked at VM-Entry.  This fixes a longstanding bug where KVM fails to
handle an exception that occurs during delivery of a previous exception,
KVM (L0) and L1 both want to intercept the exception (e.g. #PF for shadow
paging), and KVM determines that the exception is in the guest's domain,
i.e. queues the new exception for L2.  Deferring the interception check
causes KVM to esclate various combinations of injected+pending exceptions
to double fault (#DF) without consulting L1's interception desires, and
ends up injecting a spurious #DF into L2.

KVM has fudged around the issue for #PF by special casing emulated #PF
injection for shadow paging, but the underlying issue is not unique to
shadow paging in L0, e.g. if KVM is intercepting #PF because the guest
has a smaller maxphyaddr and L1 (but not L0) is using shadow paging.
Other exceptions are affected as well, e.g. if KVM is intercepting #GP
for one of SVM's workaround or for the VMware backdoor emulation stuff.
The other cases have gone unnoticed because the #DF is spurious if and
only if L1 resolves the exception, e.g. KVM's goofs go unnoticed if L1
would have injected #DF anyways.

The hack-a-fix has also led to ugly code, e.g. bailing from the emulator
if #PF injection forced a nested VM-Exit and the emulator finds itself
back in L1.  Allowing for direct-to-VM-Exit queueing also neatly solves
the async #PF in L2 mess; no need to set a magic flag and token, simply
queue a #PF nested VM-Exit.

Deal with event migration by flagging that a pending exception was queued
by userspace and check for interception at the next KVM_RUN, e.g. so that
KVM does the right thing regardless of the order in which userspace
restores nested state vs. event state.

When "getting" events from userspace, simply drop any pending excpetion
that is destined to be intercepted if there is also an injected exception
to be migrated.  Ideally, KVM would migrate both events, but that would
require new ABI, and practically speaking losing the event is unlikely to
be noticed, let alone fatal.  The injected exception is captured, RIP
still points at the original faulting instruction, etc...  So either the
injection on the target will trigger the same intercepted exception, or
the source of the intercepted exception was transient and/or
non-deterministic, thus dropping it is ok-ish.

Opportunistically add a gigantic comment above vmx_check_nested_events()
to document the priorities of all known events on Intel CPUs.  Kudos to
Jim Mattson for doing the hard work of collecting and interpreting the
priorities from various locations throughtout the SDM (because putting
them all in one place in the SDM would be too easy).

Fixes: a04aead144fd ("KVM: nSVM: fix running nested guests when npt=3D0")
Fixes: feaf0c7dc473 ("KVM: nVMX: Do not generate #DF if #PF happens during =
exception delivery into L2")
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  12 +-
 arch/x86/kvm/svm/nested.c       |  41 ++----
 arch/x86/kvm/vmx/nested.c       | 220 +++++++++++++++++++++-----------
 arch/x86/kvm/vmx/vmx.c          |   6 +-
 arch/x86/kvm/x86.c              | 159 ++++++++++++++++-------
 arch/x86/kvm/x86.h              |   7 +
 6 files changed, 287 insertions(+), 158 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 7f321d53a7e9..3bf7fdeeb25c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -648,7 +648,6 @@ struct kvm_queued_exception {
 	u32 error_code;
 	unsigned long payload;
 	bool has_payload;
-	u8 nested_apf;
 };
=20
 struct kvm_vcpu_arch {
@@ -750,8 +749,12 @@ struct kvm_vcpu_arch {
=20
 	u8 event_exit_inst_len;
=20
+	bool exception_from_userspace;
+
 	/* Exceptions to be injected to the guest. */
 	struct kvm_queued_exception exception;
+	/* Exception VM-Exits to be synthesized to L1. */
+	struct kvm_queued_exception exception_vmexit;
=20
 	struct kvm_queued_interrupt {
 		bool injected;
@@ -861,7 +864,6 @@ struct kvm_vcpu_arch {
 		u32 id;
 		bool send_user_only;
 		u32 host_apf_flags;
-		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
 		bool pageready_pending;
 	} apf;
@@ -1618,9 +1620,9 @@ struct kvm_x86_ops {
=20
 struct kvm_x86_nested_ops {
 	void (*leave_nested)(struct kvm_vcpu *vcpu);
+	bool (*is_exception_vmexit)(struct kvm_vcpu *vcpu, u8 vector,
+				    u32 error_code);
 	int (*check_events)(struct kvm_vcpu *vcpu);
-	bool (*handle_page_fault_workaround)(struct kvm_vcpu *vcpu,
-					     struct x86_exception *fault);
 	bool (*hv_timer_pending)(struct kvm_vcpu *vcpu);
 	void (*triple_fault)(struct kvm_vcpu *vcpu);
 	int (*get_state)(struct kvm_vcpu *vcpu,
@@ -1847,7 +1849,7 @@ void kvm_queue_exception_p(struct kvm_vcpu *vcpu, uns=
igned nr, unsigned long pay
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error=
_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fa=
ult);
-bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 				    struct x86_exception *fault);
 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
 bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 460161e67ce5..4075deefd132 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -55,28 +55,6 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *=
vcpu,
 	nested_svm_vmexit(svm);
 }
=20
-static bool nested_svm_handle_page_fault_workaround(struct kvm_vcpu *vcpu,
-						    struct x86_exception *fault)
-{
-	struct vcpu_svm *svm =3D to_svm(vcpu);
-	struct vmcb *vmcb =3D svm->vmcb;
-
- 	WARN_ON(!is_guest_mode(vcpu));
-
-	if (vmcb12_is_intercept(&svm->nested.ctl,
-				INTERCEPT_EXCEPTION_OFFSET + PF_VECTOR) &&
-	    !WARN_ON_ONCE(svm->nested.nested_run_pending)) {
-	     	vmcb->control.exit_code =3D SVM_EXIT_EXCP_BASE + PF_VECTOR;
-		vmcb->control.exit_code_hi =3D 0;
-		vmcb->control.exit_info_1 =3D fault->error_code;
-		vmcb->control.exit_info_2 =3D fault->address;
-		nested_svm_vmexit(svm);
-		return true;
-	}
-
-	return false;
-}
-
 static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index)
 {
 	struct vcpu_svm *svm =3D to_svm(vcpu);
@@ -1297,16 +1275,17 @@ int nested_svm_check_permissions(struct kvm_vcpu *v=
cpu)
 	return 0;
 }
=20
-static bool nested_exit_on_exception(struct vcpu_svm *svm)
+static bool nested_svm_is_exception_vmexit(struct kvm_vcpu *vcpu, u8 vecto=
r,
+					   u32 error_code)
 {
-	unsigned int vector =3D svm->vcpu.arch.exception.vector;
+	struct vcpu_svm *svm =3D to_svm(vcpu);
=20
 	return (svm->nested.ctl.intercepts[INTERCEPT_EXCEPTION] & BIT(vector));
 }
=20
 static void nested_svm_inject_exception_vmexit(struct kvm_vcpu *vcpu)
 {
-	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception_vmexit;
 	struct vcpu_svm *svm =3D to_svm(vcpu);
 	struct vmcb *vmcb =3D svm->vmcb;
=20
@@ -1368,15 +1347,19 @@ static int svm_check_nested_events(struct kvm_vcpu =
*vcpu)
 		return 0;
 	}
=20
-	if (vcpu->arch.exception.pending) {
+	if (vcpu->arch.exception_vmexit.pending) {
 		if (block_nested_exceptions)
                         return -EBUSY;
-		if (!nested_exit_on_exception(svm))
-			return 0;
 		nested_svm_inject_exception_vmexit(vcpu);
 		return 0;
 	}
=20
+	if (vcpu->arch.exception.pending) {
+		if (block_nested_exceptions)
+			return -EBUSY;
+		return 0;
+	}
+
 	if (vcpu->arch.smi_pending && !svm_smi_blocked(vcpu)) {
 		if (block_nested_events)
 			return -EBUSY;
@@ -1714,8 +1697,8 @@ static bool svm_get_nested_state_pages(struct kvm_vcp=
u *vcpu)
=20
 struct kvm_x86_nested_ops svm_nested_ops =3D {
 	.leave_nested =3D svm_leave_nested,
+	.is_exception_vmexit =3D nested_svm_is_exception_vmexit,
 	.check_events =3D svm_check_nested_events,
-	.handle_page_fault_workaround =3D nested_svm_handle_page_fault_workaround,
 	.triple_fault =3D nested_svm_triple_fault,
 	.get_nested_state_pages =3D svm_get_nested_state_pages,
 	.get_state =3D svm_get_nested_state,
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 50fe66f0cc1b..53f6ea15081d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -438,59 +438,22 @@ static bool nested_vmx_is_page_fault_vmexit(struct vm=
cs12 *vmcs12,
 	return inequality ^ bit;
 }
=20
-
-/*
- * KVM wants to inject page-faults which it got to the guest. This function
- * checks whether in a nested guest, we need to inject them to L1 or L2.
- */
-static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned long=
 *exit_qual)
-{
-	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
-	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
-
-	if (ex->vector =3D=3D PF_VECTOR) {
-		if (ex->nested_apf) {
-			*exit_qual =3D vcpu->arch.apf.nested_apf_token;
-			return 1;
-		}
-		if (nested_vmx_is_page_fault_vmexit(vmcs12, ex->error_code)) {
-			*exit_qual =3D ex->has_payload ? ex->payload : vcpu->arch.cr2;
-			return 1;
-		}
-	} else if (vmcs12->exception_bitmap & (1u << ex->vector)) {
-		if (ex->vector =3D=3D DB_VECTOR) {
-			if (ex->has_payload) {
-				*exit_qual =3D ex->payload;
-			} else {
-				*exit_qual =3D vcpu->arch.dr6;
-				*exit_qual &=3D ~DR6_BT;
-				*exit_qual ^=3D DR6_ACTIVE_LOW;
-			}
-		} else
-			*exit_qual =3D 0;
-		return 1;
-	}
-
-	return 0;
-}
-
-static bool nested_vmx_handle_page_fault_workaround(struct kvm_vcpu *vcpu,
-						    struct x86_exception *fault)
+static bool nested_vmx_is_exception_vmexit(struct kvm_vcpu *vcpu, u8 vecto=
r,
+					   u32 error_code)
 {
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
=20
-	WARN_ON(!is_guest_mode(vcpu));
+	/*
+	 * Drop bits 31:16 of the error code when performing the #PF mask+match
+	 * check.  All VMCS fields involved are 32 bits, but Intel CPUs never
+	 * set bits 31:16 and VMX disallows setting bits 31:16 in the injected
+	 * error code.  Including the to-be-dropped bits in the check might
+	 * result in an "impossible" or missed exit from L1's perspective.
+	 */
+	if (vector =3D=3D PF_VECTOR)
+		return nested_vmx_is_page_fault_vmexit(vmcs12, (u16)error_code);
=20
-	if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code) &&
-	    !WARN_ON_ONCE(to_vmx(vcpu)->nested.nested_run_pending)) {
-		vmcs12->vm_exit_intr_error_code =3D fault->error_code;
-		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
-				  PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
-				  INTR_INFO_DELIVER_CODE_MASK | INTR_INFO_VALID_MASK,
-				  fault->address);
-		return true;
-	}
-	return false;
+	return (vmcs12->exception_bitmap & (1u << vector));
 }
=20
 static int nested_vmx_check_io_bitmap_controls(struct kvm_vcpu *vcpu,
@@ -3823,12 +3786,24 @@ static int vmx_complete_nested_posted_interrupt(str=
uct kvm_vcpu *vcpu)
 	return -ENXIO;
 }
=20
-static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
-					       unsigned long exit_qual)
+static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu)
 {
-	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception_vmexit;
 	u32 intr_info =3D ex->vector | INTR_INFO_VALID_MASK;
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
+	unsigned long exit_qual;
+
+	if (ex->has_payload) {
+		exit_qual =3D ex->payload;
+	} else if (ex->vector =3D=3D PF_VECTOR) {
+		exit_qual =3D vcpu->arch.cr2;
+	} else if (ex->vector =3D=3D DB_VECTOR) {
+		exit_qual =3D vcpu->arch.dr6;
+		exit_qual &=3D ~DR6_BT;
+		exit_qual ^=3D DR6_ACTIVE_LOW;
+	} else {
+		exit_qual =3D 0;
+	}
=20
 	if (ex->has_error_code) {
 		/*
@@ -3870,14 +3845,24 @@ static void nested_vmx_inject_exception_vmexit(stru=
ct kvm_vcpu *vcpu,
  * from the emulator (because such #DBs are fault-like and thus don't trig=
ger
  * actions that fire on instruction retire).
  */
-static inline unsigned long vmx_get_pending_dbg_trap(struct kvm_vcpu *vcpu)
+static unsigned long vmx_get_pending_dbg_trap(struct kvm_queued_exception =
*ex)
 {
-	if (!vcpu->arch.exception.pending ||
-	    vcpu->arch.exception.vector !=3D DB_VECTOR)
+	if (!ex->pending || ex->vector !=3D DB_VECTOR)
 		return 0;
=20
 	/* General Detect #DBs are always fault-like. */
-	return vcpu->arch.exception.payload & ~DR6_BD;
+	return ex->payload & ~DR6_BD;
+}
+
+/*
+ * Returns true if there's a pending #DB exception that is lower priority =
than
+ * a pending Monitor Trap Flag VM-Exit.  TSS T-flag #DBs are not emulated =
by
+ * KVM, but could theoretically be injected by userspace.  Note, this code=
 is
+ * imperfect, see above.
+ */
+static bool vmx_is_low_priority_db_trap(struct kvm_queued_exception *ex)
+{
+	return vmx_get_pending_dbg_trap(ex) & ~DR6_BT;
 }
=20
 /*
@@ -3889,8 +3874,9 @@ static inline unsigned long vmx_get_pending_dbg_trap(=
struct kvm_vcpu *vcpu)
  */
 static void nested_vmx_update_pending_dbg(struct kvm_vcpu *vcpu)
 {
-	unsigned long pending_dbg =3D vmx_get_pending_dbg_trap(vcpu);
+	unsigned long pending_dbg;
=20
+	pending_dbg =3D vmx_get_pending_dbg_trap(&vcpu->arch.exception);
 	if (pending_dbg)
 		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, pending_dbg);
 }
@@ -3901,11 +3887,93 @@ static bool nested_vmx_preemption_timer_pending(str=
uct kvm_vcpu *vcpu)
 	       to_vmx(vcpu)->nested.preemption_timer_expired;
 }
=20
+/*
+ * Per the Intel SDM's table "Priority Among Concurrent Events", with minor
+ * edits to fill in missing examples, e.g. #DB due to split-lock accesses,
+ * and less minor edits to splice in the priority of VMX Non-Root specific
+ * events, e.g. MTF and NMI/INTR-window exiting.
+ *
+ * 1 Hardware Reset and Machine Checks
+ *	- RESET
+ *	- Machine Check
+ *
+ * 2 Trap on Task Switch
+ *	- T flag in TSS is set (on task switch)
+ *
+ * 3 External Hardware Interventions
+ *	- FLUSH
+ *	- STOPCLK
+ *	- SMI
+ *	- INIT
+ *
+ * 3.5 Monitor Trap Flag (MTF) VM-exit[1]
+ *
+ * 4 Traps on Previous Instruction
+ *	- Breakpoints
+ *	- Trap-class Debug Exceptions (#DB due to TF flag set, data/I-O
+ *	  breakpoint, or #DB due to a split-lock access)
+ *
+ * 4.3	VMX-preemption timer expired VM-exit
+ *
+ * 4.6	NMI-window exiting VM-exit[2]
+ *
+ * 5 Nonmaskable Interrupts (NMI)
+ *
+ * 5.5 Interrupt-window exiting VM-exit and Virtual-interrupt delivery
+ *
+ * 6 Maskable Hardware Interrupts
+ *
+ * 7 Code Breakpoint Fault
+ *
+ * 8 Faults from Fetching Next Instruction
+ *	- Code-Segment Limit Violation
+ *	- Code Page Fault
+ *	- Control protection exception (missing ENDBRANCH at target of indirect
+ *					call or jump)
+ *
+ * 9 Faults from Decoding Next Instruction
+ *	- Instruction length > 15 bytes
+ *	- Invalid Opcode
+ *	- Coprocessor Not Available
+ *
+ *10 Faults on Executing Instruction
+ *	- Overflow
+ *	- Bound error
+ *	- Invalid TSS
+ *	- Segment Not Present
+ *	- Stack fault
+ *	- General Protection
+ *	- Data Page Fault
+ *	- Alignment Check
+ *	- x86 FPU Floating-point exception
+ *	- SIMD floating-point exception
+ *	- Virtualization exception
+ *	- Control protection exception
+ *
+ * [1] Per the "Monitor Trap Flag" section: System-management interrupts (=
SMIs),
+ *     INIT signals, and higher priority events take priority over MTF VM =
exits.
+ *     MTF VM exits take priority over debug-trap exceptions and lower pri=
ority
+ *     events.
+ *
+ * [2] Debug-trap exceptions and higher priority events take priority over=
 VM exits
+ *     caused by the VMX-preemption timer.  VM exits caused by the VMX-pre=
emption
+ *     timer take priority over VM exits caused by the "NMI-window exiting"
+ *     VM-execution control and lower priority events.
+ *
+ * [3] Debug-trap exceptions and higher priority events take priority over=
 VM exits
+ *     caused by "NMI-window exiting".  VM exits caused by this control ta=
ke
+ *     priority over non-maskable interrupts (NMIs) and lower priority eve=
nts.
+ *
+ * [4] Virtual-interrupt delivery has the same priority as that of VM exit=
s due to
+ *     the 1-setting of the "interrupt-window exiting" VM-execution contro=
l.  Thus,
+ *     non-maskable interrupts (NMIs) and higher priority events take prio=
rity over
+ *     delivery of a virtual interrupt; delivery of a virtual interrupt ta=
kes
+ *     priority over external interrupts and lower priority events.
+ */
 static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
-	unsigned long exit_qual;
 	/*
 	 * Only a pending nested run blocks a pending exception.  If there is a
 	 * previously injected event, the pending exception occurred while said
@@ -3943,19 +4011,20 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 		/* Fallthrough, the SIPI is completely ignored. */
 	}
=20
-	/*
-	 * Process exceptions that are higher priority than Monitor Trap Flag:
-	 * fault-like exceptions, TSS T flag #DB (not emulated by KVM, but
-	 * could theoretically come in from userspace), and ICEBP (INT1).
-	 */
+	if (vcpu->arch.exception_vmexit.pending &&
+	    !vmx_is_low_priority_db_trap(&vcpu->arch.exception_vmexit)) {
+		if (block_nested_exceptions)
+			return -EBUSY;
+
+		nested_vmx_inject_exception_vmexit(vcpu);
+		return 0;
+	}
+
 	if (vcpu->arch.exception.pending &&
-	    !(vmx_get_pending_dbg_trap(vcpu) & ~DR6_BT)) {
+	    !vmx_is_low_priority_db_trap(&vcpu->arch.exception)) {
 		if (block_nested_exceptions)
 			return -EBUSY;
-		if (!nested_vmx_check_exception(vcpu, &exit_qual))
-			goto no_vmexit;
-		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
-		return 0;
+		goto no_vmexit;
 	}
=20
 	if (vmx->nested.mtf_pending) {
@@ -3966,13 +4035,18 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 		return 0;
 	}
=20
+	if (vcpu->arch.exception_vmexit.pending) {
+		if (block_nested_exceptions)
+			return -EBUSY;
+
+		nested_vmx_inject_exception_vmexit(vcpu);
+		return 0;
+	}
+
 	if (vcpu->arch.exception.pending) {
 		if (block_nested_exceptions)
 			return -EBUSY;
-		if (!nested_vmx_check_exception(vcpu, &exit_qual))
-			goto no_vmexit;
-		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
-		return 0;
+		goto no_vmexit;
 	}
=20
 	if (nested_vmx_preemption_timer_pending(vcpu)) {
@@ -6863,8 +6937,8 @@ __init int nested_vmx_hardware_setup(int (*exit_handl=
ers[])(struct kvm_vcpu *))
=20
 struct kvm_x86_nested_ops vmx_nested_ops =3D {
 	.leave_nested =3D vmx_leave_nested,
+	.is_exception_vmexit =3D nested_vmx_is_exception_vmexit,
 	.check_events =3D vmx_check_nested_events,
-	.handle_page_fault_workaround =3D nested_vmx_handle_page_fault_workaround,
 	.hv_timer_pending =3D nested_vmx_preemption_timer_pending,
 	.triple_fault =3D nested_vmx_triple_fault,
 	.get_state =3D vmx_get_nested_state,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7ef5659a1bbd..3591fdf7ecf9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1585,7 +1585,9 @@ static void vmx_update_emulated_instruction(struct kv=
m_vcpu *vcpu)
 	 */
 	if (nested_cpu_has_mtf(vmcs12) &&
 	    (!vcpu->arch.exception.pending ||
-	     vcpu->arch.exception.vector =3D=3D DB_VECTOR))
+	     vcpu->arch.exception.vector =3D=3D DB_VECTOR) &&
+	    (!vcpu->arch.exception_vmexit.pending ||
+	     vcpu->arch.exception_vmexit.vector =3D=3D DB_VECTOR))
 		vmx->nested.mtf_pending =3D true;
 	else
 		vmx->nested.mtf_pending =3D false;
@@ -5624,7 +5626,7 @@ static bool vmx_emulation_required_with_pending_excep=
tion(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
 	return vmx->emulation_required && !vmx->rmode.vm86_active &&
-	       (vcpu->arch.exception.pending || vcpu->arch.exception.injected);
+	       (kvm_is_exception_pending(vcpu) || vcpu->arch.exception.injected);
 }
=20
 static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1a301a1730a5..63ee79da50df 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -609,6 +609,21 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vc=
pu,
 }
 EXPORT_SYMBOL_GPL(kvm_deliver_exception_payload);
=20
+static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int=
 vector,
+				       bool has_error_code, u32 error_code,
+				       bool has_payload, unsigned long payload)
+{
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception_vmexit;
+
+	ex->vector =3D vector;
+	ex->injected =3D false;
+	ex->pending =3D true;
+	ex->has_error_code =3D has_error_code;
+	ex->error_code =3D error_code;
+	ex->has_payload =3D has_payload;
+	ex->payload =3D payload;
+}
+
 static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 		unsigned nr, bool has_error, u32 error_code,
 	        bool has_payload, unsigned long payload, bool reinject)
@@ -618,18 +633,31 @@ static void kvm_multiple_exception(struct kvm_vcpu *v=
cpu,
=20
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
=20
+	/*
+	 * If the exception is destined for L2 and isn't being reinjected,
+	 * morph it to a VM-Exit if L1 wants to intercept the exception.  A
+	 * previously injected exception is not checked because it was checked
+	 * when it was original queued, and re-checking is incorrect if _L1_
+	 * injected the exception, in which case it's exempt from interception.
+	 */
+	if (!reinject && is_guest_mode(vcpu) &&
+	    kvm_x86_ops.nested_ops->is_exception_vmexit(vcpu, nr, error_code)) {
+		kvm_queue_exception_vmexit(vcpu, nr, has_error, error_code,
+					   has_payload, payload);
+		return;
+	}
+
 	if (!vcpu->arch.exception.pending && !vcpu->arch.exception.injected) {
 	queue:
 		if (reinject) {
 			/*
-			 * On vmentry, vcpu->arch.exception.pending is only
-			 * true if an event injection was blocked by
-			 * nested_run_pending.  In that case, however,
-			 * vcpu_enter_guest requests an immediate exit,
-			 * and the guest shouldn't proceed far enough to
-			 * need reinjection.
+			 * On VM-Entry, an exception can be pending if and only
+			 * if event injection was blocked by nested_run_pending.
+			 * In that case, however, vcpu_enter_guest() requests an
+			 * immediate exit, and the guest shouldn't proceed far
+			 * enough to need reinjection.
 			 */
-			WARN_ON_ONCE(vcpu->arch.exception.pending);
+			WARN_ON_ONCE(kvm_is_exception_pending(vcpu));
 			vcpu->arch.exception.injected =3D true;
 			if (WARN_ON_ONCE(has_payload)) {
 				/*
@@ -732,20 +760,22 @@ static int complete_emulated_insn_gp(struct kvm_vcpu =
*vcpu, int err)
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fa=
ult)
 {
 	++vcpu->stat.pf_guest;
-	vcpu->arch.exception.nested_apf =3D
-		is_guest_mode(vcpu) && fault->async_page_fault;
-	if (vcpu->arch.exception.nested_apf) {
-		vcpu->arch.apf.nested_apf_token =3D fault->address;
-		kvm_queue_exception_e(vcpu, PF_VECTOR, fault->error_code);
-	} else {
+
+	/*
+	 * Async #PF in L2 is always forwarded to L1 as a VM-Exit regardless of
+	 * whether or not L1 wants to intercept "regular" #PF.
+	 */
+	if (is_guest_mode(vcpu) && fault->async_page_fault)
+		kvm_queue_exception_vmexit(vcpu, PF_VECTOR,
+					   true, fault->error_code,
+					   true, fault->address);
+	else
 		kvm_queue_exception_e_p(vcpu, PF_VECTOR, fault->error_code,
 					fault->address);
-	}
 }
 EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
=20
-/* Returns true if the page fault was immediately morphed into a VM-Exit. =
*/
-bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 				    struct x86_exception *fault)
 {
 	struct kvm_mmu *fault_mmu;
@@ -763,26 +793,7 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *v=
cpu,
 		kvm_mmu_invalidate_gva(vcpu, fault_mmu, fault->address,
 				       fault_mmu->root.hpa);
=20
-	/*
-	 * A workaround for KVM's bad exception handling.  If KVM injected an
-	 * exception into L2, and L2 encountered a #PF while vectoring the
-	 * injected exception, manually check to see if L1 wants to intercept
-	 * #PF, otherwise queuing the #PF will lead to #DF or a lost exception.
-	 * In all other cases, defer the check to nested_ops->check_events(),
-	 * which will correctly handle priority (this does not).  Note, other
-	 * exceptions, e.g. #GP, are theoretically affected, #PF is simply the
-	 * most problematic, e.g. when L0 and L1 are both intercepting #PF for
-	 * shadow paging.
-	 *
-	 * TODO: Rewrite exception handling to track injected and pending
-	 *       (VM-Exit) exceptions separately.
-	 */
-	if (unlikely(vcpu->arch.exception.injected && is_guest_mode(vcpu)) &&
-	    kvm_x86_ops.nested_ops->handle_page_fault_workaround(vcpu, fault))
-		return true;
-
 	fault_mmu->inject_page_fault(vcpu, fault);
-	return false;
 }
 EXPORT_SYMBOL_GPL(kvm_inject_emulated_page_fault);
=20
@@ -4752,7 +4763,7 @@ static int kvm_vcpu_ready_for_interrupt_injection(str=
uct kvm_vcpu *vcpu)
 	return (kvm_arch_interrupt_allowed(vcpu) &&
 		kvm_cpu_accept_dm_intr(vcpu) &&
 		!kvm_event_needs_reinjection(vcpu) &&
-		!vcpu->arch.exception.pending);
+		!kvm_is_exception_pending(vcpu));
 }
=20
 static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
@@ -4881,13 +4892,27 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vc=
pu *vcpu,
 static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 					       struct kvm_vcpu_events *events)
 {
-	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	struct kvm_queued_exception *ex;
=20
 	process_nmi(vcpu);
=20
 	if (kvm_check_request(KVM_REQ_SMI, vcpu))
 		process_smi(vcpu);
=20
+	/*
+	 * KVM's ABI only allows for one exception to be migrated.  Luckily,
+	 * the only time there can be two queued exceptions is if there's a
+	 * non-exiting _injected_ exception, and a pending exiting exception.
+	 * In that case, ignore the VM-Exiting exception as it's an extension
+	 * of the injected exception.
+	 */
+	if (vcpu->arch.exception_vmexit.pending &&
+	    !vcpu->arch.exception.pending &&
+	    !vcpu->arch.exception.injected)
+		ex =3D &vcpu->arch.exception_vmexit;
+	else
+		ex =3D &vcpu->arch.exception;
+
 	/*
 	 * In guest mode, payload delivery should be deferred if the exception
 	 * will be intercepted by L1, e.g. KVM should not modifying CR2 if L1
@@ -4994,6 +5019,19 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct=
 kvm_vcpu *vcpu,
 		return -EINVAL;
=20
 	process_nmi(vcpu);
+
+	/*
+	 * Flag that userspace is stuffing an exception, the next KVM_RUN will
+	 * morph the exception to a VM-Exit if appropriate.  Do this only for
+	 * pending exceptions, already-injected exceptions are not subject to
+	 * intercpetion.  Note, userspace that conflates pending and injected
+	 * is hosed, and will incorrectly convert an injected exception into a
+	 * pending exception, which in turn may cause a spurious VM-Exit.
+	 */
+	vcpu->arch.exception_from_userspace =3D events->exception.pending;
+
+	vcpu->arch.exception_vmexit.pending =3D false;
+
 	vcpu->arch.exception.injected =3D events->exception.injected;
 	vcpu->arch.exception.pending =3D events->exception.pending;
 	vcpu->arch.exception.vector =3D events->exception.nr;
@@ -7977,18 +8015,17 @@ static void toggle_interruptibility(struct kvm_vcpu=
 *vcpu, u32 mask)
 	}
 }
=20
-static bool inject_emulated_exception(struct kvm_vcpu *vcpu)
+static void inject_emulated_exception(struct kvm_vcpu *vcpu)
 {
 	struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt;
+
 	if (ctxt->exception.vector =3D=3D PF_VECTOR)
-		return kvm_inject_emulated_page_fault(vcpu, &ctxt->exception);
-
-	if (ctxt->exception.error_code_valid)
+		kvm_inject_emulated_page_fault(vcpu, &ctxt->exception);
+	else if (ctxt->exception.error_code_valid)
 		kvm_queue_exception_e(vcpu, ctxt->exception.vector,
 				      ctxt->exception.error_code);
 	else
 		kvm_queue_exception(vcpu, ctxt->exception.vector);
-	return false;
 }
=20
 static struct x86_emulate_ctxt *alloc_emulate_ctxt(struct kvm_vcpu *vcpu)
@@ -8601,8 +8638,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gp=
a_t cr2_or_gpa,
=20
 	if (ctxt->have_exception) {
 		r =3D 1;
-		if (inject_emulated_exception(vcpu))
-			return r;
+		inject_emulated_exception(vcpu);
 	} else if (vcpu->arch.pio.count) {
 		if (!vcpu->arch.pio.in) {
 			/* FIXME: return into emulator if single-stepping.  */
@@ -9540,7 +9576,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu=
, bool *req_immediate_exit)
 	 */
 	if (vcpu->arch.exception.injected)
 		kvm_inject_exception(vcpu);
-	else if (vcpu->arch.exception.pending)
+	else if (kvm_is_exception_pending(vcpu))
 		; /* see above */
 	else if (vcpu->arch.nmi_injected)
 		static_call(kvm_x86_inject_nmi)(vcpu);
@@ -9567,6 +9603,14 @@ static int inject_pending_event(struct kvm_vcpu *vcp=
u, bool *req_immediate_exit)
 	if (r < 0)
 		goto out;
=20
+	/*
+	 * A pending exception VM-Exit should either result in nested VM-Exit
+	 * or force an immediate re-entry and exit to/from L2, and exception
+	 * VM-Exits cannot be injected (flag should _never_ be set).
+	 */
+	WARN_ON_ONCE(vcpu->arch.exception_vmexit.injected ||
+		     vcpu->arch.exception_vmexit.pending);
+
 	/*
 	 * New events, other than exceptions, cannot be injected if KVM needs
 	 * to re-inject a previous event.  See above comments on re-injecting
@@ -9666,7 +9710,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu=
, bool *req_immediate_exit)
 	    kvm_x86_ops.nested_ops->hv_timer_pending(vcpu))
 		*req_immediate_exit =3D true;
=20
-	WARN_ON(vcpu->arch.exception.pending);
+	WARN_ON(kvm_is_exception_pending(vcpu));
 	return 0;
=20
 out:
@@ -10680,6 +10724,7 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
=20
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
 	struct kvm_run *kvm_run =3D vcpu->run;
 	int r;
=20
@@ -10738,6 +10783,21 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		}
 	}
=20
+	/*
+	 * If userspace set a pending exception and L2 is active, convert it to
+	 * a pending VM-Exit if L1 wants to intercept the exception.
+	 */
+	if (vcpu->arch.exception_from_userspace && is_guest_mode(vcpu) &&
+	    kvm_x86_ops.nested_ops->is_exception_vmexit(vcpu, ex->vector,
+							ex->error_code)) {
+		kvm_queue_exception_vmexit(vcpu, ex->vector,
+					   ex->has_error_code, ex->error_code,
+					   ex->has_payload, ex->payload);
+		ex->injected =3D false;
+		ex->pending =3D false;
+	}
+	vcpu->arch.exception_from_userspace =3D false;
+
 	if (unlikely(vcpu->arch.complete_userspace_io)) {
 		int (*cui)(struct kvm_vcpu *) =3D vcpu->arch.complete_userspace_io;
 		vcpu->arch.complete_userspace_io =3D NULL;
@@ -10842,6 +10902,7 @@ static void __set_regs(struct kvm_vcpu *vcpu, struc=
t kvm_regs *regs)
 	kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
=20
 	vcpu->arch.exception.pending =3D false;
+	vcpu->arch.exception_vmexit.pending =3D false;
=20
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
@@ -11209,7 +11270,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_=
vcpu *vcpu,
=20
 	if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) {
 		r =3D -EBUSY;
-		if (vcpu->arch.exception.pending)
+		if (kvm_is_exception_pending(vcpu))
 			goto out;
 		if (dbg->control & KVM_GUESTDBG_INJECT_DB)
 			kvm_queue_exception(vcpu, DB_VECTOR);
@@ -12387,7 +12448,7 @@ static inline bool kvm_vcpu_has_events(struct kvm_v=
cpu *vcpu)
 	if (vcpu->arch.pv.pv_unhalted)
 		return true;
=20
-	if (vcpu->arch.exception.pending)
+	if (kvm_is_exception_pending(vcpu))
 		return true;
=20
 	if (kvm_test_request(KVM_REQ_NMI, vcpu) ||
@@ -12641,7 +12702,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
 {
 	if (unlikely(!lapic_in_kernel(vcpu) ||
 		     kvm_event_needs_reinjection(vcpu) ||
-		     vcpu->arch.exception.pending))
+		     kvm_is_exception_pending(vcpu)))
 		return false;
=20
 	if (kvm_hlt_in_guest(vcpu->kvm) && !kvm_can_deliver_async_pf(vcpu))
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index dc2af0146220..eee259e387d3 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -82,10 +82,17 @@ static inline unsigned int __shrink_ple_window(unsigned=
 int val,
 void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu);
 int kvm_check_nested_events(struct kvm_vcpu *vcpu);
=20
+static inline bool kvm_is_exception_pending(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.exception.pending ||
+	       vcpu->arch.exception_vmexit.pending;
+}
+
 static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.exception.pending =3D false;
 	vcpu->arch.exception.injected =3D false;
+	vcpu->arch.exception_vmexit.pending =3D false;
 }
=20
 static inline void kvm_queue_interrupt(struct kvm_vcpu *vcpu, u8 vector,
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B05F9C43334
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:49:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1358424AbiFNUti (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:49:38 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38816 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1344430AbiFNUsa (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:48:30 -0400
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FAF83F8BF
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:20 -0700 (PDT)
Received: by mail-pj1-x104a.google.com with SMTP id
 w36-20020a17090a6ba700b001e876698a01so53395pjj.5
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=qsUEai7S+ognYlpER+KbwHJRY/EwopF0AVDuG/EQcDQ=;
        b=KT9Rcxc8Qt3yuV+LvNyoHYLpsf4j++TIneI2+reREdh93ArAwh5ZGHNTJHrhPVwU1o
         nyvoQ07boYHGbaGa6CdwoGfGJ5+YoNeOUEVSdpCGgn33mEoUN9itg3n6ph+b+8ZvBdjg
         DkgjGC3RKby0xf2wghCkmwzh6vdQzchj4cMiITak5ztJOsG2a0scvEhZqgjw0YI9rRZL
         lVv8L4qUfqbePyYPLzK0iAkoe2H62ECJK8N0fozVUtGLfvpEm8oc5SUV1U+DJ39ehaCQ
         +IpYe35aS/ovhxCML5dsHOxslj06XT/6qAcK92dEifR3/mjxZ3trTSZLsLXQC+n4JZef
         RGBQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=qsUEai7S+ognYlpER+KbwHJRY/EwopF0AVDuG/EQcDQ=;
        b=K2LVLeuN8G5DdfDCox96ueU4K4BHT82EDy792gFVSQuKu4LBixMxyxRYl34Pgd5Y6W
         Vmo3HDwDs2oWlQldhgln7QL3sI64ptV+jAg4shZERlhjApGV8hph8mlL07H4W+xhRyEB
         CyKVQfOCy/Heuu8vGKafVtlA5esKANdF0bg0xQxkzAmbBzQo/MPgEARr6yPv5+7EbDFh
         vWFhzjX2rZYCeTbyTIZiVmjibswEw15bJuWRjawQlUJ/X77RL8Q8uZu5G83YbbHtlppQ
         ubrB/np2Z8WNO/C6jPsS+kEN4o2JVquUUrwYKwpTvdhhWMKxaW84oFEJ/UJmK++xeGgt
         4f/Q==
X-Gm-Message-State: AJIora86II1eYBr0Rja8XweBaG0e5DS38l2K8CzaI0Olubwph6cxNMu1
        twCJfPxECbgTmVXEMmUdA7nS4lCKJeI=
X-Google-Smtp-Source: 
 AGRyM1sOR6OpSgIunQA2gYbpxZuaMZv7NVdH16moXPiHrIlxosI4EeZc2C+LZIdCOk4TW1RJuwaVGOa2S74=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:90b:33d2:b0:1ea:b599:9e89 with SMTP id
 lk18-20020a17090b33d200b001eab5999e89mr6427412pjb.88.1655239689874; Tue, 14
 Jun 2022 13:48:09 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:27 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-19-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 18/21] KVM: x86: Treat pending TRIPLE_FAULT requests as
 pending exceptions
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Treat pending TRIPLE_FAULTS as pending exceptions.  A triple fault is an
exception for all intents and purposes, it's just not tracked as such
because there's no vector associated the exception.  E.g. if userspace
were to set vcpu->request_interrupt_window while running L2 and L2 hit a
triple fault, a triple fault nested VM-Exit should be synthesized to L1
before exiting to userspace with KVM_EXIT_IRQ_WINDOW_OPEN.

Link: https://lore.kernel.org/all/YoVHAIGcFgJit1qp@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 3 ---
 arch/x86/kvm/x86.h | 3 ++-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 63ee79da50df..8e54a074b7ff 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12477,9 +12477,6 @@ static inline bool kvm_vcpu_has_events(struct kvm_v=
cpu *vcpu)
 	if (kvm_xen_has_pending_events(vcpu))
 		return true;
=20
-	if (kvm_test_request(KVM_REQ_TRIPLE_FAULT, vcpu))
-		return true;
-
 	return false;
 }
=20
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index eee259e387d3..078765287ec6 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -85,7 +85,8 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu);
 static inline bool kvm_is_exception_pending(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.exception.pending ||
-	       vcpu->arch.exception_vmexit.pending;
+	       vcpu->arch.exception_vmexit.pending ||
+	       kvm_test_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 }
=20
 static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 60B11C43334
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:49:57 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1358362AbiFNUtz (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:49:55 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37528 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1358279AbiFNUst (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:48:49 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B16A63EBAC
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:21 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 h190-20020a636cc7000000b003fd5d5452cfso5491649pgc.8
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=mb7FzKFSEgVLobIXf3OdFdCUMJfOed79vFK0Lawhe6g=;
        b=X6JBA0UtNYbgdBQp+RD6/GvBM17LkM9xHYl4iWexyTTR2aLmKJkaz8TuNjMuFuVkX2
         7zhuFQPJi5Cyc4Zzqs/NFyNQtaUddNt3YHG+37PrMWf/1/JfU67rTgzZseYs3AysvB7D
         QajCisyV3faBWYp4t07Dt2nUCFmbPnLMId2Bjx6Wx+UIkl7uB25MRVl8w+g4yjd1ll6y
         vLl7E1S3tlAapCiGmWbWAtxtcarAUTk/aOYnFZ9WXIfh7IFbkgb7FnB3p1HsXlXNnWC4
         zW80NyQY2QKlL9IYEqoX2/zvvYwT1FgbG8AlnSEJGbx96c/HdEL6zePosSB3wUe1Hq3z
         m2CQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=mb7FzKFSEgVLobIXf3OdFdCUMJfOed79vFK0Lawhe6g=;
        b=VrimuBXZmbkmssobBjFuscrzRqiP9KiQRWnsG0/oBDC3F58pmpx5+y5pNK0sN5H63v
         S6KyRGcloaOgxqwj+Ax6Up5c+4fAKEpc+vjVM6l4jrAf6kHa4r5SGJ0ZQoqJfqZmfR98
         Pvo+brXez8DV7XAQ9S1kke6vh41Z0wp3PPKD7pFhmzkUhU4MNKPv0xN3YnQST0ULULpA
         vqYwZ6QqunYNilkmx6LRZP1L4pi6eRMPDxg/2L0VXHq6rOmPIdxTs6DpCpS5iUEIVbGw
         HiCpyl1ValaYihwQqYYtsuDfuUqfgI/zsrRGzBBFg2jGMDlBW0sFDlLQlKhWFDfz+D82
         FA5A==
X-Gm-Message-State: AJIora9DVqZEwpSDN/W4x9yWZ6W0hTF+rrjUWurbf5aZHzyNgzFSaPVK
        E2tHZpZmh/f13tXUDdIFIURQykdj1Kk=
X-Google-Smtp-Source: 
 AGRyM1twJFBt9US2rObzd0Vkfn4fXCtGrzQHyXKawIkj6SUZl1ATz8pMlrEj/DxR98qcQ1UOJfC+vYCGKus=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:902:bc85:b0:168:dadd:f86 with SMTP id
 bb5-20020a170902bc8500b00168dadd0f86mr5968393plb.93.1655239692180; Tue, 14
 Jun 2022 13:48:12 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:28 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-20-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 19/21] KVM: VMX: Update MTF and ICEBP comments to document
 KVM's subtle behavior
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Document the oddities of ICEBP interception (trap-like #DB is intercepted
as a fault-like exception), and how using VMX's inner "skip" helper
deliberately bypasses the pending MTF and single-step #DB logic.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3591fdf7ecf9..91b8e171f232 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1578,9 +1578,13 @@ static void vmx_update_emulated_instruction(struct k=
vm_vcpu *vcpu)
=20
 	/*
 	 * Per the SDM, MTF takes priority over debug-trap exceptions besides
-	 * T-bit traps. As instruction emulation is completed (i.e. at the
-	 * instruction boundary), any #DB exception pending delivery must be a
-	 * debug-trap. Record the pending MTF state to be delivered in
+	 * TSS T-bit traps and ICEBP (INT1).  KVM doesn't emulate T-bit traps
+	 * or ICEBP (in the emulator proper), and skipping of ICEBP after an
+	 * intercepted #DB deliberately avoids single-step #DB and MTF updates
+	 * as ICEBP is higher priority than both.  As instruction emulation is
+	 * completed at this point (i.e. KVM is at the instruction boundary),
+	 * any #DB exception pending delivery must be a debug-trap of lower
+	 * priority than MTF.  Record the pending MTF state to be delivered in
 	 * vmx_check_nested_events().
 	 */
 	if (nested_cpu_has_mtf(vmcs12) &&
@@ -5071,8 +5075,10 @@ static int handle_exception_nmi(struct kvm_vcpu *vcp=
u)
 			 * instruction.  ICEBP generates a trap-like #DB, but
 			 * despite its interception control being tied to #DB,
 			 * is an instruction intercept, i.e. the VM-Exit occurs
-			 * on the ICEBP itself.  Note, skipping ICEBP also
-			 * clears STI and MOVSS blocking.
+			 * on the ICEBP itself.  Use the inner "skip" helper to
+			 * avoid single-step #DB and MTF updates, as ICEBP is
+			 * higher priority.  Note, skipping ICEBP still clears
+			 * STI and MOVSS blocking.
 			 *
 			 * For all other #DBs, set vmcs.PENDING_DBG_EXCEPTIONS.BS
 			 * if single-step is enabled in RFLAGS and STI or MOVSS
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A42CDCCA47B
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:49:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1358359AbiFNUtx (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:49:53 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37396 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1358274AbiFNUst (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:48:49 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0276C5000E
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:22 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 e18-20020a656492000000b003fa4033f9a7so5487548pgv.17
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=9yzgzTlgrscZUiJ1Gi4IjzGYJkO0pcdsrR04YqBf76E=;
        b=rZ1uyAwvL/OQ5zJbPDu3GtwjHiMrA52vKQUQZyHFOqhPmh4K6JC4Opbx/1JqZurU4n
         R3IQkslvvfIOAaK0wE+4oU2y2EKWWd84LJKm1Vg8qIh5mX9UNmJMty6s9E1Qd/jZd4QG
         JnvrxYXToV1xhxgeIr5zxxbUjTvH4v2UnD22TGWVFrCFwNnUVVjLOgzqh5lbdOSucPDq
         zzDQwzN/ZRRdcBeqTnbMmUCKwiPGOQQwFfYh67TMTulpaJl/dgqzXkdbknHNdy1fqUet
         cOpiH/8yUp0pZ5az0R/ak73jiwJnVJKtJr06j6aTRalxsOf+BZ12l1u4ZDQLITYg4lfJ
         B06A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=9yzgzTlgrscZUiJ1Gi4IjzGYJkO0pcdsrR04YqBf76E=;
        b=FBDHwYhnwgPJ07FiqSUuZCCYT0x4lirkJ5SieMQOo8wTS/XV/J4dwvlzU5uuQ/iVR1
         slaMAAbmPX1Y8UAYKzErGt3BBQlq+SSCTQcgjm0C4RTYQLBdN2cYA7/3JSxnRRrGcm7N
         am7iSUoBPaQxw6v86rWDJip7w3IFsoKh8mjX2/5uFN81qBVsAkGDWaCX074LVcWxoR5H
         u4cTh+7+ByexHMH2UERMCk7Z7CK+69H1J4FcKuUg6HH5LohrVt5RG+Puqpw8X0svQhD4
         StOqMzq7jUbS/ktZ185fYS3KLG33js75bgq3xIYFX7hiME2hDg4ZdEYP9gh2NcqVXZz6
         p9zA==
X-Gm-Message-State: AJIora+yxE/o1gZZ35e7NXRg/UjTT2YOFNpVOjcO+2xMaQJifp2oLG6d
        PwyJ/sUHoqRKaCuiSr8xTwTawJ/t88A=
X-Google-Smtp-Source: 
 AGRyM1vnJ9jOUAVV0uEy2GCBCkz/g0b0LhaP3MerIhzM8lUSLfVcJRVpjNj492GH1iIqIot21y2s3ifkVxU=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a17:902:bf49:b0:168:d855:883b with SMTP id
 u9-20020a170902bf4900b00168d855883bmr6167928pls.79.1655239694001; Tue, 14 Jun
 2022 13:48:14 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:29 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-21-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 20/21] KVM: selftests: Use uapi header to get VMX and SVM
 exit reasons/codes
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Include the vmx.h and svm.h uapi headers that KVM so kindly provides
instead of manually defining all the same exit reasons/code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 .../selftests/kvm/include/x86_64/svm_util.h   |  7 +--
 .../selftests/kvm/include/x86_64/vmx.h        | 51 +------------------
 2 files changed, 4 insertions(+), 54 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/svm_util.h b/tools/=
testing/selftests/kvm/include/x86_64/svm_util.h
index a339b537a575..7aee6244ab6a 100644
--- a/tools/testing/selftests/kvm/include/x86_64/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
@@ -9,15 +9,12 @@
 #ifndef SELFTEST_KVM_SVM_UTILS_H
 #define SELFTEST_KVM_SVM_UTILS_H
=20
+#include <asm/svm.h>
+
 #include <stdint.h>
 #include "svm.h"
 #include "processor.h"
=20
-#define SVM_EXIT_EXCP_BASE	0x040
-#define SVM_EXIT_HLT		0x078
-#define SVM_EXIT_MSR		0x07c
-#define SVM_EXIT_VMMCALL	0x081
-
 struct svm_test_data {
 	/* VMCB */
 	struct vmcb *vmcb; /* gva */
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testi=
ng/selftests/kvm/include/x86_64/vmx.h
index 99fa1410964c..e4206f69b716 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -8,6 +8,8 @@
 #ifndef SELFTEST_KVM_VMX_H
 #define SELFTEST_KVM_VMX_H
=20
+#include <asm/vmx.h>
+
 #include <stdint.h>
 #include "processor.h"
 #include "apic.h"
@@ -100,55 +102,6 @@
 #define VMX_EPT_VPID_CAP_AD_BITS		0x00200000
=20
 #define EXIT_REASON_FAILED_VMENTRY	0x80000000
-#define EXIT_REASON_EXCEPTION_NMI	0
-#define EXIT_REASON_EXTERNAL_INTERRUPT	1
-#define EXIT_REASON_TRIPLE_FAULT	2
-#define EXIT_REASON_INTERRUPT_WINDOW	7
-#define EXIT_REASON_NMI_WINDOW		8
-#define EXIT_REASON_TASK_SWITCH		9
-#define EXIT_REASON_CPUID		10
-#define EXIT_REASON_HLT			12
-#define EXIT_REASON_INVD		13
-#define EXIT_REASON_INVLPG		14
-#define EXIT_REASON_RDPMC		15
-#define EXIT_REASON_RDTSC		16
-#define EXIT_REASON_VMCALL		18
-#define EXIT_REASON_VMCLEAR		19
-#define EXIT_REASON_VMLAUNCH		20
-#define EXIT_REASON_VMPTRLD		21
-#define EXIT_REASON_VMPTRST		22
-#define EXIT_REASON_VMREAD		23
-#define EXIT_REASON_VMRESUME		24
-#define EXIT_REASON_VMWRITE		25
-#define EXIT_REASON_VMOFF		26
-#define EXIT_REASON_VMON		27
-#define EXIT_REASON_CR_ACCESS		28
-#define EXIT_REASON_DR_ACCESS		29
-#define EXIT_REASON_IO_INSTRUCTION	30
-#define EXIT_REASON_MSR_READ		31
-#define EXIT_REASON_MSR_WRITE		32
-#define EXIT_REASON_INVALID_STATE	33
-#define EXIT_REASON_MWAIT_INSTRUCTION	36
-#define EXIT_REASON_MONITOR_INSTRUCTION 39
-#define EXIT_REASON_PAUSE_INSTRUCTION	40
-#define EXIT_REASON_MCE_DURING_VMENTRY	41
-#define EXIT_REASON_TPR_BELOW_THRESHOLD 43
-#define EXIT_REASON_APIC_ACCESS		44
-#define EXIT_REASON_EOI_INDUCED		45
-#define EXIT_REASON_EPT_VIOLATION	48
-#define EXIT_REASON_EPT_MISCONFIG	49
-#define EXIT_REASON_INVEPT		50
-#define EXIT_REASON_RDTSCP		51
-#define EXIT_REASON_PREEMPTION_TIMER	52
-#define EXIT_REASON_INVVPID		53
-#define EXIT_REASON_WBINVD		54
-#define EXIT_REASON_XSETBV		55
-#define EXIT_REASON_APIC_WRITE		56
-#define EXIT_REASON_INVPCID		58
-#define EXIT_REASON_PML_FULL		62
-#define EXIT_REASON_XSAVES		63
-#define EXIT_REASON_XRSTORS		64
-#define LAST_EXIT_REASON		64
=20
 enum vmcs_field {
 	VIRTUAL_PROCESSOR_ID		=3D 0x00000000,
--=20
2.36.1.476.g0c4daa206d-goog
From nobody Mon Apr 27 09:13:37 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7A703C433EF
	for <linux-kernel@archiver.kernel.org>; Tue, 14 Jun 2022 20:49:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1358567AbiFNUt6 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 14 Jun 2022 16:49:58 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38928 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1345104AbiFNUtD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 14 Jun 2022 16:49:03 -0400
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B204403E4
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:23 -0700 (PDT)
Received: by mail-pj1-x104a.google.com with SMTP id
 nn9-20020a17090b38c900b001e82e1ec1deso1208277pjb.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 14 Jun 2022 13:48:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=reply-to:date:in-reply-to:message-id:mime-version:references
         :subject:from:to:cc;
        bh=HVrWphNJm9ODVqQWlSo7gw++y6zZnMkQugM6MxdMyIo=;
        b=n+7P2v7NKs8n+8JRX3W3Dia0TK84riQ5E+jJ86CNtA9ZWSqsX6l3sZutvDJU81hfJG
         u2YkkGwRnHfiv0A1eA4ml+hbKNLd7SYCvi0p/fC6ZgmLtjVxUEuSL3836u/L1FpsXdFo
         mfEDoxczyqb90IMDDVZemD+6vPPfNHZOKaKHEAaIiNn+7duG4xjwPCQf/8kd8ybL7Cbz
         F1LHvcTEOuu0zFpg7jnAdWzsLaiEH/YILmZq28//pxBukUlAaZJAinEA6PHrt6lX1BrS
         Mr+wb3Ejn2OEzk6k5N4zrMw+UYi8NiRZKZgDoXcMgGQ5Asi6kURSCMlv42ZhXZe6Ab09
         qyew==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:date:in-reply-to:message-id
         :mime-version:references:subject:from:to:cc;
        bh=HVrWphNJm9ODVqQWlSo7gw++y6zZnMkQugM6MxdMyIo=;
        b=3oGDED9IdAiqhoVD1UWxdCu9Feh+fzRLHy4d/nlnIQ6MZn7Q3LQ0P3tECBO0ap7K7w
         G//jLdPqfUZ62O/d5WR4S1LmOFPak5jhJokoCVLGvbngUeo2Ac4uEfLJlO2J8M2w/DxX
         i4ZkDXJ46+mIS57KR4jCqM8XuugAV+RrxWHnmYXcSz7m4j6nP+j2k1MoZfyFp/qoUrwp
         P3YWHGAwd4P3PLkRXoWw26XWcUy1WOX5G2/NsDta3TI26Ax1O3vGuLrtYUf7uAy0p97U
         Se+mDjQTCttu1ng8GpmvbQKQfaafMTOzxUuE7YLWqynO5/guXj2/d5HMjxcOGEUYvqfn
         CE6w==
X-Gm-Message-State: AOAM531PtDmfgcERb8v/JgdLPKun4CEe/j0EVdF3cTScSKZl8SAADYmh
        y4SNnugC4MfCE9+mDXFQ5tT49FAU+7I=
X-Google-Smtp-Source: 
 ABdhPJwBTG5GtOAoenn7/v08HElHG5WIh1QOed6kt0p9lJiIuSouSOtPxmc9DBzfZPhOIldBUvPnV12qGh4=
X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5])
 (user=seanjc job=sendgmr) by 2002:a63:3183:0:b0:3fd:6797:70a8 with SMTP id
 x125-20020a633183000000b003fd679770a8mr6010940pgx.206.1655239695826; Tue, 14
 Jun 2022 13:48:15 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 14 Jun 2022 20:47:30 +0000
In-Reply-To: <20220614204730.3359543-1-seanjc@google.com>
Message-Id: <20220614204730.3359543-22-seanjc@google.com>
Mime-Version: 1.0
References: <20220614204730.3359543-1-seanjc@google.com>
X-Mailer: git-send-email 2.36.1.476.g0c4daa206d-goog
Subject: [PATCH v2 21/21] KVM: selftests: Add an x86-only test to verify
 nested exception queueing
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add a test to verify that KVM_{G,S}ET_EVENTS play nice with pending vs.
injected exceptions when an exception is being queued for L2, and that
KVM correctly handles L1's exception intercept wants.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/x86_64/nested_exceptions_test.c       | 295 ++++++++++++++++++
 3 files changed, 297 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/nested_exceptions_te=
st.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftes=
ts/kvm/.gitignore
index 0ab0e255d292..7c8adb8cff83 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -27,6 +27,7 @@
 /x86_64/hyperv_svm_test
 /x86_64/max_vcpuid_cap_test
 /x86_64/mmio_warning_test
+/x86_64/nested_exceptions_test
 /x86_64/platform_info_test
 /x86_64/pmu_event_filter_test
 /x86_64/set_boot_cpu_id
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests=
/kvm/Makefile
index 2ca5400220b9..6db2dd5eca96 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -83,6 +83,7 @@ TEST_GEN_PROGS_x86_64 +=3D x86_64/hyperv_svm_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/kvm_clock_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/kvm_pv_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/mmio_warning_test
+TEST_GEN_PROGS_x86_64 +=3D x86_64/nested_exceptions_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/platform_info_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/pmu_event_filter_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/set_boot_cpu_id
diff --git a/tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c b/=
tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c
new file mode 100644
index 000000000000..ac33835f78f4
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c
@@ -0,0 +1,295 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#define _GNU_SOURCE /* for program_invocation_short_name */
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "vmx.h"
+#include "svm_util.h"
+
+#define L2_GUEST_STACK_SIZE 256
+
+/*
+ * Arbitrary, never shoved into KVM/hardware, just need to avoid conflict =
with
+ * the "real" exceptions used, #SS/#GP/#DF (12/13/8).
+ */
+#define FAKE_TRIPLE_FAULT_VECTOR	0xaa
+
+/* Arbitrary 32-bit error code injected by this test. */
+#define SS_ERROR_CODE 0xdeadbeef
+
+/*
+ * Bit '0' is set on Intel if the exception occurs while delivering a prev=
ious
+ * event/exception.  AMD's wording is ambiguous, but presumably the bit is=
 set
+ * if the exception occurs while delivering an external event, e.g. NMI or=
 INTR,
+ * but not for exceptions that occur when delivering other exceptions or
+ * software interrupts.
+ *
+ * Note, Intel's name for it, "External event", is misleading and much more
+ * aligned with AMD's behavior, but the SDM is quite clear on its behavior.
+ */
+#define ERROR_CODE_EXT_FLAG	BIT(0)
+
+/*
+ * Bit '1' is set if the fault occurred when looking up a descriptor in the
+ * IDT, which is the case here as the IDT is empty/NULL.
+ */
+#define ERROR_CODE_IDT_FLAG	BIT(1)
+
+/*
+ * The #GP that occurs when vectoring #SS should show the index into the I=
DT
+ * for #SS, plus have the "IDT flag" set.
+ */
+#define GP_ERROR_CODE_AMD ((SS_VECTOR * 8) | ERROR_CODE_IDT_FLAG)
+#define GP_ERROR_CODE_INTEL ((SS_VECTOR * 8) | ERROR_CODE_IDT_FLAG | ERROR=
_CODE_EXT_FLAG)
+
+/*
+ * Intel and AMD both shove '0' into the error code on #DF, regardless of =
what
+ * led to the double fault.
+ */
+#define DF_ERROR_CODE 0
+
+#define INTERCEPT_SS		(BIT_ULL(SS_VECTOR))
+#define INTERCEPT_SS_DF		(INTERCEPT_SS | BIT_ULL(DF_VECTOR))
+#define INTERCEPT_SS_GP_DF	(INTERCEPT_SS_DF | BIT_ULL(GP_VECTOR))
+
+static void l2_ss_pending_test(void)
+{
+	GUEST_SYNC(SS_VECTOR);
+}
+
+static void l2_ss_injected_gp_test(void)
+{
+	GUEST_SYNC(GP_VECTOR);
+}
+
+static void l2_ss_injected_df_test(void)
+{
+	GUEST_SYNC(DF_VECTOR);
+}
+
+static void l2_ss_injected_tf_test(void)
+{
+	GUEST_SYNC(FAKE_TRIPLE_FAULT_VECTOR);
+}
+
+static void svm_run_l2(struct svm_test_data *svm, void *l2_code, int vecto=
r,
+		       uint32_t error_code)
+{
+	struct vmcb *vmcb =3D svm->vmcb;
+	struct vmcb_control_area *ctrl =3D &vmcb->control;
+
+	vmcb->save.rip =3D (u64)l2_code;
+	run_guest(vmcb, svm->vmcb_gpa);
+
+	if (vector =3D=3D FAKE_TRIPLE_FAULT_VECTOR)
+		return;
+
+	GUEST_ASSERT_EQ(ctrl->exit_code, (SVM_EXIT_EXCP_BASE + vector));
+	GUEST_ASSERT_EQ(ctrl->exit_info_1, error_code);
+}
+
+static void l1_svm_code(struct svm_test_data *svm)
+{
+	struct vmcb_control_area *ctrl =3D &svm->vmcb->control;
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+
+	generic_svm_setup(svm, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+	svm->vmcb->save.idtr.limit =3D 0;
+	ctrl->intercept |=3D BIT_ULL(INTERCEPT_SHUTDOWN);
+
+	ctrl->intercept_exceptions =3D INTERCEPT_SS_GP_DF;
+	svm_run_l2(svm, l2_ss_pending_test, SS_VECTOR, SS_ERROR_CODE);
+	svm_run_l2(svm, l2_ss_injected_gp_test, GP_VECTOR, GP_ERROR_CODE_AMD);
+
+	ctrl->intercept_exceptions =3D INTERCEPT_SS_DF;
+	svm_run_l2(svm, l2_ss_injected_df_test, DF_VECTOR, DF_ERROR_CODE);
+
+	ctrl->intercept_exceptions =3D INTERCEPT_SS;
+	svm_run_l2(svm, l2_ss_injected_tf_test, FAKE_TRIPLE_FAULT_VECTOR, 0);
+	GUEST_ASSERT_EQ(ctrl->exit_code, SVM_EXIT_SHUTDOWN);
+
+	GUEST_DONE();
+}
+
+static void vmx_run_l2(void *l2_code, int vector, uint32_t error_code)
+{
+	GUEST_ASSERT(!vmwrite(GUEST_RIP, (u64)l2_code));
+
+	GUEST_ASSERT_EQ(vector =3D=3D SS_VECTOR ? vmlaunch() : vmresume(), 0);
+
+	if (vector =3D=3D FAKE_TRIPLE_FAULT_VECTOR)
+		return;
+
+	GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_EXCEPTION_NMI);
+	GUEST_ASSERT_EQ((vmreadz(VM_EXIT_INTR_INFO) & 0xff), vector);
+	GUEST_ASSERT_EQ(vmreadz(VM_EXIT_INTR_ERROR_CODE), error_code);
+}
+
+static void l1_vmx_code(struct vmx_pages *vmx)
+{
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+
+	GUEST_ASSERT_EQ(prepare_for_vmx_operation(vmx), true);
+
+	GUEST_ASSERT_EQ(load_vmcs(vmx), true);
+
+	prepare_vmcs(vmx, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+	GUEST_ASSERT_EQ(vmwrite(GUEST_IDTR_LIMIT, 0), 0);
+
+	/*
+	 * VMX disallows injecting an exception with error_code[31:16] !=3D 0,
+	 * and hardware will never generate a VM-Exit with bits 31:16 set.
+	 * KVM should likewise truncate the "bad" userspace value.
+	 */
+	GUEST_ASSERT_EQ(vmwrite(EXCEPTION_BITMAP, INTERCEPT_SS_GP_DF), 0);
+	vmx_run_l2(l2_ss_pending_test, SS_VECTOR, (u16)SS_ERROR_CODE);
+	vmx_run_l2(l2_ss_injected_gp_test, GP_VECTOR, GP_ERROR_CODE_INTEL);
+
+	GUEST_ASSERT_EQ(vmwrite(EXCEPTION_BITMAP, INTERCEPT_SS_DF), 0);
+	vmx_run_l2(l2_ss_injected_df_test, DF_VECTOR, DF_ERROR_CODE);
+
+	GUEST_ASSERT_EQ(vmwrite(EXCEPTION_BITMAP, INTERCEPT_SS), 0);
+	vmx_run_l2(l2_ss_injected_tf_test, FAKE_TRIPLE_FAULT_VECTOR, 0);
+	GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_TRIPLE_FAULT);
+
+	GUEST_DONE();
+}
+
+static void __attribute__((__flatten__)) l1_guest_code(void *test_data)
+{
+	if (this_cpu_has(X86_FEATURE_SVM))
+		l1_svm_code(test_data);
+	else
+		l1_vmx_code(test_data);
+}
+
+static void assert_ucall_vector(struct kvm_vcpu *vcpu, int vector)
+{
+	struct kvm_run *run =3D vcpu->run;
+	struct ucall uc;
+
+	TEST_ASSERT(run->exit_reason =3D=3D KVM_EXIT_IO,
+		    "Unexpected exit reason: %u (%s),\n",
+		    run->exit_reason, exit_reason_str(run->exit_reason));
+
+	switch (get_ucall(vcpu, &uc)) {
+	case UCALL_SYNC:
+		TEST_ASSERT(vector =3D=3D uc.args[1],
+			    "Expected L2 to ask for %d, got %ld", vector, uc.args[1]);
+		break;
+	case UCALL_DONE:
+		TEST_ASSERT(vector =3D=3D -1,
+			    "Expected L2 to ask for %d, L2 says it's done", vector);
+		break;
+	case UCALL_ABORT:
+		TEST_FAIL("%s at %s:%ld (0x%lx !=3D 0x%lx)",
+			  (const char *)uc.args[0], __FILE__, uc.args[1],
+			  uc.args[2], uc.args[3]);
+		break;
+	default:
+		TEST_FAIL("Expected L2 to ask for %d, got unexpected ucall %lu", vector,=
 uc.cmd);
+	}
+}
+
+static void queue_ss_exception(struct kvm_vcpu *vcpu, bool inject)
+{
+	struct kvm_vcpu_events events;
+
+	vcpu_events_get(vcpu, &events);
+
+	TEST_ASSERT(!events.exception.pending,
+		    "Vector %d unexpectedlt pending", events.exception.nr);
+	TEST_ASSERT(!events.exception.injected,
+		    "Vector %d unexpectedly injected", events.exception.nr);
+
+	events.flags =3D KVM_VCPUEVENT_VALID_PAYLOAD;
+	events.exception.pending =3D !inject;
+	events.exception.injected =3D inject;
+	events.exception.nr =3D SS_VECTOR;
+	events.exception.has_error_code =3D true;
+	events.exception.error_code =3D SS_ERROR_CODE;
+	vcpu_events_set(vcpu, &events);
+}
+
+/*
+ * Verify KVM_{G,S}ET_EVENTS play nice with pending vs. injected exceptions
+ * when an exception is being queued for L2.  Specifically, verify that KVM
+ * honors L1 exception intercept controls when a #SS is pending/injected,
+ * triggers a #GP on vectoring the #SS, morphs to #DF if #GP isn't interce=
pted
+ * by L1, and finally causes (nested) SHUTDOWN if #DF isn't intercepted by=
 L1.
+ */
+int main(int argc, char *argv[])
+{
+	vm_vaddr_t nested_test_data_gva;
+	struct kvm_vcpu_events events;
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_EXCEPTION_PAYLOAD));
+	TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM) || kvm_cpu_has(X86_FEATURE_VMX)=
);
+
+	vm =3D vm_create_with_one_vcpu(&vcpu, l1_guest_code);
+	vm_enable_cap(vm, KVM_CAP_EXCEPTION_PAYLOAD, -2ul);
+
+	if (kvm_cpu_has(X86_FEATURE_SVM))
+		vcpu_alloc_svm(vm, &nested_test_data_gva);
+	else
+		vcpu_alloc_vmx(vm, &nested_test_data_gva);
+
+	vcpu_args_set(vcpu, 1, nested_test_data_gva);
+
+	/* Run L1 =3D> L2.  L2 should sync and request #SS. */
+	vcpu_run(vcpu);
+	assert_ucall_vector(vcpu, SS_VECTOR);
+
+	/* Pend #SS and request immediate exit.  #SS should still be pending. */
+	queue_ss_exception(vcpu, false);
+	vcpu->run->immediate_exit =3D true;
+	vcpu_run_complete_io(vcpu);
+
+	/* Verify the pending events comes back out the same as it went in. */
+	vcpu_events_get(vcpu, &events);
+	ASSERT_EQ(events.flags & KVM_VCPUEVENT_VALID_PAYLOAD,
+		  KVM_VCPUEVENT_VALID_PAYLOAD);
+	ASSERT_EQ(events.exception.pending, true);
+	ASSERT_EQ(events.exception.nr, SS_VECTOR);
+	ASSERT_EQ(events.exception.has_error_code, true);
+	ASSERT_EQ(events.exception.error_code, SS_ERROR_CODE);
+
+	/*
+	 * Run for real with the pending #SS, L1 should get a VM-Exit due to
+	 * #SS interception and re-enter L2 to request #GP (via injected #SS).
+	 */
+	vcpu->run->immediate_exit =3D false;
+	vcpu_run(vcpu);
+	assert_ucall_vector(vcpu, GP_VECTOR);
+
+	/*
+	 * Inject #SS, the #SS should bypass interception and cause #GP, which
+	 * L1 should intercept before KVM morphs it to #DF.  L1 should then
+	 * disable #GP interception and run L2 to request #DF (via #SS =3D> #GP).
+	 */
+	queue_ss_exception(vcpu, true);
+	vcpu_run(vcpu);
+	assert_ucall_vector(vcpu, DF_VECTOR);
+
+	/*
+	 * Inject #SS, the #SS should bypass interception and cause #GP, which
+	 * L1 is no longer interception, and so should see a #DF VM-Exit.  L1
+	 * should then signal that is done.
+	 */
+	queue_ss_exception(vcpu, true);
+	vcpu_run(vcpu);
+	assert_ucall_vector(vcpu, FAKE_TRIPLE_FAULT_VECTOR);
+
+	/*
+	 * Inject #SS yet again.  L1 is not intercepting #GP or #DF, and so
+	 * should see nested TRIPLE_FAULT / SHUTDOWN.
+	 */
+	queue_ss_exception(vcpu, true);
+	vcpu_run(vcpu);
+	assert_ucall_vector(vcpu, -1);
+
+	kvm_vm_free(vm);
+}
--=20
2.36.1.476.g0c4daa206d-goog