From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8B825ECAAD5
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:17:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229560AbiH3XRU (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:17:20 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37796 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231342AbiH3XQq (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:16:46 -0400
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B65D6A4A6
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:19 -0700 (PDT)
Received: by mail-pj1-x104a.google.com with SMTP id
 ng1-20020a17090b1a8100b001f4f9f69d48so11872892pjb.4
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=ZlynD2151oAwoVetBZOi7ygkuiFbrmF9qnEbeilSaHk=;
        b=sW9B9SLCPv5zTB8B7Fiify7VlWOcvvPppU5ir+i6uzPnaTl8E2HyJpqa2R/2WmMSwY
         ecEecOG1/Xu+vJomiRWhB7S3JLY+HnlJcWomwIJjekLwfUyJtSYi3jsLlijpBPASrdUp
         hRD9p5uufXXSE7j5YmZCtxVZ7eBdyJ+qps1Wn1CKOqyvYS2nyV6K9W4wdEBGRb6syGJr
         NDxLQYZ1oR2w95a2Z+X9McwPsoKnW1DoA2ekhRX1wPMieLhP1BRLIxS4VjYlrMqCVv6h
         6rEV77Pn8/8xy4flunM19T1whsgGBnUQFe2CHqSoSllmL7OsHvQpZMd0iSWJ3lfLq3kq
         G8JQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=ZlynD2151oAwoVetBZOi7ygkuiFbrmF9qnEbeilSaHk=;
        b=ClPFExYPHuVsIZC6QKjkPEBKAUL60srNM7oRCG/3+UjYDohc+CsB6mGhYGY0adPay1
         1T4b7YTNuShLrSfxz4aexmrzdWmdrue3gGqYUFKdI7W1znuFnONDcWhrjvVHTXNwUHqL
         HFKPD/ILdu7ksvn+oF9ZWhgu5gXv0zoGSWmPgIoIEUN8zsP2bR8uEtbiuuloaU+BexL9
         9nlNCGAMYaOkHGN+GVblKdoW7aY+NZU8zP6/yUJjt0PEE9z4Vd3h1Q1MSFyKTsffcNkh
         oejmvOAs7Z3ALyriZcLVVF5HRoLDSbX4Wl9uNQ72Q0Q1SyeF/M7EoHf2r5/DWeQlNbOh
         iLQw==
X-Gm-Message-State: ACgBeo1FJrJ1xgJnx1o449jh18OofvjrfMXfd+15IqUmeli0kLJ/vCRx
        1XUD4K70NZItikMTH8dPEuYinJL/sGc=
X-Google-Smtp-Source: 
 AA6agR5ardEOM1vUttPFlIZwPDoClV3JBCEUdeIemWGtCUXWy+yd11oFjAIkUOIh++12NAYBBHHZQ53S4dQ=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:1496:b0:52f:734f:9122 with SMTP id
 v22-20020a056a00149600b0052f734f9122mr23792706pfu.85.1661901378626; Tue, 30
 Aug 2022 16:16:18 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:15:48 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-2-seanjc@google.com>
Subject: [PATCH v5 01/27] KVM: nVMX: Unconditionally purge queued/injected
 events on nested "exit"
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Drop pending exceptions and events queued for re-injection when leaving
nested guest mode, even if the "exit" is due to VM-Fail, SMI, or forced
by host userspace.  Failure to purge events could result in an event
belonging to L2 being injected into L1.

This _should_ never happen for VM-Fail as all events should be blocked by
nested_run_pending, but it's possible if KVM, not the L1 hypervisor, is
the source of VM-Fail when running vmcs02.

SMI is a nop (barring unknown bugs) as recognition of SMI and thus entry
to SMM is blocked by pending exceptions and re-injected events.

Forced exit is definitely buggy, but has likely gone unnoticed because
userspace probably follows the forced exit with KVM_SET_VCPU_EVENTS (or
some other ioctl() that purges the queue).

Fixes: 4f350c6dbcb9 ("kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME fa=
ilure properly")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ddd4367d4826..ca07d4ce4383 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4255,14 +4255,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, st=
ruct vmcs12 *vmcs12,
 			nested_vmx_abort(vcpu,
 					 VMX_ABORT_SAVE_GUEST_MSR_FAIL);
 	}
-
-	/*
-	 * Drop what we picked up for L2 via vmx_complete_interrupts. It is
-	 * preserved above and would only end up incorrectly in L1.
-	 */
-	vcpu->arch.nmi_injected =3D false;
-	kvm_clear_exception_queue(vcpu);
-	kvm_clear_interrupt_queue(vcpu);
 }
=20
 /*
@@ -4602,6 +4594,17 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm=
_exit_reason,
 		WARN_ON_ONCE(nested_early_check);
 	}
=20
+	/*
+	 * Drop events/exceptions that were queued for re-injection to L2
+	 * (picked up via vmx_complete_interrupts()), as well as exceptions
+	 * that were pending for L2.  Note, this must NOT be hoisted above
+	 * prepare_vmcs12(), events/exceptions queued for re-injection need to
+	 * be captured in vmcs12 (see vmcs12_save_pending_event()).
+	 */
+	vcpu->arch.nmi_injected =3D false;
+	kvm_clear_exception_queue(vcpu);
+	kvm_clear_interrupt_queue(vcpu);
+
 	vmx_switch_vmcs(vcpu, &vmx->vmcs01);
=20
 	/* Update any VMCS fields that might have changed while L2 ran */
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 58755ECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:17:32 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229565AbiH3XRb (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:17:31 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37848 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231779AbiH3XQr (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:16:47 -0400
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 949886B8FE
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:21 -0700 (PDT)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-33dd097f993so192159947b3.10
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=/leTTfx+tYGUV20b38Vn3CO6vARPbHwJ64uO0txcnTA=;
        b=WLC90H12sf6WqFQfaKUgHPqCVgt+zRiHUQdeDttwimVBeBarQp+yBSLUEaGiUFUR//
         +KsUCEknT8rhNqesVvp8PeQgjvnh8C3OpaQvnbqRwY/NqXNkB/u8iy+u4dDM9/4/e8Il
         fGNTiWDY/+dlovqp96EsGC/xZtDcGzDNQXckiYyMFSL14y1SAVm6iriu1N8+dC9f9ZLv
         ohbKHDDh9+vIjKBq3Vd7ebfFogGRFoUuExgre4AymuPvAF2VQEnjTqn9kZUXrKAv2+EB
         TqcRPgz/CsyDpxZYSQ8qkf3nuH/cRWImDtsuEPiypEwceQsWcBI7IReVBMWKjilI4UCI
         mTlA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=/leTTfx+tYGUV20b38Vn3CO6vARPbHwJ64uO0txcnTA=;
        b=WMQQHfRxqJGqOSTA00cFV2236I0XFXFqJ7TgV8h6WXj2bjwV6qBsOspkN2nWuInHbS
         MFUIjACGajuf9h46eNcpRFdc2EVoy2yiFh56i7H9CHYB8nJB/cHSeOy+W9pTZ+lemkcl
         tWkS90SbVzVGiwTKhRi5qilhJ63uW4+5HP5w1i3uecm4sioWRp++tzh2BZz+woNG1LwR
         7N/FF66bDtiHttVSmcTFWfGEPfuQTNh0yG66O/db3TpQnO4oIzO3sxkwIGkN25AEADxR
         9zvYgQVYxwRhng4XDsViUlE+Ucx2bmQ/AsHrDRGhPhxV4RvZ/qc8eOESyE+MoRW3TLZp
         WcZg==
X-Gm-Message-State: ACgBeo3hf4YhbhUWaTgO9B3DbkP9bX8rnimU3rPt/UKeNLKh2p28W6px
        hi8sGSjq1eCH2PSOY0ybFKruxYQL0mw=
X-Google-Smtp-Source: 
 AA6agR7wTFF2kwxnrSHkfUK9uKFIyJtRm5OhD26Vk649tRzd9fsQh0QhSqCS0giw3BAfZUUAxB9jIt9pfp8=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a81:b71f:0:b0:340:eb70:54c2 with SMTP id
 v31-20020a81b71f000000b00340eb7054c2mr11812408ywh.404.1661901380159; Tue, 30
 Aug 2022 16:16:20 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:15:49 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-3-seanjc@google.com>
Subject: [PATCH v5 02/27] KVM: VMX: Drop bits 31:16 when shoving exception
 error code into VMCS
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Deliberately truncate the exception error code when shoving it into the
VMCS (VM-Entry field for vmcs01 and vmcs02, VM-Exit field for vmcs12).
Intel CPUs are incapable of handling 32-bit error codes and will never
generate an error code with bits 31:16, but userspace can provide an
arbitrary error code via KVM_SET_VCPU_EVENTS.  Failure to drop the bits
on exception injection results in failed VM-Entry, as VMX disallows
setting bits 31:16.  Setting the bits on VM-Exit would at best confuse
L1, and at worse induce a nested VM-Entry failure, e.g. if L1 decided to
reinject the exception back into L2.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 11 ++++++++++-
 arch/x86/kvm/vmx/vmx.c    | 12 +++++++++++-
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ca07d4ce4383..2ca8f1ad9c59 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3827,7 +3827,16 @@ static void nested_vmx_inject_exception_vmexit(struc=
t kvm_vcpu *vcpu,
 	u32 intr_info =3D nr | INTR_INFO_VALID_MASK;
=20
 	if (vcpu->arch.exception.has_error_code) {
-		vmcs12->vm_exit_intr_error_code =3D vcpu->arch.exception.error_code;
+		/*
+		 * Intel CPUs do not generate error codes with bits 31:16 set,
+		 * and more importantly VMX disallows setting bits 31:16 in the
+		 * injected error code for VM-Entry.  Drop the bits to mimic
+		 * hardware and avoid inducing failure on nested VM-Entry if L1
+		 * chooses to inject the exception back to L2.  AMD CPUs _do_
+		 * generate "full" 32-bit error codes, so KVM allows userspace
+		 * to inject exception error codes with bits 31:16 set.
+		 */
+		vmcs12->vm_exit_intr_error_code =3D (u16)vcpu->arch.exception.error_code;
 		intr_info |=3D INTR_INFO_DELIVER_CODE_MASK;
 	}
=20
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c9b49a09e6b5..7f3581960eb5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1695,7 +1695,17 @@ static void vmx_queue_exception(struct kvm_vcpu *vcp=
u)
 	kvm_deliver_exception_payload(vcpu);
=20
 	if (has_error_code) {
-		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, error_code);
+		/*
+		 * Despite the error code being architecturally defined as 32
+		 * bits, and the VMCS field being 32 bits, Intel CPUs and thus
+		 * VMX don't actually supporting setting bits 31:16.  Hardware
+		 * will (should) never provide a bogus error code, but AMD CPUs
+		 * do generate error codes with bits 31:16 set, and so KVM's
+		 * ABI lets userspace shove in arbitrary 32-bit values.  Drop
+		 * the upper bits to avoid VM-Fail, losing information that
+		 * does't really exist is preferable to killing the VM.
+		 */
+		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, (u16)error_code);
 		intr_info |=3D INTR_INFO_DELIVER_CODE_MASK;
 	}
=20
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 690EBECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:17:41 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231431AbiH3XRk (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:17:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38862 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231795AbiH3XQr (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:16:47 -0400
Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com
 [IPv6:2607:f8b0:4864:20::64a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBA016BCF2
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:22 -0700 (PDT)
Received: by mail-pl1-x64a.google.com with SMTP id
 q6-20020a17090311c600b0017266460b8fso8763052plh.4
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=Mh9I+JSjtQjMTePFCwYbFYKBb/UOzkx18CwX/puQayY=;
        b=b1q1QQ4hrOfrbYEtpy2NkDBcAcbbfMYmT/fY1WwkyNSFMjAozfijLm6XzZA6JQMTzz
         aAmfoOaBs1XKxk5IgrD2mRkHfbjzebx/tis9Q89UvMOGgSKksqPK8fn7YDmkBtiCc9Cs
         n/E//Vxa8YG7s4TeHEDpzp3JgEpFz92AMkBHOJmbNIQy3yR0l+osHWJsGJcwSHWBjmTw
         RFAwXbn6QNxSmTOTj8uruEVUeGL1FJHcwfE5mGRKAlsChFrSD2k4X1MT5WaliTihBAZm
         VsrRO6ZeomETA1TnxUwKH6SRy/ufq3xU98IuKMwVaAu1HrV9lO6kDJiTjQQ3UZ43iNKA
         qPVQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=Mh9I+JSjtQjMTePFCwYbFYKBb/UOzkx18CwX/puQayY=;
        b=V2sYtCYFA8LYMP+mmDqwyy34hBdjvILpQXcrQiDKHFkF9xtYWZbiZhfptethOWTi/h
         TCPLx3f1+avAfyYXuwX+WVuxldyT+hYzNxGMOYtj23baevqTHQNPbWWwaohpSO7U3CLQ
         hb3UtGrBZDghxVsJxqd0bQ7Od251ZrpfEEXmkpGC5+btmzX9SUqc2h6NseoQx9bXly2Q
         v0D7hjlsdJfPqvJvBRiwhV8mp1X9c3ns/nY8vRG3lbS99Qqp+LOkTkJ4FXXRl4EcTGOF
         ODori1APKD5Ik/MtSUWaFiGYYDy9Og6ilQP7VFP7bm4MosaYV/4HuFUx1IqYh94moUOJ
         xIlQ==
X-Gm-Message-State: ACgBeo0VVKs5EErsZnnC86qyizNQwnoBTlbBvs1jakqzK31OQSmR7lXS
        h7Dx4yuY7utSl30KZRg8HYOI1TmzCnI=
X-Google-Smtp-Source: 
 AA6agR5ZXNMSxVy2nO07PEqB0rsk6xgiev+DGTd8hj8k9d4mvJC+KST/4NEANGNvwwPDv321ttXfwjotZFY=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:902:d50c:b0:175:2cb2:e0e7 with SMTP id
 b12-20020a170902d50c00b001752cb2e0e7mr3760462plg.157.1661901381865; Tue, 30
 Aug 2022 16:16:21 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:15:50 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-4-seanjc@google.com>
Subject: [PATCH v5 03/27] KVM: x86: Don't check for code breakpoints when
 emulating on exception
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Don't check for code breakpoints during instruction emulation if the
emulation was triggered by exception interception.  Code breakpoints are
the highest priority fault-like exception, and KVM only emulates on
exceptions that are fault-like.  Thus, if hardware signaled a different
exception, then the vCPU is already passed the stage of checking for
hardware breakpoints.

This is likely a glorified nop in terms of functionality, and is more for
clarification and is technically an optimization.  Intel's SDM explicitly
states vmcs.GUEST_RFLAGS.RF on exception interception is the same as the
value that would have been saved on the stack had the exception not been
intercepted, i.e. will be '1' due to all fault-like exceptions setting RF
to '1'.  AMD says "guest state saved ... is the processor state as of the
moment the intercept triggers", but that begs the question, "when does
the intercept trigger?".

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7374d768296..fead0e8cd3e3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8535,8 +8535,29 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *v=
cpu)
 }
 EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
=20
-static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu, int *r)
+static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu,
+					   int emulation_type, int *r)
 {
+	WARN_ON_ONCE(emulation_type & EMULTYPE_NO_DECODE);
+
+	/*
+	 * Do not check for code breakpoints if hardware has already done the
+	 * checks, as inferred from the emulation type.  On NO_DECODE and SKIP,
+	 * the instruction has passed all exception checks, and all intercepted
+	 * exceptions that trigger emulation have lower priority than code
+	 * breakpoints, i.e. the fact that the intercepted exception occurred
+	 * means any code breakpoints have already been serviced.
+	 *
+	 * Note, KVM needs to check for code #DBs on EMULTYPE_TRAP_UD_FORCED as
+	 * hardware has checked the RIP of the magic prefix, but not the RIP of
+	 * the instruction being emulated.  The intent of forced emulation is
+	 * to behave as if KVM intercepted the instruction without an exception
+	 * and without a prefix.
+	 */
+	if (emulation_type & (EMULTYPE_NO_DECODE | EMULTYPE_SKIP |
+			      EMULTYPE_TRAP_UD | EMULTYPE_VMWARE_GP | EMULTYPE_PF))
+		return false;
+
 	if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) &&
 	    (vcpu->arch.guest_debug_dr7 & DR7_BP_EN_MASK)) {
 		struct kvm_run *kvm_run =3D vcpu->run;
@@ -8658,8 +8679,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gp=
a_t cr2_or_gpa,
 		 * are fault-like and are higher priority than any faults on
 		 * the code fetch itself.
 		 */
-		if (!(emulation_type & EMULTYPE_SKIP) &&
-		    kvm_vcpu_check_code_breakpoint(vcpu, &r))
+		if (kvm_vcpu_check_code_breakpoint(vcpu, emulation_type, &r))
 			return r;
=20
 		r =3D x86_decode_emulated_instruction(vcpu, emulation_type,
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 96B88ECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:17:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230162AbiH3XRo (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:17:44 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38764 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231654AbiH3XRE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:17:04 -0400
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58ADC9E127
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:25 -0700 (PDT)
Received: by mail-pl1-x649.google.com with SMTP id
 b9-20020a170902d50900b0016f0342a417so8878784plg.21
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=oyW7O7dkTIafWDUZOYrTaY9ovl5H6kG4zyO9ErNBcPI=;
        b=hwErrU1ofUzrYXdyH0PkH68mxDp8A9BaOVU+rzLX5I4bY2Z5eqvshEjY4dJhn+a2aD
         Gvhl8y3skY8RsUeN+iq6ql/3LQKpiGlMYg0gCdif2IaOgdIDz6bcwgiSdRxE3HdA5xNd
         w3PONsSumD+65Ksda4OOHNgOzhsmF6C3e82m+rzYXPYruvp1QVhuZI5HeGbh+L0VfQpk
         X2Lr0I/RKwaFoeOo169mfJ+f+5A8kftH9M27MVeCsF//iO5eMJCUGIN0jJAWMN0PDR1Q
         oxO4xAupt6PcgQEiOIIh7jdYdlq97TurBpC4HHP29dYI7qS+G+XmqL7VZx5Cpznj6zkF
         0oJw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=oyW7O7dkTIafWDUZOYrTaY9ovl5H6kG4zyO9ErNBcPI=;
        b=ojyQrqcf/4yr3d+mFGqWRU2ytsoUj+xdcuJmsgIWNYeRPrGoqUE4GdPEIPri6tAP2E
         EHZPL7OQj7Q3MQRojeG/N2u/KzGVEQdkY7pxKAdQqTn7iqZ8JViesj2PuE50HX96Z8JL
         z9y11683ur1Vax7rbmC3+DfFUCVLaXGW/3xz5QJ2neg7wo2ztexLUHCshLfV/dsCElHw
         9I7gptVuq7O7f+W87uD3cBvdUldqhsy2qTs4yEn24eynPw7wBKWtCKg4qOJGEpCt6SSj
         xFsi0zoip7VZKypCVKtDrz6rTcufiUyBewKYsKXvL29KJqk4pObck3BWivgO2RFk1Iwv
         o6GA==
X-Gm-Message-State: ACgBeo0RRvV/5QrBiI+UX6GW5HO+7dWi6329eXUzhQboyopUiaiRDxzk
        Q1QZzYc8GTy9NyNm9GWsdzna3AC9HSA=
X-Google-Smtp-Source: 
 AA6agR7eQvqhd8VBnSv0zbkGPCB4IhAa75nWv7zObskXK/ex2k3GyizECsiOOHH/Isvxfi5+eU9jJws7lqo=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a63:8248:0:b0:42b:b607:f74f with SMTP id
 w69-20020a638248000000b0042bb607f74fmr13880045pgd.70.1661901383579; Tue, 30
 Aug 2022 16:16:23 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:15:51 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-5-seanjc@google.com>
Subject: [PATCH v5 04/27] KVM: x86: Allow clearing RFLAGS.RF on forced
 emulation to test code #DBs
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Extend force_emulation_prefix to an 'int' and use bit 1 as a flag to
indicate that KVM should clear RFLAGS.RF before emulating, e.g. to allow
tests to force emulation of code breakpoints in conjunction with MOV/POP
SS blocking, which is impossible without KVM intervention as VMX
unconditionally sets RFLAGS.RF on intercepted #UD.

Make the behavior controllable so that tests can also test RFLAGS.RF=3D1
(again in conjunction with code #DBs).

Note, clearing RFLAGS.RF won't create an infinite #DB loop as the guest's
IRET from the #DB handler will return to the instruction and not the
prefix, i.e. the restart won't force emulation.

Opportunistically convert the permissions to the preferred octal format.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fead0e8cd3e3..7403221c868e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -173,8 +173,13 @@ bool __read_mostly enable_vmware_backdoor =3D false;
 module_param(enable_vmware_backdoor, bool, S_IRUGO);
 EXPORT_SYMBOL_GPL(enable_vmware_backdoor);
=20
-static bool __read_mostly force_emulation_prefix =3D false;
-module_param(force_emulation_prefix, bool, S_IRUGO);
+/*
+ * Flags to manipulate forced emulation behavior (any non-zero value will
+ * enable forced emulation).
+ */
+#define KVM_FEP_CLEAR_RFLAGS_RF	BIT(1)
+static int __read_mostly force_emulation_prefix;
+module_param(force_emulation_prefix, int, 0444);
=20
 int __read_mostly pi_inject_timer =3D -1;
 module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
@@ -7255,6 +7260,8 @@ int handle_ud(struct kvm_vcpu *vcpu)
 	    kvm_read_guest_virt(vcpu, kvm_get_linear_rip(vcpu),
 				sig, sizeof(sig), &e) =3D=3D 0 &&
 	    memcmp(sig, kvm_emulate_prefix, sizeof(sig)) =3D=3D 0) {
+		if (force_emulation_prefix & KVM_FEP_CLEAR_RFLAGS_RF)
+			kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) & ~X86_EFLAGS_RF);
 		kvm_rip_write(vcpu, kvm_rip_read(vcpu) + sizeof(sig));
 		emul_type =3D EMULTYPE_TRAP_UD_FORCED;
 	}
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6527AECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:17:51 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230508AbiH3XRt (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:17:49 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39746 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231742AbiH3XRM (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:17:12 -0400
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3047D9F1BE
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:25 -0700 (PDT)
Received: by mail-pl1-x649.google.com with SMTP id
 l16-20020a170902f69000b00175138bcd25so2334385plg.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=tQH5mj5DXx8pDOg5+wg55pqynAIGnsFRb1EgCvXniyg=;
        b=pTQWse+nJwxm+r4pe+tWC4i8rGRAYXwYceIjdjBLn6lmePyYYyXsYxZF/mj7LFUfKy
         qgF08o0eK0D4v6CFJtSEHQYwlJAopF1ycK2haq12XM3oEtZU55/whez15jVRdCJEJEpW
         70jZhhj2bUiUua9tPcnEIYrTYKVuI1+3AVpaJOr9d1Rb0KpekVFtqlVXsdKSS6/RcorK
         lXayw/8slmB185RAixROfy70CxHsiMs8p6yM3g4HR1Id2+T+WYk++nimIE5fLYPGdEN+
         3/Aec/wbh4mjGwPIuRfICITCznuXvhyZQUG0ZeFrnHek8PlidnAjVCBDE3R2yFNyyXuL
         BbCw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=tQH5mj5DXx8pDOg5+wg55pqynAIGnsFRb1EgCvXniyg=;
        b=zIjUGtVkHTdGMw8nhTyagmFfs8VZFW8OTzUURMtzbxB/PAEPACRKYnbLgazbxzrdIu
         FB5qRsu8oKWFMWrsBMiLXafARl5NPFEdGDPFqMKfMWKsN+G3LDjAG1bWos7aF4TM3Zfm
         +NhLV8Coiwt9RC4yA/CS0/SKJdo0uLQ4mTXaoF1mhDIqzqeggA+E4OEh7WDxiMV+bdJP
         mf65OHpmgVeUkUD4cI5d7vfDK5mE716MY9Ht7uBxgIxKPgxG09bL/QeGVvV6PtiDIjC/
         gj9ieXovnqugF++L0n96dqu0gNLBEKpG1/UhGo37rjjht1KmyZ+HqUsbtP2leT1y4Pp4
         RyMA==
X-Gm-Message-State: ACgBeo25fLu3KUdo0xS1QkunHxUna7kBFylXaCpLvk2udp0JCEIoLSpL
        wleYKa4aSbwxtGD4pzpbZzLOZlOtwSE=
X-Google-Smtp-Source: 
 AA6agR5SmoOabESqzgeL3XbiGJ9fl5e/kwquZZibIowi2wwuplpE/2byG8fZNALDzoNTqoiF00GnDVhNngc=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:902:6b0c:b0:171:325a:364e with SMTP id
 o12-20020a1709026b0c00b00171325a364emr22929892plk.150.1661901384910; Tue, 30
 Aug 2022 16:16:24 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:15:52 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-6-seanjc@google.com>
Subject: [PATCH v5 05/27] KVM: x86: Suppress code #DBs on Intel if MOV/POP SS
 blocking is active
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Suppress code breakpoints if MOV/POP SS blocking is active and the guest
CPU is Intel, i.e. if the guest thinks it's running on an Intel CPU.
Intel CPUs inhibit code #DBs when MOV/POP SS blocking is active, whereas
AMD (and its descendents) do not.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7403221c868e..013580c355d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8542,6 +8542,23 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *v=
cpu)
 }
 EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
=20
+static bool kvm_is_code_breakpoint_inhibited(struct kvm_vcpu *vcpu)
+{
+	u32 shadow;
+
+	if (kvm_get_rflags(vcpu) & X86_EFLAGS_RF)
+		return true;
+
+	/*
+	 * Intel CPUs inhibit code #DBs when MOV/POP SS blocking is active,
+	 * but AMD CPUs do not.  MOV/POP SS blocking is rare, check that first
+	 * to avoid the relatively expensive CPUID lookup.
+	 */
+	shadow =3D static_call(kvm_x86_get_interrupt_shadow)(vcpu);
+	return (shadow & KVM_X86_SHADOW_INT_MOV_SS) &&
+	       guest_cpuid_is_intel(vcpu);
+}
+
 static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu,
 					   int emulation_type, int *r)
 {
@@ -8584,7 +8601,7 @@ static bool kvm_vcpu_check_code_breakpoint(struct kvm=
_vcpu *vcpu,
 	}
=20
 	if (unlikely(vcpu->arch.dr7 & DR7_BP_EN_MASK) &&
-	    !(kvm_get_rflags(vcpu) & X86_EFLAGS_RF)) {
+	    !kvm_is_code_breakpoint_inhibited(vcpu)) {
 		unsigned long eip =3D kvm_get_linear_rip(vcpu);
 		u32 dr6 =3D kvm_vcpu_check_hw_bp(eip, 0,
 					   vcpu->arch.dr7,
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 70801ECAAD5
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:18:02 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231571AbiH3XSA (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:18:00 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38810 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231744AbiH3XRR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:17:17 -0400
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BEC5A024F
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:27 -0700 (PDT)
Received: by mail-pj1-x1049.google.com with SMTP id
 s3-20020a17090a5d0300b001fb3ac54a03so5229706pji.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=cNQuNeWyRHQgOj4jrxbhlCOXcJ4T7+kRTqVJB33Ymrw=;
        b=pwcbOpDxqsQS1Fv94mfH9hIu542GuS2+Fnk+N/X1e0buw7zBEcus1yP5JbKiryySbo
         0ty0C7RbSZdYAQ3Vn9cpHaVMDf7XRGSN9BunvsybMV6qFxOpIo+vojMkNU8yaLiAsm3z
         DR1zpsMAD0Dd8M3bwm+dSKfT67/wAUf4AZauAhuUHP/LavD/9Nz66nDbMmGZi3d/4dKK
         kAamSIE7uJbaolhD1ExJ9QABS8v2oVT+fY4qpNc7/ZgCOkby0kevQxS/NwHoh6Oymt8z
         cPBcZalzqN5aoaSygfp3QwiYhb1xCYC7l7uEvoe1X5LqK99KErnXraZZSjhkK5iU9IQB
         Bgcw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=cNQuNeWyRHQgOj4jrxbhlCOXcJ4T7+kRTqVJB33Ymrw=;
        b=fZgpS+Olkow65SgLW1q+UXleBeqGsl1P+Cq1F+1Sj350ifg/q/S6HH8L1fbYvsJBXN
         iDk5dnAP56MUkvb18gX56ZWgRl1NUYoKjM8lGnBNHouFViRyUZTZXXEJrfdhrv3N/Joj
         /fIkhxl6aM/S+Ci6N8mVboXrv5MsDSIITqUGeTNujmWraQbUszbednjq+RIIz5qplHzl
         uIw//B1ijqpqIMZb4K1ZHDeMdEqi02KBhaKpJaKEwwHXjNj3MveMN+E3mlWGIWgCpRjr
         JhiTedsmTlI1SqFuTTTqADCZiOyGl5mf+A993jSVAWUerWoWqvF4XLhK61S/1fbqdVoE
         zl6Q==
X-Gm-Message-State: ACgBeo2vRVoFmbLOd4W0IigLEqM5geNnJlj1hsZHINjYuEvGKdDWgJ9N
        UR+XVMQdZpzkjx655ecYk1uBP/seMBo=
X-Google-Smtp-Source: 
 AA6agR4m02pis1qh+TDoZtmCLoo4gyWr8tA9X0my22KIOYFNgMvUeqsuvznp0p9kvjwvgzhImeawt1Fkuxg=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:1943:b0:53a:8602:6758 with SMTP id
 s3-20020a056a00194300b0053a86026758mr3548761pfk.47.1661901386796; Tue, 30 Aug
 2022 16:16:26 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:15:53 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-7-seanjc@google.com>
Subject: [PATCH v5 06/27] KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as
 fault-like
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Exclude General Detect #DBs, which have fault-like behavior but also have
a non-zero payload (DR6.BD=3D1), from nVMX's handling of pending debug
traps.  Opportunistically rewrite the comment to better document what is
being checked, i.e. "has a non-zero payload" vs. "has a payload", and to
call out the many caveats surrounding #DBs that KVM dodges one way or
another.

Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Fixes: 684c0422da71 ("KVM: nVMX: Handle pending #DB when injecting INIT VM-=
exit")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2ca8f1ad9c59..b540c7bf4753 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3853,16 +3853,29 @@ static void nested_vmx_inject_exception_vmexit(stru=
ct kvm_vcpu *vcpu,
 }
=20
 /*
- * Returns true if a debug trap is pending delivery.
+ * Returns true if a debug trap is (likely) pending delivery.  Infer the c=
lass
+ * of a #DB (trap-like vs. fault-like) from the exception payload (to-be-D=
R6).
+ * Using the payload is flawed because code breakpoints (fault-like) and d=
ata
+ * breakpoints (trap-like) set the same bits in DR6 (breakpoint detected),=
 i.e.
+ * this will return false positives if a to-be-injected code breakpoint #D=
B is
+ * pending (from KVM's perspective, but not "pending" across an instruction
+ * boundary).  ICEBP, a.k.a. INT1, is also not reflected here even though =
it
+ * too is trap-like.
  *
- * In KVM, debug traps bear an exception payload. As such, the class of a =
#DB
- * exception may be inferred from the presence of an exception payload.
+ * KVM "works" despite these flaws as ICEBP isn't currently supported by t=
he
+ * emulator, Monitor Trap Flag is not marked pending on intercepted #DBs (=
the
+ * #DB has already happened), and MTF isn't marked pending on code breakpo=
ints
+ * from the emulator (because such #DBs are fault-like and thus don't trig=
ger
+ * actions that fire on instruction retire).
  */
-static inline bool vmx_pending_dbg_trap(struct kvm_vcpu *vcpu)
+static inline unsigned long vmx_get_pending_dbg_trap(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.exception.pending &&
-			vcpu->arch.exception.nr =3D=3D DB_VECTOR &&
-			vcpu->arch.exception.payload;
+	if (!vcpu->arch.exception.pending ||
+	    vcpu->arch.exception.nr !=3D DB_VECTOR)
+		return 0;
+
+	/* General Detect #DBs are always fault-like. */
+	return vcpu->arch.exception.payload & ~DR6_BD;
 }
=20
 /*
@@ -3874,9 +3887,10 @@ static inline bool vmx_pending_dbg_trap(struct kvm_v=
cpu *vcpu)
  */
 static void nested_vmx_update_pending_dbg(struct kvm_vcpu *vcpu)
 {
-	if (vmx_pending_dbg_trap(vcpu))
-		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
-			    vcpu->arch.exception.payload);
+	unsigned long pending_dbg =3D vmx_get_pending_dbg_trap(vcpu);
+
+	if (pending_dbg)
+		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, pending_dbg);
 }
=20
 static bool nested_vmx_preemption_timer_pending(struct kvm_vcpu *vcpu)
@@ -3933,7 +3947,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 	 * while delivering the pending exception.
 	 */
=20
-	if (vcpu->arch.exception.pending && !vmx_pending_dbg_trap(vcpu)) {
+	if (vcpu->arch.exception.pending && !vmx_get_pending_dbg_trap(vcpu)) {
 		if (vmx->nested.nested_run_pending)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 72581ECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:18:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232004AbiH3XSK (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:18:10 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38764 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231895AbiH3XRi (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:17:38 -0400
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA75FA0303
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:29 -0700 (PDT)
Received: by mail-pl1-x649.google.com with SMTP id
 k3-20020a170902c40300b001743aafd6c6so8876141plk.20
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=Txu66imdTeO5r2h9d6P2aTekZZMC58Eb6KLriOaZUxo=;
        b=So6isXaeOVaF/OiT6A6g9HK+ruRAG/fmv/Ma9SAaVxSpg5r19JOL6yYvxiqhnF/2YC
         dAM7UVP+fVsPBChKNW/D/nd79xhFH0jayQZqecIQt9uFIulGV6BE54ICSgt5K51zU4Ef
         C4O8vQqTsWxCSM9XgfeqAwwtwvJfvfowFw8SDhmUizoVzZgy+DfWYl6bAxZ4SoZToGW2
         2WujM0f1aeC7geDAUVPyAmw/MZ3MQW+zpc/m+P1N3dmVRvQyntIa3+CiuQMZudFFL5uo
         QbYaoESg1ILQ1JJ75HD9+0xqRruEO642dZ/TK36yAYRVSWLKimkeb5rNCglypDOZdpCf
         qodg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=Txu66imdTeO5r2h9d6P2aTekZZMC58Eb6KLriOaZUxo=;
        b=tdmqbT+A7TdvAmXTmHrb1c0m457yQUManSqGCL7kJt48dj4B7V84AfNOFwEoiQ5a8t
         VF0sJMsVmkhYSzzQiHOOAa4IPOMve/FnWFuzC0NcY6oxS1wQlLHhg7qIc2dmpyjt3c7+
         yLxpGAgJXh4aTy6rfixvwxqf40G9ENUchCUe4nKvzB01XDMfHaANYCZLOSsznfiTOYJS
         KXlWR7vtvL4wJ7+vDYVQMkwPcRC8XXc/LnxeEQQJznN5pW326oAzeaDPm8TptwNFFfH8
         cZ1/LDQlN8RQ2U1gUs3onMKUafesX9hKuwnr+eLJNUcLfxu9O1YgtAdXkFTvQ6uN9TWS
         gebg==
X-Gm-Message-State: ACgBeo1ajUZhgG811aQtvmRTsjUH0IUFWAr+lyR0+qRJs4n1No5YzyEl
        cdO0HsFEeTvf06ytXAqSts3LjdSxmZ0=
X-Google-Smtp-Source: 
 AA6agR5xvUBS1Q2b8dkQcFIBXz3RZErpC9Pu2f5kPJAkwnjGez0h4Y9iYJwCmu6jY8jm3JnxAYtgQBCSub0=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:902:c945:b0:16e:d24d:1b27 with SMTP id
 i5-20020a170902c94500b0016ed24d1b27mr23845113pla.51.1661901388338; Tue, 30
 Aug 2022 16:16:28 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:15:54 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-8-seanjc@google.com>
Subject: [PATCH v5 07/27] KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor
 Trap Flag
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Service TSS T-flag #DBs prior to pending MTFs, as such #DBs are higher
priority than MTF.  KVM itself doesn't emulate TSS #DBs, and any such
exceptions injected from L1 will be handled by hardware (or morphed to
a fault-like exception if injection fails), but theoretically userspace
could pend a TSS T-flag #DB in conjunction with a pending MTF.

Note, there's no known use case this fixes, it's purely to be technically
correct with respect to Intel's SDM.

Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Fixes: 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction em=
ulation")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b540c7bf4753..5298457b3a1f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3939,15 +3939,17 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 	}
=20
 	/*
-	 * Process any exceptions that are not debug traps before MTF.
+	 * Process exceptions that are higher priority than Monitor Trap Flag:
+	 * fault-like exceptions, TSS T flag #DB (not emulated by KVM, but
+	 * could theoretically come in from userspace), and ICEBP (INT1).
 	 *
 	 * Note that only a pending nested run can block a pending exception.
 	 * Otherwise an injected NMI/interrupt should either be
 	 * lost or delivered to the nested hypervisor in the IDT_VECTORING_INFO,
 	 * while delivering the pending exception.
 	 */
-
-	if (vcpu->arch.exception.pending && !vmx_get_pending_dbg_trap(vcpu)) {
+	if (vcpu->arch.exception.pending &&
+	    !(vmx_get_pending_dbg_trap(vcpu) & ~DR6_BT)) {
 		if (vmx->nested.nested_run_pending)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C6B52ECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:18:18 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232012AbiH3XSQ (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:18:16 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40808 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231349AbiH3XRp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:17:45 -0400
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5DCCBA0638
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:31 -0700 (PDT)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-32a115757b6so191759857b3.13
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=UFryeljcs8yQ8P4nGe9cXbKFWzMUyn5inoM+2kTYTzE=;
        b=EtVUiRa6EYAYdMJY9HfkJeXfOfiZiT4gT4+hiB9cNcvjsBLonDBCQ25vkcPRblbiw3
         blpZpdKD90K9BUSKxGL4lNYR9+Ds0LZa/DbOby02tQoFffneL8QzRZZkQHEro02d9s0b
         oZ+n6ytcHPci31xeQg4NEs8/i2imOB7ud9FFZB3WcAh0i57fTxgiV+Oi767cVJG0BktE
         pPCHhg1FPtYixEgnG2dKEMHTAKDR3QgGCWQF24sEl8dttoG4dtBqdMzf/fE0yXLK7f/K
         TrOTKGQS9RL1g4nBLPkKH99liVpygVyE+ARO6U1xWmQoU4CfTQTz/gOzgrAo+XXUDlM1
         lqSg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=UFryeljcs8yQ8P4nGe9cXbKFWzMUyn5inoM+2kTYTzE=;
        b=hbRW/AX0O79vYIfjweHIIEp09xS9vOuwrtBgla2DtQ1rfB+2bTcrJFK5NbX+LE8DpT
         JMb2B/N+rhAj8HBIDSfm6LdhXpBfZ2Sv6F29JsKoLeDAvNeqLyrbuw8CDthtNOFrI0AS
         ntiiDdux7zuyHqfTAq9JIKqYge3v/NX14zp13tJPEtKBGfhwdG5dVn/bwRVac790stAR
         uHNGazSxSDhRW6fVArPdCk7/cc+l8F9UoXGfnFIx6mA5BY7tGAT7V3Kq9ioQ8C+3ryTq
         lE/bfK/Bshk0fhe9/RqU5k8JhysK4g6CXSbu0W1V8uPdrY2osPhtEHY3gBR/q3OiIT5g
         FxaQ==
X-Gm-Message-State: ACgBeo26vNdTew5N3+Gm/58pqTDwjgGT88MTsAqN5ppdXU803Wvn1xw9
        2pdHCzrsyOvBy9wAltaQDvDFtUG2Xxo=
X-Google-Smtp-Source: 
 AA6agR7uWZY2ecbTO1BssBcoMyC7CW18mv5KMlHRCii6+M29k+np2dM/cluhSbv8k7TUgk+wPkXCIQY7gCk=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a25:add1:0:b0:691:3523:13c8 with SMTP id
 d17-20020a25add1000000b00691352313c8mr14033133ybe.52.1661901390052; Tue, 30
 Aug 2022 16:16:30 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:15:55 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-9-seanjc@google.com>
Subject: [PATCH v5 08/27] KVM: x86: Treat #DBs from the emulator as fault-like
 (code and DR7.GD=1)
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or
trap-like depending the sub-type of #DB, and effectively defer the
decision of what to do with the #DB to the caller.

For the emulator's two calls to exception_type(), treat the #DB as
fault-like, as the emulator handles only code breakpoint and general
detect #DBs, both of which are fault-like.

For event injection, which uses exception_type() to determine whether to
set EFLAGS.RF=3D1 on the stack, keep the current behavior of not setting
RF=3D1 for #DBs.  Intel and AMD explicitly state RF isn't set on code #DBs,
so exempting by failing the "=3D=3D EXCPT_FAULT" check is correct.  The only
other fault-like #DB is General Detect, and despite Intel and AMD both
strongly implying (through omission) that General Detect #DBs should set
RF=3D1, hardware (multiple generations of both Intel and AMD), in fact does
not.  Through insider knowledge, extreme foresight, sheer dumb luck, or
some combination thereof, KVM correctly handled RF for General Detect #DBs.

Fixes: 38827dbd3fb8 ("KVM: x86: Do not update EFLAGS on faulting emulation")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 013580c355d7..39d3eadc43a2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -533,6 +533,7 @@ static int exception_class(int vector)
 #define EXCPT_TRAP		1
 #define EXCPT_ABORT		2
 #define EXCPT_INTERRUPT		3
+#define EXCPT_DB		4
=20
 static int exception_type(int vector)
 {
@@ -543,8 +544,14 @@ static int exception_type(int vector)
=20
 	mask =3D 1 << vector;
=20
-	/* #DB is trap, as instruction watchpoints are handled elsewhere */
-	if (mask & ((1 << DB_VECTOR) | (1 << BP_VECTOR) | (1 << OF_VECTOR)))
+	/*
+	 * #DBs can be trap-like or fault-like, the caller must check other CPU
+	 * state, e.g. DR6, to determine whether a #DB is a trap or fault.
+	 */
+	if (mask & (1 << DB_VECTOR))
+		return EXCPT_DB;
+
+	if (mask & ((1 << BP_VECTOR) | (1 << OF_VECTOR)))
 		return EXCPT_TRAP;
=20
 	if (mask & ((1 << DF_VECTOR) | (1 << MC_VECTOR)))
@@ -8832,6 +8839,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, g=
pa_t cr2_or_gpa,
 		unsigned long rflags =3D static_call(kvm_x86_get_rflags)(vcpu);
 		toggle_interruptibility(vcpu, ctxt->interruptibility);
 		vcpu->arch.emulate_regs_need_sync_to_vcpu =3D false;
+
+		/*
+		 * Note, EXCPT_DB is assumed to be fault-like as the emulator
+		 * only supports code breakpoints and general detect #DB, both
+		 * of which are fault-like.
+		 */
 		if (!ctxt->have_exception ||
 		    exception_type(ctxt->exception.vector) =3D=3D EXCPT_TRAP) {
 			kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
@@ -9755,6 +9768,16 @@ static int inject_pending_event(struct kvm_vcpu *vcp=
u, bool *req_immediate_exit)
=20
 	/* try to inject new event if pending */
 	if (vcpu->arch.exception.pending) {
+		/*
+		 * Fault-class exceptions, except #DBs, set RF=3D1 in the RFLAGS
+		 * value pushed on the stack.  Trap-like exception and all #DBs
+		 * leave RF as-is (KVM follows Intel's behavior in this regard;
+		 * AMD states that code breakpoint #DBs excplitly clear RF=3D0).
+		 *
+		 * Note, most versions of Intel's SDM and AMD's APM incorrectly
+		 * describe the behavior of General Detect #DBs, which are
+		 * fault-like.  They do _not_ set RF, a la code breakpoints.
+		 */
 		if (exception_type(vcpu->arch.exception.nr) =3D=3D EXCPT_FAULT)
 			__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
 					     X86_EFLAGS_RF);
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 040FEECAAD4
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:18:45 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232050AbiH3XSm (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:18:42 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41232 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231955AbiH3XR6 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:17:58 -0400
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF3B8A1A6D
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:33 -0700 (PDT)
Received: by mail-pj1-x1049.google.com with SMTP id
 e11-20020a17090a630b00b001f8b2deb88dso11888058pjj.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=9+JwBO4nX8GrUd03Z9QeV/uS8dpKH0L2avVFFDeS0ks=;
        b=jXXAcjBui2gp6SLGVf6HlOAyNamaRGTU9bCBd9OArQqpSgOWiNRYQyFS62VJGb7awZ
         MbuBS3ATOMu1rcMKmcDipN0/qy5F57JkCRaDfJ8OzL0FWE0F/2/jJJFZCaIcWkBt6K90
         hT7ipBF3zGzKt+qniGAPX03UTOKgY8OwzTWBCzpJbB5x7/JfQT9NAgVGuLzHBOql6xlZ
         b8HdfQbExvTD11n0LeyLbG8nyXbqIJ1AzHlQQmVBHqmMDsoPRqbZNY6VfIfhsGPlphTl
         +2JFMldCRzJ733cuFpTajJrH9E1bJPrAe2kCY13SFhAscyaCw8rg4YEiSmpBMzYOCKPi
         pLUQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=9+JwBO4nX8GrUd03Z9QeV/uS8dpKH0L2avVFFDeS0ks=;
        b=uZfUnyD1gcydlkIypNNyB+eh8Hw+cl45calXgHekOB8vtS7i9mjNuH1tpyNhtRxRgr
         xqXuAzEBLRHgd10NWeDSBZWQkmwKN1snszYm9e7KmxeTiWy2nuhEXofMUO5KI5BiLkb3
         9QnyXxCmf1fYiaN2XdM8TEY6rOLxRyEs5arvXBKUNCOV/Bo4HzRuCtS0MEQXI4MeJhvv
         Dja7apSnDGj9UgXvkxlbNS9u9+oVWfxXX7hWmPjXMmDI8RjqecIWGIN3M4VB4ws7x/ro
         1Vqx74bH6pTU1+p41mjJIP36PR0lPzSUYRTwWIFG6Vo+WwKIKw0yoyWm763SvMCnPpBe
         DhAA==
X-Gm-Message-State: ACgBeo2xjJPt8QsYyggRx7z9B5cHz7oBP+0JWkbiPK4rVmxPQmKHK56f
        UGJe7WmsP++jOE/YS44R+mOHZ94IWfs=
X-Google-Smtp-Source: 
 AA6agR7PpDK+dfX65Z2sMzsgx3m7tePN1WvJ9Hmgkqc960RY99L9vi3njldBIFNLzBOkYTsKCrRRz71/Pr4=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:90a:249:b0:1e0:a8a3:3c6c with SMTP id
 t9-20020a17090a024900b001e0a8a33c6cmr11231pje.0.1661901391766; Tue, 30 Aug
 2022 16:16:31 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:15:56 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-10-seanjc@google.com>
Subject: [PATCH v5 09/27] KVM: x86: Use DR7_GD macro instead of open coding
 check in emulator
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Use DR7_GD in the emulator instead of open coding the check, and drop a
comically wrong comment.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/emulate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f092c54d1a2f..59b61a41125a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4168,8 +4168,7 @@ static int check_dr7_gd(struct x86_emulate_ctxt *ctxt)
=20
 	ctxt->ops->get_dr(ctxt, 7, &dr7);
=20
-	/* Check if DR7.Global_Enable is set */
-	return dr7 & (1 << 13);
+	return dr7 & DR7_GD;
 }
=20
 static int check_dr_read(struct x86_emulate_ctxt *ctxt)
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 048ADECAAD5
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:18:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231882AbiH3XSs (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:18:48 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37868 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229908AbiH3XSA (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:00 -0400
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1662A222A
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:42 -0700 (PDT)
Received: by mail-pg1-x54a.google.com with SMTP id
 m34-20020a634c62000000b0042aff6dff12so6127503pgl.14
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=5F8+14Brh5MBr8u2d1sOMFBu5HJ9d+HmFP+Lqn7cDOc=;
        b=AhzYyLfFNuGgbjuIQkFVfgenN4nYB5fqEYsB7EXL5is96NH1JMK4lxC+5cEqSQnDeg
         xYtK2k+qu9V6IpkOfgcI54ZjUlsUXUgMWRiFqK2gLvXtLwhSJIfaImxh0jD5JNKqZRgh
         tU/HkpmM3/idqCBZvq5o56ItWTF5GShK86ESJbtY5y/7hd9rbZidBDEiDv6MGAe9lc5X
         CywobWBGB7FyVn0Emv25v85c1naYsWbm7u83FWGT8GA8s0PqUVtbmBNcn6uSwp7B7X43
         lWCiedpbC7iWoc9T71zbr6gEkJvZuFx6GrRmWrBKZh6rbx7KfXyUC1VKygj7P1ZfAIZ8
         ZLqg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=5F8+14Brh5MBr8u2d1sOMFBu5HJ9d+HmFP+Lqn7cDOc=;
        b=6a/k0Nrm1DP/QqWkEAtGI4a/E5ZTZg2P5L40bGM2kinc1iVgSWU9uH4ORZVwYfrUvK
         rTNwzH5jg1nlTJcCwsOR9vdbCBwhOHaqeYvryUF4O06fSX/HSKMhLrDDhqrPjpGEAqxA
         bYK/MNXMQajrW4Kt8eD5M/2RXtvzQaTFTa8yVe2RMrzSeeLRxa7JWeEkOEbxPv3nn4KH
         zw2bFmsEfExQFMXNhAxGGplt+VFr8adYIjKZwfhgm1FhTGWz26ukjVTF3jnOlaKxMD/Y
         xQT51lHpPg25xwSO9TM9KzuoItqtHYhva1aY0UN1y66Wf5/Vyy+jziPFsdePWKC1VVl+
         P2sg==
X-Gm-Message-State: ACgBeo26ldNO/bk+X03DeLryRZWMm9N3zy6OGLoEsfX6BE8DlKnhBOvR
        dI+WTb2dMHrtU4+9jIt1qcZvs3fnePk=
X-Google-Smtp-Source: 
 AA6agR6SCG/hxGLoCedS3jsDdR66U15jGtF+14OiyOp6Xd21xNUd8Utew7Vms6pnqUwaGStMPtlP8W4qYqs=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:aa7:8614:0:b0:538:1024:904e with SMTP id
 p20-20020aa78614000000b005381024904emr14961575pfn.49.1661901393819; Tue, 30
 Aug 2022 16:16:33 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:15:57 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-11-seanjc@google.com>
Subject: [PATCH v5 10/27] KVM: nVMX: Ignore SIPI that arrives in L2 when vCPU
 is not in WFS
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Fall through to handling other pending exception/events for L2 if SIPI
is pending while the CPU is not in Wait-for-SIPI.  KVM correctly ignores
the event, but incorrectly returns immediately, e.g. a SIPI coincident
with another event could lead to KVM incorrectly routing the event to L1
instead of L2.

Fixes: bf0cd88ce363 ("KVM: x86: emulate wait-for-SIPI and SIPI-VMExit")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 5298457b3a1f..d11c785b2c1c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3932,10 +3932,12 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 			return -EBUSY;
=20
 		clear_bit(KVM_APIC_SIPI, &apic->pending_events);
-		if (vcpu->arch.mp_state =3D=3D KVM_MP_STATE_INIT_RECEIVED)
+		if (vcpu->arch.mp_state =3D=3D KVM_MP_STATE_INIT_RECEIVED) {
 			nested_vmx_vmexit(vcpu, EXIT_REASON_SIPI_SIGNAL, 0,
 						apic->sipi_vector & 0xFFUL);
-		return 0;
+			return 0;
+		}
+		/* Fallthrough, the SIPI is completely ignored. */
 	}
=20
 	/*
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9CD30ECAAD4
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:18:57 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232101AbiH3XSy (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:18:54 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37816 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231873AbiH3XSH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:07 -0400
Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com
 [IPv6:2607:f8b0:4864:20::b49])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 969B8A263F
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:45 -0700 (PDT)
Received: by mail-yb1-xb49.google.com with SMTP id
 l9-20020a252509000000b00695eb4f1422so857222ybl.13
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=deckfdT/A/DPIznZY3aZvebzVPMaF1XxbtwHCNkJXFI=;
        b=Bue2rck+1YunO2USXcMoE1HA228yX08f9TqadNQEE+1XGV/0FP+x7Gu2Zo8TpSgiz4
         q+DdM/KnrnqLdr1/zmpW9ECG/C4QTM8BAqtQUnd4rQ2Qlz+c+l7gsF9ZSQYEjlc57WdK
         LcT+eY92SK6L/Uw3+4no8UqI99ijE5F8OqW6mnnT/SVsrd+2FglDkg+FPnYM0DKxjhh5
         oTZWMVgowRj6+bpMLu4+5t7KIIc/GstlnIHbGMafa6XSX/ECrjnBDBiLXfR4hKjdK5K3
         gJGUaDGLCAbtXUXqTmhMB2sXWhdswVkcqKGDegMBndHQMx0SQhDFkvdN3UAeB+iK/CaA
         C6EA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=deckfdT/A/DPIznZY3aZvebzVPMaF1XxbtwHCNkJXFI=;
        b=zUV+8g7kCPBaCIEyFBQcu6aRlXaMVJ0mB69Znj5zz2WnVhzJBjSHDlbvr4+e9AAi8H
         3/mYcZegfMXBX4KOL7oDpnoLO3PW0ubhAQIesLXul7wDGY2uGWurHXSJQlQpe0qlFVXg
         3KWj8qHvNBFs2hwmZ8xVS7U72A7U6pt6iuRGAz28fO2Hnpmg2Oq0XPtdPX+vrBsWm2u7
         LK2K3YLzLJM44JEjtxrK39gs1qfqsrWrFh5jq14x0pSRddyncqaepPse7IqvoGAwz9T2
         dHg2o3s+0CtzYE83JqIGn7QerXy3p0QvrBjHMt1TqlTd0CNOU2J7cKan7ENaWHvV14T3
         Zo5w==
X-Gm-Message-State: ACgBeo2zznof2TkQlyz8S2jSNeCbCrANnaJkUh8JkQ80AMIg8D9b1WSl
        2SgBJatUcgBR7oQ5rFh30DJcvsT3eus=
X-Google-Smtp-Source: 
 AA6agR5PKKNgKEUCmv1CKdtSY4LvtPgGItctt3djDy6IoIEgN8l1lxsBwv9cqAVr5TR0V47Jd6tKWHdRgEY=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a05:6902:100b:b0:695:bd4e:95d6 with SMTP id
 w11-20020a056902100b00b00695bd4e95d6mr13920950ybt.595.1661901395593; Tue, 30
 Aug 2022 16:16:35 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:15:58 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-12-seanjc@google.com>
Subject: [PATCH v5 11/27] KVM: nVMX: Unconditionally clear mtf_pending on
 nested VM-Exit
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Clear mtf_pending on nested VM-Exit instead of handling the clear on a
case-by-case basis in vmx_check_nested_events().  The pending MTF should
never survive nested VM-Exit, as it is a property of KVM's run of the
current L2, i.e. should never affect the next L2 run by L1.  In practice,
this is likely a nop as getting to L1 with nested_run_pending is
impossible, and KVM doesn't correctly handle morphing a pending exception
that occurs on a prior injected exception (need for re-injected exception
being the other case where MTF isn't cleared).  However, KVM will
hopefully soon correctly deal with a pending exception on top of an
injected exception.

Add a TODO to document that KVM has an inversion priority bug between
SMIs and MTF (and trap-like #DBS), and that KVM also doesn't properly
save/restore MTF across SMI/RSM.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d11c785b2c1c..51005fef0148 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3905,16 +3905,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *=
vcpu)
 	unsigned long exit_qual;
 	bool block_nested_events =3D
 	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
-	bool mtf_pending =3D vmx->nested.mtf_pending;
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
=20
-	/*
-	 * Clear the MTF state. If a higher priority VM-exit is delivered first,
-	 * this state is discarded.
-	 */
-	if (!block_nested_events)
-		vmx->nested.mtf_pending =3D false;
-
 	if (lapic_in_kernel(vcpu) &&
 		test_bit(KVM_APIC_INIT, &apic->pending_events)) {
 		if (block_nested_events)
@@ -3923,6 +3915,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 		clear_bit(KVM_APIC_INIT, &apic->pending_events);
 		if (vcpu->arch.mp_state !=3D KVM_MP_STATE_INIT_RECEIVED)
 			nested_vmx_vmexit(vcpu, EXIT_REASON_INIT_SIGNAL, 0, 0);
+
+		/* MTF is discarded if the vCPU is in WFS. */
+		vmx->nested.mtf_pending =3D false;
 		return 0;
 	}
=20
@@ -3945,6 +3940,11 @@ static int vmx_check_nested_events(struct kvm_vcpu *=
vcpu)
 	 * fault-like exceptions, TSS T flag #DB (not emulated by KVM, but
 	 * could theoretically come in from userspace), and ICEBP (INT1).
 	 *
+	 * TODO: SMIs have higher priority than MTF and trap-like #DBs (except
+	 * for TSS T flag #DBs).  KVM also doesn't save/restore pending MTF
+	 * across SMI/RSM as it should; that needs to be addressed in order to
+	 * prioritize SMI over MTF and trap-like #DBs.
+	 *
 	 * Note that only a pending nested run can block a pending exception.
 	 * Otherwise an injected NMI/interrupt should either be
 	 * lost or delivered to the nested hypervisor in the IDT_VECTORING_INFO,
@@ -3960,7 +3960,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 		return 0;
 	}
=20
-	if (mtf_pending) {
+	if (vmx->nested.mtf_pending) {
 		if (block_nested_events)
 			return -EBUSY;
 		nested_vmx_update_pending_dbg(vcpu);
@@ -4557,6 +4557,9 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_=
exit_reason,
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
=20
+	/* Pending MTF traps are discarded on VM-Exit. */
+	vmx->nested.mtf_pending =3D false;
+
 	/* trying to cancel vmlaunch/vmresume is a bug */
 	WARN_ON_ONCE(vmx->nested.nested_run_pending);
=20
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 642E2ECAAD4
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:19:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229959AbiH3XTK (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:19:10 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38802 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231895AbiH3XSL (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:11 -0400
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F500A2A88
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:49 -0700 (PDT)
Received: by mail-pj1-x104a.google.com with SMTP id
 m11-20020a17090a3f8b00b001fabfce6a26so5453997pjc.4
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=er0rn6CeFv2sFug+64NWnb5+gXtVZZy3ObF582jX2a4=;
        b=AbJsqnH0z4stM3Vcz6sOQuRN8elndSRUZ0ln3mBw6V20SASPLfRMYDoxB/WelEtthW
         E2GG2rUuCOd16ZdFy5RSYEmBbQEoPNycZhydTaQQM6DOg90hbXveQ56NE+MnvwJw1ams
         pnIhTLkS8aaPtR9K/QIcbYei24WN9gMIXPx9HJFkYc445DOAIedZ/dPdFO9yBg3nPl6O
         vgfU+qRKYBrZf8sS8Lbsx3UO2IOS/RL77TdQQMf3aizMBe4Pp3t2Nm6aQ2vom5LTGU2m
         PuLRQHVCx8i5RPlfS4FGuB1BehXf+ewox10iKsJjbuz2XWaf9w+it9i+X10Z0FtzVw++
         UuuQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=er0rn6CeFv2sFug+64NWnb5+gXtVZZy3ObF582jX2a4=;
        b=qVEKc7CXrHtJGcxdgYM+oktV1KXlyC/qUYNFBXiR1miozrUZsLsWPyvq6JJcqjZPpX
         IEp/6fN7B5xYrvBVO8dAG4fs27eRFJx5uvM/izx1HU07+38ZwbT+n63buMsyaTeY8e/K
         ss/SIIf4HG8R4E7NJvKG93wFDkJ+spAl3O7d0ktXMdutnF/LFMu5bIXhvhxxwfCQ6EZo
         +V2YDTl80LuFzJizuaefSVk/wEnpnfBquQHCJAnBYiijVjCxlD+Qj4N7raoZQa4qYHL1
         gBDBthCuYqSG5DH8tT7ny5GMQYXpY2zsfm0BptMce7He5T7wLzXWbQK9loP7+FD6qt3o
         wT4g==
X-Gm-Message-State: ACgBeo1lRcHSceXXoZ/FCJSWLB9/QOG25GpHA2A8ERuu5G0rHoNBGCB6
        oZIy8mpPQUGLsnGAjwx8+QEhYSEpY+8=
X-Google-Smtp-Source: 
 AA6agR7t5PHHGecbEoijbNb/kN8TA45y5UzQ9XctwWfmoTT3LW+rvJ+HP0dXBYP+FTPo4Cm8X7zNucFS+EY=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:90a:249:b0:1e0:a8a3:3c6c with SMTP id
 t9-20020a17090a024900b001e0a8a33c6cmr11251pje.0.1661901397020; Tue, 30 Aug
 2022 16:16:37 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:15:59 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-13-seanjc@google.com>
Subject: [PATCH v5 12/27] KVM: VMX: Inject #PF on ENCLS as "emulated" #PF
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Treat #PFs that occur during emulation of ENCLS as, wait for it, emulated
page faults.  Practically speaking, this is a glorified nop as the
exception is never of the nested flavor, and it's extremely unlikely the
guest is relying on the side effect of an implicit INVLPG on the faulting
address.

Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce C=
PUID restrictions")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/sgx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index aba8cebdc587..8f95c7c01433 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -129,7 +129,7 @@ static int sgx_inject_fault(struct kvm_vcpu *vcpu, gva_=
t gva, int trapnr)
 		ex.address =3D gva;
 		ex.error_code_valid =3D true;
 		ex.nested_page_fault =3D false;
-		kvm_inject_page_fault(vcpu, &ex);
+		kvm_inject_emulated_page_fault(vcpu, &ex);
 	} else {
 		kvm_inject_gp(vcpu, 0);
 	}
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C023EECAAD4
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:18:31 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232036AbiH3XSa (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:18:30 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39956 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231954AbiH3XR6 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:17:58 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 770F0A1D4F
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:39 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 h5-20020a636c05000000b00429fa12cb65so6216346pgc.21
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=yHBXyjuddZA0FtZxUuLwswfl5TgUYOyURvfnTVyRG1I=;
        b=g3u3SzNGQyYsW/HvOKTnjTobfUSMtxKIIwXr4GOoPY9xa5lrtiZ7K8mV2OnLnGeFtq
         +WcD60MaanYsLmteb7PWxw7jQ1VbLvPyDlOTBpyth6nRg1lL1mMeUdIyb2LZKS37TkjN
         qqhzPmmJYmkpuCyje8Tx8vx2WRmqLBs44Qc/0iVcJKpg5p/Gg70XdvyOvVyU7Ep4kAfE
         xiV22ilsA3TtmvUImdSXl6lQHMYJlAXIgn4Rm92lLVSnYhGJH3xIjwcEsL02S2MZeLa7
         KzCMxzWfPTwRWCdEc+rsuCvMuYy3iWp0D3FF8S8JaNGZ07+stOdbrt3B1MODUGPsKb7I
         IkBg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=yHBXyjuddZA0FtZxUuLwswfl5TgUYOyURvfnTVyRG1I=;
        b=y8Jiz54Ao47Fz5fvibOtsG9UHfM1TesE0NZ5sxyFX5pgCcQyMSvA7+aZjhVVLbTM9k
         ofsua+8zG5LFUjdbOASrzRhXpOIetBpuGP6ykLwQjSsPm5z98VGoUMeNHvRe7cMNBrFV
         J3hMqK45LKsBT4KvjT1JON8Taj2JE0s6qmZ8lY8WpDsm5wyUxoUug5MRtCpuVqwlbpVT
         dN+ZQAS2+R3eUnxcszMGC/p66aRp6Zj3v3vJZ89x2JCuqBF9KMMtHaaZupb+hfMMT7sr
         VwGY9lRuev8r1hxjxn7Ccu4EPuKpYMlgpo1IsqNfbGYzkPgeQO1udld1NLLby372uoTu
         MtmQ==
X-Gm-Message-State: ACgBeo1W7hnXILPqKguzpIhmcGPzPNIoXRGmrmhEeVdDIeqa3PIQEAvM
        pjB9M+n4lhjye23RHbkt/dddXlRRArM=
X-Google-Smtp-Source: 
 AA6agR7d6N9CT5D77KX6PME7R5fE0PdmC0Ng3N1TKCPTVk5cGRVHpLA1hZRdYgOLiiP8HGL7h7b+lq6GKfo=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:1c86:b0:538:b52:b494 with SMTP id
 y6-20020a056a001c8600b005380b52b494mr15505950pfw.49.1661901398839; Tue, 30
 Aug 2022 16:16:38 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:00 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-14-seanjc@google.com>
Subject: [PATCH v5 13/27] KVM: x86: Rename kvm_x86_ops.queue_exception to
 inject_exception
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Rename the kvm_x86_ops hook for exception injection to better reflect
reality, and to align with pretty much every other related function name
in KVM.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 2 +-
 arch/x86/include/asm/kvm_host.h    | 2 +-
 arch/x86/kvm/svm/svm.c             | 4 ++--
 arch/x86/kvm/vmx/vmx.c             | 4 ++--
 arch/x86/kvm/x86.c                 | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 51f777071584..82ba4a564e58 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -67,7 +67,7 @@ KVM_X86_OP(get_interrupt_shadow)
 KVM_X86_OP(patch_hypercall)
 KVM_X86_OP(inject_irq)
 KVM_X86_OP(inject_nmi)
-KVM_X86_OP(queue_exception)
+KVM_X86_OP(inject_exception)
 KVM_X86_OP(cancel_injection)
 KVM_X86_OP(interrupt_allowed)
 KVM_X86_OP(nmi_allowed)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 2c96c43c313a..71b65b8bb8cc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1523,7 +1523,7 @@ struct kvm_x86_ops {
 				unsigned char *hypercall_addr);
 	void (*inject_irq)(struct kvm_vcpu *vcpu, bool reinjected);
 	void (*inject_nmi)(struct kvm_vcpu *vcpu);
-	void (*queue_exception)(struct kvm_vcpu *vcpu);
+	void (*inject_exception)(struct kvm_vcpu *vcpu);
 	void (*cancel_injection)(struct kvm_vcpu *vcpu);
 	int (*interrupt_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
 	int (*nmi_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f3813dbacb9f..a9d3d5a5137f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -461,7 +461,7 @@ static int svm_update_soft_interrupt_rip(struct kvm_vcp=
u *vcpu)
 	return 0;
 }
=20
-static void svm_queue_exception(struct kvm_vcpu *vcpu)
+static void svm_inject_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm =3D to_svm(vcpu);
 	unsigned nr =3D vcpu->arch.exception.nr;
@@ -4798,7 +4798,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata =3D {
 	.patch_hypercall =3D svm_patch_hypercall,
 	.inject_irq =3D svm_inject_irq,
 	.inject_nmi =3D svm_inject_nmi,
-	.queue_exception =3D svm_queue_exception,
+	.inject_exception =3D svm_inject_exception,
 	.cancel_injection =3D svm_cancel_injection,
 	.interrupt_allowed =3D svm_interrupt_allowed,
 	.nmi_allowed =3D svm_nmi_allowed,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7f3581960eb5..be4348fa176c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1684,7 +1684,7 @@ static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
 		vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
 }
=20
-static void vmx_queue_exception(struct kvm_vcpu *vcpu)
+static void vmx_inject_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	unsigned nr =3D vcpu->arch.exception.nr;
@@ -8080,7 +8080,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D {
 	.patch_hypercall =3D vmx_patch_hypercall,
 	.inject_irq =3D vmx_inject_irq,
 	.inject_nmi =3D vmx_inject_nmi,
-	.queue_exception =3D vmx_queue_exception,
+	.inject_exception =3D vmx_inject_exception,
 	.cancel_injection =3D vmx_cancel_injection,
 	.interrupt_allowed =3D vmx_interrupt_allowed,
 	.nmi_allowed =3D vmx_nmi_allowed,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 39d3eadc43a2..24b538b8b0ee 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9713,7 +9713,7 @@ static void kvm_inject_exception(struct kvm_vcpu *vcp=
u)
=20
 	if (vcpu->arch.exception.error_code && !is_protmode(vcpu))
 		vcpu->arch.exception.error_code =3D false;
-	static_call(kvm_x86_queue_exception)(vcpu);
+	static_call(kvm_x86_inject_exception)(vcpu);
 }
=20
 static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate=
_exit)
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 359F3ECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:19:35 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232200AbiH3XTd (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:19:33 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41232 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231747AbiH3XS1 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:27 -0400
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25656A2AA3
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:52 -0700 (PDT)
Received: by mail-pg1-x54a.google.com with SMTP id
 m34-20020a634c62000000b0042aff6dff12so6127610pgl.14
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=mc+H7wKWEoNkKB3uhxK6b0JeaBO10pwijUeej6sqFMA=;
        b=o8VZ9k9HW5p53aG+asX+v/Z+XBDG7RGsxGLExUGQfmWaxOyaoN07pTGWG2JxmvbM86
         qNyjtBlYBsGolyr6WAK74Lexn0cw6RtFvcwbr1DgfYeGxNRuJMkr9Ru19sChYTb8m4DB
         hohY2e1QcEZbiNQmnmmK+k0ahYwhEQ93wXqmx60E7g8StCJP/Fg3Rjc4sQX1fjPK2E/h
         C2hG8ozSsIqyJCrkII3tOKlQ7+dfRN5bZbCY3cb/nbjRpmLLiUDE+zP1lqx0Ky32Ko+S
         558UXq5Ohji3gcSu1J0xRnlrCHb8IJUJSDHwercJTNHaftwvXasu+cOSYuLsi/Mui301
         tcaA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=mc+H7wKWEoNkKB3uhxK6b0JeaBO10pwijUeej6sqFMA=;
        b=ybnBP2xl3iQERoryDz+X4d4hvJFSf7rBXMY68VZverOLJqn/jQyAPCos66qD3o+PeK
         SfBlleeli36S+adM8GZItnhwKFlwyxphdRi1YwVeNfj5zjqP9NGhcaracO+6qq/FwOs1
         Zf+4APBRE857FeVusf9wMwGOfdRBBOodyeYLCgfmwIcnmFFEG/nJmEvt7kLoeI+foKEF
         5ymQUOrPSy3rZREAxZDniV5USdL+vhhxrqe/1+X/TtENElGMRJUJ2tIKOpLSdtFLusLh
         /GgjN7WnGZdOwVdlk5n2Jco/GbvSeOp5MzSmZs2eAlH9kQpFJgVVkD3P1xzMb5pBB5cv
         xB3Q==
X-Gm-Message-State: ACgBeo0o41cRxv4RfQ3uCbAljtkCPKVWCHro9mTJGhHWtNSUHZoCdoLj
        AhclJ204VU0MVxvRlQuLZ3/BqEZwKOM=
X-Google-Smtp-Source: 
 AA6agR5dQruTnezARDk5a/e3k1iLLq30xf/WneYAuFJ9x4Is24gkmQFoUCQqK3xj2tvbIJzQVlQ7gMLr8Bo=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:902:b413:b0:172:a628:7915 with SMTP id
 x19-20020a170902b41300b00172a6287915mr23359621plr.99.1661901400667; Tue, 30
 Aug 2022 16:16:40 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:01 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-15-seanjc@google.com>
Subject: [PATCH v5 14/27] KVM: x86: Make kvm_queued_exception a properly
 named, visible struct
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Move the definition of "struct kvm_queued_exception" out of kvm_vcpu_arch
in anticipation of adding a second instance in kvm_vcpu_arch to handle
exceptions that occur when vectoring an injected exception and are
morphed to VM-Exit instead of leading to #DF.

Opportunistically take advantage of the churn to rename "nr" to "vector".

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 23 +++++-----
 arch/x86/kvm/svm/nested.c       | 47 ++++++++++---------
 arch/x86/kvm/svm/svm.c          | 14 +++---
 arch/x86/kvm/vmx/nested.c       | 42 +++++++++--------
 arch/x86/kvm/vmx/vmx.c          | 20 ++++-----
 arch/x86/kvm/x86.c              | 80 ++++++++++++++++-----------------
 arch/x86/kvm/x86.h              |  3 +-
 7 files changed, 113 insertions(+), 116 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 71b65b8bb8cc..624a0676a8f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -639,6 +639,17 @@ struct kvm_vcpu_xen {
 	struct timer_list poll_timer;
 };
=20
+struct kvm_queued_exception {
+	bool pending;
+	bool injected;
+	bool has_error_code;
+	u8 vector;
+	u32 error_code;
+	unsigned long payload;
+	bool has_payload;
+	u8 nested_apf;
+};
+
 struct kvm_vcpu_arch {
 	/*
 	 * rip and regs accesses must go through
@@ -737,16 +748,8 @@ struct kvm_vcpu_arch {
=20
 	u8 event_exit_inst_len;
=20
-	struct kvm_queued_exception {
-		bool pending;
-		bool injected;
-		bool has_error_code;
-		u8 nr;
-		u32 error_code;
-		unsigned long payload;
-		bool has_payload;
-		u8 nested_apf;
-	} exception;
+	/* Exceptions to be injected to the guest. */
+	struct kvm_queued_exception exception;
=20
 	struct kvm_queued_interrupt {
 		bool injected;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 76dcc8a3e849..8f991592d277 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -468,7 +468,7 @@ static void nested_save_pending_event_to_vmcb12(struct =
vcpu_svm *svm,
 	unsigned int nr;
=20
 	if (vcpu->arch.exception.injected) {
-		nr =3D vcpu->arch.exception.nr;
+		nr =3D vcpu->arch.exception.vector;
 		exit_int_info =3D nr | SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_EXEPT;
=20
 		if (vcpu->arch.exception.has_error_code) {
@@ -1306,42 +1306,45 @@ int nested_svm_check_permissions(struct kvm_vcpu *v=
cpu)
=20
 static bool nested_exit_on_exception(struct vcpu_svm *svm)
 {
-	unsigned int nr =3D svm->vcpu.arch.exception.nr;
+	unsigned int vector =3D svm->vcpu.arch.exception.vector;
=20
-	return (svm->nested.ctl.intercepts[INTERCEPT_EXCEPTION] & BIT(nr));
+	return (svm->nested.ctl.intercepts[INTERCEPT_EXCEPTION] & BIT(vector));
 }
=20
-static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm)
+static void nested_svm_inject_exception_vmexit(struct kvm_vcpu *vcpu)
 {
-	unsigned int nr =3D svm->vcpu.arch.exception.nr;
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	struct vcpu_svm *svm =3D to_svm(vcpu);
 	struct vmcb *vmcb =3D svm->vmcb;
=20
-	vmcb->control.exit_code =3D SVM_EXIT_EXCP_BASE + nr;
+	vmcb->control.exit_code =3D SVM_EXIT_EXCP_BASE + ex->vector;
 	vmcb->control.exit_code_hi =3D 0;
=20
-	if (svm->vcpu.arch.exception.has_error_code)
-		vmcb->control.exit_info_1 =3D svm->vcpu.arch.exception.error_code;
+	if (ex->has_error_code)
+		vmcb->control.exit_info_1 =3D ex->error_code;
=20
 	/*
 	 * EXITINFO2 is undefined for all exception intercepts other
 	 * than #PF.
 	 */
-	if (nr =3D=3D PF_VECTOR) {
-		if (svm->vcpu.arch.exception.nested_apf)
-			vmcb->control.exit_info_2 =3D svm->vcpu.arch.apf.nested_apf_token;
-		else if (svm->vcpu.arch.exception.has_payload)
-			vmcb->control.exit_info_2 =3D svm->vcpu.arch.exception.payload;
+	if (ex->vector =3D=3D PF_VECTOR) {
+		if (ex->nested_apf)
+			vmcb->control.exit_info_2 =3D vcpu->arch.apf.nested_apf_token;
+		else if (ex->has_payload)
+			vmcb->control.exit_info_2 =3D ex->payload;
 		else
-			vmcb->control.exit_info_2 =3D svm->vcpu.arch.cr2;
-	} else if (nr =3D=3D DB_VECTOR) {
+			vmcb->control.exit_info_2 =3D vcpu->arch.cr2;
+	} else if (ex->vector =3D=3D DB_VECTOR) {
 		/* See inject_pending_event.  */
-		kvm_deliver_exception_payload(&svm->vcpu);
-		if (svm->vcpu.arch.dr7 & DR7_GD) {
-			svm->vcpu.arch.dr7 &=3D ~DR7_GD;
-			kvm_update_dr7(&svm->vcpu);
+		kvm_deliver_exception_payload(vcpu, ex);
+
+		if (vcpu->arch.dr7 & DR7_GD) {
+			vcpu->arch.dr7 &=3D ~DR7_GD;
+			kvm_update_dr7(vcpu);
 		}
-	} else
-		WARN_ON(svm->vcpu.arch.exception.has_payload);
+	} else {
+		WARN_ON(ex->has_payload);
+	}
=20
 	nested_svm_vmexit(svm);
 }
@@ -1379,7 +1382,7 @@ static int svm_check_nested_events(struct kvm_vcpu *v=
cpu)
                         return -EBUSY;
 		if (!nested_exit_on_exception(svm))
 			return 0;
-		nested_svm_inject_exception_vmexit(svm);
+		nested_svm_inject_exception_vmexit(vcpu);
 		return 0;
 	}
=20
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a9d3d5a5137f..dbd10d61f29d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -463,22 +463,20 @@ static int svm_update_soft_interrupt_rip(struct kvm_v=
cpu *vcpu)
=20
 static void svm_inject_exception(struct kvm_vcpu *vcpu)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
 	struct vcpu_svm *svm =3D to_svm(vcpu);
-	unsigned nr =3D vcpu->arch.exception.nr;
-	bool has_error_code =3D vcpu->arch.exception.has_error_code;
-	u32 error_code =3D vcpu->arch.exception.error_code;
=20
-	kvm_deliver_exception_payload(vcpu);
+	kvm_deliver_exception_payload(vcpu, ex);
=20
-	if (kvm_exception_is_soft(nr) &&
+	if (kvm_exception_is_soft(ex->vector) &&
 	    svm_update_soft_interrupt_rip(vcpu))
 		return;
=20
-	svm->vmcb->control.event_inj =3D nr
+	svm->vmcb->control.event_inj =3D ex->vector
 		| SVM_EVTINJ_VALID
-		| (has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
+		| (ex->has_error_code ? SVM_EVTINJ_VALID_ERR : 0)
 		| SVM_EVTINJ_TYPE_EXEPT;
-	svm->vmcb->control.event_inj_err =3D error_code;
+	svm->vmcb->control.event_inj_err =3D ex->error_code;
 }
=20
 static void svm_init_erratum_383(void)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 51005fef0148..cbbe62a84493 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -446,29 +446,27 @@ static bool nested_vmx_is_page_fault_vmexit(struct vm=
cs12 *vmcs12,
  */
 static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned long=
 *exit_qual)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
-	unsigned int nr =3D vcpu->arch.exception.nr;
-	bool has_payload =3D vcpu->arch.exception.has_payload;
-	unsigned long payload =3D vcpu->arch.exception.payload;
=20
-	if (nr =3D=3D PF_VECTOR) {
-		if (vcpu->arch.exception.nested_apf) {
+	if (ex->vector =3D=3D PF_VECTOR) {
+		if (ex->nested_apf) {
 			*exit_qual =3D vcpu->arch.apf.nested_apf_token;
 			return 1;
 		}
-		if (nested_vmx_is_page_fault_vmexit(vmcs12,
-						    vcpu->arch.exception.error_code)) {
-			*exit_qual =3D has_payload ? payload : vcpu->arch.cr2;
+		if (nested_vmx_is_page_fault_vmexit(vmcs12, ex->error_code)) {
+			*exit_qual =3D ex->has_payload ? ex->payload : vcpu->arch.cr2;
 			return 1;
 		}
-	} else if (vmcs12->exception_bitmap & (1u << nr)) {
-		if (nr =3D=3D DB_VECTOR) {
-			if (!has_payload) {
-				payload =3D vcpu->arch.dr6;
-				payload &=3D ~DR6_BT;
-				payload ^=3D DR6_ACTIVE_LOW;
+	} else if (vmcs12->exception_bitmap & (1u << ex->vector)) {
+		if (ex->vector =3D=3D DB_VECTOR) {
+			if (ex->has_payload) {
+				*exit_qual =3D ex->payload;
+			} else {
+				*exit_qual =3D vcpu->arch.dr6;
+				*exit_qual &=3D ~DR6_BT;
+				*exit_qual ^=3D DR6_ACTIVE_LOW;
 			}
-			*exit_qual =3D payload;
 		} else
 			*exit_qual =3D 0;
 		return 1;
@@ -3718,7 +3716,7 @@ static void vmcs12_save_pending_event(struct kvm_vcpu=
 *vcpu,
 	     is_double_fault(exit_intr_info))) {
 		vmcs12->idt_vectoring_info_field =3D 0;
 	} else if (vcpu->arch.exception.injected) {
-		nr =3D vcpu->arch.exception.nr;
+		nr =3D vcpu->arch.exception.vector;
 		idt_vectoring =3D nr | VECTORING_INFO_VALID_MASK;
=20
 		if (kvm_exception_is_soft(nr)) {
@@ -3822,11 +3820,11 @@ static int vmx_complete_nested_posted_interrupt(str=
uct kvm_vcpu *vcpu)
 static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
 					       unsigned long exit_qual)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	u32 intr_info =3D ex->vector | INTR_INFO_VALID_MASK;
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
-	unsigned int nr =3D vcpu->arch.exception.nr;
-	u32 intr_info =3D nr | INTR_INFO_VALID_MASK;
=20
-	if (vcpu->arch.exception.has_error_code) {
+	if (ex->has_error_code) {
 		/*
 		 * Intel CPUs do not generate error codes with bits 31:16 set,
 		 * and more importantly VMX disallows setting bits 31:16 in the
@@ -3836,11 +3834,11 @@ static void nested_vmx_inject_exception_vmexit(stru=
ct kvm_vcpu *vcpu,
 		 * generate "full" 32-bit error codes, so KVM allows userspace
 		 * to inject exception error codes with bits 31:16 set.
 		 */
-		vmcs12->vm_exit_intr_error_code =3D (u16)vcpu->arch.exception.error_code;
+		vmcs12->vm_exit_intr_error_code =3D (u16)ex->error_code;
 		intr_info |=3D INTR_INFO_DELIVER_CODE_MASK;
 	}
=20
-	if (kvm_exception_is_soft(nr))
+	if (kvm_exception_is_soft(ex->vector))
 		intr_info |=3D INTR_TYPE_SOFT_EXCEPTION;
 	else
 		intr_info |=3D INTR_TYPE_HARD_EXCEPTION;
@@ -3871,7 +3869,7 @@ static void nested_vmx_inject_exception_vmexit(struct=
 kvm_vcpu *vcpu,
 static inline unsigned long vmx_get_pending_dbg_trap(struct kvm_vcpu *vcpu)
 {
 	if (!vcpu->arch.exception.pending ||
-	    vcpu->arch.exception.nr !=3D DB_VECTOR)
+	    vcpu->arch.exception.vector !=3D DB_VECTOR)
 		return 0;
=20
 	/* General Detect #DBs are always fault-like. */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index be4348fa176c..07c4246415e9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1659,7 +1659,7 @@ static void vmx_update_emulated_instruction(struct kv=
m_vcpu *vcpu)
 	 */
 	if (nested_cpu_has_mtf(vmcs12) &&
 	    (!vcpu->arch.exception.pending ||
-	     vcpu->arch.exception.nr =3D=3D DB_VECTOR))
+	     vcpu->arch.exception.vector =3D=3D DB_VECTOR))
 		vmx->nested.mtf_pending =3D true;
 	else
 		vmx->nested.mtf_pending =3D false;
@@ -1686,15 +1686,13 @@ static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
=20
 static void vmx_inject_exception(struct kvm_vcpu *vcpu)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	u32 intr_info =3D ex->vector | INTR_INFO_VALID_MASK;
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
-	unsigned nr =3D vcpu->arch.exception.nr;
-	bool has_error_code =3D vcpu->arch.exception.has_error_code;
-	u32 error_code =3D vcpu->arch.exception.error_code;
-	u32 intr_info =3D nr | INTR_INFO_VALID_MASK;
=20
-	kvm_deliver_exception_payload(vcpu);
+	kvm_deliver_exception_payload(vcpu, ex);
=20
-	if (has_error_code) {
+	if (ex->has_error_code) {
 		/*
 		 * Despite the error code being architecturally defined as 32
 		 * bits, and the VMCS field being 32 bits, Intel CPUs and thus
@@ -1705,21 +1703,21 @@ static void vmx_inject_exception(struct kvm_vcpu *v=
cpu)
 		 * the upper bits to avoid VM-Fail, losing information that
 		 * does't really exist is preferable to killing the VM.
 		 */
-		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, (u16)error_code);
+		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, (u16)ex->error_code);
 		intr_info |=3D INTR_INFO_DELIVER_CODE_MASK;
 	}
=20
 	if (vmx->rmode.vm86_active) {
 		int inc_eip =3D 0;
-		if (kvm_exception_is_soft(nr))
+		if (kvm_exception_is_soft(ex->vector))
 			inc_eip =3D vcpu->arch.event_exit_inst_len;
-		kvm_inject_realmode_interrupt(vcpu, nr, inc_eip);
+		kvm_inject_realmode_interrupt(vcpu, ex->vector, inc_eip);
 		return;
 	}
=20
 	WARN_ON_ONCE(vmx->emulation_required);
=20
-	if (kvm_exception_is_soft(nr)) {
+	if (kvm_exception_is_soft(ex->vector)) {
 		vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
 			     vmx->vcpu.arch.event_exit_inst_len);
 		intr_info |=3D INTR_TYPE_SOFT_EXCEPTION;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 24b538b8b0ee..bed42a75b515 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -561,16 +561,13 @@ static int exception_type(int vector)
 	return EXCPT_FAULT;
 }
=20
-void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu)
+void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
+				   struct kvm_queued_exception *ex)
 {
-	unsigned nr =3D vcpu->arch.exception.nr;
-	bool has_payload =3D vcpu->arch.exception.has_payload;
-	unsigned long payload =3D vcpu->arch.exception.payload;
-
-	if (!has_payload)
+	if (!ex->has_payload)
 		return;
=20
-	switch (nr) {
+	switch (ex->vector) {
 	case DB_VECTOR:
 		/*
 		 * "Certain debug exceptions may clear bit 0-3.  The
@@ -595,8 +592,8 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcp=
u)
 		 * So they need to be flipped for DR6.
 		 */
 		vcpu->arch.dr6 |=3D DR6_ACTIVE_LOW;
-		vcpu->arch.dr6 |=3D payload;
-		vcpu->arch.dr6 ^=3D payload & DR6_ACTIVE_LOW;
+		vcpu->arch.dr6 |=3D ex->payload;
+		vcpu->arch.dr6 ^=3D ex->payload & DR6_ACTIVE_LOW;
=20
 		/*
 		 * The #DB payload is defined as compatible with the 'pending
@@ -607,12 +604,12 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *v=
cpu)
 		vcpu->arch.dr6 &=3D ~BIT(12);
 		break;
 	case PF_VECTOR:
-		vcpu->arch.cr2 =3D payload;
+		vcpu->arch.cr2 =3D ex->payload;
 		break;
 	}
=20
-	vcpu->arch.exception.has_payload =3D false;
-	vcpu->arch.exception.payload =3D 0;
+	ex->has_payload =3D false;
+	ex->payload =3D 0;
 }
 EXPORT_SYMBOL_GPL(kvm_deliver_exception_payload);
=20
@@ -651,17 +648,18 @@ static void kvm_multiple_exception(struct kvm_vcpu *v=
cpu,
 			vcpu->arch.exception.injected =3D false;
 		}
 		vcpu->arch.exception.has_error_code =3D has_error;
-		vcpu->arch.exception.nr =3D nr;
+		vcpu->arch.exception.vector =3D nr;
 		vcpu->arch.exception.error_code =3D error_code;
 		vcpu->arch.exception.has_payload =3D has_payload;
 		vcpu->arch.exception.payload =3D payload;
 		if (!is_guest_mode(vcpu))
-			kvm_deliver_exception_payload(vcpu);
+			kvm_deliver_exception_payload(vcpu,
+						      &vcpu->arch.exception);
 		return;
 	}
=20
 	/* to check exception */
-	prev_nr =3D vcpu->arch.exception.nr;
+	prev_nr =3D vcpu->arch.exception.vector;
 	if (prev_nr =3D=3D DF_VECTOR) {
 		/* triple fault -> shutdown */
 		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
@@ -679,7 +677,7 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcp=
u,
 		vcpu->arch.exception.pending =3D true;
 		vcpu->arch.exception.injected =3D false;
 		vcpu->arch.exception.has_error_code =3D true;
-		vcpu->arch.exception.nr =3D DF_VECTOR;
+		vcpu->arch.exception.vector =3D DF_VECTOR;
 		vcpu->arch.exception.error_code =3D 0;
 		vcpu->arch.exception.has_payload =3D false;
 		vcpu->arch.exception.payload =3D 0;
@@ -5015,25 +5013,24 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vc=
pu *vcpu,
 static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 					       struct kvm_vcpu_events *events)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+
 	process_nmi(vcpu);
=20
 	if (kvm_check_request(KVM_REQ_SMI, vcpu))
 		process_smi(vcpu);
=20
 	/*
-	 * In guest mode, payload delivery should be deferred,
-	 * so that the L1 hypervisor can intercept #PF before
-	 * CR2 is modified (or intercept #DB before DR6 is
-	 * modified under nVMX). Unless the per-VM capability,
-	 * KVM_CAP_EXCEPTION_PAYLOAD, is set, we may not defer the delivery of
-	 * an exception payload and handle after a KVM_GET_VCPU_EVENTS. Since we
-	 * opportunistically defer the exception payload, deliver it if the
-	 * capability hasn't been requested before processing a
-	 * KVM_GET_VCPU_EVENTS.
+	 * In guest mode, payload delivery should be deferred if the exception
+	 * will be intercepted by L1, e.g. KVM should not modifying CR2 if L1
+	 * intercepts #PF, ditto for DR6 and #DBs.  If the per-VM capability,
+	 * KVM_CAP_EXCEPTION_PAYLOAD, is not set, userspace may or may not
+	 * propagate the payload and so it cannot be safely deferred.  Deliver
+	 * the payload if the capability hasn't been requested.
 	 */
 	if (!vcpu->kvm->arch.exception_payload_enabled &&
-	    vcpu->arch.exception.pending && vcpu->arch.exception.has_payload)
-		kvm_deliver_exception_payload(vcpu);
+	    ex->pending && ex->has_payload)
+		kvm_deliver_exception_payload(vcpu, ex);
=20
 	/*
 	 * The API doesn't provide the instruction length for software
@@ -5041,26 +5038,25 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(stru=
ct kvm_vcpu *vcpu,
 	 * isn't advanced, we should expect to encounter the exception
 	 * again.
 	 */
-	if (kvm_exception_is_soft(vcpu->arch.exception.nr)) {
+	if (kvm_exception_is_soft(ex->vector)) {
 		events->exception.injected =3D 0;
 		events->exception.pending =3D 0;
 	} else {
-		events->exception.injected =3D vcpu->arch.exception.injected;
-		events->exception.pending =3D vcpu->arch.exception.pending;
+		events->exception.injected =3D ex->injected;
+		events->exception.pending =3D ex->pending;
 		/*
 		 * For ABI compatibility, deliberately conflate
 		 * pending and injected exceptions when
 		 * KVM_CAP_EXCEPTION_PAYLOAD isn't enabled.
 		 */
 		if (!vcpu->kvm->arch.exception_payload_enabled)
-			events->exception.injected |=3D
-				vcpu->arch.exception.pending;
+			events->exception.injected |=3D ex->pending;
 	}
-	events->exception.nr =3D vcpu->arch.exception.nr;
-	events->exception.has_error_code =3D vcpu->arch.exception.has_error_code;
-	events->exception.error_code =3D vcpu->arch.exception.error_code;
-	events->exception_has_payload =3D vcpu->arch.exception.has_payload;
-	events->exception_payload =3D vcpu->arch.exception.payload;
+	events->exception.nr =3D ex->vector;
+	events->exception.has_error_code =3D ex->has_error_code;
+	events->exception.error_code =3D ex->error_code;
+	events->exception_has_payload =3D ex->has_payload;
+	events->exception_payload =3D ex->payload;
=20
 	events->interrupt.injected =3D
 		vcpu->arch.interrupt.injected && !vcpu->arch.interrupt.soft;
@@ -5132,7 +5128,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct =
kvm_vcpu *vcpu,
 	process_nmi(vcpu);
 	vcpu->arch.exception.injected =3D events->exception.injected;
 	vcpu->arch.exception.pending =3D events->exception.pending;
-	vcpu->arch.exception.nr =3D events->exception.nr;
+	vcpu->arch.exception.vector =3D events->exception.nr;
 	vcpu->arch.exception.has_error_code =3D events->exception.has_error_code;
 	vcpu->arch.exception.error_code =3D events->exception.error_code;
 	vcpu->arch.exception.has_payload =3D events->exception_has_payload;
@@ -9706,7 +9702,7 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
=20
 static void kvm_inject_exception(struct kvm_vcpu *vcpu)
 {
-	trace_kvm_inj_exception(vcpu->arch.exception.nr,
+	trace_kvm_inj_exception(vcpu->arch.exception.vector,
 				vcpu->arch.exception.has_error_code,
 				vcpu->arch.exception.error_code,
 				vcpu->arch.exception.injected);
@@ -9778,12 +9774,12 @@ static int inject_pending_event(struct kvm_vcpu *vc=
pu, bool *req_immediate_exit)
 		 * describe the behavior of General Detect #DBs, which are
 		 * fault-like.  They do _not_ set RF, a la code breakpoints.
 		 */
-		if (exception_type(vcpu->arch.exception.nr) =3D=3D EXCPT_FAULT)
+		if (exception_type(vcpu->arch.exception.vector) =3D=3D EXCPT_FAULT)
 			__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
 					     X86_EFLAGS_RF);
=20
-		if (vcpu->arch.exception.nr =3D=3D DB_VECTOR) {
-			kvm_deliver_exception_payload(vcpu);
+		if (vcpu->arch.exception.vector =3D=3D DB_VECTOR) {
+			kvm_deliver_exception_payload(vcpu, &vcpu->arch.exception);
 			if (vcpu->arch.dr7 & DR7_GD) {
 				vcpu->arch.dr7 &=3D ~DR7_GD;
 				kvm_update_dr7(vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 1926d2cb8e79..4147d27f9fbc 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -286,7 +286,8 @@ int kvm_write_guest_virt_system(struct kvm_vcpu *vcpu,
=20
 int handle_ud(struct kvm_vcpu *vcpu);
=20
-void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu);
+void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
+				   struct kvm_queued_exception *ex);
=20
 void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu);
 u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E25FFECAAD5
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:19:27 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232168AbiH3XT0 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:19:26 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41200 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232021AbiH3XS0 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:26 -0400
Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com
 [IPv6:2607:f8b0:4864:20::449])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FAA8A2D90
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:53 -0700 (PDT)
Received: by mail-pf1-x449.google.com with SMTP id
 a21-20020a62bd15000000b005360da6b25aso5226962pff.23
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=iFnSRJpXZ6ws+enk3niGAa+sQbQbq4VBntoKsU5VXE4=;
        b=J8t21ciPb4uclzBy8RdolHh16L4uHjSPlAR3W96eGY1NRgq0bwUnGyIb7/xB0zD6Mm
         t/Ekl8MwSi7XCmEjDayXQ66uLNr8dGluPrZS3VFbzGyvmcWjXi9ghNIf+szUOS7yv96Y
         ocFW1euJKtywYavEfbL0oQzuzrI8Q9YxOfyRl1dU/vxjJzyM7YV3K14yHzyLRLQPt12Q
         QpopCaiT9/frav6Jc0lR+u4VUsuCI0t3+brnQiGsxhf5Zfi4WHZjxYU2U5t7op40WobX
         hcgV/BsLdyyDtBPvyOoGr3A+jRCIYyLKCudgDC0XkCtx9uutRUd3A2fU4CVhGS8Tl4hZ
         ObdQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=iFnSRJpXZ6ws+enk3niGAa+sQbQbq4VBntoKsU5VXE4=;
        b=Nl1CHXWnTpTG1dibOezP83RTbfwzSSKBoO4qCMDBoZOMZpgZYzCajH3cEAI37KmRqE
         Nhts8iLSL14iEWJEBsGV62qiS5m9NmT0m26P1KAPls0vjqy6vaZbeUchbBAHQV9RsJU2
         FwLED/+hCDkaSriRrJ5ZfBTr/SIojXVk6TtJpRcllkJOA4aRbMdwDd1YaRsMiaT2Ekad
         GhZ3mYB8CB+bR2FoyVZHjj0XdWu755eA/q/+wPACyOSmWvSHdUEU0xvksvXyZQkmqGam
         SVULG7eXEaytl5pbIESBmuLx7whPA5efzF+XD2dZy7bis2ZcpFQ5j0GY7A9noCHqPIcG
         6rVw==
X-Gm-Message-State: ACgBeo16381DWyXWdeIW9m2SM1L7pDDgEALyz751b7DMWCPHAq+DpxHj
        oKnvlw83LS3ayoNPhrIfLU2k06J/UyM=
X-Google-Smtp-Source: 
 AA6agR5R7oF/e8SX6fs9t5XZLnurDaJZQwlYr9DoRuTdAPN9YKHOa5hFo6PDErSi/a1WPPTEyT8/sEFd4qw=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:902:aa03:b0:172:9b18:ebb1 with SMTP id
 be3-20020a170902aa0300b001729b18ebb1mr22182966plb.24.1661901402413; Tue, 30
 Aug 2022 16:16:42 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:02 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-16-seanjc@google.com>
Subject: [PATCH v5 15/27] KVM: x86: Formalize blocking of nested pending
 exceptions
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Capture nested_run_pending as block_pending_exceptions so that the logic
of why exceptions are blocked only needs to be documented once instead of
at every place that employs the logic.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 26 ++++++++++++++++----------
 arch/x86/kvm/vmx/nested.c | 29 ++++++++++++++++++-----------
 2 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 8f991592d277..a6111392985c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1356,10 +1356,22 @@ static inline bool nested_exit_on_init(struct vcpu_=
svm *svm)
=20
 static int svm_check_nested_events(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_svm *svm =3D to_svm(vcpu);
-	bool block_nested_events =3D
-		kvm_event_needs_reinjection(vcpu) || svm->nested.nested_run_pending;
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
+	struct vcpu_svm *svm =3D to_svm(vcpu);
+	/*
+	 * Only a pending nested run blocks a pending exception.  If there is a
+	 * previously injected event, the pending exception occurred while said
+	 * event was being delivered and thus needs to be handled.
+	 */
+	bool block_nested_exceptions =3D svm->nested.nested_run_pending;
+	/*
+	 * New events (not exceptions) are only recognized at instruction
+	 * boundaries.  If an event needs reinjection, then KVM is handling a
+	 * VM-Exit that occurred _during_ instruction execution; new events are
+	 * blocked until the instruction completes.
+	 */
+	bool block_nested_events =3D block_nested_exceptions ||
+				   kvm_event_needs_reinjection(vcpu);
=20
 	if (lapic_in_kernel(vcpu) &&
 	    test_bit(KVM_APIC_INIT, &apic->pending_events)) {
@@ -1372,13 +1384,7 @@ static int svm_check_nested_events(struct kvm_vcpu *=
vcpu)
 	}
=20
 	if (vcpu->arch.exception.pending) {
-		/*
-		 * Only a pending nested run can block a pending exception.
-		 * Otherwise an injected NMI/interrupt should either be
-		 * lost or delivered to the nested hypervisor in the EXITINTINFO
-		 * vmcb field, while delivering the pending exception.
-		 */
-		if (svm->nested.nested_run_pending)
+		if (block_nested_exceptions)
                         return -EBUSY;
 		if (!nested_exit_on_exception(svm))
 			return 0;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index cbbe62a84493..4bc2250502ea 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3899,11 +3899,23 @@ static bool nested_vmx_preemption_timer_pending(str=
uct kvm_vcpu *vcpu)
=20
 static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
-	unsigned long exit_qual;
-	bool block_nested_events =3D
-	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
+	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
+	unsigned long exit_qual;
+	/*
+	 * Only a pending nested run blocks a pending exception.  If there is a
+	 * previously injected event, the pending exception occurred while said
+	 * event was being delivered and thus needs to be handled.
+	 */
+	bool block_nested_exceptions =3D vmx->nested.nested_run_pending;
+	/*
+	 * New events (not exceptions) are only recognized at instruction
+	 * boundaries.  If an event needs reinjection, then KVM is handling a
+	 * VM-Exit that occurred _during_ instruction execution; new events are
+	 * blocked until the instruction completes.
+	 */
+	bool block_nested_events =3D block_nested_exceptions ||
+				   kvm_event_needs_reinjection(vcpu);
=20
 	if (lapic_in_kernel(vcpu) &&
 		test_bit(KVM_APIC_INIT, &apic->pending_events)) {
@@ -3942,15 +3954,10 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 	 * for TSS T flag #DBs).  KVM also doesn't save/restore pending MTF
 	 * across SMI/RSM as it should; that needs to be addressed in order to
 	 * prioritize SMI over MTF and trap-like #DBs.
-	 *
-	 * Note that only a pending nested run can block a pending exception.
-	 * Otherwise an injected NMI/interrupt should either be
-	 * lost or delivered to the nested hypervisor in the IDT_VECTORING_INFO,
-	 * while delivering the pending exception.
 	 */
 	if (vcpu->arch.exception.pending &&
 	    !(vmx_get_pending_dbg_trap(vcpu) & ~DR6_BT)) {
-		if (vmx->nested.nested_run_pending)
+		if (block_nested_exceptions)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
 			goto no_vmexit;
@@ -3967,7 +3974,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 	}
=20
 	if (vcpu->arch.exception.pending) {
-		if (vmx->nested.nested_run_pending)
+		if (block_nested_exceptions)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
 			goto no_vmexit;
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C7950ECAAD4
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:19:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232227AbiH3XTp (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:19:45 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41254 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232037AbiH3XSa (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:30 -0400
Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com
 [IPv6:2607:f8b0:4864:20::114a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05AD8A2DA1
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:53 -0700 (PDT)
Received: by mail-yw1-x114a.google.com with SMTP id
 00721157ae682-3328a211611so188935867b3.5
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=nT/9uc77S8ExzdP0GdRk6lyVALBCe6fSmVkqSwOQXNM=;
        b=XoKFew2JGd5KCvlvH+4g/133oGG/6QUoMdKIxe00im2NLbBSuVDpGkWJRC+RJcQnUL
         A8nOT7BgGYh1cVo/ldPYIkQGSvyyEgCgZtzf9XtxgH6upHVMQ/EM6yUIn4BGMVjPTEai
         7flsFDcWWsMN3wyE1p94AwN/ACoUmW5toTfj76h/STU4jq1ujDox4OvuZJp3mrmY63RD
         7gIZcn7z0X/HkTaL/3e4OMCywsO+TZIe7djX4uIdTMka54m2ZQ4/QN5V5FaiRiaKhhB2
         Cb9yNaeAFVGOsmfjyx1J4/becTlZ7wA2cRET7+dVoS43jPCLKKedooSY0Y4u0m7+ETMN
         cWCA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=nT/9uc77S8ExzdP0GdRk6lyVALBCe6fSmVkqSwOQXNM=;
        b=gv5Tys42sOLxlCaADp+DKkIk+W1lKYhgDyI406LqfEgBb9pmQH2Ue5YQfV+nZhQM4T
         gM6uLudaueVKGIwSmqsdFbMYLXlSc5KLITs8Y1UzUEO9oUfwN2cURy+r/j1sK60cVrNC
         x2lXddKr3vxeTtumtQjhuwNoC0kNriEvC5MjeR1sXPFXvB4E9SKGNkWLXPBXCu/EDrW3
         b5kR4s8vgGfkW/cLqcFXfXhTyJD2x087UcsStqQfEZAN+EUqSANUd2cInH1Jfym6IDCk
         Hcgx4R9PuwOfbdYeHC+epSqkMnKRoxGJioNVJDWGh+6oN4D9tr9sy4J/kn/TtsLBNYzY
         op/g==
X-Gm-Message-State: ACgBeo17x6AKcwgqQX/JOOs+P8QR/L4lYqOL/ctTfVUhRYNl8X7YrlzC
        vCG4QTaJpD0sHnNDncZMieWEO2NJnIY=
X-Google-Smtp-Source: 
 AA6agR77gN34mCj9Uhtb58dxv41yVm9aD+Pz4eIF+jNcpoQDR1cK7eBAdLL7t4wrCoYGVKN/aYVpiHd1oLk=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a25:e90e:0:b0:695:64cf:5d2 with SMTP id
 n14-20020a25e90e000000b0069564cf05d2mr13184768ybd.541.1661901404187; Tue, 30
 Aug 2022 16:16:44 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:03 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-17-seanjc@google.com>
Subject: [PATCH v5 16/27] KVM: x86: Use kvm_queue_exception_e() to queue #DF
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Queue #DF by recursing on kvm_multiple_exception() by way of
kvm_queue_exception_e() instead of open coding the behavior.  This will
allow KVM to Just Work when a future commit moves exception interception
checks (for L2 =3D> L1) into kvm_multiple_exception().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bed42a75b515..c19658b7be23 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -667,25 +667,22 @@ static void kvm_multiple_exception(struct kvm_vcpu *v=
cpu,
 	}
 	class1 =3D exception_class(prev_nr);
 	class2 =3D exception_class(nr);
-	if ((class1 =3D=3D EXCPT_CONTRIBUTORY && class2 =3D=3D EXCPT_CONTRIBUTORY)
-		|| (class1 =3D=3D EXCPT_PF && class2 !=3D EXCPT_BENIGN)) {
+	if ((class1 =3D=3D EXCPT_CONTRIBUTORY && class2 =3D=3D EXCPT_CONTRIBUTORY=
) ||
+	    (class1 =3D=3D EXCPT_PF && class2 !=3D EXCPT_BENIGN)) {
 		/*
-		 * Generate double fault per SDM Table 5-5.  Set
-		 * exception.pending =3D true so that the double fault
-		 * can trigger a nested vmexit.
+		 * Synthesize #DF.  Clear the previously injected or pending
+		 * exception so as not to incorrectly trigger shutdown.
 		 */
-		vcpu->arch.exception.pending =3D true;
 		vcpu->arch.exception.injected =3D false;
-		vcpu->arch.exception.has_error_code =3D true;
-		vcpu->arch.exception.vector =3D DF_VECTOR;
-		vcpu->arch.exception.error_code =3D 0;
-		vcpu->arch.exception.has_payload =3D false;
-		vcpu->arch.exception.payload =3D 0;
-	} else
+		vcpu->arch.exception.pending =3D false;
+
+		kvm_queue_exception_e(vcpu, DF_VECTOR, 0);
+	} else {
 		/* replace previous exception with a new one in a hope
 		   that instruction re-execution will regenerate lost
 		   exception */
 		goto queue;
+	}
 }
=20
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4EF9FECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:19:04 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232115AbiH3XTC (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:19:02 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39746 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231894AbiH3XSJ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:09 -0400
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FF0DA2850
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:46 -0700 (PDT)
Received: by mail-pj1-x104a.google.com with SMTP id
 36-20020a17090a0fa700b001fd64c962afso5384968pjz.5
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=M+49WH3sEd/2PdxOVwFHjnmxDrEr5JWuSUi6kMBXc8Q=;
        b=mxT3oIQrls92stnAnba2/OMlkAm2tr5sb/hhFCXR1wyCXFOwlTOIoWukytdqN9lcpq
         4m9I/NUsBNHwsIG9/qCWEgax9UK/yRP/VHU/pojMaPt6Y5+iI7y25DX6HncLVhknvBJj
         hJxFH2aUuHBRmZopPIXzaMD4I6vo9Cb+boYDhsu9rIa18lF5S1yxdihG7x/LO82zIc3l
         +qnaEta8wnxue8E4a0RjakIQSR97I5kQe7i1fkiKGsEyRf1ksmBLrVF7nNE4Jaed16Xo
         h2h3j64Xq82zEuiYwmQ9lV9U/a70GUUTlO4LIwE+8Ur9zDMJ8qQ+sew0vFtQTe9pJQD/
         OyDw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=M+49WH3sEd/2PdxOVwFHjnmxDrEr5JWuSUi6kMBXc8Q=;
        b=7MWJHMvg86nbBGZzrRlGLwGBUGwDv7V2X9qvmF2TZ8ITnZ8krg0cb5hHb5XbP5bRYm
         WMgFF8EM7AIF0X0UhAqaROvYxfbPQDe2iD6DLYdF3vq+8dp1aitp6KqPoEMAtJhm74Fw
         wFMdwwtef7pCJcw/5l/yQa2gVOANAHHAFNkBkMm12rDKiAC7dMHEXJllq9liFkyRWRaB
         PCd1dgRAZROsOa/9SsJQ9GolnS2TvDtzkh8JZhBnqWNuNpGs87DDEGETxKiV1pTkopcB
         H/hFI2wGiQnIIE+wlB6iKU2veKQWi1W/+34ctzrAjugakogg5BIILGr16Bbxbou6gdU5
         0iSg==
X-Gm-Message-State: ACgBeo05wHw632G/hhzsTGk3kKiPx+Voda8QsPDxuybwEF2kCPpobcbx
        MxJ/KsMHMLxik5NEGgEiJvHJmVss34g=
X-Google-Smtp-Source: 
 AA6agR7nCcR0wFG7ce5E6vslWoKjuIL5XHNBcixZEQDZXyoPkKpTHRt8Df5VUkjT390Zp+GEzQwCoXOGo+0=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:aa7:86d0:0:b0:538:16db:4561 with SMTP id
 h16-20020aa786d0000000b0053816db4561mr14868024pfo.85.1661901405681; Tue, 30
 Aug 2022 16:16:45 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:04 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-18-seanjc@google.com>
Subject: [PATCH v5 17/27] KVM: x86: Hoist nested event checks above event
 injection logic
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Perform nested event checks before re-injecting exceptions/events into
L2.  If a pending exception causes VM-Exit to L1, re-injecting events
into vmcs02 is premature and wasted effort.  Take care to ensure events
that need to be re-injected are still re-injected if checking for nested
events "fails", i.e. if KVM needs to force an immediate entry+exit to
complete the to-be-re-injecteed event.

Keep the "can_inject" logic the same for now; it too can be pushed below
the nested checks, but is a slightly riskier change (see past bugs about
events not being properly purged on nested VM-Exit).

Add and/or modify comments to better document the various interactions.
Of note is the comment regarding "blocking" previously injected NMIs and
IRQs if an exception is pending.  The old comment isn't wrong strictly
speaking, but it failed to capture the reason why the logic even exists.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 89 +++++++++++++++++++++++++++-------------------
 1 file changed, 53 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c19658b7be23..534484318d52 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9711,53 +9711,70 @@ static void kvm_inject_exception(struct kvm_vcpu *v=
cpu)
=20
 static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate=
_exit)
 {
+	bool can_inject =3D !kvm_event_needs_reinjection(vcpu);
 	int r;
-	bool can_inject =3D true;
=20
-	/* try to reinject previous events if any */
+	/*
+	 * Process nested events first, as nested VM-Exit supercedes event
+	 * re-injection.  If there's an event queued for re-injection, it will
+	 * be saved into the appropriate vmc{b,s}12 fields on nested VM-Exit.
+	 */
+	if (is_guest_mode(vcpu))
+		r =3D kvm_check_nested_events(vcpu);
+	else
+		r =3D 0;
=20
-	if (vcpu->arch.exception.injected) {
+	/*
+	 * Re-inject exceptions and events *especially* if immediate entry+exit
+	 * to/from L2 is needed, as any event that has already been injected
+	 * into L2 needs to complete its lifecycle before injecting a new event.
+	 *
+	 * Don't re-inject an NMI or interrupt if there is a pending exception.
+	 * This collision arises if an exception occurred while vectoring the
+	 * injected event, KVM intercepted said exception, and KVM ultimately
+	 * determined the fault belongs to the guest and queues the exception
+	 * for injection back into the guest.
+	 *
+	 * "Injected" interrupts can also collide with pending exceptions if
+	 * userspace ignores the "ready for injection" flag and blindly queues
+	 * an interrupt.  In that case, prioritizing the exception is correct,
+	 * as the exception "occurred" before the exit to userspace.  Trap-like
+	 * exceptions, e.g. most #DBs, have higher priority than interrupts.
+	 * And while fault-like exceptions, e.g. #GP and #PF, are the lowest
+	 * priority, they're only generated (pended) during instruction
+	 * execution, and interrupts are recognized at instruction boundaries.
+	 * Thus a pending fault-like exception means the fault occurred on the
+	 * *previous* instruction and must be serviced prior to recognizing any
+	 * new events in order to fully complete the previous instruction.
+	 */
+	if (vcpu->arch.exception.injected)
 		kvm_inject_exception(vcpu);
-		can_inject =3D false;
-	}
+	else if (vcpu->arch.exception.pending)
+		; /* see above */
+	else if (vcpu->arch.nmi_injected)
+		static_call(kvm_x86_inject_nmi)(vcpu);
+	else if (vcpu->arch.interrupt.injected)
+		static_call(kvm_x86_inject_irq)(vcpu, true);
+
 	/*
-	 * Do not inject an NMI or interrupt if there is a pending
-	 * exception.  Exceptions and interrupts are recognized at
-	 * instruction boundaries, i.e. the start of an instruction.
-	 * Trap-like exceptions, e.g. #DB, have higher priority than
-	 * NMIs and interrupts, i.e. traps are recognized before an
-	 * NMI/interrupt that's pending on the same instruction.
-	 * Fault-like exceptions, e.g. #GP and #PF, are the lowest
-	 * priority, but are only generated (pended) during instruction
-	 * execution, i.e. a pending fault-like exception means the
-	 * fault occurred on the *previous* instruction and must be
-	 * serviced prior to recognizing any new events in order to
-	 * fully complete the previous instruction.
+	 * Exceptions that morph to VM-Exits are handled above, and pending
+	 * exceptions on top of injected exceptions that do not VM-Exit should
+	 * either morph to #DF or, sadly, override the injected exception.
 	 */
-	else if (!vcpu->arch.exception.pending) {
-		if (vcpu->arch.nmi_injected) {
-			static_call(kvm_x86_inject_nmi)(vcpu);
-			can_inject =3D false;
-		} else if (vcpu->arch.interrupt.injected) {
-			static_call(kvm_x86_inject_irq)(vcpu, true);
-			can_inject =3D false;
-		}
-	}
-
 	WARN_ON_ONCE(vcpu->arch.exception.injected &&
 		     vcpu->arch.exception.pending);
=20
 	/*
-	 * Call check_nested_events() even if we reinjected a previous event
-	 * in order for caller to determine if it should require immediate-exit
-	 * from L2 to L1 due to pending L1 events which require exit
-	 * from L2 to L1.
+	 * Bail if immediate entry+exit to/from the guest is needed to complete
+	 * nested VM-Enter or event re-injection so that a different pending
+	 * event can be serviced (or if KVM needs to exit to userspace).
+	 *
+	 * Otherwise, continue processing events even if VM-Exit occurred.  The
+	 * VM-Exit will have cleared exceptions that were meant for L2, but
+	 * there may now be events that can be injected into L1.
 	 */
-	if (is_guest_mode(vcpu)) {
-		r =3D kvm_check_nested_events(vcpu);
-		if (r < 0)
-			goto out;
-	}
+	if (r < 0)
+		goto out;
=20
 	/* try to inject new event if pending */
 	if (vcpu->arch.exception.pending) {
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D29F3ECAAD4
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:19:31 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232190AbiH3XTa (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:19:30 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37848 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232024AbiH3XS0 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:26 -0400
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4F14A2DA0
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:53 -0700 (PDT)
Received: by mail-pg1-x54a.google.com with SMTP id
 136-20020a63008e000000b0042d707c94fbso1788865pga.9
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=6sbjG07/3BJg7OhuoSrfyiDHRM9Via7ZFsnDogcWimo=;
        b=O4mRBlL1GVybesZRph3jxADQGbSp7RAZAKEKxl/yQutqEWROG+vXriwhT4cwxSC98i
         KN8IqmYHuulMt/uOeEy03Jo3DUBNUrACt4nAnSUY5ePNdu4NFmoy5ave8wuphTBJDLAm
         Ivfv3D6Vx6fa14019uFWe9S4CrPjsH0dmtR9s9Qg/dOks9JKJUfXs8q0ep9abWf9kHIO
         UXeZ8EcHGJK0YT5gFa4JE3tnM+w+k471tss9JvmSJFeK1nkh670haRdum+PkEGCyGGDb
         kIYKV4iMAyv7N0ciA2AKYFPrhgXCoUAhpyCJlbu4VJIAIGzGHXAJQTb2zHignDCAXuNJ
         YgxA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=6sbjG07/3BJg7OhuoSrfyiDHRM9Via7ZFsnDogcWimo=;
        b=HLNIGLtFe+H+LNJ7vfg52j/amEpiZdDhnIxBxinGY8/DE16nHAbH6LV0CEKP09G0Qg
         U9TgJFa56hfnkyt32nIY9Nt+1Jmn54yOJkGqXlOMVEXwIVY9lEgCt7ZJhX/lESQ5e+ym
         vOUjKeAoqon2OwCxCKFLU9ugLefUwb5z3EcxLQg173Z3e6Ye/U3l6m4sKAv7e3l0gkHJ
         FgxXV2CrAjsGqNjBIXBsaglVPlWS+bzOamoRot+Ub69L2l7RIGUhKwpO+qV8xRvFb+wy
         tK5M4gRiJ+/gyC4BCJ5El9dltp1Ku15Klf/gtsVvANfU28R20gra0hQdPWPs2Au+DA70
         ru0g==
X-Gm-Message-State: ACgBeo3yV+cKWY+Ok7cChunF1kvnw+v0x9FirJmgnZnHUig0ss2lq3BR
        pNGSPFmdAn6ow2OSxJPYcMgc9PDhwac=
X-Google-Smtp-Source: 
 AA6agR7SjO9JLbxN2xHQ6E0RBIlXXzvLyhqG5+N9qcoWAY4kD36SH0j14mV/yI2xP1mPSJid8n5KjtEfb1I=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:ac4:b0:535:c08:2da7 with SMTP id
 c4-20020a056a000ac400b005350c082da7mr23729757pfl.69.1661901407279; Tue, 30
 Aug 2022 16:16:47 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:05 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-19-seanjc@google.com>
Subject: [PATCH v5 18/27] KVM: x86: Evaluate ability to inject SMI/NMI/IRQ
 after potential VM-Exit
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Determine whether or not new events can be injected after checking nested
events.  If a VM-Exit occurred during nested event handling, any previous
event that needed re-injection is gone from's KVM perspective; the event
is captured in the vmc*12 VM-Exit information, but doesn't exist in terms
of what needs to be done for entry to L1.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 534484318d52..57f10bfcb90d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9711,7 +9711,7 @@ static void kvm_inject_exception(struct kvm_vcpu *vcp=
u)
=20
 static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate=
_exit)
 {
-	bool can_inject =3D !kvm_event_needs_reinjection(vcpu);
+	bool can_inject;
 	int r;
=20
 	/*
@@ -9776,7 +9776,13 @@ static int inject_pending_event(struct kvm_vcpu *vcp=
u, bool *req_immediate_exit)
 	if (r < 0)
 		goto out;
=20
-	/* try to inject new event if pending */
+	/*
+	 * New events, other than exceptions, cannot be injected if KVM needs
+	 * to re-inject a previous event.  See above comments on re-injecting
+	 * for why pending exceptions get priority.
+	 */
+	can_inject =3D !kvm_event_needs_reinjection(vcpu);
+
 	if (vcpu->arch.exception.pending) {
 		/*
 		 * Fault-class exceptions, except #DBs, set RF=3D1 in the RFLAGS
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 21104ECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:19:18 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231921AbiH3XTN (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:19:13 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40848 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232016AbiH3XST (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:19 -0400
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AC01A00D0
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:49 -0700 (PDT)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-32a115757b6so191764797b3.13
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=0/AOaEGMOEnLVXpdVz+tnbQ5CU/MJUfDeOnF55UijTA=;
        b=jQsCyEnKs5hrZxLo6apERPwi2BNAmod0kshAu73HyV4Kd5qazKYLn/Syy8pMB2nLuQ
         czINm+nAC3Mk8VYCnx1hVrMv6qklB8pbRIWFpguRbfwrUWZEnvGx0WR8e/3fmprxG2ky
         XqqjWsC0cYVtsn7HlPoEmnl/oaLEnB0CAdg/x/mas5xbpO8AtgF5nxfrYc+q4sO4RvUG
         lDmfQJ/ttD9lwts7AFVRiVLCvJsBcLqwvFllWL+t4Cs+nh4DImNOOnJTIHqLKMwHUqfy
         H21HQMkA4+f6TdA3hkOOgKvk3kB/5z/T7K6AfoZiQJxuoL29pHoiD41iMBcb+R04/J73
         e/Rg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=0/AOaEGMOEnLVXpdVz+tnbQ5CU/MJUfDeOnF55UijTA=;
        b=CPMKdDZ/VaGozwjSqI5ZxihvA1JLrC8RoRTEoCKRRl7vsqr3b5ugSJoLqo5sjARp1c
         SjPbxbMiRgNYBzaJxrgDqBCmGgrF9g3Y9U2Y8Zf3ch9xvY5cRyR0/RaWLDwWOZX5ybSO
         1n4DxZeExCTGIIDNxINsGY2vT8QQ5GL0J9z7tX6y1vJ+FTGfK09vrtIm87vnCTjMyrpG
         nkZEs7HhuaQxkNKbzTBU17L5Qac7pfddUcWjWWB/zkkXh5cOkpeMRZKQImd5Z63FqqHW
         +Lqx/cgBxjpp84W0iGRqC1Hx77IAA76EHOsSbxAd78gDq6hyHKNz88H++d0yDQmmgfeE
         IgPw==
X-Gm-Message-State: ACgBeo3Rh0Bw0z8cH5ROK3NhFrk7xIG7J4oYbsN4X0ZXW+AfBf8o2buN
        w87oFeNOpg2r8p0lJS2vwSHczsiO3qA=
X-Google-Smtp-Source: 
 AA6agR79fTtCYzmzR8E9SiUtgFrrMHPTvZrTGe3eji+9gZAiFbPGV4RpLl0m2jL2z9s7ZeKPXEDeVUpg51s=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a25:c106:0:b0:695:9412:2f0e with SMTP id
 r6-20020a25c106000000b0069594122f0emr13900907ybf.206.1661901408989; Tue, 30
 Aug 2022 16:16:48 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:06 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-20-seanjc@google.com>
Subject: [PATCH v5 19/27] KVM: nVMX: Add a helper to identify low-priority #DB
 traps
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add a helper to identify "low"-priority #DB traps, i.e. trap-like #DBs
that aren't TSS T flag #DBs, and tweak the related code to operate on any
queued exception.  A future commit will separate exceptions that are
intercepted by L1, i.e. cause nested VM-Exit, from those that do NOT
trigger nested VM-Exit.  I.e. there will be multiple exception structs
and multiple invocations of the helpers.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4bc2250502ea..b76c69c50649 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3866,14 +3866,24 @@ static void nested_vmx_inject_exception_vmexit(stru=
ct kvm_vcpu *vcpu,
  * from the emulator (because such #DBs are fault-like and thus don't trig=
ger
  * actions that fire on instruction retire).
  */
-static inline unsigned long vmx_get_pending_dbg_trap(struct kvm_vcpu *vcpu)
+static unsigned long vmx_get_pending_dbg_trap(struct kvm_queued_exception =
*ex)
 {
-	if (!vcpu->arch.exception.pending ||
-	    vcpu->arch.exception.vector !=3D DB_VECTOR)
+	if (!ex->pending || ex->vector !=3D DB_VECTOR)
 		return 0;
=20
 	/* General Detect #DBs are always fault-like. */
-	return vcpu->arch.exception.payload & ~DR6_BD;
+	return ex->payload & ~DR6_BD;
+}
+
+/*
+ * Returns true if there's a pending #DB exception that is lower priority =
than
+ * a pending Monitor Trap Flag VM-Exit.  TSS T-flag #DBs are not emulated =
by
+ * KVM, but could theoretically be injected by userspace.  Note, this code=
 is
+ * imperfect, see above.
+ */
+static bool vmx_is_low_priority_db_trap(struct kvm_queued_exception *ex)
+{
+	return vmx_get_pending_dbg_trap(ex) & ~DR6_BT;
 }
=20
 /*
@@ -3885,8 +3895,9 @@ static inline unsigned long vmx_get_pending_dbg_trap(=
struct kvm_vcpu *vcpu)
  */
 static void nested_vmx_update_pending_dbg(struct kvm_vcpu *vcpu)
 {
-	unsigned long pending_dbg =3D vmx_get_pending_dbg_trap(vcpu);
+	unsigned long pending_dbg;
=20
+	pending_dbg =3D vmx_get_pending_dbg_trap(&vcpu->arch.exception);
 	if (pending_dbg)
 		vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, pending_dbg);
 }
@@ -3956,7 +3967,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 	 * prioritize SMI over MTF and trap-like #DBs.
 	 */
 	if (vcpu->arch.exception.pending &&
-	    !(vmx_get_pending_dbg_trap(vcpu) & ~DR6_BT)) {
+	    !vmx_is_low_priority_db_trap(&vcpu->arch.exception)) {
 		if (block_nested_exceptions)
 			return -EBUSY;
 		if (!nested_vmx_check_exception(vcpu, &exit_qual))
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5E7F7ECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:19:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232213AbiH3XTj (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:19:39 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40362 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231960AbiH3XS2 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:28 -0400
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43814A3446
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:55 -0700 (PDT)
Received: by mail-pj1-x104a.google.com with SMTP id
 36-20020a17090a0fa700b001fd64c962afso5385066pjz.5
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=6ySBpcF4lKWLS5C3rbGFno/maiuEG4amRlwLIYIj7Fw=;
        b=Fp+ybHU7tXWGco9N1mrxhlVMyaP8GG7vmr4K7HvHUqLX5LmaaDPqvcowp32xViDG1G
         RbYaxDcSyhp0DWJrGHw2dqiyfmqAyPxRdpWWJl2ygQYLp8/GewC2D+aMfUlkmNvogHWx
         KXKsWUB+Da6fTuodQ5uHGWM9jmXcHsNp82w05l7+lr+tnI64uPiHm0B5xj8e1ahkDGsh
         ryVJMwxtbZ9OXhqTbSQDyFAnyfpvYMuCOmIbHUu/seRDplNey5UmDrj4URJNPsACtJD3
         atdEjiWrEXMuKA9H+2u5hV2EWTtJAX7XRiCwO+QqAFxH9KGyLO+OgJfPoORBLi3tflcJ
         TrYQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=6ySBpcF4lKWLS5C3rbGFno/maiuEG4amRlwLIYIj7Fw=;
        b=1YXC3ETu6Ps4Oa8z7R+HxSQZ6Dc51W+Oo4WnxdGS0MldUH95BsyIYNJINbjKGKuFbv
         mCaAO2KqwsIlOkyqYdTmmk8nIWw93kcDs1R9W+kS13AXSkhxe4VOFx5hpRW9BDCF2DDJ
         mjfHbBINsPTypvz7a0DV1mvuouF3KhL6jEEKnMIXSK2y/8czsXwEZ1JS2dd43TVsH4ZH
         BSFcrG0upGcPsxzV5TBHw1Q5BuMjLsHxF/ridIx22/0fPjbyjZ+KqN49DBje5+5aPlIl
         +3Um0nUr/BpkpBLXmyRnjJxIuvBwkfhHeexdNcHTWmGE1yrMCMBZgKEbotag02k9bGTg
         9odw==
X-Gm-Message-State: ACgBeo0ff+l9JhbrffRu0BuvkXb5vv7YBLIvLaVTsujy7ByySqp2t3AB
        frT+t5wdGi+0XyJQaoaLgISWxr9QTDI=
X-Google-Smtp-Source: 
 AA6agR6pjTyrc/T678JckRIztZpU3eppXKLhLEgPILC99Mj8Juy4l2Gh7YhdtIkxfxYol078VGaIoj+vimM=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:903:11c7:b0:171:2818:4cd7 with SMTP id
 q7-20020a17090311c700b0017128184cd7mr22786217plh.136.1661901410547; Tue, 30
 Aug 2022 16:16:50 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:07 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-21-seanjc@google.com>
Subject: [PATCH v5 20/27] KVM: nVMX: Document priority of all known events on
 Intel CPUs
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add a gigantic comment above vmx_check_nested_events() to document the
priorities of all known events on Intel CPUs.  Intel's SDM doesn't
include VMX-specific events in its "Priority Among Concurrent Events",
which makes it painfully difficult to suss out the correct priority
between things like Monitor Trap Flag VM-Exits and pending #DBs.

Kudos to Jim Mattson for doing the hard work of collecting and
interpreting the priorities from various locations throughtout the SDM
(because putting them all in one place in the SDM would be too easy).

Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 83 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b76c69c50649..ec954ca8a0e3 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3908,6 +3908,89 @@ static bool nested_vmx_preemption_timer_pending(stru=
ct kvm_vcpu *vcpu)
 	       to_vmx(vcpu)->nested.preemption_timer_expired;
 }
=20
+/*
+ * Per the Intel SDM's table "Priority Among Concurrent Events", with minor
+ * edits to fill in missing examples, e.g. #DB due to split-lock accesses,
+ * and less minor edits to splice in the priority of VMX Non-Root specific
+ * events, e.g. MTF and NMI/INTR-window exiting.
+ *
+ * 1 Hardware Reset and Machine Checks
+ *	- RESET
+ *	- Machine Check
+ *
+ * 2 Trap on Task Switch
+ *	- T flag in TSS is set (on task switch)
+ *
+ * 3 External Hardware Interventions
+ *	- FLUSH
+ *	- STOPCLK
+ *	- SMI
+ *	- INIT
+ *
+ * 3.5 Monitor Trap Flag (MTF) VM-exit[1]
+ *
+ * 4 Traps on Previous Instruction
+ *	- Breakpoints
+ *	- Trap-class Debug Exceptions (#DB due to TF flag set, data/I-O
+ *	  breakpoint, or #DB due to a split-lock access)
+ *
+ * 4.3	VMX-preemption timer expired VM-exit
+ *
+ * 4.6	NMI-window exiting VM-exit[2]
+ *
+ * 5 Nonmaskable Interrupts (NMI)
+ *
+ * 5.5 Interrupt-window exiting VM-exit and Virtual-interrupt delivery
+ *
+ * 6 Maskable Hardware Interrupts
+ *
+ * 7 Code Breakpoint Fault
+ *
+ * 8 Faults from Fetching Next Instruction
+ *	- Code-Segment Limit Violation
+ *	- Code Page Fault
+ *	- Control protection exception (missing ENDBRANCH at target of indirect
+ *					call or jump)
+ *
+ * 9 Faults from Decoding Next Instruction
+ *	- Instruction length > 15 bytes
+ *	- Invalid Opcode
+ *	- Coprocessor Not Available
+ *
+ *10 Faults on Executing Instruction
+ *	- Overflow
+ *	- Bound error
+ *	- Invalid TSS
+ *	- Segment Not Present
+ *	- Stack fault
+ *	- General Protection
+ *	- Data Page Fault
+ *	- Alignment Check
+ *	- x86 FPU Floating-point exception
+ *	- SIMD floating-point exception
+ *	- Virtualization exception
+ *	- Control protection exception
+ *
+ * [1] Per the "Monitor Trap Flag" section: System-management interrupts (=
SMIs),
+ *     INIT signals, and higher priority events take priority over MTF VM =
exits.
+ *     MTF VM exits take priority over debug-trap exceptions and lower pri=
ority
+ *     events.
+ *
+ * [2] Debug-trap exceptions and higher priority events take priority over=
 VM exits
+ *     caused by the VMX-preemption timer.  VM exits caused by the VMX-pre=
emption
+ *     timer take priority over VM exits caused by the "NMI-window exiting"
+ *     VM-execution control and lower priority events.
+ *
+ * [3] Debug-trap exceptions and higher priority events take priority over=
 VM exits
+ *     caused by "NMI-window exiting".  VM exits caused by this control ta=
ke
+ *     priority over non-maskable interrupts (NMIs) and lower priority eve=
nts.
+ *
+ * [4] Virtual-interrupt delivery has the same priority as that of VM exit=
s due to
+ *     the 1-setting of the "interrupt-window exiting" VM-execution contro=
l.  Thus,
+ *     non-maskable interrupts (NMIs) and higher priority events take prio=
rity over
+ *     delivery of a virtual interrupt; delivery of a virtual interrupt ta=
kes
+ *     priority over external interrupts and lower priority events.
+ */
 static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 13B1CECAAD5
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:20:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232287AbiH3XUT (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:20:19 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40850 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232129AbiH3XTE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:19:04 -0400
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C63BA00EF
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:58 -0700 (PDT)
Received: by mail-pg1-x549.google.com with SMTP id
 n28-20020a63a51c000000b0042b7f685f05so5917650pgf.13
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=aMc/vE+XiBR2Oiw5L9bgN3TIa9Sbnj+nwomUXVp1aCQ=;
        b=kf0IBeYrKJlgAmTTdB93n9dp61btlKYkSRj7ahpRV+UzZYdstINmupTNC/8nuVJIA3
         Xub4bfkvZiO5xxAOT02mYe2zcGI7QpiWtdBGnYYWMtBYFjdpgKF8UJ9fKLFJ1Ym+qs+F
         8D8Y0x9m1sOLhouteHtvtG5OFzRwhXxmfyUZWfYRJHKr6GiJxovOjp9JM01audtP+ctG
         f3SBhmTPxeeK1HZ8hjaoyNvmVTRlsqNEl5jP8AAtK+ZZ/KjmeDCWhQFuFfmakfL1/5lf
         FLVQNOxyRqAZwRdtCPngO5WrbjYZfMhtGKpT25wfPvgb26II3MheluxvL/6m6gwJnSC+
         G2IQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=aMc/vE+XiBR2Oiw5L9bgN3TIa9Sbnj+nwomUXVp1aCQ=;
        b=kgmp3xsYNzx04sU9oNhTs7UFYCKFsVn6PhUFbBseTv0JU5YdK5j/7AVYpj7UezAVCj
         G5f0GKvO2+uZbfe5sBzSKNE1c+eqlOMJyuEeNWH48WhPyL/3+nDTTaLBpqUdvxWj+KQL
         9kMUhTZ9nY9uktpEs3EFzw/B3/YsrDdEL2vxzvJaocvT5nN+gidqMe5kn9FQryH+Wz+q
         SvCBaFOiRAVjYiSi7k7FQlhcz6nDzUm3sJebFoaRFaY0GxKoEHCAZprSPrkRBe22yCyG
         VltxCk8dy+5CazoJpHYejfdBaffXhZlvC9VLUAbjny4I8JHi3lkHZjeH30HAoU3XlN40
         RYng==
X-Gm-Message-State: ACgBeo3Kr8N6/AcrV3CFJvRAvhcX+So4Wffz5wVtB41Uu2lPlaBgW4K9
        UaUCWH1D+wwIlTeTV0qoDiVueclSLE0=
X-Google-Smtp-Source: 
 AA6agR7HxBGr3Oa//v/JJfKZT8O8weiZesrG2teu7QeB727GaiPdaNL0iTsPUzDbESF8rNmLszMC7Yb2RAQ=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:902:b684:b0:172:d9f4:e511 with SMTP id
 c4-20020a170902b68400b00172d9f4e511mr23431464pls.107.1661901412270; Tue, 30
 Aug 2022 16:16:52 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:08 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-22-seanjc@google.com>
Subject: [PATCH v5 21/27] KVM: x86: Morph pending exceptions to pending
 VM-Exits at queue time
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Morph pending exceptions to pending VM-Exits (due to interception) when
the exception is queued instead of waiting until nested events are
checked at VM-Entry.  This fixes a longstanding bug where KVM fails to
handle an exception that occurs during delivery of a previous exception,
KVM (L0) and L1 both want to intercept the exception (e.g. #PF for shadow
paging), and KVM determines that the exception is in the guest's domain,
i.e. queues the new exception for L2.  Deferring the interception check
causes KVM to esclate various combinations of injected+pending exceptions
to double fault (#DF) without consulting L1's interception desires, and
ends up injecting a spurious #DF into L2.

KVM has fudged around the issue for #PF by special casing emulated #PF
injection for shadow paging, but the underlying issue is not unique to
shadow paging in L0, e.g. if KVM is intercepting #PF because the guest
has a smaller maxphyaddr and L1 (but not L0) is using shadow paging.
Other exceptions are affected as well, e.g. if KVM is intercepting #GP
for one of SVM's workaround or for the VMware backdoor emulation stuff.
The other cases have gone unnoticed because the #DF is spurious if and
only if L1 resolves the exception, e.g. KVM's goofs go unnoticed if L1
would have injected #DF anyways.

The hack-a-fix has also led to ugly code, e.g. bailing from the emulator
if #PF injection forced a nested VM-Exit and the emulator finds itself
back in L1.  Allowing for direct-to-VM-Exit queueing also neatly solves
the async #PF in L2 mess; no need to set a magic flag and token, simply
queue a #PF nested VM-Exit.

Deal with event migration by flagging that a pending exception was queued
by userspace and check for interception at the next KVM_RUN, e.g. so that
KVM does the right thing regardless of the order in which userspace
restores nested state vs. event state.

When "getting" events from userspace, simply drop any pending excpetion
that is destined to be intercepted if there is also an injected exception
to be migrated.  Ideally, KVM would migrate both events, but that would
require new ABI, and practically speaking losing the event is unlikely to
be noticed, let alone fatal.  The injected exception is captured, RIP
still points at the original faulting instruction, etc...  So either the
injection on the target will trigger the same intercepted exception, or
the source of the intercepted exception was transient and/or
non-deterministic, thus dropping it is ok-ish.

Fixes: a04aead144fd ("KVM: nSVM: fix running nested guests when npt=3D0")
Fixes: feaf0c7dc473 ("KVM: nVMX: Do not generate #DF if #PF happens during =
exception delivery into L2")
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  12 ++-
 arch/x86/kvm/svm/nested.c       |  45 +++------
 arch/x86/kvm/vmx/nested.c       | 109 ++++++++++------------
 arch/x86/kvm/vmx/vmx.c          |   6 +-
 arch/x86/kvm/x86.c              | 159 ++++++++++++++++++++++----------
 arch/x86/kvm/x86.h              |   7 ++
 6 files changed, 188 insertions(+), 150 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 624a0676a8f9..1065e86ed21a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -647,7 +647,6 @@ struct kvm_queued_exception {
 	u32 error_code;
 	unsigned long payload;
 	bool has_payload;
-	u8 nested_apf;
 };
=20
 struct kvm_vcpu_arch {
@@ -748,8 +747,12 @@ struct kvm_vcpu_arch {
=20
 	u8 event_exit_inst_len;
=20
+	bool exception_from_userspace;
+
 	/* Exceptions to be injected to the guest. */
 	struct kvm_queued_exception exception;
+	/* Exception VM-Exits to be synthesized to L1. */
+	struct kvm_queued_exception exception_vmexit;
=20
 	struct kvm_queued_interrupt {
 		bool injected;
@@ -860,7 +863,6 @@ struct kvm_vcpu_arch {
 		u32 id;
 		bool send_user_only;
 		u32 host_apf_flags;
-		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
 		bool pageready_pending;
 	} apf;
@@ -1636,9 +1638,9 @@ struct kvm_x86_ops {
=20
 struct kvm_x86_nested_ops {
 	void (*leave_nested)(struct kvm_vcpu *vcpu);
+	bool (*is_exception_vmexit)(struct kvm_vcpu *vcpu, u8 vector,
+				    u32 error_code);
 	int (*check_events)(struct kvm_vcpu *vcpu);
-	bool (*handle_page_fault_workaround)(struct kvm_vcpu *vcpu,
-					     struct x86_exception *fault);
 	bool (*hv_timer_pending)(struct kvm_vcpu *vcpu);
 	void (*triple_fault)(struct kvm_vcpu *vcpu);
 	int (*get_state)(struct kvm_vcpu *vcpu,
@@ -1865,7 +1867,7 @@ void kvm_queue_exception_p(struct kvm_vcpu *vcpu, uns=
igned nr, unsigned long pay
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error=
_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fa=
ult);
-bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 				    struct x86_exception *fault);
 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
 bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a6111392985c..405075286965 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -55,28 +55,6 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *=
vcpu,
 	nested_svm_vmexit(svm);
 }
=20
-static bool nested_svm_handle_page_fault_workaround(struct kvm_vcpu *vcpu,
-						    struct x86_exception *fault)
-{
-	struct vcpu_svm *svm =3D to_svm(vcpu);
-	struct vmcb *vmcb =3D svm->vmcb;
-
- 	WARN_ON(!is_guest_mode(vcpu));
-
-	if (vmcb12_is_intercept(&svm->nested.ctl,
-				INTERCEPT_EXCEPTION_OFFSET + PF_VECTOR) &&
-	    !WARN_ON_ONCE(svm->nested.nested_run_pending)) {
-	     	vmcb->control.exit_code =3D SVM_EXIT_EXCP_BASE + PF_VECTOR;
-		vmcb->control.exit_code_hi =3D 0;
-		vmcb->control.exit_info_1 =3D fault->error_code;
-		vmcb->control.exit_info_2 =3D fault->address;
-		nested_svm_vmexit(svm);
-		return true;
-	}
-
-	return false;
-}
-
 static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index)
 {
 	struct vcpu_svm *svm =3D to_svm(vcpu);
@@ -1304,16 +1282,17 @@ int nested_svm_check_permissions(struct kvm_vcpu *v=
cpu)
 	return 0;
 }
=20
-static bool nested_exit_on_exception(struct vcpu_svm *svm)
+static bool nested_svm_is_exception_vmexit(struct kvm_vcpu *vcpu, u8 vecto=
r,
+					   u32 error_code)
 {
-	unsigned int vector =3D svm->vcpu.arch.exception.vector;
+	struct vcpu_svm *svm =3D to_svm(vcpu);
=20
 	return (svm->nested.ctl.intercepts[INTERCEPT_EXCEPTION] & BIT(vector));
 }
=20
 static void nested_svm_inject_exception_vmexit(struct kvm_vcpu *vcpu)
 {
-	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception_vmexit;
 	struct vcpu_svm *svm =3D to_svm(vcpu);
 	struct vmcb *vmcb =3D svm->vmcb;
=20
@@ -1328,9 +1307,7 @@ static void nested_svm_inject_exception_vmexit(struct=
 kvm_vcpu *vcpu)
 	 * than #PF.
 	 */
 	if (ex->vector =3D=3D PF_VECTOR) {
-		if (ex->nested_apf)
-			vmcb->control.exit_info_2 =3D vcpu->arch.apf.nested_apf_token;
-		else if (ex->has_payload)
+		if (ex->has_payload)
 			vmcb->control.exit_info_2 =3D ex->payload;
 		else
 			vmcb->control.exit_info_2 =3D vcpu->arch.cr2;
@@ -1383,15 +1360,19 @@ static int svm_check_nested_events(struct kvm_vcpu =
*vcpu)
 		return 0;
 	}
=20
-	if (vcpu->arch.exception.pending) {
+	if (vcpu->arch.exception_vmexit.pending) {
 		if (block_nested_exceptions)
                         return -EBUSY;
-		if (!nested_exit_on_exception(svm))
-			return 0;
 		nested_svm_inject_exception_vmexit(vcpu);
 		return 0;
 	}
=20
+	if (vcpu->arch.exception.pending) {
+		if (block_nested_exceptions)
+			return -EBUSY;
+		return 0;
+	}
+
 	if (vcpu->arch.smi_pending && !svm_smi_blocked(vcpu)) {
 		if (block_nested_events)
 			return -EBUSY;
@@ -1729,8 +1710,8 @@ static bool svm_get_nested_state_pages(struct kvm_vcp=
u *vcpu)
=20
 struct kvm_x86_nested_ops svm_nested_ops =3D {
 	.leave_nested =3D svm_leave_nested,
+	.is_exception_vmexit =3D nested_svm_is_exception_vmexit,
 	.check_events =3D svm_check_nested_events,
-	.handle_page_fault_workaround =3D nested_svm_handle_page_fault_workaround,
 	.triple_fault =3D nested_svm_triple_fault,
 	.get_nested_state_pages =3D svm_get_nested_state_pages,
 	.get_state =3D svm_get_nested_state,
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ec954ca8a0e3..dfa6cf173f2b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -439,59 +439,22 @@ static bool nested_vmx_is_page_fault_vmexit(struct vm=
cs12 *vmcs12,
 	return inequality ^ bit;
 }
=20
-
-/*
- * KVM wants to inject page-faults which it got to the guest. This function
- * checks whether in a nested guest, we need to inject them to L1 or L2.
- */
-static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned long=
 *exit_qual)
-{
-	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
-	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
-
-	if (ex->vector =3D=3D PF_VECTOR) {
-		if (ex->nested_apf) {
-			*exit_qual =3D vcpu->arch.apf.nested_apf_token;
-			return 1;
-		}
-		if (nested_vmx_is_page_fault_vmexit(vmcs12, ex->error_code)) {
-			*exit_qual =3D ex->has_payload ? ex->payload : vcpu->arch.cr2;
-			return 1;
-		}
-	} else if (vmcs12->exception_bitmap & (1u << ex->vector)) {
-		if (ex->vector =3D=3D DB_VECTOR) {
-			if (ex->has_payload) {
-				*exit_qual =3D ex->payload;
-			} else {
-				*exit_qual =3D vcpu->arch.dr6;
-				*exit_qual &=3D ~DR6_BT;
-				*exit_qual ^=3D DR6_ACTIVE_LOW;
-			}
-		} else
-			*exit_qual =3D 0;
-		return 1;
-	}
-
-	return 0;
-}
-
-static bool nested_vmx_handle_page_fault_workaround(struct kvm_vcpu *vcpu,
-						    struct x86_exception *fault)
+static bool nested_vmx_is_exception_vmexit(struct kvm_vcpu *vcpu, u8 vecto=
r,
+					   u32 error_code)
 {
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
=20
-	WARN_ON(!is_guest_mode(vcpu));
+	/*
+	 * Drop bits 31:16 of the error code when performing the #PF mask+match
+	 * check.  All VMCS fields involved are 32 bits, but Intel CPUs never
+	 * set bits 31:16 and VMX disallows setting bits 31:16 in the injected
+	 * error code.  Including the to-be-dropped bits in the check might
+	 * result in an "impossible" or missed exit from L1's perspective.
+	 */
+	if (vector =3D=3D PF_VECTOR)
+		return nested_vmx_is_page_fault_vmexit(vmcs12, (u16)error_code);
=20
-	if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code) &&
-	    !WARN_ON_ONCE(to_vmx(vcpu)->nested.nested_run_pending)) {
-		vmcs12->vm_exit_intr_error_code =3D fault->error_code;
-		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
-				  PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
-				  INTR_INFO_DELIVER_CODE_MASK | INTR_INFO_VALID_MASK,
-				  fault->address);
-		return true;
-	}
-	return false;
+	return (vmcs12->exception_bitmap & (1u << vector));
 }
=20
 static int nested_vmx_check_io_bitmap_controls(struct kvm_vcpu *vcpu,
@@ -3817,12 +3780,24 @@ static int vmx_complete_nested_posted_interrupt(str=
uct kvm_vcpu *vcpu)
 	return -ENXIO;
 }
=20
-static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu,
-					       unsigned long exit_qual)
+static void nested_vmx_inject_exception_vmexit(struct kvm_vcpu *vcpu)
 {
-	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception_vmexit;
 	u32 intr_info =3D ex->vector | INTR_INFO_VALID_MASK;
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
+	unsigned long exit_qual;
+
+	if (ex->has_payload) {
+		exit_qual =3D ex->payload;
+	} else if (ex->vector =3D=3D PF_VECTOR) {
+		exit_qual =3D vcpu->arch.cr2;
+	} else if (ex->vector =3D=3D DB_VECTOR) {
+		exit_qual =3D vcpu->arch.dr6;
+		exit_qual &=3D ~DR6_BT;
+		exit_qual ^=3D DR6_ACTIVE_LOW;
+	} else {
+		exit_qual =3D 0;
+	}
=20
 	if (ex->has_error_code) {
 		/*
@@ -3995,7 +3970,6 @@ static int vmx_check_nested_events(struct kvm_vcpu *v=
cpu)
 {
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
-	unsigned long exit_qual;
 	/*
 	 * Only a pending nested run blocks a pending exception.  If there is a
 	 * previously injected event, the pending exception occurred while said
@@ -4049,14 +4023,20 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 	 * across SMI/RSM as it should; that needs to be addressed in order to
 	 * prioritize SMI over MTF and trap-like #DBs.
 	 */
+	if (vcpu->arch.exception_vmexit.pending &&
+	    !vmx_is_low_priority_db_trap(&vcpu->arch.exception_vmexit)) {
+		if (block_nested_exceptions)
+			return -EBUSY;
+
+		nested_vmx_inject_exception_vmexit(vcpu);
+		return 0;
+	}
+
 	if (vcpu->arch.exception.pending &&
 	    !vmx_is_low_priority_db_trap(&vcpu->arch.exception)) {
 		if (block_nested_exceptions)
 			return -EBUSY;
-		if (!nested_vmx_check_exception(vcpu, &exit_qual))
-			goto no_vmexit;
-		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
-		return 0;
+		goto no_vmexit;
 	}
=20
 	if (vmx->nested.mtf_pending) {
@@ -4067,13 +4047,18 @@ static int vmx_check_nested_events(struct kvm_vcpu =
*vcpu)
 		return 0;
 	}
=20
+	if (vcpu->arch.exception_vmexit.pending) {
+		if (block_nested_exceptions)
+			return -EBUSY;
+
+		nested_vmx_inject_exception_vmexit(vcpu);
+		return 0;
+	}
+
 	if (vcpu->arch.exception.pending) {
 		if (block_nested_exceptions)
 			return -EBUSY;
-		if (!nested_vmx_check_exception(vcpu, &exit_qual))
-			goto no_vmexit;
-		nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
-		return 0;
+		goto no_vmexit;
 	}
=20
 	if (nested_vmx_preemption_timer_pending(vcpu)) {
@@ -6946,8 +6931,8 @@ __init int nested_vmx_hardware_setup(int (*exit_handl=
ers[])(struct kvm_vcpu *))
=20
 struct kvm_x86_nested_ops vmx_nested_ops =3D {
 	.leave_nested =3D vmx_leave_nested,
+	.is_exception_vmexit =3D nested_vmx_is_exception_vmexit,
 	.check_events =3D vmx_check_nested_events,
-	.handle_page_fault_workaround =3D nested_vmx_handle_page_fault_workaround,
 	.hv_timer_pending =3D nested_vmx_preemption_timer_pending,
 	.triple_fault =3D nested_vmx_triple_fault,
 	.get_state =3D vmx_get_nested_state,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 07c4246415e9..c6fc37415fc6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1659,7 +1659,9 @@ static void vmx_update_emulated_instruction(struct kv=
m_vcpu *vcpu)
 	 */
 	if (nested_cpu_has_mtf(vmcs12) &&
 	    (!vcpu->arch.exception.pending ||
-	     vcpu->arch.exception.vector =3D=3D DB_VECTOR))
+	     vcpu->arch.exception.vector =3D=3D DB_VECTOR) &&
+	    (!vcpu->arch.exception_vmexit.pending ||
+	     vcpu->arch.exception_vmexit.vector =3D=3D DB_VECTOR))
 		vmx->nested.mtf_pending =3D true;
 	else
 		vmx->nested.mtf_pending =3D false;
@@ -5718,7 +5720,7 @@ static bool vmx_emulation_required_with_pending_excep=
tion(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
 	return vmx->emulation_required && !vmx->rmode.vm86_active &&
-	       (vcpu->arch.exception.pending || vcpu->arch.exception.injected);
+	       (kvm_is_exception_pending(vcpu) || vcpu->arch.exception.injected);
 }
=20
 static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 57f10bfcb90d..17cf43ca42c3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -613,6 +613,21 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vc=
pu,
 }
 EXPORT_SYMBOL_GPL(kvm_deliver_exception_payload);
=20
+static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int=
 vector,
+				       bool has_error_code, u32 error_code,
+				       bool has_payload, unsigned long payload)
+{
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception_vmexit;
+
+	ex->vector =3D vector;
+	ex->injected =3D false;
+	ex->pending =3D true;
+	ex->has_error_code =3D has_error_code;
+	ex->error_code =3D error_code;
+	ex->has_payload =3D has_payload;
+	ex->payload =3D payload;
+}
+
 static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 		unsigned nr, bool has_error, u32 error_code,
 	        bool has_payload, unsigned long payload, bool reinject)
@@ -622,18 +637,31 @@ static void kvm_multiple_exception(struct kvm_vcpu *v=
cpu,
=20
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
=20
+	/*
+	 * If the exception is destined for L2 and isn't being reinjected,
+	 * morph it to a VM-Exit if L1 wants to intercept the exception.  A
+	 * previously injected exception is not checked because it was checked
+	 * when it was original queued, and re-checking is incorrect if _L1_
+	 * injected the exception, in which case it's exempt from interception.
+	 */
+	if (!reinject && is_guest_mode(vcpu) &&
+	    kvm_x86_ops.nested_ops->is_exception_vmexit(vcpu, nr, error_code)) {
+		kvm_queue_exception_vmexit(vcpu, nr, has_error, error_code,
+					   has_payload, payload);
+		return;
+	}
+
 	if (!vcpu->arch.exception.pending && !vcpu->arch.exception.injected) {
 	queue:
 		if (reinject) {
 			/*
-			 * On vmentry, vcpu->arch.exception.pending is only
-			 * true if an event injection was blocked by
-			 * nested_run_pending.  In that case, however,
-			 * vcpu_enter_guest requests an immediate exit,
-			 * and the guest shouldn't proceed far enough to
-			 * need reinjection.
+			 * On VM-Entry, an exception can be pending if and only
+			 * if event injection was blocked by nested_run_pending.
+			 * In that case, however, vcpu_enter_guest() requests an
+			 * immediate exit, and the guest shouldn't proceed far
+			 * enough to need reinjection.
 			 */
-			WARN_ON_ONCE(vcpu->arch.exception.pending);
+			WARN_ON_ONCE(kvm_is_exception_pending(vcpu));
 			vcpu->arch.exception.injected =3D true;
 			if (WARN_ON_ONCE(has_payload)) {
 				/*
@@ -736,20 +764,22 @@ static int complete_emulated_insn_gp(struct kvm_vcpu =
*vcpu, int err)
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fa=
ult)
 {
 	++vcpu->stat.pf_guest;
-	vcpu->arch.exception.nested_apf =3D
-		is_guest_mode(vcpu) && fault->async_page_fault;
-	if (vcpu->arch.exception.nested_apf) {
-		vcpu->arch.apf.nested_apf_token =3D fault->address;
-		kvm_queue_exception_e(vcpu, PF_VECTOR, fault->error_code);
-	} else {
+
+	/*
+	 * Async #PF in L2 is always forwarded to L1 as a VM-Exit regardless of
+	 * whether or not L1 wants to intercept "regular" #PF.
+	 */
+	if (is_guest_mode(vcpu) && fault->async_page_fault)
+		kvm_queue_exception_vmexit(vcpu, PF_VECTOR,
+					   true, fault->error_code,
+					   true, fault->address);
+	else
 		kvm_queue_exception_e_p(vcpu, PF_VECTOR, fault->error_code,
 					fault->address);
-	}
 }
 EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
=20
-/* Returns true if the page fault was immediately morphed into a VM-Exit. =
*/
-bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 				    struct x86_exception *fault)
 {
 	struct kvm_mmu *fault_mmu;
@@ -767,26 +797,7 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *v=
cpu,
 		kvm_mmu_invalidate_gva(vcpu, fault_mmu, fault->address,
 				       fault_mmu->root.hpa);
=20
-	/*
-	 * A workaround for KVM's bad exception handling.  If KVM injected an
-	 * exception into L2, and L2 encountered a #PF while vectoring the
-	 * injected exception, manually check to see if L1 wants to intercept
-	 * #PF, otherwise queuing the #PF will lead to #DF or a lost exception.
-	 * In all other cases, defer the check to nested_ops->check_events(),
-	 * which will correctly handle priority (this does not).  Note, other
-	 * exceptions, e.g. #GP, are theoretically affected, #PF is simply the
-	 * most problematic, e.g. when L0 and L1 are both intercepting #PF for
-	 * shadow paging.
-	 *
-	 * TODO: Rewrite exception handling to track injected and pending
-	 *       (VM-Exit) exceptions separately.
-	 */
-	if (unlikely(vcpu->arch.exception.injected && is_guest_mode(vcpu)) &&
-	    kvm_x86_ops.nested_ops->handle_page_fault_workaround(vcpu, fault))
-		return true;
-
 	fault_mmu->inject_page_fault(vcpu, fault);
-	return false;
 }
 EXPORT_SYMBOL_GPL(kvm_inject_emulated_page_fault);
=20
@@ -4835,7 +4846,7 @@ static int kvm_vcpu_ready_for_interrupt_injection(str=
uct kvm_vcpu *vcpu)
 	return (kvm_arch_interrupt_allowed(vcpu) &&
 		kvm_cpu_accept_dm_intr(vcpu) &&
 		!kvm_event_needs_reinjection(vcpu) &&
-		!vcpu->arch.exception.pending);
+		!kvm_is_exception_pending(vcpu));
 }
=20
 static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
@@ -5010,13 +5021,27 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vc=
pu *vcpu,
 static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 					       struct kvm_vcpu_events *events)
 {
-	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
+	struct kvm_queued_exception *ex;
=20
 	process_nmi(vcpu);
=20
 	if (kvm_check_request(KVM_REQ_SMI, vcpu))
 		process_smi(vcpu);
=20
+	/*
+	 * KVM's ABI only allows for one exception to be migrated.  Luckily,
+	 * the only time there can be two queued exceptions is if there's a
+	 * non-exiting _injected_ exception, and a pending exiting exception.
+	 * In that case, ignore the VM-Exiting exception as it's an extension
+	 * of the injected exception.
+	 */
+	if (vcpu->arch.exception_vmexit.pending &&
+	    !vcpu->arch.exception.pending &&
+	    !vcpu->arch.exception.injected)
+		ex =3D &vcpu->arch.exception_vmexit;
+	else
+		ex =3D &vcpu->arch.exception;
+
 	/*
 	 * In guest mode, payload delivery should be deferred if the exception
 	 * will be intercepted by L1, e.g. KVM should not modifying CR2 if L1
@@ -5123,6 +5148,19 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct=
 kvm_vcpu *vcpu,
 		return -EINVAL;
=20
 	process_nmi(vcpu);
+
+	/*
+	 * Flag that userspace is stuffing an exception, the next KVM_RUN will
+	 * morph the exception to a VM-Exit if appropriate.  Do this only for
+	 * pending exceptions, already-injected exceptions are not subject to
+	 * intercpetion.  Note, userspace that conflates pending and injected
+	 * is hosed, and will incorrectly convert an injected exception into a
+	 * pending exception, which in turn may cause a spurious VM-Exit.
+	 */
+	vcpu->arch.exception_from_userspace =3D events->exception.pending;
+
+	vcpu->arch.exception_vmexit.pending =3D false;
+
 	vcpu->arch.exception.injected =3D events->exception.injected;
 	vcpu->arch.exception.pending =3D events->exception.pending;
 	vcpu->arch.exception.vector =3D events->exception.nr;
@@ -8155,18 +8193,17 @@ static void toggle_interruptibility(struct kvm_vcpu=
 *vcpu, u32 mask)
 	}
 }
=20
-static bool inject_emulated_exception(struct kvm_vcpu *vcpu)
+static void inject_emulated_exception(struct kvm_vcpu *vcpu)
 {
 	struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt;
+
 	if (ctxt->exception.vector =3D=3D PF_VECTOR)
-		return kvm_inject_emulated_page_fault(vcpu, &ctxt->exception);
-
-	if (ctxt->exception.error_code_valid)
+		kvm_inject_emulated_page_fault(vcpu, &ctxt->exception);
+	else if (ctxt->exception.error_code_valid)
 		kvm_queue_exception_e(vcpu, ctxt->exception.vector,
 				      ctxt->exception.error_code);
 	else
 		kvm_queue_exception(vcpu, ctxt->exception.vector);
-	return false;
 }
=20
 static struct x86_emulate_ctxt *alloc_emulate_ctxt(struct kvm_vcpu *vcpu)
@@ -8801,8 +8838,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gp=
a_t cr2_or_gpa,
=20
 	if (ctxt->have_exception) {
 		r =3D 1;
-		if (inject_emulated_exception(vcpu))
-			return r;
+		inject_emulated_exception(vcpu);
 	} else if (vcpu->arch.pio.count) {
 		if (!vcpu->arch.pio.in) {
 			/* FIXME: return into emulator if single-stepping.  */
@@ -9749,7 +9785,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu=
, bool *req_immediate_exit)
 	 */
 	if (vcpu->arch.exception.injected)
 		kvm_inject_exception(vcpu);
-	else if (vcpu->arch.exception.pending)
+	else if (kvm_is_exception_pending(vcpu))
 		; /* see above */
 	else if (vcpu->arch.nmi_injected)
 		static_call(kvm_x86_inject_nmi)(vcpu);
@@ -9776,6 +9812,14 @@ static int inject_pending_event(struct kvm_vcpu *vcp=
u, bool *req_immediate_exit)
 	if (r < 0)
 		goto out;
=20
+	/*
+	 * A pending exception VM-Exit should either result in nested VM-Exit
+	 * or force an immediate re-entry and exit to/from L2, and exception
+	 * VM-Exits cannot be injected (flag should _never_ be set).
+	 */
+	WARN_ON_ONCE(vcpu->arch.exception_vmexit.injected ||
+		     vcpu->arch.exception_vmexit.pending);
+
 	/*
 	 * New events, other than exceptions, cannot be injected if KVM needs
 	 * to re-inject a previous event.  See above comments on re-injecting
@@ -9875,7 +9919,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu=
, bool *req_immediate_exit)
 	    kvm_x86_ops.nested_ops->hv_timer_pending(vcpu))
 		*req_immediate_exit =3D true;
=20
-	WARN_ON(vcpu->arch.exception.pending);
+	WARN_ON(kvm_is_exception_pending(vcpu));
 	return 0;
=20
 out:
@@ -10893,6 +10937,7 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
=20
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
+	struct kvm_queued_exception *ex =3D &vcpu->arch.exception;
 	struct kvm_run *kvm_run =3D vcpu->run;
 	int r;
=20
@@ -10951,6 +10996,21 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		}
 	}
=20
+	/*
+	 * If userspace set a pending exception and L2 is active, convert it to
+	 * a pending VM-Exit if L1 wants to intercept the exception.
+	 */
+	if (vcpu->arch.exception_from_userspace && is_guest_mode(vcpu) &&
+	    kvm_x86_ops.nested_ops->is_exception_vmexit(vcpu, ex->vector,
+							ex->error_code)) {
+		kvm_queue_exception_vmexit(vcpu, ex->vector,
+					   ex->has_error_code, ex->error_code,
+					   ex->has_payload, ex->payload);
+		ex->injected =3D false;
+		ex->pending =3D false;
+	}
+	vcpu->arch.exception_from_userspace =3D false;
+
 	if (unlikely(vcpu->arch.complete_userspace_io)) {
 		int (*cui)(struct kvm_vcpu *) =3D vcpu->arch.complete_userspace_io;
 		vcpu->arch.complete_userspace_io =3D NULL;
@@ -11057,6 +11117,7 @@ static void __set_regs(struct kvm_vcpu *vcpu, struc=
t kvm_regs *regs)
 	kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
=20
 	vcpu->arch.exception.pending =3D false;
+	vcpu->arch.exception_vmexit.pending =3D false;
=20
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
@@ -11424,7 +11485,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_=
vcpu *vcpu,
=20
 	if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) {
 		r =3D -EBUSY;
-		if (vcpu->arch.exception.pending)
+		if (kvm_is_exception_pending(vcpu))
 			goto out;
 		if (dbg->control & KVM_GUESTDBG_INJECT_DB)
 			kvm_queue_exception(vcpu, DB_VECTOR);
@@ -12652,7 +12713,7 @@ static inline bool kvm_vcpu_has_events(struct kvm_v=
cpu *vcpu)
 	if (vcpu->arch.pv.pv_unhalted)
 		return true;
=20
-	if (vcpu->arch.exception.pending)
+	if (kvm_is_exception_pending(vcpu))
 		return true;
=20
 	if (kvm_test_request(KVM_REQ_NMI, vcpu) ||
@@ -12907,7 +12968,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
 {
 	if (unlikely(!lapic_in_kernel(vcpu) ||
 		     kvm_event_needs_reinjection(vcpu) ||
-		     vcpu->arch.exception.pending))
+		     kvm_is_exception_pending(vcpu)))
 		return false;
=20
 	if (kvm_hlt_in_guest(vcpu->kvm) && !kvm_can_deliver_async_pf(vcpu))
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 4147d27f9fbc..256745d1a2c3 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -82,10 +82,17 @@ static inline unsigned int __shrink_ple_window(unsigned=
 int val,
 void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu);
 int kvm_check_nested_events(struct kvm_vcpu *vcpu);
=20
+static inline bool kvm_is_exception_pending(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.exception.pending ||
+	       vcpu->arch.exception_vmexit.pending;
+}
+
 static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.exception.pending =3D false;
 	vcpu->arch.exception.injected =3D false;
+	vcpu->arch.exception_vmexit.pending =3D false;
 }
=20
 static inline void kvm_queue_interrupt(struct kvm_vcpu *vcpu, u8 vector,
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1CEADECAAD4
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:20:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232242AbiH3XUL (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:20:11 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40782 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232105AbiH3XS4 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:56 -0400
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E04ACA2871
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:58 -0700 (PDT)
Received: by mail-pl1-x649.google.com with SMTP id
 u14-20020a170902e5ce00b00174b2ad8435so5042481plf.12
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=apaMWeewgN9SCKsQ/b/x81peVp5WzIJeQ3d9tZS54uU=;
        b=J+/GqW68OWNVKsMUQ+Lm/NRPGm758Ko4Ru4CXb6/G3anRjQUb6/vNI3tnjNs9oXpX4
         bM1dqEIh6orkqQ3z+LgS+V1uvGpmznQU5UCg80EYopA/Zm6ag1cTYRit8WAGa9c4wOZ0
         jRsFwwMq8yPLQXI+cqyXHezofIqCk/Z3omr+dQrZNIW9oDZOIjQQ5ryghXZruFKM+EqF
         YepkUY+wC+rPsuqGJzqltlx4G9a13zBIMj77oGRvpwInWgH3PbXezmcDyHxJuX6msqn1
         U4i4diAfENCHPHSvnvoPer8kXM1O2vnkOEbg0/j979DVDO+OL/VnX6sRnqDamEuh+283
         v+Og==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=apaMWeewgN9SCKsQ/b/x81peVp5WzIJeQ3d9tZS54uU=;
        b=6NDxhvnfX5qEmPB9uqsibird+DcEbW8P/j0EQKaT1AXU6TSFgE1wO/hrqI/Bs3Y6m7
         4JQPJMQSrzlB46jXlL08yUR9wwdw8rPxMwXGz4ZC20Hvvmd/HHovMiOksuaFwBncnn/3
         gaHba2TSXdoOridiyFrOBPLKDMiKxteS5s0JECXcd3Cbw0B8u5zrFP9nyPhQdxicKnbJ
         lSnI3oz2O2GDwp33YWbRE/dQxgKVjmcSuDe2h2vaRtyFnCqAlgQNe53T1TaFIgEzXTrA
         VxgorbOtDqjKNen5q0EkgP2aSgwJJcUHwWjtI55aMJoYUmW4Gqej8FvXgZl5/ABelyGI
         a54g==
X-Gm-Message-State: ACgBeo1yEchIYCl5eIGiYofURT/v+/iM33XBVNAB+0Fwpdlz3zauKWAN
        SH4bZj8YJqkGZxWgLTq4ndAhMB6Fok8=
X-Google-Smtp-Source: 
 AA6agR4alFe/DlTkXRLpDZwZxXKY0c/bfpZS/WVRRuTfpQiWlJZ0s/3NKDpT/OTDRZjA4VcP+2ZSnFSHGAw=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:90a:249:b0:1e0:a8a3:3c6c with SMTP id
 t9-20020a17090a024900b001e0a8a33c6cmr11325pje.0.1661901413850; Tue, 30 Aug
 2022 16:16:53 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:09 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-23-seanjc@google.com>
Subject: [PATCH v5 22/27] KVM: x86: Treat pending TRIPLE_FAULT requests as
 pending exceptions
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Treat pending TRIPLE_FAULTS as pending exceptions.  A triple fault is an
exception for all intents and purposes, it's just not tracked as such
because there's no vector associated the exception.  E.g. if userspace
were to set vcpu->request_interrupt_window while running L2 and L2 hit a
triple fault, a triple fault nested VM-Exit should be synthesized to L1
before exiting to userspace with KVM_EXIT_IRQ_WINDOW_OPEN.

Link: https://lore.kernel.org/all/YoVHAIGcFgJit1qp@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 3 ---
 arch/x86/kvm/x86.h | 3 ++-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 17cf43ca42c3..d004e18c7cdb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12742,9 +12742,6 @@ static inline bool kvm_vcpu_has_events(struct kvm_v=
cpu *vcpu)
 	if (kvm_xen_has_pending_events(vcpu))
 		return true;
=20
-	if (kvm_test_request(KVM_REQ_TRIPLE_FAULT, vcpu))
-		return true;
-
 	return false;
 }
=20
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 256745d1a2c3..a784ff90740b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -85,7 +85,8 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu);
 static inline bool kvm_is_exception_pending(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.exception.pending ||
-	       vcpu->arch.exception_vmexit.pending;
+	       vcpu->arch.exception_vmexit.pending ||
+	       kvm_test_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 }
=20
 static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 170ABECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:19:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232262AbiH3XT4 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:19:56 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39490 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232080AbiH3XSo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:44 -0400
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D951FA3D11
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:59 -0700 (PDT)
Received: by mail-pl1-x649.google.com with SMTP id
 s8-20020a170902ea0800b00172e456031eso8731743plg.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=tYq9YcAwjLBbsPBl3g4Rt11xEJcHY89Bgh5mNj3EW+c=;
        b=F1HnI1LbZZGh+1iwnFz0bwwP/9HlRp8aSE9s5qTo2KlgDmgpTWstPBqoW7+Ojn0dg1
         xdZRTxVA8HvFy8Uqhzo+8R4I3wtHvZHRAPbOryN1nprBqtdnjiSzeR5Rq6Wp4GpC3XvG
         Y9wHkpv8IftQG6UPjRvd3Gd+XcdgjL81dsTkaLZMD2LS8/9Ju/TQv0B62AwUf8hYRYgA
         D4fcqkjRBiP7JU2zL9coQ1FlF+2fFjlBAHl9gNy3URmlkWOO0lqqtsrQ2qemAbho7xQF
         PFlkhVX+Q/WTvnJavNm0P8W7lwYFk7CcNXK6N2KZU0KzIoWgu8djZ6SHU77Kkz3tAh2R
         1wkQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=tYq9YcAwjLBbsPBl3g4Rt11xEJcHY89Bgh5mNj3EW+c=;
        b=yn08GeR6jbhQ7lA8XwwYxVClGO+1kG2csx/LdxLiPi5pGUl2b1czMP/cc1CzTGxsdm
         3lmZbzm8jnpInnh/juajK7i/t6IY5V7EOr7SfzRaJxqsrkJC/irCI987F6c58t4HDoFx
         Q9whumivvivDyK2ohQW3mAqsMZxk52Kf8fMJktTsOoXS6itcSU1LAa2mLvrspyPmIhvo
         nCswPbH6vbxdVXhw1SC6A/d3eBWL21MmQnY5CM9Cgem3QNjHJxUI4BceSi6DL9jS/h+9
         m3Fui6m5xum1uyan0c5xq1SiS9Sort5FK2xf7aDn2kECUT8AL+t4AsABPXKcsntlp/Bi
         pI0g==
X-Gm-Message-State: ACgBeo0uFEbKei/LPJy5dr/Pkwoeflt+3zHqi81osCqenWo6lxMbZUPf
        EU7tA+Abz4mKfsND0gtaJRuRV6eKO+E=
X-Google-Smtp-Source: 
 AA6agR6RHeCjSt+01YIz5kfeURH9q9hI3n0T4/icOqjmk3Zb8GUCUC+P264qqPI2xAk/yG/uDuMJXZaXR1M=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:90b:4b0c:b0:1f4:ffac:699a with SMTP id
 lx12-20020a17090b4b0c00b001f4ffac699amr333294pjb.145.1661901415663; Tue, 30
 Aug 2022 16:16:55 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:10 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-24-seanjc@google.com>
Subject: [PATCH v5 23/27] KVM: VMX: Update MTF and ICEBP comments to document
 KVM's subtle behavior
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Document the oddities of ICEBP interception (trap-like #DB is intercepted
as a fault-like exception), and how using VMX's inner "skip" helper
deliberately bypasses the pending MTF and single-step #DB logic.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c6fc37415fc6..8875d9c448c2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1652,9 +1652,13 @@ static void vmx_update_emulated_instruction(struct k=
vm_vcpu *vcpu)
=20
 	/*
 	 * Per the SDM, MTF takes priority over debug-trap exceptions besides
-	 * T-bit traps. As instruction emulation is completed (i.e. at the
-	 * instruction boundary), any #DB exception pending delivery must be a
-	 * debug-trap. Record the pending MTF state to be delivered in
+	 * TSS T-bit traps and ICEBP (INT1).  KVM doesn't emulate T-bit traps
+	 * or ICEBP (in the emulator proper), and skipping of ICEBP after an
+	 * intercepted #DB deliberately avoids single-step #DB and MTF updates
+	 * as ICEBP is higher priority than both.  As instruction emulation is
+	 * completed at this point (i.e. KVM is at the instruction boundary),
+	 * any #DB exception pending delivery must be a debug-trap of lower
+	 * priority than MTF.  Record the pending MTF state to be delivered in
 	 * vmx_check_nested_events().
 	 */
 	if (nested_cpu_has_mtf(vmcs12) &&
@@ -5165,8 +5169,10 @@ static int handle_exception_nmi(struct kvm_vcpu *vcp=
u)
 			 * instruction.  ICEBP generates a trap-like #DB, but
 			 * despite its interception control being tied to #DB,
 			 * is an instruction intercept, i.e. the VM-Exit occurs
-			 * on the ICEBP itself.  Note, skipping ICEBP also
-			 * clears STI and MOVSS blocking.
+			 * on the ICEBP itself.  Use the inner "skip" helper to
+			 * avoid single-step #DB and MTF updates, as ICEBP is
+			 * higher priority.  Note, skipping ICEBP still clears
+			 * STI and MOVSS blocking.
 			 *
 			 * For all other #DBs, set vmcs.PENDING_DBG_EXCEPTIONS.BS
 			 * if single-step is enabled in RFLAGS and STI or MOVSS
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 22D30ECAAD4
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:20:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232059AbiH3XUG (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:20:06 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37816 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231784AbiH3XSp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:18:45 -0400
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39C1AA3D26
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:59 -0700 (PDT)
Received: by mail-pj1-x104a.google.com with SMTP id
 92-20020a17090a09e500b001d917022847so5233128pjo.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:16:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=ppFRkhG99Lekds01NjhzNIGVVTbKBQGCAaihGiWF468=;
        b=EPrQnz5TZu1DM3Qso0wkPVPEPDWGIYBwKc9otwlVqCzNXXWzBj9zKRq9lA3BDkW6lA
         KN/SH0Jyul5m5uLFWXqKoco8p4r5ChqoOjw4iJjxFpilADDC09XL3Ud9FjnlL+AcHMUi
         fjGRh29o204uIulHCMudGmdUxZAeukPc1mGi3qCVcuAAPH5Bhtj6DOplNW/7HXfNWJzu
         Iy/ccXeuklhzrEdJlLmDANaYUgeZiQX7F79qqik9eBbyUXuG2oqKFr5HNffhVWtXeNwl
         zRAE7qX2QHa27441JdmzjGR4S4y8yjl5oqED0OAoYuynvGx3Nbn5IAe63rnIho1nR2by
         LdKA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=ppFRkhG99Lekds01NjhzNIGVVTbKBQGCAaihGiWF468=;
        b=mpvQwERSSmG91UlKAl4/kAZDimmskHmWplpUKlU42hBDdKNaJjtQdEKdVo7rdd7H6x
         +gzscQJNL3+Oxw2JaKrnMiiu6jP44cpAQQNbPo3EcXb0SZ6uDC6lEQRS+X0DK2JjT8qd
         Fq263wUomr5L1wTcE7IhRgvfyEeHhO2Ys3lMh9jAV3VxLReJG5EeFc8E8NxhMf+lRBzh
         LMLkH735+cwiDLarkmnn7ZYbAUs1TZwvy950ZUw7F22pSh8OXSlvFMPVYY1mgZiHnkST
         NTuQclb7Y0VQ6Jk2IOjRyNr0AWBgXbvdIzWkjtOJhuwxEEHOs4iTTqmSbiTAN/jAqmth
         TWmg==
X-Gm-Message-State: ACgBeo3Lz5Bmc3zUvIBiFcyOUGs2AH3KfjG83c3dS/pdfla0FXzqPaG1
        EOonGzZYm2dEkNawK0El8fj2bopVCx8=
X-Google-Smtp-Source: 
 AA6agR6Gw2OIgKEMgtNFmk7jbNSdCA5FuTDIwR93NeBqqGEzHzaWbjNgr4CyfTtDKcOU5iWRTEqQA2vXxCo=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a63:fb05:0:b0:430:268f:f8c9 with SMTP id
 o5-20020a63fb05000000b00430268ff8c9mr398202pgh.559.1661901417187; Tue, 30 Aug
 2022 16:16:57 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:11 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-25-seanjc@google.com>
Subject: [PATCH v5 24/27] KVM: x86: Rename inject_pending_events() to
 kvm_check_and_inject_events()
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Rename inject_pending_events() to kvm_check_and_inject_events() in order
to capture the fact that it handles more than just pending events, and to
(mostly) align with kvm_check_nested_events(), which omits the "inject"
for brevity.

Add a comment above kvm_check_and_inject_events() to provide a high-level
synopsis, and to document a virtualization hole (KVM erratum if you will)
that exists due to KVM not strictly tracking instruction boundaries with
respect to coincident instruction restarts and asynchronous events.

No functional change inteded.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c |  2 +-
 arch/x86/kvm/svm/svm.c    |  2 +-
 arch/x86/kvm/x86.c        | 46 ++++++++++++++++++++++++++++++++++++---
 3 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 405075286965..6b3b18404533 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1312,7 +1312,7 @@ static void nested_svm_inject_exception_vmexit(struct=
 kvm_vcpu *vcpu)
 		else
 			vmcb->control.exit_info_2 =3D vcpu->arch.cr2;
 	} else if (ex->vector =3D=3D DB_VECTOR) {
-		/* See inject_pending_event.  */
+		/* See kvm_check_and_inject_events().  */
 		kvm_deliver_exception_payload(vcpu, ex);
=20
 		if (vcpu->arch.dr7 & DR7_GD) {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dbd10d61f29d..fc6eae94aa61 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3520,7 +3520,7 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu =
*vcpu, int delivery_mode,
=20
 	/* Note, this is called iff the local APIC is in-kernel. */
 	if (!READ_ONCE(vcpu->arch.apic->apicv_active)) {
-		/* Process the interrupt via inject_pending_event */
+		/* Process the interrupt via kvm_check_and_inject_events(). */
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
 		kvm_vcpu_kick(vcpu);
 		return;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d004e18c7cdb..45f295d35cc9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9745,7 +9745,47 @@ static void kvm_inject_exception(struct kvm_vcpu *vc=
pu)
 	static_call(kvm_x86_inject_exception)(vcpu);
 }
=20
-static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate=
_exit)
+/*
+ * Check for any event (interrupt or exception) that is ready to be inject=
ed,
+ * and if there is at least one event, inject the event with the highest
+ * priority.  This handles both "pending" events, i.e. events that have ne=
ver
+ * been injected into the guest, and "injected" events, i.e. events that w=
ere
+ * injected as part of a previous VM-Enter, but weren't successfully deliv=
ered
+ * and need to be re-injected.
+ *
+ * Note, this is not guaranteed to be invoked on a guest instruction bound=
ary,
+ * i.e. doesn't guarantee that there's an event window in the guest.  KVM =
must
+ * be able to inject exceptions in the "middle" of an instruction, and so =
must
+ * also be able to re-inject NMIs and IRQs in the middle of an instruction.
+ * I.e. for exceptions and re-injected events, NOT invoking this on instru=
ction
+ * boundaries is necessary and correct.
+ *
+ * For simplicity, KVM uses a single path to inject all events (except eve=
nts
+ * that are injected directly from L1 to L2) and doesn't explicitly track
+ * instruction boundaries for asynchronous events.  However, because VM-Ex=
its
+ * that can occur during instruction execution typically result in KVM ski=
pping
+ * the instruction or injecting an exception, e.g. instruction and excepti=
on
+ * intercepts, and because pending exceptions have higher priority than pe=
nding
+ * interrupts, KVM still honors instruction boundaries in most scenarios.
+ *
+ * But, if a VM-Exit occurs during instruction execution, and KVM does NOT=
 skip
+ * the instruction or inject an exception, then KVM can incorrecty inject =
a new
+ * asynchrounous event if the event became pending after the CPU fetched t=
he
+ * instruction (in the guest).  E.g. if a page fault (#PF, #NPF, EPT viola=
tion)
+ * occurs and is resolved by KVM, a coincident NMI, SMI, IRQ, etc... can be
+ * injected on the restarted instruction instead of being deferred until t=
he
+ * instruction completes.
+ *
+ * In practice, this virtualization hole is unlikely to be observed by the
+ * guest, and even less likely to cause functional problems.  To detect the
+ * hole, the guest would have to trigger an event on a side effect of an e=
arly
+ * phase of instruction execution, e.g. on the instruction fetch from memo=
ry.
+ * And for it to be a functional problem, the guest would need to depend o=
n the
+ * ordering between that side effect, the instruction completing, _and_ the
+ * delivery of the asynchronous event.
+ */
+static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
+				       bool *req_immediate_exit)
 {
 	bool can_inject;
 	int r;
@@ -10224,7 +10264,7 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	 * When APICv gets disabled, we may still have injected interrupts
 	 * pending. At the same time, KVM_REQ_EVENT may not be set as APICv was
 	 * still active when the interrupt got accepted. Make sure
-	 * inject_pending_event() is called to check for that.
+	 * kvm_check_and_inject_events() is called to check for that.
 	 */
 	if (!apic->apicv_active)
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
@@ -10521,7 +10561,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			goto out;
 		}
=20
-		r =3D inject_pending_event(vcpu, &req_immediate_exit);
+		r =3D kvm_check_and_inject_events(vcpu, &req_immediate_exit);
 		if (r < 0) {
 			r =3D 0;
 			goto out;
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9A3B4ECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:20:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231819AbiH3XU2 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:20:28 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37866 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231771AbiH3XTM (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:19:12 -0400
Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com
 [IPv6:2607:f8b0:4864:20::114a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 662B9A3D58
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:17:04 -0700 (PDT)
Received: by mail-yw1-x114a.google.com with SMTP id
 00721157ae682-33e1114437fso190632177b3.19
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:17:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=19wpStpWgjciG3Cnonw3gTfd1FpuXzvtxhmxo4kcyWo=;
        b=QuhmWsK4UlqQbc8jWk0gJKIwhBoOrSZEb3GpvJXgUlj/Fjm2/3wTkU2exiwLaIsVRK
         eyoT0Qxr/wA6WYH6074LA1x4nUBl1bPMdb83s1zqUQ/uAf5rhVHxQePUoyQZ9nbQzo76
         6w6HHeKUc6nVm15OUP5Dqidpt656uV4Vb0A/r49hf7xApsF9bqRNI4VJQmhDw0teGzbU
         U0IVbXAeC0IHu+rqqvl9l/q/LW/uDU0UmBP/7+molaMmPvQz2vvGMuKBTr3JlnC+O/Gv
         Wj6z1676sq9rClwaFpYaWv6QR+MNXNmokkDCGkeA7tuzPnBorqyDrcP+a8J9QvVyGbKa
         4t5Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=19wpStpWgjciG3Cnonw3gTfd1FpuXzvtxhmxo4kcyWo=;
        b=Tk1UedULWiX1qv8y/UXZeInLR4jvkR4FSGOLBH3TqgaeZIGGCvyLqv/F6/ylXD9+4k
         nmUsGdeLnzUkicnEJ628z4F737KW2eXYbGSA8QYPw+eY7NKbOfKQuFu5jp+l8lHeMYc8
         7zQN3E4otaGfUNH4yj/R0uufzH8bhB7/xo5NXUBejFeunG/a3bG7zzgEufjT6jU/U6rl
         FXDuZe8P512CNIhNGtNdDB7xRbNAsTkgmXT3gv21aGIEiKr4imZ3N5VD5avSDNAcdA2J
         pmAz50i5A0lxqPkySjsCVTIjJs/6yQVAIWO1hTrL8b9UoRKSdg0323K81uacTcuyzVEe
         CFZg==
X-Gm-Message-State: ACgBeo2twyFHe939zjMfeXm4jIJdLYbiV+RRbR+JSca8fedXIwPH8V64
        gDLGhls3joB0PJO3k67q2bNhSSRfvNQ=
X-Google-Smtp-Source: 
 AA6agR56twy6OZVHnzIYAie76XeVBifGue4TfnX2tu4ywJqWqbTemGxxRZX5qCD4BYL7ogNAiir9PTwDilM=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a25:23d4:0:b0:695:65fb:bf66 with SMTP id
 j203-20020a2523d4000000b0069565fbbf66mr13671172ybj.3.1661901418977; Tue, 30
 Aug 2022 16:16:58 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:12 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-26-seanjc@google.com>
Subject: [PATCH v5 25/27] KVM: selftests: Use uapi header to get VMX and SVM
 exit reasons/codes
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Include the vmx.h and svm.h uapi headers that KVM so kindly provides
instead of manually defining all the same exit reasons/code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 .../selftests/kvm/include/x86_64/svm_util.h   |  7 +--
 .../selftests/kvm/include/x86_64/vmx.h        | 51 +------------------
 2 files changed, 4 insertions(+), 54 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/svm_util.h b/tools/=
testing/selftests/kvm/include/x86_64/svm_util.h
index a339b537a575..7aee6244ab6a 100644
--- a/tools/testing/selftests/kvm/include/x86_64/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
@@ -9,15 +9,12 @@
 #ifndef SELFTEST_KVM_SVM_UTILS_H
 #define SELFTEST_KVM_SVM_UTILS_H
=20
+#include <asm/svm.h>
+
 #include <stdint.h>
 #include "svm.h"
 #include "processor.h"
=20
-#define SVM_EXIT_EXCP_BASE	0x040
-#define SVM_EXIT_HLT		0x078
-#define SVM_EXIT_MSR		0x07c
-#define SVM_EXIT_VMMCALL	0x081
-
 struct svm_test_data {
 	/* VMCB */
 	struct vmcb *vmcb; /* gva */
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testi=
ng/selftests/kvm/include/x86_64/vmx.h
index 99fa1410964c..e4206f69b716 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -8,6 +8,8 @@
 #ifndef SELFTEST_KVM_VMX_H
 #define SELFTEST_KVM_VMX_H
=20
+#include <asm/vmx.h>
+
 #include <stdint.h>
 #include "processor.h"
 #include "apic.h"
@@ -100,55 +102,6 @@
 #define VMX_EPT_VPID_CAP_AD_BITS		0x00200000
=20
 #define EXIT_REASON_FAILED_VMENTRY	0x80000000
-#define EXIT_REASON_EXCEPTION_NMI	0
-#define EXIT_REASON_EXTERNAL_INTERRUPT	1
-#define EXIT_REASON_TRIPLE_FAULT	2
-#define EXIT_REASON_INTERRUPT_WINDOW	7
-#define EXIT_REASON_NMI_WINDOW		8
-#define EXIT_REASON_TASK_SWITCH		9
-#define EXIT_REASON_CPUID		10
-#define EXIT_REASON_HLT			12
-#define EXIT_REASON_INVD		13
-#define EXIT_REASON_INVLPG		14
-#define EXIT_REASON_RDPMC		15
-#define EXIT_REASON_RDTSC		16
-#define EXIT_REASON_VMCALL		18
-#define EXIT_REASON_VMCLEAR		19
-#define EXIT_REASON_VMLAUNCH		20
-#define EXIT_REASON_VMPTRLD		21
-#define EXIT_REASON_VMPTRST		22
-#define EXIT_REASON_VMREAD		23
-#define EXIT_REASON_VMRESUME		24
-#define EXIT_REASON_VMWRITE		25
-#define EXIT_REASON_VMOFF		26
-#define EXIT_REASON_VMON		27
-#define EXIT_REASON_CR_ACCESS		28
-#define EXIT_REASON_DR_ACCESS		29
-#define EXIT_REASON_IO_INSTRUCTION	30
-#define EXIT_REASON_MSR_READ		31
-#define EXIT_REASON_MSR_WRITE		32
-#define EXIT_REASON_INVALID_STATE	33
-#define EXIT_REASON_MWAIT_INSTRUCTION	36
-#define EXIT_REASON_MONITOR_INSTRUCTION 39
-#define EXIT_REASON_PAUSE_INSTRUCTION	40
-#define EXIT_REASON_MCE_DURING_VMENTRY	41
-#define EXIT_REASON_TPR_BELOW_THRESHOLD 43
-#define EXIT_REASON_APIC_ACCESS		44
-#define EXIT_REASON_EOI_INDUCED		45
-#define EXIT_REASON_EPT_VIOLATION	48
-#define EXIT_REASON_EPT_MISCONFIG	49
-#define EXIT_REASON_INVEPT		50
-#define EXIT_REASON_RDTSCP		51
-#define EXIT_REASON_PREEMPTION_TIMER	52
-#define EXIT_REASON_INVVPID		53
-#define EXIT_REASON_WBINVD		54
-#define EXIT_REASON_XSETBV		55
-#define EXIT_REASON_APIC_WRITE		56
-#define EXIT_REASON_INVPCID		58
-#define EXIT_REASON_PML_FULL		62
-#define EXIT_REASON_XSAVES		63
-#define EXIT_REASON_XRSTORS		64
-#define LAST_EXIT_REASON		64
=20
 enum vmcs_field {
 	VIRTUAL_PROCESSOR_ID		=3D 0x00000000,
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A1878ECAAD4
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:20:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232311AbiH3XUf (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:20:35 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41202 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231145AbiH3XTU (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:19:20 -0400
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1145A2DA6
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:17:05 -0700 (PDT)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-33e1114437fso190632617b3.19
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:17:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=L7Hm+ZuqEiC3iH5SNGorPOrTHOUA8BBAQ/SrhNHV+zI=;
        b=DO0lp9rPa0mC5g72aTB/5AhCc9T1NoAlKL7ZA7iWAuXJAUPhaAkv+T8Yesjotl4wOS
         CJ2IQeeUpzHxo67yBWwQ9+GFySEfabCggF3qvFE/xEvJvZCuygWCWQLBtCiQZxWdVHRR
         AKzDle+5YJw1vYQ6144cOC1w8zyXNqKQ98YZghibUpncb/YTd4DEcPEr905el+4CD48C
         67/UzzgNS/B9DRrfuHTry3lLp/4OsYKSTlKyYwNa909vhVuHmddz54st2vJYcX12kuVc
         MZDtFhnY6VsMh3yjn4NsRYvIO3nlRCNp/K66peWtIaKp1bc3UpoOfYCFMO6HxKNJ9hE5
         hUPw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=L7Hm+ZuqEiC3iH5SNGorPOrTHOUA8BBAQ/SrhNHV+zI=;
        b=31Im5VokKANHXDSSMsIHNTrJf8whXDQTfspoxtvzKuRIUYzNcLbNT8o6nWY4gd7tD7
         8gn8KfR+F72rGrH4umniwjDyfan39psxFXK0QAwmog2tU7FRfHRGkZAQqB7lZTT15NC0
         iPmfPG5r4ARrUTc1W6Iat3RMmCshEt1nM0BUYogW5kzYFSBZYOaXV+ME1c2M5A6qx4k7
         dgMFsksTvRS00kjnJKbiNniiujdb2SNqQKXeeDLQhh873tNLPOluwqdZmLsBIiMejzu5
         /mBoGSRAyX8BoGpHTB6mDy6gBtD2ZFQyJvB5lkzwCWWMrYpzYdQDV4+jfVo+C1gcFtF1
         nOZg==
X-Gm-Message-State: ACgBeo1trwO4r6yFqgwuwKyZwzqzgh2VQjin/iXVPgIDQyo7pkBFUMD9
        JJ/PSt+upv4Dc2GurkloJidSLXYxk08=
X-Google-Smtp-Source: 
 AA6agR4HF3x7XKIzxCE69a0Rti3U7q5npyHrD1nmRyMpow25IquQ2NGtq4sWRvrdmz84DIj0/6x690bpQHk=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a81:d552:0:b0:341:7c5:5e5e with SMTP id
 l18-20020a81d552000000b0034107c55e5emr10207689ywj.318.1661901420581; Tue, 30
 Aug 2022 16:17:00 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:13 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-27-seanjc@google.com>
Subject: [PATCH v5 26/27] KVM: selftests: Add an x86-only test to verify
 nested exception queueing
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add a test to verify that KVM_{G,S}ET_EVENTS play nice with pending vs.
injected exceptions when an exception is being queued for L2, and that
KVM correctly handles L1's exception intercept wants.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/x86_64/nested_exceptions_test.c       | 295 ++++++++++++++++++
 3 files changed, 297 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/nested_exceptions_te=
st.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftes=
ts/kvm/.gitignore
index d625a3f83780..45d9aee1c0d8 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -28,6 +28,7 @@
 /x86_64/max_vcpuid_cap_test
 /x86_64/mmio_warning_test
 /x86_64/monitor_mwait_test
+/x86_64/nested_exceptions_test
 /x86_64/nx_huge_pages_test
 /x86_64/platform_info_test
 /x86_64/pmu_event_filter_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests=
/kvm/Makefile
index 4c122f1b1737..8b1b32628ac8 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -89,6 +89,7 @@ TEST_GEN_PROGS_x86_64 +=3D x86_64/kvm_clock_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/kvm_pv_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/mmio_warning_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/monitor_mwait_test
+TEST_GEN_PROGS_x86_64 +=3D x86_64/nested_exceptions_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/platform_info_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/pmu_event_filter_test
 TEST_GEN_PROGS_x86_64 +=3D x86_64/set_boot_cpu_id
diff --git a/tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c b/=
tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c
new file mode 100644
index 000000000000..ac33835f78f4
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/nested_exceptions_test.c
@@ -0,0 +1,295 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#define _GNU_SOURCE /* for program_invocation_short_name */
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "vmx.h"
+#include "svm_util.h"
+
+#define L2_GUEST_STACK_SIZE 256
+
+/*
+ * Arbitrary, never shoved into KVM/hardware, just need to avoid conflict =
with
+ * the "real" exceptions used, #SS/#GP/#DF (12/13/8).
+ */
+#define FAKE_TRIPLE_FAULT_VECTOR	0xaa
+
+/* Arbitrary 32-bit error code injected by this test. */
+#define SS_ERROR_CODE 0xdeadbeef
+
+/*
+ * Bit '0' is set on Intel if the exception occurs while delivering a prev=
ious
+ * event/exception.  AMD's wording is ambiguous, but presumably the bit is=
 set
+ * if the exception occurs while delivering an external event, e.g. NMI or=
 INTR,
+ * but not for exceptions that occur when delivering other exceptions or
+ * software interrupts.
+ *
+ * Note, Intel's name for it, "External event", is misleading and much more
+ * aligned with AMD's behavior, but the SDM is quite clear on its behavior.
+ */
+#define ERROR_CODE_EXT_FLAG	BIT(0)
+
+/*
+ * Bit '1' is set if the fault occurred when looking up a descriptor in the
+ * IDT, which is the case here as the IDT is empty/NULL.
+ */
+#define ERROR_CODE_IDT_FLAG	BIT(1)
+
+/*
+ * The #GP that occurs when vectoring #SS should show the index into the I=
DT
+ * for #SS, plus have the "IDT flag" set.
+ */
+#define GP_ERROR_CODE_AMD ((SS_VECTOR * 8) | ERROR_CODE_IDT_FLAG)
+#define GP_ERROR_CODE_INTEL ((SS_VECTOR * 8) | ERROR_CODE_IDT_FLAG | ERROR=
_CODE_EXT_FLAG)
+
+/*
+ * Intel and AMD both shove '0' into the error code on #DF, regardless of =
what
+ * led to the double fault.
+ */
+#define DF_ERROR_CODE 0
+
+#define INTERCEPT_SS		(BIT_ULL(SS_VECTOR))
+#define INTERCEPT_SS_DF		(INTERCEPT_SS | BIT_ULL(DF_VECTOR))
+#define INTERCEPT_SS_GP_DF	(INTERCEPT_SS_DF | BIT_ULL(GP_VECTOR))
+
+static void l2_ss_pending_test(void)
+{
+	GUEST_SYNC(SS_VECTOR);
+}
+
+static void l2_ss_injected_gp_test(void)
+{
+	GUEST_SYNC(GP_VECTOR);
+}
+
+static void l2_ss_injected_df_test(void)
+{
+	GUEST_SYNC(DF_VECTOR);
+}
+
+static void l2_ss_injected_tf_test(void)
+{
+	GUEST_SYNC(FAKE_TRIPLE_FAULT_VECTOR);
+}
+
+static void svm_run_l2(struct svm_test_data *svm, void *l2_code, int vecto=
r,
+		       uint32_t error_code)
+{
+	struct vmcb *vmcb =3D svm->vmcb;
+	struct vmcb_control_area *ctrl =3D &vmcb->control;
+
+	vmcb->save.rip =3D (u64)l2_code;
+	run_guest(vmcb, svm->vmcb_gpa);
+
+	if (vector =3D=3D FAKE_TRIPLE_FAULT_VECTOR)
+		return;
+
+	GUEST_ASSERT_EQ(ctrl->exit_code, (SVM_EXIT_EXCP_BASE + vector));
+	GUEST_ASSERT_EQ(ctrl->exit_info_1, error_code);
+}
+
+static void l1_svm_code(struct svm_test_data *svm)
+{
+	struct vmcb_control_area *ctrl =3D &svm->vmcb->control;
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+
+	generic_svm_setup(svm, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+	svm->vmcb->save.idtr.limit =3D 0;
+	ctrl->intercept |=3D BIT_ULL(INTERCEPT_SHUTDOWN);
+
+	ctrl->intercept_exceptions =3D INTERCEPT_SS_GP_DF;
+	svm_run_l2(svm, l2_ss_pending_test, SS_VECTOR, SS_ERROR_CODE);
+	svm_run_l2(svm, l2_ss_injected_gp_test, GP_VECTOR, GP_ERROR_CODE_AMD);
+
+	ctrl->intercept_exceptions =3D INTERCEPT_SS_DF;
+	svm_run_l2(svm, l2_ss_injected_df_test, DF_VECTOR, DF_ERROR_CODE);
+
+	ctrl->intercept_exceptions =3D INTERCEPT_SS;
+	svm_run_l2(svm, l2_ss_injected_tf_test, FAKE_TRIPLE_FAULT_VECTOR, 0);
+	GUEST_ASSERT_EQ(ctrl->exit_code, SVM_EXIT_SHUTDOWN);
+
+	GUEST_DONE();
+}
+
+static void vmx_run_l2(void *l2_code, int vector, uint32_t error_code)
+{
+	GUEST_ASSERT(!vmwrite(GUEST_RIP, (u64)l2_code));
+
+	GUEST_ASSERT_EQ(vector =3D=3D SS_VECTOR ? vmlaunch() : vmresume(), 0);
+
+	if (vector =3D=3D FAKE_TRIPLE_FAULT_VECTOR)
+		return;
+
+	GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_EXCEPTION_NMI);
+	GUEST_ASSERT_EQ((vmreadz(VM_EXIT_INTR_INFO) & 0xff), vector);
+	GUEST_ASSERT_EQ(vmreadz(VM_EXIT_INTR_ERROR_CODE), error_code);
+}
+
+static void l1_vmx_code(struct vmx_pages *vmx)
+{
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+
+	GUEST_ASSERT_EQ(prepare_for_vmx_operation(vmx), true);
+
+	GUEST_ASSERT_EQ(load_vmcs(vmx), true);
+
+	prepare_vmcs(vmx, NULL, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+	GUEST_ASSERT_EQ(vmwrite(GUEST_IDTR_LIMIT, 0), 0);
+
+	/*
+	 * VMX disallows injecting an exception with error_code[31:16] !=3D 0,
+	 * and hardware will never generate a VM-Exit with bits 31:16 set.
+	 * KVM should likewise truncate the "bad" userspace value.
+	 */
+	GUEST_ASSERT_EQ(vmwrite(EXCEPTION_BITMAP, INTERCEPT_SS_GP_DF), 0);
+	vmx_run_l2(l2_ss_pending_test, SS_VECTOR, (u16)SS_ERROR_CODE);
+	vmx_run_l2(l2_ss_injected_gp_test, GP_VECTOR, GP_ERROR_CODE_INTEL);
+
+	GUEST_ASSERT_EQ(vmwrite(EXCEPTION_BITMAP, INTERCEPT_SS_DF), 0);
+	vmx_run_l2(l2_ss_injected_df_test, DF_VECTOR, DF_ERROR_CODE);
+
+	GUEST_ASSERT_EQ(vmwrite(EXCEPTION_BITMAP, INTERCEPT_SS), 0);
+	vmx_run_l2(l2_ss_injected_tf_test, FAKE_TRIPLE_FAULT_VECTOR, 0);
+	GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_TRIPLE_FAULT);
+
+	GUEST_DONE();
+}
+
+static void __attribute__((__flatten__)) l1_guest_code(void *test_data)
+{
+	if (this_cpu_has(X86_FEATURE_SVM))
+		l1_svm_code(test_data);
+	else
+		l1_vmx_code(test_data);
+}
+
+static void assert_ucall_vector(struct kvm_vcpu *vcpu, int vector)
+{
+	struct kvm_run *run =3D vcpu->run;
+	struct ucall uc;
+
+	TEST_ASSERT(run->exit_reason =3D=3D KVM_EXIT_IO,
+		    "Unexpected exit reason: %u (%s),\n",
+		    run->exit_reason, exit_reason_str(run->exit_reason));
+
+	switch (get_ucall(vcpu, &uc)) {
+	case UCALL_SYNC:
+		TEST_ASSERT(vector =3D=3D uc.args[1],
+			    "Expected L2 to ask for %d, got %ld", vector, uc.args[1]);
+		break;
+	case UCALL_DONE:
+		TEST_ASSERT(vector =3D=3D -1,
+			    "Expected L2 to ask for %d, L2 says it's done", vector);
+		break;
+	case UCALL_ABORT:
+		TEST_FAIL("%s at %s:%ld (0x%lx !=3D 0x%lx)",
+			  (const char *)uc.args[0], __FILE__, uc.args[1],
+			  uc.args[2], uc.args[3]);
+		break;
+	default:
+		TEST_FAIL("Expected L2 to ask for %d, got unexpected ucall %lu", vector,=
 uc.cmd);
+	}
+}
+
+static void queue_ss_exception(struct kvm_vcpu *vcpu, bool inject)
+{
+	struct kvm_vcpu_events events;
+
+	vcpu_events_get(vcpu, &events);
+
+	TEST_ASSERT(!events.exception.pending,
+		    "Vector %d unexpectedlt pending", events.exception.nr);
+	TEST_ASSERT(!events.exception.injected,
+		    "Vector %d unexpectedly injected", events.exception.nr);
+
+	events.flags =3D KVM_VCPUEVENT_VALID_PAYLOAD;
+	events.exception.pending =3D !inject;
+	events.exception.injected =3D inject;
+	events.exception.nr =3D SS_VECTOR;
+	events.exception.has_error_code =3D true;
+	events.exception.error_code =3D SS_ERROR_CODE;
+	vcpu_events_set(vcpu, &events);
+}
+
+/*
+ * Verify KVM_{G,S}ET_EVENTS play nice with pending vs. injected exceptions
+ * when an exception is being queued for L2.  Specifically, verify that KVM
+ * honors L1 exception intercept controls when a #SS is pending/injected,
+ * triggers a #GP on vectoring the #SS, morphs to #DF if #GP isn't interce=
pted
+ * by L1, and finally causes (nested) SHUTDOWN if #DF isn't intercepted by=
 L1.
+ */
+int main(int argc, char *argv[])
+{
+	vm_vaddr_t nested_test_data_gva;
+	struct kvm_vcpu_events events;
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_EXCEPTION_PAYLOAD));
+	TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM) || kvm_cpu_has(X86_FEATURE_VMX)=
);
+
+	vm =3D vm_create_with_one_vcpu(&vcpu, l1_guest_code);
+	vm_enable_cap(vm, KVM_CAP_EXCEPTION_PAYLOAD, -2ul);
+
+	if (kvm_cpu_has(X86_FEATURE_SVM))
+		vcpu_alloc_svm(vm, &nested_test_data_gva);
+	else
+		vcpu_alloc_vmx(vm, &nested_test_data_gva);
+
+	vcpu_args_set(vcpu, 1, nested_test_data_gva);
+
+	/* Run L1 =3D> L2.  L2 should sync and request #SS. */
+	vcpu_run(vcpu);
+	assert_ucall_vector(vcpu, SS_VECTOR);
+
+	/* Pend #SS and request immediate exit.  #SS should still be pending. */
+	queue_ss_exception(vcpu, false);
+	vcpu->run->immediate_exit =3D true;
+	vcpu_run_complete_io(vcpu);
+
+	/* Verify the pending events comes back out the same as it went in. */
+	vcpu_events_get(vcpu, &events);
+	ASSERT_EQ(events.flags & KVM_VCPUEVENT_VALID_PAYLOAD,
+		  KVM_VCPUEVENT_VALID_PAYLOAD);
+	ASSERT_EQ(events.exception.pending, true);
+	ASSERT_EQ(events.exception.nr, SS_VECTOR);
+	ASSERT_EQ(events.exception.has_error_code, true);
+	ASSERT_EQ(events.exception.error_code, SS_ERROR_CODE);
+
+	/*
+	 * Run for real with the pending #SS, L1 should get a VM-Exit due to
+	 * #SS interception and re-enter L2 to request #GP (via injected #SS).
+	 */
+	vcpu->run->immediate_exit =3D false;
+	vcpu_run(vcpu);
+	assert_ucall_vector(vcpu, GP_VECTOR);
+
+	/*
+	 * Inject #SS, the #SS should bypass interception and cause #GP, which
+	 * L1 should intercept before KVM morphs it to #DF.  L1 should then
+	 * disable #GP interception and run L2 to request #DF (via #SS =3D> #GP).
+	 */
+	queue_ss_exception(vcpu, true);
+	vcpu_run(vcpu);
+	assert_ucall_vector(vcpu, DF_VECTOR);
+
+	/*
+	 * Inject #SS, the #SS should bypass interception and cause #GP, which
+	 * L1 is no longer interception, and so should see a #DF VM-Exit.  L1
+	 * should then signal that is done.
+	 */
+	queue_ss_exception(vcpu, true);
+	vcpu_run(vcpu);
+	assert_ucall_vector(vcpu, FAKE_TRIPLE_FAULT_VECTOR);
+
+	/*
+	 * Inject #SS yet again.  L1 is not intercepting #GP or #DF, and so
+	 * should see nested TRIPLE_FAULT / SHUTDOWN.
+	 */
+	queue_ss_exception(vcpu, true);
+	vcpu_run(vcpu);
+	assert_ucall_vector(vcpu, -1);
+
+	kvm_vm_free(vm);
+}
--=20
2.37.2.672.g94769d06f0-goog
From nobody Tue Apr  7 05:23:32 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3DCE6ECAAA1
	for <linux-kernel@archiver.kernel.org>; Tue, 30 Aug 2022 23:20:33 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232300AbiH3XUb (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 30 Aug 2022 19:20:31 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38804 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232139AbiH3XTM (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2022 19:19:12 -0400
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0DE597B2B
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:17:05 -0700 (PDT)
Received: by mail-pl1-x649.google.com with SMTP id
 u14-20020a170902e5ce00b00174b2ad8435so5042620plf.12
        for <linux-kernel@vger.kernel.org>;
 Tue, 30 Aug 2022 16:17:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc;
        bh=7hcgE+3kLvfteqMqTOqZFsuXwWUwlLmJNQ6fU1SyGZw=;
        b=b4dE2Vhv8nSNaxaXbjIEuN3T/poWlkjAUIEj1Rn0G0mxHCgj31H/wQBsHtIi/ucSk5
         +fT9aNMFDIHFrTu//xQUfLvzdlwSRAzu0gFKUvGXYlBqUXZAOydYBSRjtQ4KCuNc5Fi0
         UHvFcgeF34kgihUR6V0l85BEW05UIJ2nK/qP2pQzrT5pVeH2HUJLck5vdwpM/DiOwpNe
         H1UVd5MYfva+oWp0yeeJMD/A/p1MP0hhu4MHWXwYVsDXjqe/9Elc/0qgYoD7vD5+qP5b
         5wQ6t7ph6ADB/GmzSsa3GhioYEIfKp+5biRXYxcYnC6KxPLpvrygRtCMNMMnin/x48Cb
         m+bw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=7hcgE+3kLvfteqMqTOqZFsuXwWUwlLmJNQ6fU1SyGZw=;
        b=DrNIviFV6K2DeIPQu+x7CgiiN09CmBfnEbzCtzVqDWDyzOIPnazrj9orNceVNc+ES8
         is38X/EqHjRQQ9+17kCMwz9Qs1z99fp/hbSvX/j6M7D4J7yMxCfyUx3ebsFDrMocb3Nr
         8k7Im9Sx9uMiI1AXTDLvpVve/06yfxO/hEZ58JMRmF+ql1lF4p4/gR8S5rHLN4uPjJMK
         fdD7GbvrgIc8luG7A9wrYZNRcz015cFTzmy6M4N7qXuJUME7jYQ0Qnv0Z0PfXkIzvU3j
         YQxj4HTm1jKpVQrPcwiqZTLd8/G9xcCwXKNgobVY0i3iNroUw07HrVCXA/bZ2v9rzhkT
         PgCQ==
X-Gm-Message-State: ACgBeo0y6BjlUtzdpy69wqpDmvkCOthaJhHgwfhwUB4nWCzSo7GNq3Gg
        GmR+dycJ9L63GKpoZYimiL2xODhCEt8=
X-Google-Smtp-Source: 
 AA6agR7YO3t+S1zeEHDnbWyLng/WBHSvcO1sLUs4OX3LjmLgIOmsdJV/rf7DFe/k7o6NdHgNrfQf5Iw3a1Y=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:902:6808:b0:174:617b:1147 with SMTP id
 h8-20020a170902680800b00174617b1147mr19028713plk.102.1661901422175; Tue, 30
 Aug 2022 16:17:02 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue, 30 Aug 2022 23:16:14 +0000
In-Reply-To: <20220830231614.3580124-1-seanjc@google.com>
Mime-Version: 1.0
References: <20220830231614.3580124-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog
Message-ID: <20220830231614.3580124-28-seanjc@google.com>
Subject: [PATCH v5 27/27] KVM: x86: Allow force_emulation_prefix to be written
 without a reload
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jim Mattson <jmattson@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        Oliver Upton <oupton@google.com>,
        Peter Shier <pshier@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Allow force_emulation_prefix to be written by privileged userspace
without reloading KVM.  The param does not have any persistent affects
and is trivial to snapshot.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 45f295d35cc9..329998e9ee7a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -179,7 +179,7 @@ EXPORT_SYMBOL_GPL(enable_vmware_backdoor);
  */
 #define KVM_FEP_CLEAR_RFLAGS_RF	BIT(1)
 static int __read_mostly force_emulation_prefix;
-module_param(force_emulation_prefix, int, 0444);
+module_param(force_emulation_prefix, int, 0644);
=20
 int __read_mostly pi_inject_timer =3D -1;
 module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
@@ -7287,6 +7287,7 @@ static int kvm_can_emulate_insn(struct kvm_vcpu *vcpu=
, int emul_type,
 int handle_ud(struct kvm_vcpu *vcpu)
 {
 	static const char kvm_emulate_prefix[] =3D { __KVM_EMULATE_PREFIX };
+	int fep_flags =3D READ_ONCE(force_emulation_prefix);
 	int emul_type =3D EMULTYPE_TRAP_UD;
 	char sig[5]; /* ud2; .ascii "kvm" */
 	struct x86_exception e;
@@ -7294,11 +7295,11 @@ int handle_ud(struct kvm_vcpu *vcpu)
 	if (unlikely(!kvm_can_emulate_insn(vcpu, emul_type, NULL, 0)))
 		return 1;
=20
-	if (force_emulation_prefix &&
+	if (fep_flags &&
 	    kvm_read_guest_virt(vcpu, kvm_get_linear_rip(vcpu),
 				sig, sizeof(sig), &e) =3D=3D 0 &&
 	    memcmp(sig, kvm_emulate_prefix, sizeof(sig)) =3D=3D 0) {
-		if (force_emulation_prefix & KVM_FEP_CLEAR_RFLAGS_RF)
+		if (fep_flags & KVM_FEP_CLEAR_RFLAGS_RF)
 			kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) & ~X86_EFLAGS_RF);
 		kvm_rip_write(vcpu, kvm_rip_read(vcpu) + sizeof(sig));
 		emul_type =3D EMULTYPE_TRAP_UD_FORCED;
--=20
2.37.2.672.g94769d06f0-goog