From nobody Fri Sep 19 03:44:28 2025
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C5A2BC433FE
	for <linux-kernel@archiver.kernel.org>; Tue, 29 Nov 2022 19:41:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S237151AbiK2Tln (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 29 Nov 2022 14:41:43 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48718 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S237119AbiK2Tkp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 29 Nov 2022 14:40:45 -0500
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F3979FE0
        for <linux-kernel@vger.kernel.org>;
 Tue, 29 Nov 2022 11:37:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
        s=mimecast20190719; t=1669750674;
        h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
         to:to:cc:cc:mime-version:mime-version:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=OPg8L3jKUTetjU9lk920ZBLtdtJuDxmt2DbZj0atMtY=;
        b=KlihBtPydiHgF6pfaw70DrmeKs8gxSPe13GReK633VxiioNOS5nLofMftO8TSeVVfF66rJ
        HRxcKkd30d6ZrrE2Wi8uPDb+WWm83h3zmAkjAfi2r+Q0eE3Azi+YQQxek+94bARcge4BuA
        ENmpv0IY5G5HtQwLpdUHgonu2iXGBTs=
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
 [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-637-qVt9wRf9OFS2X7LYDsPYVA-1; Tue, 29 Nov 2022 14:37:52 -0500
X-MC-Unique: qVt9wRf9OFS2X7LYDsPYVA-1
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com
 [10.11.54.4])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A34A2800B23;
        Tue, 29 Nov 2022 19:37:51 +0000 (UTC)
Received: from localhost.localdomain (unknown [10.35.206.46])
        by smtp.corp.redhat.com (Postfix) with ESMTP id CC7942027061;
        Tue, 29 Nov 2022 19:37:47 +0000 (UTC)
From: Maxim Levitsky <mlevitsk@redhat.com>
To: kvm@vger.kernel.org
Cc: Sandipan Das <sandipan.das@amd.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Jim Mattson <jmattson@google.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Borislav Petkov <bp@alien8.de>,
        Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>,
        Josh Poimboeuf <jpoimboe@kernel.org>,
        Daniel Sneddon <daniel.sneddon@linux.intel.com>,
        Jiaxi Chen <jiaxi.chen@linux.intel.com>,
        Babu Moger <babu.moger@amd.com>, linux-kernel@vger.kernel.org,
        Jing Liu <jing2.liu@intel.com>,
        Wyes Karny <wyes.karny@amd.com>, x86@kernel.org,
        "H. Peter Anvin" <hpa@zytor.com>,
        Sean Christopherson <seanjc@google.com>,
        Maxim Levitsky <mlevitsk@redhat.com>
Subject: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection
 interface
Date: Tue, 29 Nov 2022 21:37:13 +0200
Message-Id: <20221129193717.513824-8-mlevitsk@redhat.com>
In-Reply-To: <20221129193717.513824-1-mlevitsk@redhat.com>
References: <20221129193717.513824-1-mlevitsk@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

This patch adds two new vendor callbacks:

- kvm_x86_get_hw_nmi_pending()
- kvm_x86_set_hw_nmi_pending()

Using those callbacks the KVM can take advantage of the
hardware's accelerated delayed NMI delivery (currently vNMI on SVM).

Once NMI is set to pending via this interface, it is assumed that
the hardware will deliver the NMI on its own to the guest once
all the x86 conditions for the NMI delivery are met.

Note that the 'kvm_x86_set_hw_nmi_pending()' callback is allowed
to fail, in which case a normal NMI injection will be attempted
when NMI can be delivered (possibly by using a NMI window).

With vNMI that can happen either if vNMI is already pending or
if a nested guest is running.

When the vNMI injection fails due to the 'vNMI is already pending'
condition, the new NMI will be dropped unless the new NMI can be
injected immediately, so no NMI window will be requested.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 ++
 arch/x86/include/asm/kvm_host.h    | 15 ++++++++++++-
 arch/x86/kvm/x86.c                 | 36 ++++++++++++++++++++++++++----
 3 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index abccd51dcfca1b..9e2db6cf7cc041 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -67,6 +67,8 @@ KVM_X86_OP(get_interrupt_shadow)
 KVM_X86_OP(patch_hypercall)
 KVM_X86_OP(inject_irq)
 KVM_X86_OP(inject_nmi)
+KVM_X86_OP_OPTIONAL_RET0(get_hw_nmi_pending)
+KVM_X86_OP_OPTIONAL_RET0(set_hw_nmi_pending)
 KVM_X86_OP(inject_exception)
 KVM_X86_OP(cancel_injection)
 KVM_X86_OP(interrupt_allowed)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 684a5519812fb2..46993ce61c92db 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -871,8 +871,13 @@ struct kvm_vcpu_arch {
 	u64 tsc_scaling_ratio; /* current scaling ratio */
=20
 	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
-	unsigned nmi_pending; /* NMI queued after currently running handler */
+
+	unsigned int nmi_pending; /*
+				   * NMI queued after currently running handler
+				   * (not including a hardware pending NMI (e.g vNMI))
+				   */
 	bool nmi_injected;    /* Trying to inject an NMI this entry */
+
 	bool smi_pending;    /* SMI queued after currently running handler */
 	u8 handling_intr_from_guest;
=20
@@ -1602,6 +1607,13 @@ struct kvm_x86_ops {
 	int (*nmi_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
 	bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
 	void (*set_nmi_mask)(struct kvm_vcpu *vcpu, bool masked);
+
+	/* returns true, if a NMI is pending injection on hardware level (e.g vNM=
I) */
+	bool (*get_hw_nmi_pending)(struct kvm_vcpu *vcpu);
+
+	/* attempts make a NMI pending via hardware interface (e.g vNMI) */
+	bool (*set_hw_nmi_pending)(struct kvm_vcpu *vcpu);
+
 	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
 	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
 	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
@@ -1964,6 +1976,7 @@ int kvm_pic_set_irq(struct kvm_pic *pic, int irq, int=
 irq_source_id, int level);
 void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
=20
 void kvm_inject_nmi(struct kvm_vcpu *vcpu);
+int kvm_get_total_nmi_pending(struct kvm_vcpu *vcpu);
=20
 void kvm_update_dr7(struct kvm_vcpu *vcpu);
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 85d2a12c214dda..3c30e3f1106f79 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5103,7 +5103,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct=
 kvm_vcpu *vcpu,
 	events->interrupt.shadow =3D static_call(kvm_x86_get_interrupt_shadow)(vc=
pu);
=20
 	events->nmi.injected =3D vcpu->arch.nmi_injected;
-	events->nmi.pending =3D vcpu->arch.nmi_pending !=3D 0;
+	events->nmi.pending =3D kvm_get_total_nmi_pending(vcpu) !=3D 0;
 	events->nmi.masked =3D static_call(kvm_x86_get_nmi_mask)(vcpu);
=20
 	/* events->sipi_vector is never valid when reporting to user space */
@@ -5191,9 +5191,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct=
 kvm_vcpu *vcpu,
=20
 	vcpu->arch.nmi_injected =3D events->nmi.injected;
 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
-		vcpu->arch.nmi_pending =3D events->nmi.pending;
+		atomic_add(events->nmi.pending, &vcpu->arch.nmi_queued);
+
 	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
=20
+	process_nmi(vcpu);
+
 	if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
 	    lapic_in_kernel(vcpu))
 		vcpu->arch.apic->sipi_vector =3D events->sipi_vector;
@@ -10008,6 +10011,10 @@ static int kvm_check_and_inject_events(struct kvm_=
vcpu *vcpu,
 static void process_nmi(struct kvm_vcpu *vcpu)
 {
 	unsigned limit =3D 2;
+	int nmi_to_queue =3D atomic_xchg(&vcpu->arch.nmi_queued, 0);
+
+	if (!nmi_to_queue)
+		return;
=20
 	/*
 	 * x86 is limited to one NMI running, and one NMI pending after it.
@@ -10015,13 +10022,34 @@ static void process_nmi(struct kvm_vcpu *vcpu)
 	 * Otherwise, allow two (and we'll inject the first one immediately).
 	 */
 	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
-		limit =3D 1;
+		limit--;
+
+	/* Also if there is already a NMI hardware queued to be injected,
+	 * decrease the limit again
+	 */
+	if (static_call(kvm_x86_get_hw_nmi_pending)(vcpu))
+		limit--;
=20
-	vcpu->arch.nmi_pending +=3D atomic_xchg(&vcpu->arch.nmi_queued, 0);
+	if (limit <=3D 0)
+		return;
+
+	/* Attempt to use hardware NMI queueing */
+	if (static_call(kvm_x86_set_hw_nmi_pending)(vcpu)) {
+		limit--;
+		nmi_to_queue--;
+	}
+
+	vcpu->arch.nmi_pending +=3D nmi_to_queue;
 	vcpu->arch.nmi_pending =3D min(vcpu->arch.nmi_pending, limit);
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
=20
+/* Return total number of NMIs pending injection to the VM */
+int kvm_get_total_nmi_pending(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.nmi_pending + static_call(kvm_x86_get_hw_nmi_pending)(v=
cpu);
+}
+
 void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
 				       unsigned long *vcpu_bitmap)
 {
--=20
2.26.3