From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A6B0DC4332F
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:19:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231724AbiI3KTl (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:19:41 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33580 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231295AbiI3KS4 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:56 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74E7C15ED1B;
        Fri, 30 Sep 2022 03:18:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533133; x=1696069133;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=yCKxYB3v7XKfg89qlIIEP7LxclyaPlWpeSsZcHXQB2E=;
  b=kpAPDaBD67pLm23zJtYfw7+eT/uckmjyyH3Lox4izjyj8QiBNgqbCA/f
   dg7e6be8rHe/ot5Eaqjh1lSvJx1JvrhBIfTrjsPjCHU0hRaG5aK4/eHQ+
   E8axyaPOc4HX7+VEmMMaEsE+4Z8Ei994GhzWklCWtejnkb7/W8VHN+tlC
   VRiaipWVF+O1zONoGFDrZ11WiqejT3vC/TwmpScfb1YyV6mzkRXjp5c/r
   u9PJiHnn5axZrPJOvHeY8XPcMCleoEUIYrNpiGEWwEehgFviE2/Oeto4Y
   qqJelYPSU21bare1W93HM9blypiUQaReIYsHk0ZqG4nIwcIjUcoFCJHoG
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207469"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207469"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:51 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807506"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807506"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:51 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>
Subject: [PATCH v9 001/105] KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap
 VMX and TDX
Date: Fri, 30 Sep 2022 03:16:55 -0700
Message-Id: 
 <5e767ece0348c4ee20015aea8b15c66e4ce7a70d.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM.  TDX doesn't allow VMM to operate VMCS directly.
Instead, TDX has its own data structures, and TDX SEAMCALL APIs for VMM to
indirectly operate those data structures.  This means we must have a TDX
version of kvm_x86_ops.

The existing global struct kvm_x86_ops already defines an interface which
fits with TDX.  But kvm_x86_ops is system-wide, not per-VM structure.  To
allow VMX to coexist with TDs, the kvm_x86_ops callbacks will have wrappers
"if (tdx) tdx_op() else vmx_op()" to switch VMX or TDX at run time.

To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX, which can coexist with VMX, i.e. KVM can run
both VMs and TDs.  Use 'vt' for the naming scheme as a nod to VT-x and as a
concatenation of VmxTdx.

The current code looks as follows.
In vmx.c
  static vmx_op() { ... }
  static struct kvm_x86_ops vmx_x86_ops =3D {
        .op =3D vmx_op,
  initialization code

The eventually converted code will look like
In vmx.c, keep the VMX operations.
  vmx_op() { ... }
  VMX initialization
In tdx.c, define the TDX operations.
  tdx_op() { ... }
  TDX initialization
In x86_ops.h, declare the VMX and TDX operations.
  vmx_op();
  tdx_op();
In main.c, define common wrappers for VMX and TDX.
  static vt_ops() { if (tdx) tdx_ops() else vmx_ops() }
  static struct kvm_x86_ops vt_x86_ops =3D {
        .op =3D vt_op,
  initialization to call VMX and TDX initialization

Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vxm_vcpu_free().

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/Makefile      |   2 +-
 arch/x86/kvm/vmx/main.c    | 155 ++++++++++++++++
 arch/x86/kvm/vmx/vmx.c     | 363 +++++++++++--------------------------
 arch/x86/kvm/vmx/x86_ops.h | 125 +++++++++++++
 4 files changed, 386 insertions(+), 259 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/main.c
 create mode 100644 arch/x86/kvm/vmx/x86_ops.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 30f244b64523..ee4d0999f20f 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -22,7 +22,7 @@ kvm-$(CONFIG_X86_64) +=3D mmu/tdp_iter.o mmu/tdp_mmu.o
 kvm-$(CONFIG_KVM_XEN)	+=3D xen.o
=20
 kvm-intel-y		+=3D vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
-			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
+			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o vmx/main.o
 kvm-intel-$(CONFIG_X86_SGX_KVM)	+=3D vmx/sgx.o
=20
 kvm-amd-y		+=3D svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o =
svm/sev.o
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
new file mode 100644
index 000000000000..636768f5b985
--- /dev/null
+++ b/arch/x86/kvm/vmx/main.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/moduleparam.h>
+
+#include "x86_ops.h"
+#include "vmx.h"
+#include "nested.h"
+#include "pmu.h"
+
+struct kvm_x86_ops vt_x86_ops __initdata =3D {
+	.name =3D "kvm_intel",
+
+	.hardware_unsetup =3D vmx_hardware_unsetup,
+	.check_processor_compatibility =3D vmx_check_processor_compatibility,
+
+	.hardware_enable =3D vmx_hardware_enable,
+	.hardware_disable =3D vmx_hardware_disable,
+	.has_emulated_msr =3D vmx_has_emulated_msr,
+
+	.vm_size =3D sizeof(struct kvm_vmx),
+	.vm_init =3D vmx_vm_init,
+	.vm_destroy =3D vmx_vm_destroy,
+
+	.vcpu_precreate =3D vmx_vcpu_precreate,
+	.vcpu_create =3D vmx_vcpu_create,
+	.vcpu_free =3D vmx_vcpu_free,
+	.vcpu_reset =3D vmx_vcpu_reset,
+
+	.prepare_switch_to_guest =3D vmx_prepare_switch_to_guest,
+	.vcpu_load =3D vmx_vcpu_load,
+	.vcpu_put =3D vmx_vcpu_put,
+
+	.update_exception_bitmap =3D vmx_update_exception_bitmap,
+	.get_msr_feature =3D vmx_get_msr_feature,
+	.get_msr =3D vmx_get_msr,
+	.set_msr =3D vmx_set_msr,
+	.get_segment_base =3D vmx_get_segment_base,
+	.get_segment =3D vmx_get_segment,
+	.set_segment =3D vmx_set_segment,
+	.get_cpl =3D vmx_get_cpl,
+	.get_cs_db_l_bits =3D vmx_get_cs_db_l_bits,
+	.set_cr0 =3D vmx_set_cr0,
+	.is_valid_cr4 =3D vmx_is_valid_cr4,
+	.set_cr4 =3D vmx_set_cr4,
+	.set_efer =3D vmx_set_efer,
+	.get_idt =3D vmx_get_idt,
+	.set_idt =3D vmx_set_idt,
+	.get_gdt =3D vmx_get_gdt,
+	.set_gdt =3D vmx_set_gdt,
+	.set_dr7 =3D vmx_set_dr7,
+	.sync_dirty_debug_regs =3D vmx_sync_dirty_debug_regs,
+	.cache_reg =3D vmx_cache_reg,
+	.get_rflags =3D vmx_get_rflags,
+	.set_rflags =3D vmx_set_rflags,
+	.get_if_flag =3D vmx_get_if_flag,
+
+	.flush_tlb_all =3D vmx_flush_tlb_all,
+	.flush_tlb_current =3D vmx_flush_tlb_current,
+	.flush_tlb_gva =3D vmx_flush_tlb_gva,
+	.flush_tlb_guest =3D vmx_flush_tlb_guest,
+
+	.vcpu_pre_run =3D vmx_vcpu_pre_run,
+	.vcpu_run =3D vmx_vcpu_run,
+	.handle_exit =3D vmx_handle_exit,
+	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
+	.update_emulated_instruction =3D vmx_update_emulated_instruction,
+	.set_interrupt_shadow =3D vmx_set_interrupt_shadow,
+	.get_interrupt_shadow =3D vmx_get_interrupt_shadow,
+	.patch_hypercall =3D vmx_patch_hypercall,
+	.inject_irq =3D vmx_inject_irq,
+	.inject_nmi =3D vmx_inject_nmi,
+	.queue_exception =3D vmx_queue_exception,
+	.cancel_injection =3D vmx_cancel_injection,
+	.interrupt_allowed =3D vmx_interrupt_allowed,
+	.nmi_allowed =3D vmx_nmi_allowed,
+	.get_nmi_mask =3D vmx_get_nmi_mask,
+	.set_nmi_mask =3D vmx_set_nmi_mask,
+	.enable_nmi_window =3D vmx_enable_nmi_window,
+	.enable_irq_window =3D vmx_enable_irq_window,
+	.update_cr8_intercept =3D vmx_update_cr8_intercept,
+	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
+	.set_apic_access_page_addr =3D vmx_set_apic_access_page_addr,
+	.refresh_apicv_exec_ctrl =3D vmx_refresh_apicv_exec_ctrl,
+	.load_eoi_exitmap =3D vmx_load_eoi_exitmap,
+	.apicv_post_state_restore =3D vmx_apicv_post_state_restore,
+	.check_apicv_inhibit_reasons =3D vmx_check_apicv_inhibit_reasons,
+	.hwapic_irr_update =3D vmx_hwapic_irr_update,
+	.hwapic_isr_update =3D vmx_hwapic_isr_update,
+	.guest_apic_has_interrupt =3D vmx_guest_apic_has_interrupt,
+	.sync_pir_to_irr =3D vmx_sync_pir_to_irr,
+	.deliver_interrupt =3D vmx_deliver_interrupt,
+	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
+
+	.set_tss_addr =3D vmx_set_tss_addr,
+	.set_identity_map_addr =3D vmx_set_identity_map_addr,
+	.get_mt_mask =3D vmx_get_mt_mask,
+
+	.get_exit_info =3D vmx_get_exit_info,
+
+	.vcpu_after_set_cpuid =3D vmx_vcpu_after_set_cpuid,
+
+	.has_wbinvd_exit =3D cpu_has_vmx_wbinvd_exit,
+
+	.get_l2_tsc_offset =3D vmx_get_l2_tsc_offset,
+	.get_l2_tsc_multiplier =3D vmx_get_l2_tsc_multiplier,
+	.write_tsc_offset =3D vmx_write_tsc_offset,
+	.write_tsc_multiplier =3D vmx_write_tsc_multiplier,
+
+	.load_mmu_pgd =3D vmx_load_mmu_pgd,
+
+	.check_intercept =3D vmx_check_intercept,
+	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
+
+	.request_immediate_exit =3D vmx_request_immediate_exit,
+
+	.sched_in =3D vmx_sched_in,
+
+	.cpu_dirty_log_size =3D PML_ENTITY_NUM,
+	.update_cpu_dirty_logging =3D vmx_update_cpu_dirty_logging,
+
+	.nested_ops =3D &vmx_nested_ops,
+
+	.pi_update_irte =3D vmx_pi_update_irte,
+	.pi_start_assignment =3D vmx_pi_start_assignment,
+
+#ifdef CONFIG_X86_64
+	.set_hv_timer =3D vmx_set_hv_timer,
+	.cancel_hv_timer =3D vmx_cancel_hv_timer,
+#endif
+
+	.setup_mce =3D vmx_setup_mce,
+
+	.smi_allowed =3D vmx_smi_allowed,
+	.enter_smm =3D vmx_enter_smm,
+	.leave_smm =3D vmx_leave_smm,
+	.enable_smi_window =3D vmx_enable_smi_window,
+
+	.can_emulate_instruction =3D vmx_can_emulate_instruction,
+	.apic_init_signal_blocked =3D vmx_apic_init_signal_blocked,
+	.migrate_timers =3D vmx_migrate_timers,
+
+	.msr_filter_changed =3D vmx_msr_filter_changed,
+	.complete_emulated_msr =3D kvm_complete_insn_gp,
+
+	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
+};
+
+struct kvm_x86_init_ops vt_init_ops __initdata =3D {
+	.cpu_has_kvm_support =3D vmx_cpu_has_kvm_support,
+	.disabled_by_bios =3D vmx_disabled_by_bios,
+	.hardware_setup =3D vmx_hardware_setup,
+	.handle_intel_pt_intr =3D NULL,
+
+	.runtime_ops =3D &vt_x86_ops,
+	.pmu_ops =3D &intel_pmu_ops,
+};
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 26f16e310869..6d5eb74fedfb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -66,6 +66,7 @@
 #include "vmcs12.h"
 #include "vmx.h"
 #include "x86.h"
+#include "x86_ops.h"
=20
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
@@ -1386,7 +1387,7 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cp=
u,
  * Switches to specified vcpu, until a matching vcpu_put(), but assumes
  * vcpu mutex is already taken.
  */
-static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -1397,7 +1398,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int =
cpu)
 	vmx->host_debugctlmsr =3D get_debugctlmsr();
 }
=20
-static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
+void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	vmx_vcpu_pi_put(vcpu);
=20
@@ -1451,7 +1452,7 @@ void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned l=
ong rflags)
 		vmx->emulation_required =3D vmx_emulation_required(vcpu);
 }
=20
-static bool vmx_get_if_flag(struct kvm_vcpu *vcpu)
+bool vmx_get_if_flag(struct kvm_vcpu *vcpu)
 {
 	return vmx_get_rflags(vcpu) & X86_EFLAGS_IF;
 }
@@ -1557,8 +1558,8 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, =
u64 data)
 	return 0;
 }
=20
-static bool vmx_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_ty=
pe,
-					void *insn, int insn_len)
+bool vmx_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
+				void *insn, int insn_len)
 {
 	/*
 	 * Emulation of instructions in SGX enclaves is impossible as RIP does
@@ -1642,7 +1643,7 @@ static int skip_emulated_instruction(struct kvm_vcpu =
*vcpu)
  * Recognizes a pending MTF VM-exit and records the nested state for later
  * delivery.
  */
-static void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu)
+void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu)
 {
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
@@ -1665,7 +1666,7 @@ static void vmx_update_emulated_instruction(struct kv=
m_vcpu *vcpu)
 		vmx->nested.mtf_pending =3D false;
 }
=20
-static int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu)
+int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
 	vmx_update_emulated_instruction(vcpu);
 	return skip_emulated_instruction(vcpu);
@@ -1684,7 +1685,7 @@ static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
 		vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
 }
=20
-static void vmx_queue_exception(struct kvm_vcpu *vcpu)
+void vmx_queue_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	unsigned nr =3D vcpu->arch.exception.nr;
@@ -1797,12 +1798,12 @@ u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
 	return kvm_caps.default_tsc_scaling_ratio;
 }
=20
-static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	vmcs_write64(TSC_OFFSET, offset);
 }
=20
-static void vmx_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
+void vmx_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
 {
 	vmcs_write64(TSC_MULTIPLIER, multiplier);
 }
@@ -1826,7 +1827,7 @@ static inline bool vmx_feature_control_msr_valid(stru=
ct kvm_vcpu *vcpu,
 	return !(val & ~valid_bits);
 }
=20
-static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
+int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 {
 	switch (msr->index) {
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
@@ -1846,7 +1847,7 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *=
msr)
  * Returns 0 on success, non-0 otherwise.
  * Assumes vcpu_load() was already called.
  */
-static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	struct vmx_uret_msr *msr;
@@ -2024,7 +2025,7 @@ static u64 vcpu_supported_debugctl(struct kvm_vcpu *v=
cpu)
  * Returns 0 on success, non-0 otherwise.
  * Assumes vcpu_load() was already called.
  */
-static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	struct vmx_uret_msr *msr;
@@ -2358,7 +2359,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct =
msr_data *msr_info)
 	return ret;
 }
=20
-static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
+void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 {
 	unsigned long guest_owned_bits;
=20
@@ -2401,12 +2402,12 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, en=
um kvm_reg reg)
 	}
 }
=20
-static __init int cpu_has_kvm_support(void)
+__init int vmx_cpu_has_kvm_support(void)
 {
 	return cpu_has_vmx();
 }
=20
-static __init int vmx_disabled_by_bios(void)
+__init int vmx_disabled_by_bios(void)
 {
 	return !boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
 	       !boot_cpu_has(X86_FEATURE_VMX);
@@ -2432,7 +2433,7 @@ static int kvm_cpu_vmxon(u64 vmxon_pointer)
 	return -EFAULT;
 }
=20
-static int vmx_hardware_enable(void)
+int vmx_hardware_enable(void)
 {
 	int cpu =3D raw_smp_processor_id();
 	u64 phys_addr =3D __pa(per_cpu(vmxarea, cpu));
@@ -2473,7 +2474,7 @@ static void vmclear_local_loaded_vmcss(void)
 		__loaded_vmcs_clear(v);
 }
=20
-static void vmx_hardware_disable(void)
+void vmx_hardware_disable(void)
 {
 	vmclear_local_loaded_vmcss();
=20
@@ -3072,7 +3073,7 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
=20
 #endif
=20
-static void vmx_flush_tlb_all(struct kvm_vcpu *vcpu)
+void vmx_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -3102,7 +3103,7 @@ static inline int vmx_get_current_vpid(struct kvm_vcp=
u *vcpu)
 	return to_vmx(vcpu)->vpid;
 }
=20
-static void vmx_flush_tlb_current(struct kvm_vcpu *vcpu)
+void vmx_flush_tlb_current(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
 	u64 root_hpa =3D mmu->root.hpa;
@@ -3118,7 +3119,7 @@ static void vmx_flush_tlb_current(struct kvm_vcpu *vc=
pu)
 		vpid_sync_context(vmx_get_current_vpid(vcpu));
 }
=20
-static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
+void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
 {
 	/*
 	 * vpid_sync_vcpu_addr() is a nop if vpid=3D=3D0, see the comment in
@@ -3127,7 +3128,7 @@ static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, =
gva_t addr)
 	vpid_sync_vcpu_addr(vmx_get_current_vpid(vcpu), addr);
 }
=20
-static void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu)
+void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu)
 {
 	/*
 	 * vpid_sync_context() is a nop if vpid=3D=3D0, e.g. if enable_vpid=3D=3D=
0 or a
@@ -3282,8 +3283,7 @@ u64 construct_eptp(struct kvm_vcpu *vcpu, hpa_t root_=
hpa, int root_level)
 	return eptp;
 }
=20
-static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
-			     int root_level)
+void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_leve=
l)
 {
 	struct kvm *kvm =3D vcpu->kvm;
 	bool update_guest_cr3 =3D true;
@@ -3311,8 +3311,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, h=
pa_t root_hpa,
 		vmcs_writel(GUEST_CR3, guest_cr3);
 }
=20
-
-static bool vmx_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+bool vmx_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
 	/*
 	 * We operate under the default treatment of SMM, so VMX cannot be
@@ -3428,7 +3427,7 @@ void vmx_get_segment(struct kvm_vcpu *vcpu, struct kv=
m_segment *var, int seg)
 	var->g =3D (ar >> 15) & 1;
 }
=20
-static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
 {
 	struct kvm_segment s;
=20
@@ -3508,14 +3507,14 @@ void __vmx_set_segment(struct kvm_vcpu *vcpu, struc=
t kvm_segment *var, int seg)
 	vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
 }
=20
-static void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var=
, int seg)
+void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg)
 {
 	__vmx_set_segment(vcpu, var, seg);
=20
 	to_vmx(vcpu)->emulation_required =3D vmx_emulation_required(vcpu);
 }
=20
-static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
+void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
 {
 	u32 ar =3D vmx_read_guest_seg_ar(to_vmx(vcpu), VCPU_SREG_CS);
=20
@@ -3523,25 +3522,25 @@ static void vmx_get_cs_db_l_bits(struct kvm_vcpu *v=
cpu, int *db, int *l)
 	*l =3D (ar >> 13) & 1;
 }
=20
-static void vmx_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+void vmx_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	dt->size =3D vmcs_read32(GUEST_IDTR_LIMIT);
 	dt->address =3D vmcs_readl(GUEST_IDTR_BASE);
 }
=20
-static void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	vmcs_write32(GUEST_IDTR_LIMIT, dt->size);
 	vmcs_writel(GUEST_IDTR_BASE, dt->address);
 }
=20
-static void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	dt->size =3D vmcs_read32(GUEST_GDTR_LIMIT);
 	dt->address =3D vmcs_readl(GUEST_GDTR_BASE);
 }
=20
-static void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	vmcs_write32(GUEST_GDTR_LIMIT, dt->size);
 	vmcs_writel(GUEST_GDTR_BASE, dt->address);
@@ -4039,7 +4038,7 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcp=
u)
 	}
 }
=20
-static bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
+bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	void *vapic_page;
@@ -4059,7 +4058,7 @@ static bool vmx_guest_apic_has_interrupt(struct kvm_v=
cpu *vcpu)
 	return ((rvi & 0xf0) > (vppr & 0xf0));
 }
=20
-static void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
+void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	u32 i;
@@ -4200,8 +4199,8 @@ static int vmx_deliver_posted_interrupt(struct kvm_vc=
pu *vcpu, int vector)
 	return 0;
 }
=20
-static void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mod=
e,
-				  int trig_mode, int vector)
+void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector)
 {
 	struct kvm_vcpu *vcpu =3D apic->vcpu;
=20
@@ -4344,7 +4343,7 @@ static u32 vmx_vmexit_ctrl(void)
 		~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER);
 }
=20
-static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -4584,7 +4583,7 @@ static int vmx_alloc_ipiv_pid_table(struct kvm *kvm)
 	return 0;
 }
=20
-static int vmx_vcpu_precreate(struct kvm *kvm)
+int vmx_vcpu_precreate(struct kvm *kvm)
 {
 	return vmx_alloc_ipiv_pid_table(kvm);
 }
@@ -4736,7 +4735,7 @@ static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 	vmx->pi_desc.sn =3D 1;
 }
=20
-static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -4795,12 +4794,12 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, b=
ool init_event)
 	vmx_update_fb_clear_dis(vcpu, vmx);
 }
=20
-static void vmx_enable_irq_window(struct kvm_vcpu *vcpu)
+void vmx_enable_irq_window(struct kvm_vcpu *vcpu)
 {
 	exec_controls_setbit(to_vmx(vcpu), CPU_BASED_INTR_WINDOW_EXITING);
 }
=20
-static void vmx_enable_nmi_window(struct kvm_vcpu *vcpu)
+void vmx_enable_nmi_window(struct kvm_vcpu *vcpu)
 {
 	if (!enable_vnmi ||
 	    vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & GUEST_INTR_STATE_STI) {
@@ -4811,7 +4810,7 @@ static void vmx_enable_nmi_window(struct kvm_vcpu *vc=
pu)
 	exec_controls_setbit(to_vmx(vcpu), CPU_BASED_NMI_WINDOW_EXITING);
 }
=20
-static void vmx_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
+void vmx_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	uint32_t intr;
@@ -4839,7 +4838,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu, boo=
l reinjected)
 	vmx_clear_hlt(vcpu);
 }
=20
-static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
+void vmx_inject_nmi(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -4917,7 +4916,7 @@ bool vmx_nmi_blocked(struct kvm_vcpu *vcpu)
 		 GUEST_INTR_STATE_NMI));
 }
=20
-static int vmx_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+int vmx_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	if (to_vmx(vcpu)->nested.nested_run_pending)
 		return -EBUSY;
@@ -4939,7 +4938,7 @@ bool vmx_interrupt_blocked(struct kvm_vcpu *vcpu)
 		(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
 }
=20
-static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	if (to_vmx(vcpu)->nested.nested_run_pending)
 		return -EBUSY;
@@ -4954,7 +4953,7 @@ static int vmx_interrupt_allowed(struct kvm_vcpu *vcp=
u, bool for_injection)
 	return !vmx_interrupt_blocked(vcpu);
 }
=20
-static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
+int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
 {
 	void __user *ret;
=20
@@ -4974,7 +4973,7 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned=
 int addr)
 	return init_rmode_tss(kvm, ret);
 }
=20
-static int vmx_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
+int vmx_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
 {
 	to_kvm_vmx(kvm)->ept_identity_map_addr =3D ident_addr;
 	return 0;
@@ -5253,8 +5252,7 @@ static int handle_io(struct kvm_vcpu *vcpu)
 	return kvm_fast_pio(vcpu, size, port, in);
 }
=20
-static void
-vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
+void vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
 {
 	/*
 	 * Patch in the VMCALL instruction:
@@ -5464,7 +5462,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
 	return kvm_complete_insn_gp(vcpu, err);
 }
=20
-static void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
+void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
 {
 	get_debugreg(vcpu->arch.db[0], 0);
 	get_debugreg(vcpu->arch.db[1], 1);
@@ -5483,7 +5481,7 @@ static void vmx_sync_dirty_debug_regs(struct kvm_vcpu=
 *vcpu)
 	set_debugreg(DR6_RESERVED, 6);
 }
=20
-static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
+void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
 {
 	vmcs_writel(GUEST_DR7, val);
 }
@@ -5754,7 +5752,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu=
 *vcpu)
 	return 1;
 }
=20
-static int vmx_vcpu_pre_run(struct kvm_vcpu *vcpu)
+int vmx_vcpu_pre_run(struct kvm_vcpu *vcpu)
 {
 	if (vmx_emulation_required_with_pending_exception(vcpu)) {
 		kvm_prepare_emulation_failure_exit(vcpu);
@@ -6018,9 +6016,8 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu=
 *vcpu) =3D {
 static const int kvm_vmx_max_exit_handlers =3D
 	ARRAY_SIZE(kvm_vmx_exit_handlers);
=20
-static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
-			      u64 *info1, u64 *info2,
-			      u32 *intr_info, u32 *error_code)
+void vmx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -6463,7 +6460,7 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, f=
astpath_t exit_fastpath)
 	return 0;
 }
=20
-static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
+int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 {
 	int ret =3D __vmx_handle_exit(vcpu, exit_fastpath);
=20
@@ -6551,7 +6548,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vc=
pu)
 		: "eax", "ebx", "ecx", "edx");
 }
=20
-static void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int i=
rr)
+void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 {
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
 	int tpr_threshold;
@@ -6621,7 +6618,7 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
 	vmx_update_msr_bitmap_x2apic(vcpu);
 }
=20
-static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
+void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
 {
 	struct page *page;
=20
@@ -6649,7 +6646,7 @@ static void vmx_set_apic_access_page_addr(struct kvm_=
vcpu *vcpu)
 	put_page(page);
 }
=20
-static void vmx_hwapic_isr_update(int max_isr)
+void vmx_hwapic_isr_update(int max_isr)
 {
 	u16 status;
 	u8 old;
@@ -6683,7 +6680,7 @@ static void vmx_set_rvi(int vector)
 	}
 }
=20
-static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
+void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 {
 	/*
 	 * When running L2, updating RVI is only relevant when
@@ -6697,7 +6694,7 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vc=
pu, int max_irr)
 		vmx_set_rvi(max_irr);
 }
=20
-static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	int max_irr;
@@ -6743,7 +6740,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 	return max_irr;
 }
=20
-static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitm=
ap)
+void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 {
 	if (!kvm_vcpu_apicv_active(vcpu))
 		return;
@@ -6754,7 +6751,7 @@ static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcp=
u, u64 *eoi_exit_bitmap)
 	vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
 }
=20
-static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
+void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -6827,7 +6824,7 @@ static void handle_external_interrupt_irqoff(struct k=
vm_vcpu *vcpu)
 	vcpu->arch.at_instruction_boundary =3D true;
 }
=20
-static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -6844,7 +6841,7 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *v=
cpu)
  * The kvm parameter can be NULL (module initialization, or invocation bef=
ore
  * VM creation). Be sure to check the kvm parameter before using it.
  */
-static bool vmx_has_emulated_msr(struct kvm *kvm, u32 index)
+bool vmx_has_emulated_msr(struct kvm *kvm, u32 index)
 {
 	switch (index) {
 	case MSR_IA32_SMBASE:
@@ -6965,7 +6962,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *=
vmx)
 				  IDT_VECTORING_ERROR_CODE);
 }
=20
-static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
+void vmx_cancel_injection(struct kvm_vcpu *vcpu)
 {
 	__vmx_complete_interrupts(vcpu,
 				  vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
@@ -7099,7 +7096,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vc=
pu *vcpu,
 	guest_state_exit_irqoff();
 }
=20
-static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
+fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	unsigned long cr3, cr4;
@@ -7265,7 +7262,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	return vmx_exit_handlers_fastpath(vcpu);
 }
=20
-static void vmx_vcpu_free(struct kvm_vcpu *vcpu)
+void vmx_vcpu_free(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -7276,7 +7273,7 @@ static void vmx_vcpu_free(struct kvm_vcpu *vcpu)
 	free_loaded_vmcs(vmx->loaded_vmcs);
 }
=20
-static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
+int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 {
 	struct vmx_uret_msr *tsx_ctrl;
 	struct vcpu_vmx *vmx;
@@ -7385,7 +7382,7 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible.=
 See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/h=
w-vuln/l1tf.html for details.\n"
 #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation d=
isabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/d=
oc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
=20
-static int vmx_vm_init(struct kvm *kvm)
+int vmx_vm_init(struct kvm *kvm)
 {
 	if (!ple_gap)
 		kvm->arch.pause_in_guest =3D true;
@@ -7416,7 +7413,7 @@ static int vmx_vm_init(struct kvm *kvm)
 	return 0;
 }
=20
-static int vmx_check_processor_compatibility(void)
+int vmx_check_processor_compatibility(void)
 {
 	struct vmcs_config vmcs_conf;
 	struct vmx_capability vmx_cap;
@@ -7441,7 +7438,7 @@ static int vmx_check_processor_compatibility(void)
 	return 0;
 }
=20
-static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	u8 cache;
=20
@@ -7613,7 +7610,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 		vmx->pt_desc.ctl_bitmask &=3D ~(0xfULL << (32 + i * 4));
 }
=20
-static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
+void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -7723,7 +7720,7 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
 }
=20
-static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu)
+void vmx_request_immediate_exit(struct kvm_vcpu *vcpu)
 {
 	to_vmx(vcpu)->req_immediate_exit =3D true;
 }
@@ -7762,10 +7759,10 @@ static int vmx_check_intercept_io(struct kvm_vcpu *=
vcpu,
 	return intercept ? X86EMUL_UNHANDLEABLE : X86EMUL_CONTINUE;
 }
=20
-static int vmx_check_intercept(struct kvm_vcpu *vcpu,
-			       struct x86_instruction_info *info,
-			       enum x86_intercept_stage stage,
-			       struct x86_exception *exception)
+int vmx_check_intercept(struct kvm_vcpu *vcpu,
+		       struct x86_instruction_info *info,
+		       enum x86_intercept_stage stage,
+		       struct x86_exception *exception)
 {
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
=20
@@ -7830,8 +7827,8 @@ static inline int u64_shl_div_u64(u64 a, unsigned int=
 shift,
 	return 0;
 }
=20
-static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
-			    bool *expired)
+int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
+		bool *expired)
 {
 	struct vcpu_vmx *vmx;
 	u64 tscl, guest_tscl, delta_tsc, lapic_timer_advance_cycles;
@@ -7870,13 +7867,13 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, =
u64 guest_deadline_tsc,
 	return 0;
 }
=20
-static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
+void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
 {
 	to_vmx(vcpu)->hv_deadline_tsc =3D -1;
 }
 #endif
=20
-static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
+void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
 	if (!kvm_pause_in_guest(vcpu->kvm))
 		shrink_ple_window(vcpu);
@@ -7902,7 +7899,7 @@ void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vc=
pu)
 		secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_ENABLE_PML);
 }
=20
-static void vmx_setup_mce(struct kvm_vcpu *vcpu)
+void vmx_setup_mce(struct kvm_vcpu *vcpu)
 {
 	if (vcpu->arch.mcg_cap & MCG_LMCE_P)
 		to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |=3D
@@ -7912,7 +7909,7 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 			~FEAT_CTL_LMCE_ENABLED;
 }
=20
-static int vmx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+int vmx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	/* we need a nested vmexit to enter SMM, postpone if run is pending */
 	if (to_vmx(vcpu)->nested.nested_run_pending)
@@ -7920,7 +7917,7 @@ static int vmx_smi_allowed(struct kvm_vcpu *vcpu, boo=
l for_injection)
 	return !is_smm(vcpu);
 }
=20
-static int vmx_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+int vmx_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -7941,7 +7938,7 @@ static int vmx_enter_smm(struct kvm_vcpu *vcpu, char =
*smstate)
 	return 0;
 }
=20
-static int vmx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
+int vmx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	int ret;
@@ -7962,17 +7959,17 @@ static int vmx_leave_smm(struct kvm_vcpu *vcpu, con=
st char *smstate)
 	return 0;
 }
=20
-static void vmx_enable_smi_window(struct kvm_vcpu *vcpu)
+void vmx_enable_smi_window(struct kvm_vcpu *vcpu)
 {
 	/* RSM will cause a vmexit anyway.  */
 }
=20
-static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
+bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
 {
 	return to_vmx(vcpu)->nested.vmxon && !is_guest_mode(vcpu);
 }
=20
-static void vmx_migrate_timers(struct kvm_vcpu *vcpu)
+void vmx_migrate_timers(struct kvm_vcpu *vcpu)
 {
 	if (is_guest_mode(vcpu)) {
 		struct hrtimer *timer =3D &to_vmx(vcpu)->nested.preemption_timer;
@@ -7982,7 +7979,7 @@ static void vmx_migrate_timers(struct kvm_vcpu *vcpu)
 	}
 }
=20
-static void vmx_hardware_unsetup(void)
+void vmx_hardware_unsetup(void)
 {
 	kvm_set_posted_intr_wakeup_handler(NULL);
=20
@@ -7992,7 +7989,7 @@ static void vmx_hardware_unsetup(void)
 	free_kvm_area();
 }
=20
-static bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
+bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
 {
 	ulong supported =3D BIT(APICV_INHIBIT_REASON_DISABLE) |
 			  BIT(APICV_INHIBIT_REASON_ABSENT) |
@@ -8004,151 +8001,13 @@ static bool vmx_check_apicv_inhibit_reasons(enum k=
vm_apicv_inhibit reason)
 	return supported & BIT(reason);
 }
=20
-static void vmx_vm_destroy(struct kvm *kvm)
+void vmx_vm_destroy(struct kvm *kvm)
 {
 	struct kvm_vmx *kvm_vmx =3D to_kvm_vmx(kvm);
=20
 	free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm=
));
 }
=20
-static struct kvm_x86_ops vmx_x86_ops __initdata =3D {
-	.name =3D "kvm_intel",
-
-	.hardware_unsetup =3D vmx_hardware_unsetup,
-
-	.check_processor_compatibility =3D vmx_check_processor_compatibility,
-	.hardware_enable =3D vmx_hardware_enable,
-	.hardware_disable =3D vmx_hardware_disable,
-	.has_emulated_msr =3D vmx_has_emulated_msr,
-
-	.vm_size =3D sizeof(struct kvm_vmx),
-	.vm_init =3D vmx_vm_init,
-	.vm_destroy =3D vmx_vm_destroy,
-
-	.vcpu_precreate =3D vmx_vcpu_precreate,
-	.vcpu_create =3D vmx_vcpu_create,
-	.vcpu_free =3D vmx_vcpu_free,
-	.vcpu_reset =3D vmx_vcpu_reset,
-
-	.prepare_switch_to_guest =3D vmx_prepare_switch_to_guest,
-	.vcpu_load =3D vmx_vcpu_load,
-	.vcpu_put =3D vmx_vcpu_put,
-
-	.update_exception_bitmap =3D vmx_update_exception_bitmap,
-	.get_msr_feature =3D vmx_get_msr_feature,
-	.get_msr =3D vmx_get_msr,
-	.set_msr =3D vmx_set_msr,
-	.get_segment_base =3D vmx_get_segment_base,
-	.get_segment =3D vmx_get_segment,
-	.set_segment =3D vmx_set_segment,
-	.get_cpl =3D vmx_get_cpl,
-	.get_cs_db_l_bits =3D vmx_get_cs_db_l_bits,
-	.set_cr0 =3D vmx_set_cr0,
-	.is_valid_cr4 =3D vmx_is_valid_cr4,
-	.set_cr4 =3D vmx_set_cr4,
-	.set_efer =3D vmx_set_efer,
-	.get_idt =3D vmx_get_idt,
-	.set_idt =3D vmx_set_idt,
-	.get_gdt =3D vmx_get_gdt,
-	.set_gdt =3D vmx_set_gdt,
-	.set_dr7 =3D vmx_set_dr7,
-	.sync_dirty_debug_regs =3D vmx_sync_dirty_debug_regs,
-	.cache_reg =3D vmx_cache_reg,
-	.get_rflags =3D vmx_get_rflags,
-	.set_rflags =3D vmx_set_rflags,
-	.get_if_flag =3D vmx_get_if_flag,
-
-	.flush_tlb_all =3D vmx_flush_tlb_all,
-	.flush_tlb_current =3D vmx_flush_tlb_current,
-	.flush_tlb_gva =3D vmx_flush_tlb_gva,
-	.flush_tlb_guest =3D vmx_flush_tlb_guest,
-
-	.vcpu_pre_run =3D vmx_vcpu_pre_run,
-	.vcpu_run =3D vmx_vcpu_run,
-	.handle_exit =3D vmx_handle_exit,
-	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
-	.update_emulated_instruction =3D vmx_update_emulated_instruction,
-	.set_interrupt_shadow =3D vmx_set_interrupt_shadow,
-	.get_interrupt_shadow =3D vmx_get_interrupt_shadow,
-	.patch_hypercall =3D vmx_patch_hypercall,
-	.inject_irq =3D vmx_inject_irq,
-	.inject_nmi =3D vmx_inject_nmi,
-	.queue_exception =3D vmx_queue_exception,
-	.cancel_injection =3D vmx_cancel_injection,
-	.interrupt_allowed =3D vmx_interrupt_allowed,
-	.nmi_allowed =3D vmx_nmi_allowed,
-	.get_nmi_mask =3D vmx_get_nmi_mask,
-	.set_nmi_mask =3D vmx_set_nmi_mask,
-	.enable_nmi_window =3D vmx_enable_nmi_window,
-	.enable_irq_window =3D vmx_enable_irq_window,
-	.update_cr8_intercept =3D vmx_update_cr8_intercept,
-	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
-	.set_apic_access_page_addr =3D vmx_set_apic_access_page_addr,
-	.refresh_apicv_exec_ctrl =3D vmx_refresh_apicv_exec_ctrl,
-	.load_eoi_exitmap =3D vmx_load_eoi_exitmap,
-	.apicv_post_state_restore =3D vmx_apicv_post_state_restore,
-	.check_apicv_inhibit_reasons =3D vmx_check_apicv_inhibit_reasons,
-	.hwapic_irr_update =3D vmx_hwapic_irr_update,
-	.hwapic_isr_update =3D vmx_hwapic_isr_update,
-	.guest_apic_has_interrupt =3D vmx_guest_apic_has_interrupt,
-	.sync_pir_to_irr =3D vmx_sync_pir_to_irr,
-	.deliver_interrupt =3D vmx_deliver_interrupt,
-	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
-
-	.set_tss_addr =3D vmx_set_tss_addr,
-	.set_identity_map_addr =3D vmx_set_identity_map_addr,
-	.get_mt_mask =3D vmx_get_mt_mask,
-
-	.get_exit_info =3D vmx_get_exit_info,
-
-	.vcpu_after_set_cpuid =3D vmx_vcpu_after_set_cpuid,
-
-	.has_wbinvd_exit =3D cpu_has_vmx_wbinvd_exit,
-
-	.get_l2_tsc_offset =3D vmx_get_l2_tsc_offset,
-	.get_l2_tsc_multiplier =3D vmx_get_l2_tsc_multiplier,
-	.write_tsc_offset =3D vmx_write_tsc_offset,
-	.write_tsc_multiplier =3D vmx_write_tsc_multiplier,
-
-	.load_mmu_pgd =3D vmx_load_mmu_pgd,
-
-	.check_intercept =3D vmx_check_intercept,
-	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
-
-	.request_immediate_exit =3D vmx_request_immediate_exit,
-
-	.sched_in =3D vmx_sched_in,
-
-	.cpu_dirty_log_size =3D PML_ENTITY_NUM,
-	.update_cpu_dirty_logging =3D vmx_update_cpu_dirty_logging,
-
-	.nested_ops =3D &vmx_nested_ops,
-
-	.pi_update_irte =3D vmx_pi_update_irte,
-	.pi_start_assignment =3D vmx_pi_start_assignment,
-
-#ifdef CONFIG_X86_64
-	.set_hv_timer =3D vmx_set_hv_timer,
-	.cancel_hv_timer =3D vmx_cancel_hv_timer,
-#endif
-
-	.setup_mce =3D vmx_setup_mce,
-
-	.smi_allowed =3D vmx_smi_allowed,
-	.enter_smm =3D vmx_enter_smm,
-	.leave_smm =3D vmx_leave_smm,
-	.enable_smi_window =3D vmx_enable_smi_window,
-
-	.can_emulate_instruction =3D vmx_can_emulate_instruction,
-	.apic_init_signal_blocked =3D vmx_apic_init_signal_blocked,
-	.migrate_timers =3D vmx_migrate_timers,
-
-	.msr_filter_changed =3D vmx_msr_filter_changed,
-	.complete_emulated_msr =3D kvm_complete_insn_gp,
-
-	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
-};
-
 static unsigned int vmx_handle_intel_pt_intr(void)
 {
 	struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu();
@@ -8214,9 +8073,7 @@ static void __init vmx_setup_me_spte_mask(void)
 	kvm_mmu_set_me_spte_mask(0, me_mask);
 }
=20
-static struct kvm_x86_init_ops vmx_init_ops __initdata;
-
-static __init int hardware_setup(void)
+__init int vmx_hardware_setup(void)
 {
 	unsigned long host_bndcfgs;
 	struct desc_ptr dt;
@@ -8276,16 +8133,16 @@ static __init int hardware_setup(void)
 	 * using the APIC_ACCESS_ADDR VMCS field.
 	 */
 	if (!flexpriority_enabled)
-		vmx_x86_ops.set_apic_access_page_addr =3D NULL;
+		vt_x86_ops.set_apic_access_page_addr =3D NULL;
=20
 	if (!cpu_has_vmx_tpr_shadow())
-		vmx_x86_ops.update_cr8_intercept =3D NULL;
+		vt_x86_ops.update_cr8_intercept =3D NULL;
=20
 #if IS_ENABLED(CONFIG_HYPERV)
 	if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
 	    && enable_ept) {
-		vmx_x86_ops.tlb_remote_flush =3D hv_remote_flush_tlb;
-		vmx_x86_ops.tlb_remote_flush_with_range =3D
+		vt_x86_ops.tlb_remote_flush =3D hv_remote_flush_tlb;
+		vt_x86_ops.tlb_remote_flush_with_range =3D
 				hv_remote_flush_tlb_with_range;
 	}
 #endif
@@ -8301,7 +8158,7 @@ static __init int hardware_setup(void)
 	if (!cpu_has_vmx_apicv())
 		enable_apicv =3D 0;
 	if (!enable_apicv)
-		vmx_x86_ops.sync_pir_to_irr =3D NULL;
+		vt_x86_ops.sync_pir_to_irr =3D NULL;
=20
 	if (!enable_apicv || !cpu_has_vmx_ipiv())
 		enable_ipiv =3D false;
@@ -8337,7 +8194,7 @@ static __init int hardware_setup(void)
 		enable_pml =3D 0;
=20
 	if (!enable_pml)
-		vmx_x86_ops.cpu_dirty_log_size =3D 0;
+		vt_x86_ops.cpu_dirty_log_size =3D 0;
=20
 	if (!cpu_has_vmx_preemption_timer())
 		enable_preemption_timer =3D false;
@@ -8364,9 +8221,9 @@ static __init int hardware_setup(void)
 	}
=20
 	if (!enable_preemption_timer) {
-		vmx_x86_ops.set_hv_timer =3D NULL;
-		vmx_x86_ops.cancel_hv_timer =3D NULL;
-		vmx_x86_ops.request_immediate_exit =3D __kvm_request_immediate_exit;
+		vt_x86_ops.set_hv_timer =3D NULL;
+		vt_x86_ops.cancel_hv_timer =3D NULL;
+		vt_x86_ops.request_immediate_exit =3D __kvm_request_immediate_exit;
 	}
=20
 	kvm_caps.supported_mce_cap |=3D MCG_LMCE_P;
@@ -8377,9 +8234,9 @@ static __init int hardware_setup(void)
 	if (!enable_ept || !enable_pmu || !cpu_has_vmx_intel_pt())
 		pt_mode =3D PT_MODE_SYSTEM;
 	if (pt_mode =3D=3D PT_MODE_HOST_GUEST)
-		vmx_init_ops.handle_intel_pt_intr =3D vmx_handle_intel_pt_intr;
+		vt_init_ops.handle_intel_pt_intr =3D vmx_handle_intel_pt_intr;
 	else
-		vmx_init_ops.handle_intel_pt_intr =3D NULL;
+		vt_init_ops.handle_intel_pt_intr =3D NULL;
=20
 	setup_default_sgx_lepubkeyhash();
=20
@@ -8403,16 +8260,6 @@ static __init int hardware_setup(void)
 	return r;
 }
=20
-static struct kvm_x86_init_ops vmx_init_ops __initdata =3D {
-	.cpu_has_kvm_support =3D cpu_has_kvm_support,
-	.disabled_by_bios =3D vmx_disabled_by_bios,
-	.hardware_setup =3D hardware_setup,
-	.handle_intel_pt_intr =3D NULL,
-
-	.runtime_ops =3D &vmx_x86_ops,
-	.pmu_ops =3D &intel_pmu_ops,
-};
-
 static void vmx_cleanup_l1d_flush(void)
 {
 	if (vmx_l1d_flush_pages) {
@@ -8490,7 +8337,7 @@ static int __init vmx_init(void)
 		}
=20
 		if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
-			vmx_x86_ops.enable_direct_tlbflush
+			vt_x86_ops.enable_direct_tlbflush
 				=3D hv_enable_direct_tlbflush;
=20
 	} else {
@@ -8498,8 +8345,8 @@ static int __init vmx_init(void)
 	}
 #endif
=20
-	r =3D kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
-		     __alignof__(struct vcpu_vmx), THIS_MODULE);
+	r =3D kvm_init(&vt_init_ops, sizeof(struct vcpu_vmx),
+		__alignof__(struct vcpu_vmx), THIS_MODULE);
 	if (r)
 		return r;
=20
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
new file mode 100644
index 000000000000..85da24ecb25f
--- /dev/null
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -0,0 +1,125 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_VMX_X86_OPS_H
+#define __KVM_X86_VMX_X86_OPS_H
+
+#include <linux/kvm_host.h>
+
+#include <asm/virtext.h>
+
+#include "x86.h"
+
+__init int vmx_cpu_has_kvm_support(void);
+__init int vmx_disabled_by_bios(void);
+__init int vmx_hardware_setup(void);
+
+extern struct kvm_x86_ops vt_x86_ops __initdata;
+extern struct kvm_x86_init_ops vt_init_ops __initdata;
+
+void vmx_hardware_unsetup(void);
+int vmx_check_processor_compatibility(void);
+int vmx_hardware_enable(void);
+void vmx_hardware_disable(void);
+int vmx_vm_init(struct kvm *kvm);
+void vmx_vm_destroy(struct kvm *kvm);
+int vmx_vcpu_precreate(struct kvm *kvm);
+int vmx_vcpu_create(struct kvm_vcpu *vcpu);
+int vmx_vcpu_pre_run(struct kvm_vcpu *vcpu);
+fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu);
+void vmx_vcpu_free(struct kvm_vcpu *vcpu);
+void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
+void vmx_vcpu_put(struct kvm_vcpu *vcpu);
+int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath);
+void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu);
+int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu);
+void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu);
+int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
+int vmx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection);
+int vmx_enter_smm(struct kvm_vcpu *vcpu, char *smstate);
+int vmx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate);
+void vmx_enable_smi_window(struct kvm_vcpu *vcpu);
+bool vmx_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
+				void *insn, int insn_len);
+int vmx_check_intercept(struct kvm_vcpu *vcpu,
+			struct x86_instruction_info *info,
+			enum x86_intercept_stage stage,
+			struct x86_exception *exception);
+bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu);
+void vmx_migrate_timers(struct kvm_vcpu *vcpu);
+void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
+void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu);
+bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason);
+void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr);
+void vmx_hwapic_isr_update(int max_isr);
+bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu);
+int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu);
+void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector);
+void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
+bool vmx_has_emulated_msr(struct kvm *kvm, u32 index);
+void vmx_msr_filter_changed(struct kvm_vcpu *vcpu);
+void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
+void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu);
+int vmx_get_msr_feature(struct kvm_msr_entry *msr);
+int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
+u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg);
+void vmx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg);
+void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg);
+int vmx_get_cpl(struct kvm_vcpu *vcpu);
+void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
+void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
+void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_leve=
l);
+void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
+bool vmx_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
+int vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer);
+void vmx_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
+void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
+void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
+void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
+void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val);
+void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu);
+void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
+unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu);
+void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
+bool vmx_get_if_flag(struct kvm_vcpu *vcpu);
+void vmx_flush_tlb_all(struct kvm_vcpu *vcpu);
+void vmx_flush_tlb_current(struct kvm_vcpu *vcpu);
+void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr);
+void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu);
+void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask);
+u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu);
+void vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall);
+void vmx_inject_irq(struct kvm_vcpu *vcpu, bool reinjected);
+void vmx_inject_nmi(struct kvm_vcpu *vcpu);
+void vmx_queue_exception(struct kvm_vcpu *vcpu);
+void vmx_cancel_injection(struct kvm_vcpu *vcpu);
+int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection);
+int vmx_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection);
+bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu);
+void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
+void vmx_enable_nmi_window(struct kvm_vcpu *vcpu);
+void vmx_enable_irq_window(struct kvm_vcpu *vcpu);
+void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr);
+void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu);
+void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu);
+void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
+int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr);
+int vmx_set_identity_map_addr(struct kvm *kvm, u64 ident_addr);
+u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
+void vmx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
+u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu);
+u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu);
+void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset);
+void vmx_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier);
+void vmx_request_immediate_exit(struct kvm_vcpu *vcpu);
+void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu);
+void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
+#ifdef CONFIG_X86_64
+int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
+		bool *expired);
+void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu);
+#endif
+void vmx_setup_mce(struct kvm_vcpu *vcpu);
+
+#endif /* __KVM_X86_VMX_X86_OPS_H */
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E02E4C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:19:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231627AbiI3KTN (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:19:13 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33560 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231274AbiI3KS4 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:56 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD6BC15ED2D;
        Fri, 30 Sep 2022 03:18:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533134; x=1696069134;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=kzvGioDqr+rWmmxO+5tT4VLP7I0/691CCzSrwXRqIXc=;
  b=NHvCTxVVNpd+PLXuu+U/um5La6jWDYqVLvW+qnfk0QSRHq5oJ+MApBRi
   Pv57zmVCwtHW6X5eVQzFg71TCsqs/YrVs9InnejZGSCv2i6neL9Ujodrh
   vhqgELwzBmChGSPWkYIpZhxoGLi/O8a38EYr3oWh4h6x7f7EkbPM3Ru6v
   5UKw/T6vU8HKR7YRbKEuGl4E5rSM1l4hRQgjwcNTPC+gIS8fq5DWZ2rCU
   eVbWJXUGRF9PMNq/xSzaVeHo4iCAckzKM3hPJ3aATIl9QNY+jcfcl6X9H
   hoN/WE8D6yyU7NpZg/rafH0AaXVkGFha5PzO06gcdsMdGORwUOzYKUVVK
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207470"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207470"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:51 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807509"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807509"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:51 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 002/105] KVM: x86: Refactor KVM VMX module init/exit
 functions
Date: Fri, 30 Sep 2022 03:16:56 -0700
Message-Id: 
 <686a3b29589bbbe0d1d3ebee434008eab8e26599.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Currently, KVM VMX module initialization/exit functions are a single
function each.  Refactor KVM VMX module initialization functions into KVM
common part and VMX part so that TDX specific part can be added cleanly.
Opportunistically refactor module exit function as well.

The current module initialization flow is, 1.) calculate the sizes of VMX
kvm structure and VMX vcpu structure, 2.) hyper-v specific initialization
3.) report those sizes to the KVM common layer and KVM common
initialization, and 4.) VMX specific system-wide initialization.

Refactor the KVM VMX module initialization function into functions with a
wrapper function to separate VMX logic in vmx.c from a file, main.c, common
among VMX and TDX.  We have a wrapper function, "vt_init() {vmx kvm/vcpu
size calculation; hv_vp_assist_page_init(); kvm_init(); vmx_init(); }" in
main.c, and hv_vp_assist_page_init() and vmx_init() in vmx.c.
hv_vp_assist_page_init() initializes hyper-v specific assist pages,
kvm_init() does system-wide initialization of the KVM common layer, and
vmx_init() does system-wide VMX initialization.

The KVM architecture common layer allocates struct kvm with reported size
for architecture-specific code.  The KVM VMX module defines its structure
as struct vmx_kvm { struct kvm; VMX specific members;} and uses it as
struct vmx kvm.  Similar for vcpu structure. TDX KVM patches will define
TDX specific kvm and vcpu structures, add tdx_pre_kvm_init() to report the
sizes of them to the KVM common layer.

The current module exit function is also a single function, a combination
of VMX specific logic and common KVM logic.  Refactor it into VMX specific
logic and KVM common logic.  This is just refactoring to keep the VMX
specific logic in vmx.c from main.c.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 37 +++++++++++++++
 arch/x86/kvm/vmx/vmx.c     | 95 ++++++++++++++++++--------------------
 arch/x86/kvm/vmx/x86_ops.h |  5 ++
 3 files changed, 88 insertions(+), 49 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 636768f5b985..5b4aff7a31b6 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -153,3 +153,40 @@ struct kvm_x86_init_ops vt_init_ops __initdata =3D {
 	.runtime_ops =3D &vt_x86_ops,
 	.pmu_ops =3D &intel_pmu_ops,
 };
+
+static int __init vt_init(void)
+{
+	unsigned int vcpu_size, vcpu_align;
+	int r;
+
+	vt_x86_ops.vm_size =3D sizeof(struct kvm_vmx);
+	vcpu_size =3D sizeof(struct vcpu_vmx);
+	vcpu_align =3D __alignof__(struct vcpu_vmx);
+
+	hv_vp_assist_page_init();
+
+	r =3D kvm_init(&vt_init_ops, vcpu_size, vcpu_align, THIS_MODULE);
+	if (r)
+		goto err_vmx_post_exit;
+
+	r =3D vmx_init();
+	if (r)
+		goto err_kvm_exit;
+
+	return 0;
+
+err_kvm_exit:
+	kvm_exit();
+err_vmx_post_exit:
+	hv_vp_assist_page_exit();
+	return r;
+}
+module_init(vt_init);
+
+static void vt_exit(void)
+{
+	vmx_exit();
+	kvm_exit();
+	hv_vp_assist_page_exit();
+}
+module_exit(vt_exit);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6d5eb74fedfb..7bad73d5822e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8270,48 +8270,8 @@ static void vmx_cleanup_l1d_flush(void)
 	l1tf_vmx_mitigation =3D VMENTER_L1D_FLUSH_AUTO;
 }
=20
-static void vmx_exit(void)
+void __init hv_vp_assist_page_init(void)
 {
-#ifdef CONFIG_KEXEC_CORE
-	RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
-	synchronize_rcu();
-#endif
-
-	kvm_exit();
-
-#if IS_ENABLED(CONFIG_HYPERV)
-	if (static_branch_unlikely(&enable_evmcs)) {
-		int cpu;
-		struct hv_vp_assist_page *vp_ap;
-		/*
-		 * Reset everything to support using non-enlightened VMCS
-		 * access later (e.g. when we reload the module with
-		 * enlightened_vmcs=3D0)
-		 */
-		for_each_online_cpu(cpu) {
-			vp_ap =3D	hv_get_vp_assist_page(cpu);
-
-			if (!vp_ap)
-				continue;
-
-			vp_ap->nested_control.features.directhypercall =3D 0;
-			vp_ap->current_nested_vmcs =3D 0;
-			vp_ap->enlighten_vmentry =3D 0;
-		}
-
-		static_branch_disable(&enable_evmcs);
-	}
-#endif
-	vmx_cleanup_l1d_flush();
-
-	allow_smaller_maxphyaddr =3D false;
-}
-module_exit(vmx_exit);
-
-static int __init vmx_init(void)
-{
-	int r, cpu;
-
 #if IS_ENABLED(CONFIG_HYPERV)
 	/*
 	 * Enlightened VMCS usage should be recommended and the host needs
@@ -8322,6 +8282,7 @@ static int __init vmx_init(void)
 	    ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED &&
 	    (ms_hyperv.nested_features & HV_X64_ENLIGHTENED_VMCS_VERSION) >=3D
 	    KVM_EVMCS_VERSION) {
+		int cpu;
=20
 		/* Check that we have assist pages on all online CPUs */
 		for_each_online_cpu(cpu) {
@@ -8344,11 +8305,38 @@ static int __init vmx_init(void)
 		enlightened_vmcs =3D false;
 	}
 #endif
+}
=20
-	r =3D kvm_init(&vt_init_ops, sizeof(struct vcpu_vmx),
-		__alignof__(struct vcpu_vmx), THIS_MODULE);
-	if (r)
-		return r;
+void hv_vp_assist_page_exit(void)
+{
+#if IS_ENABLED(CONFIG_HYPERV)
+	if (static_branch_unlikely(&enable_evmcs)) {
+		int cpu;
+		struct hv_vp_assist_page *vp_ap;
+		/*
+		 * Reset everything to support using non-enlightened VMCS
+		 * access later (e.g. when we reload the module with
+		 * enlightened_vmcs=3D0)
+		 */
+		for_each_online_cpu(cpu) {
+			vp_ap =3D	hv_get_vp_assist_page(cpu);
+
+			if (!vp_ap)
+				continue;
+
+			vp_ap->nested_control.features.directhypercall =3D 0;
+			vp_ap->current_nested_vmcs =3D 0;
+			vp_ap->enlighten_vmentry =3D 0;
+		}
+
+		static_branch_disable(&enable_evmcs);
+	}
+#endif
+}
+
+int __init vmx_init(void)
+{
+	int r, cpu;
=20
 	/*
 	 * Must be called after kvm_init() so enable_ept is properly set
@@ -8358,10 +8346,8 @@ static int __init vmx_init(void)
 	 * mitigation mode.
 	 */
 	r =3D vmx_setup_l1d_flush(vmentry_l1d_flush_param);
-	if (r) {
-		vmx_exit();
+	if (r)
 		return r;
-	}
=20
 	vmx_setup_fb_clear_ctrl();
=20
@@ -8387,4 +8373,15 @@ static int __init vmx_init(void)
=20
 	return 0;
 }
-module_init(vmx_init);
+
+void vmx_exit(void)
+{
+#ifdef CONFIG_KEXEC_CORE
+	RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
+	synchronize_rcu();
+#endif
+
+	vmx_cleanup_l1d_flush();
+
+	allow_smaller_maxphyaddr =3D false;
+}
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 85da24ecb25f..c0ff4b88e8f9 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -8,6 +8,11 @@
=20
 #include "x86.h"
=20
+void __init hv_vp_assist_page_init(void);
+void hv_vp_assist_page_exit(void);
+int __init vmx_init(void);
+void vmx_exit(void);
+
 __init int vmx_cpu_has_kvm_support(void);
 __init int vmx_disabled_by_bios(void);
 __init int vmx_hardware_setup(void);
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 28BE5C43217
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:19:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231665AbiI3KTX (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:19:23 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33566 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231293AbiI3KS4 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:56 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22DAF15ED32;
        Fri, 30 Sep 2022 03:18:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533135; x=1696069135;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ykbD6skEMkTMVQqZNywDYvJqulhVuR9nM0nBEyHFShc=;
  b=BFlk0Ey/8Fe1P/D2FwiB8aon5QJJqpY+yCrL3cGR3/q3l6eRoSCAIaNQ
   OKSRREHysBQ+/ljkSNlGzinG8B0W1AFqxsydWvB7U+OvOCcsUPpMEgjGv
   hmyN2mPG3RWUYcIBUNQInjhOHsU/TBMZ1B7cps4YyjGi59sSnK9bJtjfZ
   2DX6sq0e5JZwXNgC1D4ZS2mgojRVBX2DpEN1Eb/GsmXvFN47oNQR9I5SA
   C+qCO25kAzyZy25DSFdCVXPWGtty21cSL6EEvAUy7ezrXwYcamKhFmIGu
   9rO65oo2YeFdN1rmtAPQtfcwkwxpxWc3YxSkIGBilcE5DEA66Dqg1wRNB
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207471"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207471"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:51 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807512"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807512"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:51 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 003/105] KVM: TDX: Add placeholders for TDX VM/vcpu
 structure
Date: Fri, 30 Sep 2022 03:16:57 -0700
Message-Id: 
 <89c43c69e4b84889faff9db18eed2e760b3720c0.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add placeholders TDX VM/vcpu structure that overlays with VMX VM/vcpu
structures.  Initialize VM structure size and vcpu size/align so that x86
KVM common code knows those size irrespective of VMX or TDX.  Those
structures will be populated as guest creation logic develops.

Add helper functions to check if the VM is guest TD and add conversion
functions between KVM VM/VCPU and TDX VM/VCPU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c |  8 +++---
 arch/x86/kvm/vmx/tdx.h  | 54 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/tdx.h

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 5b4aff7a31b6..c8e8b0212a2a 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -5,6 +5,7 @@
 #include "vmx.h"
 #include "nested.h"
 #include "pmu.h"
+#include "tdx.h"
=20
 struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
@@ -159,9 +160,10 @@ static int __init vt_init(void)
 	unsigned int vcpu_size, vcpu_align;
 	int r;
=20
-	vt_x86_ops.vm_size =3D sizeof(struct kvm_vmx);
-	vcpu_size =3D sizeof(struct vcpu_vmx);
-	vcpu_align =3D __alignof__(struct vcpu_vmx);
+	vt_x86_ops.vm_size =3D max(sizeof(struct kvm_vmx), sizeof(struct kvm_tdx)=
);
+	vcpu_size =3D max(sizeof(struct vcpu_vmx), sizeof(struct vcpu_tdx));
+	vcpu_align =3D max(__alignof__(struct vcpu_vmx),
+			__alignof__(struct vcpu_tdx));
=20
 	hv_vp_assist_page_init();
=20
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
new file mode 100644
index 000000000000..060bf48ec3d6
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_TDX_H
+#define __KVM_X86_TDX_H
+
+#ifdef CONFIG_INTEL_TDX_HOST
+struct kvm_tdx {
+	struct kvm kvm;
+	/* TDX specific members follow. */
+};
+
+struct vcpu_tdx {
+	struct kvm_vcpu	vcpu;
+	/* TDX specific members follow. */
+};
+
+static inline bool is_td(struct kvm *kvm)
+{
+	/*
+	 * TDX VM type isn't defined yet.
+	 * return kvm->arch.vm_type =3D=3D KVM_X86_TDX_VM;
+	 */
+	return false;
+}
+
+static inline bool is_td_vcpu(struct kvm_vcpu *vcpu)
+{
+	return is_td(vcpu->kvm);
+}
+
+static inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm)
+{
+	return container_of(kvm, struct kvm_tdx, kvm);
+}
+
+static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu)
+{
+	return container_of(vcpu, struct vcpu_tdx, vcpu);
+}
+#else
+struct kvm_tdx {
+	struct kvm kvm;
+};
+
+struct vcpu_tdx {
+	struct kvm_vcpu	vcpu;
+};
+
+static inline bool is_td(struct kvm *kvm) { return false; }
+static inline bool is_td_vcpu(struct kvm_vcpu *vcpu) { return false; }
+static inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm) { return NULL; }
+static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu) { return NULL=
; }
+#endif /* CONFIG_INTEL_TDX_HOST */
+
+#endif /* __KVM_X86_TDX_H */
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8FDBCC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:20:18 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231784AbiI3KUN (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:20:13 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33562 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231321AbiI3KS5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:57 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3219166F04;
        Fri, 30 Sep 2022 03:18:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533136; x=1696069136;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=8/N612s2otAPL0XEk2gCRytgeaTWomPXtV0sy3rdTtY=;
  b=akaI7+hMBovds+pCtFgix1gHEU3GR9yBWNi9tcdgc32CShWYxc5MgyCm
   9WKZmiqnMgO8uDtMWfoT4CN1wlgj8XG8WJTvHYwoLbKvxB1A+QA+CimR7
   BQy3NSZ/dXXnYRN3I9IVZX05cMoysAuU0yRN11CqcMH7BP0EfObruX4QU
   /bwtMfzJEtD6Y0Ek5OuVnsV9tqgPQr+NWFW3Qs99IVRStf1IuSig7J3Hu
   FOXw5mqn9GQ/ELPQ8xQoHetl8Q0Bvviz/z6g7CLfqpb1sHtfBFcaCu/1K
   wk3HA4iB+irouQPEB0Q9f+BNxKGrJXCgdhuDNxRBiTHvpBqpB6sMJ2VRh
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207473"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207473"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:52 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807515"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807515"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:51 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 004/105] x86/virt/tdx: Add a helper function to return
 system wide info about TDX module
Date: Fri, 30 Sep 2022 03:16:58 -0700
Message-Id: 
 <0d48c2abb189ba01b17eeb31363a2ddb852bb259.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX KVM needs system-wide information about the TDX module, struct
tdsysinfo_struct.  Add a helper function tdx_get_sysinfo() to return it
instead of KVM getting it with various error checks.  Move out the struct
definition about it to common place arch/x86/include/asm/tdx.h.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/tdx.h  | 55 +++++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.c | 20 +++++++++++---
 arch/x86/virt/vmx/tdx/tdx.h | 52 -----------------------------------
 3 files changed, 71 insertions(+), 56 deletions(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 801f6e10b2db..dfea0dd71bc1 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -89,11 +89,66 @@ static inline long tdx_kvm_hypercall(unsigned int nr, u=
nsigned long p1,
 #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
=20
 #ifdef CONFIG_INTEL_TDX_HOST
+struct tdx_cpuid_config {
+	u32	leaf;
+	u32	sub_leaf;
+	u32	eax;
+	u32	ebx;
+	u32	ecx;
+	u32	edx;
+} __packed;
+
+#define TDSYSINFO_STRUCT_SIZE		1024
+#define TDSYSINFO_STRUCT_ALIGNMENT	1024
+
+struct tdsysinfo_struct {
+	/* TDX-SEAM Module Info */
+	u32	attributes;
+	u32	vendor_id;
+	u32	build_date;
+	u16	build_num;
+	u16	minor_version;
+	u16	major_version;
+	u8	reserved0[14];
+	/* Memory Info */
+	u16	max_tdmrs;
+	u16	max_reserved_per_tdmr;
+	u16	pamt_entry_size;
+	u8	reserved1[10];
+	/* Control Struct Info */
+	u16	tdcs_base_size;
+	u8	reserved2[2];
+	u16	tdvps_base_size;
+	u8	tdvps_xfam_dependent_size;
+	u8	reserved3[9];
+	/* TD Capabilities */
+	u64	attributes_fixed0;
+	u64	attributes_fixed1;
+	u64	xfam_fixed0;
+	u64	xfam_fixed1;
+	u8	reserved4[32];
+	u32	num_cpuid_config;
+	/*
+	 * The actual number of CPUID_CONFIG depends on above
+	 * 'num_cpuid_config'.  The size of 'struct tdsysinfo_struct'
+	 * is 1024B defined by TDX architecture.  Use a union with
+	 * specific padding to make 'sizeof(struct tdsysinfo_struct)'
+	 * equal to 1024.
+	 */
+	union {
+		struct tdx_cpuid_config	cpuid_configs[0];
+		u8			reserved5[892];
+	};
+} __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
+
 bool platform_tdx_enabled(void);
 int tdx_init(void);
+const struct tdsysinfo_struct *tdx_get_sysinfo(void);
 #else	/* !CONFIG_INTEL_TDX_HOST */
 static inline bool platform_tdx_enabled(void) { return false; }
 static inline int tdx_init(void)  { return -ENODEV; }
+struct tdsysinfo_struct;
+static inline const struct tdsysinfo_struct *tdx_get_sysinfo(void) { retur=
n NULL; }
 #endif	/* CONFIG_INTEL_TDX_HOST */
=20
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 787b26de8f53..4054a917ca97 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -354,9 +354,9 @@ static int check_cmrs(struct cmr_info *cmr_array, int *=
actual_cmr_num)
 	return 0;
 }
=20
-static int tdx_get_sysinfo(struct tdsysinfo_struct *tdsysinfo,
-			   struct cmr_info *cmr_array,
-			   int *actual_cmr_num)
+static int __tdx_get_sysinfo(struct tdsysinfo_struct *tdsysinfo,
+			     struct cmr_info *cmr_array,
+			     int *actual_cmr_num)
 {
 	struct tdx_module_output out;
 	u64 ret;
@@ -383,6 +383,18 @@ static int tdx_get_sysinfo(struct tdsysinfo_struct *td=
sysinfo,
 	return check_cmrs(cmr_array, actual_cmr_num);
 }
=20
+const struct tdsysinfo_struct *tdx_get_sysinfo(void)
+{
+	const struct tdsysinfo_struct *r =3D NULL;
+
+	mutex_lock(&tdx_module_lock);
+	if (tdx_module_status =3D=3D TDX_MODULE_INITIALIZED)
+		r =3D &tdx_sysinfo;
+	mutex_unlock(&tdx_module_lock);
+	return r;
+}
+EXPORT_SYMBOL_GPL(tdx_get_sysinfo);
+
 /*
  * Skip the memory region below 1MB.  Return true if the entire
  * region is skipped.  Otherwise, the updated range is returned.
@@ -1106,7 +1118,7 @@ static int init_tdx_module(void)
 	if (ret)
 		goto out;
=20
-	ret =3D tdx_get_sysinfo(&tdx_sysinfo, tdx_cmr_array, &tdx_cmr_num);
+	ret =3D __tdx_get_sysinfo(&tdx_sysinfo, tdx_cmr_array, &tdx_cmr_num);
 	if (ret)
 		goto out;
=20
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index e0309558be13..c08e4ee2d0bf 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -65,58 +65,6 @@ struct cmr_info {
 #define MAX_CMRS			32
 #define CMR_INFO_ARRAY_ALIGNMENT	512
=20
-struct cpuid_config {
-	u32	leaf;
-	u32	sub_leaf;
-	u32	eax;
-	u32	ebx;
-	u32	ecx;
-	u32	edx;
-} __packed;
-
-#define TDSYSINFO_STRUCT_SIZE		1024
-#define TDSYSINFO_STRUCT_ALIGNMENT	1024
-
-struct tdsysinfo_struct {
-	/* TDX-SEAM Module Info */
-	u32	attributes;
-	u32	vendor_id;
-	u32	build_date;
-	u16	build_num;
-	u16	minor_version;
-	u16	major_version;
-	u8	reserved0[14];
-	/* Memory Info */
-	u16	max_tdmrs;
-	u16	max_reserved_per_tdmr;
-	u16	pamt_entry_size;
-	u8	reserved1[10];
-	/* Control Struct Info */
-	u16	tdcs_base_size;
-	u8	reserved2[2];
-	u16	tdvps_base_size;
-	u8	tdvps_xfam_dependent_size;
-	u8	reserved3[9];
-	/* TD Capabilities */
-	u64	attributes_fixed0;
-	u64	attributes_fixed1;
-	u64	xfam_fixed0;
-	u64	xfam_fixed1;
-	u8	reserved4[32];
-	u32	num_cpuid_config;
-	/*
-	 * The actual number of CPUID_CONFIG depends on above
-	 * 'num_cpuid_config'.  The size of 'struct tdsysinfo_struct'
-	 * is 1024B defined by TDX architecture.  Use a union with
-	 * specific padding to make 'sizeof(struct tdsysinfo_struct)'
-	 * equal to 1024.
-	 */
-	union {
-		struct cpuid_config	cpuid_configs[0];
-		u8			reserved5[892];
-	};
-} __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
-
 struct tdmr_reserved_area {
 	u64 offset;
 	u64 size;
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0D405C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:20:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231769AbiI3KUB (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:20:01 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33596 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231324AbiI3KS5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:57 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6888E166F07;
        Fri, 30 Sep 2022 03:18:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533136; x=1696069136;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=JnFXrC+6aqQ8dcetgBc+3o8D+CJRWBcwhKWhQqdhSrg=;
  b=FLCZVETmGdgyvdAO7n2c+BN9v1YRz1sSdUIabzyQ+97lPGQ2ISefU7/Y
   ibQ0MLVVpyMvfG0aIW11YYPUIOa/RLa+tFvMXdylBmPj23KpzvVAjUAVP
   LTBUZc9YkJBWaKi5uLl1YOlJ6coIB1Q7iPYacR0fgFC3elT5xXPvK8KHQ
   dV1h9Y6jQOHGLaO49MYFkK7E5VrXD2fgo6rL3TlE/YfpGe+hUKWJnLg75
   +WzJuzOe2PfV6VDWGX9xrS2EEExGHOvr1q5MckDEhP31fpnlpaSrtZsg5
   kyk/7Wbd6E/U6AL94Udsd8c5SweLK/kQcoaBJAQOqifdccLntu+ClORaV
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207477"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207477"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:52 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807518"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807518"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:52 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 005/105] KVM: TDX: Initialize the TDX module when loading
 the KVM intel kernel module
Date: Fri, 30 Sep 2022 03:16:59 -0700
Message-Id: 
 <9b5afc53b5ba35d04a666eaa31939bdb2dc3ec66.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX requires several initialization steps for KVM to create guest TDs.
Detect CPU feature, enable VMX (TDX is based on VMX), detect the TDX module
availability, and initialize it.  This patch implements those steps.

There are several options on when to initialize the TDX module.  A.) kernel
module loading time, B.) the first guest TD creation time.  A.) was chosen.
With B.), a user may hit an error of the TDX initialization when trying to
create the first guest TD.  The machine that fails to initialize the TDX
module can't boot any guest TD further.  Such failure is undesirable and a
surprise because the user expects that the machine can accommodate guest
TD, but actually not.  So A.) is better than B.).

Introduce a module parameter, enable_tdx, to explicitly enable TDX KVM
support.  It's off by default to keep same behavior for those who don't use
TDX.  Implement hardware_setup method to detect TDX feature of CPU.
Because TDX requires all present CPUs to enable VMX (VMXON).  The x86
specific kvm_arch_post_hardware_enable_setup overrides the existing weak
symbol of kvm_arch_post_hardware_enable_setup which is called at the KVM
module initialization.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/Makefile      |  1 +
 arch/x86/kvm/vmx/main.c    | 18 ++++++-
 arch/x86/kvm/vmx/tdx.c     | 99 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c     | 39 +++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  9 ++++
 arch/x86/kvm/x86.c         | 32 +++++++-----
 6 files changed, 186 insertions(+), 12 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/tdx.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index ee4d0999f20f..e2c05195cb95 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -24,6 +24,7 @@ kvm-$(CONFIG_KVM_XEN)	+=3D xen.o
 kvm-intel-y		+=3D vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
 			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o vmx/main.o
 kvm-intel-$(CONFIG_X86_SGX_KVM)	+=3D vmx/sgx.o
+kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+=3D vmx/tdx.o
=20
 kvm-amd-y		+=3D svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o =
svm/sev.o
=20
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index c8e8b0212a2a..1535f55fc312 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -7,6 +7,22 @@
 #include "pmu.h"
 #include "tdx.h"
=20
+static bool __read_mostly enable_tdx =3D IS_ENABLED(CONFIG_INTEL_TDX_HOST);
+module_param_named(tdx, enable_tdx, bool, 0444);
+
+static __init int vt_hardware_setup(void)
+{
+	int ret;
+
+	ret =3D vmx_hardware_setup();
+	if (ret)
+		return ret;
+
+	enable_tdx =3D enable_tdx && !tdx_hardware_setup(&vt_x86_ops);
+
+	return 0;
+}
+
 struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
=20
@@ -148,7 +164,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 struct kvm_x86_init_ops vt_init_ops __initdata =3D {
 	.cpu_has_kvm_support =3D vmx_cpu_has_kvm_support,
 	.disabled_by_bios =3D vmx_disabled_by_bios,
-	.hardware_setup =3D vmx_hardware_setup,
+	.hardware_setup =3D vt_hardware_setup,
 	.handle_intel_pt_intr =3D NULL,
=20
 	.runtime_ops =3D &vt_x86_ops,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
new file mode 100644
index 000000000000..6f8451ff8980
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/cpu.h>
+
+#include <asm/tdx.h>
+
+#include "capabilities.h"
+#include "x86_ops.h"
+#include "tdx.h"
+#include "x86.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt) "tdx: " fmt
+
+#define TDX_MAX_NR_CPUID_CONFIGS					\
+	((sizeof(struct tdsysinfo_struct) -				\
+		offsetof(struct tdsysinfo_struct, cpuid_configs))	\
+		/ sizeof(struct tdx_cpuid_config))
+
+struct tdx_capabilities {
+	u8 tdcs_nr_pages;
+	u8 tdvpx_nr_pages;
+
+	u64 attrs_fixed0;
+	u64 attrs_fixed1;
+	u64 xfam_fixed0;
+	u64 xfam_fixed1;
+
+	u32 nr_cpuid_configs;
+	struct tdx_cpuid_config cpuid_configs[TDX_MAX_NR_CPUID_CONFIGS];
+};
+
+/* Capabilities of KVM + the TDX module. */
+static struct tdx_capabilities tdx_caps;
+
+static int __init tdx_module_setup(void)
+{
+	const struct tdsysinfo_struct *tdsysinfo;
+	int ret =3D 0;
+
+	BUILD_BUG_ON(sizeof(*tdsysinfo) !=3D 1024);
+	BUILD_BUG_ON(TDX_MAX_NR_CPUID_CONFIGS !=3D 37);
+
+	ret =3D tdx_init();
+	if (ret) {
+		pr_info("Failed to initialize TDX module.\n");
+		return ret;
+	}
+
+	tdsysinfo =3D tdx_get_sysinfo();
+	if (tdsysinfo->num_cpuid_config > TDX_MAX_NR_CPUID_CONFIGS)
+		return -EIO;
+
+	tdx_caps =3D (struct tdx_capabilities) {
+		.tdcs_nr_pages =3D tdsysinfo->tdcs_base_size / PAGE_SIZE,
+		/*
+		 * TDVPS =3D TDVPR(4K page) + TDVPX(multiple 4K pages).
+		 * -1 for TDVPR.
+		 */
+		.tdvpx_nr_pages =3D tdsysinfo->tdvps_base_size / PAGE_SIZE - 1,
+		.attrs_fixed0 =3D tdsysinfo->attributes_fixed0,
+		.attrs_fixed1 =3D tdsysinfo->attributes_fixed1,
+		.xfam_fixed0 =3D	tdsysinfo->xfam_fixed0,
+		.xfam_fixed1 =3D tdsysinfo->xfam_fixed1,
+		.nr_cpuid_configs =3D tdsysinfo->num_cpuid_config,
+	};
+	if (!memcpy(tdx_caps.cpuid_configs, tdsysinfo->cpuid_configs,
+			tdsysinfo->num_cpuid_config *
+			sizeof(struct tdx_cpuid_config)))
+		return -EIO;
+
+	pr_info("kvm: TDX is supported. x86 phys bits %d\n",
+		boot_cpu_data.x86_phys_bits);
+
+	return 0;
+}
+
+int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops)
+{
+	int r;
+
+	if (!enable_ept) {
+		pr_warn("Cannot enable TDX with EPT disabled\n");
+		return -EINVAL;
+	}
+
+	/* MOVDIR64B instruction is needed. */
+	if (!static_cpu_has(X86_FEATURE_MOVDIR64B)) {
+		pr_warn("Cannot enable TDX with MOVDIR64B supported ");
+		return -ENODEV;
+	}
+
+	/* TDX requires VMX. */
+	r =3D vmxon_all();
+	if (!r)
+		r =3D tdx_module_setup();
+	vmxoff_all();
+
+	return r;
+}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7bad73d5822e..fb626adc347d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2464,6 +2464,35 @@ int vmx_hardware_enable(void)
 	return 0;
 }
=20
+static void __init vmxon(void *arg)
+{
+	int cpu =3D raw_smp_processor_id();
+	u64 phys_addr =3D __pa(per_cpu(vmxarea, cpu));
+	atomic_t *failed =3D arg;
+	int r;
+
+	if (cr4_read_shadow() & X86_CR4_VMXE) {
+		r =3D -EBUSY;
+		goto out;
+	}
+
+	r =3D kvm_cpu_vmxon(phys_addr);
+out:
+	if (r)
+		atomic_inc(failed);
+}
+
+int __init vmxon_all(void)
+{
+	atomic_t failed =3D ATOMIC_INIT(0);
+
+	on_each_cpu(vmxon, &failed, 1);
+
+	if (atomic_read(&failed))
+		return -EBUSY;
+	return 0;
+}
+
 static void vmclear_local_loaded_vmcss(void)
 {
 	int cpu =3D raw_smp_processor_id();
@@ -2484,6 +2513,16 @@ void vmx_hardware_disable(void)
 	intel_pt_handle_vmx(0);
 }
=20
+static void __init vmxoff(void *junk)
+{
+	cpu_vmxoff();
+}
+
+void __init vmxoff_all(void)
+{
+	on_each_cpu(vmxoff, NULL, 1);
+}
+
 /*
  * There is no X86_FEATURE for SGX yet, but anyway we need to query CPUID
  * directly instead of going through cpu_has(), to ensure KVM is trapping
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index c0ff4b88e8f9..1f46fbae346c 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -17,6 +17,9 @@ __init int vmx_cpu_has_kvm_support(void);
 __init int vmx_disabled_by_bios(void);
 __init int vmx_hardware_setup(void);
=20
+int __init vmxon_all(void);
+void __init vmxoff_all(void);
+
 extern struct kvm_x86_ops vt_x86_ops __initdata;
 extern struct kvm_x86_init_ops vt_init_ops __initdata;
=20
@@ -127,4 +130,10 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu);
 #endif
 void vmx_setup_mce(struct kvm_vcpu *vcpu);
=20
+#ifdef CONFIG_INTEL_TDX_HOST
+int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
+#else
+static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
+#endif
+
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 694c25d1381b..8d0df3527944 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11975,6 +11975,16 @@ static void hardware_enable(void *arg)
 		atomic_inc(failed);
 }
=20
+static int kvm_hardware_enable_all(void)
+{
+	atomic_t failed =3D ATOMIC_INIT(0);
+
+	on_each_cpu(hardware_enable, &failed, 1);
+	if (atomic_read(&failed))
+		return -EBUSY;
+	return 0;
+}
+
 static void hardware_disable(void *junk)
 {
 	WARN_ON_ONCE(preemptible());
@@ -11982,29 +11992,29 @@ static void hardware_disable(void *junk)
 	drop_user_return_notifiers();
 }
=20
+static void kvm_hardware_disable_all(void)
+{
+	on_each_cpu(hardware_disable, NULL, 1);
+}
+
 /*
  * Called after the VM is otherwise initialized, but just before adding it=
 to
  * the vm_list.
  */
 int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
 {
-	atomic_t failed =3D ATOMIC_INIT(0);
-	int r =3D 0;
+	int r;
=20
 	if (usage_count !=3D 1)
 		return kvm_mmu_post_init_vm(kvm);
=20
-	on_each_cpu(hardware_enable, &failed, 1);
-
-	if (atomic_read(&failed)) {
-		r =3D -EBUSY;
-		goto err;
-	}
+	r =3D kvm_hardware_enable_all();
+	if (r)
+		return r;
=20
 	r =3D kvm_mmu_post_init_vm(kvm);
-err:
 	if (r)
-		on_each_cpu(hardware_disable, NULL, 1);
+		kvm_hardware_disable_all();
 	return r;
 }
=20
@@ -12013,7 +12023,7 @@ int kvm_arch_del_vm(int usage_count)
 	if (usage_count)
 		return 0;
=20
-	on_each_cpu(hardware_disable, NULL, 1);
+	kvm_hardware_disable_all();
 	return 0;
 }
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F313AC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:20:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231886AbiI3KUq (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:20:46 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33582 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231347AbiI3KS6 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:58 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC9D415ED22;
        Fri, 30 Sep 2022 03:18:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533136; x=1696069136;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=HRhHkw5/1nEvE57KBt00g4XGw2M3HAqTRwq60MAhtcg=;
  b=n5z81VHGxZNbAHji5GZL/HC+ZHYAbcm9EIJ4oRA3yg/HXUgACHDvqeyp
   Kl79QhvdPi8+dG/vyHGZaRcRbgFeXrX6Gj8OUD+ss6S9WfFjEYzDTbA/O
   E81kDDwazlO+DMIoNumhJuAshnBJnjM/TKIQNugvzi3Pi/AirjejegsyO
   Bo0heoanbkSget9V6Dpi4m65I0zqLCNSs9BCKVaFwcFMN1/mjhR8NQTIl
   W/2V1+chRZZHy6Pl4IhgArYVq0u79JxXQgF4vXheV8PR9C2oQE2XxoBsP
   D0M01o1zLC1EGZpspnav2tGMOdx/2RLgxOx0DkYITZwdy9BGj+N7OHHyb
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207478"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207478"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:52 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807521"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807521"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:52 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>
Subject: [PATCH v9 006/105] KVM: x86: Introduce vm_type to differentiate
 default VMs from confidential VMs
Date: Fri, 30 Sep 2022 03:17:00 -0700
Message-Id: 
 <6b5302e1a394f668890fc6ffe914f45a39ea3638.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Unlike default VMs, confidential VMs (Intel TDX and AMD SEV-ES) don't allow
some operations (e.g., memory read/write, register state access, etc).

Introduce vm_type to track the type of the VM to x86 KVM.  Other arch KVMs
already use vm_type, KVM_INIT_VM accepts vm_type, and x86 KVM callback
vm_init accepts vm_type.  So follow them.  Further, a different policy can
be made based on vm_type.  Define KVM_X86_DEFAULT_VM for default VM as
default and define KVM_X86_TDX_VM for Intel TDX VM.  The wrapper function
will be defined as "bool is_td(kvm) { return vm_type =3D=3D VM_TYPE_TDX; }"

Add a capability KVM_CAP_VM_TYPES to effectively allow device model,
e.g. qemu, to query what VM types are supported by KVM.  This (introduce a
new capability and add vm_type) is chosen to align with other arch KVMs
that have VM types already.  Other arch KVMs uses different name to query
supported vm types and there is no common name for it, so new name was
chosen.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/api.rst        | 21 +++++++++++++++++++++
 arch/x86/include/asm/kvm-x86-ops.h    |  1 +
 arch/x86/include/asm/kvm_host.h       |  2 ++
 arch/x86/include/uapi/asm/kvm.h       |  3 +++
 arch/x86/kvm/svm/svm.c                |  6 ++++++
 arch/x86/kvm/vmx/main.c               |  1 +
 arch/x86/kvm/vmx/tdx.h                |  6 +-----
 arch/x86/kvm/vmx/vmx.c                |  5 +++++
 arch/x86/kvm/vmx/x86_ops.h            |  1 +
 arch/x86/kvm/x86.c                    |  9 ++++++++-
 include/uapi/linux/kvm.h              |  1 +
 tools/arch/x86/include/uapi/asm/kvm.h |  3 +++
 tools/include/uapi/linux/kvm.h        |  1 +
 13 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index c0f800d04ffc..ebf5d2177933 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -147,10 +147,31 @@ described as 'basic' will be available.
 The new VM has no virtual cpus and no memory.
 You probably want to use 0 as machine type.
=20
+X86:
+^^^^
+
+Supported vm type can be queried from KVM_CAP_VM_TYPES, which returns the
+bitmap of supported vm types. The 1-setting of bit @n means vm type with
+value @n is supported.
+
+S390:
+^^^^^
+
 In order to create user controlled virtual machines on S390, check
 KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
 privileged user (CAP_SYS_ADMIN).
=20
+MIPS:
+^^^^^
+
+To use hardware assisted virtualization on MIPS (VZ ASE) rather than
+the default trap & emulate implementation (which changes the virtual
+memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
+flag KVM_VM_MIPS_VZ.
+
+ARM64:
+^^^^^^
+
 On arm64, the physical address size for a VM (IPA Size limit) is limited
 to 40bits by default. The limit can be configured if the host supports the
 extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 3bc45932e2d1..3857bff6949c 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -19,6 +19,7 @@ KVM_X86_OP(hardware_disable)
 KVM_X86_OP(hardware_unsetup)
 KVM_X86_OP(has_emulated_msr)
 KVM_X86_OP(vcpu_after_set_cpuid)
+KVM_X86_OP(is_vm_type_supported)
 KVM_X86_OP(vm_init)
 KVM_X86_OP_OPTIONAL(vm_destroy)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index f6074d135886..e9f4bff8e3a9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1145,6 +1145,7 @@ enum kvm_apicv_inhibit {
 };
=20
 struct kvm_arch {
+	unsigned long vm_type;
 	unsigned long n_used_mmu_pages;
 	unsigned long n_requested_mmu_pages;
 	unsigned long n_max_mmu_pages;
@@ -1462,6 +1463,7 @@ struct kvm_x86_ops {
 	bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
 	void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
=20
+	bool (*is_vm_type_supported)(unsigned long vm_type);
 	unsigned int vm_size;
 	int (*vm_init)(struct kvm *kvm);
 	void (*vm_destroy)(struct kvm *kvm);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 46de10a809ec..54b08789c402 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -532,4 +532,7 @@ struct kvm_pmu_event_filter {
 #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TS=
C) */
 #define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
=20
+#define KVM_X86_DEFAULT_VM	0
+#define KVM_X86_TDX_VM		1
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 371300f03f55..37c0db89a1a4 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4722,6 +4722,11 @@ static void svm_vm_destroy(struct kvm *kvm)
 	sev_vm_destroy(kvm);
 }
=20
+static bool svm_is_vm_type_supported(unsigned long type)
+{
+	return type =3D=3D KVM_X86_DEFAULT_VM;
+}
+
 static int svm_vm_init(struct kvm *kvm)
 {
 	if (!pause_filter_count || !pause_filter_thresh)
@@ -4749,6 +4754,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata =3D {
 	.vcpu_free =3D svm_vcpu_free,
 	.vcpu_reset =3D svm_vcpu_reset,
=20
+	.is_vm_type_supported =3D svm_is_vm_type_supported,
 	.vm_size =3D sizeof(struct kvm_svm),
 	.vm_init =3D svm_vm_init,
 	.vm_destroy =3D svm_vm_destroy,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 1535f55fc312..03e3bb127837 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -33,6 +33,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.hardware_disable =3D vmx_hardware_disable,
 	.has_emulated_msr =3D vmx_has_emulated_msr,
=20
+	.is_vm_type_supported =3D vmx_is_vm_type_supported,
 	.vm_size =3D sizeof(struct kvm_vmx),
 	.vm_init =3D vmx_vm_init,
 	.vm_destroy =3D vmx_vm_destroy,
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 060bf48ec3d6..473013265bd8 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -15,11 +15,7 @@ struct vcpu_tdx {
=20
 static inline bool is_td(struct kvm *kvm)
 {
-	/*
-	 * TDX VM type isn't defined yet.
-	 * return kvm->arch.vm_type =3D=3D KVM_X86_TDX_VM;
-	 */
-	return false;
+	return kvm->arch.vm_type =3D=3D KVM_X86_TDX_VM;
 }
=20
 static inline bool is_td_vcpu(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fb626adc347d..a1d0631d5fa8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7418,6 +7418,11 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 	return err;
 }
=20
+bool vmx_is_vm_type_supported(unsigned long type)
+{
+	return type =3D=3D KVM_X86_DEFAULT_VM;
+}
+
 #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible.=
 See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/h=
w-vuln/l1tf.html for details.\n"
 #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation d=
isabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/d=
oc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
=20
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 1f46fbae346c..901b37636080 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -27,6 +27,7 @@ void vmx_hardware_unsetup(void);
 int vmx_check_processor_compatibility(void);
 int vmx_hardware_enable(void);
 void vmx_hardware_disable(void);
+bool vmx_is_vm_type_supported(unsigned long type);
 int vmx_vm_init(struct kvm *kvm);
 void vmx_vm_destroy(struct kvm *kvm);
 int vmx_vcpu_precreate(struct kvm *kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8d0df3527944..1df0dac476bc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4502,6 +4502,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lo=
ng ext)
 	case KVM_CAP_X86_NOTIFY_VMEXIT:
 		r =3D kvm_caps.has_notify_vmexit;
 		break;
+	case KVM_CAP_VM_TYPES:
+		r =3D BIT(KVM_X86_DEFAULT_VM);
+		if (static_call(kvm_x86_is_vm_type_supported)(KVM_X86_TDX_VM))
+			r |=3D BIT(KVM_X86_TDX_VM);
+		break;
 	default:
 		break;
 	}
@@ -12232,9 +12237,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned lon=
g type)
 	int ret;
 	unsigned long flags;
=20
-	if (type)
+	if (!static_call(kvm_x86_is_vm_type_supported)(type))
 		return -EINVAL;
=20
+	kvm->arch.vm_type =3D type;
+
 	ret =3D kvm_page_track_init(kvm);
 	if (ret)
 		goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0c8db7b7c138..47621588a792 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1214,6 +1214,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 220
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
+#define KVM_CAP_VM_TYPES 223
=20
 #ifdef KVM_CAP_IRQ_ROUTING
=20
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 46de10a809ec..54b08789c402 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -532,4 +532,7 @@ struct kvm_pmu_event_filter {
 #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TS=
C) */
 #define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
=20
+#define KVM_X86_DEFAULT_VM	0
+#define KVM_X86_TDX_VM		1
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index eed0315a77a6..e396070ce568 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -1177,6 +1177,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 220
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
+#define KVM_CAP_VM_TYPES 223
=20
 #ifdef KVM_CAP_IRQ_ROUTING
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C7D62C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:20:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231868AbiI3KUk (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:20:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33580 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231350AbiI3KS6 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:58 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B85BC15ED24;
        Fri, 30 Sep 2022 03:18:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533137; x=1696069137;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=+ajoNHlOsgUMx9Ru1r7eVLb2wmIljVefH8LEi5z8W90=;
  b=RJDUVLOg77b00dslKnOEJfRCyX7kIVN54vRgsy8G3FlarkRc0jx9iJZZ
   ruBpQAdw7ZvjoBz0ZQso/6aESBVqOajJeQE7t4DxMkPohP35h1buS9CMX
   sSHddKWGNnZ1mT8l487j8fKUmE9yJk0hfBEsZ7OX81sIHAciEZkxFa77f
   d4dpVGl9YxRjULIPMRdCWnyVZsT1GfRzkbKifls9fO3mwwK5evsNPcjVK
   u16cGFFd7DpOocl/JXbfOJUk+FeNIhDw+0kN8EX8171eVboVGVvfoL7Zt
   VH6Nq2p68qQNEB6jyLfRKd9+Eu4rFtw3ajFrUnrWWhobhadsToH5TcWAV
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207479"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207479"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:52 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807524"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807524"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:52 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 007/105] KVM: TDX: Make TDX VM type supported
Date: Fri, 30 Sep 2022 03:17:01 -0700
Message-Id: 
 <40488ce068d7c97507489b364613d4ece586abae.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

NOTE: This patch is in position of the patch series for developers to be
able to test codes during the middle of the patch series although this
patch series doesn't provide functional features until the all the patches
of this patch series.  When merging this patch series, this patch can be
moved to the end.

As first step TDX VM support, return that TDX VM type supported to device
model, e.g. qemu.  The callback to create guest TD is vm_init callback for
KVM_CREATE_VM.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 18 ++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     |  6 ++++++
 arch/x86/kvm/vmx/vmx.c     |  5 -----
 arch/x86/kvm/vmx/x86_ops.h |  3 ++-
 4 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 03e3bb127837..477c14b64879 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -10,6 +10,12 @@
 static bool __read_mostly enable_tdx =3D IS_ENABLED(CONFIG_INTEL_TDX_HOST);
 module_param_named(tdx, enable_tdx, bool, 0444);
=20
+static bool vt_is_vm_type_supported(unsigned long type)
+{
+	return type =3D=3D KVM_X86_DEFAULT_VM ||
+		(enable_tdx && tdx_is_vm_type_supported(type));
+}
+
 static __init int vt_hardware_setup(void)
 {
 	int ret;
@@ -23,6 +29,14 @@ static __init int vt_hardware_setup(void)
 	return 0;
 }
=20
+static int vt_vm_init(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return -EOPNOTSUPP;	/* Not ready to create guest TD yet. */
+
+	return vmx_vm_init(kvm);
+}
+
 struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
=20
@@ -33,9 +47,9 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.hardware_disable =3D vmx_hardware_disable,
 	.has_emulated_msr =3D vmx_has_emulated_msr,
=20
-	.is_vm_type_supported =3D vmx_is_vm_type_supported,
+	.is_vm_type_supported =3D vt_is_vm_type_supported,
 	.vm_size =3D sizeof(struct kvm_vmx),
-	.vm_init =3D vmx_vm_init,
+	.vm_init =3D vt_vm_init,
 	.vm_destroy =3D vmx_vm_destroy,
=20
 	.vcpu_precreate =3D vmx_vcpu_precreate,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6f8451ff8980..c4a318efbed5 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -74,6 +74,12 @@ static int __init tdx_module_setup(void)
 	return 0;
 }
=20
+bool tdx_is_vm_type_supported(unsigned long type)
+{
+	/* enable_tdx check is done by the caller. */
+	return type =3D=3D KVM_X86_TDX_VM;
+}
+
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops)
 {
 	int r;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a1d0631d5fa8..fb626adc347d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7418,11 +7418,6 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 	return err;
 }
=20
-bool vmx_is_vm_type_supported(unsigned long type)
-{
-	return type =3D=3D KVM_X86_DEFAULT_VM;
-}
-
 #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible.=
 See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/h=
w-vuln/l1tf.html for details.\n"
 #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation d=
isabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/d=
oc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
=20
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 901b37636080..2a870202fbf6 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -27,7 +27,6 @@ void vmx_hardware_unsetup(void);
 int vmx_check_processor_compatibility(void);
 int vmx_hardware_enable(void);
 void vmx_hardware_disable(void);
-bool vmx_is_vm_type_supported(unsigned long type);
 int vmx_vm_init(struct kvm *kvm);
 void vmx_vm_destroy(struct kvm *kvm);
 int vmx_vcpu_precreate(struct kvm *kvm);
@@ -133,8 +132,10 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu);
=20
 #ifdef CONFIG_INTEL_TDX_HOST
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
+bool tdx_is_vm_type_supported(unsigned long type);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
+static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 81DBAC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:20:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231899AbiI3KUu (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:20:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33594 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231379AbiI3KS7 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:59 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6C6215ED32;
        Fri, 30 Sep 2022 03:18:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533138; x=1696069138;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Wk7ogtc5Ub+4u4KfteyYsotpfo7mJ/YCVxSbhwLcn7s=;
  b=bmRnD+r5IdZQGIiFYIy+LILBFNBQtzadPeaA3d+tvNpG1hr7uFP74s3a
   MGkCT+lg26r2UV5lLAVI1MnVwXi+Vn1Z456lLoOu+qQl18F1RtBkI4Xik
   YKPwumoWoJSEO2Ty0yQMt0FKY2wszsqYvn5UdX1mhx6mIZeDSnbZ5dW3C
   3V+KGfU3WNP0ASfcAwxVShLHbZ0SJ+vhO/IWwhAOiMOZ+lZc/zwqsKKFK
   DElk6J414XxJH7KglPm5uRihYeTD0IX+kT8qQkit8HH08kEBHf/b04MCS
   1q0Mw5Ug/+Oq4lNpbzSKQF/j9TUQjvtHDJ2XzowZ2dRGM/4fv4OJFXV/d
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207480"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207480"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:52 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807528"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807528"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:52 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 008/105] [MARKER] The start of TDX KVM patch series: TDX
 architectural definitions
Date: Fri, 30 Sep 2022 03:17:02 -0700
Message-Id: 
 <efdb4dbed4640d0dac7c15dc6e8e8b1c788941c3.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TDX architectural
definitions.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 .../virt/kvm/intel-tdx-layer-status.rst       | 29 +++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100644 Documentation/virt/kvm/intel-tdx-layer-status.rst

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
new file mode 100644
index 000000000000..b7a14bc73853
--- /dev/null
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -0,0 +1,29 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+Intel Trust Dodmain Extensions(TDX)
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Layer status
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+What qemu can do
+----------------
+- TDX VM TYPE is exposed to Qemu.
+- Qemu can try to create VM of TDX VM type and then fails.
+
+Patch Layer status
+------------------
+  Patch layer                          Status
+* TDX, VMX coexistence:                 Applied
+* TDX architectural definitions:        Applying
+* TD VM creation/destruction:           Not yet
+* TD vcpu creation/destruction:         Not yet
+* TDX EPT violation:                    Not yet
+* TD finalization:                      Not yet
+* TD vcpu enter/exit:                   Not yet
+* TD vcpu interrupts/exit/hypercall:    Not yet
+
+* KVM MMU GPA shared bits:              Not yet
+* KVM TDP refactoring for TDX:          Not yet
+* KVM TDP MMU hooks:                    Not yet
+* KVM TDP MMU MapGPA:                   Not yet
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 76E7CC4332F
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:19:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231610AbiI3KTK (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:19:10 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33544 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231252AbiI3KSz (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:55 -0400
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BED815ED1E;
        Fri, 30 Sep 2022 03:18:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533133; x=1696069133;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=I2wvfJ/y/t932BPFlGKQdNeB6Xot/PIk5zhThIBg9pM=;
  b=Bofyfb3VkH6dUOWgSxDGESBl3075f52Y0A5W23H5UEkw8+pbgJmLEetm
   GfC2NyFANxu4nT6njsBJjnAZS+E0dsyz5KmauTMHtu9fw0QNU0gHcZm6T
   AStDK0TwhOhP9UmNKP9eiDy3wfF8DJgbHv0VTWqOfymffEZFYn1Rf3UQd
   H6Fc7GHSFWPOyaN43z0urbNhbMYPMkjYbpDE2RKKdvxv9Tzqvks6tLF1e
   UqfqUPxcoKqDgU0ui3qlBeD1SPiZTsFCOMoCL1hYOqScXMUOQJoITg/fh
   45Zwd5vAWrw6hJiOCZw6SccKTUKU7numBs81EYCiaXrWfGMSE+mvy3n2+
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289320433"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="289320433"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807531"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807531"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:52 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 009/105] KVM: TDX: Define TDX architectural definitions
Date: Fri, 30 Sep 2022 03:17:03 -0700
Message-Id: 
 <16426bf0e0e2309d421f718289cf8efffabaecf8.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Define architectural definitions for KVM to issue the TDX SEAMCALLs.

Structures and values that are architecturally defined in the TDX module
specifications the chapter of ABI Reference.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx_arch.h | 166 ++++++++++++++++++++++++++++++++++++
 1 file changed, 166 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_arch.h

diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
new file mode 100644
index 000000000000..18604734fb14
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_arch.h
@@ -0,0 +1,166 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* architectural constants/data definitions for TDX SEAMCALLs */
+
+#ifndef __KVM_X86_TDX_ARCH_H
+#define __KVM_X86_TDX_ARCH_H
+
+#include <linux/types.h>
+
+/*
+ * TDX SEAMCALL API function leaves
+ */
+#define TDH_VP_ENTER			0
+#define TDH_MNG_ADDCX			1
+#define TDH_MEM_PAGE_ADD		2
+#define TDH_MEM_SEPT_ADD		3
+#define TDH_VP_ADDCX			4
+#define TDH_MEM_PAGE_RELOCATE		5
+#define TDH_MEM_PAGE_AUG		6
+#define TDH_MEM_RANGE_BLOCK		7
+#define TDH_MNG_KEY_CONFIG		8
+#define TDH_MNG_CREATE			9
+#define TDH_VP_CREATE			10
+#define TDH_MNG_RD			11
+#define TDH_MR_EXTEND			16
+#define TDH_MR_FINALIZE			17
+#define TDH_VP_FLUSH			18
+#define TDH_MNG_VPFLUSHDONE		19
+#define TDH_MNG_KEY_FREEID		20
+#define TDH_MNG_INIT			21
+#define TDH_VP_INIT			22
+#define TDH_VP_RD			26
+#define TDH_MNG_KEY_RECLAIMID		27
+#define TDH_PHYMEM_PAGE_RECLAIM		28
+#define TDH_MEM_PAGE_REMOVE		29
+#define TDH_MEM_SEPT_REMOVE		30
+#define TDH_MEM_TRACK			38
+#define TDH_MEM_RANGE_UNBLOCK		39
+#define TDH_PHYMEM_CACHE_WB		40
+#define TDH_PHYMEM_PAGE_WBINVD		41
+#define TDH_VP_WR			43
+#define TDH_SYS_LP_SHUTDOWN		44
+
+#define TDG_VP_VMCALL_GET_TD_VM_CALL_INFO		0x10000
+#define TDG_VP_VMCALL_MAP_GPA				0x10001
+#define TDG_VP_VMCALL_GET_QUOTE				0x10002
+#define TDG_VP_VMCALL_REPORT_FATAL_ERROR		0x10003
+#define TDG_VP_VMCALL_SETUP_EVENT_NOTIFY_INTERRUPT	0x10004
+
+/* TDX control structure (TDR/TDCS/TDVPS) field access codes */
+#define TDX_NON_ARCH			BIT_ULL(63)
+#define TDX_CLASS_SHIFT			56
+#define TDX_FIELD_MASK			GENMASK_ULL(31, 0)
+
+#define __BUILD_TDX_FIELD(non_arch, class, field)	\
+	(((non_arch) ? TDX_NON_ARCH : 0) |		\
+	 ((u64)(class) << TDX_CLASS_SHIFT) |		\
+	 ((u64)(field) & TDX_FIELD_MASK))
+
+#define BUILD_TDX_FIELD(class, field)			\
+	__BUILD_TDX_FIELD(false, (class), (field))
+
+#define BUILD_TDX_FIELD_NON_ARCH(class, field)		\
+	__BUILD_TDX_FIELD(true, (class), (field))
+
+
+/* Class code for TD */
+#define TD_CLASS_EXECUTION_CONTROLS	17ULL
+
+/* Class code for TDVPS */
+#define TDVPS_CLASS_VMCS		0ULL
+#define TDVPS_CLASS_GUEST_GPR		16ULL
+#define TDVPS_CLASS_OTHER_GUEST		17ULL
+#define TDVPS_CLASS_MANAGEMENT		32ULL
+
+enum tdx_tdcs_execution_control {
+	TD_TDCS_EXEC_TSC_OFFSET =3D 10,
+};
+
+/* @field is any of enum tdx_tdcs_execution_control */
+#define TDCS_EXEC(field)		BUILD_TDX_FIELD(TD_CLASS_EXECUTION_CONTROLS, (fi=
eld))
+
+/* @field is the VMCS field encoding */
+#define TDVPS_VMCS(field)		BUILD_TDX_FIELD(TDVPS_CLASS_VMCS, (field))
+
+enum tdx_vcpu_guest_other_state {
+	TD_VCPU_STATE_DETAILS_NON_ARCH =3D 0x100,
+};
+
+union tdx_vcpu_state_details {
+	struct {
+		u64 vmxip	: 1;
+		u64 reserved	: 63;
+	};
+	u64 full;
+};
+
+/* @field is any of enum tdx_guest_other_state */
+#define TDVPS_STATE(field)		BUILD_TDX_FIELD(TDVPS_CLASS_OTHER_GUEST, (fiel=
d))
+#define TDVPS_STATE_NON_ARCH(field)	BUILD_TDX_FIELD_NON_ARCH(TDVPS_CLASS_O=
THER_GUEST, (field))
+
+/* Management class fields */
+enum tdx_vcpu_guest_management {
+	TD_VCPU_PEND_NMI =3D 11,
+};
+
+/* @field is any of enum tdx_vcpu_guest_management */
+#define TDVPS_MANAGEMENT(field)		BUILD_TDX_FIELD(TDVPS_CLASS_MANAGEMENT, (=
field))
+
+#define TDX_EXTENDMR_CHUNKSIZE		256
+
+struct tdx_cpuid_value {
+	u32 eax;
+	u32 ebx;
+	u32 ecx;
+	u32 edx;
+} __packed;
+
+#define TDX_TD_ATTRIBUTE_DEBUG		BIT_ULL(0)
+#define TDX_TD_ATTRIBUTE_PKS		BIT_ULL(30)
+#define TDX_TD_ATTRIBUTE_KL		BIT_ULL(31)
+#define TDX_TD_ATTRIBUTE_PERFMON	BIT_ULL(63)
+
+/*
+ * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is=
 1024B.
+ */
+struct td_params {
+	u64 attributes;
+	u64 xfam;
+	u32 max_vcpus;
+	u32 reserved0;
+
+	u64 eptp_controls;
+	u64 exec_controls;
+	u16 tsc_frequency;
+	u8  reserved1[38];
+
+	u64 mrconfigid[6];
+	u64 mrowner[6];
+	u64 mrownerconfig[6];
+	u64 reserved2[4];
+
+	union {
+		struct tdx_cpuid_value cpuid_values[0];
+		u8 reserved3[768];
+	};
+} __packed __aligned(1024);
+
+/*
+ * Guest uses MAX_PA for GPAW when set.
+ * 0: GPA.SHARED bit is GPA[47]
+ * 1: GPA.SHARED bit is GPA[51]
+ */
+#define TDX_EXEC_CONTROL_MAX_GPAW      BIT_ULL(0)
+
+/*
+ * TDX requires the frequency to be defined in units of 25MHz, which is the
+ * frequency of the core crystal clock on TDX-capable platforms, i.e. the =
TDX
+ * module can only program frequencies that are multiples of 25MHz.  The
+ * frequency must be between 100mhz and 10ghz (inclusive).
+ */
+#define TDX_TSC_KHZ_TO_25MHZ(tsc_in_khz)	((tsc_in_khz) / (25 * 1000))
+#define TDX_TSC_25MHZ_TO_KHZ(tsc_in_25mhz)	((tsc_in_25mhz) * (25 * 1000))
+#define TDX_MIN_TSC_FREQUENCY_KHZ		(100 * 1000)
+#define TDX_MAX_TSC_FREQUENCY_KHZ		(10 * 1000 * 1000)
+
+#endif /* __KVM_X86_TDX_ARCH_H */
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 067E3C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:19:52 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231736AbiI3KTs (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:19:48 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33560 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231305AbiI3KS5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:57 -0400
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF674166F02;
        Fri, 30 Sep 2022 03:18:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533136; x=1696069136;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=KEIAjHg+U4wWqFG4QgHIE8RVytQUcfrlVjc5d4iCVrU=;
  b=ngCuTqeFwTLU5vdA+nrEn35QaV9DSklH78Fjog3qJ0PdbBXv789JjRWK
   wuWKcuMZrHwP7av1A7Jcc7b7DwsJHwZH3mQg1z07FM7xFQHqhLGQNsB22
   fah7KiYwuNm/3c69C4f2CS0JtsU045ZsVonDikRDqCRopzwZbJZwhnghj
   LvpoSWiUQTN5ruBTNazketyIsiY3wBGbKm29md/zHoKz+D1Ui63YXt31v
   ueOLfXbAKjTAJ0eWXl8RU+B5XLDNGuCu9LTvaEfIa2K8cg3h+hVViiUnl
   x4aLCUDBbsQNDVi5JJl/2NxTvVqXsh5ZDlRl4Tfze3z+VS2UjHuGzV/08
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289320435"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="289320435"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807535"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807535"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:52 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 010/105] KVM: TDX: Add TDX "architectural" error codes
Date: Fri, 30 Sep 2022 03:17:04 -0700
Message-Id: 
 <474651ea784f029cf61a087e870dcc134b189f9e.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add error codes for the TDX SEAMCALLs both for TDX VMM side for TDH
SEAMCALL and TDX guest side for TDG.VP.VMCALL.  KVM issues the TDX
SEAMCALLs and checks its error code.  KVM handles hypercall from the TDX
guest and may return an error.  So error code for the TDX guest is also
needed.

TDX SEAMCALL uses bits 31:0 to return more information, so these error
codes will only exactly match RAX[63:32].  Error codes for TDG.VP.VMCALL is
defined by TDX Guest-Host-Communication interface spec.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx_errno.h | 37 ++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_errno.h

diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
new file mode 100644
index 000000000000..f2b1c4cc516f
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_errno.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* architectural status code for SEAMCALL */
+
+#ifndef __KVM_X86_TDX_ERRNO_H
+#define __KVM_X86_TDX_ERRNO_H
+
+#define TDX_SEAMCALL_STATUS_MASK		0xFFFFFFFF00000000ULL
+
+/*
+ * TDX SEAMCALL Status Codes (returned in RAX)
+ */
+#define TDX_SUCCESS				0x0000000000000000ULL
+#define TDX_NON_RECOVERABLE_VCPU		0x4000000100000000ULL
+#define TDX_INTERRUPTED_RESUMABLE		0x8000000300000000ULL
+#define TDX_OPERAND_BUSY                        0x8000020000000000ULL
+#define TDX_VCPU_NOT_ASSOCIATED			0x8000070200000000ULL
+#define TDX_KEY_GENERATION_FAILED		0x8000080000000000ULL
+#define TDX_KEY_STATE_INCORRECT			0xC000081100000000ULL
+#define TDX_KEY_CONFIGURED			0x0000081500000000ULL
+#define TDX_NO_HKID_READY_TO_WBCACHE		0x0000082100000000ULL
+#define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
+
+/*
+ * TDG.VP.VMCALL Status Codes (returned in R10)
+ */
+#define TDG_VP_VMCALL_SUCCESS			0x0000000000000000ULL
+#define TDG_VP_VMCALL_RETRY			0x0000000000000001ULL
+#define TDG_VP_VMCALL_INVALID_OPERAND		0x8000000000000000ULL
+#define TDG_VP_VMCALL_TDREPORT_FAILED		0x8000000000000001ULL
+
+/*
+ * TDX module operand ID, appears in 31:0 part of error code as
+ * detail information
+ */
+#define TDX_OPERAND_ID_SEPT			0x92
+
+#endif /* __KVM_X86_TDX_ERRNO_H */
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A61ACC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:21:10 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231952AbiI3KVJ (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:21:09 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33562 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231389AbiI3KTA (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:00 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 544621664AC;
        Fri, 30 Sep 2022 03:18:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533138; x=1696069138;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=8GfDMTsHwwGryg292MoTgp8xb4T2/mTWqYyqehU6Iqs=;
  b=NrvIgos6iRa2Gyd/bGunZm7m2WKPzRAp7PyRiYwSb+Uieh1ReNv1mmjm
   zzHlHZAljQIdhLc2FKtRj/c5qmyKdkJ+HlHzSdo4aquQB4Eu8s+E0UaDC
   GpqvD5953R8BRifTBgZ4kRJY77W/QIGiLiS/1NIpYk3F9IbWPpnpqDXLg
   19b4mpAdUIROgoIMcuU+k3shm+laa1Tro0DOzflh0Tg5aclzhDksnMf12
   RJ6BUKvcV4tUfsX4gIyXx7sISTTmqHqZbUEXkCFTHC+8NUfeIqPSFyDIP
   HxlVmX0vyhlZ9KfLVD5d1qF9cWaFa/bfTJRhDNmUkIko11i9Tc+wz69+D
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207481"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207481"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807538"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807538"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 011/105] KVM: TDX: Add C wrapper functions for SEAMCALLs to
 the TDX module
Date: Fri, 30 Sep 2022 03:17:05 -0700
Message-Id: 
 <168324cf75e978e86fc987e980e48fcb0a9cefb5.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

A VMM interacts with the TDX module using a new instruction (SEAMCALL).  A
TDX VMM uses SEAMCALLs where a VMX VMM would have directly interacted with
VMX instructions.  For instance, a TDX VMM does not have full access to the
VM control structure corresponding to VMX VMCS.  Instead, a VMM induces the
TDX module to act on behalf via SEAMCALLs.

Export __seamcall and define C wrapper functions for SEAMCALLs for
readability.  Some SEAMCALL APIs donates pages to TDX module or guest TD.
The pages are encrypted with TDX private host key id set in high bits of
physical address.  If any modified cache lines may exit for these pages,
flush them to memory by clflush_cache_range().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/tdx.h       |   2 +
 arch/x86/kvm/vmx/tdx_ops.h       | 185 +++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/seamcall.S |   2 +
 3 files changed, 189 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_ops.h

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index dfea0dd71bc1..c887618e3cec 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -144,6 +144,8 @@ struct tdsysinfo_struct {
 bool platform_tdx_enabled(void);
 int tdx_init(void);
 const struct tdsysinfo_struct *tdx_get_sysinfo(void);
+u64 __seamcall(u64 op, u64 rcx, u64 rdx, u64 r8, u64 r9,
+	       struct tdx_module_output *out);
 #else	/* !CONFIG_INTEL_TDX_HOST */
 static inline bool platform_tdx_enabled(void) { return false; }
 static inline int tdx_init(void)  { return -ENODEV; }
diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
new file mode 100644
index 000000000000..85adbf49c277
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -0,0 +1,185 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* constants/data definitions for TDX SEAMCALLs */
+
+#ifndef __KVM_X86_TDX_OPS_H
+#define __KVM_X86_TDX_OPS_H
+
+#include <linux/compiler.h>
+
+#include <asm/cacheflush.h>
+#include <asm/asm.h>
+#include <asm/kvm_host.h>
+
+#include "tdx_errno.h"
+#include "tdx_arch.h"
+
+#ifdef CONFIG_INTEL_TDX_HOST
+
+static inline u64 tdh_mng_addcx(hpa_t tdr, hpa_t addr)
+{
+	clflush_cache_range(__va(addr), PAGE_SIZE);
+	return __seamcall(TDH_MNG_ADDCX, addr, tdr, 0, 0, NULL);
+}
+
+static inline u64 tdh_mem_page_add(hpa_t tdr, gpa_t gpa, hpa_t hpa, hpa_t =
source,
+				   struct tdx_module_output *out)
+{
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	return __seamcall(TDH_MEM_PAGE_ADD, gpa, tdr, hpa, source, out);
+}
+
+static inline u64 tdh_mem_sept_add(hpa_t tdr, gpa_t gpa, int level, hpa_t =
page,
+				   struct tdx_module_output *out)
+{
+	clflush_cache_range(__va(page), PAGE_SIZE);
+	return __seamcall(TDH_MEM_SEPT_ADD, gpa | level, tdr, page, 0, out);
+}
+
+static inline u64 tdh_mem_sept_remove(hpa_t tdr, gpa_t gpa, int level,
+				      struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MEM_SEPT_REMOVE, gpa | level, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_vp_addcx(hpa_t tdvpr, hpa_t addr)
+{
+	clflush_cache_range(__va(addr), PAGE_SIZE);
+	return __seamcall(TDH_VP_ADDCX, addr, tdvpr, 0, 0, NULL);
+}
+
+static inline u64 tdh_mem_page_relocate(hpa_t tdr, gpa_t gpa, hpa_t hpa,
+					struct tdx_module_output *out)
+{
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	return __seamcall(TDH_MEM_PAGE_RELOCATE, gpa, tdr, hpa, 0, out);
+}
+
+static inline u64 tdh_mem_page_aug(hpa_t tdr, gpa_t gpa, hpa_t hpa,
+				   struct tdx_module_output *out)
+{
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	return __seamcall(TDH_MEM_PAGE_AUG, gpa, tdr, hpa, 0, out);
+}
+
+static inline u64 tdh_mem_range_block(hpa_t tdr, gpa_t gpa, int level,
+				      struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MEM_RANGE_BLOCK, gpa | level, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_mng_key_config(hpa_t tdr)
+{
+	return __seamcall(TDH_MNG_KEY_CONFIG, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_create(hpa_t tdr, int hkid)
+{
+	clflush_cache_range(__va(tdr), PAGE_SIZE);
+	return __seamcall(TDH_MNG_CREATE, tdr, hkid, 0, 0, NULL);
+}
+
+static inline u64 tdh_vp_create(hpa_t tdr, hpa_t tdvpr)
+{
+	clflush_cache_range(__va(tdvpr), PAGE_SIZE);
+	return __seamcall(TDH_VP_CREATE, tdvpr, tdr, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_rd(hpa_t tdr, u64 field, struct tdx_module_outpu=
t *out)
+{
+	return __seamcall(TDH_MNG_RD, tdr, field, 0, 0, out);
+}
+
+static inline u64 tdh_mr_extend(hpa_t tdr, gpa_t gpa,
+				struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MR_EXTEND, gpa, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_mr_finalize(hpa_t tdr)
+{
+	return __seamcall(TDH_MR_FINALIZE, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_vp_flush(hpa_t tdvpr)
+{
+	return __seamcall(TDH_VP_FLUSH, tdvpr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_vpflushdone(hpa_t tdr)
+{
+	return __seamcall(TDH_MNG_VPFLUSHDONE, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_key_freeid(hpa_t tdr)
+{
+	return __seamcall(TDH_MNG_KEY_FREEID, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_init(hpa_t tdr, hpa_t td_params,
+			       struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MNG_INIT, tdr, td_params, 0, 0, out);
+}
+
+static inline u64 tdh_vp_init(hpa_t tdvpr, u64 rcx)
+{
+	return __seamcall(TDH_VP_INIT, tdvpr, rcx, 0, 0, NULL);
+}
+
+static inline u64 tdh_vp_rd(hpa_t tdvpr, u64 field,
+			    struct tdx_module_output *out)
+{
+	return __seamcall(TDH_VP_RD, tdvpr, field, 0, 0, out);
+}
+
+static inline u64 tdh_mng_key_reclaimid(hpa_t tdr)
+{
+	return __seamcall(TDH_MNG_KEY_RECLAIMID, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_phymem_page_reclaim(hpa_t page,
+					  struct tdx_module_output *out)
+{
+	return __seamcall(TDH_PHYMEM_PAGE_RECLAIM, page, 0, 0, 0, out);
+}
+
+static inline u64 tdh_mem_page_remove(hpa_t tdr, gpa_t gpa, int level,
+				      struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MEM_PAGE_REMOVE, gpa | level, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_sys_lp_shutdown(void)
+{
+	return __seamcall(TDH_SYS_LP_SHUTDOWN, 0, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mem_track(hpa_t tdr)
+{
+	return __seamcall(TDH_MEM_TRACK, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mem_range_unblock(hpa_t tdr, gpa_t gpa, int level,
+					struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MEM_RANGE_UNBLOCK, gpa | level, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_phymem_cache_wb(bool resume)
+{
+	return __seamcall(TDH_PHYMEM_CACHE_WB, resume ? 1 : 0, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_phymem_page_wbinvd(hpa_t page)
+{
+	return __seamcall(TDH_PHYMEM_PAGE_WBINVD, page, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_vp_wr(hpa_t tdvpr, u64 field, u64 val, u64 mask,
+			    struct tdx_module_output *out)
+{
+	return __seamcall(TDH_VP_WR, tdvpr, field, val, mask, out);
+}
+#endif /* CONFIG_INTEL_TDX_HOST */
+
+#endif /* __KVM_X86_TDX_OPS_H */
diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamc=
all.S
index f322427e48c3..aced0ed9b76a 100644
--- a/arch/x86/virt/vmx/tdx/seamcall.S
+++ b/arch/x86/virt/vmx/tdx/seamcall.S
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include <linux/linkage.h>
+#include <asm/export.h>
 #include <asm/frame.h>
=20
 #include "tdxcall.S"
@@ -50,3 +51,4 @@ SYM_FUNC_START(__seamcall)
 	FRAME_END
 	RET
 SYM_FUNC_END(__seamcall)
+EXPORT_SYMBOL_GPL(__seamcall)
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C4AC6C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:20:31 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231807AbiI3KU1 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:20:27 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33602 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231329AbiI3KS5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:57 -0400
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6EE30152653;
        Fri, 30 Sep 2022 03:18:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533136; x=1696069136;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=rRN5kbq0TqI2b4uCt5j6yH0GqnOcoNEW+tyjQehDDAc=;
  b=QkuGWqrLPeY0nH6J0d/2YizzTcL8F/3BCloEBSNcWBGNCYuR3Q70it7b
   2ZsssJsx+Q19bb3sOa9KDyVYeoq4wJbaAaxUhLPMaIBkRlWxpTQVuve6e
   TS+4Yu37ta7BQiNDiHodpuiOj+DoR+m4glvNt0l2AeTnn+kmimnNYlDtL
   37EcjZZDSgVeVvSfVPQHDoCm/Rv9o5mG17yJTQo0+zMmOeqtUQzl+E/HI
   1s2GlPdVKgG0v4dM/s5D9mnDZHjVGEm2GPzEEpRu6WfsEYbijvBIXnSsc
   9EYR/2BK+ZZsiGL1eZNo3DJIg+lakqhZPNaeuxphUKwyJbeB73hc+j6lA
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289320436"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="289320436"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807542"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807542"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 012/105] KVM: TDX: Add helper functions to print TDX
 SEAMCALL error
Date: Fri, 30 Sep 2022 03:17:06 -0700
Message-Id: 
 <82b961c8a9a390188f11b50ce33c62f5d5e6822e.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add helper functions to print out errors from the TDX module in a uniform
manner.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/Makefile        |  2 +-
 arch/x86/kvm/vmx/tdx_error.c | 21 +++++++++++++++++++++
 arch/x86/kvm/vmx/tdx_ops.h   |  3 +++
 3 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kvm/vmx/tdx_error.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index e2c05195cb95..f1ad445df505 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -24,7 +24,7 @@ kvm-$(CONFIG_KVM_XEN)	+=3D xen.o
 kvm-intel-y		+=3D vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
 			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o vmx/main.o
 kvm-intel-$(CONFIG_X86_SGX_KVM)	+=3D vmx/sgx.o
-kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+=3D vmx/tdx.o
+kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+=3D vmx/tdx.o vmx/tdx_error.o
=20
 kvm-amd-y		+=3D svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o =
svm/sev.o
=20
diff --git a/arch/x86/kvm/vmx/tdx_error.c b/arch/x86/kvm/vmx/tdx_error.c
new file mode 100644
index 000000000000..574b72d34e1e
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_error.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+/* functions to record TDX SEAMCALL error */
+
+#include <linux/kernel.h>
+#include <linux/bug.h>
+
+#include "tdx_ops.h"
+
+void pr_tdx_error(u64 op, u64 error_code, const struct tdx_module_output *=
out)
+{
+	if (!out) {
+		pr_err_ratelimited("SEAMCALL[%lld] failed: 0x%llx\n",
+				   op, error_code);
+		return;
+	}
+
+	pr_err_ratelimited("SEAMCALL[%lld] failed: 0x%llx RCX 0x%llx, RDX 0x%llx,"
+			   " R8 0x%llx, R9 0x%llx, R10 0x%llx, R11 0x%llx\n",
+			   op, error_code,
+			   out->rcx, out->rdx, out->r8, out->r9, out->r10, out->r11);
+}
diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
index 85adbf49c277..8cc2f01c509b 100644
--- a/arch/x86/kvm/vmx/tdx_ops.h
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -9,12 +9,15 @@
 #include <asm/cacheflush.h>
 #include <asm/asm.h>
 #include <asm/kvm_host.h>
+#include <asm/tdx.h>
=20
 #include "tdx_errno.h"
 #include "tdx_arch.h"
=20
 #ifdef CONFIG_INTEL_TDX_HOST
=20
+void pr_tdx_error(u64 op, u64 error_code, const struct tdx_module_output *=
out);
+
 static inline u64 tdh_mng_addcx(hpa_t tdr, hpa_t addr)
 {
 	clflush_cache_range(__va(addr), PAGE_SIZE);
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9B1B2C4332F
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:19:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231404AbiI3KTA (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:19:00 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33528 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231246AbiI3KSz (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:55 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFB5A15ED22;
        Fri, 30 Sep 2022 03:18:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533133; x=1696069133;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=wbnROJIV0sPi9VRc1Kym0LCeIa3DweI4YQjfi8+X354=;
  b=A3ymsemWVvTrFIRmc9oGI7Yfy7sde4UK8nPLbQvfjaY3dJMfM6gS2BYm
   +GucV7q/UdyP+84vWxzc2JyB2xTm21kQbjRNsTBw6aaj9aW4ajHM0pdxb
   eOcoW5BRFysvhDw2lJoUw4KziBM7CrXk6F9lgQSURWeogRmRzs0V7Db3J
   w5XoOvp6EyK8TNHnfHFdqn379gjHEREnfqgZhDltucQwc3gy8bH7/YuZL
   TF6PCxFr2R67KA0Hd54y9nVbsZzdQUUAEgG4MXIlawbc2pApCEa+ROXqV
   eHGB9cOXe1BypfSrkhJrdalGBt/Rqbgo3Bh77l4xc9qEnULRQjrWqON15
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870062"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870062"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807547"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807547"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 013/105] [MARKER] The start of TDX KVM patch series: TD VM
 creation/destruction
Date: Fri, 30 Sep 2022 03:17:07 -0700
Message-Id: 
 <6f518b2e03b8b8fb1d84fe29096b240b32b65064.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD VM
creation/destruction.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index b7a14bc73853..5e0deaebf843 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -15,8 +15,8 @@ Patch Layer status
 ------------------
   Patch layer                          Status
 * TDX, VMX coexistence:                 Applied
-* TDX architectural definitions:        Applying
-* TD VM creation/destruction:           Not yet
+* TDX architectural definitions:        Applied
+* TD VM creation/destruction:           Applying
 * TD vcpu creation/destruction:         Not yet
 * TDX EPT violation:                    Not yet
 * TD finalization:                      Not yet
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8BB65C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:19:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231653AbiI3KTS (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:19:18 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33546 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231263AbiI3KSz (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:55 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C725315348B;
        Fri, 30 Sep 2022 03:18:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533134; x=1696069134;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=5k5aCmwyIFSsGRCiOhHxI/sKSJLDmnsaLGB+lq67gZM=;
  b=TbOlp1Ww9R/Qy7aGsQGxCbrJZV9leZ0zczn/0sehZFZlE5m/ZjOJKaG5
   x3lTjfCAU3beX3BsD2TvqCeuQqmkXE3yofiYWjILrLpBHts0sFKTWDdEl
   GvQl5479fSzesBYRcDaGVPOwGW/vbPwUwPKJORC3b8966j7FicPtAwcLr
   Vuar9DNNqsqm0PnAjqg2nUVtm52kpgXotQ2+9qskAbMDEQTKrJuDQQYoP
   vfk0r9neCg7gJe33FFdjKfy9UuNxz4zWSir51X8LjylzsMVUQH23Zag57
   Sd42QR6XIVmilxCxxLIUZeHvnZMHrGx+DY7s37oZc/FDfxpSf4m8giMDz
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870063"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870063"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807550"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807550"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 014/105] KVM: TDX: Stub in tdx.h with structs, accessors,
 and VMCS helpers
Date: Fri, 30 Sep 2022 03:17:08 -0700
Message-Id: 
 <f998a587d5c07e725615ac0c1144508a311b34b2.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Stub in kvm_tdx, vcpu_tdx, and their various accessors.  TDX defines
SEAMCALL APIs to access TDX control structures corresponding to the VMX
VMCS.  Introduce helper accessors to hide its SEAMCALL ABI details.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.h | 118 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 116 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 473013265bd8..98999bf3f188 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -3,14 +3,27 @@
 #define __KVM_X86_TDX_H
=20
 #ifdef CONFIG_INTEL_TDX_HOST
+
+#include "tdx_ops.h"
+
+struct tdx_td_page {
+	unsigned long va;
+	hpa_t pa;
+	bool added;
+};
+
 struct kvm_tdx {
 	struct kvm kvm;
-	/* TDX specific members follow. */
+
+	struct tdx_td_page tdr;
+	struct tdx_td_page *tdcs;
 };
=20
 struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
-	/* TDX specific members follow. */
+
+	struct tdx_td_page tdvpr;
+	struct tdx_td_page *tdvpx;
 };
=20
 static inline bool is_td(struct kvm *kvm)
@@ -32,6 +45,107 @@ static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *=
vcpu)
 {
 	return container_of(vcpu, struct vcpu_tdx, vcpu);
 }
+
+static __always_inline void tdvps_vmcs_check(u32 field, u8 bits)
+{
+#define VMCS_ENC_ACCESS_TYPE_MASK	0x1UL
+#define VMCS_ENC_ACCESS_TYPE_FULL	0x0UL
+#define VMCS_ENC_ACCESS_TYPE_HIGH	0x1UL
+#define VMCS_ENC_ACCESS_TYPE(field)	((field) & VMCS_ENC_ACCESS_TYPE_MASK)
+
+	/* TDX is 64bit only.  HIGH field isn't supported. */
+	BUILD_BUG_ON_MSG(__builtin_constant_p(field) &&
+			 VMCS_ENC_ACCESS_TYPE(field) =3D=3D VMCS_ENC_ACCESS_TYPE_HIGH,
+			 "Read/Write to TD VMCS *_HIGH fields not supported");
+
+	BUILD_BUG_ON(bits !=3D 16 && bits !=3D 32 && bits !=3D 64);
+
+#define VMCS_ENC_WIDTH_MASK	GENMASK(14, 13)
+#define VMCS_ENC_WIDTH_16BIT	(0UL << 13)
+#define VMCS_ENC_WIDTH_64BIT	(1UL << 13)
+#define VMCS_ENC_WIDTH_32BIT	(2UL << 13)
+#define VMCS_ENC_WIDTH_NATURAL	(3UL << 13)
+#define VMCS_ENC_WIDTH(field)	((field) & VMCS_ENC_WIDTH_MASK)
+
+	/* TDX is 64bit only.  i.e. natural width =3D 64bit. */
+	BUILD_BUG_ON_MSG(bits !=3D 64 && __builtin_constant_p(field) &&
+			 (VMCS_ENC_WIDTH(field) =3D=3D VMCS_ENC_WIDTH_64BIT ||
+			  VMCS_ENC_WIDTH(field) =3D=3D VMCS_ENC_WIDTH_NATURAL),
+			 "Invalid TD VMCS access for 64-bit field");
+	BUILD_BUG_ON_MSG(bits !=3D 32 && __builtin_constant_p(field) &&
+			 VMCS_ENC_WIDTH(field) =3D=3D VMCS_ENC_WIDTH_32BIT,
+			 "Invalid TD VMCS access for 32-bit field");
+	BUILD_BUG_ON_MSG(bits !=3D 16 && __builtin_constant_p(field) &&
+			 VMCS_ENC_WIDTH(field) =3D=3D VMCS_ENC_WIDTH_16BIT,
+			 "Invalid TD VMCS access for 16-bit field");
+}
+
+static __always_inline void tdvps_state_non_arch_check(u64 field, u8 bits)=
 {}
+static __always_inline void tdvps_management_check(u64 field, u8 bits) {}
+
+#define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass)				\
+static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *t=
dx,	\
+							u32 field)		\
+{										\
+	struct tdx_module_output out;						\
+	u64 err;								\
+										\
+	tdvps_##lclass##_check(field, bits);					\
+	err =3D tdh_vp_rd(tdx->tdvpr.pa, TDVPS_##uclass(field), &out);		\
+	if (unlikely(err)) {							\
+		pr_err("TDH_VP_RD["#uclass".0x%x] failed: 0x%llx\n",		\
+		       field, err);						\
+		return 0;							\
+	}									\
+	return (u##bits)out.r8;							\
+}										\
+static __always_inline void td_##lclass##_write##bits(struct vcpu_tdx *tdx=
,	\
+						      u32 field, u##bits val)	\
+{										\
+	struct tdx_module_output out;						\
+	u64 err;								\
+										\
+	tdvps_##lclass##_check(field, bits);					\
+	err =3D tdh_vp_wr(tdx->tdvpr.pa, TDVPS_##uclass(field), val,		\
+		      GENMASK_ULL(bits - 1, 0), &out);				\
+	if (unlikely(err))							\
+		pr_err("TDH_VP_WR["#uclass".0x%x] =3D 0x%llx failed: 0x%llx\n",	\
+		       field, (u64)val, err);					\
+}										\
+static __always_inline void td_##lclass##_setbit##bits(struct vcpu_tdx *td=
x,	\
+						       u32 field, u64 bit)	\
+{										\
+	struct tdx_module_output out;						\
+	u64 err;								\
+										\
+	tdvps_##lclass##_check(field, bits);					\
+	err =3D tdh_vp_wr(tdx->tdvpr.pa, TDVPS_##uclass(field), bit, bit,		\
+			&out);							\
+	if (unlikely(err))							\
+		pr_err("TDH_VP_WR["#uclass".0x%x] |=3D 0x%llx failed: 0x%llx\n",	\
+		       field, bit, err);					\
+}										\
+static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *=
tdx,	\
+							 u32 field, u64 bit)	\
+{										\
+	struct tdx_module_output out;						\
+	u64 err;								\
+										\
+	tdvps_##lclass##_check(field, bits);					\
+	err =3D tdh_vp_wr(tdx->tdvpr.pa, TDVPS_##uclass(field), 0, bit,		\
+			&out);							\
+	if (unlikely(err))							\
+		pr_err("TDH_VP_WR["#uclass".0x%x] &=3D ~0x%llx failed: 0x%llx\n",	\
+		       field, bit,  err);					\
+}
+
+TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs);
+TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs);
+TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs);
+
+TDX_BUILD_TDVPS_ACCESSORS(64, STATE_NON_ARCH, state_non_arch);
+TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management);
+
 #else
 struct kvm_tdx {
 	struct kvm kvm;
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 83164C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:19:32 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231676AbiI3KT2 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:19:28 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33562 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229875AbiI3KS4 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:56 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2479D15ED3C;
        Fri, 30 Sep 2022 03:18:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533135; x=1696069135;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=CyEf0ivv3u22DSevTnSw0YDP9x61vVNbcPFc3xYqGIU=;
  b=QtA6+Ryw44jpj5CuEkVvLv40SuMRcoeHG7wMWTbezpaiH1R9pdoEoQwd
   NZKJDAi9FZc/SxM98zzYk8qbqfi6iOIBan6fTOR4w03wvhKoOB8WM3HO2
   Q12/Mh/mPtZombBeDBgytthe7QQqeEROM2Bcoh6bSiIgod8XpeZYT2wBD
   oIxsZUhr/OEyjtfX3n5kozM6BcsJpMJ1FLH9GP35CAGqB3+x3bogmO8Zv
   /1GtzFW+XriLgEb9fTEBvRuvnlyvhPHG+uJSTtyAlaM0PbLeefCH0hgF7
   jtCLLCZUyZpkEfXxINvORpORPCLLquglxUfhMb7kgJxlM9joutAsKFuTt
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870064"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870064"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807553"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807553"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 015/105] x86/cpu: Add helper functions to allocate/free TDX
 private host key id
Date: Fri, 30 Sep 2022 03:17:09 -0700
Message-Id: 
 <33de3a9482c64959d4cad159d8688859cd3e518c.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX private host key id is assigned to guest TD.  The memory controller
encrypts guest TD memory with the assigned TDX private host key id (HIKD).
Add helper functions to allocate/free TDX private host key id so that TDX
KVM manage it.

Also export the global TDX private host key id that is used to encrypt TDX
module, its memory and some dynamic data (TDR).  When VMM releasing
encrypted page to reuse it, the page needs to be flushed with the used host
key id.  VMM needs the global TDX private host key id to flush such pages
TDX module accesses with the global TDX private host key id.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/tdx.h  | 12 ++++++++++++
 arch/x86/virt/vmx/tdx/tdx.c | 28 +++++++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index c887618e3cec..a32e8881e758 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -144,6 +144,16 @@ struct tdsysinfo_struct {
 bool platform_tdx_enabled(void);
 int tdx_init(void);
 const struct tdsysinfo_struct *tdx_get_sysinfo(void);
+/*
+ * Key id globally used by TDX module: TDX module maps TDR with this TDX g=
lobal
+ * key id.  TDR includes key id assigned to the TD.  Then TDX module maps =
other
+ * TD-related pages with the assigned key id.  TDR requires this TDX globa=
l key
+ * id for cache flush unlike other TD-related pages.
+ */
+extern u32 tdx_global_keyid __read_mostly;
+int tdx_keyid_alloc(void);
+void tdx_keyid_free(int keyid);
+
 u64 __seamcall(u64 op, u64 rcx, u64 rdx, u64 r8, u64 r9,
 	       struct tdx_module_output *out);
 #else	/* !CONFIG_INTEL_TDX_HOST */
@@ -151,6 +161,8 @@ static inline bool platform_tdx_enabled(void) { return =
false; }
 static inline int tdx_init(void)  { return -ENODEV; }
 struct tdsysinfo_struct;
 static inline const struct tdsysinfo_struct *tdx_get_sysinfo(void) { retur=
n NULL; }
+static inline int tdx_keyid_alloc(void) { return -EOPNOTSUPP; }
+static inline void tdx_keyid_free(int keyid) { }
 #endif	/* CONFIG_INTEL_TDX_HOST */
=20
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 4054a917ca97..391091427ed4 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -56,7 +56,8 @@ static struct cmr_info tdx_cmr_array[MAX_CMRS] __aligned(=
CMR_INFO_ARRAY_ALIGNMEN
 static int tdx_cmr_num;
=20
 /* TDX module global KeyID.  Used in TDH.SYS.CONFIG ABI. */
-static u32 tdx_global_keyid;
+u32 tdx_global_keyid __read_mostly;
+EXPORT_SYMBOL_GPL(tdx_global_keyid);
=20
 /* Detect whether CPU supports SEAM */
 static int detect_seam(void)
@@ -80,6 +81,31 @@ static int detect_seam(void)
 	return 0;
 }
=20
+/* TDX KeyID pool */
+static DEFINE_IDA(tdx_keyid_pool);
+
+int tdx_keyid_alloc(void)
+{
+	if (WARN_ON_ONCE(!tdx_keyid_start || !tdx_keyid_num))
+		return -EINVAL;
+
+	/* The first keyID is reserved for the global key. */
+	return ida_alloc_range(&tdx_keyid_pool, tdx_keyid_start + 1,
+			       tdx_keyid_start + tdx_keyid_num - 1,
+			       GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(tdx_keyid_alloc);
+
+void tdx_keyid_free(int keyid)
+{
+	/* keyid =3D 0 is reserved. */
+	if (!keyid || keyid <=3D 0)
+		return;
+
+	ida_free(&tdx_keyid_pool, keyid);
+}
+EXPORT_SYMBOL_GPL(tdx_keyid_free);
+
 static int detect_tdx_keyids(void)
 {
 	u64 keyid_part;
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 83931C4332F
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:19:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231755AbiI3KTx (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:19:53 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33566 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230344AbiI3KS5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:57 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF4D5163B60;
        Fri, 30 Sep 2022 03:18:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533135; x=1696069135;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=VqVQRLI+FFCzT6RTshXMRnC6EZeRpTOcBecAEo2Z4Bw=;
  b=F6eSBIykiS7qXEqn53RuXXFsgoPn5Uf+LNYuk90e4ng7P6kKR1Ccrm55
   BivYx03ZZLtW8CF4tdDULoV7juguf+vw1KSc+6WGN2CXowZDGMWS0gRrd
   jnC604NybYW1z6YnDU5sKIzqZuVHp5trOn6y0iQTw0peXMgORk6OSjVPS
   Rk0BLeALnW0Z4d/lYKkKU+C7AEJWu9UHJvVxe2SUjAIRALMTpMD65mnkO
   MqGOBYFLU6nGoal5meUrADk6ulcy2rW96fSNmX63kLXKPlxV/Es62BTkR
   34XNs/IBwcv68x/y8yedwnXKpkzSjzAqKa90qKkG2eCB2VlxiR5w2lHMl
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870066"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870066"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:54 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807556"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807556"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:53 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Kai Huang <kai.huang@intel.com>
Subject: [PATCH v9 016/105] KVM: TDX: create/destroy VM structure
Date: Fri, 30 Sep 2022 03:17:10 -0700
Message-Id: 
 <07bf749357bbf1acd20b09d7ab1fac940082632c.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

As the first step to create TDX guest, create/destroy VM struct.  Assign
TDX private Host Key ID (HKID) to the TDX guest for memory encryption and
allocate extra pages for the TDX guest. On destruction, free allocated
pages, and HKID.

Before tearing down private page tables, TDX requires some resources of the
guest TD to be destroyed (i.e. keyID must have been reclaimed, etc).  Add
flush_shadow_all_private callback before tearing down private page tables
for it.

Add a second kvm_x86_ops hook in kvm_arch_destroy_vm() to support TDX's
destruction path, which needs to first put the VM into a teardown state,
then free per-vCPU resources, and finally free per-VM resources.

Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |   2 +
 arch/x86/include/asm/kvm_host.h    |   2 +
 arch/x86/kvm/vmx/main.c            |  34 ++-
 arch/x86/kvm/vmx/tdx.c             | 409 +++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h             |   2 +
 arch/x86/kvm/vmx/x86_ops.h         |  11 +
 arch/x86/kvm/x86.c                 |   8 +
 7 files changed, 465 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 3857bff6949c..968e5ba1e4e6 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -21,7 +21,9 @@ KVM_X86_OP(has_emulated_msr)
 KVM_X86_OP(vcpu_after_set_cpuid)
 KVM_X86_OP(is_vm_type_supported)
 KVM_X86_OP(vm_init)
+KVM_X86_OP_OPTIONAL(flush_shadow_all_private)
 KVM_X86_OP_OPTIONAL(vm_destroy)
+KVM_X86_OP_OPTIONAL(vm_free)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
 KVM_X86_OP(vcpu_create)
 KVM_X86_OP(vcpu_free)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index e9f4bff8e3a9..f3d16e5730ac 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1466,7 +1466,9 @@ struct kvm_x86_ops {
 	bool (*is_vm_type_supported)(unsigned long vm_type);
 	unsigned int vm_size;
 	int (*vm_init)(struct kvm *kvm);
+	void (*flush_shadow_all_private)(struct kvm *kvm);
 	void (*vm_destroy)(struct kvm *kvm);
+	void (*vm_free)(struct kvm *kvm);
=20
 	/* Create, but do not attach this VCPU */
 	int (*vcpu_precreate)(struct kvm *kvm);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 477c14b64879..408afa691bad 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -29,18 +29,44 @@ static __init int vt_hardware_setup(void)
 	return 0;
 }
=20
+static void vt_hardware_unsetup(void)
+{
+	tdx_hardware_unsetup();
+	vmx_hardware_unsetup();
+}
+
 static int vt_vm_init(struct kvm *kvm)
 {
 	if (is_td(kvm))
-		return -EOPNOTSUPP;	/* Not ready to create guest TD yet. */
+		return tdx_vm_init(kvm);
=20
 	return vmx_vm_init(kvm);
 }
=20
+static void vt_flush_shadow_all_private(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return tdx_mmu_release_hkid(kvm);
+}
+
+static void vt_vm_destroy(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return;
+
+	vmx_vm_destroy(kvm);
+}
+
+static void vt_vm_free(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return tdx_vm_free(kvm);
+}
+
 struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
=20
-	.hardware_unsetup =3D vmx_hardware_unsetup,
+	.hardware_unsetup =3D vt_hardware_unsetup,
 	.check_processor_compatibility =3D vmx_check_processor_compatibility,
=20
 	.hardware_enable =3D vmx_hardware_enable,
@@ -50,7 +76,9 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.is_vm_type_supported =3D vt_is_vm_type_supported,
 	.vm_size =3D sizeof(struct kvm_vmx),
 	.vm_init =3D vt_vm_init,
-	.vm_destroy =3D vmx_vm_destroy,
+	.flush_shadow_all_private =3D vt_flush_shadow_all_private,
+	.vm_destroy =3D vt_vm_destroy,
+	.vm_free =3D vt_vm_free,
=20
 	.vcpu_precreate =3D vmx_vcpu_precreate,
 	.vcpu_create =3D vmx_vcpu_create,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index c4a318efbed5..93174b10e1ea 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -32,6 +32,399 @@ struct tdx_capabilities {
 /* Capabilities of KVM + the TDX module. */
 static struct tdx_capabilities tdx_caps;
=20
+/*
+ * Some TDX SEAMCALLs (TDH.MNG.CREATE, TDH.PHYMEM.CACHE.WB,
+ * TDH.MNG.KEY.RECLAIMID, TDH.MNG.KEY.FREEID etc) tries to acquire a globa=
l lock
+ * internally in TDX module.  If failed, TDX_OPERAND_BUSY is returned with=
out
+ * spinning or waiting due to a constraint on execution time.  It's caller=
's
+ * responsibility to avoid race (or retry on TDX_OPERAND_BUSY).  Use this =
mutex
+ * to avoid race in TDX module because the kernel knows better about sched=
uling.
+ */
+static DEFINE_MUTEX(tdx_lock);
+static struct mutex *tdx_mng_key_config_lock;
+
+static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
+{
+	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
+}
+
+static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->tdr.added;
+}
+
+static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx)
+{
+	tdx_keyid_free(kvm_tdx->hkid);
+	kvm_tdx->hkid =3D -1;
+}
+
+static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->hkid > 0;
+}
+
+static void tdx_clear_page(unsigned long page)
+{
+	const void *zero_page =3D (const void *) __va(page_to_phys(ZERO_PAGE(0)));
+	unsigned long i;
+
+	/*
+	 * Zeroing the page is only necessary for systems with MKTME-i:
+	 * when re-assign one page from old keyid to a new keyid, MOVDIR64B is
+	 * required to clear/write the page with new keyid to prevent integrity
+	 * error when read on the page with new keyid.
+	 *
+	 * The cache line could be poisoned (even without MKTME-i), clear the
+	 * poison bit.
+	 */
+	for (i =3D 0; i < PAGE_SIZE; i +=3D 64)
+		movdir64b((void *)(page + i), zero_page);
+	/*
+	 * MOVDIR64B store uses WC buffer.  Prevent following memory reads
+	 * from seeing potentially poisoned cache.
+	 */
+	__mb();
+}
+
+static int tdx_reclaim_page(unsigned long va, hpa_t pa, bool do_wb, u16 hk=
id)
+{
+	struct tdx_module_output out;
+	u64 err;
+
+	err =3D tdh_phymem_page_reclaim(pa, &out);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_PHYMEM_PAGE_RECLAIM, err, &out);
+		return -EIO;
+	}
+
+	if (do_wb) {
+		err =3D tdh_phymem_page_wbinvd(set_hkid_to_hpa(pa, hkid));
+		if (WARN_ON_ONCE(err)) {
+			pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL);
+			return -EIO;
+		}
+	}
+
+	tdx_clear_page(va);
+	return 0;
+}
+
+static int tdx_alloc_td_page(struct tdx_td_page *page)
+{
+	page->va =3D __get_free_page(GFP_KERNEL_ACCOUNT);
+	if (!page->va)
+		return -ENOMEM;
+
+	page->pa =3D __pa(page->va);
+	return 0;
+}
+
+static inline void tdx_mark_td_page_added(struct tdx_td_page *page)
+{
+	WARN_ON_ONCE(page->added);
+	page->added =3D true;
+}
+
+static void tdx_reclaim_td_page(struct tdx_td_page *page)
+{
+	if (page->added) {
+		/*
+		 * TDCX are being reclaimed.  TDX module maps TDCX with HKID
+		 * assigned to the TD.  Here the cache associated to the TD
+		 * was already flushed by TDH.PHYMEM.CACHE.WB before here, So
+		 * cache doesn't need to be flushed again.
+		 */
+		if (tdx_reclaim_page(page->va, page->pa, false, 0))
+			return;
+
+		page->added =3D false;
+	}
+	if (page->va) {
+		free_page(page->va);
+		page->va =3D 0;
+	}
+}
+
+static int tdx_do_tdh_phymem_cache_wb(void *param)
+{
+	u64 err =3D 0;
+
+	do {
+		err =3D tdh_phymem_cache_wb(!!err);
+	} while (err =3D=3D TDX_INTERRUPTED_RESUMABLE);
+
+	/* Other thread may have done for us. */
+	if (err =3D=3D TDX_NO_HKID_READY_TO_WBCACHE)
+		err =3D TDX_SUCCESS;
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_PHYMEM_CACHE_WB, err, NULL);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+void tdx_mmu_release_hkid(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	cpumask_var_t packages;
+	bool cpumask_allocated;
+	u64 err;
+	int ret;
+	int i;
+
+	if (!is_hkid_assigned(kvm_tdx))
+		return;
+
+	if (!is_td_created(kvm_tdx))
+		goto free_hkid;
+
+	cpumask_allocated =3D zalloc_cpumask_var(&packages, GFP_KERNEL);
+	cpus_read_lock();
+	for_each_online_cpu(i) {
+		if (cpumask_allocated &&
+			cpumask_test_and_set_cpu(topology_physical_package_id(i),
+						packages))
+			continue;
+
+		/*
+		 * We can destroy multiple the guest TDs simultaneously.
+		 * Prevent tdh_phymem_cache_wb from returning TDX_BUSY by
+		 * serialization.
+		 */
+		mutex_lock(&tdx_lock);
+		ret =3D smp_call_on_cpu(i, tdx_do_tdh_phymem_cache_wb, NULL, 1);
+		mutex_unlock(&tdx_lock);
+		if (ret)
+			break;
+	}
+	cpus_read_unlock();
+	free_cpumask_var(packages);
+
+	mutex_lock(&tdx_lock);
+	err =3D tdh_mng_key_freeid(kvm_tdx->tdr.pa);
+	mutex_unlock(&tdx_lock);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_KEY_FREEID, err, NULL);
+		pr_err("tdh_mng_key_freeid failed. HKID %d is leaked.\n",
+			kvm_tdx->hkid);
+		return;
+	}
+
+free_hkid:
+	tdx_hkid_free(kvm_tdx);
+}
+
+void tdx_vm_free(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	int i;
+
+	/* Can't reclaim or free TD pages if teardown failed. */
+	if (is_hkid_assigned(kvm_tdx))
+		return;
+
+	if (kvm_tdx->tdcs) {
+		for (i =3D 0; i < tdx_caps.tdcs_nr_pages; i++)
+			tdx_reclaim_td_page(&kvm_tdx->tdcs[i]);
+		kfree(kvm_tdx->tdcs);
+	}
+
+	/*
+	 * TDX module maps TDR with TDX global HKID.  TDX module may access TDR
+	 * while operating on TD (Especially reclaiming TDCS).  Cache flush with
+	 * TDX global HKID is needed.
+	 */
+	if (kvm_tdx->tdr.added &&
+		tdx_reclaim_page(kvm_tdx->tdr.va, kvm_tdx->tdr.pa, true,
+				tdx_global_keyid))
+		return;
+
+	free_page(kvm_tdx->tdr.va);
+}
+
+static int tdx_do_tdh_mng_key_config(void *param)
+{
+	hpa_t *tdr_p =3D param;
+	u64 err;
+
+	do {
+		err =3D tdh_mng_key_config(*tdr_p);
+
+		/*
+		 * If it failed to generate a random key, retry it because this
+		 * is typically caused by an entropy error of the CPU's random
+		 * number generator.
+		 */
+	} while (err =3D=3D TDX_KEY_GENERATION_FAILED);
+
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_KEY_CONFIG, err, NULL);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+int tdx_vm_init(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	cpumask_var_t packages;
+	int ret, i;
+	u64 err;
+
+	/* vCPUs can't be created until after KVM_TDX_INIT_VM. */
+	kvm->max_vcpus =3D 0;
+
+	kvm_tdx->hkid =3D tdx_keyid_alloc();
+	if (kvm_tdx->hkid < 0)
+		return -EBUSY;
+
+	ret =3D tdx_alloc_td_page(&kvm_tdx->tdr);
+	if (ret)
+		goto free_hkid;
+
+	kvm_tdx->tdcs =3D kcalloc(tdx_caps.tdcs_nr_pages, sizeof(*kvm_tdx->tdcs),
+				GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	if (!kvm_tdx->tdcs)
+		goto free_tdr;
+	for (i =3D 0; i < tdx_caps.tdcs_nr_pages; i++) {
+		ret =3D tdx_alloc_td_page(&kvm_tdx->tdcs[i]);
+		if (ret)
+			goto free_tdcs;
+	}
+
+	/*
+	 * Acquire global lock to avoid TDX_OPERAND_BUSY:
+	 * TDH.MNG.CREATE and other APIs try to lock the global Key Owner
+	 * Table (KOT) to track the assigned TDX private HKID.  It doesn't spin
+	 * to acquire the lock, returns TDX_OPERAND_BUSY instead, and let the
+	 * caller to handle the contention.  This is because of time limitation
+	 * usable inside the TDX module and OS/VMM knows better about process
+	 * scheduling.
+	 *
+	 * APIs to acquire the lock of KOT:
+	 * TDH.MNG.CREATE, TDH.MNG.KEY.FREEID, TDH.MNG.VPFLUSHDONE, and
+	 * TDH.PHYMEM.CACHE.WB.
+	 */
+	mutex_lock(&tdx_lock);
+	err =3D tdh_mng_create(kvm_tdx->tdr.pa, kvm_tdx->hkid);
+	mutex_unlock(&tdx_lock);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_CREATE, err, NULL);
+		ret =3D -EIO;
+		goto free_tdcs;
+	}
+	tdx_mark_td_page_added(&kvm_tdx->tdr);
+
+	if (!zalloc_cpumask_var(&packages, GFP_KERNEL)) {
+		ret =3D -ENOMEM;
+		goto key_freeid;
+	}
+	cpus_read_lock();
+	/*
+	 * Need at least one CPU of the package to be online in order to
+	 * program all packages for host key id.  Check it.
+	 */
+	for_each_present_cpu(i)
+		cpumask_set_cpu(topology_physical_package_id(i), packages);
+	for_each_online_cpu(i)
+		cpumask_clear_cpu(topology_physical_package_id(i), packages);
+	if (!cpumask_empty(packages)) {
+		ret =3D -EIO;
+		/*
+		 * Because it's hard for human operator to figure out the
+		 * reason, warn it.
+		 */
+		pr_warn("All packages need to have online CPU to create TD. Online CPU a=
nd retry.\n");
+		cpus_read_unlock();
+		free_cpumask_var(packages);
+		goto key_freeid;
+	}
+	for_each_online_cpu(i) {
+		int pkg =3D topology_physical_package_id(i);
+
+		if (cpumask_test_and_set_cpu(pkg, packages))
+			continue;
+
+		/*
+		 * Program the memory controller in the package with an
+		 * encryption key associated to a TDX private host key id
+		 * assigned to this TDR.  Concurrent operations on same memory
+		 * controller results in TDX_OPERAND_BUSY.  Avoid this race by
+		 * mutex.
+		 */
+		mutex_lock(&tdx_mng_key_config_lock[pkg]);
+		ret =3D smp_call_on_cpu(i, tdx_do_tdh_mng_key_config,
+				      &kvm_tdx->tdr.pa, true);
+		mutex_unlock(&tdx_mng_key_config_lock[pkg]);
+		if (ret)
+			break;
+	}
+	cpus_read_unlock();
+	free_cpumask_var(packages);
+	if (ret)
+		goto teardown;
+
+	for (i =3D 0; i < tdx_caps.tdcs_nr_pages; i++) {
+		err =3D tdh_mng_addcx(kvm_tdx->tdr.pa, kvm_tdx->tdcs[i].pa);
+		if (WARN_ON_ONCE(err)) {
+			pr_tdx_error(TDH_MNG_ADDCX, err, NULL);
+			ret =3D -EIO;
+			goto teardown;
+		}
+		tdx_mark_td_page_added(&kvm_tdx->tdcs[i]);
+	}
+
+	/*
+	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a de=
dicated
+	 * ioctl() to define the configure CPUID values for the TD.
+	 */
+	return 0;
+
+	/*
+	 * The sequence for freeing resources from a partially initialized TD
+	 * varies based on where in the initialization flow failure occurred.
+	 * Simply use the full teardown and destroy, which naturally play nice
+	 * with partial initialization.
+	 */
+teardown:
+	tdx_mmu_release_hkid(kvm);
+	tdx_vm_free(kvm);
+	return ret;
+
+key_freeid:
+	mutex_lock(&tdx_lock);
+	err =3D tdh_mng_key_freeid(kvm_tdx->tdr.pa);
+	mutex_unlock(&tdx_lock);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_KEY_FREEID, err, NULL);
+		pr_err("tdh_mng_key_freeid failed. HKID %d is leaked.\n",
+		       kvm_tdx->hkid);
+		kvm_tdx->hkid =3D -1;
+	}
+free_tdcs:
+	for (i =3D 0; i < tdx_caps.tdcs_nr_pages; i++) {
+		if (!kvm_tdx->tdcs[i].va)
+			continue;
+		free_page(kvm_tdx->tdcs[i].va);
+	}
+	kfree(kvm_tdx->tdcs);
+	kvm_tdx->tdcs =3D NULL;
+free_tdr:
+	if (kvm_tdx->tdr.va) {
+		free_page(kvm_tdx->tdr.va);
+		kvm_tdx->tdr.added =3D false;
+		kvm_tdx->tdr.va =3D 0;
+		kvm_tdx->tdr.pa =3D 0;
+	}
+free_hkid:
+	if (kvm_tdx->hkid !=3D -1)
+		tdx_hkid_free(kvm_tdx);
+	return ret;
+}
+
 static int __init tdx_module_setup(void)
 {
 	const struct tdsysinfo_struct *tdsysinfo;
@@ -82,6 +475,8 @@ bool tdx_is_vm_type_supported(unsigned long type)
=20
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops)
 {
+	int max_pkgs;
+	int i;
 	int r;
=20
 	if (!enable_ept) {
@@ -95,6 +490,14 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_o=
ps)
 		return -ENODEV;
 	}
=20
+	max_pkgs =3D topology_max_packages();
+	tdx_mng_key_config_lock =3D kcalloc(max_pkgs, sizeof(*tdx_mng_key_config_=
lock),
+				   GFP_KERNEL);
+	if (!tdx_mng_key_config_lock)
+		return -ENOMEM;
+	for (i =3D 0; i < max_pkgs; i++)
+		mutex_init(&tdx_mng_key_config_lock[i]);
+
 	/* TDX requires VMX. */
 	r =3D vmxon_all();
 	if (!r)
@@ -103,3 +506,9 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_o=
ps)
=20
 	return r;
 }
+
+void tdx_hardware_unsetup(void)
+{
+	/* kfree accepts NULL. */
+	kfree(tdx_mng_key_config_lock);
+}
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 98999bf3f188..938314635b47 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -17,6 +17,8 @@ struct kvm_tdx {
=20
 	struct tdx_td_page tdr;
 	struct tdx_td_page *tdcs;
+
+	int hkid;
 };
=20
 struct vcpu_tdx {
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 2a870202fbf6..ce50ddef84bf 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -133,9 +133,20 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu);
 #ifdef CONFIG_INTEL_TDX_HOST
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
 bool tdx_is_vm_type_supported(unsigned long type);
+void tdx_hardware_unsetup(void);
+
+int tdx_vm_init(struct kvm *kvm);
+void tdx_mmu_release_hkid(struct kvm *kvm);
+void tdx_vm_free(struct kvm *kvm);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
+static inline void tdx_hardware_unsetup(void) {}
+
+static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
+static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
+static inline void tdx_flush_shadow_all_private(struct kvm *kvm) {}
+static inline void tdx_vm_free(struct kvm *kvm) {}
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1df0dac476bc..82e6f54b35fb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12426,6 +12426,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_page_track_cleanup(kvm);
 	kvm_xen_destroy_vm(kvm);
 	kvm_hv_destroy_vm(kvm);
+	static_call_cond(kvm_x86_vm_free)(kvm);
 }
=20
 static void memslot_rmap_free(struct kvm_memory_slot *slot)
@@ -12736,6 +12737,13 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
=20
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
+	/*
+	 * kvm_mmu_zap_all() zaps both private and shared page tables.  Before
+	 * tearing down private page tables, TDX requires some TD resources to
+	 * be destroyed (i.e. keyID must have been reclaimed, etc).  Invoke
+	 * kvm_x86_flush_shadow_all_private() for this.
+	 */
+	static_call_cond(kvm_x86_flush_shadow_all_private)(kvm);
 	kvm_mmu_zap_all(kvm);
 }
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4037CC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:19:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231696AbiI3KTe (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:19:34 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33582 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231297AbiI3KS4 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:56 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF6031664AC;
        Fri, 30 Sep 2022 03:18:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533135; x=1696069135;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=9bbInB0ukaubXcuWCjnrDa7KuqBfgc8oZMafM3ru0lA=;
  b=CMzZvMvCTJQ70o89o73peP5HGPca8SZvlC9V9hrWn0aFnj/6Q/ajbwod
   nztGfsCnCmnHKzrMjIjjTExvNQZ82/DlJyz0os+ej68W/or4s6K/EtM6Y
   +CWMm9lTEgTjDcRONSeOnKQDsPv87Lri6BUbAFLSMj95nW94WPcAiJYFO
   cJOqiVHgck/vH2bPDqwL0xBq2M/Lzb3+0UwnhLrGdDHYugrTjJqgSEhao
   rVA9Cx5u4FDKUT5eDauwycL9IitUaDo3/UwhWLt8pS3F1ctP0Z8jVzzOB
   j/tQIyKZi+1NgBUZ0qAKaoeBh1MBL2HOZDmkeVLCp1sV6ZXv+enmItm90
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870067"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870067"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:54 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807559"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807559"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:54 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 017/105] KVM: TDX: Refuse to unplug the last cpu on the
 package
Date: Fri, 30 Sep 2022 03:17:11 -0700
Message-Id: 
 <3521fdb3fcf30043a0310ed226c06034d6b1cadd.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

In order to reclaim TDX host key id, (i.e. when deleting guest TD), needs
to call TDH.PHYMEM.PAGE.WBINVD on all packages.  If we have used TDX host
key id, refuse to offline the last online cpu.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/vmx/main.c            |  1 +
 arch/x86/kvm/vmx/tdx.c             | 40 +++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/x86_ops.h         |  2 ++
 arch/x86/kvm/x86.c                 | 27 ++++++++++++--------
 6 files changed, 61 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 968e5ba1e4e6..cc7df6fccbb2 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -17,6 +17,7 @@ BUILD_BUG_ON(1)
 KVM_X86_OP(hardware_enable)
 KVM_X86_OP(hardware_disable)
 KVM_X86_OP(hardware_unsetup)
+KVM_X86_OP_OPTIONAL_RET0(offline_cpu)
 KVM_X86_OP(has_emulated_msr)
 KVM_X86_OP(vcpu_after_set_cpuid)
 KVM_X86_OP(is_vm_type_supported)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index f3d16e5730ac..7cd5e5917bab 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1460,6 +1460,7 @@ struct kvm_x86_ops {
 	int (*hardware_enable)(void);
 	void (*hardware_disable)(void);
 	void (*hardware_unsetup)(void);
+	int (*offline_cpu)(void);
 	bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
 	void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
=20
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 408afa691bad..43cc3af7d0ec 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -67,6 +67,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
=20
 	.hardware_unsetup =3D vt_hardware_unsetup,
+	.offline_cpu =3D tdx_offline_cpu,
 	.check_processor_compatibility =3D vmx_check_processor_compatibility,
=20
 	.hardware_enable =3D vmx_hardware_enable,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 93174b10e1ea..6b07451d6661 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -42,6 +42,7 @@ static struct tdx_capabilities tdx_caps;
  */
 static DEFINE_MUTEX(tdx_lock);
 static struct mutex *tdx_mng_key_config_lock;
+static atomic_t nr_configured_hkid;
=20
 static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
 {
@@ -210,7 +211,8 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
 		pr_err("tdh_mng_key_freeid failed. HKID %d is leaked.\n",
 			kvm_tdx->hkid);
 		return;
-	}
+	} else
+		atomic_dec(&nr_configured_hkid);
=20
 free_hkid:
 	tdx_hkid_free(kvm_tdx);
@@ -362,6 +364,8 @@ int tdx_vm_init(struct kvm *kvm)
 		if (ret)
 			break;
 	}
+	if (!ret)
+		atomic_inc(&nr_configured_hkid);
 	cpus_read_unlock();
 	free_cpumask_var(packages);
 	if (ret)
@@ -512,3 +516,37 @@ void tdx_hardware_unsetup(void)
 	/* kfree accepts NULL. */
 	kfree(tdx_mng_key_config_lock);
 }
+
+int tdx_offline_cpu(void)
+{
+	int curr_cpu =3D smp_processor_id();
+	cpumask_var_t packages;
+	int ret =3D 0;
+	int i;
+
+	if (!atomic_read(&nr_configured_hkid))
+		return 0;
+
+	/*
+	 * To reclaim hkid, need to call TDH.PHYMEM.PAGE.WBINVD on all packages.
+	 * If this is the last online cpu on the package, refuse offline.
+	 */
+	if (!zalloc_cpumask_var(&packages, GFP_KERNEL))
+		return -ENOMEM;
+
+	for_each_online_cpu(i) {
+		if (i !=3D curr_cpu)
+			cpumask_set_cpu(topology_physical_package_id(i), packages);
+	}
+	if (!cpumask_test_cpu(topology_physical_package_id(curr_cpu), packages))
+		ret =3D -EBUSY;
+	free_cpumask_var(packages);
+	if (ret)
+		/*
+		 * Because it's hard for human operator to understand the
+		 * reason, warn it.
+		 */
+		pr_warn("TDX requires all packages to have an online CPU.  "
+			"Delete all TDs in order to offline all CPUs of a package.\n");
+	return ret;
+}
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index ce50ddef84bf..5ab78e044895 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -134,6 +134,7 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu);
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
 bool tdx_is_vm_type_supported(unsigned long type);
 void tdx_hardware_unsetup(void);
+int tdx_offline_cpu(void);
=20
 int tdx_vm_init(struct kvm *kvm);
 void tdx_mmu_release_hkid(struct kvm *kvm);
@@ -142,6 +143,7 @@ void tdx_vm_free(struct kvm *kvm);
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
 static inline void tdx_hardware_unsetup(void) {}
+static inline int tdx_offline_cpu(void) { return 0; }
=20
 static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
 static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 82e6f54b35fb..351803fdc944 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12062,16 +12062,23 @@ int kvm_arch_online_cpu(unsigned int cpu, int usa=
ge_count)
=20
 int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
 {
-	if (usage_count) {
-		/*
-		 * arch callback kvm_arch_hardware_disable() assumes that
-		 * preemption is disabled for historical reason.  Disable
-		 * preemption until all arch callbacks are fixed.
-		 */
-		preempt_disable();
-		hardware_disable(NULL);
-		preempt_enable();
-	}
+	int ret;
+
+	if (!usage_count)
+		return 0;
+
+	ret =3D static_call(kvm_x86_offline_cpu)();
+	if (ret)
+		return ret;
+
+	/*
+	 * arch callback kvm_arch_hardware_disable() assumes that preemption is
+	 * disabled for historical reason.  Disable preemption until all arch
+	 * callbacks are fixed.
+	 */
+	preempt_disable();
+	hardware_disable(NULL);
+	preempt_enable();
 	return 0;
 }
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 78580C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:21:06 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231926AbiI3KVE (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:21:04 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33594 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231327AbiI3KS5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:57 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DDCF166F06;
        Fri, 30 Sep 2022 03:18:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533136; x=1696069136;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=57DcaYgo0nhRwmLiLZXU5ejuBcBiTJGfkiv/7V5oP+8=;
  b=C/qeXD0SgyjAOjsP/6JgPmC6QHDAgUcNDiTd/tLFBJbHPr0jpaYOVfTj
   WrRsxNEeOJKRFPuG3YDC2txIO19G0PKhYqwBpcsVVAtCMmJvPoV2tIh1v
   GDoSCdVwI4ejQocd0x3fCjs0c43WqJd3Ndko1SNXXAD5Cza9RKYqq+s5D
   srJImCVXA6Od+0BUNFCDAj9Zj5Q/uBM6GLh+UWipRsMPYo4m+l08f3I6j
   n3jXS3FCT/cezGt6aW5W21hemiYaUB7bYZTqNp1Pav+LreJJD77VTkdCt
   rsogMlzIaHCmdqhKCyvTFZUOBukkrP6mSjPQyXtFc+qn/17f0c2S3Cawx
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870068"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870068"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:54 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807562"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807562"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:54 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 018/105] KVM: TDX: x86: Add ioctl to get TDX systemwide
 parameters
Date: Fri, 30 Sep 2022 03:17:12 -0700
Message-Id: 
 <0309a74cf9a6d00099e4d920ae9c56d1c0195e91.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Implement a system-scoped ioctl to get system-wide parameters for TDX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h    |  1 +
 arch/x86/include/asm/kvm_host.h       |  1 +
 arch/x86/include/uapi/asm/kvm.h       | 48 +++++++++++++++++++++++++++
 arch/x86/kvm/vmx/main.c               |  2 ++
 arch/x86/kvm/vmx/tdx.c                | 46 +++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h            |  2 ++
 arch/x86/kvm/x86.c                    |  6 ++++
 tools/arch/x86/include/uapi/asm/kvm.h | 48 +++++++++++++++++++++++++++
 8 files changed, 154 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index cc7df6fccbb2..78b6b2c4d596 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -118,6 +118,7 @@ KVM_X86_OP(smi_allowed)
 KVM_X86_OP(enter_smm)
 KVM_X86_OP(leave_smm)
 KVM_X86_OP(enable_smi_window)
+KVM_X86_OP_OPTIONAL(dev_mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_register_region)
 KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 7cd5e5917bab..aa43733746c6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1620,6 +1620,7 @@ struct kvm_x86_ops {
 	int (*leave_smm)(struct kvm_vcpu *vcpu, const char *smstate);
 	void (*enable_smi_window)(struct kvm_vcpu *vcpu);
=20
+	int (*dev_mem_enc_ioctl)(void __user *argp);
 	int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp);
 	int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *ar=
gp);
 	int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *=
argp);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 54b08789c402..2ad9666e02a5 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -535,4 +535,52 @@ struct kvm_pmu_event_filter {
 #define KVM_X86_DEFAULT_VM	0
 #define KVM_X86_TDX_VM		1
=20
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum kvm_tdx_cmd_id {
+	KVM_TDX_CAPABILITIES =3D 0,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	/* enum kvm_tdx_cmd_id */
+	__u32 id;
+	/* flags for sub-commend. If sub-command doesn't use this, set zero. */
+	__u32 flags;
+	/*
+	 * data for each sub-command. An immediate or a pointer to the actual
+	 * data in process virtual address.  If sub-command doesn't use it,
+	 * set zero.
+	 */
+	__u64 data;
+	/*
+	 * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+	 * status code in addition to -Exxx.
+	 * Defined for consistency with struct kvm_sev_cmd.
+	 */
+	__u64 error;
+	/* Reserved: Defined for consistency with struct kvm_sev_cmd. */
+	__u64 unused;
+};
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+
+	__u32 nr_cpuid_configs;
+	__u32 padding;
+	struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 43cc3af7d0ec..071e7d148cb7 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -203,6 +203,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.complete_emulated_msr =3D kvm_complete_insn_gp,
=20
 	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
+
+	.dev_mem_enc_ioctl =3D tdx_dev_ioctl,
 };
=20
 struct kvm_x86_init_ops vt_init_ops __initdata =3D {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6b07451d6661..60c5c08593c3 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -429,6 +429,52 @@ int tdx_vm_init(struct kvm *kvm)
 	return ret;
 }
=20
+int tdx_dev_ioctl(void __user *argp)
+{
+	struct kvm_tdx_capabilities __user *user_caps;
+	struct kvm_tdx_capabilities caps;
+	struct kvm_tdx_cmd cmd;
+
+	BUILD_BUG_ON(sizeof(struct kvm_tdx_cpuid_config) !=3D
+		     sizeof(struct tdx_cpuid_config));
+
+	if (copy_from_user(&cmd, argp, sizeof(cmd)))
+		return -EFAULT;
+	if (cmd.flags || cmd.error || cmd.unused)
+		return -EINVAL;
+	/*
+	 * Currently only KVM_TDX_CAPABILITIES is defined for system-scoped
+	 * mem_enc_ioctl().
+	 */
+	if (cmd.id !=3D KVM_TDX_CAPABILITIES)
+		return -EINVAL;
+
+	user_caps =3D (void __user *)cmd.data;
+	if (copy_from_user(&caps, user_caps, sizeof(caps)))
+		return -EFAULT;
+
+	if (caps.nr_cpuid_configs < tdx_caps.nr_cpuid_configs)
+		return -E2BIG;
+
+	caps =3D (struct kvm_tdx_capabilities) {
+		.attrs_fixed0 =3D tdx_caps.attrs_fixed0,
+		.attrs_fixed1 =3D tdx_caps.attrs_fixed1,
+		.xfam_fixed0 =3D tdx_caps.xfam_fixed0,
+		.xfam_fixed1 =3D tdx_caps.xfam_fixed1,
+		.nr_cpuid_configs =3D tdx_caps.nr_cpuid_configs,
+		.padding =3D 0,
+	};
+
+	if (copy_to_user(user_caps, &caps, sizeof(caps)))
+		return -EFAULT;
+	if (copy_to_user(user_caps->cpuid_configs, &tdx_caps.cpuid_configs,
+			 tdx_caps.nr_cpuid_configs *
+			 sizeof(struct tdx_cpuid_config)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static int __init tdx_module_setup(void)
 {
 	const struct tdsysinfo_struct *tdsysinfo;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 5ab78e044895..a831dd9ee1a3 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -135,6 +135,7 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_o=
ps);
 bool tdx_is_vm_type_supported(unsigned long type);
 void tdx_hardware_unsetup(void);
 int tdx_offline_cpu(void);
+int tdx_dev_ioctl(void __user *argp);
=20
 int tdx_vm_init(struct kvm *kvm);
 void tdx_mmu_release_hkid(struct kvm *kvm);
@@ -144,6 +145,7 @@ static inline int tdx_hardware_setup(struct kvm_x86_ops=
 *x86_ops) { return 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
 static inline void tdx_hardware_unsetup(void) {}
 static inline int tdx_offline_cpu(void) { return 0; }
+static inline int tdx_dev_ioctl(void __user *argp) { return -EOPNOTSUPP; };
=20
 static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
 static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 351803fdc944..e2cbeeec9d6a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4660,6 +4660,12 @@ long kvm_arch_dev_ioctl(struct file *filp,
 		r =3D kvm_x86_dev_has_attr(&attr);
 		break;
 	}
+	case KVM_MEMORY_ENCRYPT_OP:
+		r =3D -EINVAL;
+		if (!kvm_x86_ops.dev_mem_enc_ioctl)
+			goto out;
+		r =3D static_call(kvm_x86_dev_mem_enc_ioctl)(argp);
+		break;
 	default:
 		r =3D -EINVAL;
 		break;
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 54b08789c402..2ad9666e02a5 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -535,4 +535,52 @@ struct kvm_pmu_event_filter {
 #define KVM_X86_DEFAULT_VM	0
 #define KVM_X86_TDX_VM		1
=20
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum kvm_tdx_cmd_id {
+	KVM_TDX_CAPABILITIES =3D 0,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	/* enum kvm_tdx_cmd_id */
+	__u32 id;
+	/* flags for sub-commend. If sub-command doesn't use this, set zero. */
+	__u32 flags;
+	/*
+	 * data for each sub-command. An immediate or a pointer to the actual
+	 * data in process virtual address.  If sub-command doesn't use it,
+	 * set zero.
+	 */
+	__u64 data;
+	/*
+	 * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+	 * status code in addition to -Exxx.
+	 * Defined for consistency with struct kvm_sev_cmd.
+	 */
+	__u64 error;
+	/* Reserved: Defined for consistency with struct kvm_sev_cmd. */
+	__u64 unused;
+};
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+
+	__u32 nr_cpuid_configs;
+	__u32 padding;
+	struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
 #endif /* _ASM_X86_KVM_H */
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 55927C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:20:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231796AbiI3KUV (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:20:21 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33604 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231328AbiI3KS5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:57 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C664F15ED1E;
        Fri, 30 Sep 2022 03:18:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533136; x=1696069136;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=s70r+loZ+1jQv7SgNY3r7Wv+g31q3pKzMTqjzsqPQ7g=;
  b=MTxp3eWlVb3Jz2/2S1pGkStosrQNudz9A+Aagnhfsrqhpc2W8fXKIgLe
   wBQqYZMFsjXxvPcXp3vfseWCvglFvMDtGtcwHZXoz6kfXknLtp5aGm0sy
   uhqTAzJ3kDsHicQHxVsIQUs3jEF1x/Qcbmf5Hf07qCA9j6Gy6z1xDfa4g
   NSTb6Bkgp3x5txgAtqtkaWO4CU2wYVY+I62s2NAbNDjJQ7aTFP3IEypZx
   9R4NXKkNazBW4haXSUB+WvQhqchmBSKjuI0/WYH5goyRWC5N5QWwVDT/0
   bXJYzycaY5OErkSmDGI70XKFkWzvCQYBK1u59RUwyP76T+xz+Mh/HWNWU
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870069"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870069"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:54 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807565"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807565"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:54 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 019/105] KVM: TDX: Add place holder for TDX VM specific
 mem_enc_op ioctl
Date: Fri, 30 Sep 2022 03:17:13 -0700
Message-Id: 
 <5d72eef695c4a6368d2bef411a58c7a3ec7e62c0.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a place holder function for TDX specific VM-scoped ioctl as mem_enc_op.
TDX specific sub-commands will be added to retrieve/pass TDX specific
parameters.

KVM_MEMORY_ENCRYPT_OP was introduced for VM-scoped operations specific for
guest state-protected VM.  It defined subcommands for technology-specific
operations under KVM_MEMORY_ENCRYPT_OP.  Despite its name, the subcommands
are not limited to memory encryption, but various technology-specific
operations are defined.  It's natural to repurpose KVM_MEMORY_ENCRYPT_OP
for TDX specific operations and define subcommands.

TDX requires VM-scoped TDX-specific operations for device model, for
example, qemu.  Getting system-wide parameters, TDX-specific VM
initialization.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    |  9 +++++++++
 arch/x86/kvm/vmx/tdx.c     | 26 ++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  4 ++++
 3 files changed, 39 insertions(+)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 071e7d148cb7..42b1243a89e5 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -63,6 +63,14 @@ static void vt_vm_free(struct kvm *kvm)
 		return tdx_vm_free(kvm);
 }
=20
+static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
+{
+	if (!is_td(kvm))
+		return -ENOTTY;
+
+	return tdx_vm_ioctl(kvm, argp);
+}
+
 struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
=20
@@ -205,6 +213,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
=20
 	.dev_mem_enc_ioctl =3D tdx_dev_ioctl,
+	.mem_enc_ioctl =3D vt_mem_enc_ioctl,
 };
=20
 struct kvm_x86_init_ops vt_init_ops __initdata =3D {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 60c5c08593c3..76e00e9bfe91 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -475,6 +475,32 @@ int tdx_dev_ioctl(void __user *argp)
 	return 0;
 }
=20
+int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
+{
+	struct kvm_tdx_cmd tdx_cmd;
+	int r;
+
+	if (copy_from_user(&tdx_cmd, argp, sizeof(struct kvm_tdx_cmd)))
+		return -EFAULT;
+	if (tdx_cmd.error || tdx_cmd.unused)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	switch (tdx_cmd.id) {
+	default:
+		r =3D -EINVAL;
+		goto out;
+	}
+
+	if (copy_to_user(argp, &tdx_cmd, sizeof(struct kvm_tdx_cmd)))
+		r =3D -EFAULT;
+
+out:
+	mutex_unlock(&kvm->lock);
+	return r;
+}
+
 static int __init tdx_module_setup(void)
 {
 	const struct tdsysinfo_struct *tdsysinfo;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index a831dd9ee1a3..3576b5c7238d 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -140,6 +140,8 @@ int tdx_dev_ioctl(void __user *argp);
 int tdx_vm_init(struct kvm *kvm);
 void tdx_mmu_release_hkid(struct kvm *kvm);
 void tdx_vm_free(struct kvm *kvm);
+
+int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
@@ -151,6 +153,8 @@ static inline int tdx_vm_init(struct kvm *kvm) { return=
 -EOPNOTSUPP; }
 static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
 static inline void tdx_flush_shadow_all_private(struct kvm *kvm) {}
 static inline void tdx_vm_free(struct kvm *kvm) {}
+
+static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EECD1C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:20:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231840AbiI3KUg (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:20:36 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33596 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231377AbiI3KS7 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:59 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D20115348B;
        Fri, 30 Sep 2022 03:18:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533137; x=1696069137;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=6y6xOgs7TCl0jfpxcnUHUePcgygxURjfP3rOvBTb/p8=;
  b=Y3pkty9E/fI/wtUu3TXRab/zmc1hlCR1joEDZA9BrTFJ7XFyCsFDR15h
   dZyAh+WMUSNorrh9cev2vXtRV7uremkBQFOSelB8yj45zNoPqTAsmypc/
   OryKN7fExEuWfqM9S2ci1YjFeDDaZYAZBbiohPftMxsoYxtR19ZNnrCqn
   LFx8A6wNH09fUkRlzYI8Xap2kLC+ewrP/wsGhjqHostGUyyn8C2gZeD3+
   YdPvYRQ+qc/bnLeSTNflwYwmXyuIOe8Srju+vDyEqMqd4k8zFCtu/ZHFZ
   Z3amr3UW/49qOWFjloXDoKEF3hXl1wx7GRvLi0BS3pls3fkmiI6LWPw2V
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870072"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870072"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:54 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807568"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807568"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:54 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>
Subject: [PATCH v9 020/105] KVM: TDX: initialize VM with TDX specific
 parameters
Date: Fri, 30 Sep 2022 03:17:14 -0700
Message-Id: 
 <cbbf0df3758b41c82eb3fe638b0bf0d5124834f6.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Xiaoyao Li <xiaoyao.li@intel.com>

TDX requires additional parameters for TDX VM for confidential execution to
protect its confidentiality of its memory contents and its CPU state from
any other software, including VMM. When creating guest TD VM before
creating vcpu, the number of vcpu, TSC frequency (that is same among
vcpus. and it can't be changed.)  CPUIDs which is emulated by the TDX
module. It means guest can trust those CPUIDs. and sha384 values for
measurement.

Add new subcommand, KVM_TDX_INIT_VM, to pass parameters for TDX guest.  It
assigns encryption key to the TDX guest for memory encryption.  TDX
encrypts memory per-guest bases.  It assigns device model passes per-VM
parameters for the TDX guest.  The maximum number of vcpus, tsc frequency
(TDX guest has fised VM-wide TSC frequency. not per-vcpu.  The TDX guest
can not change it.), attributes (production or debug), available extended
features (which is reflected into guest XCR0, IA32_XSS MSR), cpuids, sha384
measurements, and etc.

This subcommand is called before creating vcpu and KVM_SET_CPUID2, i.e.
cpuids configurations aren't available yet.  So CPUIDs configuration values
needs to be passed in struct kvm_init_vm.  It's device model responsibility
to make this cpuid config for KVM_TDX_INIT_VM and KVM_SET_CPUID2.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/tdx.h            |   3 +
 arch/x86/include/uapi/asm/kvm.h       |  33 +++
 arch/x86/kvm/vmx/tdx.c                | 303 ++++++++++++++++++++++----
 arch/x86/kvm/vmx/tdx.h                |  22 ++
 tools/arch/x86/include/uapi/asm/kvm.h |  33 +++
 5 files changed, 353 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index a32e8881e758..8a1905ae3ad6 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -89,6 +89,9 @@ static inline long tdx_kvm_hypercall(unsigned int nr, uns=
igned long p1,
 #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
=20
 #ifdef CONFIG_INTEL_TDX_HOST
+
+/* -1 indicates CPUID leaf with no sub-leaves. */
+#define TDX_CPUID_NO_SUBLEAF	((u32)-1)
 struct tdx_cpuid_config {
 	u32	leaf;
 	u32	sub_leaf;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 2ad9666e02a5..e231ba752788 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -538,6 +538,7 @@ struct kvm_pmu_event_filter {
 /* Trust Domain eXtension sub-ioctl() commands. */
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
+	KVM_TDX_INIT_VM,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -583,4 +584,36 @@ struct kvm_tdx_capabilities {
 	struct kvm_tdx_cpuid_config cpuid_configs[0];
 };
=20
+struct kvm_tdx_init_vm {
+	__u64 attributes;
+	__u32 max_vcpus;
+	__u32 padding;
+	__u64 mrconfigid[6];	/* sha384 digest */
+	__u64 mrowner[6];	/* sha384 digest */
+	__u64 mrownerconfig[6];	/* sha348 digest */
+	union {
+		/*
+		 * KVM_TDX_INIT_VM is called before vcpu creation, thus before
+		 * KVM_SET_CPUID2.  CPUID configurations needs to be passed.
+		 *
+		 * This configuration supersedes KVM_SET_CPUID{,2}.
+		 * The user space VMM, e.g. qemu, should make them consistent
+		 * with this values.
+		 * sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES(256)
+		 * =3D 8KB.
+		 */
+		struct {
+			struct kvm_cpuid2 cpuid;
+			/* 8KB with KVM_MAX_CPUID_ENTRIES. */
+			struct kvm_cpuid_entry2 entries[];
+		};
+		/*
+		 * For future extensibility.
+		 * The size(struct kvm_tdx_init_vm) =3D 16KB.
+		 * This should be enough given sizeof(TD_PARAMS) =3D 1024
+		 */
+		__u64 reserved[2028];
+	};
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 76e00e9bfe91..3c16f2d535b1 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -272,12 +272,209 @@ static int tdx_do_tdh_mng_key_config(void *param)
 int tdx_vm_init(struct kvm *kvm)
 {
 	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
-	cpumask_var_t packages;
-	int ret, i;
-	u64 err;
=20
 	/* vCPUs can't be created until after KVM_TDX_INIT_VM. */
 	kvm->max_vcpus =3D 0;
+	kvm_tdx->hkid =3D -1;
+
+	/*
+	 * This function initializes only KVM software construct.  It doesn't
+	 * initialize TDX stuff, e.g. TDCS, TDR, TDCX, HKID etc.
+	 * It is handled by KVM_TDX_INIT_VM, __tdx_td_init().
+	 */
+
+	return 0;
+}
+
+int tdx_dev_ioctl(void __user *argp)
+{
+	struct kvm_tdx_capabilities __user *user_caps;
+	struct kvm_tdx_capabilities caps;
+	struct kvm_tdx_cmd cmd;
+
+	BUILD_BUG_ON(sizeof(struct kvm_tdx_cpuid_config) !=3D
+		     sizeof(struct tdx_cpuid_config));
+
+	if (copy_from_user(&cmd, argp, sizeof(cmd)))
+		return -EFAULT;
+	if (cmd.flags || cmd.error || cmd.unused)
+		return -EINVAL;
+	/*
+	 * Currently only KVM_TDX_CAPABILITIES is defined for system-scoped
+	 * mem_enc_ioctl().
+	 */
+	if (cmd.id !=3D KVM_TDX_CAPABILITIES)
+		return -EINVAL;
+
+	user_caps =3D (void __user *)cmd.data;
+	if (copy_from_user(&caps, user_caps, sizeof(caps)))
+		return -EFAULT;
+
+	if (caps.nr_cpuid_configs < tdx_caps.nr_cpuid_configs)
+		return -E2BIG;
+
+	caps =3D (struct kvm_tdx_capabilities) {
+		.attrs_fixed0 =3D tdx_caps.attrs_fixed0,
+		.attrs_fixed1 =3D tdx_caps.attrs_fixed1,
+		.xfam_fixed0 =3D tdx_caps.xfam_fixed0,
+		.xfam_fixed1 =3D tdx_caps.xfam_fixed1,
+		.nr_cpuid_configs =3D tdx_caps.nr_cpuid_configs,
+		.padding =3D 0,
+	};
+
+	if (copy_to_user(user_caps, &caps, sizeof(caps)))
+		return -EFAULT;
+	if (copy_to_user(user_caps->cpuid_configs, &tdx_caps.cpuid_configs,
+			 tdx_caps.nr_cpuid_configs *
+			 sizeof(struct tdx_cpuid_config)))
+		return -EFAULT;
+
+	return 0;
+}
+
+/*
+ * cpuid entry lookup in TDX cpuid config way.
+ * The difference is how to specify index(subleaves).
+ * Specify index to TDX_CPUID_NO_SUBLEAF for CPUID leaf with no-subleaves.
+ */
+static const struct kvm_cpuid_entry2 *tdx_find_cpuid_entry(const struct kv=
m_cpuid2 *cpuid,
+							   u32 function, u32 index)
+{
+	int i;
+
+	/* In TDX CPU CONFIG, TDX_CPUID_NO_SUBLEAF means index =3D 0. */
+	if (index =3D=3D TDX_CPUID_NO_SUBLEAF)
+		index =3D 0;
+
+	for (i =3D 0; i < cpuid->nent; i++) {
+		const struct kvm_cpuid_entry2 *e =3D &cpuid->entries[i];
+
+		if (e->function =3D=3D function &&
+		    (e->index =3D=3D index ||
+		     !(e->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)))
+			return e;
+	}
+	return NULL;
+}
+
+static int setup_tdparams(struct kvm *kvm, struct td_params *td_params,
+			struct kvm_tdx_init_vm *init_vm)
+{
+	const struct kvm_cpuid2 *cpuid =3D &init_vm->cpuid;
+	const struct kvm_cpuid_entry2 *entry;
+	u64 guest_supported_xcr0;
+	u64 guest_supported_xss;
+	int max_pa;
+	int i;
+
+	td_params->max_vcpus =3D init_vm->max_vcpus;
+
+	td_params->attributes =3D init_vm->attributes;
+	if (td_params->attributes & TDX_TD_ATTRIBUTE_PERFMON) {
+		/*
+		 * TODO: save/restore PMU related registers around TDENTER.
+		 * Once it's done, remove this guard.
+		 */
+		pr_warn("TD doesn't support perfmon yet. KVM needs to save/restore "
+			"host perf registers properly.\n");
+		return -EOPNOTSUPP;
+	}
+
+	for (i =3D 0; i < tdx_caps.nr_cpuid_configs; i++) {
+		const struct tdx_cpuid_config *config =3D &tdx_caps.cpuid_configs[i];
+		const struct kvm_cpuid_entry2 *entry =3D
+			tdx_find_cpuid_entry(cpuid, config->leaf, config->sub_leaf);
+		struct tdx_cpuid_value *value =3D &td_params->cpuid_values[i];
+
+		if (!entry)
+			continue;
+
+		value->eax =3D entry->eax & config->eax;
+		value->ebx =3D entry->ebx & config->ebx;
+		value->ecx =3D entry->ecx & config->ecx;
+		value->edx =3D entry->edx & config->edx;
+	}
+
+	max_pa =3D 36;
+	entry =3D tdx_find_cpuid_entry(cpuid, 0x80000008, 0);
+	if (entry)
+		max_pa =3D entry->eax & 0xff;
+
+	td_params->eptp_controls =3D VMX_EPTP_MT_WB;
+	/*
+	 * No CPU supports 4-level && max_pa > 48.
+	 * "5-level paging and 5-level EPT" section 4.1 4-level EPT
+	 * "4-level EPT is limited to translating 48-bit guest-physical
+	 *  addresses."
+	 * cpu_has_vmx_ept_5levels() check is just in case.
+	 */
+	if (cpu_has_vmx_ept_5levels() && max_pa > 48) {
+		td_params->eptp_controls |=3D VMX_EPTP_PWL_5;
+		td_params->exec_controls |=3D TDX_EXEC_CONTROL_MAX_GPAW;
+	} else {
+		td_params->eptp_controls |=3D VMX_EPTP_PWL_4;
+	}
+
+	/* Setup td_params.xfam */
+	entry =3D tdx_find_cpuid_entry(cpuid, 0xd, 0);
+	if (entry)
+		guest_supported_xcr0 =3D (entry->eax | ((u64)entry->edx << 32));
+	else
+		guest_supported_xcr0 =3D 0;
+	guest_supported_xcr0 &=3D kvm_caps.supported_xcr0;
+
+	entry =3D tdx_find_cpuid_entry(cpuid, 0xd, 1);
+	if (entry)
+		guest_supported_xss =3D (entry->ecx | ((u64)entry->edx << 32));
+	else
+		guest_supported_xss =3D 0;
+	/* PT can be exposed to TD guest regardless of KVM's XSS support */
+	guest_supported_xss &=3D (kvm_caps.supported_xss | XFEATURE_MASK_PT);
+
+	td_params->xfam =3D guest_supported_xcr0 | guest_supported_xss;
+	if (td_params->xfam & XFEATURE_MASK_LBR) {
+		/*
+		 * TODO: once KVM supports LBR(save/restore LBR related
+		 * registers around TDENTER), remove this guard.
+		 */
+		pr_warn("TD doesn't support LBR yet. KVM needs to save/restore "
+			"IA32_LBR_DEPTH properly.\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (td_params->xfam & XFEATURE_MASK_XTILE) {
+		/*
+		 * TODO: once KVM supports AMX(save/restore AMX related
+		 * registers around TDENTER), remove this guard.
+		 */
+		pr_warn("TD doesn't support AMX yet. KVM needs to save/restore "
+			"IA32_XFD, IA32_XFD_ERR properly.\n");
+		return -EOPNOTSUPP;
+	}
+
+	td_params->tsc_frequency =3D
+		TDX_TSC_KHZ_TO_25MHZ(kvm->arch.default_tsc_khz);
+
+#define MEMCPY_SAME_SIZE(dst, src)				\
+	do {							\
+		BUILD_BUG_ON(sizeof(dst) !=3D sizeof(src));	\
+		memcpy((dst), (src), sizeof(dst));		\
+	} while (0)
+
+	MEMCPY_SAME_SIZE(td_params->mrconfigid, init_vm->mrconfigid);
+	MEMCPY_SAME_SIZE(td_params->mrowner, init_vm->mrowner);
+	MEMCPY_SAME_SIZE(td_params->mrownerconfig, init_vm->mrownerconfig);
+
+	return 0;
+}
+
+static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	struct tdx_module_output out;
+	cpumask_var_t packages;
+	int ret, i;
+	u64 err;
=20
 	kvm_tdx->hkid =3D tdx_keyid_alloc();
 	if (kvm_tdx->hkid < 0)
@@ -381,10 +578,13 @@ int tdx_vm_init(struct kvm *kvm)
 		tdx_mark_td_page_added(&kvm_tdx->tdcs[i]);
 	}
=20
-	/*
-	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a de=
dicated
-	 * ioctl() to define the configure CPUID values for the TD.
-	 */
+	err =3D tdh_mng_init(kvm_tdx->tdr.pa, __pa(td_params), &out);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_INIT, err, &out);
+		ret =3D -EIO;
+		goto teardown;
+	}
+
 	return 0;
=20
 	/*
@@ -429,50 +629,68 @@ int tdx_vm_init(struct kvm *kvm)
 	return ret;
 }
=20
-int tdx_dev_ioctl(void __user *argp)
+static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
 {
-	struct kvm_tdx_capabilities __user *user_caps;
-	struct kvm_tdx_capabilities caps;
-	struct kvm_tdx_cmd cmd;
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	struct kvm_tdx_init_vm *init_vm =3D NULL;
+	struct td_params *td_params =3D NULL;
+	void *entries_end;
+	int ret;
=20
-	BUILD_BUG_ON(sizeof(struct kvm_tdx_cpuid_config) !=3D
-		     sizeof(struct tdx_cpuid_config));
+	BUILD_BUG_ON(sizeof(*init_vm) !=3D 16 * 1024);
+	BUILD_BUG_ON((sizeof(*init_vm) - offsetof(typeof(*init_vm), entries)) /
+		     sizeof(init_vm->entries[0]) < KVM_MAX_CPUID_ENTRIES);
+	BUILD_BUG_ON(sizeof(struct td_params) !=3D 1024);
=20
-	if (copy_from_user(&cmd, argp, sizeof(cmd)))
-		return -EFAULT;
-	if (cmd.flags || cmd.error || cmd.unused)
+	if (is_td_initialized(kvm))
 		return -EINVAL;
-	/*
-	 * Currently only KVM_TDX_CAPABILITIES is defined for system-scoped
-	 * mem_enc_ioctl().
-	 */
-	if (cmd.id !=3D KVM_TDX_CAPABILITIES)
+
+	if (cmd->flags)
 		return -EINVAL;
=20
-	user_caps =3D (void __user *)cmd.data;
-	if (copy_from_user(&caps, user_caps, sizeof(caps)))
-		return -EFAULT;
+	init_vm =3D kzalloc(sizeof(*init_vm), GFP_KERNEL);
+	if (copy_from_user(init_vm, (void __user *)cmd->data, sizeof(*init_vm))) {
+		ret =3D -EFAULT;
+		goto out;
+	}
=20
-	if (caps.nr_cpuid_configs < tdx_caps.nr_cpuid_configs)
-		return -E2BIG;
+	ret =3D -EINVAL;
+	if (init_vm->cpuid.padding)
+		goto out;
+	/* init_vm->entries shouldn't overrun. */
+	entries_end =3D init_vm->entries + init_vm->cpuid.nent;
+	if (entries_end > (void *)(init_vm + 1))
+		goto out;
+	/* Unused part must be zero. */
+	if (memchr_inv(entries_end, 0, (void *)(init_vm + 1) - entries_end))
+		goto out;
+	if (init_vm->max_vcpus > KVM_MAX_VCPUS)
+		goto out;
=20
-	caps =3D (struct kvm_tdx_capabilities) {
-		.attrs_fixed0 =3D tdx_caps.attrs_fixed0,
-		.attrs_fixed1 =3D tdx_caps.attrs_fixed1,
-		.xfam_fixed0 =3D tdx_caps.xfam_fixed0,
-		.xfam_fixed1 =3D tdx_caps.xfam_fixed1,
-		.nr_cpuid_configs =3D tdx_caps.nr_cpuid_configs,
-		.padding =3D 0,
-	};
+	td_params =3D kzalloc(sizeof(struct td_params), GFP_KERNEL);
+	if (!td_params) {
+		ret =3D -ENOMEM;
+		goto out;
+	}
=20
-	if (copy_to_user(user_caps, &caps, sizeof(caps)))
-		return -EFAULT;
-	if (copy_to_user(user_caps->cpuid_configs, &tdx_caps.cpuid_configs,
-			 tdx_caps.nr_cpuid_configs *
-			 sizeof(struct tdx_cpuid_config)))
-		return -EFAULT;
+	ret =3D setup_tdparams(kvm, td_params, init_vm);
+	if (ret)
+		goto out;
=20
-	return 0;
+	ret =3D __tdx_td_init(kvm, td_params);
+	if (ret)
+		goto out;
+
+	kvm_tdx->tsc_offset =3D td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFF=
SET);
+	kvm_tdx->attributes =3D td_params->attributes;
+	kvm_tdx->xfam =3D td_params->xfam;
+	kvm->max_vcpus =3D td_params->max_vcpus;
+
+out:
+	/* kfree() accepts NULL. */
+	kfree(init_vm);
+	kfree(td_params);
+	return ret;
 }
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
@@ -488,6 +706,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	mutex_lock(&kvm->lock);
=20
 	switch (tdx_cmd.id) {
+	case KVM_TDX_INIT_VM:
+		r =3D tdx_td_init(kvm, &tdx_cmd);
+		break;
 	default:
 		r =3D -EINVAL;
 		goto out;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 938314635b47..b87b62ba2575 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -18,7 +18,11 @@ struct kvm_tdx {
 	struct tdx_td_page tdr;
 	struct tdx_td_page *tdcs;
=20
+	u64 attributes;
+	u64 xfam;
 	int hkid;
+
+	u64 tsc_offset;
 };
=20
 struct vcpu_tdx {
@@ -48,6 +52,11 @@ static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *v=
cpu)
 	return container_of(vcpu, struct vcpu_tdx, vcpu);
 }
=20
+static inline bool is_td_initialized(struct kvm *kvm)
+{
+	return !!kvm->max_vcpus;
+}
+
 static __always_inline void tdvps_vmcs_check(u32 field, u8 bits)
 {
 #define VMCS_ENC_ACCESS_TYPE_MASK	0x1UL
@@ -148,6 +157,19 @@ TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs);
 TDX_BUILD_TDVPS_ACCESSORS(64, STATE_NON_ARCH, state_non_arch);
 TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management);
=20
+static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u3=
2 field)
+{
+	struct tdx_module_output out;
+	u64 err;
+
+	err =3D tdh_mng_rd(kvm_tdx->tdr.pa, TDCS_EXEC(field), &out);
+	if (unlikely(err)) {
+		pr_err("TDH_MNG_RD[EXEC.0x%x] failed: 0x%llx\n", field, err);
+		return 0;
+	}
+	return out.r8;
+}
+
 #else
 struct kvm_tdx {
 	struct kvm kvm;
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 2ad9666e02a5..531a0033e530 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -538,6 +538,7 @@ struct kvm_pmu_event_filter {
 /* Trust Domain eXtension sub-ioctl() commands. */
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
+	KVM_TDX_INIT_VM,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -583,4 +584,36 @@ struct kvm_tdx_capabilities {
 	struct kvm_tdx_cpuid_config cpuid_configs[0];
 };
=20
+struct kvm_tdx_init_vm {
+	__u64 attributes;
+	__u32 max_vcpus;
+	__u32 padding;
+	__u64 mrconfigid[6];    /* sha384 digest */
+	__u64 mrowner[6];       /* sha384 digest */
+	__u64 mrownerconfig[6]; /* sha348 digest */
+	union {
+		/*
+		 * KVM_TDX_INIT_VM is called before vcpu creation, thus before
+		 * KVM_SET_CPUID2.  CPUID configurations needs to be passed.
+		 *
+		 * This configuration supersedes KVM_SET_CPUID{,2}.
+		 * The user space VMM, e.g. qemu, should make them consistent
+		 * with this values.
+		 * sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES(256)
+		 * =3D 8KB.
+		 */
+		struct {
+			struct kvm_cpuid2 cpuid;
+			/* 8KB with KVM_MAX_CPUID_ENTRIES. */
+			struct kvm_cpuid_entry2 entries[];
+		};
+		/*
+		 * For future extensibility.
+		 * The size(struct kvm_tdx_init_vm) =3D 16KB.
+		 * This should be enough given sizeof(TD_PARAMS) =3D 1024
+		 */
+		__u64 reserved[2028];
+	};
+};
+
 #endif /* _ASM_X86_KVM_H */
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B9C11C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:20:35 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231829AbiI3KUc (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:20:32 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33566 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231360AbiI3KS7 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:59 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6BB915ED2D;
        Fri, 30 Sep 2022 03:18:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533137; x=1696069137;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=aZMW5HfMLXpsU0Tk1EL3dTy4TiFBa/NJZtk7VmZLpLk=;
  b=HG04awPqAuC0Bl1xEUOsDTCYsOIaS2N71+Uv1oIQMyu4AbzfN3f61JWg
   HFcbBExh3O+FXOlhrAC4xYgIoFKRsC4BY5QFVBIQClah9DNIHi9TRUiFS
   kGAeyUqTgop0Zb8IgeeKqklOL/rq+G/02JoDlsPbb/A0AVt0IQO4ztBBy
   ZqJXanQUn3olIlbmuQv8EOs63SwZnakJlrlrcNE/3jUBchykq3jVzvjg9
   PFEcwc/YobbQixu6daTW8cHDi/UOLeQtpy9L2wDZMEgdKKzGYYFka5c5I
   r1xvqoTzjcMT8Xx35X3zlZEskFj04eskUgv90yEhW/5kjr1GUVRbRu/8a
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870073"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870073"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:54 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807571"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807571"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:54 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 021/105] KVM: TDX: Make pmu_intel.c ignore guest TD case
Date: Fri, 30 Sep 2022 03:17:15 -0700
Message-Id: 
 <b0dca78ec243455dbddf68dd14f209ed0a05413e.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because TDX KVM doesn't support PMU yet (it's future work of TDX KVM
support as another patch series) and pmu_intel.c touches vmx specific
structure in vcpu initialization, as workaround add dummy structure to
struct vcpu_tdx and pmu_intel.c can ignore TDX case.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 46 +++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/pmu_intel.h | 28 ++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h       |  7 ++++++
 arch/x86/kvm/vmx/vmx.c       |  2 +-
 arch/x86/kvm/vmx/vmx.h       | 32 +------------------------
 5 files changed, 82 insertions(+), 33 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/pmu_intel.h

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index c399637a3a79..28db23380c5b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -17,6 +17,7 @@
 #include "lapic.h"
 #include "nested.h"
 #include "pmu.h"
+#include "tdx.h"
=20
 #define MSR_PMC_FULL_WIDTH_BIT      (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
=20
@@ -35,6 +36,26 @@ static struct kvm_event_hw_type_mapping intel_arch_event=
s[] =3D {
 /* mapping between fixed pmc index and intel_arch_events array */
 static int fixed_pmc_events[] =3D {1, 0, 7};
=20
+struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_INTEL_TDX_HOST
+	if (is_td_vcpu(vcpu))
+		return &to_tdx(vcpu)->lbr_desc;
+#endif
+
+	return &to_vmx(vcpu)->lbr_desc;
+}
+
+struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_INTEL_TDX_HOST
+	if (is_td_vcpu(vcpu))
+		return &to_tdx(vcpu)->lbr_desc.records;
+#endif
+
+	return &to_vmx(vcpu)->lbr_desc.records;
+}
+
 static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 {
 	struct kvm_pmc *pmc;
@@ -171,6 +192,23 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm=
_pmu *pmu, u32 msr)
 	return get_gp_pmc(pmu, msr, MSR_IA32_PMC0);
 }
=20
+bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return false;
+	return cpuid_model_is_consistent(vcpu);
+}
+
+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
+{
+	struct x86_pmu_lbr *lbr =3D vcpu_to_lbr_records(vcpu);
+
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return lbr->nr && (vcpu_get_perf_capabilities(vcpu) & PMU_CAP_LBR_FMT);
+}
+
 static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
 {
 	struct x86_pmu_lbr *records =3D vcpu_to_lbr_records(vcpu);
@@ -281,6 +319,9 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *v=
cpu)
 					PERF_SAMPLE_BRANCH_USER,
 	};
=20
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return 0;
+
 	if (unlikely(lbr_desc->event)) {
 		__set_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use);
 		return 0;
@@ -586,7 +627,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters);
=20
 	perf_capabilities =3D vcpu_get_perf_capabilities(vcpu);
-	if (cpuid_model_is_consistent(vcpu) &&
+	if (intel_pmu_lbr_is_compatible(vcpu) &&
 	    (perf_capabilities & PMU_CAP_LBR_FMT))
 		x86_perf_get_lbr(&lbr_desc->records);
 	else
@@ -643,6 +684,9 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
 	struct kvm_pmc *pmc =3D NULL;
 	int i;
=20
+	if (is_td_vcpu(vcpu))
+		return;
+
 	for (i =3D 0; i < INTEL_PMC_MAX_GENERIC; i++) {
 		pmc =3D &pmu->gp_counters[i];
=20
diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h
new file mode 100644
index 000000000000..66bba47c1269
--- /dev/null
+++ b/arch/x86/kvm/vmx/pmu_intel.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_VMX_PMU_INTEL_H
+#define  __KVM_X86_VMX_PMU_INTEL_H
+
+struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu);
+struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu);
+
+bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);
+int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
+
+struct lbr_desc {
+	/* Basic info about guest LBR records. */
+	struct x86_pmu_lbr records;
+
+	/*
+	 * Emulate LBR feature via passthrough LBR registers when the
+	 * per-vcpu guest LBR event is scheduled on the current pcpu.
+	 *
+	 * The records may be inaccurate if the host reclaims the LBR.
+	 */
+	struct perf_event *event;
+
+	/* True if LBRs are marked as not intercepted in the MSR bitmap */
+	bool msr_passthrough;
+};
+
+#endif /* __KVM_X86_VMX_PMU_INTEL_H */
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index b87b62ba2575..b1906dc2f0f9 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -4,6 +4,7 @@
=20
 #ifdef CONFIG_INTEL_TDX_HOST
=20
+#include "pmu_intel.h"
 #include "tdx_ops.h"
=20
 struct tdx_td_page {
@@ -30,6 +31,12 @@ struct vcpu_tdx {
=20
 	struct tdx_td_page tdvpr;
 	struct tdx_td_page *tdvpx;
+
+	/*
+	 * Dummy to make pmu_intel not corrupt memory.
+	 * TODO: Support PMU for TDX.  Future work.
+	 */
+	struct lbr_desc lbr_desc;
 };
=20
 static inline bool is_td(struct kvm *kvm)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fb626adc347d..b53ffd367f51 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2326,7 +2326,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_dat=
a *msr_info)
 			if ((data & PMU_CAP_LBR_FMT) !=3D
 			    (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT))
 				return 1;
-			if (!cpuid_model_is_consistent(vcpu))
+			if (!intel_pmu_lbr_is_compatible(vcpu))
 				return 1;
 		}
 		if (data & PERF_CAP_PEBS_FORMAT) {
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 24d58c2ffaa3..c9fb46e570b0 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -11,6 +11,7 @@
 #include "capabilities.h"
 #include "../kvm_cache_regs.h"
 #include "posted_intr.h"
+#include "pmu_intel.h"
 #include "vmcs.h"
 #include "vmx_ops.h"
 #include "../cpuid.h"
@@ -105,22 +106,6 @@ static inline bool intel_pmu_has_perf_global_ctrl(stru=
ct kvm_pmu *pmu)
 	return pmu->version > 1;
 }
=20
-struct lbr_desc {
-	/* Basic info about guest LBR records. */
-	struct x86_pmu_lbr records;
-
-	/*
-	 * Emulate LBR feature via passthrough LBR registers when the
-	 * per-vcpu guest LBR event is scheduled on the current pcpu.
-	 *
-	 * The records may be inaccurate if the host reclaims the LBR.
-	 */
-	struct perf_event *event;
-
-	/* True if LBRs are marked as not intercepted in the MSR bitmap */
-	bool msr_passthrough;
-};
-
 /*
  * The nested_vmx structure is part of vcpu_vmx, and holds information we =
need
  * for correct emulation of VMX (i.e., nested VMX) on this vcpu.
@@ -534,21 +519,6 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu =
*vcpu)
 	return container_of(vcpu, struct vcpu_vmx, vcpu);
 }
=20
-static inline struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu)
-{
-	return &to_vmx(vcpu)->lbr_desc;
-}
-
-static inline struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcp=
u)
-{
-	return &vcpu_to_lbr_desc(vcpu)->records;
-}
-
-static inline bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
-{
-	return !!vcpu_to_lbr_records(vcpu)->nr;
-}
-
 void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu);
 int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
 void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu);
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D5160C4332F
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:20:58 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231913AbiI3KU5 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:20:57 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33560 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231356AbiI3KS6 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:18:58 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E77D215ED3C;
        Fri, 30 Sep 2022 03:18:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533137; x=1696069137;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=dq/4kTxFog2mJ+pfF6ngUCSHtSzKcfcxTCggESSMMVA=;
  b=lDWrp6461vT9HQdy2m4rHKaIs7xyB6qRSOVllK3C7UImUx9q0JXdhKdz
   j93H1PTBJRdHv4mv7Yq0OTKdAB4KOGuQA/dBhrYjrYB88aJXefsIoqbCm
   B/GAS3lwx82bcB/XYzs8StLlir4fDnBcMeUYOo7p9P0gQTaREs0FbaHdi
   73FLdRiZnylX+f9T2LSrLV3sdLlA4dwT/0zBK6hbxc058ybSTScowfYFR
   +19luK8sBBUu+UN4EDS9Ai/trCsJrvja1XjcWUjMF8KsMlGBZDoKItbBa
   MuxiqL5P+7PPWHK1MDBrW+THcpijifsL/L9pun291UUwVC23uIxga78bp
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870075"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870075"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:55 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807574"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807574"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:54 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 022/105] [MARKER] The start of TDX KVM patch series: TD
 vcpu creation/destruction
Date: Fri, 30 Sep 2022 03:17:16 -0700
Message-Id: 
 <0e3ee395f12d761c4e00a1ce1169d820490c6f91.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD vcpu
creation/destruction.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 5e0deaebf843..3e8efde3e3f3 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -9,15 +9,15 @@ Layer status
 What qemu can do
 ----------------
 - TDX VM TYPE is exposed to Qemu.
-- Qemu can try to create VM of TDX VM type and then fails.
+- Qemu can create/destroy guest of TDX vm type.
=20
 Patch Layer status
 ------------------
   Patch layer                          Status
 * TDX, VMX coexistence:                 Applied
 * TDX architectural definitions:        Applied
-* TD VM creation/destruction:           Applying
-* TD vcpu creation/destruction:         Not yet
+* TD VM creation/destruction:           Applied
+* TD vcpu creation/destruction:         Applying
 * TDX EPT violation:                    Not yet
 * TD finalization:                      Not yet
 * TD vcpu enter/exit:                   Not yet
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6DA55C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:21:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231961AbiI3KVM (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:21:12 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33604 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231400AbiI3KTA (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:00 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 310D1163B60;
        Fri, 30 Sep 2022 03:18:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533139; x=1696069139;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=1stPIgHyis1he2c07VfBLAZMiMFnIIlywivg7Mw8CwM=;
  b=NBhPdbqtUv8ygKlYYDGIBaL35nsaTDJnD7K7NbRuJezb8pW9uI4O6DCY
   OT1xxfI0YtYSnH438v+xEvo9dAq3/KY3XMc42OjPHBEtiqb9UTRqfifM9
   m/a7bbWS8wC7V0w+9HDEmATTZx5p6Cf8gphFigmJ44Vh4OItV7gtWOR1L
   wTpUSERDrGp/T5011QbFQMvLvCqChKVO+tFqo4dgmqs+zOQuCncyLPCn3
   mL7BGm4W4lQboAr0KGXsawnWnUdxDIjF7wLrCiRCZt1x60p2VXXuyohGL
   X/a+oFSLSM0ql2hWX0wCDnKvMxCX07yAuAO4CMnMZmOVxq+yczTsovtU0
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870077"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870077"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:55 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807577"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807577"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:55 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 023/105] KVM: TDX: allocate/free TDX vcpu structure
Date: Fri, 30 Sep 2022 03:17:17 -0700
Message-Id: 
 <cfe721a18424100eba8c72ec4fd78aacedf705c9.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The next step of TDX guest creation is to create vcpu.  Allocate TDX vcpu
structures, initialize it.  Allocate pages of TDX vcpu for the TDX module.

In the case of the conventional case, cpuid is empty at the initialization.
and cpuid is configured after the vcpu initialization.  Because TDX
supports only X2APIC mode, cpuid is forcibly initialized to support X2APIC
on the vcpu initialization.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    |  40 +++++++++--
 arch/x86/kvm/vmx/tdx.c     | 138 +++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |   8 +++
 3 files changed, 182 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 42b1243a89e5..b49d3f58dc4f 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -63,6 +63,38 @@ static void vt_vm_free(struct kvm *kvm)
 		return tdx_vm_free(kvm);
 }
=20
+static int vt_vcpu_precreate(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return 0;
+
+	return vmx_vcpu_precreate(kvm);
+}
+
+static int vt_vcpu_create(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_create(vcpu);
+
+	return vmx_vcpu_create(vcpu);
+}
+
+static void vt_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_free(vcpu);
+
+	return vmx_vcpu_free(vcpu);
+}
+
+static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_reset(vcpu, init_event);
+
+	return vmx_vcpu_reset(vcpu, init_event);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -89,10 +121,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vm_destroy =3D vt_vm_destroy,
 	.vm_free =3D vt_vm_free,
=20
-	.vcpu_precreate =3D vmx_vcpu_precreate,
-	.vcpu_create =3D vmx_vcpu_create,
-	.vcpu_free =3D vmx_vcpu_free,
-	.vcpu_reset =3D vmx_vcpu_reset,
+	.vcpu_precreate =3D vt_vcpu_precreate,
+	.vcpu_create =3D vt_vcpu_create,
+	.vcpu_free =3D vt_vcpu_free,
+	.vcpu_reset =3D vt_vcpu_reset,
=20
 	.prepare_switch_to_guest =3D vmx_prepare_switch_to_guest,
 	.vcpu_load =3D vmx_vcpu_load,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 3c16f2d535b1..0fa4746f0450 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -49,6 +49,11 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u=
16 hkid)
 	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
 }
=20
+static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
+{
+	return tdx->tdvpr.added;
+}
+
 static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
 {
 	return kvm_tdx->tdr.added;
@@ -286,6 +291,139 @@ int tdx_vm_init(struct kvm *kvm)
 	return 0;
 }
=20
+int tdx_vcpu_create(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	int ret, i;
+
+	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
+	if (!vcpu->arch.apic)
+		return -EINVAL;
+
+	fpstate_set_confidential(&vcpu->arch.guest_fpu);
+
+	ret =3D tdx_alloc_td_page(&tdx->tdvpr);
+	if (ret)
+		return ret;
+
+	tdx->tdvpx =3D kcalloc(tdx_caps.tdvpx_nr_pages, sizeof(*tdx->tdvpx),
+			GFP_KERNEL_ACCOUNT);
+	if (!tdx->tdvpx) {
+		ret =3D -ENOMEM;
+		goto free_tdvpr;
+	}
+	for (i =3D 0; i < tdx_caps.tdvpx_nr_pages; i++) {
+		ret =3D tdx_alloc_td_page(&tdx->tdvpx[i]);
+		if (ret)
+			goto free_tdvpx;
+	}
+
+	vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
+
+	vcpu->arch.cr0_guest_owned_bits =3D -1ul;
+	vcpu->arch.cr4_guest_owned_bits =3D -1ul;
+
+	vcpu->arch.tsc_offset =3D to_kvm_tdx(vcpu->kvm)->tsc_offset;
+	vcpu->arch.l1_tsc_offset =3D vcpu->arch.tsc_offset;
+	vcpu->arch.guest_state_protected =3D
+		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
+
+	return 0;
+
+free_tdvpx:
+	/* @i points at the TDVPX page that failed allocation. */
+	for (--i; i >=3D 0; i--)
+		free_page(tdx->tdvpx[i].va);
+	kfree(tdx->tdvpx);
+	tdx->tdvpx =3D NULL;
+free_tdvpr:
+	free_page(tdx->tdvpr.va);
+
+	return ret;
+}
+
+void tdx_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	int i;
+
+	/* Can't reclaim or free pages if teardown failed. */
+	if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
+		return;
+
+	if (tdx->tdvpx) {
+		for (i =3D 0; i < tdx_caps.tdvpx_nr_pages; i++)
+			tdx_reclaim_td_page(&tdx->tdvpx[i]);
+		kfree(tdx->tdvpx);
+		tdx->tdvpx =3D NULL;
+	}
+	tdx_reclaim_td_page(&tdx->tdvpr);
+}
+
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	struct msr_data apic_base_msr;
+	u64 err;
+	int i;
+
+	/* TDX doesn't support INIT event. */
+	if (WARN_ON_ONCE(init_event))
+		goto td_bugged;
+	if (WARN_ON_ONCE(is_td_vcpu_created(tdx)))
+		goto td_bugged;
+
+	err =3D tdh_vp_create(kvm_tdx->tdr.pa, tdx->tdvpr.pa);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_VP_CREATE, err, NULL);
+		goto td_bugged;
+	}
+	tdx_mark_td_page_added(&tdx->tdvpr);
+
+	for (i =3D 0; i < tdx_caps.tdvpx_nr_pages; i++) {
+		err =3D tdh_vp_addcx(tdx->tdvpr.pa, tdx->tdvpx[i].pa);
+		if (WARN_ON_ONCE(err)) {
+			pr_tdx_error(TDH_VP_ADDCX, err, NULL);
+			goto td_bugged;
+		}
+		tdx_mark_td_page_added(&tdx->tdvpx[i]);
+	}
+
+	if (!vcpu->arch.cpuid_entries) {
+		/*
+		 * On cpu creation, cpuid entry is blank.  Forcibly enable
+		 * X2APIC feature to allow X2APIC.
+		 */
+		struct kvm_cpuid_entry2 *e;
+
+		e =3D kvmalloc_array(1, sizeof(*e), GFP_KERNEL_ACCOUNT);
+		*e  =3D (struct kvm_cpuid_entry2) {
+			.function =3D 1,	/* Features for X2APIC */
+			.index =3D 0,
+			.eax =3D 0,
+			.ebx =3D 0,
+			.ecx =3D 1ULL << 21,	/* X2APIC */
+			.edx =3D 0,
+		};
+		vcpu->arch.cpuid_entries =3D e;
+		vcpu->arch.cpuid_nent =3D 1;
+	}
+	apic_base_msr.data =3D APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC;
+	if (kvm_vcpu_is_reset_bsp(vcpu))
+		apic_base_msr.data |=3D MSR_IA32_APICBASE_BSP;
+	apic_base_msr.host_initiated =3D true;
+	if (WARN_ON_ONCE(kvm_set_apic_base(vcpu, &apic_base_msr)))
+		goto td_bugged;
+
+	vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
+
+	return;
+
+td_bugged:
+	vcpu->kvm->vm_bugged =3D true;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 3576b5c7238d..1febdc8dfe9f 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -141,6 +141,10 @@ int tdx_vm_init(struct kvm *kvm);
 void tdx_mmu_release_hkid(struct kvm *kvm);
 void tdx_vm_free(struct kvm *kvm);
=20
+int tdx_vcpu_create(struct kvm_vcpu *vcpu);
+void tdx_vcpu_free(struct kvm_vcpu *vcpu);
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
@@ -154,6 +158,10 @@ static inline void tdx_mmu_release_hkid(struct kvm *kv=
m) {}
 static inline void tdx_flush_shadow_all_private(struct kvm *kvm) {}
 static inline void tdx_vm_free(struct kvm *kvm) {}
=20
+static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTS=
UPP; }
+static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
+static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
+
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 #endif
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9E4F1C43217
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:21:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232032AbiI3KVi (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:21:38 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33706 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231431AbiI3KTB (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:01 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6BF64166F10;
        Fri, 30 Sep 2022 03:18:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533139; x=1696069139;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=34CR5xIh6D3mP24sKs3zCz+r1oDmGsCAoUfohFLjao0=;
  b=ZDNgFx11pcpEzZWw97AJfxO0iA6wt3VZSlCS8LO04TdnCvrBjrb3uCwO
   4VvDOulfCCOcibdziwF8GuovdT5R55uA2Qs5LM0YYOmYX9BCFsdbYwp3l
   LJVHKbFpoZychJOlg6OfxU7h/Slz2fXgZI4eUvFTSlTBvlBTqa+0SiRev
   lW7ETfRuod3OyOdjyxCLda4qgsHiYjiphEn3HuPjGFwQ1IAO5z9GDX+oD
   dE8UvFBrgVjqOt4klRrjSVIqSHR5GGdN5qW7YlIkAWFJ5rtwKLRomqavr
   HyBMwidgjLnHp9xWX6AjHCW+sdo7NKXmamKp0rCbP8ZD8DoVHZ2zgXhSc
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870078"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870078"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:55 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807580"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807580"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:55 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 024/105] KVM: TDX: Do TDX specific vcpu initialization
Date: Fri, 30 Sep 2022 03:17:18 -0700
Message-Id: 
 <944c214b72337343c7b1a87e816fdc4c3b077a99.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TD guest vcpu need to be configured before ready to run which requests
addtional information from Device model (e.g. qemu), one 64bit value is
passed to vcpu's RCX as an initial value.  Repurpose KVM_MEMORY_ENCRYPT_OP
to vcpu-scope and add new sub-commands KVM_TDX_INIT_VCPU under it for such
additional vcpu configuration.

Add callback for kvm vCPU-scoped operations of KVM_MEMORY_ENCRYPT_OP and
add a new subcommand, KVM_TDX_INIT_VCPU, for further vcpu initialization.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h    |   1 +
 arch/x86/include/asm/kvm_host.h       |   1 +
 arch/x86/include/uapi/asm/kvm.h       |   1 +
 arch/x86/kvm/vmx/main.c               |   9 ++
 arch/x86/kvm/vmx/tdx.c                | 166 ++++++++++++++++++--------
 arch/x86/kvm/vmx/tdx.h                |   4 +
 arch/x86/kvm/vmx/x86_ops.h            |   2 +
 arch/x86/kvm/x86.c                    |   6 +
 tools/arch/x86/include/uapi/asm/kvm.h |   1 +
 9 files changed, 139 insertions(+), 52 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 78b6b2c4d596..104a34b44e94 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -120,6 +120,7 @@ KVM_X86_OP(leave_smm)
 KVM_X86_OP(enable_smi_window)
 KVM_X86_OP_OPTIONAL(dev_mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
+KVM_X86_OP_OPTIONAL(vcpu_mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_register_region)
 KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
 KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index aa43733746c6..531f04e36904 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1622,6 +1622,7 @@ struct kvm_x86_ops {
=20
 	int (*dev_mem_enc_ioctl)(void __user *argp);
 	int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp);
+	int (*vcpu_mem_enc_ioctl)(struct kvm_vcpu *vcpu, void __user *argp);
 	int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *ar=
gp);
 	int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *=
argp);
 	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index e231ba752788..801b78b957fa 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -539,6 +539,7 @@ struct kvm_pmu_event_filter {
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
 	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index b49d3f58dc4f..fe927aaee114 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -103,6 +103,14 @@ static int vt_mem_enc_ioctl(struct kvm *kvm, void __us=
er *argp)
 	return tdx_vm_ioctl(kvm, argp);
 }
=20
+static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
+{
+	if (!is_td_vcpu(vcpu))
+		return -EINVAL;
+
+	return tdx_vcpu_ioctl(vcpu, argp);
+}
+
 struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
=20
@@ -246,6 +254,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.dev_mem_enc_ioctl =3D tdx_dev_ioctl,
 	.mem_enc_ioctl =3D vt_mem_enc_ioctl,
+	.vcpu_mem_enc_ioctl =3D vt_vcpu_mem_enc_ioctl,
 };
=20
 struct kvm_x86_init_ops vt_init_ops __initdata =3D {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 0fa4746f0450..10b0ac09bd00 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -70,6 +70,11 @@ static inline bool is_hkid_assigned(struct kvm_tdx *kvm_=
tdx)
 	return kvm_tdx->hkid > 0;
 }
=20
+static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->finalized;
+}
+
 static void tdx_clear_page(unsigned long page)
 {
 	const void *zero_page =3D (const void *) __va(page_to_phys(ZERO_PAGE(0)));
@@ -293,31 +298,12 @@ int tdx_vm_init(struct kvm *kvm)
=20
 int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
-	int ret, i;
-
 	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
 	if (!vcpu->arch.apic)
 		return -EINVAL;
=20
 	fpstate_set_confidential(&vcpu->arch.guest_fpu);
=20
-	ret =3D tdx_alloc_td_page(&tdx->tdvpr);
-	if (ret)
-		return ret;
-
-	tdx->tdvpx =3D kcalloc(tdx_caps.tdvpx_nr_pages, sizeof(*tdx->tdvpx),
-			GFP_KERNEL_ACCOUNT);
-	if (!tdx->tdvpx) {
-		ret =3D -ENOMEM;
-		goto free_tdvpr;
-	}
-	for (i =3D 0; i < tdx_caps.tdvpx_nr_pages; i++) {
-		ret =3D tdx_alloc_td_page(&tdx->tdvpx[i]);
-		if (ret)
-			goto free_tdvpx;
-	}
-
 	vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
=20
 	vcpu->arch.cr0_guest_owned_bits =3D -1ul;
@@ -329,17 +315,6 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
=20
 	return 0;
-
-free_tdvpx:
-	/* @i points at the TDVPX page that failed allocation. */
-	for (--i; i >=3D 0; i--)
-		free_page(tdx->tdvpx[i].va);
-	kfree(tdx->tdvpx);
-	tdx->tdvpx =3D NULL;
-free_tdvpr:
-	free_page(tdx->tdvpr.va);
-
-	return ret;
 }
=20
 void tdx_vcpu_free(struct kvm_vcpu *vcpu)
@@ -362,34 +337,14 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu)
=20
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
-	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
-	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
 	struct msr_data apic_base_msr;
-	u64 err;
-	int i;
=20
 	/* TDX doesn't support INIT event. */
 	if (WARN_ON_ONCE(init_event))
 		goto td_bugged;
-	if (WARN_ON_ONCE(is_td_vcpu_created(tdx)))
+	if (WARN_ON_ONCE(is_td_vcpu_created(to_tdx(vcpu))))
 		goto td_bugged;
=20
-	err =3D tdh_vp_create(kvm_tdx->tdr.pa, tdx->tdvpr.pa);
-	if (WARN_ON_ONCE(err)) {
-		pr_tdx_error(TDH_VP_CREATE, err, NULL);
-		goto td_bugged;
-	}
-	tdx_mark_td_page_added(&tdx->tdvpr);
-
-	for (i =3D 0; i < tdx_caps.tdvpx_nr_pages; i++) {
-		err =3D tdh_vp_addcx(tdx->tdvpr.pa, tdx->tdvpx[i].pa);
-		if (WARN_ON_ONCE(err)) {
-			pr_tdx_error(TDH_VP_ADDCX, err, NULL);
-			goto td_bugged;
-		}
-		tdx_mark_td_page_added(&tdx->tdvpx[i]);
-	}
-
 	if (!vcpu->arch.cpuid_entries) {
 		/*
 		 * On cpu creation, cpuid entry is blank.  Forcibly enable
@@ -409,6 +364,8 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_ev=
ent)
 		vcpu->arch.cpuid_entries =3D e;
 		vcpu->arch.cpuid_nent =3D 1;
 	}
+
+	/* TDX rquires X2APIC. */
 	apic_base_msr.data =3D APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC;
 	if (kvm_vcpu_is_reset_bsp(vcpu))
 		apic_base_msr.data |=3D MSR_IA32_APICBASE_BSP;
@@ -416,7 +373,10 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	if (WARN_ON_ONCE(kvm_set_apic_base(vcpu, &apic_base_msr)))
 		goto td_bugged;
=20
-	vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
+	/*
+	 * Don't update mp_state to runnable because more initialization
+	 * is needed by TDX_VCPU_INIT.
+	 */
=20
 	return;
=20
@@ -860,6 +820,108 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	return r;
 }
=20
+static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	int ret, i;
+	u64 err;
+
+	if (is_td_vcpu_created(tdx))
+		return -EINVAL;
+
+	ret =3D tdx_alloc_td_page(&tdx->tdvpr);
+	if (ret)
+		return ret;
+
+	tdx->tdvpx =3D kcalloc(tdx_caps.tdvpx_nr_pages, sizeof(*tdx->tdvpx),
+			     GFP_KERNEL_ACCOUNT);
+	if (!tdx->tdvpx) {
+		ret =3D -ENOMEM;
+		goto free_tdvpr;
+	}
+	for (i =3D 0; i < tdx_caps.tdvpx_nr_pages; i++) {
+		ret =3D tdx_alloc_td_page(&tdx->tdvpx[i]);
+		if (ret)
+			goto free_tdvpx;
+	}
+
+	err =3D tdh_vp_create(kvm_tdx->tdr.pa, tdx->tdvpr.pa);
+	if (WARN_ON_ONCE(err)) {
+		ret =3D -EIO;
+		pr_tdx_error(TDH_VP_CREATE, err, NULL);
+		goto td_bugged;
+	}
+	tdx_mark_td_page_added(&tdx->tdvpr);
+
+	for (i =3D 0; i < tdx_caps.tdvpx_nr_pages; i++) {
+		err =3D tdh_vp_addcx(tdx->tdvpr.pa, tdx->tdvpx[i].pa);
+		if (WARN_ON_ONCE(err)) {
+			ret =3D -EIO;
+			pr_tdx_error(TDH_VP_ADDCX, err, NULL);
+			goto td_bugged;
+		}
+		tdx_mark_td_page_added(&tdx->tdvpx[i]);
+	}
+
+	err =3D tdh_vp_init(tdx->tdvpr.pa, vcpu_rcx);
+	if (WARN_ON_ONCE(err)) {
+		ret =3D -EIO;
+		pr_tdx_error(TDH_VP_INIT, err, NULL);
+		goto td_bugged;
+	}
+
+	vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
+
+	return 0;
+
+td_bugged:
+	vcpu->kvm->vm_bugged =3D true;
+	return ret;
+
+free_tdvpx:
+	/* @i points at the TDVPX page that failed allocation. */
+	for (--i; i >=3D 0; i--)
+		free_page(tdx->tdvpx[i].va);
+	kfree(tdx->tdvpx);
+	tdx->tdvpx =3D NULL;
+free_tdvpr:
+	free_page(tdx->tdvpr.va);
+
+	return ret;
+}
+
+int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	struct kvm_tdx_cmd cmd;
+	int ret;
+
+	if (tdx->vcpu_initialized)
+		return -EINVAL;
+
+	if (!is_td_initialized(vcpu->kvm) || is_td_finalized(kvm_tdx))
+		return -EINVAL;
+
+	if (copy_from_user(&cmd, argp, sizeof(cmd)))
+		return -EFAULT;
+
+	if (cmd.error || cmd.unused)
+		return -EINVAL;
+
+	/* Currently only KVM_TDX_INTI_VCPU is defined for vcpu operation. */
+	if (cmd.flags || cmd.id !=3D KVM_TDX_INIT_VCPU)
+		return -EINVAL;
+
+	ret =3D tdx_td_vcpu_init(vcpu, (u64)cmd.data);
+	if (ret)
+		return ret;
+
+	tdx->vcpu_initialized =3D true;
+	return 0;
+}
+
 static int __init tdx_module_setup(void)
 {
 	const struct tdsysinfo_struct *tdsysinfo;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index b1906dc2f0f9..4ce236a0cab2 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -23,6 +23,8 @@ struct kvm_tdx {
 	u64 xfam;
 	int hkid;
=20
+	bool finalized;
+
 	u64 tsc_offset;
 };
=20
@@ -32,6 +34,8 @@ struct vcpu_tdx {
 	struct tdx_td_page tdvpr;
 	struct tdx_td_page *tdvpx;
=20
+	bool vcpu_initialized;
+
 	/*
 	 * Dummy to make pmu_intel not corrupt memory.
 	 * TODO: Support PMU for TDX.  Future work.
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 1febdc8dfe9f..37c74f325b97 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -146,6 +146,7 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
+int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
@@ -163,6 +164,7 @@ static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu)=
 {}
 static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
+static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e2cbeeec9d6a..e1c35cbe0b77 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5920,6 +5920,12 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	case KVM_SET_DEVICE_ATTR:
 		r =3D kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp);
 		break;
+	case KVM_MEMORY_ENCRYPT_OP:
+		r =3D -ENOTTY;
+		if (!kvm_x86_ops.vcpu_mem_enc_ioctl)
+			goto out;
+		r =3D kvm_x86_ops.vcpu_mem_enc_ioctl(vcpu, argp);
+		break;
 	default:
 		r =3D -EINVAL;
 	}
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 531a0033e530..35e3b4aa2e96 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -539,6 +539,7 @@ struct kvm_pmu_event_filter {
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
 	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E380DC4332F
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:21:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231989AbiI3KVW (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:21:22 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33566 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231407AbiI3KTB (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:01 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C19D166F14;
        Fri, 30 Sep 2022 03:18:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533139; x=1696069139;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=4vSZXqKi+VzUQ8Oss4q4ps8WZefdkIn/jIGxOEvEYF4=;
  b=gAeI49NO4KxURWbnIFS82Yjh305BEjOtRIJgcso/e83/zoH6uNiRUB6B
   21X8RxnY+qoO/y0lWSIUsxnYWsTbEIe9aH5fZQKkfl0kzU3tVIP3daAZF
   yZ4PQPUXu6Yh3jmD9oqAG1dCje/d5CGRHnlZruHFPAIRDHiOkch3+lTXp
   y4ihjdQNLVWPznGtqgn8pbjoERrxtjjA2GVnpcyvueWb/gtu1A+SNUR10
   GxjdMUm5TjG+nNhlBv3WEzonh8aeWwhqhw4QclbISXZ7G8lc83C/V6CiD
   0IrwEPORvZmQKwYdR40+MXVnWaVXF8NOg5HtVn0OzNxxsboMej4yuJcc7
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870079"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870079"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:55 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807583"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807583"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:55 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Chao Peng <chao.p.peng@linux.intel.com>
Subject: [PATCH v9 025/105] KVM: TDX: Use private memory for TDX
Date: Fri, 30 Sep 2022 03:17:19 -0700
Message-Id: 
 <380563c808afef2260e601e3f761e050bd0ad964.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Chao Peng <chao.p.peng@linux.intel.com>

Override kvm_arch_has_private_mem() to use fd-based private memory.
Return true when a VM has a type of KVM_X86_TDX_VM.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/x86.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e1c35cbe0b77..5006ff5d9f5e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13588,6 +13588,11 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, un=
signed int size,
 }
 EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
=20
+bool kvm_arch_has_private_mem(struct kvm *kvm)
+{
+	return kvm->arch.vm_type =3D=3D KVM_X86_TDX_VM;
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4469EC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:21:31 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232003AbiI3KV2 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:21:28 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33580 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231423AbiI3KTB (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:01 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C368166F1C;
        Fri, 30 Sep 2022 03:18:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533139; x=1696069139;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=GrRrgKMz4riAK/ebZ/sI81D2KhEbTTaxZ+pXBobqoJ4=;
  b=EwJ/OsVy0HooDgyUCTHyUBtBEGpUc5LMgL3J3YUjqEVzbgN+HCUSiWQm
   dHMbbuHR76p7AVp0sl2ts/Y59Uh2N21/PQrtiHv6VEC9KpTIyfxe4g3Cu
   +9BDpiDZJ5IaSS+/q6H7Bcb8/ZleKuNVYHM7UHy1GtBNdsRTQBEw5yZ8h
   WGSUqcGumwXaGmqLJbTZsqVrgOwfks41gn/jJMKPEU96xMyIyWjjEXg7y
   nzfpJSv6sMYcdveHgmANjkaHSlREqmSINBqSgcVBwXmCGbmvWisyu8sfz
   jSVtcDSGzTeIZFp7Zbg3Umcu3U072d9sIQtm0zh0oLLOceF2Ceq3whs2T
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870080"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870080"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:55 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807586"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807586"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:55 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 026/105] [MARKER] The start of TDX KVM patch series: KVM
 MMU GPA shared bits
Date: Fri, 30 Sep 2022 03:17:20 -0700
Message-Id: 
 <566197cb9c348b3bf40ac69ac319b96f64bab3f5.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of KVM MMU GPA
shared bits.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 3e8efde3e3f3..6e3f71ab6b59 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -10,6 +10,7 @@ What qemu can do
 ----------------
 - TDX VM TYPE is exposed to Qemu.
 - Qemu can create/destroy guest of TDX vm type.
+- Qemu can create/destroy vcpu of TDX vm type.
=20
 Patch Layer status
 ------------------
@@ -17,13 +18,13 @@ Patch Layer status
 * TDX, VMX coexistence:                 Applied
 * TDX architectural definitions:        Applied
 * TD VM creation/destruction:           Applied
-* TD vcpu creation/destruction:         Applying
+* TD vcpu creation/destruction:         Applied
 * TDX EPT violation:                    Not yet
 * TD finalization:                      Not yet
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
-* KVM MMU GPA shared bits:              Not yet
+* KVM MMU GPA shared bits:              Applying
 * KVM TDP refactoring for TDX:          Not yet
 * KVM TDP MMU hooks:                    Not yet
 * KVM TDP MMU MapGPA:                   Not yet
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9BA90C4332F
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:21:34 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232017AbiI3KVc (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:21:32 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33702 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231430AbiI3KTB (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:01 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E9B815ED1E;
        Fri, 30 Sep 2022 03:19:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533140; x=1696069140;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Ofl4tu1tk6jjezt6ZEuGxsaoew3+Xuq9m3NoTHM2FO8=;
  b=eDNBDfd2r5gSu3o3RF2CoKUih9VIktLGECcZh4ZC6GWkFUyuSd5wOBIs
   3TRa0ruOuy5Iu2+l/a5+WTKJOpremx6k1yiF9p8Q1KH0lu2mSWmp1Cmyy
   xWtv7Dc90DIA8khkdz+sDAbbzTI8mKqvIWs+TIQLy/yM45WMNVfOsJOwU
   EqhxVOK2We65lSZydxcfq6TtQloEmze7gljB12hNJXM0D+ZAwnP50ZlUB
   Jfk7RMyFOpWM9tQlerRyWtYguHdVY1CkGyckXFUovdiek3YIHE2rKe4NM
   7WyQMWEuK3vBv27JiCqSkFnPTmI5FoDnuqNvcLowKGD77h0JrgRTsFwGv
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870082"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870082"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:55 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807589"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807589"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:55 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 027/105] KVM: x86/mmu: introduce config for PRIVATE KVM MMU
Date: Fri, 30 Sep 2022 03:17:21 -0700
Message-Id: 
 <b9556ed61ea2d79dec4e4878266fb2190186fca5.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

To keep the case of non TDX intact, introduce a new config option for
private KVM MMU support.  At the moment, this is synonym for
CONFIG_INTEL_TDX_HOST && CONFIG_KVM_INTEL.  The config makes it clear
that the config is only for x86 KVM MMU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/Kconfig | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 25a24909375d..350a921b15cb 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -131,4 +131,8 @@ config KVM_XEN
 config KVM_EXTERNAL_WRITE_TRACKING
 	bool
=20
+config KVM_MMU_PRIVATE
+	def_bool y
+	depends on INTEL_TDX_HOST && KVM_INTEL
+
 endif # VIRTUALIZATION
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D5670C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:21:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232059AbiI3KVr (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:21:47 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33722 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231440AbiI3KTC (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:02 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA08F15ED3C;
        Fri, 30 Sep 2022 03:19:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533140; x=1696069140;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=dbcgdh/LKzhol7ZEIFT7uC8aHwK+z6qgswQmkF6mbW4=;
  b=PRyF3d2zHeNvY9HTqKp52Ncf+Hh4BbGoxMdWTNtIjg/H9OYIc7H2psxV
   m9CxAeL8gHvkecRiyenhoKjQzJxGgdLWveR791TjhfIvWN/ANI1/hBgtK
   T0Gidv6wmNDgKjoXHkwdgmRjyL8jRFUoIiPSnRnCE3JGqffx92uBvxdW8
   tmoH+VJQkZJso+hyGcH+D5BLPQi9vYVI5rtsX/k9dprwSk9iqbaNBKw26
   +PP4Wj+e8ETIcudCLalLUX5/9TFJT81r40DuCJPeiyND/b9W0Rt5kKbPW
   m+OwUDL2D+MzVWZN+8i4GQyhvVaYZiXzXUVd5YJcS8+tOkJJ54opJdI2t
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870083"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870083"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:56 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807593"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807593"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:55 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: [PATCH v9 028/105] KVM: x86/mmu: Add address conversion functions for
 TDX shared bit of GPA
Date: Fri, 30 Sep 2022 03:17:22 -0700
Message-Id: 
 <98006089a38c9521d666bd074f7b99c68604a934.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX repurposes one GPA bit (51 bit or 47 bit based on configuration) to
indicate the GPA is private(if cleared) or shared (if set) with VMM.  If
GPA.shared is set, GPA is covered by the existing conventional EPT pointed
by EPTP.  If GPA.shared bit is cleared, GPA is covered by TDX module.
VMM has to issue SEAMCALLs to operate.

Add a member to remember GPA shared bit for each guest TDs, add address
conversion functions between private GPA and shared GPA and test if GPA
is private.

Because struct kvm_arch (or struct kvm which includes struct kvm_arch. See
kvm_arch_alloc_vm() that passes __GPF_ZERO) is zero-cleared when allocated,
the new member to remember GPA shared bit is guaranteed to be zero with
this patch unless it's initialized explicitly.

Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++++
 arch/x86/kvm/mmu.h              | 32 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.c          |  5 +++++
 3 files changed, 41 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 531f04e36904..fc28bf9c0552 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1366,6 +1366,10 @@ struct kvm_arch {
 	 */
 #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
 	struct kvm_mmu_memory_cache split_desc_cache;
+
+#ifdef CONFIG_KVM_MMU_PRIVATE
+	gfn_t gfn_shared_mask;
+#endif
 };
=20
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index c94b620bf94b..000a0a6ac815 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -276,4 +276,36 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu =
*vcpu,
 		return gpa;
 	return translate_nested_gpa(vcpu, gpa, access, exception);
 }
+
+static inline gfn_t kvm_gfn_shared_mask(const struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_MMU_PRIVATE
+	return kvm->arch.gfn_shared_mask;
+#else
+	return 0;
+#endif
+}
+
+static inline gfn_t kvm_gfn_shared(const struct kvm *kvm, gfn_t gfn)
+{
+	return gfn | kvm_gfn_shared_mask(kvm);
+}
+
+static inline gfn_t kvm_gfn_private(const struct kvm *kvm, gfn_t gfn)
+{
+	return gfn & ~kvm_gfn_shared_mask(kvm);
+}
+
+static inline gpa_t kvm_gpa_private(const struct kvm *kvm, gpa_t gpa)
+{
+	return gpa & ~gfn_to_gpa(kvm_gfn_shared_mask(kvm));
+}
+
+static inline bool kvm_is_private_gpa(const struct kvm *kvm, gpa_t gpa)
+{
+	gfn_t mask =3D kvm_gfn_shared_mask(kvm);
+
+	return mask && !(gpa_to_gfn(gpa) & mask);
+}
+
 #endif
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 10b0ac09bd00..af99a46d1e75 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -784,6 +784,11 @@ static int tdx_td_init(struct kvm *kvm, struct kvm_tdx=
_cmd *cmd)
 	kvm_tdx->xfam =3D td_params->xfam;
 	kvm->max_vcpus =3D td_params->max_vcpus;
=20
+	if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW)
+		kvm->arch.gfn_shared_mask =3D gpa_to_gfn(BIT_ULL(51));
+	else
+		kvm->arch.gfn_shared_mask =3D gpa_to_gfn(BIT_ULL(47));
+
 out:
 	/* kfree() accepts NULL. */
 	kfree(init_vm);
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 09B11C43217
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:21:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231815AbiI3KVl (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:21:41 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33724 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231443AbiI3KTC (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:02 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17EBA166F0B;
        Fri, 30 Sep 2022 03:19:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533141; x=1696069141;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=rWTnsxMrIzKafHc5a68BzRinBeuJZkWqHiP4eLXCUc4=;
  b=m7Wr0JpW+OprGkHOke+vw/SAIwiUCEMCZS1jCFzEubnbPOAw8w57ctm2
   4bJ6azWbY3u0nxvVjYUaILi/amfdfBbNfUHggJSwQwbVpFMoDA7jXkZXb
   rfXoJeM+CR2HEY51uiLl+owCDp6T1RL+HcVZTuoKhKHhLBS4b3Um3bpbI
   zbKT8ZG0LZkyjpmsMBQscko/6UExQ83UQGkJqqahm7TwvVv5jjbFq/zy4
   tdcA4U2TyAOTwFAlSb2b/lChhpfIcovKJ94j4ftZVB/7gN4ck2dWbMmaX
   NRAw1hYmN6KITsnndlhnQ4d9QfgJJN3fwXH/aXsgGU42ZGyLhcpDDA4+4
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870084"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870084"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:56 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807596"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807596"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:56 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 029/105] [MARKER] The start of TDX KVM patch series: KVM
 TDP refactoring for TDX
Date: Fri, 30 Sep 2022 03:17:23 -0700
Message-Id: 
 <44e355a94bcc76bcc52ac2aad2be5d823155df3a.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of KVM TDP
refactoring for TDX.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 6e3f71ab6b59..df003d2ed89e 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -24,7 +24,7 @@ Patch Layer status
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
-* KVM MMU GPA shared bits:              Applying
-* KVM TDP refactoring for TDX:          Not yet
+* KVM MMU GPA shared bits:              Applied
+* KVM TDP refactoring for TDX:          Applying
 * KVM TDP MMU hooks:                    Not yet
 * KVM TDP MMU MapGPA:                   Not yet
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7F21CC4332F
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232084AbiI3KV6 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:21:58 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33594 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231472AbiI3KTC (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:02 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42BFF166F1E;
        Fri, 30 Sep 2022 03:19:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533141; x=1696069141;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=AFzhBiuFJUt3qy79nbYUvU/h8A5+9FGA9IY4AWUm4SU=;
  b=OTaW7FcwCgVY5r6YIG0OYRGH3q22cRrurJkpfZhwlH1P+KddNZuF9mAP
   xblyTWkmLi9Z/qfHwNfYE8lyQp+EYKAWed1YENXFR7wvMguGsCmL939TW
   fnrfuBsIuppJzELdtYhJyPQyF6babZ9rE8tY/m1M5h4KTOmdN91vY7ddA
   9ijDc0jVJ6StvVQEfSZ071aTSM+5PcY433gyURoi/V/VyhbKGTBd16M80
   /TMa+LFiO0GH7chpZwzjN0lemH2QPbQRdUoSdb6iy6btD+jrtnocAtaWY
   JMriFxnjtea6mW+BWA8/AVZLCo28nlLl4yJUqIT3Xckpgkyz3mqzAJFqQ
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870085"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870085"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:56 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807600"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807600"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:56 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 030/105] KVM: x86/mmu: Replace hardcoded value 0 for the
 initial value for SPTE
Date: Fri, 30 Sep 2022 03:17:24 -0700
Message-Id: 
 <1d1459082c4ff5316fa14860f4023c23a210c94e.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX support will need the "suppress #VE" bit (bit 63) set as the
initial value for SPTE.  To reduce code change size, introduce a new macro
SHADOW_NONPRESENT_VALUE for the initial value for the shadow page table
entry (SPTE) and replace hard-coded value 0 for it.  Initialize shadow page
tables with their value.

The plan is to unconditionally set the "suppress #VE" bit for both AMD and
Intel as: 1) AMD hardware doesn't use this bit; 2) for conventional VMX
guests, KVM never enables the "EPT-violation #VE" in VMCS control and
"suppress #VE" bit is ignored by hardware.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c     | 50 +++++++++++++++++++++++++++++++++-----
 arch/x86/kvm/mmu/spte.h    |  2 ++
 arch/x86/kvm/mmu/tdp_mmu.c | 15 ++++++------
 3 files changed, 54 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c9f60acfc322..ff8de361067c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -538,9 +538,9 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u=
64 *sptep)
=20
 	if (!is_shadow_present_pte(old_spte) ||
 	    !spte_has_volatile_bits(old_spte))
-		__update_clear_spte_fast(sptep, 0ull);
+		__update_clear_spte_fast(sptep, SHADOW_NONPRESENT_VALUE);
 	else
-		old_spte =3D __update_clear_spte_slow(sptep, 0ull);
+		old_spte =3D __update_clear_spte_slow(sptep, SHADOW_NONPRESENT_VALUE);
=20
 	if (!is_shadow_present_pte(old_spte))
 		return old_spte;
@@ -574,7 +574,7 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u=
64 *sptep)
  */
 static void mmu_spte_clear_no_track(u64 *sptep)
 {
-	__update_clear_spte_fast(sptep, 0ull);
+	__update_clear_spte_fast(sptep, SHADOW_NONPRESENT_VALUE);
 }
=20
 static u64 mmu_spte_get_lockless(u64 *sptep)
@@ -642,6 +642,39 @@ static void walk_shadow_page_lockless_end(struct kvm_v=
cpu *vcpu)
 	}
 }
=20
+#ifdef CONFIG_X86_64
+static inline void kvm_init_shadow_page(void *page)
+{
+	memset64(page, SHADOW_NONPRESENT_VALUE, 4096 / 8);
+}
+
+static int mmu_topup_shadow_page_cache(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu_memory_cache *mc =3D &vcpu->arch.mmu_shadow_page_cache;
+	int start, end, i, r;
+
+	start =3D kvm_mmu_memory_cache_nr_free_objects(mc);
+	r =3D kvm_mmu_topup_memory_cache(mc, PT64_ROOT_MAX_LEVEL);
+
+	/*
+	 * Note, topup may have allocated objects even if it failed to allocate
+	 * the minimum number of objects required to make forward progress _at
+	 * this time_.  Initialize newly allocated objects even on failure, as
+	 * userspace can free memory and rerun the vCPU in response to -ENOMEM.
+	 */
+	end =3D kvm_mmu_memory_cache_nr_free_objects(mc);
+	for (i =3D start; i < end; i++)
+		kvm_init_shadow_page(mc->objects[i]);
+	return r;
+}
+#else
+static int mmu_topup_shadow_page_cache(struct kvm_vcpu *vcpu)
+{
+	return kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
+					  PT64_ROOT_MAX_LEVEL);
+}
+#endif /* CONFIG_X86_64 */
+
 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indir=
ect)
 {
 	int r;
@@ -651,8 +684,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcp=
u, bool maybe_indirect)
 				       1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
 	if (r)
 		return r;
-	r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
-				       PT64_ROOT_MAX_LEVEL);
+	r =3D mmu_topup_shadow_page_cache(vcpu);
 	if (r)
 		return r;
 	if (maybe_indirect) {
@@ -5861,7 +5893,13 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_page_header_cache.kmem_cache =3D mmu_page_header_cache;
 	vcpu->arch.mmu_page_header_cache.gfp_zero =3D __GFP_ZERO;
=20
-	vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO;
+	/*
+	 * When X86_64, initial SEPT entries are initialized with
+	 * SHADOW_NONPRESENT_VALUE.  Otherwise zeroed.  See
+	 * mmu_topup_shadow_page_cache().
+	 */
+	if (!IS_ENABLED(CONFIG_X86_64))
+		vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO;
=20
 	vcpu->arch.mmu =3D &vcpu->arch.root_mmu;
 	vcpu->arch.walk_mmu =3D &vcpu->arch.root_mmu;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 7670c13ce251..42ecaa75da15 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -148,6 +148,8 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS =3D=3D 8 && MMIO_S=
PTE_GEN_HIGH_BITS =3D=3D 11);
=20
 #define MMIO_SPTE_GEN_MASK		GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE=
_GEN_HIGH_BITS - 1, 0)
=20
+#define SHADOW_NONPRESENT_VALUE	0ULL
+
 extern u64 __read_mostly shadow_host_writable_mask;
 extern u64 __read_mostly shadow_mmu_writable_mask;
 extern u64 __read_mostly shadow_nx_mask;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bf2ccf9debca..0a7edea7488e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -683,7 +683,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *k=
vm,
 	 * here since the SPTE is going from non-present to non-present.  Use
 	 * the raw write helper to avoid an unnecessary check on volatile bits.
 	 */
-	__kvm_tdp_mmu_write_spte(iter->sptep, 0);
+	__kvm_tdp_mmu_write_spte(iter->sptep, SHADOW_NONPRESENT_VALUE);
=20
 	return 0;
 }
@@ -860,8 +860,8 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct =
kvm_mmu_page *root,
 			continue;
=20
 		if (!shared)
-			tdp_mmu_set_spte(kvm, &iter, 0);
-		else if (tdp_mmu_set_spte_atomic(kvm, &iter, 0))
+			tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
+		else if (tdp_mmu_set_spte_atomic(kvm, &iter, SHADOW_NONPRESENT_VALUE))
 			goto retry;
 	}
 }
@@ -917,8 +917,9 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu=
_page *sp)
 	if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
 		return false;
=20
-	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
-			   sp->gfn, sp->role.level + 1, true, true);
+	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte,
+			   SHADOW_NONPRESENT_VALUE, sp->gfn, sp->role.level + 1,
+			   true, true);
=20
 	return true;
 }
@@ -952,7 +953,7 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct k=
vm_mmu_page *root,
 		    !is_last_spte(iter.old_spte, iter.level))
 			continue;
=20
-		tdp_mmu_set_spte(kvm, &iter, 0);
+		tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
 		flush =3D true;
 	}
=20
@@ -1316,7 +1317,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_=
iter *iter,
 	 * invariant that the PFN of a present * leaf SPTE can never change.
 	 * See __handle_changed_spte().
 	 */
-	tdp_mmu_set_spte(kvm, iter, 0);
+	tdp_mmu_set_spte(kvm, iter, SHADOW_NONPRESENT_VALUE);
=20
 	if (!pte_write(range->pte)) {
 		new_spte =3D kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 63BFBC4332F
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:21:52 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232070AbiI3KVu (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:21:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33728 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231459AbiI3KTC (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:02 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4EAE915348B;
        Fri, 30 Sep 2022 03:19:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533141; x=1696069141;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=fKKqQjm5zfL/HnoTp2AqWyipBAMJR5631mTIwmgWhU8=;
  b=Ef2VwB7btbQ1w2tdap38oieQAHDnMmwCXfTFWVnNxxkvNLOtHepF0Zmx
   t7VtYI43VGtJQjX8QAxDje9GDKGkIIWWHguS/P5dV2fYcYbsMFfUAxJ/H
   kvV8+2BpQ1FQqeUABddzUFCIIOk8gRAui02F9cWOiW6+3AjXYJnnZYw+F
   OmW1MWIwGXLoguHG3LovMrsBgf7nML1mtxDL+uteGBzFVpmpPNjuidcYa
   hS4kECcZSbM4+ktKB2rCFTXkujsqWAcCvlUUy1f4FdjUgP5twss8mGmCl
   BEfhG1KtMqJsytlgPUoUya84EFJ1KP33WUVEnoHNCt6H5OUy4m84nuIVC
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870086"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870086"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:56 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807603"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807603"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:56 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 031/105] KVM: x86/mmu: Make sync_page not use hard-coded 0
 as the initial SPTE value
Date: Fri, 30 Sep 2022 03:17:25 -0700
Message-Id: 
 <79ca03ac97aa28df839f8818a4486407bc71c605.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

FNAME(sync_page) in arch/x86/kvm/mmu/paging_tmpl.h assumes that the initial
shadow page table entry (SPTE) is zero.  Remove the assumption by using
SHADOW_NONPRESENT_VALUE that will be updated from 0 to non-zero value.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 39e0205e7300..4586de2cfe57 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1036,7 +1036,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, st=
ruct kvm_mmu_page *sp)
 		gpa_t pte_gpa;
 		gfn_t gfn;
=20
-		if (!sp->spt[i])
+		/* spt[i] has initial value of shadow page table allocation */
+		if (sp->spt[i] !=3D SHADOW_NONPRESENT_VALUE)
 			continue;
=20
 		pte_gpa =3D first_pte_gpa + i * sizeof(pt_element_t);
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E149FC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:06 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232095AbiI3KWC (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:02 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33604 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231494AbiI3KTD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:03 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 119C8166F22;
        Fri, 30 Sep 2022 03:19:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533142; x=1696069142;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=qVhCkIkTPxs7McjR4Yz/H0BYiUKA/fAH3z4d7Nm/1as=;
  b=MvvoEo4ibjkmVNveXeRIbSI/VF6azu4Uoy83vegrg7FL9Yb0fAPoPZx4
   iwJVww/RfOTfSuQ/SYgf8coT4FCYYc2be0SvrBEQIW+pvdDzbp+xKhy/B
   EbXE+0yN3Wq079Jb5klLYmCKZ7scRlLFvShdYQyZs0WUuzAwwIZz/dVXf
   EuGHV1cq26Hiw/OY/fHxjFo6XCOzd8CJFDCLEfsY05Y0LkVWq4t+uRKfb
   qOcXW5NR2oLoLsdR2UsS6UCsOBOa2fEJLDwf9ak6tPPY5JBbhYOzoeZB+
   aa2+TZydEs+bx84u0S4QB0Xx6oOgR94K6MdW/zAlc7sHkP4pFqhEOHlVm
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870088"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870088"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:56 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807606"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807606"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:56 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 032/105] KVM: x86/mmu: Allow non-zero value for non-present
 SPTE and removed SPTE
Date: Fri, 30 Sep 2022 03:17:26 -0700
Message-Id: 
 <fe690c1a01cbb62860d8f63af5f95711618b3497.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

For TD guest, the current way to emulate MMIO doesn't work any more, as KVM
is not able to access the private memory of TD guest and do the emulation.
Instead, TD guest expects to receive #VE when it accesses the MMIO and then
it can explicitly makes hypercall to KVM to get the expected information.

To achieve this, the TDX module always enables "EPT-violation #VE" in the
VMCS control.  And accordingly, KVM needs to configure the MMIO spte to
trigger EPT violation (instead of misconfiguration) and at the same time,
also clear the "suppress #VE" bit so the TD guest can get the #VE instead
of causing actual EPT violation to KVM.

In order for KVM to be able to have chance to set up the correct SPTE for
MMIO for TD guest, the default non-present SPTE must have the "suppress
guest accesses the MMIO. Also, when TD guest accesses the actual shared
memory, it should continue to trigger EPT violation to the KVM instead of
receiving the #VE (the TDX module guarantees KVM will receive EPT violation
for private memory access).  This means for the shared memory, the SPTE
also must have the "suppress #VE" bit set for the non-present SPTE.

Add "suppress VE" bit (bit 63) to SHADOW_NONPRESENT_VALUE and REMOVED_SPTE.
Unconditionally set the "suppress #VE" bit (which is bit 63) for both AMD
and Intel as: 1) AMD hardware doesn't use this bit when present bit is off;
2) for normal VMX guest, KVM never enables the "EPT-violation #VE" in VMCS
control and "suppress #VE" bit is ignored by hardware.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/vmx.h |  1 +
 arch/x86/kvm/mmu/spte.c    |  4 +++-
 arch/x86/kvm/mmu/spte.h    | 22 +++++++++++++++++++++-
 arch/x86/kvm/mmu/tdp_mmu.c |  8 ++++++++
 4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index c371ef695fcc..6231ef005a50 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -511,6 +511,7 @@ enum vmcs_field {
 #define VMX_EPT_IPAT_BIT    			(1ull << 6)
 #define VMX_EPT_ACCESS_BIT			(1ull << 8)
 #define VMX_EPT_DIRTY_BIT			(1ull << 9)
+#define VMX_EPT_SUPPRESS_VE_BIT			(1ull << 63)
 #define VMX_EPT_RWX_MASK                        (VMX_EPT_READABLE_MASK |  =
     \
 						 VMX_EPT_WRITABLE_MASK |       \
 						 VMX_EPT_EXECUTABLE_MASK)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 2e08b2a45361..0b97a045c5f0 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -419,7 +419,9 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_e=
xec_only)
 	shadow_dirty_mask	=3D has_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull;
 	shadow_nx_mask		=3D 0ull;
 	shadow_x_mask		=3D VMX_EPT_EXECUTABLE_MASK;
-	shadow_present_mask	=3D has_exec_only ? 0ull : VMX_EPT_READABLE_MASK;
+	/* VMX_EPT_SUPPRESS_VE_BIT is needed for W or X violation. */
+	shadow_present_mask	=3D
+		(has_exec_only ? 0ull : VMX_EPT_READABLE_MASK) | VMX_EPT_SUPPRESS_VE_BIT;
 	/*
 	 * EPT overrides the host MTRRs, and so KVM must program the desired
 	 * memtype directly into the SPTEs.  Note, this mask is just the mask
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 42ecaa75da15..7e0f79e8f45b 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -148,7 +148,22 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS =3D=3D 8 && MMIO_=
SPTE_GEN_HIGH_BITS =3D=3D 11);
=20
 #define MMIO_SPTE_GEN_MASK		GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE=
_GEN_HIGH_BITS - 1, 0)
=20
+/*
+ * non-present SPTE value for both VMX and SVM for TDP MMU.
+ * For SVM NPT, for non-present spte (bit 0 =3D 0), other bits are ignored.
+ * For VMX EPT, bit 63 is ignored if #VE is disabled. (EPT_VIOLATION_VE=3D=
0)
+ *              bit 63 is #VE suppress if #VE is enabled. (EPT_VIOLATION_V=
E=3D1)
+ * For TDX:
+ *   Secure-EPT: TDX module sets EPT_VIOLATION_VE for Secure-EPT
+ *   private EPT: "suppress #VE" bit is ignored.  CPU doesn't walk it.
+ *   conventional EPT: "suppress #VE" bit must be set to get EPT violation
+ */
+#ifdef CONFIG_X86_64
+#define SHADOW_NONPRESENT_VALUE	BIT_ULL(63)
+static_assert(!(SHADOW_NONPRESENT_VALUE & SPTE_MMU_PRESENT_MASK));
+#else
 #define SHADOW_NONPRESENT_VALUE	0ULL
+#endif
=20
 extern u64 __read_mostly shadow_host_writable_mask;
 extern u64 __read_mostly shadow_mmu_writable_mask;
@@ -189,13 +204,18 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_ma=
sk;
  * non-present intermediate value. Other threads which encounter this value
  * should not modify the SPTE.
  *
+ * For X86_64 case, SHADOW_NONPRESENT_VALUE, "suppress #VE" bit, is set be=
cause
+ * "EPT violation #VE" in the secondary VM execution control may be enable=
d.
+ * Because TDX module sets "EPT violation #VE" for TD, "suppress #VE" bit =
for
+ * the conventional EPT needs to be set.
+ *
  * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-prese=
nt on
  * bot AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a=
 L1TF
  * vulnerability.  Use only low bits to avoid 64-bit immediates.
  *
  * Only used by the TDP MMU.
  */
-#define REMOVED_SPTE	0x5a0ULL
+#define REMOVED_SPTE	(SHADOW_NONPRESENT_VALUE | 0x5a0ULL)
=20
 /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
 static_assert(!(REMOVED_SPTE & SPTE_MMU_PRESENT_MASK));
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 0a7edea7488e..af510dd31ebc 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -682,6 +682,14 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *=
kvm,
 	 * overwrite the special removed SPTE value. No bookkeeping is needed
 	 * here since the SPTE is going from non-present to non-present.  Use
 	 * the raw write helper to avoid an unnecessary check on volatile bits.
+	 *
+	 * Set non-present value to SHADOW_NONPRESENT_VALUE, rather than 0.
+	 * It is because when TDX is enabled, TDX module always
+	 * enables "EPT-violation #VE", so KVM needs to set
+	 * "suppress #VE" bit in EPT table entries, in order to get
+	 * real EPT violation, rather than TDVMCALL.  KVM sets
+	 * SHADOW_NONPRESENT_VALUE (which sets "suppress #VE" bit) so it
+	 * can be set when EPT table entries are zapped.
 	 */
 	__kvm_tdp_mmu_write_spte(iter->sptep, SHADOW_NONPRESENT_VALUE);
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D1EBEC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:10 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232108AbiI3KWJ (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:09 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33602 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231527AbiI3KTD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:03 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38792166F02;
        Fri, 30 Sep 2022 03:19:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533142; x=1696069142;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=KuHqjRVUBksdUjeiSc9eKEgyTsBPP0Eliqw02r0hyTk=;
  b=CjXIXk4RKQkvyVl966rPfJ4uFaAYBr/4jhX7F9I1TPWhisSxkdTQbHrZ
   qJN0NIpiAIBJxgokHudJqv+8gvqmdYU1RgeUyZV9KAEiNZLOzGUqZTE3Z
   /arLy5cXMJ6oKIAAXXkmvLk8hlyjZa1xFXfutBMWLDU9a2RPIlG0Ht1+k
   sJ3n7+TWeR270Lk+mUQeIeLrz7wUwABRvCOq5X2eLlVpPzOb3Jn5qMMwA
   lX0L/BF54WsOfoCOlcDDnww8ADk2+gMm7iXRnIR6gcQXdjWCKRBOVz3f6
   yjxYrAa+0KNIAbatXyvfrkBu0EU1rY5Yo2sVjylgBS0xPxwA8W059TE6N
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870089"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870089"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:57 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807609"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807609"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:56 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 033/105] KVM: x86/mmu: Add Suppress VE bit to
 shadow_mmio_{value, mask}
Date: Fri, 30 Sep 2022 03:17:27 -0700
Message-Id: 
 <a7968618c523e8095259a0f3701469c1003103d0.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because TDX will need shadow_mmio_mask to be VMX_SUPPRESS_VE | RWX and
shadow_mmio_value to be 0, make VMX EPT case use same value for TDX
shadow_mmio_mask.  For VMX, VMX_SUPPRESS_VE doesn't matter, it doesn't
affect VMX logic to add the bit to shadow_mmio_{value, mask}.  Note that
shadow_mmio_value will be per-VM value.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/spte.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 0b97a045c5f0..5d5c06d4fd89 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -437,8 +437,8 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_e=
xec_only)
 	 * EPT Misconfigurations are generated if the value of bits 2:0
 	 * of an EPT paging-structure entry is 110b (write/execute).
 	 */
-	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE,
-				   VMX_EPT_RWX_MASK, 0);
+	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE | VMX_EPT_SUPPRESS_=
VE_BIT,
+				   VMX_EPT_RWX_MASK | VMX_EPT_SUPPRESS_VE_BIT, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_ept_masks);
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D7EE7C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232131AbiI3KWT (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:19 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33560 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231417AbiI3KTE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:04 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38855166F3C;
        Fri, 30 Sep 2022 03:19:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533142; x=1696069142;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ReTvh59xSNWb7+bHPDJ2zE4tCXa/zPkwj0z2c8vZTnk=;
  b=JSY9bj8q8eDp0eyh1IC1cAua6kvoRFI963X+C8TSTasHhxzSqdDAejMK
   TlRspiOr6criNlCeqKs0gYfVPsr8RGl0gBhI/FB02d4ESL8uNWw+TelgF
   1n4st8vMxnZGicpuA3kaExh0NIs8A/ckmi0SjVwS7xg9+boPI3V0wl6Az
   VC+1eNTAuqcXrozX5vZQKU7KXXbUjKsv4utYHw/5MkBZeRyRnHeIZChJu
   4r6A6WE4v4SPYW0uZFIUZmBSNaQRTnsAdK2styFPigtFalied8cTSRlF2
   yBN8eKxj1ISiL1hNp4RMxURayk8pQfPHrZ1SJcpHelzDwL3nWr50gR90n
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870090"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870090"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:57 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807613"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807613"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:57 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 034/105] KVM: x86/mmu: Track shadow MMIO value on a per-VM
 basis
Date: Fri, 30 Sep 2022 03:17:28 -0700
Message-Id: 
 <733b346ad4772d0dd7e64ccb451a1f57b8a2ce80.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX will use a different shadow PTE entry value for MMIO from VMX.  Add
members to kvm_arch and track value for MMIO per-VM instead of global
variables.  By using the per-VM EPT entry value for MMIO, the existing VMX
logic is kept working.  To untangle the logic to initialize
shadow_mmio_access_mask, introduce a separate setter function.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu.h              |  1 +
 arch/x86/kvm/mmu/mmu.c          |  7 ++++---
 arch/x86/kvm/mmu/spte.c         | 11 +++++++++--
 arch/x86/kvm/mmu/spte.h         |  4 ++--
 arch/x86/kvm/mmu/tdp_mmu.c      |  6 +++---
 6 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index fc28bf9c0552..4d4794789c42 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1165,6 +1165,8 @@ struct kvm_arch {
 	 */
 	spinlock_t mmu_unsync_pages_lock;
=20
+	u64 shadow_mmio_value;
+
 	struct list_head assigned_dev_head;
 	struct iommu_domain *iommu_domain;
 	bool iommu_noncoherent;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 000a0a6ac815..f86fb04fb7d7 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -101,6 +101,7 @@ static inline u8 kvm_get_shadow_phys_bits(void)
 }
=20
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_=
mask);
+void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
 void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
=20
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ff8de361067c..fdd773ef9400 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2409,7 +2409,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct k=
vm_mmu_page *sp,
 				return kvm_mmu_prepare_zap_page(kvm, child,
 								invalid_list);
 		}
-	} else if (is_mmio_spte(pte)) {
+	} else if (is_mmio_spte(kvm, pte)) {
 		mmu_spte_clear_no_track(spte);
 	}
 	return 0;
@@ -4068,7 +4068,7 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vc=
pu, u64 addr, bool direct)
 	if (WARN_ON(reserved))
 		return -EINVAL;
=20
-	if (is_mmio_spte(spte)) {
+	if (is_mmio_spte(vcpu->kvm, spte)) {
 		gfn_t gfn =3D get_mmio_spte_gfn(spte);
 		unsigned int access =3D get_mmio_spte_access(spte);
=20
@@ -4569,7 +4569,7 @@ static unsigned long get_cr3(struct kvm_vcpu *vcpu)
 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
 			   unsigned int access)
 {
-	if (unlikely(is_mmio_spte(*sptep))) {
+	if (unlikely(is_mmio_spte(vcpu->kvm, *sptep))) {
 		if (gfn !=3D get_mmio_spte_gfn(*sptep)) {
 			mmu_spte_clear_no_track(sptep);
 			return true;
@@ -6052,6 +6052,7 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	struct kvm_page_track_notifier_node *node =3D &kvm->arch.mmu_sp_tracker;
 	int r;
=20
+	kvm->arch.shadow_mmio_value =3D shadow_mmio_value;
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
 	INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 5d5c06d4fd89..8f468ee2b985 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -74,10 +74,10 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsi=
gned int access)
 	u64 spte =3D generation_mmio_spte_mask(gen);
 	u64 gpa =3D gfn << PAGE_SHIFT;
=20
-	WARN_ON_ONCE(!shadow_mmio_value);
+	WARN_ON_ONCE(!vcpu->kvm->arch.shadow_mmio_value);
=20
 	access &=3D shadow_mmio_access_mask;
-	spte |=3D shadow_mmio_value | access;
+	spte |=3D vcpu->kvm->arch.shadow_mmio_value | access;
 	spte |=3D gpa | shadow_nonpresent_or_rsvd_mask;
 	spte |=3D (gpa & shadow_nonpresent_or_rsvd_mask)
 		<< SHADOW_NONPRESENT_OR_RSVD_MASK_LEN;
@@ -352,6 +352,7 @@ u64 mark_spte_for_access_track(u64 spte)
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_=
mask)
 {
 	BUG_ON((u64)(unsigned)access_mask !=3D access_mask);
+
 	WARN_ON(mmio_value & shadow_nonpresent_or_rsvd_lower_gfn_mask);
=20
 	/*
@@ -401,6 +402,12 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mm=
io_mask, u64 access_mask)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
=20
+void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value)
+{
+	kvm->arch.shadow_mmio_value =3D mmio_value;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_value);
+
 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask)
 {
 	/* shadow_me_value must be a subset of shadow_me_mask */
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 7e0f79e8f45b..82f0d5c08b77 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -241,9 +241,9 @@ static inline int spte_index(u64 *sptep)
  */
 extern u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;
=20
-static inline bool is_mmio_spte(u64 spte)
+static inline bool is_mmio_spte(struct kvm *kvm, u64 spte)
 {
-	return (spte & shadow_mmio_mask) =3D=3D shadow_mmio_value &&
+	return (spte & shadow_mmio_mask) =3D=3D kvm->arch.shadow_mmio_value &&
 	       likely(enable_mmio_caching);
 }
=20
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index af510dd31ebc..b80422ea798d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -569,8 +569,8 @@ static void __handle_changed_spte(struct kvm *kvm, int =
as_id, gfn_t gfn,
 		 * impact the guest since both the former and current SPTEs
 		 * are nonpresent.
 		 */
-		if (WARN_ON(!is_mmio_spte(old_spte) &&
-			    !is_mmio_spte(new_spte) &&
+		if (WARN_ON(!is_mmio_spte(kvm, old_spte) &&
+			    !is_mmio_spte(kvm, new_spte) &&
 			    !is_removed_spte(new_spte)))
 			pr_err("Unexpected SPTE change! Nonpresent SPTEs\n"
 			       "should not be replaced with another,\n"
@@ -1094,7 +1094,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm=
_vcpu *vcpu,
 	}
=20
 	/* If a MMIO SPTE is installed, the MMIO will need to be emulated. */
-	if (unlikely(is_mmio_spte(new_spte))) {
+	if (unlikely(is_mmio_spte(vcpu->kvm, new_spte))) {
 		vcpu->stat.pf_mmio_spte_created++;
 		trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn,
 				     new_spte);
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5D139C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:16 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232119AbiI3KWP (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:15 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33582 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231530AbiI3KTE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:04 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38814166F2D;
        Fri, 30 Sep 2022 03:19:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533142; x=1696069142;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=egSdmBDWUZIcewFDKHSSDYx6qc0YZSUEykKZr8M2EK4=;
  b=fb+syh55GvAKKxqwqeD61sa2ZikpXmh/Zg34gCuVZrH/sZ6S4hnCtmHU
   viyd7rRGg+hBZrt38mzKP0ssyc7aA4ocGdp/Mz52BaN+Z1PFWzLpoQiMc
   tLzc+//c6THSTA2J8kk1XjBrBSqz95b12HkGhB/2uenVqYgsy3dYejFru
   NDHN1zLs3XgazphscCnaqdbBNnUGzzhZ4WtiIhe+sByV9HChC+YlqPDCj
   x2szuyfzRs5lcThcWb7A2dG3Py4aDdhP2LK267xH5a5CbRg98AocLpdcZ
   X63wILTxFp1j+i0zVoEePK4fWMN8pBOBPadctIbIbqRDKXv0hCmLbA48S
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870091"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870091"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:57 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807616"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807616"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:57 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 035/105] KVM: TDX: Enable mmio spte caching always for TDX
Date: Fri, 30 Sep 2022 03:17:29 -0700
Message-Id: 
 <8eac5ca057eb23b851dadcbae39f267edf50d8d3.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX needs to set shared spte for MMIO GFN to !SUPPRES_VE_BIT | !RWX so that
guest TD can get #VE and then issue TDG.VP.VMCALL<MMIO>.  Enable mmio
caching always for TDX irrelevant the module parameter enable_mmio_caching.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c     | 3 ++-
 arch/x86/kvm/mmu/spte.h    | 2 +-
 arch/x86/kvm/mmu/tdp_mmu.c | 7 +++++++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fdd773ef9400..f4d7432cd9fc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3216,7 +3216,8 @@ static int handle_abnormal_pfn(struct kvm_vcpu *vcpu,=
 struct kvm_page_fault *fau
 		 * and only if L1's MAXPHYADDR is inaccurate with respect to
 		 * the hardware's).
 		 */
-		if (unlikely(!enable_mmio_caching) ||
+		if (unlikely(!enable_mmio_caching &&
+			     !kvm_gfn_shared_mask(vcpu->kvm)) ||
 		    unlikely(fault->gfn > kvm_mmu_max_gfn()))
 			return RET_PF_EMULATE;
 	}
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 82f0d5c08b77..fecfdcb5f321 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -244,7 +244,7 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_lowe=
r_gfn_mask;
 static inline bool is_mmio_spte(struct kvm *kvm, u64 spte)
 {
 	return (spte & shadow_mmio_mask) =3D=3D kvm->arch.shadow_mmio_value &&
-	       likely(enable_mmio_caching);
+	       likely(enable_mmio_caching || kvm_gfn_shared_mask(kvm));
 }
=20
 static inline bool is_shadow_present_pte(u64 pte)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b80422ea798d..5ecb976ed954 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1863,6 +1863,13 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 =
addr, u64 *sptes,
=20
 	*root_level =3D vcpu->arch.mmu->root_role.level;
=20
+	/*
+	 * mmio page fault isn't supported for protected guest because
+	 * instructions in protected guest memory can't be parsed by VMM.
+	 */
+	if (WARN_ON_ONCE(kvm_gfn_shared_mask(vcpu->kvm)))
+		return leaf;
+
 	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
 		leaf =3D iter.level;
 		sptes[leaf] =3D iter.old_spte;
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E3721C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232142AbiI3KWW (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:22 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33796 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231536AbiI3KTE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:04 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 637B315ED32;
        Fri, 30 Sep 2022 03:19:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533142; x=1696069142;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=kRqGXwvT6ecA6jYMlkVRsGMRjPs2V/ytrmrzFD9hsU0=;
  b=n2790kdtTG2QAjtg4hD2bJ2XWQnx4nNzcfUmtDo6sZ+NH7Sp4oVPZt2v
   BqMAfEOJf/EOHJomHLroU7w3MzXfuZmkV5hsBQllPYhw4Z48AFxlI1xwk
   w7Ikl2uQYbRTrxKoRuQElCKYZZgKpF1ijrIWnMJCrNK8/34cU4ZojDHhL
   d6FOOL26/SBDbi+4tDqZpNNlVuAxTfdjnUj54c2xvZkhq4rnjMivDuZzl
   kI7eFGtnbQVSsEM+FZOyMmPPPYBl/kMJvOmD+2P91xTsZuTSHmRSCMn0t
   OJECKcDNCHv/MpuILN5TmZ85CxWViM1YJkBaW+puOF4fmSLY7IbbJpEY/
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870093"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870093"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:57 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807619"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807619"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:57 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 036/105] KVM: x86/mmu: Disallow fast page fault on private
 GPA
Date: Fri, 30 Sep 2022 03:17:30 -0700
Message-Id: 
 <cd04c6a8bf26efe38d790073197670bb8b2c3f0d.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX requires TDX SEAMCALL to operate Secure EPT instead of direct memory
access and TDX SEAMCALL is heavy operation.  Fast page fault on private GPA
doesn't make sense.  Disallow fast page fault on private GPA.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f4d7432cd9fc..2fd70876d346 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3225,8 +3225,16 @@ static int handle_abnormal_pfn(struct kvm_vcpu *vcpu=
, struct kvm_page_fault *fau
 	return RET_PF_CONTINUE;
 }
=20
-static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
+static bool page_fault_can_be_fast(struct kvm *kvm, struct kvm_page_fault =
*fault)
 {
+	/*
+	 * TDX private mapping doesn't support fast page fault because the EPT
+	 * entry is read/written with TDX SEAMCALLs instead of direct memory
+	 * access.
+	 */
+	if (kvm_is_private_gpa(kvm, fault->addr))
+		return false;
+
 	/*
 	 * Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only
 	 * reach the common page fault handler if the SPTE has an invalid MMIO
@@ -3336,7 +3344,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, str=
uct kvm_page_fault *fault)
 	u64 *sptep =3D NULL;
 	uint retry_count =3D 0;
=20
-	if (!page_fault_can_be_fast(fault))
+	if (!page_fault_can_be_fast(vcpu->kvm, fault))
 		return ret;
=20
 	walk_shadow_page_lockless_begin(vcpu);
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B4B47C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:32 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232173AbiI3KWa (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:30 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33818 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231542AbiI3KTE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:04 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94120166F04;
        Fri, 30 Sep 2022 03:19:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533143; x=1696069143;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=X+8KgY5TeqLFzYI/x6X9VtsO1xS9mi4pv83tPlKvJjE=;
  b=PnrZUJmq/oB0Pt8YfngUI7+aGtF2ELl9uHucGrP4GDzK4HM+qmkDkn8Y
   fjc4NGbG+6tZ++xduvnq6xwlxVghst9XAqslQT4vLEw9ZP78xE6dPojXO
   OyHDJshGhYl45DJ4LFAl70cPnRAx0ATxU97jx83d9U5CwwRIUln0uswbR
   C6tpJ5hLctuwcwaIhb/cjcd3JmpMPR/pH701fhQK8hOj7RznVm0PArv21
   hUDKE/jHM1KZsf8vk4W7xp9+1QbGBQSxu6K5HeAcwYHkJGo+PhmS0dCQ4
   3MvsswfNG31k2HhHqXMgl1gB+0F0n7NEQIqaOQ869iwuOmc9Cn0OLOkRy
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870094"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870094"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:57 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807622"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807622"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:57 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 037/105] KVM: x86/mmu: Allow per-VM override of the TDP max
 page level
Date: Fri, 30 Sep 2022 03:17:31 -0700
Message-Id: 
 <b25e411050eaa24f5407244397533c867dfb390c.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX requires special handling to support large private page.  For
simplicity, only support 4K page for TD guest for now.  Add per-VM maximum
page level support to support different maximum page sizes for TD guest and
conventional VMX guest.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/mmu/mmu.c          | 1 +
 arch/x86/kvm/mmu/mmu_internal.h | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 4d4794789c42..122e1baef012 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1150,6 +1150,7 @@ struct kvm_arch {
 	unsigned long n_requested_mmu_pages;
 	unsigned long n_max_mmu_pages;
 	unsigned int indirect_shadow_pages;
+	int tdp_max_page_level;
 	u8 mmu_valid_gen;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
 	struct list_head active_mmu_pages;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2fd70876d346..97d575f787cc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6083,6 +6083,7 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	kvm->arch.split_desc_cache.kmem_cache =3D pte_list_desc_cache;
 	kvm->arch.split_desc_cache.gfp_zero =3D __GFP_ZERO;
=20
+	kvm->arch.tdp_max_page_level =3D KVM_MAX_HUGEPAGE_LEVEL;
 	return 0;
 }
=20
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index b27d5ae01cd8..486d719ca2e1 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -277,7 +277,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu=
 *vcpu, gpa_t cr2_or_gpa,
 		.nx_huge_page_workaround_enabled =3D
 			is_nx_huge_page_enabled(vcpu->kvm),
=20
-		.max_level =3D KVM_MAX_HUGEPAGE_LEVEL,
+		.max_level =3D vcpu->kvm->arch.tdp_max_page_level,
 		.req_level =3D PG_LEVEL_4K,
 		.goal_level =3D PG_LEVEL_4K,
 	};
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4C083C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232226AbiI3KWo (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:44 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33596 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231546AbiI3KTG (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:06 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F34816DDFB;
        Fri, 30 Sep 2022 03:19:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533144; x=1696069144;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=VNsEassfauCiOCjoXu6Lbz/ViiveLKwAKldsb4tWDIs=;
  b=QnBX2ifrAVR4DAaJPdsYUHB1VzIk8SRlBnqhUN7E36PMjfu3+wc4hRdL
   LXyrL8R249yWGlpG9ecndiRm9q2rWvfaJTUkDT/t/rMfTOOAYBlyjMI6H
   u3fLyKbVloacu7HHITm3GuRsudluYmQh6ibnJWnFxfBALapnduow/Nzij
   2yKmROvPspPXLpm1Pm8QI8a3cAoRE48788KM5XNEcTO0f/epl1rvpVn1n
   xG/yszMFk/XSD6LQzdrpdkuvzxmuqzEAkC0NhX5dVQWBPHl6QPPt1KODr
   rDNmLWES3UvbQ3cQFDQ0OPkqqEot6sTOo5CCF6e2SSrzKi+19D/ekxs2l
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870095"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870095"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:57 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807625"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807625"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:57 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 038/105] KVM: VMX: Introduce test mode related to EPT
 violation VE
Date: Fri, 30 Sep 2022 03:17:32 -0700
Message-Id: 
 <d24eb9c76ee3d09cb1dbfd73762591800885cc57.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

To support TDX, KVM is enhanced to operate with #VE.  For TDX, KVM programs
to inject #VE conditionally and set #VE suppress bit in EPT entry.  For VMX
case, #VE isn't used.  If #VE happens for VMX, it's a bug.  To be
defensive (test that VMX case isn't broken), introduce option
ept_violation_ve_test and when it's set, set error.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/vmx.h | 12 +++++++
 arch/x86/kvm/vmx/vmcs.h    |  5 +++
 arch/x86/kvm/vmx/vmx.c     | 68 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h     |  3 ++
 4 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 6231ef005a50..f0f8eecf55ac 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -68,6 +68,7 @@
 #define SECONDARY_EXEC_ENCLS_EXITING		VMCS_CONTROL_BIT(ENCLS_EXITING)
 #define SECONDARY_EXEC_RDSEED_EXITING		VMCS_CONTROL_BIT(RDSEED_EXITING)
 #define SECONDARY_EXEC_ENABLE_PML               VMCS_CONTROL_BIT(PAGE_MOD_=
LOGGING)
+#define SECONDARY_EXEC_EPT_VIOLATION_VE		VMCS_CONTROL_BIT(EPT_VIOLATION_VE)
 #define SECONDARY_EXEC_PT_CONCEAL_VMX		VMCS_CONTROL_BIT(PT_CONCEAL_VMX)
 #define SECONDARY_EXEC_XSAVES			VMCS_CONTROL_BIT(XSAVES)
 #define SECONDARY_EXEC_MODE_BASED_EPT_EXEC	VMCS_CONTROL_BIT(MODE_BASED_EPT=
_EXEC)
@@ -223,6 +224,8 @@ enum vmcs_field {
 	VMREAD_BITMAP_HIGH              =3D 0x00002027,
 	VMWRITE_BITMAP                  =3D 0x00002028,
 	VMWRITE_BITMAP_HIGH             =3D 0x00002029,
+	VE_INFORMATION_ADDRESS		=3D 0x0000202A,
+	VE_INFORMATION_ADDRESS_HIGH	=3D 0x0000202B,
 	XSS_EXIT_BITMAP                 =3D 0x0000202C,
 	XSS_EXIT_BITMAP_HIGH            =3D 0x0000202D,
 	ENCLS_EXITING_BITMAP		=3D 0x0000202E,
@@ -628,4 +631,13 @@ enum vmx_l1d_flush_state {
=20
 extern enum vmx_l1d_flush_state l1tf_vmx_mitigation;
=20
+struct vmx_ve_information {
+	u32 exit_reason;
+	u32 delivery;
+	u64 exit_qualification;
+	u64 guest_linear_address;
+	u64 guest_physical_address;
+	u16 eptp_index;
+};
+
 #endif
diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
index ac290a44a693..9277676057a7 100644
--- a/arch/x86/kvm/vmx/vmcs.h
+++ b/arch/x86/kvm/vmx/vmcs.h
@@ -140,6 +140,11 @@ static inline bool is_nm_fault(u32 intr_info)
 	return is_exception_n(intr_info, NM_VECTOR);
 }
=20
+static inline bool is_ve_fault(u32 intr_info)
+{
+	return is_exception_n(intr_info, VE_VECTOR);
+}
+
 /* Undocumented: icebp/int1 */
 static inline bool is_icebp(u32 intr_info)
 {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b53ffd367f51..f1e25e4097e1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -126,6 +126,9 @@ module_param(error_on_inconsistent_vmcs_config, bool, 0=
444);
 static bool __read_mostly dump_invalid_vmcs =3D 0;
 module_param(dump_invalid_vmcs, bool, 0644);
=20
+static bool __read_mostly ept_violation_ve_test;
+module_param(ept_violation_ve_test, bool, 0444);
+
 #define MSR_BITMAP_MODE_X2APIC		1
 #define MSR_BITMAP_MODE_X2APIC_APICV	2
=20
@@ -783,6 +786,13 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
=20
 	eb =3D (1u << PF_VECTOR) | (1u << UD_VECTOR) | (1u << MC_VECTOR) |
 	     (1u << DB_VECTOR) | (1u << AC_VECTOR);
+	/*
+	 * #VE isn't used for VMX, but for TDX.  To test against unexpected
+	 * change related to #VE for VMX, intercept unexpected #VE and warn on
+	 * it.
+	 */
+	if (ept_violation_ve_test)
+		eb |=3D 1u << VE_VECTOR;
 	/*
 	 * Guest access to VMware backdoor ports could legitimately
 	 * trigger #GP because of TSS I/O permission bitmap.
@@ -2647,6 +2657,8 @@ static int setup_vmcs_config(struct vmcs_config *vmcs=
_conf,
 			SECONDARY_EXEC_NOTIFY_VM_EXITING;
 		if (cpu_has_sgx())
 			opt2 |=3D SECONDARY_EXEC_ENCLS_EXITING;
+		if (ept_violation_ve_test)
+			opt2 |=3D SECONDARY_EXEC_EPT_VIOLATION_VE;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -2681,6 +2693,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs=
_conf,
 			return -EIO;
=20
 		vmx_cap->ept =3D 0;
+		_cpu_based_2nd_exec_control &=3D ~SECONDARY_EXEC_EPT_VIOLATION_VE;
 	}
 	if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_VPID) &&
 	    vmx_cap->vpid) {
@@ -4520,6 +4533,7 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx=
 *vmx)
 		exec_control &=3D ~SECONDARY_EXEC_ENABLE_VPID;
 	if (!enable_ept) {
 		exec_control &=3D ~SECONDARY_EXEC_ENABLE_EPT;
+		exec_control &=3D ~SECONDARY_EXEC_EPT_VIOLATION_VE;
 		enable_unrestricted_guest =3D 0;
 	}
 	if (!enable_unrestricted_guest)
@@ -4647,8 +4661,40 @@ static void init_vmcs(struct vcpu_vmx *vmx)
=20
 	exec_controls_set(vmx, vmx_exec_control(vmx));
=20
-	if (cpu_has_secondary_exec_ctrls())
+	if (cpu_has_secondary_exec_ctrls()) {
 		secondary_exec_controls_set(vmx, vmx_secondary_exec_control(vmx));
+		if (secondary_exec_controls_get(vmx) &
+		    SECONDARY_EXEC_EPT_VIOLATION_VE) {
+			if (!vmx->ve_info) {
+				/* ve_info must be page aligned. */
+				struct page *page;
+
+				BUILD_BUG_ON(sizeof(*vmx->ve_info) > PAGE_SIZE);
+				page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+				if (page)
+					vmx->ve_info =3D page_to_virt(page);
+			}
+			if (vmx->ve_info) {
+				/*
+				 * Allow #VE delivery. CPU sets this field to
+				 * 0xFFFFFFFF on #VE delivery.  Another #VE can
+				 * occur only if software clears the field.
+				 */
+				vmx->ve_info->delivery =3D 0;
+				vmcs_write64(VE_INFORMATION_ADDRESS,
+					     __pa(vmx->ve_info));
+			} else {
+				/*
+				 * Because SECONDARY_EXEC_EPT_VIOLATION_VE is
+				 * used only when ept_violation_ve_test is true,
+				 * it's okay to go with the bit disabled.
+				 */
+				pr_err("Failed to allocate ve_info. disabling EPT_VIOLATION_VE.\n");
+				secondary_exec_controls_clearbit(vmx,
+								 SECONDARY_EXEC_EPT_VIOLATION_VE);
+			}
+		}
+	}
=20
 	if (cpu_has_tertiary_exec_ctrls())
 		tertiary_exec_controls_set(vmx, vmx_tertiary_exec_control(vmx));
@@ -5128,6 +5174,12 @@ static int handle_exception_nmi(struct kvm_vcpu *vcp=
u)
 	if (is_invalid_opcode(intr_info))
 		return handle_ud(vcpu);
=20
+	/*
+	 * #VE isn't supposed to happen.  Although vcpu can send
+	 */
+	if (KVM_BUG_ON(is_ve_fault(intr_info), vcpu->kvm))
+		return -EIO;
+
 	error_code =3D 0;
 	if (intr_info & INTR_INFO_DELIVER_CODE_MASK)
 		error_code =3D vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
@@ -6314,6 +6366,18 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (secondary_exec_control & SECONDARY_EXEC_ENABLE_VPID)
 		pr_err("Virtual processor ID =3D 0x%04x\n",
 		       vmcs_read16(VIRTUAL_PROCESSOR_ID));
+	if (secondary_exec_control & SECONDARY_EXEC_EPT_VIOLATION_VE) {
+		struct vmx_ve_information *ve_info;
+
+		pr_err("VE info address =3D 0x%016llx\n",
+		       vmcs_read64(VE_INFORMATION_ADDRESS));
+		ve_info =3D __va(vmcs_read64(VE_INFORMATION_ADDRESS));
+		pr_err("ve_info: 0x%08x 0x%08x 0x%016llx 0x%016llx 0x%016llx 0x%04x\n",
+		       ve_info->exit_reason, ve_info->delivery,
+		       ve_info->exit_qualification,
+		       ve_info->guest_linear_address,
+		       ve_info->guest_physical_address, ve_info->eptp_index);
+	}
 }
=20
 /*
@@ -7310,6 +7374,8 @@ void vmx_vcpu_free(struct kvm_vcpu *vcpu)
 	free_vpid(vmx->vpid);
 	nested_vmx_free_vcpu(vcpu);
 	free_loaded_vmcs(vmx->loaded_vmcs);
+	if (vmx->ve_info)
+		free_page((unsigned long)vmx->ve_info);
 }
=20
 int vmx_vcpu_create(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c9fb46e570b0..47240671535a 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -359,6 +359,9 @@ struct vcpu_vmx {
 		DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 		DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 	} shadow_msr_intercept;
+
+	/* ve_info must be page aligned. */
+	struct vmx_ve_information *ve_info;
 };
=20
 struct kvm_vmx {
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1901EC4332F
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231883AbiI3KWs (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:48 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33836 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231405AbiI3KTF (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:05 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F62B16FFB5;
        Fri, 30 Sep 2022 03:19:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533144; x=1696069144;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=9+zynqxUnbt7L2tmrykvNHmz+Wt/xr8Fj4lA+9l+hKk=;
  b=kue4fVw1SbVSH/wT1duuB6t7p1UD6eHJ7eZUdUeG25GEz0YGnIO4sC3n
   aoumRjLY87LFkw6bITU41PJZ88DgfZtw8WFvVytlbQnbJ68CIXHoBspBL
   eEEBgccnqeaPEEzpoC6yG9HVEjoXc6LeIYUNtETpMB5iOc9WnNwWedyJp
   BWvZJAauFuvGVuV5S3uCwFwz5VBlWBt0isYBOTT2/T9LQrMzLQR9OWiPc
   +4e5hE8K/44GMDuWfcCB5cIt8NcAL2vyjbTNna8VeQfl0u8AEnZ3D2LQ+
   WDr539xtduec3fjXb4QDENKMIXsX8xriM7OrzYR798XgF12FguThCDSTV
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870096"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870096"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:58 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807628"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807628"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:57 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 039/105] [MARKER] The start of TDX KVM patch series: KVM
 TDP MMU hooks
Date: Fri, 30 Sep 2022 03:17:33 -0700
Message-Id: 
 <8f5c1cefd2d6f0446e0e21b8a43f775d322d9414.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of KVM TDP MMU
hooks.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index df003d2ed89e..d5cace00c433 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -25,6 +25,6 @@ Patch Layer status
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
-* KVM TDP refactoring for TDX:          Applying
-* KVM TDP MMU hooks:                    Not yet
+* KVM TDP refactoring for TDX:          Applied
+* KVM TDP MMU hooks:                    Applying
 * KVM TDP MMU MapGPA:                   Not yet
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3C863C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232194AbiI3KWf (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:35 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33580 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231543AbiI3KTF (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:05 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F4C916EABD;
        Fri, 30 Sep 2022 03:19:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533144; x=1696069144;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=wamxAOh4thSrU2qmpuymJBKWtZQH6fNwm0ilxwILaF4=;
  b=h1ZvlFpS2Q2jqVnndUlKB5m3bkyjUj+vXFfutP2sN/DnDYapf6XCviH5
   RvbZYipdk+2p2P6LPu2ATUNTc1Pblo5dBNcYyjz008f8k3yL5FamMXpV2
   ExJ7K5NjvYsylVtTJqgggb7wqsc3qPmEOvfIzmpMXPfUhoeEZBWFY+aKw
   FFCL2LxOY+nbUqQwniEmVzDNj3HiP4ABjEgL2C9ignG4fwyTHqM2/3v03
   9LSp1aas6ZRDlBlWKDYZJPVX1VhvEHRCeKo7guCrunNIOuVPI6WcTTmIj
   GisVQ2QjaQTlE6yC91QLDIGnAr0UZgHOimmYrj0T7H/XrOBimO6hEMnqD
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870097"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870097"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:58 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807631"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807631"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:58 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 040/105] KVM: x86/tdp_mmu: refactor kvm_tdp_mmu_map()
Date: Fri, 30 Sep 2022 03:17:34 -0700
Message-Id: 
 <6dbf687ab23f696fa5591818769ee90b0201fe3f.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Factor out non-leaf SPTE population logic from kvm_tdp_mmu_map().  MapGPA
hypercall needs to populate non-leaf SPTE to record which GPA, private or
shared, is allowed in the leaf EPT entry.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 5ecb976ed954..9e7b18c3f3e3 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1145,6 +1145,24 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct t=
dp_iter *iter,
 	return 0;
 }
=20
+static int tdp_mmu_populate_nonleaf(struct kvm_vcpu *vcpu, struct tdp_iter=
 *iter,
+				    bool account_nx)
+{
+	struct kvm_mmu_page *sp;
+	int ret;
+
+	KVM_BUG_ON(is_shadow_present_pte(iter->old_spte), vcpu->kvm);
+	KVM_BUG_ON(is_removed_spte(iter->old_spte), vcpu->kvm);
+
+	sp =3D tdp_mmu_alloc_sp(vcpu);
+	tdp_mmu_init_child_sp(sp, iter);
+
+	ret =3D tdp_mmu_link_sp(vcpu->kvm, iter, sp, account_nx, true);
+	if (ret)
+		tdp_mmu_free_sp(sp);
+	return ret;
+}
+
 /*
  * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by install=
ing
  * page tables and SPTEs to translate the faulting guest physical address.
@@ -1153,7 +1171,6 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm=
_page_fault *fault)
 {
 	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
 	struct tdp_iter iter;
-	struct kvm_mmu_page *sp;
 	int ret;
=20
 	kvm_mmu_hugepage_adjust(vcpu, fault);
@@ -1199,13 +1216,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kv=
m_page_fault *fault)
 			if (is_removed_spte(iter.old_spte))
 				break;
=20
-			sp =3D tdp_mmu_alloc_sp(vcpu);
-			tdp_mmu_init_child_sp(sp, &iter);
-
-			if (tdp_mmu_link_sp(vcpu->kvm, &iter, sp, account_nx, true)) {
-				tdp_mmu_free_sp(sp);
+			if (tdp_mmu_populate_nonleaf(vcpu, &iter, account_nx))
 				break;
-			}
 		}
 	}
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1625DC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232208AbiI3KWj (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:39 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33702 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231547AbiI3KTG (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:06 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D60017DC09;
        Fri, 30 Sep 2022 03:19:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533144; x=1696069144;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=fQn1sUg1PJnp/f7Hv13ki6PqnSGj8iHcZFdWesoCdig=;
  b=NqowM/LSAmpaoV806wfXGfUixL0AVaGYCuC640U6sLvCD4HmiEmI6omb
   QIdCqdcHATZL2negftjW8olytSMIxG0WPmeKDDbSU3vT7VLSbCBPY+9Xc
   UKD1/G/ukTlPS7hkflGoHtvq8hp3BE3pAerAR5+c+WwUWieVUmcuouxqX
   G6T7urs42jO1GUlXXjchGDOdSoyHp5NvXX7QrCxt/s0RQKYOpmslHVKOo
   sIYzX2+gexKBhHgzIz2+Jk+5FSiaxxhchckkTIMtIHK5nrOUCdxIi/8ps
   tXhFu2RdVLp4BkmTpDVCm2r6qTzPBdcpH/6x32xKadUh6YdvmjFzWEN1A
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870098"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870098"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:58 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807634"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807634"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:58 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 041/105] KVM: x86/tdp_mmu: Init role member of struct
 kvm_mmu_page at allocation
Date: Fri, 30 Sep 2022 03:17:35 -0700
Message-Id: 
 <ef8a37ad1768ce3afa7c4f8ed02495239f70363f.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Refactor tdp_mmu_alloc_sp() and tdp_mmu_init_sp and eliminate
tdp_mmu_init_child_sp().  Currently tdp_mmu_init_sp() (or
tdp_mmu_init_child_sp()) sets kvm_mmu_page.role after tdp_mmu_alloc_sp()
allocating struct kvm_mmu_page and its page table page.  This patch makes
tdp_mmu_alloc_sp() initialize kvm_mmu_page.role instead of
tdp_mmu_init_sp().

To handle private page tables, argument of is_private needs to be passed
down.  Given that already page level is passed down, it would be cumbersome
to add one more parameter about sp. Instead replace the level argument with
union kvm_mmu_page_role.  Thus the number of argument won't be increased
and more info about sp can be passed down.

For private sp, secure page table will be also allocated in addition to
struct kvm_mmu_page and page table (spt member).  The allocation functions
(tdp_mmu_alloc_sp() and __tdp_mmu_alloc_sp_for_split()) need to know if the
allocation is for the conventional page table or private page table.  Pass
union kvm_mmu_role to those functions and initialize role member of struct
kvm_mmu_page.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/tdp_iter.h | 12 ++++++++++
 arch/x86/kvm/mmu/tdp_mmu.c  | 44 ++++++++++++++++---------------------
 2 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index f0af385c56e0..9e56a5b1024c 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -115,4 +115,16 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_=
mmu_page *root,
 void tdp_iter_next(struct tdp_iter *iter);
 void tdp_iter_restart(struct tdp_iter *iter);
=20
+static inline union kvm_mmu_page_role tdp_iter_child_role(struct tdp_iter =
*iter)
+{
+	union kvm_mmu_page_role child_role;
+	struct kvm_mmu_page *parent_sp;
+
+	parent_sp =3D sptep_to_sp(rcu_dereference(iter->sptep));
+
+	child_role =3D parent_sp->role;
+	child_role.level--;
+	return child_role;
+}
+
 #endif /* __KVM_X86_MMU_TDP_ITER_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 9e7b18c3f3e3..ef8b0c929944 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -271,22 +271,28 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct =
kvm *kvm,
 		    kvm_mmu_page_as_id(_root) !=3D _as_id) {		\
 		} else
=20
-static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
+static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu,
+					     union kvm_mmu_page_role role)
 {
 	struct kvm_mmu_page *sp;
=20
 	sp =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
 	sp->spt =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
+	sp->role =3D role;
=20
 	return sp;
 }
=20
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
-			    gfn_t gfn, union kvm_mmu_page_role role)
+			    gfn_t gfn)
 {
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
=20
-	sp->role =3D role;
+	/*
+	 * role must be set before calling this function.  At least role.level
+	 * is not 0 (PG_LEVEL_NONE).
+	 */
+	WARN_ON_ONCE(!sp->role.word);
 	sp->gfn =3D gfn;
 	sp->ptep =3D sptep;
 	sp->tdp_mmu_page =3D true;
@@ -294,20 +300,6 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, t=
dp_ptep_t sptep,
 	trace_kvm_mmu_get_page(sp, true);
 }
=20
-static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp,
-				  struct tdp_iter *iter)
-{
-	struct kvm_mmu_page *parent_sp;
-	union kvm_mmu_page_role role;
-
-	parent_sp =3D sptep_to_sp(rcu_dereference(iter->sptep));
-
-	role =3D parent_sp->role;
-	role.level--;
-
-	tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role);
-}
-
 hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 {
 	union kvm_mmu_page_role role =3D vcpu->arch.mmu->root_role;
@@ -326,8 +318,8 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vc=
pu)
 			goto out;
 	}
=20
-	root =3D tdp_mmu_alloc_sp(vcpu);
-	tdp_mmu_init_sp(root, NULL, 0, role);
+	root =3D tdp_mmu_alloc_sp(vcpu, role);
+	tdp_mmu_init_sp(root, NULL, 0);
=20
 	refcount_set(&root->tdp_mmu_root_count, 1);
=20
@@ -1154,8 +1146,8 @@ static int tdp_mmu_populate_nonleaf(struct kvm_vcpu *=
vcpu, struct tdp_iter *iter
 	KVM_BUG_ON(is_shadow_present_pte(iter->old_spte), vcpu->kvm);
 	KVM_BUG_ON(is_removed_spte(iter->old_spte), vcpu->kvm);
=20
-	sp =3D tdp_mmu_alloc_sp(vcpu);
-	tdp_mmu_init_child_sp(sp, iter);
+	sp =3D tdp_mmu_alloc_sp(vcpu, tdp_iter_child_role(iter));
+	tdp_mmu_init_sp(sp, iter->sptep, iter->gfn);
=20
 	ret =3D tdp_mmu_link_sp(vcpu->kvm, iter, sp, account_nx, true);
 	if (ret)
@@ -1423,7 +1415,7 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm,
 	return spte_set;
 }
=20
-static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp)
+static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp, union =
kvm_mmu_page_role role)
 {
 	struct kvm_mmu_page *sp;
=20
@@ -1433,6 +1425,7 @@ static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_sp=
lit(gfp_t gfp)
 	if (!sp)
 		return NULL;
=20
+	sp->role =3D role;
 	sp->spt =3D (void *)__get_free_page(gfp);
 	if (!sp->spt) {
 		kmem_cache_free(mmu_page_header_cache, sp);
@@ -1446,6 +1439,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli=
t(struct kvm *kvm,
 						       struct tdp_iter *iter,
 						       bool shared)
 {
+	union kvm_mmu_page_role role =3D tdp_iter_child_role(iter);
 	struct kvm_mmu_page *sp;
=20
 	/*
@@ -1457,7 +1451,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli=
t(struct kvm *kvm,
 	 * If this allocation fails we drop the lock and retry with reclaim
 	 * allowed.
 	 */
-	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT);
+	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT, role);
 	if (sp)
 		return sp;
=20
@@ -1469,7 +1463,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli=
t(struct kvm *kvm,
 		write_unlock(&kvm->mmu_lock);
=20
 	iter->yielded =3D true;
-	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_KERNEL_ACCOUNT);
+	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_KERNEL_ACCOUNT, role);
=20
 	if (shared)
 		read_lock(&kvm->mmu_lock);
@@ -1488,7 +1482,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, s=
truct tdp_iter *iter,
 	const int level =3D iter->level;
 	int ret, i;
=20
-	tdp_mmu_init_child_sp(sp, iter);
+	tdp_mmu_init_sp(sp, iter->sptep, iter->gfn);
=20
 	/*
 	 * No need for atomics when writing to sp->spt since the page table has
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 84086C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232310AbiI3KXM (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:12 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33720 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231573AbiI3KTH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:07 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9109F1822E9;
        Fri, 30 Sep 2022 03:19:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533145; x=1696069145;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ETYhmV4ggDyJkx64nnGBESEDJmnFJiUzMUdZ17hT8mE=;
  b=F2SUyS7foBvFRIQp4hjuPZsnEAZ7q98WCVkHjQc4DlybZcE47jTTlRZd
   GkH5PuYqvkF1PuLCLrt2ls9kI+dReGLAWIoMZwMaaM6vH0V6WbwHyNVe0
   WZTZu9l1L6pXIbE+rd7DzmRYf/APlntoPxlAJRoKMTUH4R6JBZcOu23d+
   t+rBjoE6EkRH2Q/7c0dYJkbAgjx7AdnmZNLY05ub+0ZzL2pjUTRENXVvc
   PUYbBtDAjCKT/9qrP1Y0IgLkrQLosWQQmclDa4zjcwUuVNjdX3FHUxRdf
   X12zGRK8E4XIjVoe/oOc0LolLSh5nr0N1sf0p/mhXMbnWvMVaM6amlaDG
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870099"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870099"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:58 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807637"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807637"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:58 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 042/105] KVM: x86/mmu: Require TDP MMU for TDX
Date: Fri, 30 Sep 2022 03:17:36 -0700
Message-Id: 
 <aad7dfdebe3c10e5dd7c7e46de6cb539beb7f8f7.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Require the TDP MMU for guest TDs, the so called "shadow" MMU does not
support mapping guest private memory, i.e. does not support Secure-EPT.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index ef8b0c929944..8f20c3857397 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -18,8 +18,12 @@ int kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 {
 	struct workqueue_struct *wq;
=20
+	/*
+	 * Because only the TDP MMU supports TDX, require the TDP MMU for guest
+	 * TDs.
+	 */
 	if (!tdp_enabled || !READ_ONCE(tdp_mmu_enabled))
-		return 0;
+		return kvm->arch.vm_type =3D=3D KVM_X86_TDX_VM ? -EOPNOTSUPP : 0;
=20
 	wq =3D alloc_workqueue("kvm", WQ_UNBOUND|WQ_MEM_RECLAIM|WQ_CPU_INTENSIVE,=
 0);
 	if (!wq)
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AFD46C4332F
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232326AbiI3KXU (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:20 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33722 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231584AbiI3KTJ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:09 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9115E166F0C;
        Fri, 30 Sep 2022 03:19:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533146; x=1696069146;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ghoRFIhfKliM5M6jnyrFU37Y6bcUJpDFK4TlqDcCIaw=;
  b=EBxRkM3H9DwtgqSkM/SEsr7CV687RcfoajPFGqbDWtOGOAL/6LeSK9iS
   /3goIP8u4B6ntypI+gLY9wYwCU4sh/HskA06xP/4Bdluq3JkAjO5U7xFb
   hpUg+Wfx63KcBoFcfN6v0dsUvdWMtrh7MN8ZPDvEwrTKVH/76R7J9+sJ7
   L0XNb9HpYkSEb5wrcZ3G0s2dGZEZVXiASS1qitKOkda1Gu8LR28YI7ExV
   o449mnNxoCrNahMDNBGCXUbCX/Ti7RRHXAwnqdp/0QOw+asvTG3woTLyz
   1vUUu4yL5izSIGz2yALh/D07cu99sIwpfNoe6y4Ezp7qMeXf+NcLAicqT
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870100"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870100"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:58 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807640"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807640"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:58 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 043/105] KVM: x86/mmu: Add a new is_private member for
 union kvm_mmu_page_role
Date: Fri, 30 Sep 2022 03:17:37 -0700
Message-Id: 
 <d99c47094aafee1fadb863cf5695ec09302feda7.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because TDX support introduces private mapping, add a new member in union
kvm_mmu_page_role with access functions to check the member.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/mmu/mmu_internal.h | 11 +++++++++++
 2 files changed, 38 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 122e1baef012..5f18a6c16715 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -336,7 +336,12 @@ union kvm_mmu_page_role {
 		unsigned ad_disabled:1;
 		unsigned guest_mode:1;
 		unsigned passthrough:1;
+#ifdef CONFIG_KVM_MMU_PRIVATE
+		unsigned is_private:1;
+		unsigned :4;
+#else
 		unsigned :5;
+#endif
=20
 		/*
 		 * This is left at the top of the word so that
@@ -348,6 +353,28 @@ union kvm_mmu_page_role {
 	};
 };
=20
+#ifdef CONFIG_KVM_MMU_PRIVATE
+static inline bool kvm_mmu_page_role_is_private(union kvm_mmu_page_role ro=
le)
+{
+	return !!role.is_private;
+}
+
+static inline void kvm_mmu_page_role_set_private(union kvm_mmu_page_role *=
role)
+{
+	role->is_private =3D 1;
+}
+#else
+static inline bool kvm_mmu_page_role_is_private(union kvm_mmu_page_role ro=
le)
+{
+	return false;
+}
+
+static inline void kvm_mmu_page_role_set_private(union kvm_mmu_page_role *=
role)
+{
+	WARN_ON_ONCE(1);
+}
+#endif
+
 /*
  * kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties
  * relevant to the current MMU configuration.   When loading CR0, CR4, or =
EFER,
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index 486d719ca2e1..222ee61a415a 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -142,6 +142,17 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_pa=
ge *sp)
 	return kvm_mmu_role_as_id(sp->role);
 }
=20
+static inline bool is_private_sp(const struct kvm_mmu_page *sp)
+{
+	return kvm_mmu_page_role_is_private(sp->role);
+}
+
+static inline bool is_private_sptep(u64 *sptep)
+{
+	WARN_ON_ONCE(!sptep);
+	return is_private_sp(sptep_to_sp(sptep));
+}
+
 bool kvm_mem_attr_is_mixed(struct kvm_memory_slot *slot, gfn_t gfn, int le=
vel);
=20
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page =
*sp)
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B096AC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:35 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231716AbiI3KXa (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:30 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33974 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231591AbiI3KTJ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:09 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91465184818;
        Fri, 30 Sep 2022 03:19:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533146; x=1696069146;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=eMAYrSW/5qYSJlJLH5ZEJvn4Gdu38EaJ747W91OgwZg=;
  b=F9VaNnk4DfY2wCdjBLgyo0Q5s8SVsq1bIGOpYiVA4AqeRzY6KqpqQjZD
   DAGnkJs4G42ZzTX/dOm0U2eXiOqV1yoVqcY8eSyWLqL958ohXLAH7O6Dm
   S35n2iHFgPO1rbclj/lbEfoh+OwgyXBiN6xsePftRNw330NlakarNMALy
   FFfgWEZzXuftyLvG3tgBwTmqYTdjanxWhGuqHpbxTWzpgviFH4NmG1v48
   IDrw2HUVmwwHq6asEm3kAvxwMgCB3/5+qD3iSRf+dmdBHSreaCRLwTJ1R
   rrYoIU5visaebnR9MVVTaCmh8GJZe1FZuHqqnVkcD366bPEqTR2EqJgds
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870101"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870101"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:58 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807643"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807643"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:58 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 044/105] KVM: x86/mmu: Add a private pointer to struct
 kvm_mmu_page
Date: Fri, 30 Sep 2022 03:17:38 -0700
Message-Id: 
 <09c731bc5c12ff3c0bb2929305206a8d8f87d68c.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

For private GPA, CPU refers a private page table whose contents are
encrypted.  The dedicated APIs to operate on it (e.g. updating/reading its
PTE entry) are used and their cost is expensive.

When KVM resolves KVM page fault, it walks the page tables.  To reuse the
existing KVM MMU code and mitigate the heavy cost to directly walk
protected (encrypted) page table, allocate one more page to copy the
protected page table for KVM MMU code to directly walk.  Resolve KVM page
fault with the existing code, and do additional operations necessary for
the protected page table.  To distinguish such cases, the existing KVM page
table is called a shared page table (i.e. not associated with protected
page table), and the page table with protected page table is called a
private page table.  The relationship is depicted below.

Add a private pointer to struct kvm_mmu_page for protected page table and
add helper functions to allocate/initialize/free a protected page table
page.

              KVM page fault                     |
                     |                           |
                     V                           |
        -------------+----------                 |
        |                      |                 |
        V                      V                 |
     shared GPA           private GPA            |
        |                      |                 |
        V                      V                 |
    shared PT root      private PT root          |    protected PT root
        |                      |                 |           |
        V                      V                 |           V
     shared PT            private PT ----propagate----> protected PT
        |                      |                 |           |
        |                      \-----------------+------\    |
        |                                        |      |    |
        V                                        |      V    V
  shared guest page                              |    private guest page
                                                 |
                           non-encrypted memory  |    encrypted memory
                                                 |
PT: page table
- Shared PT is visible to KVM and it is used by CPU.
- Protected PT is used by CPU but it is invisible to KVM.
- Private PT is visible to KVM but not used by CPU.  It is used to
  propagate PT change to the actual protected PT which is used by CPU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  7 +++
 arch/x86/kvm/mmu/mmu.c          |  8 +++
 arch/x86/kvm/mmu/mmu_internal.h | 90 +++++++++++++++++++++++++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c      |  1 +
 4 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 5f18a6c16715..789a8de4028a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -743,6 +743,13 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu_memory_cache mmu_shadow_page_cache;
 	struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
+	/*
+	 * This cache is to allocate private page table. E.g.  Secure-EPT used
+	 * by the TDX module.  Because the TDX module doesn't trust VMM and
+	 * initializes the pages itself, KVM doesn't initialize them.  Allocate
+	 * pages with garbage and give them to the TDX module.
+	 */
+	struct kvm_mmu_memory_cache mmu_private_spt_cache;
=20
 	/*
 	 * QEMU userspace and the guest each have their own FPU state.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 97d575f787cc..7b38d9f4c457 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -653,6 +653,13 @@ static int mmu_topup_shadow_page_cache(struct kvm_vcpu=
 *vcpu)
 	struct kvm_mmu_memory_cache *mc =3D &vcpu->arch.mmu_shadow_page_cache;
 	int start, end, i, r;
=20
+	if (kvm_gfn_shared_mask(vcpu->kvm)) {
+		r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_private_spt_cache,
+					       PT64_ROOT_MAX_LEVEL);
+		if (r)
+			return r;
+	}
+
 	start =3D kvm_mmu_memory_cache_nr_free_objects(mc);
 	r =3D kvm_mmu_topup_memory_cache(mc, PT64_ROOT_MAX_LEVEL);
=20
@@ -702,6 +709,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcp=
u)
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
+	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_private_spt_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
=20
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index 222ee61a415a..fb867270829e 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -87,7 +87,23 @@ struct kvm_mmu_page {
 		int root_count;
 		refcount_t tdp_mmu_root_count;
 	};
-	unsigned int unsync_children;
+	union {
+		struct {
+			unsigned int unsync_children;
+			/*
+			 * Number of writes since the last time traversal
+			 * visited this page.
+			 */
+			atomic_t write_flooding_count;
+		};
+#ifdef CONFIG_KVM_MMU_PRIVATE
+		/*
+		 * Associated private shadow page table, e.g. Secure-EPT page
+		 * passed to the TDX module.
+		 */
+		void *private_spt;
+#endif
+	};
 	union {
 		struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
 		tdp_ptep_t ptep;
@@ -109,9 +125,6 @@ struct kvm_mmu_page {
 	int clear_spte_count;
 #endif
=20
-	/* Number of writes since the last time traversal visited this page.  */
-	atomic_t write_flooding_count;
-
 #ifdef CONFIG_X86_64
 	/* Used for freeing the page asynchronously if it is a TDP MMU page. */
 	struct rcu_head rcu_head;
@@ -153,6 +166,75 @@ static inline bool is_private_sptep(u64 *sptep)
 	return is_private_sp(sptep_to_sp(sptep));
 }
=20
+#ifdef CONFIG_KVM_MMU_PRIVATE
+static inline void *kvm_mmu_private_spt(struct kvm_mmu_page *sp)
+{
+	return sp->private_spt;
+}
+
+static inline void kvm_mmu_init_private_spt(struct kvm_mmu_page *sp, void =
*private_spt)
+{
+	sp->private_spt =3D private_spt;
+}
+
+static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu,
+					     struct kvm_mmu_memory_cache *private_spt_cache,
+					     struct kvm_mmu_page *sp)
+{
+	/*
+	 * vcpu =3D=3D NULL means non-root SPT:
+	 * vcpu =3D=3D NULL is used to split a large SPT into smaller SPT.  Root =
SPT
+	 * is not a large SPT.
+	 */
+	bool is_root =3D vcpu &&
+		vcpu->arch.root_mmu.root_role.level =3D=3D sp->role.level;
+
+	if (vcpu)
+		private_spt_cache =3D &vcpu->arch.mmu_private_spt_cache;
+	KVM_BUG_ON(!kvm_mmu_page_role_is_private(sp->role), vcpu->kvm);
+	if (is_root)
+		/*
+		 * Because TDX module assigns root Secure-EPT page and set it to
+		 * Secure-EPTP when TD vcpu is created, secure page table for
+		 * root isn't needed.
+		 */
+		sp->private_spt =3D NULL;
+	else {
+		sp->private_spt =3D kvm_mmu_memory_cache_alloc(private_spt_cache);
+		/*
+		 * Because mmu_private_spt_cache is topped up before staring kvm
+		 * page fault resolving, the allocation above shouldn't fail.
+		 */
+		WARN_ON_ONCE(!sp->private_spt);
+	}
+}
+
+static inline void kvm_mmu_free_private_spt(struct kvm_mmu_page *sp)
+{
+	if (sp->private_spt)
+		free_page((unsigned long)sp->private_spt);
+}
+#else
+static inline void *kvm_mmu_private_spt(struct kvm_mmu_page *sp)
+{
+	return NULL;
+}
+
+static inline void kvm_mmu_init_private_spt(struct kvm_mmu_page *sp, void =
*private_spt)
+{
+}
+
+static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu,
+					     struct kvm_mmu_memory_cache *private_spt_cache,
+					     struct kvm_mmu_page *sp)
+{
+}
+
+static inline void kvm_mmu_free_private_spt(struct kvm_mmu_page *sp)
+{
+}
+#endif
+
 bool kvm_mem_attr_is_mixed(struct kvm_memory_slot *slot, gfn_t gfn, int le=
vel);
=20
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page =
*sp)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 8f20c3857397..9327a77d7434 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -71,6 +71,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
=20
 static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
 {
+	kvm_mmu_free_private_spt(sp);
 	free_page((unsigned long)sp->spt);
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EF799C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:18 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232315AbiI3KXR (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:17 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33912 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231579AbiI3KTI (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:08 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 167FB188BE9;
        Fri, 30 Sep 2022 03:19:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533146; x=1696069146;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=OnPzfJJFwhJZoW4FbnFPyVFYaq7RNXfLquHtT83+l1M=;
  b=jQv0rJi8297tGxV2Qkn0hnKljdl7R490rF6jvqRoUHqn75CLbP3NXNnm
   fK5uGt+8ilUc/qqVE1VBRYarXfw3Lji8tQriO7epbutc3CU3RFdgR7Z4s
   O+MwomWSsyc2fdVziKeVCHk/pKKTzMy45oZfCfE2xCUVFaErpniWPxR0S
   bZMwxCdB0g6WH/HOUHE34Y0uGRAw+uICPqrTpPqn0iqoR7HWyW9ZxkKae
   9ZXaYKrCkbsvYRuMOoaC4xtmTv6ZaYWDr0jFR/aROHodRHytaV7nKj5S6
   aT0/K8jeDe2o7m0isXJglFJ4e6LjIiSfJyO45LUiILbp1G7gLQZ9rfnHK
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870105"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870105"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:59 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807646"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807646"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:58 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 045/105] KVM: x86/tdp_mmu: Don't zap private pages for
 unsupported cases
Date: Fri, 30 Sep 2022 03:17:39 -0700
Message-Id: 
 <fea12fb09707268f0bda41f6e9752a4cd6b7d02a.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX supports only write-back(WB) memory type for private memory
architecturally so that (virtualized) memory type change doesn't make sense
for private memory.  Also currently, page migration isn't supported for TDX
yet. (TDX architecturally supports page migration. it's KVM and kernel
implementation issue.)

Regarding memory type change (mtrr virtualization and lapic page mapping
change), pages are zapped by kvm_zap_gfn_range().  On the next KVM page
fault, the SPTE entry with a new memory type for the page is populated.
Regarding page migration, pages are zapped by the mmu notifier. On the next
KVM page fault, the new migrated page is populated.  Don't zap private
pages on unmapping for those two cases.

When deleting/moving a KVM memory slot, zap private pages. Typically
tearing down VM.  Don't invalidate private page tables. i.e. zap only leaf
SPTEs for KVM mmu that has a shared bit mask. The existing
kvm_tdp_mmu_invalidate_all_roots() depends on role.invalid with read-lock
of mmu_lock so that other vcpu can operate on KVM mmu concurrently.  It
marks the root page table invalid and zaps SPTEs of the root page
tables. The TDX module doesn't allow to unlink a protected root page table
from the hardware and then allocate a new one for it. i.e. replacing a
protected root page table.  Instead, zap only leaf SPTEs for KVM mmu with a
shared bit mask set.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c     | 58 ++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c | 24 ++++++++++++----
 arch/x86/kvm/mmu/tdp_mmu.h |  5 ++--
 3 files changed, 77 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7b38d9f4c457..84a08aa180b7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1578,7 +1578,12 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm=
_gfn_range *range)
 		flush =3D kvm_handle_gfn_range(kvm, range, kvm_zap_rmap);
=20
 	if (is_tdp_mmu_enabled(kvm))
-		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush);
+		/*
+		 * kvm_unmap_gfn_range() is called via mmu notifier.
+		 * For now page migration for private page isn't supported yet,
+		 * don't zap private pages.
+		 */
+		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush, false);
=20
 	return flush;
 }
@@ -6057,11 +6062,48 @@ static bool kvm_has_zapped_obsolete_pages(struct kv=
m *kvm)
 	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
 }
=20
+static void kvm_mmu_zap_memslot(struct kvm *kvm, struct kvm_memory_slot *s=
lot)
+{
+	bool flush =3D false;
+
+	write_lock(&kvm->mmu_lock);
+
+	/*
+	 * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
+	 * case scenario we'll have unused shadow pages lying around until they
+	 * are recycled due to age or when the VM is destroyed.
+	 */
+	if (is_tdp_mmu_enabled(kvm)) {
+		struct kvm_gfn_range range =3D {
+		      .slot =3D slot,
+		      .start =3D slot->base_gfn,
+		      .end =3D slot->base_gfn + slot->npages,
+		      .may_block =3D false,
+		};
+
+		/*
+		 * this handles both private gfn and shared gfn.
+		 * All private page should be zapped on memslot deletion.
+		 */
+		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush, true);
+	} else {
+		flush =3D slot_handle_level(kvm, slot, __kvm_zap_rmap, PG_LEVEL_4K,
+					  KVM_MAX_HUGEPAGE_LEVEL, true);
+	}
+	if (flush)
+		kvm_flush_remote_tlbs(kvm);
+
+	write_unlock(&kvm->mmu_lock);
+}
+
 static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
 			struct kvm_memory_slot *slot,
 			struct kvm_page_track_notifier_node *node)
 {
-	kvm_mmu_zap_all_fast(kvm);
+	if (kvm_gfn_shared_mask(kvm))
+		kvm_mmu_zap_memslot(kvm, slot);
+	else
+		kvm_mmu_zap_all_fast(kvm);
 }
=20
 int kvm_mmu_init_vm(struct kvm *kvm)
@@ -6164,8 +6206,18 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_st=
art, gfn_t gfn_end)
=20
 	if (is_tdp_mmu_enabled(kvm)) {
 		for (i =3D 0; i < KVM_ADDRESS_SPACE_NUM; i++)
+			/*
+			 * zap_private =3D true. Zap both private/shared pages.
+			 *
+			 * kvm_zap_gfn_range() is used when PAT memory type was
+			 * changed.  Later on the next kvm page fault, populate
+			 * it with updated spte entry.
+			 * Because only WB is supported for private pages, don't
+			 * care of private pages.
+			 */
 			flush =3D kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start,
-						      gfn_end, true, flush);
+						      gfn_end, true, flush,
+						      true);
 	}
=20
 	if (flush)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 9327a77d7434..542643b43162 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -937,7 +937,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu=
_page *sp)
  * operation can cause a soft lockup.
  */
 static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
-			      gfn_t start, gfn_t end, bool can_yield, bool flush)
+			      gfn_t start, gfn_t end, bool can_yield, bool flush,
+			      bool zap_private)
 {
 	struct tdp_iter iter;
=20
@@ -945,6 +946,10 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct =
kvm_mmu_page *root,
=20
 	lockdep_assert_held_write(&kvm->mmu_lock);
=20
+	WARN_ON_ONCE(zap_private && !is_private_sp(root));
+	if (!zap_private && is_private_sp(root))
+		return false;
+
 	rcu_read_lock();
=20
 	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
@@ -977,12 +982,13 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct=
 kvm_mmu_page *root,
  * more SPTEs were zapped since the MMU lock was last acquired.
  */
 bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t =
end,
-			   bool can_yield, bool flush)
+			   bool can_yield, bool flush, bool zap_private)
 {
 	struct kvm_mmu_page *root;
=20
 	for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
-		flush =3D tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);
+		flush =3D tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush,
+					  zap_private && is_private_sp(root));
=20
 	return flush;
 }
@@ -1042,6 +1048,12 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kv=
m)
=20
 	lockdep_assert_held_write(&kvm->mmu_lock);
 	list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
+		/*
+		 * Skip private root since private page table
+		 * is only torn down when VM is destroyed.
+		 */
+		if (is_private_sp(root))
+			continue;
 		if (!root->role.invalid &&
 		    !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) {
 			root->role.invalid =3D true;
@@ -1233,11 +1245,13 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct k=
vm_page_fault *fault)
 	return ret;
 }
=20
+/* Used by mmu notifier via kvm_unmap_gfn_range() */
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *ra=
nge,
-				 bool flush)
+				 bool flush, bool zap_private)
 {
 	return kvm_tdp_mmu_zap_leafs(kvm, range->slot->as_id, range->start,
-				     range->end, range->may_block, flush);
+				     range->end, range->may_block, flush,
+				     zap_private);
 }
=20
 typedef bool (*tdp_handler_t)(struct kvm *kvm, struct tdp_iter *iter,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index c163f7cc23ca..c98c7df449a8 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -16,7 +16,8 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu=
_page *root,
 			  bool shared);
=20
 bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start,
-				 gfn_t end, bool can_yield, bool flush);
+			   gfn_t end, bool can_yield, bool flush,
+			   bool zap_private);
 bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp);
 void kvm_tdp_mmu_zap_all(struct kvm *kvm);
 void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
@@ -25,7 +26,7 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm);
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
=20
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *ra=
nge,
-				 bool flush);
+				 bool flush, bool zap_private);
 bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *rang=
e);
 bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range=
);
 bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range=
);
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CAB67C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:24:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232181AbiI3KX7 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:59 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34194 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231643AbiI3KTR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:17 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16A591893AB;
        Fri, 30 Sep 2022 03:19:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533146; x=1696069146;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=SmrDC7LkT2pOV82ObFM3LLr9crX/KxPGTCQnZqzp2+g=;
  b=FSu7vODs0QoGa3yVrqjXTo2AFb5htrSmIqjGjHTLIXzbHfymAxIo3Nnp
   AQXRcL3WF6MkYXBu0pRlUx7erO6fiF4/K6cNn/Ibfgy0W2TBJFxC8On0M
   TwDVYEYHN1O5vEbL7BFbg7b1j61ZfIKcP0XPyUBG4mgumCWlynIJATAGY
   zq+pSkSDRXBw2f5CNR/uISHF3yNlzn14eJq3nOcrTGwYyUXzyH2KqdEx5
   jprR/UacxeBJZijZFhpdg+4TYOVr7hffudLopaT50lunxlXfc5Fh0WFEL
   /1zEfoJDtlYuRgvbEEer5HKD8/Jcz7pZNO85pmPh57N9nyoW0e4u8+Vw4
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870106"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870106"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:59 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807649"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807649"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:59 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>, Kai Huang <kai.huang@intel.com>
Subject: [PATCH v9 046/105] KVM: x86/tdp_mmu: Support TDX private mapping for
 TDP MMU
Date: Fri, 30 Sep 2022 03:17:40 -0700
Message-Id: 
 <0ac18bef9f51e535bbcb0162882478987a41c595.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Allocate protected page table for private page table, and add hooks to
operate on protected page table.  This patch adds allocation/free of
protected page tables and hooks.  When calling hooks to update SPTE entry,
freeze the entry, call hooks and unfree the entry to allow concurrent
updates on page tables.  Which is the advantage of TDP MMU.  As
kvm_gfn_shared_mask() returns false always, those hooks aren't called yet
with this patch.

When the faulting GPA is private, the KVM fault is called private.  When
resolving private KVM, allocate protected page table and call hooks to
operate on protected page table. On the change of the private PTE entry,
invoke kvm_x86_ops hook in __handle_changed_spte() to propagate the change
to protected page table. The following depicts the relationship.

  private KVM page fault   |
      |                    |
      V                    |
 private GPA               |     CPU protected EPTP
      |                    |           |
      V                    |           V
 private PT root           |     protected PT root
      |                    |           |
      V                    |           V
   private PT --hook to propagate-->protected PT
      |                    |           |
      \--------------------+------\    |
                           |      |    |
                           |      V    V
                           |    private guest page
                           |
                           |
     non-encrypted memory  |    encrypted memory
                           |
PT: page table

The existing KVM TDP MMU code uses atomic update of SPTE.  On populating
the EPT entry, atomically set the entry.  However, it requires TLB
shootdown to zap SPTE.  To address it, the entry is frozen with the special
SPTE value that clears the present bit. After the TLB shootdown, the entry
is set to the eventual value (unfreeze).

For protected page table, hooks are called to update protected page table
in addition to direct access to the private SPTE. For the zapping case, it
works to freeze the SPTE. It can call hooks in addition to TLB shootdown.
For populating the private SPTE entry, there can be a race condition
without further protection

  vcpu 1: populating 2M private SPTE
  vcpu 2: populating 4K private SPTE
  vcpu 2: TDX SEAMCALL to update 4K protected SPTE =3D> error
  vcpu 1: TDX SEAMCALL to update 2M protected SPTE

To avoid the race, the frozen SPTE is utilized.  Instead of atomic update
of the private entry, freeze the entry, call the hook that update protected
SPTE, set the entry to the final value.

Support 4K page only at this stage.  2M page support can be done in future
patches.

Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |   5 +
 arch/x86/include/asm/kvm_host.h    |  11 ++
 arch/x86/kvm/mmu/mmu.c             |  15 +-
 arch/x86/kvm/mmu/mmu_internal.h    |  32 ++++
 arch/x86/kvm/mmu/tdp_iter.h        |   2 +-
 arch/x86/kvm/mmu/tdp_mmu.c         | 237 +++++++++++++++++++++++++----
 arch/x86/kvm/mmu/tdp_mmu.h         |   2 +-
 virt/kvm/kvm_main.c                |   1 +
 8 files changed, 269 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 104a34b44e94..757952f186f8 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -94,6 +94,11 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr)
 KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
 KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
 KVM_X86_OP(load_mmu_pgd)
+KVM_X86_OP_OPTIONAL(link_private_spt)
+KVM_X86_OP_OPTIONAL(free_private_spt)
+KVM_X86_OP_OPTIONAL(set_private_spte)
+KVM_X86_OP_OPTIONAL(remove_private_spte)
+KVM_X86_OP_OPTIONAL(zap_private_spte)
 KVM_X86_OP(has_wbinvd_exit)
 KVM_X86_OP(get_l2_tsc_offset)
 KVM_X86_OP(get_l2_tsc_multiplier)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 789a8de4028a..b9ebe82a4c37 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -468,6 +468,7 @@ struct kvm_mmu {
 			 struct kvm_mmu_page *sp);
 	void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
 	struct kvm_mmu_root_info root;
+	hpa_t private_root_hpa;
 	union kvm_cpu_role cpu_role;
 	union kvm_mmu_page_role root_role;
=20
@@ -1607,6 +1608,16 @@ struct kvm_x86_ops {
 	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			     int root_level);
=20
+	int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
+				void *private_spt);
+	int (*free_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
+				void *private_spt);
+	void (*set_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
+				 kvm_pfn_t pfn);
+	void (*remove_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level lev=
el,
+				    kvm_pfn_t pfn);
+	void (*zap_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level);
+
 	bool (*has_wbinvd_exit)(void);
=20
 	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 84a08aa180b7..c9013213641e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3608,7 +3608,12 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *v=
cpu)
 		goto out_unlock;
=20
 	if (is_tdp_mmu_enabled(vcpu->kvm)) {
-		root =3D kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
+		if (kvm_gfn_shared_mask(vcpu->kvm) &&
+		    !VALID_PAGE(mmu->private_root_hpa)) {
+			root =3D kvm_tdp_mmu_get_vcpu_root_hpa(vcpu, true);
+			mmu->private_root_hpa =3D root;
+		}
+		root =3D kvm_tdp_mmu_get_vcpu_root_hpa(vcpu, false);
 		mmu->root.hpa =3D root;
 	} else if (shadow_root_level >=3D PT64_ROOT_4LEVEL) {
 		root =3D mmu_alloc_root(vcpu, 0, 0, shadow_root_level);
@@ -4319,7 +4324,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, s=
truct kvm_page_fault *fault
 	unsigned long mmu_seq;
 	int r;
=20
-	fault->gfn =3D fault->addr >> PAGE_SHIFT;
+	fault->gfn =3D gpa_to_gfn(fault->addr) & ~kvm_gfn_shared_mask(vcpu->kvm);
 	fault->slot =3D kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
=20
 	if (page_fault_handle_page_track(vcpu, fault))
@@ -5859,6 +5864,7 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, st=
ruct kvm_mmu *mmu)
=20
 	mmu->root.hpa =3D INVALID_PAGE;
 	mmu->root.pgd =3D 0;
+	mmu->private_root_hpa =3D INVALID_PAGE;
 	for (i =3D 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
 		mmu->prev_roots[i] =3D KVM_MMU_ROOT_INFO_INVALID;
=20
@@ -6082,7 +6088,7 @@ static void kvm_mmu_zap_memslot(struct kvm *kvm, stru=
ct kvm_memory_slot *slot)
 		};
=20
 		/*
-		 * this handles both private gfn and shared gfn.
+		 * This handles both private gfn and shared gfn.
 		 * All private page should be zapped on memslot deletion.
 		 */
 		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush, true);
@@ -6882,6 +6888,9 @@ int kvm_mmu_vendor_module_init(void)
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_unload(vcpu);
+	if (is_tdp_mmu_enabled(vcpu->kvm))
+		mmu_free_root_page(vcpu->kvm, &vcpu->arch.mmu->private_root_hpa,
+				NULL);
 	free_mmu_pages(&vcpu->arch.root_mmu);
 	free_mmu_pages(&vcpu->arch.guest_mmu);
 	mmu_free_memory_caches(vcpu);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index fb867270829e..e98f32b8eec2 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -6,6 +6,8 @@
 #include <linux/kvm_host.h>
 #include <asm/kvm_host.h>
=20
+#include "mmu.h"
+
 #undef MMU_DEBUG
=20
 #ifdef MMU_DEBUG
@@ -209,11 +211,29 @@ static inline void kvm_mmu_alloc_private_spt(struct k=
vm_vcpu *vcpu,
 	}
 }
=20
+static inline int kvm_alloc_private_spt_for_split(struct kvm_mmu_page *sp,=
 gfp_t gfp)
+{
+	gfp &=3D ~__GFP_ZERO;
+	sp->private_spt =3D (void *)__get_free_page(gfp);
+	if (!sp->private_spt)
+		return -ENOMEM;
+	return 0;
+}
+
 static inline void kvm_mmu_free_private_spt(struct kvm_mmu_page *sp)
 {
 	if (sp->private_spt)
 		free_page((unsigned long)sp->private_spt);
 }
+
+static inline gfn_t kvm_gfn_for_root(struct kvm *kvm, struct kvm_mmu_page =
*root,
+				     gfn_t gfn)
+{
+	if (is_private_sp(root))
+		return kvm_gfn_private(kvm, gfn);
+	else
+		return kvm_gfn_shared(kvm, gfn);
+}
 #else
 static inline void *kvm_mmu_private_spt(struct kvm_mmu_page *sp)
 {
@@ -230,9 +250,20 @@ static inline void kvm_mmu_alloc_private_spt(struct kv=
m_vcpu *vcpu,
 {
 }
=20
+static inline int kvm_alloc_private_spt_for_split(struct kvm_mmu_page *sp,=
 gfp_t gfp)
+{
+	return -ENOMEM;
+}
+
 static inline void kvm_mmu_free_private_spt(struct kvm_mmu_page *sp)
 {
 }
+
+static inline gfn_t kvm_gfn_for_root(struct kvm *kvm, struct kvm_mmu_page =
*root,
+				     gfn_t gfn)
+{
+	return gfn;
+}
 #endif
=20
 bool kvm_mem_attr_is_mixed(struct kvm_memory_slot *slot, gfn_t gfn, int le=
vel);
@@ -369,6 +400,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu=
 *vcpu, gpa_t cr2_or_gpa,
 		.is_tdp =3D likely(vcpu->arch.mmu->page_fault =3D=3D kvm_tdp_page_fault),
 		.nx_huge_page_workaround_enabled =3D
 			is_nx_huge_page_enabled(vcpu->kvm),
+		.is_private =3D kvm_is_private_gpa(vcpu->kvm, cr2_or_gpa),
=20
 		.max_level =3D vcpu->kvm->arch.tdp_max_page_level,
 		.req_level =3D PG_LEVEL_4K,
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index 9e56a5b1024c..eab62baf8549 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -71,7 +71,7 @@ struct tdp_iter {
 	tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL];
 	/* A pointer to the current SPTE */
 	tdp_ptep_t sptep;
-	/* The lowest GFN mapped by the current SPTE */
+	/* The lowest GFN (shared bits included) mapped by the current SPTE */
 	gfn_t gfn;
 	/* The level of the root page given to the iterator */
 	int root_level;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 542643b43162..3c89f0aa776c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -285,6 +285,9 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm=
_vcpu *vcpu,
 	sp->spt =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
 	sp->role =3D role;
=20
+	if (kvm_mmu_page_role_is_private(role))
+		kvm_mmu_alloc_private_spt(vcpu, NULL, sp);
+
 	return sp;
 }
=20
@@ -305,7 +308,8 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, td=
p_ptep_t sptep,
 	trace_kvm_mmu_get_page(sp, true);
 }
=20
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
+static struct kvm_mmu_page *kvm_tdp_mmu_get_vcpu_root(struct kvm_vcpu *vcp=
u,
+						      bool private)
 {
 	union kvm_mmu_page_role role =3D vcpu->arch.mmu->root_role;
 	struct kvm *kvm =3D vcpu->kvm;
@@ -317,6 +321,8 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vc=
pu)
 	 * Check for an existing root before allocating a new one.  Note, the
 	 * role check prevents consuming an invalid root.
 	 */
+	if (private)
+		kvm_mmu_page_role_set_private(&role);
 	for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) {
 		if (root->role.word =3D=3D role.word &&
 		    kvm_tdp_mmu_get_root(root))
@@ -333,12 +339,17 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *=
vcpu)
 	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
=20
 out:
-	return __pa(root->spt);
+	return root;
+}
+
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu, bool private)
+{
+	return __pa(kvm_tdp_mmu_get_vcpu_root(vcpu, private)->spt);
 }
=20
 static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
-				u64 old_spte, u64 new_spte, int level,
-				bool shared);
+				u64 old_spte, u64 new_spte,
+				union kvm_mmu_page_role role, bool shared);
=20
 static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int =
level)
 {
@@ -364,6 +375,8 @@ static void handle_changed_spte_dirty_log(struct kvm *k=
vm, int as_id, gfn_t gfn,
=20
 	if ((!is_writable_pte(old_spte) || pfn_changed) &&
 	    is_writable_pte(new_spte)) {
+		/* For memory slot operations, use GFN without aliasing */
+		gfn =3D gfn & ~kvm_gfn_shared_mask(kvm);
 		slot =3D __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
 		mark_page_dirty_in_slot(kvm, slot, gfn);
 	}
@@ -488,12 +501,76 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt=
ep_t pt, bool shared)
 							  REMOVED_SPTE, level);
 		}
 		handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
-				    old_spte, REMOVED_SPTE, level, shared);
+				    old_spte, REMOVED_SPTE, sp->role, shared);
+	}
+
+	if (is_private_sp(sp) &&
+	    WARN_ON(static_call(kvm_x86_free_private_spt)(kvm, sp->gfn, sp->role.=
level,
+							  kvm_mmu_private_spt(sp)))) {
+		/*
+		 * Failed to unlink Secure EPT page and there is nothing to do
+		 * further.  Intentionally leak the page to prevent the kernel
+		 * from accessing the encrypted page.
+		 */
+		kvm_mmu_init_private_spt(sp, NULL);
 	}
=20
 	call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
=20
+static void *get_private_spt(gfn_t gfn, u64 new_spte, int level)
+{
+	if (is_shadow_present_pte(new_spte) && !is_last_spte(new_spte, level)) {
+		struct kvm_mmu_page *sp =3D to_shadow_page(pfn_to_hpa(spte_to_pfn(new_sp=
te)));
+		void *private_spt =3D kvm_mmu_private_spt(sp);
+
+		WARN_ON_ONCE(!private_spt);
+		WARN_ON_ONCE(sp->role.level + 1 !=3D level);
+		WARN_ON_ONCE(sp->gfn !=3D gfn);
+		return private_spt;
+	}
+
+	return NULL;
+}
+
+static void handle_changed_private_spte(struct kvm *kvm, gfn_t gfn,
+					u64 old_spte, u64 new_spte,
+					int level)
+{
+	bool was_present =3D is_shadow_present_pte(old_spte);
+	bool is_present =3D is_shadow_present_pte(new_spte);
+	bool was_leaf =3D was_present && is_last_spte(old_spte, level);
+	bool is_leaf =3D is_present && is_last_spte(new_spte, level);
+	kvm_pfn_t old_pfn =3D spte_to_pfn(old_spte);
+	kvm_pfn_t new_pfn =3D spte_to_pfn(new_spte);
+
+	lockdep_assert_held(&kvm->mmu_lock);
+	if (is_present) {
+		/* TDP MMU doesn't change present -> present */
+		KVM_BUG_ON(was_present, kvm);
+
+		/*
+		 * Use different call to either set up middle level
+		 * private page table, or leaf.
+		 */
+		if (is_leaf)
+			static_call(kvm_x86_set_private_spte)(kvm, gfn, level, new_pfn);
+		else {
+			void *private_spt =3D get_private_spt(gfn, new_spte, level);
+
+			KVM_BUG_ON(!private_spt, kvm);
+			if (static_call(kvm_x86_link_private_spt)(kvm, gfn, level, private_spt))
+				/* failed to update Secure-EPT.  */
+				WARN_ON(1);
+		}
+	} else if (was_leaf) {
+		/* non-present -> non-present doesn't make sense. */
+		KVM_BUG_ON(!was_present, kvm);
+		static_call(kvm_x86_zap_private_spte)(kvm, gfn, level);
+		static_call(kvm_x86_remove_private_spte)(kvm, gfn, level, old_pfn);
+	}
+}
+
 /**
  * __handle_changed_spte - handle bookkeeping associated with an SPTE chan=
ge
  * @kvm: kvm instance
@@ -501,7 +578,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep=
_t pt, bool shared)
  * @gfn: the base GFN that was mapped by the SPTE
  * @old_spte: The value of the SPTE before the change
  * @new_spte: The value of the SPTE after the change
- * @level: the level of the PT the SPTE is part of in the paging structure
+ * @role: the role of the PT the SPTE is part of in the paging structure
  * @shared: This operation may not be running under the exclusive use of
  *	    the MMU lock and the operation must synchronize with other
  *	    threads that might be modifying SPTEs.
@@ -510,14 +587,18 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt=
ep_t pt, bool shared)
  * This function must be called for all TDP SPTE modifications.
  */
 static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
-				  u64 old_spte, u64 new_spte, int level,
-				  bool shared)
+				  u64 old_spte, u64 new_spte,
+				  union kvm_mmu_page_role role, bool shared)
 {
+	bool is_private =3D kvm_mmu_page_role_is_private(role);
+	int level =3D role.level;
 	bool was_present =3D is_shadow_present_pte(old_spte);
 	bool is_present =3D is_shadow_present_pte(new_spte);
 	bool was_leaf =3D was_present && is_last_spte(old_spte, level);
 	bool is_leaf =3D is_present && is_last_spte(new_spte, level);
-	bool pfn_changed =3D spte_to_pfn(old_spte) !=3D spte_to_pfn(new_spte);
+	kvm_pfn_t old_pfn =3D spte_to_pfn(old_spte);
+	kvm_pfn_t new_pfn =3D spte_to_pfn(new_spte);
+	bool pfn_changed =3D old_pfn !=3D new_pfn;
=20
 	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_4K);
@@ -584,7 +665,7 @@ static void __handle_changed_spte(struct kvm *kvm, int =
as_id, gfn_t gfn,
=20
 	if (was_leaf && is_dirty_spte(old_spte) &&
 	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
-		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
+		kvm_set_pfn_dirty(old_pfn);
=20
 	/*
 	 * Recursively handle child PTs if the change removed a subtree from
@@ -593,19 +674,36 @@ static void __handle_changed_spte(struct kvm *kvm, in=
t as_id, gfn_t gfn,
 	 * pages are kernel allocations and should never be migrated.
 	 */
 	if (was_present && !was_leaf &&
-	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
+	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) {
+		KVM_BUG_ON(is_private !=3D is_private_sptep(spte_to_child_pt(old_spte, l=
evel)),
+			   kvm);
 		handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
+	}
+
+	/*
+	 * Special handling for the private mapping.  We are either
+	 * setting up new mapping at middle level page table, or leaf,
+	 * or tearing down existing mapping.
+	 *
+	 * This is after handling lower page table by above
+	 * handle_remove_tdp_mmu_page().  Secure-EPT requires to remove
+	 * Secure-EPT tables after removing children.
+	 */
+	if (is_private &&
+	    /* Ignore change of software only bits. e.g. host_writable */
+	    (was_leaf !=3D is_leaf || was_present !=3D is_present || pfn_changed)=
) {
+		handle_changed_private_spte(kvm, gfn, old_spte, new_spte, role.level);
+	}
 }
=20
 static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
-				u64 old_spte, u64 new_spte, int level,
-				bool shared)
+				u64 old_spte, u64 new_spte,
+				union kvm_mmu_page_role role, bool shared)
 {
-	__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
-			      shared);
-	handle_changed_spte_acc_track(old_spte, new_spte, level);
+	__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, role, shared);
+	handle_changed_spte_acc_track(old_spte, new_spte, role.level);
 	handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
-				      new_spte, level);
+				      new_spte, role.level);
 }
=20
 /*
@@ -629,6 +727,24 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *=
kvm,
 					  struct tdp_iter *iter,
 					  u64 new_spte)
 {
+	/*
+	 * For conventional page table, the update flow is
+	 * - update STPE with atomic operation
+	 * - handle changed SPTE. __handle_changed_spte()
+	 * NOTE: __handle_changed_spte() (and functions) must be safe against
+	 * concurrent update.  It is an exception to zap SPTE.  See
+	 * tdp_mmu_zap_spte_atomic().
+	 *
+	 * For private page table, callbacks are needed to propagate SPTE
+	 * change into the protected page table.  In order to atomically update
+	 * both the SPTE and the protected page tables with callbacks, utilize
+	 * freezing SPTE.
+	 * - Freeze the SPTE. Set entry to REMOVED_SPTE.
+	 * - Trigger callbacks for protected page tables. __handle_changed_spte()
+	 * - Unfreeze the SPTE.  Set the entry to new_spte.
+	 */
+	bool freeze_spte =3D is_private_sptep(iter->sptep) && !is_removed_spte(ne=
w_spte);
+	u64 tmp_spte =3D freeze_spte ? REMOVED_SPTE : new_spte;
 	u64 *sptep =3D rcu_dereference(iter->sptep);
=20
 	/*
@@ -645,13 +761,16 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm =
*kvm,
 	 * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and
 	 * does not hold the mmu_lock.
 	 */
-	if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte))
+	if (!try_cmpxchg64(sptep, &iter->old_spte, tmp_spte))
 		return -EBUSY;
=20
 	__handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
-			      new_spte, iter->level, true);
+			      new_spte, sptep_to_sp(sptep)->role, true);
 	handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level);
=20
+	if (freeze_spte)
+		__kvm_tdp_mmu_write_spte(sptep, new_spte);
+
 	return 0;
 }
=20
@@ -718,9 +837,11 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *=
kvm,
  * SPTE had voldatile bits.
  */
 static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
-			      u64 old_spte, u64 new_spte, gfn_t gfn, int level,
-			      bool record_acc_track, bool record_dirty_log)
+			       u64 old_spte, u64 new_spte, gfn_t gfn, int level,
+			       bool record_acc_track, bool record_dirty_log)
 {
+	union kvm_mmu_page_role role;
+
 	lockdep_assert_held_write(&kvm->mmu_lock);
=20
 	/*
@@ -734,7 +855,9 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_i=
d, tdp_ptep_t sptep,
=20
 	old_spte =3D kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
=20
-	__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
+	role =3D sptep_to_sp(sptep)->role;
+	role.level =3D level;
+	__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, role, false);
=20
 	if (record_acc_track)
 		handle_changed_spte_acc_track(old_spte, new_spte, level);
@@ -786,8 +909,11 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struc=
t kvm *kvm,
 			continue;					\
 		else
=20
-#define tdp_mmu_for_each_pte(_iter, _mmu, _start, _end)		\
-	for_each_tdp_pte(_iter, to_shadow_page(_mmu->root.hpa), _start, _end)
+#define tdp_mmu_for_each_pte(_iter, _mmu, _private, _start, _end)	\
+	for_each_tdp_pte(_iter,						\
+		 to_shadow_page((_private) ? _mmu->private_root_hpa :	\
+				_mmu->root.hpa),			\
+		_start, _end)
=20
 /*
  * Yield if the MMU lock is contended or this thread needs to return contr=
ol
@@ -950,6 +1076,14 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct=
 kvm_mmu_page *root,
 	if (!zap_private && is_private_sp(root))
 		return false;
=20
+	/*
+	 * start and end doesn't have GFN shared bit.  This function zaps
+	 * a region including alias.  Adjust shared bit of [start, end) if the
+	 * root is shared.
+	 */
+	start =3D kvm_gfn_for_root(kvm, root, start);
+	end =3D kvm_gfn_for_root(kvm, root, end);
+
 	rcu_read_lock();
=20
 	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
@@ -1078,10 +1212,19 @@ static int tdp_mmu_map_handle_target_level(struct k=
vm_vcpu *vcpu,
 	WARN_ON(sp->role.level !=3D fault->goal_level);
 	if (unlikely(!fault->slot))
 		new_spte =3D make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
-	else
-		wrprot =3D make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
-					 fault->pfn, iter->old_spte, fault->prefetch, true,
-					 fault->map_writable, &new_spte);
+	else {
+		unsigned long pte_access =3D ACC_ALL;
+		gfn_t gfn_unalias =3D iter->gfn & ~kvm_gfn_shared_mask(vcpu->kvm);
+
+		/* TDX shared GPAs are no executable, enforce this for the SDV. */
+		if (kvm_gfn_shared_mask(vcpu->kvm) && !fault->is_private)
+			pte_access &=3D ~ACC_EXEC_MASK;
+
+		wrprot =3D make_spte(vcpu, sp, fault->slot, pte_access,
+				   gfn_unalias, fault->pfn, iter->old_spte,
+				   fault->prefetch, true, fault->map_writable,
+				   &new_spte);
+	}
=20
 	if (new_spte =3D=3D iter->old_spte)
 		ret =3D RET_PF_SPURIOUS;
@@ -1180,6 +1323,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm=
_page_fault *fault)
 {
 	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
 	struct tdp_iter iter;
+	gfn_t raw_gfn;
+	bool is_private =3D fault->is_private;
 	int ret;
=20
 	kvm_mmu_hugepage_adjust(vcpu, fault);
@@ -1188,7 +1333,17 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kv=
m_page_fault *fault)
=20
 	rcu_read_lock();
=20
-	tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
+	raw_gfn =3D gpa_to_gfn(fault->addr);
+
+	if (is_error_noslot_pfn(fault->pfn) ||
+	    !kvm_pfn_to_refcounted_page(fault->pfn)) {
+		if (is_private) {
+			rcu_read_unlock();
+			return -EFAULT;
+		}
+	}
+
+	tdp_mmu_for_each_pte(iter, mmu, is_private, raw_gfn, raw_gfn + 1) {
 		if (fault->nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
=20
@@ -1204,6 +1359,12 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kv=
m_page_fault *fault)
 		    is_large_pte(iter.old_spte)) {
 			if (tdp_mmu_zap_spte_atomic(vcpu->kvm, &iter))
 				break;
+			/*
+			 * TODO: large page support.
+			 * Doesn't support large page for TDX now
+			 */
+			KVM_BUG_ON(is_private_sptep(iter.sptep), vcpu->kvm);
+
=20
 			/*
 			 * The iter must explicitly re-read the spte here
@@ -1446,6 +1607,12 @@ static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_s=
plit(gfp_t gfp, union kvm_mm
=20
 	sp->role =3D role;
 	sp->spt =3D (void *)__get_free_page(gfp);
+	if (kvm_mmu_page_role_is_private(role)) {
+		if (kvm_alloc_private_spt_for_split(sp, gfp)) {
+			free_page((unsigned long)sp->spt);
+			sp->spt =3D NULL;
+		}
+	}
 	if (!sp->spt) {
 		kmem_cache_free(mmu_page_header_cache, sp);
 		return NULL;
@@ -1461,6 +1628,11 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spl=
it(struct kvm *kvm,
 	union kvm_mmu_page_role role =3D tdp_iter_child_role(iter);
 	struct kvm_mmu_page *sp;
=20
+	KVM_BUG_ON(kvm_mmu_page_role_is_private(role) !=3D
+		   is_private_sptep(iter->sptep), kvm);
+	/* TODO: Large page isn't supported for private SPTE yet. */
+	KVM_BUG_ON(kvm_mmu_page_role_is_private(role), kvm);
+
 	/*
 	 * Since we are allocating while under the MMU lock we have to be
 	 * careful about GFP flags. Use GFP_NOWAIT to avoid blocking on direct
@@ -1895,7 +2067,7 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 a=
ddr, u64 *sptes,
 	if (WARN_ON_ONCE(kvm_gfn_shared_mask(vcpu->kvm)))
 		return leaf;
=20
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	tdp_mmu_for_each_pte(iter, mmu, false, gfn, gfn + 1) {
 		leaf =3D iter.level;
 		sptes[leaf] =3D iter.old_spte;
 	}
@@ -1922,7 +2094,10 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_v=
cpu *vcpu, u64 addr,
 	gfn_t gfn =3D addr >> PAGE_SHIFT;
 	tdp_ptep_t sptep =3D NULL;
=20
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	/* fast page fault for private GPA isn't supported. */
+	WARN_ON_ONCE(kvm_is_private_gpa(vcpu->kvm, addr));
+
+	tdp_mmu_for_each_pte(iter, mmu, false, gfn, gfn + 1) {
 		*spte =3D iter.old_spte;
 		sptep =3D iter.sptep;
 	}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index c98c7df449a8..695175c921a5 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -5,7 +5,7 @@
=20
 #include <linux/kvm_host.h>
=20
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu, bool private);
=20
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *=
root)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5d0e8fbca345..263885cd97c1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -209,6 +209,7 @@ struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn)
=20
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(kvm_pfn_to_refcounted_page);
=20
 /*
  * Switches to specified vcpu, until a matching vcpu_put()
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EF393C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:58 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232159AbiI3KXz (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:55 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33602 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231641AbiI3KTR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:17 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C61C915ED2D;
        Fri, 30 Sep 2022 03:19:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533147; x=1696069147;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Fb3dBDV4drTH1Irv7AqwIGDFRaog5goehiwWHDo/z9Q=;
  b=hcSG7u3zmxVTj6Owe/PNSLAzQScwpx1bnJliZem7ZNzUzhGl+yiw5uXG
   XF9zB6uQ19Pwx2nY8gyxTZsShgiYMY69lqWqTgMFQP2V6aYsVD3CKifCr
   UqzewugdJcsHc+ruZfCBK2sXF0A6nHuag45Bjye7rwE5SkyRf3akW1M7x
   CfxtZwhOmK6vxnfgeA1MT6dlGsWWaTy1yAypN9tdUKSXI4JFuL0lSnzNx
   n4rnad9xH23fmUjvriLOdE4/bxZDX980vxsg/TeE3SuvqM04k8KRHaPxB
   gJ5L/Hk53f7xFkbmavXXQdTQiA3ZdV0tYQdxbOD7pXwS8Sdr+ga+vj+iv
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870107"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870107"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:59 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807652"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807652"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:59 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 047/105] [MARKER] The start of TDX KVM patch series: TDX
 EPT violation
Date: Fri, 30 Sep 2022 03:17:41 -0700
Message-Id: 
 <3f4be83cfdc3c4d4701dfe20efe385bb466e7929.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TDX EPT
violation.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index d5cace00c433..c3e675bea802 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -19,12 +19,12 @@ Patch Layer status
 * TDX architectural definitions:        Applied
 * TD VM creation/destruction:           Applied
 * TD vcpu creation/destruction:         Applied
-* TDX EPT violation:                    Not yet
+* TDX EPT violation:                    Applying
 * TD finalization:                      Not yet
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
 * KVM TDP refactoring for TDX:          Applied
-* KVM TDP MMU hooks:                    Applying
+* KVM TDP MMU hooks:                    Applied
 * KVM TDP MMU MapGPA:                   Not yet
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 71F5BC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:24:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232444AbiI3KYq (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:24:46 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34792 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231695AbiI3KTe (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:34 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 541371E277F;
        Fri, 30 Sep 2022 03:19:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533148; x=1696069148;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=5DLqEFdfqIeSaEHQvz+gl/qfG8aUT9uYuqDM4eiTlLk=;
  b=b7035/QRN1Nm0r1t9AnNJvo0tVzxZU5VyXE97uUWExOlykJOv/eVyyia
   XJYRTPfIvXlLkkO8ONQVRP2vRCRDARpre5P5t1nKkp2Omev79occ7vzDt
   yaE3WztAU0IsaYXhzTFSmTKdh7CBxuWZO37oG7NI6X+yb12ftbwaOEPa6
   TM2Rh1nz7aVAlcBsOBoeafgLWqasDHXlteEZTNUxI767QM2rWSuM7jT1O
   93zGrZV2b4eQSLMvUbEsF2T1CZ1IMZ1Oc0u4kQx1dtFKCTN/gjON83z3U
   qb48AxxAxRkZkG7IZpMkaqwhHsoj0Xji86k/CD5hetR+I7rt3u6Lechlp
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870108"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870108"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:59 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807655"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807655"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:59 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 048/105] KVM: x86/mmu: Disallow dirty logging for x86 TDX
Date: Fri, 30 Sep 2022 03:17:42 -0700
Message-Id: 
 <0dc5022c7e7c5f55f3ed490acfc855776fa662b2.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX doesn't support dirty logging.  Report dirty logging isn't supported so
that device model, for example qemu, can properly handle it.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c       |  5 +++++
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c      | 10 +++++++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5006ff5d9f5e..c8b129cb772e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13588,6 +13588,11 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, un=
signed int size,
 }
 EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
=20
+bool kvm_arch_dirty_log_supported(struct kvm *kvm)
+{
+	return kvm->arch.vm_type !=3D KVM_X86_TDX_VM;
+}
+
 bool kvm_arch_has_private_mem(struct kvm *kvm)
 {
 	return kvm->arch.vm_type =3D=3D KVM_X86_TDX_VM;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f5df5f97b477..eca3ca116412 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1477,6 +1477,7 @@ int kvm_arch_del_vm(int usage_count);
 void kvm_arch_pre_destroy_vm(struct kvm *kvm);
 int kvm_arch_create_vm_debugfs(struct kvm *kvm);
 bool kvm_arch_has_private_mem(struct kvm *kvm);
+bool kvm_arch_dirty_log_supported(struct kvm *kvm);
=20
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 263885cd97c1..0dbd1734a246 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1654,10 +1654,18 @@ bool __weak kvm_arch_has_private_mem(struct kvm *kv=
m)
 	return false;
 }
=20
+bool __weak kvm_arch_dirty_log_supported(struct kvm *kvm)
+{
+	return true;
+}
+
 static int check_memory_region_flags(struct kvm *kvm,
 				     const struct kvm_user_mem_region *mem)
 {
-	u32 valid_flags =3D KVM_MEM_LOG_DIRTY_PAGES;
+	u32 valid_flags =3D 0;
+
+	if (kvm_arch_dirty_log_supported(kvm))
+		valid_flags |=3D KVM_MEM_LOG_DIRTY_PAGES;
=20
 #ifdef CONFIG_HAVE_KVM_PRIVATE_MEM
 	if (kvm_arch_has_private_mem(kvm))
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 96B32C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:25:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232299AbiI3KZE (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:25:04 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33702 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231712AbiI3KTk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:40 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86C5B1E3F65;
        Fri, 30 Sep 2022 03:19:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533148; x=1696069148;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=OcqfRMaO4aXNMvZqCqrq4weJBUgFudnXUiOupVE00R4=;
  b=cOAL9uMQBz8NbHePkCcGbO6bJYYhrbEzWmiedzu8AJ7fIuNP75sxwh3c
   rqVApJsN0AhN02XnX2jLOWYnkz13s4JyoI0xurTz69ZoBdeVJVUuz6DmI
   m1H9MttdWW1RbRDbqDeMScPGDJidtwAffmI52Z/n6eDIcVdQ+cVxRNq2H
   tLjnmHtslaKVUzjVLchrlpQciBBd6VpHhKlNUZHH8gQLl1omBOPUhJbjT
   RmHfrr2SN2LK7eYM5Fgf3zLCSzLBR1+RhhzsQJsHnFd+vfL7fEDasP7ms
   SiC+OkRo7XbxXL7o3M6RZ+cszfCDxzoqHK4okVYw81mFfp6bdq4/Eb2M2
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870109"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870109"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:59 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807658"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807658"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:59 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 049/105] KVM: x86/tdp_mmu: Ignore unsupported mmu operation
 on private GFNs
Date: Fri, 30 Sep 2022 03:17:43 -0700
Message-Id: 
 <077438108fab27758a9afcb65f6d6be718c46f15.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Some KVM MMU operations (dirty page logging, page migration, aging page)
aren't supported for private GFNs (yet) with the first generation of TDX.
Silently return on unsupported TDX KVM MMU operations.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c     |  3 ++
 arch/x86/kvm/mmu/tdp_mmu.c | 73 +++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/x86.c         |  3 ++
 3 files changed, 74 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c9013213641e..8b41a73c8264 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6554,6 +6554,9 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *=
kvm,
 	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		sp =3D sptep_to_sp(sptep);
=20
+		/* Private page dirty logging is not supported yet. */
+		KVM_BUG_ON(is_private_sptep(sptep), kvm);
+
 		/*
 		 * We cannot do huge page mapping for indirect shadow pages,
 		 * which are found on the last rmap (level =3D 1) when not using
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3c89f0aa776c..cb25168dbbd5 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1420,7 +1420,8 @@ typedef bool (*tdp_handler_t)(struct kvm *kvm, struct=
 tdp_iter *iter,
=20
 static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
 						   struct kvm_gfn_range *range,
-						   tdp_handler_t handler)
+						   tdp_handler_t handler,
+						   bool only_shared)
 {
 	struct kvm_mmu_page *root;
 	struct tdp_iter iter;
@@ -1431,9 +1432,23 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(s=
truct kvm *kvm,
 	 * into this helper allow blocking; it'd be dead, wasteful code.
 	 */
 	for_each_tdp_mmu_root(kvm, root, range->slot->as_id) {
+		gfn_t start;
+		gfn_t end;
+
+		if (only_shared && is_private_sp(root))
+			continue;
+
 		rcu_read_lock();
=20
-		tdp_root_for_each_leaf_pte(iter, root, range->start, range->end)
+		/*
+		 * For TDX shared mapping, set GFN shared bit to the range,
+		 * so the handler() doesn't need to set it, to avoid duplicated
+		 * code in multiple handler()s.
+		 */
+		start =3D kvm_gfn_for_root(kvm, root, range->start);
+		end =3D kvm_gfn_for_root(kvm, root, range->end);
+
+		tdp_root_for_each_leaf_pte(iter, root, start, end)
 			ret |=3D handler(kvm, &iter, range);
=20
 		rcu_read_unlock();
@@ -1477,7 +1492,12 @@ static bool age_gfn_range(struct kvm *kvm, struct td=
p_iter *iter,
=20
 bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *rang=
e)
 {
-	return kvm_tdp_mmu_handle_gfn(kvm, range, age_gfn_range);
+	/*
+	 * First TDX generation doesn't support clearing A bit for private
+	 * mapping, since there's no secure EPT API to support it.  However
+	 * it's a legitimate request for TDX guest.
+	 */
+	return kvm_tdp_mmu_handle_gfn(kvm, range, age_gfn_range, true);
 }
=20
 static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter,
@@ -1488,7 +1508,8 @@ static bool test_age_gfn(struct kvm *kvm, struct tdp_=
iter *iter,
=20
 bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
-	return kvm_tdp_mmu_handle_gfn(kvm, range, test_age_gfn);
+	/* The first TDX generation doesn't support A bit. */
+	return kvm_tdp_mmu_handle_gfn(kvm, range, test_age_gfn, true);
 }
=20
 static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
@@ -1533,8 +1554,11 @@ bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struc=
t kvm_gfn_range *range)
 	 * No need to handle the remote TLB flush under RCU protection, the
 	 * target SPTE _must_ be a leaf SPTE, i.e. cannot result in freeing a
 	 * shadow page.  See the WARN on pfn_changed in __handle_changed_spte().
+	 *
+	 * .change_pte() callback should not happen for private page, because
+	 * for now TDX private pages are pinned during VM's life time.
 	 */
-	return kvm_tdp_mmu_handle_gfn(kvm, range, set_spte_gfn);
+	return kvm_tdp_mmu_handle_gfn(kvm, range, set_spte_gfn, true);
 }
=20
 /*
@@ -1588,6 +1612,14 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm,
=20
 	lockdep_assert_held_read(&kvm->mmu_lock);
=20
+	/*
+	 * Because first TDX generation doesn't support write protecting private
+	 * mappings and kvm_arch_dirty_log_supported(kvm) =3D false, it's a bug
+	 * to reach here for guest TD.
+	 */
+	if (WARN_ON_ONCE(!kvm_arch_dirty_log_supported(kvm)))
+		return false;
+
 	for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id, true)
 		spte_set |=3D wrprot_gfn_range(kvm, root, slot->base_gfn,
 			     slot->base_gfn + slot->npages, min_level);
@@ -1853,6 +1885,14 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm,
=20
 	lockdep_assert_held_read(&kvm->mmu_lock);
=20
+	/*
+	 * First TDX generation doesn't support clearing dirty bit,
+	 * since there's no secure EPT API to support it.  It is a
+	 * bug to reach here for TDX guest.
+	 */
+	if (WARN_ON_ONCE(!kvm_arch_dirty_log_supported(kvm)))
+		return false;
+
 	for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id, true)
 		spte_set |=3D clear_dirty_gfn_range(kvm, root, slot->base_gfn,
 				slot->base_gfn + slot->npages);
@@ -1919,6 +1959,13 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *k=
vm,
 	struct kvm_mmu_page *root;
=20
 	lockdep_assert_held_write(&kvm->mmu_lock);
+	/*
+	 * First TDX generation doesn't support clearing dirty bit,
+	 * since there's no secure EPT API to support it.  For now silently
+	 * ignore KVM_CLEAR_DIRTY_LOG.
+	 */
+	if (!kvm_arch_dirty_log_supported(kvm))
+		return;
 	for_each_tdp_mmu_root(kvm, root, slot->as_id)
 		clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
 }
@@ -1985,6 +2032,13 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *k=
vm,
=20
 	lockdep_assert_held_read(&kvm->mmu_lock);
=20
+	/*
+	 * This should only be reachable when diryt-log is supported. It's a
+	 * bug to reach here.
+	 */
+	if (WARN_ON_ONCE(!kvm_arch_dirty_log_supported(kvm)))
+		return;
+
 	for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id, true)
 		zap_collapsible_spte_range(kvm, root, slot);
 }
@@ -2038,6 +2092,15 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
 	bool spte_set =3D false;
=20
 	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	/*
+	 * First TDX generation doesn't support write protecting private
+	 * mappings, silently ignore the request.  KVM_GET_DIRTY_LOG etc
+	 * can reach here, no warning.
+	 */
+	if (!kvm_arch_dirty_log_supported(kvm))
+		return false;
+
 	for_each_tdp_mmu_root(kvm, root, slot->as_id)
 		spte_set |=3D write_protect_gfn(kvm, root, gfn, min_level);
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c8b129cb772e..9060ca2b19ee 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12621,6 +12621,9 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kv=
m,
 	u32 new_flags =3D new ? new->flags : 0;
 	bool log_dirty_pages =3D new_flags & KVM_MEM_LOG_DIRTY_PAGES;
=20
+	if (!kvm_arch_dirty_log_supported(kvm) && log_dirty_pages)
+		return;
+
 	/*
 	 * Update CPU dirty logging if dirty logging is being toggled.  This
 	 * applies to all operations.
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A69E2C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:25:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232488AbiI3KZU (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:25:20 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33720 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231810AbiI3KU2 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:20:28 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97B841EAD53;
        Fri, 30 Sep 2022 03:19:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533149; x=1696069149;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=lNZFPSHPlAXihOU3EkAk79iSkPC71ByJOnO4Yx7xM0s=;
  b=XV71/WPzDfru/IXlrKONmza9i99LdzuTMViCMK6AVC9MjK+dIU+IPiB2
   VctLetLc1WpaIyUFw9MyNvPcket0Cg8+6vG8Ijnv8iKArWI+Y0LT9yRri
   nZNL2IJ/Q1t1sxDV67bK0rhDba66clMQ/VGTU/muMo10Ec14HlcYVLTON
   iWLhTH3dYzE/4dSJLheTX1OdDfTqM40kgn81PCDG8S/7+qg8MA1PiYu//
   YAid85L88plGvnuIiGqu/Pi2OtNIwXU6qYSlNRUts49NqYpZ90Y8tVD3x
   DO6MGsll/dE4vPPv5/w/10RqS8feff1fT8C+WXZpPgcxGbj9LY0Yfu7z8
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870111"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870111"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:59 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807661"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807661"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:59 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Kai Huang <kai.huang@intel.com>
Subject: [PATCH v9 050/105] KVM: VMX: Split out guts of EPT violation to
 common/exposed function
Date: Fri, 30 Sep 2022 03:17:44 -0700
Message-Id: 
 <b6b4a42f9f9247c833e83ccfd8f68d74b7824307.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

The difference of TDX EPT violation is how to retrieve information, GPA,
and exit qualification.  To share the code to handle EPT violation, split
out the guts of EPT violation handler so that VMX/TDX exit handler can call
it after retrieving GPA and exit qualification.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/vmx/common.h | 33 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c    | 29 +++++------------------------
 2 files changed, 38 insertions(+), 24 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/common.h

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
new file mode 100644
index 000000000000..235908f3e044
--- /dev/null
+++ b/arch/x86/kvm/vmx/common.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_X86_VMX_COMMON_H
+#define __KVM_X86_VMX_COMMON_H
+
+#include <linux/kvm_host.h>
+
+#include "mmu.h"
+
+static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t =
gpa,
+					     unsigned long exit_qualification)
+{
+	u64 error_code;
+
+	/* Is it a read fault? */
+	error_code =3D (exit_qualification & EPT_VIOLATION_ACC_READ)
+		     ? PFERR_USER_MASK : 0;
+	/* Is it a write fault? */
+	error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_WRITE)
+		      ? PFERR_WRITE_MASK : 0;
+	/* Is it a fetch fault? */
+	error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_INSTR)
+		      ? PFERR_FETCH_MASK : 0;
+	/* ept page table entry is present? */
+	error_code |=3D (exit_qualification & EPT_VIOLATION_RWX_MASK)
+		      ? PFERR_PRESENT_MASK : 0;
+
+	error_code |=3D (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) !=3D =
0 ?
+	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
+
+	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
+}
+
+#endif /* __KVM_X86_VMX_COMMON_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f1e25e4097e1..ec1570b151f5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -50,6 +50,7 @@
 #include <asm/vmx.h>
=20
 #include "capabilities.h"
+#include "common.h"
 #include "cpuid.h"
 #include "evmcs.h"
 #include "hyperv.h"
@@ -5709,11 +5710,10 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
=20
 static int handle_ept_violation(struct kvm_vcpu *vcpu)
 {
-	unsigned long exit_qualification;
-	gpa_t gpa;
-	u64 error_code;
+	unsigned long exit_qualification =3D vmx_get_exit_qual(vcpu);
+	gpa_t gpa =3D vmcs_read64(GUEST_PHYSICAL_ADDRESS);
=20
-	exit_qualification =3D vmx_get_exit_qual(vcpu);
+	trace_kvm_page_fault(gpa, exit_qualification);
=20
 	/*
 	 * EPT violation happened while executing iret from NMI,
@@ -5726,25 +5726,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcp=
u)
 			(exit_qualification & INTR_INFO_UNBLOCK_NMI))
 		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
=20
-	gpa =3D vmcs_read64(GUEST_PHYSICAL_ADDRESS);
-	trace_kvm_page_fault(gpa, exit_qualification);
-
-	/* Is it a read fault? */
-	error_code =3D (exit_qualification & EPT_VIOLATION_ACC_READ)
-		     ? PFERR_USER_MASK : 0;
-	/* Is it a write fault? */
-	error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_WRITE)
-		      ? PFERR_WRITE_MASK : 0;
-	/* Is it a fetch fault? */
-	error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_INSTR)
-		      ? PFERR_FETCH_MASK : 0;
-	/* ept page table entry is present? */
-	error_code |=3D (exit_qualification & EPT_VIOLATION_RWX_MASK)
-		      ? PFERR_PRESENT_MASK : 0;
-
-	error_code |=3D (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) !=3D =
0 ?
-	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
-
 	vcpu->arch.exit_qualification =3D exit_qualification;
=20
 	/*
@@ -5758,7 +5739,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	if (unlikely(allow_smaller_maxphyaddr && kvm_vcpu_is_illegal_gpa(vcpu, gp=
a)))
 		return kvm_emulate_instruction(vcpu, 0);
=20
-	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
+	return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification);
 }
=20
 static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0AA6FC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:25:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232322AbiI3KZ0 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:25:26 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33896 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231817AbiI3KU3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:20:29 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53AA41EF003;
        Fri, 30 Sep 2022 03:19:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533150; x=1696069150;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Yz571VKU7YjksKA9v8sPWDW0ORFXhEjVtvAEg6HxfMA=;
  b=bt//mXXDXnnnYBct4NOjmjN8iAofSw4RrD1effenwaMEXuJDth2jMHq8
   j2XN2Fb9hFIAfVLgSRS5Phhj33OGPTsHeDPA8EDaJrfEAV9hZiW2fmi0N
   JMAL+iRwcUBX8l0d0UTjCtUUbcrT87J/54qpxXOmhHxRCUH1LwpG0Y88i
   ECVuAt6yITwo/Xj7HPvpFbDCQvjW/zqL3kzBeWAwrybHWTEf46GGBC1E2
   E2vliiVdn//5XmSuVtwegj45nRQfYz2D0I2qoy0A/4prS8qBZaXoHp7vF
   RpUqpgoQQQ0A89KxnfJdGADTbOpM8xbrY0D77Yc6rcBlv4Gtn+Vzi22G4
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870112"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870112"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:00 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807666"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807666"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:18:59 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 051/105] KVM: VMX: Move setting of EPT MMU masks to common
 VT-x code
Date: Fri, 30 Sep 2022 03:17:45 -0700
Message-Id: 
 <4de99b84bf43fef3d2c117ab464e5f62d630ac91.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

EPT MMU masks are used commonly for VMX and TDX.  The value needs to be
initialized in common code before both VMX/TDX-specific initialization
code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c | 5 +++++
 arch/x86/kvm/vmx/vmx.c  | 4 ----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index fe927aaee114..03fc1986227b 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -4,6 +4,7 @@
 #include "x86_ops.h"
 #include "vmx.h"
 #include "nested.h"
+#include "mmu.h"
 #include "pmu.h"
 #include "tdx.h"
=20
@@ -26,6 +27,10 @@ static __init int vt_hardware_setup(void)
=20
 	enable_tdx =3D enable_tdx && !tdx_hardware_setup(&vt_x86_ops);
=20
+	if (enable_ept)
+		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
+				      cpu_has_vmx_ept_execute_only());
+
 	return 0;
 }
=20
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ec1570b151f5..d78f37e2e2af 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8259,10 +8259,6 @@ __init int vmx_hardware_setup(void)
=20
 	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
=20
-	if (enable_ept)
-		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
-				      cpu_has_vmx_ept_execute_only());
-
 	/*
 	 * Setup shadow_me_value/shadow_me_mask to include MKTME KeyID
 	 * bits to shadow_zero_check.
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3A1ACC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:24:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232419AbiI3KYU (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:24:20 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34570 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231687AbiI3KTc (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:32 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B316C1EF60F;
        Fri, 30 Sep 2022 03:19:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533150; x=1696069150;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=b2WMrqjkDPXuRIBko4XFZYs8Te2ZXTbLGoZwNbxcH2o=;
  b=eA+9c+MOiGCsaRqBTev12WO3rvZ/ZTRtH/pnmPrkCDSVLIaGuUa4nW3v
   pztjLAGAXGcqPP3mkARxyyRwBYOxb7hYB8RGQAmVJchYU5JwSthbidvVu
   SeT9bLiFigZFM9ALy0Pa4Hpj3uYzuUNsPAUCXndOmcAnt4HEoIqNyVOub
   a8ecz8KBA0kI0Ee8QsK/FBqSHE1XmTzjfXEXA6YPREPWfEa/elhUcaAKw
   8K8krQUNPkukZ7SImcJxPhdW8bGY7Ro9LCKBBZPgzLVMETJ9zaYrD/5D4
   qAxPs8VyqptVSH4g8DEXna1ys6KnL2ibjG55nKTVhmF1LilrCq9+oX+tB
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870113"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870113"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:00 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807670"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807670"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:00 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 052/105] KVM: TDX: Add load_mmu_pgd method for TDX
Date: Fri, 30 Sep 2022 03:17:46 -0700
Message-Id: 
 <e22de7c7c2721d8362177efffb44ceb312f8d5d0.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

For virtual IO, the guest TD shares guest pages with VMM without
encryption.  Shared EPT is used to map guest pages in unprotected way.

Add the VMCS field encoding for the shared EPTP, which will be used by
TDX to have separate EPT walks for private GPAs (existing EPTP) versus
shared GPAs (new shared EPTP).

Set shared EPT pointer value for the TDX guest to initialize TDX MMU.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/vmx.h |  1 +
 arch/x86/kvm/vmx/main.c    | 11 ++++++++++-
 arch/x86/kvm/vmx/tdx.c     |  5 +++++
 arch/x86/kvm/vmx/x86_ops.h |  4 ++++
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index f0f8eecf55ac..e169ace97e83 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -234,6 +234,7 @@ enum vmcs_field {
 	TSC_MULTIPLIER_HIGH             =3D 0x00002033,
 	TERTIARY_VM_EXEC_CONTROL	=3D 0x00002034,
 	TERTIARY_VM_EXEC_CONTROL_HIGH	=3D 0x00002035,
+	SHARED_EPT_POINTER		=3D 0x0000203C,
 	PID_POINTER_TABLE		=3D 0x00002042,
 	PID_POINTER_TABLE_HIGH		=3D 0x00002043,
 	GUEST_PHYSICAL_ADDRESS          =3D 0x00002400,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 03fc1986227b..4d2717c64c62 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -100,6 +100,15 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool =
init_event)
 	return vmx_vcpu_reset(vcpu, init_event);
 }
=20
+static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
+			int pgd_level)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
+
+	vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -219,7 +228,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.write_tsc_offset =3D vmx_write_tsc_offset,
 	.write_tsc_multiplier =3D vmx_write_tsc_multiplier,
=20
-	.load_mmu_pgd =3D vmx_load_mmu_pgd,
+	.load_mmu_pgd =3D vt_load_mmu_pgd,
=20
 	.check_intercept =3D vmx_check_intercept,
 	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index af99a46d1e75..0312172c98cb 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -384,6 +384,11 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	vcpu->kvm->vm_bugged =3D true;
 }
=20
+void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
+{
+	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 37c74f325b97..d4e8fefc37d1 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -147,6 +147,8 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_ev=
ent);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
+
+void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_leve=
l);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
@@ -165,6 +167,8 @@ static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu=
, bool init_event) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
+
+static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,=
 int root_level) {}
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 421BAC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:25:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231734AbiI3KZh (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:25:37 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33728 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231855AbiI3KUj (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:20:39 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5B661F8C05;
        Fri, 30 Sep 2022 03:19:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533153; x=1696069153;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Igv22U9RXdZA7/q9wQNPPWJH+RI/ZzPv1pGdYvbYSTE=;
  b=jfs3mbF6cR+v8nMo9tE9pBhRJ7zPIDtnAI/4JEdlRAk7Ha31ki00opY2
   wVguD0wgLcdIAmKcsCc32yXth2LrLAGrGoowfM8hyIhXgYc+eOzQ1LF0V
   xTX//ye5k4Q9B0h07G+0AaoHca4wFP+nsQfqqL27/q/Ot4tG5OqaWXem/
   IWg9gywznl3CItMnwp6Osgm3V3CjapEmzsF8ZLHKvVuo8nohWJP58BOGk
   fWE5byIzQZorrwgpGIDASU+QKkRp4MHadkGutuTOFzMSUIqz8Amd/P9d8
   RyKjjWzrBskxvCHL8ldculFXXX0yfVG7Eeas/NpVmxhtjHCWQFsYV8tPw
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870114"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870114"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:00 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807675"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807675"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:00 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 053/105] KVM: TDX: don't request KVM_REQ_APIC_PAGE_RELOAD
Date: Fri, 30 Sep 2022 03:17:47 -0700
Message-Id: 
 <6160295eb8f008fe5cd123f60c40a0dbf3ea9ecb.1664530907.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX doesn't need APIC page depending on vapic and its callback is
WARN_ON_ONCE(is_tdx).  To avoid unnecessary overhead and WARN_ON_ONCE(),
skip requesting KVM_REQ_APIC_PAGE_RELOAD when TD.

  WARNING: arch/x86/kvm/vmx/main.c:696 vt_set_apic_access_page_addr+0x3c/0x=
50 [kvm_intel]
  RIP: 0010:vt_set_apic_access_page_addr+0x3c/0x50 [kvm_intel]
  Call Trace:
   vcpu_enter_guest+0x145d/0x24d0 [kvm]
   kvm_arch_vcpu_ioctl_run+0x25d/0xcc0 [kvm]
   kvm_vcpu_ioctl+0x414/0xa30 [kvm]
   __x64_sys_ioctl+0xc0/0x100
   do_syscall_64+0x39/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/x86.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9060ca2b19ee..f6f0a4b56263 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10248,7 +10248,9 @@ void kvm_arch_mmu_notifier_invalidate_range(struct =
kvm *kvm,
 	 * Update it when it becomes invalid.
 	 */
 	apic_address =3D gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-	if (start <=3D apic_address && apic_address < end)
+	/* TDX doesn't need APIC page. */
+	if (kvm->arch.vm_type !=3D KVM_X86_TDX_VM &&
+	    start <=3D apic_address && apic_address < end)
 		kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
 }
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 147F9C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:25:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232388AbiI3KZv (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:25:51 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40458 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231860AbiI3KUk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:20:40 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 311C21E111A;
        Fri, 30 Sep 2022 03:19:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533154; x=1696069154;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=VS+fjXx5KW+x8trTWHiTY8VED+Gp6eRibRac1D6F8h4=;
  b=No8EqmyiqsnZQ1XZEankw6Pea8K5sCOx0BbH3wjfU7bTrhVW0AiTIGIv
   IbQbSGO+AUGT+P6HwCMFxppqS7IekI9Dz+JUnaVsQvHurhq/l17+HyibW
   JqCLgc4pXRFyKaXqIqMvAWhKhjgFLvdVYMnAzEYv5rxEp8zYIaMA2LEhr
   wuwdlCj8YpW44M/dST40uVUaNwpaBPwWqT1NFBms+GbKVRLgA6dJzVPkF
   5XicxVWAye5rpAb5ND3sgKnGQgSFXplkzUdI0+4g0tDrBsTFhmOBoasp+
   VkTI5YMO+TpXVq6hNXyMI5xRYITdF769Bs0HxjbciwrixBQcsf8aN/2KT
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870116"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870116"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:00 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807678"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807678"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:00 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 054/105] KVM: x86/VMX: introduce vmx tlb_remote_flush and
 tlb_remote_flush_with_range
Date: Fri, 30 Sep 2022 03:17:48 -0700
Message-Id: 
 <8cf9b216985bebe427635f963cd216e7e05c35f0.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This is preparation for TDX to define its own tlb_remote_flush and
tlb_remote_flush_with_range.  Currently vmx code defines tlb_remote_flush
and tlb_remote_flush_with_range defined as NULL by default and only when
nested hyper-v guest case, they are defined to non-NULL methods.

To make TDX code to override those two methods consistently with other
methods, define vmx_tlb_remote_flush and vmx_tlb_remote_flush_with_range
as nop and call hyper-v code only when nested hyper-v guest case.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/kvm_onhyperv.c     |  5 ++++-
 arch/x86/kvm/kvm_onhyperv.h     |  1 +
 arch/x86/kvm/mmu/mmu.c          |  2 +-
 arch/x86/kvm/svm/svm_onhyperv.h |  1 +
 arch/x86/kvm/vmx/main.c         |  2 ++
 arch/x86/kvm/vmx/vmx.c          | 34 ++++++++++++++++++++++++++++-----
 arch/x86/kvm/vmx/x86_ops.h      |  3 +++
 7 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/kvm_onhyperv.c b/arch/x86/kvm/kvm_onhyperv.c
index ee4f696a0782..d43518da1c0e 100644
--- a/arch/x86/kvm/kvm_onhyperv.c
+++ b/arch/x86/kvm/kvm_onhyperv.c
@@ -93,11 +93,14 @@ int hv_remote_flush_tlb(struct kvm *kvm)
 }
 EXPORT_SYMBOL_GPL(hv_remote_flush_tlb);
=20
+bool hv_use_remote_flush_tlb __ro_after_init;
+EXPORT_SYMBOL_GPL(hv_use_remote_flush_tlb);
+
 void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
 {
 	struct kvm_arch *kvm_arch =3D &vcpu->kvm->arch;
=20
-	if (kvm_x86_ops.tlb_remote_flush =3D=3D hv_remote_flush_tlb) {
+	if (hv_use_remote_flush_tlb) {
 		spin_lock(&kvm_arch->hv_root_tdp_lock);
 		vcpu->arch.hv_root_tdp =3D root_tdp;
 		if (root_tdp !=3D kvm_arch->hv_root_tdp)
diff --git a/arch/x86/kvm/kvm_onhyperv.h b/arch/x86/kvm/kvm_onhyperv.h
index 287e98ef9df3..9a07a34666fb 100644
--- a/arch/x86/kvm/kvm_onhyperv.h
+++ b/arch/x86/kvm/kvm_onhyperv.h
@@ -10,6 +10,7 @@
 int hv_remote_flush_tlb_with_range(struct kvm *kvm,
 		struct kvm_tlb_range *range);
 int hv_remote_flush_tlb(struct kvm *kvm);
+extern bool hv_use_remote_flush_tlb __ro_after_init;
 void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp);
 #else /* !CONFIG_HYPERV */
 static inline void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8b41a73c8264..6bbfaa24d06c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -242,7 +242,7 @@ static void kvm_flush_remote_tlbs_with_range(struct kvm=
 *kvm,
 {
 	int ret =3D -ENOTSUPP;
=20
-	if (range && kvm_x86_ops.tlb_remote_flush_with_range)
+	if (range && kvm_available_flush_tlb_with_range())
 		ret =3D static_call(kvm_x86_tlb_remote_flush_with_range)(kvm, range);
=20
 	if (ret)
diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyper=
v.h
index e2fc59380465..b3cd61c62305 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.h
+++ b/arch/x86/kvm/svm/svm_onhyperv.h
@@ -36,6 +36,7 @@ static inline void svm_hv_hardware_setup(void)
 		svm_x86_ops.tlb_remote_flush =3D hv_remote_flush_tlb;
 		svm_x86_ops.tlb_remote_flush_with_range =3D
 				hv_remote_flush_tlb_with_range;
+		hv_use_remote_flush_tlb =3D true;
 	}
=20
 	if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH) {
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 4d2717c64c62..025315e0934b 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -178,6 +178,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.flush_tlb_all =3D vmx_flush_tlb_all,
 	.flush_tlb_current =3D vmx_flush_tlb_current,
+	.tlb_remote_flush =3D vmx_tlb_remote_flush,
+	.tlb_remote_flush_with_range =3D vmx_tlb_remote_flush_with_range,
 	.flush_tlb_gva =3D vmx_flush_tlb_gva,
 	.flush_tlb_guest =3D vmx_flush_tlb_guest,
=20
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d78f37e2e2af..3c0587f26f2b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3172,6 +3172,33 @@ void vmx_flush_tlb_current(struct kvm_vcpu *vcpu)
 		vpid_sync_context(vmx_get_current_vpid(vcpu));
 }
=20
+int vmx_tlb_remote_flush(struct kvm *kvm)
+{
+#if IS_ENABLED(CONFIG_HYPERV)
+	if (hv_use_remote_flush_tlb)
+		return hv_remote_flush_tlb(kvm);
+#endif
+	/*
+	 * fallback to KVM_REQ_TLB_FLUSH.
+	 * See kvm_arch_flush_remote_tlb() and kvm_flush_remote_tlbs().
+	 */
+	return -EOPNOTSUPP;
+}
+
+int vmx_tlb_remote_flush_with_range(struct kvm *kvm,
+				    struct kvm_tlb_range *range)
+{
+#if IS_ENABLED(CONFIG_HYPERV)
+	if (hv_use_remote_flush_tlb)
+		return hv_remote_flush_tlb_with_range(kvm, range);
+#endif
+	/*
+	 * fallback to tlb_remote_flush. See
+	 * kvm_flush_remote_tlbs_with_range()
+	 */
+	return -EOPNOTSUPP;
+}
+
 void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
 {
 	/*
@@ -8226,11 +8253,8 @@ __init int vmx_hardware_setup(void)
=20
 #if IS_ENABLED(CONFIG_HYPERV)
 	if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
-	    && enable_ept) {
-		vt_x86_ops.tlb_remote_flush =3D hv_remote_flush_tlb;
-		vt_x86_ops.tlb_remote_flush_with_range =3D
-				hv_remote_flush_tlb_with_range;
-	}
+	    && enable_ept)
+		hv_use_remote_flush_tlb =3D true;
 #endif
=20
 	if (!cpu_has_vmx_ple()) {
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index d4e8fefc37d1..f3c6ab7e517f 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -92,6 +92,9 @@ void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long =
rflags);
 bool vmx_get_if_flag(struct kvm_vcpu *vcpu);
 void vmx_flush_tlb_all(struct kvm_vcpu *vcpu);
 void vmx_flush_tlb_current(struct kvm_vcpu *vcpu);
+int vmx_tlb_remote_flush(struct kvm *kvm);
+int vmx_tlb_remote_flush_with_range(struct kvm *kvm,
+				    struct kvm_tlb_range *range);
 void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr);
 void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu);
 void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask);
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ABE80C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:26:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232520AbiI3K0O (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:26:14 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34148 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231878AbiI3KUp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:20:45 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69C7D1FBC84;
        Fri, 30 Sep 2022 03:19:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533155; x=1696069155;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=hyBFh+FarfkmXzmrpHAeW1rE5zBWAYb52vtutxnlXTs=;
  b=Al7Kt5LPVQba+JR7wrfKi4GzTlg+TBRaqWndgbWbr6Ui7mD265pMGQb/
   /pdS+8T5JmMzOP8WbnVRHUjF2cEHvzS9xpBGWPiT5He6PoZDCQZbdddMj
   igg13jgrq9eAtERRbk8We4CadMNhSWv2TMCCZBRcfbokWercZeW8yo+sK
   NEeoUIDsjRftvy2CREyYVyxaEP5TCePs1DS4Ace2e+zFNM2T6rUgf+H6G
   Jb4Ks9NgWpNnFafbv2yvPCPhnGefFqgcxg+4BPB7y9OWebaJieHozzEWo
   KUfUQfwqstBeNo/R7Wvj2OrVbPL+CcOhnJ9cRD7VKX/fy5m4IMH2LXMqL
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870117"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870117"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:00 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807682"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807682"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:00 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 055/105] KVM: TDX: TDP MMU TDX support
Date: Fri, 30 Sep 2022 03:17:49 -0700
Message-Id: 
 <41aa5461e9df28a45826389bb319d30c2b26560d.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implement hooks of TDP MMU for TDX backend.  TLB flush, TLB shootdown,
propagating the change private EPT entry to Secure EPT and freeing Secure
EPT page. TLB flush handles both shared EPT and private EPT.  It flushes
shared EPT same as VMX.  It also waits for the TDX TLB shootdown.  For the
hook to free Secure EPT page, unlinks the Secure EPT page from the Secure
EPT so that the page can be freed to OS.

Propagate the entry change to Secure EPT.  The possible entry changes are
present -> non-present(zapping) and non-present -> present(population).  On
population just link the Secure EPT page or the private guest page to the
Secure EPT by TDX SEAMCALL. Because TDP MMU allows concurrent
zapping/population, zapping requires synchronous TLB shoot down with the
frozen EPT entry.  It zaps the secure entry, increments TLB counter, sends
IPI to remote vcpus to trigger TLB flush, and then unlinks the private
guest page from the Secure EPT. For simplicity, batched zapping with
exclude lock is handled as concurrent zapping.  Although it's inefficient,
it can be optimized in the future.

For MMIO SPTE, the spte value changes as follows.
initial value (suppress VE bit is set)
-> Guest issues MMIO and triggers EPT violation
-> KVM updates SPTE value to MMIO value (suppress VE bit is cleared)
-> Guest MMIO resumes.  It triggers VE exception in guest TD
-> Guest VE handler issues TDG.VP.VMCALL<MMIO>
-> KVM handles MMIO
-> Guest VE handler resumes its execution after MMIO instruction

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/spte.c    |   3 +-
 arch/x86/kvm/mmu/tdp_mmu.c |   4 +
 arch/x86/kvm/vmx/main.c    |  61 +++++++-
 arch/x86/kvm/vmx/tdx.c     | 289 ++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h     |  21 +++
 arch/x86/kvm/vmx/x86_ops.h |   4 +
 6 files changed, 374 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 8f468ee2b985..3167c12d9c74 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -74,7 +74,8 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsign=
ed int access)
 	u64 spte =3D generation_mmio_spte_mask(gen);
 	u64 gpa =3D gfn << PAGE_SHIFT;
=20
-	WARN_ON_ONCE(!vcpu->kvm->arch.shadow_mmio_value);
+	WARN_ON_ONCE(!vcpu->kvm->arch.shadow_mmio_value &&
+		     !kvm_gfn_shared_mask(vcpu->kvm));
=20
 	access &=3D shadow_mmio_access_mask;
 	spte |=3D vcpu->kvm->arch.shadow_mmio_value | access;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index cb25168dbbd5..784dcfaed505 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -566,6 +566,10 @@ static void handle_changed_private_spte(struct kvm *kv=
m, gfn_t gfn,
 	} else if (was_leaf) {
 		/* non-present -> non-present doesn't make sense. */
 		KVM_BUG_ON(!was_present, kvm);
+		/*
+		 * Zap private leaf SPTE.  Zapping private table is done
+		 * below in handle_removed_tdp_mmu_page().
+		 */
 		static_call(kvm_x86_zap_private_spte)(kvm, gfn, level);
 		static_call(kvm_x86_remove_private_spte)(kvm, gfn, level, old_pfn);
 	}
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 025315e0934b..10aacde3a40a 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -100,6 +100,55 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool =
init_event)
 	return vmx_vcpu_reset(vcpu, init_event);
 }
=20
+static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_flush_tlb(vcpu);
+
+	vmx_flush_tlb_all(vcpu);
+}
+
+static void vt_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_flush_tlb(vcpu);
+
+	vmx_flush_tlb_current(vcpu);
+}
+
+static int vt_tlb_remote_flush(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return tdx_sept_tlb_remote_flush(kvm);
+
+	return vmx_tlb_remote_flush(kvm);
+}
+
+static int vt_tlb_remote_flush_with_range(struct kvm *kvm,
+					  struct kvm_tlb_range *range)
+{
+	if (is_td(kvm))
+		return -EOPNOTSUPP; /* fall back to tlb_remote_flush */
+
+	return vmx_tlb_remote_flush_with_range(kvm, range);
+}
+
+static void vt_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_flush_tlb_gva(vcpu, addr);
+}
+
+static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_flush_tlb_guest(vcpu);
+}
+
 static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			int pgd_level)
 {
@@ -176,12 +225,12 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.set_rflags =3D vmx_set_rflags,
 	.get_if_flag =3D vmx_get_if_flag,
=20
-	.flush_tlb_all =3D vmx_flush_tlb_all,
-	.flush_tlb_current =3D vmx_flush_tlb_current,
-	.tlb_remote_flush =3D vmx_tlb_remote_flush,
-	.tlb_remote_flush_with_range =3D vmx_tlb_remote_flush_with_range,
-	.flush_tlb_gva =3D vmx_flush_tlb_gva,
-	.flush_tlb_guest =3D vmx_flush_tlb_guest,
+	.flush_tlb_all =3D vt_flush_tlb_all,
+	.flush_tlb_current =3D vt_flush_tlb_current,
+	.tlb_remote_flush =3D vt_tlb_remote_flush,
+	.tlb_remote_flush_with_range =3D vt_tlb_remote_flush_with_range,
+	.flush_tlb_gva =3D vt_flush_tlb_gva,
+	.flush_tlb_guest =3D vt_flush_tlb_guest,
=20
 	.vcpu_pre_run =3D vmx_vcpu_pre_run,
 	.vcpu_run =3D vmx_vcpu_run,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 0312172c98cb..e08ead40c964 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -6,7 +6,9 @@
 #include "capabilities.h"
 #include "x86_ops.h"
 #include "tdx.h"
+#include "vmx.h"
 #include "x86.h"
+#include "mmu.h"
=20
 #undef pr_fmt
 #define pr_fmt(fmt) "tdx: " fmt
@@ -283,10 +285,28 @@ int tdx_vm_init(struct kvm *kvm)
 {
 	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
=20
+	/*
+	 * Because guest TD is protected, VMM can't parse the instruction in TD.
+	 * Instead, guest uses MMIO hypercall.  For unmodified device driver,
+	 * #VE needs to be injected for MMIO and #VE handler in TD converts MMIO
+	 * instruction into MMIO hypercall.
+	 *
+	 * SPTE value for MMIO needs to be setup so that #VE is injected into
+	 * TD instead of triggering EPT MISCONFIG.
+	 * - RWX=3D0 so that EPT violation is triggered.
+	 * - suppress #VE bit is cleared to inject #VE.
+	 */
+	kvm_mmu_set_mmio_spte_value(kvm, 0);
+
+	/* TODO: Enable 2mb and 1gb large page support. */
+	kvm->arch.tdp_max_page_level =3D PG_LEVEL_4K;
+
 	/* vCPUs can't be created until after KVM_TDX_INIT_VM. */
 	kvm->max_vcpus =3D 0;
 	kvm_tdx->hkid =3D -1;
=20
+	spin_lock_init(&kvm_tdx->seamcall_lock);
+
 	/*
 	 * This function initializes only KVM software construct.  It doesn't
 	 * initialize TDX stuff, e.g. TDCS, TDR, TDCX, HKID etc.
@@ -389,6 +409,246 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t ro=
ot_hpa, int pgd_level)
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
 }
=20
+static void tdx_unpin_pfn(struct kvm *kvm, kvm_pfn_t pfn)
+{
+	struct page *page =3D pfn_to_page(pfn);
+
+	put_page(page);
+}
+
+static void __tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
+					enum pg_level level, kvm_pfn_t pfn)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	hpa_t hpa =3D pfn_to_hpa(pfn);
+	gpa_t gpa =3D gfn_to_gpa(gfn);
+	struct tdx_module_output out;
+	u64 err;
+
+	if (WARN_ON_ONCE(is_error_noslot_pfn(pfn) ||
+			 !kvm_pfn_to_refcounted_page(pfn)))
+		return;
+
+	/* TODO: handle large pages. */
+	if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm))
+		return;
+
+	/* To prevent page migration, do nothing on mmu notifier. */
+	get_page(pfn_to_page(pfn));
+
+	if (likely(is_td_finalized(kvm_tdx))) {
+		err =3D tdh_mem_page_aug(kvm_tdx->tdr.pa, gpa, hpa, &out);
+		if (KVM_BUG_ON(err, kvm)) {
+			pr_tdx_error(TDH_MEM_PAGE_AUG, err, &out);
+			put_page(pfn_to_page(pfn));
+		}
+		return;
+	}
+}
+
+static void tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
+				      enum pg_level level, kvm_pfn_t pfn)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+
+	spin_lock(&kvm_tdx->seamcall_lock);
+	__tdx_sept_set_private_spte(kvm, gfn, level, pfn);
+	spin_unlock(&kvm_tdx->seamcall_lock);
+}
+
+static void tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
+				       enum pg_level level, kvm_pfn_t pfn)
+{
+	int tdx_level =3D pg_level_to_tdx_sept_level(level);
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	gpa_t gpa =3D gfn_to_gpa(gfn);
+	hpa_t hpa =3D pfn_to_hpa(pfn);
+	hpa_t hpa_with_hkid;
+	struct tdx_module_output out;
+	u64 err =3D 0;
+
+	/* TODO: handle large pages. */
+	if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm))
+		return;
+
+	spin_lock(&kvm_tdx->seamcall_lock);
+	if (is_hkid_assigned(kvm_tdx)) {
+		err =3D tdh_mem_page_remove(kvm_tdx->tdr.pa, gpa, tdx_level, &out);
+		if (KVM_BUG_ON(err, kvm)) {
+			pr_tdx_error(TDH_MEM_PAGE_REMOVE, err, &out);
+			goto unlock;
+		}
+
+		hpa_with_hkid =3D set_hkid_to_hpa(hpa, (u16)kvm_tdx->hkid);
+		err =3D tdh_phymem_page_wbinvd(hpa_with_hkid);
+		if (WARN_ON_ONCE(err)) {
+			pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL);
+			goto unlock;
+		}
+	} else
+		/*
+		 * The HKID assigned to this TD was already freed and cache
+		 * was already flushed. We don't have to flush again.
+		 */
+		err =3D tdx_reclaim_page((unsigned long)__va(hpa), hpa, false, 0);
+
+unlock:
+	spin_unlock(&kvm_tdx->seamcall_lock);
+
+	if (!err)
+		tdx_unpin_pfn(kvm, pfn);
+}
+
+static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,
+				     enum pg_level level, void *private_spt)
+{
+	int tdx_level =3D pg_level_to_tdx_sept_level(level);
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	gpa_t gpa =3D gfn_to_gpa(gfn);
+	hpa_t hpa =3D __pa(private_spt);
+	struct tdx_module_output out;
+	u64 err;
+
+	spin_lock(&kvm_tdx->seamcall_lock);
+	err =3D tdh_mem_sept_add(kvm_tdx->tdr.pa, gpa, tdx_level, hpa, &out);
+	spin_unlock(&kvm_tdx->seamcall_lock);
+	if (KVM_BUG_ON(err, kvm)) {
+		pr_tdx_error(TDH_MEM_SEPT_ADD, err, &out);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static void tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn,
+				      enum pg_level level)
+{
+	int tdx_level =3D pg_level_to_tdx_sept_level(level);
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	gpa_t gpa =3D gfn_to_gpa(gfn);
+	struct tdx_module_output out;
+	u64 err;
+
+	/* For now large page isn't supported yet. */
+	WARN_ON_ONCE(level !=3D PG_LEVEL_4K);
+	spin_lock(&kvm_tdx->seamcall_lock);
+	err =3D tdh_mem_range_block(kvm_tdx->tdr.pa, gpa, tdx_level, &out);
+	spin_unlock(&kvm_tdx->seamcall_lock);
+	if (KVM_BUG_ON(err, kvm))
+		pr_tdx_error(TDH_MEM_RANGE_BLOCK, err, &out);
+}
+
+/*
+ * TLB shoot down procedure:
+ * There is a global epoch counter and each vcpu has local epoch counter.
+ * - TDH.MEM.RANGE.BLOCK(TDR. level, range) on one vcpu
+ *   This blocks the subsequenct creation of TLB translation on that range.
+ *   This corresponds to clear the present bit(all RXW) in EPT entry
+ * - TDH.MEM.TRACK(TDR): advances the epoch counter which is global.
+ * - IPI to remote vcpus
+ * - TDExit and re-entry with TDH.VP.ENTER on remote vcpus
+ * - On re-entry, TDX module compares the local epoch counter with the glo=
bal
+ *   epoch counter.  If the local epoch counter is older than the global e=
poch
+ *   counter, update the local epoch counter and flushes TLB.
+ */
+static void tdx_track(struct kvm_tdx *kvm_tdx)
+{
+	u64 err;
+
+	KVM_BUG_ON(!is_hkid_assigned(kvm_tdx), &kvm_tdx->kvm);
+	/* If TD isn't finalized, it's before any vcpu running. */
+	if (unlikely(!is_td_finalized(kvm_tdx)))
+		return;
+
+	/*
+	 * tdx_flush_tlb() waits for this function to issue TDH.MEM.TRACK() by
+	 * the counter.  The counter is used instead of bool because multiple
+	 * TDH_MEM_TRACK() can be issued concurrently by multiple vcpus.
+	 */
+	atomic_inc(&kvm_tdx->tdh_mem_track);
+	/*
+	 * KVM_REQ_TLB_FLUSH waits for the empty IPI handler, ack_flush(), with
+	 * KVM_REQUEST_WAIT.
+	 */
+	kvm_make_all_cpus_request(&kvm_tdx->kvm, KVM_REQ_TLB_FLUSH);
+
+	spin_lock(&kvm_tdx->seamcall_lock);
+	err =3D tdh_mem_track(kvm_tdx->tdr.pa);
+	spin_unlock(&kvm_tdx->seamcall_lock);
+
+	/* Release remote vcpu waiting for TDH.MEM.TRACK in tdx_flush_tlb(). */
+	atomic_dec(&kvm_tdx->tdh_mem_track);
+
+	if (KVM_BUG_ON(err, &kvm_tdx->kvm))
+		pr_tdx_error(TDH_MEM_TRACK, err, NULL);
+
+}
+
+static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
+				     enum pg_level level, void *private_spt)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	int ret;
+
+	/*
+	 * free_private_spt() is (obviously) called when a shadow page is being
+	 * zapped.  KVM doesn't (yet) zap private SPs while the TD is active.
+	 * Note: This function is for private shadow page.  Not for private
+	 * guest page.   private guest page can be zapped during TD is active.
+	 * shared <-> private conversion and slot move/deletion.
+	 *
+	 * TODO: large page support.  If large page is supported, S-EPT page
+	 * can be freed when promoting 4K page to 2M/1G page during TD running.
+	 * In such case, flush cache and TDH.PAGE.RECLAIM.
+	 */
+	if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm))
+		return -EINVAL;
+
+	/*
+	 * The HKID assigned to this TD was already freed and cache was
+	 * already flushed. We don't have to flush again.
+	 */
+	spin_lock(&kvm_tdx->seamcall_lock);
+	ret =3D tdx_reclaim_page((unsigned long)private_spt, __pa(private_spt), f=
alse, 0);
+	spin_unlock(&kvm_tdx->seamcall_lock);
+
+	return ret;
+}
+
+int tdx_sept_tlb_remote_flush(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx;
+
+	if (!is_td(kvm))
+		return -EOPNOTSUPP;
+
+	kvm_tdx =3D to_kvm_tdx(kvm);
+	if (is_hkid_assigned(kvm_tdx))
+		tdx_track(kvm_tdx);
+
+	return 0;
+}
+
+static void tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
+					 enum pg_level level, kvm_pfn_t pfn)
+{
+	/*
+	 * TDX requires TLB tracking before dropping private page.  Do
+	 * it here, although it is also done later.
+	 * If hkid isn't assigned, the guest is destroying and no vcpu
+	 * runs further.  TLB shootdown isn't needed.
+	 *
+	 * TODO: implement with_range version for optimization.
+	 * kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+	 *   =3D> tdx_sept_tlb_remote_flush_with_range(kvm, gfn,
+	 *                                 KVM_PAGES_PER_HPAGE(level));
+	 */
+	if (is_hkid_assigned(to_kvm_tdx(kvm)))
+		kvm_flush_remote_tlbs(kvm);
+
+	tdx_sept_drop_private_spte(kvm, gfn, level, pfn);
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
@@ -801,6 +1061,25 @@ static int tdx_td_init(struct kvm *kvm, struct kvm_td=
x_cmd *cmd)
 	return ret;
 }
=20
+void tdx_flush_tlb(struct kvm_vcpu *vcpu)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
+	u64 root_hpa =3D mmu->root.hpa;
+
+	/* Flush the shared EPTP, if it's valid. */
+	if (VALID_PAGE(root_hpa))
+		ept_sync_context(construct_eptp(vcpu, root_hpa,
+						mmu->root_role.level));
+
+	/*
+	 * See tdx_track().  Wait for tlb shootdown initiater to finish
+	 * TDH_MEM_TRACK() so that TLB is flushed on the next TDENTER.
+	 */
+	while (atomic_read(&kvm_tdx->tdh_mem_track))
+		cpu_relax();
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -1010,8 +1289,16 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x8=
6_ops)
 	if (!r)
 		r =3D tdx_module_setup();
 	vmxoff_all();
+	if (r)
+		return r;
=20
-	return r;
+	x86_ops->link_private_spt =3D tdx_sept_link_private_spt;
+	x86_ops->free_private_spt =3D tdx_sept_free_private_spt;
+	x86_ops->set_private_spte =3D tdx_sept_set_private_spte;
+	x86_ops->remove_private_spte =3D tdx_sept_remove_private_spte;
+	x86_ops->zap_private_spte =3D tdx_sept_zap_private_spte;
+
+	return 0;
 }
=20
 void tdx_hardware_unsetup(void)
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 4ce236a0cab2..5f25f866291e 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -24,8 +24,23 @@ struct kvm_tdx {
 	int hkid;
=20
 	bool finalized;
+	atomic_t tdh_mem_track;
=20
 	u64 tsc_offset;
+
+	/*
+	 * Some SEAMCALLs try to lock TD resources (e.g. Secure-EPT) they use or
+	 * update.  If TDX module fails to obtain the lock, it returns
+	 * TDX_OPERAND_BUSY error without spinning.  It's VMM/OS responsibility
+	 * to retry or guarantee no contention because TDX module has the
+	 * restriction on cpu cycles it can spend and VMM/OS knows better
+	 * vcpu scheduling.
+	 *
+	 * TDP MMU uses read lock of kvm.arch.mmu_lock so TDP MMU code can be
+	 * run concurrently with multiple vCPUs.   Lock to prevent seamcalls from
+	 * running concurrently when TDP MMU is enabled.
+	 */
+	spinlock_t seamcall_lock;
 };
=20
 struct vcpu_tdx {
@@ -181,6 +196,12 @@ static __always_inline u64 td_tdcs_exec_read64(struct =
kvm_tdx *kvm_tdx, u32 fiel
 	return out.r8;
 }
=20
+static __always_inline int pg_level_to_tdx_sept_level(enum pg_level level)
+{
+	WARN_ON_ONCE(level =3D=3D PG_LEVEL_NONE);
+	return level - 1;
+}
+
 #else
 struct kvm_tdx {
 	struct kvm kvm;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index f3c6ab7e517f..7d6d9a6c2562 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -151,6 +151,8 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_ev=
ent);
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
=20
+void tdx_flush_tlb(struct kvm_vcpu *vcpu);
+int tdx_sept_tlb_remote_flush(struct kvm *kvm);
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_leve=
l);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
@@ -171,6 +173,8 @@ static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu=
, bool init_event) {}
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
=20
+static inline void tdx_flush_tlb(struct kvm_vcpu *vcpu) {}
+static inline int tdx_sept_tlb_remote_flush(struct kvm *kvm) { return 0; }
 static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,=
 int root_level) {}
 #endif
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BF6B4C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:25:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232561AbiI3KZ4 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:25:56 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40536 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231605AbiI3KUk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:20:40 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3710C20856B;
        Fri, 30 Sep 2022 03:19:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533158; x=1696069158;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=A805Zxruh6yYNtylIzj9f+DsWSAMkPS9vWZxb71Yd8s=;
  b=fK7hIKNGBr84y5QUFJMYXERkj7iqE+i3PWBE/d0nHTT9tiN5UfRfOD/s
   ujtCnM8y1H4s/b9qpeTRY7LT4SmlIVx6mwhs8z7Hkl7YL66TdIxPDu0kA
   feIIijFXMzqbiJhrtYQ5Dq7VIG0DCsoyIujQ+8RAwox8oG0MGJ8NJ58ub
   wUNAz7M4smHLHeyqTs282nAz8W6aGCpzbm91l1i7z2ZyxRPAZI6JC0YDj
   beBqPc09fJILs0oaGGtvrzWN0DuF+J3zP9qRlGC5kkqP5o/JESbxUmPAM
   EuB6+/CNEC3sG6AUriul/KJiiGXtX7xYnzCpynnJKoDlpbTmp42cC9Usa
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870118"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870118"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:00 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807686"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807686"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:00 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 056/105] [MARKER] The start of TDX KVM patch series: KVM
 TDP MMU MapGPA
Date: Fri, 30 Sep 2022 03:17:50 -0700
Message-Id: 
 <66be3394bafbbe603b7f637c6edc3c12992679c4.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of KVM TDP MMU
MapGPA.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index c3e675bea802..5797d172176d 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -11,6 +11,7 @@ What qemu can do
 - TDX VM TYPE is exposed to Qemu.
 - Qemu can create/destroy guest of TDX vm type.
 - Qemu can create/destroy vcpu of TDX vm type.
+- Qemu can populate initial guest memory image.
=20
 Patch Layer status
 ------------------
@@ -19,7 +20,7 @@ Patch Layer status
 * TDX architectural definitions:        Applied
 * TD VM creation/destruction:           Applied
 * TD vcpu creation/destruction:         Applied
-* TDX EPT violation:                    Applying
+* TDX EPT violation:                    Applied
 * TD finalization:                      Not yet
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 56EACC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:26:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232585AbiI3K0V (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:26:21 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42276 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231932AbiI3KVE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:21:04 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36E021E05C9;
        Fri, 30 Sep 2022 03:19:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533158; x=1696069158;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=HLHgtHqsIjWYwcn8V6phxCpOor1CcaO2v8wpXvHA2G0=;
  b=EdzkZTqqxndF6dltWZKZaA5YIOn3rX099LateAFGlwhBIC6vYs6D30k/
   jB/RjW7uZM5Z6AOdw4Q61PkDIgQN0ovfO4AxXC9PjNYTW5aGri1jG4keh
   RHpw2AwgcRsi9mtPYQ4t0mAjJgJ010f0Uvk5gqPdmgN1wITgORwlQPlse
   hrMxHAuy6ZhN/prfICve5DRuWKqiOpGQDLdAN6OBlq63lwt5oIhe9cBey
   0z/WmixXmLbASdwLxgLtuJZ++SaF8pgI0AoeZPogEGMRX8DevjndmLGQB
   RtEFxM9cyc0E3P6h5dVbqIlo40VuSUcmE2QGORhJ4g8RVVyuxdVRlk0r+
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870121"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870121"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:01 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807689"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807689"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:01 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 057/105] KVM: Add functions to set GFN to private or shared
Date: Fri, 30 Sep 2022 03:17:51 -0700
Message-Id: 
 <60ecd050492a8dc393e7e2be3a2e8ad8b7856f99.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX KVM support needs to track whether GFN is private or shared.  Introduce
functions to set whether GFN is private or shared and pre-allocate memory
for xarray.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/Kconfig     |  2 +-
 include/linux/kvm_host.h | 17 +++++++++++-
 virt/kvm/kvm_main.c      | 56 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 350a921b15cb..e968ecab4d0a 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -49,7 +49,7 @@ config KVM
 	select SRCU
 	select INTERVAL_TREE
 	select HAVE_KVM_PM_NOTIFIER if PM
-	select HAVE_KVM_PRIVATE_MEM if X86_64
+	select HAVE_KVM_PRIVATE_MEM if KVM_MMU_PRIVATE
 	help
 	  Support hosting fully virtualized guest machines using hardware
 	  virtualization extensions.  You will need a fairly recent
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index eca3ca116412..5e11ccbc23af 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2294,9 +2294,20 @@ static inline void kvm_handle_signal_exit(struct kvm=
_vcpu *vcpu)
 #define KVM_MEM_ATTR_PRIVATE	0x0002
=20
 #ifdef __KVM_HAVE_ARCH_UPDATE_MEM_ATTR
+/* memory attr on [start, end) */
+int kvm_vm_reserve_mem_attr(struct kvm *kvm, gfn_t start, gfn_t end);
+int kvm_vm_set_mem_attr(struct kvm *kvm, int attr, gfn_t start, gfn_t end);
 void kvm_arch_update_mem_attr(struct kvm *kvm, unsigned int attr,
 			      gfn_t start, gfn_t end);
 #else
+static inline int kvm_vm_reserve_mem_attr(struct kvm *kvm, gfn_t start, gf=
n_t end)
+{
+	return -EOPNOTSUPP;
+}
+static inline int kvm_vm_set_mem_attr(struct kvm *kvm, int attr, gfn_t sta=
rt, gfn_t end)
+{
+	return -EOPNOTSUPP;
+}
 static inline void kvm_arch_update_mem_attr(struct kvm *kvm, unsigned int =
attr,
 					    gfn_t start, gfn_t end)
 {
@@ -2326,7 +2337,11 @@ static inline bool kvm_mem_is_private(struct kvm *kv=
m, gfn_t gfn)
 {
 	return !xa_load(&kvm->mem_attr_array, gfn);
 }
-
+#else
+static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
+{
+	return false;
+}
 #endif /* CONFIG_HAVE_KVM_PRIVATE_MEM */
=20
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0dbd1734a246..20c46f26691d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1019,6 +1019,62 @@ static inline void kvm_private_mem_unregister(struct=
 kvm_memory_slot *slot)
 	inaccessible_unregister_notifier(slot->private_file, &slot->notifier);
 }
=20
+/*
+ * Reserve memory for [start, end) so that the next set oepration won't fa=
il
+ * with -ENOMEM.
+ */
+int kvm_vm_reserve_mem_attr(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+	int r =3D 0;
+	gfn_t gfn;
+
+	xa_lock(&kvm->mem_attr_array);
+	for (gfn =3D start; gfn < end; gfn++) {
+		r =3D __xa_insert(&kvm->mem_attr_array, gfn, NULL, GFP_KERNEL_ACCOUNT);
+		if (r =3D=3D -EBUSY)
+			r =3D 0;
+		if (r)
+			break;
+	}
+	xa_unlock(&kvm->mem_attr_array);
+
+	return r;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_reserve_mem_attr);
+
+/* Set memory attr for [start, end) */
+int kvm_vm_set_mem_attr(struct kvm *kvm, int attr, gfn_t start, gfn_t end)
+{
+	void *entry;
+	gfn_t gfn;
+	int r;
+
+	/* By default, the entry is private. */
+	switch (attr) {
+	case KVM_MEM_ATTR_PRIVATE:
+		entry =3D NULL;
+		break;
+	case KVM_MEM_ATTR_SHARED:
+		entry =3D xa_mk_value(KVM_MEM_ATTR_SHARED);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return -EINVAL;
+	}
+
+	WARN_ON_ONCE(start >=3D end);
+	for (gfn =3D start; gfn < end; gfn++) {
+		r =3D xa_err(xa_store(&kvm->mem_attr_array, gfn, entry,
+				    GFP_KERNEL_ACCOUNT));
+		if (r)
+			break;
+	}
+	if (start < gfn)
+		kvm_arch_update_mem_attr(kvm, attr, start, gfn);
+	return r;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_set_mem_attr);
+
 #else /* !CONFIG_HAVE_KVM_PRIVATE_MEM */
=20
 static inline void kvm_private_mem_register(struct kvm_memory_slot *slot)
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 650E3C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:26:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232613AbiI3K0m (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:26:42 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34792 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231981AbiI3KVS (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:21:18 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 199B715AB49;
        Fri, 30 Sep 2022 03:19:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533160; x=1696069160;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=51xqS3whQ2fAWcwxbOtIISmE29u7kwNJM4kAauqE20E=;
  b=Czatodoi3Mc03LtYH3f7ddbnikmtEbIumsR04T58lUoP1wjRQ2r6+L7m
   QvyPaAHiArbrr/4hi6brcMyX6h4XIpJJ9TQCyoluc+h/JKV5juUxWZVpk
   0c8kypsv17VVAzhanscyFy5fWSYj/oBSHjN+Mj/yY+t5zQz/0zKA2O6o9
   397nfbz6I6kP4nkt3jtJ/dkM5H55N1tFwOgXp2RCB9NIQFv3lHNi+m23g
   N/8jfVfhH5FNVkkTamV9ouNftyT1nTYPJAZj3J1lqj8ixJu4LJP4/2dyJ
   MIFjjPkQ5EcjLmRbwpBAluiCLvXiUIfZL4dZ1VJWAErgCUedKgpsEQVvl
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870122"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870122"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:01 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807693"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807693"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:01 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 058/105] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for
 use by TDX
Date: Fri, 30 Sep 2022 03:17:52 -0700
Message-Id: 
 <d7e1c5b8f27bdc63db829c55274b826ccee710bd.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Introduce a helper to directly (pun intended) fault-in a TDP page
without having to go through the full page fault path.  This allows
TDX to get the resulting pfn and also allows the RET_PF_* enums to
stay in mmu.c where they belong.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu.h     |  3 +++
 arch/x86/kvm/mmu/mmu.c | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index f86fb04fb7d7..3cd969bf5b69 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -154,6 +154,9 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vc=
pu)
 					  vcpu->arch.mmu->root_role.level);
 }
=20
+kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
+			       u32 error_code, int max_level);
+
 /*
  * Check if a given access (described through the I/D, W/R and U/S bits of=
 a
  * page fault error code pfec) causes a permission fault with the given PTE
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6bbfaa24d06c..d463bc13c094 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4451,6 +4451,45 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct=
 kvm_page_fault *fault)
 	return direct_page_fault(vcpu, fault);
 }
=20
+kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
+			       u32 error_code, int max_level)
+{
+	int r;
+	struct kvm_page_fault fault =3D (struct kvm_page_fault) {
+		.addr =3D gpa,
+		.error_code =3D error_code,
+		.exec =3D error_code & PFERR_FETCH_MASK,
+		.write =3D error_code & PFERR_WRITE_MASK,
+		.present =3D error_code & PFERR_PRESENT_MASK,
+		.rsvd =3D error_code & PFERR_RSVD_MASK,
+		.user =3D error_code & PFERR_USER_MASK,
+		.prefetch =3D false,
+		.is_tdp =3D true,
+		.nx_huge_page_workaround_enabled =3D is_nx_huge_page_enabled(vcpu->kvm),
+		.is_private =3D kvm_is_private_gpa(vcpu->kvm, gpa),
+	};
+
+	if (mmu_topup_memory_caches(vcpu, false))
+		return KVM_PFN_ERR_FAULT;
+
+	/*
+	 * Loop on the page fault path to handle the case where an mmu_notifier
+	 * invalidation triggers RET_PF_RETRY.  In the normal page fault path,
+	 * KVM needs to resume the guest in case the invalidation changed any
+	 * of the page fault properties, i.e. the gpa or error code.  For this
+	 * path, the gpa and error code are fixed by the caller, and the caller
+	 * expects failure if and only if the page fault can't be fixed.
+	 */
+	do {
+		fault.max_level =3D max_level;
+		fault.req_level =3D PG_LEVEL_4K;
+		fault.goal_level =3D PG_LEVEL_4K;
+		r =3D direct_page_fault(vcpu, &fault);
+	} while (r =3D=3D RET_PF_RETRY && !is_error_noslot_pfn(fault.pfn));
+	return fault.pfn;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page);
+
 static void nonpaging_init_context(struct kvm_mmu *context)
 {
 	context->page_fault =3D nonpaging_page_fault;
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8050EC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:26:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232438AbiI3K00 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:26:26 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33796 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231964AbiI3KVM (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:21:12 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 497D515AB52;
        Fri, 30 Sep 2022 03:19:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533161; x=1696069161;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=/Veqw9EW8tUuQFbI7sS3UqOsMHilx9sG0zWvby86DKg=;
  b=bC+992OB2UZhTwqLpcwc+k4kcH/sJGOFHxAEm+YJU4ZUPXJljVA7TY/b
   68u5NB2+NtG0UcfmhC73ZPUHnLsQTg4D7KolsCaaMwBD7uj2zIIkxRiaa
   xd1AKQAUsoPKXaew1rXhiIcdOBmMBbUFJUz86VW63wxpaIqKf0LiF5a5f
   kJR9rnChyXHNFu7P7sFdYFpD1TIjirjYPrihmBLna7UrO4cJ+GSlZ6uuF
   zodDIauSSG/qQ2Vz/S0QeXZxFdOqLa0s1iSX0ilIHlZiclpVmVTLHULPB
   2afLoIl/7SAkol08w3++34EBKCBNB+ExHj6PH+boGxebZiTHBVnhtnWGb
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870123"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870123"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:01 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807696"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807696"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:01 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 059/105] KVM: x86/tdp_mmu: implement MapGPA hypercall for
 TDX
Date: Fri, 30 Sep 2022 03:17:53 -0700
Message-Id: 
 <79cfc130e36a7754977af7460dc752188073d103.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX Guest-Hypervisor communication interface(GHCI) specification
defines MapGPA hypercall for guest TD to request the host VMM to map given
GPA range as private or shared.

It means the guest TD uses the GPA as shared (or private).  The GPA
won't be used as private (or shared).  VMM should enforce GPA usage. VMM
doesn't have to map the GPA on the hypercall request.

- Zap the aliased region.
  If shared (or private) GPA is requested, zap private (or shared) GPA
  (modulo shared bit).
- Record the request GPA is shared (or private) by kvm.mem_attr_array.
- Don't map GPA. The GPA is mapped on the next EPT violation.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu.h         |  5 ++++
 arch/x86/kvm/mmu/mmu.c     | 60 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu/tdp_mmu.c | 35 ++++++++++++++++++++++
 arch/x86/kvm/mmu/tdp_mmu.h |  3 ++
 4 files changed, 103 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 3cd969bf5b69..d67ca298983c 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -217,6 +217,11 @@ static inline u8 permission_fault(struct kvm_vcpu *vcp=
u, struct kvm_mmu *mmu,
=20
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
=20
+int __kvm_mmu_map_gpa(struct kvm *kvm, gfn_t *startp, gfn_t end,
+		      bool map_private);
+int kvm_mmu_map_gpa(struct kvm_vcpu *vcpu, gfn_t *startp, gfn_t end,
+		    bool map_private);
+
 int kvm_mmu_post_init_vm(struct kvm *kvm);
 void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
=20
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d463bc13c094..9a4acf2ad694 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6743,6 +6743,66 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, =
u64 gen)
 	}
 }
=20
+int __kvm_mmu_map_gpa(struct kvm *kvm, gfn_t *startp, gfn_t end,
+		      bool map_private)
+{
+	gfn_t start =3D *startp;
+	int attr;
+	int ret;
+
+	if (!kvm_gfn_shared_mask(kvm))
+		return -EOPNOTSUPP;
+
+	attr =3D map_private ? KVM_MEM_ATTR_PRIVATE : KVM_MEM_ATTR_SHARED;
+	start =3D start & ~kvm_gfn_shared_mask(kvm);
+	end =3D end & ~kvm_gfn_shared_mask(kvm);
+
+	/*
+	 * To make the following kvm_vm_set_mem_attr() success within spinlock
+	 * without memory allocation.
+	 */
+	ret =3D kvm_vm_reserve_mem_attr(kvm, start, end);
+	if (ret)
+		return ret;
+
+	write_lock(&kvm->mmu_lock);
+	if (is_tdp_mmu_enabled(kvm)) {
+		gfn_t s =3D start;
+
+		ret =3D kvm_tdp_mmu_map_gpa(kvm, &s, end, map_private);
+		if (!ret) {
+			KVM_BUG_ON(kvm_vm_set_mem_attr(kvm, attr, start, end), kvm);
+		} else if (ret =3D=3D -EAGAIN) {
+			KVM_BUG_ON(kvm_vm_set_mem_attr(kvm, attr, start, s), kvm);
+			start =3D s;
+		}
+	} else {
+		ret =3D -EOPNOTSUPP;
+	}
+	write_unlock(&kvm->mmu_lock);
+
+	if (ret =3D=3D -EAGAIN) {
+		if (map_private)
+			*startp =3D kvm_gfn_private(kvm, start);
+		else
+			*startp =3D kvm_gfn_shared(kvm, start);
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(__kvm_mmu_map_gpa);
+
+int kvm_mmu_map_gpa(struct kvm_vcpu *vcpu, gfn_t *startp, gfn_t end,
+		    bool map_private)
+{
+	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
+
+	if (!VALID_PAGE(mmu->root.hpa) || !VALID_PAGE(mmu->private_root_hpa))
+		return -EINVAL;
+
+	return __kvm_mmu_map_gpa(vcpu->kvm, startp, end, map_private);
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_map_gpa);
+
 static unsigned long
 mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 {
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 784dcfaed505..7078c75d7103 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -2111,6 +2111,41 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
 	return spte_set;
 }
=20
+int kvm_tdp_mmu_map_gpa(struct kvm *kvm,
+			gfn_t *startp, gfn_t end, bool map_private)
+{
+	struct kvm_mmu_page *root;
+	gfn_t start =3D *startp;
+	bool flush =3D false;
+	int i;
+
+	lockdep_assert_held_write(&kvm->mmu_lock);
+	KVM_BUG_ON(start & kvm_gfn_shared_mask(kvm), kvm);
+	KVM_BUG_ON(end & kvm_gfn_shared_mask(kvm), kvm);
+
+	kvm_mmu_invalidate_begin(kvm, start, end);
+	for (i =3D 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		for_each_tdp_mmu_root_yield_safe(kvm, root, i) {
+			if (is_private_sp(root) =3D=3D map_private)
+				continue;
+
+			/*
+			 * TODO: If necessary, return to the caller with -EAGAIN
+			 * instead of yield-and-resume within
+			 * tdp_mmu_zap_leafs().
+			 */
+			flush =3D tdp_mmu_zap_leafs(kvm, root, start, end,
+						  /*can_yield=3D*/true, flush,
+						  /*zap_private=3D*/is_private_sp(root));
+		}
+	}
+	if (flush)
+		kvm_flush_remote_tlbs_with_address(kvm, start, end - start);
+	kvm_mmu_invalidate_end(kvm, start, end);
+
+	return 0;
+}
+
 /*
  * Return the level of the lowest level SPTE added to sptes.
  * That SPTE may be non-present.
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 695175c921a5..cb13bc1c3679 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -51,6 +51,9 @@ void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kvm,
 				      gfn_t start, gfn_t end,
 				      int target_level, bool shared);
=20
+int kvm_tdp_mmu_map_gpa(struct kvm *kvm,
+			gfn_t *startp, gfn_t end, bool map_private);
+
 static inline void kvm_tdp_mmu_walk_lockless_begin(void)
 {
 	rcu_read_lock();
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 68F03C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:27:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232463AbiI3K07 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:26:59 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33896 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231440AbiI3KVs (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:21:48 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D9B715AB57;
        Fri, 30 Sep 2022 03:19:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533162; x=1696069162;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=yzsW3HP34uHnzDykzPATauL2DDgMbwFkjBwFhJu1DX0=;
  b=bGCWtDFok15B4/ln6H3HOf5eT8cKCfFp6CTc6OJXgV2LGPPIvda3SXlE
   hj203x7aqLp5KesFvEbI9sXyJSTVf2U3lLyVZz4SIjICIU2i6ckGuc4HC
   mjYzjGiWzcBts/CS8+kEBdaOReNumNZYIfkcXPZy/oSXO8MFW1gXYBvg+
   LsaLqXHFciD0hzhRhZ5dWPj3jRnggpAQH8gFYM9eR5YZzswvcQVinCIMo
   FjjKUkbinixEIDa2TYbwPsPlr0sHVapnZmyAPKJnD/oGgtVpgXmSdooUu
   lRBOYp1F8sMmMOffoCpQ/ncpKMIe2o6eABwlh0YHwyPnIUzScZhy4z2B6
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870125"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870125"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:01 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807699"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807699"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:01 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 060/105] [MARKER] The start of TDX KVM patch series: TD
 finalization
Date: Fri, 30 Sep 2022 03:17:54 -0700
Message-Id: 
 <b7377dd0768f61b486c6c7fc53c539f67f0698d9.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD finalization.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 5797d172176d..53897312699f 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -21,11 +21,11 @@ Patch Layer status
 * TD VM creation/destruction:           Applied
 * TD vcpu creation/destruction:         Applied
 * TDX EPT violation:                    Applied
-* TD finalization:                      Not yet
+* TD finalization:                      Applying
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
 * KVM TDP refactoring for TDX:          Applied
 * KVM TDP MMU hooks:                    Applied
-* KVM TDP MMU MapGPA:                   Not yet
+* KVM TDP MMU MapGPA:                   Applied
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D7AF2C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:26:33 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231698AbiI3K0c (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:26:32 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33818 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231974AbiI3KVN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:21:13 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C602615AB5E;
        Fri, 30 Sep 2022 03:19:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533163; x=1696069163;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ii8Icoumwvf9eUytTukPSYgAMbCQomA4LexB5FKVM7k=;
  b=cH+MYT2G9vO2gnSjyJiPANBaKfCkS+sIoztbH0uQfzI1sQy61D3MOXLk
   Dyjl3AkFuQlV+oA3PuE7c5PMFpCybyRTRsM4/gYhlU7512ailCJt9F9uN
   AOT1OI7sAMK4klKohbPwj84OvGJSD8kaLMUDBXxuvLOtykInz8D4byf7X
   LXLRNxSQqnDifBy91SsAwwuVm4yRNOaM71T0TZa5xQ9ermwDWwpFs523P
   1UEVbiNJID1uN3RVHCfoSlRapbWXufgiGnnLD+SjkzURvSRBuEeNAt+S5
   mLDSq6O6gmzaWeKK8do6ZdJKZJA2VDrvW0t/WFXBr03TXW6nX5Wak5XQC
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870126"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870126"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:01 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807702"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807702"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:01 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 061/105] KVM: TDX: Create initial guest memory
Date: Fri, 30 Sep 2022 03:17:55 -0700
Message-Id: 
 <880ebf1403ca96d56d435e2cca4d8465b7591a79.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because the guest memory is protected in TDX, the creation of the initial
guest memory requires a dedicated TDX module API, tdh_mem_page_add, instead
of directly copying the memory contents into the guest memory in the case
of the default VM type.  KVM MMU page fault handler callback,
private_page_add, handles it.

Define new subcommand, KVM_TDX_INIT_MEM_REGION, of VM-scoped
KVM_MEMORY_ENCRYPT_OP.  It assigns the guest page, copies the initial
memory contents into the guest memory, encrypts the guest memory.  At the
same time, optionally it extends memory measurement of the TDX guest.  It
calls the KVM MMU page fault(EPT-violation) handler to trigger the
callbacks for it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/uapi/asm/kvm.h       |   9 ++
 arch/x86/kvm/mmu/mmu.c                |   1 +
 arch/x86/kvm/vmx/tdx.c                | 147 +++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h                |   2 +
 tools/arch/x86/include/uapi/asm/kvm.h |   9 ++
 5 files changed, 163 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 801b78b957fa..af91a7d27bd2 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -540,6 +540,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -617,4 +618,12 @@ struct kvm_tdx_init_vm {
 	};
 };
=20
+#define KVM_TDX_MEASURE_MEMORY_REGION	(1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9a4acf2ad694..c2ca85d4dd6e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5458,6 +5458,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 out:
 	return r;
 }
+EXPORT_SYMBOL(kvm_mmu_load);
=20
 void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index e08ead40c964..0bd0f5945788 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -409,6 +409,21 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t roo=
t_hpa, int pgd_level)
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
 }
=20
+static void tdx_measure_page(struct kvm_tdx *kvm_tdx, hpa_t gpa)
+{
+	struct tdx_module_output out;
+	u64 err;
+	int i;
+
+	for (i =3D 0; i < PAGE_SIZE; i +=3D TDX_EXTENDMR_CHUNKSIZE) {
+		err =3D tdh_mr_extend(kvm_tdx->tdr.pa, gpa + i, &out);
+		if (KVM_BUG_ON(err, &kvm_tdx->kvm)) {
+			pr_tdx_error(TDH_MR_EXTEND, err, &out);
+			break;
+		}
+	}
+}
+
 static void tdx_unpin_pfn(struct kvm *kvm, kvm_pfn_t pfn)
 {
 	struct page *page =3D pfn_to_page(pfn);
@@ -423,27 +438,58 @@ static void __tdx_sept_set_private_spte(struct kvm *k=
vm, gfn_t gfn,
 	hpa_t hpa =3D pfn_to_hpa(pfn);
 	gpa_t gpa =3D gfn_to_gpa(gfn);
 	struct tdx_module_output out;
+	hpa_t source_pa;
 	u64 err;
=20
 	if (WARN_ON_ONCE(is_error_noslot_pfn(pfn) ||
 			 !kvm_pfn_to_refcounted_page(pfn)))
 		return;
=20
-	/* TODO: handle large pages. */
-	if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm))
-		return;
-
 	/* To prevent page migration, do nothing on mmu notifier. */
 	get_page(pfn_to_page(pfn));
=20
+	/* Build-time faults are induced and handled via TDH_MEM_PAGE_ADD. */
 	if (likely(is_td_finalized(kvm_tdx))) {
+		/* TODO: handle large pages. */
+		if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm))
+			return;
+
 		err =3D tdh_mem_page_aug(kvm_tdx->tdr.pa, gpa, hpa, &out);
 		if (KVM_BUG_ON(err, kvm)) {
 			pr_tdx_error(TDH_MEM_PAGE_AUG, err, &out);
-			put_page(pfn_to_page(pfn));
+			tdx_unpin_pfn(kvm, pfn);
 		}
 		return;
 	}
+
+	/* KVM_INIT_MEM_REGION, tdx_init_mem_region(), supports only 4K page. */
+	if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm))
+		return;
+
+	/*
+	 * In case of TDP MMU, fault handler can run concurrently.  Note
+	 * 'source_pa' is a TD scope variable, meaning if there are multiple
+	 * threads reaching here with all needing to access 'source_pa', it
+	 * will break.  However fortunately this won't happen, because below
+	 * TDH_MEM_PAGE_ADD code path is only used when VM is being created
+	 * before it is running, using KVM_TDX_INIT_MEM_REGION ioctl (which
+	 * always uses vcpu 0's page table and protected by vcpu->mutex).
+	 */
+	if (KVM_BUG_ON(kvm_tdx->source_pa =3D=3D INVALID_PAGE, kvm)) {
+		tdx_unpin_pfn(kvm, pfn);
+		return;
+	}
+
+	source_pa =3D kvm_tdx->source_pa & ~KVM_TDX_MEASURE_MEMORY_REGION;
+
+	err =3D tdh_mem_page_add(kvm_tdx->tdr.pa, gpa, hpa, source_pa, &out);
+	if (KVM_BUG_ON(err, kvm)) {
+		pr_tdx_error(TDH_MEM_PAGE_ADD, err, &out);
+		tdx_unpin_pfn(kvm, pfn);
+	} else if ((kvm_tdx->source_pa & KVM_TDX_MEASURE_MEMORY_REGION))
+		tdx_measure_page(kvm_tdx, gpa);
+
+	kvm_tdx->source_pa =3D INVALID_PAGE;
 }
=20
 static void tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
@@ -1080,6 +1126,94 @@ void tdx_flush_tlb(struct kvm_vcpu *vcpu)
 		cpu_relax();
 }
=20
+#define TDX_SEPT_PFERR	PFERR_WRITE_MASK
+
+static int tdx_init_mem_region(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	struct kvm_tdx_init_mem_region region;
+	struct kvm_vcpu *vcpu;
+	struct page *page;
+	kvm_pfn_t pfn;
+	int idx, ret =3D 0;
+
+	/* The BSP vCPU must be created before initializing memory regions. */
+	if (!atomic_read(&kvm->online_vcpus))
+		return -EINVAL;
+
+	if (cmd->flags & ~KVM_TDX_MEASURE_MEMORY_REGION)
+		return -EINVAL;
+
+	if (copy_from_user(&region, (void __user *)cmd->data, sizeof(region)))
+		return -EFAULT;
+
+	/* Sanity check */
+	if (!IS_ALIGNED(region.source_addr, PAGE_SIZE) ||
+	    !IS_ALIGNED(region.gpa, PAGE_SIZE) ||
+	    !region.nr_pages ||
+	    region.gpa + (region.nr_pages << PAGE_SHIFT) <=3D region.gpa ||
+	    !kvm_is_private_gpa(kvm, region.gpa) ||
+	    !kvm_is_private_gpa(kvm, region.gpa + (region.nr_pages << PAGE_SHIFT)=
))
+		return -EINVAL;
+
+	vcpu =3D kvm_get_vcpu(kvm, 0);
+	if (mutex_lock_killable(&vcpu->mutex))
+		return -EINTR;
+
+	vcpu_load(vcpu);
+	idx =3D srcu_read_lock(&kvm->srcu);
+
+	kvm_mmu_reload(vcpu);
+
+	while (region.nr_pages) {
+		if (signal_pending(current)) {
+			ret =3D -ERESTARTSYS;
+			break;
+		}
+
+		if (need_resched())
+			cond_resched();
+
+
+		/* Pin the source page. */
+		ret =3D get_user_pages_fast(region.source_addr, 1, 0, &page);
+		if (ret < 0)
+			break;
+		if (ret !=3D 1) {
+			ret =3D -ENOMEM;
+			break;
+		}
+
+		kvm_tdx->source_pa =3D pfn_to_hpa(page_to_pfn(page)) |
+				     (cmd->flags & KVM_TDX_MEASURE_MEMORY_REGION);
+
+		pfn =3D kvm_mmu_map_tdp_page(vcpu, region.gpa, TDX_SEPT_PFERR,
+					   PG_LEVEL_4K);
+		if (is_error_noslot_pfn(pfn) || kvm->vm_bugged)
+			ret =3D -EFAULT;
+		else
+			ret =3D 0;
+
+		put_page(page);
+		if (ret)
+			break;
+
+		region.source_addr +=3D PAGE_SIZE;
+		region.gpa +=3D PAGE_SIZE;
+		region.nr_pages--;
+	}
+
+	srcu_read_unlock(&kvm->srcu, idx);
+	vcpu_put(vcpu);
+
+	mutex_unlock(&vcpu->mutex);
+
+	if (copy_to_user((void __user *)cmd->data, &region, sizeof(region)))
+		ret =3D -EFAULT;
+
+	return ret;
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -1096,6 +1230,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_TDX_INIT_VM:
 		r =3D tdx_td_init(kvm, &tdx_cmd);
 		break;
+	case KVM_TDX_INIT_MEM_REGION:
+		r =3D tdx_init_mem_region(kvm, &tdx_cmd);
+		break;
 	default:
 		r =3D -EINVAL;
 		goto out;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 5f25f866291e..8a2ad0b980e6 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -23,6 +23,8 @@ struct kvm_tdx {
 	u64 xfam;
 	int hkid;
=20
+	hpa_t source_pa;
+
 	bool finalized;
 	atomic_t tdh_mem_track;
=20
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 35e3b4aa2e96..37e713ffab72 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -540,6 +540,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -617,4 +618,12 @@ struct kvm_tdx_init_vm {
 	};
 };
=20
+#define KVM_TDX_MEASURE_MEMORY_REGION	(1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9B4DDC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:26:57 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232624AbiI3K0y (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:26:54 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33724 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232046AbiI3KVl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:21:41 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABE2F15AB6E;
        Fri, 30 Sep 2022 03:19:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533165; x=1696069165;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=L4Y4+HfgPTiRbb6pNrZHfFkDdcGSHG+6n0iBguwwcdQ=;
  b=GXDzskt+R1ODqDKwoqhfZpLib1AE0g12RQuoRYubao31BD8eKP0eszyW
   AxuvGPQzKG/+EO8o0MDomAfdHh9FFoTl2Mg10W4SGrgMuPMRKtq2iHt96
   DkCmnp1tPCw862XmClafutLpBENfg/3fFwf1dqJqRqah18DnKXdxLITEV
   jsy8NmWnKig2Np845n5Z4/UM/P3NV7R25BfZPgTTpAYRlJ7fctn9kjd5Y
   GjYrH8/qarChkzkmxhi49z0DtELRHZIsgb+ZHO0iGmLMI4YRZx31zJgVM
   oAdRx7Ser+wwJ3B05/rUuhc/5R6N7zk+s1F3vYkiuId9N9960DqIMWtd+
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="281870127"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="281870127"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:02 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807705"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807705"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:01 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 062/105] KVM: TDX: Finalize VM initialization
Date: Fri, 30 Sep 2022 03:17:56 -0700
Message-Id: 
 <b5b04f33a6210b956c830ffb7cee11e10b6d36fa.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

To protect the initial contents of the guest TD, the TDX module measures
the guest TD during the build process as SHA-384 measurement.  The
measurement of the guest TD contents needs to be completed to make the
guest TD ready to run.

Add a new subcommand, KVM_TDX_FINALIZE_VM, for VM-scoped
KVM_MEMORY_ENCRYPT_OP to finalize the measurement and mark the TDX VM ready
to run.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/uapi/asm/kvm.h       |  1 +
 arch/x86/kvm/vmx/tdx.c                | 21 +++++++++++++++++++++
 tools/arch/x86/include/uapi/asm/kvm.h |  1 +
 3 files changed, 23 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index af91a7d27bd2..e409247739a4 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -541,6 +541,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
 	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 0bd0f5945788..6c1730443497 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1214,6 +1214,24 @@ static int tdx_init_mem_region(struct kvm *kvm, stru=
ct kvm_tdx_cmd *cmd)
 	return ret;
 }
=20
+static int tdx_td_finalizemr(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	u64 err;
+
+	if (!is_td_initialized(kvm) || is_td_finalized(kvm_tdx))
+		return -EINVAL;
+
+	err =3D tdh_mr_finalize(kvm_tdx->tdr.pa);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MR_FINALIZE, err, NULL);
+		return -EIO;
+	}
+
+	kvm_tdx->finalized =3D true;
+	return 0;
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -1233,6 +1251,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_TDX_INIT_MEM_REGION:
 		r =3D tdx_init_mem_region(kvm, &tdx_cmd);
 		break;
+	case KVM_TDX_FINALIZE_VM:
+		r =3D tdx_td_finalizemr(kvm);
+		break;
 	default:
 		r =3D -EINVAL;
 		goto out;
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 37e713ffab72..0aeb4639be89 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -541,6 +541,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
 	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CF335C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232163AbiI3KW1 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:27 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33802 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231538AbiI3KTE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:04 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 393E0166F06;
        Fri, 30 Sep 2022 03:19:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533143; x=1696069143;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=aIudZd3USQowTJaovq/Lv5Tlgdhp2EQL5ZZ1g16g3i0=;
  b=hCgr4LElulrol6Po9KoOY0YjOuSYj9in15x04MQ8tgVEBGeRwNCscgLz
   EqJ5FJyPMCjIAIzgrrTHE65nH9GbEZAPs89+7JMU7G+PY+ZZLzVkrEd/d
   m6WHTtjt1vUqDxLEgM9fTb0vRafCKyyQk69kt+FzLMStx0FrNJ1Iz0xh9
   JPE81qaYkpiwXDZLtwibmHVDiAlQnlDdrn5pRzw25oh6OMUZcMunH/NQC
   XVbnfABwFgzUCHIClmZnzhbWA99nNa4xDuYnWtZCzvJIGX+3QifSKWO4K
   tJ4RscIhjggjbXQAa0bU+Tb2DxyrVs0bb7lnu+xDao52F0uy9SNE/8B5h
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="285294794"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="285294794"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:02 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807708"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807708"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:02 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 063/105] [MARKER] The start of TDX KVM patch series: TD
 vcpu enter/exit
Date: Fri, 30 Sep 2022 03:17:57 -0700
Message-Id: 
 <9044a72104cf9e8c0c7817a973e008136bef1358.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD vcpu
enter/exit.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 53897312699f..b51e8e6b1541 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -12,6 +12,7 @@ What qemu can do
 - Qemu can create/destroy guest of TDX vm type.
 - Qemu can create/destroy vcpu of TDX vm type.
 - Qemu can populate initial guest memory image.
+- Qemu can finalize guest TD.
=20
 Patch Layer status
 ------------------
@@ -21,8 +22,8 @@ Patch Layer status
 * TD VM creation/destruction:           Applied
 * TD vcpu creation/destruction:         Applied
 * TDX EPT violation:                    Applied
-* TD finalization:                      Applying
-* TD vcpu enter/exit:                   Not yet
+* TD finalization:                      Applied
+* TD vcpu enter/exit:                   Applying
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0E820C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232272AbiI3KW4 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:56 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33856 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230491AbiI3KTG (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:06 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79F0E166F14;
        Fri, 30 Sep 2022 03:19:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533144; x=1696069144;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Rdj8i/x6WV4gTaBhgFiEgUILrvKPKY9MlLj3JObbtPM=;
  b=mlN1MPWCXff4rmdyS77acapS5ZitgRJMW9HhyRD6U18P6F+wlQDQSe6C
   41p17eDFGFJ4ThgcFfFoF/dXETiBiKh5rf44DXbs8ZTo2SotZJW1A7Pdn
   swp3yXyjZ2PmzJ9DX5DSkeBWHuDxApcJO2bskmjT4mZyjxNMaMNRDNCiY
   YTpLXRcGbNxMQB5CDwtyIkDGlcQzr/SmRkOJjJNJxVJvAuqiamtYljTMw
   GY24wqnyM0s93KfCc5/pCp1qIrfUeIOA7jBSvcRKNMPmJDk6LIjyB/kg3
   7LSWWH7LkZBOIXzxftVy6fdfzqpkMOeuQLt40dOPczAwPznL3YYiDHh6M
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="285294795"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="285294795"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:02 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807711"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807711"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:02 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 064/105] KVM: TDX: Add helper assembly function to TDX vcpu
Date: Fri, 30 Sep 2022 03:17:58 -0700
Message-Id: 
 <2fedaedc09669f03c510248320709b964db11959.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX defines an API to run TDX vcpu with its own ABI.  Define an assembly
helper function to run TDX vcpu to hide the special ABI so that C code can
call it with function call ABI.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/vmenter.S | 146 +++++++++++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 6de96b943804..edc05c8e61a8 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -3,6 +3,7 @@
 #include <asm/asm.h>
 #include <asm/asm-offsets.h>
 #include <asm/bitsperlong.h>
+#include <asm/errno.h>
 #include <asm/kvm_vcpu_regs.h>
 #include <asm/nospec-branch.h>
 #include <asm/percpu.h>
@@ -31,6 +32,13 @@
 #define VCPU_R15	__VCPU_REGS_R15 * WORD_SIZE
 #endif
=20
+#ifdef CONFIG_INTEL_TDX_HOST
+#define TDENTER 		0
+#define EXIT_REASON_TDCALL	77
+#define TDENTER_ERROR_BIT	63
+#define seamcall		.byte 0x66,0x0f,0x01,0xcf
+#endif
+
 .section .noinstr.text, "ax"
=20
 /**
@@ -360,3 +368,141 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
 	pop %_ASM_BP
 	RET
 SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff)
+
+#ifdef CONFIG_INTEL_TDX_HOST
+
+.pushsection .noinstr.text, "ax"
+
+/**
+ * __tdx_vcpu_run - Call SEAMCALL(TDENTER) to run a TD vcpu
+ * @tdvpr:	physical address of TDVPR
+ * @regs:	void * (to registers of TDVCPU)
+ * @gpr_mask:	non-zero if guest registers need to be loaded prior to TDENT=
ER
+ *
+ * Returns:
+ *	TD-Exit Reason
+ *
+ * Note: KVM doesn't support using XMM in its hypercalls, it's the HyperV
+ *	 code's responsibility to save/restore XMM registers on TDVMCALL.
+ */
+SYM_FUNC_START(__tdx_vcpu_run)
+	push %rbp
+	mov  %rsp, %rbp
+
+	push %r15
+	push %r14
+	push %r13
+	push %r12
+	push %rbx
+
+	/* Save @regs, which is needed after TDENTER to capture output. */
+	push %rsi
+
+	/* Load @tdvpr to RCX */
+	mov %rdi, %rcx
+
+	/* No need to load guest GPRs if the last exit wasn't a TDVMCALL. */
+	test %dx, %dx
+	je 1f
+
+	/* Load @regs to RAX, which will be clobbered with $TDENTER anyways. */
+	mov %rsi, %rax
+
+	mov VCPU_RBX(%rax), %rbx
+	mov VCPU_RDX(%rax), %rdx
+	mov VCPU_RBP(%rax), %rbp
+	mov VCPU_RSI(%rax), %rsi
+	mov VCPU_RDI(%rax), %rdi
+
+	mov VCPU_R8 (%rax),  %r8
+	mov VCPU_R9 (%rax),  %r9
+	mov VCPU_R10(%rax), %r10
+	mov VCPU_R11(%rax), %r11
+	mov VCPU_R12(%rax), %r12
+	mov VCPU_R13(%rax), %r13
+	mov VCPU_R14(%rax), %r14
+	mov VCPU_R15(%rax), %r15
+
+	/*  Load TDENTER to RAX.  This kills the @regs pointer! */
+1:	mov $TDENTER, %rax
+
+2:	seamcall
+
+	/* Skip to the exit path if TDENTER failed. */
+	bt $TDENTER_ERROR_BIT, %rax
+	jc 4f
+
+	/* Temporarily save the TD-Exit reason. */
+	push %rax
+
+	/* check if TD-exit due to TDVMCALL */
+	cmp $EXIT_REASON_TDCALL, %ax
+
+	/* Reload @regs to RAX. */
+	mov 8(%rsp), %rax
+
+	/* Jump on non-TDVMCALL */
+	jne 3f
+
+	/* Save all output from SEAMCALL(TDENTER) */
+	mov %rbx, VCPU_RBX(%rax)
+	mov %rbp, VCPU_RBP(%rax)
+	mov %rsi, VCPU_RSI(%rax)
+	mov %rdi, VCPU_RDI(%rax)
+	mov %r10, VCPU_R10(%rax)
+	mov %r11, VCPU_R11(%rax)
+	mov %r12, VCPU_R12(%rax)
+	mov %r13, VCPU_R13(%rax)
+	mov %r14, VCPU_R14(%rax)
+	mov %r15, VCPU_R15(%rax)
+
+3:	mov %rcx, VCPU_RCX(%rax)
+	mov %rdx, VCPU_RDX(%rax)
+	mov %r8,  VCPU_R8 (%rax)
+	mov %r9,  VCPU_R9 (%rax)
+
+	/*
+	 * Clear all general purpose registers except RSP and RAX to prevent
+	 * speculative use of the guest's values.
+	 */
+	xor %rbx, %rbx
+	xor %rcx, %rcx
+	xor %rdx, %rdx
+	xor %rsi, %rsi
+	xor %rdi, %rdi
+	xor %rbp, %rbp
+	xor %r8,  %r8
+	xor %r9,  %r9
+	xor %r10, %r10
+	xor %r11, %r11
+	xor %r12, %r12
+	xor %r13, %r13
+	xor %r14, %r14
+	xor %r15, %r15
+
+	/* Restore the TD-Exit reason to RAX for return. */
+	pop %rax
+
+	/* "POP" @regs. */
+4:	add $8, %rsp
+	pop %rbx
+	pop %r12
+	pop %r13
+	pop %r14
+	pop %r15
+
+	pop %rbp
+	RET
+
+5:	cmpb $0, kvm_rebooting
+	je 6f
+	mov $-EFAULT, %rax
+	jmp 4b
+6:	ud2
+	_ASM_EXTABLE(2b, 5b)
+
+SYM_FUNC_END(__tdx_vcpu_run)
+
+.popsection
+
+#endif
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C2D29C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:05 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232289AbiI3KXD (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:03 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33868 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231556AbiI3KTG (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:06 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD9D1166F1C;
        Fri, 30 Sep 2022 03:19:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533144; x=1696069144;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=YNhnhoSI1URFHxgkBnw9FxeAp7yIy5x/VY6VY34Ms74=;
  b=DK5eZBgMXIpSd8vPCz7DuunEGde+xP7PGmAMP7ezbP23+9mIMKxFc5RX
   8ay7PPV8XjKplGVtyfCnhm78NFOdR6tWBw1wg+iwUnAmIiSXUpF5t/t5/
   CuHODaStppqIJWfKDvAtk/sGbQG2oUfKsena8c46V+jwUn1/w8tW+QG4E
   zK82LZbcJTg12J5chRmOxmZkzNojLBL+DEr6M3jMYkswUoj4DuEPT8Wiq
   iIyBYjIMVSIDFD2dBzWSjWk1iCEqRbdKt2yXwBPH/rKO798VRa6CXO3UN
   ZlNEHrJo1QTn2fw29QfFN9qcKtSDdl5X+1NyLkopE5PnxbTc8nd7Jt7tr
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="285294796"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="285294796"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:02 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807714"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807714"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:02 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 065/105] KVM: TDX: Implement TDX vcpu enter/exit path
Date: Fri, 30 Sep 2022 03:17:59 -0700
Message-Id: 
 <0605d036f158ec1d2a4ef3ea52c4041ee1175404.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This patch implements running TDX vcpu.  Once vcpu runs on the logical
processor (LP), the TDX vcpu is associated with it.  When the TDX vcpu
moves to another LP, the TDX vcpu needs to flush its status on the LP.
When destroying TDX vcpu, it needs to complete flush and flush cpu memory
cache.  Track which LP the TDX vcpu run and flush it as necessary.

Do nothing on sched_in event as TDX doesn't support pause loop.

TDX vcpu execution requires restoring PMU debug store after returning back
to KVM because the TDX module unconditionally resets the value.  To reuse
the existing code, export perf_restore_debug_store.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 21 +++++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 32 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h     | 33 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  2 ++
 arch/x86/kvm/x86.c         |  1 +
 5 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 10aacde3a40a..de01b3c79eca 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -100,6 +100,23 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool =
init_event)
 	return vmx_vcpu_reset(vcpu, init_event);
 }
=20
+static int vt_vcpu_pre_run(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		/* Unconditionally continue to vcpu_run(). */
+		return 1;
+
+	return vmx_vcpu_pre_run(vcpu);
+}
+
+static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_run(vcpu);
+
+	return vmx_vcpu_run(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -232,8 +249,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.flush_tlb_gva =3D vt_flush_tlb_gva,
 	.flush_tlb_guest =3D vt_flush_tlb_guest,
=20
-	.vcpu_pre_run =3D vmx_vcpu_pre_run,
-	.vcpu_run =3D vmx_vcpu_run,
+	.vcpu_pre_run =3D vt_vcpu_pre_run,
+	.vcpu_run =3D vt_vcpu_run,
 	.handle_exit =3D vmx_handle_exit,
 	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
 	.update_emulated_instruction =3D vmx_update_emulated_instruction,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6c1730443497..e5545608aea5 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -10,6 +10,9 @@
 #include "x86.h"
 #include "mmu.h"
=20
+#include <trace/events/kvm.h>
+#include "trace.h"
+
 #undef pr_fmt
 #define pr_fmt(fmt) "tdx: " fmt
=20
@@ -404,6 +407,35 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	vcpu->kvm->vm_bugged =3D true;
 }
=20
+u64 __tdx_vcpu_run(hpa_t tdvpr, void *regs, u32 regs_mask);
+
+static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
+					struct vcpu_tdx *tdx)
+{
+	guest_enter_irqoff();
+	tdx->exit_reason.full =3D __tdx_vcpu_run(tdx->tdvpr.pa, vcpu->arch.regs, =
0);
+	guest_exit_irqoff();
+}
+
+fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	if (unlikely(vcpu->kvm->vm_bugged)) {
+		tdx->exit_reason.full =3D TDX_NON_RECOVERABLE_VCPU;
+		return EXIT_FASTPATH_NONE;
+	}
+
+	trace_kvm_entry(vcpu);
+
+	tdx_vcpu_enter_exit(vcpu, tdx);
+
+	vcpu->arch.regs_avail &=3D ~VMX_REGS_LAZY_LOAD_SET;
+	trace_kvm_exit(vcpu, KVM_ISA_VMX);
+
+	return EXIT_FASTPATH_NONE;
+}
+
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
 {
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 8a2ad0b980e6..2c850297e8b2 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -45,12 +45,45 @@ struct kvm_tdx {
 	spinlock_t seamcall_lock;
 };
=20
+union tdx_exit_reason {
+	struct {
+		/* 31:0 mirror the VMX Exit Reason format */
+		u64 basic		: 16;
+		u64 reserved16		: 1;
+		u64 reserved17		: 1;
+		u64 reserved18		: 1;
+		u64 reserved19		: 1;
+		u64 reserved20		: 1;
+		u64 reserved21		: 1;
+		u64 reserved22		: 1;
+		u64 reserved23		: 1;
+		u64 reserved24		: 1;
+		u64 reserved25		: 1;
+		u64 bus_lock_detected	: 1;
+		u64 enclave_mode	: 1;
+		u64 smi_pending_mtf	: 1;
+		u64 smi_from_vmx_root	: 1;
+		u64 reserved30		: 1;
+		u64 failed_vmentry	: 1;
+
+		/* 63:32 are TDX specific */
+		u64 details_l1		: 8;
+		u64 class		: 8;
+		u64 reserved61_48	: 14;
+		u64 non_recoverable	: 1;
+		u64 error		: 1;
+	};
+	u64 full;
+};
+
 struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
=20
 	struct tdx_td_page tdvpr;
 	struct tdx_td_page *tdvpx;
=20
+	union tdx_exit_reason exit_reason;
+
 	bool vcpu_initialized;
=20
 	/*
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 7d6d9a6c2562..f28812b7bf98 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -147,6 +147,7 @@ void tdx_vm_free(struct kvm *kvm);
 int tdx_vcpu_create(struct kvm_vcpu *vcpu);
 void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -169,6 +170,7 @@ static inline void tdx_vm_free(struct kvm *kvm) {}
 static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTS=
UPP; }
 static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
+static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu) { return EXIT=
_FASTPATH_NONE; }
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f6f0a4b56263..7046bb601225 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -302,6 +302,7 @@ const struct kvm_stats_header kvm_vcpu_stats_header =3D=
 {
 };
=20
 u64 __read_mostly host_xcr0;
+EXPORT_SYMBOL_GPL(host_xcr0);
=20
 static struct kmem_cache *x86_emulator_cache;
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EA884C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:35 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232357AbiI3KXe (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:34 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33562 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231611AbiI3KTK (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:10 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 700DA1664AC;
        Fri, 30 Sep 2022 03:19:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533146; x=1696069146;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=xEq02T/uQIuvtk20d4tWbp/OW6HixVvgi/WsrUctB2M=;
  b=I2Bcxp5eF0sZqmey0HLpaSqocni/OZr6wIklOxFh9RAqhsefzVoVhJ5Z
   RORPMklGZEaFUrLJKHQzhZCWrAaOSnOd7PUUoIO7nuR/LFDHxC1TO4xxP
   hUsJVsacKYRF9kvALHRC2dE0otnwz4cFTpaicAjvX1K4zlRL6nUlek5Wf
   h3ZUoy60OMnPtqWD/dkcPkuIDFbQ6GMsCQZIQ15R75u6iZvgf5eb5+r1j
   0XglukZrGifKNm0+Zs/GIWdeUiXLyRR+7deYGJRj3TE6Lq9dSwsxrSfTx
   gxcrWgoXNZl4s+CHyI9GeG36J4erf40gNxep1HE6fPR1i1R+AWZkyq4Aj
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="285294798"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="285294798"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:02 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807718"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807718"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:02 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 066/105] KVM: TDX: vcpu_run: save/restore host state(host
 kernel gs)
Date: Fri, 30 Sep 2022 03:18:00 -0700
Message-Id: 
 <1b16f328c88654df8fda2a5e40f26e7bf14d08a0.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

On entering/exiting TDX vcpu, Preserved or clobbered CPU state is different
from VMX case.  Add TDX hooks to save/restore host/guest CPU state.
Save/restore kernel GS base MSR.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/vmx/main.c         | 28 ++++++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c          | 42 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h          |  4 ++++
 arch/x86/kvm/vmx/x86_ops.h      |  4 ++++
 arch/x86/kvm/x86.c              | 10 ++++++--
 6 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index b9ebe82a4c37..18224a3b59a5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2074,6 +2074,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ip=
i_bitmap_low,
=20
 int kvm_add_user_return_msr(u32 msr);
 int kvm_find_user_return_msr(u32 msr);
+void kvm_user_return_msr_init_cpu(void);
 int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
=20
 static inline bool kvm_is_supported_user_return_msr(u32 msr)
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index de01b3c79eca..44f9fc9e987b 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -100,6 +100,30 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool =
init_event)
 	return vmx_vcpu_reset(vcpu, init_event);
 }
=20
+static void vt_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * All host state is saved/restored across SEAMCALL/SEAMRET, and the
+	 * guest state of a TD is obviously off limits.  Deferring MSRs and DRs
+	 * is pointless because the TDX module needs to load *something* so as
+	 * not to expose guest state.
+	 */
+	if (is_td_vcpu(vcpu)) {
+		tdx_prepare_switch_to_guest(vcpu);
+		return;
+	}
+
+	vmx_prepare_switch_to_guest(vcpu);
+}
+
+static void vt_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_put(vcpu);
+
+	return vmx_vcpu_put(vcpu);
+}
+
 static int vt_vcpu_pre_run(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -214,9 +238,9 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vcpu_free =3D vt_vcpu_free,
 	.vcpu_reset =3D vt_vcpu_reset,
=20
-	.prepare_switch_to_guest =3D vmx_prepare_switch_to_guest,
+	.prepare_switch_to_guest =3D vt_prepare_switch_to_guest,
 	.vcpu_load =3D vmx_vcpu_load,
-	.vcpu_put =3D vmx_vcpu_put,
+	.vcpu_put =3D vt_vcpu_put,
=20
 	.update_exception_bitmap =3D vmx_update_exception_bitmap,
 	.get_msr_feature =3D vmx_get_msr_feature,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index e5545608aea5..869cb9952773 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/cpu.h>
+#include <linux/mmu_context.h>
=20
 #include <asm/tdx.h>
=20
@@ -321,6 +322,8 @@ int tdx_vm_init(struct kvm *kvm)
=20
 int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
 	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
 	if (!vcpu->arch.apic)
 		return -EINVAL;
@@ -337,9 +340,46 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.guest_state_protected =3D
 		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
=20
+	tdx->host_state_need_save =3D true;
+	tdx->host_state_need_restore =3D false;
+
 	return 0;
 }
=20
+void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	kvm_user_return_msr_init_cpu();
+	if (!tdx->host_state_need_save)
+		return;
+
+	if (likely(is_64bit_mm(current->mm)))
+		tdx->msr_host_kernel_gs_base =3D current->thread.gsbase;
+	else
+		tdx->msr_host_kernel_gs_base =3D read_msr(MSR_KERNEL_GS_BASE);
+
+	tdx->host_state_need_save =3D false;
+}
+
+static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	tdx->host_state_need_save =3D true;
+	if (!tdx->host_state_need_restore)
+		return;
+
+	wrmsrl(MSR_KERNEL_GS_BASE, tdx->msr_host_kernel_gs_base);
+	tdx->host_state_need_restore =3D false;
+}
+
+void tdx_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	vmx_vcpu_pi_put(vcpu);
+	tdx_prepare_switch_to_host(vcpu);
+}
+
 void tdx_vcpu_free(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
@@ -430,6 +470,8 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
+	tdx->host_state_need_restore =3D true;
+
 	vcpu->arch.regs_avail &=3D ~VMX_REGS_LAZY_LOAD_SET;
 	trace_kvm_exit(vcpu, KVM_ISA_VMX);
=20
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 2c850297e8b2..caf837d7f64d 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -86,6 +86,10 @@ struct vcpu_tdx {
=20
 	bool vcpu_initialized;
=20
+	bool host_state_need_save;
+	bool host_state_need_restore;
+	u64 msr_host_kernel_gs_base;
+
 	/*
 	 * Dummy to make pmu_intel not corrupt memory.
 	 * TODO: Support PMU for TDX.  Future work.
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index f28812b7bf98..de94aa189268 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -148,6 +148,8 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu);
 void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
 fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu);
+void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
+void tdx_vcpu_put(struct kvm_vcpu *vcpu);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -171,6 +173,8 @@ static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu=
) { return -EOPNOTSUPP; }
 static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
 static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu) { return EXIT=
_FASTPATH_NONE; }
+static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
+static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7046bb601225..9254d2e72c56 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -413,7 +413,7 @@ int kvm_find_user_return_msr(u32 msr)
 }
 EXPORT_SYMBOL_GPL(kvm_find_user_return_msr);
=20
-static void kvm_user_return_msr_init_cpu(struct kvm_user_return_msrs *msrs)
+static void __kvm_user_return_msr_init_cpu(struct kvm_user_return_msrs *ms=
rs)
 {
 	u64 value;
 	int i;
@@ -429,12 +429,18 @@ static void kvm_user_return_msr_init_cpu(struct kvm_u=
ser_return_msrs *msrs)
 	msrs->initialized =3D true;
 }
=20
+void kvm_user_return_msr_init_cpu(void)
+{
+	__kvm_user_return_msr_init_cpu(this_cpu_ptr(user_return_msrs));
+}
+EXPORT_SYMBOL_GPL(kvm_user_return_msr_init_cpu);
+
 int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
 {
 	struct kvm_user_return_msrs *msrs =3D this_cpu_ptr(user_return_msrs);
 	int err;
=20
-	kvm_user_return_msr_init_cpu(msrs);
+	__kvm_user_return_msr_init_cpu(msrs);
=20
 	value =3D (value & mask) | (msrs->values[slot].host & ~mask);
 	if (value =3D=3D msrs->values[slot].curr)
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8346FC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232384AbiI3KXq (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:46 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33604 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231634AbiI3KTQ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:16 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FDE71B9135;
        Fri, 30 Sep 2022 03:19:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533146; x=1696069146;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Lsc7RkBMrBtJ4cW2rT1N+hJ1KRQN4P+cMhYEroo2Rf8=;
  b=Pxs1WUlg/irsVgFsMx39lRcGfyzcbleb6KC2gPdmSMh+sR1ffIrcWNoP
   eW01PWxzQqcAzD1g+EOr2rmHGXE5GU59pks6UZAUWHwDnqBmUsmIXV5zr
   5e3Znqv4B1uVkO8fbns1io9TNOzPihemTBs/4TtdxGcsJo8nfoJDSeXGx
   uWbUBfJPkr6nzgIjJOO7aQEvW3kKA5WQw6smGh9AWIt7RMxNDPUA3vr1E
   qFgbeeoGXNSybM4my3Kj89VjihpIJMhc6MZMQ+Mz2losKRycZTJgq99Ix
   XLza+QgQYQmusvl+R8jIZX7xHaCwFRF1GxLE2V7V8oNG379Gde/KWTaio
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="285294800"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="285294800"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:02 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807722"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807722"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:02 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 067/105] KVM: TDX: restore host xsave state when exit from
 the guest TD
Date: Fri, 30 Sep 2022 03:18:01 -0700
Message-Id: 
 <ed506a75e4a8d759fac57ac120188f24d7e9e67e.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

On exiting from the guest TD, xsave state is clobbered.  Restore xsave
state on TD exit.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 869cb9952773..9261da8e6236 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2,6 +2,7 @@
 #include <linux/cpu.h>
 #include <linux/mmu_context.h>
=20
+#include <asm/fpu/xcr.h>
 #include <asm/tdx.h>
=20
 #include "capabilities.h"
@@ -447,6 +448,22 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	vcpu->kvm->vm_bugged =3D true;
 }
=20
+static void tdx_restore_host_xsave_state(struct kvm_vcpu *vcpu)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+
+	if (static_cpu_has(X86_FEATURE_XSAVE) &&
+	    host_xcr0 !=3D (kvm_tdx->xfam & kvm_caps.supported_xcr0))
+		xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);
+	if (static_cpu_has(X86_FEATURE_XSAVES) &&
+	    /* PT can be exposed to TD guest regardless of KVM's XSS support */
+	    host_xss !=3D (kvm_tdx->xfam & (kvm_caps.supported_xss | XFEATURE_MAS=
K_PT)))
+		wrmsrl(MSR_IA32_XSS, host_xss);
+	if (static_cpu_has(X86_FEATURE_PKU) &&
+	    (kvm_tdx->xfam & XFEATURE_MASK_PKRU))
+		write_pkru(vcpu->arch.host_pkru);
+}
+
 u64 __tdx_vcpu_run(hpa_t tdvpr, void *regs, u32 regs_mask);
=20
 static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
@@ -470,6 +487,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
+	tdx_restore_host_xsave_state(vcpu);
 	tdx->host_state_need_restore =3D true;
=20
 	vcpu->arch.regs_avail &=3D ~VMX_REGS_LAZY_LOAD_SET;
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8E968C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:24:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232263AbiI3KYv (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:24:51 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34886 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231698AbiI3KTf (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:35 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCDD21D13B4;
        Fri, 30 Sep 2022 03:19:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533147; x=1696069147;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=l0Ac5IexnuZLAZe++dN+UELVX47OgBp4i6lJyLywpTc=;
  b=BtXd1hDWwqNX6BRXVz1IJAjo6aO3yZCziubZNDJPJrI0ZVw5R0xB+aUn
   lqyF+u8cpqjqSivLS1E5+FW8Vl9U5E+ptFEcN4Y6PLFP2KbP6k0eOFdo4
   SeakwhsFJVBdLGiYeLFBAQDtlCF0WNLlJGEBW+gwI67xJy+wGtLhJWUGN
   TnJ9iXAkvejSEumZDFBTgUGnyXnD1RqBzUHhurOkgXnsdvSuwm9ggnNct
   deGEHGE1fjJUNSLypwBJno1Ubm5OKFX84vYaTOeI+YemU/jShDinD5zqf
   VRAqtGrjX/u+b+X0WOA9MgCD1DqtM8xySdmKfcn3dVQpBKt1xCqW0oSH8
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="285294801"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="285294801"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:03 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807725"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807725"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:02 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>, Chao Gao <chao.gao@intel.com>
Subject: [PATCH v9 068/105] KVM: x86: Allow to update cached values in
 kvm_user_return_msrs w/o wrmsr
Date: Fri, 30 Sep 2022 03:18:02 -0700
Message-Id: 
 <50af57e5a5845c408f42a5fce44b3907e6e36286.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Chao Gao <chao.gao@intel.com>

Several MSRs are constant and only used in userspace(ring 3).  But VMs may
have different values.  KVM uses kvm_set_user_return_msr() to switch to
guest's values and leverages user return notifier to restore them when the
kernel is to return to userspace.  To eliminate unnecessary wrmsr, KVM also
caches the value it wrote to an MSR last time.

TDX module unconditionally resets some of these MSRs to architectural INIT
state on TD exit.  It makes the cached values in kvm_user_return_msrs are
inconsistent with values in hardware.  This inconsistency needs to be
fixed.  Otherwise, it may mislead kvm_on_user_return() to skip restoring
some MSRs to the host's values.  kvm_set_user_return_msr() can help correct
this case, but it is not optimal as it always does a wrmsr.  So, introduce
a variation of kvm_set_user_return_msr() to update cached values and skip
that wrmsr.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 26 +++++++++++++++++++++-----
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 18224a3b59a5..e772798684ae 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2076,6 +2076,7 @@ int kvm_add_user_return_msr(u32 msr);
 int kvm_find_user_return_msr(u32 msr);
 void kvm_user_return_msr_init_cpu(void);
 int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
+void kvm_user_return_update_cache(unsigned int index, u64 val);
=20
 static inline bool kvm_is_supported_user_return_msr(u32 msr)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9254d2e72c56..8160f51bbb92 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -435,6 +435,15 @@ void kvm_user_return_msr_init_cpu(void)
 }
 EXPORT_SYMBOL_GPL(kvm_user_return_msr_init_cpu);
=20
+static void kvm_user_return_register_notifier(struct kvm_user_return_msrs =
*msrs)
+{
+	if (!msrs->registered) {
+		msrs->urn.on_user_return =3D kvm_on_user_return;
+		user_return_notifier_register(&msrs->urn);
+		msrs->registered =3D true;
+	}
+}
+
 int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
 {
 	struct kvm_user_return_msrs *msrs =3D this_cpu_ptr(user_return_msrs);
@@ -450,15 +459,22 @@ int kvm_set_user_return_msr(unsigned slot, u64 value,=
 u64 mask)
 		return 1;
=20
 	msrs->values[slot].curr =3D value;
-	if (!msrs->registered) {
-		msrs->urn.on_user_return =3D kvm_on_user_return;
-		user_return_notifier_register(&msrs->urn);
-		msrs->registered =3D true;
-	}
+	kvm_user_return_register_notifier(msrs);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_set_user_return_msr);
=20
+/* Update the cache, "curr", and register the notifier */
+void kvm_user_return_update_cache(unsigned int slot, u64 value)
+{
+	struct kvm_user_return_msrs *msrs =3D this_cpu_ptr(user_return_msrs);
+
+	WARN_ON_ONCE(!msrs->initialized);
+	msrs->values[slot].curr =3D value;
+	kvm_user_return_register_notifier(msrs);
+}
+EXPORT_SYMBOL_GPL(kvm_user_return_update_cache);
+
 static void drop_user_return_notifiers(void)
 {
 	struct kvm_user_return_msrs *msrs =3D this_cpu_ptr(user_return_msrs);
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0CC02C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:25:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232455AbiI3KY6 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:24:58 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33596 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231713AbiI3KTk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:40 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF79B1EC9B3;
        Fri, 30 Sep 2022 03:19:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533149; x=1696069149;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ZdzV5B5bu+VfmCI0UsnnENsiM0x1p02Gl2Piniw8mow=;
  b=Hv4wNB0WauC+N/1XH12aHJViQ1NcAcJWbS8YblAtu8/2q5tz2M4SXs8S
   +4DTZgOubeqtbGhUoIwlzl+6Agm9RPGW5fuel9OdmM7cwuBu24fj1OnIC
   wEzX9kkfF1s9fsmjywK5/yh7kvzwSz2rHSqHNY737xhrIwT5cjUuxWhB2
   WiWXzY1hhvoCUmfBoEodaiLmPviC7QrMQ8IBrePG4jnCTu9ubXgjjeCv3
   11VwPEVk/lnCxjHIM6tU4Cmhq3yd4ivpa04m6WOFipx6QkaytM5Pc1Z9A
   tCGaVq+kD+XvfIpLsIV0AZw2v6yeNOcoUY2hoAmOS0qVNOGC76W6KJXi9
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="285294803"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="285294803"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:03 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807728"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807728"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:03 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 069/105] KVM: TDX: restore user ret MSRs
Date: Fri, 30 Sep 2022 03:18:03 -0700
Message-Id: 
 <aba44f7d87c388fc3c42aa9793826e9d2b82f7c0.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Several user ret MSRs are clobbered on TD exit.  Restore those values on
TD exit and before returning to ring 3.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 43 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 9261da8e6236..b6fdfc5135e6 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -448,6 +448,28 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	vcpu->kvm->vm_bugged =3D true;
 }
=20
+struct tdx_uret_msr {
+	u32 msr;
+	unsigned int slot;
+	u64 defval;
+};
+
+static struct tdx_uret_msr tdx_uret_msrs[] =3D {
+	{.msr =3D MSR_SYSCALL_MASK,},
+	{.msr =3D MSR_STAR,},
+	{.msr =3D MSR_LSTAR,},
+	{.msr =3D MSR_TSC_AUX,},
+};
+
+static void tdx_user_return_update_cache(void)
+{
+	int i;
+
+	for (i =3D 0; i < ARRAY_SIZE(tdx_uret_msrs); i++)
+		kvm_user_return_update_cache(tdx_uret_msrs[i].slot,
+					     tdx_uret_msrs[i].defval);
+}
+
 static void tdx_restore_host_xsave_state(struct kvm_vcpu *vcpu)
 {
 	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
@@ -487,6 +509,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
+	tdx_user_return_update_cache();
 	tdx_restore_host_xsave_state(vcpu);
 	tdx->host_state_need_restore =3D true;
=20
@@ -1526,6 +1549,26 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x8=
6_ops)
 		return -ENODEV;
 	}
=20
+	for (i =3D 0; i < ARRAY_SIZE(tdx_uret_msrs); i++) {
+		/*
+		 * Here it checks if MSRs (tdx_uret_msrs) can be saved/restored
+		 * before returning to user space.
+		 *
+		 * this_cpu_ptr(user_return_msrs)->registered isn't checked
+		 * because the registration is done at vcpu runtime by
+		 * kvm_set_user_return_msr().
+		 * Here is setting up cpu feature before running vcpu,
+		 * registered is alreays false.
+		 */
+		tdx_uret_msrs[i].slot =3D kvm_find_user_return_msr(tdx_uret_msrs[i].msr);
+		if (tdx_uret_msrs[i].slot =3D=3D -1) {
+			/* If any MSR isn't supported, it is a KVM bug */
+			pr_err("MSR %x isn't included by kvm_find_user_return_msr\n",
+				tdx_uret_msrs[i].msr);
+			return -EIO;
+		}
+	}
+
 	max_pkgs =3D topology_max_packages();
 	tdx_mng_key_config_lock =3D kcalloc(max_pkgs, sizeof(*tdx_mng_key_config_=
lock),
 				   GFP_KERNEL);
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3CF91C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:25:41 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232517AbiI3KZk (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:25:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33562 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231857AbiI3KUj (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:20:39 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B802166F1E;
        Fri, 30 Sep 2022 03:19:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533150; x=1696069150;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=JDJD5HxBG16VKKz5Tw7vIJeGTJe3WzrVqBHL0IAN/Ng=;
  b=QMYsvEnLSqxoD06kly49iVtIQcH/7v8ebXIevD57RDpfiwQ045QETaY1
   +tmI2ZQXHuoifp3SiUyam9m4FKY9LJWpOg/RXxmYfUb5qe4Kwrgi15GHW
   mxDAk9ZGYBem5lqyE5NjD8aoc4JF1shAxoQN8Wn/v5xC3Yvd+e8xklVbn
   BAbqLu3ZcYFZGKGernBWWiSYJcj5T4cAj5e+edd0moRGHZ7JU12AB/rPw
   kEj7KW81H9pb6DKwmVHmLNenu7ELazJspnpKcv8C1J2Ss6pgPR+/eG7J8
   ukk0H5574WTgYB/puh3EIVIn+uu1duvbSRebK/A00nf7DLhItiyzoulXC
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="285294804"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="285294804"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:03 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807731"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807731"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:03 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 070/105] [MARKER] The start of TDX KVM patch series: TD
 vcpu exits/interrupts/hypercalls
Date: Fri, 30 Sep 2022 03:18:04 -0700
Message-Id: 
 <48a7a7980df96d9a99027e959f10748953e4f8fb.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD vcpu
exits, interrupts, and hypercalls.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index b51e8e6b1541..1cec14213f69 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -13,6 +13,7 @@ What qemu can do
 - Qemu can create/destroy vcpu of TDX vm type.
 - Qemu can populate initial guest memory image.
 - Qemu can finalize guest TD.
+- Qemu can start to run vcpu. But vcpu can not make progress yet.
=20
 Patch Layer status
 ------------------
@@ -23,7 +24,7 @@ Patch Layer status
 * TD vcpu creation/destruction:         Applied
 * TDX EPT violation:                    Applied
 * TD finalization:                      Applied
-* TD vcpu enter/exit:                   Applying
+* TD vcpu enter/exit:                   Applied
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A40A5C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:22:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232261AbiI3KWw (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:22:52 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33850 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231552AbiI3KTG (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:06 -0400
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B3D115ED22;
        Fri, 30 Sep 2022 03:19:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533144; x=1696069144;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=sYjSgzyHZOauPeGBFjFdTepSx4d5F989a1T77Uo0A2o=;
  b=Mn6TAA9VC5Ynia/ez3OAQlJAKo2c1MCBJPDRJSjw9owFAovkXl0T7Als
   rOJR7KR65QbgkYA/hoReEmYqtONChteAwd4x3Rqkl79tfgNKM1/PMyUFZ
   qmdOURZ1SNvyM8Joonh5tCcbju1XYdGmC8GuS1zNNM5lFXy5GTRhJ1A3I
   64GPQ1ocpMz3AVxohw5CcWShhFw+kGxy7N6ZwYj3823rqZ/nNM4XQ3j9M
   WKWxw+hyxsd3QuSbHHwbNtQPagrW68w/dkpe6h1llTZkbjGV0ZmwnSWQy
   mSnRaSVH679M1BYM/0FnL/Kerpv1AVbb4Zsc2vPqnojpCU0aagWOmwdka
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289320474"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="289320474"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:03 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807734"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807734"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:03 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 071/105] KVM: TDX: complete interrupts after tdexit
Date: Fri, 30 Sep 2022 03:18:05 -0700
Message-Id: 
 <99f9671e2dca2e8272c5d624b83a1571970b0b41.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This corresponds to VMX __vmx_complete_interrupts().  Because TDX
virtualize vAPIC, KVM only needs to care NMI injection.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b6fdfc5135e6..44c8bdb5b1d0 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -448,6 +448,14 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	vcpu->kvm->vm_bugged =3D true;
 }
=20
+static void tdx_complete_interrupts(struct kvm_vcpu *vcpu)
+{
+	/* Avoid costly SEAMCALL if no nmi was injected */
+	if (vcpu->arch.nmi_injected)
+		vcpu->arch.nmi_injected =3D td_management_read8(to_tdx(vcpu),
+							      TD_VCPU_PEND_NMI);
+}
+
 struct tdx_uret_msr {
 	u32 msr;
 	unsigned int slot;
@@ -516,6 +524,8 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
 	vcpu->arch.regs_avail &=3D ~VMX_REGS_LAZY_LOAD_SET;
 	trace_kvm_exit(vcpu, KVM_ISA_VMX);
=20
+	tdx_complete_interrupts(vcpu);
+
 	return EXIT_FASTPATH_NONE;
 }
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B772FC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231761AbiI3KXn (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:43 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33994 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231612AbiI3KTK (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:10 -0400
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7726D166F24;
        Fri, 30 Sep 2022 03:19:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533146; x=1696069146;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=iwkpJTduFD9/3UH8yg9inNZXlE0NCoCcELRd9mHB0tc=;
  b=XmxUge/5q2F+bG1txYiJKl+VUVpZpnrFRTKbfgt1hr3wqkwjGDMXca7A
   Z/OeJ2JpyxiH+qqh05gO1SKeMKS1jmjNX0xVtU2/z1bkJVQh/cq0nKbKZ
   NNqGJkzc3TkxwFX1EDY4eb5Ro1XDGKafl4kiii91d4NJuObZrJSnXaC7B
   wZ6wCxxuD7Qgkg8Lci5ZaH6sDWC8NKGejssZl07C+BfBkJGTb/dpHYX2B
   psiawRNvS9iJxfrGKa9EnvVi+URSdhBT5cv4U4ddIuoG5wYly0aSBHsfp
   YBCc00Pmqz+wgNtmzz2EDKerxJr0Lp1a85bR+0ImRJunnlYrfzHGOtku6
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289320476"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="289320476"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:03 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807737"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807737"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:03 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 072/105] KVM: TDX: restore debug store when TD exit
Date: Fri, 30 Sep 2022 03:18:06 -0700
Message-Id: 
 <2e377d0a655a08718b43d0064369b6d0532a5917.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because debug store is clobbered, restore it on TD exit.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/events/intel/ds.c | 1 +
 arch/x86/kvm/vmx/tdx.c     | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index de1f55d51784..e80d25ab4bcf 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2308,3 +2308,4 @@ void perf_restore_debug_store(void)
=20
 	wrmsrl(MSR_IA32_DS_AREA, (unsigned long)ds);
 }
+EXPORT_SYMBOL_GPL(perf_restore_debug_store);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 44c8bdb5b1d0..407216512729 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -518,6 +518,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
 	tdx_user_return_update_cache();
+	perf_restore_debug_store();
 	tdx_restore_host_xsave_state(vcpu);
 	tdx->host_state_need_restore =3D true;
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7BFBEC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:24:06 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232210AbiI3KYE (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:24:04 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33560 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231649AbiI3KTS (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:18 -0400
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26AF31C6A7D;
        Fri, 30 Sep 2022 03:19:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533147; x=1696069147;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=GqvFey6xb19YQgsYnj21x/0rfMZsOIrkovFD+ChU914=;
  b=JopKo0wM4gy0RfmhQ0G99oWjrHjh8kj4ZX7MNqdGAorybt6J7Il7OAD0
   i+9xIF+yVw1Ix1dai4UUFfPRsGeNpjhP+CQbyRcOlfF3n4OSVTIyPi/jk
   NqwLqAqD6YIy3h9cc3RzgNQAHpX/cmrSVqGbGCADvov0I/JdP/pr3FfB2
   QSR2TRnKQGuaA9TnrRah2K3glxgotTLsZtGoXFtqMAeNlnctvrEpYPkdd
   Pqy/hhxtWVmGj8yUCUiUiiJkpjqO1iDbs8YTLiAPZeZ3rC47Ud9UnQW4A
   hEdBQTdaLlkCIKfkW6Co7mOFXrc7CLOwzEzqjG4jM4IFCdCQF6j3n8h3z
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289320477"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="289320477"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:03 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807740"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807740"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:03 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 073/105] KVM: TDX: handle vcpu migration over logical
 processor
Date: Fri, 30 Sep 2022 03:18:07 -0700
Message-Id: 
 <924d20b1a894c9a94502fea037954194dc229ecf.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

For vcpu migration, in the case of VMX, VCMS is flushed on the source pcpu,
and load it on the target pcpu.  There are corresponding TDX SEAMCALL APIs,
call them on vcpu migration.  The logic is mostly same as VMX except the
TDX SEAMCALLs are used.

When shutting down the machine, (VMX or TDX) vcpus needs to be shutdown on
each pcpu.  Do the similar for TDX with TDX SEAMCALL APIs.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    |  43 +++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 121 +++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h     |   2 +
 arch/x86/kvm/vmx/x86_ops.h |   6 ++
 4 files changed, 168 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 44f9fc9e987b..b60e113696c0 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -17,6 +17,25 @@ static bool vt_is_vm_type_supported(unsigned long type)
 		(enable_tdx && tdx_is_vm_type_supported(type));
 }
=20
+static int vt_hardware_enable(void)
+{
+	int ret;
+
+	ret =3D vmx_hardware_enable();
+	if (ret)
+		return ret;
+
+	tdx_hardware_enable();
+	return 0;
+}
+
+static void vt_hardware_disable(void)
+{
+	/* Note, TDX *and* VMX need to be disabled if TDX is enabled. */
+	tdx_hardware_disable();
+	vmx_hardware_disable();
+}
+
 static __init int vt_hardware_setup(void)
 {
 	int ret;
@@ -141,6 +160,14 @@ static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu)
 	return vmx_vcpu_run(vcpu);
 }
=20
+static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_load(vcpu, cpu);
+
+	return vmx_vcpu_load(vcpu, cpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -199,6 +226,14 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa=
_t root_hpa,
 	vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
 }
=20
+static void vt_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_sched_in(vcpu, cpu);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -222,8 +257,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.offline_cpu =3D tdx_offline_cpu,
 	.check_processor_compatibility =3D vmx_check_processor_compatibility,
=20
-	.hardware_enable =3D vmx_hardware_enable,
-	.hardware_disable =3D vmx_hardware_disable,
+	.hardware_enable =3D vt_hardware_enable,
+	.hardware_disable =3D vt_hardware_disable,
 	.has_emulated_msr =3D vmx_has_emulated_msr,
=20
 	.is_vm_type_supported =3D vt_is_vm_type_supported,
@@ -239,7 +274,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vcpu_reset =3D vt_vcpu_reset,
=20
 	.prepare_switch_to_guest =3D vt_prepare_switch_to_guest,
-	.vcpu_load =3D vmx_vcpu_load,
+	.vcpu_load =3D vt_vcpu_load,
 	.vcpu_put =3D vt_vcpu_put,
=20
 	.update_exception_bitmap =3D vmx_update_exception_bitmap,
@@ -327,7 +362,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.request_immediate_exit =3D vmx_request_immediate_exit,
=20
-	.sched_in =3D vmx_sched_in,
+	.sched_in =3D vt_sched_in,
=20
 	.cpu_dirty_log_size =3D PML_ENTITY_NUM,
 	.update_cpu_dirty_logging =3D vmx_update_cpu_dirty_logging,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 407216512729..51de41fbe098 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -51,6 +51,14 @@ static DEFINE_MUTEX(tdx_lock);
 static struct mutex *tdx_mng_key_config_lock;
 static atomic_t nr_configured_hkid;
=20
+/*
+ * A per-CPU list of TD vCPUs associated with a given CPU.  Used when a CPU
+ * is brought down to invoke TDH_VP_FLUSH on the approapriate TD vCPUS.
+ * Protected by interrupt mask.  This list is manipulated in process conte=
xt
+ * of vcpu and IPI callback.  See tdx_flush_vp_on_cpu().
+ */
+static DEFINE_PER_CPU(struct list_head, associated_tdvcpus);
+
 static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
 {
 	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
@@ -82,6 +90,36 @@ static inline bool is_td_finalized(struct kvm_tdx *kvm_t=
dx)
 	return kvm_tdx->finalized;
 }
=20
+static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu)
+{
+	list_del(&to_tdx(vcpu)->cpu_list);
+
+	/*
+	 * Ensure tdx->cpu_list is updated is before setting vcpu->cpu to -1,
+	 * otherwise, a different CPU can see vcpu->cpu =3D -1 and add the vCPU
+	 * to its list before its deleted from this CPUs list.
+	 */
+	smp_wmb();
+
+	vcpu->cpu =3D -1;
+}
+
+void tdx_hardware_enable(void)
+{
+	INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, raw_smp_processor_id()));
+}
+
+void tdx_hardware_disable(void)
+{
+	int cpu =3D raw_smp_processor_id();
+	struct list_head *tdvcpus =3D &per_cpu(associated_tdvcpus, cpu);
+	struct vcpu_tdx *tdx, *tmp;
+
+	/* Safe variant needed as tdx_disassociate_vp() deletes the entry. */
+	list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list)
+		tdx_disassociate_vp(&tdx->vcpu);
+}
+
 static void tdx_clear_page(unsigned long page)
 {
 	const void *zero_page =3D (const void *) __va(page_to_phys(ZERO_PAGE(0)));
@@ -164,6 +202,41 @@ static void tdx_reclaim_td_page(struct tdx_td_page *pa=
ge)
 	}
 }
=20
+static void tdx_flush_vp(void *arg)
+{
+	struct kvm_vcpu *vcpu =3D arg;
+	u64 err;
+
+	lockdep_assert_irqs_disabled();
+
+	/* Task migration can race with CPU offlining. */
+	if (vcpu->cpu !=3D raw_smp_processor_id())
+		return;
+
+	/*
+	 * No need to do TDH_VP_FLUSH if the vCPU hasn't been initialized.  The
+	 * list tracking still needs to be updated so that it's correct if/when
+	 * the vCPU does get initialized.
+	 */
+	if (is_td_vcpu_created(to_tdx(vcpu))) {
+		err =3D tdh_vp_flush(to_tdx(vcpu)->tdvpr.pa);
+		if (unlikely(err && err !=3D TDX_VCPU_NOT_ASSOCIATED)) {
+			if (WARN_ON_ONCE(err))
+				pr_tdx_error(TDH_VP_FLUSH, err, NULL);
+		}
+	}
+
+	tdx_disassociate_vp(vcpu);
+}
+
+static void tdx_flush_vp_on_cpu(struct kvm_vcpu *vcpu)
+{
+	if (unlikely(vcpu->cpu =3D=3D -1))
+		return;
+
+	smp_call_function_single(vcpu->cpu, tdx_flush_vp, vcpu, 1);
+}
+
 static int tdx_do_tdh_phymem_cache_wb(void *param)
 {
 	u64 err =3D 0;
@@ -188,6 +261,8 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
 	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
 	cpumask_var_t packages;
 	bool cpumask_allocated;
+	struct kvm_vcpu *vcpu;
+	unsigned long j;
 	u64 err;
 	int ret;
 	int i;
@@ -198,6 +273,19 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
 	if (!is_td_created(kvm_tdx))
 		goto free_hkid;
=20
+	kvm_for_each_vcpu(j, vcpu, kvm)
+		tdx_flush_vp_on_cpu(vcpu);
+
+	mutex_lock(&tdx_lock);
+	err =3D tdh_mng_vpflushdone(kvm_tdx->tdr.pa);
+	mutex_unlock(&tdx_lock);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_VPFLUSHDONE, err, NULL);
+		pr_err("tdh_mng_vpflushdone failed. HKID %d is leaked.\n",
+			kvm_tdx->hkid);
+		return;
+	}
+
 	cpumask_allocated =3D zalloc_cpumask_var(&packages, GFP_KERNEL);
 	cpus_read_lock();
 	for_each_online_cpu(i) {
@@ -347,6 +435,26 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	return 0;
 }
=20
+void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	if (vcpu->cpu =3D=3D cpu)
+		return;
+
+	tdx_flush_vp_on_cpu(vcpu);
+
+	local_irq_disable();
+	/*
+	 * Pairs with the smp_wmb() in tdx_disassociate_vp() to ensure
+	 * vcpu->cpu is read before tdx->cpu_list.
+	 */
+	smp_rmb();
+
+	list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu));
+	local_irq_enable();
+}
+
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
@@ -397,6 +505,19 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu)
 		tdx->tdvpx =3D NULL;
 	}
 	tdx_reclaim_td_page(&tdx->tdvpr);
+
+	/*
+	 * kvm_free_vcpus()
+	 *   -> kvm_unload_vcpu_mmu()
+	 *
+	 * does vcpu_load() for every vcpu after they already disassociated
+	 * from the per cpu list when tdx_vm_teardown(). So we need to
+	 * disassociate them again, otherwise the freed vcpu data will be
+	 * accessed when do list_{del,add}() on associated_tdvcpus list
+	 * later.
+	 */
+	tdx_flush_vp_on_cpu(vcpu);
+	WARN_ON_ONCE(vcpu->cpu !=3D -1);
 }
=20
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index caf837d7f64d..8b425e7415c2 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -82,6 +82,8 @@ struct vcpu_tdx {
 	struct tdx_td_page tdvpr;
 	struct tdx_td_page *tdvpx;
=20
+	struct list_head cpu_list;
+
 	union tdx_exit_reason exit_reason;
=20
 	bool vcpu_initialized;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index de94aa189268..36a2956ba677 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -138,6 +138,8 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_o=
ps);
 bool tdx_is_vm_type_supported(unsigned long type);
 void tdx_hardware_unsetup(void);
 int tdx_offline_cpu(void);
+void tdx_hardware_enable(void);
+void tdx_hardware_disable(void);
 int tdx_dev_ioctl(void __user *argp);
=20
 int tdx_vm_init(struct kvm *kvm);
@@ -150,6 +152,7 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_ev=
ent);
 fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu);
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void tdx_vcpu_put(struct kvm_vcpu *vcpu);
+void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -162,6 +165,8 @@ static inline int tdx_hardware_setup(struct kvm_x86_ops=
 *x86_ops) { return 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
 static inline void tdx_hardware_unsetup(void) {}
 static inline int tdx_offline_cpu(void) { return 0; }
+static inline void tdx_hardware_enable(void) {}
+static inline void tdx_hardware_disable(void) {}
 static inline int tdx_dev_ioctl(void __user *argp) { return -EOPNOTSUPP; };
=20
 static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
@@ -175,6 +180,7 @@ static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu=
, bool init_event) {}
 static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu) { return EXIT=
_FASTPATH_NONE; }
 static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
+static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6768BC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:25:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232475AbiI3KZL (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:25:11 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35210 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231721AbiI3KTk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:40 -0400
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF2DA15348B;
        Fri, 30 Sep 2022 03:19:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533150; x=1696069150;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=qkZramoz10/5E5ufIDgmW7yfcrdAsX2nd+p6+wTO/Y4=;
  b=JoiqR4mkibpXhNhILaDRLQQp2BV35ZUJ5Fbv0EM2A5NrigmMQX+SPPWg
   FDZzSxTwM9MT9/ABxEWNKy1UJzVzmbyKC39rrbHIQ+TkjMTOFQREgFziw
   zyOheZag27mk3rkZTtriwYh68DRttcpgmHQyYgkmxmv1Khjh+mpLRkzF8
   AdRktMQghRKaFuEI8SvwzodplGkQvXTMv8/SHU/luTxeiYU7pYptr69ob
   bGxW151VsfPlwkoIMre+58eBDdFzBPhV60g6m57PsKWASTi7DImOXiw+n
   eq+Vj/OOYIcVeUWwV9LUKwKtPtaAeG+7kEapEc35vvejzOnb32HX+kKJ3
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="289320479"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="289320479"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:04 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807743"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807743"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:03 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Chao Gao <chao.gao@intel.com>
Subject: [PATCH v9 074/105] KVM: x86: Add a switch_db_regs flag to handle
 TDX's auto-switched behavior
Date: Fri, 30 Sep 2022 03:18:08 -0700
Message-Id: 
 <d5d1cf06c33626c3fb35820703ba947cd310b804.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a flag, KVM_DEBUGREG_AUTO_SWITCHED_GUEST, to skip saving/restoring DRs
irrespective of any other flags.  TDX-SEAM unconditionally saves and
restores guest DRs and reset to architectural INIT state on TD exit.
So, KVM needs to save host DRs before TD enter without restoring guest DRs
and restore host DRs after TD exit.

Opportunistically convert the KVM_DEBUGREG_* definitions to use BIT().

Reported-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  9 +++++++--
 arch/x86/kvm/vmx/tdx.c          |  1 +
 arch/x86/kvm/x86.c              | 11 ++++++++---
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index e772798684ae..e29a93973ad8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -584,8 +584,13 @@ struct kvm_pmu {
 struct kvm_pmu_ops;
=20
 enum {
-	KVM_DEBUGREG_BP_ENABLED =3D 1,
-	KVM_DEBUGREG_WONT_EXIT =3D 2,
+	KVM_DEBUGREG_BP_ENABLED		=3D BIT(0),
+	KVM_DEBUGREG_WONT_EXIT		=3D BIT(1),
+	/*
+	 * Guest debug registers are saved/restored by hardware on exit from
+	 * or enter guest. KVM needn't switch them.
+	 */
+	KVM_DEBUGREG_AUTO_SWITCH	=3D BIT(2),
 };
=20
 struct kvm_mtrr_range {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 51de41fbe098..78322bd6037e 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -421,6 +421,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
=20
 	vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
=20
+	vcpu->arch.switch_db_regs =3D KVM_DEBUGREG_AUTO_SWITCH;
 	vcpu->arch.cr0_guest_owned_bits =3D -1ul;
 	vcpu->arch.cr4_guest_owned_bits =3D -1ul;
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8160f51bbb92..fda72bef6c90 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10540,7 +10540,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.guest_fpu.xfd_err)
 		wrmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
=20
-	if (unlikely(vcpu->arch.switch_db_regs)) {
+	if (unlikely(vcpu->arch.switch_db_regs & ~KVM_DEBUGREG_AUTO_SWITCH)) {
 		set_debugreg(0, 7);
 		set_debugreg(vcpu->arch.eff_db[0], 0);
 		set_debugreg(vcpu->arch.eff_db[1], 1);
@@ -10583,6 +10583,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 */
 	if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)) {
 		WARN_ON(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP);
+		WARN_ON(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCH);
 		static_call(kvm_x86_sync_dirty_debug_regs)(vcpu);
 		kvm_update_dr0123(vcpu);
 		kvm_update_dr7(vcpu);
@@ -10595,8 +10596,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * care about the messed up debug address registers. But if
 	 * we have some of them active, restore the old state.
 	 */
-	if (hw_breakpoint_active())
-		hw_breakpoint_restore();
+	if (hw_breakpoint_active()) {
+		if (!(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCH))
+			hw_breakpoint_restore();
+		else
+			set_debugreg(__this_cpu_read(cpu_dr7), 7);
+	}
=20
 	vcpu->arch.last_vmentry_cpu =3D vcpu->cpu;
 	vcpu->arch.last_guest_tsc =3D kvm_read_l1_tsc(vcpu, rdtsc());
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5C74BC43217
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232298AbiI3KXG (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:06 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33706 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231572AbiI3KTH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:07 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3EDC17F57D;
        Fri, 30 Sep 2022 03:19:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533144; x=1696069144;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=NdJZwwp9dIB+QrPH/f3DI8sZz38Ov+rl4jk+wSfi5B4=;
  b=HY95KYiGeswyeWK6+08+YjrgltGeIAX3Nj8eFO0DnkKpZONgN4pcoB7P
   WrhsgEyN0Nsritg/D8kak5VL4fxFdScSjRhgVfhKpZcsomhsqdtkTiafR
   39jrKU/v3Eq76xYyHVHUUhxBh5n5a4AyUs1k37T+jqFHDn5J3XVBmGNfE
   fiu8xAxrlL5VPLGNlgHQudbxxEDjaXjrvuL9n2uahnP/S7b+qrWuj5SN8
   t6mtbDYBMZ1YFTgvc8YP2zt/bjKOj8NGzOcZi7Xq24C97VfCAAh1fkb3S
   uL4WrW3V4mD7qfTYNtVYtkd6egme2MjHRfWdq/gYso6Xh54qQBvwCjmOE
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207524"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207524"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:04 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807749"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807749"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:04 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 075/105] KVM: TDX: Add support for find pending IRQ in a
 protected local APIC
Date: Fri, 30 Sep 2022 03:18:09 -0700
Message-Id: 
 <e64660ebe668e5bd5c2983ab6a71fcda7d33b2fb.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <seanjc@google.com>

Add flag and hook to KVM's local APIC management to support determining
whether or not a TDX guest as a pending IRQ.  For TDX vCPUs, the virtual
APIC page is owned by the TDX module and cannot be accessed by KVM.  As a
result, registers that are virtualized by the CPU, e.g. PPR, cannot be
read or written by KVM.  To deliver interrupts for TDX guests, KVM must
send an IRQ to the CPU on the posted interrupt notification vector.  And
to determine if TDX vCPU has a pending interrupt, KVM must check if there
is an outstanding notification.

Return "no interrupt" in kvm_apic_has_interrupt() if the guest APIC is
protected to short-circuit the various other flows that try to pull an
IRQ out of the vAPIC, the only valid operation is querying _if_ an IRQ is
pending, KVM can't do anything based on _which_ IRQ is pending.

Intentionally omit sanity checks from other flows, e.g. PPR update, so as
not to degrade non-TDX guests with unnecessary checks.  A well-behaved KVM
and userspace will never reach those flows for TDX guests, but reaching
them is not fatal if something does go awry.

Note, this doesn't handle interrupts that have been delivered to the vCPU
but not yet recognized by the core, i.e. interrupts that are sitting in
vmcs.GUEST_INTR_STATUS.  Querying that state requires a SEAMCALL and will
be supported in a future patch.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/irq.c                 |  3 +++
 arch/x86/kvm/lapic.c               |  3 +++
 arch/x86/kvm/lapic.h               |  2 ++
 arch/x86/kvm/vmx/main.c            | 11 +++++++++++
 arch/x86/kvm/vmx/tdx.c             |  6 ++++++
 arch/x86/kvm/vmx/x86_ops.h         |  2 ++
 8 files changed, 29 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 757952f186f8..0ecaebcf8a18 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -116,6 +116,7 @@ KVM_X86_OP_OPTIONAL(pi_update_irte)
 KVM_X86_OP_OPTIONAL(pi_start_assignment)
 KVM_X86_OP_OPTIONAL(apicv_post_state_restore)
 KVM_X86_OP_OPTIONAL_RET0(dy_apicv_has_pending_interrupt)
+KVM_X86_OP_OPTIONAL(protected_apic_has_interrupt)
 KVM_X86_OP_OPTIONAL(set_hv_timer)
 KVM_X86_OP_OPTIONAL(cancel_hv_timer)
 KVM_X86_OP(setup_mce)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index e29a93973ad8..96fd23392c1e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1665,6 +1665,7 @@ struct kvm_x86_ops {
 	void (*pi_start_assignment)(struct kvm *kvm);
 	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
 	bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
+	bool (*protected_apic_has_interrupt)(struct kvm_vcpu *vcpu);
=20
 	int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
 			    bool *expired);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index f371f1292ca3..56e52eef0269 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -100,6 +100,9 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 	if (kvm_cpu_has_extint(v))
 		return 1;
=20
+	if (lapic_in_kernel(v) && v->arch.apic->guest_apic_protected)
+		return static_call(kvm_x86_protected_apic_has_interrupt)(v);
+
 	return kvm_apic_has_interrupt(v) !=3D -1;	/* LAPIC */
 }
 EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9dda989a1cf0..78452a26ad33 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2624,6 +2624,9 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
 	if (!kvm_apic_present(vcpu))
 		return -1;
=20
+	if (apic->guest_apic_protected)
+		return -1;
+
 	__apic_update_ppr(apic, &ppr);
 	return apic_has_interrupt_for_ppr(apic, ppr);
 }
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 117a46df5cc1..6264cf52663e 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -65,6 +65,8 @@ struct kvm_lapic {
 	bool sw_enabled;
 	bool irr_pending;
 	bool lvt0_in_nmi_mode;
+	/* Select registers in the vAPIC cannot be read/written. */
+	bool guest_apic_protected;
 	/* Number of bits set in ISR. */
 	s16 isr_count;
 	/* The highest vector set in ISR; if -1 - invalid, must scan ISR. */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index b60e113696c0..b26f660dd728 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -46,6 +46,9 @@ static __init int vt_hardware_setup(void)
=20
 	enable_tdx =3D enable_tdx && !tdx_hardware_setup(&vt_x86_ops);
=20
+	if (!enable_tdx)
+		vt_x86_ops.protected_apic_has_interrupt =3D NULL;
+
 	if (enable_ept)
 		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
 				      cpu_has_vmx_ept_execute_only());
@@ -168,6 +171,13 @@ static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cp=
u)
 	return vmx_vcpu_load(vcpu, cpu);
 }
=20
+static bool vt_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	KVM_BUG_ON(!is_td_vcpu(vcpu), vcpu->kvm);
+
+	return tdx_protected_apic_has_interrupt(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -339,6 +349,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.sync_pir_to_irr =3D vmx_sync_pir_to_irr,
 	.deliver_interrupt =3D vmx_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
+	.protected_apic_has_interrupt =3D vt_protected_apic_has_interrupt,
=20
 	.set_tss_addr =3D vmx_set_tss_addr,
 	.set_identity_map_addr =3D vmx_set_identity_map_addr,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 78322bd6037e..43d11d6bb6f5 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -418,6 +418,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 		return -EINVAL;
=20
 	fpstate_set_confidential(&vcpu->arch.guest_fpu);
+	vcpu->arch.apic->guest_apic_protected =3D true;
=20
 	vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
=20
@@ -456,6 +457,11 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	local_irq_enable();
 }
=20
+bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	return pi_has_pending_interrupt(vcpu);
+}
+
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 36a2956ba677..ff9eb6a37f8e 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -153,6 +153,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu);
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void tdx_vcpu_put(struct kvm_vcpu *vcpu);
 void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
+bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -181,6 +182,7 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *=
vcpu) { return EXIT_FASTP
 static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
+static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)=
 { return false; }
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 490DCC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:26 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231407AbiI3KXX (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:23 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33962 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231585AbiI3KTJ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:09 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 916D3184834;
        Fri, 30 Sep 2022 03:19:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533146; x=1696069146;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=h/G+GFjQMCRf72t3x8igo9ICZroVhNmHFQSX5Tjz1D8=;
  b=ayzsgt22tEHNk45D8Qsa2ALS7CViv37kb0az+yLshwO4k6b7D0NwH3YA
   CW8LxkqVIcgESylh9nwcrDzd65sYVYQEAiGrGVRLlkJQupxBQTl0RD0oV
   0pnTtrFvsGcrPdCdM47h8ieUqNo/mWFW7ASnT0j+t6Z2YA67UQupZFlCa
   URWq6bkQqiJhvgDc8TmG4wSKVbrJO8Nuuhfxw/gFbwkcmvKTuiVK9VauA
   M9o3KzYkDxI9FsEvTU0vx3XOjv3WHh5qHZrg5to3AsDXF9hKffAF+ofAw
   xlHx2j6B6AP1X7S7mO+EL4G8a7nwlhCDjp1HoLZlS/a8Y00pisqrAh8+Z
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207526"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207526"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:04 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807752"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807752"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:04 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 076/105] KVM: x86: Assume timer IRQ was injected if APIC
 state is proteced
Date: Fri, 30 Sep 2022 03:18:10 -0700
Message-Id: 
 <5058ab82e415eda4307569cbf1f3705c2cae63d6.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <seanjc@google.com>

If APIC state is protected, i.e. the vCPU is a TDX guest, assume a timer
IRQ was injected when deciding whether or not to busy wait in the "timer
advanced" path.  The "real" vIRR is not readable/writable, so trying to
query for a pending timer IRQ will return garbage.

Note, TDX can scour the PIR if it wants to be more precise and skip the
"wait" call entirely.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 78452a26ad33..4e506084e8ed 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1606,8 +1606,17 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
 static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
-	u32 reg =3D kvm_lapic_get_reg(apic, APIC_LVTT);
+	u32 reg;
=20
+	/*
+	 * Assume a timer IRQ was "injected" if the APIC is protected.  KVM's
+	 * copy of the vIRR is bogus, it's the responsibility of the caller to
+	 * precisely check whether or not a timer IRQ is pending.
+	 */
+	if (apic->guest_apic_protected)
+		return true;
+
+	reg  =3D kvm_lapic_get_reg(apic, APIC_LVTT);
 	if (kvm_apic_hw_enabled(apic)) {
 		int vec =3D reg & APIC_VECTOR_MASK;
 		void *bitmap =3D apic->regs + APIC_ISR;
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D431AC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:24:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232434AbiI3KYi (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:24:38 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33802 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231690AbiI3KTe (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:34 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C66861CD123;
        Fri, 30 Sep 2022 03:19:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533147; x=1696069147;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=0Bv9GhuyD7SiQXGmuMmn2uihlB+NgS7Lxw0rklMSzaQ=;
  b=ALxaA5AWbM9qpkxs5lrpHaULFDnzCqq0pAIAA1QoXTUG0J44Baqp4KcD
   HViEO8xpLoI07gTvZvx35XuF/eC5U+ZT4+78v44jl76BrD6SrM6PYeL/M
   AJzCnEiVblFT5GN7Zs1h1RYw9p0+Tx/GDYI7EqB0ZdkjaB3qZEmJrKij0
   iy3RM8f4w9Za2SH5a3J7nPtB7Jkd5eFWrYQC9zx4pmLw5LAJ0oI0ftwAT
   YLN+K8uMVF9sVbiRNJbbdGD7sA+3FUelUQOV9/bC0fpiKdyWolPEQ+Z+m
   8qAePuTFvov5qEtXcjlptqGgDlkmkWnw4kqfA8zkBvWZULlh0VZu/Hzp6
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207528"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207528"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:04 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807755"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807755"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:04 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 077/105] KVM: TDX: remove use of struct vcpu_vmx from
 posted_interrupt.c
Date: Fri, 30 Sep 2022 03:18:11 -0700
Message-Id: 
 <f397eb52f562dce76fca729f229c3eb74aa182b8.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

As TDX will use posted_interrupt.c, the use of struct vcpu_vmx is a
blocker.  Because the members of struct pi_desc pi_desc and struct
list_head pi_wakeup_list are only used in posted_interrupt.c, introduce
common structure, struct vcpu_pi, make vcpu_vmx and vcpu_tdx has same
layout in the top of structure.

To minimize the diff size, avoid code conversion like,
vmx->pi_desc =3D> vmx->common->pi_desc.  Instead add compile time check
if the layout is expected.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/posted_intr.c | 41 ++++++++++++++++++++++++++--------
 arch/x86/kvm/vmx/posted_intr.h | 11 +++++++++
 arch/x86/kvm/vmx/tdx.c         |  1 +
 arch/x86/kvm/vmx/tdx.h         |  8 +++++++
 arch/x86/kvm/vmx/vmx.h         | 14 +++++++-----
 5 files changed, 60 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 1b56c5e5c9fb..62caf74753bc 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -9,6 +9,7 @@
 #include "posted_intr.h"
 #include "trace.h"
 #include "vmx.h"
+#include "tdx.h"
=20
 /*
  * Maintain a per-CPU list of vCPUs that need to be awakened by wakeup_han=
dler()
@@ -29,9 +30,29 @@ static DEFINE_PER_CPU(struct list_head, wakeup_vcpus_on_=
cpu);
  */
 static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock);
=20
+/*
+ * The layout of the head of struct vcpu_vmx and struct vcpu_tdx must matc=
h with
+ * struct vcpu_pi.
+ */
+static_assert(offsetof(struct vcpu_pi, pi_desc) =3D=3D
+	      offsetof(struct vcpu_vmx, pi_desc));
+static_assert(offsetof(struct vcpu_pi, pi_wakeup_list) =3D=3D
+	      offsetof(struct vcpu_vmx, pi_wakeup_list));
+#ifdef CONFIG_INTEL_TDX_HOST
+static_assert(offsetof(struct vcpu_pi, pi_desc) =3D=3D
+	      offsetof(struct vcpu_tdx, pi_desc));
+static_assert(offsetof(struct vcpu_pi, pi_wakeup_list) =3D=3D
+	      offsetof(struct vcpu_tdx, pi_wakeup_list));
+#endif
+
+static inline struct vcpu_pi *vcpu_to_pi(struct kvm_vcpu *vcpu)
+{
+	return (struct vcpu_pi *)vcpu;
+}
+
 static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 {
-	return &(to_vmx(vcpu)->pi_desc);
+	return &vcpu_to_pi(vcpu)->pi_desc;
 }
=20
 static int pi_try_set_control(struct pi_desc *pi_desc, u64 *pold, u64 new)
@@ -50,8 +71,8 @@ static int pi_try_set_control(struct pi_desc *pi_desc, u6=
4 *pold, u64 new)
=20
 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 {
-	struct pi_desc *pi_desc =3D vcpu_to_pi_desc(vcpu);
-	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
+	struct vcpu_pi *vcpu_pi =3D vcpu_to_pi(vcpu);
+	struct pi_desc *pi_desc =3D &vcpu_pi->pi_desc;
 	struct pi_desc old, new;
 	unsigned long flags;
 	unsigned int dest;
@@ -88,7 +109,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 	 */
 	if (pi_desc->nv =3D=3D POSTED_INTR_WAKEUP_VECTOR) {
 		raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
-		list_del(&vmx->pi_wakeup_list);
+		list_del(&vcpu_pi->pi_wakeup_list);
 		raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
 	}
=20
@@ -143,15 +164,15 @@ static bool vmx_can_use_vtd_pi(struct kvm *kvm)
  */
 static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
 {
-	struct pi_desc *pi_desc =3D vcpu_to_pi_desc(vcpu);
-	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
+	struct vcpu_pi *vcpu_pi =3D vcpu_to_pi(vcpu);
+	struct pi_desc *pi_desc =3D &vcpu_pi->pi_desc;
 	struct pi_desc old, new;
 	unsigned long flags;
=20
 	local_irq_save(flags);
=20
 	raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
-	list_add_tail(&vmx->pi_wakeup_list,
+	list_add_tail(&vcpu_pi->pi_wakeup_list,
 		      &per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu));
 	raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
=20
@@ -188,7 +209,8 @@ static bool vmx_needs_pi_wakeup(struct kvm_vcpu *vcpu)
 	 * notification vector is switched to the one that calls
 	 * back to the pi_wakeup_handler() function.
 	 */
-	return vmx_can_use_ipiv(vcpu) || vmx_can_use_vtd_pi(vcpu->kvm);
+	return (vmx_can_use_ipiv(vcpu) && !is_td_vcpu(vcpu)) ||
+		vmx_can_use_vtd_pi(vcpu->kvm);
 }
=20
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
@@ -198,7 +220,8 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
 	if (!vmx_needs_pi_wakeup(vcpu))
 		return;
=20
-	if (kvm_vcpu_is_blocking(vcpu) && !vmx_interrupt_blocked(vcpu))
+	if (kvm_vcpu_is_blocking(vcpu) &&
+	    (is_td_vcpu(vcpu) || !vmx_interrupt_blocked(vcpu)))
 		pi_enable_wakeup_handler(vcpu);
=20
 	/*
diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
index 26992076552e..2fe8222308b2 100644
--- a/arch/x86/kvm/vmx/posted_intr.h
+++ b/arch/x86/kvm/vmx/posted_intr.h
@@ -94,6 +94,17 @@ static inline bool pi_test_sn(struct pi_desc *pi_desc)
 			(unsigned long *)&pi_desc->control);
 }
=20
+struct vcpu_pi {
+	struct kvm_vcpu	vcpu;
+
+	/* Posted interrupt descriptor */
+	struct pi_desc pi_desc;
+
+	/* Used if this vCPU is waiting for PI notification wakeup. */
+	struct list_head pi_wakeup_list;
+	/* Until here common layout betwwn vcpu_vmx and vcpu_tdx. */
+};
+
 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu);
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu);
 void pi_wakeup_handler(void);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 43d11d6bb6f5..114c10cab019 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -419,6 +419,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
=20
 	fpstate_set_confidential(&vcpu->arch.guest_fpu);
 	vcpu->arch.apic->guest_apic_protected =3D true;
+	INIT_LIST_HEAD(&tdx->pi_wakeup_list);
=20
 	vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
=20
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 8b425e7415c2..003544ead8fb 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -4,6 +4,7 @@
=20
 #ifdef CONFIG_INTEL_TDX_HOST
=20
+#include "posted_intr.h"
 #include "pmu_intel.h"
 #include "tdx_ops.h"
=20
@@ -79,6 +80,13 @@ union tdx_exit_reason {
 struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
=20
+	/* Posted interrupt descriptor */
+	struct pi_desc pi_desc;
+
+	/* Used if this vCPU is waiting for PI notification wakeup. */
+	struct list_head pi_wakeup_list;
+	/* Until here same layout to struct vcpu_pi. */
+
 	struct tdx_td_page tdvpr;
 	struct tdx_td_page *tdvpx;
=20
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 47240671535a..7d88790f6e68 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -245,6 +245,14 @@ struct nested_vmx {
=20
 struct vcpu_vmx {
 	struct kvm_vcpu       vcpu;
+
+	/* Posted interrupt descriptor */
+	struct pi_desc pi_desc;
+
+	/* Used if this vCPU is waiting for PI notification wakeup. */
+	struct list_head pi_wakeup_list;
+	/* Until here same layout to struct vcpu_pi. */
+
 	u8                    fail;
 	u8		      x2apic_msr_bitmap_mode;
=20
@@ -314,12 +322,6 @@ struct vcpu_vmx {
=20
 	union vmx_exit_reason exit_reason;
=20
-	/* Posted interrupt descriptor */
-	struct pi_desc pi_desc;
-
-	/* Used if this vCPU is waiting for PI notification wakeup. */
-	struct list_head pi_wakeup_list;
-
 	/* Support for a guest hypervisor (nested VMX) */
 	struct nested_vmx nested;
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6107BC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:24:19 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232417AbiI3KYQ (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:24:16 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34548 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231681AbiI3KTb (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:31 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC9611E3F7F;
        Fri, 30 Sep 2022 03:19:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533149; x=1696069149;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=z15CcsW6N70rM1DEF/hVcEpeM1f+HH6v9/k+525RrIQ=;
  b=eqkjXl7ky1KETKdlC6xWYDaq8NgsAntGGRBdtkrXAT5e2hxpqpAAESMx
   LItIYfAqTQ4gpJr9N2BSeVm19lptMxk32SvWDjMmWc8qEuvJrl1h0ORS5
   I8EzgfAQFDTEaeKXkgDq/MrfQglCBfngTYzdLYMypTVsP27H0iNvMjJai
   VCa7HNu9G5ygVSRfVEgu8yy0MWcSU5KHvkR1j3mxex/Ozwv+ItCpj62xD
   q9FcuYplZYwUFKxT5cTxzXRcfTDGm3Heuqn1WA3nHUvE7hjwzlFDkHkKn
   PdgKGnkl1TCUSlGkjIjYXqNf4zzUsIsvFTA2WxLva2wlf6q5hkG3aTEvY
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207530"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207530"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807758"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807758"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:04 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 078/105] KVM: TDX: Implement interrupt injection
Date: Fri, 30 Sep 2022 03:18:12 -0700
Message-Id: 
 <e23bb74b232c4c80adcd596d50b2c70507905b07.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX supports interrupt inject into vcpu with posted interrupt.  Wire up the
corresponding kvm x86 operations to posted interrupt.  Move
kvm_vcpu_trigger_posted_interrupt() from vmx.c to common.h to share the
code.

VMX can inject interrupt by setting interrupt information field,
VM_ENTRY_INTR_INFO_FIELD, of VMCS.  TDX supports interrupt injection only
by posted interrupt.  Ignore the execution path to access
VM_ENTRY_INTR_INFO_FIELD.

As cpu state is protected and apicv is enabled for the TDX guest, VMM can
inject interrupt by updating posted interrupt descriptor.  Treat interrupt
can be injected always.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/common.h      | 71 ++++++++++++++++++++++++++
 arch/x86/kvm/vmx/main.c        | 93 ++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/posted_intr.c |  2 +-
 arch/x86/kvm/vmx/posted_intr.h |  2 +
 arch/x86/kvm/vmx/tdx.c         | 25 +++++++++
 arch/x86/kvm/vmx/vmx.c         | 67 +-----------------------
 arch/x86/kvm/vmx/x86_ops.h     |  7 ++-
 7 files changed, 190 insertions(+), 77 deletions(-)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 235908f3e044..747f993cf7de 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -4,6 +4,7 @@
=20
 #include <linux/kvm_host.h>
=20
+#include "posted_intr.h"
 #include "mmu.h"
=20
 static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t =
gpa,
@@ -30,4 +31,74 @@ static inline int __vmx_handle_ept_violation(struct kvm_=
vcpu *vcpu, gpa_t gpa,
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
=20
+static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
+						     int pi_vec)
+{
+#ifdef CONFIG_SMP
+	if (vcpu->mode =3D=3D IN_GUEST_MODE) {
+		/*
+		 * The vector of the virtual has already been set in the PIR.
+		 * Send a notification event to deliver the virtual interrupt
+		 * unless the vCPU is the currently running vCPU, i.e. the
+		 * event is being sent from a fastpath VM-Exit handler, in
+		 * which case the PIR will be synced to the vIRR before
+		 * re-entering the guest.
+		 *
+		 * When the target is not the running vCPU, the following
+		 * possibilities emerge:
+		 *
+		 * Case 1: vCPU stays in non-root mode. Sending a notification
+		 * event posts the interrupt to the vCPU.
+		 *
+		 * Case 2: vCPU exits to root mode and is still runnable. The
+		 * PIR will be synced to the vIRR before re-entering the guest.
+		 * Sending a notification event is ok as the host IRQ handler
+		 * will ignore the spurious event.
+		 *
+		 * Case 3: vCPU exits to root mode and is blocked. vcpu_block()
+		 * has already synced PIR to vIRR and never blocks the vCPU if
+		 * the vIRR is not empty. Therefore, a blocked vCPU here does
+		 * not wait for any requested interrupts in PIR, and sending a
+		 * notification event also results in a benign, spurious event.
+		 */
+
+		if (vcpu !=3D kvm_get_running_vcpu())
+			apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
+		return;
+	}
+#endif
+	/*
+	 * The vCPU isn't in the guest; wake the vCPU in case it is blocking,
+	 * otherwise do nothing as KVM will grab the highest priority pending
+	 * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
+	 */
+	kvm_vcpu_wake_up(vcpu);
+}
+
+/*
+ * Send interrupt to vcpu via posted interrupt way.
+ * 1. If target vcpu is running(non-root mode), send posted interrupt
+ * notification to vcpu and hardware will sync PIR to vIRR atomically.
+ * 2. If target vcpu isn't running(root mode), kick it to pick up the
+ * interrupt from PIR in next vmentry.
+ */
+static inline void __vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu,
+						  struct pi_desc *pi_desc, int vector)
+{
+	if (pi_test_and_set_pir(vector, pi_desc))
+		return;
+
+	/* If a previous notification has sent the IPI, nothing to do.  */
+	if (pi_test_and_set_on(pi_desc))
+		return;
+
+	/*
+	 * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
+	 * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
+	 * guaranteed to see PID.ON=3D1 and sync the PIR to IRR if triggering a
+	 * posted interrupt "fails" because vcpu->mode !=3D IN_GUEST_MODE.
+	 */
+	kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
+}
+
 #endif /* __KVM_X86_VMX_COMMON_H */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index b26f660dd728..6b232fb8f981 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -178,6 +178,34 @@ static bool vt_protected_apic_has_interrupt(struct kvm=
_vcpu *vcpu)
 	return tdx_protected_apic_has_interrupt(vcpu);
 }
=20
+static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
+
+	pi_clear_on(pi);
+	memset(pi->pir, 0, sizeof(pi->pir));
+}
+
+static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return -1;
+
+	return vmx_sync_pir_to_irr(vcpu);
+}
+
+static void vt_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector)
+{
+	if (is_td_vcpu(apic->vcpu)) {
+		tdx_deliver_interrupt(apic, delivery_mode, trig_mode,
+					     vector);
+		return;
+	}
+
+	vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -244,6 +272,53 @@ static void vt_sched_in(struct kvm_vcpu *vcpu, int cpu)
 	vmx_sched_in(vcpu, cpu);
 }
=20
+static void vt_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+	vmx_set_interrupt_shadow(vcpu, mask);
+}
+
+static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return 0;
+
+	return vmx_get_interrupt_shadow(vcpu);
+}
+
+static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_inject_irq(vcpu, reinjected);
+}
+
+static void vt_cancel_injection(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_cancel_injection(vcpu);
+}
+
+static int vt_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_interrupt_allowed(vcpu, for_injection);
+}
+
+static void vt_enable_irq_window(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_enable_irq_window(vcpu);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -323,31 +398,31 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.handle_exit =3D vmx_handle_exit,
 	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
 	.update_emulated_instruction =3D vmx_update_emulated_instruction,
-	.set_interrupt_shadow =3D vmx_set_interrupt_shadow,
-	.get_interrupt_shadow =3D vmx_get_interrupt_shadow,
+	.set_interrupt_shadow =3D vt_set_interrupt_shadow,
+	.get_interrupt_shadow =3D vt_get_interrupt_shadow,
 	.patch_hypercall =3D vmx_patch_hypercall,
-	.inject_irq =3D vmx_inject_irq,
+	.inject_irq =3D vt_inject_irq,
 	.inject_nmi =3D vmx_inject_nmi,
 	.queue_exception =3D vmx_queue_exception,
-	.cancel_injection =3D vmx_cancel_injection,
-	.interrupt_allowed =3D vmx_interrupt_allowed,
+	.cancel_injection =3D vt_cancel_injection,
+	.interrupt_allowed =3D vt_interrupt_allowed,
 	.nmi_allowed =3D vmx_nmi_allowed,
 	.get_nmi_mask =3D vmx_get_nmi_mask,
 	.set_nmi_mask =3D vmx_set_nmi_mask,
 	.enable_nmi_window =3D vmx_enable_nmi_window,
-	.enable_irq_window =3D vmx_enable_irq_window,
+	.enable_irq_window =3D vt_enable_irq_window,
 	.update_cr8_intercept =3D vmx_update_cr8_intercept,
 	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
 	.set_apic_access_page_addr =3D vmx_set_apic_access_page_addr,
 	.refresh_apicv_exec_ctrl =3D vmx_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap =3D vmx_load_eoi_exitmap,
-	.apicv_post_state_restore =3D vmx_apicv_post_state_restore,
+	.apicv_post_state_restore =3D vt_apicv_post_state_restore,
 	.check_apicv_inhibit_reasons =3D vmx_check_apicv_inhibit_reasons,
 	.hwapic_irr_update =3D vmx_hwapic_irr_update,
 	.hwapic_isr_update =3D vmx_hwapic_isr_update,
 	.guest_apic_has_interrupt =3D vmx_guest_apic_has_interrupt,
-	.sync_pir_to_irr =3D vmx_sync_pir_to_irr,
-	.deliver_interrupt =3D vmx_deliver_interrupt,
+	.sync_pir_to_irr =3D vt_sync_pir_to_irr,
+	.deliver_interrupt =3D vt_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
 	.protected_apic_has_interrupt =3D vt_protected_apic_has_interrupt,
=20
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 62caf74753bc..91ea3396463d 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -50,7 +50,7 @@ static inline struct vcpu_pi *vcpu_to_pi(struct kvm_vcpu =
*vcpu)
 	return (struct vcpu_pi *)vcpu;
 }
=20
-static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 {
 	return &vcpu_to_pi(vcpu)->pi_desc;
 }
diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
index 2fe8222308b2..0f9983b6910b 100644
--- a/arch/x86/kvm/vmx/posted_intr.h
+++ b/arch/x86/kvm/vmx/posted_intr.h
@@ -105,6 +105,8 @@ struct vcpu_pi {
 	/* Until here common layout betwwn vcpu_vmx and vcpu_tdx. */
 };
=20
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu);
+
 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu);
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu);
 void pi_wakeup_handler(void);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 114c10cab019..fa309acf05de 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -7,6 +7,7 @@
=20
 #include "capabilities.h"
 #include "x86_ops.h"
+#include "common.h"
 #include "tdx.h"
 #include "vmx.h"
 #include "x86.h"
@@ -432,6 +433,9 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.guest_state_protected =3D
 		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
=20
+	tdx->pi_desc.nv =3D POSTED_INTR_VECTOR;
+	tdx->pi_desc.sn =3D 1;
+
 	tdx->host_state_need_save =3D true;
 	tdx->host_state_need_restore =3D false;
=20
@@ -442,6 +446,7 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
=20
+	vmx_vcpu_pi_load(vcpu, cpu);
 	if (vcpu->cpu =3D=3D cpu)
 		return;
=20
@@ -644,6 +649,12 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	trace_kvm_entry(vcpu);
=20
+	if (pi_test_on(&tdx->pi_desc)) {
+		apic->send_IPI_self(POSTED_INTR_VECTOR);
+
+		kvm_wait_lapic_expire(vcpu);
+	}
+
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
 	tdx_user_return_update_cache();
@@ -950,6 +961,16 @@ static void tdx_sept_remove_private_spte(struct kvm *k=
vm, gfn_t gfn,
 	tdx_sept_drop_private_spte(kvm, gfn, level, pfn);
 }
=20
+void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector)
+{
+	struct kvm_vcpu *vcpu =3D apic->vcpu;
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	/* TDX supports only posted interrupt.  No lapic emulation. */
+	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
@@ -1620,6 +1641,10 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __use=
r *argp)
 	if (ret)
 		return ret;
=20
+	td_vmcs_write16(tdx, POSTED_INTR_NV, POSTED_INTR_VECTOR);
+	td_vmcs_write64(tdx, POSTED_INTR_DESC_ADDR, __pa(&tdx->pi_desc));
+	td_vmcs_setbit32(tdx, PIN_BASED_VM_EXEC_CONTROL, PIN_BASED_POSTED_INTR);
+
 	tdx->vcpu_initialized =3D true;
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3c0587f26f2b..4a8c398bb69a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4166,50 +4166,6 @@ void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
 		pt_update_intercept_for_msr(vcpu);
 }
=20
-static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
-						     int pi_vec)
-{
-#ifdef CONFIG_SMP
-	if (vcpu->mode =3D=3D IN_GUEST_MODE) {
-		/*
-		 * The vector of the virtual has already been set in the PIR.
-		 * Send a notification event to deliver the virtual interrupt
-		 * unless the vCPU is the currently running vCPU, i.e. the
-		 * event is being sent from a fastpath VM-Exit handler, in
-		 * which case the PIR will be synced to the vIRR before
-		 * re-entering the guest.
-		 *
-		 * When the target is not the running vCPU, the following
-		 * possibilities emerge:
-		 *
-		 * Case 1: vCPU stays in non-root mode. Sending a notification
-		 * event posts the interrupt to the vCPU.
-		 *
-		 * Case 2: vCPU exits to root mode and is still runnable. The
-		 * PIR will be synced to the vIRR before re-entering the guest.
-		 * Sending a notification event is ok as the host IRQ handler
-		 * will ignore the spurious event.
-		 *
-		 * Case 3: vCPU exits to root mode and is blocked. vcpu_block()
-		 * has already synced PIR to vIRR and never blocks the vCPU if
-		 * the vIRR is not empty. Therefore, a blocked vCPU here does
-		 * not wait for any requested interrupts in PIR, and sending a
-		 * notification event also results in a benign, spurious event.
-		 */
-
-		if (vcpu !=3D kvm_get_running_vcpu())
-			apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
-		return;
-	}
-#endif
-	/*
-	 * The vCPU isn't in the guest; wake the vCPU in case it is blocking,
-	 * otherwise do nothing as KVM will grab the highest priority pending
-	 * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
-	 */
-	kvm_vcpu_wake_up(vcpu);
-}
-
 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 						int vector)
 {
@@ -4262,20 +4218,7 @@ static int vmx_deliver_posted_interrupt(struct kvm_v=
cpu *vcpu, int vector)
 	if (!vcpu->arch.apic->apicv_active)
 		return -1;
=20
-	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
-		return 0;
-
-	/* If a previous notification has sent the IPI, nothing to do.  */
-	if (pi_test_and_set_on(&vmx->pi_desc))
-		return 0;
-
-	/*
-	 * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
-	 * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
-	 * guaranteed to see PID.ON=3D1 and sync the PIR to IRR if triggering a
-	 * posted interrupt "fails" because vcpu->mode !=3D IN_GUEST_MODE.
-	 */
-	kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
+	__vmx_deliver_posted_interrupt(vcpu, &vmx->pi_desc, vector);
 	return 0;
 }
=20
@@ -6862,14 +6805,6 @@ void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64=
 *eoi_exit_bitmap)
 	vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
 }
=20
-void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
-
-	pi_clear_on(&vmx->pi_desc);
-	memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
-}
-
 void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
=20
 static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index ff9eb6a37f8e..c6dda5f6acda 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -55,7 +55,6 @@ int vmx_check_intercept(struct kvm_vcpu *vcpu,
 bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu);
 void vmx_migrate_timers(struct kvm_vcpu *vcpu);
 void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
-void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu);
 bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason);
 void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr);
 void vmx_hwapic_isr_update(int max_isr);
@@ -155,6 +154,9 @@ void tdx_vcpu_put(struct kvm_vcpu *vcpu);
 void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
=20
+void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector);
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
=20
@@ -184,6 +186,9 @@ static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) =
{}
 static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
 static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)=
 { return false; }
=20
+static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int deliv=
ery_mode,
+					 int trig_mode, int vector) {}
+
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 492F1C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232148AbiI3KXv (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:51 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33582 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231638AbiI3KTR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:17 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8696A1E3F48;
        Fri, 30 Sep 2022 03:19:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533148; x=1696069148;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=3m2IFup46Mtlv/fc2es6KrwlyyadmWXqmbVoQs8Ubi0=;
  b=DPr5b8yW6ceksUvfrSlIDISoJX24xItYf21WHOlg3ulePMVPlEC6iOnZ
   cI9OhjoGFNMphPMuX4BuD7tU1tqgdxanaSciwo8qsZ8OniUjC3hdkLuFD
   x1UZZy+Mb1ncLFVeKJ4gQ21GtH0fc+wNJhtsIRVzEbzCdH4cDunWlgnD1
   8jmF1Q03If7QTR6PCUWPV21sLayXzXJ8exp2/3GPZjqqZ4Hj8qNPanA0M
   tMAxnOIzZr3ijrVOf5OTvPv2pqJHhyHgH/jIx6aLR0Tj/FOfhUX6Zr+BZ
   kvTcCR9VcZdWrC7578s4Fs8NP2kKHIiRyR7+5/tl5zzDKvFMLCP8L1NXp
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207532"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207532"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807761"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807761"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:04 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 079/105] KVM: TDX: Implements vcpu request_immediate_exit
Date: Fri, 30 Sep 2022 03:18:13 -0700
Message-Id: 
 <4f68d392b80f6eb6b31f31155f07e790ca52fd7a.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Now we are able to inject interrupts into TDX vcpu, it's ready to block TDX
vcpu.  Wire up kvm x86 methods for blocking/unblocking vcpu for TDX.  To
unblock on pending events, request immediate exit methods is also needed.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 6b232fb8f981..7dae3a1999eb 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -319,6 +319,14 @@ static void vt_enable_irq_window(struct kvm_vcpu *vcpu)
 	vmx_enable_irq_window(vcpu);
 }
=20
+static void vt_request_immediate_exit(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return __kvm_request_immediate_exit(vcpu);
+
+	vmx_request_immediate_exit(vcpu);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -446,7 +454,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.check_intercept =3D vmx_check_intercept,
 	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
=20
-	.request_immediate_exit =3D vmx_request_immediate_exit,
+	.request_immediate_exit =3D vt_request_immediate_exit,
=20
 	.sched_in =3D vt_sched_in,
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CB03EC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:25:33 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232502AbiI3KZc (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:25:32 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39326 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231825AbiI3KUb (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:20:31 -0400
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E08BC1F0185;
        Fri, 30 Sep 2022 03:19:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533151; x=1696069151;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=T4J3Fz4bJzzSS1JT/PJ4XNJjBqTH7WjvGiRoqDtdVj4=;
  b=k9kSJmsnuvFldXgMifWwjqrM7OFhq93U0Hg60zhn2lbmWpTu3b6pw51x
   wnpS0j9i014jlJ550wcELSS8uZH2cSqm//cX1fLviM6k3ikX+2GSZzrCo
   sFe9um0xjPoxw5h9qXd2Bsk3WCtxzR0gwLSUH12o180sigAbMpMzNnDMo
   w9KwhztED1d7ypok8vt/BBR32itskZVggKGJN4ujdDduUhEquzU8zhbu7
   7VaaOhJjxQ2tTFlHjfY/FDmxEHPXp1f2Fn+htZifKUElHLTnDjQlinx8K
   W8gpf5GDTbIhEIFyWX8zPqz3k5gYc6G/5ecrmoMqC+DpY6JrxiwPx9xti
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="366207535"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="366207535"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807764"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807764"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 080/105] KVM: TDX: Implement methods to inject NMI
Date: Fri, 30 Sep 2022 03:18:14 -0700
Message-Id: 
 <471ee3c28b309ba10434c30ac6947a070b18c790.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX vcpu control structure defines one bit for pending NMI for VMM to
inject NMI by setting the bit without knowing TDX vcpu NMI states.  Because
the vcpu state is protected, VMM can't know about NMI states of TDX vcpu.
The TDX module handles actual injection and NMI states transition.

Add methods for NMI and treat NMI can be injected always.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c    | 62 +++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/tdx.c     |  5 +++
 arch/x86/kvm/vmx/x86_ops.h |  2 ++
 3 files changed, 64 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 7dae3a1999eb..a2417c7f7ad7 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -255,6 +255,58 @@ static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu)
 	vmx_flush_tlb_guest(vcpu);
 }
=20
+static void vt_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_inject_nmi(vcpu);
+
+	vmx_inject_nmi(vcpu);
+}
+
+static int vt_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	/*
+	 * The TDX module manages NMI windows and NMI reinjection, and hides NMI
+	 * blocking, all KVM can do is throw an NMI over the wall.
+	 */
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_nmi_allowed(vcpu, for_injection);
+}
+
+static bool vt_get_nmi_mask(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Assume NMIs are always unmasked.  KVM could query PEND_NMI and treat
+	 * NMIs as masked if a previous NMI is still pending, but SEAMCALLs are
+	 * expensive and the end result is unchanged as the only relevant usage
+	 * of get_nmi_mask() is to limit the number of pending NMIs, i.e. it
+	 * only changes whether KVM or the TDX module drops an NMI.
+	 */
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return vmx_get_nmi_mask(vcpu);
+}
+
+static void vt_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_nmi_mask(vcpu, masked);
+}
+
+static void vt_enable_nmi_window(struct kvm_vcpu *vcpu)
+{
+	/* Refer the comment in vt_get_nmi_mask(). */
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_enable_nmi_window(vcpu);
+}
+
 static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			int pgd_level)
 {
@@ -410,14 +462,14 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.get_interrupt_shadow =3D vt_get_interrupt_shadow,
 	.patch_hypercall =3D vmx_patch_hypercall,
 	.inject_irq =3D vt_inject_irq,
-	.inject_nmi =3D vmx_inject_nmi,
+	.inject_nmi =3D vt_inject_nmi,
 	.queue_exception =3D vmx_queue_exception,
 	.cancel_injection =3D vt_cancel_injection,
 	.interrupt_allowed =3D vt_interrupt_allowed,
-	.nmi_allowed =3D vmx_nmi_allowed,
-	.get_nmi_mask =3D vmx_get_nmi_mask,
-	.set_nmi_mask =3D vmx_set_nmi_mask,
-	.enable_nmi_window =3D vmx_enable_nmi_window,
+	.nmi_allowed =3D vt_nmi_allowed,
+	.get_nmi_mask =3D vt_get_nmi_mask,
+	.set_nmi_mask =3D vt_set_nmi_mask,
+	.enable_nmi_window =3D vt_enable_nmi_window,
 	.enable_irq_window =3D vt_enable_irq_window,
 	.update_cr8_intercept =3D vmx_update_cr8_intercept,
 	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index fa309acf05de..5e994d581d87 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -670,6 +670,11 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
 	return EXIT_FASTPATH_NONE;
 }
=20
+void tdx_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	td_management_write8(to_tdx(vcpu), TD_VCPU_PEND_NMI, 1);
+}
+
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
 {
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index c6dda5f6acda..fb630d17ccd1 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -156,6 +156,7 @@ bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *=
vcpu);
=20
 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 			   int trig_mode, int vector);
+void tdx_inject_nmi(struct kvm_vcpu *vcpu);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -188,6 +189,7 @@ static inline bool tdx_protected_apic_has_interrupt(str=
uct kvm_vcpu *vcpu) { ret
=20
 static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int deliv=
ery_mode,
 					 int trig_mode, int vector) {}
+static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 90583C4332F
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:23:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232366AbiI3KXi (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:23:38 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33980 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231609AbiI3KTK (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:10 -0400
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2951418B5E0;
        Fri, 30 Sep 2022 03:19:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533146; x=1696069146;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=kZLhfSmzVTvaaeQgEQOarnShUJcRUFoMTeqFZRT01qQ=;
  b=JVOs6PWQLX+TFstpIywrt0UxDdYfO3LnRy4+xbWlY6G3TcSEdcmlBQKm
   +Fgf9sHgFl3CWUmxxCI9wvJhYIyK3VguPaQu49QdaHJREVUKE1GZQ8MVs
   F2QYHcSyax3AR90MvqrYWllpH5VYcbEyuBddgjC1elTPiOy1BdEUzKJ//
   7jaYMrRiUeNniIhZfy13AlFCJWZzBL8ivzLK5DHyA7QjzaaOwskwu41+f
   hZaZ8/3mzwOHT2vc/7oce3XnVlmskuThovftwgH+S9uXnl9OJKvhxs7cU
   4DKW8pii7KEbcFgu3FhHPEWtSAPCBqdGzEx/oo0ucV5sASbPubP+KieA8
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="300875877"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="300875877"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807767"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807767"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 081/105] KVM: VMX: Modify NMI and INTR handlers to take
 intr_info as function argument
Date: Fri, 30 Sep 2022 03:18:15 -0700
Message-Id: 
 <6792a020d525fe3eedf814ece75265f44c82def2.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX uses different ABI to get information about VM exit.  Pass intr_info to
the NMI and INTR handlers instead of pulling it from vcpu_vmx in
preparation for sharing the bulk of the handlers with TDX.

When the guest TD exits to VMM, RAX holds status and exit reason, RCX holds
exit qualification etc rather than the VMCS fields because VMM doesn't have
access to the VMCS.  The eventual code will be

VMX:
  - get exit reason, intr_info, exit_qualification, and etc from VMCS
  - call NMI/INTR handlers (common code)

TDX:
  - get exit reason, intr_info, exit_qualification, and etc from guest
    registers
  - call NMI/INTR handlers (common code)

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4a8c398bb69a..ab0f2b87b5b4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6837,28 +6837,27 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu =
*vcpu)
 		rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
 }
=20
-static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
+static void handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_in=
fo)
 {
 	const unsigned long nmi_entry =3D (unsigned long)asm_exc_nmi_noist;
-	u32 intr_info =3D vmx_get_intr_info(&vmx->vcpu);
=20
 	/* if exit due to PF check for async PF */
 	if (is_page_fault(intr_info))
-		vmx->vcpu.arch.apf.host_apf_flags =3D kvm_read_and_reset_apf_flags();
+		vcpu->arch.apf.host_apf_flags =3D kvm_read_and_reset_apf_flags();
 	/* if exit due to NM, handle before interrupts are enabled */
 	else if (is_nm_fault(intr_info))
-		handle_nm_fault_irqoff(&vmx->vcpu);
+		handle_nm_fault_irqoff(vcpu);
 	/* Handle machine checks before interrupts are enabled */
 	else if (is_machine_check(intr_info))
 		kvm_machine_check();
 	/* We need to handle NMIs before interrupts are enabled */
 	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
+		handle_interrupt_nmi_irqoff(vcpu, nmi_entry);
 }
=20
-static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
+static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
+					     u32 intr_info)
 {
-	u32 intr_info =3D vmx_get_intr_info(vcpu);
 	unsigned int vector =3D intr_info & INTR_INFO_VECTOR_MASK;
 	gate_desc *desc =3D (gate_desc *)host_idt_base + vector;
=20
@@ -6878,9 +6877,9 @@ void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 		return;
=20
 	if (vmx->exit_reason.basic =3D=3D EXIT_REASON_EXTERNAL_INTERRUPT)
-		handle_external_interrupt_irqoff(vcpu);
+		handle_external_interrupt_irqoff(vcpu, vmx_get_intr_info(vcpu));
 	else if (vmx->exit_reason.basic =3D=3D EXIT_REASON_EXCEPTION_NMI)
-		handle_exception_nmi_irqoff(vmx);
+		handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
 }
=20
 /*
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 81829C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:24:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232412AbiI3KYM (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:24:12 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34214 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231651AbiI3KTS (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:18 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEB24166F2C;
        Fri, 30 Sep 2022 03:19:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533146; x=1696069146;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=c9c+uZSskcYB3azajxUyDubWH3eDg2ctDW5cMWgrWDg=;
  b=SHRJcCec0MEFFWKnKRecj3iPsADiCM4IpjeLQp4BFRfENjv6aqQlKU9K
   OuG65e9n4hYU3WQ6o/M8+Dfpx1WWb61SIVnFIiwZdcdrkcXpNz9X+2NMB
   M0jwL3+GdH81manm1O10j5y2gcjRVx+VspHSe7trpkpS4HbfVs+aCCwQa
   FBhj5tDyyPKaQfoPe4ZqVzp8H9D77QbcPo0nzpOW7AUurjpLeXjpjUSy6
   Z8OFKyXyTnzaYQUjxAxgovt7mXdh/Ie3ixK6yklswwHzybv7iHCCag72a
   qAL1kpTQPskGKOnNGRYPVBLaoHPmrgUeOhedhvqohnBSea5F40Yz5wwqe
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540152"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540152"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807771"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807771"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 082/105] KVM: VMX: Move NMI/exception handler to common
 helper
Date: Fri, 30 Sep 2022 03:18:16 -0700
Message-Id: 
 <c967a2aac450f28130e614c42242a678f14f1d53.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX mostly handles NMI/exception exit mostly the same to VMX case.  The
difference is how to retrieve exit qualification.  To share the code with
TDX, move NMI/exception to a common header, common.h.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/common.h | 70 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c    | 79 ++++-----------------------------------
 2 files changed, 78 insertions(+), 71 deletions(-)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 747f993cf7de..04836d88baaa 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -4,8 +4,78 @@
=20
 #include <linux/kvm_host.h>
=20
+#include <asm/traps.h>
+
 #include "posted_intr.h"
 #include "mmu.h"
+#include "vmcs.h"
+#include "x86.h"
+
+extern unsigned long vmx_host_idt_base;
+void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
+
+static inline void vmx_handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
+						   unsigned long entry)
+{
+	bool is_nmi =3D entry =3D=3D (unsigned long)asm_exc_nmi_noist;
+
+	kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
+	vmx_do_interrupt_nmi_irqoff(entry);
+	kvm_after_interrupt(vcpu);
+}
+
+static inline void vmx_handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Save xfd_err to guest_fpu before interrupt is enabled, so the
+	 * MSR value is not clobbered by the host activity before the guest
+	 * has chance to consume it.
+	 *
+	 * Do not blindly read xfd_err here, since this exception might
+	 * be caused by L1 interception on a platform which doesn't
+	 * support xfd at all.
+	 *
+	 * Do it conditionally upon guest_fpu::xfd. xfd_err matters
+	 * only when xfd contains a non-zero value.
+	 *
+	 * Queuing exception is done in vmx_handle_exit. See comment there.
+	 */
+	if (vcpu->arch.guest_fpu.fpstate->xfd)
+		rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
+}
+
+static inline void vmx_handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu,
+						   u32 intr_info)
+{
+	const unsigned long nmi_entry =3D (unsigned long)asm_exc_nmi_noist;
+
+	/* if exit due to PF check for async PF */
+	if (is_page_fault(intr_info))
+		vcpu->arch.apf.host_apf_flags =3D kvm_read_and_reset_apf_flags();
+	/* if exit due to NM, handle before interrupts are enabled */
+	else if (is_nm_fault(intr_info))
+		vmx_handle_nm_fault_irqoff(vcpu);
+	/* Handle machine checks before interrupts are enabled */
+	else if (is_machine_check(intr_info))
+		kvm_machine_check();
+	/* We need to handle NMIs before interrupts are enabled */
+	else if (is_nmi(intr_info))
+		vmx_handle_interrupt_nmi_irqoff(vcpu, nmi_entry);
+}
+
+static inline void vmx_handle_external_interrupt_irqoff(struct kvm_vcpu *v=
cpu,
+							u32 intr_info)
+{
+	unsigned int vector =3D intr_info & INTR_INFO_VECTOR_MASK;
+	gate_desc *desc =3D (gate_desc *)vmx_host_idt_base + vector;
+
+	if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
+	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
+		return;
+
+	vmx_handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
+	vcpu->arch.at_instruction_boundary =3D true;
+}
=20
 static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t =
gpa,
 					     unsigned long exit_qualification)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ab0f2b87b5b4..64e5a7c9951a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -525,7 +525,7 @@ static inline void vmx_segment_cache_clear(struct vcpu_=
vmx *vmx)
 	vmx->segment_cache.bitmask =3D 0;
 }
=20
-static unsigned long host_idt_base;
+unsigned long vmx_host_idt_base;
=20
 #if IS_ENABLED(CONFIG_HYPERV)
 static bool __read_mostly enlightened_vmcs =3D true;
@@ -4282,7 +4282,7 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 	vmcs_write16(HOST_SS_SELECTOR, __KERNEL_DS);  /* 22.2.4 */
 	vmcs_write16(HOST_TR_SELECTOR, GDT_ENTRY_TSS*8);  /* 22.2.4 */
=20
-	vmcs_writel(HOST_IDTR_BASE, host_idt_base);   /* 22.2.4 */
+	vmcs_writel(HOST_IDTR_BASE, vmx_host_idt_base);   /* 22.2.4 */
=20
 	vmcs_writel(HOST_RIP, (unsigned long)vmx_vmexit); /* 22.2.5 */
=20
@@ -5129,10 +5129,10 @@ static int handle_exception_nmi(struct kvm_vcpu *vc=
pu)
 	intr_info =3D vmx_get_intr_info(vcpu);
=20
 	if (is_machine_check(intr_info) || is_nmi(intr_info))
-		return 1; /* handled by handle_exception_nmi_irqoff() */
+		return 1; /* handled by vmx_handle_exception_nmi_irqoff() */
=20
 	/*
-	 * Queue the exception here instead of in handle_nm_fault_irqoff().
+	 * Queue the exception here instead of in vmx_handle_nm_fault_irqoff().
 	 * This ensures the nested_vmx check is not skipped so vmexit can
 	 * be reflected to L1 (when it intercepts #NM) before reaching this
 	 * point.
@@ -6805,70 +6805,6 @@ void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64=
 *eoi_exit_bitmap)
 	vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
 }
=20
-void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
-
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
-					unsigned long entry)
-{
-	bool is_nmi =3D entry =3D=3D (unsigned long)asm_exc_nmi_noist;
-
-	kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
-	vmx_do_interrupt_nmi_irqoff(entry);
-	kvm_after_interrupt(vcpu);
-}
-
-static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
-{
-	/*
-	 * Save xfd_err to guest_fpu before interrupt is enabled, so the
-	 * MSR value is not clobbered by the host activity before the guest
-	 * has chance to consume it.
-	 *
-	 * Do not blindly read xfd_err here, since this exception might
-	 * be caused by L1 interception on a platform which doesn't
-	 * support xfd at all.
-	 *
-	 * Do it conditionally upon guest_fpu::xfd. xfd_err matters
-	 * only when xfd contains a non-zero value.
-	 *
-	 * Queuing exception is done in vmx_handle_exit. See comment there.
-	 */
-	if (vcpu->arch.guest_fpu.fpstate->xfd)
-		rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
-}
-
-static void handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_in=
fo)
-{
-	const unsigned long nmi_entry =3D (unsigned long)asm_exc_nmi_noist;
-
-	/* if exit due to PF check for async PF */
-	if (is_page_fault(intr_info))
-		vcpu->arch.apf.host_apf_flags =3D kvm_read_and_reset_apf_flags();
-	/* if exit due to NM, handle before interrupts are enabled */
-	else if (is_nm_fault(intr_info))
-		handle_nm_fault_irqoff(vcpu);
-	/* Handle machine checks before interrupts are enabled */
-	else if (is_machine_check(intr_info))
-		kvm_machine_check();
-	/* We need to handle NMIs before interrupts are enabled */
-	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(vcpu, nmi_entry);
-}
-
-static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
-					     u32 intr_info)
-{
-	unsigned int vector =3D intr_info & INTR_INFO_VECTOR_MASK;
-	gate_desc *desc =3D (gate_desc *)host_idt_base + vector;
-
-	if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
-	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
-		return;
-
-	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
-	vcpu->arch.at_instruction_boundary =3D true;
-}
-
 void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
@@ -6877,9 +6813,10 @@ void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 		return;
=20
 	if (vmx->exit_reason.basic =3D=3D EXIT_REASON_EXTERNAL_INTERRUPT)
-		handle_external_interrupt_irqoff(vcpu, vmx_get_intr_info(vcpu));
+		vmx_handle_external_interrupt_irqoff(vcpu,
+						     vmx_get_intr_info(vcpu));
 	else if (vmx->exit_reason.basic =3D=3D EXIT_REASON_EXCEPTION_NMI)
-		handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
+		vmx_handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
 }
=20
 /*
@@ -8127,7 +8064,7 @@ __init int vmx_hardware_setup(void)
 	int r;
=20
 	store_idt(&dt);
-	host_idt_base =3D dt.address;
+	vmx_host_idt_base =3D dt.address;
=20
 	vmx_setup_user_return_msrs();
=20
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 55BBDC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:24:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232429AbiI3KYe (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:24:34 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33818 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231693AbiI3KTe (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:34 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9B6D1CE60F;
        Fri, 30 Sep 2022 03:19:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533147; x=1696069147;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=6YyqKyK6x+kgbFyp0+yYM9gZEPI08ZOunLWqbUMhTLI=;
  b=hh3/igGUJbM+Ihop1LIoSpfyeylXBXw0wY8h4aUbVJYGewjyTM+rCBWY
   yrsRY0gCtYOWzWdtePdOa5KPNsIpk6ivYkVs4EYJYBfqVJfTQAuur7Tg/
   NIidURn0FDPv9xepxDL5iIpgnmQKAQRkU77vq/LB6249U50pQLAGe8ypZ
   Mha6uL9njiTKj880xmg5xWgefopWuupor2UrzeVhFKIDWFJXTP47TJJnU
   gYqEGTJ6JeeyoXaZGWZVjcESo+tCx+gv6K/zwanuQv+As13OL9P9/AM5z
   eiMfCKW+uw9Syb/rSUhjznCI+pGoF3CawuiFgzGN0q343g+gDxHGiar/M
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540153"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540153"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807774"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807774"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 083/105] KVM: x86: Split core of hypercall emulation to
 helper function
Date: Fri, 30 Sep 2022 03:18:17 -0700
Message-Id: 
 <dc9b481efe15f635338ec224490fb20a9da242c5.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

By necessity, TDX will use a different register ABI for hypercalls.
Break out the core functionality so that it may be reused for TDX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 +++
 arch/x86/kvm/x86.c              | 54 ++++++++++++++++++++-------------
 2 files changed, 37 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 96fd23392c1e..de7a716e774c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1998,6 +1998,10 @@ static inline void kvm_clear_apicv_inhibit(struct kv=
m *kvm,
 	kvm_set_or_clear_apicv_inhibit(kvm, reason, false);
 }
=20
+unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long=
 nr,
+				      unsigned long a0, unsigned long a1,
+				      unsigned long a2, unsigned long a3,
+				      int op_64_bit);
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
=20
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_=
code,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fda72bef6c90..43b1681a0952 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9526,26 +9526,15 @@ static int complete_hypercall_exit(struct kvm_vcpu =
*vcpu)
 	return kvm_skip_emulated_instruction(vcpu);
 }
=20
-int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
+unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long=
 nr,
+				      unsigned long a0, unsigned long a1,
+				      unsigned long a2, unsigned long a3,
+				      int op_64_bit)
 {
-	unsigned long nr, a0, a1, a2, a3, ret;
-	int op_64_bit;
-
-	if (kvm_xen_hypercall_enabled(vcpu->kvm))
-		return kvm_xen_hypercall(vcpu);
-
-	if (kvm_hv_hypercall_enabled(vcpu))
-		return kvm_hv_hypercall(vcpu);
-
-	nr =3D kvm_rax_read(vcpu);
-	a0 =3D kvm_rbx_read(vcpu);
-	a1 =3D kvm_rcx_read(vcpu);
-	a2 =3D kvm_rdx_read(vcpu);
-	a3 =3D kvm_rsi_read(vcpu);
+	unsigned long ret;
=20
 	trace_kvm_hypercall(nr, a0, a1, a2, a3);
=20
-	op_64_bit =3D is_64_bit_hypercall(vcpu);
 	if (!op_64_bit) {
 		nr &=3D 0xFFFFFFFF;
 		a0 &=3D 0xFFFFFFFF;
@@ -9554,11 +9543,6 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		a3 &=3D 0xFFFFFFFF;
 	}
=20
-	if (static_call(kvm_x86_get_cpl)(vcpu) !=3D 0) {
-		ret =3D -KVM_EPERM;
-		goto out;
-	}
-
 	ret =3D -KVM_ENOSYS;
=20
 	switch (nr) {
@@ -9617,6 +9601,34 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		ret =3D -KVM_ENOSYS;
 		break;
 	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(__kvm_emulate_hypercall);
+
+int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
+{
+	unsigned long nr, a0, a1, a2, a3, ret;
+	int op_64_bit;
+
+	if (kvm_xen_hypercall_enabled(vcpu->kvm))
+		return kvm_xen_hypercall(vcpu);
+
+	if (kvm_hv_hypercall_enabled(vcpu))
+		return kvm_hv_hypercall(vcpu);
+
+	nr =3D kvm_rax_read(vcpu);
+	a0 =3D kvm_rbx_read(vcpu);
+	a1 =3D kvm_rcx_read(vcpu);
+	a2 =3D kvm_rdx_read(vcpu);
+	a3 =3D kvm_rsi_read(vcpu);
+	op_64_bit =3D is_64_bit_hypercall(vcpu);
+
+	if (static_call(kvm_x86_get_cpl)(vcpu) !=3D 0) {
+		ret =3D -KVM_EPERM;
+		goto out;
+	}
+
+	ret =3D __kvm_emulate_hypercall(vcpu, nr, a0, a1, a2, a3, op_64_bit);
 out:
 	if (!op_64_bit)
 		ret =3D (u32)ret;
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 61CD2C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:24:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232230AbiI3KY1 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:24:27 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33566 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231691AbiI3KTe (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:34 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 695551E3F40;
        Fri, 30 Sep 2022 03:19:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533148; x=1696069148;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=c8DwAzFRRC708EGUaGtKouvjw4FjIA4SRZ4Wd61Djq8=;
  b=lGmnCabc35zaKSVpop0wjnhYNSteZG+CNnAfk9Vy1tYkieRO56Yo3MdU
   c923RQvCrXuyllITb8BYyHqYh00f4SZAjT4Q8+UU/IKixamS13yzDI7e5
   XWXFWGgwsiumtKMMovDdrebhKpi2g+pjhJ9MBQjLTGyJwy5cU2Mwb+o10
   uN48//RwJfTa8ofANJlpELNYDRPjBF+rTpJUsDx11HHNWBnKhBwJnHtiH
   phUA/LfFV5C5XPBTNsyo2Ka2vupSPbt9Pq2ZgTNjTjfsUitV6fTULZR6+
   +MtZz+q6u/GaL4fIww/AwJw2Ybf6f4xiwRktif7vGtPoRsrNW+w6sFozi
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540154"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540154"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807777"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807777"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 084/105] KVM: TDX: Add a place holder to handle TDX VM exit
Date: Fri, 30 Sep 2022 03:18:18 -0700
Message-Id: 
 <45b075f5442bc3646428093d58dcc015043129e2.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up handle_exit and handle_exit_irqoff methods and add a place holder
to handle VM exit.  Add helper functions to get exit info, exit
qualification, etc.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c    | 33 ++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 81 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h | 10 +++++
 3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index a2417c7f7ad7..ec322f4bcec5 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -178,6 +178,23 @@ static bool vt_protected_apic_has_interrupt(struct kvm=
_vcpu *vcpu)
 	return tdx_protected_apic_has_interrupt(vcpu);
 }
=20
+static int vt_handle_exit(struct kvm_vcpu *vcpu,
+			     enum exit_fastpath_completion fastpath)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_handle_exit(vcpu, fastpath);
+
+	return vmx_handle_exit(vcpu, fastpath);
+}
+
+static void vt_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_handle_exit_irqoff(vcpu);
+
+	vmx_handle_exit_irqoff(vcpu);
+}
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -379,6 +396,16 @@ static void vt_request_immediate_exit(struct kvm_vcpu =
*vcpu)
 	vmx_request_immediate_exit(vcpu);
 }
=20
+static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+			u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_get_exit_info(vcpu, reason, info1, info2, intr_info,
+					 error_code);
+
+	return vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_cod=
e);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -455,7 +482,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.vcpu_pre_run =3D vt_vcpu_pre_run,
 	.vcpu_run =3D vt_vcpu_run,
-	.handle_exit =3D vmx_handle_exit,
+	.handle_exit =3D vt_handle_exit,
 	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
 	.update_emulated_instruction =3D vmx_update_emulated_instruction,
 	.set_interrupt_shadow =3D vt_set_interrupt_shadow,
@@ -490,7 +517,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.set_identity_map_addr =3D vmx_set_identity_map_addr,
 	.get_mt_mask =3D vmx_get_mt_mask,
=20
-	.get_exit_info =3D vmx_get_exit_info,
+	.get_exit_info =3D vt_get_exit_info,
=20
 	.vcpu_after_set_cpuid =3D vmx_vcpu_after_set_cpuid,
=20
@@ -504,7 +531,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.load_mmu_pgd =3D vt_load_mmu_pgd,
=20
 	.check_intercept =3D vmx_check_intercept,
-	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
+	.handle_exit_irqoff =3D vt_handle_exit_irqoff,
=20
 	.request_immediate_exit =3D vt_request_immediate_exit,
=20
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 5e994d581d87..cd62a9f42ed0 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -65,6 +65,26 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u=
16 hkid)
 	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
 }
=20
+static __always_inline unsigned long tdexit_exit_qual(struct kvm_vcpu *vcp=
u)
+{
+	return kvm_rcx_read(vcpu);
+}
+
+static __always_inline unsigned long tdexit_ext_exit_qual(struct kvm_vcpu =
*vcpu)
+{
+	return kvm_rdx_read(vcpu);
+}
+
+static __always_inline unsigned long tdexit_gpa(struct kvm_vcpu *vcpu)
+{
+	return kvm_r8_read(vcpu);
+}
+
+static __always_inline unsigned long tdexit_intr_info(struct kvm_vcpu *vcp=
u)
+{
+	return kvm_r9_read(vcpu);
+}
+
 static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
 {
 	return tdx->tdvpr.added;
@@ -675,6 +695,25 @@ void tdx_inject_nmi(struct kvm_vcpu *vcpu)
 	td_management_write8(to_tdx(vcpu), TD_VCPU_PEND_NMI, 1);
 }
=20
+void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	u16 exit_reason =3D tdx->exit_reason.basic;
+
+	if (exit_reason =3D=3D EXIT_REASON_EXCEPTION_NMI)
+		vmx_handle_exception_nmi_irqoff(vcpu, tdexit_intr_info(vcpu));
+	else if (exit_reason =3D=3D EXIT_REASON_EXTERNAL_INTERRUPT)
+		vmx_handle_external_interrupt_irqoff(vcpu,
+						     tdexit_intr_info(vcpu));
+}
+
+static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
+{
+	vcpu->run->exit_reason =3D KVM_EXIT_SHUTDOWN;
+	vcpu->mmio_needed =3D 0;
+	return 0;
+}
+
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
 {
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
@@ -976,6 +1015,48 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, in=
t delivery_mode,
 	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
 }
=20
+int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
+{
+	union tdx_exit_reason exit_reason =3D to_tdx(vcpu)->exit_reason;
+
+	if (unlikely(exit_reason.non_recoverable || exit_reason.error)) {
+		if (exit_reason.basic =3D=3D EXIT_REASON_TRIPLE_FAULT)
+			return tdx_handle_triple_fault(vcpu);
+
+		kvm_pr_unimpl("TD exit 0x%llx, %d hkid 0x%x hkid pa 0x%llx\n",
+			      exit_reason.full, exit_reason.basic,
+			      to_kvm_tdx(vcpu->kvm)->hkid,
+			      set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid));
+		goto unhandled_exit;
+	}
+
+	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
+
+	switch (exit_reason.basic) {
+	default:
+		break;
+	}
+
+unhandled_exit:
+	vcpu->run->exit_reason =3D KVM_EXIT_UNKNOWN;
+	vcpu->run->hw.hardware_exit_reason =3D exit_reason.full;
+	return 0;
+}
+
+void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	*reason =3D tdx->exit_reason.full;
+
+	*info1 =3D tdexit_exit_qual(vcpu);
+	*info2 =3D tdexit_ext_exit_qual(vcpu);
+
+	*intr_info =3D tdexit_intr_info(vcpu);
+	*error_code =3D 0;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index fb630d17ccd1..b8dc1fb7ccb3 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -153,10 +153,15 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcp=
u);
 void tdx_vcpu_put(struct kvm_vcpu *vcpu);
 void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
+void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu);
+int tdx_handle_exit(struct kvm_vcpu *vcpu,
+		enum exit_fastpath_completion fastpath);
=20
 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 			   int trig_mode, int vector);
 void tdx_inject_nmi(struct kvm_vcpu *vcpu);
+void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -186,10 +191,15 @@ static inline void tdx_prepare_switch_to_guest(struct=
 kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
 static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)=
 { return false; }
+static inline void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu) {}
+static inline int tdx_handle_exit(struct kvm_vcpu *vcpu,
+		enum exit_fastpath_completion fastpath) { return 0; }
=20
 static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int deliv=
ery_mode,
 					 int trig_mode, int vector) {}
 static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {}
+static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u=
64 *info1,
+				     u64 *info2, u32 *intr_info, u32 *error_code) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 53EEEC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:25:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232479AbiI3KZO (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:25:14 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33856 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231747AbiI3KTw (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:19:52 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5B2C1ED6D3;
        Fri, 30 Sep 2022 03:19:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533150; x=1696069150;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=nO1ijqoPoZDVCg8xW46SAmH84xl5I/lJVGDucBYjzBo=;
  b=ib7u67QcvTDItZ63ml2kJggQDqqMIgkbGlhfyE+f/EzMkik4ZOYY94Az
   0YlNfTTM2iMCZzbK1LqDEHTooG6t3VURlR5O2JgJudIJIAcKIQoTUeY6a
   3rMpK/X0zqa3IuMbgpTxmqBwTnc0k3FP5oJxVc8a6X2BQNOODZxa0x78m
   JMaJfQlPgJOc18EmRW8tPMcde1J34qRJFCR2P6Zog9oaFiwd75V9Ge5ar
   Iv2J1FZXBCwvpitVA0XYnIpIJPcbmqYlgRhAGD/fZxPaLx8lZaq/rgLv4
   RWYn3/Ewq97rZq8wqt02J21dqCSB6o4UG3/or4KEz+Bpo8++i4l7DIeYE
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540156"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540156"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:06 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807782"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807782"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:05 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>, Yuan Yao <yuan.yao@intel.com>
Subject: [PATCH v9 085/105] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY
 with operand SEPT
Date: Fri, 30 Sep 2022 03:18:19 -0700
Message-Id: 
 <4c02bb911cfe94001b925d10a613ff61159ee7f4.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Yuan Yao <yuan.yao@intel.com>

TDX module internally uses locks to protect internal resources.  It tries
to acquire the locks.  If it fails to obtain the lock, it returns
TDX_OPERAND_BUSY error without spin because its execution time limitation.

TDX SEAMCALL API reference describes what resources are used.  It's known
which TDX SEAMCALL can cause contention with which resources.  VMM can
avoid contention inside the TDX module by avoiding contentious TDX SEAMCALL
with, for example, spinlock.  Because OS knows better its process
scheduling and its scalability, a lock at OS/VMM layer would work better
than simply retrying TDX SEAMCALLs.

TDH.MEM.* API except for TDH.MEM.TRACK operates on a secure EPT tree and
the TDX module internally tries to acquire the lock of the secure EPT tree.
They return TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT in case of failure to
get the lock.  TDX KVM uses kvm_tdx::seamcall_lock spinlock at OS/VMM layer
to avoid contention inside the TDX module.

TDH.VP.ENTER is an exception with zero-step attack mitigation.  Normally
TDH.VP.ENTER uses only TD vcpu resources and it doesn't cause contention.
When a zero-step attack is suspected, it obtains a secure EPT tree lock and
tracks the GPAs causing a secure EPT fault.  Thus TDG.VP.ENTER may result
in TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT.  Also TDH.MEM.* SEAMCALLs may
result in TDX_OPERAN_BUSY | TDX_OPERAND_ID_SEPT because TDH.VP.ENTER is not
protected with seamcall_lock.

Retry TDX TDH.MEM.* API and TDH.VP.ENTER on the error because the error is
a rare event caused by zero-step attack mitigation and spinlock can not be
used for TDH.VP.ENTER due to indefinite time execution.

Signed-off-by: Yuan Yao <yuan.yao@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c     |  4 ++++
 arch/x86/kvm/vmx/tdx_ops.h | 36 ++++++++++++++++++++++++++++++------
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index cd62a9f42ed0..977f0cc56ab8 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1019,6 +1019,10 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_=
t fastpath)
 {
 	union tdx_exit_reason exit_reason =3D to_tdx(vcpu)->exit_reason;
=20
+	/* See the comment of tdh_sept_seamcall(). */
+	if (unlikely(exit_reason.full =3D=3D (TDX_OPERAND_BUSY | TDX_OPERAND_ID_S=
EPT)))
+		return 1;
+
 	if (unlikely(exit_reason.non_recoverable || exit_reason.error)) {
 		if (exit_reason.basic =3D=3D EXIT_REASON_TRIPLE_FAULT)
 			return tdx_handle_triple_fault(vcpu);
diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
index 8cc2f01c509b..a50bc1445cc2 100644
--- a/arch/x86/kvm/vmx/tdx_ops.h
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -18,6 +18,26 @@
=20
 void pr_tdx_error(u64 op, u64 error_code, const struct tdx_module_output *=
out);
=20
+/*
+ * Although seamcal_lock protects seamcall to avoid contention inside the =
TDX
+ * module, it doesn't protect TDH.VP.ENTER.  With zero-step attack mitigat=
ion,
+ * TDH.VP.ENTER may rarely acquire SEPT lock and release it when zero-step
+ * attack is suspected.  It results in TDX_OPERAND_BUSY | TDX_OPERAND_ID_S=
EPT
+ * with TDH.MEM.* operation.  (TDH.MEM.TRACK is an exception.)  Because su=
ch
+ * error is rare event, just retry on those TDH.MEM operations and TDH.VP.=
ENTER.
+ */
+static inline u64 seamcall_sept_retry(u64 op, u64 rcx, u64 rdx, u64 r8, u6=
4 r9,
+				      struct tdx_module_output *out)
+{
+	u64 ret;
+
+	do {
+		ret =3D __seamcall(op, rcx, rdx, r8, r9, out);
+	} while (unlikely(ret =3D=3D (TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT)));
+
+	return ret;
+}
+
 static inline u64 tdh_mng_addcx(hpa_t tdr, hpa_t addr)
 {
 	clflush_cache_range(__va(addr), PAGE_SIZE);
@@ -28,14 +48,15 @@ static inline u64 tdh_mem_page_add(hpa_t tdr, gpa_t gpa=
, hpa_t hpa, hpa_t source
 				   struct tdx_module_output *out)
 {
 	clflush_cache_range(__va(hpa), PAGE_SIZE);
-	return __seamcall(TDH_MEM_PAGE_ADD, gpa, tdr, hpa, source, out);
+	return seamcall_sept_retry(TDH_MEM_PAGE_ADD, gpa, tdr, hpa, source, out);
 }
=20
 static inline u64 tdh_mem_sept_add(hpa_t tdr, gpa_t gpa, int level, hpa_t =
page,
 				   struct tdx_module_output *out)
 {
 	clflush_cache_range(__va(page), PAGE_SIZE);
-	return __seamcall(TDH_MEM_SEPT_ADD, gpa | level, tdr, page, 0, out);
+	return seamcall_sept_retry(TDH_MEM_SEPT_ADD, gpa | level, tdr, page, 0,
+				   out);
 }
=20
 static inline u64 tdh_mem_sept_remove(hpa_t tdr, gpa_t gpa, int level,
@@ -61,13 +82,14 @@ static inline u64 tdh_mem_page_aug(hpa_t tdr, gpa_t gpa=
, hpa_t hpa,
 				   struct tdx_module_output *out)
 {
 	clflush_cache_range(__va(hpa), PAGE_SIZE);
-	return __seamcall(TDH_MEM_PAGE_AUG, gpa, tdr, hpa, 0, out);
+	return seamcall_sept_retry(TDH_MEM_PAGE_AUG, gpa, tdr, hpa, 0, out);
 }
=20
 static inline u64 tdh_mem_range_block(hpa_t tdr, gpa_t gpa, int level,
 				      struct tdx_module_output *out)
 {
-	return __seamcall(TDH_MEM_RANGE_BLOCK, gpa | level, tdr, 0, 0, out);
+	return seamcall_sept_retry(TDH_MEM_RANGE_BLOCK, gpa | level, tdr, 0, 0,
+				   out);
 }
=20
 static inline u64 tdh_mng_key_config(hpa_t tdr)
@@ -149,7 +171,8 @@ static inline u64 tdh_phymem_page_reclaim(hpa_t page,
 static inline u64 tdh_mem_page_remove(hpa_t tdr, gpa_t gpa, int level,
 				      struct tdx_module_output *out)
 {
-	return __seamcall(TDH_MEM_PAGE_REMOVE, gpa | level, tdr, 0, 0, out);
+	return seamcall_sept_retry(TDH_MEM_PAGE_REMOVE, gpa | level, tdr, 0, 0,
+				   out);
 }
=20
 static inline u64 tdh_sys_lp_shutdown(void)
@@ -165,7 +188,8 @@ static inline u64 tdh_mem_track(hpa_t tdr)
 static inline u64 tdh_mem_range_unblock(hpa_t tdr, gpa_t gpa, int level,
 					struct tdx_module_output *out)
 {
-	return __seamcall(TDH_MEM_RANGE_UNBLOCK, gpa | level, tdr, 0, 0, out);
+	return seamcall_sept_retry(TDH_MEM_RANGE_UNBLOCK, gpa | level, tdr, 0, 0,
+				   out);
 }
=20
 static inline u64 tdh_phymem_cache_wb(bool resume)
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 91466C433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:26:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232574AbiI3K0G (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:26:06 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33604 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231875AbiI3KUo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:20:44 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECDCC1F11E0;
        Fri, 30 Sep 2022 03:19:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533152; x=1696069152;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=AljX8FFgH0+mYqTEgK3uK5bmoc3/YW38RsQiiehMTOQ=;
  b=hZpd8pEzJAeKmKGGtez9hYybm8zJH00qOLnufK1GYCadKV/PlAz2M1lU
   /nq75AnX8jlnPL7VQwDKQ/aIrwWnsl4JQRQuDP6LM86U99xL7+8poxJz8
   M0DFq/HcdFQtRF4QwPBN9Hczl/n+3ODbf+/l0vRKpuejxtPvjhFBUSdON
   zJSv7g2ptBQds0AVjciITSRjEIbLBGYV0rFZLRCq1plbR3FGia0wwmZiX
   BUcR78qDVoqzab5iwy2vGeZBATZTT0GY+rewVu9ZW9XAcHFfbWuTBtWDS
   VjBRulWqrKmiODqLMMyL4iszTG/Kojj2Ht1qajgwkbTVou2cqgOxfocTl
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540157"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540157"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:06 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807785"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807785"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:06 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 086/105] KVM: TDX: handle EXIT_REASON_OTHER_SMI
Date: Fri, 30 Sep 2022 03:18:20 -0700
Message-Id: 
 <cd40ff68a1330a783eaaa69e97c3599456fffdae.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

If the control reaches EXIT_REASON_OTHER_SMI, #SMI is delivered and
handled right after returning from the TDX module to KVM nothing needs to
be done in KVM.  Continue TDX vcpu execution.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/uapi/asm/vmx.h | 1 +
 arch/x86/kvm/vmx/tdx.c          | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vm=
x.h
index a5faf6d88f1b..b3a30ef3efdd 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -34,6 +34,7 @@
 #define EXIT_REASON_TRIPLE_FAULT        2
 #define EXIT_REASON_INIT_SIGNAL			3
 #define EXIT_REASON_SIPI_SIGNAL         4
+#define EXIT_REASON_OTHER_SMI           6
=20
 #define EXIT_REASON_INTERRUPT_WINDOW    7
 #define EXIT_REASON_NMI_WINDOW          8
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 977f0cc56ab8..fbf7968bf718 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1037,6 +1037,13 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_=
t fastpath)
 	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
=20
 	switch (exit_reason.basic) {
+	case EXIT_REASON_OTHER_SMI:
+		/*
+		 * If reach here, it's not a Machine Check System Management
+		 * Interrupt(MSMI).  #SMI is delivered and handled right after
+		 * SEAMRET, nothing needs to be done in KVM.
+		 */
+		return 1;
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9F015C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:25:47 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232535AbiI3KZo (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:25:44 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40456 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231859AbiI3KUk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:20:40 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7052A1F9CB9;
        Fri, 30 Sep 2022 03:19:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533154; x=1696069154;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Uio+2tF2pQzjJFOB8U+cXpbHmxFv+7xgGFcTZNRudcc=;
  b=WZ7uPNi+D7JIbJA0xC99H4WvL+wpT/NGe41iAV75RWuwrmbnaAruIAfu
   eugRWb0Y3uM1W8xbQ01l4o55H0Ytx6sgL2WWF0l5u/qdDG7nb5gEngcXC
   HHigrpbLiNXXt8CUDIfsF9FrqvKjQ5JJUka4wT6rbmE7OXpGz8QCXGT6Q
   KFqiBrkOIy2jWhg8pIsGaYS9AKTpHEw0OVubgEn/IKh91Wyn6+MJuBtRq
   44psKSmxgzWCMgRAdPdz1i14DghfVHoCdaB7/8wwzfFiTp6E70oxYMd6n
   dWIFL/bLcpfiBPJ77AAc/AePRDw5d87MpV/5p91uydPr5f3PhiucBvJKN
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540158"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540158"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:06 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807788"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807788"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:06 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 087/105] KVM: TDX: handle ept violation/misconfig exit
Date: Fri, 30 Sep 2022 03:18:21 -0700
Message-Id: 
 <07f45330c377c6fbd700e8538ed9f718bac07598.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

On EPT violation, call a common function, __vmx_handle_ept_violation() to
trigger x86 MMU code.  On EPT misconfiguration, exit to ring 3 with
KVM_EXIT_UNKNOWN.  because EPT misconfiguration can't happen as MMIO is
trigged by TDG.VP.VMCALL. No point to set a misconfiguration value for the
fast path.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 46 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index fbf7968bf718..def8eb3df75f 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1015,6 +1015,48 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, i=
nt delivery_mode,
 	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
 }
=20
+static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qual;
+
+	if (kvm_is_private_gpa(vcpu->kvm, tdexit_gpa(vcpu))) {
+		/*
+		 * Always treat SEPT violations as write faults.  Ignore the
+		 * EXIT_QUALIFICATION reported by TDX-SEAM for SEPT violations.
+		 * TD private pages are always RWX in the SEPT tables,
+		 * i.e. they're always mapped writable.  Just as importantly,
+		 * treating SEPT violations as write faults is necessary to
+		 * avoid COW allocations, which will cause TDAUGPAGE failures
+		 * due to aliasing a single HPA to multiple GPAs.
+		 */
+#define TDX_SEPT_VIOLATION_EXIT_QUAL	EPT_VIOLATION_ACC_WRITE
+		exit_qual =3D TDX_SEPT_VIOLATION_EXIT_QUAL;
+	} else {
+		exit_qual =3D tdexit_exit_qual(vcpu);
+		if (exit_qual & EPT_VIOLATION_ACC_INSTR) {
+			pr_warn("kvm: TDX instr fetch to shared GPA =3D 0x%lx @ RIP =3D 0x%lx\n=
",
+				tdexit_gpa(vcpu), kvm_rip_read(vcpu));
+			vcpu->run->exit_reason =3D KVM_EXIT_EXCEPTION;
+			vcpu->run->ex.exception =3D PF_VECTOR;
+			vcpu->run->ex.error_code =3D exit_qual;
+			return 0;
+		}
+	}
+
+	trace_kvm_page_fault(tdexit_gpa(vcpu), exit_qual);
+	return __vmx_handle_ept_violation(vcpu, tdexit_gpa(vcpu), exit_qual);
+}
+
+static int tdx_handle_ept_misconfig(struct kvm_vcpu *vcpu)
+{
+	WARN_ON_ONCE(1);
+
+	vcpu->run->exit_reason =3D KVM_EXIT_UNKNOWN;
+	vcpu->run->hw.hardware_exit_reason =3D EXIT_REASON_EPT_MISCONFIG;
+
+	return 0;
+}
+
 int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
 {
 	union tdx_exit_reason exit_reason =3D to_tdx(vcpu)->exit_reason;
@@ -1037,6 +1079,10 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_=
t fastpath)
 	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
=20
 	switch (exit_reason.basic) {
+	case EXIT_REASON_EPT_VIOLATION:
+		return tdx_handle_ept_violation(vcpu);
+	case EXIT_REASON_EPT_MISCONFIG:
+		return tdx_handle_ept_misconfig(vcpu);
 	case EXIT_REASON_OTHER_SMI:
 		/*
 		 * If reach here, it's not a Machine Check System Management
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8C9A0C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:26:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232566AbiI3K0B (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:26:01 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40484 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231864AbiI3KUk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:20:40 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 702AC1F8C12;
        Fri, 30 Sep 2022 03:19:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533154; x=1696069154;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ST4szoZsbOwYTc04ubkpP+Tdapj3PLlz5Lv25v02VM4=;
  b=N8KScZOFuf2z0INRIfUFKYv1CHaUGSDWVkHPxTegpQUfqyEQiLpXHxVg
   HgK2YYcycX2c8JQqKdvcsfaJ7RL7uFxDjmJUI4Tuw7SqxfWgW9AdQ+fzX
   krGNx7vfPuFRiwHmxhliin2ys+2Sl4KtSX9ZhOWZujDXnfK6qpmF95mdY
   49XlWERbD2BW4kdNhLlIugFPSfEuZocQos2eDQHRrwwvL2zEeggfmu9Ji
   ctSpcCpMGLECOhztqu4ybGxvuoF27rz4Gd5vH7EYcNI21yo7AAt0siZ7b
   QHL8pr7xBD0iNqOOldujY0FpdCyaSuDIBpbJCVarbHb79Hs2KQTGBji8V
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540160"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540160"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:06 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807791"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807791"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:06 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 088/105] KVM: TDX: handle EXCEPTION_NMI and
 EXTERNAL_INTERRUPT
Date: Fri, 30 Sep 2022 03:18:22 -0700
Message-Id: 
 <56197179cc33dd80efea20058e09077b85a43ec3.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because guest TD state is protected, exceptions in guest TDs can't be
intercepted.  TDX VMM doesn't need to handle exceptions.
tdx_handle_exit_irqoff() handles NMI and machine check.  Ignore NMI and
machine check and continue guest TD execution.

For external interrupt, increment stats same to the VMX case.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index def8eb3df75f..7edf28a36d83 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -707,6 +707,25 @@ void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 						     tdexit_intr_info(vcpu));
 }
=20
+static int tdx_handle_exception(struct kvm_vcpu *vcpu)
+{
+	u32 intr_info =3D tdexit_intr_info(vcpu);
+
+	if (is_nmi(intr_info) || is_machine_check(intr_info))
+		return 1;
+
+	kvm_pr_unimpl("unexpected exception 0x%x(exit_reason 0x%llx qual 0x%lx)\n=
",
+		intr_info,
+		to_tdx(vcpu)->exit_reason.full, tdexit_exit_qual(vcpu));
+	return -EFAULT;
+}
+
+static int tdx_handle_external_interrupt(struct kvm_vcpu *vcpu)
+{
+	++vcpu->stat.irq_exits;
+	return 1;
+}
+
 static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
 {
 	vcpu->run->exit_reason =3D KVM_EXIT_SHUTDOWN;
@@ -1079,6 +1098,10 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_=
t fastpath)
 	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
=20
 	switch (exit_reason.basic) {
+	case EXIT_REASON_EXCEPTION_NMI:
+		return tdx_handle_exception(vcpu);
+	case EXIT_REASON_EXTERNAL_INTERRUPT:
+		return tdx_handle_external_interrupt(vcpu);
 	case EXIT_REASON_EPT_VIOLATION:
 		return tdx_handle_ept_violation(vcpu);
 	case EXIT_REASON_EPT_MISCONFIG:
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9757AC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:26:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232604AbiI3K0h (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:26:37 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33580 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231982AbiI3KVS (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:21:18 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5BA315AB50;
        Fri, 30 Sep 2022 03:19:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533161; x=1696069161;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=/VSC/zTeWjJYaVHuyqjuhTCPose3J3idbrcT67ndi1I=;
  b=E9OO9wNonIO6SCbadMUYgZjDPo3jmK7E68Wv/MdPIpnhgZXl+rnNWrGI
   s2UChSNO6ulgG8MDUKo9yRF/3dB6+pngM4ibeUtkb5fP6ygYxmVoCpdaw
   7JABEkd6YmRR8dEGOR6MeJMivDcxR9ngNLCn7AWABz7ZbzO0beHVf7qds
   gryRiB/tL7Glk7SVYP5HCTxf+Bf1x5TaI3bcm+x1is928h7zFC4a8W1ny
   +AmbNrEDYf84wcIcvdujbzT8vJ5eO5oVTX4ef4NQ9CiJv2fko9spRTxaz
   5tC5qMFMl2XukNJWa0FpiBQEmh2CjTKArPf9UIe8bMmXtSlN9X90osn+7
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540163"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540163"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:06 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807795"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807795"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:06 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 089/105] KVM: TDX: Add a place holder for handler of TDX
 hypercalls (TDG.VP.VMCALL)
Date: Fri, 30 Sep 2022 03:18:23 -0700
Message-Id: 
 <23c46b3896a56ca2c7bc980bbd1c27f9dcb8e4b0.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX module specification defines TDG.VP.VMCALL API (TDVMCALL for short)
for the guest TD to call hypercall to VMM.  When the guest TD issues
TDG.VP.VMCALL, the guest TD exits to VMM with a new exit reason of
TDVMCALL.  The arguments from the guest TD and returned values from the VMM
are passed in the guest registers.  The guest RCX registers indicates which
registers are used.  Define helper functions to access those registers as
ABI.

Define the TDVMCALL exit reason, which is carved out from the VMX exit
reason namespace as the TDVMCALL exit from TDX guest to TDX-SEAM is really
just a VM-Exit.  Add a place holder to handle TDVMCALL exit.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/uapi/asm/vmx.h |  4 ++-
 arch/x86/kvm/vmx/tdx.c          | 56 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h          | 13 ++++++++
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vm=
x.h
index b3a30ef3efdd..f0f4a4cf84a7 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -93,6 +93,7 @@
 #define EXIT_REASON_TPAUSE              68
 #define EXIT_REASON_BUS_LOCK            74
 #define EXIT_REASON_NOTIFY              75
+#define EXIT_REASON_TDCALL              77
=20
 #define VMX_EXIT_REASONS \
 	{ EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
@@ -156,7 +157,8 @@
 	{ EXIT_REASON_UMWAIT,                "UMWAIT" }, \
 	{ EXIT_REASON_TPAUSE,                "TPAUSE" }, \
 	{ EXIT_REASON_BUS_LOCK,              "BUS_LOCK" }, \
-	{ EXIT_REASON_NOTIFY,                "NOTIFY" }
+	{ EXIT_REASON_NOTIFY,                "NOTIFY" }, \
+	{ EXIT_REASON_TDCALL,                "TDCALL" }
=20
 #define VMX_EXIT_REASON_FLAGS \
 	{ VMX_EXIT_REASONS_FAILED_VMENTRY,	"FAILED_VMENTRY" }
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 7edf28a36d83..3392da81ef14 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -85,6 +85,41 @@ static __always_inline unsigned long tdexit_intr_info(st=
ruct kvm_vcpu *vcpu)
 	return kvm_r9_read(vcpu);
 }
=20
+#define BUILD_TDVMCALL_ACCESSORS(param, gpr)				\
+static __always_inline							\
+unsigned long tdvmcall_##param##_read(struct kvm_vcpu *vcpu)		\
+{									\
+	return kvm_##gpr##_read(vcpu);					\
+}									\
+static __always_inline void tdvmcall_##param##_write(struct kvm_vcpu *vcpu=
, \
+						     unsigned long val)	\
+{									\
+	kvm_##gpr##_write(vcpu, val);					\
+}
+BUILD_TDVMCALL_ACCESSORS(a0, r12);
+BUILD_TDVMCALL_ACCESSORS(a1, r13);
+BUILD_TDVMCALL_ACCESSORS(a2, r14);
+BUILD_TDVMCALL_ACCESSORS(a3, r15);
+
+static __always_inline unsigned long tdvmcall_exit_type(struct kvm_vcpu *v=
cpu)
+{
+	return kvm_r10_read(vcpu);
+}
+static __always_inline unsigned long tdvmcall_leaf(struct kvm_vcpu *vcpu)
+{
+	return kvm_r11_read(vcpu);
+}
+static __always_inline void tdvmcall_set_return_code(struct kvm_vcpu *vcpu,
+						     long val)
+{
+	kvm_r10_write(vcpu, val);
+}
+static __always_inline void tdvmcall_set_return_val(struct kvm_vcpu *vcpu,
+						    unsigned long val)
+{
+	kvm_r11_write(vcpu, val);
+}
+
 static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
 {
 	return tdx->tdvpr.added;
@@ -654,7 +689,8 @@ static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu=
 *vcpu,
 					struct vcpu_tdx *tdx)
 {
 	guest_enter_irqoff();
-	tdx->exit_reason.full =3D __tdx_vcpu_run(tdx->tdvpr.pa, vcpu->arch.regs, =
0);
+	tdx->exit_reason.full =3D __tdx_vcpu_run(tdx->tdvpr.pa, vcpu->arch.regs,
+					tdx->tdvmcall.regs_mask);
 	guest_exit_irqoff();
 }
=20
@@ -687,6 +723,11 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	tdx_complete_interrupts(vcpu);
=20
+	if (tdx->exit_reason.basic =3D=3D EXIT_REASON_TDCALL)
+		tdx->tdvmcall.rcx =3D vcpu->arch.regs[VCPU_REGS_RCX];
+	else
+		tdx->tdvmcall.rcx =3D 0;
+
 	return EXIT_FASTPATH_NONE;
 }
=20
@@ -733,6 +774,17 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vc=
pu)
 	return 0;
 }
=20
+static int handle_tdvmcall(struct kvm_vcpu *vcpu)
+{
+	switch (tdvmcall_leaf(vcpu)) {
+	default:
+		break;
+	}
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+	return 1;
+}
+
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
 {
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
@@ -1102,6 +1154,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t=
 fastpath)
 		return tdx_handle_exception(vcpu);
 	case EXIT_REASON_EXTERNAL_INTERRUPT:
 		return tdx_handle_external_interrupt(vcpu);
+	case EXIT_REASON_TDCALL:
+		return handle_tdvmcall(vcpu);
 	case EXIT_REASON_EPT_VIOLATION:
 		return tdx_handle_ept_violation(vcpu);
 	case EXIT_REASON_EPT_MISCONFIG:
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 003544ead8fb..ce2f49e15243 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -92,6 +92,19 @@ struct vcpu_tdx {
=20
 	struct list_head cpu_list;
=20
+	union {
+		struct {
+			union {
+				struct {
+					u16 gpr_mask;
+					u16 xmm_mask;
+				};
+				u32 regs_mask;
+			};
+			u32 reserved;
+		};
+		u64 rcx;
+	} tdvmcall;
 	union tdx_exit_reason exit_reason;
=20
 	bool vcpu_initialized;
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 39C5FC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:26:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232617AbiI3K0u (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:26:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43970 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232036AbiI3KVk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:21:40 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4A8A15AB66;
        Fri, 30 Sep 2022 03:19:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533164; x=1696069164;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=HIrS6Twn7gSGBcV6L396WMWo2mZfJEBr9hTtzQbiApM=;
  b=NfhpYQ9bQlX5OlMV98ViIWlgJg+t9n8gqalR3HHO9TifWlzDEDobVLbW
   x987wMcNGIfRtgCRDmjaQv8nRg3g+DcvM+wMeXlc01gCRArgdfuUjU93W
   F8BSpWJOO7TtNFpoFVNyYY82oQ6cthL7IL0Z09RNNtciWzrR7IxugrZEO
   EM3mhhhViXQg6SGfWnIdEnUIoUgEsIVgwbMTWpr6lhUNe8xGgzlKx5M+I
   otrwNgj+mgPAcR8eA6uYfeu9m58QffS4zNjOLsOTaJzgvUEQYN3LHG2vA
   7HRfQRjBOJE+xfdgitvnAIRtQ7m7TikDz/P5PZNM/fzV/ZLE0SrWo+zhd
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540165"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540165"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:06 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807798"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807798"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:06 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 090/105] KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL
Date: Fri, 30 Sep 2022 03:18:24 -0700
Message-Id: 
 <50aea5742fbf45d17248903ab3811a4cf32b7fc6.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX Guest-Host communication interface (GHCI) specification defines
the ABI for the guest TD to issue hypercall.   It reserves vendor specific
arguments for VMM specific use.  Use it as KVM hypercall and handle it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 3392da81ef14..4e3c45e3b24d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -774,8 +774,39 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vc=
pu)
 	return 0;
 }
=20
+static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
+{
+	unsigned long nr, a0, a1, a2, a3, ret;
+
+	/*
+	 * ABI for KVM tdvmcall argument:
+	 * In Guest-Hypervisor Communication Interface(GHCI) specification,
+	 * Non-zero leaf number (R10 !=3D 0) is defined to indicate
+	 * vendor-specific.  KVM uses this for KVM hypercall.  NOTE: KVM
+	 * hypercall number starts from one.  Zero isn't used for KVM hypercall
+	 * number.
+	 *
+	 * R10: KVM hypercall number
+	 * arguments: R11, R12, R13, R14.
+	 */
+	nr =3D kvm_r10_read(vcpu);
+	a0 =3D kvm_r11_read(vcpu);
+	a1 =3D kvm_r12_read(vcpu);
+	a2 =3D kvm_r13_read(vcpu);
+	a3 =3D kvm_r14_read(vcpu);
+
+	ret =3D __kvm_emulate_hypercall(vcpu, nr, a0, a1, a2, a3, true);
+
+	tdvmcall_set_return_code(vcpu, ret);
+
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
+	if (tdvmcall_exit_type(vcpu))
+		return tdx_emulate_vmcall(vcpu);
+
 	switch (tdvmcall_leaf(vcpu)) {
 	default:
 		break;
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8AE32C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:27:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232469AbiI3K1F (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:27:05 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33566 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232297AbiI3KXF (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:23:05 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D134F163B60;
        Fri, 30 Sep 2022 03:19:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533171; x=1696069171;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=AhKoPnSHGVSzISgtP2jtzWItLYKNfc5a26JG8TZ/m4A=;
  b=n2OrMcVMwjUQP+CTRTdY3N4e2nHEG05nvalwdWGxFB8L/xSYWUk/D5xE
   b55bT9w62byjYM/hJ2JDZBRWTiZlAewWU9V5SAMeTHkAos0z+Od8jHg5T
   1dx5Y4m2NubLsfrYrZL7cYEajIJeTWcRZWyD2GAJqAa6gvY7BFaMN4HPN
   iZA187GJiV4xb3mdXwR4t/dvko0vU+UW8JfRYq9XzpjozwLZ1L9dOO1Jf
   C9bLEmPcZgHkuLcSjP535oMcDcp4oNGO6JZIJGcJShpv8AO4yBQVWfhw7
   L1SB8jaGmT9qHAzJxeXlNQed65bz42i6ZZpqwUZNavpVP4jzaTczSy401
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540168"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540168"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:07 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807801"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807801"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:06 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 091/105] KVM: TDX: Handle TDX PV CPUID hypercall
Date: Fri, 30 Sep 2022 03:18:25 -0700
Message-Id: 
 <ba44ce0e50e2cacc9a9d7092808bc1aef3d120af.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV CPUID hypercall to the KVM backend function.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 4e3c45e3b24d..16bee3b38bf4 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -802,12 +802,34 @@ static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_emulate_cpuid(struct kvm_vcpu *vcpu)
+{
+	u32 eax, ebx, ecx, edx;
+
+	/* EAX and ECX for cpuid is stored in R12 and R13. */
+	eax =3D tdvmcall_a0_read(vcpu);
+	ecx =3D tdvmcall_a1_read(vcpu);
+
+	kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, false);
+
+	tdvmcall_a0_write(vcpu, eax);
+	tdvmcall_a1_write(vcpu, ebx);
+	tdvmcall_a2_write(vcpu, ecx);
+	tdvmcall_a3_write(vcpu, edx);
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
 		return tdx_emulate_vmcall(vcpu);
=20
 	switch (tdvmcall_leaf(vcpu)) {
+	case EXIT_REASON_CPUID:
+		return tdx_emulate_cpuid(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6E5DFC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:27:04 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232637AbiI3K1B (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:27:01 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42752 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231400AbiI3KXD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:23:03 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35485B853;
        Fri, 30 Sep 2022 03:19:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533180; x=1696069180;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=yjeOmKxSHXyQaROUSIDgVbHM093jimLkUc7a+n1Knms=;
  b=ijXM0/WJKnoFOvt7i874MEPpXdYkM5MUBdV6lsOSaXnEHbOOURtUb7zS
   GThEdVG9TBmUDQyVHgnyH/rCgiYZgnfezyswue8znCVQSRYOJbe/6W63O
   OCUgMUb9eGrywZZ8Kv3jwHjTre0x+jfeBnpW1kdF/oMAbiYXf/RGCTBbL
   asTfoi5Z3l3y6ow8le+Pbw2J4WydARJu7mtDK3v5sVVH8F5ShCzbAe/za
   yex8rD4H7yJELchsk2gVMoXOy67iD946/LfVGDWui1uqJHzO2hoNMNAmS
   vqh2hHyxsH2CA4uvlwGb5nGy02N+RbbMhdL8sDSJhddU09LoffrBQgaV2
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540170"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540170"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:07 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807804"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807804"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:07 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 092/105] KVM: TDX: Handle TDX PV HLT hypercall
Date: Fri, 30 Sep 2022 03:18:26 -0700
Message-Id: 
 <1a481884ea0b0bf832c7d01977706692052245b4.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV HLT hypercall to the KVM backend function.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h |  3 +++
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 16bee3b38bf4..73dba86f9341 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -520,7 +520,32 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
=20
 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
 {
-	return pi_has_pending_interrupt(vcpu);
+	bool ret =3D pi_has_pending_interrupt(vcpu);
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	if (ret || vcpu->arch.mp_state !=3D KVM_MP_STATE_HALTED)
+		return true;
+
+	if (tdx->interrupt_disabled_hlt)
+		return false;
+
+	/*
+	 * This is for the case where the virtual interrupt is recognized,
+	 * i.e. set in vmcs.RVI, between the STI and "HLT".  KVM doesn't have
+	 * access to RVI and the interrupt is no longer in the PID (because it
+	 * was "recognized".  It doesn't get delivered in the guest because the
+	 * TDCALL completes before interrupts are enabled.
+	 *
+	 * TDX modules sets RVI while in an STI interrupt shadow.
+	 * - TDExit(typically TDG.VP.VMCALL<HLT>) from the guest to TDX module.
+	 *   The interrupt shadow at this point is gone.
+	 * - It knows that there is an interrupt that can be delivered
+	 *   (RVI > PPR && EFLAGS.IF=3D1, the other conditions of 29.2.2 don't
+	 *    matter)
+	 * - It forwards the TDExit nevertheless, to a clueless hypervisor that
+	 *   has no way to glean either RVI or PPR.
+	 */
+	return !!xchg(&tdx->buggy_hlt_workaround, 0);
 }
=20
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
@@ -822,6 +847,17 @@ static int tdx_emulate_cpuid(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_emulate_hlt(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	/* See tdx_protected_apic_has_interrupt() to avoid heavy seamcall */
+	tdx->interrupt_disabled_hlt =3D tdvmcall_a0_read(vcpu);
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	return kvm_emulate_halt_noskip(vcpu);
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -830,6 +866,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 	switch (tdvmcall_leaf(vcpu)) {
 	case EXIT_REASON_CPUID:
 		return tdx_emulate_cpuid(vcpu);
+	case EXIT_REASON_HLT:
+		return tdx_emulate_hlt(vcpu);
 	default:
 		break;
 	}
@@ -1135,6 +1173,8 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, in=
t delivery_mode,
 	struct kvm_vcpu *vcpu =3D apic->vcpu;
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
=20
+	/* See comment in tdx_protected_apic_has_interrupt(). */
+	tdx->buggy_hlt_workaround =3D 1;
 	/* TDX supports only posted interrupt.  No lapic emulation. */
 	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
 }
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index ce2f49e15243..e79bdf01ad3e 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -113,6 +113,9 @@ struct vcpu_tdx {
 	bool host_state_need_restore;
 	u64 msr_host_kernel_gs_base;
=20
+	bool interrupt_disabled_hlt;
+	unsigned int buggy_hlt_workaround;
+
 	/*
 	 * Dummy to make pmu_intel not corrupt memory.
 	 * TODO: Support PMU for TDX.  Future work.
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1234EC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:27:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232677AbiI3K1W (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:27:22 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39210 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231634AbiI3KXt (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:23:49 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 145BF10FFE;
        Fri, 30 Sep 2022 03:19:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533187; x=1696069187;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Gv9BuQ+6pfhrbm7O8VR04zbdkahHdfZyEWkOIY6sXKg=;
  b=c1X3JlApMg4y0wbZKYBFwyVMVZNUJrjYYV1t9X0oTqIb8jdIgMfYsMBw
   E5ZxYAhZNcLM65RBNsLqrflMnlSBGpKEoEPGZaJUoDMXUp3RFJBCw5C0M
   YqVbBnR0AM18uoc9oXAglC3ooIeIlowlT0WmeMvWLAYqS7JttOYT2GFGe
   s46DGhmzAHH09t88iIOHFVAS1xNk9PY1sYVY6uZmLranRlQm9posUEqXy
   BPxgxf66sl8Dg6L+4wIEOfuz5URUq1S2fWN8KCyDb85Ng7Mdza2iPMGPK
   PvDxD1KgV/0VQWZsyuH/FGx5wvFpFzIE7+WyfI3w2qcMo0ILCdi/bAaIM
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540171"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540171"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:07 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807807"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807807"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:07 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 093/105] KVM: TDX: Handle TDX PV port io hypercall
Date: Fri, 30 Sep 2022 03:18:27 -0700
Message-Id: 
 <dd6a65ed69d25e3bcc9e070f4af3c8fa0fea050b.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV port IO hypercall to the KVM backend function.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 57 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 73dba86f9341..36af819ee99a 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -858,6 +858,61 @@ static int tdx_emulate_hlt(struct kvm_vcpu *vcpu)
 	return kvm_emulate_halt_noskip(vcpu);
 }
=20
+static int tdx_complete_pio_in(struct kvm_vcpu *vcpu)
+{
+	struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt;
+	unsigned long val =3D 0;
+	int ret;
+
+	WARN_ON_ONCE(vcpu->arch.pio.count !=3D 1);
+
+	ret =3D ctxt->ops->pio_in_emulated(ctxt, vcpu->arch.pio.size,
+					 vcpu->arch.pio.port, &val, 1);
+	WARN_ON_ONCE(!ret);
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	tdvmcall_set_return_val(vcpu, val);
+
+	return 1;
+}
+
+static int tdx_emulate_io(struct kvm_vcpu *vcpu)
+{
+	struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt;
+	unsigned long val =3D 0;
+	unsigned int port;
+	int size, ret;
+	bool write;
+
+	++vcpu->stat.io_exits;
+
+	size =3D tdvmcall_a0_read(vcpu);
+	write =3D tdvmcall_a1_read(vcpu);
+	port =3D tdvmcall_a2_read(vcpu);
+
+	if (size !=3D 1 && size !=3D 2 && size !=3D 4) {
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+		return 1;
+	}
+
+	if (write) {
+		val =3D tdvmcall_a3_read(vcpu);
+		ret =3D ctxt->ops->pio_out_emulated(ctxt, size, port, &val, 1);
+
+		/* No need for a complete_userspace_io callback. */
+		vcpu->arch.pio.count =3D 0;
+	} else {
+		ret =3D ctxt->ops->pio_in_emulated(ctxt, size, port, &val, 1);
+		if (!ret)
+			vcpu->arch.complete_userspace_io =3D tdx_complete_pio_in;
+		else
+			tdvmcall_set_return_val(vcpu, val);
+	}
+	if (ret)
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	return ret;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -868,6 +923,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_cpuid(vcpu);
 	case EXIT_REASON_HLT:
 		return tdx_emulate_hlt(vcpu);
+	case EXIT_REASON_IO_INSTRUCTION:
+		return tdx_emulate_io(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3A87EC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:27:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231778AbiI3K1L (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:27:11 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35196 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232342AbiI3KX0 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:23:26 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E80D118340;
        Fri, 30 Sep 2022 03:19:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533188; x=1696069188;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=0ob0qtwwyU00wqWiTDI1WWJGB+kDYa0XGcRMLI6hNWk=;
  b=aFWnReDLdexCslnWQwcrK58RymjcyC0C93B2vxUdOLtsnex2v77w29u6
   gKIeqvsYNVb7ryYw2LXA56gnldXlBSl9TdeYDD0ftQCrhXloKIElzz1/u
   5EO4BEkhfyhuIavYgyOb8LtG6V14dFJELSerkJCqWcoTVRNQSc0qZPDCg
   tJyWDTnVFbTfmN7y4A3ZFPcEUAnX03Qh4XTi6xPi8eIQlGP3DHaHYFe4r
   1i1nsQs73WADbTTLfjjwjcs4/fOYNJgk69qjnxBnFnQC4L/DmRbSK6NnB
   q+6Sj7uOjoeEGHTqhrxOrOWFlamzY+BOpBIaKB6Q8vtYqEel3sKNwIalJ
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540172"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540172"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:07 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807813"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807813"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:07 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 094/105] KVM: TDX: Handle TDX PV MMIO hypercall
Date: Fri, 30 Sep 2022 03:18:28 -0700
Message-Id: 
 <75b6bf56d02cbc22cfd90be308b44ff043a7dfe9.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Export kvm_io_bus_read and kvm_mmio tracepoint and wire up TDX PV MMIO
hypercall to the KVM backend functions.

kvm_io_bus_read/write() searches KVM device emulated in kernel of the given
MMIO address and emulates the MMIO.  As TDX PV MMIO also needs it, export
kvm_io_bus_read().  kvm_io_bus_write() is already exported.  TDX PV MMIO
emulates some of MMIO itself.  To add trace point consistently with x86
kvm, export kvm_mmio tracepoint.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 114 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c     |   1 +
 virt/kvm/kvm_main.c    |   2 +
 3 files changed, 117 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 36af819ee99a..9dbfc8c6a121 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -913,6 +913,118 @@ static int tdx_emulate_io(struct kvm_vcpu *vcpu)
 	return ret;
 }
=20
+static int tdx_complete_mmio(struct kvm_vcpu *vcpu)
+{
+	unsigned long val =3D 0;
+	gpa_t gpa;
+	int size;
+
+	KVM_BUG_ON(vcpu->mmio_needed !=3D 1, vcpu->kvm);
+	vcpu->mmio_needed =3D 0;
+
+	if (!vcpu->mmio_is_write) {
+		gpa =3D vcpu->mmio_fragments[0].gpa;
+		size =3D vcpu->mmio_fragments[0].len;
+
+		memcpy(&val, vcpu->run->mmio.data, size);
+		tdvmcall_set_return_val(vcpu, val);
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val);
+	}
+	return 1;
+}
+
+static inline int tdx_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, int siz=
e,
+				 unsigned long val)
+{
+	if (kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) &&
+	    kvm_io_bus_write(vcpu, KVM_MMIO_BUS, gpa, size, &val))
+		return -EOPNOTSUPP;
+
+	trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val);
+	return 0;
+}
+
+static inline int tdx_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, int size)
+{
+	unsigned long val;
+
+	if (kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) &&
+	    kvm_io_bus_read(vcpu, KVM_MMIO_BUS, gpa, size, &val))
+		return -EOPNOTSUPP;
+
+	tdvmcall_set_return_val(vcpu, val);
+	trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val);
+	return 0;
+}
+
+static int tdx_emulate_mmio(struct kvm_vcpu *vcpu)
+{
+	struct kvm_memory_slot *slot;
+	int size, write, r;
+	unsigned long val;
+	gpa_t gpa;
+
+	KVM_BUG_ON(vcpu->mmio_needed, vcpu->kvm);
+
+	size =3D tdvmcall_a0_read(vcpu);
+	write =3D tdvmcall_a1_read(vcpu);
+	gpa =3D tdvmcall_a2_read(vcpu);
+	val =3D write ? tdvmcall_a3_read(vcpu) : 0;
+
+	if (size !=3D 1 && size !=3D 2 && size !=3D 4 && size !=3D 8)
+		goto error;
+	if (write !=3D 0 && write !=3D 1)
+		goto error;
+
+	/* Strip the shared bit, allow MMIO with and without it set. */
+	gpa =3D gpa & ~gfn_to_gpa(kvm_gfn_shared_mask(vcpu->kvm));
+
+	if (size > 8u || ((gpa + size - 1) ^ gpa) & PAGE_MASK)
+		goto error;
+
+	slot =3D kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(gpa));
+	if (slot && !(slot->flags & KVM_MEMSLOT_INVALID))
+		goto error;
+
+	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
+		trace_kvm_fast_mmio(gpa);
+		return 1;
+	}
+
+	if (write)
+		r =3D tdx_mmio_write(vcpu, gpa, size, val);
+	else
+		r =3D tdx_mmio_read(vcpu, gpa, size);
+	if (!r) {
+		/* Kernel completed device emulation. */
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+		return 1;
+	}
+
+	/* Request the device emulation to userspace device model. */
+	vcpu->mmio_needed =3D 1;
+	vcpu->mmio_is_write =3D write;
+	vcpu->arch.complete_userspace_io =3D tdx_complete_mmio;
+
+	vcpu->run->mmio.phys_addr =3D gpa;
+	vcpu->run->mmio.len =3D size;
+	vcpu->run->mmio.is_write =3D write;
+	vcpu->run->exit_reason =3D KVM_EXIT_MMIO;
+
+	if (write) {
+		memcpy(vcpu->run->mmio.data, &val, size);
+	} else {
+		vcpu->mmio_fragments[0].gpa =3D gpa;
+		vcpu->mmio_fragments[0].len =3D size;
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, size, gpa, NULL);
+	}
+	return 0;
+
+error:
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -925,6 +1037,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_hlt(vcpu);
 	case EXIT_REASON_IO_INSTRUCTION:
 		return tdx_emulate_io(vcpu);
+	case EXIT_REASON_EPT_VIOLATION:
+		return tdx_emulate_mmio(vcpu);
 	default:
 		break;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 43b1681a0952..5ecd7a028632 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13645,6 +13645,7 @@ bool kvm_arch_has_private_mem(struct kvm *kvm)
=20
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 20c46f26691d..a5febc1ddeef 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2581,6 +2581,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struc=
t kvm_vcpu *vcpu, gfn_t gfn
=20
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot);
=20
 bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
 {
@@ -5520,6 +5521,7 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_b=
us bus_idx, gpa_t addr,
 	r =3D __kvm_io_bus_read(vcpu, bus, &range, val);
 	return r < 0 ? r : 0;
 }
+EXPORT_SYMBOL_GPL(kvm_io_bus_read);
=20
 /* Caller must hold slots_lock. */
 int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t a=
ddr,
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3160AC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:27:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232695AbiI3K1b (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:27:31 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33728 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232404AbiI3KXz (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:23:55 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7FCF17E3E;
        Fri, 30 Sep 2022 03:19:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533188; x=1696069188;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=YDnT5NqP3QlZvsd/rrmKRQAtWUrQo0rP2RKmWRNBiBw=;
  b=E6EYHtk7NFz7lAIm3zeqDGfD3yG5N009V/WnVwZ2TM4vR/nmsqCL7p8C
   NA5EAVC7Cwdt4Hn6ZqaTzEQcG1UfDePr1YJ3vW6nA466dFlHqGjyHbotV
   GayxguwS2D1o6lQ6iHGec3ibeqjvdB/0iDGF4ckvlmnA2odrJIt1ZlA6x
   QP6V5+nWfFIeHNKMk1WmckZdn86amvnm36y0HrVIelxHXrMtKiQ3Cvp3W
   S1gbg633Txw0EuGUXJCw8+mSiBloUJySe8JGCsCOSGzAmnkfHgkgG4gB/
   tCJg8I1s5XX38AkmWwBLis+8UkZl1T3W2p45uvNJ2s9P7wFLioCtTWKSq
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540175"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540175"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:07 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807816"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807816"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:07 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 095/105] KVM: TDX: Implement callbacks for MSR operations
 for TDX
Date: Fri, 30 Sep 2022 03:18:29 -0700
Message-Id: 
 <8abe56774de4e5427682a82e8179e40612b892ff.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implements set_msr/get_msr/has_emulated_msr methods for TDX to handle
hypercall from guest TD for paravirtualized rdmsr and wrmsr.  The TDX
module virtualizes MSRs.  For some MSRs, it injects #VE to the guest TD
upon RDMSR or WRMSR.  The exact list of such MSRs are defined in the spec.

Upon #VE, the guest TD may execute hypercalls,
TDG.VP.VMCALL<INSTRUCTION.RDMSR> and TDG.VP.VMCALL<INSTRUCTION.WRMSR>,
which are defined in GHCI (Guest-Host Communication Interface) so that the
host VMM (e.g. KVM) can virtualizes the MSRs.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c    | 34 +++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 68 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  6 ++++
 3 files changed, 105 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index ec322f4bcec5..6189bcdc1d80 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -195,6 +195,34 @@ static void vt_handle_exit_irqoff(struct kvm_vcpu *vcp=
u)
 	vmx_handle_exit_irqoff(vcpu);
 }
=20
+static int vt_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_set_msr(vcpu, msr_info);
+
+	return vmx_set_msr(vcpu, msr_info);
+}
+
+/*
+ * The kvm parameter can be NULL (module initialization, or invocation bef=
ore
+ * VM creation). Be sure to check the kvm parameter before using it.
+ */
+static bool vt_has_emulated_msr(struct kvm *kvm, u32 index)
+{
+	if (kvm && is_td(kvm))
+		return tdx_is_emulated_msr(index, true);
+
+	return vmx_has_emulated_msr(kvm, index);
+}
+
+static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_get_msr(vcpu, msr_info);
+
+	return vmx_get_msr(vcpu, msr_info);
+}
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -431,7 +459,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.hardware_enable =3D vt_hardware_enable,
 	.hardware_disable =3D vt_hardware_disable,
-	.has_emulated_msr =3D vmx_has_emulated_msr,
+	.has_emulated_msr =3D vt_has_emulated_msr,
=20
 	.is_vm_type_supported =3D vt_is_vm_type_supported,
 	.vm_size =3D sizeof(struct kvm_vmx),
@@ -451,8 +479,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.update_exception_bitmap =3D vmx_update_exception_bitmap,
 	.get_msr_feature =3D vmx_get_msr_feature,
-	.get_msr =3D vmx_get_msr,
-	.set_msr =3D vmx_set_msr,
+	.get_msr =3D vt_get_msr,
+	.set_msr =3D vt_set_msr,
 	.get_segment_base =3D vmx_get_segment_base,
 	.get_segment =3D vmx_get_segment,
 	.set_segment =3D vmx_set_segment,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 9dbfc8c6a121..86bde12c7818 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1455,6 +1455,74 @@ void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *r=
eason,
 	*error_code =3D 0;
 }
=20
+bool tdx_is_emulated_msr(u32 index, bool write)
+{
+	switch (index) {
+	case MSR_IA32_UCODE_REV:
+	case MSR_IA32_ARCH_CAPABILITIES:
+	case MSR_IA32_POWER_CTL:
+	case MSR_MTRRcap:
+	case 0x200 ... 0x26f:
+		/* IA32_MTRR_PHYS{BASE, MASK}, IA32_MTRR_FIX*_* */
+	case MSR_IA32_CR_PAT:
+	case MSR_MTRRdefType:
+	case MSR_IA32_TSC_DEADLINE:
+	case MSR_IA32_MISC_ENABLE:
+	case MSR_KVM_STEAL_TIME:
+	case MSR_KVM_POLL_CONTROL:
+	case MSR_PLATFORM_INFO:
+	case MSR_MISC_FEATURES_ENABLES:
+	case MSR_IA32_MCG_CAP:
+	case MSR_IA32_MCG_STATUS:
+	case MSR_IA32_MCG_CTL:
+	case MSR_IA32_MCG_EXT_CTL:
+	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_MISC(28) - 1:
+		/* MSR_IA32_MCx_{CTL, STATUS, ADDR, MISC} */
+		return true;
+	case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
+		/*
+		 * x2APIC registers that are virtualized by the CPU can't be
+		 * emulated, KVM doesn't have access to the virtual APIC page.
+		 */
+		switch (index) {
+		case X2APIC_MSR(APIC_TASKPRI):
+		case X2APIC_MSR(APIC_PROCPRI):
+		case X2APIC_MSR(APIC_EOI):
+		case X2APIC_MSR(APIC_ISR) ... X2APIC_MSR(APIC_ISR + APIC_ISR_NR):
+		case X2APIC_MSR(APIC_TMR) ... X2APIC_MSR(APIC_TMR + APIC_ISR_NR):
+		case X2APIC_MSR(APIC_IRR) ... X2APIC_MSR(APIC_IRR + APIC_ISR_NR):
+			return false;
+		default:
+			return true;
+		}
+	case MSR_IA32_APICBASE:
+	case MSR_EFER:
+		return !write;
+	case MSR_IA32_MCx_CTL2(0) ... MSR_IA32_MCx_CTL2(31):
+		/*
+		 * 0x280 - 0x29f: The x86 common code doesn't emulate MCx_CTL2.
+		 * Refer to kvm_{get,set}_msr_common(),
+		 * kvm_mtrr_{get, set}_msr(), and msr_mtrr_valid().
+		 */
+	default:
+		return false;
+	}
+}
+
+int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
+{
+	if (tdx_is_emulated_msr(msr->index, false))
+		return kvm_get_msr_common(vcpu, msr);
+	return 1;
+}
+
+int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
+{
+	if (tdx_is_emulated_msr(msr->index, true))
+		return kvm_set_msr_common(vcpu, msr);
+	return 1;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index b8dc1fb7ccb3..0a4bdf63e07a 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -162,6 +162,9 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, int =
delivery_mode,
 void tdx_inject_nmi(struct kvm_vcpu *vcpu);
 void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
 		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
+bool tdx_is_emulated_msr(u32 index, bool write);
+int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
+int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -200,6 +203,9 @@ static inline void tdx_deliver_interrupt(struct kvm_lap=
ic *apic, int delivery_mo
 static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {}
 static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u=
64 *info1,
 				     u64 *info2, u32 *intr_info, u32 *error_code) {}
+static inline bool tdx_is_emulated_msr(u32 index, bool write) { return fal=
se; }
+static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
+static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 65ADBC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:27:18 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232670AbiI3K1Q (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:27:16 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33720 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232375AbiI3KXl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:23:41 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 967D51DF11;
        Fri, 30 Sep 2022 03:19:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533188; x=1696069188;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=D67l/HHFxfKH6r8PtuMSvx6zUANhihvOPiWPfIrfNOQ=;
  b=IQ8FEhXBgUWGQDoB1BVM0DMHfbe089Or7OMWLN9rPAMql59ijLs2QMu1
   yiwzSgO1Iw7oJts1Cbp67PkCv9LOThLnMnzDKEX8FdER90LiY8Y0Br5Xw
   /CKyJZK4gJsYEUI+nIelMez7b/ytrj/sRYEc09dzNXOAi2aqZk1Do+gAI
   jNMtcoiXQ2tHpx4D98uM1p02OHqHvBr6Wwnjb4CzHSC+Iob90LcW2c30t
   AIksxdsG0ULG6+4UqDJf3LOaSxQ9M/xcRw8cNzQ8xoeGOaW9wEeOZLgKt
   rotH648gtjbvlmuY0KTBK11TeC2yTCaixRWBpG7nNYheucn4KAgQ75xt3
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540176"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540176"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:07 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807819"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807819"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:07 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 096/105] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall
Date: Fri, 30 Sep 2022 03:18:30 -0700
Message-Id: 
 <c8ec0527448f4d184a5106dbe54bd83626243466.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV rdmsr/wrmsr hypercall to the KVM backend function.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 86bde12c7818..6cf18eca654a 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1025,6 +1025,41 @@ static int tdx_emulate_mmio(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_emulate_rdmsr(struct kvm_vcpu *vcpu)
+{
+	u32 index =3D tdvmcall_a0_read(vcpu);
+	u64 data;
+
+	if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ) ||
+	    kvm_get_msr(vcpu, index, &data)) {
+		trace_kvm_msr_read_ex(index);
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+		return 1;
+	}
+	trace_kvm_msr_read(index, data);
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	tdvmcall_set_return_val(vcpu, data);
+	return 1;
+}
+
+static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu)
+{
+	u32 index =3D tdvmcall_a0_read(vcpu);
+	u64 data =3D tdvmcall_a1_read(vcpu);
+
+	if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE) ||
+	    kvm_set_msr(vcpu, index, data)) {
+		trace_kvm_msr_write_ex(index, data);
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+		return 1;
+	}
+
+	trace_kvm_msr_write(index, data);
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -1039,6 +1074,10 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_io(vcpu);
 	case EXIT_REASON_EPT_VIOLATION:
 		return tdx_emulate_mmio(vcpu);
+	case EXIT_REASON_MSR_READ:
+		return tdx_emulate_rdmsr(vcpu);
+	case EXIT_REASON_MSR_WRITE:
+		return tdx_emulate_wrmsr(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6B10DC433FE
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:27:47 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232546AbiI3K1p (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:27:45 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42788 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232302AbiI3KZE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:25:04 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAB3322B12;
        Fri, 30 Sep 2022 03:19:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533198; x=1696069198;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=pTAM5O5il6nt/65EYbq7evLkjZPVzO1QuMnGQ4g6lmg=;
  b=HTgqNGZNTvEXsgdbEQrOEcs4t+XHPjRYu24IgFjSHv7dKlOKqgL1VMvg
   GHJUNnTMFXmwhHLhD0TSvmHQqb44lRCEDvmiXgco/eF56vIJ+u/zxQ4Xs
   kNYR2y3rj4Uzazl2hDJlykwksNGpAlG8X+29/LJgHrvOUWtRqMHI2GTUm
   vd7y0zeVT17VswUvHyb4ZJO/uq/RgnuVr2la1jd+Tlveoci+4/h3A8WNd
   kOzAaV2nlshDVrim5Bi5ZYCIlrYGzANEahlPMK9U/4ASpmYvgn/8Xw6pB
   m9fB2UMDzT94TS8ioDnvPu44w32nUz5J3pTokEA3sHMif5YsbHwvHsEs1
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540177"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540177"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:08 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807822"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807822"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:07 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 097/105] KVM: TDX: Handle TDX PV report fatal error
 hypercall
Date: Fri, 30 Sep 2022 03:18:31 -0700
Message-Id: 
 <58ea2a6ddcc53bfe1a5755c308e695c1343a8bd3.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV report fatal error hypercall to KVM_SYSTEM_EVENT_CRASH KVM
exit event.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c   | 20 ++++++++++++++++++++
 include/uapi/linux/kvm.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6cf18eca654a..a7cf08d60744 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1060,6 +1060,24 @@ static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Exit to userspace device model for teardown.
+	 * Because guest TD is already panicing, returning an error to guerst TD
+	 * doesn't make sense.  No argument check is done.
+	 */
+
+	vcpu->run->exit_reason =3D KVM_EXIT_SYSTEM_EVENT;
+	vcpu->run->system_event.type =3D KVM_SYSTEM_EVENT_TDX;
+	vcpu->run->system_event.ndata =3D 3;
+	vcpu->run->system_event.data[0] =3D TDG_VP_VMCALL_REPORT_FATAL_ERROR;
+	vcpu->run->system_event.data[1] =3D tdvmcall_a0_read(vcpu);
+	vcpu->run->system_event.data[2] =3D tdvmcall_a1_read(vcpu);
+
+	return 0;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -1078,6 +1096,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_rdmsr(vcpu);
 	case EXIT_REASON_MSR_WRITE:
 		return tdx_emulate_wrmsr(vcpu);
+	case TDG_VP_VMCALL_REPORT_FATAL_ERROR:
+		return tdx_report_fatal_error(vcpu);
 	default:
 		break;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 47621588a792..498a2b92f2db 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -478,6 +478,7 @@ struct kvm_run {
 #define KVM_SYSTEM_EVENT_WAKEUP         4
 #define KVM_SYSTEM_EVENT_SUSPEND        5
 #define KVM_SYSTEM_EVENT_SEV_TERM       6
+#define KVM_SYSTEM_EVENT_TDX            7
 			__u32 type;
 			__u32 ndata;
 			union {
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BF515C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:28:04 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232586AbiI3K17 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:27:59 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55170 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232491AbiI3KZY (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:25:24 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D8B236405;
        Fri, 30 Sep 2022 03:19:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533200; x=1696069200;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=CEku3Y+5uD95iizZnUIZvdPui7y5N0fIpv5aBviNf2w=;
  b=Ab01ajwDbNhkdhb1llHFTlOs4fpcibA4IptKRdH+aQq+4dEfIX5rXaUT
   b6yGUPVeg7FI966LSnY+M+PncA8JmsJENuX6ingEr2h2/i2K2rZ8TWj1S
   hWvlub94utUL6OWdIAvktT3xJo4Ppv0F4t/VOrcyuJBKxdMC3nj8dPnRd
   j8E1vpXvNmucqhe5gDrgC1/F+A3NNQa/P/IcU8nrhrTQ9zduJ6wE8fnhF
   Cf1+SLFg8Zjv89HzZrKN8Xizaz/Tnpjztj4m42Z8dwyuVC83nCZ0nUu4c
   bUyhORQyPwslCE2jrr4BZKe+CCw3/C0h2t8r3hjtsHP67WTOsGRl0KHkO
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540179"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540179"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:08 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807830"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807830"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:08 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 098/105] KVM: TDX: Handle TDX PV map_gpa hypercall
Date: Fri, 30 Sep 2022 03:18:32 -0700
Message-Id: 
 <a85b1723ceaab24b00c29be054c80b4596f3f45f.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV map_gpa hypercall to the kvm/mmu backend.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a7cf08d60744..6318c0c09c0d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1078,6 +1078,37 @@ static int tdx_report_fatal_error(struct kvm_vcpu *v=
cpu)
 	return 0;
 }
=20
+static int tdx_map_gpa(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm =3D vcpu->kvm;
+	gpa_t gpa =3D tdvmcall_a0_read(vcpu);
+	gpa_t size =3D tdvmcall_a1_read(vcpu);
+	gpa_t end =3D gpa + size;
+	gfn_t s =3D gpa_to_gfn(gpa) & ~kvm_gfn_shared_mask(kvm);
+	gfn_t e =3D gpa_to_gfn(end) & ~kvm_gfn_shared_mask(kvm);
+	bool map_private =3D kvm_is_private_gpa(kvm, gpa);
+	int ret;
+
+	if (!IS_ALIGNED(gpa, 4096) || !IS_ALIGNED(size, 4096) ||
+	    end < gpa ||
+	    end > kvm_gfn_shared_mask(kvm) << (PAGE_SHIFT + 1) ||
+	    kvm_is_private_gpa(kvm, gpa) !=3D kvm_is_private_gpa(kvm, end)) {
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+		return 1;
+	}
+
+	ret =3D kvm_mmu_map_gpa(vcpu, &s, e, map_private);
+	if (ret =3D=3D -EAGAIN) {
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_RETRY);
+		tdvmcall_set_return_val(vcpu, gfn_to_gpa(s));
+	} else if (ret)
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+	else
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -1098,6 +1129,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_wrmsr(vcpu);
 	case TDG_VP_VMCALL_REPORT_FATAL_ERROR:
 		return tdx_report_fatal_error(vcpu);
+	case TDG_VP_VMCALL_MAP_GPA:
+		return tdx_map_gpa(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B886BC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:27:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232716AbiI3K1v (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:27:51 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53738 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231717AbiI3KZL (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:25:11 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A134303E4;
        Fri, 30 Sep 2022 03:19:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533200; x=1696069200;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=lqtsfa9QSpgZx+LEDtJSZlEH4s57QdJiMhJe7CAgFSg=;
  b=XcDbQYKzqHAZ269X5UzQ9sWrZutjqOVYaDX/mFR7SlwCBAS9TntoiPAQ
   trKnBuEglvxzXo/k5H7LWFn5MVEsyOhR27qqmL5gPlx1DTUYZ4V0bTxHE
   MbyCG0HHNW/t9pSH8B3SKrx+uBB1x7Zwv8x2r09fvuHQBfo1DJxk1Aphw
   TsOvOgp8PdY0SKHLw9jUAJ1e++CzBSmsmdFPrQ9mWkA/Dbrpmkn2EsiR9
   Y6RBtet39dMW2JZl+IckARZ7pn6e4GaBWk57doNCI6jg60Pg9LAM/mCrh
   pcwztVlhneiNk/Btu87q0jdyFcDHyWzgdUqnT4mx74Q8/nGf/BbRJIW78
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540181"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540181"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:08 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807836"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807836"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:08 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 099/105] KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo>
 hypercall
Date: Fri, 30 Sep 2022 03:18:33 -0700
Message-Id: 
 <ada57a7cdb75a0fddd1d35cc62ad80233059c329.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implement TDG.VP.VMCALL<GetTdVmCallInfo> hypercall.  If the input value is
zero, return success code and zero in output registers.

TDG.VP.VMCALL<GetTdVmCallInfo> hypercall is a subleaf of TDG.VP.VMCALL to
enumerate which TDG.VP.VMCALL sub leaves are supported.  This hypercall is
for future enhancement of the Guest-Host-Communication Interface (GHCI)
specification.  The GHCI version of 344426-001US defines it to require
input R12 to be zero and to return zero in output registers, R11, R12, R13,
and R14 so that guest TD enumerates no enhancement.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6318c0c09c0d..e5337fb24e82 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1060,6 +1060,20 @@ static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_get_td_vm_call_info(struct kvm_vcpu *vcpu)
+{
+	if (tdvmcall_a0_read(vcpu))
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+	else {
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+		kvm_r11_write(vcpu, 0);
+		tdvmcall_a0_write(vcpu, 0);
+		tdvmcall_a1_write(vcpu, 0);
+		tdvmcall_a2_write(vcpu, 0);
+	}
+	return 1;
+}
+
 static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
 {
 	/*
@@ -1127,6 +1141,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_rdmsr(vcpu);
 	case EXIT_REASON_MSR_WRITE:
 		return tdx_emulate_wrmsr(vcpu);
+	case TDG_VP_VMCALL_GET_TD_VM_CALL_INFO:
+		return tdx_get_td_vm_call_info(vcpu);
 	case TDG_VP_VMCALL_REPORT_FATAL_ERROR:
 		return tdx_report_fatal_error(vcpu);
 	case TDG_VP_VMCALL_MAP_GPA:
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9920DC433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:28:16 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232732AbiI3K2O (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:28:14 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57352 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232527AbiI3KZm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:25:42 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C7EF32DAB;
        Fri, 30 Sep 2022 03:19:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533200; x=1696069200;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=oPyutxGE/++Ne7CeSQMxhy+1nYhW/8cl/c+HXzmwEZs=;
  b=TKEVhyeBjRKHch7YrqXTZaT3WrIURbHQj6ctWV+ZofKwclUwV9jRx0xB
   bjmenZH9pFGAAdm4yIdc1epjZZuvwQDmqyw88HWTEaiIPjRSbvRFI868b
   +lM4Dh9INcho/OvWA+FpB/SqJqu91ELscAIWvPJKSoFo9VwkzMDvDGj5q
   k/814cGesa5tZfUo7MfaD63lDVgYBWajsY9DdJpouNcWKyEccpwKvgGFx
   aqpVcqBiK0aSw7CLmzhfPUJJGW8YVvKryh5n9QMW3cD5Jbo0S/6GXGRNo
   BnYKd7+C5OBV9n2S6CUTjvGzN8MAuiBqGcKy9ep1BTf0G8ko71iAIPlKt
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540182"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540182"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:08 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807839"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807839"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:08 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 100/105] KVM: TDX: Silently discard SMI request
Date: Fri, 30 Sep 2022 03:18:34 -0700
Message-Id: 
 <916f8ff8d66e52f10a31bcd812fd9a220c9874bf.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX doesn't support system-management mode (SMM) and system-management
interrupt (SMI) in guest TDs.  Because guest state (vcpu state, memory
state) is protected, it must go through the TDX module APIs to change guest
state, injecting SMI and changing vcpu mode into SMM.  The TDX module
doesn't provide a way for VMM to inject SMI into guest TD and a way for VMM
to switch guest vcpu mode into SMM.

We have two options in KVM when handling SMM or SMI in the guest TD or the
device model (e.g. QEMU): 1) silently ignore the request or 2) return a
meaningful error.

For simplicity, we implemented the option 1).

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/lapic.c       |  7 +++++--
 arch/x86/kvm/vmx/main.c    | 43 ++++++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/tdx.c     | 27 ++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  8 +++++++
 arch/x86/kvm/x86.c         |  3 ++-
 5 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4e506084e8ed..e02681061637 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1171,8 +1171,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic,=
 int delivery_mode,
=20
 	case APIC_DM_SMI:
 		result =3D 1;
-		kvm_make_request(KVM_REQ_SMI, vcpu);
-		kvm_vcpu_kick(vcpu);
+		if (static_call(kvm_x86_has_emulated_msr)(vcpu->kvm,
+							  MSR_IA32_SMBASE)) {
+			kvm_make_request(KVM_REQ_SMI, vcpu);
+			kvm_vcpu_kick(vcpu);
+		}
 		break;
=20
 	case APIC_DM_NMI:
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 6189bcdc1d80..017c24ed16e5 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -223,6 +223,41 @@ static int vt_get_msr(struct kvm_vcpu *vcpu, struct ms=
r_data *msr_info)
 	return vmx_get_msr(vcpu, msr_info);
 }
=20
+static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_smi_allowed(vcpu, for_injection);
+
+	return vmx_smi_allowed(vcpu, for_injection);
+}
+
+static int vt_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+{
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_enter_smm(vcpu, smstate);
+
+	return vmx_enter_smm(vcpu, smstate);
+}
+
+static int vt_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
+{
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_leave_smm(vcpu, smstate);
+
+	return vmx_leave_smm(vcpu, smstate);
+}
+
+static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_enable_smi_window(vcpu);
+		return;
+	}
+
+	/* RSM will cause a vmexit anyway.  */
+	vmx_enable_smi_window(vcpu);
+}
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -580,10 +615,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.setup_mce =3D vmx_setup_mce,
=20
-	.smi_allowed =3D vmx_smi_allowed,
-	.enter_smm =3D vmx_enter_smm,
-	.leave_smm =3D vmx_leave_smm,
-	.enable_smi_window =3D vmx_enable_smi_window,
+	.smi_allowed =3D vt_smi_allowed,
+	.enter_smm =3D vt_enter_smm,
+	.leave_smm =3D vt_leave_smm,
+	.enable_smi_window =3D vt_enable_smi_window,
=20
 	.can_emulate_instruction =3D vmx_can_emulate_instruction,
 	.apic_init_signal_blocked =3D vmx_apic_init_signal_blocked,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index e5337fb24e82..c7164404a79f 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1631,6 +1631,33 @@ int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_da=
ta *msr)
 	return 1;
 }
=20
+int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	/* SMI isn't supported for TDX. */
+	WARN_ON_ONCE(1);
+	return false;
+}
+
+int tdx_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+{
+	/* smi_allowed() is always false for TDX as above. */
+	WARN_ON_ONCE(1);
+	return 0;
+}
+
+int tdx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
+{
+	WARN_ON_ONCE(1);
+	return 0;
+}
+
+void tdx_enable_smi_window(struct kvm_vcpu *vcpu)
+{
+	/* SMI isn't supported for TDX.  Silently discard SMI request. */
+	WARN_ON_ONCE(1);
+	vcpu->arch.smi_pending =3D false;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 0a4bdf63e07a..a81b47307b39 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -165,6 +165,10 @@ void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *rea=
son,
 bool tdx_is_emulated_msr(u32 index, bool write);
 int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
 int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
+int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection);
+int tdx_enter_smm(struct kvm_vcpu *vcpu, char *smstate);
+int tdx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate);
+void tdx_enable_smi_window(struct kvm_vcpu *vcpu);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -206,6 +210,10 @@ static inline void tdx_get_exit_info(struct kvm_vcpu *=
vcpu, u32 *reason, u64 *in
 static inline bool tdx_is_emulated_msr(u32 index, bool write) { return fal=
se; }
 static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
 static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
+static inline int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injectio=
n) { return false; }
+static inline int tdx_enter_smm(struct kvm_vcpu *vcpu, char *smstate) { re=
turn 0; }
+static inline int tdx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate=
) { return 0; }
+static inline void tdx_enable_smi_window(struct kvm_vcpu *vcpu) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5ecd7a028632..3ba16fe6c9df 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4919,7 +4919,8 @@ static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu)
=20
 static int kvm_vcpu_ioctl_smi(struct kvm_vcpu *vcpu)
 {
-	kvm_make_request(KVM_REQ_SMI, vcpu);
+	if (static_call(kvm_x86_has_emulated_msr)(vcpu->kvm, MSR_IA32_SMBASE))
+		kvm_make_request(KVM_REQ_SMI, vcpu);
=20
 	return 0;
 }
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B3277C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:32:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232748AbiI3K2b (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:28:31 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34768 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232594AbiI3K02 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:26:28 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E4FF3719B;
        Fri, 30 Sep 2022 03:19:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533200; x=1696069200;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=x/pp7Rr6YLHDavCMmIEepzFwSxDLb5TVp3XhWZFvsNw=;
  b=NBEBa03ohIRpXNjvH6hHBG9BgZOeZPdPTzvewgwbzmphT8w2lVq7qzDF
   6nTKJRIlO84iJSd0FvhGjgi1xAIbqXcYh1R3/k+I6W6wBbci7AIYIPX3U
   cK8bOXOmhOPHqh7y6zCEgpmJ+GMByzPw245cgcMRSKlW9iaIQ5uaBEHgv
   yMnArqZdbON7QnZvP5U+4Sjwg6am5wSD+/24bjSzkxR7EkmcJr79Y3GSA
   M9r2WTBtx6bkGoCRu58h6WR8YswfH+QGSswgeTZwKRnFTumWAkSVkFJpi
   ffgDEXqI6RAGpsjXhvxC4tolvc7Iy0q0YcG8liiMiS07Ih2LBjVKSU5Zm
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540184"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540184"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:08 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807843"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807843"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:08 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 101/105] KVM: TDX: Silently ignore INIT/SIPI
Date: Fri, 30 Sep 2022 03:18:35 -0700
Message-Id: 
 <4df60690fb826f632716e5c2fd5190e34467d86e.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX module API doesn't provide API for VMM to inject INIT IPI and SIPI.
Instead it defines the different protocols to boot application processors.
Ignore INIT and SIPI events for the TDX guest.

There are two options. 1) (silently) ignore INIT/SIPI request or 2) return
error to guest TDs somehow.  Given that TDX guest is paravirtualized to
boot AP, the option 1 is chosen for simplicity.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/lapic.c               | 16 +++++++++++-----
 arch/x86/kvm/svm/svm.c             |  1 +
 arch/x86/kvm/vmx/main.c            | 22 +++++++++++++++++++++-
 5 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 0ecaebcf8a18..89cdc31dca4a 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -140,6 +140,7 @@ KVM_X86_OP_OPTIONAL(migrate_timers)
 KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
+KVM_X86_OP(vcpu_deliver_init)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP(check_processor_compatibility)
=20
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index de7a716e774c..276f09caf9bd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1700,6 +1700,7 @@ struct kvm_x86_ops {
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
=20
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
+	void (*vcpu_deliver_init)(struct kvm_vcpu *vcpu);
=20
 	/*
 	 * Returns vCPU specific APICv inhibit reasons
@@ -1908,6 +1909,7 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int s=
eg);
 void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
+void kvm_vcpu_deliver_init(struct kvm_vcpu *vcpu);
=20
 int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
 		    int reason, bool has_error_code, u32 error_code);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e02681061637..03e06e1e901d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -3035,6 +3035,16 @@ int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 =
data, unsigned long len)
 	return 0;
 }
=20
+void kvm_vcpu_deliver_init(struct kvm_vcpu *vcpu)
+{
+	kvm_vcpu_reset(vcpu, true);
+	if (kvm_vcpu_is_bsp(vcpu))
+		vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
+	else
+		vcpu->arch.mp_state =3D KVM_MP_STATE_INIT_RECEIVED;
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_init);
+
 int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
@@ -3082,11 +3092,7 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
=20
 	if (test_bit(KVM_APIC_INIT, &pe)) {
 		clear_bit(KVM_APIC_INIT, &apic->pending_events);
-		kvm_vcpu_reset(vcpu, true);
-		if (kvm_vcpu_is_bsp(apic->vcpu))
-			vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
-		else
-			vcpu->arch.mp_state =3D KVM_MP_STATE_INIT_RECEIVED;
+		static_call(kvm_x86_vcpu_deliver_init)(vcpu);
 	}
 	if (test_bit(KVM_APIC_SIPI, &pe)) {
 		clear_bit(KVM_APIC_SIPI, &apic->pending_events);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 37c0db89a1a4..455eded53f47 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4866,6 +4866,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata =3D {
 	.complete_emulated_msr =3D svm_complete_emulated_msr,
=20
 	.vcpu_deliver_sipi_vector =3D svm_vcpu_deliver_sipi_vector,
+	.vcpu_deliver_init =3D kvm_vcpu_deliver_init,
 	.vcpu_get_apicv_inhibit_reasons =3D avic_vcpu_get_apicv_inhibit_reasons,
 };
=20
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 017c24ed16e5..72dd771b0993 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -286,6 +286,25 @@ static void vt_deliver_interrupt(struct kvm_lapic *api=
c, int delivery_mode,
 	vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector);
 }
=20
+static void vt_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	kvm_vcpu_deliver_sipi_vector(vcpu, vector);
+}
+
+static void vt_vcpu_deliver_init(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		/* TDX doesn't support INIT.  Ignore INIT event */
+		vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
+		return;
+	}
+
+	kvm_vcpu_deliver_init(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -627,7 +646,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.msr_filter_changed =3D vmx_msr_filter_changed,
 	.complete_emulated_msr =3D kvm_complete_insn_gp,
=20
-	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
+	.vcpu_deliver_sipi_vector =3D vt_vcpu_deliver_sipi_vector,
+	.vcpu_deliver_init =3D vt_vcpu_deliver_init,
=20
 	.dev_mem_enc_ioctl =3D tdx_dev_ioctl,
 	.mem_enc_ioctl =3D vt_mem_enc_ioctl,
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 87714C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:28:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232590AbiI3K2J (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:28:09 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55168 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232493AbiI3KZY (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:25:24 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D10632DB7;
        Fri, 30 Sep 2022 03:19:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533200; x=1696069200;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=2mxF7aPw49+BZQYMIQKcGMHSl864tkdbyqaIIa1qC+8=;
  b=GBAeuUnY1kTKTeE3LtUULiXjE/6llsrZFlgARpN1/wDvleAoyeRmxmcf
   PSHvL9PWFTpkwb9LFl+Xq5bAkyaHN7E/wx6R/a7INQ97yKiLARDdoWlti
   qY+eauCyxjL3wU7BLIT+k++7sf2rhMmUquIj0U8CU/P5u5spKaq6uBcGd
   LvkPDT7x4jRe9PYN/5E2YfnGTAqMXMTHAHfk2Q/vnGr0nm81f32/a3oOM
   i+Wgj/dtBC/9nf1C3hDva2OCtWprxfdhSCH27Q6XRqXruYMfi7vOcQGRd
   W0JtBDNJLYTfVnqjm059fbT8T9I+lt5ZRt2wv93bWeZ8r+7JvC6UDSROM
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540185"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540185"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:08 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807846"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807846"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:08 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v9 102/105] KVM: TDX: Add methods to ignore accesses to CPU
 state
Date: Fri, 30 Sep 2022 03:18:36 -0700
Message-Id: 
 <a5ae094dc3362ea1ca4284bdd301d977be05d41e.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX protects TDX guest state from VMM.  Implements to access methods for
TDX guest state to ignore them or return zero.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 463 +++++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/tdx.c     |  55 ++++-
 arch/x86/kvm/vmx/x86_ops.h |  17 ++
 3 files changed, 490 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 72dd771b0993..2e4be54a3be5 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -258,6 +258,46 @@ static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
 	vmx_enable_smi_window(vcpu);
 }
=20
+static bool vt_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_typ=
e,
+				       void *insn, int insn_len)
+{
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return vmx_can_emulate_instruction(vcpu, emul_type, insn, insn_len);
+}
+
+static int vt_check_intercept(struct kvm_vcpu *vcpu,
+				 struct x86_instruction_info *info,
+				 enum x86_intercept_stage stage,
+				 struct x86_exception *exception)
+{
+	/*
+	 * This call back is triggered by the x86 instruction emulator. TDX
+	 * doesn't allow guest memory inspection.
+	 */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return X86EMUL_UNHANDLEABLE;
+
+	return vmx_check_intercept(vcpu, info, stage, exception);
+}
+
+static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_apic_init_signal_blocked(vcpu);
+}
+
+static void vt_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_set_virtual_apic_mode(vcpu);
+
+	return vmx_set_virtual_apic_mode(vcpu);
+}
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -266,6 +306,31 @@ static void vt_apicv_post_state_restore(struct kvm_vcp=
u *vcpu)
 	memset(pi->pir, 0, sizeof(pi->pir));
 }
=20
+static void vt_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	return vmx_hwapic_irr_update(vcpu, max_irr);
+}
+
+static void vt_hwapic_isr_update(int max_isr)
+{
+	if (is_td_vcpu(kvm_get_running_vcpu()))
+		return;
+
+	return vmx_hwapic_isr_update(max_isr);
+}
+
+static bool vt_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	/* TDX doesn't support L2 at the moment. */
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return false;
+
+	return vmx_guest_apic_has_interrupt(vcpu);
+}
+
 static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -305,6 +370,177 @@ static void vt_vcpu_deliver_init(struct kvm_vcpu *vcp=
u)
 	kvm_vcpu_deliver_init(vcpu);
 }
=20
+static void vt_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	return vmx_vcpu_after_set_cpuid(vcpu);
+}
+
+static void vt_update_exception_bitmap(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_update_exception_bitmap(vcpu);
+}
+
+static u64 vt_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return tdx_get_segment_base(vcpu, seg);
+
+	return vmx_get_segment_base(vcpu, seg);
+}
+
+static void vt_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
+			      int seg)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return tdx_get_segment(vcpu, var, seg);
+
+	vmx_get_segment(vcpu, var, seg);
+}
+
+static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
+			      int seg)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_set_segment(vcpu, var, seg);
+}
+
+static int vt_get_cpl(struct kvm_vcpu *vcpu)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return tdx_get_cpl(vcpu);
+
+	return vmx_get_cpl(vcpu);
+}
+
+static void vt_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_get_cs_db_l_bits(vcpu, db, l);
+}
+
+static void vt_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_cr0(vcpu, cr0);
+}
+
+static void vt_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_cr4(vcpu, cr4);
+}
+
+static int vt_set_efer(struct kvm_vcpu *vcpu, u64 efer)
+{
+	if (is_td_vcpu(vcpu))
+		return 0;
+
+	return vmx_set_efer(vcpu, efer);
+}
+
+static void vt_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm)) {
+		memset(dt, 0, sizeof(*dt));
+		return;
+	}
+
+	vmx_get_idt(vcpu, dt);
+}
+
+static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_set_idt(vcpu, dt);
+}
+
+static void vt_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm)) {
+		memset(dt, 0, sizeof(*dt));
+		return;
+	}
+
+	vmx_get_gdt(vcpu, dt);
+}
+
+static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_set_gdt(vcpu, dt);
+}
+
+static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_dr7(vcpu, val);
+}
+
+static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * MOV-DR exiting is always cleared for TD guest, even in debug mode.
+	 * Thus KVM_DEBUGREG_WONT_EXIT can never be set and it should never
+	 * reach here for TD vcpu.
+	 */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_sync_dirty_debug_regs(vcpu);
+}
+
+static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_cache_reg(vcpu, reg);
+
+	return vmx_cache_reg(vcpu, reg);
+}
+
+static unsigned long vt_get_rflags(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_get_rflags(vcpu);
+
+	return vmx_get_rflags(vcpu);
+}
+
+static void vt_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_rflags(vcpu, rflags);
+}
+
+static bool vt_get_if_flag(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return vmx_get_if_flag(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -438,6 +674,15 @@ static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vc=
pu)
 	return vmx_get_interrupt_shadow(vcpu);
 }
=20
+static void vt_patch_hypercall(struct kvm_vcpu *vcpu,
+				  unsigned char *hypercall)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_patch_hypercall(vcpu, hypercall);
+}
+
 static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
 {
 	if (is_td_vcpu(vcpu))
@@ -446,6 +691,14 @@ static void vt_inject_irq(struct kvm_vcpu *vcpu, bool =
reinjected)
 	vmx_inject_irq(vcpu, reinjected);
 }
=20
+static void vt_queue_exception(struct kvm_vcpu *vcpu)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_queue_exception(vcpu);
+}
+
 static void vt_cancel_injection(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -478,6 +731,130 @@ static void vt_request_immediate_exit(struct kvm_vcpu=
 *vcpu)
 	vmx_request_immediate_exit(vcpu);
 }
=20
+static void vt_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int ir=
r)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_update_cr8_intercept(vcpu, tpr, irr);
+}
+
+static void vt_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
+{
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return;
+
+	vmx_set_apic_access_page_addr(vcpu);
+}
+
+static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+{
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return;
+
+	vmx_refresh_apicv_exec_ctrl(vcpu);
+}
+
+static void vt_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitma=
p)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_load_eoi_exitmap(vcpu, eoi_exit_bitmap);
+}
+
+static int vt_set_tss_addr(struct kvm *kvm, unsigned int addr)
+{
+	if (is_td(kvm))
+		return 0;
+
+	return vmx_set_tss_addr(kvm, addr);
+}
+
+static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
+{
+	if (is_td(kvm))
+		return 0;
+
+	return vmx_set_identity_map_addr(kvm, ident_addr);
+}
+
+static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+{
+	if (is_td_vcpu(vcpu)) {
+		if (is_mmio)
+			return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
+		return  MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
+	}
+
+	return vmx_get_mt_mask(vcpu, gfn, is_mmio);
+}
+
+static u64 vt_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	/* TDX doesn't support L2 guest at the moment. */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return 0;
+
+	return vmx_get_l2_tsc_offset(vcpu);
+}
+
+static u64 vt_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
+{
+	/* TDX doesn't support L2 guest at the moment. */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return 0;
+
+	return vmx_get_l2_tsc_multiplier(vcpu);
+}
+
+static void vt_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+{
+	/* In TDX, tsc offset can't be changed. */
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_write_tsc_offset(vcpu, offset);
+}
+
+static void vt_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
+{
+	/* In TDX, tsc multiplier can't be changed. */
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_write_tsc_multiplier(vcpu, multiplier);
+}
+
+static void vt_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_update_cpu_dirty_logging(vcpu);
+}
+
+#ifdef CONFIG_X86_64
+static int vt_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
+			      bool *expired)
+{
+	/* VMX-preemption timer isn't available for TDX. */
+	if (is_td_vcpu(vcpu))
+		return -EINVAL;
+
+	return vmx_set_hv_timer(vcpu, guest_deadline_tsc, expired);
+}
+
+static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu)
+{
+	/* VMX-preemption timer can't be set.  Set vt_set_hv_timer(). */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_cancel_hv_timer(vcpu);
+}
+#endif
+
 static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
 			u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
 {
@@ -531,29 +908,29 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vcpu_load =3D vt_vcpu_load,
 	.vcpu_put =3D vt_vcpu_put,
=20
-	.update_exception_bitmap =3D vmx_update_exception_bitmap,
+	.update_exception_bitmap =3D vt_update_exception_bitmap,
 	.get_msr_feature =3D vmx_get_msr_feature,
 	.get_msr =3D vt_get_msr,
 	.set_msr =3D vt_set_msr,
-	.get_segment_base =3D vmx_get_segment_base,
-	.get_segment =3D vmx_get_segment,
-	.set_segment =3D vmx_set_segment,
-	.get_cpl =3D vmx_get_cpl,
-	.get_cs_db_l_bits =3D vmx_get_cs_db_l_bits,
-	.set_cr0 =3D vmx_set_cr0,
+	.get_segment_base =3D vt_get_segment_base,
+	.get_segment =3D vt_get_segment,
+	.set_segment =3D vt_set_segment,
+	.get_cpl =3D vt_get_cpl,
+	.get_cs_db_l_bits =3D vt_get_cs_db_l_bits,
+	.set_cr0 =3D vt_set_cr0,
 	.is_valid_cr4 =3D vmx_is_valid_cr4,
-	.set_cr4 =3D vmx_set_cr4,
-	.set_efer =3D vmx_set_efer,
-	.get_idt =3D vmx_get_idt,
-	.set_idt =3D vmx_set_idt,
-	.get_gdt =3D vmx_get_gdt,
-	.set_gdt =3D vmx_set_gdt,
-	.set_dr7 =3D vmx_set_dr7,
-	.sync_dirty_debug_regs =3D vmx_sync_dirty_debug_regs,
-	.cache_reg =3D vmx_cache_reg,
-	.get_rflags =3D vmx_get_rflags,
-	.set_rflags =3D vmx_set_rflags,
-	.get_if_flag =3D vmx_get_if_flag,
+	.set_cr4 =3D vt_set_cr4,
+	.set_efer =3D vt_set_efer,
+	.get_idt =3D vt_get_idt,
+	.set_idt =3D vt_set_idt,
+	.get_gdt =3D vt_get_gdt,
+	.set_gdt =3D vt_set_gdt,
+	.set_dr7 =3D vt_set_dr7,
+	.sync_dirty_debug_regs =3D vt_sync_dirty_debug_regs,
+	.cache_reg =3D vt_cache_reg,
+	.get_rflags =3D vt_get_rflags,
+	.set_rflags =3D vt_set_rflags,
+	.get_if_flag =3D vt_get_if_flag,
=20
 	.flush_tlb_all =3D vt_flush_tlb_all,
 	.flush_tlb_current =3D vt_flush_tlb_current,
@@ -569,10 +946,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.update_emulated_instruction =3D vmx_update_emulated_instruction,
 	.set_interrupt_shadow =3D vt_set_interrupt_shadow,
 	.get_interrupt_shadow =3D vt_get_interrupt_shadow,
-	.patch_hypercall =3D vmx_patch_hypercall,
+	.patch_hypercall =3D vt_patch_hypercall,
 	.inject_irq =3D vt_inject_irq,
 	.inject_nmi =3D vt_inject_nmi,
-	.queue_exception =3D vmx_queue_exception,
+	.queue_exception =3D vt_queue_exception,
 	.cancel_injection =3D vt_cancel_injection,
 	.interrupt_allowed =3D vt_interrupt_allowed,
 	.nmi_allowed =3D vt_nmi_allowed,
@@ -580,39 +957,39 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.set_nmi_mask =3D vt_set_nmi_mask,
 	.enable_nmi_window =3D vt_enable_nmi_window,
 	.enable_irq_window =3D vt_enable_irq_window,
-	.update_cr8_intercept =3D vmx_update_cr8_intercept,
-	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
-	.set_apic_access_page_addr =3D vmx_set_apic_access_page_addr,
-	.refresh_apicv_exec_ctrl =3D vmx_refresh_apicv_exec_ctrl,
-	.load_eoi_exitmap =3D vmx_load_eoi_exitmap,
+	.update_cr8_intercept =3D vt_update_cr8_intercept,
+	.set_virtual_apic_mode =3D vt_set_virtual_apic_mode,
+	.set_apic_access_page_addr =3D vt_set_apic_access_page_addr,
+	.refresh_apicv_exec_ctrl =3D vt_refresh_apicv_exec_ctrl,
+	.load_eoi_exitmap =3D vt_load_eoi_exitmap,
 	.apicv_post_state_restore =3D vt_apicv_post_state_restore,
 	.check_apicv_inhibit_reasons =3D vmx_check_apicv_inhibit_reasons,
-	.hwapic_irr_update =3D vmx_hwapic_irr_update,
-	.hwapic_isr_update =3D vmx_hwapic_isr_update,
-	.guest_apic_has_interrupt =3D vmx_guest_apic_has_interrupt,
+	.hwapic_irr_update =3D vt_hwapic_irr_update,
+	.hwapic_isr_update =3D vt_hwapic_isr_update,
+	.guest_apic_has_interrupt =3D vt_guest_apic_has_interrupt,
 	.sync_pir_to_irr =3D vt_sync_pir_to_irr,
 	.deliver_interrupt =3D vt_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
 	.protected_apic_has_interrupt =3D vt_protected_apic_has_interrupt,
=20
-	.set_tss_addr =3D vmx_set_tss_addr,
-	.set_identity_map_addr =3D vmx_set_identity_map_addr,
-	.get_mt_mask =3D vmx_get_mt_mask,
+	.set_tss_addr =3D vt_set_tss_addr,
+	.set_identity_map_addr =3D vt_set_identity_map_addr,
+	.get_mt_mask =3D vt_get_mt_mask,
=20
 	.get_exit_info =3D vt_get_exit_info,
=20
-	.vcpu_after_set_cpuid =3D vmx_vcpu_after_set_cpuid,
+	.vcpu_after_set_cpuid =3D vt_vcpu_after_set_cpuid,
=20
 	.has_wbinvd_exit =3D cpu_has_vmx_wbinvd_exit,
=20
-	.get_l2_tsc_offset =3D vmx_get_l2_tsc_offset,
-	.get_l2_tsc_multiplier =3D vmx_get_l2_tsc_multiplier,
-	.write_tsc_offset =3D vmx_write_tsc_offset,
-	.write_tsc_multiplier =3D vmx_write_tsc_multiplier,
+	.get_l2_tsc_offset =3D vt_get_l2_tsc_offset,
+	.get_l2_tsc_multiplier =3D vt_get_l2_tsc_multiplier,
+	.write_tsc_offset =3D vt_write_tsc_offset,
+	.write_tsc_multiplier =3D vt_write_tsc_multiplier,
=20
 	.load_mmu_pgd =3D vt_load_mmu_pgd,
=20
-	.check_intercept =3D vmx_check_intercept,
+	.check_intercept =3D vt_check_intercept,
 	.handle_exit_irqoff =3D vt_handle_exit_irqoff,
=20
 	.request_immediate_exit =3D vt_request_immediate_exit,
@@ -620,7 +997,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.sched_in =3D vt_sched_in,
=20
 	.cpu_dirty_log_size =3D PML_ENTITY_NUM,
-	.update_cpu_dirty_logging =3D vmx_update_cpu_dirty_logging,
+	.update_cpu_dirty_logging =3D vt_update_cpu_dirty_logging,
=20
 	.nested_ops =3D &vmx_nested_ops,
=20
@@ -628,8 +1005,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.pi_start_assignment =3D vmx_pi_start_assignment,
=20
 #ifdef CONFIG_X86_64
-	.set_hv_timer =3D vmx_set_hv_timer,
-	.cancel_hv_timer =3D vmx_cancel_hv_timer,
+	.set_hv_timer =3D vt_set_hv_timer,
+	.cancel_hv_timer =3D vt_cancel_hv_timer,
 #endif
=20
 	.setup_mce =3D vmx_setup_mce,
@@ -639,8 +1016,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.leave_smm =3D vt_leave_smm,
 	.enable_smi_window =3D vt_enable_smi_window,
=20
-	.can_emulate_instruction =3D vmx_can_emulate_instruction,
-	.apic_init_signal_blocked =3D vmx_apic_init_signal_blocked,
+	.can_emulate_instruction =3D vt_can_emulate_instruction,
+	.apic_init_signal_blocked =3D vt_apic_init_signal_blocked,
 	.migrate_timers =3D vmx_migrate_timers,
=20
 	.msr_filter_changed =3D vmx_msr_filter_changed,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index c7164404a79f..cfce15d7361e 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3,6 +3,7 @@
 #include <linux/mmu_context.h>
=20
 #include <asm/fpu/xcr.h>
+#include <asm/virtext.h>
 #include <asm/tdx.h>
=20
 #include "capabilities.h"
@@ -485,8 +486,15 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
=20
 	vcpu->arch.tsc_offset =3D to_kvm_tdx(vcpu->kvm)->tsc_offset;
 	vcpu->arch.l1_tsc_offset =3D vcpu->arch.tsc_offset;
-	vcpu->arch.guest_state_protected =3D
-		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
+	/*
+	 * TODO: support off-TD debug.  If TD DEBUG is enabled, guest state
+	 * can be accessed. guest_state_protected =3D false. and kvm ioctl to
+	 * access CPU states should be usable for user space VMM (e.g. qemu).
+	 *
+	 * vcpu->arch.guest_state_protected =3D
+	 *	!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
+	 */
+	vcpu->arch.guest_state_protected =3D true;
=20
 	tdx->pi_desc.nv =3D POSTED_INTR_VECTOR;
 	tdx->pi_desc.sn =3D 1;
@@ -1658,6 +1666,49 @@ void tdx_enable_smi_window(struct kvm_vcpu *vcpu)
 	vcpu->arch.smi_pending =3D false;
 }
=20
+void tdx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
+{
+	/* Only x2APIC mode is supported for TD. */
+	WARN_ON_ONCE(kvm_get_apic_mode(vcpu) !=3D LAPIC_MODE_X2APIC);
+}
+
+int tdx_get_cpl(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
+{
+	kvm_register_mark_available(vcpu, reg);
+	switch (reg) {
+	case VCPU_REGS_RSP:
+	case VCPU_REGS_RIP:
+	case VCPU_EXREG_PDPTR:
+	case VCPU_EXREG_CR0:
+	case VCPU_EXREG_CR3:
+	case VCPU_EXREG_CR4:
+		break;
+	default:
+		KVM_BUG_ON(1, vcpu->kvm);
+		break;
+	}
+}
+
+unsigned long tdx_get_rflags(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+u64 tdx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+{
+	return 0;
+}
+
+void tdx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg)
+{
+	memset(var, 0, sizeof(*var));
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index a81b47307b39..7d5f1c0073b4 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -169,6 +169,14 @@ int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_in=
jection);
 int tdx_enter_smm(struct kvm_vcpu *vcpu, char *smstate);
 int tdx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate);
 void tdx_enable_smi_window(struct kvm_vcpu *vcpu);
+void tdx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
+
+int tdx_get_cpl(struct kvm_vcpu *vcpu);
+void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
+unsigned long tdx_get_rflags(struct kvm_vcpu *vcpu);
+bool tdx_is_emulated_msr(u32 index, bool write);
+u64 tdx_get_segment_base(struct kvm_vcpu *vcpu, int seg);
+void tdx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -210,10 +218,19 @@ static inline void tdx_get_exit_info(struct kvm_vcpu =
*vcpu, u32 *reason, u64 *in
 static inline bool tdx_is_emulated_msr(u32 index, bool write) { return fal=
se; }
 static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
 static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
+
 static inline int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injectio=
n) { return false; }
 static inline int tdx_enter_smm(struct kvm_vcpu *vcpu, char *smstate) { re=
turn 0; }
 static inline int tdx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate=
) { return 0; }
 static inline void tdx_enable_smi_window(struct kvm_vcpu *vcpu) {}
+static inline void tdx_set_virtual_apic_mode(struct kvm_vcpu *vcpu) {}
+
+static inline int tdx_get_cpl(struct kvm_vcpu *vcpu) { return 0; }
+static inline void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) =
{}
+static inline unsigned long tdx_get_rflags(struct kvm_vcpu *vcpu) { return=
 0; }
+static inline u64 tdx_get_segment_base(struct kvm_vcpu *vcpu, int seg) { r=
eturn 0; }
+static inline void tdx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segme=
nt *var,
+				   int seg) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3B4D6C43217
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:32:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232398AbiI3Kcq (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:32:46 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53738 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232653AbiI3K1I (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:27:08 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 129735B7B2;
        Fri, 30 Sep 2022 03:20:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533211; x=1696069211;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=cIu6PvsPhZ0UQ/64xg/km6uQfrpF5sEyR8sXlvhDs4Y=;
  b=m6cY7vG51kYqMbjyjegOxsViZhPW09EEp1LXbrLrdSDMhZJxmWFhzP1o
   8jceFltZE4DgIroe0FZC5r1kV6yyaCE6tLZxAnNbiV3s16EYTrutlM1y1
   4l4EDDEXq+wmrNL0m08odmygOxxqZfmwSxjg32RFAQ+94wRqykOpxIYgS
   MdJmVNH3JUC+iBXp4hz9jS1ye7cBM5MdqrmTo6/tGQ6caaAZx6e/RMMI8
   J8IPEww4qAP4dYub8kucp+U7YxG4XVtRyvQZciVoRpm9j8o/1WOq04hH5
   RVHHOZFFODRZnekp7BXDN7jfXAVtQf1U7ORQf1WeQ3IPX9nFeoV0bp7pU
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540187"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540187"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:09 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807849"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807849"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:08 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 103/105] Documentation/virt/kvm: Document on Trust Domain
 Extensions(TDX)
Date: Fri, 30 Sep 2022 03:18:37 -0700
Message-Id: 
 <f27f5d5040d66fa73d39d10bcd17da5857f3f835.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add documentation to Intel Trusted Domain Extensions(TDX) support.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/api.rst       |   9 +-
 Documentation/virt/kvm/index.rst     |   2 +
 Documentation/virt/kvm/intel-tdx.rst | 345 +++++++++++++++++++++++++++
 3 files changed, 355 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/virt/kvm/intel-tdx.rst

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index ebf5d2177933..3aa64ba9bb2b 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1426,6 +1426,9 @@ It is recommended to use this API instead of the KVM_=
SET_MEMORY_REGION ioctl.
 The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
 allocation and is deprecated.
=20
+For TDX guest, deleting/moving memory region loses guest memory contents.
+Read only region isn't supported.  Only as-id 0 is supported.
+
=20
 4.36 KVM_SET_TSS_ADDR
 ---------------------
@@ -4712,7 +4715,7 @@ H_GET_CPU_CHARACTERISTICS hypercall.
=20
 :Capability: basic
 :Architectures: x86
-:Type: vm
+:Type: vm ioctl, vcpu ioctl
 :Parameters: an opaque platform specific structure (in/out)
 :Returns: 0 on success; -1 on error
=20
@@ -4724,6 +4727,10 @@ Currently, this ioctl is used for issuing Secure Enc=
rypted Virtualization
 (SEV) commands on AMD Processors. The SEV commands are defined in
 Documentation/virt/kvm/x86/amd-memory-encryption.rst.
=20
+Currently, this ioctl is used for issuing Trusted Domain Extensions
+(TDX) commands on Intel Processors. The TDX commands are defined in
+Documentation/virt/kvm/intel-tdx.rst.
+
 4.111 KVM_MEMORY_ENCRYPT_REG_REGION
 -----------------------------------
=20
diff --git a/Documentation/virt/kvm/index.rst b/Documentation/virt/kvm/inde=
x.rst
index e0a2c74e1043..cdb8b43ce797 100644
--- a/Documentation/virt/kvm/index.rst
+++ b/Documentation/virt/kvm/index.rst
@@ -18,3 +18,5 @@ KVM
    locking
    vcpu-requests
    review-checklist
+
+   intel-tdx
diff --git a/Documentation/virt/kvm/intel-tdx.rst b/Documentation/virt/kvm/=
intel-tdx.rst
new file mode 100644
index 000000000000..6999b0f4f6c2
--- /dev/null
+++ b/Documentation/virt/kvm/intel-tdx.rst
@@ -0,0 +1,345 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+Intel Trust Domain Extensions (TDX)
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Overview
+=3D=3D=3D=3D=3D=3D=3D=3D
+TDX stands for Trust Domain Extensions which isolates VMs from
+the virtual-machine manager (VMM)/hypervisor and any other software on
+the platform. For details, see the specifications [1]_, whitepaper [2]_,
+architectural extensions specification [3]_, module documentation [4]_,
+loader interface specification [5]_, guest-hypervisor communication
+interface [6]_, virtual firmware design guide [7]_, and other resources
+([8]_, [9]_, [10]_, [11]_, and [12]_).
+
+
+API description
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+KVM_MEMORY_ENCRYPT_OP
+---------------------
+:Type: vm ioctl, vcpu ioctl
+
+For TDX operations, KVM_MEMORY_ENCRYPT_OP is re-purposed to be generic
+ioctl with TDX specific sub ioctl command.
+
+::
+
+  /* Trust Domain eXtension sub-ioctl() commands. */
+  enum kvm_tdx_cmd_id {
+          KVM_TDX_CAPABILITIES =3D 0,
+          KVM_TDX_INIT_VM,
+          KVM_TDX_INIT_VCPU,
+          KVM_TDX_INIT_MEM_REGION,
+          KVM_TDX_FINALIZE_VM,
+
+          KVM_TDX_CMD_NR_MAX,
+  };
+
+  struct kvm_tdx_cmd {
+        /* enum kvm_tdx_cmd_id */
+        __u32 id;
+        /* flags for sub-commend. If sub-command doesn't use this, set zer=
o. */
+        __u32 flags;
+        /*
+         * data for each sub-command. An immediate or a pointer to the act=
ual
+         * data in process virtual address.  If sub-command doesn't use it,
+         * set zero.
+         */
+        __u64 data;
+        /*
+         * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+         * status code in addition to -Exxx.
+         * Defined for consistency with struct kvm_sev_cmd.
+         */
+        __u64 error;
+        /* Reserved: Defined for consistency with struct kvm_sev_cmd. */
+        __u64 unused;
+  };
+
+KVM_TDX_CAPABILITIES
+--------------------
+:Type: vm ioctl
+
+Subset of TDSYSINFO_STRCUCT retrieved by TDH.SYS.INFO TDX SEAM call will be
+returned. Which describes about Intel TDX module.
+
+- id: KVM_TDX_CAPABILITIES
+- flags: must be 0
+- data: pointer to struct kvm_tdx_capabilities
+- error: must be 0
+- unused: must be 0
+
+::
+
+  struct kvm_tdx_cpuid_config {
+          __u32 leaf;
+          __u32 sub_leaf;
+          __u32 eax;
+          __u32 ebx;
+          __u32 ecx;
+          __u32 edx;
+  };
+
+  struct kvm_tdx_capabilities {
+          __u64 attrs_fixed0;
+          __u64 attrs_fixed1;
+          __u64 xfam_fixed0;
+          __u64 xfam_fixed1;
+
+          __u32 nr_cpuid_configs;
+          struct kvm_tdx_cpuid_config cpuid_configs[0];
+  };
+
+
+KVM_TDX_INIT_VM
+---------------
+:Type: vm ioctl
+
+Does additional VM initialization specific to TDX which corresponds to
+TDH.MNG.INIT TDX SEAM call.
+
+- id: KVM_TDX_INIT_VM
+- flags: must be 0
+- data: pointer to struct kvm_tdx_init_vm
+- error: must be 0
+- unused: must be 0
+
+::
+
+  struct kvm_tdx_init_vm {
+          __u32 max_vcpus;
+          __u32 reserved;
+          __u64 attributes;
+          __u64 cpuid;  /* pointer to struct kvm_cpuid2 */
+          __u64 mrconfigid[6];          /* sha384 digest */
+          __u64 mrowner[6];             /* sha384 digest */
+          __u64 mrownerconfig[6];       /* sha348 digest */
+          __u64 reserved[43];           /* must be zero for future extensi=
bility */
+  };
+
+
+KVM_TDX_INIT_VCPU
+-----------------
+:Type: vcpu ioctl
+
+Does additional VCPU initialization specific to TDX which corresponds to
+TDH.VP.INIT TDX SEAM call.
+
+- id: KVM_TDX_INIT_VCPU
+- flags: must be 0
+- data: initial value of the guest TD VCPU RCX
+- error: must be 0
+- unused: must be 0
+
+KVM_TDX_INIT_MEM_REGION
+-----------------------
+:Type: vm ioctl
+
+Encrypt a memory continuous region which corresponding to TDH.MEM.PAGE.ADD
+TDX SEAM call.
+If KVM_TDX_MEASURE_MEMORY_REGION flag is specified, it also extends measur=
ement
+which corresponds to TDH.MR.EXTEND TDX SEAM call.
+
+- id: KVM_TDX_INIT_VCPU
+- flags: flags
+            currently only KVM_TDX_MEASURE_MEMORY_REGION is defined
+- data: pointer to struct kvm_tdx_init_mem_region
+- error: must be 0
+- unused: must be 0
+
+::
+
+  #define KVM_TDX_MEASURE_MEMORY_REGION   (1UL << 0)
+
+  struct kvm_tdx_init_mem_region {
+          __u64 source_addr;
+          __u64 gpa;
+          __u64 nr_pages;
+  };
+
+
+KVM_TDX_FINALIZE_VM
+-------------------
+:Type: vm ioctl
+
+Complete measurement of the initial TD contents and mark it ready to run
+which corresponds to TDH.MR.FINALIZE
+
+- id: KVM_TDX_FINALIZE_VM
+- flags: must be 0
+- data: must be 0
+- error: must be 0
+- unused: must be 0
+
+KVM TDX creation flow
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+In addition to KVM normal flow, new TDX ioctls need to be called.  The con=
trol flow
+looks like as follows.
+
+#. system wide capability check
+
+   * KVM_CAP_VM_TYPES: check if VM type is supported and if TDX_VM_TYPE is
+     supported.
+
+#. creating VM
+
+   * KVM_CREATE_VM
+   * KVM_TDX_CAPABILITIES: query if TDX is supported on the platform.
+   * KVM_TDX_INIT_VM: pass TDX specific VM parameters.
+
+#. creating VCPU
+
+   * KVM_CREATE_VCPU
+   * KVM_TDX_INIT_VCPU: pass TDX specific VCPU parameters.
+
+#. initializing guest memory
+
+   * allocate guest memory and initialize page same to normal KVM case
+     In TDX case, parse and load TDVF into guest memory in addition.
+   * KVM_TDX_INIT_MEM_REGION to add and measure guest pages.
+     If the pages has contents above, those pages need to be added.
+     Otherwise the contents will be lost and guest sees zero pages.
+   * KVM_TDX_FINALIAZE_VM: Finalize VM and measurement
+     This must be after KVM_TDX_INIT_MEM_REGION.
+
+#. run vcpu
+
+Design discussion
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Coexistence of normal(VMX) VM and TD VM
+---------------------------------------
+It's required to allow both legacy(normal VMX) VMs and new TD VMs to
+coexist. Otherwise the benefits of VM flexibility would be eliminated.
+The main issue for it is that the logic of kvm_x86_ops callbacks for
+TDX is different from VMX. On the other hand, the variable,
+kvm_x86_ops, is global single variable. Not per-VM, not per-vcpu.
+
+Several points to be considered:
+
+  * No or minimal overhead when TDX is disabled(CONFIG_INTEL_TDX_HOST=3Dn).
+  * Avoid overhead of indirect call via function pointers.
+  * Contain the changes under arch/x86/kvm/vmx directory and share logic
+    with VMX for maintenance.
+    Even though the ways to operation on VM (VMX instruction vs TDX
+    SEAM call) is different, the basic idea remains same. So, many
+    logic can be shared.
+  * Future maintenance
+    The huge change of kvm_x86_ops in (near) future isn't expected.
+    a centralized file is acceptable.
+
+- Wrapping kvm x86_ops: The current choice
+
+  Introduce dedicated file for arch/x86/kvm/vmx/main.c (the name,
+  main.c, is just chosen to show main entry points for callbacks.) and
+  wrapper functions around all the callbacks with
+  "if (is-tdx) tdx-callback() else vmx-callback()".
+
+  Pros:
+
+  - No major change in common x86 KVM code. The change is (mostly)
+    contained under arch/x86/kvm/vmx/.
+  - When TDX is disabled(CONFIG_INTEL_TDX_HOST=3Dn), the overhead is
+    optimized out.
+  - Micro optimization by avoiding function pointer.
+
+  Cons:
+
+  - Many boiler plates in arch/x86/kvm/vmx/main.c.
+
+KVM MMU Changes
+---------------
+KVM MMU needs to be enhanced to handle Secure/Shared-EPT. The
+high-level execution flow is mostly same to normal EPT case.
+EPT violation/misconfiguration -> invoke TDP fault handler ->
+resolve TDP fault -> resume execution. (or emulate MMIO)
+The difference is, that S-EPT is operated(read/write) via TDX SEAM
+call which is expensive instead of direct read/write EPT entry.
+One bit of GPA (51 or 47 bit) is repurposed so that it means shared
+with host(if set to 1) or private to TD(if cleared to 0).
+
+- The current implementation
+
+  * Reuse the existing MMU code with minimal update.  Because the
+    execution flow is mostly same. But additional operation, TDX call
+    for S-EPT, is needed. So add hooks for it to kvm_x86_ops.
+  * For performance, minimize TDX SEAM call to operate on S-EPT. When
+    getting corresponding S-EPT pages/entry from faulting GPA, don't
+    use TDX SEAM call to read S-EPT entry. Instead create shadow copy
+    in host memory.
+    Repurpose the existing kvm_mmu_page as shadow copy of S-EPT and
+    associate S-EPT to it.
+  * Treats share bit as attributes. mask/unmask the bit where
+    necessary to keep the existing traversing code works.
+    Introduce kvm.arch.gfn_shared_mask and use "if (gfn_share_mask)"
+    for special case.
+
+    * 0 : for non-TDX case
+    * 51 or 47 bit set for TDX case.
+
+  Pros:
+
+  - Large code reuse with minimal new hooks.
+  - Execution path is same.
+
+  Cons:
+
+  - Complicates the existing code.
+  - Repurpose kvm_mmu_page as shadow of Secure-EPT can be confusing.
+
+New KVM API, ioctl (sub)command, to manage TD VMs
+-------------------------------------------------
+Additional KVM API are needed to control TD VMs. The operations on TD
+VMs are specific to TDX.
+
+- Piggyback and repurpose KVM_MEMORY_ENCRYPT_OP
+
+  Although not all operation isn't memory encryption, repupose to get
+  TDX specific ioctls.
+
+  Pros:
+
+  - No major change in common x86 KVM code.
+
+  Cons:
+
+  - The operations aren't actually memory encryption, but operations
+    on TD VMs.
+
+References
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+.. [1] TDX specification
+   https://software.intel.com/content/www/us/en/develop/articles/intel-tru=
st-domain-extensions.html
+.. [2] Intel Trust Domain Extensions (Intel TDX)
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/tdx-whitepaper-final9-17.pdf
+.. [3] Intel CPU Architectural Extensions Specification
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/intel-tdx-cpu-architectural-specification.pdf
+.. [4] Intel TDX Module 1.0 EAS
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/intel-tdx-module-1eas.pdf
+.. [5] Intel TDX Loader Interface Specification
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/intel-tdx-seamldr-interface-specification.pdf
+.. [6] Intel TDX Guest-Hypervisor Communication Interface
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/intel-tdx-guest-hypervisor-communication-interface.pdf
+.. [7] Intel TDX Virtual Firmware Design Guide
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/tdx-virtual-firmware-design-guide-rev-1.
+.. [8] intel public github
+
+   * kvm TDX branch: https://github.com/intel/tdx/tree/kvm
+   * TDX guest branch: https://github.com/intel/tdx/tree/guest
+
+.. [9] tdvf
+    https://github.com/tianocore/edk2-staging/tree/TDVF
+.. [10] KVM forum 2020: Intel Virtualization Technology Extensions to
+     Enable Hardware Isolated VMs
+     https://osseu2020.sched.com/event/eDzm/intel-virtualization-technolog=
y-extensions-to-enable-hardware-isolated-vms-sean-christopherson-intel
+.. [11] Linux Security Summit EU 2020:
+     Architectural Extensions for Hardware Virtual Machine Isolation
+     to Advance Confidential Computing in Public Clouds - Ravi Sahita
+     & Jun Nakajima, Intel Corporation
+     https://osseu2020.sched.com/event/eDOx/architectural-extensions-for-h=
ardware-virtual-machine-isolation-to-advance-confidential-computing-in-publ=
ic-clouds-ravi-sahita-jun-nakajima-intel-corporation
+.. [12] [RFCv2,00/16] KVM protected memory extension
+     https://lkml.org/lkml/2020/10/20/66
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0DAF1C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:32:52 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232527AbiI3Kcu (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:32:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53854 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232658AbiI3K1I (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:27:08 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0420D5A3C0;
        Fri, 30 Sep 2022 03:20:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533211; x=1696069211;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=/iL/5HKwTLM57fOI1n7l4Rg3A8WpxBIbgeC/p+evIPo=;
  b=Z2xipjAazj0ZwCwz80/rifaVNRcqC0nB7xNr87S8SIqttR/+04ApajQr
   pd3tgjJJr/SF3blTCkMnDndIHNu72BD/cYqT+g6Eb090bDrTzl5dwmIfZ
   wxQEm7DQL+Gqx+M8HNggtFBQcDf5W9uhrb8ZkyMWoMIVdflqg1amaFtrH
   vpp0MMFryhCjIBjxv5XpENVCNvjM3fe3lx1GGrblB7qhK8ClvAA4I8isH
   jfUMrqkd+b+k0OM6FHU3od5lU2yb0PGjAgQBb7sf6xU/bV/Ja3+od0p2N
   fGIDsmWZFg2CcRjgfzmS9tWHDQMvU+2hoQnncK/QIVxW0GuXt5VC+CZne
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540188"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540188"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:09 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807852"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807852"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:09 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        Bagas Sanjaya <bagasdotme@gmail.com>
Subject: [PATCH v9 104/105] KVM: x86: design documentation on TDX support of
 x86 KVM TDP MMU
Date: Fri, 30 Sep 2022 03:18:38 -0700
Message-Id: 
 <17fa5756b75d0a1639c68482241bcdbf68e37325.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a high level design document on TDX changes to TDP MMU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
---
 Documentation/virt/kvm/tdx-tdp-mmu.rst | 417 +++++++++++++++++++++++++
 1 file changed, 417 insertions(+)
 create mode 100644 Documentation/virt/kvm/tdx-tdp-mmu.rst

diff --git a/Documentation/virt/kvm/tdx-tdp-mmu.rst b/Documentation/virt/kv=
m/tdx-tdp-mmu.rst
new file mode 100644
index 000000000000..2d91c94e6d8f
--- /dev/null
+++ b/Documentation/virt/kvm/tdx-tdp-mmu.rst
@@ -0,0 +1,417 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Design of TDP MMU for TDX support
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
+This document describes a (high level) design for TDX support of KVM TDP M=
MU of
+x86 KVM.
+
+In this document, we use "TD" or "guest TD" to differentiate it from the c=
urrent
+"VM" (Virtual Machine), which is supported by KVM today.
+
+
+Background of TDX
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+TD private memory is designed to hold TD private content, encrypted by the=
 CPU
+using the TD ephemeral key.  An encryption engine holds a table of encrypt=
ion
+keys, and an encryption key is selected for each memory transaction based =
on a
+Host Key Identifier (HKID).  By design, the host VMM does not have access =
to the
+encryption keys.
+
+In the first generation of MKTME, HKID is "stolen" from the physical addre=
ss by
+allocating a configurable number of bits from the top of the physical addr=
ess.
+The HKID space is partitioned into shared HKIDs for legacy MKTME accesses =
and
+private HKIDs for SEAM-mode-only accesses.  We use 0 for the shared HKID o=
n the
+host so that MKTME can be opaque or bypassed on the host.
+
+During TDX non-root operation (i.e. guest TD), memory accesses can be qual=
ified
+as either shared or private, based on the value of a new SHARED bit in the=
 Guest
+Physical Address (GPA).  The CPU translates shared GPAs using the usual VM=
X EPT
+(Extended Page Table) or "Shared EPT" (in this document), which resides in=
 the
+host VMM memory.  The Shared EPT is directly managed by the host VMM - the=
 same
+as with the current VMX.  Since guest TDs usually require I/O, and the data
+exchange needs to be done via shared memory, thus KVM needs to use the cur=
rent
+EPT functionality even for TDs.
+
+The CPU translates private GPAs using a separate Secure EPT.  The Secure E=
PT
+pages are encrypted and integrity-protected with the TD's ephemeral privat=
e key.
+Secure EPT can be managed _indirectly_ by the host VMM, using the TDX inte=
rface
+functions (SEAMCALLs), and thus conceptually Secure EPT is a subset of EPT
+because not all functionalities are available.
+
+Since the execution of such interface functions takes much longer time than
+accessing memory directly, in KVM we use the existing TDP code to mirror t=
he
+Secure EPT for the TD. And we think there are at least two options today in
+terms of the timing for executing such SEAMCALLs:
+
+1. synchronous, i.e. while walking the TDP page tables, or
+2. post-walk, i.e. record what needs to be done to the real Secure EPT dur=
ing
+   the walk, and execute SEAMCALLs later.
+
+The option 1 seems to be more intuitive and simpler, but the Secure EPT
+concurrency rules are different from the ones of the TDP or EPT. For examp=
le,
+MEM.SEPT.RD acquire shared access to the whole Secure EPT tree of the targ=
et
+
+Secure EPT(SEPT) operations
+---------------------------
+Secure EPT is an Extended Page Table for GPA-to-HPA translation of TD priv=
ate
+HPA.  A Secure EPT is designed to be encrypted with the TD's ephemeral pri=
vate
+key. SEPT pages are allocated by the host VMM via Intel TDX functions, but=
 their
+content is intended to be hidden and is not architectural.
+
+Unlike the conventional EPT, the CPU can't directly read/write its entry.
+Instead, TDX SEAMCALL API is used.  Several SEAMCALLs correspond to operat=
ion on
+the EPT entry.
+
+* TDH.MEM.SEPT.ADD():
+
+  Add a secure EPT page from the secure EPT tree.  This corresponds to upd=
ating
+  the non-leaf EPT entry with present bit set
+
+* TDH.MEM.SEPT.REMOVE():
+
+  Remove the secure page from the secure EPT tree.  There is no correspond=
ing
+  to the EPT operation.
+
+* TDH.MEM.SEPT.RD():
+
+  Read the secure EPT entry.  This corresponds to reading the EPT entry as
+  memory.  Please note that this is much slower than direct memory reading.
+
+* TDH.MEM.PAGE.ADD() and TDH.MEM.PAGE.AUG():
+
+  Add a private page to the secure EPT tree.  This corresponds to updating=
 the
+  leaf EPT entry with present bit set.
+
+* THD.MEM.PAGE.REMOVE():
+
+  Remove a private page from the secure EPT tree.  There is no correspondi=
ng
+  to the EPT operation.
+
+* TDH.MEM.RANGE.BLOCK():
+
+  This (mostly) corresponds to clearing the present bit of the leaf EPT en=
try.
+  Note that the private page is still linked in the secure EPT.  To remove=
 it
+  from the secure EPT, TDH.MEM.SEPT.REMOVE() and TDH.MEM.PAGE.REMOVE() nee=
ds to
+  be called.
+
+* TDH.MEM.TRACK():
+
+  Increment the TLB epoch counter. This (mostly) corresponds to EPT TLB fl=
ush.
+  Note that the private page is still linked in the secure EPT.  To remove=
 it
+  from the secure EPT, tdh_mem_page_remove() needs to be called.
+
+
+Adding private page
+-------------------
+The procedure of populating the private page looks as follows.
+
+1. TDH.MEM.SEPT.ADD(512G level)
+2. TDH.MEM.SEPT.ADD(1G level)
+3. TDH.MEM.SEPT.ADD(2M level)
+4. TDH.MEM.PAGE.AUG(4K level)
+
+Those operations correspond to updating the EPT entries.
+
+Dropping private page and TLB shootdown
+---------------------------------------
+The procedure of dropping the private page looks as follows.
+
+1. TDH.MEM.RANGE.BLOCK(4K level)
+
+   This mostly corresponds to clear the present bit in the EPT entry.  This
+   prevents (or blocks) TLB entry from creating in the future.  Note that =
the
+   private page is still linked in the secure EPT tree and the existing ca=
che
+   entry in the TLB isn't flushed.
+
+2. TDH.MEM.TRACK(range) and TLB shootdown
+
+   This mostly corresponds to the EPT TLB shootdown.  Because all vcpus sh=
are
+   the same Secure EPT, all vcpus need to flush TLB.
+
+   * TDH.MEM.TRACK(range) by one vcpu.  It increments the global internal =
TLB
+     epoch counter.
+
+   * send IPI to remote vcpus
+   * Other vcpu exits to VMM from guest TD and then re-enter. TDH.VP.ENTER=
().
+   * TDH.VP.ENTER() checks the TLB epoch counter and If its TLB is old, fl=
ush
+     TLB.
+
+   Note that only single vcpu issues tdh_mem_track().
+
+   Note that the private page is still linked in the secure EPT tree, unli=
ke the
+   conventional EPT.
+
+3. TDH.MEM.PAGE.PROMOTE, TDH.MEM.PAGEDEMOTE(), TDH.MEM.PAGE.RELOCATE(), or
+   TDH.MEM.PAGE.REMOVE()
+
+   There is no corresponding operation to the conventional EPT.
+
+   * When changing page size (e.g. 4K <-> 2M) TDH.MEM.PAGE.PROMOTE() or
+     TDH.MEM.PAGE.DEMOTE() is used.  During those operation, the guest pag=
e is
+     kept referenced in the Secure EPT.
+
+   * When migrating page, TDH.MEM.PAGE.RELOCATE().  This requires both sou=
rce
+     page and destination page.
+   * when destroying TD, TDH.MEM.PAGE.REMOVE() removes the private page fr=
om the
+     secure EPT tree.  In this case TLB shootdown is not needed because vc=
pus
+     don't run any more.
+
+The basic idea for TDX support
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
+Because shared EPT is the same as the existing EPT, use the existing logic=
 for
+shared EPT.  On the other hand, secure EPT requires additional operations
+instead of directly reading/writing of the EPT entry.
+
+On EPT violation, The KVM mmu walks down the EPT tree from the root, deter=
mines
+the EPT entry to operate, and updates the entry. If necessary, a TLB shoot=
down
+is done.  Because it's very slow to directly walk secure EPT by TDX SEAMCA=
LL,
+TDH.MEM.SEPT.RD(), the mirror of secure EPT is created and maintained.  Add
+hooks to KVM MMU to reuse the existing code.
+
+EPT violation on shared GPA
+---------------------------
+(1) EPT violation on shared GPA or zapping shared GPA
+    ::
+
+        walk down shared EPT tree (the existing code)
+                |
+                |
+                V
+        shared EPT tree (CPU refers.)
+
+(2) update the EPT entry. (the existing code)
+
+    TLB shootdown in the case of zapping.
+
+
+EPT violation on private GPA
+----------------------------
+(1) EPT violation on private GPA or zapping private GPA
+    ::
+
+        walk down the mirror of secure EPT tree (mostly same as the existi=
ng code)
+            |
+            |
+            V
+        mirror of secure EPT tree (KVM MMU software only. reuse of the exi=
sting code)
+
+(2) update the (mirrored) EPT entry. (mostly same as the existing code)
+
+(3) call the hooks with what EPT entry is changed
+    ::
+
+           |
+        NEW: hooks in KVM MMU
+           |
+           V
+        secure EPT root(CPU refers)
+
+(4) the TDX backend calls necessary TDX SEAMCALLs to update real secure EP=
T.
+
+The major modification is to add hooks for the TDX backend for additional
+operations and to pass down which EPT, shared EPT, or private EPT is used,=
 and
+twist the behavior if we're operating on private EPT.
+
+The following depicts the relationship.
+::
+
+                    KVM                             |       TDX module
+                     |                              |           |
+        -------------+----------                    |           |
+        |                      |                    |           |
+        V                      V                    |           |
+     shared GPA           private GPA               |           |
+  CPU shared EPT pointer  KVM private EPT pointer   |  CPU secure EPT poin=
ter
+        |                      |                    |           |
+        |                      |                    |           |
+        V                      V                    |           V
+  shared EPT                private EPT<-------mirror----->Secure EPT
+        |                      |                    |           |
+        |                      \--------------------+------\    |
+        |                                           |      |    |
+        V                                           |      V    V
+  shared guest page                                 |    private guest page
+                                                    |
+                                                    |
+                              non-encrypted memory  |    encrypted memory
+                                                    |
+
+shared EPT: CPU and KVM walk with shared GPA
+            Maintained by the existing code
+private EPT: KVM walks with private GPA
+             Maintained by the twisted existing code
+secure EPT: CPU walks with private GPA.
+            Maintained by TDX module with TDX SEAMCALLs via hooks
+
+
+Tracking private EPT page
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
+Shared EPT pages are managed by struct kvm_mmu_page.  They are linked in a=
 list
+structure.  When necessary, the list is traversed to operate on.  Private =
EPT
+pages have different characteristics.  For example, private pages can't be
+swapped out.  When shrinking memory, we'd like to traverse only shared EPT=
 pages
+and skip private EPT pages.  Likewise, page migration isn't supported for
+private pages (yet).  Introduce an additional list to track shared EPT pag=
es and
+track private EPT pages independently.
+
+At the beginning of EPT violation, the fault handler knows fault GPA, thus=
 it
+knows which EPT to operate on, private or shared.  If it's private EPT,
+an additional task is done.  Something like "if (private) { callback a hoo=
k }".
+Since the fault handler has deep function calls, it's cumbersome to hold t=
he
+information of which EPT is operating.  Options to mitigate it are
+
+1. Pass the information as an argument for the function call.
+2. Record the information in struct kvm_mmu_page somehow.
+3. Record the information in vcpu structure.
+
+Option 2 was chosen.  Because option 1 requires modifying all the function=
s.  It
+would affect badly to the normal case.  Option 3 doesn't work well because=
 in
+some cases, we need to walk both private and shared EPT.
+
+The role of the EPT page can be utilized and one bit can be curved out from
+unused bits in struct kvm_mmu_page_role.  When allocating the EPT page,
+initialize the information. Mostly struct kvm_mmu_page is available because
+we're operating on EPT pages.
+
+
+The conversion of private GPA and shared GPA
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+A page of a given GPA can be assigned to only private GPA xor shared GPA a=
t one
+time.  The GPA can't be accessed simultaneously via both private GPA and s=
hared
+GPA.  On guest startup, all the GPAs are assigned as private.  Guest conve=
rts
+the range of GPA to shared (or private) from private (or shared) by MapGPA
+hypercall.  MapGPA hypercall takes the start GPA and the size of the regio=
n.  If
+the given start GPA is shared, VMM converts the region into shared (if it's
+already shared, nop).  If the start GPA is private, VMM converts the regio=
n into
+private.  It implies the guest won't access the unmapped region. private(or
+shared) region after converting to shared(or private).
+
+If the guest TD triggers an EPT violation on the already converted region,=
 the
+access won't be allowed (loop in EPT violation) until other vcpu converts =
back
+the region.
+
+KVM MMU records which GPA is allowed to access, private or shared by xarra=
y.
+
+
+The original TDP MMU and race condition
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+Because vcpus share the EPT, once the EPT entry is zapped, we need to shoo=
tdown
+TLB.  Send IPI to remote vcpus.  Remote vcpus flush their down TLBs.  Unti=
l TLB
+shootdown is done, vcpus may reference the zapped guest page.
+
+TDP MMU uses read lock of mmu_lock to mitigate vcpu contention.  When read=
 lock
+is obtained, it depends on the atomic update of the EPT entry.  (On the ot=
her
+hand legacy MMU uses write lock.)  When vcpu is populating/zapping the EPT=
 entry
+with a read lock held, other vcpu may be populating or zapping the same EPT
+entry at the same time.
+
+To avoid the race condition, the entry is frozen.  It means the EPT entry =
is set
+to the special value, REMOVED_SPTE which clears the present bit.  And then=
 after
+TLB shootdown, update the EPT entry to the final value.
+
+Concurrent zapping
+------------------
+1. read lock
+2. freeze the EPT entry (atomically set the value to REMOVED_SPTE)
+   If other vcpu froze the entry, restart page fault.
+3. TLB shootdown
+
+   * send IPI to remote vcpus
+   * TLB flush (local and remote)
+
+   For each entry update, TLB shootdown is needed because of the
+   concurrency.
+4. atomically set the EPT entry to the final value
+5. read unlock
+
+Concurrent populating
+---------------------
+In the case of populating the non-present EPT entry, atomically update the=
 EPT
+entry.
+
+1. read lock
+
+2. atomically update the EPT entry
+   If other vcpu frozen the entry or updated the entry, restart page fault.
+
+3. read unlock
+
+In the case of updating the present EPT entry (e.g. page migration), the
+operation is split into two.  Zapping the entry and populating the entry.
+
+1. read lock
+2. zap the EPT entry.  follow the concurrent zapping case.
+3. populate the non-present EPT entry.
+4. read unlock
+
+Non-concurrent batched zapping
+------------------------------
+In some cases, zapping the ranges is done exclusively with a write lock he=
ld.
+In this case, the TLB shootdown is batched into one.
+
+1. write lock
+2. zap the EPT entries by traversing them
+3. TLB shootdown
+4. write unlock
+
+For Secure EPT, TDX SEAMCALLs are needed in addition to updating the mirro=
red
+EPT entry.
+
+TDX concurrent zapping
+----------------------
+Add a hook for TDX SEAMCALLs at the step of the TLB shootdown.
+
+1. read lock
+2. freeze the EPT entry(set the value to REMOVED_SPTE)
+3. TLB shootdown via a hook
+
+   * TLB.MEM.RANGE.BLOCK()
+   * TLB.MEM.TRACK()
+   * send IPI to remote vcpus
+
+4. set the EPT entry to the final value
+5. read unlock
+
+TDX concurrent populating
+-------------------------
+TDX SEAMCALLs are required in addition to operating the mirrored EPT entry=
.  The
+frozen entry is utilized by following the zapping case to avoid the race
+condition.  A hook can be added.
+
+1. read lock
+2. freeze the EPT entry
+3. hook
+
+   * TDH_MEM_SEPT_ADD() for non-leaf or TDH_MEM_PAGE_AUG() for leaf.
+
+4. set the EPT entry to the final value
+5. read unlock
+
+Without freezing the entry, the following race can happen.  Suppose two vc=
pus
+are faulting on the same GPA and the 2M and 4K level entries aren't popula=
ted
+yet.
+
+* vcpu 1: update 2M level EPT entry
+* vcpu 2: update 4K level EPT entry
+* vcpu 2: TDX SEAMCALL to update 4K secure EPT entry =3D> error
+* vcpu 1: TDX SEAMCALL to update 2M secure EPT entry
+
+
+TDX non-concurrent batched zapping
+----------------------------------
+For simplicity, the procedure of concurrent populating is utilized.  The
+procedure can be optimized later.
+
+
+Co-existing with unmapping guest private memory
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+TODO.  This needs to be addressed.
+
+
+Restrictions or future work
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
+The following features aren't supported yet at the moment.
+
+* optimizing non-concurrent zap
+* Large page
+* Page migration
--=20
2.25.1
From nobody Sat Feb  7 22:05:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CA109C433F5
	for <linux-kernel@archiver.kernel.org>; Fri, 30 Sep 2022 10:32:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232690AbiI3Kcj (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 30 Sep 2022 06:32:39 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53010 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232650AbiI3K1E (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Sep 2022 06:27:04 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 035F758B69;
        Fri, 30 Sep 2022 03:20:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1664533211; x=1696069211;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=sOBwFi6TijFmWxRR2u7LJCs6ZAO9aD248QKXtMkkXSw=;
  b=e1DBSzh2YbCM1KOgqHWww2U1C+WX19MvVBNJa523OFH6dWy+Ir0uaums
   oU0lsP/9HTmn6iL5Ai4ckcMlnGvxI5U0hE4zfA8xEKAs+Huqiai4gs7qq
   QqHEu7Vi8Q6LteBE+zpzh673hFD0hfJjGNpIp5JSJoo9ouoeV6g5t/amH
   nM+NIYFFNWAg5FC6wrSlYXAJAdAMw/ggowwqIsnO4x18MFVYKZYiE7o1d
   kPLV2v7JAlXLXwp+3O7BK2NMJ8uN6D0MMfoXLpf9AXPNjLmsr3lB1d035
   vpHCER+xqWJKUd1cqZE4YcqKOjRvkxBcIo/OiY6UJxWBHMeGUSAe1yRDE
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="328540189"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="328540189"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:09 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="726807855"
X-IronPort-AV: E=Sophos;i="5.93,358,1654585200";
   d="scan'208";a="726807855"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Sep 2022 03:19:09 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [PATCH v9 105/105] [MARKER] the end of (the first phase of) TDX KVM
 patch series
Date: Fri, 30 Sep 2022 03:18:39 -0700
Message-Id: 
 <981f4f50663e0e86f6cfebef0c3ea0fe662f9be3.1664530908.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1664530907.git.isaku.yamahata@intel.com>
References: <cover.1664530907.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the end of (the first phase of) patch series
of TDX KVM support.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 .../virt/kvm/intel-tdx-layer-status.rst       | 33 -------------------
 1 file changed, 33 deletions(-)
 delete mode 100644 Documentation/virt/kvm/intel-tdx-layer-status.rst

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
deleted file mode 100644
index 1cec14213f69..000000000000
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ /dev/null
@@ -1,33 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-Intel Trust Dodmain Extensions(TDX)
-=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-
-Layer status
-=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-What qemu can do
-----------------
-- TDX VM TYPE is exposed to Qemu.
-- Qemu can create/destroy guest of TDX vm type.
-- Qemu can create/destroy vcpu of TDX vm type.
-- Qemu can populate initial guest memory image.
-- Qemu can finalize guest TD.
-- Qemu can start to run vcpu. But vcpu can not make progress yet.
-
-Patch Layer status
-------------------
-  Patch layer                          Status
-* TDX, VMX coexistence:                 Applied
-* TDX architectural definitions:        Applied
-* TD VM creation/destruction:           Applied
-* TD vcpu creation/destruction:         Applied
-* TDX EPT violation:                    Applied
-* TD finalization:                      Applied
-* TD vcpu enter/exit:                   Applied
-* TD vcpu interrupts/exit/hypercall:    Not yet
-
-* KVM MMU GPA shared bits:              Applied
-* KVM TDP refactoring for TDX:          Applied
-* KVM TDP MMU hooks:                    Applied
-* KVM TDP MMU MapGPA:                   Applied
--=20
2.25.1