From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F3929C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:15:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383136AbiEESTb (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:19:31 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36186 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383061AbiEESTW (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:22 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C75811C11;
        Thu,  5 May 2022 11:15:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774541; x=1683310541;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=H9O7IPy1rSgvtrGJ60iZsGVmHOxopCMfq6hGotmciaM=;
  b=WMyBTHIB3hn7MlPHpMHEJ+c6kmo8IIP9T7jmgR5SuZ6Ky6TA9+lXWKS5
   imXqvRQZAujmej/kbjCrR5/N18Ha1iKwscSOhJjHPXa+eVs0TlNP5ZbyE
   ZSnqIhSn/R8C6OqISPP5j5l5H6BdioLBE3akOq4RAgDoPE8/EmthFgRDO
   YmCprg+Kx8CTTUr2SGWsW60SQK9yqDS3EaJBu24oUMrRS646+EuD2Egp/
   F43uFNZyOmxTESMxikFAFLX419JzWxPJbGbD77Gr7soyq1MvYeVQO9ZLV
   VLdy7LvAW0mzr8x2Y7EQ5/+BhawXOPsHucCB4DA9jUbg8EEckRqXA8FeU
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746217"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746217"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:39 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083129"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:39 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 001/104] KVM: x86: Move check_processor_compatibility
 from init ops to runtime ops
Date: Thu,  5 May 2022 11:13:55 -0700
Message-Id: 
 <ddc0c6f84441e74a2e134bd1c0e035e42471a5d1.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Chao Gao <chao.gao@intel.com>

so that KVM can do compatibility checks on hotplugged CPUs. Drop __init
from check_processor_compatibility() and its callees.

use a static_call() to invoke .check_processor_compatibility.

Opportunistically rename {svm,vmx}_check_processor_compat to conform
to the naming convention of fields of kvm_x86_ops.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220216031528.92558-2-chao.gao@intel.com
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 +-
 arch/x86/kvm/svm/svm.c             |  4 ++--
 arch/x86/kvm/vmx/evmcs.c           |  2 +-
 arch/x86/kvm/vmx/evmcs.h           |  2 +-
 arch/x86/kvm/vmx/vmx.c             | 14 +++++++-------
 arch/x86/kvm/x86.c                 |  3 +--
 7 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 6f2f1affbb78..75bc44aa8d51 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -129,6 +129,7 @@ KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP(check_processor_compatibility)
=20
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index bd8ac9498f1f..5bea36d2d5a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1331,6 +1331,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_m=
ode_logical)
 struct kvm_x86_ops {
 	const char *name;
=20
+	int (*check_processor_compatibility)(void);
 	int (*hardware_enable)(void);
 	void (*hardware_disable)(void);
 	void (*hardware_unsetup)(void);
@@ -1541,7 +1542,6 @@ struct kvm_x86_nested_ops {
 struct kvm_x86_init_ops {
 	int (*cpu_has_kvm_support)(void);
 	int (*disabled_by_bios)(void);
-	int (*check_processor_compatibility)(void);
 	int (*hardware_setup)(void);
 	unsigned int (*handle_intel_pt_intr)(void);
=20
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 63880b33ce37..d977e4ad133d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3985,7 +3985,7 @@ svm_patch_hypercall(struct kvm_vcpu *vcpu, unsigned c=
har *hypercall)
 	hypercall[2] =3D 0xd9;
 }
=20
-static int __init svm_check_processor_compat(void)
+static int svm_check_processor_compatibility(void)
 {
 	return 0;
 }
@@ -4603,6 +4603,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata =3D {
 	.name =3D "kvm_amd",
=20
 	.hardware_unsetup =3D svm_hardware_unsetup,
+	.check_processor_compatibility =3D svm_check_processor_compatibility,
 	.hardware_enable =3D svm_hardware_enable,
 	.hardware_disable =3D svm_hardware_disable,
 	.has_emulated_msr =3D svm_has_emulated_msr,
@@ -4989,7 +4990,6 @@ static struct kvm_x86_init_ops svm_init_ops __initdat=
a =3D {
 	.cpu_has_kvm_support =3D has_svm,
 	.disabled_by_bios =3D is_disabled,
 	.hardware_setup =3D svm_hardware_setup,
-	.check_processor_compatibility =3D svm_check_processor_compat,
=20
 	.runtime_ops =3D &svm_x86_ops,
 	.pmu_ops =3D &amd_pmu_ops,
diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 6a61b1ae7942..3f84680c8139 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -295,7 +295,7 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] =3D {
 const unsigned int nr_evmcs_1_fields =3D ARRAY_SIZE(vmcs_field_to_evmcs_1);
=20
 #if IS_ENABLED(CONFIG_HYPERV)
-__init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf)
+void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf)
 {
 	vmcs_conf->cpu_based_exec_ctrl &=3D ~EVMCS1_UNSUPPORTED_EXEC_CTRL;
 	vmcs_conf->pin_based_exec_ctrl &=3D ~EVMCS1_UNSUPPORTED_PINCTRL;
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index f886a8ff0342..276f788cef15 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -212,7 +212,7 @@ static inline void evmcs_load(u64 phys_addr)
 	vp_ap->enlighten_vmentry =3D 1;
 }
=20
-__init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf);
+void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf);
 #else /* !IS_ENABLED(CONFIG_HYPERV) */
 static __always_inline void evmcs_write64(unsigned long field, u64 value) =
{}
 static inline void evmcs_write32(unsigned long field, u32 value) {}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 26ec9b814651..e30493fe4553 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2394,8 +2394,8 @@ static bool cpu_has_sgx(void)
 	return cpuid_eax(0) >=3D 0x12 && (cpuid_eax(0x12) & BIT(0));
 }
=20
-static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
-				      u32 msr, u32 *result)
+static int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
+			       u32 msr, u32 *result)
 {
 	u32 vmx_msr_low, vmx_msr_high;
 	u32 ctl =3D ctl_min | ctl_opt;
@@ -2413,7 +2413,7 @@ static __init int adjust_vmx_controls(u32 ctl_min, u3=
2 ctl_opt,
 	return 0;
 }
=20
-static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
+static u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
 {
 	u64 allowed;
=20
@@ -2422,8 +2422,8 @@ static __init u64 adjust_vmx_controls64(u64 ctl_opt, =
u32 msr)
 	return  ctl_opt & allowed;
 }
=20
-static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
-				    struct vmx_capability *vmx_cap)
+static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
+			     struct vmx_capability *vmx_cap)
 {
 	u32 vmx_msr_low, vmx_msr_high;
 	u32 min, opt, min2, opt2;
@@ -7210,7 +7210,7 @@ static int vmx_vm_init(struct kvm *kvm)
 	return 0;
 }
=20
-static int __init vmx_check_processor_compat(void)
+static int vmx_check_processor_compatibility(void)
 {
 	struct vmcs_config vmcs_conf;
 	struct vmx_capability vmx_cap;
@@ -7811,6 +7811,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D {
=20
 	.hardware_unsetup =3D vmx_hardware_unsetup,
=20
+	.check_processor_compatibility =3D vmx_check_processor_compatibility,
 	.hardware_enable =3D vmx_hardware_enable,
 	.hardware_disable =3D vmx_hardware_disable,
 	.has_emulated_msr =3D vmx_has_emulated_msr,
@@ -8198,7 +8199,6 @@ static __init int hardware_setup(void)
 static struct kvm_x86_init_ops vmx_init_ops __initdata =3D {
 	.cpu_has_kvm_support =3D cpu_has_kvm_support,
 	.disabled_by_bios =3D vmx_disabled_by_bios,
-	.check_processor_compatibility =3D vmx_check_processor_compat,
 	.hardware_setup =3D hardware_setup,
 	.handle_intel_pt_intr =3D NULL,
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7344f75fdd45..09eb1bafd72f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11735,7 +11735,6 @@ void kvm_arch_hardware_unsetup(void)
 int kvm_arch_check_processor_compat(void *opaque)
 {
 	struct cpuinfo_x86 *c =3D &cpu_data(smp_processor_id());
-	struct kvm_x86_init_ops *ops =3D opaque;
=20
 	WARN_ON(!irqs_disabled());
=20
@@ -11743,7 +11742,7 @@ int kvm_arch_check_processor_compat(void *opaque)
 	    __cr4_reserved_bits(cpu_has, &boot_cpu_data))
 		return -EIO;
=20
-	return ops->check_processor_compatibility();
+	return static_call(kvm_x86_check_processor_compatibility)();
 }
=20
 bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5F9EBC433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:15:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383078AbiEEST1 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:19:27 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36166 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S240591AbiEESTV (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:21 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71B7211A18;
        Thu,  5 May 2022 11:15:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774540; x=1683310540;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=CUsCwuISte3NC/AZsZ5vuhlu+ufbC9LqFp9KTiTf1zo=;
  b=FYyr4T986PThpfSD5lxKJbWLHoILZx9VxqcZMlXP3dk89lMe9asC7cvk
   Jc3thl+Ntfhg5aU2a2ko1hUFUqC4e1Qg0B0ZvWKDXXjRhRQk9M9ggq2b+
   72maIXkAmM7CSd6bdFfO5wILB3HcWPa5U4qOJ5VkJFL+A671Jax0NsdQu
   LKKMNbZNRp3+epB5eOV45UZUYaDMIF8Hy+fy1BJn4SthYtVmROF1c+p60
   Tz/mWcm2BHy/TziF9kJfmk44BLYugXZkw8RUkn/AupoZGs6cGgwPYFU2n
   N+HD5hfEMxqJOTKt8dinzDf5myDz+aFyfd1t8ptQmcWjRPUBDbFdM6CIl
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746219"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746219"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:39 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083132"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:39 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 002/104] Partially revert "KVM: Pass kvm_init()'s
 opaque param to additional arch funcs"
Date: Thu,  5 May 2022 11:13:56 -0700
Message-Id: 
 <96252a05613707dc2c68592e190973d2a4e1cef7.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Chao Gao <chao.gao@intel.com>

This partially reverts commit b99040853738 ("KVM: Pass kvm_init()'s opaque
param to additional arch funcs") remove opaque from
kvm_arch_check_processor_compat because no one uses this opaque now.
Address conflicts for ARM (due to file movement) and manually handle RISC-V
which comes after the commit.

And changes about kvm_arch_hardware_setup() in original commit are still
needed so they are not reverted.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Anup Patel <anup@brainfault.org>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20220216031528.92558-3-chao.gao@intel.com
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/arm64/kvm/arm.c       |  2 +-
 arch/mips/kvm/mips.c       |  2 +-
 arch/powerpc/kvm/powerpc.c |  2 +-
 arch/riscv/kvm/main.c      |  2 +-
 arch/s390/kvm/kvm-s390.c   |  2 +-
 arch/x86/kvm/x86.c         |  2 +-
 include/linux/kvm_host.h   |  2 +-
 virt/kvm/kvm_main.c        | 16 +++-------------
 8 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7fceb855fa71..c8206819c858 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -68,7 +68,7 @@ int kvm_arch_hardware_setup(void *opaque)
 	return 0;
 }
=20
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
 {
 	return 0;
 }
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..092d09fb6a7e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -140,7 +140,7 @@ int kvm_arch_hardware_setup(void *opaque)
 	return 0;
 }
=20
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
 {
 	return 0;
 }
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 533c4232e5ab..84e1d81ede66 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -445,7 +445,7 @@ int kvm_arch_hardware_setup(void *opaque)
 	return 0;
 }
=20
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
 {
 	return kvmppc_core_check_processor_compat();
 }
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 2e5ca43c8c49..992877e78393 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -20,7 +20,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
 	return -EINVAL;
 }
=20
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
 {
 	return 0;
 }
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index da7197dae8eb..0e5ef8f51da4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -251,7 +251,7 @@ int kvm_arch_hardware_enable(void)
 	return 0;
 }
=20
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
 {
 	return 0;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09eb1bafd72f..65f725a49e67 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11732,7 +11732,7 @@ void kvm_arch_hardware_unsetup(void)
 	static_call(kvm_x86_hardware_unsetup)();
 }
=20
-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
 {
 	struct cpuinfo_x86 *c =3D &cpu_data(smp_processor_id());
=20
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 883e86ec8e8c..1aead3921a16 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1442,7 +1442,7 @@ int kvm_arch_hardware_enable(void);
 void kvm_arch_hardware_disable(void);
 int kvm_arch_hardware_setup(void *opaque);
 void kvm_arch_hardware_unsetup(void);
-int kvm_arch_check_processor_compat(void *opaque);
+int kvm_arch_check_processor_compat(void);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4b7a0005f5c6..ec365291c625 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5681,22 +5681,14 @@ void kvm_unregister_perf_callbacks(void)
 }
 #endif
=20
-struct kvm_cpu_compat_check {
-	void *opaque;
-	int *ret;
-};
-
-static void check_processor_compat(void *data)
+static void check_processor_compat(void *rtn)
 {
-	struct kvm_cpu_compat_check *c =3D data;
-
-	*c->ret =3D kvm_arch_check_processor_compat(c->opaque);
+	*(int *)rtn =3D kvm_arch_check_processor_compat();
 }
=20
 int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 		  struct module *module)
 {
-	struct kvm_cpu_compat_check c;
 	int r;
 	int cpu;
=20
@@ -5724,10 +5716,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsig=
ned vcpu_align,
 	if (r < 0)
 		goto out_free_1;
=20
-	c.ret =3D &r;
-	c.opaque =3D opaque;
 	for_each_online_cpu(cpu) {
-		smp_call_function_single(cpu, check_processor_compat, &c, 1);
+		smp_call_function_single(cpu, check_processor_compat, &r, 1);
 		if (r < 0)
 			goto out_free_2;
 	}
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AD050C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:15:57 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383151AbiEESTf (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:19:35 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36188 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383031AbiEESTW (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:22 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC6B611C24;
        Thu,  5 May 2022 11:15:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774541; x=1683310541;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=HdCdcuH/OgY0jKRJKLCVg2pYtJ+xgxTLIcUG8pswVLU=;
  b=Vs2olW7w2m+WZAa3kBRAdkic9nkqzg5Te0N1ZcTdMyweNwFUEx2+82ff
   Vl6E39HiUFjLWQBa/bx/27rNPN+IbttbWdFatR3YIILDxUAbaDQ1XNSpN
   ef92/a8Fu7UP1T64m0RV5LygWTfutPYf+lhQKob1F26EjZobgjqikmbYQ
   C5herDOpVfk65H5Qp5MsNihzjUVxYHeM9u+dDs4lc1G6pZkB417K2I8LN
   ssnMNiNk3eZT3JYVodqnlMPdYBfC+OIJZaWNXDH/UTQluijy/2AbYo9N5
   sBmWl7gQDvA6Cl+WCX0JFO2FvLymhq77cGBj7R6dNigTxkA6knCiYHUCa
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746221"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746221"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:39 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083135"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:39 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 003/104] KVM: Refactor CPU compatibility check on
 module initialiization
Date: Thu,  5 May 2022 11:13:57 -0700
Message-Id: 
 <75912816e498ddf62e7efb6a187d763c89e72f45.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Although non-x86 arch doesn't break as long as I inspected code, it's by
code inspection.  This should be reviewed by each arch maintainers.

kvm_init() checks CPU compatibility by calling
kvm_arch_check_processor_compat() on all online CPUs.  Move the callback
to hardware_enable_nolock() and add hardware_enable_all() and
hardware_disable_all().
Add arch specific callback kvm_arch_post_hardware_enable_setup() for arch
to do arch specific initialization that required hardware_enable_all().
This makes a room for TDX module to initialize on kvm module loading.  TDX
module requires all online cpu to enable VMX by VMXON.

If kvm_arch_hardware_enable/disable() depend on (*) part, such dependency
must be called before kvm_init().  In fact kvm_intel() does.  Although
other arch doesn't as long as I checked as follows, it should be reviewed
by each arch maintainers.

Before this patch:
- Arch module initialization
  - kvm_init()
    - kvm_arch_init()
    - kvm_arch_check_processor_compat() on each CPUs
  - post arch specific initialization ---- (*)

- when creating/deleting first/last VM
   - kvm_arch_hardware_enable() on each CPUs --- (A)
   - kvm_arch_hardware_disable() on each CPUs --- (B)

After this patch:
- Arch module initialization
  - kvm_init()
    - kvm_arch_init()
    - kvm_arch_hardware_enable() on each CPUs  (A)
    - kvm_arch_check_processor_compat() on each CPUs
    - kvm_arch_hardware_disable() on each CPUs (B)
  - post arch specific initialization  --- (*)

Code inspection result:
(A)/(B) can depend on (*) before this patch.  If there is dependency, such
initialization must be moved before kvm_init() with this patch.  VMX does
in fact.  As long as I inspected other archs and find only mips has it.

- arch/mips/kvm/mips.c
  module init function, kvm_mips_init(), does some initialization after
  kvm_init().  Compile test only.  Needs review.

- arch/x86/kvm/x86.c
  - uses vm_list which is statically initialized.
  - static_call(kvm_x86_hardware_enable)();
    - SVM: (*) is empty.
    - VMX: needs fix

- arch/powerpc/kvm/powerpc.c
  kvm_arch_hardware_enable/disable() are nop

- arch/s390/kvm/kvm-s390.c
  kvm_arch_hardware_enable/disable() are nop

- arch/arm64/kvm/arm.c
  module init function, arm_init(), calls only kvm_init().
  (*) is empty

- arch/riscv/kvm/main.c
  module init function, riscv_kvm_init(), calls only kvm_init().
  (*) is empty

Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/mips/kvm/mips.c   | 12 +++++++-----
 arch/x86/kvm/vmx/vmx.c | 15 +++++++++++----
 virt/kvm/kvm_main.c    | 20 ++++++++++----------
 3 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 092d09fb6a7e..17228584485d 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1643,11 +1643,6 @@ static int __init kvm_mips_init(void)
 	}
=20
 	ret =3D kvm_mips_entry_setup();
-	if (ret)
-		return ret;
-
-	ret =3D kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
-
 	if (ret)
 		return ret;
=20
@@ -1656,6 +1651,13 @@ static int __init kvm_mips_init(void)
=20
 	register_die_notifier(&kvm_mips_csr_die_notifier);
=20
+	ret =3D kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+
+	if (ret) {
+		unregister_die_notifier(&kvm_mips_csr_die_notifier);
+		return ret;
+	}
+
 	return 0;
 }
=20
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e30493fe4553..9bc46c1e64d9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8254,6 +8254,15 @@ static void vmx_exit(void)
 }
 module_exit(vmx_exit);
=20
+/* initialize before kvm_init() so that hardware_enable/disable() can work=
. */
+static void __init vmx_init_early(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
+}
+
 static int __init vmx_init(void)
 {
 	int r, cpu;
@@ -8291,6 +8300,7 @@ static int __init vmx_init(void)
 	}
 #endif
=20
+	vmx_init_early();
 	r =3D kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
 		     __alignof__(struct vcpu_vmx), THIS_MODULE);
 	if (r)
@@ -8309,11 +8319,8 @@ static int __init vmx_init(void)
 		return r;
 	}
=20
-	for_each_possible_cpu(cpu) {
-		INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
-
+	for_each_possible_cpu(cpu)
 		pi_init_cpu(cpu);
-	}
=20
 #ifdef CONFIG_KEXEC_CORE
 	rcu_assign_pointer(crash_vmclear_loaded_vmcss,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ec365291c625..0ff03889aa5d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4883,8 +4883,13 @@ static void hardware_enable_nolock(void *junk)
=20
 	cpumask_set_cpu(cpu, cpus_hardware_enabled);
=20
+	r =3D kvm_arch_check_processor_compat();
+	if (r)
+		goto out;
+
 	r =3D kvm_arch_hardware_enable();
=20
+out:
 	if (r) {
 		cpumask_clear_cpu(cpu, cpus_hardware_enabled);
 		atomic_inc(&hardware_enable_failed);
@@ -5681,11 +5686,6 @@ void kvm_unregister_perf_callbacks(void)
 }
 #endif
=20
-static void check_processor_compat(void *rtn)
-{
-	*(int *)rtn =3D kvm_arch_check_processor_compat();
-}
-
 int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 		  struct module *module)
 {
@@ -5716,11 +5716,11 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsi=
gned vcpu_align,
 	if (r < 0)
 		goto out_free_1;
=20
-	for_each_online_cpu(cpu) {
-		smp_call_function_single(cpu, check_processor_compat, &r, 1);
-		if (r < 0)
-			goto out_free_2;
-	}
+	/* hardware_enable_nolock() checks CPU compatibility on each CPUs. */
+	r =3D hardware_enable_all();
+	if (r)
+		goto out_free_2;
+	hardware_disable_all();
=20
 	r =3D cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting",
 				      kvm_starting_cpu, kvm_dying_cpu);
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E6256C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:17:41 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S237076AbiEESVT (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:21:19 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36320 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383080AbiEEST1 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:27 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E7A411A18;
        Thu,  5 May 2022 11:15:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774542; x=1683310542;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=mtevwzW0/648NkdASR+P1n4R4jL9I43WRvGresL6zmg=;
  b=C0App5kRZyiIiG+V4DOX2aiORdRKcte03Cg+Xru/LX5z4QEmmSJd+CNN
   LhdDuQcYjiKWkStGxnaqDtnaCezTFVoYk0s/ShvRzTN9zKOF/embpEal0
   Ct3uMHFhgPxda6JlQTP0Z4T0VS38UZzqc4fKad9xXez7BXjhQ0IKOPyjS
   PHsQBin9yyyDNEJF5Z/hKCbMlPNUau6JwPj6yajlE77uCBJpENnQVBVXF
   lXX6orwptCAS3M2DQ3hgqBnxzwVo1P93/4+Z57tZRiIVYz59e5qU4VxC+
   dWG32tTXfyJtVsgzaPtjBS+GbWLzmbRdkOMt6uVExoeX1KaMXqNEuL//r
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746223"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746223"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:40 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083138"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:39 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 004/104] KVM: VMX: Move out vmx_x86_ops to 'main.c' to
 wrap VMX and TDX
Date: Thu,  5 May 2022 11:13:58 -0700
Message-Id: 
 <930c73f3f815500a9b3209c1a996733089ed256f.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

KVM accesses Virtual Machine Control Structure (VMCS) with VMX instructions
to operate on VM.  TDX defines its data structure and TDX SEAMCALL APIs for
VMM to operate on Trust Domain (TD) instead.

Trust Domain Virtual Processor State (TDVPS) is the root control structure
of a TD VCPU.  It helps the TDX module control the operation of the VCPU,
and holds the VCPU state while the VCPU is not running. TDVPS is opaque to
software and DMA access, accessible only by using the TDX module interface
functions (such as TDH.VP.RD, TDH.VP.WR ,..).  TDVPS includes TD VMCS, and
TD VMCS auxiliary structures, such as virtual APIC page, virtualization
exception information, etc.  TDVPS is composed of Trust Domain Virtual
Processor Root (TDVPR) which is the root page of TDVPS and Trust Domain
Virtual Processor eXtension (TDVPX) pages which extend TDVPR to help
provide enough physical space for the logical TDVPS structure.

Also, we have a new structure, Trust Domain Control Structure (TDCS) is the
main control structure of a guest TD, and encrypted (using the guest TD's
ephemeral private key).  At a high level, TDCS holds information for
controlling TD operation as a whole, execution, EPTP, MSR bitmaps, etc. KVM
needs to set it up.  Note that MSR bitmaps are held as part of TDCS (unlike
VMX) because they are meant to have the same value for all VCPUs of the
same TD.  TDCS is a multi-page logical structure composed of multiple Trust
Domain Control Extension (TDCX) physical pages.  Trust Domain Root (TDR) is
the root control structure of a guest TD and is encrypted using the TDX
global private key. It holds a minimal set of state variables that enable
guest TD control even during times when the TD's private key is not known,
or when the TD's key management state does not permit access to memory
encrypted using the TD's private key.

The following shows the relationship between those structures.

        TDR--> TDCS                     per-TD
         |       \--> TDCX
         \
          \--> TDVPS                    per-TD VCPU
                 \--> TDVPR and TDVPX

The existing global struct kvm_x86_ops already defines an interface which
fits with TDX.  But kvm_x86_ops is system-wide, not per-VM structure.  To
allow VMX to coexist with TDs, the kvm_x86_ops callbacks will have wrappers
"if (tdx) tdx_op() else vmx_op()" to switch VMX or TDX at run time.

To split the runtime switch, the VMX implementation, and the TDX
implementation, add main.c, and move out the vmx_x86_ops hooks in
preparation for adding TDX, which can coexist with VMX, i.e. KVM can run
both VMs and TDs.  Use 'vt' for the naming scheme as a nod to VT-x and as a
concatenation of VmxTdx.

The current code looks as follows.
In vmx.c
  static vmx_op() { ... }
  static struct kvm_x86_ops vmx_x86_ops =3D {
        .op =3D vmx_op,
  initialization code

The eventually converted code will look like
In vmx.c, keep the VMX operations.
  vmx_op() { ... }
  VMX initialization
In tdx.c, define the TDX operations.
  tdx_op() { ... }
  TDX initialization
In x86_ops.h, declare the VMX and TDX operations.
  vmx_op();
  tdx_op();
In main.c, define common wrappers for VMX and VMX.
  static vt_ops() { if (tdx) tdx_ops() else vmx_ops() }
  static struct kvm_x86_ops vt_x86_ops =3D {
        .op =3D vt_op,
  initialization to call VMX and TDX initialization

Opportunistically, fix the name inconsistency from vmx_create_vcpu() and
vmx_free_vcpu() to vmx_vcpu_create() and vxm_vcpu_free().

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/Makefile      |   2 +-
 arch/x86/kvm/vmx/main.c    | 155 ++++++++++++++++
 arch/x86/kvm/vmx/vmx.c     | 363 +++++++++++--------------------------
 arch/x86/kvm/vmx/x86_ops.h | 125 +++++++++++++
 4 files changed, 386 insertions(+), 259 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/main.c
 create mode 100644 arch/x86/kvm/vmx/x86_ops.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 30f244b64523..ee4d0999f20f 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -22,7 +22,7 @@ kvm-$(CONFIG_X86_64) +=3D mmu/tdp_iter.o mmu/tdp_mmu.o
 kvm-$(CONFIG_KVM_XEN)	+=3D xen.o
=20
 kvm-intel-y		+=3D vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
-			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
+			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o vmx/main.o
 kvm-intel-$(CONFIG_X86_SGX_KVM)	+=3D vmx/sgx.o
=20
 kvm-amd-y		+=3D svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o =
svm/sev.o
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
new file mode 100644
index 000000000000..636768f5b985
--- /dev/null
+++ b/arch/x86/kvm/vmx/main.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/moduleparam.h>
+
+#include "x86_ops.h"
+#include "vmx.h"
+#include "nested.h"
+#include "pmu.h"
+
+struct kvm_x86_ops vt_x86_ops __initdata =3D {
+	.name =3D "kvm_intel",
+
+	.hardware_unsetup =3D vmx_hardware_unsetup,
+	.check_processor_compatibility =3D vmx_check_processor_compatibility,
+
+	.hardware_enable =3D vmx_hardware_enable,
+	.hardware_disable =3D vmx_hardware_disable,
+	.has_emulated_msr =3D vmx_has_emulated_msr,
+
+	.vm_size =3D sizeof(struct kvm_vmx),
+	.vm_init =3D vmx_vm_init,
+	.vm_destroy =3D vmx_vm_destroy,
+
+	.vcpu_precreate =3D vmx_vcpu_precreate,
+	.vcpu_create =3D vmx_vcpu_create,
+	.vcpu_free =3D vmx_vcpu_free,
+	.vcpu_reset =3D vmx_vcpu_reset,
+
+	.prepare_switch_to_guest =3D vmx_prepare_switch_to_guest,
+	.vcpu_load =3D vmx_vcpu_load,
+	.vcpu_put =3D vmx_vcpu_put,
+
+	.update_exception_bitmap =3D vmx_update_exception_bitmap,
+	.get_msr_feature =3D vmx_get_msr_feature,
+	.get_msr =3D vmx_get_msr,
+	.set_msr =3D vmx_set_msr,
+	.get_segment_base =3D vmx_get_segment_base,
+	.get_segment =3D vmx_get_segment,
+	.set_segment =3D vmx_set_segment,
+	.get_cpl =3D vmx_get_cpl,
+	.get_cs_db_l_bits =3D vmx_get_cs_db_l_bits,
+	.set_cr0 =3D vmx_set_cr0,
+	.is_valid_cr4 =3D vmx_is_valid_cr4,
+	.set_cr4 =3D vmx_set_cr4,
+	.set_efer =3D vmx_set_efer,
+	.get_idt =3D vmx_get_idt,
+	.set_idt =3D vmx_set_idt,
+	.get_gdt =3D vmx_get_gdt,
+	.set_gdt =3D vmx_set_gdt,
+	.set_dr7 =3D vmx_set_dr7,
+	.sync_dirty_debug_regs =3D vmx_sync_dirty_debug_regs,
+	.cache_reg =3D vmx_cache_reg,
+	.get_rflags =3D vmx_get_rflags,
+	.set_rflags =3D vmx_set_rflags,
+	.get_if_flag =3D vmx_get_if_flag,
+
+	.flush_tlb_all =3D vmx_flush_tlb_all,
+	.flush_tlb_current =3D vmx_flush_tlb_current,
+	.flush_tlb_gva =3D vmx_flush_tlb_gva,
+	.flush_tlb_guest =3D vmx_flush_tlb_guest,
+
+	.vcpu_pre_run =3D vmx_vcpu_pre_run,
+	.vcpu_run =3D vmx_vcpu_run,
+	.handle_exit =3D vmx_handle_exit,
+	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
+	.update_emulated_instruction =3D vmx_update_emulated_instruction,
+	.set_interrupt_shadow =3D vmx_set_interrupt_shadow,
+	.get_interrupt_shadow =3D vmx_get_interrupt_shadow,
+	.patch_hypercall =3D vmx_patch_hypercall,
+	.inject_irq =3D vmx_inject_irq,
+	.inject_nmi =3D vmx_inject_nmi,
+	.queue_exception =3D vmx_queue_exception,
+	.cancel_injection =3D vmx_cancel_injection,
+	.interrupt_allowed =3D vmx_interrupt_allowed,
+	.nmi_allowed =3D vmx_nmi_allowed,
+	.get_nmi_mask =3D vmx_get_nmi_mask,
+	.set_nmi_mask =3D vmx_set_nmi_mask,
+	.enable_nmi_window =3D vmx_enable_nmi_window,
+	.enable_irq_window =3D vmx_enable_irq_window,
+	.update_cr8_intercept =3D vmx_update_cr8_intercept,
+	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
+	.set_apic_access_page_addr =3D vmx_set_apic_access_page_addr,
+	.refresh_apicv_exec_ctrl =3D vmx_refresh_apicv_exec_ctrl,
+	.load_eoi_exitmap =3D vmx_load_eoi_exitmap,
+	.apicv_post_state_restore =3D vmx_apicv_post_state_restore,
+	.check_apicv_inhibit_reasons =3D vmx_check_apicv_inhibit_reasons,
+	.hwapic_irr_update =3D vmx_hwapic_irr_update,
+	.hwapic_isr_update =3D vmx_hwapic_isr_update,
+	.guest_apic_has_interrupt =3D vmx_guest_apic_has_interrupt,
+	.sync_pir_to_irr =3D vmx_sync_pir_to_irr,
+	.deliver_interrupt =3D vmx_deliver_interrupt,
+	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
+
+	.set_tss_addr =3D vmx_set_tss_addr,
+	.set_identity_map_addr =3D vmx_set_identity_map_addr,
+	.get_mt_mask =3D vmx_get_mt_mask,
+
+	.get_exit_info =3D vmx_get_exit_info,
+
+	.vcpu_after_set_cpuid =3D vmx_vcpu_after_set_cpuid,
+
+	.has_wbinvd_exit =3D cpu_has_vmx_wbinvd_exit,
+
+	.get_l2_tsc_offset =3D vmx_get_l2_tsc_offset,
+	.get_l2_tsc_multiplier =3D vmx_get_l2_tsc_multiplier,
+	.write_tsc_offset =3D vmx_write_tsc_offset,
+	.write_tsc_multiplier =3D vmx_write_tsc_multiplier,
+
+	.load_mmu_pgd =3D vmx_load_mmu_pgd,
+
+	.check_intercept =3D vmx_check_intercept,
+	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
+
+	.request_immediate_exit =3D vmx_request_immediate_exit,
+
+	.sched_in =3D vmx_sched_in,
+
+	.cpu_dirty_log_size =3D PML_ENTITY_NUM,
+	.update_cpu_dirty_logging =3D vmx_update_cpu_dirty_logging,
+
+	.nested_ops =3D &vmx_nested_ops,
+
+	.pi_update_irte =3D vmx_pi_update_irte,
+	.pi_start_assignment =3D vmx_pi_start_assignment,
+
+#ifdef CONFIG_X86_64
+	.set_hv_timer =3D vmx_set_hv_timer,
+	.cancel_hv_timer =3D vmx_cancel_hv_timer,
+#endif
+
+	.setup_mce =3D vmx_setup_mce,
+
+	.smi_allowed =3D vmx_smi_allowed,
+	.enter_smm =3D vmx_enter_smm,
+	.leave_smm =3D vmx_leave_smm,
+	.enable_smi_window =3D vmx_enable_smi_window,
+
+	.can_emulate_instruction =3D vmx_can_emulate_instruction,
+	.apic_init_signal_blocked =3D vmx_apic_init_signal_blocked,
+	.migrate_timers =3D vmx_migrate_timers,
+
+	.msr_filter_changed =3D vmx_msr_filter_changed,
+	.complete_emulated_msr =3D kvm_complete_insn_gp,
+
+	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
+};
+
+struct kvm_x86_init_ops vt_init_ops __initdata =3D {
+	.cpu_has_kvm_support =3D vmx_cpu_has_kvm_support,
+	.disabled_by_bios =3D vmx_disabled_by_bios,
+	.hardware_setup =3D vmx_hardware_setup,
+	.handle_intel_pt_intr =3D NULL,
+
+	.runtime_ops =3D &vt_x86_ops,
+	.pmu_ops =3D &intel_pmu_ops,
+};
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9bc46c1e64d9..e08be67352e0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -66,6 +66,7 @@
 #include "vmcs12.h"
 #include "vmx.h"
 #include "x86.h"
+#include "x86_ops.h"
=20
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
@@ -1307,7 +1308,7 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cp=
u,
  * Switches to specified vcpu, until a matching vcpu_put(), but assumes
  * vcpu mutex is already taken.
  */
-static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -1318,7 +1319,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int =
cpu)
 	vmx->host_debugctlmsr =3D get_debugctlmsr();
 }
=20
-static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
+void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	vmx_vcpu_pi_put(vcpu);
=20
@@ -1372,7 +1373,7 @@ void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned l=
ong rflags)
 		vmx->emulation_required =3D vmx_emulation_required(vcpu);
 }
=20
-static bool vmx_get_if_flag(struct kvm_vcpu *vcpu)
+bool vmx_get_if_flag(struct kvm_vcpu *vcpu)
 {
 	return vmx_get_rflags(vcpu) & X86_EFLAGS_IF;
 }
@@ -1478,8 +1479,8 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, =
u64 data)
 	return 0;
 }
=20
-static bool vmx_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_ty=
pe,
-					void *insn, int insn_len)
+bool vmx_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
+				void *insn, int insn_len)
 {
 	/*
 	 * Emulation of instructions in SGX enclaves is impossible as RIP does
@@ -1563,7 +1564,7 @@ static int skip_emulated_instruction(struct kvm_vcpu =
*vcpu)
  * Recognizes a pending MTF VM-exit and records the nested state for later
  * delivery.
  */
-static void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu)
+void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu)
 {
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
@@ -1586,7 +1587,7 @@ static void vmx_update_emulated_instruction(struct kv=
m_vcpu *vcpu)
 		vmx->nested.mtf_pending =3D false;
 }
=20
-static int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu)
+int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
 	vmx_update_emulated_instruction(vcpu);
 	return skip_emulated_instruction(vcpu);
@@ -1605,7 +1606,7 @@ static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
 		vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
 }
=20
-static void vmx_queue_exception(struct kvm_vcpu *vcpu)
+void vmx_queue_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	unsigned nr =3D vcpu->arch.exception.nr;
@@ -1718,12 +1719,12 @@ u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
 	return kvm_default_tsc_scaling_ratio;
 }
=20
-static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	vmcs_write64(TSC_OFFSET, offset);
 }
=20
-static void vmx_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
+void vmx_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
 {
 	vmcs_write64(TSC_MULTIPLIER, multiplier);
 }
@@ -1747,7 +1748,7 @@ static inline bool vmx_feature_control_msr_valid(stru=
ct kvm_vcpu *vcpu,
 	return !(val & ~valid_bits);
 }
=20
-static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
+int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 {
 	switch (msr->index) {
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
@@ -1767,7 +1768,7 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *=
msr)
  * Returns 0 on success, non-0 otherwise.
  * Assumes vcpu_load() was already called.
  */
-static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	struct vmx_uret_msr *msr;
@@ -1945,7 +1946,7 @@ static u64 vcpu_supported_debugctl(struct kvm_vcpu *v=
cpu)
  * Returns 0 on success, non-0 otherwise.
  * Assumes vcpu_load() was already called.
  */
-static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	struct vmx_uret_msr *msr;
@@ -2258,7 +2259,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct =
msr_data *msr_info)
 	return ret;
 }
=20
-static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
+void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 {
 	unsigned long guest_owned_bits;
=20
@@ -2301,12 +2302,12 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, en=
um kvm_reg reg)
 	}
 }
=20
-static __init int cpu_has_kvm_support(void)
+__init int vmx_cpu_has_kvm_support(void)
 {
 	return cpu_has_vmx();
 }
=20
-static __init int vmx_disabled_by_bios(void)
+__init int vmx_disabled_by_bios(void)
 {
 	return !boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
 	       !boot_cpu_has(X86_FEATURE_VMX);
@@ -2332,7 +2333,7 @@ static int kvm_cpu_vmxon(u64 vmxon_pointer)
 	return -EFAULT;
 }
=20
-static int vmx_hardware_enable(void)
+int vmx_hardware_enable(void)
 {
 	int cpu =3D raw_smp_processor_id();
 	u64 phys_addr =3D __pa(per_cpu(vmxarea, cpu));
@@ -2373,7 +2374,7 @@ static void vmclear_local_loaded_vmcss(void)
 		__loaded_vmcs_clear(v);
 }
=20
-static void vmx_hardware_disable(void)
+void vmx_hardware_disable(void)
 {
 	vmclear_local_loaded_vmcss();
=20
@@ -2929,7 +2930,7 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
=20
 #endif
=20
-static void vmx_flush_tlb_all(struct kvm_vcpu *vcpu)
+void vmx_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -2959,7 +2960,7 @@ static inline int vmx_get_current_vpid(struct kvm_vcp=
u *vcpu)
 	return to_vmx(vcpu)->vpid;
 }
=20
-static void vmx_flush_tlb_current(struct kvm_vcpu *vcpu)
+void vmx_flush_tlb_current(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
 	u64 root_hpa =3D mmu->root.hpa;
@@ -2975,7 +2976,7 @@ static void vmx_flush_tlb_current(struct kvm_vcpu *vc=
pu)
 		vpid_sync_context(vmx_get_current_vpid(vcpu));
 }
=20
-static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
+void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
 {
 	/*
 	 * vpid_sync_vcpu_addr() is a nop if vpid=3D=3D0, see the comment in
@@ -2984,7 +2985,7 @@ static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, =
gva_t addr)
 	vpid_sync_vcpu_addr(vmx_get_current_vpid(vcpu), addr);
 }
=20
-static void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu)
+void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu)
 {
 	/*
 	 * vpid_sync_context() is a nop if vpid=3D=3D0, e.g. if enable_vpid=3D=3D=
0 or a
@@ -3139,8 +3140,7 @@ u64 construct_eptp(struct kvm_vcpu *vcpu, hpa_t root_=
hpa, int root_level)
 	return eptp;
 }
=20
-static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
-			     int root_level)
+void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_leve=
l)
 {
 	struct kvm *kvm =3D vcpu->kvm;
 	bool update_guest_cr3 =3D true;
@@ -3168,8 +3168,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, h=
pa_t root_hpa,
 		vmcs_writel(GUEST_CR3, guest_cr3);
 }
=20
-
-static bool vmx_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+bool vmx_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
 	/*
 	 * We operate under the default treatment of SMM, so VMX cannot be
@@ -3285,7 +3284,7 @@ void vmx_get_segment(struct kvm_vcpu *vcpu, struct kv=
m_segment *var, int seg)
 	var->g =3D (ar >> 15) & 1;
 }
=20
-static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
 {
 	struct kvm_segment s;
=20
@@ -3365,14 +3364,14 @@ void __vmx_set_segment(struct kvm_vcpu *vcpu, struc=
t kvm_segment *var, int seg)
 	vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
 }
=20
-static void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var=
, int seg)
+void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg)
 {
 	__vmx_set_segment(vcpu, var, seg);
=20
 	to_vmx(vcpu)->emulation_required =3D vmx_emulation_required(vcpu);
 }
=20
-static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
+void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
 {
 	u32 ar =3D vmx_read_guest_seg_ar(to_vmx(vcpu), VCPU_SREG_CS);
=20
@@ -3380,25 +3379,25 @@ static void vmx_get_cs_db_l_bits(struct kvm_vcpu *v=
cpu, int *db, int *l)
 	*l =3D (ar >> 13) & 1;
 }
=20
-static void vmx_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+void vmx_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	dt->size =3D vmcs_read32(GUEST_IDTR_LIMIT);
 	dt->address =3D vmcs_readl(GUEST_IDTR_BASE);
 }
=20
-static void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	vmcs_write32(GUEST_IDTR_LIMIT, dt->size);
 	vmcs_writel(GUEST_IDTR_BASE, dt->address);
 }
=20
-static void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	dt->size =3D vmcs_read32(GUEST_GDTR_LIMIT);
 	dt->address =3D vmcs_readl(GUEST_GDTR_BASE);
 }
=20
-static void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	vmcs_write32(GUEST_GDTR_LIMIT, dt->size);
 	vmcs_writel(GUEST_GDTR_BASE, dt->address);
@@ -3896,7 +3895,7 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcp=
u)
 	}
 }
=20
-static bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
+bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	void *vapic_page;
@@ -3916,7 +3915,7 @@ static bool vmx_guest_apic_has_interrupt(struct kvm_v=
cpu *vcpu)
 	return ((rvi & 0xf0) > (vppr & 0xf0));
 }
=20
-static void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
+void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	u32 i;
@@ -4050,8 +4049,8 @@ static int vmx_deliver_posted_interrupt(struct kvm_vc=
pu *vcpu, int vector)
 	return 0;
 }
=20
-static void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mod=
e,
-				  int trig_mode, int vector)
+void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector)
 {
 	struct kvm_vcpu *vcpu =3D apic->vcpu;
=20
@@ -4194,7 +4193,7 @@ static u32 vmx_vmexit_ctrl(void)
 		~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER);
 }
=20
-static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -4431,7 +4430,7 @@ static int vmx_alloc_ipiv_pid_table(struct kvm *kvm)
 	return 0;
 }
=20
-static int vmx_vcpu_precreate(struct kvm *kvm)
+int vmx_vcpu_precreate(struct kvm *kvm)
 {
 	return vmx_alloc_ipiv_pid_table(kvm);
 }
@@ -4580,7 +4579,7 @@ static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 	vmx->pi_desc.sn =3D 1;
 }
=20
-static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -4637,12 +4636,12 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, b=
ool init_event)
 	vpid_sync_context(vmx->vpid);
 }
=20
-static void vmx_enable_irq_window(struct kvm_vcpu *vcpu)
+void vmx_enable_irq_window(struct kvm_vcpu *vcpu)
 {
 	exec_controls_setbit(to_vmx(vcpu), CPU_BASED_INTR_WINDOW_EXITING);
 }
=20
-static void vmx_enable_nmi_window(struct kvm_vcpu *vcpu)
+void vmx_enable_nmi_window(struct kvm_vcpu *vcpu)
 {
 	if (!enable_vnmi ||
 	    vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & GUEST_INTR_STATE_STI) {
@@ -4653,7 +4652,7 @@ static void vmx_enable_nmi_window(struct kvm_vcpu *vc=
pu)
 	exec_controls_setbit(to_vmx(vcpu), CPU_BASED_NMI_WINDOW_EXITING);
 }
=20
-static void vmx_inject_irq(struct kvm_vcpu *vcpu)
+void vmx_inject_irq(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	uint32_t intr;
@@ -4681,7 +4680,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
 	vmx_clear_hlt(vcpu);
 }
=20
-static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
+void vmx_inject_nmi(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -4759,7 +4758,7 @@ bool vmx_nmi_blocked(struct kvm_vcpu *vcpu)
 		 GUEST_INTR_STATE_NMI));
 }
=20
-static int vmx_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+int vmx_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	if (to_vmx(vcpu)->nested.nested_run_pending)
 		return -EBUSY;
@@ -4781,7 +4780,7 @@ bool vmx_interrupt_blocked(struct kvm_vcpu *vcpu)
 		(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
 }
=20
-static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	if (to_vmx(vcpu)->nested.nested_run_pending)
 		return -EBUSY;
@@ -4796,7 +4795,7 @@ static int vmx_interrupt_allowed(struct kvm_vcpu *vcp=
u, bool for_injection)
 	return !vmx_interrupt_blocked(vcpu);
 }
=20
-static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
+int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
 {
 	void __user *ret;
=20
@@ -4816,7 +4815,7 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned=
 int addr)
 	return init_rmode_tss(kvm, ret);
 }
=20
-static int vmx_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
+int vmx_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
 {
 	to_kvm_vmx(kvm)->ept_identity_map_addr =3D ident_addr;
 	return 0;
@@ -5095,8 +5094,7 @@ static int handle_io(struct kvm_vcpu *vcpu)
 	return kvm_fast_pio(vcpu, size, port, in);
 }
=20
-static void
-vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
+void vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
 {
 	/*
 	 * Patch in the VMCALL instruction:
@@ -5306,7 +5304,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
 	return kvm_complete_insn_gp(vcpu, err);
 }
=20
-static void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
+void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
 {
 	get_debugreg(vcpu->arch.db[0], 0);
 	get_debugreg(vcpu->arch.db[1], 1);
@@ -5325,7 +5323,7 @@ static void vmx_sync_dirty_debug_regs(struct kvm_vcpu=
 *vcpu)
 	set_debugreg(DR6_RESERVED, 6);
 }
=20
-static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
+void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
 {
 	vmcs_writel(GUEST_DR7, val);
 }
@@ -5596,7 +5594,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu=
 *vcpu)
 	return 1;
 }
=20
-static int vmx_vcpu_pre_run(struct kvm_vcpu *vcpu)
+int vmx_vcpu_pre_run(struct kvm_vcpu *vcpu)
 {
 	if (vmx_emulation_required_with_pending_exception(vcpu)) {
 		kvm_prepare_emulation_failure_exit(vcpu);
@@ -5833,9 +5831,8 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu=
 *vcpu) =3D {
 static const int kvm_vmx_max_exit_handlers =3D
 	ARRAY_SIZE(kvm_vmx_exit_handlers);
=20
-static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
-			      u64 *info1, u64 *info2,
-			      u32 *intr_info, u32 *error_code)
+void vmx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -6268,7 +6265,7 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, f=
astpath_t exit_fastpath)
 	return 0;
 }
=20
-static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
+int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 {
 	int ret =3D __vmx_handle_exit(vcpu, exit_fastpath);
=20
@@ -6356,7 +6353,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vc=
pu)
 		: "eax", "ebx", "ecx", "edx");
 }
=20
-static void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int i=
rr)
+void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 {
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
 	int tpr_threshold;
@@ -6426,7 +6423,7 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
 	vmx_update_msr_bitmap_x2apic(vcpu);
 }
=20
-static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
+void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
 {
 	struct page *page;
=20
@@ -6454,7 +6451,7 @@ static void vmx_set_apic_access_page_addr(struct kvm_=
vcpu *vcpu)
 	put_page(page);
 }
=20
-static void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
+void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
 {
 	u16 status;
 	u8 old;
@@ -6488,7 +6485,7 @@ static void vmx_set_rvi(int vector)
 	}
 }
=20
-static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
+void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 {
 	/*
 	 * When running L2, updating RVI is only relevant when
@@ -6502,7 +6499,7 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vc=
pu, int max_irr)
 		vmx_set_rvi(max_irr);
 }
=20
-static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	int max_irr;
@@ -6548,7 +6545,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 	return max_irr;
 }
=20
-static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitm=
ap)
+void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 {
 	if (!kvm_vcpu_apicv_active(vcpu))
 		return;
@@ -6559,7 +6556,7 @@ static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcp=
u, u64 *eoi_exit_bitmap)
 	vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
 }
=20
-static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
+void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -6631,7 +6628,7 @@ static void handle_external_interrupt_irqoff(struct k=
vm_vcpu *vcpu)
 	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
 }
=20
-static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -6648,7 +6645,7 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *v=
cpu)
  * The kvm parameter can be NULL (module initialization, or invocation bef=
ore
  * VM creation). Be sure to check the kvm parameter before using it.
  */
-static bool vmx_has_emulated_msr(struct kvm *kvm, u32 index)
+bool vmx_has_emulated_msr(struct kvm *kvm, u32 index)
 {
 	switch (index) {
 	case MSR_IA32_SMBASE:
@@ -6769,7 +6766,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *=
vmx)
 				  IDT_VECTORING_ERROR_CODE);
 }
=20
-static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
+void vmx_cancel_injection(struct kvm_vcpu *vcpu)
 {
 	__vmx_complete_interrupts(vcpu,
 				  vmcs_read32(VM_ENTRY_INTR_INFO_FIELD),
@@ -6865,7 +6862,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vc=
pu *vcpu,
 	guest_state_exit_irqoff();
 }
=20
-static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
+fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	unsigned long cr3, cr4;
@@ -7059,7 +7056,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	return vmx_exit_handlers_fastpath(vcpu);
 }
=20
-static void vmx_vcpu_free(struct kvm_vcpu *vcpu)
+void vmx_vcpu_free(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -7070,7 +7067,7 @@ static void vmx_vcpu_free(struct kvm_vcpu *vcpu)
 	free_loaded_vmcs(vmx->loaded_vmcs);
 }
=20
-static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
+int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 {
 	struct vmx_uret_msr *tsx_ctrl;
 	struct vcpu_vmx *vmx;
@@ -7179,7 +7176,7 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible.=
 See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/h=
w-vuln/l1tf.html for details.\n"
 #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation d=
isabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/d=
oc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
=20
-static int vmx_vm_init(struct kvm *kvm)
+int vmx_vm_init(struct kvm *kvm)
 {
 	if (!ple_gap)
 		kvm->arch.pause_in_guest =3D true;
@@ -7210,7 +7207,7 @@ static int vmx_vm_init(struct kvm *kvm)
 	return 0;
 }
=20
-static int vmx_check_processor_compatibility(void)
+int vmx_check_processor_compatibility(void)
 {
 	struct vmcs_config vmcs_conf;
 	struct vmx_capability vmx_cap;
@@ -7233,7 +7230,7 @@ static int vmx_check_processor_compatibility(void)
 	return 0;
 }
=20
-static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	u8 cache;
=20
@@ -7422,7 +7419,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 		vmx->pt_desc.ctl_bitmask &=3D ~(0xfULL << (32 + i * 4));
 }
=20
-static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
+void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -7527,7 +7524,7 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
 }
=20
-static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu)
+void vmx_request_immediate_exit(struct kvm_vcpu *vcpu)
 {
 	to_vmx(vcpu)->req_immediate_exit =3D true;
 }
@@ -7566,10 +7563,10 @@ static int vmx_check_intercept_io(struct kvm_vcpu *=
vcpu,
 	return intercept ? X86EMUL_UNHANDLEABLE : X86EMUL_CONTINUE;
 }
=20
-static int vmx_check_intercept(struct kvm_vcpu *vcpu,
-			       struct x86_instruction_info *info,
-			       enum x86_intercept_stage stage,
-			       struct x86_exception *exception)
+int vmx_check_intercept(struct kvm_vcpu *vcpu,
+		       struct x86_instruction_info *info,
+		       enum x86_intercept_stage stage,
+		       struct x86_exception *exception)
 {
 	struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu);
=20
@@ -7634,8 +7631,8 @@ static inline int u64_shl_div_u64(u64 a, unsigned int=
 shift,
 	return 0;
 }
=20
-static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
-			    bool *expired)
+int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
+		bool *expired)
 {
 	struct vcpu_vmx *vmx;
 	u64 tscl, guest_tscl, delta_tsc, lapic_timer_advance_cycles;
@@ -7674,13 +7671,13 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, =
u64 guest_deadline_tsc,
 	return 0;
 }
=20
-static void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
+void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu)
 {
 	to_vmx(vcpu)->hv_deadline_tsc =3D -1;
 }
 #endif
=20
-static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
+void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
 	if (!kvm_pause_in_guest(vcpu->kvm))
 		shrink_ple_window(vcpu);
@@ -7706,7 +7703,7 @@ void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vc=
pu)
 		secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_ENABLE_PML);
 }
=20
-static void vmx_setup_mce(struct kvm_vcpu *vcpu)
+void vmx_setup_mce(struct kvm_vcpu *vcpu)
 {
 	if (vcpu->arch.mcg_cap & MCG_LMCE_P)
 		to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |=3D
@@ -7716,7 +7713,7 @@ static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 			~FEAT_CTL_LMCE_ENABLED;
 }
=20
-static int vmx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+int vmx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	/* we need a nested vmexit to enter SMM, postpone if run is pending */
 	if (to_vmx(vcpu)->nested.nested_run_pending)
@@ -7724,7 +7721,7 @@ static int vmx_smi_allowed(struct kvm_vcpu *vcpu, boo=
l for_injection)
 	return !is_smm(vcpu);
 }
=20
-static int vmx_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+int vmx_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
=20
@@ -7738,7 +7735,7 @@ static int vmx_enter_smm(struct kvm_vcpu *vcpu, char =
*smstate)
 	return 0;
 }
=20
-static int vmx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
+int vmx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
 	int ret;
@@ -7759,17 +7756,17 @@ static int vmx_leave_smm(struct kvm_vcpu *vcpu, con=
st char *smstate)
 	return 0;
 }
=20
-static void vmx_enable_smi_window(struct kvm_vcpu *vcpu)
+void vmx_enable_smi_window(struct kvm_vcpu *vcpu)
 {
 	/* RSM will cause a vmexit anyway.  */
 }
=20
-static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
+bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
 {
 	return to_vmx(vcpu)->nested.vmxon && !is_guest_mode(vcpu);
 }
=20
-static void vmx_migrate_timers(struct kvm_vcpu *vcpu)
+void vmx_migrate_timers(struct kvm_vcpu *vcpu)
 {
 	if (is_guest_mode(vcpu)) {
 		struct hrtimer *timer =3D &to_vmx(vcpu)->nested.preemption_timer;
@@ -7779,7 +7776,7 @@ static void vmx_migrate_timers(struct kvm_vcpu *vcpu)
 	}
 }
=20
-static void vmx_hardware_unsetup(void)
+void vmx_hardware_unsetup(void)
 {
 	kvm_set_posted_intr_wakeup_handler(NULL);
=20
@@ -7789,7 +7786,7 @@ static void vmx_hardware_unsetup(void)
 	free_kvm_area();
 }
=20
-static bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
+bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
 {
 	ulong supported =3D BIT(APICV_INHIBIT_REASON_DISABLE) |
 			  BIT(APICV_INHIBIT_REASON_ABSENT) |
@@ -7799,151 +7796,13 @@ static bool vmx_check_apicv_inhibit_reasons(enum k=
vm_apicv_inhibit reason)
 	return supported & BIT(reason);
 }
=20
-static void vmx_vm_destroy(struct kvm *kvm)
+void vmx_vm_destroy(struct kvm *kvm)
 {
 	struct kvm_vmx *kvm_vmx =3D to_kvm_vmx(kvm);
=20
 	free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm=
));
 }
=20
-static struct kvm_x86_ops vmx_x86_ops __initdata =3D {
-	.name =3D "kvm_intel",
-
-	.hardware_unsetup =3D vmx_hardware_unsetup,
-
-	.check_processor_compatibility =3D vmx_check_processor_compatibility,
-	.hardware_enable =3D vmx_hardware_enable,
-	.hardware_disable =3D vmx_hardware_disable,
-	.has_emulated_msr =3D vmx_has_emulated_msr,
-
-	.vm_size =3D sizeof(struct kvm_vmx),
-	.vm_init =3D vmx_vm_init,
-	.vm_destroy =3D vmx_vm_destroy,
-
-	.vcpu_precreate =3D vmx_vcpu_precreate,
-	.vcpu_create =3D vmx_vcpu_create,
-	.vcpu_free =3D vmx_vcpu_free,
-	.vcpu_reset =3D vmx_vcpu_reset,
-
-	.prepare_switch_to_guest =3D vmx_prepare_switch_to_guest,
-	.vcpu_load =3D vmx_vcpu_load,
-	.vcpu_put =3D vmx_vcpu_put,
-
-	.update_exception_bitmap =3D vmx_update_exception_bitmap,
-	.get_msr_feature =3D vmx_get_msr_feature,
-	.get_msr =3D vmx_get_msr,
-	.set_msr =3D vmx_set_msr,
-	.get_segment_base =3D vmx_get_segment_base,
-	.get_segment =3D vmx_get_segment,
-	.set_segment =3D vmx_set_segment,
-	.get_cpl =3D vmx_get_cpl,
-	.get_cs_db_l_bits =3D vmx_get_cs_db_l_bits,
-	.set_cr0 =3D vmx_set_cr0,
-	.is_valid_cr4 =3D vmx_is_valid_cr4,
-	.set_cr4 =3D vmx_set_cr4,
-	.set_efer =3D vmx_set_efer,
-	.get_idt =3D vmx_get_idt,
-	.set_idt =3D vmx_set_idt,
-	.get_gdt =3D vmx_get_gdt,
-	.set_gdt =3D vmx_set_gdt,
-	.set_dr7 =3D vmx_set_dr7,
-	.sync_dirty_debug_regs =3D vmx_sync_dirty_debug_regs,
-	.cache_reg =3D vmx_cache_reg,
-	.get_rflags =3D vmx_get_rflags,
-	.set_rflags =3D vmx_set_rflags,
-	.get_if_flag =3D vmx_get_if_flag,
-
-	.flush_tlb_all =3D vmx_flush_tlb_all,
-	.flush_tlb_current =3D vmx_flush_tlb_current,
-	.flush_tlb_gva =3D vmx_flush_tlb_gva,
-	.flush_tlb_guest =3D vmx_flush_tlb_guest,
-
-	.vcpu_pre_run =3D vmx_vcpu_pre_run,
-	.vcpu_run =3D vmx_vcpu_run,
-	.handle_exit =3D vmx_handle_exit,
-	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
-	.update_emulated_instruction =3D vmx_update_emulated_instruction,
-	.set_interrupt_shadow =3D vmx_set_interrupt_shadow,
-	.get_interrupt_shadow =3D vmx_get_interrupt_shadow,
-	.patch_hypercall =3D vmx_patch_hypercall,
-	.inject_irq =3D vmx_inject_irq,
-	.inject_nmi =3D vmx_inject_nmi,
-	.queue_exception =3D vmx_queue_exception,
-	.cancel_injection =3D vmx_cancel_injection,
-	.interrupt_allowed =3D vmx_interrupt_allowed,
-	.nmi_allowed =3D vmx_nmi_allowed,
-	.get_nmi_mask =3D vmx_get_nmi_mask,
-	.set_nmi_mask =3D vmx_set_nmi_mask,
-	.enable_nmi_window =3D vmx_enable_nmi_window,
-	.enable_irq_window =3D vmx_enable_irq_window,
-	.update_cr8_intercept =3D vmx_update_cr8_intercept,
-	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
-	.set_apic_access_page_addr =3D vmx_set_apic_access_page_addr,
-	.refresh_apicv_exec_ctrl =3D vmx_refresh_apicv_exec_ctrl,
-	.load_eoi_exitmap =3D vmx_load_eoi_exitmap,
-	.apicv_post_state_restore =3D vmx_apicv_post_state_restore,
-	.check_apicv_inhibit_reasons =3D vmx_check_apicv_inhibit_reasons,
-	.hwapic_irr_update =3D vmx_hwapic_irr_update,
-	.hwapic_isr_update =3D vmx_hwapic_isr_update,
-	.guest_apic_has_interrupt =3D vmx_guest_apic_has_interrupt,
-	.sync_pir_to_irr =3D vmx_sync_pir_to_irr,
-	.deliver_interrupt =3D vmx_deliver_interrupt,
-	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
-
-	.set_tss_addr =3D vmx_set_tss_addr,
-	.set_identity_map_addr =3D vmx_set_identity_map_addr,
-	.get_mt_mask =3D vmx_get_mt_mask,
-
-	.get_exit_info =3D vmx_get_exit_info,
-
-	.vcpu_after_set_cpuid =3D vmx_vcpu_after_set_cpuid,
-
-	.has_wbinvd_exit =3D cpu_has_vmx_wbinvd_exit,
-
-	.get_l2_tsc_offset =3D vmx_get_l2_tsc_offset,
-	.get_l2_tsc_multiplier =3D vmx_get_l2_tsc_multiplier,
-	.write_tsc_offset =3D vmx_write_tsc_offset,
-	.write_tsc_multiplier =3D vmx_write_tsc_multiplier,
-
-	.load_mmu_pgd =3D vmx_load_mmu_pgd,
-
-	.check_intercept =3D vmx_check_intercept,
-	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
-
-	.request_immediate_exit =3D vmx_request_immediate_exit,
-
-	.sched_in =3D vmx_sched_in,
-
-	.cpu_dirty_log_size =3D PML_ENTITY_NUM,
-	.update_cpu_dirty_logging =3D vmx_update_cpu_dirty_logging,
-
-	.nested_ops =3D &vmx_nested_ops,
-
-	.pi_update_irte =3D vmx_pi_update_irte,
-	.pi_start_assignment =3D vmx_pi_start_assignment,
-
-#ifdef CONFIG_X86_64
-	.set_hv_timer =3D vmx_set_hv_timer,
-	.cancel_hv_timer =3D vmx_cancel_hv_timer,
-#endif
-
-	.setup_mce =3D vmx_setup_mce,
-
-	.smi_allowed =3D vmx_smi_allowed,
-	.enter_smm =3D vmx_enter_smm,
-	.leave_smm =3D vmx_leave_smm,
-	.enable_smi_window =3D vmx_enable_smi_window,
-
-	.can_emulate_instruction =3D vmx_can_emulate_instruction,
-	.apic_init_signal_blocked =3D vmx_apic_init_signal_blocked,
-	.migrate_timers =3D vmx_migrate_timers,
-
-	.msr_filter_changed =3D vmx_msr_filter_changed,
-	.complete_emulated_msr =3D kvm_complete_insn_gp,
-
-	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
-};
-
 static unsigned int vmx_handle_intel_pt_intr(void)
 {
 	struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu();
@@ -8009,9 +7868,7 @@ static void __init vmx_setup_me_spte_mask(void)
 	kvm_mmu_set_me_spte_mask(0, me_mask);
 }
=20
-static struct kvm_x86_init_ops vmx_init_ops __initdata;
-
-static __init int hardware_setup(void)
+__init int vmx_hardware_setup(void)
 {
 	unsigned long host_bndcfgs;
 	struct desc_ptr dt;
@@ -8071,16 +7928,16 @@ static __init int hardware_setup(void)
 	 * using the APIC_ACCESS_ADDR VMCS field.
 	 */
 	if (!flexpriority_enabled)
-		vmx_x86_ops.set_apic_access_page_addr =3D NULL;
+		vt_x86_ops.set_apic_access_page_addr =3D NULL;
=20
 	if (!cpu_has_vmx_tpr_shadow())
-		vmx_x86_ops.update_cr8_intercept =3D NULL;
+		vt_x86_ops.update_cr8_intercept =3D NULL;
=20
 #if IS_ENABLED(CONFIG_HYPERV)
 	if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
 	    && enable_ept) {
-		vmx_x86_ops.tlb_remote_flush =3D hv_remote_flush_tlb;
-		vmx_x86_ops.tlb_remote_flush_with_range =3D
+		vt_x86_ops.tlb_remote_flush =3D hv_remote_flush_tlb;
+		vt_x86_ops.tlb_remote_flush_with_range =3D
 				hv_remote_flush_tlb_with_range;
 	}
 #endif
@@ -8096,7 +7953,7 @@ static __init int hardware_setup(void)
 	if (!cpu_has_vmx_apicv())
 		enable_apicv =3D 0;
 	if (!enable_apicv)
-		vmx_x86_ops.sync_pir_to_irr =3D NULL;
+		vt_x86_ops.sync_pir_to_irr =3D NULL;
=20
 	if (!enable_apicv || !cpu_has_vmx_ipiv())
 		enable_ipiv =3D false;
@@ -8131,7 +7988,7 @@ static __init int hardware_setup(void)
 		enable_pml =3D 0;
=20
 	if (!enable_pml)
-		vmx_x86_ops.cpu_dirty_log_size =3D 0;
+		vt_x86_ops.cpu_dirty_log_size =3D 0;
=20
 	if (!cpu_has_vmx_preemption_timer())
 		enable_preemption_timer =3D false;
@@ -8158,9 +8015,9 @@ static __init int hardware_setup(void)
 	}
=20
 	if (!enable_preemption_timer) {
-		vmx_x86_ops.set_hv_timer =3D NULL;
-		vmx_x86_ops.cancel_hv_timer =3D NULL;
-		vmx_x86_ops.request_immediate_exit =3D __kvm_request_immediate_exit;
+		vt_x86_ops.set_hv_timer =3D NULL;
+		vt_x86_ops.cancel_hv_timer =3D NULL;
+		vt_x86_ops.request_immediate_exit =3D __kvm_request_immediate_exit;
 	}
=20
 	kvm_mce_cap_supported |=3D MCG_LMCE_P;
@@ -8170,9 +8027,9 @@ static __init int hardware_setup(void)
 	if (!enable_ept || !cpu_has_vmx_intel_pt())
 		pt_mode =3D PT_MODE_SYSTEM;
 	if (pt_mode =3D=3D PT_MODE_HOST_GUEST)
-		vmx_init_ops.handle_intel_pt_intr =3D vmx_handle_intel_pt_intr;
+		vt_init_ops.handle_intel_pt_intr =3D vmx_handle_intel_pt_intr;
 	else
-		vmx_init_ops.handle_intel_pt_intr =3D NULL;
+		vt_init_ops.handle_intel_pt_intr =3D NULL;
=20
 	setup_default_sgx_lepubkeyhash();
=20
@@ -8196,16 +8053,6 @@ static __init int hardware_setup(void)
 	return r;
 }
=20
-static struct kvm_x86_init_ops vmx_init_ops __initdata =3D {
-	.cpu_has_kvm_support =3D cpu_has_kvm_support,
-	.disabled_by_bios =3D vmx_disabled_by_bios,
-	.hardware_setup =3D hardware_setup,
-	.handle_intel_pt_intr =3D NULL,
-
-	.runtime_ops =3D &vmx_x86_ops,
-	.pmu_ops =3D &intel_pmu_ops,
-};
-
 static void vmx_cleanup_l1d_flush(void)
 {
 	if (vmx_l1d_flush_pages) {
@@ -8292,7 +8139,7 @@ static int __init vmx_init(void)
 		}
=20
 		if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
-			vmx_x86_ops.enable_direct_tlbflush
+			vt_x86_ops.enable_direct_tlbflush
 				=3D hv_enable_direct_tlbflush;
=20
 	} else {
@@ -8301,8 +8148,8 @@ static int __init vmx_init(void)
 #endif
=20
 	vmx_init_early();
-	r =3D kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
-		     __alignof__(struct vcpu_vmx), THIS_MODULE);
+	r =3D kvm_init(&vt_init_ops, sizeof(struct vcpu_vmx),
+		__alignof__(struct vcpu_vmx), THIS_MODULE);
 	if (r)
 		return r;
=20
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
new file mode 100644
index 000000000000..1d5dff7c0d96
--- /dev/null
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -0,0 +1,125 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_VMX_X86_OPS_H
+#define __KVM_X86_VMX_X86_OPS_H
+
+#include <linux/kvm_host.h>
+
+#include <asm/virtext.h>
+
+#include "x86.h"
+
+__init int vmx_cpu_has_kvm_support(void);
+__init int vmx_disabled_by_bios(void);
+__init int vmx_hardware_setup(void);
+
+extern struct kvm_x86_ops vt_x86_ops __initdata;
+extern struct kvm_x86_init_ops vt_init_ops __initdata;
+
+void vmx_hardware_unsetup(void);
+int vmx_check_processor_compatibility(void);
+int vmx_hardware_enable(void);
+void vmx_hardware_disable(void);
+int vmx_vm_init(struct kvm *kvm);
+void vmx_vm_destroy(struct kvm *kvm);
+int vmx_vcpu_precreate(struct kvm *kvm);
+int vmx_vcpu_create(struct kvm_vcpu *vcpu);
+int vmx_vcpu_pre_run(struct kvm_vcpu *vcpu);
+fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu);
+void vmx_vcpu_free(struct kvm_vcpu *vcpu);
+void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
+void vmx_vcpu_put(struct kvm_vcpu *vcpu);
+int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath);
+void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu);
+int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu);
+void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu);
+int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
+int vmx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection);
+int vmx_enter_smm(struct kvm_vcpu *vcpu, char *smstate);
+int vmx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate);
+void vmx_enable_smi_window(struct kvm_vcpu *vcpu);
+bool vmx_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type,
+				void *insn, int insn_len);
+int vmx_check_intercept(struct kvm_vcpu *vcpu,
+			struct x86_instruction_info *info,
+			enum x86_intercept_stage stage,
+			struct x86_exception *exception);
+bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu);
+void vmx_migrate_timers(struct kvm_vcpu *vcpu);
+void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
+void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu);
+bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason);
+void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr);
+void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr);
+bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu);
+int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu);
+void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector);
+void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
+bool vmx_has_emulated_msr(struct kvm *kvm, u32 index);
+void vmx_msr_filter_changed(struct kvm_vcpu *vcpu);
+void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
+void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu);
+int vmx_get_msr_feature(struct kvm_msr_entry *msr);
+int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
+u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg);
+void vmx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg);
+void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg);
+int vmx_get_cpl(struct kvm_vcpu *vcpu);
+void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
+void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
+void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_leve=
l);
+void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
+bool vmx_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
+int vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer);
+void vmx_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
+void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
+void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
+void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
+void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val);
+void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu);
+void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
+unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu);
+void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
+bool vmx_get_if_flag(struct kvm_vcpu *vcpu);
+void vmx_flush_tlb_all(struct kvm_vcpu *vcpu);
+void vmx_flush_tlb_current(struct kvm_vcpu *vcpu);
+void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr);
+void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu);
+void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask);
+u32 vmx_get_interrupt_shadow(struct kvm_vcpu *vcpu);
+void vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall);
+void vmx_inject_irq(struct kvm_vcpu *vcpu);
+void vmx_inject_nmi(struct kvm_vcpu *vcpu);
+void vmx_queue_exception(struct kvm_vcpu *vcpu);
+void vmx_cancel_injection(struct kvm_vcpu *vcpu);
+int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection);
+int vmx_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection);
+bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu);
+void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
+void vmx_enable_nmi_window(struct kvm_vcpu *vcpu);
+void vmx_enable_irq_window(struct kvm_vcpu *vcpu);
+void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr);
+void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu);
+void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu);
+void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
+int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr);
+int vmx_set_identity_map_addr(struct kvm *kvm, u64 ident_addr);
+u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
+void vmx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
+u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu);
+u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu);
+void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset);
+void vmx_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier);
+void vmx_request_immediate_exit(struct kvm_vcpu *vcpu);
+void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu);
+void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
+#ifdef CONFIG_X86_64
+int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
+		bool *expired);
+void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu);
+#endif
+void vmx_setup_mce(struct kvm_vcpu *vcpu);
+
+#endif /* __KVM_X86_VMX_X86_OPS_H */
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 054A1C4332F
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384351AbiEES3N (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:29:13 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36398 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383176AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2BDA15734;
        Thu,  5 May 2022 11:15:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774551; x=1683310551;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=+4NUmFZ8LyOytDGfSFdAHw6TxIUqH8yTed5boLRoo/E=;
  b=VM5gwPga+tnFGOfUBPsGK820V90Sy0MjuX/M3JaLZO265aE9O5LXfHnQ
   G3/BYYYAkUZLcCanSCrVn62m+WHD1Y2rnZd35pbkXwPNpdRtpk19tYhpp
   QxyktSGR0ngw7UXW55RBNA4iRqOvUvqHIiXe9Lt2kj+hf5mT1cQYfyOly
   C0dMIwKqnQl4za3UOyHxhU/m3C6HW2GnwlUM+Fopi/1Q3cTxqYWz7Pr+4
   7FX+IHymuJFLdsaMEa3C8G5V8PzeU1+nMMBhI6l3uno1uScjqvG6aJD9a
   VfQkZ3be7d952MajL+nt7y+G9zEyWhppI813VABw0o6lDgHUO4qS2DTWD
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248741980"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248741980"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:40 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083141"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:39 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 005/104] x86/virt/vmx/tdx: export platform_has_tdx
Date: Thu,  5 May 2022 11:13:59 -0700
Message-Id: 
 <78d9874f2c6b6006fd9ba4250ead6df064b6b82b.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX KVM uses platform_has_tdx() via vmx_hardware_setup() to check if the
platform supports TDX, concretely CPU SEAM mode, irrespective of TDX module
is loaded or initialized.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 8c49ca40b6ad..b6c82e64ad54 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1715,3 +1715,4 @@ bool platform_has_tdx(void)
 {
 	return seamrr_enabled() && tdx_keyid_sufficient();
 }
+EXPORT_SYMBOL_GPL(platform_has_tdx);
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D73CFC433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384089AbiEES2z (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:28:55 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36854 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383187AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7FFDA1D0E5;
        Thu,  5 May 2022 11:15:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774551; x=1683310551;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=sJEMN3KdJzcgA289GLYEQLBDGgstxFVCGT3Pqq31q48=;
  b=ivbB6Hc/lw3vPqeClb9Bud+Pf1stkhTk44G/MqwimXi+CXRdQz3rU62B
   zv+pPa1/ajIghz7YI7rajZo3Y6wZdaKdM4VVp+ihZleqZUXRd2Sj3ZBLp
   T8zL8e6CsGDPRlMwvANVeEBw50Gc8DLUhfh+3Ib9jo9OC0PIar9MiB/Gq
   hglFXnbNaEqOUAEZntlId1R8uapX3Qd2s2ENs7OTukVrtWyXraGO+5df9
   zEqGWzskWtRX0vTSw7cPb2IFTdFuSW6kdn7erRbzo8jz14ov3iOckcMk2
   878FQ1pBdr67748ERGGnG0VXh/iN6NoGjWAwyOwn/wyRbq7NVumyApG6Y
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248741981"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248741981"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:40 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083144"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:39 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 006/104] KVM: TDX: Detect CPU feature on kernel module
 initialization
Date: Thu,  5 May 2022 11:14:00 -0700
Message-Id: 
 <eb5d2891a3ff55d88545221c588ba87e4c811878.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX requires several initialization steps for KVM to create guest TDs.
Detect CPU feature, enable VMX (TDX is based on VMX), detect TDX module
availability, and initialize TDX module.  This patch implements the first
step to detect CPU feature.  Because VMX isn't enabled yet by VMXON
instruction on KVM kernel module initialization, defer further
initialization step until VMX is enabled by hardware_enable callback.

Introduce a module parameter, enable_tdx, to explicitly enable TDX KVM
support.  It's off by default to keep same behavior for those who don't use
TDX.  Implement CPU feature detection at KVM kernel module initialization
as hardware_setup callback to check if CPU feature is available and get
some CPU parameters.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/tdx.h  |  2 ++
 arch/x86/kvm/Makefile       |  1 +
 arch/x86/kvm/vmx/main.c     | 18 ++++++++++++++++-
 arch/x86/kvm/vmx/tdx.c      | 39 +++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h  |  6 ++++++
 arch/x86/virt/vmx/tdx/tdx.c |  3 ++-
 6 files changed, 67 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/tdx.c

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 513b9ce9a870..f8f459e28254 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -91,11 +91,13 @@ static inline long tdx_kvm_hypercall(unsigned int nr, u=
nsigned long p1,
 #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
=20
 #ifdef CONFIG_INTEL_TDX_HOST
+bool __seamrr_enabled(void);
 void tdx_detect_cpu(struct cpuinfo_x86 *c);
 int tdx_detect(void);
 int tdx_init(void);
 bool platform_has_tdx(void);
 #else
+static inline bool __seamrr_enabled(void) { return false; }
 static inline void tdx_detect_cpu(struct cpuinfo_x86 *c) { }
 static inline int tdx_detect(void) { return -ENODEV; }
 static inline int tdx_init(void) { return -ENODEV; }
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index ee4d0999f20f..e2c05195cb95 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -24,6 +24,7 @@ kvm-$(CONFIG_KVM_XEN)	+=3D xen.o
 kvm-intel-y		+=3D vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
 			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o vmx/main.o
 kvm-intel-$(CONFIG_X86_SGX_KVM)	+=3D vmx/sgx.o
+kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+=3D vmx/tdx.o
=20
 kvm-amd-y		+=3D svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o =
svm/sev.o
=20
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 636768f5b985..fabf5f22c94f 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -6,6 +6,22 @@
 #include "nested.h"
 #include "pmu.h"
=20
+static bool __read_mostly enable_tdx =3D IS_ENABLED(CONFIG_INTEL_TDX_HOST);
+module_param_named(tdx, enable_tdx, bool, 0444);
+
+static __init int vt_hardware_setup(void)
+{
+	int ret;
+
+	ret =3D vmx_hardware_setup();
+	if (ret)
+		return ret;
+
+	enable_tdx =3D enable_tdx && !tdx_hardware_setup(&vt_x86_ops);
+
+	return 0;
+}
+
 struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
=20
@@ -147,7 +163,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 struct kvm_x86_init_ops vt_init_ops __initdata =3D {
 	.cpu_has_kvm_support =3D vmx_cpu_has_kvm_support,
 	.disabled_by_bios =3D vmx_disabled_by_bios,
-	.hardware_setup =3D vmx_hardware_setup,
+	.hardware_setup =3D vt_hardware_setup,
 	.handle_intel_pt_intr =3D NULL,
=20
 	.runtime_ops =3D &vt_x86_ops,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
new file mode 100644
index 000000000000..9e26e3fa60ee
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/cpu.h>
+
+#include <asm/tdx.h>
+
+#include "capabilities.h"
+#include "x86_ops.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt) "tdx: " fmt
+
+static u64 hkid_mask __ro_after_init;
+static u8 hkid_start_pos __ro_after_init;
+
+int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops)
+{
+	u32 max_pa;
+
+	if (!enable_ept) {
+		pr_warn("Cannot enable TDX with EPT disabled\n");
+		return -EINVAL;
+	}
+
+	if (!platform_has_tdx()) {
+		if (__seamrr_enabled())
+			pr_warn("Cannot enable TDX with SEAMRR disabled\n");
+		return -ENODEV;
+	}
+
+	if (WARN_ON_ONCE(x86_ops->tlb_remote_flush))
+		return -EIO;
+
+	max_pa =3D cpuid_eax(0x80000008) & 0xff;
+	hkid_start_pos =3D boot_cpu_data.x86_phys_bits;
+	hkid_mask =3D GENMASK_ULL(max_pa - 1, hkid_start_pos);
+	pr_info("hkid start pos %d mask 0x%llx\n", hkid_start_pos, hkid_mask);
+
+	return 0;
+}
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 1d5dff7c0d96..7a885dc84183 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -122,4 +122,10 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu);
 #endif
 void vmx_setup_mce(struct kvm_vcpu *vcpu);
=20
+#ifdef CONFIG_INTEL_TDX_HOST
+int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
+#else
+static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
+#endif
+
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index b6c82e64ad54..e8044114079d 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -126,10 +126,11 @@ static int __init tdx_host_setup(char *s)
 }
 __setup("tdx_host=3D", tdx_host_setup);
=20
-static bool __seamrr_enabled(void)
+bool __seamrr_enabled(void)
 {
 	return (seamrr_mask & SEAMRR_ENABLED_BITS) =3D=3D SEAMRR_ENABLED_BITS;
 }
+EXPORT_SYMBOL_GPL(__seamrr_enabled);
=20
 static void detect_seam_bsp(struct cpuinfo_x86 *c)
 {
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B427DC433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:19:16 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383859AbiEESWy (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:22:54 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36852 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383184AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F8942D1D5;
        Thu,  5 May 2022 11:15:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774552; x=1683310552;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=7cTG/wgSucNiB6hMNj+y4rrPyFsAdfnJRxHUvG7loWQ=;
  b=n2NHlNfXvqrEubPLljjSuxJ/dRkYE5vHmWx1LD54QU3emJTt1pocF5Yr
   4ISIGvzxpj4JYlraFnVm45LVitpKaaQ9Jnhvrt5YQcXrYF7A+7jB+dNA9
   h7dc/CAXVVrxFA/VyHu5K0eEiOEk5nkh65RG/tnSnEyCjovbSHDsSGo1B
   OzD5YoDut3FgbTmfKilCbuhP2f0vq+y3cH0W8pBGmIe/DNsbhkvseM+mE
   kv7xE4ahQBmGKJgAQhQNheTbU9BY5dnxj6pizWtbJbzvduDpgkkdDZ77d
   gaj6ocWnc1kH/6noU2CJ+zmqveZRhkIyvUIMj4UtEvLLoWk7UmoLKk0nE
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248741982"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248741982"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:40 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083148"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:40 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 007/104] KVM: Enable hardware before doing arch VM
 initialization
Date: Thu,  5 May 2022 11:14:01 -0700
Message-Id: 
 <e97911f54055ca0e4728f9f010da8e4bb4785191.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Swap the order of hardware_enable_all() and kvm_arch_init_vm() to
accommodate Intel's TDX, which needs VMX to be enabled during VM init in
order to make SEAMCALLs.

This also provides consistent ordering between kvm_create_vm() and
kvm_destroy_vm() with respect to calling kvm_arch_destroy_vm() and
hardware_disable_all().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virt/kvm/kvm_main.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0ff03889aa5d..6b9dbc55af9e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1129,19 +1129,19 @@ static struct kvm *kvm_create_vm(unsigned long type)
 		rcu_assign_pointer(kvm->buses[i],
 			kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL_ACCOUNT));
 		if (!kvm->buses[i])
-			goto out_err_no_arch_destroy_vm;
+			goto out_err_no_disable;
 	}
=20
 	kvm->max_halt_poll_ns =3D halt_poll_ns;
=20
-	r =3D kvm_arch_init_vm(kvm, type);
-	if (r)
-		goto out_err_no_arch_destroy_vm;
-
 	r =3D hardware_enable_all();
 	if (r)
 		goto out_err_no_disable;
=20
+	r =3D kvm_arch_init_vm(kvm, type);
+	if (r)
+		goto out_err_no_arch_destroy_vm;
+
 #ifdef CONFIG_HAVE_KVM_IRQFD
 	INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list);
 #endif
@@ -1179,10 +1179,10 @@ static struct kvm *kvm_create_vm(unsigned long type)
 		mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
 #endif
 out_err_no_mmu_notifier:
-	hardware_disable_all();
-out_err_no_disable:
 	kvm_arch_destroy_vm(kvm);
 out_err_no_arch_destroy_vm:
+	hardware_disable_all();
+out_err_no_disable:
 	WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
 	for (i =3D 0; i < KVM_NR_BUSES; i++)
 		kfree(kvm_get_bus(kvm, i));
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5EB54C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:16:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383359AbiEESTu (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:19:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36216 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383069AbiEESTX (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:23 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B4D411C33;
        Thu,  5 May 2022 11:15:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774543; x=1683310543;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=edji8pZckxmQsAtS6gc1oiKadfL+agtktzsETJQF6rs=;
  b=U31hSxEzzd3lv5ezjXk0F4OMMPQa225Wir6UDOadAQ6eC/ylp6vxtvEt
   DN0NCJkSVRCb6ZFywNh3KpQdp8KZ7SBbThJ0q4Ru2KiKTXsP8J3BqDQpn
   sXmdzDxwS8YFumHNJ4mm2ttCHgvhtQjAWmrg+hLEPDjH0zdO8fwjufvY5
   xD34GHSuaW7ACY1hRhX0IM1Thh1XZidjnPgceyLOVO2arD7dHmTjukAQr
   7pJNPIZfgett70zw6hTVH90kb+vJZ5KUailEt9g6i63ZMVHEtnKmsvvyZ
   hcywgqa+vdyvJar64RfUFnxM4GwlDY/NL9kdVFe6rqnkNa2JeXQATsIE7
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746228"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746228"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:40 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083153"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:40 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 008/104] KVM: x86: Refactor KVM VMX module init/exit
 functions
Date: Thu,  5 May 2022 11:14:02 -0700
Message-Id: 
 <ed25a8ca0f1ce36cfb0385da1374292a9d1d5f4b.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Currently, KVM VMX module initialization/exit functions are a single
function each.  Refactor KVM VMX module initialization functions into KVM
common part and VMX part so that TDX specific part can be added cleanly.
Opportunistically refactor module exit function as well.

The current module initialization flow is, 1.) calculate the sizes of VMX
kvm structure and VMX vcpu structure, 2.) hyper-v specific initialization
3.) report those sizes to the KVM common layer and KVM common
initialization, and 4.) VMX specific system-wide initialization.

Refactor the KVM VMX module initialization function into functions with a
wrapper function to separate VMX logic in vmx.c from a file, main.c, common
among VMX and TDX.  We have a wrapper function, "vt_init() {vmx kvm/vcpu
size calculation; hv_vp_assist_page_init(); kvm_init(); vmx_init(); }" in
main.c, and hv_vp_assist_page_init() and vmx_init() in vmx.c.
hv_vp_assist_page_init() initializes hyper-v specific assist pages,
kvm_init() does system-wide initialization of the KVM common layer, and
vmx_init() does system-wide VMX initialization.

The KVM architecture common layer allocates struct kvm with reported size
for architecture-specific code.  The KVM VMX module defines its structure
as struct vmx_kvm { struct kvm; VMX specific members;} and uses it as
struct vmx kvm.  Similar for vcpu structure. TDX KVM patches will define
TDX specific kvm and vcpu structures, add tdx_pre_kvm_init() to report the
sizes of them to the KVM common layer.

The current module exit function is also a single function, a combination
of VMX specific logic and common KVM logic.  Refactor it into VMX specific
logic and KVM common logic.  This is just refactoring to keep the VMX
specific logic in vmx.c from main.c.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    |  38 +++++++++++++
 arch/x86/kvm/vmx/vmx.c     | 106 ++++++++++++++++++-------------------
 arch/x86/kvm/vmx/x86_ops.h |   6 +++
 3 files changed, 95 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index fabf5f22c94f..371dad728166 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -169,3 +169,41 @@ struct kvm_x86_init_ops vt_init_ops __initdata =3D {
 	.runtime_ops =3D &vt_x86_ops,
 	.pmu_ops =3D &intel_pmu_ops,
 };
+
+static int __init vt_init(void)
+{
+	unsigned int vcpu_size, vcpu_align;
+	int r;
+
+	vt_x86_ops.vm_size =3D sizeof(struct kvm_vmx);
+	vcpu_size =3D sizeof(struct vcpu_vmx);
+	vcpu_align =3D __alignof__(struct vcpu_vmx);
+
+	hv_vp_assist_page_init();
+	vmx_init_early();
+
+	r =3D kvm_init(&vt_init_ops, vcpu_size, vcpu_align, THIS_MODULE);
+	if (r)
+		goto err_vmx_post_exit;
+
+	r =3D vmx_init();
+	if (r)
+		goto err_kvm_exit;
+
+	return 0;
+
+err_kvm_exit:
+	kvm_exit();
+err_vmx_post_exit:
+	hv_vp_assist_page_exit();
+	return r;
+}
+module_init(vt_init);
+
+static void vt_exit(void)
+{
+	vmx_exit();
+	kvm_exit();
+	hv_vp_assist_page_exit();
+}
+module_exit(vt_exit);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e08be67352e0..b5846e0fc78f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8063,15 +8063,45 @@ static void vmx_cleanup_l1d_flush(void)
 	l1tf_vmx_mitigation =3D VMENTER_L1D_FLUSH_AUTO;
 }
=20
-static void vmx_exit(void)
+void __init hv_vp_assist_page_init(void)
 {
-#ifdef CONFIG_KEXEC_CORE
-	RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
-	synchronize_rcu();
-#endif
+#if IS_ENABLED(CONFIG_HYPERV)
+	/*
+	 * Enlightened VMCS usage should be recommended and the host needs
+	 * to support eVMCS v1 or above. We can also disable eVMCS support
+	 * with module parameter.
+	 */
+	if (enlightened_vmcs &&
+	    ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED &&
+	    (ms_hyperv.nested_features & HV_X64_ENLIGHTENED_VMCS_VERSION) >=3D
+	    KVM_EVMCS_VERSION) {
+		int cpu;
+
+		/* Check that we have assist pages on all online CPUs */
+		for_each_online_cpu(cpu) {
+			if (!hv_get_vp_assist_page(cpu)) {
+				enlightened_vmcs =3D false;
+				break;
+			}
+		}
=20
-	kvm_exit();
+		if (enlightened_vmcs) {
+			pr_info("KVM: vmx: using Hyper-V Enlightened VMCS\n");
+			static_branch_enable(&enable_evmcs);
+		}
+
+		if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
+			vt_x86_ops.enable_direct_tlbflush
+				=3D hv_enable_direct_tlbflush;
=20
+	} else {
+		enlightened_vmcs =3D false;
+	}
+#endif
+}
+
+void hv_vp_assist_page_exit(void)
+{
 #if IS_ENABLED(CONFIG_HYPERV)
 	if (static_branch_unlikely(&enable_evmcs)) {
 		int cpu;
@@ -8095,14 +8125,10 @@ static void vmx_exit(void)
 		static_branch_disable(&enable_evmcs);
 	}
 #endif
-	vmx_cleanup_l1d_flush();
-
-	allow_smaller_maxphyaddr =3D false;
 }
-module_exit(vmx_exit);
=20
 /* initialize before kvm_init() so that hardware_enable/disable() can work=
. */
-static void __init vmx_init_early(void)
+void __init vmx_init_early(void)
 {
 	int cpu;
=20
@@ -8110,49 +8136,10 @@ static void __init vmx_init_early(void)
 		INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
 }
=20
-static int __init vmx_init(void)
+int __init vmx_init(void)
 {
 	int r, cpu;
=20
-#if IS_ENABLED(CONFIG_HYPERV)
-	/*
-	 * Enlightened VMCS usage should be recommended and the host needs
-	 * to support eVMCS v1 or above. We can also disable eVMCS support
-	 * with module parameter.
-	 */
-	if (enlightened_vmcs &&
-	    ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED &&
-	    (ms_hyperv.nested_features & HV_X64_ENLIGHTENED_VMCS_VERSION) >=3D
-	    KVM_EVMCS_VERSION) {
-
-		/* Check that we have assist pages on all online CPUs */
-		for_each_online_cpu(cpu) {
-			if (!hv_get_vp_assist_page(cpu)) {
-				enlightened_vmcs =3D false;
-				break;
-			}
-		}
-
-		if (enlightened_vmcs) {
-			pr_info("KVM: vmx: using Hyper-V Enlightened VMCS\n");
-			static_branch_enable(&enable_evmcs);
-		}
-
-		if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
-			vt_x86_ops.enable_direct_tlbflush
-				=3D hv_enable_direct_tlbflush;
-
-	} else {
-		enlightened_vmcs =3D false;
-	}
-#endif
-
-	vmx_init_early();
-	r =3D kvm_init(&vt_init_ops, sizeof(struct vcpu_vmx),
-		__alignof__(struct vcpu_vmx), THIS_MODULE);
-	if (r)
-		return r;
-
 	/*
 	 * Must be called after kvm_init() so enable_ept is properly set
 	 * up. Hand the parameter mitigation value in which was stored in
@@ -8161,10 +8148,8 @@ static int __init vmx_init(void)
 	 * mitigation mode.
 	 */
 	r =3D vmx_setup_l1d_flush(vmentry_l1d_flush_param);
-	if (r) {
-		vmx_exit();
+	if (r)
 		return r;
-	}
=20
 	for_each_possible_cpu(cpu)
 		pi_init_cpu(cpu);
@@ -8185,4 +8170,15 @@ static int __init vmx_init(void)
=20
 	return 0;
 }
-module_init(vmx_init);
+
+void vmx_exit(void)
+{
+#ifdef CONFIG_KEXEC_CORE
+	RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
+	synchronize_rcu();
+#endif
+
+	vmx_cleanup_l1d_flush();
+
+	allow_smaller_maxphyaddr =3D false;
+}
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 7a885dc84183..e28f4e0653b8 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -8,6 +8,12 @@
=20
 #include "x86.h"
=20
+void __init hv_vp_assist_page_init(void);
+void hv_vp_assist_page_exit(void);
+void __init vmx_init_early(void);
+int __init vmx_init(void);
+void vmx_exit(void);
+
 __init int vmx_cpu_has_kvm_support(void);
 __init int vmx_disabled_by_bios(void);
 __init int vmx_hardware_setup(void);
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3FED9C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:16:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383418AbiEESUA (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:20:00 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36218 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383070AbiEESTX (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:23 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A0C512081;
        Thu,  5 May 2022 11:15:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774543; x=1683310543;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=KUPJ6dK5lBD3fP3mP/jmpr0owrv6CGzN1HXbf/5M6zI=;
  b=gZcmVafm0X6sEujpyC93BV9dl1OyG+vNeTL36bcc/c/3H/MURe+suYkN
   Ru06MGGDbE8deaoScPiqDJ+Cjb5c2BJm+1CrzwbGnvkBnPT94u6GFqLsu
   T8LHsCdOsAQkL5mixN0W388z4XME8mYKYiX0y5FSHJglFQqzgBgNh345X
   mI4SpLWDm1yuYocFahUuQ65EHmZW5onDUun7k3qSI29e080qPc5I7bkee
   vFv7gN8+g6FWDTkxGyxy4GzV+ruwe5SHbk922ildC9ydE5bYTx8if0loH
   STX549VO5YP0faRE9ZI73z0wBYPtG6nvSocUV/rrx2O5bhWhZAaC5yWfa
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746229"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746229"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:40 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083157"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:40 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 009/104] KVM: TDX: Add placeholders for TDX VM/vcpu
 structure
Date: Thu,  5 May 2022 11:14:03 -0700
Message-Id: 
 <021879376d16dfd2c1236857a95672275fdc65fa.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add placeholders TDX VM/vcpu structure that overlays with VMX VM/vcpu
structures.  Initialize VM structure size and vcpu size/align so that x86
KVM common code knows those size irrespective of VMX or TDX.  Those
structures will be populated as guest creation logic develops.

Add helper functions to check if the VM is guest TD and add conversion
functions between KVM VM/VCPU and TDX VM/VCPU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c |  8 +++---
 arch/x86/kvm/vmx/tdx.c  |  1 +
 arch/x86/kvm/vmx/tdx.h  | 54 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/tdx.h

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 371dad728166..349534412216 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -5,6 +5,7 @@
 #include "vmx.h"
 #include "nested.h"
 #include "pmu.h"
+#include "tdx.h"
=20
 static bool __read_mostly enable_tdx =3D IS_ENABLED(CONFIG_INTEL_TDX_HOST);
 module_param_named(tdx, enable_tdx, bool, 0444);
@@ -175,9 +176,10 @@ static int __init vt_init(void)
 	unsigned int vcpu_size, vcpu_align;
 	int r;
=20
-	vt_x86_ops.vm_size =3D sizeof(struct kvm_vmx);
-	vcpu_size =3D sizeof(struct vcpu_vmx);
-	vcpu_align =3D __alignof__(struct vcpu_vmx);
+	vt_x86_ops.vm_size =3D max(sizeof(struct kvm_vmx), sizeof(struct kvm_tdx)=
);
+	vcpu_size =3D max(sizeof(struct vcpu_vmx), sizeof(struct vcpu_tdx));
+	vcpu_align =3D max(__alignof__(struct vcpu_vmx),
+			__alignof__(struct vcpu_tdx));
=20
 	hv_vp_assist_page_init();
 	vmx_init_early();
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 9e26e3fa60ee..5d2b4e87b9b7 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -5,6 +5,7 @@
=20
 #include "capabilities.h"
 #include "x86_ops.h"
+#include "tdx.h"
=20
 #undef pr_fmt
 #define pr_fmt(fmt) "tdx: " fmt
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
new file mode 100644
index 000000000000..060bf48ec3d6
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_TDX_H
+#define __KVM_X86_TDX_H
+
+#ifdef CONFIG_INTEL_TDX_HOST
+struct kvm_tdx {
+	struct kvm kvm;
+	/* TDX specific members follow. */
+};
+
+struct vcpu_tdx {
+	struct kvm_vcpu	vcpu;
+	/* TDX specific members follow. */
+};
+
+static inline bool is_td(struct kvm *kvm)
+{
+	/*
+	 * TDX VM type isn't defined yet.
+	 * return kvm->arch.vm_type =3D=3D KVM_X86_TDX_VM;
+	 */
+	return false;
+}
+
+static inline bool is_td_vcpu(struct kvm_vcpu *vcpu)
+{
+	return is_td(vcpu->kvm);
+}
+
+static inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm)
+{
+	return container_of(kvm, struct kvm_tdx, kvm);
+}
+
+static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu)
+{
+	return container_of(vcpu, struct vcpu_tdx, vcpu);
+}
+#else
+struct kvm_tdx {
+	struct kvm kvm;
+};
+
+struct vcpu_tdx {
+	struct kvm_vcpu	vcpu;
+};
+
+static inline bool is_td(struct kvm *kvm) { return false; }
+static inline bool is_td_vcpu(struct kvm_vcpu *vcpu) { return false; }
+static inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm) { return NULL; }
+static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu) { return NULL=
; }
+#endif /* CONFIG_INTEL_TDX_HOST */
+
+#endif /* __KVM_X86_TDX_H */
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D15EFC433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:17:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383513AbiEESVc (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:21:32 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36322 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383081AbiEEST1 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:27 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F8EA14024;
        Thu,  5 May 2022 11:15:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774544; x=1683310544;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Kj6DgUMuVhHYbdH5XNKmLJ6UF+Qnim280mYzIBKMa0Q=;
  b=Yt6Mvm1OdkxH8YWncGPNlxKGQ4FETP+OZzXxBbWZPb5/x+IHPMdnM2z/
   YXH1AXqFm0zqaEcWo76p84QpDXXt1dXjctRadfP6tYcTjTQCoHQR7Kh1K
   o+ICOgjriCg8Jg2XL8+i0HO6RE0o/b3mng72Rbl0sEbO6lM0ASY34CYJu
   gr8j9srBA2aOL1seXAWYorrVoyvAs8BOyqJWWHN4qQAm8+HOmwsPG6p4k
   PIiG01rgSMSfPz6v9VTjbtkp5W34OvdFoUdbnqUpZ926mBseowod2Vlzm
   tT1d3FqUKTYSICDGrdbO6aMJTgO23/jjaHK6we6j8GVQ2hZDozwkZcJct
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746230"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746230"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083160"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:40 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 010/104] x86/virt/tdx: Add a helper function to return
 system wide info about TDX module
Date: Thu,  5 May 2022 11:14:04 -0700
Message-Id: 
 <2da654ee581dfcc8ea375bc4dbd313038fc1fe5f.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX KVM needs system-wide information about the TDX module, struct
tdsysinfo_struct.  Add a helper function tdx_get_sysinfo() to return it
instead of KVM getting it with various error checks.  Move out the struct
definition about it to common place tdx_host.h.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/tdx.h  | 55 +++++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.c | 16 +++++++++--
 arch/x86/virt/vmx/tdx/tdx.h | 52 -----------------------------------
 3 files changed, 69 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index f8f459e28254..0670c86de015 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -91,17 +91,72 @@ static inline long tdx_kvm_hypercall(unsigned int nr, u=
nsigned long p1,
 #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
=20
 #ifdef CONFIG_INTEL_TDX_HOST
+struct tdx_cpuid_config {
+	u32	leaf;
+	u32	sub_leaf;
+	u32	eax;
+	u32	ebx;
+	u32	ecx;
+	u32	edx;
+} __packed;
+
+#define TDSYSINFO_STRUCT_SIZE		1024
+#define TDSYSINFO_STRUCT_ALIGNMENT	1024
+
+struct tdsysinfo_struct {
+	/* TDX-SEAM Module Info */
+	u32	attributes;
+	u32	vendor_id;
+	u32	build_date;
+	u16	build_num;
+	u16	minor_version;
+	u16	major_version;
+	u8	reserved0[14];
+	/* Memory Info */
+	u16	max_tdmrs;
+	u16	max_reserved_per_tdmr;
+	u16	pamt_entry_size;
+	u8	reserved1[10];
+	/* Control Struct Info */
+	u16	tdcs_base_size;
+	u8	reserved2[2];
+	u16	tdvps_base_size;
+	u8	tdvps_xfam_dependent_size;
+	u8	reserved3[9];
+	/* TD Capabilities */
+	u64	attributes_fixed0;
+	u64	attributes_fixed1;
+	u64	xfam_fixed0;
+	u64	xfam_fixed1;
+	u8	reserved4[32];
+	u32	num_cpuid_config;
+	/*
+	 * The actual number of CPUID_CONFIG depends on above
+	 * 'num_cpuid_config'.  The size of 'struct tdsysinfo_struct'
+	 * is 1024B defined by TDX architecture.  Use a union with
+	 * specific padding to make 'sizeof(struct tdsysinfo_struct)'
+	 * equal to 1024.
+	 */
+	union {
+		struct tdx_cpuid_config	cpuid_configs[0];
+		u8			reserved5[892];
+	};
+} __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
+
 bool __seamrr_enabled(void);
 void tdx_detect_cpu(struct cpuinfo_x86 *c);
 int tdx_detect(void);
 int tdx_init(void);
 bool platform_has_tdx(void);
+const struct tdsysinfo_struct *tdx_get_sysinfo(void);
 #else
 static inline bool __seamrr_enabled(void) { return false; }
 static inline void tdx_detect_cpu(struct cpuinfo_x86 *c) { }
 static inline int tdx_detect(void) { return -ENODEV; }
 static inline int tdx_init(void) { return -ENODEV; }
 static inline bool platform_has_tdx(void) { return false; }
+struct tdsysinfo_struct;
+static inline const struct tdsysinfo_struct *tdx_get_sysinfo(void) { retur=
n NULL; }
 #endif /* CONFIG_INTEL_TDX_HOST */
=20
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index e8044114079d..1ef22c445126 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -645,7 +645,7 @@ static int sanitize_cmrs(struct cmr_info *cmr_array, in=
t cmr_num)
 	return 0;
 }
=20
-static int tdx_get_sysinfo(void)
+static int __tdx_get_sysinfo(void)
 {
 	struct tdx_module_output out;
 	u64 tdsysinfo_sz, cmr_num;
@@ -680,6 +680,18 @@ static int tdx_get_sysinfo(void)
 	return sanitize_cmrs(tdx_cmr_array, cmr_num);
 }
=20
+const struct tdsysinfo_struct *tdx_get_sysinfo(void)
+{
+       const struct tdsysinfo_struct *r =3D NULL;
+
+       mutex_lock(&tdx_module_lock);
+       if (tdx_module_status =3D=3D TDX_MODULE_INITIALIZED)
+	       r =3D &tdx_sysinfo;
+       mutex_unlock(&tdx_module_lock);
+       return r;
+}
+EXPORT_SYMBOL_GPL(tdx_get_sysinfo);
+
 /* Check whether one e820 entry is RAM and could be used as TDX memory */
 static bool e820_entry_is_ram(struct e820_entry *entry)
 {
@@ -1467,7 +1479,7 @@ static int init_tdx_module(void)
 		goto out;
=20
 	/* Get TDX module information and CMRs */
-	ret =3D tdx_get_sysinfo();
+	ret =3D __tdx_get_sysinfo();
 	if (ret)
 		goto out;
=20
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 212f83374c0a..b071d299327b 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -37,58 +37,6 @@ struct cmr_info {
 #define MAX_CMRS			32
 #define CMR_INFO_ARRAY_ALIGNMENT	512
=20
-struct cpuid_config {
-	u32	leaf;
-	u32	sub_leaf;
-	u32	eax;
-	u32	ebx;
-	u32	ecx;
-	u32	edx;
-} __packed;
-
-#define TDSYSINFO_STRUCT_SIZE		1024
-#define TDSYSINFO_STRUCT_ALIGNMENT	1024
-
-struct tdsysinfo_struct {
-	/* TDX-SEAM Module Info */
-	u32	attributes;
-	u32	vendor_id;
-	u32	build_date;
-	u16	build_num;
-	u16	minor_version;
-	u16	major_version;
-	u8	reserved0[14];
-	/* Memory Info */
-	u16	max_tdmrs;
-	u16	max_reserved_per_tdmr;
-	u16	pamt_entry_size;
-	u8	reserved1[10];
-	/* Control Struct Info */
-	u16	tdcs_base_size;
-	u8	reserved2[2];
-	u16	tdvps_base_size;
-	u8	tdvps_xfam_dependent_size;
-	u8	reserved3[9];
-	/* TD Capabilities */
-	u64	attributes_fixed0;
-	u64	attributes_fixed1;
-	u64	xfam_fixed0;
-	u64	xfam_fixed1;
-	u8	reserved4[32];
-	u32	num_cpuid_config;
-	/*
-	 * The actual number of CPUID_CONFIG depends on above
-	 * 'num_cpuid_config'.  The size of 'struct tdsysinfo_struct'
-	 * is 1024B defined by TDX architecture.  Use a union with
-	 * specific padding to make 'sizeof(struct tdsysinfo_struct)'
-	 * equal to 1024.
-	 */
-	union {
-		struct cpuid_config	cpuid_configs[0];
-		u8			reserved5[892];
-	};
-} __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
-
 struct tdmr_reserved_area {
 	u64 offset;
 	u64 size;
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 92319C4332F
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:17:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383737AbiEESV0 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:21:26 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36324 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383082AbiEEST1 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:27 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94E5A14081;
        Thu,  5 May 2022 11:15:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774544; x=1683310544;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=eZnEaulazRXd6HNzQIj0/VmrZv00fy3MDt+VIsVcMYQ=;
  b=CTE9JvCCwW80eyUQQAOGaCOo3/BxBPw57KktG4AL6Y3TQyYpPgaRbYq6
   ncCdqNHGi4GZZ2KQomXcmsx/aHWyCscpunawl0Hr0A84mCzdnyL+f+fFm
   m/GaMPtdWbYZrsVZ5nbIIowNp2oNOiSpVqo6ZslDlc+HZABbJoKyDKvYq
   5AfTta4MZ2isjZ+6FXhmxmj2GV4uKzvChjCoDHOsmsE87DgnNGNoXptfr
   rqM3vAijgNTgy+hfsj/qK6EFTEZpnbu+r4xFg0rA1F4W4V1JhJJjiqRQj
   I4iFSO39tp+3X96CDFoKVniMhWS8KqK/PShZt2686kGOplfUf7S2oyUib
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746231"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746231"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083163"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:40 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 011/104] KVM: TDX: Initialize TDX module when loading
 kvm_intel.ko
Date: Thu,  5 May 2022 11:14:05 -0700
Message-Id: 
 <752bc449e13cb3e6874ba2d82f790f6f6018813c.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

To use TDX functionality, TDX module needs to be loaded and initialized.
A TDX host patch series[1] implements the detection of the TDX module,
tdx_detect() and its initialization, tdx_init().

This patch is to call those functions, tdx_detect() and tdx_init(), when
loading kvm_intel.ko.

Add a hook, kvm_arch_post_hardware_enable_setup, to module initialization
while hardware is enabled, i.e. after hardware_enable_all() and before
hardware_disable_all().  Because TDX requires all present CPUs to enable
VMX (VMXON).

[1] https://lore.kernel.org/lkml/cover.1649219184.git.kai.huang@intel.com/

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/vmx/main.c         | 11 ++++++
 arch/x86/kvm/vmx/tdx.c          | 66 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h          |  4 ++
 arch/x86/kvm/x86.c              |  8 ++++
 include/linux/kvm_host.h        |  1 +
 virt/kvm/kvm_main.c             | 11 ++++++
 7 files changed, 102 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 5bea36d2d5a4..24c15cfe6c32 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1543,6 +1543,7 @@ struct kvm_x86_init_ops {
 	int (*cpu_has_kvm_support)(void);
 	int (*disabled_by_bios)(void);
 	int (*hardware_setup)(void);
+	int (*post_hardware_enable_setup)(void);
 	unsigned int (*handle_intel_pt_intr)(void);
=20
 	struct kvm_x86_ops *runtime_ops;
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 349534412216..ac788af17d92 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -23,6 +23,16 @@ static __init int vt_hardware_setup(void)
 	return 0;
 }
=20
+static int __init vt_post_hardware_enable_setup(void)
+{
+	enable_tdx =3D enable_tdx && !tdx_module_setup();
+	/*
+	 * Even if it failed to initialize TDX module, conventional VMX is
+	 * available.  Keep VMX usable.
+	 */
+	return 0;
+}
+
 struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
=20
@@ -165,6 +175,7 @@ struct kvm_x86_init_ops vt_init_ops __initdata =3D {
 	.cpu_has_kvm_support =3D vmx_cpu_has_kvm_support,
 	.disabled_by_bios =3D vmx_disabled_by_bios,
 	.hardware_setup =3D vt_hardware_setup,
+	.post_hardware_enable_setup =3D vt_post_hardware_enable_setup,
 	.handle_intel_pt_intr =3D NULL,
=20
 	.runtime_ops =3D &vt_x86_ops,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 5d2b4e87b9b7..34aac9b5d43e 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -13,6 +13,72 @@
 static u64 hkid_mask __ro_after_init;
 static u8 hkid_start_pos __ro_after_init;
=20
+#define TDX_MAX_NR_CPUID_CONFIGS					\
+	((sizeof(struct tdsysinfo_struct) -				\
+		offsetof(struct tdsysinfo_struct, cpuid_configs))	\
+		/ sizeof(struct tdx_cpuid_config))
+
+struct tdx_capabilities {
+	u8 tdcs_nr_pages;
+	u8 tdvpx_nr_pages;
+
+	u64 attrs_fixed0;
+	u64 attrs_fixed1;
+	u64 xfam_fixed0;
+	u64 xfam_fixed1;
+
+	u32 nr_cpuid_configs;
+	struct tdx_cpuid_config cpuid_configs[TDX_MAX_NR_CPUID_CONFIGS];
+};
+
+/* Capabilities of KVM + the TDX module. */
+static struct tdx_capabilities tdx_caps;
+
+int __init tdx_module_setup(void)
+{
+	const struct tdsysinfo_struct *tdsysinfo;
+	int ret =3D 0;
+
+	BUILD_BUG_ON(sizeof(*tdsysinfo) !=3D 1024);
+	BUILD_BUG_ON(TDX_MAX_NR_CPUID_CONFIGS !=3D 37);
+
+	ret =3D tdx_detect();
+	if (ret) {
+		pr_info("Failed to detect TDX module.\n");
+		return ret;
+	}
+
+	ret =3D tdx_init();
+	if (ret) {
+		pr_info("Failed to initialize TDX module.\n");
+		return ret;
+	}
+
+	tdsysinfo =3D tdx_get_sysinfo();
+	if (tdx_caps.nr_cpuid_configs > TDX_MAX_NR_CPUID_CONFIGS)
+		return -EIO;
+
+	tdx_caps =3D (struct tdx_capabilities) {
+		.tdcs_nr_pages =3D tdsysinfo->tdcs_base_size / PAGE_SIZE,
+		/*
+		 * TDVPS =3D TDVPR(4K page) + TDVPX(multiple 4K pages).
+		 * -1 for TDVPR.
+		 */
+		.tdvpx_nr_pages =3D tdsysinfo->tdvps_base_size / PAGE_SIZE - 1,
+		.attrs_fixed0 =3D tdsysinfo->attributes_fixed0,
+		.attrs_fixed1 =3D tdsysinfo->attributes_fixed1,
+		.xfam_fixed0 =3D	tdsysinfo->xfam_fixed0,
+		.xfam_fixed1 =3D tdsysinfo->xfam_fixed1,
+		.nr_cpuid_configs =3D tdsysinfo->num_cpuid_config,
+	};
+	if (!memcpy(tdx_caps.cpuid_configs, tdsysinfo->cpuid_configs,
+			tdsysinfo->num_cpuid_config *
+			sizeof(struct tdx_cpuid_config)))
+		return -EIO;
+
+	return 0;
+}
+
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops)
 {
 	u32 max_pa;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 060bf48ec3d6..54d7a26ed9ee 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -3,6 +3,8 @@
 #define __KVM_X86_TDX_H
=20
 #ifdef CONFIG_INTEL_TDX_HOST
+int tdx_module_setup(void);
+
 struct kvm_tdx {
 	struct kvm kvm;
 	/* TDX specific members follow. */
@@ -37,6 +39,8 @@ static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vc=
pu)
 	return container_of(vcpu, struct vcpu_tdx, vcpu);
 }
 #else
+static inline int tdx_module_setup(void) { return -ENODEV; };
+
 struct kvm_tdx {
 	struct kvm kvm;
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 65f725a49e67..bf77a8b64647 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11725,6 +11725,14 @@ int kvm_arch_hardware_setup(void *opaque)
 	return 0;
 }
=20
+int kvm_arch_post_hardware_enable_setup(void *opaque)
+{
+	struct kvm_x86_init_ops *ops =3D opaque;
+	if (ops->post_hardware_enable_setup)
+		return ops->post_hardware_enable_setup();
+	return 0;
+}
+
 void kvm_arch_hardware_unsetup(void)
 {
 	kvm_unregister_perf_callbacks();
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1aead3921a16..55dd08cca5d2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1441,6 +1441,7 @@ void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vc=
pu, struct dentry *debugfs_
 int kvm_arch_hardware_enable(void);
 void kvm_arch_hardware_disable(void);
 int kvm_arch_hardware_setup(void *opaque);
+int kvm_arch_post_hardware_enable_setup(void *opaque);
 void kvm_arch_hardware_unsetup(void);
 int kvm_arch_check_processor_compat(void);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6b9dbc55af9e..6edce5de54ff 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5686,6 +5686,11 @@ void kvm_unregister_perf_callbacks(void)
 }
 #endif
=20
+__weak int kvm_arch_post_hardware_enable_setup(void *opaque)
+{
+	return 0;
+}
+
 int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
 		  struct module *module)
 {
@@ -5720,6 +5725,12 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsig=
ned vcpu_align,
 	r =3D hardware_enable_all();
 	if (r)
 		goto out_free_2;
+	/*
+	 * Arch specific initialization that requires to enable virtualization
+	 * feature.  e.g. TDX module initialization requires VMXON on all
+	 * present CPUs.
+	 */
+	kvm_arch_post_hardware_enable_setup(opaque);
 	hardware_disable_all();
=20
 	r =3D cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting",
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 46153C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:18:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383326AbiEESVo (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:21:44 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36324 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383096AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCFFC15822;
        Thu,  5 May 2022 11:15:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774545; x=1683310545;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ipau9dS5D6zvXjQ3tr/rTVE69FF+JZrbVTsRkhZ0HrM=;
  b=aLzY3uPeTM8nIOfwalUQm9abH+aFQIH8XHqsy1PbRRidMmnQXXJZhQqG
   nmB26K5Q4j04sqoBIA4ERBspXsvXNypzU2C8UdynuLY7qqpyxdTDTQnNw
   pyZxCngo1FMMKF3qmd4spP0jExwCPQoj2ArYNefLlKEI5if+kJePTeAsP
   czuPyjUt9BqBWicPpNNmX+JoB+ZrhJnwtYSUOkg56sYfhmqT9bwEhA8zc
   7SzhIe1MV4xqnEvC5TH23L6l7xTIyUMDgT6/IL55e4Km3zsSfmfUpSq9G
   UpViWw5tGtCJzfIusoAnnVzZnu6t3DnFS9TA1D//01p63I3t7wykI5YRF
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746233"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746233"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083166"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 012/104] KVM: x86: Introduce vm_type to differentiate
 default VMs from confidential VMs
Date: Thu,  5 May 2022 11:14:06 -0700
Message-Id: 
 <7b355e5b431d42dc4b494dd96f0e941ff1fb0094.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Unlike default VMs, confidential VMs (Intel TDX and AMD SEV-ES) don't allow
some operations (e.g., memory read/write, register state access, etc).

Introduce vm_type to track the type of the VM to x86 KVM.  Other arch KVMs
already use vm_type, KVM_INIT_VM accepts vm_type, and x86 KVM callback
vm_init accepts vm_type.  So follow them.  Further, a different policy can
be made based on vm_type.  Define KVM_X86_DEFAULT_VM for default VM as
default and define KVM_X86_TDX_VM for Intel TDX VM.  The wrapper function
will be defined as "bool is_td(kvm) { return vm_type =3D=3D VM_TYPE_TDX; }"

Add a capability KVM_CAP_VM_TYPES to effectively allow device model,
e.g. qemu, to query what VM types are supported by KVM.  This (introduce a
new capability and add vm_type) is chosen to align with other arch KVMs
that have VM types already.  Other arch KVMs uses different name to query
supported vm types and there is no common name for it, so new name was
chosen.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/api.rst        | 21 +++++++++++++++++++++
 arch/x86/include/asm/kvm-x86-ops.h    |  1 +
 arch/x86/include/asm/kvm_host.h       |  2 ++
 arch/x86/include/uapi/asm/kvm.h       |  3 +++
 arch/x86/kvm/svm/svm.c                |  6 ++++++
 arch/x86/kvm/vmx/main.c               |  1 +
 arch/x86/kvm/vmx/tdx.h                |  6 +-----
 arch/x86/kvm/vmx/vmx.c                |  5 +++++
 arch/x86/kvm/vmx/x86_ops.h            |  1 +
 arch/x86/kvm/x86.c                    |  9 ++++++++-
 include/uapi/linux/kvm.h              |  1 +
 tools/arch/x86/include/uapi/asm/kvm.h |  3 +++
 tools/include/uapi/linux/kvm.h        |  1 +
 13 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index ae19a1da71f4..7fa6850f1e81 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -147,10 +147,31 @@ described as 'basic' will be available.
 The new VM has no virtual cpus and no memory.
 You probably want to use 0 as machine type.
=20
+X86:
+^^^^
+
+Supported vm type can be queried from KVM_CAP_VM_TYPES, which returns the
+bitmap of supported vm types. The 1-setting of bit @n means vm type with
+value @n is supported.
+
+S390:
+^^^^^
+
 In order to create user controlled virtual machines on S390, check
 KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
 privileged user (CAP_SYS_ADMIN).
=20
+MIPS:
+^^^^^
+
+To use hardware assisted virtualization on MIPS (VZ ASE) rather than
+the default trap & emulate implementation (which changes the virtual
+memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
+flag KVM_VM_MIPS_VZ.
+
+ARM64:
+^^^^^^
+
 On arm64, the physical address size for a VM (IPA Size limit) is limited
 to 40bits by default. The limit can be configured if the host supports the
 extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 75bc44aa8d51..a97cdb203a16 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -19,6 +19,7 @@ KVM_X86_OP(hardware_disable)
 KVM_X86_OP(hardware_unsetup)
 KVM_X86_OP(has_emulated_msr)
 KVM_X86_OP(vcpu_after_set_cpuid)
+KVM_X86_OP(is_vm_type_supported)
 KVM_X86_OP(vm_init)
 KVM_X86_OP_OPTIONAL(vm_destroy)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 24c15cfe6c32..d1c6c529d52a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1058,6 +1058,7 @@ enum kvm_apicv_inhibit {
 };
=20
 struct kvm_arch {
+	unsigned long vm_type;
 	unsigned long n_used_mmu_pages;
 	unsigned long n_requested_mmu_pages;
 	unsigned long n_max_mmu_pages;
@@ -1338,6 +1339,7 @@ struct kvm_x86_ops {
 	bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
 	void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
=20
+	bool (*is_vm_type_supported)(unsigned long vm_type);
 	unsigned int vm_size;
 	int (*vm_init)(struct kvm *kvm);
 	void (*vm_destroy)(struct kvm *kvm);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 21614807a2cb..1c3b97f403f4 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -526,4 +526,7 @@ struct kvm_pmu_event_filter {
 #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TS=
C) */
 #define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
=20
+#define KVM_X86_DEFAULT_VM	0
+#define KVM_X86_TDX_VM		1
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d977e4ad133d..833eb557dee7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4585,6 +4585,11 @@ static void svm_vm_destroy(struct kvm *kvm)
 	sev_vm_destroy(kvm);
 }
=20
+static bool svm_is_vm_type_supported(unsigned long type)
+{
+	return type =3D=3D KVM_X86_DEFAULT_VM;
+}
+
 static int svm_vm_init(struct kvm *kvm)
 {
 	if (!pause_filter_count || !pause_filter_thresh)
@@ -4612,6 +4617,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata =3D {
 	.vcpu_free =3D svm_vcpu_free,
 	.vcpu_reset =3D svm_vcpu_reset,
=20
+	.is_vm_type_supported =3D svm_is_vm_type_supported,
 	.vm_size =3D sizeof(struct kvm_svm),
 	.vm_init =3D svm_vm_init,
 	.vm_destroy =3D svm_vm_destroy,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index ac788af17d92..7be4941e4c4d 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -43,6 +43,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.hardware_disable =3D vmx_hardware_disable,
 	.has_emulated_msr =3D vmx_has_emulated_msr,
=20
+	.is_vm_type_supported =3D vmx_is_vm_type_supported,
 	.vm_size =3D sizeof(struct kvm_vmx),
 	.vm_init =3D vmx_vm_init,
 	.vm_destroy =3D vmx_vm_destroy,
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 54d7a26ed9ee..2f43db5bbefb 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -17,11 +17,7 @@ struct vcpu_tdx {
=20
 static inline bool is_td(struct kvm *kvm)
 {
-	/*
-	 * TDX VM type isn't defined yet.
-	 * return kvm->arch.vm_type =3D=3D KVM_X86_TDX_VM;
-	 */
-	return false;
+	return kvm->arch.vm_type =3D=3D KVM_X86_TDX_VM;
 }
=20
 static inline bool is_td_vcpu(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b5846e0fc78f..5312aa6339b3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7173,6 +7173,11 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 	return err;
 }
=20
+bool vmx_is_vm_type_supported(unsigned long type)
+{
+	return type =3D=3D KVM_X86_DEFAULT_VM;
+}
+
 #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible.=
 See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/h=
w-vuln/l1tf.html for details.\n"
 #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation d=
isabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/d=
oc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
=20
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index e28f4e0653b8..544ae23998f7 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -25,6 +25,7 @@ void vmx_hardware_unsetup(void);
 int vmx_check_processor_compatibility(void);
 int vmx_hardware_enable(void);
 void vmx_hardware_disable(void);
+bool vmx_is_vm_type_supported(unsigned long type);
 int vmx_vm_init(struct kvm *kvm);
 void vmx_vm_destroy(struct kvm *kvm);
 int vmx_vcpu_precreate(struct kvm *kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bf77a8b64647..7233ce67ae1d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4389,6 +4389,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lo=
ng ext)
 	case KVM_CAP_DISABLE_QUIRKS2:
 		r =3D KVM_X86_VALID_QUIRKS;
 		break;
+	case KVM_CAP_VM_TYPES:
+		r =3D BIT(KVM_X86_DEFAULT_VM);
+		if (static_call(kvm_x86_is_vm_type_supported)(KVM_X86_TDX_VM))
+			r |=3D BIT(KVM_X86_TDX_VM);
+		break;
 	default:
 		break;
 	}
@@ -11791,9 +11796,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned lon=
g type)
 	int ret;
 	unsigned long flags;
=20
-	if (type)
+	if (!static_call(kvm_x86_is_vm_type_supported)(type))
 		return -EINVAL;
=20
+	kvm->arch.vm_type =3D type;
+
 	ret =3D kvm_page_track_init(kvm);
 	if (ret)
 		goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index dc5837e7bb40..9a3fd7b41fc5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1154,6 +1154,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DISABLE_QUIRKS2 213
 #define KVM_CAP_VM_TSC_CONTROL 214
 #define KVM_CAP_SYSTEM_EVENT_DATA 215
+#define KVM_CAP_VM_TYPES 216
=20
 #ifdef KVM_CAP_IRQ_ROUTING
=20
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index bf6e96011dfe..71a5851475e7 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -525,4 +525,7 @@ struct kvm_pmu_event_filter {
 #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TS=
C) */
 #define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
=20
+#define KVM_X86_DEFAULT_VM	0
+#define KVM_X86_TDX_VM		1
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index 91a6fe4e02c0..110d5822f8b2 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -1144,6 +1144,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_MEM_OP_EXTENSION 211
 #define KVM_CAP_PMU_CAPABILITY 212
 #define KVM_CAP_DISABLE_QUIRKS2 213
+#define KVM_CAP_VM_TYPES 216
=20
 #ifdef KVM_CAP_IRQ_ROUTING
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 20666C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:18:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383479AbiEESVx (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:21:53 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36396 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383106AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C86C915837;
        Thu,  5 May 2022 11:15:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774545; x=1683310545;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=wc7k9mBOhm2yjTFK7ZxMkJGA1JBjjIRE6jOaW+UZtN4=;
  b=LVp5lEU8dxCQDmu73smJUEWuvFRNjL6hcyuQJJU5mUiEjbb7anyatK42
   7JWhlIxNXNGed1lb3YSFueWbxgFIJal6YCtPMIftbSV032Y6bsL6Xgn9D
   OoFkJ1QPFfHNV8RUgh4JXUYe3e4pDDxtSuZbDaTentQL+27etxK7QLceA
   rtVmiR+xrDcGxxxSuysrVdoplIpaHqqqUj5YLtY/CCHwX8B4ozg6j+Cyx
   /AdTW/Hiu4/Un5LRI98Ap3X6yIBB/eabbCH9rkM9oXzP68W3a/q2Xwf6Q
   uvKH8/Wbp79ABnI35xCPruRn9/Df3d6R/1Pxg0vywAMaTXW+UzCNWdL5Z
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746235"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746235"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083170"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 013/104] KVM: TDX: Make TDX VM type supported
Date: Thu,  5 May 2022 11:14:07 -0700
Message-Id: 
 <f371c6545dc9040ce1aa3b4fcd5f84a6ef47c4e7.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

NOTE: This patch is in position of the patch series for developers to be
able to test codes during the middle of the patch series although this
patch series doesn't provide functional features until the all the patches
of this patch series.  When merging this patch series, this patch can be
moved to the end.

As first step TDX VM support, return that TDX VM type supported to device
model, e.g. qemu.  The callback to create guest TD is vm_init callback for
KVM_CREATE_VM.  Add a place holder function and call a function to
initialize TDX module on demand because in that callback VMX is enabled by
hardware_enable callback (vmx_hardware_enable).

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 18 ++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     |  6 ++++++
 arch/x86/kvm/vmx/vmx.c     |  5 -----
 arch/x86/kvm/vmx/x86_ops.h |  3 ++-
 4 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 7be4941e4c4d..47bfa94e538e 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -10,6 +10,12 @@
 static bool __read_mostly enable_tdx =3D IS_ENABLED(CONFIG_INTEL_TDX_HOST);
 module_param_named(tdx, enable_tdx, bool, 0444);
=20
+static bool vt_is_vm_type_supported(unsigned long type)
+{
+	return type =3D=3D KVM_X86_DEFAULT_VM ||
+		(enable_tdx && tdx_is_vm_type_supported(type));
+}
+
 static __init int vt_hardware_setup(void)
 {
 	int ret;
@@ -33,6 +39,14 @@ static int __init vt_post_hardware_enable_setup(void)
 	return 0;
 }
=20
+static int vt_vm_init(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return -EOPNOTSUPP;	/* Not ready to create guest TD yet. */
+
+	return vmx_vm_init(kvm);
+}
+
 struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
=20
@@ -43,9 +57,9 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.hardware_disable =3D vmx_hardware_disable,
 	.has_emulated_msr =3D vmx_has_emulated_msr,
=20
-	.is_vm_type_supported =3D vmx_is_vm_type_supported,
+	.is_vm_type_supported =3D vt_is_vm_type_supported,
 	.vm_size =3D sizeof(struct kvm_vmx),
-	.vm_init =3D vmx_vm_init,
+	.vm_init =3D vt_vm_init,
 	.vm_destroy =3D vmx_vm_destroy,
=20
 	.vcpu_precreate =3D vmx_vcpu_precreate,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 34aac9b5d43e..99bda653ef35 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -79,6 +79,12 @@ int __init tdx_module_setup(void)
 	return 0;
 }
=20
+bool tdx_is_vm_type_supported(unsigned long type)
+{
+	/* enable_tdx check is done by the caller. */
+	return type =3D=3D KVM_X86_TDX_VM;
+}
+
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops)
 {
 	u32 max_pa;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5312aa6339b3..b5846e0fc78f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7173,11 +7173,6 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 	return err;
 }
=20
-bool vmx_is_vm_type_supported(unsigned long type)
-{
-	return type =3D=3D KVM_X86_DEFAULT_VM;
-}
-
 #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible.=
 See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/h=
w-vuln/l1tf.html for details.\n"
 #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation d=
isabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/d=
oc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
=20
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 544ae23998f7..9d8e8abac6d7 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -25,7 +25,6 @@ void vmx_hardware_unsetup(void);
 int vmx_check_processor_compatibility(void);
 int vmx_hardware_enable(void);
 void vmx_hardware_disable(void);
-bool vmx_is_vm_type_supported(unsigned long type);
 int vmx_vm_init(struct kvm *kvm);
 void vmx_vm_destroy(struct kvm *kvm);
 int vmx_vcpu_precreate(struct kvm *kvm);
@@ -131,8 +130,10 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu);
=20
 #ifdef CONFIG_INTEL_TDX_HOST
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
+bool tdx_is_vm_type_supported(unsigned long type);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
+static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1DDB5C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:16:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1354421AbiEESUa (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:20:30 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36322 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383102AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C82D815829;
        Thu,  5 May 2022 11:15:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774545; x=1683310545;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Wk7ogtc5Ub+4u4KfteyYsotpfo7mJ/YCVxSbhwLcn7s=;
  b=IZ07dTiu9q0tl2oc2Bek39VlXLVgD+r5z8PQOY8yA4SXI7OpJTV2gaak
   rQY3xbc8SiWyEkyS193VAuUxC0HWvsD2Ywt+jjBpyPcOuEsjwiciVjbz5
   1WFoAq/8R8QGi60vXe03FKGvSSBP99f+DBDY84wkQ0RHJ85CpRc2IyjmY
   JfAhZYezDM/h/13GH4Yq4DBKSghIiRp98vc+F+awP+tNtVSn3MJTkzM/r
   /OiCex286iGP9Em9YfKDfn7eA+mFoGumhewFx5P8VJzeqmZho3FNhxms9
   7Upw2fc42sZy+FXDo6JtYYFRryKkER70ZradMpRIPuUB0U+A7YpVhPL/E
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746236"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746236"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083173"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 014/104] [MARKER] The start of TDX KVM patch series:
 TDX architectural definitions
Date: Thu,  5 May 2022 11:14:08 -0700
Message-Id: 
 <fd909cd4cdd3dda2f080935e77d8b88e73452252.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TDX architectural
definitions.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 .../virt/kvm/intel-tdx-layer-status.rst       | 29 +++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100644 Documentation/virt/kvm/intel-tdx-layer-status.rst

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
new file mode 100644
index 000000000000..b7a14bc73853
--- /dev/null
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -0,0 +1,29 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+Intel Trust Dodmain Extensions(TDX)
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Layer status
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+What qemu can do
+----------------
+- TDX VM TYPE is exposed to Qemu.
+- Qemu can try to create VM of TDX VM type and then fails.
+
+Patch Layer status
+------------------
+  Patch layer                          Status
+* TDX, VMX coexistence:                 Applied
+* TDX architectural definitions:        Applying
+* TD VM creation/destruction:           Not yet
+* TD vcpu creation/destruction:         Not yet
+* TDX EPT violation:                    Not yet
+* TD finalization:                      Not yet
+* TD vcpu enter/exit:                   Not yet
+* TD vcpu interrupts/exit/hypercall:    Not yet
+
+* KVM MMU GPA shared bits:              Not yet
+* KVM TDP refactoring for TDX:          Not yet
+* KVM TDP MMU hooks:                    Not yet
+* KVM TDP MMU MapGPA:                   Not yet
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D2CDAC433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:17:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383277AbiEESUu (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:20:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36392 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383104AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FEE015A09;
        Thu,  5 May 2022 11:15:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774547; x=1683310547;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=hCnMuBz/aWLrv0xY1KyhBL7QqdXE6/57a1wmJVcYfBw=;
  b=NE6EI6C7VGZaohlz/TwX4E8l+B9a7nfcXoIU954skG4zJkA9QfDKu4Qj
   K1VhtpC1BI+TXujEZFEsR/ie6NGbTr/qHPcDyls5DoAjIcIld4wUvVj2k
   Ibxw8m6lkWmfQ3za3/MiVUApwkXaw0RPOZV5TXCcvFhSLNg7N3RI6+K9n
   /lVIcTeVDKDtXtIe8veVVDmotlVLZlya7wqiUS08+lpSKWKubKAvetIk5
   d1McuzImV71bavdNMadXN8RHEepeL6JDE76wo+iJLLEVoShjhP4VIk5D5
   UvMZ78V5P3zgA4seLnzRz/pa36gyXKBRnHzaqtQeEbTvKYcvC8OUU2XM0
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746239"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746239"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083176"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 015/104] KVM: TDX: Define TDX architectural definitions
Date: Thu,  5 May 2022 11:14:09 -0700
Message-Id: 
 <2653916aee0b2b58979b8aae3815f4e73251ef65.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Define architectural definitions for KVM to issue the TDX SEAMCALLs.

Structures and values that are architecturally defined in the TDX module
specifications the chapter of ABI Reference.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx_arch.h | 157 ++++++++++++++++++++++++++++++++++++
 1 file changed, 157 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_arch.h

diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
new file mode 100644
index 000000000000..94258056d742
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_arch.h
@@ -0,0 +1,157 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* architectural constants/data definitions for TDX SEAMCALLs */
+
+#ifndef __KVM_X86_TDX_ARCH_H
+#define __KVM_X86_TDX_ARCH_H
+
+#include <linux/types.h>
+
+/*
+ * TDX SEAMCALL API function leaves
+ */
+#define TDH_VP_ENTER			0
+#define TDH_MNG_ADDCX			1
+#define TDH_MEM_PAGE_ADD		2
+#define TDH_MEM_SEPT_ADD		3
+#define TDH_VP_ADDCX			4
+#define TDH_MEM_PAGE_RELOCATE		5
+#define TDH_MEM_PAGE_AUG		6
+#define TDH_MEM_RANGE_BLOCK		7
+#define TDH_MNG_KEY_CONFIG		8
+#define TDH_MNG_CREATE			9
+#define TDH_VP_CREATE			10
+#define TDH_MNG_RD			11
+#define TDH_MR_EXTEND			16
+#define TDH_MR_FINALIZE			17
+#define TDH_VP_FLUSH			18
+#define TDH_MNG_VPFLUSHDONE		19
+#define TDH_MNG_KEY_FREEID		20
+#define TDH_MNG_INIT			21
+#define TDH_VP_INIT			22
+#define TDH_VP_RD			26
+#define TDH_MNG_KEY_RECLAIMID		27
+#define TDH_PHYMEM_PAGE_RECLAIM		28
+#define TDH_MEM_PAGE_REMOVE		29
+#define TDH_MEM_SEPT_REMOVE		30
+#define TDH_MEM_TRACK			38
+#define TDH_MEM_RANGE_UNBLOCK		39
+#define TDH_PHYMEM_CACHE_WB		40
+#define TDH_PHYMEM_PAGE_WBINVD		41
+#define TDH_VP_WR			43
+#define TDH_SYS_LP_SHUTDOWN		44
+
+#define TDG_VP_VMCALL_GET_TD_VM_CALL_INFO		0x10000
+#define TDG_VP_VMCALL_MAP_GPA				0x10001
+#define TDG_VP_VMCALL_GET_QUOTE				0x10002
+#define TDG_VP_VMCALL_REPORT_FATAL_ERROR		0x10003
+#define TDG_VP_VMCALL_SETUP_EVENT_NOTIFY_INTERRUPT	0x10004
+
+/* TDX control structure (TDR/TDCS/TDVPS) field access codes */
+#define TDX_NON_ARCH			BIT_ULL(63)
+#define TDX_CLASS_SHIFT			56
+#define TDX_FIELD_MASK			GENMASK_ULL(31, 0)
+
+#define __BUILD_TDX_FIELD(non_arch, class, field)	\
+	(((non_arch) ? TDX_NON_ARCH : 0) |		\
+	 ((u64)(class) << TDX_CLASS_SHIFT) |		\
+	 ((u64)(field) & TDX_FIELD_MASK))
+
+#define BUILD_TDX_FIELD(class, field)			\
+	__BUILD_TDX_FIELD(false, (class), (field))
+
+#define BUILD_TDX_FIELD_NON_ARCH(class, field)		\
+	__BUILD_TDX_FIELD(true, (class), (field))
+
+
+/* @field is the VMCS field encoding */
+#define TDVPS_VMCS(field)		BUILD_TDX_FIELD(0, (field))
+
+enum tdx_guest_other_state {
+	TD_VCPU_STATE_DETAILS_NON_ARCH =3D 0x100,
+};
+
+union tdx_vcpu_state_details {
+	struct {
+		u64 vmxip	: 1;
+		u64 reserved	: 63;
+	};
+	u64 full;
+};
+
+/* @field is any of enum tdx_guest_other_state */
+#define TDVPS_STATE(field)		BUILD_TDX_FIELD(17, (field))
+#define TDVPS_STATE_NON_ARCH(field)	BUILD_TDX_FIELD_NON_ARCH(17, (field))
+
+/* Management class fields */
+enum tdx_guest_management {
+	TD_VCPU_PEND_NMI =3D 11,
+};
+
+/* @field is any of enum tdx_guest_management */
+#define TDVPS_MANAGEMENT(field)		BUILD_TDX_FIELD(32, (field))
+
+enum tdx_tdcs_execution_control {
+	TD_TDCS_EXEC_TSC_OFFSET =3D 10,
+};
+
+/* @field is any of enum tdx_tdcs_execution_control */
+#define TDCS_EXEC(field)		BUILD_TDX_FIELD(17, (field))
+
+#define TDX_EXTENDMR_CHUNKSIZE		256
+
+struct tdx_cpuid_value {
+	u32 eax;
+	u32 ebx;
+	u32 ecx;
+	u32 edx;
+} __packed;
+
+#define TDX_TD_ATTRIBUTE_DEBUG		BIT_ULL(0)
+#define TDX_TD_ATTRIBUTE_PKS		BIT_ULL(30)
+#define TDX_TD_ATTRIBUTE_KL		BIT_ULL(31)
+#define TDX_TD_ATTRIBUTE_PERFMON	BIT_ULL(63)
+
+/*
+ * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is=
 1024B.
+ */
+struct td_params {
+	u64 attributes;
+	u64 xfam;
+	u32 max_vcpus;
+	u32 reserved0;
+
+	u64 eptp_controls;
+	u64 exec_controls;
+	u16 tsc_frequency;
+	u8  reserved1[38];
+
+	u64 mrconfigid[6];
+	u64 mrowner[6];
+	u64 mrownerconfig[6];
+	u64 reserved2[4];
+
+	union {
+		struct tdx_cpuid_value cpuid_values[0];
+		u8 reserved3[768];
+	};
+} __packed __aligned(1024);
+
+/*
+ * Guest uses MAX_PA for GPAW when set.
+ * 0: GPA.SHARED bit is GPA[47]
+ * 1: GPA.SHARED bit is GPA[51]
+ */
+#define TDX_EXEC_CONTROL_MAX_GPAW      BIT_ULL(0)
+
+/*
+ * TDX requires the frequency to be defined in units of 25MHz, which is the
+ * frequency of the core crystal clock on TDX-capable platforms, i.e. the =
TDX
+ * module can only program frequencies that are multiples of 25MHz.  The
+ * frequency must be between 100mhz and 10ghz (inclusive).
+ */
+#define TDX_TSC_KHZ_TO_25MHZ(tsc_in_khz)	((tsc_in_khz) / (25 * 1000))
+#define TDX_TSC_25MHZ_TO_KHZ(tsc_in_25mhz)	((tsc_in_25mhz) * (25 * 1000))
+#define TDX_MIN_TSC_FREQUENCY_KHZ		(100 * 1000)
+#define TDX_MAX_TSC_FREQUENCY_KHZ		(10 * 1000 * 1000)
+
+#endif /* __KVM_X86_TDX_ARCH_H */
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8F94BC433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:17:05 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383381AbiEESUl (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:20:41 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36398 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383105AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5910715A3F;
        Thu,  5 May 2022 11:15:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774547; x=1683310547;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Ammg3A2Mnovfv+rRVb0tQyZFWoZUYbJqVmLEz3yGWKI=;
  b=VcKxQDnSY0QLG1DkZyBUCiWPXOD6imv7XuqrPFR7RL5nqzdVzGpiEORU
   NqoJh8RbrI6Ca9AkPpN4KhcxW+v6G08wTbBYbC7h7Zf8UE9ca7OzFkNw/
   H7FrA9ZhZhZMdFpxERKDD7U+yPDuyk4isU8/0rD4Ar8Tqd1NAZ+2ze5hl
   YwhetPm3RSXmNgZ7TFWLJ9xQugldF4ysMqckvbc/axDvrV108D1iAoN0k
   rZ0jWT5lGzsZxyEuzp0NpIT14MJva4xU25VI3dFBG+/Iz1ceP6dGmPFCY
   WQHvlDjbSipajWxNb54/ZHLxEu2CuIqNHjtlUghJHGUpxDGbXS+oesh7+
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746240"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746240"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083182"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 016/104] KVM: TDX: Add TDX "architectural" error codes
Date: Thu,  5 May 2022 11:14:10 -0700
Message-Id: 
 <bd6c66c585e1b4aa906f7b6a7c5e48a15f2a0f18.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add error codes for the TDX SEAMCALLs both for TDX VMM side for TDH
SEAMCALL and TDX guest side for TDG.VP.VMCALL.  KVM issues the TDX
SEAMCALLs and checks its error code.  KVM handles hypercall from the TDX
guest and may return an error.  So error code for the TDX guest is also
needed.

TDX SEAMCALL uses bits 31:0 to return more information, so these error
codes will only exactly match RAX[63:32].  Error codes for TDG.VP.VMCALL is
defined by TDX Guest-Host-Communication interface spec.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx_errno.h | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_errno.h

diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
new file mode 100644
index 000000000000..5c878488795d
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_errno.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* architectural status code for SEAMCALL */
+
+#ifndef __KVM_X86_TDX_ERRNO_H
+#define __KVM_X86_TDX_ERRNO_H
+
+#define TDX_SEAMCALL_STATUS_MASK		0xFFFFFFFF00000000ULL
+
+/*
+ * TDX SEAMCALL Status Codes (returned in RAX)
+ */
+#define TDX_SUCCESS				0x0000000000000000ULL
+#define TDX_NON_RECOVERABLE_VCPU		0x4000000100000000ULL
+#define TDX_INTERRUPTED_RESUMABLE		0x8000000300000000ULL
+#define TDX_LIFECYCLE_STATE_INCORRECT		0xC000060700000000ULL
+#define TDX_VCPU_NOT_ASSOCIATED			0x8000070200000000ULL
+#define TDX_KEY_GENERATION_FAILED		0x8000080000000000ULL
+#define TDX_KEY_STATE_INCORRECT			0xC000081100000000ULL
+#define TDX_KEY_CONFIGURED			0x0000081500000000ULL
+#define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
+
+/*
+ * TDG.VP.VMCALL Status Codes (returned in R10)
+ */
+#define TDG_VP_VMCALL_SUCCESS			0x0000000000000000ULL
+#define TDG_VP_VMCALL_INVALID_OPERAND		0x8000000000000000ULL
+#define TDG_VP_VMCALL_TDREPORT_FAILED		0x8000000000000001ULL
+
+#endif /* __KVM_X86_TDX_ERRNO_H */
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 65D9FC433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:18:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235397AbiEESWH (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:22:07 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36394 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383113AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63A3015FD5;
        Thu,  5 May 2022 11:15:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774547; x=1683310547;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=amIDCEje2LpizIwCgnc/FPRrEbXu+uG3+NUYbq3D7fI=;
  b=E3R17JMDG7a6D+fgI6LY2/3T5nT5sv7IuXp1BolnLyrjGDrgk6AT+k0d
   CQZonc7z/SsuC6tbEKP8L5IQhQZKPKLtrKsZ3yGQmZWKPmic+wKYVVboq
   GWnwsu04leoty/VN6h+lQdIY3tn2Jp24SOxeEPtQ98hQAVqKE7L3EbHgK
   JdoVtUY8gtzZKI5/k5FbvC06L+omcBcSyQeqqkheMPrI0ojZE5IQml223
   p3fyhrqEePHA2lmUAhNAhAgRTQPNP1ZGoQ304ikxuThZPs+a3RZI7sPw0
   bJMH+6Q8ZJTyqcW9KOOkXc41v/C/J7cawO4CbQN98ybegosFEuv1fDnAm
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746241"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746241"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:42 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083185"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:41 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 017/104] KVM: TDX: Add C wrapper functions for
 SEAMCALLs to the TDX module
Date: Thu,  5 May 2022 11:14:11 -0700
Message-Id: 
 <b4cfd2e1b4daf91899a95ab3e2a4e2ea1d25773c.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

A VMM interacts with the TDX module using a new instruction (SEAMCALL).  A
TDX VMM uses SEAMCALLs where a VMX VMM would have directly interacted with
VMX instructions.  For instance, a TDX VMM does not have full access to the
VM control structure corresponding to VMX VMCS.  Instead, a VMM induces the
TDX module to act on behalf via SEAMCALLs.

Export __seamcall and define C wrapper functions for SEAMCALLs for
readability.  Some SEAMCALL APIs donates pages to TDX module or guest TD.
The pages are encrypted with TDX private host key id set in high bits of
physical address.  If any modified cache lines may exit for these pages,
flush them to memory by clflush_cache_range().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/tdx.h       |   2 +
 arch/x86/kvm/vmx/tdx_ops.h       | 185 +++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/seamcall.S |   1 +
 3 files changed, 188 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_ops.h

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 0670c86de015..3a4fb2844f66 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -149,6 +149,8 @@ int tdx_detect(void);
 int tdx_init(void);
 bool platform_has_tdx(void);
 const struct tdsysinfo_struct *tdx_get_sysinfo(void);
+u64 __seamcall(u64 op, u64 rcx, u64 rdx, u64 r8, u64 r9,
+	       struct tdx_module_output *out);
 #else
 static inline bool __seamrr_enabled(void) { return false; }
 static inline void tdx_detect_cpu(struct cpuinfo_x86 *c) { }
diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
new file mode 100644
index 000000000000..85adbf49c277
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -0,0 +1,185 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* constants/data definitions for TDX SEAMCALLs */
+
+#ifndef __KVM_X86_TDX_OPS_H
+#define __KVM_X86_TDX_OPS_H
+
+#include <linux/compiler.h>
+
+#include <asm/cacheflush.h>
+#include <asm/asm.h>
+#include <asm/kvm_host.h>
+
+#include "tdx_errno.h"
+#include "tdx_arch.h"
+
+#ifdef CONFIG_INTEL_TDX_HOST
+
+static inline u64 tdh_mng_addcx(hpa_t tdr, hpa_t addr)
+{
+	clflush_cache_range(__va(addr), PAGE_SIZE);
+	return __seamcall(TDH_MNG_ADDCX, addr, tdr, 0, 0, NULL);
+}
+
+static inline u64 tdh_mem_page_add(hpa_t tdr, gpa_t gpa, hpa_t hpa, hpa_t =
source,
+				   struct tdx_module_output *out)
+{
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	return __seamcall(TDH_MEM_PAGE_ADD, gpa, tdr, hpa, source, out);
+}
+
+static inline u64 tdh_mem_sept_add(hpa_t tdr, gpa_t gpa, int level, hpa_t =
page,
+				   struct tdx_module_output *out)
+{
+	clflush_cache_range(__va(page), PAGE_SIZE);
+	return __seamcall(TDH_MEM_SEPT_ADD, gpa | level, tdr, page, 0, out);
+}
+
+static inline u64 tdh_mem_sept_remove(hpa_t tdr, gpa_t gpa, int level,
+				      struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MEM_SEPT_REMOVE, gpa | level, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_vp_addcx(hpa_t tdvpr, hpa_t addr)
+{
+	clflush_cache_range(__va(addr), PAGE_SIZE);
+	return __seamcall(TDH_VP_ADDCX, addr, tdvpr, 0, 0, NULL);
+}
+
+static inline u64 tdh_mem_page_relocate(hpa_t tdr, gpa_t gpa, hpa_t hpa,
+					struct tdx_module_output *out)
+{
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	return __seamcall(TDH_MEM_PAGE_RELOCATE, gpa, tdr, hpa, 0, out);
+}
+
+static inline u64 tdh_mem_page_aug(hpa_t tdr, gpa_t gpa, hpa_t hpa,
+				   struct tdx_module_output *out)
+{
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	return __seamcall(TDH_MEM_PAGE_AUG, gpa, tdr, hpa, 0, out);
+}
+
+static inline u64 tdh_mem_range_block(hpa_t tdr, gpa_t gpa, int level,
+				      struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MEM_RANGE_BLOCK, gpa | level, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_mng_key_config(hpa_t tdr)
+{
+	return __seamcall(TDH_MNG_KEY_CONFIG, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_create(hpa_t tdr, int hkid)
+{
+	clflush_cache_range(__va(tdr), PAGE_SIZE);
+	return __seamcall(TDH_MNG_CREATE, tdr, hkid, 0, 0, NULL);
+}
+
+static inline u64 tdh_vp_create(hpa_t tdr, hpa_t tdvpr)
+{
+	clflush_cache_range(__va(tdvpr), PAGE_SIZE);
+	return __seamcall(TDH_VP_CREATE, tdvpr, tdr, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_rd(hpa_t tdr, u64 field, struct tdx_module_outpu=
t *out)
+{
+	return __seamcall(TDH_MNG_RD, tdr, field, 0, 0, out);
+}
+
+static inline u64 tdh_mr_extend(hpa_t tdr, gpa_t gpa,
+				struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MR_EXTEND, gpa, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_mr_finalize(hpa_t tdr)
+{
+	return __seamcall(TDH_MR_FINALIZE, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_vp_flush(hpa_t tdvpr)
+{
+	return __seamcall(TDH_VP_FLUSH, tdvpr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_vpflushdone(hpa_t tdr)
+{
+	return __seamcall(TDH_MNG_VPFLUSHDONE, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_key_freeid(hpa_t tdr)
+{
+	return __seamcall(TDH_MNG_KEY_FREEID, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_init(hpa_t tdr, hpa_t td_params,
+			       struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MNG_INIT, tdr, td_params, 0, 0, out);
+}
+
+static inline u64 tdh_vp_init(hpa_t tdvpr, u64 rcx)
+{
+	return __seamcall(TDH_VP_INIT, tdvpr, rcx, 0, 0, NULL);
+}
+
+static inline u64 tdh_vp_rd(hpa_t tdvpr, u64 field,
+			    struct tdx_module_output *out)
+{
+	return __seamcall(TDH_VP_RD, tdvpr, field, 0, 0, out);
+}
+
+static inline u64 tdh_mng_key_reclaimid(hpa_t tdr)
+{
+	return __seamcall(TDH_MNG_KEY_RECLAIMID, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_phymem_page_reclaim(hpa_t page,
+					  struct tdx_module_output *out)
+{
+	return __seamcall(TDH_PHYMEM_PAGE_RECLAIM, page, 0, 0, 0, out);
+}
+
+static inline u64 tdh_mem_page_remove(hpa_t tdr, gpa_t gpa, int level,
+				      struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MEM_PAGE_REMOVE, gpa | level, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_sys_lp_shutdown(void)
+{
+	return __seamcall(TDH_SYS_LP_SHUTDOWN, 0, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mem_track(hpa_t tdr)
+{
+	return __seamcall(TDH_MEM_TRACK, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mem_range_unblock(hpa_t tdr, gpa_t gpa, int level,
+					struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MEM_RANGE_UNBLOCK, gpa | level, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_phymem_cache_wb(bool resume)
+{
+	return __seamcall(TDH_PHYMEM_CACHE_WB, resume ? 1 : 0, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_phymem_page_wbinvd(hpa_t page)
+{
+	return __seamcall(TDH_PHYMEM_PAGE_WBINVD, page, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_vp_wr(hpa_t tdvpr, u64 field, u64 val, u64 mask,
+			    struct tdx_module_output *out)
+{
+	return __seamcall(TDH_VP_WR, tdvpr, field, val, mask, out);
+}
+#endif /* CONFIG_INTEL_TDX_HOST */
+
+#endif /* __KVM_X86_TDX_OPS_H */
diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamc=
all.S
index 8df7a16f7685..b4fc8182e1cf 100644
--- a/arch/x86/virt/vmx/tdx/seamcall.S
+++ b/arch/x86/virt/vmx/tdx/seamcall.S
@@ -50,3 +50,4 @@ SYM_FUNC_START(__seamcall)
 	FRAME_END
 	ret
 SYM_FUNC_END(__seamcall)
+EXPORT_SYMBOL_GPL(__seamcall)
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9C30DC433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:18:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S236351AbiEESV5 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:21:57 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36322 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383116AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9E1318E11;
        Thu,  5 May 2022 11:15:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774548; x=1683310548;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=5Wv/IPFVTRe4jyjKqgjJvRX4XujCf32I2mYt7uAyjxc=;
  b=LSaAo5qWfvnwSsvP1iKSjcm9P5eoebmLnJcYczlbGPV1hd14fwG4kOxv
   cHtF6AnEiQrxXcHtVlnhc5TAAS3Jl515Z9Zt4Cf9o2THrJiE8IBehr22l
   KYN8fLXVqk/5IpC9gotJucuwiay1p2ZyLihgzVzggEAkFheO07TzfVaDX
   FUtR1sIBDXMvLGyfAG752q8sG32wq8XUdjfb6t3zoyDKsF0CNhWcRXiFa
   fjhNIw5TaY4jeH2ll9V37JZZIjwEUNddPZIoYySmam0z/5csiuSqquy5Y
   gz1Q+YEhKE2BQQ0to03qN2N99lBB2xrPEs+agql8R1QtnEEAA+OR91/3n
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746246"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746246"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:42 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083188"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:42 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 018/104] KVM: TDX: Add helper functions to print TDX
 SEAMCALL error
Date: Thu,  5 May 2022 11:14:12 -0700
Message-Id: 
 <5171b6919ddf95e6b43c7d28b10caba61a8d83ff.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add helper functions to print out errors from the TDX module in a uniform
manner.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/Makefile        |  2 +-
 arch/x86/kvm/vmx/tdx_error.c | 22 ++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx_ops.h   |  3 +++
 3 files changed, 26 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kvm/vmx/tdx_error.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index e2c05195cb95..f1ad445df505 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -24,7 +24,7 @@ kvm-$(CONFIG_KVM_XEN)	+=3D xen.o
 kvm-intel-y		+=3D vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
 			   vmx/evmcs.o vmx/nested.o vmx/posted_intr.o vmx/main.o
 kvm-intel-$(CONFIG_X86_SGX_KVM)	+=3D vmx/sgx.o
-kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+=3D vmx/tdx.o
+kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+=3D vmx/tdx.o vmx/tdx_error.o
=20
 kvm-amd-y		+=3D svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o =
svm/sev.o
=20
diff --git a/arch/x86/kvm/vmx/tdx_error.c b/arch/x86/kvm/vmx/tdx_error.c
new file mode 100644
index 000000000000..61ed855d1188
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_error.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0
+/* functions to record TDX SEAMCALL error */
+
+#include <linux/kernel.h>
+#include <linux/bug.h>
+
+#include "tdx_ops.h"
+
+void pr_tdx_error(u64 op, u64 error_code, const struct tdx_module_output *=
out)
+{
+	if (!out) {
+		pr_err_ratelimited("SEAMCALL[%lld] failed: 0x%llx\n",
+				op, error_code);
+		return;
+	}
+
+	pr_err_ratelimited(
+		"SEAMCALL[%lld] failed: 0x%llx "
+		"RCX 0x%llx, RDX 0x%llx, R8 0x%llx, R9 0x%llx, R10 0x%llx, R11 0x%llx\n",
+		op, error_code,
+		out->rcx, out->rdx, out->r8, out->r9, out->r10, out->r11);
+}
diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
index 85adbf49c277..8cc2f01c509b 100644
--- a/arch/x86/kvm/vmx/tdx_ops.h
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -9,12 +9,15 @@
 #include <asm/cacheflush.h>
 #include <asm/asm.h>
 #include <asm/kvm_host.h>
+#include <asm/tdx.h>
=20
 #include "tdx_errno.h"
 #include "tdx_arch.h"
=20
 #ifdef CONFIG_INTEL_TDX_HOST
=20
+void pr_tdx_error(u64 op, u64 error_code, const struct tdx_module_output *=
out);
+
 static inline u64 tdh_mng_addcx(hpa_t tdr, hpa_t addr)
 {
 	clflush_cache_range(__va(addr), PAGE_SIZE);
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9846DC4332F
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:47 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384727AbiEES34 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:29:56 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36388 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383120AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6DE819003;
        Thu,  5 May 2022 11:15:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774548; x=1683310548;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=wbnROJIV0sPi9VRc1Kym0LCeIa3DweI4YQjfi8+X354=;
  b=mfxDZPEr94V/ZAnXY3mgcfZZzZoiwm4KPVcK3qJD9hpk8HvAvnWLx6aN
   voI/Quw4aGN4Le45RQS+yd15NCKmdJaNEZnY5t3GZDvaWhUNF0+KAK7HC
   +BnEUb/ON+r2sgRgleyKFkm44hPlzQ7jELqXmF6N2Y1hBBgfLqL/0lej4
   hSA7yjOdvnh8OknkEnxI5JhrzwDcDoZoIIHn7P/KrBWHGUQgnJLFpWjCp
   a+UF8W0Pf5HISGffTm84juXUFiDgBTgnGqnFPNsEMlMcNqSG9beL1L9Jw
   w0O775dfYRn0QNTBV2MGe2KL+3IRn2+jurlhB+KTYoNk+3XRy3kkO7XJ+
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746248"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746248"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:42 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083191"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:42 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 019/104] [MARKER] The start of TDX KVM patch series: TD
 VM creation/destruction
Date: Thu,  5 May 2022 11:14:13 -0700
Message-Id: 
 <daaf60d85584a303dd5e08820b5abdd24662ac11.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD VM
creation/destruction.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index b7a14bc73853..5e0deaebf843 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -15,8 +15,8 @@ Patch Layer status
 ------------------
   Patch layer                          Status
 * TDX, VMX coexistence:                 Applied
-* TDX architectural definitions:        Applying
-* TD VM creation/destruction:           Not yet
+* TDX architectural definitions:        Applied
+* TD VM creation/destruction:           Applying
 * TD vcpu creation/destruction:         Not yet
 * TDX EPT violation:                    Not yet
 * TD finalization:                      Not yet
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5BD0FC43219
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384650AbiEES3j (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:29:39 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36420 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383122AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEFF019027;
        Thu,  5 May 2022 11:15:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774548; x=1683310548;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=dkyGueQFkeyCNCcuHQ4HyYP9QzPEHcqXig5YrBx0PVE=;
  b=VcMhGHnDmw2P4dkWynuWfO8b5LgjzqYBxdCpPebvDwN8TVVG5zhaz/0z
   WlcTiNcuYNPZs8D+TN9o9dOPtGtEVor+NECm+CHeTQvdZghTf1XvbltiN
   xiQpn7WcidGoL65YvXX3SE1m7KVaiFAqU0dFqioxDup6SRbZ7dNY13hJ4
   xMdVFVTn9Fw0wdRfemFmxl04dm0ufB0OTqTgtLUicnybyfT2Pi1tN98LS
   imvqcg42my5K/zsA2tnquHMzbh0vk+ZZpcRn10UGqhL6CMSdzHHQb2ofE
   AUesfyhcIHenv7WLr7WyLdBaeqHydFiQ2vDeg5tLixNpaiwM+6rViB66C
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746249"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746249"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:42 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083194"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:42 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 020/104] KVM: TDX: Stub in tdx.h with structs,
 accessors, and VMCS helpers
Date: Thu,  5 May 2022 11:14:14 -0700
Message-Id: 
 <4b4e573dc15d50ce28645558b2610bf7a6effe5a.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Stub in kvm_tdx, vcpu_tdx, and their various accessors.  TDX defines
SEAMCALL APIs to access TDX control structures corresponding to the VMX
VMCS.  Introduce helper accessors to hide its SEAMCALL ABI details.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.h | 103 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 101 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 2f43db5bbefb..f50d37f3fc9c 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -3,16 +3,29 @@
 #define __KVM_X86_TDX_H
=20
 #ifdef CONFIG_INTEL_TDX_HOST
+
+#include "tdx_ops.h"
+
 int tdx_module_setup(void);
=20
+struct tdx_td_page {
+	unsigned long va;
+	hpa_t pa;
+	bool added;
+};
+
 struct kvm_tdx {
 	struct kvm kvm;
-	/* TDX specific members follow. */
+
+	struct tdx_td_page tdr;
+	struct tdx_td_page *tdcs;
 };
=20
 struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
-	/* TDX specific members follow. */
+
+	struct tdx_td_page tdvpr;
+	struct tdx_td_page *tdvpx;
 };
=20
 static inline bool is_td(struct kvm *kvm)
@@ -34,6 +47,92 @@ static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *v=
cpu)
 {
 	return container_of(vcpu, struct vcpu_tdx, vcpu);
 }
+
+static __always_inline void tdvps_vmcs_check(u32 field, u8 bits)
+{
+	BUILD_BUG_ON_MSG(__builtin_constant_p(field) && (field) & 0x1,
+			 "Read/Write to TD VMCS *_HIGH fields not supported");
+
+	BUILD_BUG_ON(bits !=3D 16 && bits !=3D 32 && bits !=3D 64);
+
+	BUILD_BUG_ON_MSG(bits !=3D 64 && __builtin_constant_p(field) &&
+			 (((field) & 0x6000) =3D=3D 0x2000 ||
+			  ((field) & 0x6000) =3D=3D 0x6000),
+			 "Invalid TD VMCS access for 64-bit field");
+	BUILD_BUG_ON_MSG(bits !=3D 32 && __builtin_constant_p(field) &&
+			 ((field) & 0x6000) =3D=3D 0x4000,
+			 "Invalid TD VMCS access for 32-bit field");
+	BUILD_BUG_ON_MSG(bits !=3D 16 && __builtin_constant_p(field) &&
+			 ((field) & 0x6000) =3D=3D 0x0000,
+			 "Invalid TD VMCS access for 16-bit field");
+}
+
+static __always_inline void tdvps_state_non_arch_check(u64 field, u8 bits)=
 {}
+static __always_inline void tdvps_management_check(u64 field, u8 bits) {}
+
+#define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass)				\
+static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *t=
dx,	\
+							u32 field)		\
+{										\
+	struct tdx_module_output out;						\
+	u64 err;								\
+										\
+	tdvps_##lclass##_check(field, bits);					\
+	err =3D tdh_vp_rd(tdx->tdvpr.pa, TDVPS_##uclass(field), &out);		\
+	if (unlikely(err)) {							\
+		pr_err("TDH_VP_RD["#uclass".0x%x] failed: 0x%llx\n",		\
+		       field, err);						\
+		return 0;							\
+	}									\
+	return (u##bits)out.r8;							\
+}										\
+static __always_inline void td_##lclass##_write##bits(struct vcpu_tdx *tdx=
,	\
+						      u32 field, u##bits val)	\
+{										\
+	struct tdx_module_output out;						\
+	u64 err;								\
+										\
+	tdvps_##lclass##_check(field, bits);					\
+	err =3D tdh_vp_wr(tdx->tdvpr.pa, TDVPS_##uclass(field), val,		\
+		      GENMASK_ULL(bits - 1, 0), &out);				\
+	if (unlikely(err))							\
+		pr_err("TDH_VP_WR["#uclass".0x%x] =3D 0x%llx failed: 0x%llx\n",	\
+		       field, (u64)val, err);					\
+}										\
+static __always_inline void td_##lclass##_setbit##bits(struct vcpu_tdx *td=
x,	\
+						       u32 field, u64 bit)	\
+{										\
+	struct tdx_module_output out;						\
+	u64 err;								\
+										\
+	tdvps_##lclass##_check(field, bits);					\
+	err =3D tdh_vp_wr(tdx->tdvpr.pa, TDVPS_##uclass(field), bit, bit,		\
+			&out);							\
+	if (unlikely(err))							\
+		pr_err("TDH_VP_WR["#uclass".0x%x] |=3D 0x%llx failed: 0x%llx\n",	\
+		       field, bit, err);					\
+}										\
+static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *=
tdx,	\
+							 u32 field, u64 bit)	\
+{										\
+	struct tdx_module_output out;						\
+	u64 err;								\
+										\
+	tdvps_##lclass##_check(field, bits);					\
+	err =3D tdh_vp_wr(tdx->tdvpr.pa, TDVPS_##uclass(field), 0, bit,		\
+			&out);							\
+	if (unlikely(err))							\
+		pr_err("TDH_VP_WR["#uclass".0x%x] &=3D ~0x%llx failed: 0x%llx\n",	\
+		       field, bit,  err);					\
+}
+
+TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs);
+TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs);
+TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs);
+
+TDX_BUILD_TDVPS_ACCESSORS(64, STATE_NON_ARCH, state_non_arch);
+TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management);
+
 #else
 static inline int tdx_module_setup(void) { return -ENODEV; };
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 25580C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384367AbiEES3S (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:29:18 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36396 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383173AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D682819C33;
        Thu,  5 May 2022 11:15:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774549; x=1683310549;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ApxzvG9f2c2buqwxf9g/IY+QsPLdO7qoBpdMaSG3x0c=;
  b=kJYkWjbPjcQRcjLVIyMVwjS5dmLk3KGahqqYodr1btfC6P/0zZcHW3uL
   Zkmup/WEH72NC9lSGJ45/B57m7+fTV0XRvZS98iB8Km3y4MLlZVzOXn/L
   Q508/lo6/EBR1PeR2yLa7KMH3m5ZEvH61KkNU0IgABCmf3LGor6JGDI89
   CJqqfBy1ufAVdLiaGn5q6jPxEeim0pVviZPVCQjXRwFZCBAplGYJ9LKHp
   3b4y0/Yd8VRZTBWLHe9WhW92EsSr1PrZH7Crrwf6wVEiSb/MwbjpcD2mt
   Qb0aDO9CLeLf8DCvdxmn+KeyxrnfHteuzIjM0Q4xghcwCmzdMFxt9IIjv
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746251"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746251"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:42 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083197"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:42 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 021/104] x86/cpu: Add helper functions to allocate/free
 TDX private host key id
Date: Thu,  5 May 2022 11:14:15 -0700
Message-Id: 
 <37cbab674de44e0563f81cbe147216a9031a1704.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX private host key id is assigned to guest TD.  The memory controller
encrypts guest TD memory with the assigned TDX private host key id (HIKD).
Add helper functions to allocate/free TDX private host key id so that TDX
KVM manage it.

Also export the global TDX private host key id that is used to encrypt TDX
module, its memory and some dynamic data (TDR).  When VMM releasing
encrypted page to reuse it, the page needs to be flushed with the used host
key id.  VMM needs the global TDX private host key id to flush such pages
TDX module accesses with the global TDX private host key id.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/tdx.h  |  7 +++++++
 arch/x86/virt/vmx/tdx/tdx.c | 33 ++++++++++++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 3a4fb2844f66..7da43ed0e216 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -149,6 +149,10 @@ int tdx_detect(void);
 int tdx_init(void);
 bool platform_has_tdx(void);
 const struct tdsysinfo_struct *tdx_get_sysinfo(void);
+u32 tdx_get_global_keyid(void);
+int tdx_keyid_alloc(void);
+void tdx_keyid_free(int keyid);
+
 u64 __seamcall(u64 op, u64 rcx, u64 rdx, u64 r8, u64 r9,
 	       struct tdx_module_output *out);
 #else
@@ -159,6 +163,9 @@ static inline int tdx_init(void) { return -ENODEV; }
 static inline bool platform_has_tdx(void) { return false; }
 struct tdsysinfo_struct;
 static inline const struct tdsysinfo_struct *tdx_get_sysinfo(void) { retur=
n NULL; }
+static inline u32 tdx_get_global_keyid(void) { return 0; };
+static inline int tdx_keyid_alloc(void) { return -EOPNOTSUPP; }
+static inline void tdx_keyid_free(int keyid) { }
 #endif /* CONFIG_INTEL_TDX_HOST */
=20
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 1ef22c445126..799a26e56f11 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -114,7 +114,13 @@ static int tdx_cmr_num;
 static struct tdsysinfo_struct tdx_sysinfo;
=20
 /* TDX global KeyID to protect TDX metadata */
-static u32 tdx_global_keyid;
+static u32 __read_mostly tdx_global_keyid;
+
+u32 tdx_get_global_keyid(void)
+{
+	return tdx_global_keyid;
+}
+EXPORT_SYMBOL_GPL(tdx_get_global_keyid);
=20
 static bool enable_tdx_host;
=20
@@ -191,6 +197,31 @@ static void detect_seam(struct cpuinfo_x86 *c)
 		detect_seam_ap(c);
 }
=20
+/* TDX KeyID pool */
+static DEFINE_IDA(tdx_keyid_pool);
+
+int tdx_keyid_alloc(void)
+{
+	if (WARN_ON_ONCE(!tdx_keyid_start || !tdx_keyid_num))
+		return -EINVAL;
+
+	/* The first keyID is reserved for the global key. */
+	return ida_alloc_range(&tdx_keyid_pool, tdx_keyid_start + 1,
+			       tdx_keyid_start + tdx_keyid_num - 1,
+			       GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(tdx_keyid_alloc);
+
+void tdx_keyid_free(int keyid)
+{
+	/* keyid =3D 0 is reserved. */
+	if (!keyid || keyid <=3D 0)
+		return;
+
+	ida_free(&tdx_keyid_pool, keyid);
+}
+EXPORT_SYMBOL_GPL(tdx_keyid_free);
+
 static void detect_tdx_keyids_bsp(struct cpuinfo_x86 *c)
 {
 	u64 keyid_part;
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E818BC433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:18:57 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383667AbiEESWd (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:22:33 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36320 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383175AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34A2C11A18;
        Thu,  5 May 2022 11:15:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774550; x=1683310550;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=VbnyeacpaoB1tFAr8Qpd/AH+ID/wvxD/LGbmdMtyQCM=;
  b=UHfJ8jpr0IIGLgXsYu2T1jFLS6hfD7Xwu3LY7ToMmrryWXGhW2YTloHH
   r9tkT57PmNenYVbEwUE8YMzmjGn7VJ+/xlWWI6D+62j11syNcpFrCdmDk
   t+YrwjIxthAf+RU8JyFxqk45buG7R8b7gRr7PsWLq3OTuKeZZb2NHqw1M
   C16jth/TVC2Ck6O6zWwrkGF1JNj/vOJzOEpZQRjmFEFq4AKsWvrhvh8D2
   CKo/bAwR72uiw85ivGoNCYZNydzberOJ5pT2ynzZDsKp+fefxs/qtulKs
   od3ygTKaLoiVuOIFOdsRF6qNsyPypsfKxE8YuPiGPZTLrLUUXc2+SPdwe
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746253"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746253"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:43 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083201"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:42 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 022/104] KVM: TDX: create/destroy VM structure
Date: Thu,  5 May 2022 11:14:16 -0700
Message-Id: 
 <c1119d7482e4e40f4f64b9048cb9fe5babe142ab.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

As the first step to create TDX guest, create/destroy VM struct.  Assign
TDX private Host Key ID (HKID) to the TDX guest for memory encryption and
allocate extra pages for the TDX guest. On destruction, free allocated
pages, and HKID.

Before tearing down private page tables, TDX requires some resources of the
guest TD to be destroyed (i.e. keyID must have been reclaimed, etc).  Add
flush_shadow_all_private callback before tearing down private page tables
for it.

Add a second kvm_x86_ops hook in kvm_arch_destroy_vm() to support TDX's
destruction path, which needs to first put the VM into a teardown state,
then free per-vCPU resources, and finally free per-VM resources.

Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |   2 +
 arch/x86/include/asm/kvm_host.h    |   2 +
 arch/x86/kvm/vmx/main.c            |  34 ++-
 arch/x86/kvm/vmx/tdx.c             | 376 +++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h             |   2 +
 arch/x86/kvm/vmx/tdx_errno.h       |   2 +-
 arch/x86/kvm/vmx/x86_ops.h         |  11 +
 arch/x86/kvm/x86.c                 |   8 +
 8 files changed, 433 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index a97cdb203a16..fbb2c6746066 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -21,7 +21,9 @@ KVM_X86_OP(has_emulated_msr)
 KVM_X86_OP(vcpu_after_set_cpuid)
 KVM_X86_OP(is_vm_type_supported)
 KVM_X86_OP(vm_init)
+KVM_X86_OP_OPTIONAL(flush_shadow_all_private)
 KVM_X86_OP_OPTIONAL(vm_destroy)
+KVM_X86_OP_OPTIONAL(vm_free)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
 KVM_X86_OP(vcpu_create)
 KVM_X86_OP(vcpu_free)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index d1c6c529d52a..a6a5518ce0ca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1342,7 +1342,9 @@ struct kvm_x86_ops {
 	bool (*is_vm_type_supported)(unsigned long vm_type);
 	unsigned int vm_size;
 	int (*vm_init)(struct kvm *kvm);
+	void (*flush_shadow_all_private)(struct kvm *kvm);
 	void (*vm_destroy)(struct kvm *kvm);
+	void (*vm_free)(struct kvm *kvm);
=20
 	/* Create, but do not attach this VCPU */
 	int (*vcpu_precreate)(struct kvm *kvm);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 47bfa94e538e..6a93b19a8b06 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -39,18 +39,44 @@ static int __init vt_post_hardware_enable_setup(void)
 	return 0;
 }
=20
+static void vt_hardware_unsetup(void)
+{
+	tdx_hardware_unsetup();
+	vmx_hardware_unsetup();
+}
+
 static int vt_vm_init(struct kvm *kvm)
 {
 	if (is_td(kvm))
-		return -EOPNOTSUPP;	/* Not ready to create guest TD yet. */
+		return tdx_vm_init(kvm);
=20
 	return vmx_vm_init(kvm);
 }
=20
+static void vt_flush_shadow_all_private(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return tdx_mmu_release_hkid(kvm);
+}
+
+static void vt_vm_destroy(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return;
+
+	vmx_vm_destroy(kvm);
+}
+
+static void vt_vm_free(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return tdx_vm_free(kvm);
+}
+
 struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
=20
-	.hardware_unsetup =3D vmx_hardware_unsetup,
+	.hardware_unsetup =3D vt_hardware_unsetup,
 	.check_processor_compatibility =3D vmx_check_processor_compatibility,
=20
 	.hardware_enable =3D vmx_hardware_enable,
@@ -60,7 +86,9 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.is_vm_type_supported =3D vt_is_vm_type_supported,
 	.vm_size =3D sizeof(struct kvm_vmx),
 	.vm_init =3D vt_vm_init,
-	.vm_destroy =3D vmx_vm_destroy,
+	.flush_shadow_all_private =3D vt_flush_shadow_all_private,
+	.vm_destroy =3D vt_vm_destroy,
+	.vm_free =3D vt_vm_free,
=20
 	.vcpu_precreate =3D vmx_vcpu_precreate,
 	.vcpu_create =3D vmx_vcpu_create,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 99bda653ef35..ba0671cbf0a7 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -31,9 +31,367 @@ struct tdx_capabilities {
 	struct tdx_cpuid_config cpuid_configs[TDX_MAX_NR_CPUID_CONFIGS];
 };
=20
+/*
+ * Key id globally used by TDX module: TDX module maps TDR with this TDX g=
lobal
+ * key id.  TDR includes key id assigned to the TD.  Then TDX module maps =
other
+ * TD-related pages with the assigned key id.  TDR requires this TDX globa=
l key
+ * id for cache flush unlike other TD-related pages.
+ */
+static u32 tdx_global_keyid __read_mostly;
+
 /* Capabilities of KVM + the TDX module. */
 static struct tdx_capabilities tdx_caps;
=20
+/*
+ * Some TDX SEAMCALLs (TDH.MNG.CREATE, TDH.PHYMEM.CACHE.WB,
+ * TDH.MNG.KEY.RECLAIMID, TDH.MNG.KEY.FREEID etc) tries to acquire a globa=
l lock
+ * internally in TDX module.  If failed, TDX_OPERAND_BUSY is returned with=
out
+ * spinning or waiting due to a constraint on execution time.  It's caller=
's
+ * responsibility to avoid race (or retry on TDX_OPERAND_BUSY).  Use this =
mutex
+ * to avoid race in TDX module because the kernel knows better about sched=
uling.
+ */
+static DEFINE_MUTEX(tdx_lock);
+static struct mutex *tdx_mng_key_config_lock;
+
+static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
+{
+	pa &=3D ~hkid_mask;
+	pa |=3D (u64)hkid << hkid_start_pos;
+
+	return pa;
+}
+
+static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->tdr.added;
+}
+
+static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx)
+{
+	tdx_keyid_free(kvm_tdx->hkid);
+	kvm_tdx->hkid =3D -1;
+}
+
+static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->hkid > 0;
+}
+
+static void tdx_clear_page(unsigned long page)
+{
+	const void *zero_page =3D (const void *) __va(page_to_phys(ZERO_PAGE(0)));
+	unsigned long i;
+
+	/*
+	 * Zeroing the page is only necessary for systems with MKTME-i:
+	 * when re-assign one page from old keyid to a new keyid, MOVDIR64B is
+	 * required to clear/write the page with new keyid to prevent integrity
+	 * error when read on the page with new keyid.
+	 */
+	if (!static_cpu_has(X86_FEATURE_MOVDIR64B))
+		return;
+
+	for (i =3D 0; i < 4096; i +=3D 64)
+		/* MOVDIR64B [rdx], es:rdi */
+		asm (".byte 0x66, 0x0f, 0x38, 0xf8, 0x3a"
+		     : : "d" (zero_page), "D" (page + i) : "memory");
+}
+
+static int tdx_reclaim_page(unsigned long va, hpa_t pa, bool do_wb, u16 hk=
id)
+{
+	struct tdx_module_output out;
+	u64 err;
+
+	err =3D tdh_phymem_page_reclaim(pa, &out);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_PHYMEM_PAGE_RECLAIM, err, &out);
+		return -EIO;
+	}
+
+	if (do_wb) {
+		err =3D tdh_phymem_page_wbinvd(set_hkid_to_hpa(pa, hkid));
+		if (WARN_ON_ONCE(err)) {
+			pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL);
+			return -EIO;
+		}
+	}
+
+	tdx_clear_page(va);
+	return 0;
+}
+
+static int tdx_alloc_td_page(struct tdx_td_page *page)
+{
+	page->va =3D __get_free_page(GFP_KERNEL_ACCOUNT);
+	if (!page->va)
+		return -ENOMEM;
+
+	page->pa =3D __pa(page->va);
+	return 0;
+}
+
+static void tdx_mark_td_page_added(struct tdx_td_page *page)
+{
+	WARN_ON_ONCE(page->added);
+	page->added =3D true;
+}
+
+static void tdx_reclaim_td_page(struct tdx_td_page *page)
+{
+	if (page->added) {
+		/*
+		 * TDCX are being reclaimed.  TDX module maps TDCX with HKID
+		 * assigned to the TD.  Here the cache associated to the TD
+		 * was already flushed by TDH.PHYMEM.CACHE.WB before here, So
+		 * cache doesn't need to be flushed again.
+		 */
+		if (tdx_reclaim_page(page->va, page->pa, false, 0))
+			return;
+
+		page->added =3D false;
+	}
+	free_page(page->va);
+}
+
+static int tdx_do_tdh_phymem_cache_wb(void *param)
+{
+	u64 err =3D 0;
+
+	do {
+		err =3D tdh_phymem_cache_wb(!!err);
+	} while (err =3D=3D TDX_INTERRUPTED_RESUMABLE);
+
+	/* Other thread may have done for us. */
+	if (err =3D=3D TDX_NO_HKID_READY_TO_WBCACHE)
+		err =3D TDX_SUCCESS;
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_PHYMEM_CACHE_WB, err, NULL);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+void tdx_mmu_release_hkid(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	cpumask_var_t packages;
+	bool cpumask_allocated;
+	u64 err;
+	int ret;
+	int i;
+
+	if (!is_hkid_assigned(kvm_tdx))
+		return;
+
+	if (!is_td_created(kvm_tdx))
+		goto free_hkid;
+
+	cpumask_allocated =3D zalloc_cpumask_var(&packages, GFP_KERNEL);
+	cpus_read_lock();
+	for_each_online_cpu(i) {
+		if (cpumask_allocated &&
+			cpumask_test_and_set_cpu(topology_physical_package_id(i),
+						packages))
+			continue;
+
+		/*
+		 * We can destroy multiple the guest TDs simultaneously.
+		 * Prevent tdh_phymem_cache_wb from returning TDX_BUSY by
+		 * serialization.
+		 */
+		mutex_lock(&tdx_lock);
+		ret =3D smp_call_on_cpu(i, tdx_do_tdh_phymem_cache_wb, NULL, 1);
+		mutex_unlock(&tdx_lock);
+		if (ret)
+			break;
+	}
+	cpus_read_unlock();
+	free_cpumask_var(packages);
+
+	mutex_lock(&tdx_lock);
+	err =3D tdh_mng_key_freeid(kvm_tdx->tdr.pa);
+	mutex_unlock(&tdx_lock);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_KEY_FREEID, err, NULL);
+		pr_err("tdh_mng_key_freeid failed. HKID %d is leaked.\n",
+			kvm_tdx->hkid);
+		return;
+	}
+
+free_hkid:
+	tdx_hkid_free(kvm_tdx);
+}
+
+void tdx_vm_free(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	int i;
+
+	/* Can't reclaim or free TD pages if teardown failed. */
+	if (is_hkid_assigned(kvm_tdx))
+		return;
+
+	for (i =3D 0; i < tdx_caps.tdcs_nr_pages; i++)
+		tdx_reclaim_td_page(&kvm_tdx->tdcs[i]);
+	kfree(kvm_tdx->tdcs);
+
+	/*
+	 * TDX module maps TDR with TDX global HKID.  TDX module may access TDR
+	 * while operating on TD (Especially reclaiming TDCS).  Cache flush with
+	 * TDX global HKID is needed.
+	 */
+	if (kvm_tdx->tdr.added &&
+		tdx_reclaim_page(kvm_tdx->tdr.va, kvm_tdx->tdr.pa, true,
+				tdx_global_keyid))
+		return;
+
+	free_page(kvm_tdx->tdr.va);
+}
+
+static int tdx_do_tdh_mng_key_config(void *param)
+{
+	hpa_t *tdr_p =3D param;
+	u64 err;
+
+	do {
+		err =3D tdh_mng_key_config(*tdr_p);
+
+		/*
+		 * If it failed to generate a random key, retry it because this
+		 * is typically caused by an entropy error of the CPU's random
+		 * number generator.
+		 */
+	} while (err =3D=3D TDX_KEY_GENERATION_FAILED);
+
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_KEY_CONFIG, err, NULL);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+int tdx_vm_init(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	cpumask_var_t packages;
+	int ret, i;
+	u64 err;
+
+	/* vCPUs can't be created until after KVM_TDX_INIT_VM. */
+	kvm->max_vcpus =3D 0;
+
+	kvm_tdx->hkid =3D tdx_keyid_alloc();
+	if (kvm_tdx->hkid < 0)
+		return -EBUSY;
+
+	ret =3D tdx_alloc_td_page(&kvm_tdx->tdr);
+	if (ret)
+		goto free_hkid;
+
+	kvm_tdx->tdcs =3D kcalloc(tdx_caps.tdcs_nr_pages, sizeof(*kvm_tdx->tdcs),
+				GFP_KERNEL_ACCOUNT);
+	if (!kvm_tdx->tdcs)
+		goto free_tdr;
+	for (i =3D 0; i < tdx_caps.tdcs_nr_pages; i++) {
+		ret =3D tdx_alloc_td_page(&kvm_tdx->tdcs[i]);
+		if (ret)
+			goto free_tdcs;
+	}
+
+	/*
+	 * Acquire global lock to avoid TDX_OPERAND_BUSY:
+	 * TDH.MNG.CREATE and other APIs try to lock the global Key Owner
+	 * Table (KOT) to track the assigned TDX private HKID.  It doesn't spin
+	 * to acquire the lock, returns TDX_OPERAND_BUSY instead, and let the
+	 * caller to handle the contention.  This is because of time limitation
+	 * usable inside the TDX module and OS/VMM knows better about process
+	 * scheduling.
+	 *
+	 * APIs to acquire the lock of KOT:
+	 * TDH.MNG.CREATE, TDH.MNG.KEY.FREEID, TDH.MNG.VPFLUSHDONE, and
+	 * TDH.PHYMEM.CACHE.WB.
+	 */
+	mutex_lock(&tdx_lock);
+	err =3D tdh_mng_create(kvm_tdx->tdr.pa, kvm_tdx->hkid);
+	mutex_unlock(&tdx_lock);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_CREATE, err, NULL);
+		ret =3D -EIO;
+		goto free_tdcs;
+	}
+	tdx_mark_td_page_added(&kvm_tdx->tdr);
+
+	if (!zalloc_cpumask_var(&packages, GFP_KERNEL)) {
+		ret =3D -ENOMEM;
+		goto free_tdcs;
+	}
+	cpus_read_lock();
+	for_each_online_cpu(i) {
+		int pkg =3D topology_physical_package_id(i);
+
+		if (cpumask_test_and_set_cpu(pkg, packages))
+			continue;
+
+		/*
+		 * Program the memory controller in the package with an
+		 * encryption key associated to a TDX private host key id
+		 * assigned to this TDR.  Concurrent operations on same memory
+		 * controller results in TDX_OPERAND_BUSY.  Avoid this race by
+		 * mutex.
+		 */
+		mutex_lock(&tdx_mng_key_config_lock[pkg]);
+		ret =3D smp_call_on_cpu(i, tdx_do_tdh_mng_key_config,
+				      &kvm_tdx->tdr.pa, true);
+		mutex_unlock(&tdx_mng_key_config_lock[pkg]);
+		if (ret)
+			break;
+	}
+	cpus_read_unlock();
+	free_cpumask_var(packages);
+	if (ret)
+		goto teardown;
+
+	for (i =3D 0; i < tdx_caps.tdcs_nr_pages; i++) {
+		err =3D tdh_mng_addcx(kvm_tdx->tdr.pa, kvm_tdx->tdcs[i].pa);
+		if (WARN_ON_ONCE(err)) {
+			pr_tdx_error(TDH_MNG_ADDCX, err, NULL);
+			ret =3D -EIO;
+			goto teardown;
+		}
+		tdx_mark_td_page_added(&kvm_tdx->tdcs[i]);
+	}
+
+	/*
+	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a de=
dicated
+	 * ioctl() to define the configure CPUID values for the TD.
+	 */
+	return 0;
+
+	/*
+	 * The sequence for freeing resources from a partially initialized TD
+	 * varies based on where in the initialization flow failure occurred.
+	 * Simply use the full teardown and destroy, which naturally play nice
+	 * with partial initialization.
+	 */
+teardown:
+	tdx_mmu_release_hkid(kvm);
+	tdx_vm_free(kvm);
+	return ret;
+
+free_tdcs:
+	/* @i points at the TDCS page that failed allocation. */
+	for (--i; i >=3D 0; i--)
+		free_page(kvm_tdx->tdcs[i].va);
+	kfree(kvm_tdx->tdcs);
+free_tdr:
+	free_page(kvm_tdx->tdr.va);
+free_hkid:
+	tdx_hkid_free(kvm_tdx);
+	return ret;
+}
+
 int __init tdx_module_setup(void)
 {
 	const struct tdsysinfo_struct *tdsysinfo;
@@ -54,6 +412,8 @@ int __init tdx_module_setup(void)
 		return ret;
 	}
=20
+	tdx_global_keyid =3D tdx_get_global_keyid();
+
 	tdsysinfo =3D tdx_get_sysinfo();
 	if (tdx_caps.nr_cpuid_configs > TDX_MAX_NR_CPUID_CONFIGS)
 		return -EIO;
@@ -87,7 +447,9 @@ bool tdx_is_vm_type_supported(unsigned long type)
=20
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops)
 {
+	int max_pkgs;
 	u32 max_pa;
+	int i;
=20
 	if (!enable_ept) {
 		pr_warn("Cannot enable TDX with EPT disabled\n");
@@ -103,6 +465,14 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_=
ops)
 	if (WARN_ON_ONCE(x86_ops->tlb_remote_flush))
 		return -EIO;
=20
+	max_pkgs =3D topology_max_packages();
+	tdx_mng_key_config_lock =3D kcalloc(max_pkgs, sizeof(*tdx_mng_key_config_=
lock),
+				   GFP_KERNEL);
+	if (!tdx_mng_key_config_lock)
+		return -ENOMEM;
+	for (i =3D 0; i < max_pkgs; i++)
+		mutex_init(&tdx_mng_key_config_lock[i]);
+
 	max_pa =3D cpuid_eax(0x80000008) & 0xff;
 	hkid_start_pos =3D boot_cpu_data.x86_phys_bits;
 	hkid_mask =3D GENMASK_ULL(max_pa - 1, hkid_start_pos);
@@ -110,3 +480,9 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_o=
ps)
=20
 	return 0;
 }
+
+void tdx_hardware_unsetup(void)
+{
+	/* kfree accepts NULL. */
+	kfree(tdx_mng_key_config_lock);
+}
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index f50d37f3fc9c..8058b6b153f8 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -19,6 +19,8 @@ struct kvm_tdx {
=20
 	struct tdx_td_page tdr;
 	struct tdx_td_page *tdcs;
+
+	int hkid;
 };
=20
 struct vcpu_tdx {
diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
index 5c878488795d..590fcfdd1899 100644
--- a/arch/x86/kvm/vmx/tdx_errno.h
+++ b/arch/x86/kvm/vmx/tdx_errno.h
@@ -12,11 +12,11 @@
 #define TDX_SUCCESS				0x0000000000000000ULL
 #define TDX_NON_RECOVERABLE_VCPU		0x4000000100000000ULL
 #define TDX_INTERRUPTED_RESUMABLE		0x8000000300000000ULL
-#define TDX_LIFECYCLE_STATE_INCORRECT		0xC000060700000000ULL
 #define TDX_VCPU_NOT_ASSOCIATED			0x8000070200000000ULL
 #define TDX_KEY_GENERATION_FAILED		0x8000080000000000ULL
 #define TDX_KEY_STATE_INCORRECT			0xC000081100000000ULL
 #define TDX_KEY_CONFIGURED			0x0000081500000000ULL
+#define TDX_NO_HKID_READY_TO_WBCACHE		0x0000082100000000ULL
 #define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
=20
 /*
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 9d8e8abac6d7..468f00d0633e 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -131,9 +131,20 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu);
 #ifdef CONFIG_INTEL_TDX_HOST
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
 bool tdx_is_vm_type_supported(unsigned long type);
+void tdx_hardware_unsetup(void);
+
+int tdx_vm_init(struct kvm *kvm);
+void tdx_mmu_release_hkid(struct kvm *kvm);
+void tdx_vm_free(struct kvm *kvm);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
+static inline void tdx_hardware_unsetup(void) {}
+
+static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
+static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
+static inline void tdx_flush_shadow_all_private(struct kvm *kvm) {}
+static inline void tdx_vm_free(struct kvm *kvm) {}
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7233ce67ae1d..af280db31777 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11990,6 +11990,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_page_track_cleanup(kvm);
 	kvm_xen_destroy_vm(kvm);
 	kvm_hv_destroy_vm(kvm);
+	static_call_cond(kvm_x86_vm_free)(kvm);
 }
=20
 static void memslot_rmap_free(struct kvm_memory_slot *slot)
@@ -12254,6 +12255,13 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
=20
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
+	/*
+	 * kvm_mmu_zap_all() zaps both private and shared page tables.  Before
+	 * tearing down private page tables, TDX requires some TD resources to
+	 * be destroyed (i.e. keyID must have been reclaimed, etc).  Invoke
+	 * kvm_x86_flush_shadow_all_private() for this.
+	 */
+	static_call_cond(kvm_x86_flush_shadow_all_private)(kvm);
 	kvm_mmu_zap_all(kvm);
 }
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 78E20C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:18:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383637AbiEESWX (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:22:23 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36388 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383180AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3BA3714081;
        Thu,  5 May 2022 11:15:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774550; x=1683310550;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=AEtYNjpEBkvZ7J/qWgVfVOw8rhuGAkNWQTcLQNB3yK8=;
  b=E1InvdU2B3oF8H8D1BdYV9RdTl0cMIn29FQnO7RtKWwUGmuvutwEiLPl
   SomfntS/vOVEYRPfJDy7itbN4POoD+SHiQ8i4xhGnkrCXL0XFMF2Zk3ez
   SxxLYfxo2UIgqlebFaZpLbOXJr98hT/EXtp0od9nKI034UbVuRO6guZ5z
   1pEDJsLgMCCu5X9YpXBB7JqczjAFgZ4s6dKTkDTZ1cykAq9/YSrDZTrVR
   tiSb3Vv5wXKhl9ahSlis0bC8th84nS21N4/j6kvJf3I7RkkJZHqCy0Nk1
   DTQpow4o27rd4ntww+FsSLTS9+Yf/AXVCfiaO+DRPz87Riu78qtfoBy17
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746255"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746255"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:43 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083204"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:42 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 023/104] KVM: TDX: x86: Add ioctl to get TDX systemwide
 parameters
Date: Thu,  5 May 2022 11:14:17 -0700
Message-Id: 
 <f23861c45001c1b746a0195dcc4706c7d11a1fc2.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Implement a system-scoped ioctl to get system-wide parameters for TDX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h    |  1 +
 arch/x86/include/asm/kvm_host.h       |  1 +
 arch/x86/include/uapi/asm/kvm.h       | 48 +++++++++++++++++++++++++++
 arch/x86/kvm/vmx/main.c               |  2 ++
 arch/x86/kvm/vmx/tdx.c                | 46 +++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h            |  2 ++
 arch/x86/kvm/x86.c                    |  6 ++++
 tools/arch/x86/include/uapi/asm/kvm.h | 48 +++++++++++++++++++++++++++
 8 files changed, 154 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index fbb2c6746066..3677a5015a4f 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -117,6 +117,7 @@ KVM_X86_OP(smi_allowed)
 KVM_X86_OP(enter_smm)
 KVM_X86_OP(leave_smm)
 KVM_X86_OP(enable_smi_window)
+KVM_X86_OP_OPTIONAL(dev_mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_register_region)
 KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index a6a5518ce0ca..e6f6f8c8619f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1495,6 +1495,7 @@ struct kvm_x86_ops {
 	int (*leave_smm)(struct kvm_vcpu *vcpu, const char *smstate);
 	void (*enable_smi_window)(struct kvm_vcpu *vcpu);
=20
+	int (*dev_mem_enc_ioctl)(void __user *argp);
 	int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp);
 	int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *ar=
gp);
 	int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *=
argp);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 1c3b97f403f4..ca85a070ac19 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -529,4 +529,52 @@ struct kvm_pmu_event_filter {
 #define KVM_X86_DEFAULT_VM	0
 #define KVM_X86_TDX_VM		1
=20
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum kvm_tdx_cmd_id {
+	KVM_TDX_CAPABILITIES =3D 0,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	/* enum kvm_tdx_cmd_id */
+	__u32 id;
+	/* flags for sub-commend. If sub-command doesn't use this, set zero. */
+	__u32 flags;
+	/*
+	 * data for each sub-command. An immediate or a pointer to the actual
+	 * data in process virtual address.  If sub-command doesn't use it,
+	 * set zero.
+	 */
+	__u64 data;
+	/*
+	 * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+	 * status code in addition to -Exxx.
+	 * Defined for consistency with struct kvm_sev_cmd.
+	 */
+	__u64 error;
+	/* Reserved: Defined for consistency with struct kvm_sev_cmd. */
+	__u64 unused;
+};
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+
+	__u32 nr_cpuid_configs;
+	__u32 padding;
+	struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 6a93b19a8b06..7b497ed1f21c 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -212,6 +212,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.complete_emulated_msr =3D kvm_complete_insn_gp,
=20
 	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
+
+	.dev_mem_enc_ioctl =3D tdx_dev_ioctl,
 };
=20
 struct kvm_x86_init_ops vt_init_ops __initdata =3D {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index ba0671cbf0a7..07b69ffbe55b 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -392,6 +392,52 @@ int tdx_vm_init(struct kvm *kvm)
 	return ret;
 }
=20
+int tdx_dev_ioctl(void __user *argp)
+{
+	struct kvm_tdx_capabilities __user *user_caps;
+	struct kvm_tdx_capabilities caps;
+	struct kvm_tdx_cmd cmd;
+
+	BUILD_BUG_ON(sizeof(struct kvm_tdx_cpuid_config) !=3D
+		     sizeof(struct tdx_cpuid_config));
+
+	if (copy_from_user(&cmd, argp, sizeof(cmd)))
+		return -EFAULT;
+	if (cmd.flags || cmd.error || cmd.unused)
+		return -EINVAL;
+	/*
+	 * Currently only KVM_TDX_CAPABILITIES is defined for system-scoped
+	 * mem_enc_ioctl().
+	 */
+	if (cmd.id !=3D KVM_TDX_CAPABILITIES)
+		return -EINVAL;
+
+	user_caps =3D (void __user *)cmd.data;
+	if (copy_from_user(&caps, user_caps, sizeof(caps)))
+		return -EFAULT;
+
+	if (caps.nr_cpuid_configs < tdx_caps.nr_cpuid_configs)
+		return -E2BIG;
+
+	caps =3D (struct kvm_tdx_capabilities) {
+		.attrs_fixed0 =3D tdx_caps.attrs_fixed0,
+		.attrs_fixed1 =3D tdx_caps.attrs_fixed1,
+		.xfam_fixed0 =3D tdx_caps.xfam_fixed0,
+		.xfam_fixed1 =3D tdx_caps.xfam_fixed1,
+		.nr_cpuid_configs =3D tdx_caps.nr_cpuid_configs,
+		.padding =3D 0,
+	};
+
+	if (copy_to_user(user_caps, &caps, sizeof(caps)))
+		return -EFAULT;
+	if (copy_to_user(user_caps->cpuid_configs, &tdx_caps.cpuid_configs,
+			 tdx_caps.nr_cpuid_configs *
+			 sizeof(struct tdx_cpuid_config)))
+		return -EFAULT;
+
+	return 0;
+}
+
 int __init tdx_module_setup(void)
 {
 	const struct tdsysinfo_struct *tdsysinfo;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 468f00d0633e..75e6cdcb06f5 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -132,6 +132,7 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu);
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
 bool tdx_is_vm_type_supported(unsigned long type);
 void tdx_hardware_unsetup(void);
+int tdx_dev_ioctl(void __user *argp);
=20
 int tdx_vm_init(struct kvm *kvm);
 void tdx_mmu_release_hkid(struct kvm *kvm);
@@ -140,6 +141,7 @@ void tdx_vm_free(struct kvm *kvm);
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
 static inline void tdx_hardware_unsetup(void) {}
+static inline int tdx_dev_ioctl(void __user *argp) { return -EOPNOTSUPP; };
=20
 static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
 static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index af280db31777..77e83e20180a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4546,6 +4546,12 @@ long kvm_arch_dev_ioctl(struct file *filp,
 			break;
 		r =3D kvm_x86_dev_has_attr(&attr);
 		break;
+		case KVM_MEMORY_ENCRYPT_OP:
+			r =3D -EINVAL;
+			if (!kvm_x86_ops.dev_mem_enc_ioctl)
+				goto out;
+			r =3D static_call(kvm_x86_dev_mem_enc_ioctl)(argp);
+			break;
 	}
 	default:
 		r =3D -EINVAL;
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 71a5851475e7..a9ea3573be1b 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -528,4 +528,52 @@ struct kvm_pmu_event_filter {
 #define KVM_X86_DEFAULT_VM	0
 #define KVM_X86_TDX_VM		1
=20
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum kvm_tdx_cmd_id {
+	KVM_TDX_CAPABILITIES =3D 0,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	/* enum kvm_tdx_cmd_id */
+	__u32 id;
+	/* flags for sub-commend. If sub-command doesn't use this, set zero. */
+	__u32 flags;
+	/*
+	 * data for each sub-command. An immediate or a pointer to the actual
+	 * data in process virtual address.  If sub-command doesn't use it,
+	 * set zero.
+	 */
+	__u64 data;
+	/*
+	 * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+	 * status code in addition to -Exxx.
+	 * Defined for consistency with struct kvm_sev_cmd.
+	 */
+	__u64 error;
+	/* Reserved: Defined for consistency with struct kvm_sev_cmd. */
+	__u64 unused;
+};
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+
+	__u32 nr_cpuid_configs;
+	__u32 padding;
+	struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
 #endif /* _ASM_X86_KVM_H */
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E9F0FC433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384323AbiEES3F (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:29:05 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36322 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383179AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F6962C67A;
        Thu,  5 May 2022 11:15:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774552; x=1683310552;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=lEoC3Ic1zLdkdH9xkBmGOfUxHuUICOpZJb0q3gtoue0=;
  b=BNtyeUKxVuO/6igdjDCJQd0fNlK+aAgY9JHALLhl4ahitDqNRGdaJqoW
   PlRSIBmyae6gcumi+Zwewg4mmcFvclDk999sgCr8k7kxdLyT/PqqjMPhC
   TkOZ/MbsnGYVfLnixmFUzQ0W/gI66iEsXPy4bNu2HP0D3L8id02IQtfVi
   bFLX66jkBaxoaVqZsXO3ov76BD7roJs8M4RApeQza/L3pwOLGB1qz8sfL
   USQotqDfsGUn00RMZpo859sOUoL2N006TFWh3iclV8xIZS7GRsFxB8Ey3
   zq8YSRUF/OA2XEAg2g7qbqFuOgRF1PcTZu1hHduS7NliMfmk3uKaOdOq3
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746257"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746257"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:43 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083207"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:43 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 024/104] KVM: TDX: Add place holder for TDX VM specific
 mem_enc_op ioctl
Date: Thu,  5 May 2022 11:14:18 -0700
Message-Id: 
 <46c9ee65c92b526f8c1c009099a08a2e0b291649.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a place holder function for TDX specific VM-scoped ioctl as mem_enc_op.
TDX specific sub-commands will be added to retrieve/pass TDX specific
parameters.

KVM_MEMORY_ENCRYPT_OP was introduced for VM-scoped operations specific for
guest state-protected VM.  It defined subcommands for technology-specific
operations under KVM_MEMORY_ENCRYPT_OP.  Despite its name, the subcommands
are not limited to memory encryption, but various technology-specific
operations are defined.  It's natural to repurpose KVM_MEMORY_ENCRYPT_OP
for TDX specific operations and define subcommands.

TDX requires VM-scoped, and VCPU-scoped TDX-specific operations for device
model, for example, qemu.  Getting system-wide parameters, TDX-specific VM
initialization, and TDX-specific vCPU initialization.  Which requires KVM
vCPU-scoped operations in addition to the existing VM-scoped operations.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    |  9 +++++++++
 arch/x86/kvm/vmx/tdx.c     | 26 ++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  4 ++++
 3 files changed, 39 insertions(+)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 7b497ed1f21c..067f5de56c53 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -73,6 +73,14 @@ static void vt_vm_free(struct kvm *kvm)
 		return tdx_vm_free(kvm);
 }
=20
+static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
+{
+	if (!is_td(kvm))
+		return -ENOTTY;
+
+	return tdx_vm_ioctl(kvm, argp);
+}
+
 struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
=20
@@ -214,6 +222,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
=20
 	.dev_mem_enc_ioctl =3D tdx_dev_ioctl,
+	.mem_enc_ioctl =3D vt_mem_enc_ioctl,
 };
=20
 struct kvm_x86_init_ops vt_init_ops __initdata =3D {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 07b69ffbe55b..a4be3ef313b2 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -438,6 +438,32 @@ int tdx_dev_ioctl(void __user *argp)
 	return 0;
 }
=20
+int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
+{
+	struct kvm_tdx_cmd tdx_cmd;
+	int r;
+
+	if (copy_from_user(&tdx_cmd, argp, sizeof(struct kvm_tdx_cmd)))
+		return -EFAULT;
+	if (tdx_cmd.error || tdx_cmd.unused)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	switch (tdx_cmd.id) {
+	default:
+		r =3D -EINVAL;
+		goto out;
+	}
+
+	if (copy_to_user(argp, &tdx_cmd, sizeof(struct kvm_tdx_cmd)))
+		r =3D -EFAULT;
+
+out:
+	mutex_unlock(&kvm->lock);
+	return r;
+}
+
 int __init tdx_module_setup(void)
 {
 	const struct tdsysinfo_struct *tdsysinfo;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 75e6cdcb06f5..1ff555cc6c17 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -137,6 +137,8 @@ int tdx_dev_ioctl(void __user *argp);
 int tdx_vm_init(struct kvm *kvm);
 void tdx_mmu_release_hkid(struct kvm *kvm);
 void tdx_vm_free(struct kvm *kvm);
+
+int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
@@ -147,6 +149,8 @@ static inline int tdx_vm_init(struct kvm *kvm) { return=
 -EOPNOTSUPP; }
 static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
 static inline void tdx_flush_shadow_all_private(struct kvm *kvm) {}
 static inline void tdx_vm_free(struct kvm *kvm) {}
+
+static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8A0E5C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:25:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383993AbiEES2p (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:28:45 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36420 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383194AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56711387BF;
        Thu,  5 May 2022 11:15:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774552; x=1683310552;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=sMmfx1EyVq4pvElmqg1VAc6KGZfGRhiNY1BOEukxkwA=;
  b=R4f6DHN90i9Zm4NJHAP7XZGjdPwPWnRzOJuv60jHIwtKWbLbn5ZH+imk
   MQu6gQZsiZDHfkg/M6XJRxbBX6FaXYEaVxV+Elu7hADxwvpgnoY78/OSE
   M3m+Razf44+1aja5i/J2YxDULDmctkE14AvCw2uDtZqqBHwTjZgj/s7nH
   p5Jp9ZodC2e1sRCDOo+SrQJrC69kUuAoOEp493IlOqe1UpNwmZxJ88xRu
   gmygu+hGHyWNQEv/LKJYk5Hw8lyf6LFlfqsf6Vfwv5F1II9GLTAXsyPQ5
   9yZ083c8DqKqhjxdRxMvagGPQAMz9olOPXrjj1lSTVir+ZPm8k6Ojv0kG
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746258"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746258"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:43 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083210"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:43 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 025/104] KVM: TDX: initialize VM with TDX specific
 parameters
Date: Thu,  5 May 2022 11:14:19 -0700
Message-Id: 
 <fbc23565f7556e7b33227bcad95441195bb4758d.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Xiaoyao Li <xiaoyao.li@intel.com>

TDX requires additional parameters for TDX VM for confidential execution to
protect its confidentiality of its memory contents and its CPU state from
any other software, including VMM. When creating guest TD VM before
creating vcpu, the number of vcpu, TSC frequency (that is same among
vcpus. and it can't be changed.)  CPUIDs which is emulated by the TDX
module. It means guest can trust those CPUIDs. and sha384 values for
measurement.

Add new subcommand, KVM_TDX_INIT_VM, to pass parameters for TDX guest.  It
assigns encryption key to the TDX guest for memory encryption.  TDX
encrypts memory per-guest bases.  It assigns device model passes per-VM
parameters for the TDX guest.  The maximum number of vcpus, tsc frequency
(TDX guest has fised VM-wide TSC frequency. not per-vcpu.  The TDX guest
can not change it.), attributes (production or debug), available extended
features (which is reflected into guest XCR0, IA32_XSS MSR), cpuids, sha384
measurements, and etc.

This subcommand is called before creating vcpu and KVM_SET_CPUID2, i.e.
cpuids configurations aren't available yet.  So CPUIDs configuration values
needs to be passed in struct kvm_init_vm.  It's device model responsibility
to make this cpuid config for KVM_TDX_INIT_VM and KVM_SET_CPUID2.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h       |   2 +
 arch/x86/include/uapi/asm/kvm.h       |  33 +++++
 arch/x86/kvm/vmx/tdx.c                | 205 ++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h                |  23 +++
 tools/arch/x86/include/uapi/asm/kvm.h |  33 +++++
 5 files changed, 296 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index e6f6f8c8619f..bc2038157f11 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1250,6 +1250,8 @@ struct kvm_arch {
 	 * the global KVM_MAX_VCPU_IDS may lead to significant memory waste.
 	 */
 	u32 max_vcpu_ids;
+
+	gfn_t gfn_shared_mask;
 };
=20
 struct kvm_vm_stat {
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index ca85a070ac19..0f067fe7c186 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -532,6 +532,7 @@ struct kvm_pmu_event_filter {
 /* Trust Domain eXtension sub-ioctl() commands. */
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
+	KVM_TDX_INIT_VM,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -577,4 +578,36 @@ struct kvm_tdx_capabilities {
 	struct kvm_tdx_cpuid_config cpuid_configs[0];
 };
=20
+struct kvm_tdx_init_vm {
+	__u64 attributes;
+	__u32 max_vcpus;
+	__u32 tsc_khz;
+	__u64 mrconfigid[6];	/* sha384 digest */
+	__u64 mrowner[6];	/* sha384 digest */
+	__u64 mrownerconfig[6];	/* sha348 digest */
+	union {
+		/*
+		 * KVM_TDX_INIT_VM is called before vcpu creation, thus before
+		 * KVM_SET_CPUID2.  CPUID configurations needs to be passed.
+		 *
+		 * This configuration supersedes KVM_SET_CPUID{,2}.
+		 * The user space VMM, e.g. qemu, should make them consistent
+		 * with this values.
+		 * sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES(256)
+		 * =3D 8KB.
+		 */
+		struct {
+			struct kvm_cpuid2 cpuid;
+			/* 8KB with KVM_MAX_CPUID_ENTRIES. */
+			struct kvm_cpuid_entry2 entries[];
+		};
+		/*
+		 * For future extensibility.
+		 * The size(struct kvm_tdx_init_vm) =3D 16KB.
+		 * This should be enough given sizeof(TD_PARAMS) =3D 1024
+		 */
+		__u64 reserved[2028];
+	};
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a4be3ef313b2..095ca7952114 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -438,6 +438,208 @@ int tdx_dev_ioctl(void __user *argp)
 	return 0;
 }
=20
+#define TDX_CPUID_NO_INDEX	((u32)-1)
+static const struct kvm_cpuid_entry2 *tdx_find_cpuid_entry(
+	const struct kvm_tdx_init_vm *init_vm, u32 function, u32 index)
+{
+	int i;
+
+	for (i =3D 0; i < init_vm->cpuid.nent; i++) {
+		const struct kvm_cpuid_entry2 *e =3D &init_vm->entries[i];
+
+		if (e->function =3D=3D function &&
+		    (e->index =3D=3D index ||
+		     !(e->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX)))
+			return e;
+	}
+	return NULL;
+}
+
+static int setup_tdparams(struct kvm *kvm, struct td_params *td_params,
+			struct kvm_tdx_init_vm *init_vm)
+{
+	const struct kvm_cpuid_entry2 *entry;
+	u64 guest_supported_xcr0;
+	u64 guest_supported_xss;
+	u32 guest_tsc_khz;
+	int max_pa;
+	int i;
+
+	td_params->max_vcpus =3D init_vm->max_vcpus;
+
+	td_params->attributes =3D init_vm->attributes;
+	if (td_params->attributes & TDX_TD_ATTRIBUTE_PERFMON) {
+		/*
+		 * TODO: save/restore PMU related registers around TDENTER.
+		 * Once it's done, remove this guard.
+		 */
+		pr_warn("TD doesn't support perfmon yet. KVM needs to save/restore "
+			"host perf registers properly.\n");
+		return -EOPNOTSUPP;
+	}
+
+	for (i =3D 0; i < tdx_caps.nr_cpuid_configs; i++) {
+		const struct tdx_cpuid_config *config =3D &tdx_caps.cpuid_configs[i];
+		const struct kvm_cpuid_entry2 *entry =3D
+			tdx_find_cpuid_entry(init_vm, config->leaf, config->sub_leaf);
+		struct tdx_cpuid_value *value =3D &td_params->cpuid_values[i];
+
+		value->eax =3D entry->eax & config->eax;
+		value->ebx =3D entry->ebx & config->ebx;
+		value->ecx =3D entry->ecx & config->ecx;
+		value->edx =3D entry->edx & config->edx;
+	}
+
+	max_pa =3D 36;
+	entry =3D tdx_find_cpuid_entry(init_vm, 0x80000008, TDX_CPUID_NO_INDEX);
+	if (entry)
+		max_pa =3D entry->eax & 0xff;
+
+	td_params->eptp_controls =3D VMX_EPTP_MT_WB;
+	/*
+	 * No CPU supports 4-level && max_pa > 48.
+	 * "5-level paging and 5-level EPT" section 4.1 4-level EPT
+	 * "4-level EPT is limited to translating 48-bit guest-physical
+	 *  addresses."
+	 * cpu_has_vmx_ept_5levels() check is just in case.
+	 */
+	if (cpu_has_vmx_ept_5levels() && max_pa > 48) {
+		td_params->eptp_controls |=3D VMX_EPTP_PWL_5;
+		td_params->exec_controls |=3D TDX_EXEC_CONTROL_MAX_GPAW;
+	} else {
+		td_params->eptp_controls |=3D VMX_EPTP_PWL_4;
+	}
+
+	/* Setup td_params.xfam */
+	entry =3D tdx_find_cpuid_entry(init_vm, 0xd, 0);
+	if (entry)
+		guest_supported_xcr0 =3D (entry->eax | ((u64)entry->edx << 32));
+	else
+		guest_supported_xcr0 =3D 0;
+	guest_supported_xcr0 &=3D supported_xcr0;
+
+	entry =3D tdx_find_cpuid_entry(init_vm, 0xd, 1);
+	if (entry)
+		guest_supported_xss =3D (entry->ecx | ((u64)entry->edx << 32));
+	else
+		guest_supported_xss =3D 0;
+	/* PT can be exposed to TD guest regardless of KVM's XSS support */
+	guest_supported_xss &=3D (supported_xss | XFEATURE_MASK_PT);
+
+	td_params->xfam =3D guest_supported_xcr0 | guest_supported_xss;
+	if (td_params->xfam & XFEATURE_MASK_LBR) {
+		/*
+		 * TODO: once KVM supports LBR(save/restore LBR related
+		 * registers around TDENTER), remove this guard.
+		 */
+		pr_warn("TD doesn't support LBR yet. KVM needs to save/restore "
+			"IA32_LBR_DEPTH properly.\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (td_params->xfam & XFEATURE_MASK_XTILE) {
+		/*
+		 * TODO: once KVM supports AMX(save/restore AMX related
+		 * registers around TDENTER), remove this guard.
+		 */
+		pr_warn("TD doesn't support AMX yet. KVM needs to save/restore "
+			"IA32_XFD, IA32_XFD_ERR properly.\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (init_vm->tsc_khz) {
+		guest_tsc_khz =3D init_vm->tsc_khz;
+		kvm->arch.default_tsc_khz =3D guest_tsc_khz;
+	} else
+		guest_tsc_khz =3D kvm->arch.default_tsc_khz;
+	td_params->tsc_frequency =3D TDX_TSC_KHZ_TO_25MHZ(guest_tsc_khz);
+
+#define MEMCPY_SAME_SIZE(dst, src)				\
+	do {							\
+		BUILD_BUG_ON(sizeof(dst) !=3D sizeof(src));	\
+		memcpy((dst), (src), sizeof(dst));		\
+	} while (0)
+
+	MEMCPY_SAME_SIZE(td_params->mrconfigid, init_vm->mrconfigid);
+	MEMCPY_SAME_SIZE(td_params->mrowner, init_vm->mrowner);
+	MEMCPY_SAME_SIZE(td_params->mrownerconfig, init_vm->mrownerconfig);
+
+	return 0;
+}
+
+static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	struct kvm_tdx_init_vm *init_vm =3D NULL;
+	struct td_params *td_params =3D NULL;
+	struct tdx_module_output out;
+	int ret;
+	u64 err;
+
+	BUILD_BUG_ON(sizeof(*init_vm) !=3D 16 * 1024);
+	BUILD_BUG_ON((sizeof(*init_vm) - offsetof(typeof(*init_vm), entries)) /
+		     sizeof(init_vm->entries[0]) < KVM_MAX_CPUID_ENTRIES);
+	BUILD_BUG_ON(sizeof(struct td_params) !=3D 1024);
+
+	if (is_td_initialized(kvm))
+		return -EINVAL;
+
+	if (cmd->flags)
+		return -EINVAL;
+
+	init_vm =3D kzalloc(sizeof(*init_vm), GFP_KERNEL);
+	if (copy_from_user(init_vm, (void __user *)cmd->data, sizeof(*init_vm))) {
+		ret =3D -EFAULT;
+		goto out;
+	}
+
+	if (init_vm->max_vcpus > KVM_MAX_VCPUS) {
+		ret =3D -EINVAL;
+		goto out;
+	}
+
+	if (tdx_caps.nr_cpuid_configs > KVM_MAX_CPUID_ENTRIES) {
+		/* tdx_caps.nr_cpuis_configs should be smaller. */
+		pr_warn("too large nr_cpuid_configs\n");
+		ret =3D -E2BIG;
+		goto out;
+	}
+
+	td_params =3D kzalloc(sizeof(struct td_params), GFP_KERNEL);
+	if (!td_params) {
+		ret =3D -ENOMEM;
+		goto out;
+	}
+
+	ret =3D setup_tdparams(kvm, td_params, init_vm);
+	if (ret)
+		goto out;
+
+	err =3D tdh_mng_init(kvm_tdx->tdr.pa, __pa(td_params), &out);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_INIT, err, &out);
+		ret =3D -EIO;
+		goto out;
+	}
+
+	kvm_tdx->tsc_offset =3D td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFF=
SET);
+	kvm_tdx->attributes =3D td_params->attributes;
+	kvm_tdx->xfam =3D td_params->xfam;
+	kvm_tdx->tsc_khz =3D TDX_TSC_25MHZ_TO_KHZ(td_params->tsc_frequency);
+	kvm->max_vcpus =3D td_params->max_vcpus;
+
+	if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW)
+		kvm->arch.gfn_shared_mask =3D gpa_to_gfn(BIT_ULL(51));
+	else
+		kvm->arch.gfn_shared_mask =3D gpa_to_gfn(BIT_ULL(47));
+
+out:
+	/* kfree() accepts NULL. */
+	kfree(init_vm);
+	kfree(td_params);
+	return ret;
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -451,6 +653,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	mutex_lock(&kvm->lock);
=20
 	switch (tdx_cmd.id) {
+	case KVM_TDX_INIT_VM:
+		r =3D tdx_td_init(kvm, &tdx_cmd);
+		break;
 	default:
 		r =3D -EINVAL;
 		goto out;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 8058b6b153f8..8a0793fcc3ab 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -20,7 +20,12 @@ struct kvm_tdx {
 	struct tdx_td_page tdr;
 	struct tdx_td_page *tdcs;
=20
+	u64 attributes;
+	u64 xfam;
 	int hkid;
+
+	u64 tsc_offset;
+	unsigned long tsc_khz;
 };
=20
 struct vcpu_tdx {
@@ -50,6 +55,11 @@ static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *v=
cpu)
 	return container_of(vcpu, struct vcpu_tdx, vcpu);
 }
=20
+static inline bool is_td_initialized(struct kvm *kvm)
+{
+	return !!kvm->max_vcpus;
+}
+
 static __always_inline void tdvps_vmcs_check(u32 field, u8 bits)
 {
 	BUILD_BUG_ON_MSG(__builtin_constant_p(field) && (field) & 0x1,
@@ -135,6 +145,19 @@ TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs);
 TDX_BUILD_TDVPS_ACCESSORS(64, STATE_NON_ARCH, state_non_arch);
 TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management);
=20
+static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u3=
2 field)
+{
+	struct tdx_module_output out;
+	u64 err;
+
+	err =3D tdh_mng_rd(kvm_tdx->tdr.pa, TDCS_EXEC(field), &out);
+	if (unlikely(err)) {
+		pr_err("TDH_MNG_RD[EXEC.0x%x] failed: 0x%llx\n", field, err);
+		return 0;
+	}
+	return out.r8;
+}
+
 #else
 static inline int tdx_module_setup(void) { return -ENODEV; };
=20
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index a9ea3573be1b..779dfd683d66 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -531,6 +531,7 @@ struct kvm_pmu_event_filter {
 /* Trust Domain eXtension sub-ioctl() commands. */
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
+	KVM_TDX_INIT_VM,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -576,4 +577,36 @@ struct kvm_tdx_capabilities {
 	struct kvm_tdx_cpuid_config cpuid_configs[0];
 };
=20
+struct kvm_tdx_init_vm {
+	__u64 attributes;
+	__u32 max_vcpus;
+	__u32 tsc_khz;
+	__u64 mrconfigid[6];    /* sha384 digest */
+	__u64 mrowner[6];       /* sha384 digest */
+	__u64 mrownerconfig[6]; /* sha348 digest */
+	union {
+		/*
+		 * KVM_TDX_INIT_VM is called before vcpu creation, thus before
+		 * KVM_SET_CPUID2.  CPUID configurations needs to be passed.
+		 *
+		 * This configuration supersedes KVM_SET_CPUID{,2}.
+		 * The user space VMM, e.g. qemu, should make them consistent
+		 * with this values.
+		 * sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES(256)
+		 * =3D 8KB.
+		 */
+		struct {
+			struct kvm_cpuid2 cpuid;
+			/* 8KB with KVM_MAX_CPUID_ENTRIES. */
+			struct kvm_cpuid_entry2 entries[];
+		};
+		/*
+		 * For future extensibility.
+		 * The size(struct kvm_tdx_init_vm) =3D 16KB.
+		 * This should be enough given sizeof(TD_PARAMS) =3D 1024
+		 */
+		__u64 reserved[2028];
+	};
+};
+
 #endif /* _ASM_X86_KVM_H */
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1633AC433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:16:10 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383329AbiEESTr (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:19:47 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36220 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383071AbiEESTX (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:23 -0400
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8C971117B;
        Thu,  5 May 2022 11:15:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774543; x=1683310543;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=DF2YmHbo2UpGtex9X0UYIzKWr6u+VFipK67sZ4vSZkM=;
  b=PyS7vwmlFSZJAsJ/ZogvCqRNameyPqUBNBB53TK1869q2Cle6CpTnvYR
   RTHXkDw54eR6GzaL0C3f5OTwK4eF9aFaJQ5XO+IOJhf+cp5GZ5nENdxDu
   RsFW7Bx2u5baPXyD4RwaHrzz/S2WA8qqMcUwrcpl3icH8KI5ihZjQ/LQv
   BQfCjCdcimhStCIdb0/bg0OYlIvZFeiMySSEJAtwLHufqMlQvrZmNVwKK
   cb+mgk3nJLBk6r5yPV7Pk7bEqs650SsNa50qJrUMdHXeLfaf59uGPP4db
   8P+9MG0AR0irFll7llVvIyXzmWin++6GfURndmfvzJ9axQSZ2X1vqN/OB
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="265800737"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="265800737"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:43 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083216"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:43 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 026/104] KVM: TDX: Make KVM_CAP_SET_IDENTITY_MAP_ADDR
 unsupported for TDX
Date: Thu,  5 May 2022 11:14:20 -0700
Message-Id: 
 <a207a7181260bc1c55629c2bf901b22efb6418e2.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because TDX doesn't support KVM_SET_IDENTITY_MAP_ADDR.  Return
KVM_CAP_SET_IDENTITY_MAP_ADDR as unsupported.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/x86.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 77e83e20180a..fd282e5efec1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4236,7 +4236,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lon=
g ext)
 	case KVM_CAP_IOEVENTFD_NO_LENGTH:
 	case KVM_CAP_PIT2:
 	case KVM_CAP_PIT_STATE2:
-	case KVM_CAP_SET_IDENTITY_MAP_ADDR:
 	case KVM_CAP_VCPU_EVENTS:
 	case KVM_CAP_HYPERV:
 	case KVM_CAP_HYPERV_VAPIC:
@@ -4389,6 +4388,12 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lo=
ng ext)
 	case KVM_CAP_DISABLE_QUIRKS2:
 		r =3D KVM_X86_VALID_QUIRKS;
 		break;
+	case KVM_CAP_SET_IDENTITY_MAP_ADDR:
+		if (kvm && kvm->arch.vm_type =3D=3D KVM_X86_TDX_VM)
+			r =3D 0;
+		else
+			r =3D 1;
+		break;
 	case KVM_CAP_VM_TYPES:
 		r =3D BIT(KVM_X86_DEFAULT_VM);
 		if (static_call(kvm_x86_is_vm_type_supported)(KVM_X86_TDX_VM))
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D75FCC433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:17:36 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383592AbiEESVN (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:21:13 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36326 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383083AbiEEST1 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:27 -0400
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DCE01140DD;
        Thu,  5 May 2022 11:15:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774544; x=1683310544;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=rn1fA5gObl+Rf3AUa8EQeZo10m5xgJZyjRqWSFr/jCY=;
  b=T3S6iD7MdaCgHXWk4n9sjtLwHd8XH8f436fOeGYLqivZeVUYC1onnr5Y
   nbek5eKSR869YqwIRxQvUJluc6SRso1YkHauD2Yzd4j9n9voXsHMgEVw7
   9Xqw/psgbHOsoOpdF/naFNFtzsjHJO8K6Bk5Co7VjwJLem1g1h12oG52Y
   v4jZCshsB6d0eRLJczIU6Y+t62Lx2EzjbQmqN1lyQFOfDBX3WvwSupN2R
   3+vuxvhm9o/A+xVcGFg6iDJ8fUfs1YLRg96xZEkZNQPqfCJtBT983bRGr
   3epDtehrnI2jK5xKuyaGSVynFlyZMR1GHZ3Eh2qelcOvydlUrfoBCucgY
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="265800738"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="265800738"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:43 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083224"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:43 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 027/104] KVM: TDX: Make pmu_intel.c ignore guest TD
 case
Date: Thu,  5 May 2022 11:14:21 -0700
Message-Id: 
 <3f4e5e8378499193185d4edc799af5643a14df04.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because TDX KVM doesn't support PMU yet (it's future work of TDX KVM
support as another patch series) and pmu_intel.c touches vmx specific
structure in vcpu initialization, as workaround add dummy structure to
struct vcpu_tdx and pmu_intel.c can ignore TDX case.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 33 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/pmu_intel.h | 29 +++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h       |  7 +++++++
 arch/x86/kvm/vmx/vmx.h       | 24 +-----------------------
 4 files changed, 70 insertions(+), 23 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/pmu_intel.h

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 37e9eb32e3d9..616e9f00cd6e 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -17,6 +17,7 @@
 #include "lapic.h"
 #include "nested.h"
 #include "pmu.h"
+#include "tdx.h"
=20
 #define MSR_PMC_FULL_WIDTH_BIT      (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
=20
@@ -35,6 +36,26 @@ static struct kvm_event_hw_type_mapping intel_arch_event=
s[] =3D {
 /* mapping between fixed pmc index and intel_arch_events array */
 static int fixed_pmc_events[] =3D {1, 0, 7};
=20
+struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_INTEL_TDX_HOST
+	if (is_td_vcpu(vcpu))
+		return &to_tdx(vcpu)->lbr_desc;
+#endif
+
+	return &to_vmx(vcpu)->lbr_desc;
+}
+
+struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_INTEL_TDX_HOST
+	if (is_td_vcpu(vcpu))
+		return &to_tdx(vcpu)->lbr_desc.records;
+#endif
+
+	return &to_vmx(vcpu)->lbr_desc.records;
+}
+
 static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 {
 	int i;
@@ -169,6 +190,9 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_=
pmu *pmu, u32 msr)
=20
 bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu)
 {
+	if (is_td_vcpu(vcpu))
+		return false;
+
 	/*
 	 * As a first step, a guest could only enable LBR feature if its
 	 * cpu model is the same as the host because the LBR registers
@@ -181,6 +205,9 @@ bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
 {
 	struct x86_pmu_lbr *lbr =3D vcpu_to_lbr_records(vcpu);
=20
+	if (is_td_vcpu(vcpu))
+		return false;
+
 	return lbr->nr && (vcpu_get_perf_capabilities(vcpu) & PMU_CAP_LBR_FMT);
 }
=20
@@ -282,6 +309,9 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *v=
cpu)
 					PERF_SAMPLE_BRANCH_USER,
 	};
=20
+	if (WARN_ON(is_td_vcpu(vcpu)))
+		return 0;
+
 	if (unlikely(lbr_desc->event)) {
 		__set_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use);
 		return 0;
@@ -589,6 +619,9 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
 	struct kvm_pmc *pmc =3D NULL;
 	int i;
=20
+	if (is_td_vcpu(vcpu))
+		return;
+
 	for (i =3D 0; i < INTEL_PMC_MAX_GENERIC; i++) {
 		pmc =3D &pmu->gp_counters[i];
=20
diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h
new file mode 100644
index 000000000000..12d592450d0f
--- /dev/null
+++ b/arch/x86/kvm/vmx/pmu_intel.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_VMX_PMU_INTEL_H
+#define  __KVM_X86_VMX_PMU_INTEL_H
+
+struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu);
+struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu);
+
+bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);
+
+int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
+
+struct lbr_desc {
+	/* Basic info about guest LBR records. */
+	struct x86_pmu_lbr records;
+
+	/*
+	 * Emulate LBR feature via passthrough LBR registers when the
+	 * per-vcpu guest LBR event is scheduled on the current pcpu.
+	 *
+	 * The records may be inaccurate if the host reclaims the LBR.
+	 */
+	struct perf_event *event;
+
+	/* True if LBRs are marked as not intercepted in the MSR bitmap */
+	bool msr_passthrough;
+};
+
+#endif /* __KVM_X86_VMX_PMU_INTEL_H */
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 8a0793fcc3ab..892e7dc96e99 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -4,6 +4,7 @@
=20
 #ifdef CONFIG_INTEL_TDX_HOST
=20
+#include "pmu_intel.h"
 #include "tdx_ops.h"
=20
 int tdx_module_setup(void);
@@ -33,6 +34,12 @@ struct vcpu_tdx {
=20
 	struct tdx_td_page tdvpr;
 	struct tdx_td_page *tdvpx;
+
+	/*
+	 * Dummy to make pmu_intel not corrupt memory.
+	 * TODO: Support PMU for TDX.  Future work.
+	 */
+	struct lbr_desc lbr_desc;
 };
=20
 static inline bool is_td(struct kvm *kvm)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d7baedda79e5..36dbaf1add45 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -10,6 +10,7 @@
 #include "capabilities.h"
 #include "kvm_cache_regs.h"
 #include "posted_intr.h"
+#include "pmu_intel.h"
 #include "vmcs.h"
 #include "vmx_ops.h"
 #include "cpuid.h"
@@ -91,31 +92,8 @@ union vmx_exit_reason {
 	u32 full;
 };
=20
-#define vcpu_to_lbr_desc(vcpu) (&to_vmx(vcpu)->lbr_desc)
-#define vcpu_to_lbr_records(vcpu) (&to_vmx(vcpu)->lbr_desc.records)
-
-bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
-bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);
-
-int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
 void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu);
=20
-struct lbr_desc {
-	/* Basic info about guest LBR records. */
-	struct x86_pmu_lbr records;
-
-	/*
-	 * Emulate LBR feature via passthrough LBR registers when the
-	 * per-vcpu guest LBR event is scheduled on the current pcpu.
-	 *
-	 * The records may be inaccurate if the host reclaims the LBR.
-	 */
-	struct perf_event *event;
-
-	/* True if LBRs are marked as not intercepted in the MSR bitmap */
-	bool msr_passthrough;
-};
-
 /*
  * The nested_vmx structure is part of vcpu_vmx, and holds information we =
need
  * for correct emulation of VMX (i.e., nested VMX) on this vcpu.
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0270FC433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:18:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383338AbiEESVk (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:21:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36248 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383075AbiEESTY (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:24 -0400
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 962CA140D1;
        Thu,  5 May 2022 11:15:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774544; x=1683310544;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=dq/4kTxFog2mJ+pfF6ngUCSHtSzKcfcxTCggESSMMVA=;
  b=WGv1p/QAMu/5HS3ljujsT5XXMjrS0zu0eggenYyu4k6w4e9rqsd7oPNG
   Qn9SwyL5hUcmFRCMC9SCNMGYr+v6XxYrz5PvvXKfIXgo3O+YGSYHaxrWs
   HcbYBwlidS9KaU78bepLHuVcbTpxMzVIodt5lIEyfBuhCjuJ2i90io5iC
   tGUersyr0mkQFZH+3TBifE5hkPo+6e+okJIpj6TwllZXA+7YsSaggpowt
   NXiuqm7e97OaH0Rs73xtap9vZruVgl3pyz3QA+nB5vQk13W52ZT9VfUPB
   8Ak+XMMc+4laXWM65a2Wg9qoWYe4Sp0uGPvruDn7dN0rtPhBzSpTiMxek
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="255683941"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="255683941"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:44 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083228"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:43 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 028/104] [MARKER] The start of TDX KVM patch series: TD
 vcpu creation/destruction
Date: Thu,  5 May 2022 11:14:22 -0700
Message-Id: 
 <6f35b7eeb910dfec5c2c3859618fb767c0b05e63.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD vcpu
creation/destruction.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 5e0deaebf843..3e8efde3e3f3 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -9,15 +9,15 @@ Layer status
 What qemu can do
 ----------------
 - TDX VM TYPE is exposed to Qemu.
-- Qemu can try to create VM of TDX VM type and then fails.
+- Qemu can create/destroy guest of TDX vm type.
=20
 Patch Layer status
 ------------------
   Patch layer                          Status
 * TDX, VMX coexistence:                 Applied
 * TDX architectural definitions:        Applied
-* TD VM creation/destruction:           Applying
-* TD vcpu creation/destruction:         Not yet
+* TD VM creation/destruction:           Applied
+* TD vcpu creation/destruction:         Applying
 * TDX EPT violation:                    Not yet
 * TD finalization:                      Not yet
 * TD vcpu enter/exit:                   Not yet
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CBFF7C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:16:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383358AbiEESUV (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:20:21 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36390 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383091AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33DB515709;
        Thu,  5 May 2022 11:15:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774545; x=1683310545;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=JDuR99zvML4Df4RzY+/b69Zxf0RiAF2K0Jup441XziI=;
  b=kgS7AS/yHAjKvYWpoCkmy9Pol57T03tgH/NwaCScAZsqrn0iVabIu7ci
   vD4Hp7iqlCL0hMt/lYFX3edQfvVvsMNTNdHHP5P/wSu2eqhuGmELizBMg
   VqbdVS05I+Qol/8pRZHA78nFKo57DQKJtnnuWZD/M0I9AsAGULeSMrxPq
   pcNat/63vW1fJBNiAOEUT8oaWPEQtNPGx1BhVAph7ZrFBN4EoiWbHzOQt
   85mfoC90bEn7uflPO36ao0c4o+Q2cWHqNv2lqmwAgEI3+bYgYapwlylF2
   9JVBdCFrd92qrELQIkhOVrGp8tnyOZGdhbcbsMGvJ3ax+rhRtDDYAN2Ye
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="255683944"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="255683944"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:44 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083231"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:43 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 029/104] KVM: TDX: allocate/free TDX vcpu structure
Date: Thu,  5 May 2022 11:14:23 -0700
Message-Id: 
 <1fc6f3157deeaa0f6dfe12feecade8d0c99d2509.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The next step of TDX guest creation is to create vcpu.  Allocate TDX vcpu
structures, initialize it.  Allocate pages of TDX vcpu for the TDX module.

In the case of the conventional case, cpuid is empty at the initialization.
and cpuid is configured after the vcpu initialization.  Because TDX
supports only X2APIC mode, cpuid is forcibly initialized to support X2APIC
on the vcpu initialization.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 133 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 133 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 095ca7952114..eb4fd31bc369 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -6,6 +6,7 @@
 #include "capabilities.h"
 #include "x86_ops.h"
 #include "tdx.h"
+#include "x86.h"
=20
 #undef pr_fmt
 #define pr_fmt(fmt) "tdx: " fmt
@@ -61,6 +62,11 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u=
16 hkid)
 	return pa;
 }
=20
+static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
+{
+	return tdx->tdvpr.added;
+}
+
 static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
 {
 	return kvm_tdx->tdr.added;
@@ -392,6 +398,133 @@ int tdx_vm_init(struct kvm *kvm)
 	return ret;
 }
=20
+int tdx_vcpu_create(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	int ret, i;
+
+	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
+	if (!vcpu->arch.apic)
+		return -EINVAL;
+
+	ret =3D tdx_alloc_td_page(&tdx->tdvpr);
+	if (ret)
+		return ret;
+
+	tdx->tdvpx =3D kcalloc(tdx_caps.tdvpx_nr_pages, sizeof(*tdx->tdvpx),
+			GFP_KERNEL_ACCOUNT);
+	if (!tdx->tdvpx) {
+		ret =3D -ENOMEM;
+		goto free_tdvpr;
+	}
+	for (i =3D 0; i < tdx_caps.tdvpx_nr_pages; i++) {
+		ret =3D tdx_alloc_td_page(&tdx->tdvpx[i]);
+		if (ret)
+			goto free_tdvpx;
+	}
+
+	vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
+
+	vcpu->arch.cr0_guest_owned_bits =3D -1ul;
+	vcpu->arch.cr4_guest_owned_bits =3D -1ul;
+
+	vcpu->arch.tsc_offset =3D to_kvm_tdx(vcpu->kvm)->tsc_offset;
+	vcpu->arch.l1_tsc_offset =3D vcpu->arch.tsc_offset;
+	vcpu->arch.guest_state_protected =3D
+		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
+
+	return 0;
+
+free_tdvpx:
+	/* @i points at the TDVPX page that failed allocation. */
+	for (--i; i >=3D 0; i--)
+		free_page(tdx->tdvpx[i].va);
+	kfree(tdx->tdvpx);
+free_tdvpr:
+	free_page(tdx->tdvpr.va);
+
+	return ret;
+}
+
+void tdx_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	int i;
+
+	/* Can't reclaim or free pages if teardown failed. */
+	if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
+		return;
+
+	for (i =3D 0; i < tdx_caps.tdvpx_nr_pages; i++)
+		tdx_reclaim_td_page(&tdx->tdvpx[i]);
+	kfree(tdx->tdvpx);
+	tdx_reclaim_td_page(&tdx->tdvpr);
+}
+
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	struct msr_data apic_base_msr;
+	u64 err;
+	int i;
+
+	/* TDX doesn't support INIT event. */
+	if (WARN_ON(init_event))
+		goto td_bugged;
+	if (WARN_ON(is_td_vcpu_created(tdx)))
+		goto td_bugged;
+
+	err =3D tdh_vp_create(kvm_tdx->tdr.pa, tdx->tdvpr.pa);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_VP_CREATE, err, NULL);
+		goto td_bugged;
+	}
+	tdx_mark_td_page_added(&tdx->tdvpr);
+
+	for (i =3D 0; i < tdx_caps.tdvpx_nr_pages; i++) {
+		err =3D tdh_vp_addcx(tdx->tdvpr.pa, tdx->tdvpx[i].pa);
+		if (WARN_ON_ONCE(err)) {
+			pr_tdx_error(TDH_VP_ADDCX, err, NULL);
+			goto td_bugged;
+		}
+		tdx_mark_td_page_added(&tdx->tdvpx[i]);
+	}
+
+	if (!vcpu->arch.cpuid_entries) {
+		/*
+		 * On cpu creation, cpuid entry is blank.  Forcibly enable
+		 * X2APIC feature to allow X2APIC.
+		 */
+		struct kvm_cpuid_entry2 *e;
+
+		e =3D kvmalloc_array(1, sizeof(*e), GFP_KERNEL_ACCOUNT);
+		*e  =3D (struct kvm_cpuid_entry2) {
+			.function =3D 1,	/* Features for X2APIC */
+			.index =3D 0,
+			.eax =3D 0,
+			.ebx =3D 0,
+			.ecx =3D 1ULL << 21,	/* X2APIC */
+			.edx =3D 0,
+		};
+		vcpu->arch.cpuid_entries =3D e;
+		vcpu->arch.cpuid_nent =3D 1;
+	}
+	apic_base_msr.data =3D APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC;
+	if (kvm_vcpu_is_reset_bsp(vcpu))
+		apic_base_msr.data |=3D MSR_IA32_APICBASE_BSP;
+	apic_base_msr.host_initiated =3D true;
+	if (WARN_ON(kvm_set_apic_base(vcpu, &apic_base_msr)))
+		goto td_bugged;
+
+	vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
+
+	return;
+
+td_bugged:
+	vcpu->kvm->vm_bugged =3D true;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D5BE9C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:17:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383401AbiEESVF (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:21:05 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36388 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S240591AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7603E15818;
        Thu,  5 May 2022 11:15:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774545; x=1683310545;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=tHxWnEnWzHxtFAH14IxIaOvVTgyVwNRXnQcU80k0WQQ=;
  b=K7o5EvYtD08hf0KnMJAlBecNaNw1B5IeZpjkL6dY9fy3Ns5ZCigym+UX
   pRvA0qMp8xY1KjspSTOQJHNhcPK3ddQoolRLG6BNuVMeCrd0q7qUK4elk
   YerkDxqXmcz/NnHhfPVaKSxXoTSi3kICd9PdU9Wh3qLINYH47XFGUxm7i
   bkE7mYJoMIfiQtlJVpt3aeSDDcye1PAKDiIphphf1DNsY4a2QJCIVRsLG
   MBz/R965NFE2kHzLLhT93v6ozmSQT5jzqbSCtxK0yzB3slpSW21+1Litz
   5PmgL+ZE9YBs0oGqb5Ry7SLrNLlRn1Qc2UsslFJtN/kmGzCzZ7RQiU2tH
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="255683947"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="255683947"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:44 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083234"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:44 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 030/104] KVM: TDX: allocate/free TDX vcpu structure
Date: Thu,  5 May 2022 11:14:24 -0700
Message-Id: 
 <47649c754bf6115246c0b6bd6a65fcdca76202dc.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The next step of TDX guest creation is to create vcpu.  Allocate TDX vcpu
structures, initialize it.  Allocate pages of TDX vcpu for the TDX module.

In the case of the conventional case, cpuid is empty at the initialization.
and cpuid is configured after the vcpu initialization.  Because TDX
supports only X2APIC mode, cpuid is forcibly initialized to support X2APIC
on the vcpu initialization.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 40 ++++++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/x86_ops.h |  8 ++++++++
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 067f5de56c53..4f4ed4ad65a7 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -73,6 +73,38 @@ static void vt_vm_free(struct kvm *kvm)
 		return tdx_vm_free(kvm);
 }
=20
+static int vt_vcpu_precreate(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return 0;
+
+	return vmx_vcpu_precreate(kvm);
+}
+
+static int vt_vcpu_create(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_create(vcpu);
+
+	return vmx_vcpu_create(vcpu);
+}
+
+static void vt_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_free(vcpu);
+
+	return vmx_vcpu_free(vcpu);
+}
+
+static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_reset(vcpu, init_event);
+
+	return vmx_vcpu_reset(vcpu, init_event);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -98,10 +130,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vm_destroy =3D vt_vm_destroy,
 	.vm_free =3D vt_vm_free,
=20
-	.vcpu_precreate =3D vmx_vcpu_precreate,
-	.vcpu_create =3D vmx_vcpu_create,
-	.vcpu_free =3D vmx_vcpu_free,
-	.vcpu_reset =3D vmx_vcpu_reset,
+	.vcpu_precreate =3D vt_vcpu_precreate,
+	.vcpu_create =3D vt_vcpu_create,
+	.vcpu_free =3D vt_vcpu_free,
+	.vcpu_reset =3D vt_vcpu_reset,
=20
 	.prepare_switch_to_guest =3D vmx_prepare_switch_to_guest,
 	.vcpu_load =3D vmx_vcpu_load,
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 1ff555cc6c17..74bab1ba2edf 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -138,6 +138,10 @@ int tdx_vm_init(struct kvm *kvm);
 void tdx_mmu_release_hkid(struct kvm *kvm);
 void tdx_vm_free(struct kvm *kvm);
=20
+int tdx_vcpu_create(struct kvm_vcpu *vcpu);
+void tdx_vcpu_free(struct kvm_vcpu *vcpu);
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
@@ -150,6 +154,10 @@ static inline void tdx_mmu_release_hkid(struct kvm *kv=
m) {}
 static inline void tdx_flush_shadow_all_private(struct kvm *kvm) {}
 static inline void tdx_vm_free(struct kvm *kvm) {}
=20
+static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTS=
UPP; }
+static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
+static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
+
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 #endif
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5094CC433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:16:32 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S236599AbiEESUJ (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:20:09 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36372 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383100AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C861E15831;
        Thu,  5 May 2022 11:15:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774545; x=1683310545;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=QQ8pSLVoHt54x3wbAA4rJ6sfIQgZFdaNBZcEiyXm8+o=;
  b=mJ26Uz9L2nid/IB1St0xHZQ+wJgl+HsUrOYyWeU7OXhfHKM7/rq5ga64
   jD4hKT3DWlDdNt+q3BIP1q+z4K9pfkzvq7A52qgVB3RXT9xVEjfLlXPCL
   RqzHF/7BaEliMjiTVb8Q9x9f08CoBjINFtkf2sFpMCT3pdD30aAYBEMxn
   Av47dEPMnzCPNdZqOpklnMHX8aR0gJ6CtDdkH6jNIR1h0EKmt8Er/FCSH
   Vrm5owfzVvweOCzHJgyaTk12nLPoFdeI1FztAApnjeFIkhmTY6ZEEQ4SK
   XSt86hW3FokliYSZ4UIxEY908a8HUq/KERZ7Bpe6fWAzdnd9QyHH33svG
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113867"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113867"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:44 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083238"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:44 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 031/104] KVM: TDX: Do TDX specific vcpu initialization
Date: Thu,  5 May 2022 11:14:25 -0700
Message-Id: 
 <753717dd4a4ff6b893e951d262bc6b78f959c9c4.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TD guest vcpu need to be configured before ready to run which requests
addtional information from Device model (e.g. qemu), one 64bit value is
passed to vcpu's RCX as an initial value.  Repurpose KVM_MEMORY_ENCRYPT_OP
to vcpu-scope and add new sub-commands KVM_TDX_INIT_VCPU under it for such
additional vcpu configuration.

Add callback for kvm vCPU-scoped operations of KVM_MEMORY_ENCRYPT_OP and
add a new subcommand, KVM_TDX_INIT_VCPU, for further vcpu initialization.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h    |  1 +
 arch/x86/include/asm/kvm_host.h       |  1 +
 arch/x86/include/uapi/asm/kvm.h       |  1 +
 arch/x86/kvm/vmx/main.c               |  9 +++++++
 arch/x86/kvm/vmx/tdx.c                | 36 +++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h                |  4 +++
 arch/x86/kvm/vmx/x86_ops.h            |  2 ++
 arch/x86/kvm/x86.c                    |  6 +++++
 tools/arch/x86/include/uapi/asm/kvm.h |  1 +
 9 files changed, 61 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 3677a5015a4f..32a6df784ea6 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -119,6 +119,7 @@ KVM_X86_OP(leave_smm)
 KVM_X86_OP(enable_smi_window)
 KVM_X86_OP_OPTIONAL(dev_mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
+KVM_X86_OP_OPTIONAL(vcpu_mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_register_region)
 KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
 KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index bc2038157f11..60a97ae55972 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1499,6 +1499,7 @@ struct kvm_x86_ops {
=20
 	int (*dev_mem_enc_ioctl)(void __user *argp);
 	int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp);
+	int (*vcpu_mem_enc_ioctl)(struct kvm_vcpu *vcpu, void __user *argp);
 	int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *ar=
gp);
 	int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *=
argp);
 	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 0f067fe7c186..6b1c3e0e9a3c 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -533,6 +533,7 @@ struct kvm_pmu_event_filter {
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
 	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 4f4ed4ad65a7..ce12cc8276ef 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -113,6 +113,14 @@ static int vt_mem_enc_ioctl(struct kvm *kvm, void __us=
er *argp)
 	return tdx_vm_ioctl(kvm, argp);
 }
=20
+static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
+{
+	if (!is_td_vcpu(vcpu))
+		return -EINVAL;
+
+	return tdx_vcpu_ioctl(vcpu, argp);
+}
+
 struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.name =3D "kvm_intel",
=20
@@ -255,6 +263,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.dev_mem_enc_ioctl =3D tdx_dev_ioctl,
 	.mem_enc_ioctl =3D vt_mem_enc_ioctl,
+	.vcpu_mem_enc_ioctl =3D vt_vcpu_mem_enc_ioctl,
 };
=20
 struct kvm_x86_init_ops vt_init_ops __initdata =3D {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index eb4fd31bc369..54573537e2b8 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -83,6 +83,11 @@ static inline bool is_hkid_assigned(struct kvm_tdx *kvm_=
tdx)
 	return kvm_tdx->hkid > 0;
 }
=20
+static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->finalized;
+}
+
 static void tdx_clear_page(unsigned long page)
 {
 	const void *zero_page =3D (const void *) __va(page_to_phys(ZERO_PAGE(0)));
@@ -802,6 +807,37 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	return r;
 }
=20
+int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	struct kvm_tdx_cmd cmd;
+	u64 err;
+
+	if (tdx->initialized)
+		return -EINVAL;
+
+	if (!is_td_initialized(vcpu->kvm) || is_td_finalized(kvm_tdx))
+		return -EINVAL;
+
+	if (copy_from_user(&cmd, argp, sizeof(cmd)))
+		return -EFAULT;
+
+	if (cmd.error || cmd.unused)
+		return -EINVAL;
+	if (cmd.flags || cmd.id !=3D KVM_TDX_INIT_VCPU)
+		return -EINVAL;
+
+	err =3D tdh_vp_init(tdx->tdvpr.pa, cmd.data);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_VP_INIT, err, NULL);
+		return -EIO;
+	}
+
+	tdx->initialized =3D true;
+	return 0;
+}
+
 int __init tdx_module_setup(void)
 {
 	const struct tdsysinfo_struct *tdsysinfo;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 892e7dc96e99..337c3adb4fcf 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -25,6 +25,8 @@ struct kvm_tdx {
 	u64 xfam;
 	int hkid;
=20
+	bool finalized;
+
 	u64 tsc_offset;
 	unsigned long tsc_khz;
 };
@@ -35,6 +37,8 @@ struct vcpu_tdx {
 	struct tdx_td_page tdvpr;
 	struct tdx_td_page *tdvpx;
=20
+	bool initialized;
+
 	/*
 	 * Dummy to make pmu_intel not corrupt memory.
 	 * TODO: Support PMU for TDX.  Future work.
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 74bab1ba2edf..ab94f95bb915 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -143,6 +143,7 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
+int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
@@ -159,6 +160,7 @@ static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu)=
 {}
 static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
+static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd282e5efec1..e9b5d6007025 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5734,6 +5734,12 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	case KVM_SET_DEVICE_ATTR:
 		r =3D kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp);
 		break;
+	case KVM_MEMORY_ENCRYPT_OP:
+		r =3D -ENOTTY;
+		if (!kvm_x86_ops.vcpu_mem_enc_ioctl)
+			goto out;
+		r =3D kvm_x86_ops.vcpu_mem_enc_ioctl(vcpu, argp);
+		break;
 	default:
 		r =3D -EINVAL;
 	}
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 779dfd683d66..60a79f9ef174 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -532,6 +532,7 @@ struct kvm_pmu_event_filter {
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
 	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 027E8C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:17:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383396AbiEESU7 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:20:59 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36370 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383087AbiEEST2 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:28 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA2DA140F8;
        Thu,  5 May 2022 11:15:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774544; x=1683310544;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=GrRrgKMz4riAK/ebZ/sI81D2KhEbTTaxZ+pXBobqoJ4=;
  b=JjVkR0rW+eVV7UK5ktNuoTohlxjmRuB/Sg6Bbw8bV2sTHClDw9NONjcR
   aJszWGrsiERoWVBSPdMwkxNP/uHH0lc5rHNmPwKtUH47J04YueZh35EVv
   WccbS1eIk9e8TAvHMCler+aYtaQ7O1ya6KdKex7RzDzR/DzO7R6TCNFsp
   CkVSFA+bHhTVMVbxSO7jnEcE2kisMeX7WDDxHvGxvQGllysOqTHupO4I1
   nWEmJHiYllm7f1TmxVaGUtOuqjaI8m4lQKJLA9QMs1bTFdZGRZRqHxtal
   NYVEXt3DOx6kLHSmrK/XYPPtG1YzAMBuL1ilxjHfwoFetPUz2QoavWFnT
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113868"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113868"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:44 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083241"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:44 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 032/104] [MARKER] The start of TDX KVM patch series:
 KVM MMU GPA shared bits
Date: Thu,  5 May 2022 11:14:26 -0700
Message-Id: 
 <0003cf30319cfbe25dff1654558b3d6664cb5754.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of KVM MMU GPA
shared bits.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 3e8efde3e3f3..6e3f71ab6b59 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -10,6 +10,7 @@ What qemu can do
 ----------------
 - TDX VM TYPE is exposed to Qemu.
 - Qemu can create/destroy guest of TDX vm type.
+- Qemu can create/destroy vcpu of TDX vm type.
=20
 Patch Layer status
 ------------------
@@ -17,13 +18,13 @@ Patch Layer status
 * TDX, VMX coexistence:                 Applied
 * TDX architectural definitions:        Applied
 * TD VM creation/destruction:           Applied
-* TD vcpu creation/destruction:         Applying
+* TD vcpu creation/destruction:         Applied
 * TDX EPT violation:                    Not yet
 * TD finalization:                      Not yet
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
-* KVM MMU GPA shared bits:              Not yet
+* KVM MMU GPA shared bits:              Applying
 * KVM TDP refactoring for TDX:          Not yet
 * KVM TDP MMU hooks:                    Not yet
 * KVM TDP MMU MapGPA:                   Not yet
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 71125C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:16:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231866AbiEESUP (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:20:15 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36326 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383097AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 540AB15A30;
        Thu,  5 May 2022 11:15:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774547; x=1683310547;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=yHcwMDhdHNxS1z+HmPGIPHlVsmRfZFfsT3rl08Yk2/s=;
  b=ExWL3QRooY3s11YF+NCyR9kY5Iod+xNqdQEQncLk87snaqjvQJsntZ56
   dKQZT9+cbYtmLnvRiQvQGapPKAd5EqlCuyX7HcMDCW/1xmP/EBfQR9n7k
   a4/6/bBLGUgh/Ni6RiFDJGseJYOMvw+B9uHmxFGZ0moBtxW7L5V7cWXcf
   7RXm5ZVBEmnhcmhYOJJdAdzjDlDQvhyQtj9zqmGfp6UXixpLy+sNKKKLW
   Smor4XKfjcGDZ+lydsm9PCt6CfedMU/DQRBmiF/93zLCiVpNc9UWDG0Nw
   JdHOsWXBMA7s9rzdflXy01ZTWlQOgyrbYZoNBBQnxoKU8YPr02eFZiIut
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113869"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113869"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:44 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083246"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:44 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 033/104] KVM: x86/mmu: introduce config for PRIVATE KVM
 MMU
Date: Thu,  5 May 2022 11:14:27 -0700
Message-Id: 
 <7d91cfb0b37b18a072043929c4c6dae7c78cd85d.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

To Keep the case of non TDX intact, introduce a new config option for
private KVM MMU support.  At the moment, this is synonym for
CONFIG_INTEL_TDX_HOST && CONFIG_KVM_INTEL.  The new flag make it clear
that the config is only for x86 KVM MMU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/Kconfig | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index e3cbd7706136..5a59abc83179 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -129,4 +129,8 @@ config KVM_XEN
 config KVM_EXTERNAL_WRITE_TRACKING
 	bool
=20
+config KVM_MMU_PRIVATE
+	def_bool y
+	depends on INTEL_TDX_HOST && KVM_INTEL
+
 endif # VIRTUALIZATION
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 56914C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:23:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1356593AbiEES1d (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:27:33 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36456 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383196AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93D5338DA9;
        Thu,  5 May 2022 11:15:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774552; x=1683310552;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=h813y3dEF676rU6qIUTQNIYMn233TjmsldMLS/ofG4A=;
  b=NhjCzpdOuT8ommU+JsCv0cUJBr4IbzsiAuYbKrjNaOKa9nuoi0XKtyeO
   HZLY3K/aY/9WdBIFf9vCgRHxseu5ChMqZvg3xw68Mcbt9LZY4ytoXwh2Z
   4drzjvM0R5evYp6Cvc0JvGyJDmklnDYdaS7/CqFD6B41E0rbY4mdQZ8iY
   jM6GnybT7FfMaixizDSXTOnO1kxxCjYKsbCaRGNsoRKqfG3jvq4KpDzMN
   5ZpPWJxInorBbk/hl2RnrTR3horq4eXjMEcTPSao6evHEdZR8/oUS+Rr3
   6SSNigObRkV+eiOFGvk9aCMvqwWFAk1Cu+RRAsIsGGLmABsEC0/aJRpI/
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742015"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742015"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083250"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:44 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 034/104] KVM: x86/mmu: Add address conversion functions
 for TDX shared bits
Date: Thu,  5 May 2022 11:14:28 -0700
Message-Id: 
 <38c30f2c5ad6f9ca018c3e990f244c9b67ef10cb.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Rick Edgecombe <rick.p.edgecombe@intel.com>

TDX repurposes one GPA bits (51 bit or 47 bit based on configuration) to
indicate the GPA is private(if cleared) or shared (if set) with VMM.  If
GPA.shared is set, GPA is converted existing conventional EPT pointed by
EPTP.  If GPA.shared bit is cleared, GPA is converted by Secure-EPT(S-EPT)
TDX module manages.  VMM has to issue SEAM call to TDX module to operate on
S-EPT.  e.g. populating/zapping guest page or shadow page by TDH.PAGE.{ADD,
REMOVE} for guest page, TDH.PAGE.SEPT.{ADD, REMOVE} S-EPT etc.

Several hooks needs to be added to KVM MMU to support TDX.  Add a function
to check if KVM MMU is running for TDX and several functions for address
conversation between private-GPA and shared-GPA.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu.h              | 32 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu/mmu.c          |  6 ++++--
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 60a97ae55972..88fd3fd3e1a0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1251,7 +1251,9 @@ struct kvm_arch {
 	 */
 	u32 max_vcpu_ids;
=20
+#ifdef CONFIG_KVM_MMU_PRIVATE
 	gfn_t gfn_shared_mask;
+#endif
 };
=20
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 7e258cc94152..3647035a147e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -373,4 +373,36 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu =
*vcpu,
 		return gpa;
 	return translate_nested_gpa(vcpu, gpa, access, exception);
 }
+
+static inline gfn_t kvm_gfn_shared_mask(const struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_MMU_PRIVATE
+	return kvm->arch.gfn_shared_mask;
+#else
+	return 0;
+#endif
+}
+
+static inline gfn_t kvm_gfn_shared(const struct kvm *kvm, gfn_t gfn)
+{
+	return gfn | kvm_gfn_shared_mask(kvm);
+}
+
+static inline gfn_t kvm_gfn_private(const struct kvm *kvm, gfn_t gfn)
+{
+	return gfn & ~kvm_gfn_shared_mask(kvm);
+}
+
+static inline gpa_t kvm_gpa_private(const struct kvm *kvm, gpa_t gpa)
+{
+	return gpa & ~gfn_to_gpa(kvm_gfn_shared_mask(kvm));
+}
+
+static inline bool kvm_is_private_gpa(const struct kvm *kvm, gpa_t gpa)
+{
+	gfn_t mask =3D kvm_gfn_shared_mask(kvm);
+
+	return mask && !(gpa_to_gfn(gpa) & mask);
+}
+
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 909372762363..d1c37295bb6e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -264,8 +264,10 @@ static void kvm_flush_remote_tlbs_with_range(struct kv=
m *kvm,
 {
 	int ret =3D -ENOTSUPP;
=20
-	if (range && kvm_x86_ops.tlb_remote_flush_with_range)
+	if (range && kvm_available_flush_tlb_with_range()) {
+		/* Callback should flush both private GFN and shared GFN. */
 		ret =3D static_call(kvm_x86_tlb_remote_flush_with_range)(kvm, range);
+	}
=20
 	if (ret)
 		kvm_flush_remote_tlbs(kvm);
@@ -4048,7 +4050,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, s=
truct kvm_page_fault *fault
 	unsigned long mmu_seq;
 	int r;
=20
-	fault->gfn =3D fault->addr >> PAGE_SHIFT;
+	fault->gfn =3D gpa_to_gfn(fault->addr) & ~kvm_gfn_shared_mask(vcpu->kvm);
 	fault->slot =3D kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
=20
 	if (page_fault_handle_page_track(vcpu, fault))
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 85DE1C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:23:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383530AbiEES11 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:27:27 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36390 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383110AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A3F341607;
        Thu,  5 May 2022 11:15:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774553; x=1683310553;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=rWTnsxMrIzKafHc5a68BzRinBeuJZkWqHiP4eLXCUc4=;
  b=SvIrh+EeILyJsq2PyiGCb4iInmHSp6/fpeEVXARNUKnt0p6CK69l0+A6
   Tr1yVns8lqnnQZM36TdvkLSNl4B0Ne5N+UyuFcVOZUw3SRByyiX5WADPF
   j6eBLLgB1roVVi1tGK+j/9vwWh/lGxDKzc5HW1ugtpUHATP1UrqlIxLg0
   bh9S+E3sMCIi6vijhSMIbDb9kLWdRcB58MF66BEiG3K49iztzKjQbypCR
   S8qawUy8bdejii7Kek/dFIr9liQ7D5qsT2ETp+VROjwP4oaCW0JK6fUHz
   9+neO9Gpyj2HKUSKLw+Fdy20WghwfIlMGDOX3TzxWNbxesc/IA8u+F84C
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742017"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742017"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083254"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:44 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 035/104] [MARKER] The start of TDX KVM patch series:
 KVM TDP refactoring for TDX
Date: Thu,  5 May 2022 11:14:29 -0700
Message-Id: 
 <8fae2816de915ca61f2f35fc8d120f23bf0fcd48.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of KVM TDP
refactoring for TDX.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 6e3f71ab6b59..df003d2ed89e 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -24,7 +24,7 @@ Patch Layer status
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
-* KVM MMU GPA shared bits:              Applying
-* KVM TDP refactoring for TDX:          Not yet
+* KVM MMU GPA shared bits:              Applied
+* KVM TDP refactoring for TDX:          Applying
 * KVM TDP MMU hooks:                    Not yet
 * KVM TDP MMU MapGPA:                   Not yet
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 87D8BC433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:23:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235548AbiEES0u (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:26:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36890 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383211AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A57A41FAE;
        Thu,  5 May 2022 11:15:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774553; x=1683310553;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=c4l/dtPeYjGdPkK10ruYt5hIF0woALL4dwVeXy0KjaY=;
  b=hRutbVcxKGYmQS4BmALpfPmhHvRcKPq0zrEaKPyhMl+RvrwdYqDcqYZu
   zrTYfrKtVot5MvrCfUhq5Tg8qcNWfaIbo+WteMU0lyxuV/kCPZ96VWIYZ
   EKiGDSkgeEdlUuirGXzlwvF52FkaFt8RKtIuSCO0BOS2gZwdy/Mahw4sa
   UrYqOxIqySq05pD6fQW6FfR/+Y32HXfJeArYIqHd1N3004YrZKqpW9Ean
   P4UXI5EDVE+JAm2KPjFctthkAw8p4pOMemcHATPrQ/gq9QEyC9MO4NtV8
   GuD/Fqo4oKKvcU95+Ob1VrpDkFMCHgM+BP9di/d8lThgwqosfgE1p36w7
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742018"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742018"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083257"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 036/104] KVM: x86/mmu: Explicitly check for MMIO spte
 in fast page fault
Date: Thu,  5 May 2022 11:14:30 -0700
Message-Id: 
 <d1a1da631b44f425d929767fda74c90de2d87a8d.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Explicitly check for an MMIO spte in the fast page fault flow.  TDX will
use a not-present entry for MMIO sptes, which can be mistaken for an
access-tracked spte since both have SPTE_SPECIAL_MASK set.

MMIO sptes are handled in handle_mmio_page_fault for non-TDX VMs, so this
patch does not affect them.  TDX will handle MMIO emulation through a
hypercall instead.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d1c37295bb6e..4a12d862bbb6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3184,7 +3184,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, str=
uct kvm_page_fault *fault)
 		else
 			sptep =3D fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
=20
-		if (!is_shadow_present_pte(spte))
+		if (!is_shadow_present_pte(spte) || is_mmio_spte(spte))
 			break;
=20
 		sp =3D sptep_to_sp(sptep);
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E2CCAC433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:23:19 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S242982AbiEES05 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:26:57 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36396 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383212AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F3B647AE1;
        Thu,  5 May 2022 11:15:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774554; x=1683310554;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=1jkcBKQleZuP+2LYbN3mGxRTD6jMnVu63Fc7/MnhhWw=;
  b=XW73NtPJFFJ4Kx+rg9y9JJrhAA0+h2NrhxfcRBaW1iwUK3lwvlnZhHbP
   MOmzYY2/Okpv0rGSBQHh95/1+Wi3GBrPPui9H0s7gwFNm5WRFkTnsz++O
   Ri9xM1fysVy6XPd7W9xal87fM5T37B/4Ky9FbRbdfGdkCp5bgZMOLpl5y
   yCuJwo/P5kyw/ZdbE9AZNMfTyBx6KXstfSfoLbqAFB73PFk6WSUC+sV0r
   ulB5WapgG71cF02FKyT0pEj9egOsE4HOBOsxE4t0tGdAJzyNpl4M7lRfA
   HJJSEITu+qdqL+3w+dXEaOmnkjgYv63lv0NhsYj+dZH3GSr5ze4hrMaZb
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742019"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742019"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083261"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 037/104] KVM: x86/mmu: Allow non-zero value for
 non-present SPTE
Date: Thu,  5 May 2022 11:14:31 -0700
Message-Id: 
 <bfa4f7415a1d059bd3a4c6d14105f2baf2d03ba6.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX introduced a new ETP, Secure-EPT, in addition to the existing EPT.
Secure-EPT maps protected guest memory, which is called private. Since
Secure-EPT page tables is also protected, those page tables is also called
private.  The existing EPT is often called shared EPT to distinguish from
Secure-EPT.  And also page tables for share EPT is also called shared.

Virtualization Exception, #VE, is a new processor exception in VMX non-root
operation.  In certain virtualizatoin-related conditions, #VE is injected
into guest instead of exiting from guest to VMM so that guest is given a
chance to inspect it.  One important one is EPT violation.  When
"ETP-violation #VE" VM-execution is set, "#VE suppress bit" in EPT entry
is cleared, #VE is injected instead of EPT violation.

Because guest memory is protected with TDX, VMM can't parse instructions
in the guest memory.  Instead, MMIO hypercall is used for guest to pass
necessary information to VMM.

To make unmodified device driver work, guest TD expects #VE on accessing
shared GPA.  The #VE handler converts MMIO access into MMIO hypercall with
the EPT entry of enabled "#VE" by clearing "suppress #VE" bit.  Before VMM
enabling #VE, it needs to figure out the given GPA is for MMIO by EPT
violation.  So the execution flow looks like

- Allocate unused shared EPT entry with suppress #VE bit set.
- EPT violation on that GPA.
- VMM figures out the faulted GPA is for MMIO.
- VMM clears the suppress #VE bit.
- Guest TD gets #VE, and converts MMIO access into MMIO hypercall.
- If the GPA maps guest memory, VMM resolves it with guest pages.

For both cases, SPTE needs suppress #VE" bit set initially when it
is allocated or zapped, therefore non-zero non-present value for SPTE
needs to be allowed.

This change requires to update FNAME(sync_page) for shadow EPT.
"if(!sp->spte[i])" in FNAME(sync_page) means that the spte entry is the
initial value.  With the introduction of shadow_nonpresent_value which can
be non-zero, it doesn't hold any more. Replace zero check with
"!is_shadow_present_pte() && !is_mmio_spte()".

When "if (!spt[i])" doesn't hold, but the entry value is
shadow_nonpresent_value, the entry is wrongly synchronized from non-present
to non-present with (wrongly) pfn changed and tries to remove rmap wrongly
and BUG_ON() is hit.

TDP MMU uses REMOVED_SPTE =3D 0x5a0ULL as special constant to indicate the
intermediate value to indicate one thread is operating on it and the value
should be semi-arbitrary value.  For TDX (more correctly to use #VE), the
value should include suppress #VE value which is SHADOW_NONPRESENT_VALUE.
Rename REMOVED_SPTE to __REMOVED_SPTE and define REMOVED_SPTE as
SHADOW_NONPRESENT_VALUE | REMOVED_SPTE to set suppress #VE bit.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c         | 55 ++++++++++++++++++++++++++++++----
 arch/x86/kvm/mmu/paging_tmpl.h |  3 +-
 arch/x86/kvm/mmu/spte.c        |  5 +++-
 arch/x86/kvm/mmu/spte.h        | 37 ++++++++++++++++++++---
 arch/x86/kvm/mmu/tdp_mmu.c     | 23 +++++++++-----
 5 files changed, 105 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4a12d862bbb6..324ea25ee0c7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -693,6 +693,44 @@ static void walk_shadow_page_lockless_end(struct kvm_v=
cpu *vcpu)
 	}
 }
=20
+static inline void kvm_init_shadow_page(void *page)
+{
+#ifdef CONFIG_X86_64
+	int ign;
+
+	WARN_ON_ONCE(shadow_nonpresent_value !=3D SHADOW_NONPRESENT_VALUE);
+	asm volatile (
+		"rep stosq\n\t"
+		: "=3Dc"(ign), "=3DD"(page)
+		: "a"(SHADOW_NONPRESENT_VALUE), "c"(4096/8), "D"(page)
+		: "memory"
+	);
+#else
+	BUG();
+#endif
+}
+
+static int mmu_topup_shadow_page_cache(struct kvm_vcpu *vcpu)
+{
+	struct kvm_mmu_memory_cache *mc =3D &vcpu->arch.mmu_shadow_page_cache;
+	int start, end, i, r;
+	bool is_tdp_mmu =3D is_tdp_mmu_enabled(vcpu->kvm);
+
+	if (is_tdp_mmu && shadow_nonpresent_value)
+		start =3D kvm_mmu_memory_cache_nr_free_objects(mc);
+
+	r =3D kvm_mmu_topup_memory_cache(mc, PT64_ROOT_MAX_LEVEL);
+	if (r)
+		return r;
+
+	if (is_tdp_mmu && shadow_nonpresent_value) {
+		end =3D kvm_mmu_memory_cache_nr_free_objects(mc);
+		for (i =3D start; i < end; i++)
+			kvm_init_shadow_page(mc->objects[i]);
+	}
+	return 0;
+}
+
 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indir=
ect)
 {
 	int r;
@@ -702,8 +740,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcp=
u, bool maybe_indirect)
 				       1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
 	if (r)
 		return r;
-	r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
-				       PT64_ROOT_MAX_LEVEL);
+	r =3D mmu_topup_shadow_page_cache(vcpu);
 	if (r)
 		return r;
 	if (maybe_indirect) {
@@ -5510,9 +5547,16 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forc=
ed_root_level,
 	 * what is used by the kernel for any given HVA, i.e. the kernel's
 	 * capabilities are ultimately consulted by kvm_mmu_hugepage_adjust().
 	 */
-	if (tdp_enabled)
+	if (tdp_enabled) {
+		/*
+		 * For TDP MMU, always set bit 63 for TDX support. See the
+		 * comment on SHADOW_NONPRESENT_VALUE.
+		 */
+#ifdef CONFIG_X86_64
+		shadow_nonpresent_value =3D SHADOW_NONPRESENT_VALUE;
+#endif
 		max_huge_page_level =3D tdp_huge_page_level;
-	else if (boot_cpu_has(X86_FEATURE_GBPAGES))
+	} else if (boot_cpu_has(X86_FEATURE_GBPAGES))
 		max_huge_page_level =3D PG_LEVEL_1G;
 	else
 		max_huge_page_level =3D PG_LEVEL_2M;
@@ -5643,7 +5687,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_page_header_cache.kmem_cache =3D mmu_page_header_cache;
 	vcpu->arch.mmu_page_header_cache.gfp_zero =3D __GFP_ZERO;
=20
-	vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO;
+	if (!(is_tdp_mmu_enabled(vcpu->kvm) && shadow_nonpresent_value))
+		vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO;
=20
 	vcpu->arch.mmu =3D &vcpu->arch.root_mmu;
 	vcpu->arch.walk_mmu =3D &vcpu->arch.root_mmu;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index b025decf610d..058efd4bbcbc 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1030,7 +1030,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, st=
ruct kvm_mmu_page *sp)
 		gpa_t pte_gpa;
 		gfn_t gfn;
=20
-		if (!sp->spt[i])
+		if (!is_shadow_present_pte(sp->spt[i]) &&
+		    !is_mmio_spte(sp->spt[i]))
 			continue;
=20
 		pte_gpa =3D first_pte_gpa + i * sizeof(pt_element_t);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 75c9e87d446a..1bf934f64b6f 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -36,6 +36,9 @@ u64 __read_mostly shadow_present_mask;
 u64 __read_mostly shadow_me_value;
 u64 __read_mostly shadow_me_mask;
 u64 __read_mostly shadow_acc_track_mask;
+#ifdef CONFIG_X86_64
+u64 __read_mostly shadow_nonpresent_value;
+#endif
=20
 u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
 u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;
@@ -330,7 +333,7 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmi=
o_mask, u64 access_mask)
 	 * not set any RWX bits.
 	 */
 	if (WARN_ON((mmio_value & mmio_mask) !=3D mmio_value) ||
-	    WARN_ON(mmio_value && (REMOVED_SPTE & mmio_mask) =3D=3D mmio_value))
+	    WARN_ON(mmio_value && (__REMOVED_SPTE & mmio_mask) =3D=3D mmio_value))
 		mmio_value =3D 0;
=20
 	if (!mmio_value)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index fbbab180395e..3319ca7f8f48 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -140,6 +140,19 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS =3D=3D 8 && MMIO_=
SPTE_GEN_HIGH_BITS =3D=3D 11);
=20
 #define MMIO_SPTE_GEN_MASK		GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE=
_GEN_HIGH_BITS - 1, 0)
=20
+/*
+ * non-present SPTE value for both VMX and SVM for TDP MMU.
+ * For SVM NPT, for non-present spte (bit 0 =3D 0), other bits are ignored.
+ * For VMX EPT, bit 63 is ignored if #VE is disabled.
+ *              bit 63 is #VE suppress if #VE is enabled.
+ */
+#ifdef CONFIG_X86_64
+#define SHADOW_NONPRESENT_VALUE	BIT_ULL(63)
+static_assert(!(SHADOW_NONPRESENT_VALUE & SPTE_MMU_PRESENT_MASK));
+#else
+#define SHADOW_NONPRESENT_VALUE	0ULL
+#endif
+
 extern u64 __read_mostly shadow_host_writable_mask;
 extern u64 __read_mostly shadow_mmu_writable_mask;
 extern u64 __read_mostly shadow_nx_mask;
@@ -154,6 +167,12 @@ extern u64 __read_mostly shadow_present_mask;
 extern u64 __read_mostly shadow_me_value;
 extern u64 __read_mostly shadow_me_mask;
=20
+#ifdef CONFIG_X86_64
+extern u64 __read_mostly shadow_nonpresent_value;
+#else
+#define shadow_nonpresent_value	0ULL
+#endif
+
 /*
  * SPTEs in MMUs without A/D bits are marked with SPTE_TDP_AD_DISABLED_MAS=
K;
  * shadow_acc_track_mask is the set of bits to be cleared in non-accessed
@@ -174,9 +193,12 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mas=
k;
=20
 /*
  * If a thread running without exclusive control of the MMU lock must perf=
orm a
- * multi-part operation on an SPTE, it can set the SPTE to REMOVED_SPTE as=
 a
+ * multi-part operation on an SPTE, it can set the SPTE to __REMOVED_SPTE =
as a
  * non-present intermediate value. Other threads which encounter this value
- * should not modify the SPTE.
+ * should not modify the SPTE.  For the case that TDX is enabled,
+ * SHADOW_NONPRESENT_VALUE, which is "suppress #VE" bit set because TDX mo=
dule
+ * always enables "EPT violation #VE".  The bit is ignored by non-TDX case=
 as
+ * present bit (bit 0) is cleared.
  *
  * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-prese=
nt on
  * bot AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a=
 L1TF
@@ -184,10 +206,17 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_ma=
sk;
  *
  * Only used by the TDP MMU.
  */
-#define REMOVED_SPTE	0x5a0ULL
+#define __REMOVED_SPTE	0x5a0ULL
=20
 /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
-static_assert(!(REMOVED_SPTE & SPTE_MMU_PRESENT_MASK));
+static_assert(!(__REMOVED_SPTE & SPTE_MMU_PRESENT_MASK));
+static_assert(!(__REMOVED_SPTE & SHADOW_NONPRESENT_VALUE));
+
+/*
+ * See above comment around __REMOVED_SPTE.  REMOVED_SPTE is the actual
+ * intermediate value set to the removed SPET.  it sets the "suppress #VE"=
 bit.
+ */
+#define REMOVED_SPTE	(SHADOW_NONPRESENT_VALUE | __REMOVED_SPTE)
=20
 static inline bool is_removed_spte(u64 spte)
 {
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 4fabb2cd0ba9..383904742f44 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -673,8 +673,16 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *=
kvm,
 	 * special removed SPTE value. No bookkeeping is needed
 	 * here since the SPTE is going from non-present
 	 * to non-present.
+	 *
+	 * Set non-present value to SHADOW_NONPRESENT_VALUE, rather than 0.
+	 * It is because when TDX is enabled, TDX module always
+	 * enables "EPT-violation #VE", so KVM needs to set
+	 * "suppress #VE" bit in EPT table entries, in order to get
+	 * real EPT violation, rather than TDVMCALL.  KVM sets
+	 * SHADOW_NONPRESENT_VALUE (which sets "suppress #VE" bit) so it
+	 * can be set when EPT table entries are zapped.
 	 */
-	kvm_tdp_mmu_write_spte(iter->sptep, 0);
+	kvm_tdp_mmu_write_spte(iter->sptep, SHADOW_NONPRESENT_VALUE);
=20
 	return 0;
 }
@@ -846,8 +854,8 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct =
kvm_mmu_page *root,
 			continue;
=20
 		if (!shared)
-			tdp_mmu_set_spte(kvm, &iter, 0);
-		else if (tdp_mmu_set_spte_atomic(kvm, &iter, 0))
+			tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
+		else if (tdp_mmu_set_spte_atomic(kvm, &iter, SHADOW_NONPRESENT_VALUE))
 			goto retry;
 	}
 }
@@ -903,8 +911,9 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu=
_page *sp)
 	if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
 		return false;
=20
-	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
-			   sp->gfn, sp->role.level + 1, true, true);
+	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte,
+			   SHADOW_NONPRESENT_VALUE, sp->gfn, sp->role.level + 1,
+			   true, true);
=20
 	return true;
 }
@@ -941,7 +950,7 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct k=
vm_mmu_page *root,
 		    !is_last_spte(iter.old_spte, iter.level))
 			continue;
=20
-		tdp_mmu_set_spte(kvm, &iter, 0);
+		tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
 		flush =3D true;
 	}
=20
@@ -1312,7 +1321,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_=
iter *iter,
 	 * invariant that the PFN of a present * leaf SPTE can never change.
 	 * See __handle_changed_spte().
 	 */
-	tdp_mmu_set_spte(kvm, iter, 0);
+	tdp_mmu_set_spte(kvm, iter, SHADOW_NONPRESENT_VALUE);
=20
 	if (!pte_write(range->pte)) {
 		new_spte =3D kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 86971C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:22:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S236729AbiEES0L (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:26:11 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36922 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383244AbiEESTm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:42 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 393F65DA14;
        Thu,  5 May 2022 11:15:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774555; x=1683310555;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=enMxtBvaytMuE9UKh3G4V5V6Q6g5MjQ1J0dl7+LzurY=;
  b=Y3NSryAsTZnAqu0dydiAwBtioIANMjqOBErrOHXti7elZ1nQFq0caxmj
   BhXCGFH9W8ybT+gZfs8wFu1IQlMOlD1D0/qx4A11dmCn5qYjJzLx06iRL
   lmox3jKOxiQzEa2mFJqOlGTLX2CwIGM8lL9KwrpDIQmbm8fKlHxJ7fxq1
   MYSUCP6q2rUS/zo6jrqzObr7oTRwCmPkxwxP0p4I6n2xQo/rjBXPKtWci
   EYfX/KRC7xDiG2i/tFLpwnJsJYQPrYwYvaK+l5/d6cl1/fXHBZTf6EQ45
   fFkKt6jfbDjku4RXHmLyYND6OeZiwbIp6sXN6+T9yXBfgy0rlPFj6EOef
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742020"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742020"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083264"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 038/104] KVM: x86/mmu: Track shadow MMIO value/mask on
 a per-VM basis
Date: Thu,  5 May 2022 11:14:32 -0700
Message-Id: 
 <879bb2e4e5c64e6000b68b46c7522bfbb8c875d8.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX will use a different shadow PTE entry value for MMIO from VMX.  Add
members to kvm_arch and track value for MMIO per-VM instead of global
variables.  By using the per-VM EPT entry value for MMIO, the existing VMX
logic is kept working.

In the case of VMX VM case, the EPT entry for MMIO is non-present PTE
(present bit cleared) without backing guest physical address (on EPT
violation, KVM searches backing guest memory and it finds there is no
backing guest page.) or the value to trigger EPT misconfiguration.  Once
MMIO is triggered on the EPT entry, the EPT entry is updated to trigger EPT
misconfiguration for the future MMIO on the same GPA.  It allows KVM to
understand the memory access is for MMIO without searching backing guest
pages.). And then KVM parses guest instruction to figure out
address/value/width for MMIO.

In the case of the guest TD, the guest memory is protected so that VMM
can't parse guest instruction to understand the value and access width for
MMIO.  Instead, VMM sets up (Shared) EPT to trigger #VE by clearing
the VE-suppress bit.  When the guest TD issues MMIO, #VE is injected.  Gues=
t VE
handler converts MMIO access into MMIO hypercall to pass
address/value/width for MMIO to VMM. (or directly paravirtualize MMIO into
hypercall.)  Then VMM can handle the MMIO hypercall without parsing guest
instructions.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++++
 arch/x86/include/asm/vmx.h      |  1 +
 arch/x86/kvm/mmu.h              |  4 +++-
 arch/x86/kvm/mmu/mmu.c          | 20 ++++++++++++----
 arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
 arch/x86/kvm/mmu/spte.c         | 41 +++++++++++++++------------------
 arch/x86/kvm/mmu/spte.h         | 11 ++++-----
 arch/x86/kvm/mmu/tdp_mmu.c      |  6 ++---
 arch/x86/kvm/svm/svm.c          |  2 +-
 arch/x86/kvm/vmx/vmx.c          |  8 +++++++
 10 files changed, 59 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 88fd3fd3e1a0..c9c113316fd3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1078,6 +1078,10 @@ struct kvm_arch {
 	 */
 	spinlock_t mmu_unsync_pages_lock;
=20
+	bool enable_mmio_caching;
+	u64 shadow_mmio_value;
+	u64 shadow_mmio_mask;
+
 	struct list_head assigned_dev_head;
 	struct iommu_domain *iommu_domain;
 	bool iommu_noncoherent;
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 89d2172787c5..9682f5a02da8 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -509,6 +509,7 @@ enum vmcs_field {
 #define VMX_EPT_IPAT_BIT    			(1ull << 6)
 #define VMX_EPT_ACCESS_BIT			(1ull << 8)
 #define VMX_EPT_DIRTY_BIT			(1ull << 9)
+#define VMX_EPT_SUPPRESS_VE_BIT			(1ull << 63)
 #define VMX_EPT_RWX_MASK                        (VMX_EPT_READABLE_MASK |  =
     \
 						 VMX_EPT_WRITABLE_MASK |       \
 						 VMX_EPT_EXECUTABLE_MASK)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 3647035a147e..eecb5e27b6a5 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -108,7 +108,9 @@ static inline u8 kvm_get_shadow_phys_bits(void)
 	return boot_cpu_data.x86_phys_bits;
 }
=20
-void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_=
mask);
+void kvm_mmu_set_mmio_spte_mask(struct kvm *kvm, u64 mmio_value, u64 mmio_=
mask,
+				u64 access_mask);
+void kvm_mmu_set_default_mmio_spte_mask(u64 mask);
 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
 void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
=20
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 324ea25ee0c7..f4758b1b5202 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2310,7 +2310,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct k=
vm_mmu_page *sp,
 				return kvm_mmu_prepare_zap_page(kvm, child,
 								invalid_list);
 		}
-	} else if (is_mmio_spte(pte)) {
+	} else if (is_mmio_spte(kvm, pte)) {
 		mmu_spte_clear_no_track(spte);
 	}
 	return 0;
@@ -3092,8 +3092,13 @@ static bool handle_abnormal_pfn(struct kvm_vcpu *vcp=
u, struct kvm_page_fault *fa
 		 * by L0 userspace (you can observe gfn > L1.MAXPHYADDR if
 		 * and only if L1's MAXPHYADDR is inaccurate with respect to
 		 * the hardware's).
+		 *
+		 * Excludes the INTEL TD guest.  Because TD memory is
+		 * protected, the instruction can't be emulated.  Instead, use
+		 * SPTE value without #VE suppress bit cleared
+		 * (kvm->arch.shadow_mmio_value =3D 0).
 		 */
-		if (unlikely(!enable_mmio_caching) ||
+		if (unlikely(!vcpu->kvm->arch.enable_mmio_caching) ||
 		    unlikely(fault->gfn > kvm_mmu_max_gfn())) {
 			*ret_val =3D RET_PF_EMULATE;
 			return true;
@@ -3221,7 +3226,8 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, str=
uct kvm_page_fault *fault)
 		else
 			sptep =3D fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
=20
-		if (!is_shadow_present_pte(spte) || is_mmio_spte(spte))
+		if (!is_shadow_present_pte(spte) ||
+		    is_mmio_spte(vcpu->kvm, spte))
 			break;
=20
 		sp =3D sptep_to_sp(sptep);
@@ -3914,7 +3920,7 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vc=
pu, u64 addr, bool direct)
 	if (WARN_ON(reserved))
 		return -EINVAL;
=20
-	if (is_mmio_spte(spte)) {
+	if (is_mmio_spte(vcpu->kvm, spte)) {
 		gfn_t gfn =3D get_mmio_spte_gfn(spte);
 		unsigned int access =3D get_mmio_spte_access(spte);
=20
@@ -4341,7 +4347,7 @@ static unsigned long get_cr3(struct kvm_vcpu *vcpu)
 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
 			   unsigned int access)
 {
-	if (unlikely(is_mmio_spte(*sptep))) {
+	if (unlikely(is_mmio_spte(vcpu->kvm, *sptep))) {
 		if (gfn !=3D get_mmio_spte_gfn(*sptep)) {
 			mmu_spte_clear_no_track(sptep);
 			return true;
@@ -5851,6 +5857,10 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	node->track_write =3D kvm_mmu_pte_write;
 	node->track_flush_slot =3D kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
+	kvm_mmu_set_mmio_spte_mask(kvm, shadow_default_mmio_mask,
+				   shadow_default_mmio_mask,
+				   ACC_WRITE_MASK | ACC_USER_MASK);
+
 	return 0;
 }
=20
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 058efd4bbcbc..1850689fa76c 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1031,7 +1031,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, st=
ruct kvm_mmu_page *sp)
 		gfn_t gfn;
=20
 		if (!is_shadow_present_pte(sp->spt[i]) &&
-		    !is_mmio_spte(sp->spt[i]))
+		    !is_mmio_spte(vcpu->kvm, sp->spt[i]))
 			continue;
=20
 		pte_gpa =3D first_pte_gpa + i * sizeof(pt_element_t);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 1bf934f64b6f..dc89eef82951 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -29,8 +29,7 @@ u64 __read_mostly shadow_x_mask; /* mutual exclusive with=
 nx_mask */
 u64 __read_mostly shadow_user_mask;
 u64 __read_mostly shadow_accessed_mask;
 u64 __read_mostly shadow_dirty_mask;
-u64 __read_mostly shadow_mmio_value;
-u64 __read_mostly shadow_mmio_mask;
+u64 __read_mostly shadow_default_mmio_mask;
 u64 __read_mostly shadow_mmio_access_mask;
 u64 __read_mostly shadow_present_mask;
 u64 __read_mostly shadow_me_value;
@@ -62,10 +61,11 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsi=
gned int access)
 	u64 spte =3D generation_mmio_spte_mask(gen);
 	u64 gpa =3D gfn << PAGE_SHIFT;
=20
-	WARN_ON_ONCE(!shadow_mmio_value);
+	WARN_ON_ONCE(!vcpu->kvm->arch.shadow_mmio_value &&
+		     !kvm_gfn_shared_mask(vcpu->kvm));
=20
 	access &=3D shadow_mmio_access_mask;
-	spte |=3D shadow_mmio_value | access;
+	spte |=3D vcpu->kvm->arch.shadow_mmio_value | access;
 	spte |=3D gpa | shadow_nonpresent_or_rsvd_mask;
 	spte |=3D (gpa & shadow_nonpresent_or_rsvd_mask)
 		<< SHADOW_NONPRESENT_OR_RSVD_MASK_LEN;
@@ -307,7 +307,8 @@ u64 mark_spte_for_access_track(u64 spte)
 	return spte;
 }
=20
-void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_=
mask)
+void kvm_mmu_set_mmio_spte_mask(struct kvm *kvm, u64 mmio_value, u64 mmio_=
mask,
+				u64 access_mask)
 {
 	BUG_ON((u64)(unsigned)access_mask !=3D access_mask);
 	WARN_ON(mmio_value & shadow_nonpresent_or_rsvd_lower_gfn_mask);
@@ -336,11 +337,9 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mm=
io_mask, u64 access_mask)
 	    WARN_ON(mmio_value && (__REMOVED_SPTE & mmio_mask) =3D=3D mmio_value))
 		mmio_value =3D 0;
=20
-	if (!mmio_value)
-		enable_mmio_caching =3D false;
-
-	shadow_mmio_value =3D mmio_value;
-	shadow_mmio_mask  =3D mmio_mask;
+	kvm->arch.enable_mmio_caching =3D !!mmio_value;
+	kvm->arch.shadow_mmio_value =3D mmio_value;
+	kvm->arch.shadow_mmio_mask =3D mmio_mask;
 	shadow_mmio_access_mask =3D access_mask;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
@@ -363,24 +362,18 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has=
_exec_only)
 	shadow_dirty_mask	=3D has_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull;
 	shadow_nx_mask		=3D 0ull;
 	shadow_x_mask		=3D VMX_EPT_EXECUTABLE_MASK;
-	shadow_present_mask	=3D has_exec_only ? 0ull : VMX_EPT_READABLE_MASK;
+	/* VMX_EPT_SUPPRESS_VE_BIT is needed for W or X violation. */
+	shadow_present_mask	=3D
+		(has_exec_only ? 0ull : VMX_EPT_READABLE_MASK) | VMX_EPT_SUPPRESS_VE_BIT;
 	shadow_acc_track_mask	=3D VMX_EPT_RWX_MASK;
 	shadow_host_writable_mask =3D EPT_SPTE_HOST_WRITABLE;
 	shadow_mmu_writable_mask  =3D EPT_SPTE_MMU_WRITABLE;
-
-	/*
-	 * EPT Misconfigurations are generated if the value of bits 2:0
-	 * of an EPT paging-structure entry is 110b (write/execute).
-	 */
-	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE,
-				   VMX_EPT_RWX_MASK, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_ept_masks);
=20
 void kvm_mmu_reset_all_pte_masks(void)
 {
 	u8 low_phys_bits;
-	u64 mask;
=20
 	shadow_phys_bits =3D kvm_get_shadow_phys_bits();
=20
@@ -429,9 +422,13 @@ void kvm_mmu_reset_all_pte_masks(void)
 	 * PTEs and so the reserved PA approach must be disabled.
 	 */
 	if (shadow_phys_bits < 52)
-		mask =3D BIT_ULL(51) | PT_PRESENT_MASK;
+		shadow_default_mmio_mask =3D BIT_ULL(51) | PT_PRESENT_MASK;
 	else
-		mask =3D 0;
+		shadow_default_mmio_mask =3D 0;
+}
=20
-	kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);
+void kvm_mmu_set_default_mmio_spte_mask(u64 mask)
+{
+	shadow_default_mmio_mask =3D mask;
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_set_default_mmio_spte_mask);
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 3319ca7f8f48..1ac2a7a91166 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -5,8 +5,6 @@
=20
 #include "mmu_internal.h"
=20
-extern bool __read_mostly enable_mmio_caching;
-
 /*
  * A MMU present SPTE is backed by actual memory and may or may not be pre=
sent
  * in hardware.  E.g. MMIO SPTEs are not considered present.  Use bit 11, =
as it
@@ -160,8 +158,7 @@ extern u64 __read_mostly shadow_x_mask; /* mutual exclu=
sive with nx_mask */
 extern u64 __read_mostly shadow_user_mask;
 extern u64 __read_mostly shadow_accessed_mask;
 extern u64 __read_mostly shadow_dirty_mask;
-extern u64 __read_mostly shadow_mmio_value;
-extern u64 __read_mostly shadow_mmio_mask;
+extern u64 __read_mostly shadow_default_mmio_mask;
 extern u64 __read_mostly shadow_mmio_access_mask;
 extern u64 __read_mostly shadow_present_mask;
 extern u64 __read_mostly shadow_me_value;
@@ -233,10 +230,10 @@ static inline bool is_removed_spte(u64 spte)
  */
 extern u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;
=20
-static inline bool is_mmio_spte(u64 spte)
+static inline bool is_mmio_spte(struct kvm *kvm, u64 spte)
 {
-	return (spte & shadow_mmio_mask) =3D=3D shadow_mmio_value &&
-	       likely(enable_mmio_caching);
+	return (spte & kvm->arch.shadow_mmio_mask) =3D=3D kvm->arch.shadow_mmio_v=
alue &&
+		likely(kvm->arch.enable_mmio_caching || kvm_gfn_shared_mask(kvm));
 }
=20
 static inline bool is_shadow_present_pte(u64 pte)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 383904742f44..f8c1824d85a5 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -549,8 +549,8 @@ static void __handle_changed_spte(struct kvm *kvm, int =
as_id, gfn_t gfn,
 		 * impact the guest since both the former and current SPTEs
 		 * are nonpresent.
 		 */
-		if (WARN_ON(!is_mmio_spte(old_spte) &&
-			    !is_mmio_spte(new_spte) &&
+		if (WARN_ON(!is_mmio_spte(kvm, old_spte) &&
+			    !is_mmio_spte(kvm, new_spte) &&
 			    !is_removed_spte(new_spte)))
 			pr_err("Unexpected SPTE change! Nonpresent SPTEs\n"
 			       "should not be replaced with another,\n"
@@ -1084,7 +1084,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm=
_vcpu *vcpu,
 	}
=20
 	/* If a MMIO SPTE is installed, the MMIO will need to be emulated. */
-	if (unlikely(is_mmio_spte(new_spte))) {
+	if (unlikely(is_mmio_spte(vcpu->kvm, new_spte))) {
 		trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn,
 				     new_spte);
 		ret =3D RET_PF_EMULATE;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 833eb557dee7..ca2700020322 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4770,7 +4770,7 @@ static __init void svm_adjust_mmio_mask(void)
 	 */
 	mask =3D (mask_bit < 52) ? rsvd_bits(mask_bit, 51) | PT_PRESENT_MASK : 0;
=20
-	kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK);
+	kvm_mmu_set_default_mmio_spte_mask(mask);
 }
=20
 static __init void svm_set_cpu_caps(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b5846e0fc78f..d2314389f268 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7181,6 +7181,14 @@ int vmx_vm_init(struct kvm *kvm)
 	if (!ple_gap)
 		kvm->arch.pause_in_guest =3D true;
=20
+	/*
+	 * EPT Misconfigurations can be generated if the value of bits 2:0
+	 * of an EPT paging-structure entry is 110b (write/execute).
+	 */
+	if (enable_ept)
+		kvm_mmu_set_mmio_spte_mask(kvm, VMX_EPT_MISCONFIG_WX_VALUE,
+					   VMX_EPT_RWX_MASK, 0);
+
 	if (boot_cpu_has(X86_BUG_L1TF) && enable_ept) {
 		switch (l1tf_mitigation) {
 		case L1TF_MITIGATION_OFF:
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2A9D7C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:22:26 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1352129AbiEES0A (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:26:00 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36394 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383240AbiEESTm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:42 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CFFB5DA1E;
        Thu,  5 May 2022 11:15:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774555; x=1683310555;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=YNoW9z1ClCT7swG5JfpwbF3FFn47leQenTkTt1akezE=;
  b=MNUYsTUnOaywXuL/FpdO4Bn2jHnGuvSqymdOz/JYQ9VLPYWvk63ALnxf
   geeShMjHVpeYX1EyNnj95yF/ouakGO5pbqf0c1Rm5DkWZomybvbsZgpgi
   Z78OvA15MrZplPh39pOn5TDYquEgSnxf8GmWMcROTGmk9f9woBiFKp4AF
   yxA+NpYS2Uzsm0++3Eqw9qZoeXWN6nBD6zMoGe42u0L1jkNbu/ghVvfrW
   wrr6HAR3tA70kReH6/zKWhklphEOh0XGGWVRkDuRbo7OoCjtCieSG0rgn
   whCsMQSGaR8kDFIxzZ8ElIl64t0BwHLEPSx+MrzKxr0Jk2Vw63yWsarDZ
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742023"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742023"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083270"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 039/104] KVM: x86/mmu: Disallow fast page fault on
 private GPA
Date: Thu,  5 May 2022 11:14:33 -0700
Message-Id: 
 <7a8550ac1ed70fea901756f84b10960a07089140.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX requires TDX SEAMCALL to operate Secure EPT instead of direct memory
access and TDX SEAMCALL is heavy operation.  Fast page fault on private GPA
doesn't make sense.  Disallow fast page fault on private GPA.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f4758b1b5202..8b26729cb9c4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3108,8 +3108,16 @@ static bool handle_abnormal_pfn(struct kvm_vcpu *vcp=
u, struct kvm_page_fault *fa
 	return false;
 }
=20
-static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
+static bool page_fault_can_be_fast(struct kvm *kvm, struct kvm_page_fault =
*fault)
 {
+	/*
+	 * TDX private mapping doesn't support fast page fault because the EPT
+	 * entry is read/written with TDX SEAMCALLs instead of direct memory
+	 * access.
+	 */
+	if (kvm_is_private_gpa(kvm, fault->addr))
+		return false;
+
 	/*
 	 * Do not fix the mmio spte with invalid generation number which
 	 * need to be updated by slow page fault path.
@@ -3213,7 +3221,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, str=
uct kvm_page_fault *fault)
 	u64 *sptep =3D NULL;
 	uint retry_count =3D 0;
=20
-	if (!page_fault_can_be_fast(fault))
+	if (!page_fault_can_be_fast(vcpu->kvm, fault))
 		return ret;
=20
 	walk_shadow_page_lockless_begin(vcpu);
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2558FC433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384100AbiEESZZ (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:25:25 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36392 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383262AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CE571A3A4;
        Thu,  5 May 2022 11:15:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774557; x=1683310557;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=4lquZTTZV+C6DaVeZCPZttfWMjtz7WKuO5kSC5oLrCc=;
  b=ncbbpo+iOMr9fnU4Pttlw7Kb3dZaXtZ7I6tJP1TnSvZvCSmxmLmDqjQC
   yoiKAPKx2LTWiYoHP/GNO/kc4A3lTjRpsmKLdmfOs066B6Y+0HMQXZ8iP
   8yamuRLJEpqBmlZRGYQEWp+SpD+6KF75lvmI2WvBLy86HfKfXJMa8ojP9
   zs8hSk0voyeit+kuWp16CHTlypGV629Nnbjt+3FauDrKK11U/5HbEEfDY
   jkl/am5mzZ7SH3UqyFRGG2VOVDluMFvhxoKqLLa/JJpDwNZVGn6n5jEkB
   rYXNDDjNvorvcc8WS1yAa1x4vRRDVu/8+idGQGeGPEDe4x+FBUfZrU5la
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742025"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742025"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083275"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 040/104] KVM: x86/mmu: Allow per-VM override of the TDP
 max page level
Date: Thu,  5 May 2022 11:14:34 -0700
Message-Id: 
 <83e129bcd111c4dec472c377c43926f338b80ac1.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TODO: This is a transient workaround patch until the large page support for
TDX is implemented.  Support large page for TDX and remove this patch.

At this point, large page for TDX isn't supported, and need to allow guest
TD to work only with 4K pages.  On the other hand, conventional VMX VMs
should continue to work with large page.  Allow per-VM override of the TDP
max page level.

In the existing x86 KVM MMU code, there is already max_level member in
struct kvm_page_fault with KVM_MAX_HUGEPAGE_LEVEL initial value.  The KVM
page fault handler denies page size larger than max_level.

Add per-VM member to indicate the allowed maximum page size with
KVM_MAX_HUGEPAGE_LEVEL as default value and initialize max_level in struct
kvm_page_fault with it.  For the guest TD, the set per-VM value for allows
maximum page size to 4K page size.  Then only allowed page size is 4K.  It
means large page is disabled.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/mmu.h              | 2 +-
 arch/x86/kvm/mmu/mmu.c          | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index c9c113316fd3..60223c21f16a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1063,6 +1063,7 @@ struct kvm_arch {
 	unsigned long n_requested_mmu_pages;
 	unsigned long n_max_mmu_pages;
 	unsigned int indirect_shadow_pages;
+	int tdp_max_page_level;
 	u8 mmu_valid_gen;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
 	struct list_head active_mmu_pages;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index eecb5e27b6a5..a37b2efec4a8 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -239,7 +239,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu=
 *vcpu, gpa_t cr2_or_gpa,
 		.is_tdp =3D likely(vcpu->arch.mmu->page_fault =3D=3D kvm_tdp_page_fault),
 		.nx_huge_page_workaround_enabled =3D is_nx_huge_page_enabled(),
=20
-		.max_level =3D KVM_MAX_HUGEPAGE_LEVEL,
+		.max_level =3D vcpu->kvm->arch.tdp_max_page_level,
 		.req_level =3D PG_LEVEL_4K,
 		.goal_level =3D PG_LEVEL_4K,
 	};
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8b26729cb9c4..8a684a7b1883 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5865,6 +5865,7 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	node->track_write =3D kvm_mmu_pte_write;
 	node->track_flush_slot =3D kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
+	kvm->arch.tdp_max_page_level =3D KVM_MAX_HUGEPAGE_LEVEL;
 	kvm_mmu_set_mmio_spte_mask(kvm, shadow_default_mmio_mask,
 				   shadow_default_mmio_mask,
 				   ACC_WRITE_MASK | ACC_USER_MASK);
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5C103C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384184AbiEESZb (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:25:31 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36940 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383250AbiEESTn (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:43 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 123DC5DA5B;
        Thu,  5 May 2022 11:15:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774556; x=1683310556;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ijQE85NEcnvtg6BiCR/pJrx1S15vRE4puZ8YRfcVbY0=;
  b=H4eP7Z/vrkfiQAxD+F/YWebAXiTWfZHHIXf8Hg3gQ3GIdaPrGIu3+wmQ
   ZPBgJXUM7Fpr87an0ydDde2Wsk5tQAGihFuw83EBUARyHVvMhLM+mcEBW
   mDus6sYY6jLz6KEJRFoaQ05UFHW8q/y/oDvPxBxFgnab5gTe2uDx/4t21
   bnz+usy6dXmpwtSEd0ILWzcgXTXY3cYpV2oMmWVl8GUsyVrFDq5aLqKn9
   RSqfgj+uYvGszWuV7buvGt+SRqoii7mVjy67kziOPWxXNdVZOulZ3PbcT
   XwdVqwJMvE/Kuc0WvyLcvkmDTSU62sq1SoiVJQgbK1eSlIVfhHq6znNux
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742026"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742026"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:46 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083279"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:45 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 041/104] KVM: x86/mmu: Zap only leaf SPTEs for
 deleted/moved memslot for private mmu
Date: Thu,  5 May 2022 11:14:35 -0700
Message-Id: 
 <17930cdb95783cf115239d50b5023e56b9a2b61f.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

For kvm mmu that has shared bit mask, zap only leaf SPTEs when
deleting/moving a memslot.  The existing kvm_mmu_zap_memslot() depends on
role.invalid with read lock of mmu_lock so that other vcpu can operate on
kvm mmu concurrently. Mark the root page table invalid, unlink it from page
table pointer of CPU, process the page table.  It doesn't work for private
page table to unlink the root page table because it requires all SPTE entry
to be non-present.  Instead, with write-lock of mmu_lock and zap only leaf
SPTEs for kvm mmu with shared bit mask.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8a684a7b1883..96cdafae0468 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5841,11 +5841,44 @@ static bool kvm_has_zapped_obsolete_pages(struct kv=
m *kvm)
 	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
 }
=20
+static void kvm_mmu_zap_memslot(struct kvm *kvm, struct kvm_memory_slot *s=
lot)
+{
+	bool flush =3D false;
+
+	write_lock(&kvm->mmu_lock);
+
+	/*
+	 * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
+	 * case scenario we'll have unused shadow pages lying around until they
+	 * are recycled due to age or when the VM is destroyed.
+	 */
+	if (is_tdp_mmu_enabled(kvm)) {
+		struct kvm_gfn_range range =3D {
+		      .slot =3D slot,
+		      .start =3D slot->base_gfn,
+		      .end =3D slot->base_gfn + slot->npages,
+		      .may_block =3D false,
+		};
+
+		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush);
+	} else {
+		flush =3D slot_handle_level(kvm, slot, kvm_zap_rmapp, PG_LEVEL_4K,
+					  KVM_MAX_HUGEPAGE_LEVEL, true);
+	}
+	if (flush)
+		kvm_flush_remote_tlbs(kvm);
+
+	write_unlock(&kvm->mmu_lock);
+}
+
 static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
 			struct kvm_memory_slot *slot,
 			struct kvm_page_track_notifier_node *node)
 {
-	kvm_mmu_zap_all_fast(kvm);
+	if (kvm_gfn_shared_mask(kvm))
+		kvm_mmu_zap_memslot(kvm, slot);
+	else
+		kvm_mmu_zap_all_fast(kvm);
 }
=20
 int kvm_mmu_init_vm(struct kvm *kvm)
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E6CD8C4332F
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:27 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1382119AbiEESZE (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:25:04 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36992 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383264AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96D105DA64;
        Thu,  5 May 2022 11:15:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774557; x=1683310557;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=DKO8p1z1u/GRhlWjJ3qX2Z+Q/wRIK2LVWM4LrNFmYpo=;
  b=C7UXcGoHHZdderSQNoOSDd/xVavnmelB+X4fed03zUyQ+Pwkm2At2Fz0
   +zt7EfJuqQiQuWhY8v8wsi1T4pMmZooifSwMyRi0AUvI7cQftTKcyMy3X
   fcQofjobqKkqdn4Q9Gmjnw7sN2l2HfotFLTICKaeX7HfENJjLahpzrPGX
   iTfIDL1+3athvGPJ72gQtl4iy95HBRkxzbd1o/T28QnuYVrD5Mb8AprzP
   80GemcUvPXZN8fV4nSQM8xnlvsMJaC6mvQdG6kvGwCM//EsUdx7mwGkrr
   EAbb20hhX21oM3c47/fNKSpDrdWOIUDjUOoCBSVf2qxXsTk2Jbo2wVW7l
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742027"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742027"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:46 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083284"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:46 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 042/104] KVM: VMX: Introduce test mode related to EPT
 violation VE
Date: Thu,  5 May 2022 11:14:36 -0700
Message-Id: 
 <055cf7f4120e5dece70552fab1576ace8bb047ee.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

To support TDX, KVM is enhanced to operate with #VE.  For TDX, KVM programs
to inject #VE conditionally and set #VE suppress bit in EPT entry.  For VMX
case, #VE isn't used.  If #VE happens for VMX, it's a bug.  To be
defensive (test that VMX case isn't broken), introduce option
ept_violation_ve_test and when it's set, set error.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/vmx.h | 12 +++++++
 arch/x86/kvm/vmx/vmx.c     | 68 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h     |  3 ++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 9682f5a02da8..d3c8abcaa35e 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -68,6 +68,7 @@
 #define SECONDARY_EXEC_ENCLS_EXITING		VMCS_CONTROL_BIT(ENCLS_EXITING)
 #define SECONDARY_EXEC_RDSEED_EXITING		VMCS_CONTROL_BIT(RDSEED_EXITING)
 #define SECONDARY_EXEC_ENABLE_PML               VMCS_CONTROL_BIT(PAGE_MOD_=
LOGGING)
+#define SECONDARY_EXEC_EPT_VIOLATION_VE		VMCS_CONTROL_BIT(EPT_VIOLATION_VE)
 #define SECONDARY_EXEC_PT_CONCEAL_VMX		VMCS_CONTROL_BIT(PT_CONCEAL_VMX)
 #define SECONDARY_EXEC_XSAVES			VMCS_CONTROL_BIT(XSAVES)
 #define SECONDARY_EXEC_MODE_BASED_EPT_EXEC	VMCS_CONTROL_BIT(MODE_BASED_EPT=
_EXEC)
@@ -222,6 +223,8 @@ enum vmcs_field {
 	VMREAD_BITMAP_HIGH              =3D 0x00002027,
 	VMWRITE_BITMAP                  =3D 0x00002028,
 	VMWRITE_BITMAP_HIGH             =3D 0x00002029,
+	VE_INFORMATION_ADDRESS		=3D 0x0000202A,
+	VE_INFORMATION_ADDRESS_HIGH	=3D 0x0000202B,
 	XSS_EXIT_BITMAP                 =3D 0x0000202C,
 	XSS_EXIT_BITMAP_HIGH            =3D 0x0000202D,
 	ENCLS_EXITING_BITMAP		=3D 0x0000202E,
@@ -621,4 +624,13 @@ enum vmx_l1d_flush_state {
=20
 extern enum vmx_l1d_flush_state l1tf_vmx_mitigation;
=20
+struct vmx_ve_information {
+	u32 exit_reason;
+	u32 delivery;
+	u64 exit_qualification;
+	u64 guest_linear_address;
+	u64 guest_physical_address;
+	u16 eptp_index;
+};
+
 #endif
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d2314389f268..60dedae31426 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -123,6 +123,9 @@ module_param_named(pml, enable_pml, bool, S_IRUGO);
 static bool __read_mostly dump_invalid_vmcs =3D 0;
 module_param(dump_invalid_vmcs, bool, 0644);
=20
+static bool __read_mostly ept_violation_ve_test =3D 0;
+module_param(ept_violation_ve_test, bool, 0444);
+
 #define MSR_BITMAP_MODE_X2APIC		1
 #define MSR_BITMAP_MODE_X2APIC_APICV	2
=20
@@ -721,6 +724,13 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
=20
 	eb =3D (1u << PF_VECTOR) | (1u << UD_VECTOR) | (1u << MC_VECTOR) |
 	     (1u << DB_VECTOR) | (1u << AC_VECTOR);
+	/*
+	 * #VE isn't used for VMX, but for TDX.  To test against unexpected
+	 * change related to #VE for VMX, intercept unexpected #VE and warn on
+	 * it.
+	 */
+	if (ept_violation_ve_test)
+		eb |=3D 1u << VE_VECTOR;
 	/*
 	 * Guest access to VMware backdoor ports could legitimately
 	 * trigger #GP because of TSS I/O permission bitmap.
@@ -2490,6 +2500,8 @@ static int setup_vmcs_config(struct vmcs_config *vmcs=
_conf,
 			SECONDARY_EXEC_BUS_LOCK_DETECTION;
 		if (cpu_has_sgx())
 			opt2 |=3D SECONDARY_EXEC_ENCLS_EXITING;
+		if (ept_violation_ve_test)
+			opt2 |=3D SECONDARY_EXEC_EPT_VIOLATION_VE;
 		if (adjust_vmx_controls(min2, opt2,
 					MSR_IA32_VMX_PROCBASED_CTLS2,
 					&_cpu_based_2nd_exec_control) < 0)
@@ -2518,6 +2530,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs=
_conf,
 					     CPU_BASED_INVLPG_EXITING);
 	} else if (vmx_cap->ept) {
 		vmx_cap->ept =3D 0;
+		_cpu_based_2nd_exec_control &=3D ~SECONDARY_EXEC_EPT_VIOLATION_VE;
 		pr_warn_once("EPT CAP should not exist if not support "
 				"1-setting enable EPT VM-execution control\n");
 	}
@@ -4331,6 +4344,7 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx=
 *vmx)
 		exec_control &=3D ~SECONDARY_EXEC_ENABLE_VPID;
 	if (!enable_ept) {
 		exec_control &=3D ~SECONDARY_EXEC_ENABLE_EPT;
+		exec_control &=3D ~SECONDARY_EXEC_EPT_VIOLATION_VE;
 		enable_unrestricted_guest =3D 0;
 	}
 	if (!enable_unrestricted_guest)
@@ -4455,8 +4469,40 @@ static void init_vmcs(struct vcpu_vmx *vmx)
=20
 	exec_controls_set(vmx, vmx_exec_control(vmx));
=20
-	if (cpu_has_secondary_exec_ctrls())
+	if (cpu_has_secondary_exec_ctrls()) {
 		secondary_exec_controls_set(vmx, vmx_secondary_exec_control(vmx));
+		if (secondary_exec_controls_get(vmx) &
+		    SECONDARY_EXEC_EPT_VIOLATION_VE) {
+			if (!vmx->ve_info) {
+				/* ve_info must be page aligned. */
+				struct page *page;
+
+				BUILD_BUG_ON(sizeof(*vmx->ve_info) > PAGE_SIZE);
+				page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+				if (page)
+					vmx->ve_info =3D page_to_virt(page);
+			}
+			if (vmx->ve_info) {
+				/*
+				 * Allow #VE delivery. CPU sets this field to
+				 * 0xFFFFFFFF on #VE delivery.  Another #VE can
+				 * occur only if software clears the field.
+				 */
+				vmx->ve_info->delivery =3D 0;
+				vmcs_write64(VE_INFORMATION_ADDRESS,
+					     __pa(vmx->ve_info));
+			} else {
+				/*
+				 * Because SECONDARY_EXEC_EPT_VIOLATION_VE is
+				 * used only when ept_violation_ve_test is true,
+				 * it's okay to go with the bit disabled.
+				 */
+				pr_err("Failed to allocate ve_info. disabling EPT_VIOLATION_VE.\n");
+				secondary_exec_controls_clearbit(
+					vmx, SECONDARY_EXEC_EPT_VIOLATION_VE);
+			}
+		}
+	}
=20
 	if (cpu_has_tertiary_exec_ctrls())
 		tertiary_exec_controls_set(vmx, vmx_tertiary_exec_control(vmx));
@@ -5051,7 +5097,14 @@ static int handle_exception_nmi(struct kvm_vcpu *vcp=
u)
 		if (handle_guest_split_lock(kvm_rip_read(vcpu)))
 			return 1;
 		fallthrough;
+	case VE_VECTOR:
 	default:
+		if (ept_violation_ve_test && ex_no =3D=3D VE_VECTOR) {
+			pr_err("VMEXIT due to unexpected #VE.\n");
+			secondary_exec_controls_clearbit(
+				vmx, SECONDARY_EXEC_EPT_VIOLATION_VE);
+			return 1;
+		}
 		kvm_run->exit_reason =3D KVM_EXIT_EXCEPTION;
 		kvm_run->ex.exception =3D ex_no;
 		kvm_run->ex.error_code =3D error_code;
@@ -6081,6 +6134,17 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (secondary_exec_control & SECONDARY_EXEC_ENABLE_VPID)
 		pr_err("Virtual processor ID =3D 0x%04x\n",
 		       vmcs_read16(VIRTUAL_PROCESSOR_ID));
+	if (secondary_exec_control & SECONDARY_EXEC_EPT_VIOLATION_VE) {
+		struct vmx_ve_information *ve_info;
+		pr_err("VE info address =3D 0x%016llx\n",
+		       vmcs_read64(VE_INFORMATION_ADDRESS));
+		ve_info =3D __va(vmcs_read64(VE_INFORMATION_ADDRESS));
+		pr_err("ve_info: 0x%08x 0x%08x 0x%016llx 0x%016llx 0x%016llx 0x%04x\n",
+		       ve_info->exit_reason, ve_info->delivery,
+		       ve_info->exit_qualification,
+		       ve_info->guest_linear_address,
+		       ve_info->guest_physical_address, ve_info->eptp_index);
+	}
 }
=20
 /*
@@ -7065,6 +7129,8 @@ void vmx_vcpu_free(struct kvm_vcpu *vcpu)
 	free_vpid(vmx->vpid);
 	nested_vmx_free_vcpu(vcpu);
 	free_loaded_vmcs(vmx->loaded_vmcs);
+	if (vmx->ve_info)
+		free_page((unsigned long)vmx->ve_info);
 }
=20
 int vmx_vcpu_create(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 36dbaf1add45..f49be71290bd 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -336,6 +336,9 @@ struct vcpu_vmx {
 		DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 		DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 	} shadow_msr_intercept;
+
+	/* ve_info must be page aligned. */
+	struct vmx_ve_information *ve_info;
 };
=20
 struct kvm_vmx {
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C3998C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:24:02 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384088AbiEES1k (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:27:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36462 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383202AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D3B943AC8;
        Thu,  5 May 2022 11:15:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774553; x=1683310553;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=9+zynqxUnbt7L2tmrykvNHmz+Wt/xr8Fj4lA+9l+hKk=;
  b=O4tzeEeEFQSc0IdtyqEH2eQcb88aOCfXrqxPisk+bO9ndK70aWSkoNiB
   CyP3KKY6sFovYopKHrzKmhCnq9oDZxtHYdrWNql8PokWjN9J/sudF1GSK
   Yjl9jtI26/Y25Mxed/cazkBN+ryQPt5q2eoLAOMdS65iBJtWq44lpNRXI
   s9uEFaQLk5zrQ15VvRdXa/jiw8xTkjIRzfNCYz/g60fMfIge+WwzEv9Mf
   ENFqagJxEU5kCnTWJ4mkoSvaOcRCxqcclwA9yQTSp7IH8TRfjj86lgk66
   eMxfqytcrozM42XjAkNLujES++750fOoRGyKiOnanSxXkLcyWgmQPlt8j
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746279"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746279"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:46 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083287"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:46 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 043/104] [MARKER] The start of TDX KVM patch series:
 KVM TDP MMU hooks
Date: Thu,  5 May 2022 11:14:37 -0700
Message-Id: 
 <166203215d2227e1bf5b903535f283ae8d02d710.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of KVM TDP MMU
hooks.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index df003d2ed89e..d5cace00c433 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -25,6 +25,6 @@ Patch Layer status
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
-* KVM TDP refactoring for TDX:          Applying
-* KVM TDP MMU hooks:                    Not yet
+* KVM TDP refactoring for TDX:          Applied
+* KVM TDP MMU hooks:                    Applying
 * KVM TDP MMU MapGPA:                   Not yet
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A8AD7C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S243902AbiEES2w (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:28:52 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36868 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1382880AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D9B0443E1;
        Thu,  5 May 2022 11:15:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774553; x=1683310553;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=dqWkE2oXMmA/xSffMn3L9CacObWffok44u2yasFRPjk=;
  b=AL2XsOYxNXAT7CXCx0JPkVLfc+EGCmyGXnDDAsyvZXWqD146S3PV8s9+
   W/gJCdnWzojuNxTAHNj1fTAbcP6lf/7PqN7fRBlOQE7zGErtCeLnmtzxK
   p0ZwYtKucn26WF1F28dj7n7Qg8wvt/LtuZxaNWmBDex0mpE4yCrt7373V
   l5Gye2FGMUiocH8DD+tCm5/lxFwmv+Rg6Y/SSvpnUk/di4SCHKGLROhjG
   eX3A7O/e3So4eXjR1wOjzCA4XfRiOOwO3+7xbo/mv4mOmy6pipeZS+ZKv
   IvixtnMVt2i5TBZZvi3P/RI/JzbvSkBiAzNJ7p+BRqbeqPxKlSRujw+mM
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746281"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746281"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:46 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083290"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:46 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 044/104] KVM: x86/mmu: Focibly use TDP MMU for TDX
Date: Thu,  5 May 2022 11:14:38 -0700
Message-Id: 
 <74ed344d1ee81bc378dcd0339278b0c3b7d675e9.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

In this patch series, TDX supports only TDP MMU and doesn't support legacy
MMU.  Forcibly use TDP MMU for TDX irrelevant of kernel parameter to
disable TDP MMU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index f8c1824d85a5..b8850a0ceb15 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -18,8 +18,13 @@ int kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 {
 	struct workqueue_struct *wq;
=20
-	if (!tdp_enabled || !READ_ONCE(tdp_mmu_enabled))
-		return 0;
+	/*
+	 *  Because TDX supports only TDP MMU, forcibly use TDP MMU in the case
+	 *  of TDX.
+	 */
+	if (kvm->arch.vm_type !=3D KVM_X86_TDX_VM &&
+		(!tdp_enabled || !READ_ONCE(tdp_mmu_enabled)))
+		return false;
=20
 	wq =3D alloc_workqueue("kvm", WQ_UNBOUND|WQ_MEM_RECLAIM|WQ_CPU_INTENSIVE,=
 0);
 	if (!wq)
--=20
2.25.1
From nobody Sat Feb  7 23:07:57 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0AF96C46467
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384983AbiEESYz (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:24:55 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36326 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383271AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA3C95DA69;
        Thu,  5 May 2022 11:15:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774558; x=1683310558;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=9IGSNVmmtKN00Y7BcZ/KwJEWtXgn4QTJ82hX67/rE/E=;
  b=NZW7eeNxJ5bsGOeEGllmo09XBsDvjCexrGl3kiIAC9RDY4aoprxJ8fVy
   dFLy/wEBicyd4Wt9vKgkkh1xZL8Jhjj2VVnQJfUeZVlDhDD7VHMN1vCy/
   xlbSWu+rZ4myfkubR/xGlqcIxQZFghsLhOZwxqszvoh8AHmiP0QdPBXe9
   xwY5Nwoe4jsLEEXcHJsOk/S2gJXsvnKSL2tVwFw2ZY2IpiPqGwSi+R2pZ
   PtNIX0O6UT+3/RR0YNKrTelhAny6R2V7Ydi4KCZJ1ptk3S/4Nxei4tdpR
   U77IME5hxVg7ZlpVTrZhf8f/e6CBISq9W6yOlYXOLYKNrGm2JRIBzHMvl
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742030"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742030"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:46 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083293"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:46 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 045/104] KVM: x86/mmu: Add a private pointer to struct
 kvm_mmu_page
Date: Thu,  5 May 2022 11:14:39 -0700
Message-Id: 
 <34f9976182e770ce4c5b0e3888ed15f059055789.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

For private GPA, CPU refers a private page table whose contents are
encrypted.  The dedicated APIs to operate on it (e.g. updating/reading its
PTE entry) are used and their cost is expensive.

When KVM resolves KVM page fault, it walks the page tables.  To reuse the
existing KVM MMU code and mitigate the heavy cost to directly walk
encrypted private page table, allocate a more page to mirror the existing
KVM page table.  Resolve KVM page fault with the existing code, and do
additional operations necessary for the mirrored private page table.  To
distinguish such cases, the existing KVM page table is called a shared page
table (i.e. no mirrored private page table), and the KVM page table with
mirrored private page table is called a private page table.  The
relationship is depicted below.

Add private pointer to struct kvm_mmu_page for mirrored private page table
and add helper functions to allocate/initialize/free a mirrored private
page table page.  Also, add helper functions to check if a given
kvm_mmu_page is private.  The later patch introduces hooks to operate on
the mirrored private page table.

              KVM page fault                     |
                     |                           |
                     V                           |
        -------------+----------                 |
        |                      |                 |
        V                      V                 |
     shared GPA           private GPA            |
        |                      |                 |
        V                      V                 |
 CPU/KVM shared PT root  KVM private PT root     |  CPU private PT root
        |                      |                 |           |
        V                      V                 |           V
     shared PT            private PT <----mirror----> mirrored private PT
        |                      |                 |           |
        |                      \-----------------+------\    |
        |                                        |      |    |
        V                                        |      V    V
  shared guest page                              |    private guest page
                                                 |
                           non-encrypted memory  |    encrypted memory
                                                 |
PT: page table

Both CPU and KVM refer to CPU/KVM shared page table.  Private page table
is used only by KVM.  CPU refers to mirrored private page table.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu/mmu.c          |  9 ++++
 arch/x86/kvm/mmu/mmu_internal.h | 84 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu/tdp_mmu.c      |  3 ++
 4 files changed, 97 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 60223c21f16a..8ef83bcefa57 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -696,6 +696,7 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu_memory_cache mmu_shadow_page_cache;
 	struct kvm_mmu_memory_cache mmu_gfn_array_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
+	struct kvm_mmu_memory_cache mmu_private_sp_cache;
=20
 	/*
 	 * QEMU userspace and the guest each have their own FPU state.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 96cdafae0468..7e4c96605261 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -716,6 +716,13 @@ static int mmu_topup_shadow_page_cache(struct kvm_vcpu=
 *vcpu)
 	int start, end, i, r;
 	bool is_tdp_mmu =3D is_tdp_mmu_enabled(vcpu->kvm);
=20
+	if (kvm_gfn_shared_mask(vcpu->kvm)) {
+		r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_private_sp_cache,
+					       PT64_ROOT_MAX_LEVEL);
+		if (r)
+			return r;
+	}
+
 	if (is_tdp_mmu && shadow_nonpresent_value)
 		start =3D kvm_mmu_memory_cache_nr_free_objects(mc);
=20
@@ -757,6 +764,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcp=
u)
 {
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
+	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_private_sp_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_gfn_array_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
@@ -1759,6 +1767,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct=
 kvm_vcpu *vcpu, int direct
 	if (!direct)
 		sp->gfns =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_gfn_array_cache);
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
+	kvm_mmu_init_private_sp(sp);
=20
 	/*
 	 * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index 1bff453f7cbe..123736d651e3 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -55,6 +55,10 @@ struct kvm_mmu_page {
 	u64 *spt;
 	/* hold the gfn of each spte inside spt */
 	gfn_t *gfns;
+#ifdef CONFIG_KVM_MMU_PRIVATE
+	/* associated private shadow page, e.g. SEPT page. */
+	void *private_sp;
+#endif
 	/* Currently serving as active root */
 	union {
 		int root_count;
@@ -115,6 +119,86 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_pa=
ge *sp)
 	return kvm_mmu_role_as_id(sp->role);
 }
=20
+/*
+ * TDX vcpu allocates page for root Secure EPT page and assigns to CPU sec=
ure
+ * EPT pointer.  KVM doesn't need to allocate and link to the secure EPT.
+ * Dummy value to make is_pivate_sp() return true.
+ */
+#define KVM_MMU_PRIVATE_SP_ROOT	((void *)1)
+
+#ifdef CONFIG_KVM_MMU_PRIVATE
+static inline bool is_private_sp(struct kvm_mmu_page *sp)
+{
+	return !!sp->private_sp;
+}
+
+static inline bool is_private_sptep(u64 *sptep)
+{
+	WARN_ON(!sptep);
+	return is_private_sp(sptep_to_sp(sptep));
+}
+
+static inline void *kvm_mmu_private_sp(struct kvm_mmu_page *sp)
+{
+	return sp->private_sp;
+}
+
+static inline void kvm_mmu_init_private_sp(struct kvm_mmu_page *sp)
+{
+	sp->private_sp =3D NULL;
+}
+
+/* Valid sp->role.level is required. */
+static inline void kvm_mmu_alloc_private_sp(
+	struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, bool is_root)
+{
+	if (is_root)
+		sp->private_sp =3D KVM_MMU_PRIVATE_SP_ROOT;
+	else
+		sp->private_sp =3D kvm_mmu_memory_cache_alloc(
+			&vcpu->arch.mmu_private_sp_cache);
+	/*
+	 * Because mmu_private_sp_cache is topped up before staring kvm page
+	 * fault resolving, the allocation above shouldn't fail.
+	 */
+	WARN_ON_ONCE(!sp->private_sp);
+}
+
+static inline void kvm_mmu_free_private_sp(struct kvm_mmu_page *sp)
+{
+	if (sp->private_sp !=3D KVM_MMU_PRIVATE_SP_ROOT)
+		free_page((unsigned long)sp->private_sp);
+}
+#else
+static inline bool is_private_sp(struct kvm_mmu_page *sp)
+{
+	return false;
+}
+
+static inline bool is_private_sptep(u64 *sptep)
+{
+	return false;
+}
+
+static inline void *kvm_mmu_private_sp(struct kvm_mmu_page *sp)
+{
+	return NULL;
+}
+
+static inline void kvm_mmu_init_private_sp(struct kvm_mmu_page *sp)
+{
+}
+
+static inline void kvm_mmu_alloc_private_sp(
+	struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, bool is_root)
+{
+}
+
+static inline void kvm_mmu_free_private_sp(struct kvm_mmu_page *sp)
+{
+}
+#endif
+
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page =
*sp)
 {
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b8850a0ceb15..b7e13061e57d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -72,6 +72,8 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
=20
 static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
 {
+	if (is_private_sp(sp))
+		kvm_mmu_free_private_sp(sp);
 	free_page((unsigned long)sp->spt);
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
@@ -295,6 +297,7 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, td=
p_ptep_t sptep,
 	sp->gfn =3D gfn;
 	sp->ptep =3D sptep;
 	sp->tdp_mmu_page =3D true;
+	kvm_mmu_init_private_sp(sp);
=20
 	trace_kvm_mmu_get_page(sp, true);
 }
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5D8A9C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:24:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384033AbiEES1h (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:27:37 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36898 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383218AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21BFF49684;
        Thu,  5 May 2022 11:15:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774554; x=1683310554;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=pVoiIWs/KT0rFFHvpPmgf/2kpDklS0hKSk3efqDHWmY=;
  b=nlFjGtGZMRAjNvI9XnOrWmjdsOFz4O6UL4lFCgryqCUwA6Dh+r4H5A0p
   rN4DdKBGUs46m0FC0QYMTV333s7di+FkcXVlpI3vBs/f75thpJgGPPwxA
   ZHZUA4h2TtF6pBikliccu8jm0oOio4Y+Q23tJOxv2TmrPcnyUxUg8bDZH
   tOEXszZJyhkM4/LuYttXZIbN4MNPubPoE//yuChPHB2A9phcOETQWYUPP
   XLVZdo6KggMJACAxbejtQm7W3YcoAdb2bG5WgHHQ4S3qGnYYsRw1lLx0f
   R8Q7sYXRZV+hg4Aj7X24PzBE2HvOXOv6qSz2X9VIYsv4GiCQMUoMGAMRy
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="328746282"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="328746282"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:46 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083298"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:46 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 046/104] KVM: x86/tdp_mmu: refactor kvm_tdp_mmu_map()
Date: Thu,  5 May 2022 11:14:40 -0700
Message-Id: 
 <4744ed20a4d9db95b515559e7525623781261b0c.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Factor out non-leaf SPTE population logic from kvm_tdp_mmu_map().  MapGPA
hypercall needs to populate non-leaf SPTE to record which GPA, private or
shared, is allowed in the leaf EPT entry.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b7e13061e57d..9e015b3e0578 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1149,6 +1149,24 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct t=
dp_iter *iter,
 	return 0;
 }
=20
+static int tdp_mmu_populate_nonleaf(
+	struct kvm_vcpu *vcpu, struct tdp_iter *iter, bool account_nx)
+{
+	struct kvm_mmu_page *sp;
+	int ret;
+
+	WARN_ON(is_shadow_present_pte(iter->old_spte));
+	WARN_ON(is_removed_spte(iter->old_spte));
+
+	sp =3D tdp_mmu_alloc_sp(vcpu);
+	tdp_mmu_init_child_sp(sp, iter);
+
+	ret =3D tdp_mmu_link_sp(vcpu->kvm, iter, sp, account_nx, true);
+	if (ret)
+		tdp_mmu_free_sp(sp);
+	return ret;
+}
+
 /*
  * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by install=
ing
  * page tables and SPTEs to translate the faulting guest physical address.
@@ -1157,7 +1175,6 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm=
_page_fault *fault)
 {
 	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
 	struct tdp_iter iter;
-	struct kvm_mmu_page *sp;
 	int ret;
=20
 	kvm_mmu_hugepage_adjust(vcpu, fault);
@@ -1203,13 +1220,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kv=
m_page_fault *fault)
 			if (is_removed_spte(iter.old_spte))
 				break;
=20
-			sp =3D tdp_mmu_alloc_sp(vcpu);
-			tdp_mmu_init_child_sp(sp, &iter);
-
-			if (tdp_mmu_link_sp(vcpu->kvm, &iter, sp, account_nx, true)) {
-				tdp_mmu_free_sp(sp);
+			if (tdp_mmu_populate_nonleaf(vcpu, &iter, account_nx))
 				break;
-			}
 		}
 	}
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AC075C4167B
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384517AbiEESY2 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:24:28 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36890 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383312AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 227B15DA7B;
        Thu,  5 May 2022 11:15:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774559; x=1683310559;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=gocwGFQc5nGL8ZF1cjRQZgifSA7fXlCdAUiZR29pvWk=;
  b=W3tlVk0xylrJZO22x4BtZ8TOnteP9vUmqYpC0sWQd9eIRwaE3ujxHs0R
   QXsjE/j81GrMXy4mOrddGdOvxApDxrakivtBQQQ0hl4LSZNrj7k9BlZ6K
   4kN9j++TBIzv18jA3+IFHJr43AbdEiL/TNPuPrGyrbjNB/I5kaSaZJzz+
   ykSaZdr/ZXL4qR8k+2YcQkIhAqgcAjpI6DbgoNvsdx2Mh+LV2r7yBJ4GP
   xBSlUf+jHl2LNU/GmN5Fw2EnEsGAwWMrEKiH2a9AW4W3ArjWaS25gAKXO
   MtPlKoigPcTsl42GJnTRJWZFFtFyq7tI3Y6uftbN2y7QOFZyT3rK15y+H
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742034"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742034"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:47 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083306"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:46 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 047/104] KVM: x86/tdp_mmu: Support TDX private mapping
 for TDP MMU
Date: Thu,  5 May 2022 11:14:41 -0700
Message-Id: 
 <653230043fdb2d20e871e79e73f757134ca92eeb.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Allocate mirrored private page table for private page table, and add hooks
to operate on mirrored private page table.  This patch adds only hooks. As
kvm_gfn_shared_mask() returns false always, those hooks aren't called yet.

Because private guest page is protected, page copy with mmu_notifier to
migrate page doesn't work.  Callback from backing store is needed.
Instead, pin the page for now.  Page migration is future task.

When the faulting GPA is private, the KVM fault is also called private.
When resolving private KVM, allocate mirrored private page table and call
hooks to operate on mirrored private page table. On the change of the
private PTE entry, invoke kvm_x86_ops hook in __handle_changed_spte() to
propagate the change to mirrored private page table. The following depicts
the relationship.

  private KVM page fault   |
      |                    |
      V                    |
 private GPA               |
      |                    |
      V                    |
 KVM private PT root       |  CPU private PT root
      |                    |           |
      V                    |           V
   private PT ---hook to mirror--->mirrored private PT
      |                    |           |
      \--------------------+------\    |
                           |      |    |
                           |      V    V
                           |    private guest page
                           |
                           |
     non-encrypted memory  |    encrypted memory
                           |
PT: page table

The existing KVM TDP MMU code uses atomic update of SPTE.  On populating
the EPT entry, atomically set the entry.  However, it requires TLB
shootdown to zap SPTE.  To address it, the entry is frozen with the special
SPTE value that clears the present bit. After the TLB shootdown, the entry
is set to the eventual value (unfreeze).

For mirrored private page table, hooks are called to update mirrored
private page table in addition to direct access to the private SPTE. For
the zapping case, it works to freeze the SPTE. It can call hooks in
addition to TLB shootdown.  For populating the private SPTE entry, there
can be a race condition without further protection

  vcpu 1: populating 2M private SPTE
  vcpu 2: populating 4K private SPTE
  vcpu 2: TDX SEAMCALL to update 4K mirrored private SPTE =3D> error
  vcpu 1: TDX SEAMCALL to update 2M mirrored private SPTE

To avoid the race, the frozen SPTE is utilized.  Instead of atomic update
of the private entry, freeze the entry, call the hook that update mirrored
private SPTE, set the entry to the final value.

Support 4K page only at this stage.  2M page support can be done in future
patches.

Add is_private member to kvm_page_fault to indicate the fault is private.
Also is_private member to struct tdp_inter to propagate it.

Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |   2 +
 arch/x86/include/asm/kvm_host.h    |  20 +++
 arch/x86/kvm/mmu.h                 |   4 +
 arch/x86/kvm/mmu/mmu.c             |  91 ++++++++++-
 arch/x86/kvm/mmu/mmu_internal.h    |  35 +++++
 arch/x86/kvm/mmu/paging_tmpl.h     |   2 +-
 arch/x86/kvm/mmu/tdp_iter.c        |   1 +
 arch/x86/kvm/mmu/tdp_iter.h        |   5 +-
 arch/x86/kvm/mmu/tdp_mmu.c         | 240 +++++++++++++++++++++++------
 arch/x86/kvm/mmu/tdp_mmu.h         |   7 +-
 virt/kvm/kvm_main.c                |   1 +
 11 files changed, 349 insertions(+), 59 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 32a6df784ea6..6982d57e4518 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -93,6 +93,8 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr)
 KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
 KVM_X86_OP(get_mt_mask)
 KVM_X86_OP(load_mmu_pgd)
+KVM_X86_OP_OPTIONAL(free_private_sp)
+KVM_X86_OP_OPTIONAL(handle_changed_private_spte)
 KVM_X86_OP(has_wbinvd_exit)
 KVM_X86_OP(get_l2_tsc_offset)
 KVM_X86_OP(get_l2_tsc_multiplier)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 8ef83bcefa57..88c3e9c78797 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -437,6 +437,7 @@ struct kvm_mmu {
 			 struct kvm_mmu_page *sp);
 	void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
 	struct kvm_mmu_root_info root;
+	hpa_t private_root_hpa;
 	union kvm_cpu_role cpu_role;
 	union kvm_mmu_page_role root_role;
=20
@@ -1339,6 +1340,20 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_=
mode_logical)
 	return dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
 }
=20
+struct kvm_spte {
+	kvm_pfn_t pfn;
+	bool is_present;
+	bool is_leaf;
+};
+
+struct kvm_spte_change {
+	gfn_t gfn;
+	enum pg_level level;
+	struct kvm_spte old;
+	struct kvm_spte new;
+	void *sept_page;
+};
+
 struct kvm_x86_ops {
 	const char *name;
=20
@@ -1451,6 +1466,11 @@ struct kvm_x86_ops {
 	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			     int root_level);
=20
+	int (*free_private_sp)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
+			       void *private_sp);
+	void (*handle_changed_private_spte)(
+		struct kvm *kvm, const struct kvm_spte_change *change);
+
 	bool (*has_wbinvd_exit)(void);
=20
 	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index a37b2efec4a8..d02c0274777a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -179,6 +179,7 @@ struct kvm_page_fault {
 	/* Derived from mmu and global state.  */
 	const bool is_tdp;
 	const bool nx_huge_page_workaround_enabled;
+	const bool is_private;
=20
 	/*
 	 * Whether a >4KB mapping can be created or is forbidden due to NX
@@ -224,6 +225,8 @@ static inline bool is_nx_huge_page_enabled(void)
 	return READ_ONCE(nx_huge_pages);
 }
=20
+static inline bool kvm_is_private_gpa(const struct kvm *kvm, gpa_t gpa);
+
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_o=
r_gpa,
 					u32 err, bool prefetch)
 {
@@ -238,6 +241,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu=
 *vcpu, gpa_t cr2_or_gpa,
 		.prefetch =3D prefetch,
 		.is_tdp =3D likely(vcpu->arch.mmu->page_fault =3D=3D kvm_tdp_page_fault),
 		.nx_huge_page_workaround_enabled =3D is_nx_huge_page_enabled(),
+		.is_private =3D kvm_is_private_gpa(vcpu->kvm, cr2_or_gpa),
=20
 		.max_level =3D vcpu->kvm->arch.tdp_max_page_level,
 		.req_level =3D PG_LEVEL_4K,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7e4c96605261..f4284e9cf9ec 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1600,7 +1600,11 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm=
_gfn_range *range)
 		flush =3D kvm_handle_gfn_range(kvm, range, kvm_unmap_rmapp);
=20
 	if (is_tdp_mmu_enabled(kvm))
-		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush);
+		/*
+		 * private page needs to be kept and handle page migration
+		 * on next EPT violation.
+		 */
+		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush, false);
=20
 	return flush;
 }
@@ -3107,7 +3111,8 @@ static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu=
, struct kvm_page_fault *fa
 		 * SPTE value without #VE suppress bit cleared
 		 * (kvm->arch.shadow_mmio_value =3D 0).
 		 */
-		if (unlikely(!vcpu->kvm->arch.enable_mmio_caching) ||
+		if (unlikely(!vcpu->kvm->arch.enable_mmio_caching &&
+			     !kvm_gfn_shared_mask(vcpu->kvm)) ||
 		    unlikely(fault->gfn > kvm_mmu_max_gfn())) {
 			*ret_val =3D RET_PF_EMULATE;
 			return true;
@@ -3461,7 +3466,12 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *v=
cpu)
 		goto out_unlock;
=20
 	if (is_tdp_mmu_enabled(vcpu->kvm)) {
-		root =3D kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
+		if (kvm_gfn_shared_mask(vcpu->kvm) &&
+		    !VALID_PAGE(mmu->private_root_hpa)) {
+			root =3D kvm_tdp_mmu_get_vcpu_root_hpa(vcpu, true);
+			mmu->private_root_hpa =3D root;
+		}
+		root =3D kvm_tdp_mmu_get_vcpu_root_hpa(vcpu, false);
 		mmu->root.hpa =3D root;
 	} else if (shadow_root_level >=3D PT64_ROOT_4LEVEL) {
 		root =3D mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true);
@@ -4014,6 +4024,38 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu =
*vcpu, gpa_t cr2_or_gpa,
 				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
 }
=20
+/*
+ * Private page can't be release on mmu_notifier without losing page conte=
nts.
+ * The help, callback, from backing store is needed to allow page migratio=
n.
+ * For now, pin the page.
+ */
+static bool kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
+				    struct kvm_page_fault *fault, int *r)
+{
+	hva_t hva =3D gfn_to_hva_memslot(fault->slot, fault->gfn);
+	struct page *page[1];
+	unsigned int flags;
+	int npages;
+
+	fault->map_writable =3D false;
+	fault->pfn =3D KVM_PFN_ERR_FAULT;
+	*r =3D -1;
+	if (hva =3D=3D KVM_HVA_ERR_RO_BAD || hva =3D=3D KVM_HVA_ERR_BAD)
+		return true;
+
+	/* TDX allows only RWX.  Read-only isn't supported. */
+	WARN_ON_ONCE(!fault->write);
+	flags =3D FOLL_WRITE | FOLL_LONGTERM;
+
+	npages =3D pin_user_pages_fast(hva, 1, flags, page);
+	if (npages !=3D 1)
+		return true;
+
+	fault->map_writable =3D true;
+	fault->pfn =3D page_to_pfn(page[0]);
+	return false;
+}
+
 static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *=
fault, int *r)
 {
 	struct kvm_memory_slot *slot =3D fault->slot;
@@ -4048,6 +4090,9 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, st=
ruct kvm_page_fault *fault,
 		}
 	}
=20
+	if (fault->is_private)
+		return kvm_faultin_pfn_private(vcpu, fault, r);
+
 	async =3D false;
 	fault->pfn =3D __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
 					  fault->write, &fault->map_writable,
@@ -4103,6 +4148,18 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcp=
u,
 	       mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva);
 }
=20
+void kvm_mmu_release_fault(struct kvm *kvm, struct kvm_page_fault *fault, =
int r)
+{
+	if (is_error_noslot_pfn(fault->pfn) || kvm_is_reserved_pfn(fault->pfn))
+		return;
+
+	if (fault->is_private) {
+		if (r !=3D RET_PF_FIXED)
+			unpin_user_page(pfn_to_page(fault->pfn));
+	} else
+		kvm_release_pfn_clean(fault->pfn);
+}
+
 static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault =
*fault)
 {
 	bool is_tdp_mmu_fault =3D is_tdp_mmu(vcpu->arch.mmu);
@@ -4157,7 +4214,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, s=
truct kvm_page_fault *fault
 		read_unlock(&vcpu->kvm->mmu_lock);
 	else
 		write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(fault->pfn);
+	kvm_mmu_release_fault(vcpu->kvm, fault, r);
 	return r;
 }
=20
@@ -5654,6 +5711,7 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, st=
ruct kvm_mmu *mmu)
=20
 	mmu->root.hpa =3D INVALID_PAGE;
 	mmu->root.pgd =3D 0;
+	mmu->private_root_hpa =3D INVALID_PAGE;
 	for (i =3D 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
 		mmu->prev_roots[i] =3D KVM_MMU_ROOT_INFO_INVALID;
=20
@@ -5842,6 +5900,10 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 	 * lead to use-after-free.
 	 */
 	if (is_tdp_mmu_enabled(kvm))
+		/*
+		 * For now private root is never invalidate during VM is running,
+		 * so this can only happen for shared roots.
+		 */
 		kvm_tdp_mmu_zap_invalidated_roots(kvm);
 }
=20
@@ -5869,7 +5931,8 @@ static void kvm_mmu_zap_memslot(struct kvm *kvm, stru=
ct kvm_memory_slot *slot)
 		      .may_block =3D false,
 		};
=20
-		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush);
+		/* All private page should be zapped on memslot deletion. */
+		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush, true);
 	} else {
 		flush =3D slot_handle_level(kvm, slot, kvm_zap_rmapp, PG_LEVEL_4K,
 					  KVM_MAX_HUGEPAGE_LEVEL, true);
@@ -5977,7 +6040,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_sta=
rt, gfn_t gfn_end)
 	if (is_tdp_mmu_enabled(kvm)) {
 		for (i =3D 0; i < KVM_ADDRESS_SPACE_NUM; i++)
 			flush =3D kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start,
-						      gfn_end, true, flush);
+						      gfn_end, true, flush, false);
 	}
=20
 	if (flush)
@@ -6010,6 +6073,11 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kv=
m,
 		write_unlock(&kvm->mmu_lock);
 	}
=20
+	/*
+	 * For now this can only happen for non-TD VM, because TD private
+	 * mapping doesn't support write protection.  kvm_tdp_mmu_wrprot_slot()
+	 * will give a WARN() if it hits for TD.
+	 */
 	if (is_tdp_mmu_enabled(kvm)) {
 		read_lock(&kvm->mmu_lock);
 		flush |=3D kvm_tdp_mmu_wrprot_slot(kvm, memslot, start_level);
@@ -6098,6 +6166,9 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *=
kvm,
 		sp =3D sptep_to_sp(sptep);
 		pfn =3D spte_to_pfn(*sptep);
=20
+		/* Private page dirty logging is not supported. */
+		KVM_BUG_ON(is_private_sptep(sptep), kvm);
+
 		/*
 		 * We cannot do huge page mapping for indirect shadow pages,
 		 * which are found on the last rmap (level =3D 1) when not using
@@ -6138,6 +6209,11 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 		write_unlock(&kvm->mmu_lock);
 	}
=20
+	/*
+	 * This should only be reachable in case of log-dirty, wihch TD private
+	 * mapping doesn't support so far.  kvm_tdp_mmu_zap_collapsible_sptes()
+	 * internally gives a WARN() when it hits.
+	 */
 	if (is_tdp_mmu_enabled(kvm)) {
 		read_lock(&kvm->mmu_lock);
 		kvm_tdp_mmu_zap_collapsible_sptes(kvm, slot);
@@ -6424,6 +6500,9 @@ int kvm_mmu_vendor_module_init(void)
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_unload(vcpu);
+	if (is_tdp_mmu_enabled(vcpu->kvm))
+		mmu_free_root_page(vcpu->kvm, &vcpu->arch.mmu->private_root_hpa,
+				NULL);
 	free_mmu_pages(&vcpu->arch.root_mmu);
 	free_mmu_pages(&vcpu->arch.guest_mmu);
 	mmu_free_memory_caches(vcpu);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index 123736d651e3..affbfe895dab 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -6,6 +6,8 @@
 #include <linux/kvm_host.h>
 #include <asm/kvm_host.h>
=20
+#include "mmu.h"
+
 #undef MMU_DEBUG
=20
 #ifdef MMU_DEBUG
@@ -164,11 +166,30 @@ static inline void kvm_mmu_alloc_private_sp(
 	WARN_ON_ONCE(!sp->private_sp);
 }
=20
+static inline int kvm_alloc_private_sp_for_split(
+	struct kvm_mmu_page *sp, gfp_t gfp)
+{
+	gfp &=3D ~__GFP_ZERO;
+	sp->private_sp =3D (void*)__get_free_page(gfp);
+	if (!sp->private_sp)
+		return -ENOMEM;
+	return 0;
+}
+
 static inline void kvm_mmu_free_private_sp(struct kvm_mmu_page *sp)
 {
 	if (sp->private_sp !=3D KVM_MMU_PRIVATE_SP_ROOT)
 		free_page((unsigned long)sp->private_sp);
 }
+
+static inline gfn_t kvm_gfn_for_root(struct kvm *kvm, struct kvm_mmu_page =
*root,
+				     gfn_t gfn)
+{
+	if (is_private_sp(root))
+		return kvm_gfn_private(kvm, gfn);
+	else
+		return kvm_gfn_shared(kvm, gfn);
+}
 #else
 static inline bool is_private_sp(struct kvm_mmu_page *sp)
 {
@@ -194,11 +215,25 @@ static inline void kvm_mmu_alloc_private_sp(
 {
 }
=20
+static inline int kvm_alloc_private_sp_for_split(
+	struct kvm_mmu_page *sp, gfp_t gfp)
+{
+	return -ENOMEM;
+}
+
 static inline void kvm_mmu_free_private_sp(struct kvm_mmu_page *sp)
 {
 }
+
+static inline gfn_t kvm_gfn_for_root(struct kvm *kvm, struct kvm_mmu_page =
*root,
+				     gfn_t gfn)
+{
+	return gfn;
+}
 #endif
=20
+void kvm_mmu_release_fault(struct kvm *kvm, struct kvm_page_fault *fault, =
int r);
+
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page =
*sp)
 {
 	/*
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 1850689fa76c..7adf70c9a672 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -876,7 +876,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, str=
uct kvm_page_fault *fault
=20
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(fault->pfn);
+	kvm_mmu_release_fault(vcpu->kvm, fault, r);
 	return r;
 }
=20
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 6d3b3e5a5533..fc427425e0b4 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -53,6 +53,7 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu=
_page *root,
 	iter->min_level =3D min_level;
 	iter->pt_path[iter->root_level - 1] =3D (tdp_ptep_t)root->spt;
 	iter->as_id =3D kvm_mmu_page_as_id(root);
+	iter->is_private =3D is_private_sp(root);
=20
 	tdp_iter_restart(iter);
 }
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index b1eaf6ec0e0b..882fe7ba4ddb 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -41,7 +41,7 @@ struct tdp_iter {
 	tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL];
 	/* A pointer to the current SPTE */
 	tdp_ptep_t sptep;
-	/* The lowest GFN mapped by the current SPTE */
+	/* The lowest GFN (shared bits included) mapped by the current SPTE */
 	gfn_t gfn;
 	/* The level of the root page given to the iterator */
 	int root_level;
@@ -64,6 +64,9 @@ struct tdp_iter {
 	 * level instead of advancing to the next entry.
 	 */
 	bool yielded;
+
+	/* True if this iter is handling private KVM page fault. */
+	bool is_private;
 };
=20
 /*
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 9e015b3e0578..ae2ee7cc948a 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -278,18 +278,24 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct =
kvm *kvm,
 		    kvm_mmu_page_as_id(_root) !=3D _as_id) {		\
 		} else
=20
-static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
+static struct kvm_mmu_page *tdp_mmu_alloc_sp(
+	struct kvm_vcpu *vcpu, bool private, bool is_root)
 {
 	struct kvm_mmu_page *sp;
=20
 	sp =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
 	sp->spt =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
=20
+	if (private)
+		kvm_mmu_alloc_private_sp(vcpu, sp, is_root);
+	else
+		kvm_mmu_init_private_sp(sp);
+
 	return sp;
 }
=20
-static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
-			    gfn_t gfn, union kvm_mmu_page_role role)
+static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep, gfn=
_t gfn,
+			    union kvm_mmu_page_role role)
 {
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
=20
@@ -297,7 +303,6 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, td=
p_ptep_t sptep,
 	sp->gfn =3D gfn;
 	sp->ptep =3D sptep;
 	sp->tdp_mmu_page =3D true;
-	kvm_mmu_init_private_sp(sp);
=20
 	trace_kvm_mmu_get_page(sp, true);
 }
@@ -316,7 +321,8 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page *=
child_sp,
 	tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role);
 }
=20
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
+static struct kvm_mmu_page *kvm_tdp_mmu_get_vcpu_root(struct kvm_vcpu *vcp=
u,
+						      bool private)
 {
 	union kvm_mmu_page_role role =3D vcpu->arch.mmu->root_role;
 	struct kvm *kvm =3D vcpu->kvm;
@@ -330,11 +336,12 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *=
vcpu)
 	 */
 	for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) {
 		if (root->role.word =3D=3D role.word &&
+		    is_private_sp(root) =3D=3D private &&
 		    kvm_tdp_mmu_get_root(root))
 			goto out;
 	}
=20
-	root =3D tdp_mmu_alloc_sp(vcpu);
+	root =3D tdp_mmu_alloc_sp(vcpu, private, true);
 	tdp_mmu_init_sp(root, NULL, 0, role);
=20
 	refcount_set(&root->tdp_mmu_root_count, 1);
@@ -344,12 +351,17 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *=
vcpu)
 	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
=20
 out:
-	return __pa(root->spt);
+	return root;
+}
+
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu, bool private)
+{
+	return __pa(kvm_tdp_mmu_get_vcpu_root(vcpu, private)->spt);
 }
=20
 static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
-				u64 old_spte, u64 new_spte, int level,
-				bool shared);
+				bool private_spte, u64 old_spte,
+				u64 new_spte, int level, bool shared);
=20
 static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int =
level)
 {
@@ -422,7 +434,8 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct k=
vm_mmu_page *sp,
  * this thread will be responsible for ensuring the page is freed. Hence t=
he
  * early rcu_dereferences in the function.
  */
-static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
+static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool is_priv=
ate,
+			      bool shared)
 {
 	struct kvm_mmu_page *sp =3D sptep_to_sp(rcu_dereference(pt));
 	int level =3D sp->role.level;
@@ -477,11 +490,22 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt=
ep_t pt, bool shared)
 			 */
 			WRITE_ONCE(*sptep, REMOVED_SPTE);
 		}
-		handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
+		handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn, is_private,
 				    old_child_spte, REMOVED_SPTE, level,
 				    shared);
 	}
=20
+	if (is_private && WARN_ON(static_call(kvm_x86_free_private_sp)(
+					  kvm, sp->gfn, sp->role.level,
+					  kvm_mmu_private_sp(sp)))) {
+		/*
+		 * Failed to unlink Secure EPT page and there is nothing to do
+		 * further.  Intentionally leak the page to prevent the kernel
+		 * from accessing the encrypted page.
+		 */
+		kvm_mmu_init_private_sp(sp);
+	}
+
 	call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
=20
@@ -490,6 +514,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep=
_t pt, bool shared)
  * @kvm: kvm instance
  * @as_id: the address space of the paging structure the SPTE was a part of
  * @gfn: the base GFN that was mapped by the SPTE
+ * @private_spte: the SPTE is private or not
  * @old_spte: The value of the SPTE before the change
  * @new_spte: The value of the SPTE after the change
  * @level: the level of the PT the SPTE is part of in the paging structure
@@ -501,14 +526,30 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt=
ep_t pt, bool shared)
  * This function must be called for all TDP SPTE modifications.
  */
 static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
-				  u64 old_spte, u64 new_spte, int level,
-				  bool shared)
+				  bool private_spte, u64 old_spte,
+				  u64 new_spte, int level, bool shared)
 {
 	bool was_present =3D is_shadow_present_pte(old_spte);
 	bool is_present =3D is_shadow_present_pte(new_spte);
 	bool was_leaf =3D was_present && is_last_spte(old_spte, level);
 	bool is_leaf =3D is_present && is_last_spte(new_spte, level);
-	bool pfn_changed =3D spte_to_pfn(old_spte) !=3D spte_to_pfn(new_spte);
+	kvm_pfn_t old_pfn =3D spte_to_pfn(old_spte);
+	kvm_pfn_t new_pfn =3D spte_to_pfn(new_spte);
+	bool pfn_changed =3D old_pfn !=3D new_pfn;
+	struct kvm_spte_change change =3D {
+		.gfn =3D gfn,
+		.level =3D level,
+		.old =3D {
+			.pfn =3D old_pfn,
+			.is_present =3D was_present,
+			.is_leaf =3D was_leaf,
+		},
+		.new =3D {
+			.pfn =3D new_pfn,
+			.is_present =3D is_present,
+			.is_leaf =3D is_leaf,
+		},
+	};
=20
 	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_4K);
@@ -575,7 +616,7 @@ static void __handle_changed_spte(struct kvm *kvm, int =
as_id, gfn_t gfn,
=20
 	if (was_leaf && is_dirty_spte(old_spte) &&
 	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
-		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
+		kvm_set_pfn_dirty(old_pfn);
=20
 	/*
 	 * Recursively handle child PTs if the change removed a subtree from
@@ -584,16 +625,47 @@ static void __handle_changed_spte(struct kvm *kvm, in=
t as_id, gfn_t gfn,
 	 * pages are kernel allocations and should never be migrated.
 	 */
 	if (was_present && !was_leaf &&
-	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
-		handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
+	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) {
+		WARN_ON(private_spte !=3D
+			is_private_sptep(spte_to_child_pt(old_spte, level)));
+		handle_removed_pt(kvm, spte_to_child_pt(old_spte, level),
+				  private_spte, shared);
+	}
+
+	/*
+	 * Special handling for the private mapping.  We are either
+	 * setting up new mapping at middle level page table, or leaf,
+	 * or tearing down existing mapping.
+	 *
+	 * This is after handling lower page table by above
+	 * handle_remove_tdp_mmu_page().  S-EPT requires to remove S-EPT tables
+	 * after removing childrens.
+	 */
+	if (private_spte &&
+	    /* Ignore change of software only bits. e.g. host_writable */
+	    (was_leaf !=3D is_leaf || was_present !=3D is_present || pfn_changed)=
) {
+		void *sept_page =3D NULL;
+
+		if (is_present && !is_leaf) {
+			struct kvm_mmu_page *sp =3D to_shadow_page(pfn_to_hpa(new_pfn));
+
+			sept_page =3D kvm_mmu_private_sp(sp);
+			WARN_ON(!sept_page);
+			WARN_ON(sp->role.level + 1 !=3D level);
+			WARN_ON(sp->gfn !=3D gfn);
+		}
+		change.sept_page =3D sept_page;
+
+		static_call(kvm_x86_handle_changed_private_spte)(kvm, &change);
+	}
 }
=20
 static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
-				u64 old_spte, u64 new_spte, int level,
-				bool shared)
+				bool private_spte, u64 old_spte, u64 new_spte,
+				int level, bool shared)
 {
-	__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
-			      shared);
+	__handle_changed_spte(kvm, as_id, gfn, private_spte,
+			old_spte, new_spte, level, shared);
 	handle_changed_spte_acc_track(old_spte, new_spte, level);
 	handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
 				      new_spte, level);
@@ -620,6 +692,8 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *k=
vm,
 					  struct tdp_iter *iter,
 					  u64 new_spte)
 {
+	bool freeze_spte =3D iter->is_private && !is_removed_spte(new_spte);
+	u64 tmp_spte =3D freeze_spte ? REMOVED_SPTE : new_spte;
 	u64 *sptep =3D rcu_dereference(iter->sptep);
 	u64 old_spte;
=20
@@ -637,7 +711,7 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *k=
vm,
 	 * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and
 	 * does not hold the mmu_lock.
 	 */
-	old_spte =3D cmpxchg64(sptep, iter->old_spte, new_spte);
+	old_spte =3D cmpxchg64(sptep, iter->old_spte, tmp_spte);
 	if (old_spte !=3D iter->old_spte) {
 		/*
 		 * The page table entry was modified by a different logical
@@ -649,10 +723,14 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm =
*kvm,
 		return -EBUSY;
 	}
=20
-	__handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
-			      new_spte, iter->level, true);
+	__handle_changed_spte(
+		kvm, iter->as_id, iter->gfn, iter->is_private,
+		iter->old_spte, new_spte, iter->level, true);
 	handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level);
=20
+	if (freeze_spte)
+		kvm_tdp_mmu_write_spte(sptep, new_spte);
+
 	return 0;
 }
=20
@@ -715,10 +793,12 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm =
*kvm,
  *		      unless performing certain dirty logging operations.
  *		      Leaving record_dirty_log unset in that case prevents page
  *		      writes from being double counted.
+ * @is_private:       The fault is private.
  */
 static void __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t spte=
p,
 			       u64 old_spte, u64 new_spte, gfn_t gfn, int level,
-			       bool record_acc_track, bool record_dirty_log)
+			       bool record_acc_track, bool record_dirty_log,
+			       bool is_private)
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);
=20
@@ -733,7 +813,8 @@ static void __tdp_mmu_set_spte(struct kvm *kvm, int as_=
id, tdp_ptep_t sptep,
=20
 	kvm_tdp_mmu_write_spte(sptep, new_spte);
=20
-	__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
+	__handle_changed_spte(kvm, as_id, gfn, is_private,
+			      old_spte, new_spte, level, false);
=20
 	if (record_acc_track)
 		handle_changed_spte_acc_track(old_spte, new_spte, level);
@@ -750,7 +831,7 @@ static inline void _tdp_mmu_set_spte(struct kvm *kvm, s=
truct tdp_iter *iter,
=20
 	__tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep, iter->old_spte,
 			   new_spte, iter->gfn, iter->level,
-			   record_acc_track, record_dirty_log);
+			   record_acc_track, record_dirty_log, iter->is_private);
 }
=20
 static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
@@ -783,8 +864,11 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struc=
t kvm *kvm,
 			continue;					\
 		else
=20
-#define tdp_mmu_for_each_pte(_iter, _mmu, _start, _end)		\
-	for_each_tdp_pte(_iter, to_shadow_page(_mmu->root.hpa), _start, _end)
+#define tdp_mmu_for_each_pte(_iter, _mmu, _private, _start, _end)	\
+	for_each_tdp_pte(_iter,						\
+		 to_shadow_page((_private) ? _mmu->private_root_hpa :	\
+				_mmu->root.hpa),			\
+		_start, _end)
=20
 /*
  * Yield if the MMU lock is contended or this thread needs to return contr=
ol
@@ -921,7 +1005,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mm=
u_page *sp)
=20
 	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte,
 			   SHADOW_NONPRESENT_VALUE, sp->gfn, sp->role.level + 1,
-			   true, true);
+			   true, true, is_private_sp(sp));
=20
 	return true;
 }
@@ -937,13 +1021,21 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_=
mmu_page *sp)
  * operation can cause a soft lockup.
  */
 static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
-			      gfn_t start, gfn_t end, bool can_yield, bool flush)
+			      gfn_t start, gfn_t end, bool can_yield, bool flush,
+			      bool drop_private)
 {
 	struct tdp_iter iter;
=20
 	end =3D min(end, tdp_mmu_max_gfn_exclusive());
=20
 	lockdep_assert_held_write(&kvm->mmu_lock);
+	/*
+	 * Extend [start, end) to include GFN shared bit when TDX is enabled,
+	 * and for shared mapping range.
+	 */
+	WARN_ON_ONCE(!is_private_sp(root) && drop_private);
+	start =3D kvm_gfn_for_root(kvm, root, start);
+	end =3D kvm_gfn_for_root(kvm, root, end);
=20
 	rcu_read_lock();
=20
@@ -978,12 +1070,13 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struc=
t kvm_mmu_page *root,
  * MMU lock.
  */
 bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t =
end,
-			   bool can_yield, bool flush)
+			   bool can_yield, bool flush, bool drop_private)
 {
 	struct kvm_mmu_page *root;
=20
 	for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
-		flush =3D tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);
+		flush =3D tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush,
+					  drop_private && is_private_sp(root));
=20
 	return flush;
 }
@@ -1043,6 +1136,12 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kv=
m)
=20
 	lockdep_assert_held_write(&kvm->mmu_lock);
 	list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
+		/*
+		 * Skip private root since private page table
+		 * is only torn down when VM is destroyed.
+		 */
+		if (is_private_sp(root))
+			continue;
 		if (!root->role.invalid &&
 		    !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) {
 			root->role.invalid =3D true;
@@ -1063,14 +1162,22 @@ static int tdp_mmu_map_handle_target_level(struct k=
vm_vcpu *vcpu,
 	u64 new_spte;
 	int ret =3D RET_PF_FIXED;
 	bool wrprot =3D false;
+	unsigned long pte_access =3D ACC_ALL;
+	gfn_t gfn_unalias =3D iter->gfn & ~kvm_gfn_shared_mask(vcpu->kvm);
=20
 	WARN_ON(sp->role.level !=3D fault->goal_level);
+
+	/* TDX shared GPAs are no executable, enforce this for the SDV. */
+	if (kvm_gfn_shared_mask(vcpu->kvm) && !fault->is_private)
+		pte_access &=3D ~ACC_EXEC_MASK;
+
 	if (unlikely(!fault->slot))
-		new_spte =3D make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
+		new_spte =3D make_mmio_spte(vcpu, gfn_unalias, pte_access);
 	else
-		wrprot =3D make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
-					 fault->pfn, iter->old_spte, fault->prefetch, true,
-					 fault->map_writable, &new_spte);
+		wrprot =3D make_spte(vcpu, sp, fault->slot, pte_access,
+				   gfn_unalias, fault->pfn, iter->old_spte,
+				   fault->prefetch, true, fault->map_writable,
+				   &new_spte);
=20
 	if (new_spte =3D=3D iter->old_spte)
 		ret =3D RET_PF_SPURIOUS;
@@ -1149,8 +1256,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct td=
p_iter *iter,
 	return 0;
 }
=20
-static int tdp_mmu_populate_nonleaf(
-	struct kvm_vcpu *vcpu, struct tdp_iter *iter, bool account_nx)
+static int tdp_mmu_populate_nonleaf(struct kvm_vcpu *vcpu, struct tdp_iter=
 *iter, bool account_nx)
 {
 	struct kvm_mmu_page *sp;
 	int ret;
@@ -1158,7 +1264,7 @@ static int tdp_mmu_populate_nonleaf(
 	WARN_ON(is_shadow_present_pte(iter->old_spte));
 	WARN_ON(is_removed_spte(iter->old_spte));
=20
-	sp =3D tdp_mmu_alloc_sp(vcpu);
+	sp =3D tdp_mmu_alloc_sp(vcpu, iter->is_private, false);
 	tdp_mmu_init_child_sp(sp, iter);
=20
 	ret =3D tdp_mmu_link_sp(vcpu->kvm, iter, sp, account_nx, true);
@@ -1175,6 +1281,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm=
_page_fault *fault)
 {
 	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
 	struct tdp_iter iter;
+	gfn_t raw_gfn;
+	bool is_private =3D fault->is_private;
 	int ret;
=20
 	kvm_mmu_hugepage_adjust(vcpu, fault);
@@ -1183,7 +1291,16 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kv=
m_page_fault *fault)
=20
 	rcu_read_lock();
=20
-	tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
+	raw_gfn =3D gpa_to_gfn(fault->addr);
+
+	if (is_error_noslot_pfn(fault->pfn) || kvm_is_reserved_pfn(fault->pfn)) {
+		if (is_private) {
+			rcu_read_unlock();
+			return -EFAULT;
+		}
+	}
+
+	tdp_mmu_for_each_pte(iter, mmu, is_private, raw_gfn, raw_gfn + 1) {
 		if (fault->nx_huge_page_workaround_enabled)
 			disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
=20
@@ -1199,6 +1316,12 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kv=
m_page_fault *fault)
 		    is_large_pte(iter.old_spte)) {
 			if (tdp_mmu_zap_spte_atomic(vcpu->kvm, &iter))
 				break;
+			/*
+			 * TODO: large page support.
+			 * Doesn't support large page for TDX now
+			 */
+			WARN_ON(is_private_sptep(iter.sptep));
+
=20
 			/*
 			 * The iter must explicitly re-read the spte here
@@ -1240,11 +1363,13 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct k=
vm_page_fault *fault)
 	return ret;
 }
=20
+/* Used by mmu notifier via kvm_unmap_gfn_range() */
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *ra=
nge,
-				 bool flush)
+				 bool flush, bool drop_private)
 {
 	return kvm_tdp_mmu_zap_leafs(kvm, range->slot->as_id, range->start,
-				     range->end, range->may_block, flush);
+				     range->end, range->may_block, flush,
+				     drop_private);
 }
=20
 typedef bool (*tdp_handler_t)(struct kvm *kvm, struct tdp_iter *iter,
@@ -1427,7 +1552,8 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm,
 	return spte_set;
 }
=20
-static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp)
+static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(
+	gfp_t gfp, bool is_private)
 {
 	struct kvm_mmu_page *sp;
=20
@@ -1438,6 +1564,12 @@ static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_s=
plit(gfp_t gfp)
 		return NULL;
=20
 	sp->spt =3D (void *)__get_free_page(gfp);
+	if (is_private) {
+		if (kvm_alloc_private_sp_for_split(sp, gfp)) {
+			free_page((unsigned long)sp->spt);
+			sp->spt =3D NULL;
+		}
+	}
 	if (!sp->spt) {
 		kmem_cache_free(mmu_page_header_cache, sp);
 		return NULL;
@@ -1451,6 +1583,11 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spl=
it(struct kvm *kvm,
 						       bool shared)
 {
 	struct kvm_mmu_page *sp;
+	bool is_private =3D iter->is_private;
+
+	/* TODO: For now large page isn't supported for private SPTE. */
+	WARN_ON(is_private);
+	WARN_ON(iter->is_private !=3D is_private_sptep(iter->sptep));
=20
 	/*
 	 * Since we are allocating while under the MMU lock we have to be
@@ -1461,7 +1598,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli=
t(struct kvm *kvm,
 	 * If this allocation fails we drop the lock and retry with reclaim
 	 * allowed.
 	 */
-	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT);
+	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT, is_privat=
e);
 	if (sp)
 		return sp;
=20
@@ -1473,7 +1610,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli=
t(struct kvm *kvm,
 		write_unlock(&kvm->mmu_lock);
=20
 	iter->yielded =3D true;
-	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_KERNEL_ACCOUNT);
+	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_KERNEL_ACCOUNT, is_private);
=20
 	if (shared)
 		read_lock(&kvm->mmu_lock);
@@ -1863,10 +2000,14 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64=
 addr, u64 *sptes,
 	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
 	gfn_t gfn =3D addr >> PAGE_SHIFT;
 	int leaf =3D -1;
+	bool is_private =3D kvm_is_private_gpa(vcpu->kvm, addr);
=20
 	*root_level =3D vcpu->arch.mmu->root_role.level;
=20
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	if (WARN_ON(is_private))
+		return leaf;
+
+	tdp_mmu_for_each_pte(iter, mmu, false, gfn, gfn + 1) {
 		leaf =3D iter.level;
 		sptes[leaf] =3D iter.old_spte;
 	}
@@ -1893,7 +2034,10 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_v=
cpu *vcpu, u64 addr,
 	gfn_t gfn =3D addr >> PAGE_SHIFT;
 	tdp_ptep_t sptep =3D NULL;
=20
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	/* fast page fault for private GPA isn't supported. */
+	WARN_ON_ONCE(kvm_is_private_gpa(vcpu->kvm, addr));
+
+	tdp_mmu_for_each_pte(iter, mmu, false, gfn, gfn + 1) {
 		*spte =3D iter.old_spte;
 		sptep =3D iter.sptep;
 	}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index c163f7cc23ca..d1655571eb2f 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -5,7 +5,7 @@
=20
 #include <linux/kvm_host.h>
=20
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu, bool private);
=20
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *=
root)
 {
@@ -16,7 +16,8 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu=
_page *root,
 			  bool shared);
=20
 bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start,
-				 gfn_t end, bool can_yield, bool flush);
+				gfn_t end, bool can_yield, bool flush,
+				bool drop_private);
 bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp);
 void kvm_tdp_mmu_zap_all(struct kvm *kvm);
 void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
@@ -25,7 +26,7 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm);
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
=20
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *ra=
nge,
-				 bool flush);
+				 bool flush, bool drop_private);
 bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *rang=
e);
 bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range=
);
 bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range=
);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6edce5de54ff..0ed431a1e35f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -196,6 +196,7 @@ bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
=20
 	return true;
 }
+EXPORT_SYMBOL_GPL(kvm_is_reserved_pfn);
=20
 /*
  * Switches to specified vcpu, until a matching vcpu_put()
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 328D6C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384073AbiEESXx (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:23:53 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36896 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383309AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 338635DBCE;
        Thu,  5 May 2022 11:16:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774560; x=1683310560;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Fb3dBDV4drTH1Irv7AqwIGDFRaog5goehiwWHDo/z9Q=;
  b=FBPDH40SO0YDzvz4IsDSabrmXWYhTpVY0C0UUidp7C0fvtfm/xsqNA8f
   jbgFY/JcH22HQdLWbOAh43w2aT59jUCsWzHZffQ7b6ktbSFxxMTFj/KEa
   6NbFqsQb6+n4pAZEiWBMadvAE6sPoeUqayQuxbi9irmGBy6jANh4QQvym
   UWqXj2mcUkWGJW/jSTLetcPwtBdvPuJRR8KzYnB5VfPOT7g/SD9dON/vc
   pFjWSYSxzLFIjfCEIjrYS61gHBWjIHIe3FizTWK3IacG6vOd+r7J6fQ/l
   K5U3AzbUfvE/DGArkryXOh6PMDYqSq8VMPSOvw9KkHKGRyjyyZmCiCTLF
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742036"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742036"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:47 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083310"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:47 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 048/104] [MARKER] The start of TDX KVM patch series:
 TDX EPT violation
Date: Thu,  5 May 2022 11:14:42 -0700
Message-Id: 
 <3bf0b02c05298a4d3f33a2ee31a629401fb16909.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TDX EPT
violation.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index d5cace00c433..c3e675bea802 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -19,12 +19,12 @@ Patch Layer status
 * TDX architectural definitions:        Applied
 * TD VM creation/destruction:           Applied
 * TD vcpu creation/destruction:         Applied
-* TDX EPT violation:                    Not yet
+* TDX EPT violation:                    Applying
 * TD finalization:                      Not yet
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
 * KVM TDP refactoring for TDX:          Applied
-* KVM TDP MMU hooks:                    Applying
+* KVM TDP MMU hooks:                    Applied
 * KVM TDP MMU MapGPA:                   Not yet
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7777EC433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:20:05 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1349180AbiEESXm (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:23:42 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37008 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383290AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87BD7393F6;
        Thu,  5 May 2022 11:15:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774559; x=1683310559;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=69sh3pw1NcgscHP0fbjDPQ4nsOQbDzwcSXBGPmnH3Sc=;
  b=E+qBnxiWagoUAdW6fvQciAOVdHPwex4u8Ca6flkFyKcMQ1Di27xbezNc
   l1trEjSWhM0dN0TQP6N7H7cgJV0zF5Ram0hAgOGv5WPFTRgAJbPazxsPl
   qHR7ybYB2DC/2cvovyaCjaHSjbZhYlaGGfg9ZnK/7F0hbce3LIFOgjF2V
   iksSfF9MmrZE/THI56y8IB6KCvbLHSO4y9pgh1vvjijfA9OgtQrCqZYbf
   ZketWL3K+Ldp0PA1GEK/yntnLOzR5MYE0LEhi58OfkwXsR3SWh7Lgef8O
   vYQdkrsL8nUiKa5ZWAYcgfWh17Hg+WZ+jkqF5vmzaSXwmM3oq4JPvSx0N
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742037"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742037"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:47 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083313"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:47 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 049/104] KVM: x86/mmu: Disallow dirty logging for x86
 TDX
Date: Thu,  5 May 2022 11:14:43 -0700
Message-Id: 
 <d8df1cc5528d567a4f238a46365bf1aef930dec6.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX doesn't support dirty logging.  Report dirty logging isn't supported so
that device model, for example qemu, can properly handle it.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c       |  5 +++++
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c      | 15 ++++++++++++---
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9b5d6007025..8e6d54faa7ba 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13122,6 +13122,11 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, un=
signed int size,
 }
 EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
=20
+bool kvm_arch_dirty_log_supported(struct kvm *kvm)
+{
+	return kvm->arch.vm_type !=3D KVM_X86_TDX_VM;
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 55dd08cca5d2..8488d85fdb33 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1452,6 +1452,7 @@ bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcp=
u *vcpu);
 int kvm_arch_post_init_vm(struct kvm *kvm);
 void kvm_arch_pre_destroy_vm(struct kvm *kvm);
 int kvm_arch_create_vm_debugfs(struct kvm *kvm);
+bool kvm_arch_dirty_log_supported(struct kvm *kvm);
=20
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0ed431a1e35f..4bf7178e42bd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1458,9 +1458,18 @@ static void kvm_replace_memslot(struct kvm *kvm,
 	}
 }
=20
-static int check_memory_region_flags(const struct kvm_userspace_memory_reg=
ion *mem)
+bool __weak kvm_arch_dirty_log_supported(struct kvm *kvm)
 {
-	u32 valid_flags =3D KVM_MEM_LOG_DIRTY_PAGES;
+	return true;
+}
+
+static int check_memory_region_flags(struct kvm *kvm,
+				     const struct kvm_userspace_memory_region *mem)
+{
+	u32 valid_flags =3D 0;
+
+	if (kvm_arch_dirty_log_supported(kvm))
+		valid_flags |=3D KVM_MEM_LOG_DIRTY_PAGES;
=20
 #ifdef __KVM_HAVE_READONLY_MEM
 	valid_flags |=3D KVM_MEM_READONLY;
@@ -1862,7 +1871,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	int as_id, id;
 	int r;
=20
-	r =3D check_memory_region_flags(mem);
+	r =3D check_memory_region_flags(kvm, mem);
 	if (r)
 		return r;
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 24207C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:22:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384262AbiEESZs (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:25:48 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36910 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383245AbiEESTq (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:46 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64DBF15837;
        Thu,  5 May 2022 11:16:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774562; x=1683310562;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ahkXmwLQ9uVoNjJiiXbRcJvEeFXcAtRP+bydslkMsvQ=;
  b=oGdf90iWZ2Qn8A0RY1sPI2YQwEFLuNdaL3MHUZkfZSQXWE6/GfMxKr+P
   uleAex/ZiPHOCiMzNGXRIiEQ/Rps7tAwCYKYZDupi8aopvbJcAFjCOwIl
   L3dFzwuEC2aTopYm199Na5PjalKLiMefArHsoqPs7hbEYNKg7T59t/zT3
   viV2xt9LXkc2m7SmN8xxgsH/Qi3aaph4nWWBGMEjnlCHGVJ1jjOlH1Mxj
   umQcRfzo1LaHdu3q3H6buVHYwJcoksbRO47y98HlT49UqDQwHEtK7F8OF
   4fFrybUXMVlpfB+y89G71UFJeImZkP8fg1Qs6n8dB36+4um9LVBUri5mb
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742039"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742039"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:47 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083317"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:47 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 050/104] KVM: x86/tdp_mmu: Ignore unsupported mmu
 operation on private GFNs
Date: Thu,  5 May 2022 11:14:44 -0700
Message-Id: 
 <9be9dc86983ab1ffdc679c1ab5de025d1487dc40.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Some KVM MMU operations (dirty page logging, page migration, aging page)
aren't supported for private GFNs (yet) with the first generation of TDX.
Silently return on unsupported TDX KVM MMU operations.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu_internal.h |  1 +
 arch/x86/kvm/mmu/tdp_mmu.c      | 74 ++++++++++++++++++++++++++++++---
 arch/x86/kvm/x86.c              |  3 ++
 3 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index affbfe895dab..1c4884220660 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -5,6 +5,7 @@
 #include <linux/types.h>
 #include <linux/kvm_host.h>
 #include <asm/kvm_host.h>
+#include "mmu.h"
=20
 #include "mmu.h"
=20
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index ae2ee7cc948a..2aa2cb8a9b05 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -387,6 +387,8 @@ static void handle_changed_spte_dirty_log(struct kvm *k=
vm, int as_id, gfn_t gfn,
=20
 	if ((!is_writable_pte(old_spte) || pfn_changed) &&
 	    is_writable_pte(new_spte)) {
+		/* For memory slot operations, use GFN without aliasing */
+		gfn =3D gfn & ~kvm_gfn_shared_mask(kvm);
 		slot =3D __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
 		mark_page_dirty_in_slot(kvm, slot, gfn);
 	}
@@ -1377,7 +1379,8 @@ typedef bool (*tdp_handler_t)(struct kvm *kvm, struct=
 tdp_iter *iter,
=20
 static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
 						   struct kvm_gfn_range *range,
-						   tdp_handler_t handler)
+						   tdp_handler_t handler,
+						   bool only_shared)
 {
 	struct kvm_mmu_page *root;
 	struct tdp_iter iter;
@@ -1388,9 +1391,23 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(s=
truct kvm *kvm,
 	 * into this helper allow blocking; it'd be dead, wasteful code.
 	 */
 	for_each_tdp_mmu_root(kvm, root, range->slot->as_id) {
+		gfn_t start;
+		gfn_t end;
+
+		if (only_shared && is_private_sp(root))
+			continue;
+
 		rcu_read_lock();
=20
-		tdp_root_for_each_leaf_pte(iter, root, range->start, range->end)
+		/*
+		 * For TDX shared mapping, set GFN shared bit to the range,
+		 * so the handler() doesn't need to set it, to avoid duplicated
+		 * code in multiple handler()s.
+		 */
+		start =3D kvm_gfn_for_root(kvm, root, range->start);
+		end =3D kvm_gfn_for_root(kvm, root, range->end);
+
+		tdp_root_for_each_leaf_pte(iter, root, start, end)
 			ret |=3D handler(kvm, &iter, range);
=20
 		rcu_read_unlock();
@@ -1434,7 +1451,12 @@ static bool age_gfn_range(struct kvm *kvm, struct td=
p_iter *iter,
=20
 bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *rang=
e)
 {
-	return kvm_tdp_mmu_handle_gfn(kvm, range, age_gfn_range);
+	/*
+	 * First TDX generation doesn't support clearing A bit for private
+	 * mapping, since there's no secure EPT API to support it.  However
+	 * it's a legitimate request for TDX guest.
+	 */
+	return kvm_tdp_mmu_handle_gfn(kvm, range, age_gfn_range, true);
 }
=20
 static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter,
@@ -1445,7 +1467,7 @@ static bool test_age_gfn(struct kvm *kvm, struct tdp_=
iter *iter,
=20
 bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
-	return kvm_tdp_mmu_handle_gfn(kvm, range, test_age_gfn);
+	return kvm_tdp_mmu_handle_gfn(kvm, range, test_age_gfn, false);
 }
=20
 static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
@@ -1490,8 +1512,11 @@ bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struc=
t kvm_gfn_range *range)
 	 * No need to handle the remote TLB flush under RCU protection, the
 	 * target SPTE _must_ be a leaf SPTE, i.e. cannot result in freeing a
 	 * shadow page.  See the WARN on pfn_changed in __handle_changed_spte().
+	 *
+	 * .change_pte() callback should not happen for private page, because
+	 * for now TDX private pages are pinned during VM's life time.
 	 */
-	return kvm_tdp_mmu_handle_gfn(kvm, range, set_spte_gfn);
+	return kvm_tdp_mmu_handle_gfn(kvm, range, set_spte_gfn, true);
 }
=20
 /*
@@ -1545,6 +1570,14 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm,
=20
 	lockdep_assert_held_read(&kvm->mmu_lock);
=20
+	/*
+	 * Because first TDX generation doesn't support write protecting private
+	 * mappings and kvm_arch_dirty_log_supported(kvm) =3D false, it's a bug
+	 * to reach here for guest TD.
+	 */
+	if (WARN_ON(!kvm_arch_dirty_log_supported(kvm)))
+		return false;
+
 	for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id, true)
 		spte_set |=3D wrprot_gfn_range(kvm, root, slot->base_gfn,
 			     slot->base_gfn + slot->npages, min_level);
@@ -1809,6 +1842,14 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm,
=20
 	lockdep_assert_held_read(&kvm->mmu_lock);
=20
+	/*
+	 * First TDX generation doesn't support clearing dirty bit,
+	 * since there's no secure EPT API to support it.  It is a
+	 * bug to reach here for TDX guest.
+	 */
+	if (WARN_ON(!kvm_arch_dirty_log_supported(kvm)))
+		return false;
+
 	for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id, true)
 		spte_set |=3D clear_dirty_gfn_range(kvm, root, slot->base_gfn,
 				slot->base_gfn + slot->npages);
@@ -1875,6 +1916,13 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *k=
vm,
 	struct kvm_mmu_page *root;
=20
 	lockdep_assert_held_write(&kvm->mmu_lock);
+	/*
+	 * First TDX generation doesn't support clearing dirty bit,
+	 * since there's no secure EPT API to support it.  For now silently
+	 * ignore KVM_CLEAR_DIRTY_LOG.
+	 */
+	if (!kvm_arch_dirty_log_supported(kvm))
+		return;
 	for_each_tdp_mmu_root(kvm, root, slot->as_id)
 		clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
 }
@@ -1928,6 +1976,13 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *k=
vm,
=20
 	lockdep_assert_held_read(&kvm->mmu_lock);
=20
+	/*
+	 * This should only be reachable when diryt-log is supported. It's a
+	 * bug to reach here.
+	 */
+	if (WARN_ON(!kvm_arch_dirty_log_supported(kvm)))
+		return;
+
 	for_each_valid_tdp_mmu_root_yield_safe(kvm, root, slot->as_id, true)
 		zap_collapsible_spte_range(kvm, root, slot);
 }
@@ -1981,6 +2036,15 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
 	bool spte_set =3D false;
=20
 	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	/*
+	 * First TDX generation doesn't support write protecting private
+	 * mappings, silently ignore the request.  KVM_GET_DIRTY_LOG etc
+	 * can reach here, no warning.
+	 */
+	if (!kvm_arch_dirty_log_supported(kvm))
+		return false;
+
 	for_each_tdp_mmu_root(kvm, root, slot->as_id)
 		spte_set |=3D write_protect_gfn(kvm, root, gfn, min_level);
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8e6d54faa7ba..77a9403bdd02 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12181,6 +12181,9 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kv=
m,
 	u32 new_flags =3D new ? new->flags : 0;
 	bool log_dirty_pages =3D new_flags & KVM_MEM_LOG_DIRTY_PAGES;
=20
+	if (!kvm_arch_dirty_log_supported(kvm) && log_dirty_pages)
+		return;
+
 	/*
 	 * Update CPU dirty logging if dirty logging is being toggled.  This
 	 * applies to all operations.
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D1FA7C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:24:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383721AbiEES2A (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:28:00 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36854 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383337AbiEESTr (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:47 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 816B518E10;
        Thu,  5 May 2022 11:16:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774563; x=1683310563;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=eYmfvBfYMquS6jwfZ3SHdNcbFWex4Khf+QLEW+sIm5g=;
  b=m2FidzIGh0E1vo+TmnE8MhYqTXGYV7tSTmm7N5nP845K+TUvkW4paaON
   b/FUnV7ZXGR8Ppcwq7IxxiGx4izRcoT2HqI5uSQOcfnQmtCCiAgAacTNq
   MILzO9iG2bwsRObl6miSFl9JXSlwYJe5qq3YCpSMmnQUzCD0SlM4UhAeb
   tBx+QGBTgeErrgt82xViBn4Hdak22ri5o4wY/FlMNjT3UoWJKFgr4lq72
   K9Fug5XAwrJCTbvaHM6Pdh0+P/5/L09GDM89N5KT+eF+af8606Bv2Fju2
   q9FCLEt8U/mG36cmBufD2W1GM5oyI1BDtl8brV2m1tpb4lZ/5URKL9gBL
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742041"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742041"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083321"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:47 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 051/104] KVM: VMX: Split out guts of EPT violation to
 common/exposed function
Date: Thu,  5 May 2022 11:14:45 -0700
Message-Id: 
 <edf100867cc00bd9038dd5fca7f63fab750b93b1.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

The difference of TDX EPT violation is how to retrieve information, GPA,
and exit qualification.  To share the code to handle EPT violation, split
out the guts of EPT violation handler so that VMX/TDX exit handler can call
it after retrieving GPA and exit qualification.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/common.h | 33 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c    | 32 ++++++--------------------------
 2 files changed, 39 insertions(+), 26 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/common.h

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
new file mode 100644
index 000000000000..235908f3e044
--- /dev/null
+++ b/arch/x86/kvm/vmx/common.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_X86_VMX_COMMON_H
+#define __KVM_X86_VMX_COMMON_H
+
+#include <linux/kvm_host.h>
+
+#include "mmu.h"
+
+static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t =
gpa,
+					     unsigned long exit_qualification)
+{
+	u64 error_code;
+
+	/* Is it a read fault? */
+	error_code =3D (exit_qualification & EPT_VIOLATION_ACC_READ)
+		     ? PFERR_USER_MASK : 0;
+	/* Is it a write fault? */
+	error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_WRITE)
+		      ? PFERR_WRITE_MASK : 0;
+	/* Is it a fetch fault? */
+	error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_INSTR)
+		      ? PFERR_FETCH_MASK : 0;
+	/* ept page table entry is present? */
+	error_code |=3D (exit_qualification & EPT_VIOLATION_RWX_MASK)
+		      ? PFERR_PRESENT_MASK : 0;
+
+	error_code |=3D (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) !=3D =
0 ?
+	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
+
+	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
+}
+
+#endif /* __KVM_X86_VMX_COMMON_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 60dedae31426..1a2e8d195891 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -50,6 +50,7 @@
 #include <asm/vmx.h>
=20
 #include "capabilities.h"
+#include "common.h"
 #include "cpuid.h"
 #include "evmcs.h"
 #include "hyperv.h"
@@ -5513,11 +5514,10 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
=20
 static int handle_ept_violation(struct kvm_vcpu *vcpu)
 {
-	unsigned long exit_qualification;
-	gpa_t gpa;
-	u64 error_code;
+	unsigned long exit_qualification =3D vmx_get_exit_qual(vcpu);
+	gpa_t gpa =3D vmcs_read64(GUEST_PHYSICAL_ADDRESS);
=20
-	exit_qualification =3D vmx_get_exit_qual(vcpu);
+	trace_kvm_page_fault(gpa, exit_qualification);
=20
 	/*
 	 * EPT violation happened while executing iret from NMI,
@@ -5526,29 +5526,9 @@ static int handle_ept_violation(struct kvm_vcpu *vcp=
u)
 	 * AAK134, BY25.
 	 */
 	if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
-			enable_vnmi &&
-			(exit_qualification & INTR_INFO_UNBLOCK_NMI))
+	    enable_vnmi && (exit_qualification & INTR_INFO_UNBLOCK_NMI))
 		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
=20
-	gpa =3D vmcs_read64(GUEST_PHYSICAL_ADDRESS);
-	trace_kvm_page_fault(gpa, exit_qualification);
-
-	/* Is it a read fault? */
-	error_code =3D (exit_qualification & EPT_VIOLATION_ACC_READ)
-		     ? PFERR_USER_MASK : 0;
-	/* Is it a write fault? */
-	error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_WRITE)
-		      ? PFERR_WRITE_MASK : 0;
-	/* Is it a fetch fault? */
-	error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_INSTR)
-		      ? PFERR_FETCH_MASK : 0;
-	/* ept page table entry is present? */
-	error_code |=3D (exit_qualification & EPT_VIOLATION_RWX_MASK)
-		      ? PFERR_PRESENT_MASK : 0;
-
-	error_code |=3D (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) !=3D =
0 ?
-	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
-
 	vcpu->arch.exit_qualification =3D exit_qualification;
=20
 	/*
@@ -5562,7 +5542,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	if (unlikely(allow_smaller_maxphyaddr && kvm_vcpu_is_illegal_gpa(vcpu, gp=
a)))
 		return kvm_emulate_instruction(vcpu, 0);
=20
-	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
+	return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification);
 }
=20
 static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8556DC4167B
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384805AbiEES37 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:29:59 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36372 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383115AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A5B0186F1;
        Thu,  5 May 2022 11:15:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774548; x=1683310548;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=n0Cp++5LoPFGP8ehZslxep34XcZ0liOLwOyDKsAMJS4=;
  b=fQWAqhTYAWDStsjAIeYYqIB+YnmfO/h2AoEApCCAdU4qbOL0/FYVFyBh
   HjTexJrl0lPFZtIUI7WnigDFFgMjlTYuhuRhdeNCEQskdO49Zd7xx/BlA
   8PBWaS3p2T79AGEdmbil7zQuS4IDVDGAgkxwJHpMRtut7JBtdY1dWMNIC
   1pfdqh/mrcXe+h6TbUi3camLPDW6YJJby7ADkStNTtwz8UrDC03gixgqO
   laoibi2JkpdoWEX44mSaNkckRmZwdIy/YIHaA6c0+oxzG1mVqGUkM3D+u
   oUzSPSaLfn/XqKiNksZqLaRmXRypuEYeu2B6+mAnVSBRdI+MfYqGe8Ta5
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="293409459"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="293409459"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083325"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:47 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 052/104] KVM: VMX: Move setting of EPT MMU masks to
 common VT-x code
Date: Thu,  5 May 2022 11:14:46 -0700
Message-Id: 
 <ca26d73d9f6c5755aacbfd95ba4cf708f1fa90cc.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

EPT MMU masks are used commonly for VMX and TDX.  The value needs to be
initialized in common code before both VMX/TDX-specific initialization
code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c | 5 +++++
 arch/x86/kvm/vmx/vmx.c  | 4 ----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index ce12cc8276ef..9f4c3a0bcc12 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -4,6 +4,7 @@
 #include "x86_ops.h"
 #include "vmx.h"
 #include "nested.h"
+#include "mmu.h"
 #include "pmu.h"
 #include "tdx.h"
=20
@@ -26,6 +27,10 @@ static __init int vt_hardware_setup(void)
=20
 	enable_tdx =3D enable_tdx && !tdx_hardware_setup(&vt_x86_ops);
=20
+	if (enable_ept)
+		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
+				      cpu_has_vmx_ept_execute_only());
+
 	return 0;
 }
=20
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a2e8d195891..df78e2220fec 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8021,10 +8021,6 @@ __init int vmx_hardware_setup(void)
=20
 	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
=20
-	if (enable_ept)
-		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
-				      cpu_has_vmx_ept_execute_only());
-
 	/*
 	 * Setup shadow_me_value/shadow_me_mask to include MKTME KeyID
 	 * bits to shadow_zero_check.
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A1A7CC433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:18:34 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383756AbiEESWK (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:22:10 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36458 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383129AbiEESTa (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:30 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74386193CD;
        Thu,  5 May 2022 11:15:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774549; x=1683310549;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=f1ODjRHohSp8LErDG1PBYRtvrWL3MuC/3/bcDQB5tmY=;
  b=Oxm/0tjpg/pL+3fr8ZKz6Nfj+x9ayfSDgLj8QsDog17djswqhXV7oYzw
   6lz5yVfXUjQ9yIh6V9vLI/TpcqpUNirpvHnSwwb2s+emR0nlyVhy/efiN
   lmeKpvB50f/anOEBaHyXmHx8G5nwzvhPuRVrUJ642VTGJfUA4Ep8bhFvg
   sCe+ixnp8jtjsf1efbFTdnHXTv2gCP0v8OP2eMf2A/9Rpj8l0PbccTifB
   AEH/bqLNqL9vmCqeIiuRb1Z6ACSzAbpM8YVqUFiQ3ewmyAe8oLfqxSCBm
   5kIUN1x7ChPxRlo397L9MXAT9ZgHKYww8lhJ0axx6UZKVJWVFyntK8em7
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="293409460"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="293409460"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083329"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 053/104] KVM: TDX: Add load_mmu_pgd method for TDX
Date: Thu,  5 May 2022 11:14:47 -0700
Message-Id: 
 <88270097c7bb87253223959cd8d71c78bb119468.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

For virtual IO, the guest TD shares guest pages with VMM without
encryption.  Shared EPT is used to map guest pages in unprotected way.

Add the VMCS field encoding for the shared EPTP, which will be used by
TDX to have separate EPT walks for private GPAs (existing EPTP) versus
shared GPAs (new shared EPTP).

Set shared EPT pointer value for the TDX guest to initialize TDX MMU.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/vmx.h |  1 +
 arch/x86/kvm/vmx/main.c    | 11 ++++++++++-
 arch/x86/kvm/vmx/tdx.c     |  5 +++++
 arch/x86/kvm/vmx/x86_ops.h |  4 ++++
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index d3c8abcaa35e..257bc8e86f47 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -233,6 +233,7 @@ enum vmcs_field {
 	TSC_MULTIPLIER_HIGH             =3D 0x00002033,
 	TERTIARY_VM_EXEC_CONTROL	=3D 0x00002034,
 	TERTIARY_VM_EXEC_CONTROL_HIGH	=3D 0x00002035,
+	SHARED_EPT_POINTER		=3D 0x0000203C,
 	PID_POINTER_TABLE		=3D 0x00002042,
 	PID_POINTER_TABLE_HIGH		=3D 0x00002043,
 	GUEST_PHYSICAL_ADDRESS          =3D 0x00002400,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 9f4c3a0bcc12..252b7298b230 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -110,6 +110,15 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool =
init_event)
 	return vmx_vcpu_reset(vcpu, init_event);
 }
=20
+static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
+			int pgd_level)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
+
+	vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -228,7 +237,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.write_tsc_offset =3D vmx_write_tsc_offset,
 	.write_tsc_multiplier =3D vmx_write_tsc_multiplier,
=20
-	.load_mmu_pgd =3D vmx_load_mmu_pgd,
+	.load_mmu_pgd =3D vt_load_mmu_pgd,
=20
 	.check_intercept =3D vmx_check_intercept,
 	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 54573537e2b8..dd8553d9f31b 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -530,6 +530,11 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	vcpu->kvm->vm_bugged =3D true;
 }
=20
+void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
+{
+	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index ab94f95bb915..b59dbc1f9906 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -144,6 +144,8 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_ev=
ent);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
+
+void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_leve=
l);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
@@ -161,6 +163,8 @@ static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu=
, bool init_event) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
+
+static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,=
 int root_level) {}
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 46C37C4321E
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384468AbiEES30 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:29:26 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36466 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383132AbiEESTa (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:30 -0400
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A58D919C04;
        Thu,  5 May 2022 11:15:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774549; x=1683310549;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=AN2hJ1AsB9/+CMP2S4H2VIiRG8T70amEoJdbWSJ2WTc=;
  b=TQaC8fc/ZtdoQACDfKpSdvmVXkkkGiAuLkbB8g0jua8M3bYWwzDfqFMc
   AHgN8KLtAAsFFd63hi6yuCVoIDJG7oPmCorYH+TUeGejBMWZv07miR/WJ
   5AmRHrMlvNwPlaCzl7FItRUqXvxoM5o3E57sXnEJcBWi6E+uGYt0fGykY
   IE4NNQltR9dh0hZtDVfRgdDxfzsQgGCbRIXXGuk3LfHdwCATbLPqm6rRm
   tjpw2vS5N2TFI/thg65baq/ohtdwwi4yrY4GCrRMcJ0MMJX7Xt6Gm3DzQ
   T+eK7tDdapmPB+XOAOv8O0u14HTJir0Tpl65EQp9ehx80ukLEll73FmXl
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="293409461"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="293409461"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083334"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 054/104] KVM: TDX: don't request
 KVM_REQ_APIC_PAGE_RELOAD
Date: Thu,  5 May 2022 11:14:48 -0700
Message-Id: 
 <47f9fcead9424ec88c1a25a8490e83cd236f2223.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX doesn't need APIC page depending on vapic and its callback is
WARN_ON_ONCE(is_tdx).  To avoid unnecessary overhead and WARN_ON_ONCE(),
skip requesting KVM_REQ_APIC_PAGE_RELOAD when TD.

  ------------[ cut here ]------------
  WARNING: CPU: 134 PID: 42205 at arch/x86/kvm/vmx/main.c:696 vt_set_apic_a=
ccess_page_addr+0x3c/0x50 [kvm_intel]
  Modules linked in: squashfs nls_iso8859_1 nls_cp437 vhost_vsock vhost vho=
st_iotlb tdx_debug kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul gh=
ash_clmulni_intel aesni_intel crypto_simd cryptd i2c_i801 i2c_smbus i2c_ismt
  CPU: 134 PID: 42205 Comm: tdx_vm_tests Tainted: G        W         5.17.0=
-rc8 #165 4baba67c36c7c1001d782c47f2964b779a5659c7
  Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS EGSDCRB1.S=
YS.0066.D24.2110072326 10/07/2021
  RIP: 0010:vt_set_apic_access_page_addr+0x3c/0x50 [kvm_intel]
  Code: e7 d5 49 8b 1c 24 48 8d bb 78 15 00 00 e8 4c 78 e7 d5 48 83 bb 78 1=
5 00 00 01 74 0d 4c 89 e7 e8 7a 9b fd ff 5b 41 5c 5d c3 90 <0f  0b 90 5b 41=
 5c 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f
  RSP: 0018:ffa0000027477b68 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffa00000572d9000 RCX: ffffffffde6864d4
  RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffa00000572da578
  RBP: ffa0000027477b78 R08: 0000000000000001 R09: ffe21c006df80008
  R10: ff1100036fc0003f R11: ffe21c006df80007 R12: ff1100036fc00000
  R13: ff1100036fc000d8 R14: ff1100036fc00038 R15: ff1100036fc00000
  FS:  00007fdf1ad32740(0000) GS:ff11000e1ed00000(0000) knlGS:0000000000000=
000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fdf15f1b000 CR3: 000000011e462005 CR4: 0000000000773ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   <TASK>
   vcpu_enter_guest+0x145d/0x24d0 [kvm]
   ? inject_pending_event+0x750/0x750 [kvm]
   ? xsaves+0x31/0x40
   ? rcu_read_lock_held_common+0x1e/0x60
   ? rcu_read_lock_sched_held+0x60/0xe0
   ? rcu_read_lock_bh_held+0xc0/0xc0
   kvm_arch_vcpu_ioctl_run+0x25d/0xcc0 [kvm]
   kvm_vcpu_ioctl+0x414/0xa30 [kvm]]
   ? kvm_clear_dirty_log_protect+0x4d0/0x4d0 [kvm]
   ? userfaultfd_unmap_prep+0x240/0x240
   ? __up_read+0x17f/0x530
   ? rwsem_wake+0x110/0x110
   ? __do_munmap+0x437/0x7c0
   ? rcu_read_lock_held_common+0x1e/0x60
   ? rcu_read_lock_sched_held+0x60/0xe0
   ? rcu_read_lock_sched_held+0x60/0xe0
   ? __kasan_check_read+0x11/0x20
   ? __fget_light+0xa9/0x100
   __x64_sys_ioctl+0xc0/0x100
   do_syscall_64+0x39/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7fdf1ae493db
  Code: 0f 1e fa 48 8b 05 b5 7a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff f=
f ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48  3d 01 f0 ff=
 ff 73 01 c3 48 8b 0d 85 7a 0d 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffcf8bdfb38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00000000006f26d0 RCX: 00007fdf1ae493db
  RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000007
  RBP: 0000000000000000 R08: 0000000000411d36 R09: 0000000000000000
  R10: fffffffffffffb69 R11: 0000000000000246 R12: 0000000000402410
  R13: 00000000006f02b0 R14: 0000000000000000 R15: 0000000000000000
   </TASK>
  irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  hardirqs last disabled at (0): [<ffffffffb40c809a>] copy_process+0xaca/0x=
3270
  softirqs last  enabled at (0): [<ffffffffb40c809a>] copy_process+0xaca/0x=
3270
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  ---[ end trace 0000000000000000 ]---

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 77a9403bdd02..d2dc4333f493 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9981,7 +9981,8 @@ void kvm_arch_mmu_notifier_invalidate_range(struct kv=
m *kvm,
 	 * Update it when it becomes invalid.
 	 */
 	apic_address =3D gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
-	if (start <=3D apic_address && apic_address < end)
+	if (start <=3D apic_address && apic_address < end &&
+	    !kvm_gfn_shared_mask(kvm))
 		kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
 }
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6C1C1C4167D
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384687AbiEES3r (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:29:47 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36392 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383092AbiEEST3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:29 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0CE91928D;
        Thu,  5 May 2022 11:15:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774548; x=1683310548;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=B/aN/eyDCa2O5pBSXBCf52WeCECJEO5WYDZiYdsfBiE=;
  b=jjES0GkXv4aUCHhcylYMTmhBHkasnvdpsVPa0GfAfWEI4VPkn0FgaMFQ
   QJvbwnyxsSVh1sHQFsmyKQjbr5b7ZUuy42OzIrKEgAhY/ZxpHDtLuBodW
   T5XckUq0dMQ2/LDSVOLNuuQ+DfhYZPkgBZs+dA9uqoVFKhZ8XMJhFS0Ak
   4cMJQYe/ijnMwO5GaB+hyrtYWqgqe96kHm19DouS+//e26MfZKb57BtD3
   yTcN4CutUJ5Kqn3JkiYFlYIfPNJ3Awmw2gEAtKQW+VTtL9B0Io2vENgTy
   GmjZ43aEvE1C/hZYXIloM3dmfocfaWc8MBILtLY+wm0dusy+gNWgd+h/1
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354843"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354843"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083339"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 055/104] KVM: TDX: TDP MMU TDX support
Date: Thu,  5 May 2022 11:14:49 -0700
Message-Id: 
 <99f0cfa4924565629a09b8b981b327b7940f2e47.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implement hooks of TDP MMU for TDX backend.  TLB flush, TLB shootdown,
propagating the change private EPT entry to Secure EPT and freeing Secure
EPT page.

TLB flush handles both shared EPT and private EPT.  It flushes shared EPT
same as VMX.  It also waits for the TDX TLB shootdown.

For the hook to free Secure EPT page, unlinks the Secure EPT page from the
Secure EPT so that the page can be freed to OS.

Propagating the entry change to Secure EPT.  The possible entry changes are
present -> non-present(zapping) and non-present -> present(population).  On
population just link the Secure EPT page or the private guest page to the
Secure EPT by TDX SEAMCALL.

Because TDP MMU allows concurrent zapping/population, zapping requires
synchronous TLB shootdown with the frozen EPT entry.  It zaps the secure
entry, increments TLB counter, sends IPI to remote vcpus to trigger TLB
flush, and then unlinks the private guest page from the Secure EPT.

For simplicity, batched zapping with exclude lock is handled as concurrent
zapping.  Although it's inefficient, it can be optimized in the future.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    |  40 ++++-
 arch/x86/kvm/vmx/tdx.c     | 319 +++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h     |  21 +++
 arch/x86/kvm/vmx/x86_ops.h |   2 +
 4 files changed, 378 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 252b7298b230..442d89e02459 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -110,6 +110,38 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool =
init_event)
 	return vmx_vcpu_reset(vcpu, init_event);
 }
=20
+static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_flush_tlb(vcpu);
+
+	vmx_flush_tlb_all(vcpu);
+}
+
+static void vt_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_flush_tlb(vcpu);
+
+	vmx_flush_tlb_current(vcpu);
+}
+
+static void vt_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_flush_tlb_gva(vcpu, addr);
+}
+
+static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_flush_tlb_guest(vcpu);
+}
+
 static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			int pgd_level)
 {
@@ -185,10 +217,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.set_rflags =3D vmx_set_rflags,
 	.get_if_flag =3D vmx_get_if_flag,
=20
-	.flush_tlb_all =3D vmx_flush_tlb_all,
-	.flush_tlb_current =3D vmx_flush_tlb_current,
-	.flush_tlb_gva =3D vmx_flush_tlb_gva,
-	.flush_tlb_guest =3D vmx_flush_tlb_guest,
+	.flush_tlb_all =3D vt_flush_tlb_all,
+	.flush_tlb_current =3D vt_flush_tlb_current,
+	.flush_tlb_gva =3D vt_flush_tlb_gva,
+	.flush_tlb_guest =3D vt_flush_tlb_guest,
=20
 	.vcpu_pre_run =3D vmx_vcpu_pre_run,
 	.vcpu_run =3D vmx_vcpu_run,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index dd8553d9f31b..7f1eb75bb79d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -5,7 +5,9 @@
=20
 #include "capabilities.h"
 #include "x86_ops.h"
+#include "mmu.h"
 #include "tdx.h"
+#include "vmx.h"
 #include "x86.h"
=20
 #undef pr_fmt
@@ -290,6 +292,22 @@ int tdx_vm_init(struct kvm *kvm)
 	int ret, i;
 	u64 err;
=20
+	/*
+	 * Because guest TD is protected, VMM can't parse the instruction in TD.
+	 * Instead, guest uses MMIO hypercall.  For unmodified device driver,
+	 * #VE needs to be injected for MMIO and #VE handler in TD converts MMIO
+	 * instruction into MMIO hypercall.
+	 *
+	 * SPTE value for MMIO needs to be setup so that #VE is injected into
+	 * TD instead of triggering EPT MISCONFIG.
+	 * - RWX=3D0 so that EPT violation is triggered.
+	 * - suppress #VE bit is cleared to inject #VE.
+	 */
+	kvm_mmu_set_mmio_spte_mask(kvm, 0, VMX_EPT_RWX_MASK, 0);
+
+	/* TODO: Enable 2mb and 1gb large page support. */
+	kvm->arch.tdp_max_page_level =3D PG_LEVEL_4K;
+
 	/* vCPUs can't be created until after KVM_TDX_INIT_VM. */
 	kvm->max_vcpus =3D 0;
=20
@@ -374,6 +392,8 @@ int tdx_vm_init(struct kvm *kvm)
 		tdx_mark_td_page_added(&kvm_tdx->tdcs[i]);
 	}
=20
+	spin_lock_init(&kvm_tdx->seamcall_lock);
+
 	/*
 	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a de=
dicated
 	 * ioctl() to define the configure CPUID values for the TD.
@@ -535,6 +555,282 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t ro=
ot_hpa, int pgd_level)
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
 }
=20
+static void tdx_unpin_pfn(struct kvm *kvm, kvm_pfn_t pfn)
+{
+	struct page *page =3D pfn_to_page(pfn);
+
+	WARN_ON(!page_maybe_dma_pinned(page));
+	unpin_user_page(page);
+	WARN_ON(!page_count(page) && to_kvm_tdx(kvm)->hkid > 0);
+}
+
+static void __tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
+					enum pg_level level, kvm_pfn_t pfn)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	hpa_t hpa =3D pfn_to_hpa(pfn);
+	gpa_t gpa =3D gfn_to_gpa(gfn);
+	struct tdx_module_output out;
+	u64 err;
+
+	if (WARN_ON_ONCE(is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn)))
+		return;
+
+	/* TODO: handle large pages. */
+	if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm))
+		return;
+
+	/*
+	 * The caller pinned this pfn for us and don't unpin when success.
+	 * See kvm_faultin_pfn_private() and kvm_mmu_release_fault().
+	 */
+
+	if (likely(is_td_finalized(kvm_tdx))) {
+		err =3D tdh_mem_page_aug(kvm_tdx->tdr.pa, gpa, hpa, &out);
+		if (KVM_BUG_ON(err, kvm))
+			pr_tdx_error(TDH_MEM_PAGE_AUG, err, &out);
+		return;
+	}
+}
+
+static void tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
+				      enum pg_level level, kvm_pfn_t pfn)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+
+	spin_lock(&kvm_tdx->seamcall_lock);
+	__tdx_sept_set_private_spte(kvm, gfn, level, pfn);
+	spin_unlock(&kvm_tdx->seamcall_lock);
+}
+
+static void tdx_sept_drop_private_spte(
+	struct kvm *kvm, gfn_t gfn, enum pg_level level, kvm_pfn_t pfn)
+{
+	int tdx_level =3D pg_level_to_tdx_sept_level(level);
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	gpa_t gpa =3D gfn_to_gpa(gfn);
+	hpa_t hpa =3D pfn_to_hpa(pfn);
+	hpa_t hpa_with_hkid;
+	struct tdx_module_output out;
+	u64 err =3D 0;
+
+	/* TODO: handle large pages. */
+	if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm))
+		return;
+
+	spin_lock(&kvm_tdx->seamcall_lock);
+	if (is_hkid_assigned(kvm_tdx)) {
+		err =3D tdh_mem_page_remove(kvm_tdx->tdr.pa, gpa, tdx_level, &out);
+		if (KVM_BUG_ON(err, kvm)) {
+			pr_tdx_error(TDH_MEM_PAGE_REMOVE, err, &out);
+			goto unlock;
+		}
+
+		hpa_with_hkid =3D set_hkid_to_hpa(hpa, (u16)kvm_tdx->hkid);
+		err =3D tdh_phymem_page_wbinvd(hpa_with_hkid);
+		if (WARN_ON_ONCE(err)) {
+			pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL);
+			goto unlock;
+		}
+	} else
+		/*
+		 * The HKID assigned to this TD was already freed and cache
+		 * was already flushed. We don't have to flush again.
+		 */
+		err =3D tdx_reclaim_page((unsigned long)__va(hpa), hpa, false, 0);
+
+unlock:
+	spin_unlock(&kvm_tdx->seamcall_lock);
+
+	if (!err)
+		tdx_unpin_pfn(kvm, pfn);
+}
+
+static int tdx_sept_link_private_sp(struct kvm *kvm, gfn_t gfn,
+				    enum pg_level level, void *sept_page)
+{
+	int tdx_level =3D pg_level_to_tdx_sept_level(level);
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	gpa_t gpa =3D gfn_to_gpa(gfn);
+	hpa_t hpa =3D __pa(sept_page);
+	struct tdx_module_output out;
+	u64 err;
+
+	spin_lock(&kvm_tdx->seamcall_lock);
+	err =3D tdh_mem_sept_add(kvm_tdx->tdr.pa, gpa, tdx_level, hpa, &out);
+	spin_unlock(&kvm_tdx->seamcall_lock);
+	if (KVM_BUG_ON(err, kvm)) {
+		pr_tdx_error(TDH_MEM_SEPT_ADD, err, &out);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static void tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn,
+				      enum pg_level level)
+{
+	int tdx_level =3D pg_level_to_tdx_sept_level(level);
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	gpa_t gpa =3D gfn_to_gpa(gfn);
+	struct tdx_module_output out;
+	u64 err;
+
+	/* For now large page isn't supported yet. */
+	WARN_ON_ONCE(level !=3D PG_LEVEL_4K);
+	spin_lock(&kvm_tdx->seamcall_lock);
+	err =3D tdh_mem_range_block(kvm_tdx->tdr.pa, gpa, tdx_level, &out);
+	spin_unlock(&kvm_tdx->seamcall_lock);
+	if (KVM_BUG_ON(err, kvm))
+		pr_tdx_error(TDH_MEM_RANGE_BLOCK, err, &out);
+}
+
+/*
+ * TLB shoot down procedure:
+ * There is a global epoch counter and each vcpu has local epoch counter.
+ * - TDH.MEM.RANGE.BLOCK(TDR. level, range) on one vcpu
+ *   This blocks the subsequenct creation of TLB translation on that range.
+ *   This corresponds to clear the present bit(all RXW) in EPT entry
+ * - TDH.MEM.TRACK(TDR): advances the epoch counter which is global.
+ * - IPI to remote vcpus
+ * - TDExit and re-entry with TDH.VP.ENTER on remote vcpus
+ * - On re-entry, TDX module compares the local epoch counter with the glo=
bal
+ *   epoch counter.  If the local epoch counter is older than the global e=
poch
+ *   counter, update the local epoch counter and flushes TLB.
+ */
+static void tdx_track(struct kvm_tdx *kvm_tdx)
+{
+	u64 err;
+
+	WARN_ON(!is_hkid_assigned(kvm_tdx));
+	/* If TD isn't finalized, it's before any vcpu running. */
+	if (unlikely(!is_td_finalized(kvm_tdx)))
+		return;
+
+	/*
+	 * tdx_flush_tlb() waits for this function to issue TDH.MEM.TRACK() by
+	 * the counter.  The counter is used instead of bool because multiple
+	 * TDH_MEM_TRACK() can be issued concurrently by multiple vcpus.
+	 */
+	atomic_inc(&kvm_tdx->tdh_mem_track);
+	/*
+	 * KVM_REQ_TLB_FLUSH waits for the empty IPI handler, ack_flush(), with
+	 * KVM_REQUEST_WAIT.
+	 */
+	kvm_make_all_cpus_request(&kvm_tdx->kvm, KVM_REQ_TLB_FLUSH);
+
+	spin_lock(&kvm_tdx->seamcall_lock);
+	err =3D tdh_mem_track(kvm_tdx->tdr.pa);
+	spin_unlock(&kvm_tdx->seamcall_lock);
+
+	/* Release remote vcpu waiting for TDH.MEM.TRACK in tdx_flush_tlb(). */
+	atomic_dec(&kvm_tdx->tdh_mem_track);
+
+	if (KVM_BUG_ON(err, &kvm_tdx->kvm))
+		pr_tdx_error(TDH_MEM_TRACK, err, NULL);
+
+}
+
+static int tdx_sept_free_private_sp(struct kvm *kvm, gfn_t gfn, enum pg_le=
vel level,
+				    void *sept_page)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	int ret;
+
+	/*
+	 * free_private_sp() is (obviously) called when a shadow page is being
+	 * zapped.  KVM doesn't (yet) zap private SPs while the TD is active.
+	 * Note: This function is for private shadow page.  Not for private
+	 * guest page.   private guest page can be zapped during TD is active.
+	 * shared <-> private conversion and slot move/deletion.
+	 *
+	 * TODO: large page support.  If large page is supported, S-EPT page
+	 * can be freed when promoting 4K page to 2M/1G page during TD running.
+	 * In such case, flush cache and TDH.PAGE.RECLAIM.
+	 */
+	if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm))
+		return -EINVAL;
+
+	/*
+	 * The HKID assigned to this TD was already freed and cache was
+	 * already flushed. We don't have to flush again.
+	 */
+	spin_lock(&kvm_tdx->seamcall_lock);
+	ret =3D tdx_reclaim_page((unsigned long)sept_page, __pa(sept_page), false=
, 0);
+	spin_unlock(&kvm_tdx->seamcall_lock);
+
+	return ret;
+}
+
+static int tdx_sept_tlb_remote_flush(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx;
+
+	if (!is_td(kvm))
+		return -EOPNOTSUPP;
+
+	kvm_tdx =3D to_kvm_tdx(kvm);
+	if (is_hkid_assigned(kvm_tdx))
+		tdx_track(kvm_tdx);
+
+	return 0;
+}
+
+static void tdx_handle_changed_private_spte(
+	struct kvm *kvm, const struct kvm_spte_change *change)
+{
+	const gfn_t gfn =3D change->gfn;
+	const enum pg_level level =3D change->level;
+
+	WARN_ON(!is_td(kvm));
+	lockdep_assert_held(&kvm->mmu_lock);
+
+	if (change->new.is_present) {
+		/* TDP MMU doesn't change present -> present */
+		WARN_ON(change->old.is_present);
+
+		/*
+		 * Use different call to either set up middle level
+		 * private page table, or leaf.
+		 */
+		if (change->new.is_leaf)
+			tdx_sept_set_private_spte(
+				kvm, gfn, level, change->new.pfn);
+		else {
+			WARN_ON(!change->sept_page);
+			if (tdx_sept_link_private_sp(
+				    kvm, gfn, level, change->sept_page))
+				/* failed to update Secure-EPT.  */
+				WARN_ON(1);
+		}
+	} else if (change->old.is_leaf) {
+		/* non-present -> non-present doesn't make sense. */
+		WARN_ON(!change->old.is_present);
+
+		/*
+		 * Zap private leaf SPTE.  Zapping private table is done
+		 * below in handle_removed_tdp_mmu_page().
+		 */
+		tdx_sept_zap_private_spte(kvm, gfn, level);
+
+		/*
+		 * TDX requires TLB tracking before dropping private page.  Do
+		 * it here, although it is also done later.
+		 * If hkid isn't assigned, the guest is destroying and no vcpu
+		 * runs further.  TLB shootdown isn't needed.
+		 *
+		 * TODO: implement with_range version for optimization.
+		 * kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+		 *   =3D> tdx_sept_tlb_remote_flush_with_range(kvm, gfn,
+		 *                                 KVM_PAGES_PER_HPAGE(level));
+		 */
+		if (is_hkid_assigned(to_kvm_tdx(kvm)))
+			kvm_flush_remote_tlbs(kvm);
+
+		tdx_sept_drop_private_spte(kvm, gfn, level, change->old.pfn);
+	}
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
@@ -783,6 +1079,25 @@ static int tdx_td_init(struct kvm *kvm, struct kvm_td=
x_cmd *cmd)
 	return ret;
 }
=20
+void tdx_flush_tlb(struct kvm_vcpu *vcpu)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
+	u64 root_hpa =3D mmu->root.hpa;
+
+	/* Flush the shared EPTP, if it's valid. */
+	if (VALID_PAGE(root_hpa))
+		ept_sync_context(construct_eptp(vcpu, root_hpa,
+						mmu->root_role.level));
+
+	/*
+	 * See tdx_track().  Wait for tlb shootdown initiater to finish
+	 * TDH_MEM_TRACK() so that TLB is flushed on the next TDENTER.
+	 */
+	while (atomic_read(&kvm_tdx->tdh_mem_track))
+		cpu_relax();
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -929,6 +1244,10 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86=
_ops)
 	hkid_mask =3D GENMASK_ULL(max_pa - 1, hkid_start_pos);
 	pr_info("hkid start pos %d mask 0x%llx\n", hkid_start_pos, hkid_mask);
=20
+	x86_ops->tlb_remote_flush =3D tdx_sept_tlb_remote_flush;
+	x86_ops->free_private_sp =3D tdx_sept_free_private_sp;
+	x86_ops->handle_changed_private_spte =3D tdx_handle_changed_private_spte;
+
 	return 0;
 }
=20
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 337c3adb4fcf..d8dcbedd690b 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -26,9 +26,24 @@ struct kvm_tdx {
 	int hkid;
=20
 	bool finalized;
+	atomic_t tdh_mem_track;
=20
 	u64 tsc_offset;
 	unsigned long tsc_khz;
+
+	/*
+	 * Some SEAMCALLs try to lock TD resources (e.g. Secure-EPT) they use or
+	 * update.  If TDX module fails to obtain the lock, it returns
+	 * TDX_OPERAND_BUSY error without spinning.  It's VMM/OS responsibility
+	 * to retry or guarantee no contention because TDX module has the
+	 * restriction on cpu cycles it can spend and VMM/OS knows better
+	 * vcpu scheduling.
+	 *
+	 * TDP MMU uses read lock of kvm.arch.mmu_lock so TDP MMU code can be
+	 * run concurrently with multiple vCPUs.   Lock to prevent seamcalls from
+	 * running concurrently when TDP MMU is enabled.
+	 */
+	spinlock_t seamcall_lock;
 };
=20
 struct vcpu_tdx {
@@ -169,6 +184,12 @@ static __always_inline u64 td_tdcs_exec_read64(struct =
kvm_tdx *kvm_tdx, u32 fiel
 	return out.r8;
 }
=20
+static __always_inline int pg_level_to_tdx_sept_level(enum pg_level level)
+{
+	WARN_ON(level =3D=3D PG_LEVEL_NONE);
+	return level - 1;
+}
+
 #else
 static inline int tdx_module_setup(void) { return -ENODEV; };
=20
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index b59dbc1f9906..1e63158081a5 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -145,6 +145,7 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_ev=
ent);
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
=20
+void tdx_flush_tlb(struct kvm_vcpu *vcpu);
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_leve=
l);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
@@ -164,6 +165,7 @@ static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu=
, bool init_event) {}
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
=20
+static inline void tdx_flush_tlb(struct kvm_vcpu *vcpu) {}
 static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,=
 int root_level) {}
 #endif
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 209DEC35294
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:45 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384573AbiEES3d (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:29:33 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36324 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383164AbiEESTj (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:39 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D00A619C12;
        Thu,  5 May 2022 11:15:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774549; x=1683310549;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=A805Zxruh6yYNtylIzj9f+DsWSAMkPS9vWZxb71Yd8s=;
  b=LbdoH7CLMgyMgejVfSCnK2pU3cpjMhFfdYIGBhk3z9rop4tmd0yr8JIS
   IisN8B6ml1O58EjSh8RdiD2bx1ay9NrqIG8Ern62J9apKcFWmaVXWreTj
   /mO8I2ORNuohHc+noJUtNk0VxRs1fAZemI7ObfS8Lw7nVZ0kTRd0OxKB1
   ZeqIHvu7Nrf2UZUGfnI3SL4hJ6X+f55aU8P/hy28+qbXSU6OU3rC1/k5y
   aAK2T7t+PImv3DzRdBjn1umapZYjVjbkNkAnlDknjWfK6YMcpz3R0eA9R
   VO0gxkXiXCcJ0Voql0ZwvZSoW+sQnal5v5yKR6/aAd1j6+GFNDkcPbEb9
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354844"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354844"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083344"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 056/104] [MARKER] The start of TDX KVM patch series:
 KVM TDP MMU MapGPA
Date: Thu,  5 May 2022 11:14:50 -0700
Message-Id: 
 <f5b4f8c9b7cd100b0a11b3561470a8587c8c7f5c.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of KVM TDP MMU
MapGPA.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index c3e675bea802..5797d172176d 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -11,6 +11,7 @@ What qemu can do
 - TDX VM TYPE is exposed to Qemu.
 - Qemu can create/destroy guest of TDX vm type.
 - Qemu can create/destroy vcpu of TDX vm type.
+- Qemu can populate initial guest memory image.
=20
 Patch Layer status
 ------------------
@@ -19,7 +20,7 @@ Patch Layer status
 * TDX architectural definitions:        Applied
 * TD VM creation/destruction:           Applied
 * TD vcpu creation/destruction:         Applied
-* TDX EPT violation:                    Applying
+* TDX EPT violation:                    Applied
 * TD finalization:                      Not yet
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D9460C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:24:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384104AbiEES1p (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:27:45 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36326 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383195AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CFADB15709;
        Thu,  5 May 2022 11:15:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774552; x=1683310552;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=aYynseuNWmbGl2UAUJ0UpxmY6xn9k2cN25ehM7pMsG8=;
  b=HCxY5ZCX8RL01CMxw4nlf2i5l58naQKmHT1HTkKuhJMrtkfEYWMk32+9
   lLWRN1Fe/SW5wjWSfjlNQNpaJoI9bfrPgUkh/1K9PKpa/Gi1u1xep+RJs
   oipXrSkVpPzTdVjW3FZo5xF5BHUMXH2j3qSVCUja8br9DWg5v2Pg6Tdsd
   syyHmgggCdHd/kQLXtsySmY/jfojfeCWJEs0cS8FfblKrveK6a8Ar7uDQ
   ORbB9OML6cZ1l4wNqs2V5Hz4rXa3lmjCh/6K5uF1k6TIXFN8a6qra7cui
   2Y8FgHLpNpshkxoe+sFhIMR3LfnX4K/wXPLCKxNeOsqkc8oVb4igrl3Nb
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354846"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354846"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083348"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 057/104] KVM: x86/mmu: steal software usable git to
 record if GFN is for shared or not
Date: Thu,  5 May 2022 11:14:51 -0700
Message-Id: 
 <d0a89a95fce3f7cfdd3089c80988e1f54008620d.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

With TDX, all GFNs are private at guest boot time.  At run time guest TD
can explicitly change it to shared from private or vice-versa by MapGPA
hypercall.  If it's specified, the given GFN can't be used as otherwise.
That's is, if a guest tells KVM that the GFN is shared, it can't be used
as private.  or vice-versa.

Steal software usable bit, SPTE_SHARED_MASK, for it from MMIO counter to
record it.  Use the bit SPTE_SHARED_MASK in shared or private EPT to
determine which mapping, shared or private, is allowed.  If requested
mapping isn't allowed, return RET_PF_RETRY to wait for other vcpu to change
it.  The bit is recorded in both shared and private shadow page to avoid
traverse one more shadow page when resolving KVM page fault.

The bit needs to be kept over zapping the EPT entry.  Currently the EPT
entry is initialized SHADOW_NONPRESENT_VALUE unconditionally to clear
SPTE_SHARED_MASK bit.  To carry SPTE_SHARED_MASK bit, introduce a helper
function to get initial value for zapped entry with SPTE_SHARED_MASK bit.
Replace SHADOW_NONPRESENT_VALUE with it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/spte.h    | 17 +++++++---
 arch/x86/kvm/mmu/tdp_mmu.c | 65 ++++++++++++++++++++++++++++++++------
 2 files changed, 68 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 1ac2a7a91166..d97ffe440536 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -14,6 +14,9 @@
  */
 #define SPTE_MMU_PRESENT_MASK		BIT_ULL(11)
=20
+/* Masks that used to track for shared GPA **/
+#define SPTE_SHARED_MASK		BIT_ULL(62)
+
 /*
  * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may=
 also
  * be restricted to using write-protection (for L2 when CPU dirty logging,=
 i.e.
@@ -104,7 +107,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRAC=
K_SAVED_MASK));
  * the memslots generation and is derived as follows:
  *
  * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10
- * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62
+ * Bits 8-18 of the MMIO generation are propagated to spte bits 52-61
  *
  * The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not includ=
ed in
  * the MMIO generation number, as doing so would require stealing a bit fr=
om
@@ -118,7 +121,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRAC=
K_SAVED_MASK));
 #define MMIO_SPTE_GEN_LOW_END		10
=20
 #define MMIO_SPTE_GEN_HIGH_START	52
-#define MMIO_SPTE_GEN_HIGH_END		62
+#define MMIO_SPTE_GEN_HIGH_END		61
=20
 #define MMIO_SPTE_GEN_LOW_MASK		GENMASK_ULL(MMIO_SPTE_GEN_LOW_END, \
 						    MMIO_SPTE_GEN_LOW_START)
@@ -131,7 +134,7 @@ static_assert(!(SPTE_MMU_PRESENT_MASK &
 #define MMIO_SPTE_GEN_HIGH_BITS		(MMIO_SPTE_GEN_HIGH_END - MMIO_SPTE_GEN_H=
IGH_START + 1)
=20
 /* remember to adjust the comment above as well if you change these */
-static_assert(MMIO_SPTE_GEN_LOW_BITS =3D=3D 8 && MMIO_SPTE_GEN_HIGH_BITS =
=3D=3D 11);
+static_assert(MMIO_SPTE_GEN_LOW_BITS =3D=3D 8 && MMIO_SPTE_GEN_HIGH_BITS =
=3D=3D 10);
=20
 #define MMIO_SPTE_GEN_LOW_SHIFT		(MMIO_SPTE_GEN_LOW_START - 0)
 #define MMIO_SPTE_GEN_HIGH_SHIFT	(MMIO_SPTE_GEN_HIGH_START - MMIO_SPTE_GEN=
_LOW_BITS)
@@ -208,6 +211,7 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
 /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
 static_assert(!(__REMOVED_SPTE & SPTE_MMU_PRESENT_MASK));
 static_assert(!(__REMOVED_SPTE & SHADOW_NONPRESENT_VALUE));
+static_assert(!(__REMOVED_SPTE & SPTE_SHARED_MASK));
=20
 /*
  * See above comment around __REMOVED_SPTE.  REMOVED_SPTE is the actual
@@ -217,7 +221,12 @@ static_assert(!(__REMOVED_SPTE & SHADOW_NONPRESENT_VAL=
UE));
=20
 static inline bool is_removed_spte(u64 spte)
 {
-	return spte =3D=3D REMOVED_SPTE;
+	return (spte & ~SPTE_SHARED_MASK) =3D=3D REMOVED_SPTE;
+}
+
+static inline u64 spte_shared_mask(u64 spte)
+{
+	return spte & SPTE_SHARED_MASK;
 }
=20
 /*
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 2aa2cb8a9b05..1d7642a0acc9 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -736,6 +736,11 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *=
kvm,
 	return 0;
 }
=20
+static u64 shadow_nonpresent_spte(u64 old_spte)
+{
+	return SHADOW_NONPRESENT_VALUE | spte_shared_mask(old_spte);
+}
+
 static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
 					  struct tdp_iter *iter)
 {
@@ -770,7 +775,8 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *k=
vm,
 	 * SHADOW_NONPRESENT_VALUE (which sets "suppress #VE" bit) so it
 	 * can be set when EPT table entries are zapped.
 	 */
-	kvm_tdp_mmu_write_spte(iter->sptep, SHADOW_NONPRESENT_VALUE);
+	kvm_tdp_mmu_write_spte(iter->sptep,
+			       shadow_nonpresent_spte(iter->old_spte));
=20
 	return 0;
 }
@@ -948,8 +954,11 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct=
 kvm_mmu_page *root,
 			continue;
=20
 		if (!shared)
-			tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
-		else if (tdp_mmu_set_spte_atomic(kvm, &iter, SHADOW_NONPRESENT_VALUE))
+			tdp_mmu_set_spte(kvm, &iter,
+					 shadow_nonpresent_spte(iter.old_spte));
+		else if (tdp_mmu_set_spte_atomic(
+				 kvm, &iter,
+				 shadow_nonpresent_spte(iter.old_spte)))
 			goto retry;
 	}
 }
@@ -1006,7 +1015,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_m=
mu_page *sp)
 		return false;
=20
 	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte,
-			   SHADOW_NONPRESENT_VALUE, sp->gfn, sp->role.level + 1,
+			   shadow_nonpresent_spte(old_spte),
+			   sp->gfn, sp->role.level + 1,
 			   true, true, is_private_sp(sp));
=20
 	return true;
@@ -1048,11 +1058,20 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, stru=
ct kvm_mmu_page *root,
 			continue;
 		}
=20
+		/*
+		 * SPTE_SHARED_MASK is stored as 4K granularity.  The
+		 * information is lost if we delete upper level SPTE page.
+		 * TODO: support large page.
+		 */
+		if (kvm_gfn_shared_mask(kvm) && iter.level > PG_LEVEL_4K)
+			continue;
+
 		if (!is_shadow_present_pte(iter.old_spte) ||
 		    !is_last_spte(iter.old_spte, iter.level))
 			continue;
=20
-		tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
+		tdp_mmu_set_spte(kvm, &iter,
+				 shadow_nonpresent_spte(iter.old_spte));
 		flush =3D true;
 	}
=20
@@ -1168,18 +1187,44 @@ static int tdp_mmu_map_handle_target_level(struct k=
vm_vcpu *vcpu,
 	gfn_t gfn_unalias =3D iter->gfn & ~kvm_gfn_shared_mask(vcpu->kvm);
=20
 	WARN_ON(sp->role.level !=3D fault->goal_level);
+	WARN_ON(is_private_sptep(iter->sptep) !=3D fault->is_private);
=20
-	/* TDX shared GPAs are no executable, enforce this for the SDV. */
-	if (kvm_gfn_shared_mask(vcpu->kvm) && !fault->is_private)
-		pte_access &=3D ~ACC_EXEC_MASK;
+	if (kvm_gfn_shared_mask(vcpu->kvm)) {
+		if (fault->is_private) {
+			/*
+			 * SPTE allows only RWX mapping. PFN can't be mapped it
+			 * as READONLY in GPA.
+			 */
+			if (fault->slot && !fault->map_writable)
+				return RET_PF_RETRY;
+			/*
+			 * This GPA is not allowed to map as private.  Let
+			 * vcpu loop in page fault until other vcpu change it
+			 * by MapGPA hypercall.
+			 */
+			if (fault->slot &&
+				spte_shared_mask(iter->old_spte))
+				return RET_PF_RETRY;
+		} else {
+			/* This GPA is not allowed to map as shared. */
+			if (fault->slot &&
+				!spte_shared_mask(iter->old_spte))
+				return RET_PF_RETRY;
+			/* TDX shared GPAs are no executable, enforce this. */
+			pte_access &=3D ~ACC_EXEC_MASK;
+		}
+	}
=20
 	if (unlikely(!fault->slot))
 		new_spte =3D make_mmio_spte(vcpu, gfn_unalias, pte_access);
-	else
+	else {
 		wrprot =3D make_spte(vcpu, sp, fault->slot, pte_access,
 				   gfn_unalias, fault->pfn, iter->old_spte,
 				   fault->prefetch, true, fault->map_writable,
 				   &new_spte);
+		if (spte_shared_mask(iter->old_spte))
+			new_spte |=3D SPTE_SHARED_MASK;
+	}
=20
 	if (new_spte =3D=3D iter->old_spte)
 		ret =3D RET_PF_SPURIOUS;
@@ -1488,7 +1533,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_=
iter *iter,
 	 * invariant that the PFN of a present * leaf SPTE can never change.
 	 * See __handle_changed_spte().
 	 */
-	tdp_mmu_set_spte(kvm, iter, SHADOW_NONPRESENT_VALUE);
+	tdp_mmu_set_spte(kvm, iter, shadow_nonpresent_spte(iter->old_spte));
=20
 	if (!pte_write(range->pte)) {
 		new_spte =3D kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 94E06C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:22:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383723AbiEES0U (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:26:20 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36908 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383231AbiEESTm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:42 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B585654BF6;
        Thu,  5 May 2022 11:15:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774554; x=1683310554;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=zHlYPaq/CL1BkHg8xKNWjvjdVIGklNNMbNgXRzNRwcE=;
  b=jyLNLvpB1XqryX1ABVhAqe1nFCiqusHj8v+8/AvwqTX1A7UPhNyivrC/
   eNegEXYnXbtyk4Lh6IbCji/W88+W7QPSt1B9TcFsapSVw0h72uzYACfuW
   tp0u2tbUg2RZUAESqk9Y1rWJK7oLj9ic475SLTFY9cVG2HEXoxegXzws5
   Z7H92QJkgtaDAnoD2pr3c+emEmcLOcXrvjtOerlf2TNyOLf3gG09vGnNN
   2XuQB0A1XRGi/zdmw+K27HgacaWi64SyG10B+o4OxNwQjKbGR2k9Mijtc
   DLe3S2t5S3Ahj/r0bqC9fSaFOf5uVxb8u9tvmPQ+l/avIDGvGvGiclKFP
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354847"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354847"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:49 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083353"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:48 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 058/104] KVM: x86/tdp_mmu: implement MapGPA hypercall
 for TDX
Date: Thu,  5 May 2022 11:14:52 -0700
Message-Id: 
 <fb8f699b7cdd1dc54c13b663e66dfa2cc82c5cd3.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX Guest-Hypervisor communication interface(GHCI) specification
defines MapGPA hypercall for guest TD to request the host VMM to map given
GPA range as private or shared.

It means the guest TD uses the GPA as shared (or private).  The GPA
won't be used as private (or shared).  VMM should enforce GPA usage. VMM
doesn't have to map the GPA on the hypercall request.

- Allocate 4k PTE to record SPTE_SHARED_MASK bit.

- Zap the aliased region.
  If shared (or private) GPA is requested, zap private (or shared) GPA
  (modulo shared bit).

- Record the request GPA is shared (or private) by SPTE_SHARED_MASK in SPTE
  in both shared and private EPT tables.
  - With SPTE_SHARED_MASK set, a shared GPA is allowed.
  - With SPTE_SHARED_MASK cleared, a private GPA is allowed.

  The reason to record SPTE_SHARED_MASK in both shared and private EPT
  is to optimize EPT violation path for normal guest TD execution path and
  penalize map_gpa hypercall.

  If the guest TD faults on not-allowed GPA (modulo shared bit), the KVM
  doesn't resolve EPT violation and let vcpu retry.  vcpu will keep
  faulting until other vcpu maps the region with MapGPA hypercall.  With
  the nonpresent value of spte(shadow_nonpresent_value), SPTE_SHARED_MASK
  is cleared.  So the default behavior doesn't change.

- don't map GPA.
  The GPA is mapped on the next EPT violation.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu.h         |   3 +
 arch/x86/kvm/mmu/mmu.c     | 106 +++++++++++++++
 arch/x86/kvm/mmu/tdp_mmu.c | 271 ++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/mmu/tdp_mmu.h |   5 +
 4 files changed, 382 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index d02c0274777a..beff084d6cd3 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -316,6 +316,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start=
, gfn_t gfn_end);
=20
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
=20
+int kvm_mmu_map_gpa(struct kvm_vcpu *vcpu, gfn_t *startp, gfn_t end,
+		    bool allow_private);
+
 int kvm_mmu_post_init_vm(struct kvm *kvm);
 void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
=20
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f4284e9cf9ec..497e2b9e58cc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6317,6 +6317,112 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm,=
 u64 gen)
 	}
 }
=20
+static int kvm_mmu_populate_nonleaf(struct kvm_vcpu *vcpu, gfn_t start, gf=
n_t end)
+{
+	struct kvm *kvm =3D vcpu->kvm;
+	struct kvm_memslots *slots;
+	struct kvm_memslot_iter iter;
+	int ret =3D 0;
+
+	/* No need to populate as mmu_map_gpa() handles single GPA. */
+	if (!is_tdp_mmu_enabled(kvm))
+		return 0;
+
+	slots =3D __kvm_memslots(kvm, 0 /* only normal ram. not SMM. */);
+	kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) {
+		struct kvm_memory_slot *memslot =3D iter.slot;
+		gfn_t s =3D max(start, memslot->base_gfn);
+		gfn_t e =3D min(end, memslot->base_gfn + memslot->npages);
+
+		if (WARN_ON_ONCE(s >=3D e))
+			continue;
+
+		ret =3D kvm_tdp_mmu_populate_nonleaf(vcpu, kvm_gfn_private(kvm, s),
+						kvm_gfn_private(kvm, e), true, false);
+		if (ret)
+			break;
+		ret =3D kvm_tdp_mmu_populate_nonleaf(vcpu, kvm_gfn_shared(kvm, s),
+						kvm_gfn_shared(kvm, e), false, false);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
+int kvm_mmu_map_gpa(struct kvm_vcpu *vcpu, gfn_t *startp, gfn_t end,
+		bool allow_private)
+{
+	struct kvm *kvm =3D vcpu->kvm;
+	struct kvm_memslots *slots;
+	struct kvm_memslot_iter iter;
+	gfn_t start =3D *startp;
+	int ret;
+
+	if (!kvm_gfn_shared_mask(kvm))
+		return -EOPNOTSUPP;
+
+	start =3D start & ~kvm_gfn_shared_mask(kvm);
+	end =3D end & ~kvm_gfn_shared_mask(kvm);
+
+	/*
+	 * Allocate S-EPT pages first so that the operations leaf SPTE entry
+	 * can be done without memory allocation.
+	 */
+	while (true) {
+		ret =3D mmu_topup_memory_caches(vcpu, false);
+		if (ret)
+			return ret;
+
+		mutex_lock(&kvm->slots_lock);
+		write_lock(&kvm->mmu_lock);
+
+		ret =3D kvm_mmu_populate_nonleaf(vcpu, start, end);
+		if (!ret)
+			break;
+
+		write_unlock(&kvm->mmu_lock);
+		mutex_unlock(&kvm->slots_lock);
+		if (ret =3D=3D -EAGAIN) {
+			if (need_resched())
+				cond_resched();
+			continue;
+		}
+		return ret;
+	}
+
+	slots =3D __kvm_memslots(kvm, 0 /* only normal ram. not SMM. */);
+	kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) {
+		struct kvm_memory_slot *memslot =3D iter.slot;
+		gfn_t s =3D max(start, memslot->base_gfn);
+		gfn_t e =3D min(end, memslot->base_gfn + memslot->npages);
+
+		if (WARN_ON_ONCE(s >=3D e))
+			continue;
+		if (is_tdp_mmu_enabled(kvm)) {
+			ret =3D kvm_tdp_mmu_map_gpa(vcpu, &s, e, allow_private);
+			if (ret) {
+				start =3D s;
+				break;
+			}
+		} else {
+			ret =3D -EOPNOTSUPP;
+			break;
+		}
+	}
+
+	write_unlock(&kvm->mmu_lock);
+	mutex_unlock(&kvm->slots_lock);
+
+	if (ret =3D=3D -EAGAIN) {
+		if (allow_private)
+			*startp =3D kvm_gfn_private(kvm, start);
+		else
+			*startp =3D kvm_gfn_shared(kvm, start);
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_map_gpa);
+
 static unsigned long
 mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 {
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1d7642a0acc9..8bcb241cc12c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -658,6 +658,13 @@ static void __handle_changed_spte(struct kvm *kvm, int=
 as_id, gfn_t gfn,
 		}
 		change.sept_page =3D sept_page;
=20
+		/*
+		 * SPTE_SHARED_MASK is only changed by map_gpa that obtains
+		 * write lock of mmu_lock.
+		 */
+		WARN_ON(shared &&
+			(spte_shared_mask(old_spte) !=3D
+				spte_shared_mask(new_spte)));
 		static_call(kvm_x86_handle_changed_private_spte)(kvm, &change);
 	}
 }
@@ -1303,7 +1310,8 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct td=
p_iter *iter,
 	return 0;
 }
=20
-static int tdp_mmu_populate_nonleaf(struct kvm_vcpu *vcpu, struct tdp_iter=
 *iter, bool account_nx)
+static int tdp_mmu_populate_nonleaf(
+	struct kvm_vcpu *vcpu, struct tdp_iter *iter, bool account_nx, bool share=
d)
 {
 	struct kvm_mmu_page *sp;
 	int ret;
@@ -1314,7 +1322,7 @@ static int tdp_mmu_populate_nonleaf(struct kvm_vcpu *=
vcpu, struct tdp_iter *iter
 	sp =3D tdp_mmu_alloc_sp(vcpu, iter->is_private, false);
 	tdp_mmu_init_child_sp(sp, iter);
=20
-	ret =3D tdp_mmu_link_sp(vcpu->kvm, iter, sp, account_nx, true);
+	ret =3D tdp_mmu_link_sp(vcpu->kvm, iter, sp, account_nx, shared);
 	if (ret)
 		tdp_mmu_free_sp(sp);
 	return ret;
@@ -1390,7 +1398,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm=
_page_fault *fault)
 			if (is_removed_spte(iter.old_spte))
 				break;
=20
-			if (tdp_mmu_populate_nonleaf(vcpu, &iter, account_nx))
+			if (tdp_mmu_populate_nonleaf(vcpu, &iter, account_nx, true))
 				break;
 		}
 	}
@@ -2096,6 +2104,263 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
 	return spte_set;
 }
=20
+/*
+ * Allocate shadow page table for given gfn so that the following operatio=
ns
+ * on sptes can be done without memory allocation.
+ */
+int kvm_tdp_mmu_populate_nonleaf(
+	struct kvm_vcpu *vcpu, gfn_t start, gfn_t end, bool is_private, bool shar=
ed)
+{
+	struct kvm *kvm =3D vcpu->kvm;
+	struct tdp_iter iter;
+	int ret =3D 0;
+
+	kvm_lockdep_assert_mmu_lock_held(kvm, false);
+	rcu_read_lock();
+	tdp_mmu_for_each_pte(iter, vcpu->arch.mmu, is_private, start, end) {
+		if (iter.level =3D=3D PG_LEVEL_4K)
+			continue;
+		if (is_shadow_present_pte(iter.old_spte) &&
+			is_large_pte(iter.old_spte)) {
+			/* TODO: large page support. */
+			WARN_ON_ONCE(true);
+			return -ENOSYS;
+		}
+
+		if (is_shadow_present_pte(iter.old_spte))
+			continue;
+
+		/*
+		 * Guarantee that alloc_tdp_mmu_page() succees which
+		 * assumes page allocation from cache always successes.
+		 */
+		if (vcpu->arch.mmu_page_header_cache.nobjs =3D=3D 0 ||
+			vcpu->arch.mmu_shadow_page_cache.nobjs =3D=3D 0 ||
+			vcpu->arch.mmu_private_sp_cache.nobjs =3D=3D 0) {
+			ret =3D -EAGAIN;
+			break;
+		}
+
+		/*
+		 * write lock of mmu_lock is held.  No other thread
+		 * freezes SPTE.
+		 */
+		ret =3D tdp_mmu_populate_nonleaf(vcpu, &iter, false, shared);
+		if (ret) {
+			/* As write lock is held, this case sholdn't happen. */
+			WARN_ON_ONCE(true);
+			break;
+		}
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
+typedef void (*update_spte_t)(
+	struct kvm *kvm, struct tdp_iter *iter, bool allow_private);
+
+static int kvm_tdp_mmu_update_range(struct kvm_vcpu *vcpu, bool is_private,
+				gfn_t start, gfn_t end, gfn_t *nextp,
+				update_spte_t fn, bool allow_private)
+{
+	struct kvm *kvm =3D vcpu->kvm;
+	struct tdp_iter iter;
+	int ret =3D 0;
+
+	rcu_read_lock();
+	tdp_mmu_for_each_pte(iter, vcpu->arch.mmu, is_private, start, end) {
+		if (iter.level =3D=3D PG_LEVEL_4K) {
+			fn(kvm, &iter, allow_private);
+			continue;
+		}
+
+		/*
+		 * Which GPA is allowed, private or shared, is recorded in the
+		 * granular of 4K in private leaf spte as SPTE_SHARED_MASK.
+		 * Break large page into 4K.
+		 */
+		if (is_shadow_present_pte(iter.old_spte) &&
+			is_large_pte(iter.old_spte)) {
+			/*
+			 * TODO: large page support.
+			 * Doesn't support large page for TDX now
+			 */
+			WARN_ON_ONCE(true);
+			tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
+			iter.old_spte =3D kvm_tdp_mmu_read_spte(iter.sptep);
+		}
+
+		if (!is_shadow_present_pte(iter.old_spte)) {
+			/*
+			 * Guarantee that alloc_tdp_mmu_page() succees which
+			 * assumes page allocation from cache always successes.
+			 */
+			if (vcpu->arch.mmu_page_header_cache.nobjs =3D=3D 0 ||
+				vcpu->arch.mmu_shadow_page_cache.nobjs =3D=3D 0 ||
+				vcpu->arch.mmu_private_sp_cache.nobjs =3D=3D 0) {
+				ret =3D -EAGAIN;
+				break;
+			}
+			/*
+			 * write lock of mmu_lock is held.  No other thread
+			 * freezes SPTE.
+			 */
+			ret =3D tdp_mmu_populate_nonleaf(vcpu, &iter, false, false);
+			if (ret) {
+				/* As write lock is held, this case sholdn't happen. */
+				WARN_ON_ONCE(true);
+				break;
+			}
+		}
+	}
+	rcu_read_unlock();
+
+	if (ret =3D=3D -EAGAIN)
+		*nextp =3D iter.next_last_level_gfn;
+
+	return ret;
+}
+
+static void kvm_tdp_mmu_update_shared_spte(
+	struct kvm *kvm, struct tdp_iter *iter, bool allow_private)
+{
+	u64 new_spte;
+
+	WARN_ON(iter->is_private);
+	if (allow_private) {
+		/* Zap SPTE and clear SPTE_SHARED_MASK */
+		new_spte =3D SHADOW_NONPRESENT_VALUE;
+		if (new_spte !=3D iter->old_spte)
+			tdp_mmu_set_spte(kvm, iter, new_spte);
+	} else {
+		new_spte =3D iter->old_spte | SPTE_SHARED_MASK;
+		/* No side effect is needed */
+		if (new_spte !=3D iter->old_spte)
+			kvm_tdp_mmu_write_spte(iter->sptep, new_spte);
+	}
+}
+
+static void kvm_tdp_mmu_update_private_spte(
+	struct kvm *kvm, struct tdp_iter *iter, bool allow_private)
+{
+	u64 new_spte;
+
+	WARN_ON(!iter->is_private);
+	if (allow_private) {
+		new_spte =3D iter->old_spte & ~SPTE_SHARED_MASK;
+		/* No side effect is needed */
+		if (new_spte !=3D iter->old_spte)
+			kvm_tdp_mmu_write_spte(iter->sptep, new_spte);
+	} else {
+		if (is_shadow_present_pte(iter->old_spte)) {
+			/* Zap SPTE */
+			new_spte =3D shadow_nonpresent_spte(iter->old_spte) |
+				SPTE_SHARED_MASK;
+			if (new_spte !=3D iter->old_spte)
+				tdp_mmu_set_spte(kvm, iter, new_spte);
+		} else {
+			new_spte =3D iter->old_spte | SPTE_SHARED_MASK;
+			/* No side effect is needed */
+			if (new_spte !=3D iter->old_spte)
+				kvm_tdp_mmu_write_spte(iter->sptep, new_spte);
+		}
+	}
+}
+
+/*
+ * Whether GPA is allowed to map private or shared is recorded in both pri=
vate
+ * and shared leaf spte entry as SPTE_SHARED_MASK bit.  They must match.
+ * private leaf spte entry
+ * - present: private mapping is allowed. (already mapped)
+ * - non-present: private mapping is allowed.
+ * - present | SPTE_SHARED_MASK: invalid state.
+ * - non-present | SPTE_SHARED_MASK: shared mapping is allowed.
+ *                                        may or may not be mapped as shar=
ed.
+ * shared leaf spte entry
+ * - present: invalid state
+ * - non-present: private mapping is allowed.
+ * - present | SPTE_SHARED_MASK: shared mapping is allowed (already mapped)
+ * - non-present | SPTE_SHARED_MASK: shared mapping is allowed.
+ *
+ * state change of private spte:
+ * map_gpa(private):
+ *      private EPT entry: clear SPTE_SHARED_MASK
+ *	  present: nop
+ *	  non-present: nop
+ *	  non-present | SPTE_SHARED_MASK -> non-present
+ *	share EPT entry: zap and clear SPTE_SHARED_MASK
+ *	  any -> non-present
+ * map_gpa(shared):
+ *	private EPT entry: zap and set SPTE_SHARED_MASK
+ *	  present     -> non-present | SPTE_SHARED_MASK
+ *	  non-present -> non-present | SPTE_SHARED_MASK
+ *	  non-present | SPTE_SHARED_MASK: nop
+ *	shared EPT entry: set SPTE_SHARED_MASK
+ *	  present | SPTE_SHARED_MASK: nop
+ *	  non-present -> non-present | SPTE_SHARED_MASK
+ *	  non-present | SPTE_SHARED_MASK: nop
+ * map(private GPA):
+ *	private EPT entry: try to populate
+ *	  present: nop
+ *	  non-present -> present
+ *	  non-present | SPTE_SHARED_MASK: nop. looping on EPT violation
+ *	shared EPT entry: nop
+ * map(shared GPA):
+ *	private EPT entry: nop
+ *	shared EPT entry: populate
+ *	  present | SPTE_SHARED_MASK: nop
+ *	  non-present | SPTE_SHARED_MASK -> present | SPTE_SHARED_MASK
+ *	  non-present: nop. looping on EPT violation
+ * zap(private GPA):
+ *	private EPT entry: zap and keep SPTE_SHARED_MASK
+ *	  present | SPTE_SHARED_MASK -> non-present | SPTE_SHARED_MASK
+ *	  non-present: nop as is_shadow_prsent_pte() is checked
+ *	  non-present | SPTE_SHARED_MASK: nop by is_shadow_present_pte()
+ *	shared EPT entry: nop
+ * zap(shared GPA):
+ *	private EPT entry: nop
+ *	shared EPT entry: zap and keep SPTE_SHARED_MASK
+ *	  present | SPTE_SHARED_MASK -> non-present | SPTE_SHARED_MASK
+ *	  non-present | SPTE_SHARED_MASK: nop
+ *	  non-present: nop.
+ */
+int kvm_tdp_mmu_map_gpa(struct kvm_vcpu *vcpu,
+			gfn_t *startp, gfn_t end, bool allow_private)
+{
+	struct kvm *kvm =3D vcpu->kvm;
+	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
+	gfn_t start =3D *startp;
+	gfn_t next;
+	int ret =3D 0;
+
+	lockdep_assert_held_write(&kvm->mmu_lock);
+	WARN_ON(start & kvm_gfn_shared_mask(kvm));
+	WARN_ON(end & kvm_gfn_shared_mask(kvm));
+
+	if (!VALID_PAGE(mmu->root.hpa) || !VALID_PAGE(mmu->private_root_hpa))
+		return -EINVAL;
+
+	next =3D end;
+	ret =3D kvm_tdp_mmu_update_range(
+		vcpu, false, kvm_gfn_shared(kvm, start), kvm_gfn_shared(kvm, end),
+		&next, kvm_tdp_mmu_update_shared_spte, allow_private);
+	if (ret) {
+		kvm_flush_remote_tlbs_with_address(kvm, start, next - start);
+		return ret;
+	}
+
+	ret =3D kvm_tdp_mmu_update_range(
+		vcpu, true, kvm_gfn_private(kvm, start), kvm_gfn_private(kvm, end),
+		&next, kvm_tdp_mmu_update_private_spte, allow_private);
+	if (ret =3D=3D -EAGAIN) {
+		*startp =3D next;
+		end =3D *startp;
+	}
+	kvm_flush_remote_tlbs_with_address(kvm, start, end - start);
+	return ret;
+}
+
 /*
  * Return the level of the lowest level SPTE added to sptes.
  * That SPTE may be non-present.
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index d1655571eb2f..4d1c27911134 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -51,6 +51,11 @@ void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kvm,
 				      gfn_t start, gfn_t end,
 				      int target_level, bool shared);
=20
+int kvm_tdp_mmu_populate_nonleaf(struct kvm_vcpu *vcpu, gfn_t start, gfn_t=
 end,
+				bool is_private, bool shared);
+int kvm_tdp_mmu_map_gpa(struct kvm_vcpu *vcpu,
+			gfn_t *startp, gfn_t end, bool allow_private);
+
 static inline void kvm_tdp_mmu_walk_lockless_begin(void)
 {
 	rcu_read_lock();
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 33C43C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:22:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383823AbiEES0e (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:26:34 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36324 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383229AbiEESTm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:42 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5453562CD;
        Thu,  5 May 2022 11:15:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774554; x=1683310554;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=r2/5ldw5pVUVlMl5ZifuzWjDJnhnIJbR7f7gWYsK9LU=;
  b=eeGXkpLOfpsQKkghs9OP1ULd2s3qnyTwynBe6Pgx2tfsxvKsq8UzzwZL
   q5WzZvfPZbpZ9k6DSu3zOir5UAsdj/RkKTacnsBrTKfVDqhRjnsddR8Rz
   vBmzigo1NoD87vfoVn1GfNcb+IcLmADUdg9c2jhjVb3u/JrsHxp7FMjmB
   r1i5oO3D4/z/x0/jlMgUESgPV4J28sADrhozoBNEGgNqDGIMOxgydj3Ez
   xwqaPcs0+6ekva0RSfl0hNRvHXVttukzageUtZU4dx9p4zcInoMNxD5UU
   2KRPMD7auPv70FrGKxV1Iknf0fFtWpUrFq21ItoFsAHpzOdPWj36oGa40
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354849"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354849"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:49 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083358"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:49 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 059/104] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page()
 for use by TDX
Date: Thu,  5 May 2022 11:14:53 -0700
Message-Id: 
 <8ef4dd02cdbdbee00064c35c22b753ef32a20c90.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Introduce a helper to directly (pun intended) fault-in a TDP page
without having to go through the full page fault path.  This allows
TDX to get the resulting pfn and also allows the RET_PF_* enums to
stay in mmu.c where they belong.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu.h     |  3 +++
 arch/x86/kvm/mmu/mmu.c | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index beff084d6cd3..6606f790ae0b 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -254,6 +254,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu=
 *vcpu, gpa_t cr2_or_gpa,
 	return vcpu->arch.mmu->page_fault(vcpu, &fault);
 }
=20
+kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
+			       u32 error_code, int max_level);
+
 /*
  * Check if a given access (described through the I/D, W/R and U/S bits of=
 a
  * page fault error code pfec) causes a permission fault with the given PTE
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 497e2b9e58cc..643b33c75ae9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4276,6 +4276,45 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct=
 kvm_page_fault *fault)
 	return direct_page_fault(vcpu, fault);
 }
=20
+kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
+			       u32 error_code, int max_level)
+{
+	int r;
+	struct kvm_page_fault fault =3D (struct kvm_page_fault) {
+		.addr =3D gpa,
+		.error_code =3D error_code,
+		.exec =3D error_code & PFERR_FETCH_MASK,
+		.write =3D error_code & PFERR_WRITE_MASK,
+		.present =3D error_code & PFERR_PRESENT_MASK,
+		.rsvd =3D error_code & PFERR_RSVD_MASK,
+		.user =3D error_code & PFERR_USER_MASK,
+		.prefetch =3D false,
+		.is_tdp =3D true,
+		.nx_huge_page_workaround_enabled =3D is_nx_huge_page_enabled(),
+		.is_private =3D kvm_is_private_gpa(vcpu->kvm, gpa),
+	};
+
+	if (mmu_topup_memory_caches(vcpu, false))
+		return KVM_PFN_ERR_FAULT;
+
+	/*
+	 * Loop on the page fault path to handle the case where an mmu_notifier
+	 * invalidation triggers RET_PF_RETRY.  In the normal page fault path,
+	 * KVM needs to resume the guest in case the invalidation changed any
+	 * of the page fault properties, i.e. the gpa or error code.  For this
+	 * path, the gpa and error code are fixed by the caller, and the caller
+	 * expects failure if and only if the page fault can't be fixed.
+	 */
+	do {
+		fault.max_level =3D max_level;
+		fault.req_level =3D PG_LEVEL_4K;
+		fault.goal_level =3D PG_LEVEL_4K;
+		r =3D direct_page_fault(vcpu, &fault);
+	} while (r =3D=3D RET_PF_RETRY && !is_error_noslot_pfn(fault.pfn));
+	return fault.pfn;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page);
+
 static void nonpaging_init_context(struct kvm_mmu *context)
 {
 	context->page_fault =3D nonpaging_page_fault;
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 96A7BC433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:22:47 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384031AbiEES0Y (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:26:24 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36398 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383230AbiEESTm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:42 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFC535D644;
        Thu,  5 May 2022 11:15:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774554; x=1683310554;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=yzsW3HP34uHnzDykzPATauL2DDgMbwFkjBwFhJu1DX0=;
  b=S/1dSvFKJGer6u5GdJkEKG6eQbmZDEAdcpTFjnazSWkCsFENWs43aa5d
   aj65dXXcQkg4LLPZtju9wbbZK+y/TOJssaw92YX7MBwNUZW6XzLxCDBf8
   aOn8s+Q5uPOr9HYAaCtRRischm7hjPNux5zHSduRQ3qml4Wx8dHHCryXm
   3X/7+BRtpXN+yDNpmIdHeQWW0LRtYDx1IMG1xt9+umc7FjHziLAqFk39O
   W5C4nHmT/wOATbQXrSXCuMonc9qY82YpLHDJlDChwYvRqMVSnqF3Ort3G
   tt0tJtN21T6Qfma2VG+Lz84WvmjsZlz3170b9wfP1ZIBF+uMwS0Ee6paP
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354850"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354850"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:49 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083363"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:49 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 060/104] [MARKER] The start of TDX KVM patch series: TD
 finalization
Date: Thu,  5 May 2022 11:14:54 -0700
Message-Id: 
 <8627d70daf0e76bcfcc473a866366d22f9b088e3.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD finalization.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 5797d172176d..53897312699f 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -21,11 +21,11 @@ Patch Layer status
 * TD VM creation/destruction:           Applied
 * TD vcpu creation/destruction:         Applied
 * TDX EPT violation:                    Applied
-* TD finalization:                      Not yet
+* TD finalization:                      Applying
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
 * KVM TDP refactoring for TDX:          Applied
 * KVM TDP MMU hooks:                    Applied
-* KVM TDP MMU MapGPA:                   Not yet
+* KVM TDP MMU MapGPA:                   Applied
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 89DE4C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:35 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1356252AbiEESZN (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:25:13 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36456 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383276AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEF645DA65;
        Thu,  5 May 2022 11:15:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774558; x=1683310558;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=6GH2JR5FupkC9izGlie6FMvLYsxUPH1D0aZ4F5r5lMs=;
  b=bMcy2QUbM2eg2oiAcPU1pw2AjOOdrF17eLRrB+AO38S4vXichZg9hFsE
   tK/jTtbmdBntOeiHNTVbEdg2bbDvrkwfYrCupo++HUNP5o793r7PgUHx4
   UfpR7t8czY7rFOkf9UdAj78S1e6FUHmiymzBNocX3d6t1BZmsrZbqy40z
   jTDUhLiR8xP9BZ19syJqKNUJ0mD8EBt2cxjY/BiaXZZ9PlqT0oqyoc7Ll
   Kf6kAWmHGeFpKjXK/1QG7TOlTOP/y4psF4rQ1DqtbjUlqm2yUMa1Lpg8p
   PRT9VfluSvUB31w0H7OZZz8auK0On7InjsF3AzewwdhjKe7GlxI8AvODJ
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354852"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354852"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:49 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083366"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:49 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 061/104] KVM: TDX: Create initial guest memory
Date: Thu,  5 May 2022 11:14:55 -0700
Message-Id: 
 <70ed041fd47c1f7571aa259450b3f9244edda48d.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because the guest memory is protected in TDX, the creation of the initial
guest memory requires a dedicated TDX module API, tdh_mem_page_add, instead
of directly copying the memory contents into the guest memory in the case
of the default VM type.  KVM MMU page fault handler callback,
private_page_add, handles it.

Define new subcommand, KVM_TDX_INIT_MEM_REGION, of VM-scoped
KVM_MEMORY_ENCRYPT_OP.  It assigns the guest page, copies the initial
memory contents into the guest memory, encrypts the guest memory.  At the
same time, optionally it extends memory measurement of the TDX guest.  It
calls the KVM MMU page fault(EPT-violation) handler to trigger the
callbacks for it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/uapi/asm/kvm.h       |   9 ++
 arch/x86/kvm/mmu/mmu.c                |   1 +
 arch/x86/kvm/vmx/tdx.c                | 135 +++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h                |   2 +
 tools/arch/x86/include/uapi/asm/kvm.h |   9 ++
 5 files changed, 155 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 6b1c3e0e9a3c..3e919b3c7d7b 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -534,6 +534,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -611,4 +612,12 @@ struct kvm_tdx_init_vm {
 	};
 };
=20
+#define KVM_TDX_MEASURE_MEMORY_REGION	(1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 643b33c75ae9..899dc8466a93 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5283,6 +5283,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 out:
 	return r;
 }
+EXPORT_SYMBOL(kvm_mmu_load);
=20
 void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 7f1eb75bb79d..3981cd509686 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -555,6 +555,21 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t roo=
t_hpa, int pgd_level)
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
 }
=20
+static void tdx_measure_page(struct kvm_tdx *kvm_tdx, hpa_t gpa)
+{
+	struct tdx_module_output out;
+	u64 err;
+	int i;
+
+	for (i =3D 0; i < PAGE_SIZE; i +=3D TDX_EXTENDMR_CHUNKSIZE) {
+		err =3D tdh_mr_extend(kvm_tdx->tdr.pa, gpa + i, &out);
+		if (KVM_BUG_ON(err, &kvm_tdx->kvm)) {
+			pr_tdx_error(TDH_MR_EXTEND, err, &out);
+			break;
+		}
+	}
+}
+
 static void tdx_unpin_pfn(struct kvm *kvm, kvm_pfn_t pfn)
 {
 	struct page *page =3D pfn_to_page(pfn);
@@ -571,6 +586,7 @@ static void __tdx_sept_set_private_spte(struct kvm *kvm=
, gfn_t gfn,
 	hpa_t hpa =3D pfn_to_hpa(pfn);
 	gpa_t gpa =3D gfn_to_gpa(gfn);
 	struct tdx_module_output out;
+	hpa_t source_pa;
 	u64 err;
=20
 	if (WARN_ON_ONCE(is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn)))
@@ -585,12 +601,38 @@ static void __tdx_sept_set_private_spte(struct kvm *k=
vm, gfn_t gfn,
 	 * See kvm_faultin_pfn_private() and kvm_mmu_release_fault().
 	 */
=20
+	/* Build-time faults are induced and handled via TDH_MEM_PAGE_ADD. */
 	if (likely(is_td_finalized(kvm_tdx))) {
 		err =3D tdh_mem_page_aug(kvm_tdx->tdr.pa, gpa, hpa, &out);
-		if (KVM_BUG_ON(err, kvm))
+		if (KVM_BUG_ON(err, kvm)) {
 			pr_tdx_error(TDH_MEM_PAGE_AUG, err, &out);
+			tdx_unpin_pfn(kvm, pfn);
+		}
 		return;
 	}
+
+	/*
+	 * In case of TDP MMU, fault handler can run concurrently.  Note
+	 * 'source_pa' is a TD scope variable, meaning if there are multiple
+	 * threads reaching here with all needing to access 'source_pa', it
+	 * will break.  However fortunately this won't happen, because below
+	 * TDH_MEM_PAGE_ADD code path is only used when VM is being created
+	 * before it is running, using KVM_TDX_INIT_MEM_REGION ioctl (which
+	 * always uses vcpu 0's page table and protected by vcpu->mutex).
+	 */
+	if (KVM_BUG_ON(kvm_tdx->source_pa =3D=3D INVALID_PAGE, kvm))
+		return;
+
+	source_pa =3D kvm_tdx->source_pa & ~KVM_TDX_MEASURE_MEMORY_REGION;
+
+	err =3D tdh_mem_page_add(kvm_tdx->tdr.pa, gpa, hpa, source_pa, &out);
+	if (KVM_BUG_ON(err, kvm)) {
+		pr_tdx_error(TDH_MEM_PAGE_ADD, err, &out);
+		tdx_unpin_pfn(kvm, pfn);
+	} else if ((kvm_tdx->source_pa & KVM_TDX_MEASURE_MEMORY_REGION))
+		tdx_measure_page(kvm_tdx, gpa);
+
+	kvm_tdx->source_pa =3D INVALID_PAGE;
 }
=20
 static void tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
@@ -1098,6 +1140,94 @@ void tdx_flush_tlb(struct kvm_vcpu *vcpu)
 		cpu_relax();
 }
=20
+#define TDX_SEPT_PFERR	PFERR_WRITE_MASK
+
+static int tdx_init_mem_region(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	struct kvm_tdx_init_mem_region region;
+	struct kvm_vcpu *vcpu;
+	struct page *page;
+	kvm_pfn_t pfn;
+	int idx, ret =3D 0;
+
+	/* The BSP vCPU must be created before initializing memory regions. */
+	if (!atomic_read(&kvm->online_vcpus))
+		return -EINVAL;
+
+	if (cmd->flags & ~KVM_TDX_MEASURE_MEMORY_REGION)
+		return -EINVAL;
+
+	if (copy_from_user(&region, (void __user *)cmd->data, sizeof(region)))
+		return -EFAULT;
+
+	/* Sanity check */
+	if (!IS_ALIGNED(region.source_addr, PAGE_SIZE) ||
+	    !IS_ALIGNED(region.gpa, PAGE_SIZE) ||
+	    !region.nr_pages ||
+	    region.gpa + (region.nr_pages << PAGE_SHIFT) <=3D region.gpa ||
+	    !kvm_is_private_gpa(kvm, region.gpa) ||
+	    !kvm_is_private_gpa(kvm, region.gpa + (region.nr_pages << PAGE_SHIFT)=
))
+		return -EINVAL;
+
+	vcpu =3D kvm_get_vcpu(kvm, 0);
+	if (mutex_lock_killable(&vcpu->mutex))
+		return -EINTR;
+
+	vcpu_load(vcpu);
+	idx =3D srcu_read_lock(&kvm->srcu);
+
+	kvm_mmu_reload(vcpu);
+
+	while (region.nr_pages) {
+		if (signal_pending(current)) {
+			ret =3D -ERESTARTSYS;
+			break;
+		}
+
+		if (need_resched())
+			cond_resched();
+
+
+		/* Pin the source page. */
+		ret =3D get_user_pages_fast(region.source_addr, 1, 0, &page);
+		if (ret < 0)
+			break;
+		if (ret !=3D 1) {
+			ret =3D -ENOMEM;
+			break;
+		}
+
+		kvm_tdx->source_pa =3D pfn_to_hpa(page_to_pfn(page)) |
+				     (cmd->flags & KVM_TDX_MEASURE_MEMORY_REGION);
+
+		pfn =3D kvm_mmu_map_tdp_page(vcpu, region.gpa, TDX_SEPT_PFERR,
+					   PG_LEVEL_4K);
+		if (is_error_noslot_pfn(pfn) || kvm->vm_bugged)
+			ret =3D -EFAULT;
+		else
+			ret =3D 0;
+
+		put_page(page);
+		if (ret)
+			break;
+
+		region.source_addr +=3D PAGE_SIZE;
+		region.gpa +=3D PAGE_SIZE;
+		region.nr_pages--;
+	}
+
+	srcu_read_unlock(&kvm->srcu, idx);
+	vcpu_put(vcpu);
+
+	mutex_unlock(&vcpu->mutex);
+
+	if (copy_to_user((void __user *)cmd->data, &region, sizeof(region)))
+		ret =3D -EFAULT;
+
+	return ret;
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -1114,6 +1244,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_TDX_INIT_VM:
 		r =3D tdx_td_init(kvm, &tdx_cmd);
 		break;
+	case KVM_TDX_INIT_MEM_REGION:
+		r =3D tdx_init_mem_region(kvm, &tdx_cmd);
+		break;
 	default:
 		r =3D -EINVAL;
 		goto out;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index d8dcbedd690b..29e7accee733 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -25,6 +25,8 @@ struct kvm_tdx {
 	u64 xfam;
 	int hkid;
=20
+	hpa_t source_pa;
+
 	bool finalized;
 	atomic_t tdh_mem_track;
=20
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 60a79f9ef174..af39f3adc179 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -533,6 +533,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -610,4 +611,12 @@ struct kvm_tdx_init_vm {
 	};
 };
=20
+#define KVM_TDX_MEASURE_MEMORY_REGION	(1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 70339C4321E
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384152AbiEESX6 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:23:58 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36458 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383303AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1965B5DA6F;
        Thu,  5 May 2022 11:15:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774559; x=1683310559;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=DJF4BaiDnESk/UyNoRkpJippmzFVNyq9HJDxTNJdc0s=;
  b=D8frs94jk1U2ife4WDcBFOJmOt2xXkLl2AChNEPecMO4xu6JNaiIX9kL
   BBmy+o8qvH8Hdbjw8C/lhkzUduyda6rWecgcK7V/mxVFsqY7E6jKHx/pA
   AYsHLrXGA7l724yRCHMCfmEgkCg1qxltPRJ2CEloxzdzxdTIHlCVd0iOo
   ePpeSzeSp8p4dI/LD0JboadeJAuJJGSQfD4PqFtdy0jpW4AHPuvJMMq72
   GNYPWXlBJmEJ0DtKEyAT9FUqBzCJXhGituTWvZjKvuj++QQuFsCuQzdTS
   IAkfefU9M+8kefMYjBPm/yTBIwivtB4vl/3/paIluCRFxGGiKbSxfP4RL
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354854"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354854"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:49 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083369"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:49 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 062/104] KVM: TDX: Finalize VM initialization
Date: Thu,  5 May 2022 11:14:56 -0700
Message-Id: 
 <8f5bd8677abfc01e8b02e6d6a72a14acbdf579e7.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

To protect the initial contents of the guest TD, the TDX module measures
the guest TD during the build process as SHA-384 measurement.  The
measurement of the guest TD contents needs to be completed to make the
guest TD ready to run.

Add a new subcommand, KVM_TDX_FINALIZE_VM, for VM-scoped
KVM_MEMORY_ENCRYPT_OP to finalize the measurement and mark the TDX VM ready
to run.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/uapi/asm/kvm.h       |  1 +
 arch/x86/kvm/vmx/tdx.c                | 21 +++++++++++++++++++++
 tools/arch/x86/include/uapi/asm/kvm.h |  1 +
 3 files changed, 23 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 3e919b3c7d7b..f058eab57c32 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -535,6 +535,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
 	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 3981cd509686..78418e553017 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1228,6 +1228,24 @@ static int tdx_init_mem_region(struct kvm *kvm, stru=
ct kvm_tdx_cmd *cmd)
 	return ret;
 }
=20
+static int tdx_td_finalizemr(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	u64 err;
+
+	if (!is_td_initialized(kvm) || is_td_finalized(kvm_tdx))
+		return -EINVAL;
+
+	err =3D tdh_mr_finalize(kvm_tdx->tdr.pa);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MR_FINALIZE, err, NULL);
+		return -EIO;
+	}
+
+	kvm_tdx->finalized =3D true;
+	return 0;
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -1247,6 +1265,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_TDX_INIT_MEM_REGION:
 		r =3D tdx_init_mem_region(kvm, &tdx_cmd);
 		break;
+	case KVM_TDX_FINALIZE_VM:
+		r =3D tdx_td_finalizemr(kvm);
+		break;
 	default:
 		r =3D -EINVAL;
 		goto out;
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index af39f3adc179..7f5eb5536ec5 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -534,6 +534,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
 	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 618A6C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:24:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383914AbiEES23 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:28:29 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36922 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383334AbiEESTr (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:47 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C499315822;
        Thu,  5 May 2022 11:16:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774562; x=1683310562;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=aIudZd3USQowTJaovq/Lv5Tlgdhp2EQL5ZZ1g16g3i0=;
  b=C1EeEPr7JjKAs439i9cpt6cayJmiAVylN04hI9BSPyshArsjQv3WAbY8
   G/7tcJbGCCdCMuvhpe5b2OCOoopCm6U13jxLcjDKY0CtBX533KP3np8+0
   42MmN7KPtXM7Ku9fB9kGbGNC9C9tvA3AePov8xTp30WXy/Eprq+VgwLYh
   bnpQ7xTV7aKmjoOSEAqqzXnl8G8ttj+yU40wUTKql9Jx5E0SwlzMPt4H1
   2gNdCuBKJQLuaZPw5TWGSsK36hCJFKyYz64ux06FF2OtDbh924wTguPK/
   0DGmedX0OBtJ1pFuMTDtuwTFbfgyvdA+IFS7VsRcJXWfEYrQCo03/1bhv
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354858"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354858"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:49 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083372"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:49 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 063/104] [MARKER] The start of TDX KVM patch series: TD
 vcpu enter/exit
Date: Thu,  5 May 2022 11:14:57 -0700
Message-Id: 
 <ec52d8fcd8dc93238665bac36681259a12b07e3a.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD vcpu
enter/exit.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 53897312699f..b51e8e6b1541 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -12,6 +12,7 @@ What qemu can do
 - Qemu can create/destroy guest of TDX vm type.
 - Qemu can create/destroy vcpu of TDX vm type.
 - Qemu can populate initial guest memory image.
+- Qemu can finalize guest TD.
=20
 Patch Layer status
 ------------------
@@ -21,8 +22,8 @@ Patch Layer status
 * TD VM creation/destruction:           Applied
 * TD vcpu creation/destruction:         Applied
 * TDX EPT violation:                    Applied
-* TD finalization:                      Applying
-* TD vcpu enter/exit:                   Not yet
+* TD finalization:                      Applied
+* TD vcpu enter/exit:                   Applying
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D730EC433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:25:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383860AbiEES2i (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:28:38 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36370 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383335AbiEESTr (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:47 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57BFE14024;
        Thu,  5 May 2022 11:16:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774563; x=1683310563;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=YZOQ01NryO/irmwpK4dLsrWfZS5XyHt4+kNVcwSK6qc=;
  b=gqmyWic9aFIANpHy3KgLIi1NsjX6aYGgiLwN3cYeD7MF6mG80TJnRdct
   p38urxpugYq+vpX4JEGqcj9GlI1fiw9wQZkn9yyjsaQ7SivckfaJTWJ5I
   UJNua60JC99zI8ylZ7ZZfzeH3kGkq2DLyDFEhRyyakiUaXvuJnk7M10kn
   sb3qekHQsgT0YBJj8mmmpIn2KedlSw22cZ3QWhEVHcEHAZQmVHZWs+/Xq
   nqZPm732PHkp6uaM3pb3id8cyiktqYmM99D6+QR4oGNdLhEtnHZlERoez
   85LzLrWu/PZqFAMDnPK/ennaHLYc3mDWbNa9jocXZ/+esUYideACFTOji
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354859"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354859"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:50 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083375"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:49 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 064/104] KVM: TDX: Add helper assembly function to TDX
 vcpu
Date: Thu,  5 May 2022 11:14:58 -0700
Message-Id: 
 <f40b7827026d65963fea84d4af78cb1cbca85149.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX defines an API to run TDX vcpu with its own ABI.  Define an assembly
helper function to run TDX vcpu to hide the special ABI so that C code can
call it with function call ABI.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/vmenter.S | 146 +++++++++++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 435c187927c4..f655bcca0e93 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -2,6 +2,7 @@
 #include <linux/linkage.h>
 #include <asm/asm.h>
 #include <asm/bitsperlong.h>
+#include <asm/errno.h>
 #include <asm/kvm_vcpu_regs.h>
 #include <asm/nospec-branch.h>
 #include <asm/segment.h>
@@ -28,6 +29,13 @@
 #define VCPU_R15	__VCPU_REGS_R15 * WORD_SIZE
 #endif
=20
+#ifdef CONFIG_INTEL_TDX_HOST
+#define TDENTER 		0
+#define EXIT_REASON_TDCALL	77
+#define TDENTER_ERROR_BIT	63
+#define seamcall		.byte 0x66,0x0f,0x01,0xcf
+#endif
+
 .section .noinstr.text, "ax"
=20
 /**
@@ -328,3 +336,141 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
 	pop %_ASM_BP
 	RET
 SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff)
+
+#ifdef CONFIG_INTEL_TDX_HOST
+
+.pushsection .noinstr.text, "ax"
+
+/**
+ * __tdx_vcpu_run - Call SEAMCALL(TDENTER) to run a TD vcpu
+ * @tdvpr:	physical address of TDVPR
+ * @regs:	void * (to registers of TDVCPU)
+ * @gpr_mask:	non-zero if guest registers need to be loaded prior to TDENT=
ER
+ *
+ * Returns:
+ *	TD-Exit Reason
+ *
+ * Note: KVM doesn't support using XMM in its hypercalls, it's the HyperV
+ *	 code's responsibility to save/restore XMM registers on TDVMCALL.
+ */
+SYM_FUNC_START(__tdx_vcpu_run)
+	push %rbp
+	mov  %rsp, %rbp
+
+	push %r15
+	push %r14
+	push %r13
+	push %r12
+	push %rbx
+
+	/* Save @regs, which is needed after TDENTER to capture output. */
+	push %rsi
+
+	/* Load @tdvpr to RCX */
+	mov %rdi, %rcx
+
+	/* No need to load guest GPRs if the last exit wasn't a TDVMCALL. */
+	test %dx, %dx
+	je 1f
+
+	/* Load @regs to RAX, which will be clobbered with $TDENTER anyways. */
+	mov %rsi, %rax
+
+	mov VCPU_RBX(%rax), %rbx
+	mov VCPU_RDX(%rax), %rdx
+	mov VCPU_RBP(%rax), %rbp
+	mov VCPU_RSI(%rax), %rsi
+	mov VCPU_RDI(%rax), %rdi
+
+	mov VCPU_R8 (%rax),  %r8
+	mov VCPU_R9 (%rax),  %r9
+	mov VCPU_R10(%rax), %r10
+	mov VCPU_R11(%rax), %r11
+	mov VCPU_R12(%rax), %r12
+	mov VCPU_R13(%rax), %r13
+	mov VCPU_R14(%rax), %r14
+	mov VCPU_R15(%rax), %r15
+
+	/*  Load TDENTER to RAX.  This kills the @regs pointer! */
+1:	mov $TDENTER, %rax
+
+2:	seamcall
+
+	/* Skip to the exit path if TDENTER failed. */
+	bt $TDENTER_ERROR_BIT, %rax
+	jc 4f
+
+	/* Temporarily save the TD-Exit reason. */
+	push %rax
+
+	/* check if TD-exit due to TDVMCALL */
+	cmp $EXIT_REASON_TDCALL, %ax
+
+	/* Reload @regs to RAX. */
+	mov 8(%rsp), %rax
+
+	/* Jump on non-TDVMCALL */
+	jne 3f
+
+	/* Save all output from SEAMCALL(TDENTER) */
+	mov %rbx, VCPU_RBX(%rax)
+	mov %rbp, VCPU_RBP(%rax)
+	mov %rsi, VCPU_RSI(%rax)
+	mov %rdi, VCPU_RDI(%rax)
+	mov %r10, VCPU_R10(%rax)
+	mov %r11, VCPU_R11(%rax)
+	mov %r12, VCPU_R12(%rax)
+	mov %r13, VCPU_R13(%rax)
+	mov %r14, VCPU_R14(%rax)
+	mov %r15, VCPU_R15(%rax)
+
+3:	mov %rcx, VCPU_RCX(%rax)
+	mov %rdx, VCPU_RDX(%rax)
+	mov %r8,  VCPU_R8 (%rax)
+	mov %r9,  VCPU_R9 (%rax)
+
+	/*
+	 * Clear all general purpose registers except RSP and RAX to prevent
+	 * speculative use of the guest's values.
+	 */
+	xor %rbx, %rbx
+	xor %rcx, %rcx
+	xor %rdx, %rdx
+	xor %rsi, %rsi
+	xor %rdi, %rdi
+	xor %rbp, %rbp
+	xor %r8,  %r8
+	xor %r9,  %r9
+	xor %r10, %r10
+	xor %r11, %r11
+	xor %r12, %r12
+	xor %r13, %r13
+	xor %r14, %r14
+	xor %r15, %r15
+
+	/* Restore the TD-Exit reason to RAX for return. */
+	pop %rax
+
+	/* "POP" @regs. */
+4:	add $8, %rsp
+	pop %rbx
+	pop %r12
+	pop %r13
+	pop %r14
+	pop %r15
+
+	pop %rbp
+	ret
+
+5:	cmpb $0, kvm_rebooting
+	je 6f
+	mov $-EFAULT, %rax
+	jmp 4b
+6:	ud2
+	_ASM_EXTABLE(2b, 5b)
+
+SYM_FUNC_END(__tdx_vcpu_run)
+
+.popsection
+
+#endif
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C8E27C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:24:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384170AbiEES2R (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:28:17 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36868 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383342AbiEESTr (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:47 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FE1414081;
        Thu,  5 May 2022 11:16:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774563; x=1683310563;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=EjbdcIcWS+M05WFTYp/HNPmRivVr1dDRuO4IXDRv9BM=;
  b=Ysq3hAiz2GLACsB1g6oIZ2EpikrpNl3018xXNycQAgYpWnz+/JAcXQNA
   8TYsXDq84bYHKxUHd58b9Ah53uLPuclUJjIc6UEZ+C6gJbQrUDNmlvaIF
   x3riSJsd1R/5mxUSyPZxXuIqFL9Szf12N/M59eEzmKxC2NUqm//YQ9z1D
   BACFICBMNP6ObOpBeR5g11EnJ1hpG+y65tcVmLHtXERonS+oQSLaT/7UL
   cKz5yhylj319+byXfzcSirRjYbGo+91BKaGPyerHT9EDq7shnP+xcurRB
   ety0Bu4xULJCcWod2jlCfvbMeLURQJVeuo/38iMD+CJR99NvF/xlUY+GO
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354860"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354860"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:50 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083378"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:50 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 065/104] KVM: TDX: Implement TDX vcpu enter/exit path
Date: Thu,  5 May 2022 11:14:59 -0700
Message-Id: 
 <085a3bf1cce60b853d984df4426463c97aec63e8.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This patch implements running TDX vcpu.  Once vcpu runs on the logical
processor (LP), the TDX vcpu is associated with it.  When the TDX vcpu
moves to another LP, the TDX vcpu needs to flush its status on the LP.
When destroying TDX vcpu, it needs to complete flush and flush cpu memory
cache.  Track which LP the TDX vcpu run and flush it as necessary.

Do nothing on sched_in event as TDX doesn't support pause loop.

TDX vcpu execution requires restoring PMU debug store after returning back
to KVM because the TDX module unconditionally resets the value.  To reuse
the existing code, export perf_restore_debug_store.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 21 +++++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 32 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h     | 33 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  2 ++
 arch/x86/kvm/x86.c         |  1 +
 5 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 442d89e02459..099842a8a397 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -110,6 +110,23 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool =
init_event)
 	return vmx_vcpu_reset(vcpu, init_event);
 }
=20
+static int vt_vcpu_pre_run(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		/* Unconditionally continue to vcpu_run(). */
+		return 1;
+
+	return vmx_vcpu_pre_run(vcpu);
+}
+
+static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_run(vcpu);
+
+	return vmx_vcpu_run(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -222,8 +239,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.flush_tlb_gva =3D vt_flush_tlb_gva,
 	.flush_tlb_guest =3D vt_flush_tlb_guest,
=20
-	.vcpu_pre_run =3D vmx_vcpu_pre_run,
-	.vcpu_run =3D vmx_vcpu_run,
+	.vcpu_pre_run =3D vt_vcpu_pre_run,
+	.vcpu_run =3D vt_vcpu_run,
 	.handle_exit =3D vmx_handle_exit,
 	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
 	.update_emulated_instruction =3D vmx_update_emulated_instruction,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 78418e553017..4c69fe7c2ee5 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -10,6 +10,9 @@
 #include "vmx.h"
 #include "x86.h"
=20
+#include <trace/events/kvm.h>
+#include "trace.h"
+
 #undef pr_fmt
 #define pr_fmt(fmt) "tdx: " fmt
=20
@@ -550,6 +553,35 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	vcpu->kvm->vm_bugged =3D true;
 }
=20
+u64 __tdx_vcpu_run(hpa_t tdvpr, void *regs, u32 regs_mask);
+
+static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
+					struct vcpu_tdx *tdx)
+{
+	guest_enter_irqoff();
+	tdx->exit_reason.full =3D __tdx_vcpu_run(tdx->tdvpr.pa, vcpu->arch.regs, =
0);
+	guest_exit_irqoff();
+}
+
+fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	if (unlikely(vcpu->kvm->vm_bugged)) {
+		tdx->exit_reason.full =3D TDX_NON_RECOVERABLE_VCPU;
+		return EXIT_FASTPATH_NONE;
+	}
+
+	trace_kvm_entry(vcpu);
+
+	tdx_vcpu_enter_exit(vcpu, tdx);
+
+	vcpu->arch.regs_avail &=3D ~VMX_REGS_LAZY_LOAD_SET;
+	trace_kvm_exit(vcpu, KVM_ISA_VMX);
+
+	return EXIT_FASTPATH_NONE;
+}
+
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
 {
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 29e7accee733..f90f83b22d25 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -48,12 +48,45 @@ struct kvm_tdx {
 	spinlock_t seamcall_lock;
 };
=20
+union tdx_exit_reason {
+	struct {
+		/* 31:0 mirror the VMX Exit Reason format */
+		u64 basic		: 16;
+		u64 reserved16		: 1;
+		u64 reserved17		: 1;
+		u64 reserved18		: 1;
+		u64 reserved19		: 1;
+		u64 reserved20		: 1;
+		u64 reserved21		: 1;
+		u64 reserved22		: 1;
+		u64 reserved23		: 1;
+		u64 reserved24		: 1;
+		u64 reserved25		: 1;
+		u64 bus_lock_detected	: 1;
+		u64 enclave_mode	: 1;
+		u64 smi_pending_mtf	: 1;
+		u64 smi_from_vmx_root	: 1;
+		u64 reserved30		: 1;
+		u64 failed_vmentry	: 1;
+
+		/* 63:32 are TDX specific */
+		u64 details_l1		: 8;
+		u64 class		: 8;
+		u64 reserved61_48	: 14;
+		u64 non_recoverable	: 1;
+		u64 error		: 1;
+	};
+	u64 full;
+};
+
 struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
=20
 	struct tdx_td_page tdvpr;
 	struct tdx_td_page *tdvpx;
=20
+	union tdx_exit_reason exit_reason;
+
 	bool initialized;
=20
 	/*
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 1e63158081a5..96e7d853f93f 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -141,6 +141,7 @@ void tdx_vm_free(struct kvm *kvm);
 int tdx_vcpu_create(struct kvm_vcpu *vcpu);
 void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -161,6 +162,7 @@ static inline void tdx_vm_free(struct kvm *kvm) {}
 static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTS=
UPP; }
 static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
+static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu) { return EXIT=
_FASTPATH_NONE; }
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d2dc4333f493..b4e79b55ede2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -304,6 +304,7 @@ const struct kvm_stats_header kvm_vcpu_stats_header =3D=
 {
 };
=20
 u64 __read_mostly host_xcr0;
+EXPORT_SYMBOL_GPL(host_xcr0);
 u64 __read_mostly supported_xcr0;
 EXPORT_SYMBOL_GPL(supported_xcr0);
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D5C05C4167E
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1385048AbiEESaH (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:30:07 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37298 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383388AbiEESTy (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:54 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3C5F38D90;
        Thu,  5 May 2022 11:16:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774567; x=1683310567;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=zt4Yu3nwgeAzEX1NGQfWr8DzjA9UBF3Xk5KbtI2IbZk=;
  b=hf4U3oX6LHEiJklUxawR9rLpx+g6hY6qmCqBO6xSCFuAb8KIhkgMSOC8
   Hav6LKRMztNiF/JzI8H2s7930X/iKZf0FaNEkssHI/6rNWNcSsCAc5oXd
   i90LxFwsaJLzVdoNeN8v4HCq6yliP98JWNZovl3RCwAhrc6SbeL3lOPLT
   Uwcaeqzn0T52W3x4L+5eGfhH1EEnzfpfFo0av3AG6eshrP3w9850hCayF
   SNhlwQKvjNqSeRSvx1dK+oPSfpCun1uW8P5UJ8/vx4BY/CDDUK7lZlqIa
   /AplCiqqVZtIgqEqx1KPksp77dtJUrrdxGLyglpvtVRbHbHCruS+tZfBM
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354862"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354862"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:50 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083381"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:50 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 066/104] KVM: TDX: vcpu_run: save/restore host
 state(host kernel gs)
Date: Thu,  5 May 2022 11:15:00 -0700
Message-Id: 
 <f699aa4630f9823084017e155f4dcbac911030a6.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

On entering/exiting TDX vcpu, Preserved or clobbered CPU state is different
from VMX case.  Add TDX hooks to save/restore host/guest CPU state.
Save/restore kernel GS base MSR.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c    | 28 +++++++++++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 39 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h     |  4 ++++
 arch/x86/kvm/vmx/x86_ops.h |  4 ++++
 4 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 099842a8a397..f101f358d90c 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -110,6 +110,30 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool =
init_event)
 	return vmx_vcpu_reset(vcpu, init_event);
 }
=20
+static void vt_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * All host state is saved/restored across SEAMCALL/SEAMRET, and the
+	 * guest state of a TD is obviously off limits.  Deferring MSRs and DRs
+	 * is pointless because the TDX module needs to load *something* so as
+	 * not to expose guest state.
+	 */
+	if (is_td_vcpu(vcpu)) {
+		tdx_prepare_switch_to_guest(vcpu);
+		return;
+	}
+
+	vmx_prepare_switch_to_guest(vcpu);
+}
+
+static void vt_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_put(vcpu);
+
+	return vmx_vcpu_put(vcpu);
+}
+
 static int vt_vcpu_pre_run(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -206,9 +230,9 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vcpu_free =3D vt_vcpu_free,
 	.vcpu_reset =3D vt_vcpu_reset,
=20
-	.prepare_switch_to_guest =3D vmx_prepare_switch_to_guest,
+	.prepare_switch_to_guest =3D vt_prepare_switch_to_guest,
 	.vcpu_load =3D vmx_vcpu_load,
-	.vcpu_put =3D vmx_vcpu_put,
+	.vcpu_put =3D vt_vcpu_put,
=20
 	.update_exception_bitmap =3D vmx_update_exception_bitmap,
 	.get_msr_feature =3D vmx_get_msr_feature,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 4c69fe7c2ee5..1aa52a093764 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/cpu.h>
+#include <linux/mmu_context.h>
=20
 #include <asm/tdx.h>
=20
@@ -461,6 +462,9 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.guest_state_protected =3D
 		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
=20
+	tdx->host_state_need_save =3D true;
+	tdx->host_state_need_restore =3D false;
+
 	return 0;
=20
 free_tdvpx:
@@ -474,6 +478,39 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	return ret;
 }
=20
+void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	if (!tdx->host_state_need_save)
+		return;
+
+	if (likely(is_64bit_mm(current->mm)))
+		tdx->msr_host_kernel_gs_base =3D current->thread.gsbase;
+	else
+		tdx->msr_host_kernel_gs_base =3D read_msr(MSR_KERNEL_GS_BASE);
+
+	tdx->host_state_need_save =3D false;
+}
+
+static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	tdx->host_state_need_save =3D true;
+	if (!tdx->host_state_need_restore)
+		return;
+
+	wrmsrl(MSR_KERNEL_GS_BASE, tdx->msr_host_kernel_gs_base);
+	tdx->host_state_need_restore =3D false;
+}
+
+void tdx_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	vmx_vcpu_pi_put(vcpu);
+	tdx_prepare_switch_to_host(vcpu);
+}
+
 void tdx_vcpu_free(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
@@ -576,6 +613,8 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
+	tdx->host_state_need_restore =3D true;
+
 	vcpu->arch.regs_avail &=3D ~VMX_REGS_LAZY_LOAD_SET;
 	trace_kvm_exit(vcpu, KVM_ISA_VMX);
=20
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index f90f83b22d25..414c15235ed0 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -89,6 +89,10 @@ struct vcpu_tdx {
=20
 	bool initialized;
=20
+	bool host_state_need_save;
+	bool host_state_need_restore;
+	u64 msr_host_kernel_gs_base;
+
 	/*
 	 * Dummy to make pmu_intel not corrupt memory.
 	 * TODO: Support PMU for TDX.  Future work.
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 96e7d853f93f..a2deb42794c0 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -142,6 +142,8 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu);
 void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
 fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu);
+void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
+void tdx_vcpu_put(struct kvm_vcpu *vcpu);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -163,6 +165,8 @@ static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu=
) { return -EOPNOTSUPP; }
 static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
 static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu) { return EXIT=
_FASTPATH_NONE; }
+static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
+static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9A16BC41535
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384947AbiEESaD (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:30:03 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37252 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383378AbiEESTx (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:53 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F4064393D5;
        Thu,  5 May 2022 11:16:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774568; x=1683310568;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=vQdc+7bMp85r+RLxtRflm+8G7MwPp5BRnYdgKX7piSg=;
  b=ihtcdYxJjLJLDjYNIoIYDsHVArbnCA+hoyDvulDoQPCIuJHfdOfaBiWY
   aO20QgT7l+9UmOQtnB+6h3t5iZn82Z2aSnS5V5AfxX6ORQgJU4YykdV5Q
   cWtXofLlgGfwymtXlml5nknYNpXytUtvK0sKiI+enJyopiw09wXfN1mYL
   3Czc2m4IpH7CPQPr2ioXLQczxMDZfUdkj6Bxl/G+qYsVQvDXJTgBebhgM
   5GufE4IGfxp3dU1DbFmmxrTzxPSvyBafolz/hNRSY1PEsyM2zH1rUU9db
   uNOrv+mVXyhSmdLFP6hwOeE6RQ3swSJe6fJsxQL9SbHlih7eYBgS2WWna
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354864"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354864"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:50 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083384"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:50 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 067/104] KVM: TDX: restore host xsave state when exit
 from the guest TD
Date: Thu,  5 May 2022 11:15:01 -0700
Message-Id: 
 <547fe194bc6eccfe7ddb23e68065b36a58d64758.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

On exiting from the guest TD, xsave state is clobbered.  Restore xsave
state on TD exit.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 1aa52a093764..a54ee22b6c64 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2,6 +2,7 @@
 #include <linux/cpu.h>
 #include <linux/mmu_context.h>
=20
+#include <asm/fpu/xcr.h>
 #include <asm/tdx.h>
=20
 #include "capabilities.h"
@@ -590,6 +591,22 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	vcpu->kvm->vm_bugged =3D true;
 }
=20
+static void tdx_restore_host_xsave_state(struct kvm_vcpu *vcpu)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+
+	if (static_cpu_has(X86_FEATURE_XSAVE) &&
+	    host_xcr0 !=3D (kvm_tdx->xfam & supported_xcr0))
+		xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);
+	if (static_cpu_has(X86_FEATURE_XSAVES) &&
+	    /* PT can be exposed to TD guest regardless of KVM's XSS support */
+	    host_xss !=3D (kvm_tdx->xfam & (supported_xss | XFEATURE_MASK_PT)))
+		wrmsrl(MSR_IA32_XSS, host_xss);
+	if (static_cpu_has(X86_FEATURE_PKU) &&
+	    (kvm_tdx->xfam & XFEATURE_MASK_PKRU))
+		write_pkru(vcpu->arch.host_pkru);
+}
+
 u64 __tdx_vcpu_run(hpa_t tdvpr, void *regs, u32 regs_mask);
=20
 static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
@@ -613,6 +630,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
+	tdx_restore_host_xsave_state(vcpu);
 	tdx->host_state_need_restore =3D true;
=20
 	vcpu->arch.regs_avail &=3D ~VMX_REGS_LAZY_LOAD_SET;
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E600DC46467
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1385186AbiEESaM (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:30:12 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37304 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383390AbiEESTy (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:54 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A2B13BBCE;
        Thu,  5 May 2022 11:16:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774568; x=1683310568;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=iz3Fe4BkLbZGqr8gwnV+pu6ogyyg63homohRHV1DqEQ=;
  b=Fg6U3Rumf2YYEIJq4QqEGXKTZE3x1L1rCjhgC/+8kb9zxrdOBDK6rz+p
   7oXHgGutJvfEhfKd1jcvqnYFZDYzzBrK/h3TKN+6lzI98ZRfq7NYJsFsk
   3xAlg+p5wqMEb8+Awb9ImhrE37NvRUHLBHHv+oAwhwPzEyZq8NG1ZYWh5
   yHGlG/KSx7v035Z95ausiOyQt8GHozQJZOr5WimqZNWSWvKGJDKz6GOlc
   918nbKDl+lxQch0XIaBZmuHWuBlmXkS7QE1v2BtybjWzmuEIyud92dj1C
   kV8BC8yxB9z/teX1iIo9NEiOZoN3Pzjko2AstduaXEi6caCBvXCtshKrb
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354865"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354865"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:50 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083387"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:50 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 068/104] KVM: x86: Allow to update cached values in
 kvm_user_return_msrs w/o wrmsr
Date: Thu,  5 May 2022 11:15:02 -0700
Message-Id: 
 <ca769c85f68ae3ac88fc77c48d32be16c9c29690.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Chao Gao <chao.gao@intel.com>

Several MSRs are constant and only used in userspace(ring 3).  But VMs may
have different values.  KVM uses kvm_set_user_return_msr() to switch to
guest's values and leverages user return notifier to restore them when the
kernel is to return to userspace.  To eliminate unnecessary wrmsr, KVM also
caches the value it wrote to an MSR last time.

TDX module unconditionally resets some of these MSRs to architectural INIT
state on TD exit.  It makes the cached values in kvm_user_return_msrs are
inconsistent with values in hardware.  This inconsistency needs to be
fixed.  Otherwise, it may mislead kvm_on_user_return() to skip restoring
some MSRs to the host's values.  kvm_set_user_return_msr() can help correct
this case, but it is not optimal as it always does a wrmsr.  So, introduce
a variation of kvm_set_user_return_msr() to update cached values and skip
that wrmsr.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 25 ++++++++++++++++++++-----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 88c3e9c78797..4513b619f614 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1944,6 +1944,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ip=
i_bitmap_low,
 int kvm_add_user_return_msr(u32 msr);
 int kvm_find_user_return_msr(u32 msr);
 int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
+void kvm_user_return_update_cache(unsigned int index, u64 val);
=20
 static inline bool kvm_is_supported_user_return_msr(u32 msr)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4e79b55ede2..bdba187fb087 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -431,6 +431,15 @@ static void kvm_user_return_msr_cpu_online(void)
 	}
 }
=20
+static void kvm_user_return_register_notifier(struct kvm_user_return_msrs =
*msrs)
+{
+	if (!msrs->registered) {
+		msrs->urn.on_user_return =3D kvm_on_user_return;
+		user_return_notifier_register(&msrs->urn);
+		msrs->registered =3D true;
+	}
+}
+
 int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
 {
 	unsigned int cpu =3D smp_processor_id();
@@ -445,15 +454,21 @@ int kvm_set_user_return_msr(unsigned slot, u64 value,=
 u64 mask)
 		return 1;
=20
 	msrs->values[slot].curr =3D value;
-	if (!msrs->registered) {
-		msrs->urn.on_user_return =3D kvm_on_user_return;
-		user_return_notifier_register(&msrs->urn);
-		msrs->registered =3D true;
-	}
+	kvm_user_return_register_notifier(msrs);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_set_user_return_msr);
=20
+/* Update the cache, "curr", and register the notifier */
+void kvm_user_return_update_cache(unsigned int slot, u64 value)
+{
+	struct kvm_user_return_msrs *msrs =3D this_cpu_ptr(user_return_msrs);
+
+	msrs->values[slot].curr =3D value;
+	kvm_user_return_register_notifier(msrs);
+}
+EXPORT_SYMBOL_GPL(kvm_user_return_update_cache);
+
 static void drop_user_return_notifiers(void)
 {
 	unsigned int cpu =3D smp_processor_id();
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6A792C4707F
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1385863AbiEESax (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:30:53 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37772 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383439AbiEESUK (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:20:10 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 607B95DA65;
        Thu,  5 May 2022 11:16:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774575; x=1683310575;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=pGVWdXymcfkAJa+MRlBkaFoB6YnKT+c72nrlPC0H22U=;
  b=P4370nbxEdFCJs0kbOzRoU6bxCzD4APy5tE4x/WWv9cv/wQ+5jkXNSQ7
   QrHdqiDZYUOYT4zq5BmraJVpH4xCANsLhMWz+AjmRsmleQgj5YzDXXfMT
   oVnVG3wlnaWP2IVFXFYIMzuYuD8OMbm9OB3aWGm8nfKliIRxNeQTJjwpU
   6ERFaC5EalJTQiFT887xSp0OWjhXxBTBDFQ98gYIZX3nsUdVJb2RH4pRD
   cMzGZE5jc3w6ZJgYUhB2AvMgKkaLlxMd2sTHO6C0XgI3LpV2+u0pKnh2w
   btbrU/BYfQ4UwXmfikjmg6rOBSGGeehfhX2VQWhWh2FGz3iz7xILF3NJw
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354867"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354867"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:50 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083390"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:50 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 069/104] KVM: TDX: restore user ret MSRs
Date: Thu,  5 May 2022 11:15:03 -0700
Message-Id: 
 <8c065b3e16a754b80e7af5734330040295d657db.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Several user ret MSRs are clobbered on TD exit.  Restore those values on
TD exit and before returning to ring 3.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 43 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a54ee22b6c64..80fe76214b3d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -591,6 +591,28 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	vcpu->kvm->vm_bugged =3D true;
 }
=20
+struct tdx_uret_msr {
+	u32 msr;
+	unsigned int slot;
+	u64 defval;
+};
+
+static struct tdx_uret_msr tdx_uret_msrs[] =3D {
+	{.msr =3D MSR_SYSCALL_MASK,},
+	{.msr =3D MSR_STAR,},
+	{.msr =3D MSR_LSTAR,},
+	{.msr =3D MSR_TSC_AUX,},
+};
+
+static void tdx_user_return_update_cache(void)
+{
+	int i;
+
+	for (i =3D 0; i < ARRAY_SIZE(tdx_uret_msrs); i++)
+		kvm_user_return_update_cache(tdx_uret_msrs[i].slot,
+					     tdx_uret_msrs[i].defval);
+}
+
 static void tdx_restore_host_xsave_state(struct kvm_vcpu *vcpu)
 {
 	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
@@ -630,6 +652,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
+	tdx_user_return_update_cache();
 	tdx_restore_host_xsave_state(vcpu);
 	tdx->host_state_need_restore =3D true;
=20
@@ -1474,6 +1497,26 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x8=
6_ops)
 	if (WARN_ON_ONCE(x86_ops->tlb_remote_flush))
 		return -EIO;
=20
+	for (i =3D 0; i < ARRAY_SIZE(tdx_uret_msrs); i++) {
+		/*
+		 * Here it checks if MSRs (tdx_uret_msrs) can be saved/restored
+		 * before returning to user space.
+		 *
+		 * this_cpu_ptr(user_return_msrs)->registerd isn't checked
+		 * because the registration is done at vcpu runtime by
+		 * kvm_set_user_return_msr().
+		 * Here is setting up cpu feature before running vcpu,
+		 * registered is alreays false.
+		 */
+		tdx_uret_msrs[i].slot =3D kvm_find_user_return_msr(tdx_uret_msrs[i].msr);
+		if (tdx_uret_msrs[i].slot =3D=3D -1) {
+			/* If any MSR isn't supported, it is a KVM bug */
+			pr_err("MSR %x isn't included by kvm_find_user_return_msr\n",
+				tdx_uret_msrs[i].msr);
+			return -EIO;
+		}
+	}
+
 	max_pkgs =3D topology_max_packages();
 	tdx_mng_key_config_lock =3D kcalloc(max_pkgs, sizeof(*tdx_mng_key_config_=
lock),
 				   GFP_KERNEL);
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 375FFC433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:19:06 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1376291AbiEESWm (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:22:42 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36394 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383178AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 604171ADA4;
        Thu,  5 May 2022 11:15:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774551; x=1683310551;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=JDJD5HxBG16VKKz5Tw7vIJeGTJe3WzrVqBHL0IAN/Ng=;
  b=jfjzz46z19F9BQTpBUbWDFoz8j2ymRhhKPgKPmF+8J8fBlBHuiyF9kMi
   0pQTO+g2IewWtgCn3gtzcxQ0ZtrooUVZf+FHa0YV6vyzHv3L4s1O0tBEO
   BUyuHLsAvh3E3c5M008iSXjUYfP/qgf6K82C2M7v1d9eTRpQ/CGc0itTP
   GiQuTjNgF7bkouEInthcIL+6rs2Fj1PQfCoyzZ5Gy1/ew4ytOGhNaJOIa
   iV4sWG0wxotV71+MYUqyB7MRh5KIvZ4KWSA7TmAjbaXn49fgeikYV9qI4
   T2LVIiiOP3Iwi+FYaJyKNggVi/9abirFXiZElVr+yghuv3FdBtnIInxmB
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113898"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113898"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083393"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:50 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 070/104] [MARKER] The start of TDX KVM patch series: TD
 vcpu exits/interrupts/hypercalls
Date: Thu,  5 May 2022 11:15:04 -0700
Message-Id: 
 <817f48ff0551b79d9185817b94df9d111c4c4bad.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD vcpu
exits, interrupts, and hypercalls.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index b51e8e6b1541..1cec14213f69 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -13,6 +13,7 @@ What qemu can do
 - Qemu can create/destroy vcpu of TDX vm type.
 - Qemu can populate initial guest memory image.
 - Qemu can finalize guest TD.
+- Qemu can start to run vcpu. But vcpu can not make progress yet.
=20
 Patch Layer status
 ------------------
@@ -23,7 +24,7 @@ Patch Layer status
 * TD vcpu creation/destruction:         Applied
 * TDX EPT violation:                    Applied
 * TD finalization:                      Applied
-* TD vcpu enter/exit:                   Applying
+* TD vcpu enter/exit:                   Applied
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A8B34C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:19:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383836AbiEESWu (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:22:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36370 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383174AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 606ED1AF04;
        Thu,  5 May 2022 11:15:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774551; x=1683310551;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=zi32JicQj96LCv9Y/HZhsTD683f+9s81qGHuSBeE4Tg=;
  b=iz4Gx0P9iSU4+DJzOsrTZkfdntoh9Qs7ER6afel6iu5tayq1fAKvojZ5
   FwibsN1p674K1WxgLHOpmphdKgeDxE6L/SAZWbHWAQD3cRAGUTuh9/UfS
   c5BzvwqSonNzDJ4VCNJFDsoFRvBKyRjvxgxlhTMXN+FpqLj0J1j2moXvn
   t6yMpNF8P/rucfUzy6aJGmf3iQwzuLARCFCIwnIBItLgKBu5LytzpigGd
   ctAqoSMLS1AE2kDAmoktIQ0G1B9xDIiPRoD+7o+D9ut5KKv/316WRutpi
   CcxZDjH4OeEksQq8HgLvi6B5/vxY64CD05Mp5PXPIsJgh0h3nHmjJs6mJ
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113899"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113899"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083396"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 071/104] KVM: TDX: complete interrupts after tdexit
Date: Thu,  5 May 2022 11:15:05 -0700
Message-Id: 
 <3dc5e620557dc7d44b8ad3e0da9ff0bf5fefc1a8.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This corresponds to VMX __vmx_complete_interrupts().  Because TDX
virtualize vAPIC, KVM only needs to care NMI injection.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 80fe76214b3d..7af7c84f8891 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -591,6 +591,14 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	vcpu->kvm->vm_bugged =3D true;
 }
=20
+static void tdx_complete_interrupts(struct kvm_vcpu *vcpu)
+{
+	/* Avoid costly SEAMCALL if no nmi was injected */
+	if (vcpu->arch.nmi_injected)
+		vcpu->arch.nmi_injected =3D td_management_read8(to_tdx(vcpu),
+							      TD_VCPU_PEND_NMI);
+}
+
 struct tdx_uret_msr {
 	u32 msr;
 	unsigned int slot;
@@ -659,6 +667,8 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
 	vcpu->arch.regs_avail &=3D ~VMX_REGS_LAZY_LOAD_SET;
 	trace_kvm_exit(vcpu, KVM_ISA_VMX);
=20
+	tdx_complete_interrupts(vcpu);
+
 	return EXIT_FASTPATH_NONE;
 }
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E3A5EC433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:24:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384121AbiEES1s (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:27:48 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36858 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383190AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66CFD38D90;
        Thu,  5 May 2022 11:15:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774552; x=1683310552;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=EmTD5xuasLG79hjIVjNHG4j5atRZDi3WJaBWf+tyl5w=;
  b=kU5folv09sq5Rg6oQyGO00gzff1B57Su4EMVKXAv1x2eGeS/CfYBxkxY
   H6BGwXFH8r48iZS/7sxiPeAgR8slCnEA/D1CuaEIcqop1sngnbVgu5tZv
   hFI3qXY6PrH41iLAI7dk5risaJ4eNaKUirlOm3Gk8JE63BFpFo9lB9Uhp
   xhZdOnSXtx55OGVsAYNbR96VTjwzNXiRAtBwmEZG9znuoh+Z7sgy0h/TM
   vfS4ss7Ub8A7zog6HjT+2vRBYLXFHQivIdg3fYkoAN+sueM9N96gG1J/x
   NEC4oovNcRyNGpR/hLYU34FSMu+lQI/c45C7qHLtJ8LFnB8SefFAK8pLY
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113901"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113901"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083399"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 072/104] KVM: TDX: restore debug store when TD exit
Date: Thu,  5 May 2022 11:15:06 -0700
Message-Id: 
 <171422f20eedf32e1099ea52d6102f5571f80850.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because debug store is clobbered, restore it on TD exit.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/events/intel/ds.c | 1 +
 arch/x86/kvm/vmx/tdx.c     | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 376cc3d66094..cdba4227ad3b 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2256,3 +2256,4 @@ void perf_restore_debug_store(void)
=20
 	wrmsrl(MSR_IA32_DS_AREA, (unsigned long)ds);
 }
+EXPORT_SYMBOL_GPL(perf_restore_debug_store);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 7af7c84f8891..f92941586b42 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -661,6 +661,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
 	tdx_user_return_update_cache();
+	perf_restore_debug_store();
 	tdx_restore_host_xsave_state(vcpu);
 	tdx->host_state_need_restore =3D true;
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 56CDFC433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:23:26 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235851AbiEES1C (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:27:02 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36372 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383206AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37BE93A714;
        Thu,  5 May 2022 11:15:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774553; x=1683310553;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=2hsiPaxykS0hFgrg6vKCDbwTAq/Z3c/uskYvahx3PYU=;
  b=Pe9sT4XveBLmGeA5j/qaNU/K9yaqt2blrQSYa5cGo94o3FiYWKYOG7Kl
   Sk8Pqcy+gFYHbvVu5QZhwM/B+wgwZeVxXKYeM8idcLDyN4tN6SG4RGnb5
   H2uKb4rPI7fSs0KBPNVejDiwbx0xDvPci+FUk3JhNLLtKNXxxVwF6Oozg
   GLiUTHpSSnRsyjXrWP1dGzTS9HM3p3CdxPuBgi/MIT2c3wKIFtuWVj23C
   mZmrAHYnxaQpxHYDn+2VMbujkZ32j1X7f7eEg9BdzEZxF0TJ8nNGv3L2B
   rHl3CI4nfXZ8sz59Daj75XcKAahUpK37wLgUIstTR0IeA13JzkXt5E/Pk
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113902"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113902"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083402"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 073/104] KVM: TDX: handle vcpu migration over logical
 processor
Date: Thu,  5 May 2022 11:15:07 -0700
Message-Id: 
 <96148d16e5c3b394bec4d7f51130b2d35e33352e.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

For vcpu migration, in the case of VMX, VCMS is flushed on the source pcpu,
and load it on the target pcpu.  There are corresponding TDX SEAMCALL APIs,
call them on vcpu migration.  The logic is mostly same as VMX except the
TDX SEAMCALLs are used.

When shutting down the machine, (VMX or TDX) vcpus needs to be shutdown on
each pcpu.  Do the similar for TDX with TDX SEAMCALL APIs.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    |  43 +++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 121 +++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h     |   2 +
 arch/x86/kvm/vmx/x86_ops.h |   6 ++
 4 files changed, 168 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index f101f358d90c..ad09988c4faa 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -17,6 +17,25 @@ static bool vt_is_vm_type_supported(unsigned long type)
 		(enable_tdx && tdx_is_vm_type_supported(type));
 }
=20
+static int vt_hardware_enable(void)
+{
+	int ret;
+
+	ret =3D vmx_hardware_enable();
+	if (ret)
+		return ret;
+
+	tdx_hardware_enable();
+	return 0;
+}
+
+static void vt_hardware_disable(void)
+{
+	/* Note, TDX *and* VMX need to be disabled if TDX is enabled. */
+	tdx_hardware_disable();
+	vmx_hardware_disable();
+}
+
 static __init int vt_hardware_setup(void)
 {
 	int ret;
@@ -151,6 +170,14 @@ static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu)
 	return vmx_vcpu_run(vcpu);
 }
=20
+static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_load(vcpu, cpu);
+
+	return vmx_vcpu_load(vcpu, cpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -192,6 +219,14 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa=
_t root_hpa,
 	vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
 }
=20
+static void vt_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_sched_in(vcpu, cpu);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -214,8 +249,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.hardware_unsetup =3D vt_hardware_unsetup,
 	.check_processor_compatibility =3D vmx_check_processor_compatibility,
=20
-	.hardware_enable =3D vmx_hardware_enable,
-	.hardware_disable =3D vmx_hardware_disable,
+	.hardware_enable =3D vt_hardware_enable,
+	.hardware_disable =3D vt_hardware_disable,
 	.has_emulated_msr =3D vmx_has_emulated_msr,
=20
 	.is_vm_type_supported =3D vt_is_vm_type_supported,
@@ -231,7 +266,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vcpu_reset =3D vt_vcpu_reset,
=20
 	.prepare_switch_to_guest =3D vt_prepare_switch_to_guest,
-	.vcpu_load =3D vmx_vcpu_load,
+	.vcpu_load =3D vt_vcpu_load,
 	.vcpu_put =3D vt_vcpu_put,
=20
 	.update_exception_bitmap =3D vmx_update_exception_bitmap,
@@ -317,7 +352,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.request_immediate_exit =3D vmx_request_immediate_exit,
=20
-	.sched_in =3D vmx_sched_in,
+	.sched_in =3D vt_sched_in,
=20
 	.cpu_dirty_log_size =3D PML_ENTITY_NUM,
 	.update_cpu_dirty_logging =3D vmx_update_cpu_dirty_logging,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index f92941586b42..6233c65b2a48 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -61,6 +61,14 @@ static struct tdx_capabilities tdx_caps;
 static DEFINE_MUTEX(tdx_lock);
 static struct mutex *tdx_mng_key_config_lock;
=20
+/*
+ * A per-CPU list of TD vCPUs associated with a given CPU.  Used when a CPU
+ * is brought down to invoke TDH_VP_FLUSH on the approapriate TD vCPUS.
+ * Protected by interrupt mask.  This list is manipulated in process conte=
xt
+ * of vcpu and IPI callback.  See tdx_flush_vp_on_cpu().
+ */
+static DEFINE_PER_CPU(struct list_head, associated_tdvcpus);
+
 static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
 {
 	pa &=3D ~hkid_mask;
@@ -95,6 +103,36 @@ static inline bool is_td_finalized(struct kvm_tdx *kvm_=
tdx)
 	return kvm_tdx->finalized;
 }
=20
+static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu)
+{
+	list_del(&to_tdx(vcpu)->cpu_list);
+
+	/*
+	 * Ensure tdx->cpu_list is updated is before setting vcpu->cpu to -1,
+	 * otherwise, a different CPU can see vcpu->cpu =3D -1 and add the vCPU
+	 * to its list before its deleted from this CPUs list.
+	 */
+	smp_wmb();
+
+	vcpu->cpu =3D -1;
+}
+
+void tdx_hardware_enable(void)
+{
+	INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, raw_smp_processor_id()));
+}
+
+void tdx_hardware_disable(void)
+{
+	int cpu =3D raw_smp_processor_id();
+	struct list_head *tdvcpus =3D &per_cpu(associated_tdvcpus, cpu);
+	struct vcpu_tdx *tdx, *tmp;
+
+	/* Safe variant needed as tdx_disassociate_vp() deletes the entry. */
+	list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list)
+		tdx_disassociate_vp(&tdx->vcpu);
+}
+
 static void tdx_clear_page(unsigned long page)
 {
 	const void *zero_page =3D (const void *) __va(page_to_phys(ZERO_PAGE(0)));
@@ -171,6 +209,41 @@ static void tdx_reclaim_td_page(struct tdx_td_page *pa=
ge)
 	free_page(page->va);
 }
=20
+static void tdx_flush_vp(void *arg)
+{
+	struct kvm_vcpu *vcpu =3D arg;
+	u64 err;
+
+	lockdep_assert_irqs_disabled();
+
+	/* Task migration can race with CPU offlining. */
+	if (vcpu->cpu !=3D raw_smp_processor_id())
+		return;
+
+	/*
+	 * No need to do TDH_VP_FLUSH if the vCPU hasn't been initialized.  The
+	 * list tracking still needs to be updated so that it's correct if/when
+	 * the vCPU does get initialized.
+	 */
+	if (is_td_vcpu_created(to_tdx(vcpu))) {
+		err =3D tdh_vp_flush(to_tdx(vcpu)->tdvpr.pa);
+		if (unlikely(err && err !=3D TDX_VCPU_NOT_ASSOCIATED)) {
+			if (WARN_ON_ONCE(err))
+				pr_tdx_error(TDH_VP_FLUSH, err, NULL);
+		}
+	}
+
+	tdx_disassociate_vp(vcpu);
+}
+
+static void tdx_flush_vp_on_cpu(struct kvm_vcpu *vcpu)
+{
+	if (unlikely(vcpu->cpu =3D=3D -1))
+		return;
+
+	smp_call_function_single(vcpu->cpu, tdx_flush_vp, vcpu, 1);
+}
+
 static int tdx_do_tdh_phymem_cache_wb(void *param)
 {
 	u64 err =3D 0;
@@ -195,9 +268,11 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
 	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
 	cpumask_var_t packages;
 	bool cpumask_allocated;
+	struct kvm_vcpu *vcpu;
 	u64 err;
 	int ret;
 	int i;
+	unsigned long j;
=20
 	if (!is_hkid_assigned(kvm_tdx))
 		return;
@@ -205,6 +280,19 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
 	if (!is_td_created(kvm_tdx))
 		goto free_hkid;
=20
+	kvm_for_each_vcpu(j, vcpu, kvm)
+		tdx_flush_vp_on_cpu(vcpu);
+
+	mutex_lock(&tdx_lock);
+	err =3D tdh_mng_vpflushdone(kvm_tdx->tdr.pa);
+	mutex_unlock(&tdx_lock);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_VPFLUSHDONE, err, NULL);
+		pr_err("tdh_mng_vpflushdone failed. HKID %d is leaked.\n",
+			kvm_tdx->hkid);
+		return;
+	}
+
 	cpumask_allocated =3D zalloc_cpumask_var(&packages, GFP_KERNEL);
 	cpus_read_lock();
 	for_each_online_cpu(i) {
@@ -479,6 +567,26 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	return ret;
 }
=20
+void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	if (vcpu->cpu =3D=3D cpu)
+		return;
+
+	tdx_flush_vp_on_cpu(vcpu);
+
+	local_irq_disable();
+	/*
+	 * Pairs with the smp_wmb() in tdx_disassociate_vp() to ensure
+	 * vcpu->cpu is read before tdx->cpu_list.
+	 */
+	smp_rmb();
+
+	list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu));
+	local_irq_enable();
+}
+
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
@@ -525,6 +633,19 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu)
 		tdx_reclaim_td_page(&tdx->tdvpx[i]);
 	kfree(tdx->tdvpx);
 	tdx_reclaim_td_page(&tdx->tdvpr);
+
+	/*
+	 * kvm_free_vcpus()
+	 *   -> kvm_unload_vcpu_mmu()
+	 *
+	 * does vcpu_load() for every vcpu after they already disassociated
+	 * from the per cpu list when tdx_vm_teardown(). So we need to
+	 * disassociate them again, otherwise the freed vcpu data will be
+	 * accessed when do list_{del,add}() on associated_tdvcpus list
+	 * later.
+	 */
+	tdx_flush_vp_on_cpu(vcpu);
+	WARN_ON(vcpu->cpu !=3D -1);
 }
=20
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 414c15235ed0..32e05efa70f9 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -85,6 +85,8 @@ struct vcpu_tdx {
 	struct tdx_td_page tdvpr;
 	struct tdx_td_page *tdvpx;
=20
+	struct list_head cpu_list;
+
 	union tdx_exit_reason exit_reason;
=20
 	bool initialized;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index a2deb42794c0..6594bcb717cd 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -132,6 +132,8 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu);
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
 bool tdx_is_vm_type_supported(unsigned long type);
 void tdx_hardware_unsetup(void);
+void tdx_hardware_enable(void);
+void tdx_hardware_disable(void);
 int tdx_dev_ioctl(void __user *argp);
=20
 int tdx_vm_init(struct kvm *kvm);
@@ -144,6 +146,7 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_ev=
ent);
 fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu);
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void tdx_vcpu_put(struct kvm_vcpu *vcpu);
+void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -154,6 +157,8 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root=
_hpa, int root_level);
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 0; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
 static inline void tdx_hardware_unsetup(void) {}
+static inline void tdx_hardware_enable(void) {}
+static inline void tdx_hardware_disable(void) {}
 static inline int tdx_dev_ioctl(void __user *argp) { return -EOPNOTSUPP; };
=20
 static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
@@ -167,6 +172,7 @@ static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu=
, bool init_event) {}
 static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu) { return EXIT=
_FASTPATH_NONE; }
 static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
+static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1B591C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:23:45 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383797AbiEES1V (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:27:21 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36886 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383210AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37C733BBCE;
        Thu,  5 May 2022 11:15:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774553; x=1683310553;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=so7kiXtRlPEg3DgTCrMIl6OMFEHOZkZ06JH3qx9GgwY=;
  b=TCbpo90I49uMTBdBUJbp5yj2oj8mTKgG97xPh01XHNSINAfXOeTGbMZk
   zEP0YWYOvOpqjMBwInHRBVuq/oq7/JQZOp4EACBeP+XAUVmR4OOP1yj3e
   Po1r+Ja+Y02QEKfYoqQorxzqRaML/kMzqjaVKoJWYQJbPnaHoEab/hiAR
   r7IjSFc2rDmeIlt7GKNTpo+upz9mXk1XiCCLd3ja5zVsODsIVzCp4YSod
   f7teV55TVqFh2QaZhfaLMhKwYoPurSZau02Xh1iJpQ78xcHrDBjBU8M9d
   jAMQAvg9i9EW8Gqjy0UG3UCJI2jMf7IFXhyAk1Pxp4Y9FYv1jDfoqG/TS
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113903"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113903"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083405"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 074/104] KVM: x86: Add a switch_db_regs flag to handle
 TDX's auto-switched behavior
Date: Thu,  5 May 2022 11:15:08 -0700
Message-Id: 
 <628f68b444170562b7dd344c932f7cd2143165b1.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a flag, KVM_DEBUGREG_AUTO_SWITCHED_GUEST, to skip saving/restoring DRs
irrespective of any other flags.  TDX-SEAM unconditionally saves and
restores guest DRs and reset to architectural INIT state on TD exit.
So, KVM needs to save host DRs before TD enter without restoring guest DRs
and restore host DRs after TD exit.

Opportunistically convert the KVM_DEBUGREG_* definitions to use BIT().

Reported-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  9 +++++++--
 arch/x86/kvm/vmx/tdx.c          |  1 +
 arch/x86/kvm/x86.c              | 11 ++++++++---
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 4513b619f614..5d2855e8de81 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -537,8 +537,13 @@ struct kvm_pmu {
 struct kvm_pmu_ops;
=20
 enum {
-	KVM_DEBUGREG_BP_ENABLED =3D 1,
-	KVM_DEBUGREG_WONT_EXIT =3D 2,
+	KVM_DEBUGREG_BP_ENABLED		=3D BIT(0),
+	KVM_DEBUGREG_WONT_EXIT		=3D BIT(1),
+	/*
+	 * Guest debug registers are saved/restored by hardware on exit from
+	 * or enter guest. KVM needn't switch them.
+	 */
+	KVM_DEBUGREG_AUTO_SWITCH	=3D BIT(2),
 };
=20
 struct kvm_mtrr_range {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6233c65b2a48..41edf3e414ec 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -543,6 +543,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
=20
 	vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
=20
+	vcpu->arch.switch_db_regs =3D KVM_DEBUGREG_AUTO_SWITCH;
 	vcpu->arch.cr0_guest_owned_bits =3D -1ul;
 	vcpu->arch.cr4_guest_owned_bits =3D -1ul;
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bdba187fb087..2691abb46fac 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10265,7 +10265,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.guest_fpu.xfd_err)
 		wrmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
=20
-	if (unlikely(vcpu->arch.switch_db_regs)) {
+	if (unlikely(vcpu->arch.switch_db_regs & ~KVM_DEBUGREG_AUTO_SWITCH)) {
 		set_debugreg(0, 7);
 		set_debugreg(vcpu->arch.eff_db[0], 0);
 		set_debugreg(vcpu->arch.eff_db[1], 1);
@@ -10307,6 +10307,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 */
 	if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)) {
 		WARN_ON(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP);
+		WARN_ON(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCH);
 		static_call(kvm_x86_sync_dirty_debug_regs)(vcpu);
 		kvm_update_dr0123(vcpu);
 		kvm_update_dr7(vcpu);
@@ -10319,8 +10320,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * care about the messed up debug address registers. But if
 	 * we have some of them active, restore the old state.
 	 */
-	if (hw_breakpoint_active())
-		hw_breakpoint_restore();
+	if (hw_breakpoint_active()) {
+		if (!(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCH))
+			hw_breakpoint_restore();
+		else
+			set_debugreg(__this_cpu_read(cpu_dr7), 7);
+	}
=20
 	vcpu->arch.last_vmentry_cpu =3D vcpu->cpu;
 	vcpu->arch.last_guest_tsc =3D kvm_read_l1_tsc(vcpu, rdtsc());
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 94226C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:23:33 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383652AbiEES1I (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:27:08 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36896 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383216AbiEESTl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:41 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73D0F45527;
        Thu,  5 May 2022 11:15:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774553; x=1683310553;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=nNDI032K8EEtqB2nXQeNC2FjSsuLVjodVijndKuOaUM=;
  b=YgJcpONJK4bwvT9RY3A7l9qmzO93JeG0Q7ptgg6eEf14yvBPlsu3Zx3p
   RRjF6YuqTVe/KJrukq4ExlVpyRm9WRKGSsLC+t+HQkMVjauqeFL+xnzHQ
   JG+iCudusBt3lRVjwv5GxBDgvJcCxhe+o8ovypsD+nMKEa0FlkX87XouA
   PbxUrOtO9gTDpX/0GlofQhZFF7YxTptSUK8p06DTkvBizZS+bDx8stZhg
   aXi6iEHicvbgYhgG9bcP2r4ADNNFxbAAeykp16LWTAG69Eel8xrK+81vg
   tOFw8nImzwmwl2vQopPR6YDNBkw+2AdNczahwhT2bIAu8GVbJ7euREU7R
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113905"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113905"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083408"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 075/104] KVM: TDX: Add support for find pending IRQ in
 a protected local APIC
Date: Thu,  5 May 2022 11:15:09 -0700
Message-Id: 
 <3ebaeb05d96a5953b1ab2ea2bea722e278fb0a5b.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <seanjc@google.com>

Add flag and hook to KVM's local APIC management to support determining
whether or not a TDX guest as a pending IRQ.  For TDX vCPUs, the virtual
APIC page is owned by the TDX module and cannot be accessed by KVM.  As a
result, registers that are virtualized by the CPU, e.g. PPR, cannot be
read or written by KVM.  To deliver interrupts for TDX guests, KVM must
send an IRQ to the CPU on the posted interrupt notification vector.  And
to determine if TDX vCPU has a pending interrupt, KVM must check if there
is an outstanding notification.

Return "no interrupt" in kvm_apic_has_interrupt() if the guest APIC is
protected to short-circuit the various other flows that try to pull an
IRQ out of the vAPIC, the only valid operation is querying _if_ an IRQ is
pending, KVM can't do anything based on _which_ IRQ is pending.

Intentionally omit sanity checks from other flows, e.g. PPR update, so as
not to degrade non-TDX guests with unecessary checks.  A well-behaved KVM
and userspace will never reach those flows for TDX guests, but reaching
them is not fatal if something does go awry.

Note, this doesn't handle interrupts that have been delivered to the vCPU
but not yet recognized by the core, i.e. interrupts that are sitting in
vmcs.GUEST_INTR_STATUS.  Querying that state requires a SEAMCALL and will
be supported in a future patch.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/irq.c                 |  3 +++
 arch/x86/kvm/lapic.c               |  3 +++
 arch/x86/kvm/lapic.h               |  2 ++
 arch/x86/kvm/vmx/main.c            | 11 +++++++++++
 arch/x86/kvm/vmx/tdx.c             |  7 +++++++
 arch/x86/kvm/vmx/x86_ops.h         |  2 ++
 8 files changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 6982d57e4518..ec98b3f734a2 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -112,6 +112,7 @@ KVM_X86_OP_OPTIONAL(pi_update_irte)
 KVM_X86_OP_OPTIONAL(pi_start_assignment)
 KVM_X86_OP_OPTIONAL(apicv_post_state_restore)
 KVM_X86_OP_OPTIONAL_RET0(dy_apicv_has_pending_interrupt)
+KVM_X86_OP_OPTIONAL(protected_apic_has_interrupt)
 KVM_X86_OP_OPTIONAL(set_hv_timer)
 KVM_X86_OP_OPTIONAL(cancel_hv_timer)
 KVM_X86_OP(setup_mce)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 5d2855e8de81..d6dce57fe0eb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1518,6 +1518,7 @@ struct kvm_x86_ops {
 	void (*pi_start_assignment)(struct kvm *kvm);
 	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
 	bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
+	bool (*protected_apic_has_interrupt)(struct kvm_vcpu *vcpu);
=20
 	int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
 			    bool *expired);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index f371f1292ca3..56e52eef0269 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -100,6 +100,9 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 	if (kvm_cpu_has_extint(v))
 		return 1;
=20
+	if (lapic_in_kernel(v) && v->arch.apic->guest_apic_protected)
+		return static_call(kvm_x86_protected_apic_has_interrupt)(v);
+
 	return kvm_apic_has_interrupt(v) !=3D -1;	/* LAPIC */
 }
 EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 137c3a2f5180..953b7f1d6257 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2567,6 +2567,9 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
 	if (!kvm_apic_present(vcpu))
 		return -1;
=20
+	if (apic->guest_apic_protected)
+		return -1;
+
 	__apic_update_ppr(apic, &ppr);
 	return apic_has_interrupt_for_ppr(apic, ppr);
 }
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 4e4f8a22754f..97972dc56f6e 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -52,6 +52,8 @@ struct kvm_lapic {
 	bool sw_enabled;
 	bool irr_pending;
 	bool lvt0_in_nmi_mode;
+	/* Select registers in the vAPIC cannot be read/written. */
+	bool guest_apic_protected;
 	/* Number of bits set in ISR. */
 	s16 isr_count;
 	/* The highest vector set in ISR; if -1 - invalid, must scan ISR. */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index ad09988c4faa..f14519c6a861 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -46,6 +46,9 @@ static __init int vt_hardware_setup(void)
=20
 	enable_tdx =3D enable_tdx && !tdx_hardware_setup(&vt_x86_ops);
=20
+	if (!enable_tdx)
+		vt_x86_ops.protected_apic_has_interrupt =3D NULL;
+
 	if (enable_ept)
 		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
 				      cpu_has_vmx_ept_execute_only());
@@ -178,6 +181,13 @@ static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cp=
u)
 	return vmx_vcpu_load(vcpu, cpu);
 }
=20
+static bool vt_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	KVM_BUG_ON(!is_td_vcpu(vcpu), vcpu->kvm);
+
+	return tdx_protected_apic_has_interrupt(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -329,6 +339,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.sync_pir_to_irr =3D vmx_sync_pir_to_irr,
 	.deliver_interrupt =3D vmx_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
+	.protected_apic_has_interrupt =3D vt_protected_apic_has_interrupt,
=20
 	.set_tss_addr =3D vmx_set_tss_addr,
 	.set_identity_map_addr =3D vmx_set_identity_map_addr,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 41edf3e414ec..98e0a907dc68 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -525,6 +525,8 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	if (!vcpu->arch.apic)
 		return -EINVAL;
=20
+	vcpu->arch.apic->guest_apic_protected =3D true;
+
 	ret =3D tdx_alloc_td_page(&tdx->tdvpr);
 	if (ret)
 		return ret;
@@ -588,6 +590,11 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	local_irq_enable();
 }
=20
+bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	return pi_has_pending_interrupt(vcpu);
+}
+
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 6594bcb717cd..d1face47f547 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -147,6 +147,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu);
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void tdx_vcpu_put(struct kvm_vcpu *vcpu);
 void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
+bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -173,6 +174,7 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *=
vcpu) { return EXIT_FASTP
 static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
+static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)=
 { return false; }
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7BA97C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:23:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383936AbiEES0o (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:26:44 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36322 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383227AbiEESTm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:42 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D79F5D676;
        Thu,  5 May 2022 11:15:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774555; x=1683310555;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=wCptZcUcsyZySNsXKqKhsEo8FgpHxpA5MijVVTkWO3g=;
  b=FWHPv2wIAiL3rH63zUqyWNNmuYSwhtZMIwbXtmMbXDuSIw1HFDZQ7972
   mP4ovJmcS9lCVTG0JMhByUKonite8DiM9RXAFNQcZgMYHPWHXXmL/7Zz+
   KS2MQXwW/rXD5MyW7iTD5Ljt7ZTXT3dheVCy6lRLZwVFloyi30fr+pQry
   6gYrZhcOyM8JRwpZcAl5tc+NRholceyIRTeTgvxGAbZTdzR0dAeOIHMfB
   VliROIEo2KXZXSc69voVVYZYS8tB4lbtAvp5b8MTi/WDQNuu/LuW8cyMt
   vCee30FfcaQEzmCMIZwU0S+rVMiDDx26cRYb/aIWUQtXF2FN81FBx1jkS
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113908"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113908"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083411"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:51 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 076/104] KVM: x86: Assume timer IRQ was injected if
 APIC state is proteced
Date: Thu,  5 May 2022 11:15:10 -0700
Message-Id: 
 <5e167bdffa69ecd0d58b630036719c3a515ee5af.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <seanjc@google.com>

If APIC state is protected, i.e. the vCPU is a TDX guest, assume a timer
IRQ was injected when deciding whether or not to busy wait in the "timer
advanced" path.  The "real" vIRR is not readable/writable, so trying to
query for a pending timer IRQ will return garbage.

Note, TDX can scour the PIR if it wants to be more precise and skip the
"wait" call entirely.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/lapic.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 953b7f1d6257..f8e190da769f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1577,8 +1577,17 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
 static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
-	u32 reg =3D kvm_lapic_get_reg(apic, APIC_LVTT);
+	u32 reg;
=20
+	/*
+	 * Assume a timer IRQ was "injected" if the APIC is protected.  KVM's
+	 * copy of the vIRR is bogus, it's the responsibility of the caller to
+	 * precisely check whether or not a timer IRQ is pending.
+	 */
+	if (apic->guest_apic_protected)
+		return true;
+
+	reg  =3D kvm_lapic_get_reg(apic, APIC_LVTT);
 	if (kvm_apic_hw_enabled(apic)) {
 		int vec =3D reg & APIC_VECTOR_MASK;
 		void *bitmap =3D apic->regs + APIC_ISR;
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E6111C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:19:19 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383715AbiEESW5 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:22:57 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36914 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383237AbiEESTm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:42 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36BE65DA12;
        Thu,  5 May 2022 11:15:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774555; x=1683310555;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=zguXL40awg9Gey1LF1OZWsmZLz0iZkRMLy45v2YTEQs=;
  b=YfkLjXnqUW6lavStKcCrb2Znqhmggpeh87XUyqQwHtr4f2q4R+YiL5AB
   +ve4AfZq5G064bYnhJ7NgnecXVuTryTaUU97tabTJE0IDZKtGwJU/D6TL
   c/9O/9qWMeWtUfPak8/WSLz8oKbMfu974U6biZC30AKJ1Wm7eZBmsYYhF
   t/OzyXMsSuPifcnRpp3J3vnEOir0plXkBZhuxL3MhxNuV504YOUIAyPbU
   vDxJJ4bm1tzZngdxmb7yq6GTRyp5czOlyjwbM9ZvimbdUDvsVczxytpD/
   9eoWPl692mGhpQtscCwRHkzAMurXkokn0eZYa/ssG6qUhbDMXYABoYQ5S
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113910"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113910"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:52 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083415"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:52 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 077/104] KVM: TDX: remove use of struct vcpu_vmx from
 posted_interrupt.c
Date: Thu,  5 May 2022 11:15:11 -0700
Message-Id: 
 <9e5418d1de1c924e3d99bd3b3705e59cf8945085.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

As TDX will use posted_interrupt.c, the use of struct vcpu_vmx is a blocker.
Because the members of struct pi_desc pi_desc and struct list_head
pi_wakeup_list are only used in posted_interrupt.c, introduce common
structure, struct vcpu_pi, make vcpu_vmx and vcpu_tdx has same layout
in the top of structure.

To minimize the diff size, avoid code conversion like,
vmx->pi_desc =3D> vmx->common->pi_desc.  Instead add compile time check
if the layout is expected.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/posted_intr.c | 41 ++++++++++++++++++++++++++--------
 arch/x86/kvm/vmx/posted_intr.h | 11 +++++++++
 arch/x86/kvm/vmx/tdx.c         |  1 +
 arch/x86/kvm/vmx/tdx.h         |  8 +++++++
 arch/x86/kvm/vmx/vmx.h         | 14 +++++++-----
 5 files changed, 60 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 160653925f38..0bc7a848b319 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -9,6 +9,7 @@
 #include "posted_intr.h"
 #include "trace.h"
 #include "vmx.h"
+#include "tdx.h"
=20
 /*
  * Maintain a per-CPU list of vCPUs that need to be awakened by wakeup_han=
dler()
@@ -29,9 +30,29 @@ static DEFINE_PER_CPU(struct list_head, wakeup_vcpus_on_=
cpu);
  */
 static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock);
=20
+/*
+ * The layout of the head of struct vcpu_vmx and struct vcpu_tdx must matc=
h with
+ * struct vcpu_pi.
+ */
+static_assert(offsetof(struct vcpu_pi, pi_desc) =3D=3D
+	      offsetof(struct vcpu_vmx, pi_desc));
+static_assert(offsetof(struct vcpu_pi, pi_wakeup_list) =3D=3D
+	      offsetof(struct vcpu_vmx, pi_wakeup_list));
+#ifdef CONFIG_INTEL_TDX_HOST
+static_assert(offsetof(struct vcpu_pi, pi_desc) =3D=3D
+	      offsetof(struct vcpu_tdx, pi_desc));
+static_assert(offsetof(struct vcpu_pi, pi_wakeup_list) =3D=3D
+	      offsetof(struct vcpu_tdx, pi_wakeup_list));
+#endif
+
+static inline struct vcpu_pi *vcpu_to_pi(struct kvm_vcpu *vcpu)
+{
+	return (struct vcpu_pi*)vcpu;
+}
+
 static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 {
-	return &(to_vmx(vcpu)->pi_desc);
+	return &vcpu_to_pi(vcpu)->pi_desc;
 }
=20
 static int pi_try_set_control(struct pi_desc *pi_desc, u64 old, u64 new)
@@ -50,8 +71,8 @@ static int pi_try_set_control(struct pi_desc *pi_desc, u6=
4 old, u64 new)
=20
 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 {
-	struct pi_desc *pi_desc =3D vcpu_to_pi_desc(vcpu);
-	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
+	struct vcpu_pi *vcpu_pi =3D vcpu_to_pi(vcpu);
+	struct pi_desc *pi_desc =3D &vcpu_pi->pi_desc;
 	struct pi_desc old, new;
 	unsigned long flags;
 	unsigned int dest;
@@ -88,7 +109,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 	 */
 	if (pi_desc->nv =3D=3D POSTED_INTR_WAKEUP_VECTOR) {
 		raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
-		list_del(&vmx->pi_wakeup_list);
+		list_del(&vcpu_pi->pi_wakeup_list);
 		raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
 	}
=20
@@ -142,15 +163,15 @@ static bool vmx_can_use_vtd_pi(struct kvm *kvm)
  */
 static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
 {
-	struct pi_desc *pi_desc =3D vcpu_to_pi_desc(vcpu);
-	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
+	struct vcpu_pi *vcpu_pi =3D vcpu_to_pi(vcpu);
+	struct pi_desc *pi_desc =3D &vcpu_pi->pi_desc;
 	struct pi_desc old, new;
 	unsigned long flags;
=20
 	local_irq_save(flags);
=20
 	raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
-	list_add_tail(&vmx->pi_wakeup_list,
+	list_add_tail(&vcpu_pi->pi_wakeup_list,
 		      &per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu));
 	raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
=20
@@ -187,7 +208,8 @@ static bool vmx_needs_pi_wakeup(struct kvm_vcpu *vcpu)
 	 * notification vector is switched to the one that calls
 	 * back to the pi_wakeup_handler() function.
 	 */
-	return vmx_can_use_ipiv(vcpu) || vmx_can_use_vtd_pi(vcpu->kvm);
+	return (vmx_can_use_ipiv(vcpu) && !is_td_vcpu(vcpu)) ||
+		vmx_can_use_vtd_pi(vcpu->kvm);
 }
=20
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
@@ -197,7 +219,8 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
 	if (!vmx_needs_pi_wakeup(vcpu))
 		return;
=20
-	if (kvm_vcpu_is_blocking(vcpu) && !vmx_interrupt_blocked(vcpu))
+	if (kvm_vcpu_is_blocking(vcpu) &&
+	    (is_td_vcpu(vcpu) || !vmx_interrupt_blocked(vcpu)))
 		pi_enable_wakeup_handler(vcpu);
=20
 	/*
diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
index 26992076552e..2fe8222308b2 100644
--- a/arch/x86/kvm/vmx/posted_intr.h
+++ b/arch/x86/kvm/vmx/posted_intr.h
@@ -94,6 +94,17 @@ static inline bool pi_test_sn(struct pi_desc *pi_desc)
 			(unsigned long *)&pi_desc->control);
 }
=20
+struct vcpu_pi {
+	struct kvm_vcpu	vcpu;
+
+	/* Posted interrupt descriptor */
+	struct pi_desc pi_desc;
+
+	/* Used if this vCPU is waiting for PI notification wakeup. */
+	struct list_head pi_wakeup_list;
+	/* Until here common layout betwwn vcpu_vmx and vcpu_tdx. */
+};
+
 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu);
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu);
 void pi_wakeup_handler(void);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 98e0a907dc68..758af6ec3507 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -526,6 +526,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 		return -EINVAL;
=20
 	vcpu->arch.apic->guest_apic_protected =3D true;
+	INIT_LIST_HEAD(&tdx->pi_wakeup_list);
=20
 	ret =3D tdx_alloc_td_page(&tdx->tdvpr);
 	if (ret)
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 32e05efa70f9..1268a49fdf18 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -4,6 +4,7 @@
=20
 #ifdef CONFIG_INTEL_TDX_HOST
=20
+#include "posted_intr.h"
 #include "pmu_intel.h"
 #include "tdx_ops.h"
=20
@@ -82,6 +83,13 @@ union tdx_exit_reason {
 struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
=20
+	/* Posted interrupt descriptor */
+	struct pi_desc pi_desc;
+
+	/* Used if this vCPU is waiting for PI notification wakeup. */
+	struct list_head pi_wakeup_list;
+	/* Until here same layout to struct vcpu_pi. */
+
 	struct tdx_td_page tdvpr;
 	struct tdx_td_page *tdvpx;
=20
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index f49be71290bd..4b2e610df657 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -224,6 +224,14 @@ struct nested_vmx {
=20
 struct vcpu_vmx {
 	struct kvm_vcpu       vcpu;
+
+	/* Posted interrupt descriptor */
+	struct pi_desc pi_desc;
+
+	/* Used if this vCPU is waiting for PI notification wakeup. */
+	struct list_head pi_wakeup_list;
+	/* Until here same layout to struct vcpu_pi. */
+
 	u8                    fail;
 	u8		      x2apic_msr_bitmap_mode;
=20
@@ -293,12 +301,6 @@ struct vcpu_vmx {
=20
 	union vmx_exit_reason exit_reason;
=20
-	/* Posted interrupt descriptor */
-	struct pi_desc pi_desc;
-
-	/* Used if this vCPU is waiting for PI notification wakeup. */
-	struct list_head pi_wakeup_list;
-
 	/* Support for a guest hypervisor (nested VMX) */
 	struct nested_vmx nested;
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8944CC433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:19:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383563AbiEESXR (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:23:17 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36954 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383254AbiEESTn (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:43 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 136765DA5C;
        Thu,  5 May 2022 11:15:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774556; x=1683310556;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=sR+CfdOThrEJwh/X4kqxV4Wkn0dk4EIctGsQJhrHX5E=;
  b=diKLh1gBXSlRbu91mlNe4ESGOPsVpMSkOrRg3fjrX5dS15BiZKAhL3r7
   UBuA0iyXw0ho7C0S41BDZjFZbVc6U9AYZ/oQ9QTBBlRCxF2x9Qd2ziG2q
   Ksn+DW+x5xB/USTVXxWlbugiHYuB7ln27o2vAfSe2+2vYxveb3qp9VtGo
   JUAzc1MqYR463T2TLOi5Bj3FwqxJ/OtH/PvJnFY1v6XDiVv4a3nvgb0+G
   MD5v7nxpd+3rDxZN88Gmu5aImS0wIO6Emj8ejtPCmIyXeklNwdDKsYqJ4
   dA9W0DaIMxMc/IDv8wlraulL7HGEhx93j6yPn1W6L2dvc4g94M8uHRiop
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113915"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113915"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:52 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083420"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:52 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 078/104] KVM: TDX: Implement interrupt injection
Date: Thu,  5 May 2022 11:15:12 -0700
Message-Id: 
 <ee87435c5ed87e0568c904bb29710f1463211ef7.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX supports interrupt inject into vcpu with posted interrupt.  Wire up the
corresponding kvm x86 operations to posted interrupt.  Move
kvm_vcpu_trigger_posted_interrupt() from vmx.c to common.h to share the
code.

VMX can inject interrupt by setting interrupt information field,
VM_ENTRY_INTR_INFO_FIELD, of VMCS.  TDX supports interrupt injection only
by posted interrupt.  Ignore the execution path to access
VM_ENTRY_INTR_INFO_FIELD.

As cpu state is protected and apicv is enabled for the TDX guest, VMM can
inject interrupt by updating posted interrupt descriptor.  Treat interrupt
can be injected always.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/common.h      | 71 ++++++++++++++++++++++++++
 arch/x86/kvm/vmx/main.c        | 92 ++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/posted_intr.c |  2 +-
 arch/x86/kvm/vmx/posted_intr.h |  2 +
 arch/x86/kvm/vmx/tdx.c         | 25 +++++++++
 arch/x86/kvm/vmx/vmx.c         | 67 +------------------------
 arch/x86/kvm/vmx/x86_ops.h     |  7 ++-
 7 files changed, 189 insertions(+), 77 deletions(-)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 235908f3e044..1522e9e6851b 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -4,6 +4,7 @@
=20
 #include <linux/kvm_host.h>
=20
+#include "posted_intr.h"
 #include "mmu.h"
=20
 static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t =
gpa,
@@ -30,4 +31,74 @@ static inline int __vmx_handle_ept_violation(struct kvm_=
vcpu *vcpu, gpa_t gpa,
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
=20
+static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
+						     int pi_vec)
+{
+#ifdef CONFIG_SMP
+	if (vcpu->mode =3D=3D IN_GUEST_MODE) {
+		/*
+		 * The vector of the virtual has already been set in the PIR.
+		 * Send a notification event to deliver the virtual interrupt
+		 * unless the vCPU is the currently running vCPU, i.e. the
+		 * event is being sent from a fastpath VM-Exit handler, in
+		 * which case the PIR will be synced to the vIRR before
+		 * re-entering the guest.
+		 *
+		 * When the target is not the running vCPU, the following
+		 * possibilities emerge:
+		 *
+		 * Case 1: vCPU stays in non-root mode. Sending a notification
+		 * event posts the interrupt to the vCPU.
+		 *
+		 * Case 2: vCPU exits to root mode and is still runnable. The
+		 * PIR will be synced to the vIRR before re-entering the guest.
+		 * Sending a notification event is ok as the host IRQ handler
+		 * will ignore the spurious event.
+		 *
+		 * Case 3: vCPU exits to root mode and is blocked. vcpu_block()
+		 * has already synced PIR to vIRR and never blocks the vCPU if
+		 * the vIRR is not empty. Therefore, a blocked vCPU here does
+		 * not wait for any requested interrupts in PIR, and sending a
+		 * notification event also results in a benign, spurious event.
+		 */
+
+		if (vcpu !=3D kvm_get_running_vcpu())
+			apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
+		return;
+	}
+#endif
+	/*
+	 * The vCPU isn't in the guest; wake the vCPU in case it is blocking,
+	 * otherwise do nothing as KVM will grab the highest priority pending
+	 * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
+	 */
+	kvm_vcpu_wake_up(vcpu);
+}
+
+/*
+ * Send interrupt to vcpu via posted interrupt way.
+ * 1. If target vcpu is running(non-root mode), send posted interrupt
+ * notification to vcpu and hardware will sync PIR to vIRR atomically.
+ * 2. If target vcpu isn't running(root mode), kick it to pick up the
+ * interrupt from PIR in next vmentry.
+ */
+static inline void __vmx_deliver_posted_interrupt(
+	struct kvm_vcpu *vcpu, struct pi_desc *pi_desc, int vector)
+{
+	if (pi_test_and_set_pir(vector, pi_desc))
+		return;
+
+	/* If a previous notification has sent the IPI, nothing to do.  */
+	if (pi_test_and_set_on(pi_desc))
+		return;
+
+	/*
+	 * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
+	 * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
+	 * guaranteed to see PID.ON=3D1 and sync the PIR to IRR if triggering a
+	 * posted interrupt "fails" because vcpu->mode !=3D IN_GUEST_MODE.
+	 */
+	kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
+}
+
 #endif /* __KVM_X86_VMX_COMMON_H */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index f14519c6a861..613791b50f55 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -188,6 +188,33 @@ static bool vt_protected_apic_has_interrupt(struct kvm=
_vcpu *vcpu)
 	return tdx_protected_apic_has_interrupt(vcpu);
 }
=20
+static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
+	pi_clear_on(pi);
+	memset(pi->pir, 0, sizeof(pi->pir));
+}
+
+static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return -1;
+
+	return vmx_sync_pir_to_irr(vcpu);
+}
+
+static void vt_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector)
+{
+	if (is_td_vcpu(apic->vcpu)) {
+		tdx_deliver_interrupt(apic, delivery_mode, trig_mode,
+					     vector);
+		return;
+	}
+
+	vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -237,6 +264,53 @@ static void vt_sched_in(struct kvm_vcpu *vcpu, int cpu)
 	vmx_sched_in(vcpu, cpu);
 }
=20
+static void vt_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+	vmx_set_interrupt_shadow(vcpu, mask);
+}
+
+static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return 0;
+
+	return vmx_get_interrupt_shadow(vcpu);
+}
+
+static void vt_inject_irq(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_inject_irq(vcpu);
+}
+
+static void vt_cancel_injection(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_cancel_injection(vcpu);
+}
+
+static int vt_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_interrupt_allowed(vcpu, for_injection);
+}
+
+static void vt_enable_irq_window(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_enable_irq_window(vcpu);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -313,31 +387,31 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.handle_exit =3D vmx_handle_exit,
 	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
 	.update_emulated_instruction =3D vmx_update_emulated_instruction,
-	.set_interrupt_shadow =3D vmx_set_interrupt_shadow,
-	.get_interrupt_shadow =3D vmx_get_interrupt_shadow,
+	.set_interrupt_shadow =3D vt_set_interrupt_shadow,
+	.get_interrupt_shadow =3D vt_get_interrupt_shadow,
 	.patch_hypercall =3D vmx_patch_hypercall,
-	.inject_irq =3D vmx_inject_irq,
+	.inject_irq =3D vt_inject_irq,
 	.inject_nmi =3D vmx_inject_nmi,
 	.queue_exception =3D vmx_queue_exception,
-	.cancel_injection =3D vmx_cancel_injection,
-	.interrupt_allowed =3D vmx_interrupt_allowed,
+	.cancel_injection =3D vt_cancel_injection,
+	.interrupt_allowed =3D vt_interrupt_allowed,
 	.nmi_allowed =3D vmx_nmi_allowed,
 	.get_nmi_mask =3D vmx_get_nmi_mask,
 	.set_nmi_mask =3D vmx_set_nmi_mask,
 	.enable_nmi_window =3D vmx_enable_nmi_window,
-	.enable_irq_window =3D vmx_enable_irq_window,
+	.enable_irq_window =3D vt_enable_irq_window,
 	.update_cr8_intercept =3D vmx_update_cr8_intercept,
 	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
 	.set_apic_access_page_addr =3D vmx_set_apic_access_page_addr,
 	.refresh_apicv_exec_ctrl =3D vmx_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap =3D vmx_load_eoi_exitmap,
-	.apicv_post_state_restore =3D vmx_apicv_post_state_restore,
+	.apicv_post_state_restore =3D vt_apicv_post_state_restore,
 	.check_apicv_inhibit_reasons =3D vmx_check_apicv_inhibit_reasons,
 	.hwapic_irr_update =3D vmx_hwapic_irr_update,
 	.hwapic_isr_update =3D vmx_hwapic_isr_update,
 	.guest_apic_has_interrupt =3D vmx_guest_apic_has_interrupt,
-	.sync_pir_to_irr =3D vmx_sync_pir_to_irr,
-	.deliver_interrupt =3D vmx_deliver_interrupt,
+	.sync_pir_to_irr =3D vt_sync_pir_to_irr,
+	.deliver_interrupt =3D vt_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
 	.protected_apic_has_interrupt =3D vt_protected_apic_has_interrupt,
=20
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 0bc7a848b319..50cabc8c93c1 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -50,7 +50,7 @@ static inline struct vcpu_pi *vcpu_to_pi(struct kvm_vcpu =
*vcpu)
 	return (struct vcpu_pi*)vcpu;
 }
=20
-static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 {
 	return &vcpu_to_pi(vcpu)->pi_desc;
 }
diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
index 2fe8222308b2..0f9983b6910b 100644
--- a/arch/x86/kvm/vmx/posted_intr.h
+++ b/arch/x86/kvm/vmx/posted_intr.h
@@ -105,6 +105,8 @@ struct vcpu_pi {
 	/* Until here common layout betwwn vcpu_vmx and vcpu_tdx. */
 };
=20
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu);
+
 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu);
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu);
 void pi_wakeup_handler(void);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 758af6ec3507..55acf6f1b1a3 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -7,6 +7,7 @@
=20
 #include "capabilities.h"
 #include "x86_ops.h"
+#include "common.h"
 #include "mmu.h"
 #include "tdx.h"
 #include "vmx.h"
@@ -555,6 +556,9 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.guest_state_protected =3D
 		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
=20
+	tdx->pi_desc.nv =3D POSTED_INTR_VECTOR;
+	tdx->pi_desc.sn =3D 1;
+
 	tdx->host_state_need_save =3D true;
 	tdx->host_state_need_restore =3D false;
=20
@@ -575,6 +579,7 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
=20
+	vmx_vcpu_pi_load(vcpu, cpu);
 	if (vcpu->cpu =3D=3D cpu)
 		return;
=20
@@ -788,6 +793,12 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	trace_kvm_entry(vcpu);
=20
+	if (pi_test_on(&tdx->pi_desc)) {
+		apic->send_IPI_self(POSTED_INTR_VECTOR);
+
+		kvm_wait_lapic_expire(vcpu);
+	}
+
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
 	tdx_user_return_update_cache();
@@ -1126,6 +1137,16 @@ static void tdx_handle_changed_private_spte(
 	}
 }
=20
+void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector)
+{
+	struct kvm_vcpu *vcpu =3D apic->vcpu;
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	/* TDX supports only posted interrupt.  No lapic emulation. */
+	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
@@ -1561,6 +1582,10 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __use=
r *argp)
 		return -EIO;
 	}
=20
+	td_vmcs_write16(tdx, POSTED_INTR_NV, POSTED_INTR_VECTOR);
+	td_vmcs_write64(tdx, POSTED_INTR_DESC_ADDR, __pa(&tdx->pi_desc));
+	td_vmcs_setbit32(tdx, PIN_BASED_VM_EXEC_CONTROL, PIN_BASED_POSTED_INTR);
+
 	tdx->initialized =3D true;
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index df78e2220fec..718b38239e03 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3951,50 +3951,6 @@ void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
 	pt_update_intercept_for_msr(vcpu);
 }
=20
-static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
-						     int pi_vec)
-{
-#ifdef CONFIG_SMP
-	if (vcpu->mode =3D=3D IN_GUEST_MODE) {
-		/*
-		 * The vector of the virtual has already been set in the PIR.
-		 * Send a notification event to deliver the virtual interrupt
-		 * unless the vCPU is the currently running vCPU, i.e. the
-		 * event is being sent from a fastpath VM-Exit handler, in
-		 * which case the PIR will be synced to the vIRR before
-		 * re-entering the guest.
-		 *
-		 * When the target is not the running vCPU, the following
-		 * possibilities emerge:
-		 *
-		 * Case 1: vCPU stays in non-root mode. Sending a notification
-		 * event posts the interrupt to the vCPU.
-		 *
-		 * Case 2: vCPU exits to root mode and is still runnable. The
-		 * PIR will be synced to the vIRR before re-entering the guest.
-		 * Sending a notification event is ok as the host IRQ handler
-		 * will ignore the spurious event.
-		 *
-		 * Case 3: vCPU exits to root mode and is blocked. vcpu_block()
-		 * has already synced PIR to vIRR and never blocks the vCPU if
-		 * the vIRR is not empty. Therefore, a blocked vCPU here does
-		 * not wait for any requested interrupts in PIR, and sending a
-		 * notification event also results in a benign, spurious event.
-		 */
-
-		if (vcpu !=3D kvm_get_running_vcpu())
-			apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
-		return;
-	}
-#endif
-	/*
-	 * The vCPU isn't in the guest; wake the vCPU in case it is blocking,
-	 * otherwise do nothing as KVM will grab the highest priority pending
-	 * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
-	 */
-	kvm_vcpu_wake_up(vcpu);
-}
-
 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 						int vector)
 {
@@ -4046,20 +4002,7 @@ static int vmx_deliver_posted_interrupt(struct kvm_v=
cpu *vcpu, int vector)
 	if (!vcpu->arch.apicv_active)
 		return -1;
=20
-	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
-		return 0;
-
-	/* If a previous notification has sent the IPI, nothing to do.  */
-	if (pi_test_and_set_on(&vmx->pi_desc))
-		return 0;
-
-	/*
-	 * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
-	 * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
-	 * guaranteed to see PID.ON=3D1 and sync the PIR to IRR if triggering a
-	 * posted interrupt "fails" because vcpu->mode !=3D IN_GUEST_MODE.
-	 */
-	kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
+	__vmx_deliver_posted_interrupt(vcpu, &vmx->pi_desc, vector);
 	return 0;
 }
=20
@@ -6600,14 +6543,6 @@ void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64=
 *eoi_exit_bitmap)
 	vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
 }
=20
-void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
-
-	pi_clear_on(&vmx->pi_desc);
-	memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
-}
-
 void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
=20
 static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index d1face47f547..3eeb35dee8cf 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -53,7 +53,6 @@ int vmx_check_intercept(struct kvm_vcpu *vcpu,
 bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu);
 void vmx_migrate_timers(struct kvm_vcpu *vcpu);
 void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
-void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu);
 bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason);
 void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr);
 void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr);
@@ -149,6 +148,9 @@ void tdx_vcpu_put(struct kvm_vcpu *vcpu);
 void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
=20
+void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector);
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
=20
@@ -176,6 +178,9 @@ static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) =
{}
 static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
 static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)=
 { return false; }
=20
+static inline void tdx_deliver_interrupt(
+	struct kvm_lapic *apic, int delivery_mode, int trig_mode, int vector) {}
+
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5757BC4332F
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:22:04 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383255AbiEESZj (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:25:39 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36854 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383253AbiEESTn (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:43 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F08F5DA5E;
        Thu,  5 May 2022 11:15:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774556; x=1683310556;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=xGxTjXnp3C28uC6iCdzQxF3RthvA/OE2Nb7h0Y2iDng=;
  b=Hr4a7Bx7x0hD7VQpWX8QF/l7BIdUTi3F1wVKPCvnDhmKEt/WGWwCyJrB
   3PkeLOzl20kVVwaIjOkQiuQMP95YJknjIVWJGcGhPNUTLk1bAqoHdw3tq
   G8NKpwN7DunrP6gQIeycCp7BGytdy/ykEk27o4GtHTUjKJ/a/6P4PBar1
   g+s+cq6u4UB0kfzvs1lINXS7pv6agAU5LjXk0cxZAE4RyzNzNI+qQGKQb
   hleFL8QGMH/HgiYW+L/286NyuMg7rTz38nd0kBpa8qe/rFZOrKIyerjOj
   5NEYeq6ha62swWvaHy8ddgQYDUr7m8VFit5pHtB/6gkWjj7pi/K+x3jZ7
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113919"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113919"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:52 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083425"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:52 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 079/104] KVM: TDX: Implements vcpu
 request_immediate_exit
Date: Thu,  5 May 2022 11:15:13 -0700
Message-Id: 
 <6a288ef4c20e580ea8e0a26f671f0edcf7366461.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Now we are able to inject interrupts into TDX vcpu, it's ready to block TDX
vcpu.  Wire up kvm x86 methods for blocking/unblocking vcpu for TDX.  To
unblock on pending events, request immediate exit methods is also needed.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 613791b50f55..2c07e195cb8b 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -311,6 +311,14 @@ static void vt_enable_irq_window(struct kvm_vcpu *vcpu)
 	vmx_enable_irq_window(vcpu);
 }
=20
+static void vt_request_immediate_exit(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return __kvm_request_immediate_exit(vcpu);
+
+	vmx_request_immediate_exit(vcpu);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -435,7 +443,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.check_intercept =3D vmx_check_intercept,
 	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
=20
-	.request_immediate_exit =3D vmx_request_immediate_exit,
+	.request_immediate_exit =3D vt_request_immediate_exit,
=20
 	.sched_in =3D vt_sched_in,
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CFB8DC4332F
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384014AbiEESZR (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:25:17 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36960 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383256AbiEESTn (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:43 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC3575DA60;
        Thu,  5 May 2022 11:15:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774556; x=1683310556;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=LfcaBSODx5xGiL6ffnm4yo/A3+EvKO6v548CR1+chy8=;
  b=LougT3VrKrqTmUXHHcNJYIloFWDMQPynWCmYu+zxypLP0yfXeJBFEmgU
   wgbucdjYzdLjM7A7WdAbAr/K9gvGeaSV05m3wQKrqz8quljrAguWXYdEb
   RfXjrXvdyiwoYgAJ+cz5UcMmGU66ftlCgfiYWjexgMckYYLFUbXZSJiyS
   qnk5obkcNh8QJCyOFQ6rSzS+6mxGr7oOxaT4Zr3Pt/Eybd6K33rgxifdJ
   3fdum6zGsi7YJU9rse9GLNAzJiUO5u9ntYZcUXFptxDbs5jxZy4D2Ca/5
   Njgq6+WDhgE2yJksnwnBWAu0F00F2kRUMOsK49En4VyLexqfQokrjzDdN
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113921"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113921"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:52 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083429"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:52 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 080/104] KVM: TDX: Implement methods to inject NMI
Date: Thu,  5 May 2022 11:15:14 -0700
Message-Id: 
 <286f84d0bd8dae3e4ffecf2bdc73daf76e9413dc.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX vcpu control structure defines one bit for pending NMI for VMM to
inject NMI by setting the bit without knowing TDX vcpu NMI states.  Because
the vcpu state is protected, VMM can't know about NMI states of TDX vcpu.
The TDX module handles actual injection and NMI states transition.

Add methods for NMI and treat NMI can be injected always.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c    | 62 +++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/tdx.c     |  5 +++
 arch/x86/kvm/vmx/x86_ops.h |  2 ++
 3 files changed, 64 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 2c07e195cb8b..74ca538edf46 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -247,6 +247,58 @@ static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu)
 	vmx_flush_tlb_guest(vcpu);
 }
=20
+static void vt_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_inject_nmi(vcpu);
+
+	vmx_inject_nmi(vcpu);
+}
+
+static int vt_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	/*
+	 * The TDX module manages NMI windows and NMI reinjection, and hides NMI
+	 * blocking, all KVM can do is throw an NMI over the wall.
+	 */
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_nmi_allowed(vcpu, for_injection);
+}
+
+static bool vt_get_nmi_mask(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Assume NMIs are always unmasked.  KVM could query PEND_NMI and treat
+	 * NMIs as masked if a previous NMI is still pending, but SEAMCALLs are
+	 * expensive and the end result is unchanged as the only relevant usage
+	 * of get_nmi_mask() is to limit the number of pending NMIs, i.e. it
+	 * only changes whether KVM or the TDX module drops an NMI.
+	 */
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return vmx_get_nmi_mask(vcpu);
+}
+
+static void vt_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_nmi_mask(vcpu, masked);
+}
+
+static void vt_enable_nmi_window(struct kvm_vcpu *vcpu)
+{
+	/* Refer the comment in vt_get_nmi_mask(). */
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_enable_nmi_window(vcpu);
+}
+
 static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			int pgd_level)
 {
@@ -399,14 +451,14 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.get_interrupt_shadow =3D vt_get_interrupt_shadow,
 	.patch_hypercall =3D vmx_patch_hypercall,
 	.inject_irq =3D vt_inject_irq,
-	.inject_nmi =3D vmx_inject_nmi,
+	.inject_nmi =3D vt_inject_nmi,
 	.queue_exception =3D vmx_queue_exception,
 	.cancel_injection =3D vt_cancel_injection,
 	.interrupt_allowed =3D vt_interrupt_allowed,
-	.nmi_allowed =3D vmx_nmi_allowed,
-	.get_nmi_mask =3D vmx_get_nmi_mask,
-	.set_nmi_mask =3D vmx_set_nmi_mask,
-	.enable_nmi_window =3D vmx_enable_nmi_window,
+	.nmi_allowed =3D vt_nmi_allowed,
+	.get_nmi_mask =3D vt_get_nmi_mask,
+	.set_nmi_mask =3D vt_set_nmi_mask,
+	.enable_nmi_window =3D vt_enable_nmi_window,
 	.enable_irq_window =3D vt_enable_irq_window,
 	.update_cr8_intercept =3D vmx_update_cr8_intercept,
 	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 55acf6f1b1a3..39220f63a005 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -814,6 +814,11 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
 	return EXIT_FASTPATH_NONE;
 }
=20
+void tdx_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	td_management_write8(to_tdx(vcpu), TD_VCPU_PEND_NMI, 1);
+}
+
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
 {
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 3eeb35dee8cf..0ef1e94d4196 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -150,6 +150,7 @@ bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *=
vcpu);
=20
 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 			   int trig_mode, int vector);
+void tdx_inject_nmi(struct kvm_vcpu *vcpu);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -180,6 +181,7 @@ static inline bool tdx_protected_apic_has_interrupt(str=
uct kvm_vcpu *vcpu) { ret
=20
 static inline void tdx_deliver_interrupt(
 	struct kvm_lapic *apic, int delivery_mode, int trig_mode, int vector) {}
+static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4E712C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:26 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1385109AbiEESY7 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:24:59 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36420 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383267AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DE455DA70;
        Thu,  5 May 2022 11:15:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774558; x=1683310558;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=8nUU+UGNy7mQEIUhX+nB4yYC3U4K7m46T49yAw5q968=;
  b=ZMSDQxMiJFYpYHWwVTapqvzFFJ8H91xbrjgC6JMqqEfmKTdohdm9qcw6
   1lEflFIpH1KglPfREHR4WlmVXpi8zXTYmOqQaryWXPfTlTaO709TEwaiX
   Rp4ljyiS4alTdXrQwU4SpVQR14RFTJA9QfhKt0HjW40GVX5xXG4KLyRjh
   bNmbDK3AuSGuDIj0MNWUQ/Ht1SzVfGavSnnVDf6e2qb/9T7oYg2AnsPQF
   PH7DcCddpK6gebuDXtG3YrJ8b9EctEiY+iScxkpp557GDvGTpJfGoPp1M
   5UmrhN4EwQC2E7a9H8sMPLlohEzEQ6BW5s3NrZ70+r+1Gzc3QRNzfL9Oz
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113924"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113924"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:52 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083433"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:52 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 081/104] KVM: VMX: Modify NMI and INTR handlers to take
 intr_info as function argument
Date: Thu,  5 May 2022 11:15:15 -0700
Message-Id: 
 <246546b76f01e9eec12910f9eb809a04c02c2ef1.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX uses different ABI to get information about VM exit.  Pass intr_info to
the NMI and INTR handlers instead of pulling it from vcpu_vmx in
preparation for sharing the bulk of the handlers with TDX.

When the guest TD exits to VMM, RAX holds status and exit reason, RCX holds
exit qualification etc rather than the VMCS fields because VMM doesn't have
access to the VMCS.  The eventual code will be

VMX:
  - get exit reason, intr_info, exit_qualification, and etc from VMCS
  - call NMI/INTR handlers (common code)

TDX:
  - get exit reason, intr_info, exit_qualification, and etc from guest
    registers
  - call NMI/INTR handlers (common code)

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 718b38239e03..2bcc4511ee28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6575,28 +6575,27 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu =
*vcpu)
 		rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
 }
=20
-static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
+static void handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_in=
fo)
 {
 	const unsigned long nmi_entry =3D (unsigned long)asm_exc_nmi_noist;
-	u32 intr_info =3D vmx_get_intr_info(&vmx->vcpu);
=20
 	/* if exit due to PF check for async PF */
 	if (is_page_fault(intr_info))
-		vmx->vcpu.arch.apf.host_apf_flags =3D kvm_read_and_reset_apf_flags();
+		vcpu->arch.apf.host_apf_flags =3D kvm_read_and_reset_apf_flags();
 	/* if exit due to NM, handle before interrupts are enabled */
 	else if (is_nm_fault(intr_info))
-		handle_nm_fault_irqoff(&vmx->vcpu);
+		handle_nm_fault_irqoff(vcpu);
 	/* Handle machine checks before interrupts are enabled */
 	else if (is_machine_check(intr_info))
 		kvm_machine_check();
 	/* We need to handle NMIs before interrupts are enabled */
 	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
+		handle_interrupt_nmi_irqoff(vcpu, nmi_entry);
 }
=20
-static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
+static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
+					     u32 intr_info)
 {
-	u32 intr_info =3D vmx_get_intr_info(vcpu);
 	unsigned int vector =3D intr_info & INTR_INFO_VECTOR_MASK;
 	gate_desc *desc =3D (gate_desc *)host_idt_base + vector;
=20
@@ -6615,9 +6614,9 @@ void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 		return;
=20
 	if (vmx->exit_reason.basic =3D=3D EXIT_REASON_EXTERNAL_INTERRUPT)
-		handle_external_interrupt_irqoff(vcpu);
+		handle_external_interrupt_irqoff(vcpu, vmx_get_intr_info(vcpu));
 	else if (vmx->exit_reason.basic =3D=3D EXIT_REASON_EXCEPTION_NMI)
-		handle_exception_nmi_irqoff(vmx);
+		handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
 }
=20
 /*
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B056FC433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:20 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384026AbiEESXs (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:23:48 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37002 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383281AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DE095DA76;
        Thu,  5 May 2022 11:15:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774559; x=1683310559;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=kgo3q/Y6ZqaGO4un1aT6A1szyhIMrrqRiFgjrCXXXYs=;
  b=nZvPQ9DGHSwZSzMu5mZmX87vDJnF9omHtGvaMVS96sK1sbr56Fwc/ENT
   KqjOzTr/T8PqeEzLflyv7nTqLgLhyRdKNB/0fFCPMQB0CQqNmWZNYYbd1
   FywHucpLNGPmeRWKPMqgcQqhsYHNH4Tn61ocq94E6fbknceNZxDzmVgnG
   4WqPl72EoGLP36wYQAxJCvIN8BEmPkoTN1xT18eTHyMEWmxCBvKtYJKGq
   pKIuw5Hf6h6W/gX7z7dd1VeLO7mmGmPCo2X81YRTiBKVo9ZVLAzAQh6wx
   s5MtjzO4Ig3Fp8B931z5dJbObFZRAdFgwgcqP1cW8TUZ0UbK41qsYOjxN
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113927"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113927"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:53 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083437"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:52 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 082/104] KVM: VMX: Move NMI/exception handler to common
 helper
Date: Thu,  5 May 2022 11:15:16 -0700
Message-Id: 
 <ee79b2dda7d18499c163ee4a77e18f7dc04e7d12.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX mostly handles NMI/exception exit mostly the same to VMX case.  The
difference is how to retrieve exit qualification.  To share the code with
TDX, move NMI/exception to a common header

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/common.h | 50 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c    | 60 ++++++---------------------------------
 2 files changed, 59 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 1522e9e6851b..b54e3ed6b2e1 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -4,8 +4,58 @@
=20
 #include <linux/kvm_host.h>
=20
+#include <asm/traps.h>
+
 #include "posted_intr.h"
 #include "mmu.h"
+#include "vmcs.h"
+#include "x86.h"
+
+extern unsigned long vmx_host_idt_base;
+void vmx_handle_nm_fault_irqoff(struct kvm_vcpu *vcpu);
+void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
+
+static inline void vmx_handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
+						unsigned long entry)
+{
+	bool is_nmi =3D entry =3D=3D (unsigned long)asm_exc_nmi_noist;
+
+	kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
+	vmx_do_interrupt_nmi_irqoff(entry);
+	kvm_after_interrupt(vcpu);
+}
+
+static inline void vmx_handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu,
+						u32 intr_info)
+{
+	const unsigned long nmi_entry =3D (unsigned long)asm_exc_nmi_noist;
+
+	/* if exit due to PF check for async PF */
+	if (is_page_fault(intr_info))
+		vcpu->arch.apf.host_apf_flags =3D kvm_read_and_reset_apf_flags();
+	/* if exit due to NM, handle before interrupts are enabled */
+	else if (is_nm_fault(intr_info))
+		vmx_handle_nm_fault_irqoff(vcpu);
+	/* Handle machine checks before interrupts are enabled */
+	else if (is_machine_check(intr_info))
+		kvm_machine_check();
+	/* We need to handle NMIs before interrupts are enabled */
+	else if (is_nmi(intr_info))
+		vmx_handle_interrupt_nmi_irqoff(vcpu, nmi_entry);
+}
+
+static inline void vmx_handle_external_interrupt_irqoff(struct kvm_vcpu *v=
cpu,
+							u32 intr_info)
+{
+	unsigned int vector =3D intr_info & INTR_INFO_VECTOR_MASK;
+	gate_desc *desc =3D (gate_desc *)vmx_host_idt_base + vector;
+
+	if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
+	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
+		return;
+
+	vmx_handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
+}
=20
 static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t =
gpa,
 					     unsigned long exit_qualification)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2bcc4511ee28..7949048c1acf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -463,7 +463,7 @@ static inline void vmx_segment_cache_clear(struct vcpu_=
vmx *vmx)
 	vmx->segment_cache.bitmask =3D 0;
 }
=20
-static unsigned long host_idt_base;
+unsigned long vmx_host_idt_base;
=20
 #if IS_ENABLED(CONFIG_HYPERV)
 static bool __read_mostly enlightened_vmcs =3D true;
@@ -4066,7 +4066,7 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 	vmcs_write16(HOST_SS_SELECTOR, __KERNEL_DS);  /* 22.2.4 */
 	vmcs_write16(HOST_TR_SELECTOR, GDT_ENTRY_TSS*8);  /* 22.2.4 */
=20
-	vmcs_writel(HOST_IDTR_BASE, host_idt_base);   /* 22.2.4 */
+	vmcs_writel(HOST_IDTR_BASE, vmx_host_idt_base);   /* 22.2.4 */
=20
 	vmcs_writel(HOST_RIP, (unsigned long)vmx_vmexit); /* 22.2.5 */
=20
@@ -4905,10 +4905,10 @@ static int handle_exception_nmi(struct kvm_vcpu *vc=
pu)
 	intr_info =3D vmx_get_intr_info(vcpu);
=20
 	if (is_machine_check(intr_info) || is_nmi(intr_info))
-		return 1; /* handled by handle_exception_nmi_irqoff() */
+		return 1; /* handled by vmx_handle_exception_nmi_irqoff() */
=20
 	/*
-	 * Queue the exception here instead of in handle_nm_fault_irqoff().
+	 * Queue the exception here instead of in vmx_handle_nm_fault_irqoff().
 	 * This ensures the nested_vmx check is not skipped so vmexit can
 	 * be reflected to L1 (when it intercepts #NM) before reaching this
 	 * point.
@@ -6543,19 +6543,7 @@ void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64=
 *eoi_exit_bitmap)
 	vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
 }
=20
-void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
-
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
-					unsigned long entry)
-{
-	bool is_nmi =3D entry =3D=3D (unsigned long)asm_exc_nmi_noist;
-
-	kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
-	vmx_do_interrupt_nmi_irqoff(entry);
-	kvm_after_interrupt(vcpu);
-}
-
-static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
+void vmx_handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
 {
 	/*
 	 * Save xfd_err to guest_fpu before interrupt is enabled, so the
@@ -6575,37 +6563,6 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu *=
vcpu)
 		rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
 }
=20
-static void handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_in=
fo)
-{
-	const unsigned long nmi_entry =3D (unsigned long)asm_exc_nmi_noist;
-
-	/* if exit due to PF check for async PF */
-	if (is_page_fault(intr_info))
-		vcpu->arch.apf.host_apf_flags =3D kvm_read_and_reset_apf_flags();
-	/* if exit due to NM, handle before interrupts are enabled */
-	else if (is_nm_fault(intr_info))
-		handle_nm_fault_irqoff(vcpu);
-	/* Handle machine checks before interrupts are enabled */
-	else if (is_machine_check(intr_info))
-		kvm_machine_check();
-	/* We need to handle NMIs before interrupts are enabled */
-	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(vcpu, nmi_entry);
-}
-
-static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
-					     u32 intr_info)
-{
-	unsigned int vector =3D intr_info & INTR_INFO_VECTOR_MASK;
-	gate_desc *desc =3D (gate_desc *)host_idt_base + vector;
-
-	if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
-	    "KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
-		return;
-
-	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
-}
-
 void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
@@ -6614,9 +6571,10 @@ void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 		return;
=20
 	if (vmx->exit_reason.basic =3D=3D EXIT_REASON_EXTERNAL_INTERRUPT)
-		handle_external_interrupt_irqoff(vcpu, vmx_get_intr_info(vcpu));
+		vmx_handle_external_interrupt_irqoff(vcpu,
+						     vmx_get_intr_info(vcpu));
 	else if (vmx->exit_reason.basic =3D=3D EXIT_REASON_EXCEPTION_NMI)
-		handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
+		vmx_handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
 }
=20
 /*
@@ -7863,7 +7821,7 @@ __init int vmx_hardware_setup(void)
 	int r;
=20
 	store_idt(&dt);
-	host_idt_base =3D dt.address;
+	vmx_host_idt_base =3D dt.address;
=20
 	vmx_setup_user_return_msrs();
=20
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1951EC433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:19:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1352319AbiEESXf (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:23:35 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37006 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383282AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2297E5DA7C;
        Thu,  5 May 2022 11:15:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774559; x=1683310559;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=cixCPEKKpzvmyK7DHYTRav82CT9isuogv5cupV3KuZM=;
  b=fziwpUtwaZKF5G4as7bNB4GyI78vDX+SjYVmVkLBui6JN7pwNnbJPNQ1
   xPrCCAlXu0f+iYQhQxJNHv7o1XUGFMEHnUYER41hMCjiwhAvNYUlSJdPH
   S2HuVRc85D0+6H5cjwVC1xjKNib8HRk+kdEaNPRrX/xspyCAYwKyeW4dB
   3IOpLAyUNTI02Y26KpFOwuxxobfUsZXnt2jCVyCxIrgztnrWE/NDm2EN4
   Vy0Nns/DRVDAeep0rSFF+Tuly/P2P/wxR8vMGt0Y4pjrw9hln0XWOBgXh
   lFcy2g0oYDosS7ZGfd6UVFXLMFn5xgyITCRSbOskXwoRjlzCO5gFin1vj
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113929"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113929"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:53 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083440"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:53 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 083/104] KVM: x86: Split core of hypercall emulation to
 helper function
Date: Thu,  5 May 2022 11:15:17 -0700
Message-Id: 
 <ae16ad3dc0be329cbdc131e0cdd14dce65692a17.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

By necessity, TDX will use a different register ABI for hypercalls.
Break out the core functionality so that it may be reused for TDX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 +++
 arch/x86/kvm/x86.c              | 54 ++++++++++++++++++++-------------
 2 files changed, 37 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index d6dce57fe0eb..f67fe33e6661 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1867,6 +1867,10 @@ static inline void kvm_clear_apicv_inhibit(struct kv=
m *kvm,
 	kvm_set_or_clear_apicv_inhibit(kvm, reason, false);
 }
=20
+unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long=
 nr,
+				      unsigned long a0, unsigned long a1,
+				      unsigned long a2, unsigned long a3,
+				      int op_64_bit);
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
=20
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_=
code,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2691abb46fac..5f291470a6f6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9259,26 +9259,15 @@ static int complete_hypercall_exit(struct kvm_vcpu =
*vcpu)
 	return kvm_skip_emulated_instruction(vcpu);
 }
=20
-int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
+unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long=
 nr,
+				      unsigned long a0, unsigned long a1,
+				      unsigned long a2, unsigned long a3,
+				      int op_64_bit)
 {
-	unsigned long nr, a0, a1, a2, a3, ret;
-	int op_64_bit;
-
-	if (kvm_xen_hypercall_enabled(vcpu->kvm))
-		return kvm_xen_hypercall(vcpu);
-
-	if (kvm_hv_hypercall_enabled(vcpu))
-		return kvm_hv_hypercall(vcpu);
-
-	nr =3D kvm_rax_read(vcpu);
-	a0 =3D kvm_rbx_read(vcpu);
-	a1 =3D kvm_rcx_read(vcpu);
-	a2 =3D kvm_rdx_read(vcpu);
-	a3 =3D kvm_rsi_read(vcpu);
+	unsigned long ret;
=20
 	trace_kvm_hypercall(nr, a0, a1, a2, a3);
=20
-	op_64_bit =3D is_64_bit_hypercall(vcpu);
 	if (!op_64_bit) {
 		nr &=3D 0xFFFFFFFF;
 		a0 &=3D 0xFFFFFFFF;
@@ -9287,11 +9276,6 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		a3 &=3D 0xFFFFFFFF;
 	}
=20
-	if (static_call(kvm_x86_get_cpl)(vcpu) !=3D 0) {
-		ret =3D -KVM_EPERM;
-		goto out;
-	}
-
 	ret =3D -KVM_ENOSYS;
=20
 	switch (nr) {
@@ -9350,6 +9334,34 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		ret =3D -KVM_ENOSYS;
 		break;
 	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(__kvm_emulate_hypercall);
+
+int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
+{
+	unsigned long nr, a0, a1, a2, a3, ret;
+	int op_64_bit;
+
+	if (kvm_xen_hypercall_enabled(vcpu->kvm))
+		return kvm_xen_hypercall(vcpu);
+
+	if (kvm_hv_hypercall_enabled(vcpu))
+		return kvm_hv_hypercall(vcpu);
+
+	nr =3D kvm_rax_read(vcpu);
+	a0 =3D kvm_rbx_read(vcpu);
+	a1 =3D kvm_rcx_read(vcpu);
+	a2 =3D kvm_rdx_read(vcpu);
+	a3 =3D kvm_rsi_read(vcpu);
+	op_64_bit =3D is_64_bit_hypercall(vcpu);
+
+	if (static_call(kvm_x86_get_cpl)(vcpu) !=3D 0) {
+		ret =3D -KVM_EPERM;
+		goto out;
+	}
+
+	ret =3D __kvm_emulate_hypercall(vcpu, nr, a0, a1, a2, a3, op_64_bit);
 out:
 	if (!op_64_bit)
 		ret =3D (u32)ret;
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BA2D0C433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:22:02 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384616AbiEESYe (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:24:34 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36886 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383313AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6A015DBC6;
        Thu,  5 May 2022 11:15:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774559; x=1683310559;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=VeGHay8OCa05hjiRm2lQG6GZXIFXJVMCyPZmh9n40Yk=;
  b=Zf8FG/T0R+Tl9IOx7JQ0HqJgR6Ots8CD7Vg7G7Brxez6qSaUp+KAQEwB
   fZ28EAA7aKh9SzM4TVVU/NRA8XuPaLA4oyU16PK0+JcuOviueH4xYK/z6
   k5MV67Xa+dF36i/qDkNBUtu87AR2Wn2WVN9bkjVIEeksgzmlSTeL/Qj1z
   HjvPp26mzqxsZEu1aDNK08jX1iQ/JLDWy162oeKe9I+xl29h2WGhItfa/
   lxInDu2HhLbkVUqtJ+l+9Y2FhoYQYt4mTWbvAFsBpyFGQUhdhInjx6DtZ
   3ggeaJk3KTq21nMDBBcTK0xwnxQU6BjtevI09Rgm32KQKmtdMq+dg6Ylj
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113930"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113930"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:53 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083445"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:53 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 084/104] KVM: TDX: Add a place holder to handle TDX VM
 exit
Date: Thu,  5 May 2022 11:15:18 -0700
Message-Id: 
 <731f560faad173e525f2a7649080125dc221069d.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up handle_exit and handle_exit_irqoff methods and add a place holder
to handle VM exit.  Add helper functions to get exit info, exit
qualification, etc.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c    | 35 ++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 81 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h | 11 ++++++
 3 files changed, 124 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 74ca538edf46..95b7a90aa0d7 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -188,6 +188,23 @@ static bool vt_protected_apic_has_interrupt(struct kvm=
_vcpu *vcpu)
 	return tdx_protected_apic_has_interrupt(vcpu);
 }
=20
+static int vt_handle_exit(struct kvm_vcpu *vcpu,
+			     enum exit_fastpath_completion fastpath)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_handle_exit(vcpu, fastpath);
+
+	return vmx_handle_exit(vcpu, fastpath);
+}
+
+static void vt_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_handle_exit_irqoff(vcpu);
+
+	vmx_handle_exit_irqoff(vcpu);
+}
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -371,6 +388,18 @@ static void vt_request_immediate_exit(struct kvm_vcpu =
*vcpu)
 	vmx_request_immediate_exit(vcpu);
 }
=20
+static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+			u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_get_exit_info(vcpu, reason, info1, info2, intr_info,
+				error_code);
+		return;
+	}
+
+	vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -444,7 +473,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.vcpu_pre_run =3D vt_vcpu_pre_run,
 	.vcpu_run =3D vt_vcpu_run,
-	.handle_exit =3D vmx_handle_exit,
+	.handle_exit =3D vt_handle_exit,
 	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
 	.update_emulated_instruction =3D vmx_update_emulated_instruction,
 	.set_interrupt_shadow =3D vt_set_interrupt_shadow,
@@ -479,7 +508,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.set_identity_map_addr =3D vmx_set_identity_map_addr,
 	.get_mt_mask =3D vmx_get_mt_mask,
=20
-	.get_exit_info =3D vmx_get_exit_info,
+	.get_exit_info =3D vt_get_exit_info,
=20
 	.vcpu_after_set_cpuid =3D vmx_vcpu_after_set_cpuid,
=20
@@ -493,7 +522,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.load_mmu_pgd =3D vt_load_mmu_pgd,
=20
 	.check_intercept =3D vmx_check_intercept,
-	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
+	.handle_exit_irqoff =3D vt_handle_exit_irqoff,
=20
 	.request_immediate_exit =3D vt_request_immediate_exit,
=20
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 39220f63a005..b3fc9d95fffd 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -78,6 +78,26 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u=
16 hkid)
 	return pa;
 }
=20
+static __always_inline unsigned long tdexit_exit_qual(struct kvm_vcpu *vcp=
u)
+{
+	return kvm_rcx_read(vcpu);
+}
+
+static __always_inline unsigned long tdexit_ext_exit_qual(struct kvm_vcpu =
*vcpu)
+{
+	return kvm_rdx_read(vcpu);
+}
+
+static __always_inline unsigned long tdexit_gpa(struct kvm_vcpu *vcpu)
+{
+	return kvm_r8_read(vcpu);
+}
+
+static __always_inline unsigned long tdexit_intr_info(struct kvm_vcpu *vcp=
u)
+{
+	return kvm_r9_read(vcpu);
+}
+
 static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
 {
 	return tdx->tdvpr.added;
@@ -819,6 +839,25 @@ void tdx_inject_nmi(struct kvm_vcpu *vcpu)
 	td_management_write8(to_tdx(vcpu), TD_VCPU_PEND_NMI, 1);
 }
=20
+void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	u16 exit_reason =3D tdx->exit_reason.basic;
+
+	if (exit_reason =3D=3D EXIT_REASON_EXCEPTION_NMI)
+		vmx_handle_exception_nmi_irqoff(vcpu, tdexit_intr_info(vcpu));
+	else if (exit_reason =3D=3D EXIT_REASON_EXTERNAL_INTERRUPT)
+		vmx_handle_external_interrupt_irqoff(vcpu,
+						     tdexit_intr_info(vcpu));
+}
+
+static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
+{
+	vcpu->run->exit_reason =3D KVM_EXIT_SHUTDOWN;
+	vcpu->mmio_needed =3D 0;
+	return 0;
+}
+
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
 {
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
@@ -1152,6 +1191,48 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, i=
nt delivery_mode,
 	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
 }
=20
+int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
+{
+	union tdx_exit_reason exit_reason =3D to_tdx(vcpu)->exit_reason;
+
+	if (unlikely(exit_reason.non_recoverable || exit_reason.error)) {
+		if (exit_reason.basic =3D=3D EXIT_REASON_TRIPLE_FAULT)
+			return tdx_handle_triple_fault(vcpu);
+
+		kvm_pr_unimpl("TD exit 0x%llx, %d hkid 0x%x hkid pa 0x%llx\n",
+			      exit_reason.full, exit_reason.basic,
+			      to_kvm_tdx(vcpu->kvm)->hkid,
+			      set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid));
+		goto unhandled_exit;
+	}
+
+	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
+
+	switch (exit_reason.basic) {
+	default:
+		break;
+	}
+
+unhandled_exit:
+	vcpu->run->exit_reason =3D KVM_EXIT_UNKNOWN;
+	vcpu->run->hw.hardware_exit_reason =3D exit_reason.full;
+	return 0;
+}
+
+void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	*reason =3D tdx->exit_reason.full;
+
+	*info1 =3D tdexit_exit_qual(vcpu);
+	*info2 =3D tdexit_ext_exit_qual(vcpu);
+
+	*intr_info =3D tdexit_intr_info(vcpu);
+	*error_code =3D 0;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 0ef1e94d4196..53cf6d5a72a1 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -147,10 +147,15 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcp=
u);
 void tdx_vcpu_put(struct kvm_vcpu *vcpu);
 void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
+void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu);
+int tdx_handle_exit(struct kvm_vcpu *vcpu,
+		enum exit_fastpath_completion fastpath);
=20
 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 			   int trig_mode, int vector);
 void tdx_inject_nmi(struct kvm_vcpu *vcpu);
+void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -178,10 +183,16 @@ static inline void tdx_prepare_switch_to_guest(struct=
 kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
 static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)=
 { return false; }
+static inline void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu) {}
+static inline int tdx_handle_exit(struct kvm_vcpu *vcpu,
+		enum exit_fastpath_completion fastpath) { return 0; }
=20
 static inline void tdx_deliver_interrupt(
 	struct kvm_lapic *apic, int delivery_mode, int trig_mode, int vector) {}
 static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {}
+static inline void tdx_get_exit_info(
+	struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, u64 *info2,
+	u32 *intr_info, u32 *error_code) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 964A2C4167D
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384334AbiEESYR (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:24:17 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36826 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383317AbiEESTp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:45 -0400
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A49B85DBD5;
        Thu,  5 May 2022 11:16:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774561; x=1683310561;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=SOHEj5Fn64+hYv6cZsOEgx/aKfn1VDBjHKB7akkORwQ=;
  b=FVUpTXIlGR/+Z2llGt5AfbQjR/lhclUO/hVkE/em9m+DgXIPCgm880Mc
   I+e/3OLCnbdDrwFPfW0jCPL0ANfH8mUiDLCEQHww7Zk9Q6dfj5TllCzhi
   gUSUEaYqGNL4xUarrLioophlxy16tl024xQ+z9LVAlrjfOABaThvBm+Vf
   tUZCfM6TUcmOZi132NQXTlxkT3/Y4Z4nzWGHmrZ8YywMJ9D3CYvGkpjQO
   a9m35NCwjkI/MIed3Gl6EjvHHtIfPSgYe9Nu1Iy/CqxX5yux0ZK9scbSB
   ZEsk0b7CdXJ1l3PpRmMIloM3taNLIGZvIiWY0DWiKpp2L+vLFUzrB8f6L
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248113932"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248113932"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:53 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083448"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:53 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 085/104] KVM: TDX: handle EXIT_REASON_OTHER_SMI
Date: Thu,  5 May 2022 11:15:19 -0700
Message-Id: 
 <59e90973b777eb6992e41e3d8a949e8a99711bce.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

If the control reaches EXIT_REASON_OTHER_SMI, #SMI is delivered and
handled right after returning from the TDX module to KVM nothing needs to
be done in KVM.  Continue TDX vcpu execution.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/uapi/asm/vmx.h | 1 +
 arch/x86/kvm/vmx/tdx.c          | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vm=
x.h
index 946d761adbd3..3d9b4598e166 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -34,6 +34,7 @@
 #define EXIT_REASON_TRIPLE_FAULT        2
 #define EXIT_REASON_INIT_SIGNAL			3
 #define EXIT_REASON_SIPI_SIGNAL         4
+#define EXIT_REASON_OTHER_SMI           6
=20
 #define EXIT_REASON_INTERRUPT_WINDOW    7
 #define EXIT_REASON_NMI_WINDOW          8
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b3fc9d95fffd..1f31dead31f2 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1209,6 +1209,13 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_=
t fastpath)
 	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
=20
 	switch (exit_reason.basic) {
+	case EXIT_REASON_OTHER_SMI:
+		/*
+		 * If reach here, it's not a Machine Check System Management
+		 * Interrupt(MSMI).  #SMI is delivered and handled right after
+		 * SEAMRET, nothing needs to be done in KVM.
+		 */
+		return 1;
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9A0E2C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:24:36 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S245753AbiEES2L (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:28:11 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36392 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383339AbiEESTr (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:47 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22FC01928D;
        Thu,  5 May 2022 11:16:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774564; x=1683310564;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=OPt5gdUrcNtS6wCuAiVXEkb7LwQ4j8XYfPRSPrOJvhE=;
  b=S6/aaKTEzrxheLHYNdNB3CcjXMNSurwFePKJsWl+AVVDZurPt+MeHY7y
   Y+xM+7/p9yvCZGF7DOtml2d9qPj6UkByad0nyBaZX71V85D2zzZjNyPKT
   AuYloFU5LayQnbDLvisoxem/leccjSVkfBSSa6cGsfY0TnPU+j5w4saH1
   wlXcdiakBRcSJAmnfGX10Q0slu1mguVFw2Oh212p2oADbwWO/0W/Zb7ZP
   O+B5Y4CW+lzt0wMh8psesPD2fn3HygOVgb6K8/6zjRYF+w7iJ0MIYM3Kf
   JMCs3akMxEFJI8q+QRomkCWnnmWJGzhF12tYcOebQeGZ+pCbvag8jF00A
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742059"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742059"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:53 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083452"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:53 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 086/104] KVM: TDX: handle ept violation/misconfig exit
Date: Thu,  5 May 2022 11:15:20 -0700
Message-Id: 
 <b5527f06a64678b260196ea67386c98be32f43dd.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

On EPT violation, call a common function, __vmx_handle_ept_violation() to
trigger x86 MMU code.  On EPT misconfiguration, exit to ring 3 with
KVM_EXIT_UNKNOWN.  because EPT misconfiguration can't happen as MMIO is
trigged by TDG.VP.VMCALL. No point to set a misconfiguration value for the
fast path.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 46 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 1f31dead31f2..82bfdb05e67b 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1191,6 +1191,48 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, i=
nt delivery_mode,
 	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
 }
=20
+static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qual;
+
+	if (kvm_is_private_gpa(vcpu->kvm, tdexit_gpa(vcpu))) {
+		/*
+		 * Always treat SEPT violations as write faults.  Ignore the
+		 * EXIT_QUALIFICATION reported by TDX-SEAM for SEPT violations.
+		 * TD private pages are always RWX in the SEPT tables,
+		 * i.e. they're always mapped writable.  Just as importantly,
+		 * treating SEPT violations as write faults is necessary to
+		 * avoid COW allocations, which will cause TDAUGPAGE failures
+		 * due to aliasing a single HPA to multiple GPAs.
+		 */
+#define TDX_SEPT_VIOLATION_EXIT_QUAL	EPT_VIOLATION_ACC_WRITE
+		exit_qual =3D TDX_SEPT_VIOLATION_EXIT_QUAL;
+	} else {
+		exit_qual =3D tdexit_exit_qual(vcpu);;
+		if (exit_qual & EPT_VIOLATION_ACC_INSTR) {
+			pr_warn("kvm: TDX instr fetch to shared GPA =3D 0x%lx @ RIP =3D 0x%lx\n=
",
+				tdexit_gpa(vcpu), kvm_rip_read(vcpu));
+			vcpu->run->exit_reason =3D KVM_EXIT_EXCEPTION;
+			vcpu->run->ex.exception =3D PF_VECTOR;
+			vcpu->run->ex.error_code =3D exit_qual;
+			return 0;
+		}
+	}
+
+	trace_kvm_page_fault(tdexit_gpa(vcpu), exit_qual);
+	return __vmx_handle_ept_violation(vcpu, tdexit_gpa(vcpu), exit_qual);
+}
+
+static int tdx_handle_ept_misconfig(struct kvm_vcpu *vcpu)
+{
+	WARN_ON(1);
+
+	vcpu->run->exit_reason =3D KVM_EXIT_UNKNOWN;
+	vcpu->run->hw.hardware_exit_reason =3D EXIT_REASON_EPT_MISCONFIG;
+
+	return 0;
+}
+
 int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
 {
 	union tdx_exit_reason exit_reason =3D to_tdx(vcpu)->exit_reason;
@@ -1209,6 +1251,10 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_=
t fastpath)
 	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
=20
 	switch (exit_reason.basic) {
+	case EXIT_REASON_EPT_VIOLATION:
+		return tdx_handle_ept_violation(vcpu);
+	case EXIT_REASON_EPT_MISCONFIG:
+		return tdx_handle_ept_misconfig(vcpu);
 	case EXIT_REASON_OTHER_SMI:
 		/*
 		 * If reach here, it's not a Machine Check System Management
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 34C49C43217
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384391AbiEES3W (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:29:22 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37216 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383371AbiEESTw (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:52 -0400
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3EB4A2C67A;
        Thu,  5 May 2022 11:16:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774567; x=1683310567;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=674P44PKVle2HtNNXJ57ExY7a4p/nlHO2jQBLfs+yQs=;
  b=C9siS0z2iBUzM3u4OPcBz8PfIyYk0x9JraBr8t0yhWlj2/KxCdTlgOVH
   8vFP9xKJc8ybvKt9FkNt4omkQ1wKXmUtbIavPuOb4/HgctSiFtY35/6Ew
   VzCLqPQhCw1l5UdcgpfWJwf8P3Rie6Vw4cOwCDMF2v1vbcfBSguKS5dG5
   b3LqtqUnxmDr4Cq7ijq/SR/HuXpRZTg3PfopgLHZ78HZoQEkPxYPYAEym
   Yx9CXlWLU3484+WVknqd8jJuNIeOsZB2wu/rz+zv6rgpCSJEoT7BaEsih
   bD5nFkOFJMpb7WKuFU4eCZay91TOv5kUB8TKtJZfOcZbvl1Gjnsz0etCn
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="248742060"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="248742060"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:53 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083456"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:53 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 087/104] KVM: TDX: handle EXCEPTION_NMI and
 EXTERNAL_INTERRUPT
Date: Thu,  5 May 2022 11:15:21 -0700
Message-Id: 
 <1eef3971c8fa43816bb8fae57cff75584549a1a0.1651774250.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because guest TD state is protected, exceptions in guest TDs can't be
intercepted.  TDX VMM doesn't need to handle exceptions.
tdx_handle_exit_irqoff() handles NMI and machine check.  Ignore NMI and
machine check and continue guest TD execution.

For external interrupt, increment stats same to the VMX case.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 82bfdb05e67b..2f88871b5b86 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -851,6 +851,25 @@ void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 						     tdexit_intr_info(vcpu));
 }
=20
+static int tdx_handle_exception(struct kvm_vcpu *vcpu)
+{
+	u32 intr_info =3D tdexit_intr_info(vcpu);
+
+	if (is_nmi(intr_info) || is_machine_check(intr_info))
+		return 1;
+
+	kvm_pr_unimpl("unexpected exception 0x%x(exit_reason 0x%llx qual 0x%lx)\n=
",
+		intr_info,
+		to_tdx(vcpu)->exit_reason.full, tdexit_exit_qual(vcpu));
+	return -EFAULT;
+}
+
+static int tdx_handle_external_interrupt(struct kvm_vcpu *vcpu)
+{
+	++vcpu->stat.irq_exits;
+	return 1;
+}
+
 static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
 {
 	vcpu->run->exit_reason =3D KVM_EXIT_SHUTDOWN;
@@ -1251,6 +1270,10 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_=
t fastpath)
 	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
=20
 	switch (exit_reason.basic) {
+	case EXIT_REASON_EXCEPTION_NMI:
+		return tdx_handle_exception(vcpu);
+	case EXIT_REASON_EXTERNAL_INTERRUPT:
+		return tdx_handle_external_interrupt(vcpu);
 	case EXIT_REASON_EPT_VIOLATION:
 		return tdx_handle_ept_violation(vcpu);
 	case EXIT_REASON_EPT_MISCONFIG:
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5AED7C3527D
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1385799AbiEESas (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:30:48 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36326 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383453AbiEESUL (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:20:11 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 420C95DA6F;
        Thu,  5 May 2022 11:16:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774576; x=1683310576;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=dQW6WFHQdnwK7Za9a0thwXd0uTEe0ieRUqRCfR5Ix14=;
  b=Exm/xn/GCxa5GGGmsfetlEnD4GPpYJ7WpQxDFnU0BnsR+8/QkDhvD85D
   i4wdDBtbDW2TOHg3MPoNplkuV94cjx9juwI4YAfjAQooNyxPLqHcE/wc0
   HrMWrpaYhtW5QNNcoZS9ArERXReVNpp4x1gZj/rm2gHN99HL+DbWLJio7
   DGr72LO9OpWPE4PjJ1XFmqxx7/4ws61PsAke0/KYRpGT3mkGjXL8mxe39
   GVOkn2uNDjtRJhyeGo4fMaotVuLtlG/FTWwtpiUVZoFkVgqYoON8igzQf
   XtDTiqUsZPIUGHIDqeJLmIRdskXoDKnF7goO3wVRebhukLVm3BgL7rCCF
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354869"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354869"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:54 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083459"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:53 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 088/104] KVM: TDX: Add a place holder for handler of
 TDX hypercalls (TDG.VP.VMCALL)
Date: Thu,  5 May 2022 11:15:22 -0700
Message-Id: 
 <b92b43b3d961716efab78f700d26648792ceda76.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX module specification defines TDG.VP.VMCALL API (TDVMCALL for short)
for the guest TD to call hypercall to VMM.  When the guest TD issues
TDG.VP.VMCALL, the guest TD exits to VMM with a new exit reason of
TDVMCALL.  The arguments from the guest TD and returned values from the VMM
are passed in the guest registers.  The guest RCX registers indicates which
registers are used.  Define helper functions to access those registers as
ABI.

Define the TDVMCALL exit reason, which is carved out from the VMX exit
reason namespace as the TDVMCALL exit from TDX guest to TDX-SEAM is really
just a VM-Exit.  Add a place holder to handle TDVMCALL exit.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/uapi/asm/vmx.h |  4 ++-
 arch/x86/kvm/vmx/tdx.c          | 56 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h          | 13 ++++++++
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vm=
x.h
index 3d9b4598e166..cb0a0565219a 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -92,6 +92,7 @@
 #define EXIT_REASON_UMWAIT              67
 #define EXIT_REASON_TPAUSE              68
 #define EXIT_REASON_BUS_LOCK            74
+#define EXIT_REASON_TDCALL              77
=20
 #define VMX_EXIT_REASONS \
 	{ EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
@@ -154,7 +155,8 @@
 	{ EXIT_REASON_XRSTORS,               "XRSTORS" }, \
 	{ EXIT_REASON_UMWAIT,                "UMWAIT" }, \
 	{ EXIT_REASON_TPAUSE,                "TPAUSE" }, \
-	{ EXIT_REASON_BUS_LOCK,              "BUS_LOCK" }
+	{ EXIT_REASON_BUS_LOCK,              "BUS_LOCK" }, \
+	{ EXIT_REASON_TDCALL,                "TDCALL" }
=20
 #define VMX_EXIT_REASON_FLAGS \
 	{ VMX_EXIT_REASONS_FAILED_VMENTRY,	"FAILED_VMENTRY" }
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 2f88871b5b86..3c6cf08a2e3c 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -98,6 +98,41 @@ static __always_inline unsigned long tdexit_intr_info(st=
ruct kvm_vcpu *vcpu)
 	return kvm_r9_read(vcpu);
 }
=20
+#define BUILD_TDVMCALL_ACCESSORS(param, gpr)				\
+static __always_inline							\
+unsigned long tdvmcall_##param##_read(struct kvm_vcpu *vcpu)		\
+{									\
+	return kvm_##gpr##_read(vcpu);					\
+}									\
+static __always_inline void tdvmcall_##param##_write(struct kvm_vcpu *vcpu=
, \
+						     unsigned long val)	\
+{									\
+	kvm_##gpr##_write(vcpu, val);					\
+}
+BUILD_TDVMCALL_ACCESSORS(a0, r12);
+BUILD_TDVMCALL_ACCESSORS(a1, r13);
+BUILD_TDVMCALL_ACCESSORS(a2, r14);
+BUILD_TDVMCALL_ACCESSORS(a3, r15);
+
+static __always_inline unsigned long tdvmcall_exit_type(struct kvm_vcpu *v=
cpu)
+{
+	return kvm_r10_read(vcpu);
+}
+static __always_inline unsigned long tdvmcall_leaf(struct kvm_vcpu *vcpu)
+{
+	return kvm_r11_read(vcpu);
+}
+static __always_inline void tdvmcall_set_return_code(struct kvm_vcpu *vcpu,
+						     long val)
+{
+	kvm_r10_write(vcpu, val);
+}
+static __always_inline void tdvmcall_set_return_val(struct kvm_vcpu *vcpu,
+						    unsigned long val)
+{
+	kvm_r11_write(vcpu, val);
+}
+
 static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
 {
 	return tdx->tdvpr.added;
@@ -798,7 +833,8 @@ static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu=
 *vcpu,
 					struct vcpu_tdx *tdx)
 {
 	guest_enter_irqoff();
-	tdx->exit_reason.full =3D __tdx_vcpu_run(tdx->tdvpr.pa, vcpu->arch.regs, =
0);
+	tdx->exit_reason.full =3D __tdx_vcpu_run(tdx->tdvpr.pa, vcpu->arch.regs,
+					tdx->tdvmcall.regs_mask);
 	guest_exit_irqoff();
 }
=20
@@ -831,6 +867,11 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	tdx_complete_interrupts(vcpu);
=20
+	if (tdx->exit_reason.basic =3D=3D EXIT_REASON_TDCALL)
+		tdx->tdvmcall.rcx =3D vcpu->arch.regs[VCPU_REGS_RCX];
+	else
+		tdx->tdvmcall.rcx =3D 0;
+
 	return EXIT_FASTPATH_NONE;
 }
=20
@@ -877,6 +918,17 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vc=
pu)
 	return 0;
 }
=20
+static int handle_tdvmcall(struct kvm_vcpu *vcpu)
+{
+	switch (tdvmcall_leaf(vcpu)) {
+	default:
+		break;
+	}
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+	return 1;
+}
+
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
 {
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
@@ -1274,6 +1326,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t=
 fastpath)
 		return tdx_handle_exception(vcpu);
 	case EXIT_REASON_EXTERNAL_INTERRUPT:
 		return tdx_handle_external_interrupt(vcpu);
+	case EXIT_REASON_TDCALL:
+		return handle_tdvmcall(vcpu);
 	case EXIT_REASON_EPT_VIOLATION:
 		return tdx_handle_ept_violation(vcpu);
 	case EXIT_REASON_EPT_MISCONFIG:
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 1268a49fdf18..b0bb239b51bf 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -95,6 +95,19 @@ struct vcpu_tdx {
=20
 	struct list_head cpu_list;
=20
+	union {
+		struct {
+			union {
+				struct {
+					u16 gpr_mask;
+					u16 xmm_mask;
+				};
+				u32 regs_mask;
+			};
+			u32 reserved;
+		};
+		u64 rcx;
+	} tdvmcall;
 	union tdx_exit_reason exit_reason;
=20
 	bool initialized;
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 45D30C4707E
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1385717AbiEESao (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:30:44 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36456 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383455AbiEESUL (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:20:11 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4206E5DA6E;
        Thu,  5 May 2022 11:16:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774576; x=1683310576;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=p7v50OG3pQWVTT5t2I7uMFA6egL57y4i1JmJKjiM3zE=;
  b=n64hQxiE1xXWGjDXozTTGbzz6NmRsKvQgYbiJzLeHqN2no06k5/Z96JW
   YH2QjeW+qxM2VPkqmMQFyk7/X/9BCMIOfCrgr3rPeCjixCzoTZtsyiXQC
   SREbj9JJVSqx/8nhKwdmtVT5Qr/xKuZxjU7N1Nc5de3VjYoU5mkaTZBdh
   d7uYs8iY9UY5vxRd/OnAFFscgi8/eksfLz9O17yMcGf7WCk8qy6JbDAhz
   aPaQnR4eVymgQH7kChw25x/1Qn0/ndKEO+iKndQ8QmOJJmdpVcG07AyEM
   mtiNsjo0gKpkdAPnv8zfM3i9Az6wYSRLkeVd2IZJ9oemtSW1vzL5lkiVp
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354875"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354875"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:54 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083462"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:54 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 089/104] KVM: TDX: handle KVM hypercall with
 TDG.VP.VMCALL
Date: Thu,  5 May 2022 11:15:23 -0700
Message-Id: 
 <3d6b7a3dd4b766b444546d37b7e307c2482258ca.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX Guest-Host communication interface (GHCI) specification defines
the ABI for the guest TD to issue hypercall.   It reserves vendor specific
arguments for VMM specific use.  Use it as KVM hypercall and handle it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 3c6cf08a2e3c..9c712f661a7c 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -918,8 +918,39 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vc=
pu)
 	return 0;
 }
=20
+static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
+{
+	unsigned long nr, a0, a1, a2, a3, ret;
+
+	/*
+	 * ABI for KVM tdvmcall argument:
+	 * In Guest-Hypervisor Communication Interface(GHCI) specification,
+	 * Non-zero leaf number (R10 !=3D 0) is defined to indicate
+	 * vendor-specific.  KVM uses this for KVM hypercall.  NOTE: KVM
+	 * hypercall number starts from one.  Zero isn't used for KVM hypercall
+	 * number.
+	 *
+	 * R10: KVM hypercall number
+	 * arguments: R11, R12, R13, R14.
+	 */
+	nr =3D kvm_r10_read(vcpu);
+	a0 =3D kvm_r11_read(vcpu);
+	a1 =3D kvm_r12_read(vcpu);
+	a2 =3D kvm_r13_read(vcpu);
+	a3 =3D kvm_r14_read(vcpu);
+
+	ret =3D __kvm_emulate_hypercall(vcpu, nr, a0, a1, a2, a3, true);
+
+	tdvmcall_set_return_code(vcpu, ret);
+
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
+	if (tdvmcall_exit_type(vcpu))
+		return tdx_emulate_vmcall(vcpu);
+
 	switch (tdvmcall_leaf(vcpu)) {
 	default:
 		break;
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 88ACFC433FE
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:19:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383881AbiEESXG (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:23:06 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36370 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383239AbiEESTm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:42 -0400
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA7EB5DA52;
        Thu,  5 May 2022 11:15:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774555; x=1683310555;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=J3jJT6bCwC1e/7tz6SuLrzZPSdZIzYDNXbQn9zrg0R0=;
  b=Zr9db+rJPG5TL7n/0XZ3eCwNHj9PT7/TqpT8XIwDK+Y9lul2x5+PBbn6
   3YqHRv4CMVT7qUHDkgwjTMppWY4hw9khrGa+4ItcdOKTT/XlNqX0wAgUM
   IAgbB90GlKUQcpYLIMAvDSZvMTNtd2d1eKRPjJhZDWmvevQCEdKXMDVuL
   7pqY/zUgXltuCEBsAPeZZbtQZtZDtUysb6YfQBv/0Rlaa93svpQm0x4W7
   k63UyPAAx9G6RtMwucrvytg0dPA50Dsjl555glFD36AT9e7HUvjIeVYYL
   +K5u9V/z5QpaJNJnQDj13fsyQPAahDCDWhqbRad0pMcU8BJnbK03P0hK8
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268097115"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268097115"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:54 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083466"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:54 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 090/104] KVM: TDX: Handle TDX PV CPUID hypercall
Date: Thu,  5 May 2022 11:15:24 -0700
Message-Id: 
 <98939c0ec83a109c8f49045e82096d6cdd5dafa3.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV CPUID hypercall to the KVM backend function.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 9c712f661a7c..c7cdfee397ec 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -946,12 +946,34 @@ static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_emulate_cpuid(struct kvm_vcpu *vcpu)
+{
+	u32 eax, ebx, ecx, edx;
+
+	/* EAX and ECX for cpuid is stored in R12 and R13. */
+	eax =3D tdvmcall_a0_read(vcpu);
+	ecx =3D tdvmcall_a1_read(vcpu);
+
+	kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, true);
+
+	tdvmcall_a0_write(vcpu, eax);
+	tdvmcall_a1_write(vcpu, ebx);
+	tdvmcall_a2_write(vcpu, ecx);
+	tdvmcall_a3_write(vcpu, edx);
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
 		return tdx_emulate_vmcall(vcpu);
=20
 	switch (tdvmcall_leaf(vcpu)) {
+	case EXIT_REASON_CPUID:
+		return tdx_emulate_cpuid(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AFFF9C433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:47 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1385608AbiEESak (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:30:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37294 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383520AbiEESUi (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:20:38 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 438135DBD5;
        Thu,  5 May 2022 11:16:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774583; x=1683310583;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=xE7g35oLs8X/G1hv5apUVp7r06NZAdcT7tVknFwA1W0=;
  b=nwC4MINmtibQZ4zbGtXYIe1w+FCUAiRotweNWiFBGhIBgA8ZMzLndlxX
   cRtcBgnTgU60LsxNf7ekbRxOCDYmr4hf1UMobYjgVGEE8I+gNFZWBAEMB
   7tvHMKu7zyZgvcZa7PYOQcl9kAIu2Gk0d0EWX+a/xwuLZloCDNdfFa3lO
   M0jDSZoUfJ1IsTShfcCv8sBOqeFyTo1qot41rslVa7eVujZRz6spwau3W
   Pjrk8gRoGpAg5/Pp9Cn+arn6d8Jt0NiPGvG8NQYBrkZU3Lk2K0iV5jlAB
   FVaxNcrwu4gybdTJonqXSyk3AGA1svCRO6wwlF30UbA6XrOQL/VkRuBRL
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354877"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354877"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:54 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083469"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:54 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 091/104] KVM: TDX: Handle TDX PV HLT hypercall
Date: Thu,  5 May 2022 11:15:25 -0700
Message-Id: 
 <c020ee0b90c424a7010e979c9b32a28e9c488a51.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV HLT hypercall to the KVM backend function.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h |  3 +++
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index c7cdfee397ec..631917eb873e 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -653,7 +653,32 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
=20
 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
 {
-	return pi_has_pending_interrupt(vcpu);
+	bool ret =3D pi_has_pending_interrupt(vcpu);
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	if (ret || vcpu->arch.mp_state !=3D KVM_MP_STATE_HALTED)
+		return true;
+
+	if (tdx->interrupt_disabled_hlt)
+		return false;
+
+	/*
+	 * This is for the case where the virtual interrupt is recognized,
+	 * i.e. set in vmcs.RVI, between the STI and "HLT".  KVM doesn't have
+	 * access to RVI and the interrupt is no longer in the PID (because it
+	 * was "recognized".  It doesn't get delivered in the guest because the
+	 * TDCALL completes before interrupts are enabled.
+	 *
+	 * TDX modules sets RVI while in an STI interrupt shadow.
+	 * - TDExit(typically TDG.VP.VMCALL<HLT>) from the guest to TDX module.
+	 *   The interrupt shadow at this point is gone.
+	 * - It knows that there is an interrupt that can be delivered
+	 *   (RVI > PPR && EFLAGS.IF=3D1, the other conditions of 29.2.2 don't
+	 *    matter)
+	 * - It forwards the TDExit nevertheless, to a clueless hypervisor that
+	 *   has no way to glean either RVI or PPR.
+	 */
+	return !!xchg(&tdx->buggy_hlt_workaround, 0);
 }
=20
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
@@ -966,6 +991,17 @@ static int tdx_emulate_cpuid(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_emulate_hlt(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	/* See tdx_protected_apic_has_interrupt() to avoid heavy seamcall */
+	tdx->interrupt_disabled_hlt =3D tdvmcall_a0_read(vcpu);;
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	return kvm_emulate_halt_noskip(vcpu);
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -974,6 +1010,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 	switch (tdvmcall_leaf(vcpu)) {
 	case EXIT_REASON_CPUID:
 		return tdx_emulate_cpuid(vcpu);
+	case EXIT_REASON_HLT:
+		return tdx_emulate_hlt(vcpu);
 	default:
 		break;
 	}
@@ -1311,6 +1349,8 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, in=
t delivery_mode,
 	struct kvm_vcpu *vcpu =3D apic->vcpu;
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
=20
+	/* See comment in tdx_protected_apic_has_interrupt(). */
+	tdx->buggy_hlt_workaround =3D 1;
 	/* TDX supports only posted interrupt.  No lapic emulation. */
 	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
 }
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index b0bb239b51bf..a456ca6ec187 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -116,6 +116,9 @@ struct vcpu_tdx {
 	bool host_state_need_restore;
 	u64 msr_host_kernel_gs_base;
=20
+	bool interrupt_disabled_hlt;
+	unsigned int buggy_hlt_workaround;
+
 	/*
 	 * Dummy to make pmu_intel not corrupt memory.
 	 * TODO: Support PMU for TDX.  Future work.
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 17DF0C4707A
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1385414AbiEESa0 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:30:26 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37304 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1376753AbiEESUq (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:20:46 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AA165DBDC;
        Thu,  5 May 2022 11:16:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774584; x=1683310584;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=J4ioLk80QYEEpaXoOYW7VeLlRYf+vl8Y8/7LMbluPFM=;
  b=K6BWxUGz+xPNWcbwSX9N6uYWw/AsjgvI9hQVedqu9LZpcpZ7LDpaimMQ
   GWiXn133orzIHIbELDgEcwvJ4sQ9IcS1IYBimOapHnEwye+AzWTzaAU1O
   /L6UQbOJTyi7h0tGiuOToOy8CLjY9WkUf/0neztCrC6ZzU7KibW9AaIes
   91sFxkX5rAdFG85TCxrNbvYRhTzKc1jICm5UWMXVYdJqTlUOMyc4T/T/D
   e3LmKLTsccJQqjRrht1QzTfgnHyVY9vZyXz9Vh8bhK1InLKsRmwakFHQQ
   7qbAcmu2nNZpNF6/hsDBXbn2pLQD/O2c4zU6aaQI7CGuMUm/HTzXtHGif
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354881"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354881"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083473"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:54 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 092/104] KVM: TDX: Handle TDX PV port io hypercall
Date: Thu,  5 May 2022 11:15:26 -0700
Message-Id: 
 <937aebe9071bd944b2672cdf949ae03deac1e2d1.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV port IO hypercall to the KVM backend function.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 57 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 631917eb873e..ee0cf5336ade 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1002,6 +1002,61 @@ static int tdx_emulate_hlt(struct kvm_vcpu *vcpu)
 	return kvm_emulate_halt_noskip(vcpu);
 }
=20
+static int tdx_complete_pio_in(struct kvm_vcpu *vcpu)
+{
+	struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt;
+	unsigned long val =3D 0;
+	int ret;
+
+	WARN_ON(vcpu->arch.pio.count !=3D 1);
+
+	ret =3D ctxt->ops->pio_in_emulated(ctxt, vcpu->arch.pio.size,
+					 vcpu->arch.pio.port, &val, 1);
+	WARN_ON(!ret);
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	tdvmcall_set_return_val(vcpu, val);
+
+	return 1;
+}
+
+static int tdx_emulate_io(struct kvm_vcpu *vcpu)
+{
+	struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt;
+	unsigned long val =3D 0;
+	unsigned int port;
+	int size, ret;
+	bool write;
+
+	++vcpu->stat.io_exits;
+
+	size =3D tdvmcall_a0_read(vcpu);
+	write =3D tdvmcall_a1_read(vcpu);
+	port =3D tdvmcall_a2_read(vcpu);
+
+	if (size !=3D 1 && size !=3D 2 && size !=3D 4) {
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+		return 1;
+	}
+
+	if (write) {
+		val =3D tdvmcall_a3_read(vcpu);
+		ret =3D ctxt->ops->pio_out_emulated(ctxt, size, port, &val, 1);
+
+		/* No need for a complete_userspace_io callback. */
+		vcpu->arch.pio.count =3D 0;
+	} else {
+		ret =3D ctxt->ops->pio_in_emulated(ctxt, size, port, &val, 1);
+		if (!ret)
+			vcpu->arch.complete_userspace_io =3D tdx_complete_pio_in;
+		else
+			tdvmcall_set_return_val(vcpu, val);
+	}
+	if (ret)
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	return ret;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -1012,6 +1067,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_cpuid(vcpu);
 	case EXIT_REASON_HLT:
 		return tdx_emulate_hlt(vcpu);
+	case EXIT_REASON_IO_INSTRUCTION:
+		return tdx_emulate_io(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4F54BC433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:19:34 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383726AbiEESXK (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:23:10 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36958 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383255AbiEESTn (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:43 -0400
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B045A5DA5F;
        Thu,  5 May 2022 11:15:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774556; x=1683310556;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=us0aw+tDgkcNliX/K/zSuPp2+zwEh9mJDQ0zdicKY1g=;
  b=BOM/q6Hwj7zaxpKaDG6S+g2H9/KVJ8ttBF6EbhERFxPvpzuE2CMYgUpR
   7afp1uK7Z3jNh/8FDyeMrlKiYcPT5Gi7QEEwDul5CykV1q4ItRKfOCZut
   ybYizePXXX/NbalrKdMKBCWtD+8kzZgdzH2JXtq80GzG5TFwnCkKb3HRZ
   NlGF25C7TmxD5JyR28N+s9NE1MvTQZpNoj8AVTgl+97bwvxuffxV5z2ZF
   iSsdFz2/7OaS7PDUyzNrPAYnga1Qx4bTovtuSfxcBO3g6+hJNwRljbm//
   +lxHcnwpHLGE+p98iPlRT+JnrIi77MZtx8fuvUs4Mi14suDdEjMPz3QsH
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268097116"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268097116"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083479"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:54 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 093/104] KVM: TDX: Handle TDX PV MMIO hypercall
Date: Thu,  5 May 2022 11:15:27 -0700
Message-Id: 
 <ea5e6a1fc740cfe69167c8713b63fdb952a98e8b.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Export kvm_io_bus_read and kvm_mmio tracepoint and wire up TDX PV MMIO
hypercall to the KVM backend functions.

kvm_io_bus_read/write() searches KVM device emulated in kernel of the given
MMIO address and emulates the MMIO.  As TDX PV MMIO also needs it, export
kvm_io_bus_read().  kvm_io_bus_write() is already exported.  TDX PV MMIO
emulates some of MMIO itself.  To add trace point consistently with x86
kvm, export kvm_mmio tracepoint.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 114 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c     |   1 +
 virt/kvm/kvm_main.c    |   2 +
 3 files changed, 117 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index ee0cf5336ade..6ab4a52fc9e9 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1057,6 +1057,118 @@ static int tdx_emulate_io(struct kvm_vcpu *vcpu)
 	return ret;
 }
=20
+static int tdx_complete_mmio(struct kvm_vcpu *vcpu)
+{
+	unsigned long val =3D 0;
+	gpa_t gpa;
+	int size;
+
+	WARN_ON(vcpu->mmio_needed !=3D 1);
+	vcpu->mmio_needed =3D 0;
+
+	if (!vcpu->mmio_is_write) {
+		gpa =3D vcpu->mmio_fragments[0].gpa;
+		size =3D vcpu->mmio_fragments[0].len;
+
+		memcpy(&val, vcpu->run->mmio.data, size);
+		tdvmcall_set_return_val(vcpu, val);
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val);
+	}
+	return 1;
+}
+
+static inline int tdx_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, int siz=
e,
+				 unsigned long val)
+{
+	if (kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) &&
+	    kvm_io_bus_write(vcpu, KVM_MMIO_BUS, gpa, size, &val))
+		return -EOPNOTSUPP;
+
+	trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val);
+	return 0;
+}
+
+static inline int tdx_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, int size)
+{
+	unsigned long val;
+
+	if (kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) &&
+	    kvm_io_bus_read(vcpu, KVM_MMIO_BUS, gpa, size, &val))
+		return -EOPNOTSUPP;
+
+	tdvmcall_set_return_val(vcpu, val);
+	trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val);
+	return 0;
+}
+
+static int tdx_emulate_mmio(struct kvm_vcpu *vcpu)
+{
+	struct kvm_memory_slot *slot;
+	int size, write, r;
+	unsigned long val;
+	gpa_t gpa;
+
+	WARN_ON(vcpu->mmio_needed);
+
+	size =3D tdvmcall_a0_read(vcpu);
+	write =3D tdvmcall_a1_read(vcpu);
+	gpa =3D tdvmcall_a2_read(vcpu);
+	val =3D write ? tdvmcall_a3_read(vcpu) : 0;
+
+	if (size !=3D 1 && size !=3D 2 && size !=3D 4 && size !=3D 8)
+		goto error;
+	if (write !=3D 0 && write !=3D 1)
+		goto error;
+
+	/* Strip the shared bit, allow MMIO with and without it set. */
+	gpa =3D gpa & ~gfn_to_gpa(kvm_gfn_shared_mask(vcpu->kvm));
+
+	if (size > 8u || ((gpa + size - 1) ^ gpa) & PAGE_MASK)
+		goto error;
+
+	slot =3D kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(gpa));
+	if (slot && !(slot->flags & KVM_MEMSLOT_INVALID))
+		goto error;
+
+	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
+		trace_kvm_fast_mmio(gpa);
+		return 1;
+	}
+
+	if (write)
+		r =3D tdx_mmio_write(vcpu, gpa, size, val);
+	else
+		r =3D tdx_mmio_read(vcpu, gpa, size);
+	if (!r) {
+		/* Kernel completed device emulation. */
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+		return 1;
+	}
+
+	/* Request the device emulation to userspace device model. */
+	vcpu->mmio_needed =3D 1;
+	vcpu->mmio_is_write =3D write;
+	vcpu->arch.complete_userspace_io =3D tdx_complete_mmio;
+
+	vcpu->run->mmio.phys_addr =3D gpa;
+	vcpu->run->mmio.len =3D size;
+	vcpu->run->mmio.is_write =3D write;
+	vcpu->run->exit_reason =3D KVM_EXIT_MMIO;
+
+	if (write) {
+		memcpy(vcpu->run->mmio.data, &val, size);
+	} else {
+		vcpu->mmio_fragments[0].gpa =3D gpa;
+		vcpu->mmio_fragments[0].len =3D size;
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, size, gpa, NULL);
+	}
+	return 0;
+
+error:
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -1069,6 +1181,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_hlt(vcpu);
 	case EXIT_REASON_IO_INSTRUCTION:
 		return tdx_emulate_io(vcpu);
+	case EXIT_REASON_EPT_VIOLATION:
+		return tdx_emulate_mmio(vcpu);
 	default:
 		break;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f291470a6f6..f367d0dcef97 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13166,6 +13166,7 @@ bool kvm_arch_dirty_log_supported(struct kvm *kvm)
=20
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4bf7178e42bd..7f01131666de 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2294,6 +2294,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struc=
t kvm_vcpu *vcpu, gfn_t gfn
=20
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot);
=20
 bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
 {
@@ -5169,6 +5170,7 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_b=
us bus_idx, gpa_t addr,
 	r =3D __kvm_io_bus_read(vcpu, bus, &range, val);
 	return r < 0 ? r : 0;
 }
+EXPORT_SYMBOL_GPL(kvm_io_bus_read);
=20
 /* Caller must hold slots_lock. */
 int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t a=
ddr,
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A685BC433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:19:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1344550AbiEESX3 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:23:29 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36454 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383263AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F2FB5DA63;
        Thu,  5 May 2022 11:15:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774557; x=1683310557;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=tDqkocHNZGy04mZLMR2UzVcdtaLO9xWmd1EIda1vppM=;
  b=OiynojTl/w0hCYeic1dP74LXgBkDA+BASyUYLw6n53kRuGEid2SAQNal
   3fHkk0lG4egsT1YxrqryF2EOocLgmWxqmLspiUNho86znC8ASEnCbwLwA
   INZI06TZZK8sCHBFWiElIyDUXIm5ogzRuiFeEO+pacbHNtQY+izPIWGby
   SEybUsgMQmDqWmTtooqgQpMqC1eD0G+HAWY6Ikh/7Y4demzmVnTcj+Ma9
   gGJf9v4n3fAvVoiFxX+Zqa/u5ct6sVubUMbJSclbeMjcpHeb8ZB9QJ4cI
   k3kXGnhyKvHfxqpUxct+boa3CjNcUlSkkevc9LvDN9i1NPb2UV5MW1/M2
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268097118"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268097118"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083484"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:54 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 094/104] KVM: TDX: Implement callbacks for MSR
 operations for TDX
Date: Thu,  5 May 2022 11:15:28 -0700
Message-Id: 
 <678fcee8bfaaea23cdf3ea23c2aed361308f20f3.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implements set_msr/get_msr/has_emulated_msr methods for TDX to handle
hypercall from guest TD for paravirtualized rdmsr and wrmsr.  The TDX
module virtualizes MSRs.  For some MSRs, it injects #VE to the guest TD
upon RDMSR or WRMSR.  The exact list of such MSRs are defined in the spec.

Upon #VE, the guest TD may execute hypercalls,
TDG.VP.VMCALL<INSTRUCTION.RDMSR> and TDG.VP.VMCALL<INSTRUCTION.WRMSR>,
which are defined in GHCI (Guest-Host Communication Interface) so that the
host VMM (e.g. KVM) can virtualizes the MSRs.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c    | 34 +++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 68 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  6 ++++
 3 files changed, 105 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 95b7a90aa0d7..dec9689afab2 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -205,6 +205,34 @@ static void vt_handle_exit_irqoff(struct kvm_vcpu *vcp=
u)
 	vmx_handle_exit_irqoff(vcpu);
 }
=20
+static int vt_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_set_msr(vcpu, msr_info);
+
+	return vmx_set_msr(vcpu, msr_info);
+}
+
+/*
+ * The kvm parameter can be NULL (module initialization, or invocation bef=
ore
+ * VM creation). Be sure to check the kvm parameter before using it.
+ */
+static bool vt_has_emulated_msr(struct kvm *kvm, u32 index)
+{
+	if (kvm && is_td(kvm))
+		return tdx_is_emulated_msr(index, true);
+
+	return vmx_has_emulated_msr(kvm, index);
+}
+
+static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_get_msr(vcpu, msr_info);
+
+	return vmx_get_msr(vcpu, msr_info);
+}
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -424,7 +452,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.hardware_enable =3D vt_hardware_enable,
 	.hardware_disable =3D vt_hardware_disable,
-	.has_emulated_msr =3D vmx_has_emulated_msr,
+	.has_emulated_msr =3D vt_has_emulated_msr,
=20
 	.is_vm_type_supported =3D vt_is_vm_type_supported,
 	.vm_size =3D sizeof(struct kvm_vmx),
@@ -444,8 +472,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.update_exception_bitmap =3D vmx_update_exception_bitmap,
 	.get_msr_feature =3D vmx_get_msr_feature,
-	.get_msr =3D vmx_get_msr,
-	.set_msr =3D vmx_set_msr,
+	.get_msr =3D vt_get_msr,
+	.set_msr =3D vt_set_msr,
 	.get_segment_base =3D vmx_get_segment_base,
 	.get_segment =3D vmx_get_segment,
 	.set_segment =3D vmx_set_segment,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6ab4a52fc9e9..f46825843a8b 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1627,6 +1627,74 @@ void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *r=
eason,
 	*error_code =3D 0;
 }
=20
+bool tdx_is_emulated_msr(u32 index, bool write)
+{
+	switch (index) {
+	case MSR_IA32_UCODE_REV:
+	case MSR_IA32_ARCH_CAPABILITIES:
+	case MSR_IA32_POWER_CTL:
+	case MSR_MTRRcap:
+	case 0x200 ... 0x26f:
+		/* IA32_MTRR_PHYS{BASE, MASK}, IA32_MTRR_FIX*_* */
+	case MSR_IA32_CR_PAT:
+	case MSR_MTRRdefType:
+	case MSR_IA32_TSC_DEADLINE:
+	case MSR_IA32_MISC_ENABLE:
+	case MSR_KVM_STEAL_TIME:
+	case MSR_KVM_POLL_CONTROL:
+	case MSR_PLATFORM_INFO:
+	case MSR_MISC_FEATURES_ENABLES:
+	case MSR_IA32_MCG_CAP:
+	case MSR_IA32_MCG_STATUS:
+	case MSR_IA32_MCG_CTL:
+	case MSR_IA32_MCG_EXT_CTL:
+	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_MISC(28) - 1:
+		/* MSR_IA32_MCx_{CTL, STATUS, ADDR, MISC} */
+		return true;
+	case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
+		/*
+		 * x2APIC registers that are virtualized by the CPU can't be
+		 * emulated, KVM doesn't have access to the virtual APIC page.
+		 */
+		switch (index) {
+		case X2APIC_MSR(APIC_TASKPRI):
+		case X2APIC_MSR(APIC_PROCPRI):
+		case X2APIC_MSR(APIC_EOI):
+		case X2APIC_MSR(APIC_ISR) ... X2APIC_MSR(APIC_ISR + APIC_ISR_NR):
+		case X2APIC_MSR(APIC_TMR) ... X2APIC_MSR(APIC_TMR + APIC_ISR_NR):
+		case X2APIC_MSR(APIC_IRR) ... X2APIC_MSR(APIC_IRR + APIC_ISR_NR):
+			return false;
+		default:
+			return true;
+		}
+	case MSR_IA32_APICBASE:
+	case MSR_EFER:
+		return !write;
+	case MSR_IA32_MCx_CTL2(0) ... MSR_IA32_MCx_CTL2(31):
+		/*
+		 * 0x280 - 0x29f: The x86 common code doesn't emulate MCx_CTL2.
+		 * Refer to kvm_{get,set}_msr_common(),
+		 * kvm_mtrr_{get, set}_msr(), and msr_mtrr_valid().
+		 */
+	default:
+		return false;
+	}
+}
+
+int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
+{
+	if (tdx_is_emulated_msr(msr->index, false))
+		return kvm_get_msr_common(vcpu, msr);
+	return 1;
+}
+
+int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
+{
+	if (tdx_is_emulated_msr(msr->index, true))
+		return kvm_set_msr_common(vcpu, msr);
+	return 1;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 53cf6d5a72a1..64e7da448906 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -156,6 +156,9 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, int =
delivery_mode,
 void tdx_inject_nmi(struct kvm_vcpu *vcpu);
 void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
 		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
+bool tdx_is_emulated_msr(u32 index, bool write);
+int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
+int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -193,6 +196,9 @@ static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu=
) {}
 static inline void tdx_get_exit_info(
 	struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, u64 *info2,
 	u32 *intr_info, u32 *error_code) {}
+static inline bool tdx_is_emulated_msr(u32 index, bool write) { return fal=
se; }
+static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
+static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 30597C3527B
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1385501AbiEESag (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:30:36 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37306 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383534AbiEESUs (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:20:48 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DDB35DBDD;
        Thu,  5 May 2022 11:16:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774584; x=1683310584;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=pvd1gmJexYUlNNp6eiYlAPCc6di3vHS+fVQ39zW8Fms=;
  b=AMpmChKn3jBpLw0pIjH6zMDy8uTBZDUVe4xDMIv9Cf4kjr9gvgIvjK5+
   7ioYLaRvEemxZxYDwWFnohU3xwr055mpSKq7oyp4SxcWcLLrUKQy3xjMY
   TZ3NHYzZRDb8eOCmtU2n0lCvyNQUFPXr9+/H6APHTyVOspvX2MNQGhs9W
   IohXjQPHWBNCz6KUi+AmPSIPuGEa+UG8KY8iUIqYEto43BUsD6amb//8H
   uJvP/CQZRSDP98cW7a/zNqZ1JEryh/ZrUL4YChxbuogGw+7AaWki4EGsO
   X7doD1u8WXPCK4wth4ePVopiu+3TVNti84KlW2Ll63FWW8wMCny46RyU+
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354891"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354891"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083488"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 095/104] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall
Date: Thu,  5 May 2022 11:15:29 -0700
Message-Id: 
 <9a45667060dd2f8634bf1ecba23b89567c7e46e7.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV rdmsr/wrmsr hypercall to the KVM backend function.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index f46825843a8b..1518a8c310d6 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1169,6 +1169,39 @@ static int tdx_emulate_mmio(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_emulate_rdmsr(struct kvm_vcpu *vcpu)
+{
+	u32 index =3D tdvmcall_a0_read(vcpu);
+	u64 data;
+
+	if (kvm_get_msr(vcpu, index, &data)) {
+		trace_kvm_msr_read_ex(index);
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+		return 1;
+	}
+	trace_kvm_msr_read(index, data);
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	tdvmcall_set_return_val(vcpu, data);
+	return 1;
+}
+
+static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu)
+{
+	u32 index =3D tdvmcall_a0_read(vcpu);
+	u64 data =3D tdvmcall_a1_read(vcpu);
+
+	if (kvm_set_msr(vcpu, index, data)) {
+		trace_kvm_msr_write_ex(index, data);
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+		return 1;
+	}
+
+	trace_kvm_msr_write(index, data);
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -1183,6 +1216,10 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_io(vcpu);
 	case EXIT_REASON_EPT_VIOLATION:
 		return tdx_emulate_mmio(vcpu);
+	case EXIT_REASON_MSR_READ:
+		return tdx_emulate_rdmsr(vcpu);
+	case EXIT_REASON_MSR_WRITE:
+		return tdx_emulate_wrmsr(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D22F4C4167E
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384760AbiEESYl (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:24:41 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36860 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383273AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAC605DA6D;
        Thu,  5 May 2022 11:15:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774558; x=1683310558;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=5gKEBATWdi7brsOaUqd9Sxn2b02OlGwx2H6cEqgaWQU=;
  b=Qe02oA3Hxdk5tamuLi064gGfQzUVVXL4ZArhXrF1I2/jaJNYw9WKJ24w
   266q5VAO5suDwwIZYP0rfHa0QyzKEm5ati93odo8WwAXgnBC2zT6rY8+T
   MHzuSakX0/wiCetypb+jQ7ruQY8ptR9b9KIRlSbYdg5u1t088uqrYbmS6
   yAMrbdABgE/mEwBnn4nTyZ0kCBk7n77VyAjb8CDdS6rsTlNRqdOWkRY7k
   RMIoTWXUkmQz6TTR3cb19RSYtCndnZEWZrwRgcl7h26E9W654dQe9J9Uu
   kNs0RmhoCjydCc4P7igEp3+SBWvIVw+eujJpHXDGsjOdGnstrqikqlpDy
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268097119"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268097119"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083491"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 096/104] KVM: TDX: Handle TDX PV report fatal error
 hypercall
Date: Thu,  5 May 2022 11:15:30 -0700
Message-Id: 
 <6bb829189d8d94c64fb42db2d84f7519b7d29359.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV report fatal error hypercall to KVM_SYSTEM_EVENT_CRASH KVM
exit event.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c   | 21 +++++++++++++++++++++
 include/uapi/linux/kvm.h |  1 +
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 1518a8c310d6..ee83539d5228 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1202,6 +1202,25 @@ static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Exit to userspace device model for teardown.
+	 * Because guest TD is already panicing, returning an error to guerst TD
+	 * doesn't make sense.  No argument check is done.
+	 */
+
+	vcpu->run->exit_reason =3D KVM_EXIT_SYSTEM_EVENT;
+	vcpu->run->system_event.type =3D
+		KVM_SYSTEM_EVENT_TDX | KVM_SYSTEM_EVENT_NDATA_VALID;
+	vcpu->run->system_event.ndata =3D 3;
+	vcpu->run->system_event.data[0] =3D TDG_VP_VMCALL_REPORT_FATAL_ERROR;
+	vcpu->run->system_event.data[1] =3D tdvmcall_a0_read(vcpu);
+	vcpu->run->system_event.data[2] =3D tdvmcall_a1_read(vcpu);
+
+	return 0;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -1220,6 +1239,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_rdmsr(vcpu);
 	case EXIT_REASON_MSR_WRITE:
 		return tdx_emulate_wrmsr(vcpu);
+	case TDG_VP_VMCALL_REPORT_FATAL_ERROR:
+		return tdx_report_fatal_error(vcpu);
 	default:
 		break;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9a3fd7b41fc5..df1b89ffdac6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -445,6 +445,7 @@ struct kvm_run {
 #define KVM_SYSTEM_EVENT_RESET          2
 #define KVM_SYSTEM_EVENT_CRASH          3
 #define KVM_SYSTEM_EVENT_SEV_TERM       4
+#define KVM_SYSTEM_EVENT_TDX            5
 #define KVM_SYSTEM_EVENT_NDATA_VALID    (1u << 31)
 			__u32 type;
 			__u32 ndata;
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E7E37C3527A
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384908AbiEESYr (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:24:47 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37000 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383279AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DF8B5DA77;
        Thu,  5 May 2022 11:15:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774559; x=1683310559;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=9xf32HW6EXdJYgTdtxGZYvnAhMF+g9xyTE0/Qzb0fQY=;
  b=fm/4zcFIiAb2tOb1iky810vaJDmoUXk7fUmQ7gDrQrIxGopAm54959wI
   sMEnpJpmPmNOODxTPvN+KUJVkX07jlihUoGQu4WzvqX5KpNmo1RqKYCnX
   9PB85MsDkiIugc+glyro5snmgAgmji8buFJuDyCncpkFnxne9hklOQUnF
   7BCYLFb6sPBJeYB3tqeQU/+RjywQeF90Lf6LcAN6ZiGzeOKg3UXgy5o+P
   6tce8ahBwbCrfzmTgQ1JHdigQyNAZT0cZJ5m3UEyEJo3QC9Nak1MDZ1rl
   pOIwvAykvmsM3ZONiSWNSxVfig/yGXugOsOOVgS/w4NvNwP/CxkrtUrCP
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268097120"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268097120"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083494"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 097/104] KVM: TDX: Handle TDX PV map_gpa hypercall
Date: Thu,  5 May 2022 11:15:31 -0700
Message-Id: 
 <ef10b38a868fb411c848fe05cc27961426a0f2e4.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV map_gpa hypercall to the kvm/mmu backend.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 60 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index ee83539d5228..d5bb5f1cbd21 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1221,6 +1221,64 @@ static int tdx_report_fatal_error(struct kvm_vcpu *v=
cpu)
 	return 0;
 }
=20
+static int tdx_map_gpa(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm =3D vcpu->kvm;
+	gpa_t gpa =3D tdvmcall_a0_read(vcpu);
+	gpa_t size =3D tdvmcall_a1_read(vcpu);
+	gpa_t end =3D gpa + size;
+	bool allow_private =3D kvm_is_private_gpa(kvm, gpa);
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+	if (!IS_ALIGNED(gpa, 4096) || !IS_ALIGNED(size, 4096) ||
+		end < gpa ||
+		end > kvm_gfn_shared_mask(kvm) << (PAGE_SHIFT + 1) ||
+		kvm_is_private_gpa(kvm, gpa) !=3D kvm_is_private_gpa(kvm, end))
+		return 1;
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+
+#define TDX_MAP_GPA_SIZE_MAX   (16 * 1024 * 1024)
+	while (gpa < end) {
+		gfn_t s =3D gpa_to_gfn(gpa);
+		gfn_t e =3D gpa_to_gfn(
+			min(roundup(gpa + 1, TDX_MAP_GPA_SIZE_MAX), end));
+		int ret =3D kvm_mmu_map_gpa(vcpu, &s, e, allow_private);
+
+		if (ret =3D=3D -EAGAIN)
+			e =3D s;
+		else if (ret) {
+			tdvmcall_set_return_code(vcpu,
+						TDG_VP_VMCALL_INVALID_OPERAND);
+			break;
+		}
+
+		gpa =3D gfn_to_gpa(e);
+
+		/*
+		 * TODO:
+		 * Interrupt this hypercall invocation to return remaining
+		 * region to the guest and let the guest to resume the
+		 * hypercall.
+		 *
+		 * The TDX Guest-Hypervisor Communication Interface(GHCI)
+		 * specification and guest implementation need to be updated.
+		 *
+		 * if (gpa < end && need_resched()) {
+		 *	size =3D end - gpa;
+		 *	tdvmcall_a0_write(vcpu, gpa);
+		 *	tdvmcall_a1_write(vcpu, size);
+		 *	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INTERRUPTED_RESUME);
+		 *	break;
+		 * }
+		 */
+		if (gpa < end && need_resched())
+			cond_resched();
+	}
+
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -1241,6 +1299,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_wrmsr(vcpu);
 	case TDG_VP_VMCALL_REPORT_FATAL_ERROR:
 		return tdx_report_fatal_error(vcpu);
+	case TDG_VP_VMCALL_MAP_GPA:
+		return tdx_map_gpa(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 85C4DC43219
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384222AbiEESYH (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:24:07 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36390 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383307AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 884035DA7F;
        Thu,  5 May 2022 11:15:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774559; x=1683310559;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=vov4jPM1ApnycITbC2fGVR7nHT4HZozduyZCEnCrz7k=;
  b=U7OVbY6GD/zEhanS7ssHU1Nn3NRKJ9bhEW/mPiLJ0iDBbJsAFuZ6uC6T
   IuZSAreN1ODMKNP5y/uO9E38zp9i4wDzGyTVkQM3kvT1QHbpxqoHSAMG3
   T6PSW/LbARXcFQbEW2JpIrURfHj51qyo8zBY6Qetg736KKFXfx8e6eqEk
   Its+e6wonnt+2b358zwCcem/CCbBNqLYZ9kBSiqP/44rS28GYJG2W2tgA
   DmRkus4BtZQBOZ8U41Zp/hioMXutLEPX4DWTWV/PL42uXPQ8+0F62OSyx
   tJV8bGq6zRXlGflD0QdwTmNC/07h+Me99yJkKRLxr+b11bVZzS76HubKO
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268097121"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268097121"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083497"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 098/104] KVM: TDX: Handle
 TDG.VP.VMCALL<GetTdVmCallInfo> hypercall
Date: Thu,  5 May 2022 11:15:32 -0700
Message-Id: 
 <fecb444875ab4283fed19aef962aa291a272b668.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implement TDG.VP.VMCALL<GetTdVmCallInfo> hypercall.  If the input value is
zero, return success code and zero in output registers.

TDG.VP.VMCALL<GetTdVmCallInfo> hypercall is a subleaf of TDG.VP.VMCALL to
enumerate which TDG.VP.VMCALL sub leaves are supported.  This hypercall is
for future enhancement of the Guest-Host-Communication Interface (GHCI)
specification.  The GHCI version of 344426-001US defines it to require
input R12 to be zero and to return zero in output registers, R11, R12, R13,
and R14 so that guest TD enumerates no enhancement.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index d5bb5f1cbd21..4618934700cc 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1202,6 +1202,20 @@ static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_get_td_vm_call_info(struct kvm_vcpu *vcpu)
+{
+	if (tdvmcall_a0_read(vcpu))
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+	else {
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+		kvm_r11_write(vcpu, 0);
+		tdvmcall_a0_write(vcpu, 0);
+		tdvmcall_a1_write(vcpu, 0);
+		tdvmcall_a2_write(vcpu, 0);
+	}
+	return 1;
+}
+
 static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
 {
 	/*
@@ -1297,6 +1311,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_rdmsr(vcpu);
 	case EXIT_REASON_MSR_WRITE:
 		return tdx_emulate_wrmsr(vcpu);
+	case TDG_VP_VMCALL_GET_TD_VM_CALL_INFO:
+		return tdx_get_td_vm_call_info(vcpu);
 	case TDG_VP_VMCALL_REPORT_FATAL_ERROR:
 		return tdx_report_fatal_error(vcpu);
 	case TDG_VP_VMCALL_MAP_GPA:
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8D67DC433F5
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:22:02 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384424AbiEESYW (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:24:22 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37014 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383315AbiEESTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:44 -0400
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 581BF5DBD1;
        Thu,  5 May 2022 11:16:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774560; x=1683310560;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=5TcLcP+p51HgP1gzMxR+mwJRzVkAUNArKCx6uZxhAyE=;
  b=RU0cpuZL9oFZA7eNUkXaeDFz2nM9jArwZmbC44wt2b2YTPgLvCRpru8F
   Kt/W2RCZ1Fya6/EdKwjjyqM7xuxR8Uk7D300qgtDuCAvkL9wtpUM478M1
   RVF9RcO+juixDeFOaPHR9Z1tlmhYEpMr6BFa5kVm7W4eUl9CPRjbgyJBf
   LjOiLj2m78MJJaWupa2YoMAI2kSv1/pvP3Ug0PwQEvQ2Fb609+8TAl/cX
   tjMIt0vUkWdJLuHpCxyjCncHqBPwvgg/4TmukmwziIbE/hal5CwNL68ia
   BMsnBcd+7ZfaezwLZjlX2aE35s69n6PmPqX2ExkECG/gZS6v6tiqhWskf
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268097123"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268097123"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083500"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 099/104] KVM: TDX: Silently discard SMI request
Date: Thu,  5 May 2022 11:15:33 -0700
Message-Id: 
 <05e71e67d894d656a2ebe54cd5c1b0e206628d93.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX doesn't support system-management mode (SMM) and system-management
interrupt (SMI) in guest TDs.  Because guest state (vcpu state, memory
state) is protected, it must go through the TDX module APIs to change guest
state, injecting SMI and changing vcpu mode into SMM.  The TDX module
doesn't provide a way for VMM to inject SMI into guest TD and a way for VMM
to switch guest vcpu mode into SMM.

We have two options in KVM when handling SMM or SMI in the guest TD or the
device model (e.g. QEMU): 1) silently ignore the request or 2) return a
meaningful error.

For simplicity, we implemented the option 1).

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/lapic.c       |  7 +++++--
 arch/x86/kvm/vmx/main.c    | 43 ++++++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/tdx.c     | 27 ++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  8 +++++++
 arch/x86/kvm/x86.c         |  3 ++-
 5 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index f8e190da769f..bc329c4488a9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1146,8 +1146,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic,=
 int delivery_mode,
=20
 	case APIC_DM_SMI:
 		result =3D 1;
-		kvm_make_request(KVM_REQ_SMI, vcpu);
-		kvm_vcpu_kick(vcpu);
+		if (static_call(kvm_x86_has_emulated_msr)(vcpu->kvm,
+							  MSR_IA32_SMBASE)) {
+			kvm_make_request(KVM_REQ_SMI, vcpu);
+			kvm_vcpu_kick(vcpu);
+		}
 		break;
=20
 	case APIC_DM_NMI:
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index dec9689afab2..b8d0b875d8d9 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -233,6 +233,41 @@ static int vt_get_msr(struct kvm_vcpu *vcpu, struct ms=
r_data *msr_info)
 	return vmx_get_msr(vcpu, msr_info);
 }
=20
+static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_smi_allowed(vcpu, for_injection);
+
+	return vmx_smi_allowed(vcpu, for_injection);
+}
+
+static int vt_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+{
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_enter_smm(vcpu, smstate);
+
+	return vmx_enter_smm(vcpu, smstate);
+}
+
+static int vt_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
+{
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_leave_smm(vcpu, smstate);
+
+	return vmx_leave_smm(vcpu, smstate);
+}
+
+static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_enable_smi_window(vcpu);
+		return;
+	}
+
+	/* RSM will cause a vmexit anyway.  */
+	vmx_enable_smi_window(vcpu);
+}
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -571,10 +606,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.setup_mce =3D vmx_setup_mce,
=20
-	.smi_allowed =3D vmx_smi_allowed,
-	.enter_smm =3D vmx_enter_smm,
-	.leave_smm =3D vmx_leave_smm,
-	.enable_smi_window =3D vmx_enable_smi_window,
+	.smi_allowed =3D vt_smi_allowed,
+	.enter_smm =3D vt_enter_smm,
+	.leave_smm =3D vt_leave_smm,
+	.enable_smi_window =3D vt_enable_smi_window,
=20
 	.can_emulate_instruction =3D vmx_can_emulate_instruction,
 	.apic_init_signal_blocked =3D vmx_apic_init_signal_blocked,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 4618934700cc..0b464c6bd81d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1829,6 +1829,33 @@ int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_da=
ta *msr)
 	return 1;
 }
=20
+int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	/* SMI isn't supported for TDX. */
+	WARN_ON_ONCE(1);
+	return false;
+}
+
+int tdx_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+{
+	/* smi_allowed() is always false for TDX as above. */
+	WARN_ON_ONCE(1);
+	return 0;
+}
+
+int tdx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
+{
+	WARN_ON_ONCE(1);
+	return 0;
+}
+
+void tdx_enable_smi_window(struct kvm_vcpu *vcpu)
+{
+	/* SMI isn't supported for TDX.  Silently discard SMI request. */
+	WARN_ON_ONCE(1);
+	vcpu->arch.smi_pending =3D false;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 64e7da448906..63573629a365 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -159,6 +159,10 @@ void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *rea=
son,
 bool tdx_is_emulated_msr(u32 index, bool write);
 int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
 int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
+int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection);
+int tdx_enter_smm(struct kvm_vcpu *vcpu, char *smstate);
+int tdx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate);
+void tdx_enable_smi_window(struct kvm_vcpu *vcpu);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -199,6 +203,10 @@ static inline void tdx_get_exit_info(
 static inline bool tdx_is_emulated_msr(u32 index, bool write) { return fal=
se; }
 static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
 static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
+static inline int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injectio=
n) { return false; }
+static inline int tdx_enter_smm(struct kvm_vcpu *vcpu, char *smstate) { re=
turn 0; }
+static inline int tdx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate=
) { return 0; }
+static inline void tdx_enable_smi_window(struct kvm_vcpu *vcpu) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f367d0dcef97..38e1f00cd224 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4787,7 +4787,8 @@ static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu)
=20
 static int kvm_vcpu_ioctl_smi(struct kvm_vcpu *vcpu)
 {
-	kvm_make_request(KVM_REQ_SMI, vcpu);
+	if (static_call(kvm_x86_has_emulated_msr)(vcpu->kvm, MSR_IA32_SMBASE))
+		kvm_make_request(KVM_REQ_SMI, vcpu);
=20
 	return 0;
 }
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 02859C3527A
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1385322AbiEESaV (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:30:21 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37416 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383407AbiEESVE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:21:04 -0400
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF4815DBC7;
        Thu,  5 May 2022 11:16:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774588; x=1683310588;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=IbG+j+ZdWAUtQg/ZqHolbv+gUeqsU/yo5UGAxS0cGAs=;
  b=HcgT1r+JzW4jH5e/nLM3+h+MqDiXjMkDJgh+eh+KavEO10xy0whbgYxV
   e8DN4OcyzV2YI9Xp37nf+3oaT0d9KqDfCAgsh8abjpCFcl7oOMzD87XvP
   0R/SQ7PEH2dC0Fc0B6IDoJ6sP9U6pYuoS6FX2Emg2YHKAxHS7jS493B+f
   kTr/DsYsSjQshs/NlYiuieLnlSOjc1WKOEMIDHtAmuwJjTuB4czYchFxB
   N9Kz5VkGoMtfEIEgLemrPOFbGoLbXcf+NjuLH9P/fSf2AkJXnGdvjUwQc
   Vw4T5D9fREm+rXtkgxOdflC3h/tjulH6AU9zQI/jxHxBQkO9TlWSkA/f1
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354897"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268354897"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:56 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083503"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:55 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 100/104] KVM: TDX: Silently ignore INIT/SIPI
Date: Thu,  5 May 2022 11:15:34 -0700
Message-Id: 
 <e83b335c8a0b02a9c30bea63291790223461ca70.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX module API doesn't provide API for VMM to inject INIT IPI and SIPI.
Instead it defines the different protocols to boot application processors.
Ignore INIT and SIPI events for the TDX guest.

There are two options. 1) (silently) ignore INIT/SIPI request or 2) return
error to guest TDs somehow.  Given that TDX guest is paravirtualized to
boot AP, the option 1 is chosen for simplicity.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/lapic.c               | 16 +++++++++++-----
 arch/x86/kvm/svm/svm.c             |  1 +
 arch/x86/kvm/vmx/main.c            | 22 +++++++++++++++++++++-
 5 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index ec98b3f734a2..ff658969cfff 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -136,6 +136,7 @@ KVM_X86_OP_OPTIONAL(migrate_timers)
 KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
+KVM_X86_OP(vcpu_deliver_init)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP(check_processor_compatibility)
=20
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index f67fe33e6661..94736f107628 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1553,6 +1553,7 @@ struct kvm_x86_ops {
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
=20
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
+	void (*vcpu_deliver_init)(struct kvm_vcpu *vcpu);
=20
 	/*
 	 * Returns vCPU specific APICv inhibit reasons
@@ -1777,6 +1778,7 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int s=
eg);
 void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
+void kvm_vcpu_deliver_init(struct kvm_vcpu *vcpu);
=20
 int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
 		    int reason, bool has_error_code, u32 error_code);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index bc329c4488a9..db5ff56538f7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2976,6 +2976,16 @@ int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 =
data, unsigned long len)
 	return 0;
 }
=20
+void kvm_vcpu_deliver_init(struct kvm_vcpu *vcpu)
+{
+	kvm_vcpu_reset(vcpu, true);
+	if (kvm_vcpu_is_bsp(vcpu))
+		vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
+	else
+		vcpu->arch.mp_state =3D KVM_MP_STATE_INIT_RECEIVED;
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_init);
+
 int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
@@ -3023,11 +3033,7 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
=20
 	if (test_bit(KVM_APIC_INIT, &pe)) {
 		clear_bit(KVM_APIC_INIT, &apic->pending_events);
-		kvm_vcpu_reset(vcpu, true);
-		if (kvm_vcpu_is_bsp(apic->vcpu))
-			vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
-		else
-			vcpu->arch.mp_state =3D KVM_MP_STATE_INIT_RECEIVED;
+		static_call(kvm_x86_vcpu_deliver_init)(vcpu);
 	}
 	if (test_bit(KVM_APIC_SIPI, &pe)) {
 		clear_bit(KVM_APIC_SIPI, &apic->pending_events);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ca2700020322..ee11a4537ddd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4729,6 +4729,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata =3D {
 	.complete_emulated_msr =3D svm_complete_emulated_msr,
=20
 	.vcpu_deliver_sipi_vector =3D svm_vcpu_deliver_sipi_vector,
+	.vcpu_deliver_init =3D kvm_vcpu_deliver_init,
 	.vcpu_get_apicv_inhibit_reasons =3D avic_vcpu_get_apicv_inhibit_reasons,
 };
=20
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index b8d0b875d8d9..d7cc85f81713 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -295,6 +295,25 @@ static void vt_deliver_interrupt(struct kvm_lapic *api=
c, int delivery_mode,
 	vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector);
 }
=20
+static void vt_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	kvm_vcpu_deliver_sipi_vector(vcpu, vector);
+}
+
+static void vt_vcpu_deliver_init(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		/* TDX doesn't support INIT.  Ignore INIT event */
+		vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
+		return;
+	}
+
+	kvm_vcpu_deliver_init(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -618,7 +637,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.msr_filter_changed =3D vmx_msr_filter_changed,
 	.complete_emulated_msr =3D kvm_complete_insn_gp,
=20
-	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
+	.vcpu_deliver_sipi_vector =3D vt_vcpu_deliver_sipi_vector,
+	.vcpu_deliver_init =3D vt_vcpu_deliver_init,
=20
 	.dev_mem_enc_ioctl =3D tdx_dev_ioctl,
 	.mem_enc_ioctl =3D vt_mem_enc_ioctl,
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C3097C41535
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:21:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1384678AbiEESYi (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:24:38 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36398 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383323AbiEESTp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:45 -0400
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F3535DBD3;
        Thu,  5 May 2022 11:16:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774561; x=1683310561;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=L+7mKcUlojIRpXDVHvK5cVjJQy39srZXgMdW3BGJOJM=;
  b=YMGDF7jSyrnj/WP51A7xsFmwrB52JB8BeaG+h/ckioXpyGZGtFAhGm5H
   CU7Pm3oNIFWGQesMV3zKIFLqgiIQgzxKS9Ssk9lsRdmARorhN5iZqb01d
   mfWst+aU+ZHK8y3Mv2NsLSEYSNTy2kJDmzYjBPps1iFvRaNi8d8suS2QC
   Tf4dZXJ9ehiVqauYNvDTcSA1Hd5KcI70HHN6BqzScLF5C0WyemBQlLwXM
   lpUyPdOG+FdItIk4AlLVEMbrYnptCbqFDA06yE0B69xa6lo3RE+5hrGzc
   XbprYBhzDTLqMaPmTnbQEMyWS25Pz798O3xsUPxiyqpMmBU3TbWbK0ISD
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268097124"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268097124"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:56 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083506"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:56 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 101/104] KVM: TDX: Add methods to ignore accesses to
 CPU state
Date: Thu,  5 May 2022 11:15:35 -0700
Message-Id: 
 <db6c85edbd893c40429a52c9e1d5e1000e392849.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX protects TDX guest state from VMM.  Implements to access methods for
TDX guest state to ignore them or return zero.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 463 +++++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/tdx.c     |  55 ++++-
 arch/x86/kvm/vmx/x86_ops.h |  17 ++
 3 files changed, 490 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d7cc85f81713..3acd4f0f91b7 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -268,6 +268,46 @@ static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
 	vmx_enable_smi_window(vcpu);
 }
=20
+static bool vt_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_typ=
e,
+				       void *insn, int insn_len)
+{
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return vmx_can_emulate_instruction(vcpu, emul_type, insn, insn_len);
+}
+
+static int vt_check_intercept(struct kvm_vcpu *vcpu,
+				 struct x86_instruction_info *info,
+				 enum x86_intercept_stage stage,
+				 struct x86_exception *exception)
+{
+	/*
+	 * This call back is triggered by the x86 instruction emulator. TDX
+	 * doesn't allow guest memory inspection.
+	 */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return X86EMUL_UNHANDLEABLE;
+
+	return vmx_check_intercept(vcpu, info, stage, exception);
+}
+
+static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_apic_init_signal_blocked(vcpu);
+}
+
+static void vt_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_set_virtual_apic_mode(vcpu);
+
+	return vmx_set_virtual_apic_mode(vcpu);
+}
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -275,6 +315,31 @@ static void vt_apicv_post_state_restore(struct kvm_vcp=
u *vcpu)
 	memset(pi->pir, 0, sizeof(pi->pir));
 }
=20
+static void vt_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	return vmx_hwapic_irr_update(vcpu, max_irr);
+}
+
+static void vt_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	return vmx_hwapic_isr_update(vcpu, max_isr);
+}
+
+static bool vt_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	/* TDX doesn't support L2 at the moment. */
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return false;
+
+	return vmx_guest_apic_has_interrupt(vcpu);
+}
+
 static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -314,6 +379,177 @@ static void vt_vcpu_deliver_init(struct kvm_vcpu *vcp=
u)
 	kvm_vcpu_deliver_init(vcpu);
 }
=20
+static void vt_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	return vmx_vcpu_after_set_cpuid(vcpu);
+}
+
+static void vt_update_exception_bitmap(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_update_exception_bitmap(vcpu);
+}
+
+static u64 vt_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return tdx_get_segment_base(vcpu, seg);
+
+	return vmx_get_segment_base(vcpu, seg);
+}
+
+static void vt_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
+			      int seg)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return tdx_get_segment(vcpu, var, seg);
+
+	vmx_get_segment(vcpu, var, seg);
+}
+
+static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
+			      int seg)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_set_segment(vcpu, var, seg);
+}
+
+static int vt_get_cpl(struct kvm_vcpu *vcpu)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return tdx_get_cpl(vcpu);
+
+	return vmx_get_cpl(vcpu);
+}
+
+static void vt_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_get_cs_db_l_bits(vcpu, db, l);
+}
+
+static void vt_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_cr0(vcpu, cr0);
+}
+
+static void vt_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_cr4(vcpu, cr4);
+}
+
+static int vt_set_efer(struct kvm_vcpu *vcpu, u64 efer)
+{
+	if (is_td_vcpu(vcpu))
+		return 0;
+
+	return vmx_set_efer(vcpu, efer);
+}
+
+static void vt_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm)) {
+		memset(dt, 0, sizeof(*dt));
+		return;
+	}
+
+	vmx_get_idt(vcpu, dt);
+}
+
+static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_set_idt(vcpu, dt);
+}
+
+static void vt_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm)) {
+		memset(dt, 0, sizeof(*dt));
+		return;
+	}
+
+	vmx_get_gdt(vcpu, dt);
+}
+
+static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_set_gdt(vcpu, dt);
+}
+
+static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_dr7(vcpu, val);
+}
+
+static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * MOV-DR exiting is always cleared for TD guest, even in debug mode.
+	 * Thus KVM_DEBUGREG_WONT_EXIT can never be set and it should never
+	 * reach here for TD vcpu.
+	 */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_sync_dirty_debug_regs(vcpu);
+}
+
+static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_cache_reg(vcpu, reg);
+
+	return vmx_cache_reg(vcpu, reg);
+}
+
+static unsigned long vt_get_rflags(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_get_rflags(vcpu);
+
+	return vmx_get_rflags(vcpu);
+}
+
+static void vt_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_rflags(vcpu, rflags);
+}
+
+static bool vt_get_if_flag(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return vmx_get_if_flag(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -430,6 +666,15 @@ static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vc=
pu)
 	return vmx_get_interrupt_shadow(vcpu);
 }
=20
+static void vt_patch_hypercall(struct kvm_vcpu *vcpu,
+				  unsigned char *hypercall)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_patch_hypercall(vcpu, hypercall);
+}
+
 static void vt_inject_irq(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -438,6 +683,14 @@ static void vt_inject_irq(struct kvm_vcpu *vcpu)
 	vmx_inject_irq(vcpu);
 }
=20
+static void vt_queue_exception(struct kvm_vcpu *vcpu)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_queue_exception(vcpu);
+}
+
 static void vt_cancel_injection(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -470,6 +723,130 @@ static void vt_request_immediate_exit(struct kvm_vcpu=
 *vcpu)
 	vmx_request_immediate_exit(vcpu);
 }
=20
+static void vt_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int ir=
r)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_update_cr8_intercept(vcpu, tpr, irr);
+}
+
+static void vt_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
+{
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return;
+
+	vmx_set_apic_access_page_addr(vcpu);
+}
+
+static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+{
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return;
+
+	vmx_refresh_apicv_exec_ctrl(vcpu);
+}
+
+static void vt_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitma=
p)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_load_eoi_exitmap(vcpu, eoi_exit_bitmap);
+}
+
+static int vt_set_tss_addr(struct kvm *kvm, unsigned int addr)
+{
+	if (is_td(kvm))
+		return 0;
+
+	return vmx_set_tss_addr(kvm, addr);
+}
+
+static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
+{
+	if (is_td(kvm))
+		return 0;
+
+	return vmx_set_identity_map_addr(kvm, ident_addr);
+}
+
+static u64 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+{
+	if (is_td_vcpu(vcpu)) {
+		if (is_mmio)
+			return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
+		return  MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
+	}
+
+	return vmx_get_mt_mask(vcpu, gfn, is_mmio);
+}
+
+static u64 vt_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	/* TDX doesn't support L2 guest at the moment. */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return 0;
+
+	return vmx_get_l2_tsc_offset(vcpu);
+}
+
+static u64 vt_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
+{
+	/* TDX doesn't support L2 guest at the moment. */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return 0;
+
+	return vmx_get_l2_tsc_multiplier(vcpu);
+}
+
+static void vt_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+{
+	/* In TDX, tsc offset can't be changed. */
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_write_tsc_offset(vcpu, offset);
+}
+
+static void vt_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
+{
+	/* In TDX, tsc multiplier can't be changed. */
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_write_tsc_multiplier(vcpu, multiplier);
+}
+
+static void vt_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_update_cpu_dirty_logging(vcpu);
+}
+
+#ifdef CONFIG_X86_64
+static int vt_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
+			      bool *expired)
+{
+	/* VMX-preemption timer isn't available for TDX. */
+	if (is_td_vcpu(vcpu))
+		return -EINVAL;
+
+	return vmx_set_hv_timer(vcpu, guest_deadline_tsc, expired);
+}
+
+static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu)
+{
+	/* VMX-preemption timer can't be set.  Set vt_set_hv_timer(). */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_cancel_hv_timer(vcpu);
+}
+#endif
+
 static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
 			u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
 {
@@ -524,29 +901,29 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vcpu_load =3D vt_vcpu_load,
 	.vcpu_put =3D vt_vcpu_put,
=20
-	.update_exception_bitmap =3D vmx_update_exception_bitmap,
+	.update_exception_bitmap =3D vt_update_exception_bitmap,
 	.get_msr_feature =3D vmx_get_msr_feature,
 	.get_msr =3D vt_get_msr,
 	.set_msr =3D vt_set_msr,
-	.get_segment_base =3D vmx_get_segment_base,
-	.get_segment =3D vmx_get_segment,
-	.set_segment =3D vmx_set_segment,
-	.get_cpl =3D vmx_get_cpl,
-	.get_cs_db_l_bits =3D vmx_get_cs_db_l_bits,
-	.set_cr0 =3D vmx_set_cr0,
+	.get_segment_base =3D vt_get_segment_base,
+	.get_segment =3D vt_get_segment,
+	.set_segment =3D vt_set_segment,
+	.get_cpl =3D vt_get_cpl,
+	.get_cs_db_l_bits =3D vt_get_cs_db_l_bits,
+	.set_cr0 =3D vt_set_cr0,
 	.is_valid_cr4 =3D vmx_is_valid_cr4,
-	.set_cr4 =3D vmx_set_cr4,
-	.set_efer =3D vmx_set_efer,
-	.get_idt =3D vmx_get_idt,
-	.set_idt =3D vmx_set_idt,
-	.get_gdt =3D vmx_get_gdt,
-	.set_gdt =3D vmx_set_gdt,
-	.set_dr7 =3D vmx_set_dr7,
-	.sync_dirty_debug_regs =3D vmx_sync_dirty_debug_regs,
-	.cache_reg =3D vmx_cache_reg,
-	.get_rflags =3D vmx_get_rflags,
-	.set_rflags =3D vmx_set_rflags,
-	.get_if_flag =3D vmx_get_if_flag,
+	.set_cr4 =3D vt_set_cr4,
+	.set_efer =3D vt_set_efer,
+	.get_idt =3D vt_get_idt,
+	.set_idt =3D vt_set_idt,
+	.get_gdt =3D vt_get_gdt,
+	.set_gdt =3D vt_set_gdt,
+	.set_dr7 =3D vt_set_dr7,
+	.sync_dirty_debug_regs =3D vt_sync_dirty_debug_regs,
+	.cache_reg =3D vt_cache_reg,
+	.get_rflags =3D vt_get_rflags,
+	.set_rflags =3D vt_set_rflags,
+	.get_if_flag =3D vt_get_if_flag,
=20
 	.flush_tlb_all =3D vt_flush_tlb_all,
 	.flush_tlb_current =3D vt_flush_tlb_current,
@@ -560,10 +937,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.update_emulated_instruction =3D vmx_update_emulated_instruction,
 	.set_interrupt_shadow =3D vt_set_interrupt_shadow,
 	.get_interrupt_shadow =3D vt_get_interrupt_shadow,
-	.patch_hypercall =3D vmx_patch_hypercall,
+	.patch_hypercall =3D vt_patch_hypercall,
 	.inject_irq =3D vt_inject_irq,
 	.inject_nmi =3D vt_inject_nmi,
-	.queue_exception =3D vmx_queue_exception,
+	.queue_exception =3D vt_queue_exception,
 	.cancel_injection =3D vt_cancel_injection,
 	.interrupt_allowed =3D vt_interrupt_allowed,
 	.nmi_allowed =3D vt_nmi_allowed,
@@ -571,39 +948,39 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.set_nmi_mask =3D vt_set_nmi_mask,
 	.enable_nmi_window =3D vt_enable_nmi_window,
 	.enable_irq_window =3D vt_enable_irq_window,
-	.update_cr8_intercept =3D vmx_update_cr8_intercept,
-	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
-	.set_apic_access_page_addr =3D vmx_set_apic_access_page_addr,
-	.refresh_apicv_exec_ctrl =3D vmx_refresh_apicv_exec_ctrl,
-	.load_eoi_exitmap =3D vmx_load_eoi_exitmap,
+	.update_cr8_intercept =3D vt_update_cr8_intercept,
+	.set_virtual_apic_mode =3D vt_set_virtual_apic_mode,
+	.set_apic_access_page_addr =3D vt_set_apic_access_page_addr,
+	.refresh_apicv_exec_ctrl =3D vt_refresh_apicv_exec_ctrl,
+	.load_eoi_exitmap =3D vt_load_eoi_exitmap,
 	.apicv_post_state_restore =3D vt_apicv_post_state_restore,
 	.check_apicv_inhibit_reasons =3D vmx_check_apicv_inhibit_reasons,
-	.hwapic_irr_update =3D vmx_hwapic_irr_update,
-	.hwapic_isr_update =3D vmx_hwapic_isr_update,
-	.guest_apic_has_interrupt =3D vmx_guest_apic_has_interrupt,
+	.hwapic_irr_update =3D vt_hwapic_irr_update,
+	.hwapic_isr_update =3D vt_hwapic_isr_update,
+	.guest_apic_has_interrupt =3D vt_guest_apic_has_interrupt,
 	.sync_pir_to_irr =3D vt_sync_pir_to_irr,
 	.deliver_interrupt =3D vt_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
 	.protected_apic_has_interrupt =3D vt_protected_apic_has_interrupt,
=20
-	.set_tss_addr =3D vmx_set_tss_addr,
-	.set_identity_map_addr =3D vmx_set_identity_map_addr,
-	.get_mt_mask =3D vmx_get_mt_mask,
+	.set_tss_addr =3D vt_set_tss_addr,
+	.set_identity_map_addr =3D vt_set_identity_map_addr,
+	.get_mt_mask =3D vt_get_mt_mask,
=20
 	.get_exit_info =3D vt_get_exit_info,
=20
-	.vcpu_after_set_cpuid =3D vmx_vcpu_after_set_cpuid,
+	.vcpu_after_set_cpuid =3D vt_vcpu_after_set_cpuid,
=20
 	.has_wbinvd_exit =3D cpu_has_vmx_wbinvd_exit,
=20
-	.get_l2_tsc_offset =3D vmx_get_l2_tsc_offset,
-	.get_l2_tsc_multiplier =3D vmx_get_l2_tsc_multiplier,
-	.write_tsc_offset =3D vmx_write_tsc_offset,
-	.write_tsc_multiplier =3D vmx_write_tsc_multiplier,
+	.get_l2_tsc_offset =3D vt_get_l2_tsc_offset,
+	.get_l2_tsc_multiplier =3D vt_get_l2_tsc_multiplier,
+	.write_tsc_offset =3D vt_write_tsc_offset,
+	.write_tsc_multiplier =3D vt_write_tsc_multiplier,
=20
 	.load_mmu_pgd =3D vt_load_mmu_pgd,
=20
-	.check_intercept =3D vmx_check_intercept,
+	.check_intercept =3D vt_check_intercept,
 	.handle_exit_irqoff =3D vt_handle_exit_irqoff,
=20
 	.request_immediate_exit =3D vt_request_immediate_exit,
@@ -611,7 +988,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.sched_in =3D vt_sched_in,
=20
 	.cpu_dirty_log_size =3D PML_ENTITY_NUM,
-	.update_cpu_dirty_logging =3D vmx_update_cpu_dirty_logging,
+	.update_cpu_dirty_logging =3D vt_update_cpu_dirty_logging,
=20
 	.nested_ops =3D &vmx_nested_ops,
=20
@@ -619,8 +996,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.pi_start_assignment =3D vmx_pi_start_assignment,
=20
 #ifdef CONFIG_X86_64
-	.set_hv_timer =3D vmx_set_hv_timer,
-	.cancel_hv_timer =3D vmx_cancel_hv_timer,
+	.set_hv_timer =3D vt_set_hv_timer,
+	.cancel_hv_timer =3D vt_cancel_hv_timer,
 #endif
=20
 	.setup_mce =3D vmx_setup_mce,
@@ -630,8 +1007,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.leave_smm =3D vt_leave_smm,
 	.enable_smi_window =3D vt_enable_smi_window,
=20
-	.can_emulate_instruction =3D vmx_can_emulate_instruction,
-	.apic_init_signal_blocked =3D vmx_apic_init_signal_blocked,
+	.can_emulate_instruction =3D vt_can_emulate_instruction,
+	.apic_init_signal_blocked =3D vt_apic_init_signal_blocked,
 	.migrate_timers =3D vmx_migrate_timers,
=20
 	.msr_filter_changed =3D vmx_msr_filter_changed,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 0b464c6bd81d..9ae02e8d4634 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3,6 +3,7 @@
 #include <linux/mmu_context.h>
=20
 #include <asm/fpu/xcr.h>
+#include <asm/virtext.h>
 #include <asm/tdx.h>
=20
 #include "capabilities.h"
@@ -608,8 +609,15 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
=20
 	vcpu->arch.tsc_offset =3D to_kvm_tdx(vcpu->kvm)->tsc_offset;
 	vcpu->arch.l1_tsc_offset =3D vcpu->arch.tsc_offset;
-	vcpu->arch.guest_state_protected =3D
-		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
+	/*
+	 * TODO: support off-TD debug.  If TD DEBUG is enabled, guest state
+	 * can be accessed. guest_state_protected =3D false. and kvm ioctl to
+	 * access CPU states should be usable for user space VMM (e.g. qemu).
+	 *
+	 * vcpu->arch.guest_state_protected =3D
+	 *	!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
+	 */
+	vcpu->arch.guest_state_protected =3D true;
=20
 	tdx->pi_desc.nv =3D POSTED_INTR_VECTOR;
 	tdx->pi_desc.sn =3D 1;
@@ -1856,6 +1864,49 @@ void tdx_enable_smi_window(struct kvm_vcpu *vcpu)
 	vcpu->arch.smi_pending =3D false;
 }
=20
+void tdx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
+{
+	/* Only x2APIC mode is supported for TD. */
+	WARN_ON_ONCE(kvm_get_apic_mode(vcpu) !=3D LAPIC_MODE_X2APIC);
+}
+
+int tdx_get_cpl(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
+{
+	kvm_register_mark_available(vcpu, reg);
+	switch (reg) {
+	case VCPU_REGS_RSP:
+	case VCPU_REGS_RIP:
+	case VCPU_EXREG_PDPTR:
+	case VCPU_EXREG_CR0:
+	case VCPU_EXREG_CR3:
+	case VCPU_EXREG_CR4:
+		break;
+	default:
+		KVM_BUG_ON(1, vcpu->kvm);
+		break;
+	}
+}
+
+unsigned long tdx_get_rflags(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+u64 tdx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+{
+	return 0;
+}
+
+void tdx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg)
+{
+	memset(var, 0, sizeof(*var));
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 63573629a365..f8c575ef7560 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -163,6 +163,14 @@ int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_in=
jection);
 int tdx_enter_smm(struct kvm_vcpu *vcpu, char *smstate);
 int tdx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate);
 void tdx_enable_smi_window(struct kvm_vcpu *vcpu);
+void tdx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
+
+int tdx_get_cpl(struct kvm_vcpu *vcpu);
+void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
+unsigned long tdx_get_rflags(struct kvm_vcpu *vcpu);
+bool tdx_is_emulated_msr(u32 index, bool write);
+u64 tdx_get_segment_base(struct kvm_vcpu *vcpu, int seg);
+void tdx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg);
=20
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -203,10 +211,19 @@ static inline void tdx_get_exit_info(
 static inline bool tdx_is_emulated_msr(u32 index, bool write) { return fal=
se; }
 static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
 static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
+
 static inline int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injectio=
n) { return false; }
 static inline int tdx_enter_smm(struct kvm_vcpu *vcpu, char *smstate) { re=
turn 0; }
 static inline int tdx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate=
) { return 0; }
 static inline void tdx_enable_smi_window(struct kvm_vcpu *vcpu) {}
+static inline void tdx_set_virtual_apic_mode(struct kvm_vcpu *vcpu) {}
+
+static inline int tdx_get_cpl(struct kvm_vcpu *vcpu) { return 0; }
+static inline void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) =
{}
+static inline unsigned long tdx_get_rflags(struct kvm_vcpu *vcpu) { return=
 0; }
+static inline u64 tdx_get_segment_base(struct kvm_vcpu *vcpu, int seg) { r=
eturn 0;}
+static inline void tdx_get_segment(
+	struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg) {}
=20
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3FBABC433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:24:26 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1353487AbiEES2C (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:28:02 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36454 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1345363AbiEESTr (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:47 -0400
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE3B8186F1;
        Thu,  5 May 2022 11:16:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774562; x=1683310562;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=aerYO2+S0GjEpS+ZAx3zDsJy3Pih1H9mB0PTgl/gZzA=;
  b=VpvdH7CJqgeG0PVEiRNZT9NdkRpWtQtiUnAug2+eq7jnNms8LuprxUDy
   nMPPHAXD/jMRgu5SysRQpKVRrUE4bN1b+QhH7PHtcLD629m6KfKLLqSZl
   85orhj20UWAKeeGm0gcEr+8/ayJqjDW0sZwDkfE/+TeI7W//Wc5TCi669
   OYB/PllWsHOL0eQ01WS9Sj2BpHDxbEEXWQDPRwIgAkoWRwugXuVtRVktV
   bTnIfZ6fkS8TITtTxNwrg2d1bD2p0m4bva8TjRAB+SFBjAiN66u3gwpc1
   ztbaPZTaJYlPmO0kGfKVPvnUzs8xmS3H04cihCiVELya314pXKJrFLMcP
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268097126"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268097126"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:56 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083510"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:56 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 102/104] Documentation/virtual/kvm: Document on Trust
 Domain Extensions(TDX)
Date: Thu,  5 May 2022 11:15:36 -0700
Message-Id: 
 <1c17b8e6988911d754e319156cfb6a44acd2cb05.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add documentation to Intel Trusted Domain Extensions(TDX) support.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/api.rst       |   9 +-
 Documentation/virt/kvm/intel-tdx.rst | 381 +++++++++++++++++++++++++++
 2 files changed, 389 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/virt/kvm/intel-tdx.rst

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7fa6850f1e81..09691d3a6b4a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1394,6 +1394,9 @@ It is recommended to use this API instead of the KVM_=
SET_MEMORY_REGION ioctl.
 The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
 allocation and is deprecated.
=20
+For TDX guest, deleting/moving memory region loses guest memory contents.
+Read only region isn't supported.  Only as-id 0 is supported.
+
=20
 4.36 KVM_SET_TSS_ADDR
 ---------------------
@@ -4625,7 +4628,7 @@ H_GET_CPU_CHARACTERISTICS hypercall.
=20
 :Capability: basic
 :Architectures: x86
-:Type: vm
+:Type: vm ioctl, vcpu ioctl
 :Parameters: an opaque platform specific structure (in/out)
 :Returns: 0 on success; -1 on error
=20
@@ -4637,6 +4640,10 @@ Currently, this ioctl is used for issuing Secure Enc=
rypted Virtualization
 (SEV) commands on AMD Processors. The SEV commands are defined in
 Documentation/virt/kvm/amd-memory-encryption.rst.
=20
+Currently, this ioctl is used for issuing Trusted Domain Extensions
+(TDX) commands on Intel Processors. The TDX commands are defined in
+Documentation/virt/kvm/intel-tdx.rst.
+
 4.111 KVM_MEMORY_ENCRYPT_REG_REGION
 -----------------------------------
=20
diff --git a/Documentation/virt/kvm/intel-tdx.rst b/Documentation/virt/kvm/=
intel-tdx.rst
new file mode 100644
index 000000000000..3fae2cf9e534
--- /dev/null
+++ b/Documentation/virt/kvm/intel-tdx.rst
@@ -0,0 +1,381 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+Intel Trust Dodmain Extensions(TDX)
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Overview
+=3D=3D=3D=3D=3D=3D=3D=3D
+TDX stands for Trust Domain Extensions which isolates VMs from
+the virtual-machine manager (VMM)/hypervisor and any other software on
+the platform. [1]
+For details, the specifications, [2], [3], [4], [5], [6], [7], are
+available.
+
+
+API description
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+KVM_MEMORY_ENCRYPT_OP
+---------------------
+:Type: vm ioctl, vcpu ioctl
+
+For TDX operations, KVM_MEMORY_ENCRYPT_OP is re-purposed to be generic
+ioctl with TDX specific sub ioctl command.
+
+::
+
+  /* Trust Domain eXtension sub-ioctl() commands. */
+  enum kvm_tdx_cmd_id {
+          KVM_TDX_CAPABILITIES =3D 0,
+          KVM_TDX_INIT_VM,
+          KVM_TDX_INIT_VCPU,
+          KVM_TDX_INIT_MEM_REGION,
+          KVM_TDX_FINALIZE_VM,
+
+          KVM_TDX_CMD_NR_MAX,
+  };
+
+  struct kvm_tdx_cmd {
+        /* enum kvm_tdx_cmd_id */
+        __u32 id;
+        /* flags for sub-commend. If sub-command doesn't use this, set zer=
o. */
+        __u32 flags;
+        /*
+         * data for each sub-command. An immediate or a pointer to the act=
ual
+         * data in process virtual address.  If sub-command doesn't use it,
+         * set zero.
+         */
+        __u64 data;
+        /*
+         * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+         * status code in addition to -Exxx.
+         * Defined for consistency with struct kvm_sev_cmd.
+         */
+        __u64 error;
+        /* Reserved: Defined for consistency with struct kvm_sev_cmd. */
+        __u64 unused;
+  };
+
+KVM_TDX_CAPABILITIES
+--------------------
+:Type: vm ioctl
+
+Subset of TDSYSINFO_STRCUCT retrieved by TDH.SYS.INFO TDX SEAM call will be
+returned. Which describes about Intel TDX module.
+
+- id: KVM_TDX_CAPABILITIES
+- flags: must be 0
+- data: pointer to struct kvm_tdx_capabilities
+- error: must be 0
+- unused: must be 0
+
+::
+
+  struct kvm_tdx_cpuid_config {
+          __u32 leaf;
+          __u32 sub_leaf;
+          __u32 eax;
+          __u32 ebx;
+          __u32 ecx;
+          __u32 edx;
+  };
+
+  struct kvm_tdx_capabilities {
+          __u64 attrs_fixed0;
+          __u64 attrs_fixed1;
+          __u64 xfam_fixed0;
+          __u64 xfam_fixed1;
+
+          __u32 nr_cpuid_configs;
+          struct kvm_tdx_cpuid_config cpuid_configs[0];
+  };
+
+
+KVM_TDX_INIT_VM
+---------------
+:Type: vm ioctl
+
+Does additional VM initialization specific to TDX which corresponds to
+TDH.MNG.INIT TDX SEAM call.
+
+- id: KVM_TDX_INIT_VM
+- flags: must be 0
+- data: pointer to struct kvm_tdx_init_vm
+- error: must be 0
+- unused: must be 0
+
+::
+
+  struct kvm_tdx_init_vm {
+          __u32 max_vcpus;
+          __u32 reserved;
+          __u64 attributes;
+          __u64 cpuid;  /* pointer to struct kvm_cpuid2 */
+          __u64 mrconfigid[6];          /* sha384 digest */
+          __u64 mrowner[6];             /* sha384 digest */
+          __u64 mrownerconfig[6];       /* sha348 digest */
+          __u64 reserved[43];           /* must be zero for future extensi=
bility */
+  };
+
+
+KVM_TDX_INIT_VCPU
+-----------------
+:Type: vcpu ioctl
+
+Does additional VCPU initialization specific to TDX which corresponds to
+TDH.VP.INIT TDX SEAM call.
+
+- id: KVM_TDX_INIT_VCPU
+- flags: must be 0
+- data: initial value of the guest TD VCPU RCX
+- error: must be 0
+- unused: must be 0
+
+KVM_TDX_INIT_MEM_REGION
+-----------------------
+:Type: vm ioctl
+
+Encrypt a memory continuous region which corresponding to TDH.MEM.PAGE.ADD
+TDX SEAM call.
+If KVM_TDX_MEASURE_MEMORY_REGION flag is specified, it also extends measur=
ement
+which corresponds to TDH.MR.EXTEND TDX SEAM call.
+
+- id: KVM_TDX_INIT_VCPU
+- flags: flags
+            currently only KVM_TDX_MEASURE_MEMORY_REGION is defined
+- data: pointer to struct kvm_tdx_init_mem_region
+- error: must be 0
+- unused: must be 0
+
+::
+
+  #define KVM_TDX_MEASURE_MEMORY_REGION   (1UL << 0)
+
+  struct kvm_tdx_init_mem_region {
+          __u64 source_addr;
+          __u64 gpa;
+          __u64 nr_pages;
+  };
+
+
+KVM_TDX_FINALIZE_VM
+-------------------
+:Type: vm ioctl
+
+Complete measurement of the initial TD contents and mark it ready to run
+which corresponds to TDH.MR.FINALIZE
+
+- id: KVM_TDX_FINALIZE_VM
+- flags: must be 0
+- data: must be 0
+- error: must be 0
+- unused: must be 0
+
+KVM TDX creation flow
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+In addition to KVM normal flow, new TDX ioctls need to be called.  The con=
trol flow
+looks like as follows.
+
+#. system wide capability check
+  * KVM_CAP_VM_TYPES: check if VM type is supported and if TDX_VM_TYPE is
+    supported.
+
+#. creating VM
+  * KVM_CREATE_VM
+  * KVM_TDX_CAPABILITIES: query if TDX is supported on the platform.
+  * KVM_TDX_INIT_VM: pass TDX specific VM parameters.
+
+#. creating VCPU
+  * KVM_CREATE_VCPU
+  * KVM_TDX_INIT_VCPU: pass TDX specific VCPU parameters.
+
+#. initializing guest memory
+  * allocate guest memory and initialize page same to normal KVM case
+    In TDX case, parse and load TDVF into guest memory in addition.
+  * KVM_TDX_INIT_MEM_REGION to add and measure guest pages.
+    If the pages has contents above, those pages need to be added.
+    Otherwise the contents will be lost and guest sees zero pages.
+  * KVM_TDX_FINALIAZE_VM: Finalize VM and measurement
+    This must be after KVM_TDX_INIT_MEM_REGION.
+
+#. run vcpu
+
+Design discussion
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Coexistence of normal(VMX) VM and TD VM
+---------------------------------------
+It's required to allow both legacy(normal VMX) VMs and new TD VMs to
+coexist. Otherwise the benefits of VM flexibility would be eliminated.
+The main issue for it is that the logic of kvm_x86_ops callbacks for
+TDX is different from VMX. On the other hand, the variable,
+kvm_x86_ops, is global single variable. Not per-VM, not per-vcpu.
+
+Several points to be considered.
+  . No or minimal overhead when TDX is disabled(CONFIG_INTEL_TDX_HOST=3Dn).
+  . Avoid overhead of indirect call via function pointers.
+  . Contain the changes under arch/x86/kvm/vmx directory and share logic
+    with VMX for maintenance.
+    Even though the ways to operation on VM (VMX instruction vs TDX
+    SEAM call) is different, the basic idea remains same. So, many
+    logic can be shared.
+  . Future maintenance
+    The huge change of kvm_x86_ops in (near) future isn't expected.
+    a centralized file is acceptable.
+
+- Wrapping kvm x86_ops: The current choice
+  Introduce dedicated file for arch/x86/kvm/vmx/main.c (the name,
+  main.c, is just chosen to show main entry points for callbacks.) and
+  wrapper functions around all the callbacks with
+  "if (is-tdx) tdx-callback() else vmx-callback()".
+
+  Pros:
+  - No major change in common x86 KVM code. The change is (mostly)
+    contained under arch/x86/kvm/vmx/.
+  - When TDX is disabled(CONFIG_INTEL_TDX_HOST=3Dn), the overhead is
+    optimized out.
+  - Micro optimization by avoiding function pointer.
+  Cons:
+  - Many boiler plates in arch/x86/kvm/vmx/main.c.
+
+Alternative:
+- Introduce another callback layer under arch/x86/kvm/vmx.
+  Pros:
+  - No major change in common x86 KVM code. The change is (mostly)
+    contained under arch/x86/kvm/vmx/.
+  - clear separation on callbacks.
+  Cons:
+  - overhead in VMX even when TDX is disabled(CONFIG_INTEL_TDX_HOST=3Dn).
+
+- Allow per-VM kvm_x86_ops callbacks instead of global kvm_x86_ops
+  Pros:
+  - clear separation on callbacks.
+  Cons:
+  - Big change in common x86 code.
+  - overhead in common code even when TDX is
+    disabled(CONFIG_INTEL_TDX_HOST=3Dn).
+
+- Introduce new directory arch/x86/kvm/tdx
+  Pros:
+  - It clarifies that TDX is different from VMX.
+  Cons:
+  - Given the level of code sharing, it complicates code sharing.
+
+KVM MMU Changes
+---------------
+KVM MMU needs to be enhanced to handle Secure/Shared-EPT. The
+high-level execution flow is mostly same to normal EPT case.
+EPT violation/misconfiguration -> invoke TDP fault handler ->
+resolve TDP fault -> resume execution. (or emulate MMIO)
+The difference is, that S-EPT is operated(read/write) via TDX SEAM
+call which is expensive instead of direct read/write EPT entry.
+One bit of GPA (51 or 47 bit) is repurposed so that it means shared
+with host(if set to 1) or private to TD(if cleared to 0).
+
+- The current implementation
+  . Reuse the existing MMU code with minimal update.  Because the
+    execution flow is mostly same. But additional operation, TDX call
+    for S-EPT, is needed. So add hooks for it to kvm_x86_ops.
+  . For performance, minimize TDX SEAM call to operate on S-EPT. When
+    getting corresponding S-EPT pages/entry from faulting GPA, don't
+    use TDX SEAM call to read S-EPT entry. Instead create shadow copy
+    in host memory.
+    Repurpose the existing kvm_mmu_page as shadow copy of S-EPT and
+    associate S-EPT to it.
+  . Treats share bit as attributes. mask/unmask the bit where
+    necessary to keep the existing traversing code works.
+    Introduce kvm.arch.gfn_shared_mask and use "if (gfn_share_mask)"
+    for special case.
+    =3D 0 : for non-TDX case
+    =3D 51 or 47 bit set for TDX case.
+
+  Pros:
+  - Large code reuse with minimal new hooks.
+  - Execution path is same.
+  Cons:
+  - Complicates the existing code.
+  - Repurpose kvm_mmu_page as shadow of Secure-EPT can be confusing.
+
+Alternative:
+- Replace direct read/write on EPT entry with TDX-SEAM call by
+  introducing callbacks on EPT entry.
+  Pros:
+  - Straightforward.
+  Cons:
+  - Too many touching point.
+  - Too slow due to TDX-SEAM call.
+  - Overhead even when TDX is disabled(CONFIG_INTEL_TDX_HOST=3Dn).
+
+- Sprinkle "if (is-tdx)" for TDX special case
+  Pros:
+  - Straightforward.
+  Cons:
+  - The result is non-generic and ugly.
+  - Put TDX specific logic into common KVM MMU code.
+
+New KVM API, ioctl (sub)command, to manage TD VMs
+-------------------------------------------------
+Additional KVM API are needed to control TD VMs. The operations on TD
+VMs are specific to TDX.
+
+- Piggyback and repurpose KVM_MEMORY_ENCRYPT_OP
+  Although not all operation isn't memory encryption, repupose to get
+  TDX specific ioctls.
+  Pros:
+  - No major change in common x86 KVM code.
+  Cons:
+  - The operations aren't actually memory encryption, but operations
+    on TD VMs.
+
+Alternative:
+- Introduce new ioctl for guest protection like
+  KVM_GUEST_PROTECTION_OP and introduce subcommand for TDX.
+  Pros:
+  - Clean name.
+  Cons:
+  - One more new ioctl for guest protection.
+  - Confusion with KVM_MEMORY_ENCRYPT_OP with KVM_GUEST_PROTECTION_OP.
+
+- Rename KVM_MEMORY_ENCRYPT_OP to KVM_GUEST_PROTECTION_OP and keep
+  KVM_MEMORY_ENCRYPT_OP as same value for user API for compatibility.
+  "#define KVM_MEMORY_ENCRYPT_OP KVM_GUEST_PROTECTION_OP" for uapi
+  compatibility.
+  Pros:
+  - No new ioctl with more suitable name.
+  Cons:
+  - May cause confusion to the existing user program.
+
+
+References
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+.. [1] TDX specification
+   https://software.intel.com/content/www/us/en/develop/articles/intel-tru=
st-domain-extensions.html
+.. [2] Intel Trust Domain Extensions (Intel TDX)
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/tdx-whitepaper-final9-17.pdf
+.. [3] Intel CPU Architectural Extensions Specification
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/intel-tdx-cpu-architectural-specification.pdf
+.. [4] Intel TDX Module 1.0 EAS
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/intel-tdx-module-1eas.pdf
+.. [5] Intel TDX Loader Interface Specification
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/intel-tdx-seamldr-interface-specification.pdf
+.. [6] Intel TDX Guest-Hypervisor Communication Interface
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/intel-tdx-guest-hypervisor-communication-interface.pdf
+.. [7] Intel TDX Virtual Firmware Design Guide
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/tdx-virtual-firmware-design-guide-rev-1.
+.. [8] intel public github
+   kvm TDX branch: https://github.com/intel/tdx/tree/kvm
+   TDX guest branch: https://github.com/intel/tdx/tree/guest
+.. [9] tdvf
+    https://github.com/tianocore/edk2-staging/tree/TDVF
+.. [10] KVM forum 2020: Intel Virtualization Technology Extensions to
+     Enable Hardware Isolated VMs
+     https://osseu2020.sched.com/event/eDzm/intel-virtualization-technolog=
y-extensions-to-enable-hardware-isolated-vms-sean-christopherson-intel
+.. [11] Linux Security Summit EU 2020:
+     Architectural Extensions for Hardware Virtual Machine Isolation
+     to Advance Confidential Computing in Public Clouds - Ravi Sahita
+     & Jun Nakajima, Intel Corporation
+     https://osseu2020.sched.com/event/eDOx/architectural-extensions-for-h=
ardware-virtual-machine-isolation-to-advance-confidential-computing-in-publ=
ic-clouds-ravi-sahita-jun-nakajima-intel-corporation
+.. [12] [RFCv2,00/16] KVM protected memory extension
+     https://lkml.org/lkml/2020/10/20/66
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 81880C433EF
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:24:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1383231AbiEES2I (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:28:08 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36826 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383325AbiEESTu (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:50 -0400
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3292615709;
        Thu,  5 May 2022 11:16:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774565; x=1683310565;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=zxMlIXklQM/ISaSdvRZFL/IvRDO0LUHpaRctG+gz45c=;
  b=YpT41poIl2Awr0pYW45TBcRPcTlNySvrAf9ZdeWJAkhe+KBih8Kq+0Gw
   wzCOf/jpFUiYLE7O8aRTnPpDw7gj51hB7ufG/+99IXaTae/8Jm22k20ko
   LXt1RhdjHWRrfLRYY/SJTiPOx2M5Pok88ImtVoHMeu7FbEMpBp5G3qVJK
   xbJ9l9nNiIhxMhgUij57vcka7rNlF64ApKlyDiJocozs9YXlZfCVhVuTd
   R1kuH2/jimlSRvB0N5GkSIPH7zj8fFXEFSyGTJn5zF8KywXyX8MFMByv3
   sEaTGExe5w/km5hPUK0mwkPFGuHSIe7sdi5BK0XaSmAk+Lt73iQjff9cR
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268097127"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268097127"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:56 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083514"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:56 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 103/104] KVM: x86: design documentation on TDX support
 of x86 KVM TDP MMU
Date: Thu,  5 May 2022 11:15:37 -0700
Message-Id: 
 <fe205fea212962098165f351e379e8e7305dcf42.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a high level design document on TDX changes to TDP MMU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/tdx-tdp-mmu.rst | 466 +++++++++++++++++++++++++
 1 file changed, 466 insertions(+)
 create mode 100644 Documentation/virt/kvm/tdx-tdp-mmu.rst

diff --git a/Documentation/virt/kvm/tdx-tdp-mmu.rst b/Documentation/virt/kv=
m/tdx-tdp-mmu.rst
new file mode 100644
index 000000000000..6d63bb75f785
--- /dev/null
+++ b/Documentation/virt/kvm/tdx-tdp-mmu.rst
@@ -0,0 +1,466 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Design of TDP MMU for TDX support
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
+This document describes a (high level) design for TDX support of KVM TDP M=
MU of
+x86 KVM.
+
+In this document, we use "TD" or "guest TD" to differentiate it from the c=
urrent
+"VM" (Virtual Machine), which is supported by KVM today.
+
+
+Background of TDX
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+TD private memory is designed to hold TD private content, encrypted by the=
 CPU
+using the TD ephemeral key.  An encryption engine holds a table of encrypt=
ion
+keys, and an encryption key is selected for each memory transaction based =
on a
+Host Key Identifier (HKID).  By design, the host VMM does not have access =
to the
+encryption keys.
+
+In the first generation of MKTME, HKID is "stolen" from the physical addre=
ss by
+allocating a configurable number of bits from the top of the physical addr=
ess.
+The HKID space is partitioned into shared HKIDs for legacy MKTME accesses =
and
+private HKIDs for SEAM-mode-only accesses.  We use 0 for the shared HKID o=
n the
+host so that MKTME can be opaque or bypassed on the host.
+
+During TDX non-root operation (i.e. guest TD), memory accesses can be qual=
ified
+as either shared or private, based on the value of a new SHARED bit in the=
 Guest
+Physical Address (GPA).  The CPU translates shared GPAs using the usual VM=
X EPT
+(Extended Page Table) or "Shared EPT" (in this document), which resides in=
 the
+host VMM memory.  The Shared EPT is directly managed by the host VMM - the=
 same
+as with the current VMX.  Since guest TDs usually require I/O, and the data
+exchange needs to be done via shared memory, thus KVM needs to use the cur=
rent
+EPT functionality even for TDs.
+
+The CPU translates private GPAs using a separate Secure EPT.  The Secure E=
PT
+pages are encrypted and integrity-protected with the TD's ephemeral privat=
e key.
+Secure EPT can be managed _indirectly_ by the host VMM, using the TDX inte=
rface
+functions (SEAMCALLs), and thus conceptually Secure EPT is a subset of EPT
+because not all functionalities are available.
+
+Since the execution of such interface functions takes much longer time than
+accessing memory directly, in KVM we use the existing TDP code to mirror t=
he
+Secure EPT for the TD. And we think there are at least two options today in
+terms of the timing for executing such SEAMCALLs:
+
+1. synchronous, i.e. while walking the TDP page tables, or
+2. post-walk, i.e. record what needs to be done to the real Secure EPT dur=
ing
+   the walk, and execute SEAMCALLs later.
+
+The option 1 seems to be more intuitive and simpler, but the Secure EPT
+concurrency rules are different from the ones of the TDP or EPT. For examp=
le,
+MEM.SEPT.RD acquire shared access to the whole Secure EPT tree of the targ=
et
+
+Secure EPT(SEPT) operations
+---------------------------
+Secure EPT is an Extended Page Table for GPA-to-HPA translation of TD priv=
ate
+HPA.  A Secure EPT is designed to be encrypted with the TD's ephemeral pri=
vate
+key. SEPT pages are allocated by the host VMM via Intel TDX functions, but=
 their
+content is intended to be hidden and is not architectural.
+
+Unlike the conventional EPT, the CPU can't directly read/write its entry.
+Instead, TDX SEAMCALL API is used.  Several SEAMCALLs correspond to operat=
ion on
+the EPT entry.
+
+* TDH.MEM.SEPT.ADD():
+  Add a secure EPT page from the secure EPT tree.  This corresponds to upd=
ating
+  the non-leaf EPT entry with present bit set
+
+* TDH.MEM.SEPT.REMOVE():
+  Remove the secure page from the secure EPT tree.  There is no correspond=
ing
+  to the EPT operation.
+
+* TDH.MEM.SEPT.RD():
+  Read the secure EPT entry.  This corresponds to reading the EPT entry as
+  memory.  Please note that this is much slower than direct memory reading.
+
+* TDH.MEM.PAGE.ADD() and TDH.MEM.PAGE.AUG():
+  Add a private page to the secure EPT tree.  This corresponds to updating=
 the
+  leaf EPT entry with present bit set.
+
+* THD.MEM.PAGE.REMOVE():
+  Remove a private page from the secure EPT tree.  There is no correspondi=
ng
+  to the EPT operation.
+
+* TDH.MEM.RANGE.BLOCK():
+  This (mostly) corresponds to clearing the present bit of the leaf EPT en=
try.
+  Note that the private page is still linked in the secure EPT.  To remove=
 it
+  from the secure EPT, TDH.MEM.SEPT.REMOVE() and TDH.MEM.PAGE.REMOVE() nee=
ds to
+  be called.
+
+* TDH.MEM.TRACK():
+  Increment the TLB epoch counter. This (mostly) corresponds to EPT TLB fl=
ush.
+  Note that the private page is still linked in the secure EPT.  To remove=
 it
+  from the secure EPT, tdh_mem_page_remove() needs to be called.
+
+
+Adding private page
+-------------------
+The procedure of populating the private page looks as follows.
+
+1. TDH.MEM.SEPT.ADD(512G level)
+2. TDH.MEM.SEPT.ADD(1G level)
+3. TDH.MEM.SEPT.ADD(2M level)
+4. TDH.MEM.PAGE.AUG(4K level)
+
+Those operations correspond to updating the EPT entries.
+
+Dropping private page and TLB shootdown
+---------------------------------------
+The procedure of dropping the private page looks as follows.
+
+1. TDH.MEM.RANGE.BLOCK(4K level)
+   This mostly corresponds to clear the present bit in the EPT entry.  This
+   prevents (or blocks) TLB entry from creating in the future.  Note that =
the
+   private page is still linked in the secure EPT tree and the existing ca=
che
+   entry in the TLB isn't flushed.
+2. TDH.MEM.TRACK(range) and TLB shootdown
+   This mostly corresponds to the EPT TLB shootdown.  Because all vcpus sh=
are
+   the same Secure EPT, all vcpus need to flush TLB.
+   * TDH.MEM.TRACK(range) by one vcpu.  It increments the global internal =
TLB
+     epoch counter.
+   * send IPI to remote vcpus
+   * Other vcpu exits to VMM from guest TD and then re-enter. TDH.VP.ENTER=
().
+   * TDH.VP.ENTER() checks the TLB epoch counter and If its TLB is old, fl=
ush
+     TLB.
+   Note that only single vcpu issues tdh_mem_track().
+   Note that the private page is still linked in the secure EPT tree, unli=
ke the
+   conventional EPT.
+3. TDH.MEM.PAGE.PROMOTE, TDH.MEM.PAGEDEMOTE(), TDH.MEM.PAGE.RELOCATE(), or
+   TDH.MEM.PAGE.REMOVE()
+   There is no corresponding operation to the conventional EPT.
+   * When changing page size (e.g. 4K <-> 2M) TDH.MEM.PAGE.PROMOTE() or
+     TDH.MEM.PAGE.DEMOTE() is used.  During those operation, the guest pag=
e is
+     kept referenced in the Secure EPT.
+   * When migrating page, TDH.MEM.PAGE.RELOCATE().  This requires both sou=
rce
+     page and destination page.
+   * when destroying TD, TDH.MEM.PAGE.REMOVE() removes the private page fr=
om the
+     secure EPT tree.  In this case TLB shootdown is not needed because vc=
pus
+     don't run any more.
+
+The basic idea for TDX support
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
+Because shared EPT is the same as the existing EPT, use the existing logic=
 for
+shared EPT.  On the other hand, secure EPT requires additional operations
+instead of directly reading/writing of the EPT entry.
+
+On EPT violation, The KVM mmu walks down the EPT tree from the root, deter=
mines
+the EPT entry to operate, and updates the entry. If necessary, a TLB shoot=
down
+is done.  Because it's very slow to directly walk secure EPT by TDX SEAMCA=
LL,
+TDH.MEM.SEPT.RD(), the mirror of secure EPT is created and maintained.  Add
+hooks to KVM MMU to reuse the existing code.
+
+EPT violation on shared GPA
+---------------------------
+(1) EPT violation on shared GPA or zapping shared GPA
+    walk down shared EPT tree (the existing code)
+        |
+        |
+        V
+shared EPT tree (CPU refers.)
+(2) update the EPT entry. (the existing code)
+    TLB shootdown in the case of zapping.
+
+
+EPT violation on private GPA
+----------------------------
+(1) EPT violation on private GPA or zapping private GPA
+    walk down the mirror of secure EPT tree (mostly same as the existing c=
ode)
+        |
+        |
+        V
+mirror of secure EPT tree (KVM MMU software only. reuse of the existing co=
de)
+(2) update the (mirrored) EPT entry. (mostly same as the existing code)
+(3) call the hooks with what EPT entry is changed
+        |
+        NEW: hooks in KVM MMU
+        |
+        V
+secure EPT root(CPU refers)
+(4) the TDX backend calls necessary TDX SEAMCALLs to update real secure EP=
T.
+
+The major modification is to add hooks for the TDX backend for additional
+operations and to pass down which EPT, shared EPT, or private EPT is used,=
 and
+twist the behavior if we're operating on private EPT.
+
+The following depicts the relationship.
+::
+
+                    KVM                             |       TDX module
+                     |                              |           |
+        -------------+----------                    |           |
+        |                      |                    |           |
+        V                      V                    |           |
+     shared GPA           private GPA               |           |
+  CPU shared EPT pointer  KVM private EPT pointer   |  CPU secure EPT poin=
ter
+        |                      |                    |           |
+        |                      |                    |           |
+        V                      V                    |           V
+  shared EPT                private EPT<-------mirror----->Secure EPT
+        |                      |                    |           |
+        |                      \--------------------+------\    |
+        |                                           |      |    |
+        V                                           |      V    V
+  shared guest page                                 |    private guest page
+                                                    |
+                                                    |
+                              non-encrypted memory  |    encrypted memory
+                                                    |
+
+shared EPT: CPU and KVM walk with shared GPA
+            Maintained by the existing code
+private EPT: KVM walks with private GPA
+             Maintained by the twisted existing code
+secure EPT: CPU walks with private GPA.
+            Maintained by TDX module with TDX SEAMCALLs via hooks
+
+
+Tracking private EPT page
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
+Shared EPT pages are managed by struct kvm_mmu_page.  They are linked in a=
 list
+structure.  When necessary, the list is traversed to operate on.  Private =
EPT
+pages have different characteristics.  For example, private pages can't be
+swapped out.  When shrinking memory, we'd like to traverse only shared EPT=
 pages
+and skip private EPT pages.  Likewise, page migration isn't supported for
+private pages (yet).  Introduce an additional list to track shared EPT pag=
es and
+track private EPT pages independently.
+
+At the beginning of EPT violation, the fault handler knows fault GPA, thus=
 it
+knows which EPT to operate on, private or shared.  If it's private EPT,
+an additional task is done.  Something like "if (private) { callback a hoo=
k }".
+Since the fault handler has deep function calls, it's cumbersome to hold t=
he
+information of which EPT is operating.  Options to mitigate it are
+
+1. Pass the information as an argument for the function call.
+2. Record the information in struct kvm_mmu_page somehow.
+3. Record the information in vcpu structure.
+
+Option 2 was chosen.  Because option 1 requires modifying all the function=
s.  It
+would affect badly to the normal case.  Option 3 doesn't work well because=
 in
+some cases, we need to walk both private and shared EPT.
+
+The role of the EPT page can be utilized and one bit can be curved out from
+unused bits in struct kvm_mmu_page_role.  When allocating the EPT page,
+initialize the information. Mostly struct kvm_mmu_page is available because
+we're operating on EPT pages.
+
+
+The conversion of private GPA and shared GPA
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+A page of a given GPA can be assigned to only private GPA xor shared GPA a=
t one
+time.  The GPA can't be accessed simultaneously via both private GPA and s=
hared
+GPA.  On guest startup, all the GPAs are assigned as private.  Guest conve=
rts
+the range of GPA to shared (or private) from private (or shared) by MapGPA
+hypercall.  MapGPA hypercall takes the start GPA and the size of the regio=
n.  If
+the given start GPA is shared, VMM converts the region into shared (if it's
+already shared, nop).  If the start GPA is private, VMM converts the regio=
n into
+private.  It implies the guest won't access the unmapped region. private(or
+shared) region after converting to shared(or private).
+
+If the guest TD triggers an EPT violation on the already converted region,=
 the
+access won't be allowed (loop in EPT violation) until other vcpu converts =
back
+the region.
+
+KVM MMU records which GPA is allowed to access, private or shared.  It ste=
als
+software usable bit from MMU present mask.  SPTE_SHARED_MASK.  The bit is
+recorded in both shared EPT and the mirror of secure EPT.
+
+* If SPTE_SHARED_MASK cleared in the shared EPT and the mirror of secure E=
PT:
+  Private GPA is allowed. Shared GPA is not allowed.
+
+* SPTE_SHARED_MASK set in the shared EPT and the mirror of secure EPT:
+  Private GPA is not allowed. Shared GPA is allowed.
+
+The default is that SPTE_SHARED_MASK is cleared so that the existing KVM
+MMU code (mostly) works.
+
+The reason why the bit is recorded in both shared and private EPT is to op=
timize
+for EPT violation path by penalizing MapGPA hypercall.
+
+The state machine of EPT entry
+------------------------------
+(private EPT entry, shared EPT entry) =3D
+        (non-present, non-present):             private mapping is allowed
+        (present, non-present):                 private mapping is mapped
+        (non-present | SPTE_SHARED_MASK, non-present | SPTE_SHARED_MASK):
+                                                shared mapping is allowed
+        (non-present | SPTE_SHARED_MASK, present | SPTE_SHARED_MASK):
+                                                shared mapping is mapped
+        (present | SPTE_SHARED_MASK, any)       invalid combination
+
+* map_gpa(private GPA): Mark the region that private GPA is allowed(NEW)
+        private EPT entry: clear SPTE_SHARED_MASK
+          present: nop
+          non-present: nop
+          non-present | SPTE_SHARED_MASK -> non-present (clear SPTE_SHARED=
_MASK)
+
+        shared EPT entry: zap the entry, clear SPTE_SHARED_MASK
+          present: invalid
+          non-present -> non-present: nop
+          present | SPTE_SHARED_MASK -> non-present
+          non-present | SPTE_SHARED_MASK -> non-present
+
+* map_gpa(shared GPA): Mark the region that shared GPA is allowed(NEW)
+        private EPT entry: zap and set SPTE_SHARED_MASK
+          present     -> non-present | SPTE_SHARED_MASK
+          non-present -> non-present | SPTE_SHARED_MASK
+          non-present | SPTE_SHARED_MASK: nop
+
+        shared EPT entry: set SPTE_SHARED_MASK
+          present: invalid
+          non-present -> non-present | SPTE_SHARED_MASK
+          present | SPTE_SHARED_MASK -> present | SPTE_SHARED_MASK: nop
+          non-present | SPTE_SHARED_MASK -> non-present | SPTE_SHARED_MASK=
: nop
+
+* map(private GPA)
+        private EPT entry
+          present: nop
+          non-present -> present
+          non-present | SPTE_SHARED_MASK: nop. looping on EPT violation(NE=
W)
+
+        shared EPT entry: nop
+
+* map(shared GPA)
+        private EPT entry: nop
+
+        shared EPT entry
+          present: invalid
+          present | SPTE_SHARED_MASK: nop
+          non-present | SPTE_SHARED_MASK -> present | SPTE_SHARED_MASK
+          non-present: nop. looping on EPT violation(NEW)
+
+* zap(private GPA)
+        private EPT entry: zap the entry with keeping SPTE_SHARED_MASK
+          present -> non-present
+          present | SPTE_SHARED_MASK: invalid
+          non-present: nop as is_shadow_present_pte() is checked
+          non-present | SPTE_SHARED_MASK: nop as is_shadow_present_pte() is
+                                          checked
+
+        shared EPT entry: nop
+
+* zap(shared GPA)
+        private EPT entry: nop
+
+        shared EPT entry: zap
+          any -> non-present
+          present: invalid
+          present | SPTE_SHARED_MASK -> non-present | SPTE_SHARED_MASK
+          non-present: nop as is_shadow_present_pte() is checked
+          non-present | SPTE_SHARED_MASK: nop as is_shadow_present_pte() is
+                                          checked
+
+
+The original TDP MMU and race condition
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+Because vcpus share the EPT, once the EPT entry is zapped, we need to shoo=
tdown
+TLB.  Send IPI to remote vcpus.  Remote vcpus flush their down TLBs.  Unti=
l TLB
+shootdown is done, vcpus may reference the zapped guest page.
+
+TDP MMU uses read lock of mmu_lock to mitigate vcpu contention.  When read=
 lock
+is obtained, it depends on the atomic update of the EPT entry.  (On the ot=
her
+hand legacy MMU uses write lock.)  When vcpu is populating/zapping the EPT=
 entry
+with a read lock held, other vcpu may be populating or zapping the same EPT
+entry at the same time.
+
+To avoid the race condition, the entry is frozen.  It means the EPT entry =
is set
+to the special value, REMOVED_SPTE which clears the present bit.  And then=
 after
+TLB shootdown, update the EPT entry to the final value.
+
+Concurrent zapping
+------------------
+1. read lock
+2. freeze the EPT entry (atomically set the value to REMOVED_SPTE)
+   If other vcpu froze the entry, restart page fault.
+3. TLB shootdown
+   * send IPI to remote vcpus
+   * TLB flush (local and remote)
+   For each entry update, TLB shootdown is needed because of the
+   concurrency.
+4. atomically set the EPT entry to the final value
+5. read unlock
+
+Concurrent populating
+---------------------
+In the case of populating the non-present EPT entry, atomically update the=
 EPT
+entry.
+1. read lock
+2. atomically update the EPT entry
+   If other vcpu frozen the entry or updated the entry, restart page fault.
+3. read unlock
+
+In the case of updating the present EPT entry (e.g. page migration), the
+operation is split into two.  Zapping the entry and populating the entry.
+1. read lock
+2. zap the EPT entry.  follow the concurrent zapping case.
+3. populate the non-present EPT entry.
+4. read unlock
+
+Non-concurrent batched zapping
+------------------------------
+In some cases, zapping the ranges is done exclusively with a write lock he=
ld.
+In this case, the TLB shootdown is batched into one.
+
+1. write lock
+2. zap the EPT entries by traversing them
+3. TLB shootdown
+4. write unlock
+
+
+For Secure EPT, TDX SEAMCALLs are needed in addition to updating the mirro=
red
+EPT entry.
+
+TDX concurrent zapping
+----------------------
+Add a hook for TDX SEAMCALLs at the step of the TLB shootdown.
+
+1. read lock
+2. freeze the EPT entry(set the value to REMOVED_SPTE)
+3. TLB shootdown via a hook
+   * TLB.MEM.RANGE.BLOCK()
+   * TLB.MEM.TRACK()
+   * send IPI to remote vcpus
+4. set the EPT entry to the final value
+5. read unlock
+
+TDX concurrent populating
+-------------------------
+TDX SEAMCALLs are required in addition to operating the mirrored EPT entry=
.  The
+frozen entry is utilized by following the zapping case to avoid the race
+condition.  A hook can be added.
+
+1. read lock
+2. freeze the EPT entry
+3. hook
+   * TDH_MEM_SEPT_ADD() for non-leaf or TDH_MEM_PAGE_AUG() for leaf.
+4. set the EPT entry to the final value
+5. read unlock
+
+Without freezing the entry, the following race can happen.  Suppose two vc=
pus
+are faulting on the same GPA and the 2M and 4K level entries aren't popula=
ted
+yet.
+
+* vcpu 1: update 2M level EPT entry
+* vcpu 2: update 4K level EPT entry
+* vcpu 2: TDX SEAMCALL to update 4K secure EPT entry =3D> error
+* vcpu 1: TDX SEAMCALL to update 2M secure EPT entry
+
+
+TDX non-concurrent batched zapping
+----------------------------------
+For simplicity, the procedure of concurrent populating is utilized.  The
+procedure can be optimized later.
+
+
+Co-existing with unmapping guest private memory
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+TODO.  This needs to be addressed.
+
+
+Restrictions or future work
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
+The following features aren't supported yet at the moment.
+
+* optimizing non-concurrent zap
+* Large page
+* Page migration
--=20
2.25.1
From nobody Sat Feb  7 23:07:58 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7A0F8C35280
	for <linux-kernel@archiver.kernel.org>; Thu,  5 May 2022 18:28:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1385947AbiEESa4 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 5 May 2022 14:30:56 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37446 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1383414AbiEEST6 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 5 May 2022 14:19:58 -0400
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE1921D0FB;
        Thu,  5 May 2022 11:16:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1651774571; x=1683310571;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=sOBwFi6TijFmWxRR2u7LJCs6ZAO9aD248QKXtMkkXSw=;
  b=kcApizlDGwjCaI/+rwCfs04HIP5M+Jep580Ognvflvymu8K8xRME6FiL
   iHWfn1rIvQ0SzwVjTbj+TDgQrA3LnTWVEYiUVp21/2g9Cy88e4l+7Ll44
   Bc0qtj7uh12wgD1Od6RuEUbpkjCsoBrP7r4NUQU0fSmccZgOO0simiZh0
   EBFdlqEifpe6DyC9Q4tIMdJsP6zHtt0ikZEVr0diecSU+ChbMCiJVDHty
   4MoD53pjBPnTgqbj3ZFxXt+tn3qEyNhOg79m+iI6qIz9EZTwjo7Kx4Ruz
   zzu+PUaeEbokeYvc/P+Efuarr0IgivYGFR4oPCOiGbNw405EV1YTlycMZ
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268097129"
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="268097129"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:56 -0700
X-IronPort-AV: E=Sophos;i="5.91,202,1647327600";
   d="scan'208";a="665083517"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 May 2022 11:15:56 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>
Subject: [RFC PATCH v6 104/104] [MARKER] the end of (the first phase of) TDX
 KVM patch series
Date: Thu,  5 May 2022 11:15:38 -0700
Message-Id: 
 <054442f5c4e7b2ef5da7fb3cdbde3e3cbdd9aeb1.1651774251.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1651774250.git.isaku.yamahata@intel.com>
References: <cover.1651774250.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the end of (the first phase of) patch series
of TDX KVM support.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 .../virt/kvm/intel-tdx-layer-status.rst       | 33 -------------------
 1 file changed, 33 deletions(-)
 delete mode 100644 Documentation/virt/kvm/intel-tdx-layer-status.rst

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
deleted file mode 100644
index 1cec14213f69..000000000000
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ /dev/null
@@ -1,33 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-Intel Trust Dodmain Extensions(TDX)
-=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-
-Layer status
-=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-What qemu can do
-----------------
-- TDX VM TYPE is exposed to Qemu.
-- Qemu can create/destroy guest of TDX vm type.
-- Qemu can create/destroy vcpu of TDX vm type.
-- Qemu can populate initial guest memory image.
-- Qemu can finalize guest TD.
-- Qemu can start to run vcpu. But vcpu can not make progress yet.
-
-Patch Layer status
-------------------
-  Patch layer                          Status
-* TDX, VMX coexistence:                 Applied
-* TDX architectural definitions:        Applied
-* TD VM creation/destruction:           Applied
-* TD vcpu creation/destruction:         Applied
-* TDX EPT violation:                    Applied
-* TD finalization:                      Applied
-* TD vcpu enter/exit:                   Applied
-* TD vcpu interrupts/exit/hypercall:    Not yet
-
-* KVM MMU GPA shared bits:              Applied
-* KVM TDP refactoring for TDX:          Applied
-* KVM TDP MMU hooks:                    Applied
-* KVM TDP MMU MapGPA:                   Applied
--=20
2.25.1