From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D5785C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:24:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230143AbjB0IYI (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:08 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54702 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229516AbjB0IYF (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:05 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 067CDE073;
        Mon, 27 Feb 2023 00:24:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486243; x=1709022243;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=7xbvDeVo6hkQKWgPXjbAZQCa6LensbFjiGsnnkXHDAY=;
  b=O5hRWYX+QAt16JT3QksHnzqQYfdFAx5fKGdW2fNLuccV/01rLHKvgptc
   fFZDhGstQtH5jOom5nKMSu7Topb7ZBKz3VFbvnjnDFWjhmFDovX/PecxW
   M78I/wNWIXOXC/8y/J7NjD7TlBPkaoxSeWpPYNlhyAuRXqdXDxPKoMWw2
   uLtlkS3E2iQ9GXowrKKhjVI/vUfCfIeNg2T0+TgW5zZKFfAP2Z1Psy7Dj
   ktAVW3ZNBl60MPdIhpl6zsBfce7QvkQ0f6oa/ruFxJV++E5k2EINKT+cO
   PT7rrya3m0pFxyamiLMwF8U8iFMEWj6ZhIwNKvNxKHrQ2rzElBW3Ky11s
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608660"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608660"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:01 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242009"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242009"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:00 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 001/106] [MARKER] The start of TDX KVM patch series: TDX
 architectural definitions
Date: Mon, 27 Feb 2023 00:22:00 -0800
Message-Id: 
 <93282e130ec3e1cd3d30195e01cdec2a52bcb371.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TDX architectural
definitions.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 .../virt/kvm/intel-tdx-layer-status.rst       | 28 +++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100644 Documentation/virt/kvm/intel-tdx-layer-status.rst

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
new file mode 100644
index 000000000000..db32e89e16e9
--- /dev/null
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -0,0 +1,28 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+Intel Trust Dodmain Extensions(TDX)
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Layer status
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+What qemu can do
+----------------
+- TDX VM TYPE is exposed to Qemu.
+- Qemu can try to create VM of TDX VM type and then fails.
+
+Patch Layer status
+------------------
+  Patch layer                          Status
+* TDX, VMX coexistence:                 Applied
+* TDX architectural definitions:        Applying
+* TD VM creation/destruction:           Not yet
+* TD vcpu creation/destruction:         Not yet
+* TDX EPT violation:                    Not yet
+* TD finalization:                      Not yet
+* TD vcpu enter/exit:                   Not yet
+* TD vcpu interrupts/exit/hypercall:    Not yet
+
+* KVM MMU GPA shared bits:              Not yet
+* KVM TDP refactoring for TDX:          Not yet
+* KVM TDP MMU hooks:                    Not yet
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E2766C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:24:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230269AbjB0IY1 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:27 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54732 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230135AbjB0IYH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:07 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AFE6E078;
        Mon, 27 Feb 2023 00:24:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486245; x=1709022245;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=XWf25yfdMbsBdduhmI3HG8soYrr5G48MYSQTf93wnMU=;
  b=km9J4teK2cSqU3ewN4cicq0qOxPKomXFaBsBOsSaaaDw5sLYexcXuy7N
   WxLPW04tVn/MVXMylVjmMwSpljw8CbB3IiqFyc7Qbn5xKykaTWKJ+fxGx
   cCUVY8HRsJoXnBVvT6h07G5/zrnzIrEbWqancpE0APa5hNEUDgXf/UWep
   1ase9IdVUVJoIsJcrCUOikJEU5N1cS+WC+T/hPB6g8W/vZuoUn4b3uszS
   ExfpkxTHzE/v5ycftvtb9S5DELvJR9tL2ERQV8j1ere5iIvMEDZIckUqw
   YBtaJG9922IirbYEeeZg1B1EeLTNHJ3PpwgZMW6gt3Ho2NBIcHX3tJ2W9
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608666"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608666"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:01 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242012"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242012"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:00 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 002/106] KVM: TDX: Define TDX architectural definitions
Date: Mon, 27 Feb 2023 00:22:01 -0800
Message-Id: 
 <fc842c13c20a19d8046a3611a3d3c070fdca157e.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Define architectural definitions for KVM to issue the TDX SEAMCALLs.

Structures and values that are architecturally defined in the TDX module
specifications the chapter of ABI Reference.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx_arch.h | 168 ++++++++++++++++++++++++++++++++++++
 1 file changed, 168 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_arch.h

diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
new file mode 100644
index 000000000000..942a0e561a7b
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_arch.h
@@ -0,0 +1,168 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* architectural constants/data definitions for TDX SEAMCALLs */
+
+#ifndef __KVM_X86_TDX_ARCH_H
+#define __KVM_X86_TDX_ARCH_H
+
+#include <linux/types.h>
+
+/*
+ * TDX SEAMCALL API function leaves
+ */
+#define TDH_VP_ENTER			0
+#define TDH_MNG_ADDCX			1
+#define TDH_MEM_PAGE_ADD		2
+#define TDH_MEM_SEPT_ADD		3
+#define TDH_VP_ADDCX			4
+#define TDH_MEM_PAGE_RELOCATE		5
+#define TDH_MEM_PAGE_AUG		6
+#define TDH_MEM_RANGE_BLOCK		7
+#define TDH_MNG_KEY_CONFIG		8
+#define TDH_MNG_CREATE			9
+#define TDH_VP_CREATE			10
+#define TDH_MNG_RD			11
+#define TDH_MR_EXTEND			16
+#define TDH_MR_FINALIZE			17
+#define TDH_VP_FLUSH			18
+#define TDH_MNG_VPFLUSHDONE		19
+#define TDH_MNG_KEY_FREEID		20
+#define TDH_MNG_INIT			21
+#define TDH_VP_INIT			22
+#define TDH_VP_RD			26
+#define TDH_MNG_KEY_RECLAIMID		27
+#define TDH_PHYMEM_PAGE_RECLAIM		28
+#define TDH_MEM_PAGE_REMOVE		29
+#define TDH_MEM_SEPT_REMOVE		30
+#define TDH_MEM_TRACK			38
+#define TDH_MEM_RANGE_UNBLOCK		39
+#define TDH_PHYMEM_CACHE_WB		40
+#define TDH_PHYMEM_PAGE_WBINVD		41
+#define TDH_VP_WR			43
+#define TDH_SYS_LP_SHUTDOWN		44
+
+#define TDG_VP_VMCALL_GET_TD_VM_CALL_INFO		0x10000
+#define TDG_VP_VMCALL_MAP_GPA				0x10001
+#define TDG_VP_VMCALL_GET_QUOTE				0x10002
+#define TDG_VP_VMCALL_REPORT_FATAL_ERROR		0x10003
+#define TDG_VP_VMCALL_SETUP_EVENT_NOTIFY_INTERRUPT	0x10004
+
+/* TDX control structure (TDR/TDCS/TDVPS) field access codes */
+#define TDX_NON_ARCH			BIT_ULL(63)
+#define TDX_CLASS_SHIFT			56
+#define TDX_FIELD_MASK			GENMASK_ULL(31, 0)
+
+#define __BUILD_TDX_FIELD(non_arch, class, field)	\
+	(((non_arch) ? TDX_NON_ARCH : 0) |		\
+	 ((u64)(class) << TDX_CLASS_SHIFT) |		\
+	 ((u64)(field) & TDX_FIELD_MASK))
+
+#define BUILD_TDX_FIELD(class, field)			\
+	__BUILD_TDX_FIELD(false, (class), (field))
+
+#define BUILD_TDX_FIELD_NON_ARCH(class, field)		\
+	__BUILD_TDX_FIELD(true, (class), (field))
+
+
+/* Class code for TD */
+#define TD_CLASS_EXECUTION_CONTROLS	17ULL
+
+/* Class code for TDVPS */
+#define TDVPS_CLASS_VMCS		0ULL
+#define TDVPS_CLASS_GUEST_GPR		16ULL
+#define TDVPS_CLASS_OTHER_GUEST		17ULL
+#define TDVPS_CLASS_MANAGEMENT		32ULL
+
+enum tdx_tdcs_execution_control {
+	TD_TDCS_EXEC_TSC_OFFSET =3D 10,
+};
+
+/* @field is any of enum tdx_tdcs_execution_control */
+#define TDCS_EXEC(field)		BUILD_TDX_FIELD(TD_CLASS_EXECUTION_CONTROLS, (fi=
eld))
+
+/* @field is the VMCS field encoding */
+#define TDVPS_VMCS(field)		BUILD_TDX_FIELD(TDVPS_CLASS_VMCS, (field))
+
+enum tdx_vcpu_guest_other_state {
+	TD_VCPU_STATE_DETAILS_NON_ARCH =3D 0x100,
+};
+
+union tdx_vcpu_state_details {
+	struct {
+		u64 vmxip	: 1;
+		u64 reserved	: 63;
+	};
+	u64 full;
+};
+
+/* @field is any of enum tdx_guest_other_state */
+#define TDVPS_STATE(field)		BUILD_TDX_FIELD(TDVPS_CLASS_OTHER_GUEST, (fiel=
d))
+#define TDVPS_STATE_NON_ARCH(field)	BUILD_TDX_FIELD_NON_ARCH(TDVPS_CLASS_O=
THER_GUEST, (field))
+
+/* Management class fields */
+enum tdx_vcpu_guest_management {
+	TD_VCPU_PEND_NMI =3D 11,
+};
+
+/* @field is any of enum tdx_vcpu_guest_management */
+#define TDVPS_MANAGEMENT(field)		BUILD_TDX_FIELD(TDVPS_CLASS_MANAGEMENT, (=
field))
+
+#define TDX_EXTENDMR_CHUNKSIZE		256
+
+struct tdx_cpuid_value {
+	u32 eax;
+	u32 ebx;
+	u32 ecx;
+	u32 edx;
+} __packed;
+
+#define TDX_TD_ATTRIBUTE_DEBUG		BIT_ULL(0)
+#define TDX_TD_ATTRIBUTE_PKS		BIT_ULL(30)
+#define TDX_TD_ATTRIBUTE_KL		BIT_ULL(31)
+#define TDX_TD_ATTRIBUTE_PERFMON	BIT_ULL(63)
+
+/*
+ * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is=
 1024B.
+ */
+#define TDX_MAX_VCPUS	(~(u16)0)
+
+struct td_params {
+	u64 attributes;
+	u64 xfam;
+	u16 max_vcpus;
+	u8 reserved0[6];
+
+	u64 eptp_controls;
+	u64 exec_controls;
+	u16 tsc_frequency;
+	u8  reserved1[38];
+
+	u64 mrconfigid[6];
+	u64 mrowner[6];
+	u64 mrownerconfig[6];
+	u64 reserved2[4];
+
+	union {
+		struct tdx_cpuid_value cpuid_values[0];
+		u8 reserved3[768];
+	};
+} __packed __aligned(1024);
+
+/*
+ * Guest uses MAX_PA for GPAW when set.
+ * 0: GPA.SHARED bit is GPA[47]
+ * 1: GPA.SHARED bit is GPA[51]
+ */
+#define TDX_EXEC_CONTROL_MAX_GPAW      BIT_ULL(0)
+
+/*
+ * TDX requires the frequency to be defined in units of 25MHz, which is the
+ * frequency of the core crystal clock on TDX-capable platforms, i.e. the =
TDX
+ * module can only program frequencies that are multiples of 25MHz.  The
+ * frequency must be between 100mhz and 10ghz (inclusive).
+ */
+#define TDX_TSC_KHZ_TO_25MHZ(tsc_in_khz)	((tsc_in_khz) / (25 * 1000))
+#define TDX_TSC_25MHZ_TO_KHZ(tsc_in_25mhz)	((tsc_in_25mhz) * (25 * 1000))
+#define TDX_MIN_TSC_FREQUENCY_KHZ		(100 * 1000)
+#define TDX_MAX_TSC_FREQUENCY_KHZ		(10 * 1000 * 1000)
+
+#endif /* __KVM_X86_TDX_ARCH_H */
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 14690C7EE2E
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:24:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230173AbjB0IYL (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:11 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54712 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229542AbjB0IYG (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:06 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AB4FE077;
        Mon, 27 Feb 2023 00:24:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486244; x=1709022244;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=09ex/GWFgm/e3kqVrQ9ERgn2H38TVnuMLzJQ3ZuE9Tg=;
  b=GIBpvo+jfMVLa9ktoGtJ3z7Na40gLuW8uCpmsIQ+b93TSbrdB43M16px
   CjqN0vKWkbI4Viv02WZH/BIKbKUALPFWwwFpN0wrZMbQWcEp65XMYz8Kf
   gnTfvvxSN/197+qmzYRcglJFzToRy1Cv4ZQoCzJucGK8fq1w4lbzs9MYl
   xbW028ZoSlzDU8+/2J3Ff6NwuUyLjoBRaJnqS3ookzvB7ZDmVoubpOk8E
   UbL+yE4QuO/0RH+DwiQitT63M47cbuwl2OJGECyLrMNOg6eFgq6T+pUN0
   ANcxAzsuTw23A37Frzhs4MwowfkVd74lAD2d5bdZpgZ5pn4bpvtfztTew
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608669"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608669"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:01 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242015"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242015"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:00 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 003/106] KVM: TDX: Add TDX "architectural" error codes
Date: Mon, 27 Feb 2023 00:22:02 -0800
Message-Id: 
 <2808e1a3f65d7d91a21f7073ae67bae066816a2e.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add error codes for the TDX SEAMCALLs both for TDX VMM side for TDH
SEAMCALL and TDX guest side for TDG.VP.VMCALL.  KVM issues the TDX
SEAMCALLs and checks its error code.  KVM handles hypercall from the TDX
guest and may return an error.  So error code for the TDX guest is also
needed.

TDX SEAMCALL uses bits 31:0 to return more information, so these error
codes will only exactly match RAX[63:32].  Error codes for TDG.VP.VMCALL is
defined by TDX Guest-Host-Communication interface spec.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx_errno.h | 38 ++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_errno.h

diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h
new file mode 100644
index 000000000000..389b1b53da25
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_errno.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* architectural status code for SEAMCALL */
+
+#ifndef __KVM_X86_TDX_ERRNO_H
+#define __KVM_X86_TDX_ERRNO_H
+
+#define TDX_SEAMCALL_STATUS_MASK		0xFFFFFFFF00000000ULL
+
+/*
+ * TDX SEAMCALL Status Codes (returned in RAX)
+ */
+#define TDX_SUCCESS				0x0000000000000000ULL
+#define TDX_NON_RECOVERABLE_VCPU		0x4000000100000000ULL
+#define TDX_INTERRUPTED_RESUMABLE		0x8000000300000000ULL
+#define TDX_OPERAND_BUSY			0x8000020000000000ULL
+#define TDX_VCPU_NOT_ASSOCIATED			0x8000070200000000ULL
+#define TDX_KEY_GENERATION_FAILED		0x8000080000000000ULL
+#define TDX_KEY_STATE_INCORRECT			0xC000081100000000ULL
+#define TDX_KEY_CONFIGURED			0x0000081500000000ULL
+#define TDX_NO_HKID_READY_TO_WBCACHE		0x0000082100000000ULL
+#define TDX_EPT_WALK_FAILED			0xC0000B0000000000ULL
+
+/*
+ * TDG.VP.VMCALL Status Codes (returned in R10)
+ */
+#define TDG_VP_VMCALL_SUCCESS			0x0000000000000000ULL
+#define TDG_VP_VMCALL_RETRY			0x0000000000000001ULL
+#define TDG_VP_VMCALL_INVALID_OPERAND		0x8000000000000000ULL
+#define TDG_VP_VMCALL_TDREPORT_FAILED		0x8000000000000001ULL
+
+/*
+ * TDX module operand ID, appears in 31:0 part of error code as
+ * detail information
+ */
+#define TDX_OPERAND_ID_RCX			0x01
+#define TDX_OPERAND_ID_SEPT			0x92
+
+#endif /* __KVM_X86_TDX_ERRNO_H */
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 20F4AC7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:24:32 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230293AbjB0IYa (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:30 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54734 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230076AbjB0IYH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:07 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A411CE07C;
        Mon, 27 Feb 2023 00:24:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486245; x=1709022245;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Dgj0M7UMFofyfMJITVozaVnHr8zNMmT0ZxbhZtvkq0M=;
  b=e2wMI8Z+MeqGZI6cGoY7Akb0lvuiTteHcz1XDJfJwU1PNQF9BvBIhJRk
   H8bvR2pjHnSIGzqlCcLn8/HdyTnvAOJQ3B9Py1xDTYGYXJjvkP7Q4QWqa
   ZTwUCXgFIRpFjWGIf7s2Zfs1P3eaV6JT41xPOMUFHNQxgSBI/pp42fpQf
   /vvpnR3GduZNxxtwfNU7y/olZJaNtQyw2AZWQ7xNWbahFl2GGAXXpUyYH
   b13cGh+7KaNZiklQv72TCkmEbFvdMYOERBkc12ItMWJ7XcszyUpWrb5Hs
   rT2B111/X0Yu5Qpu1sA+bRHYhlzXYwZD1IY63yH26QOkCNuZ6dgMlhUXR
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608675"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608675"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:01 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242018"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242018"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:01 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 004/106] KVM: TDX: Add C wrapper functions for SEAMCALLs
 to the TDX module
Date: Mon, 27 Feb 2023 00:22:03 -0800
Message-Id: 
 <a5a858d26109a223e3c75295736c0c38e99ad1d8.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

A VMM interacts with the TDX module using a new instruction (SEAMCALL).  A
TDX VMM uses SEAMCALLs where a VMX VMM would have directly interacted with
VMX instructions.  For instance, a TDX VMM does not have full access to the
VM control structure corresponding to VMX VMCS.  Instead, a VMM induces the
TDX module to act on behalf via SEAMCALLs.

Export __seamcall and define C wrapper functions for SEAMCALLs for
readability.

Some SEAMCALL APIs donates host pages to TDX module or guest TD and the
donated pages are encrypted.  Some of such SEAMCALLs flush cache lines
(typically by movdir64b instruction), some don't.  Those that doesn't
clear cache lines require the VMM to flush the cache lines to avoid cache
line alias.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/tdx.h       |   2 +
 arch/x86/kvm/vmx/tdx_ops.h       | 185 +++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/seamcall.S |   2 +
 3 files changed, 189 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx_ops.h

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 2b2efaa4bc0e..9c61d247c425 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -108,6 +108,8 @@ static inline long tdx_kvm_hypercall(unsigned int nr, u=
nsigned long p1,
 bool platform_tdx_enabled(void);
 int tdx_enable(void);
 int tdx_cpu_online(unsigned int cpu);
+u64 __seamcall(u64 op, u64 rcx, u64 rdx, u64 r8, u64 r9,
+	       struct tdx_module_output *out);
 #else	/* !CONFIG_INTEL_TDX_HOST */
 static inline bool platform_tdx_enabled(void) { return false; }
 static inline int tdx_enable(void)  { return -EINVAL; }
diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
new file mode 100644
index 000000000000..85adbf49c277
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -0,0 +1,185 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* constants/data definitions for TDX SEAMCALLs */
+
+#ifndef __KVM_X86_TDX_OPS_H
+#define __KVM_X86_TDX_OPS_H
+
+#include <linux/compiler.h>
+
+#include <asm/cacheflush.h>
+#include <asm/asm.h>
+#include <asm/kvm_host.h>
+
+#include "tdx_errno.h"
+#include "tdx_arch.h"
+
+#ifdef CONFIG_INTEL_TDX_HOST
+
+static inline u64 tdh_mng_addcx(hpa_t tdr, hpa_t addr)
+{
+	clflush_cache_range(__va(addr), PAGE_SIZE);
+	return __seamcall(TDH_MNG_ADDCX, addr, tdr, 0, 0, NULL);
+}
+
+static inline u64 tdh_mem_page_add(hpa_t tdr, gpa_t gpa, hpa_t hpa, hpa_t =
source,
+				   struct tdx_module_output *out)
+{
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	return __seamcall(TDH_MEM_PAGE_ADD, gpa, tdr, hpa, source, out);
+}
+
+static inline u64 tdh_mem_sept_add(hpa_t tdr, gpa_t gpa, int level, hpa_t =
page,
+				   struct tdx_module_output *out)
+{
+	clflush_cache_range(__va(page), PAGE_SIZE);
+	return __seamcall(TDH_MEM_SEPT_ADD, gpa | level, tdr, page, 0, out);
+}
+
+static inline u64 tdh_mem_sept_remove(hpa_t tdr, gpa_t gpa, int level,
+				      struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MEM_SEPT_REMOVE, gpa | level, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_vp_addcx(hpa_t tdvpr, hpa_t addr)
+{
+	clflush_cache_range(__va(addr), PAGE_SIZE);
+	return __seamcall(TDH_VP_ADDCX, addr, tdvpr, 0, 0, NULL);
+}
+
+static inline u64 tdh_mem_page_relocate(hpa_t tdr, gpa_t gpa, hpa_t hpa,
+					struct tdx_module_output *out)
+{
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	return __seamcall(TDH_MEM_PAGE_RELOCATE, gpa, tdr, hpa, 0, out);
+}
+
+static inline u64 tdh_mem_page_aug(hpa_t tdr, gpa_t gpa, hpa_t hpa,
+				   struct tdx_module_output *out)
+{
+	clflush_cache_range(__va(hpa), PAGE_SIZE);
+	return __seamcall(TDH_MEM_PAGE_AUG, gpa, tdr, hpa, 0, out);
+}
+
+static inline u64 tdh_mem_range_block(hpa_t tdr, gpa_t gpa, int level,
+				      struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MEM_RANGE_BLOCK, gpa | level, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_mng_key_config(hpa_t tdr)
+{
+	return __seamcall(TDH_MNG_KEY_CONFIG, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_create(hpa_t tdr, int hkid)
+{
+	clflush_cache_range(__va(tdr), PAGE_SIZE);
+	return __seamcall(TDH_MNG_CREATE, tdr, hkid, 0, 0, NULL);
+}
+
+static inline u64 tdh_vp_create(hpa_t tdr, hpa_t tdvpr)
+{
+	clflush_cache_range(__va(tdvpr), PAGE_SIZE);
+	return __seamcall(TDH_VP_CREATE, tdvpr, tdr, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_rd(hpa_t tdr, u64 field, struct tdx_module_outpu=
t *out)
+{
+	return __seamcall(TDH_MNG_RD, tdr, field, 0, 0, out);
+}
+
+static inline u64 tdh_mr_extend(hpa_t tdr, gpa_t gpa,
+				struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MR_EXTEND, gpa, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_mr_finalize(hpa_t tdr)
+{
+	return __seamcall(TDH_MR_FINALIZE, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_vp_flush(hpa_t tdvpr)
+{
+	return __seamcall(TDH_VP_FLUSH, tdvpr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_vpflushdone(hpa_t tdr)
+{
+	return __seamcall(TDH_MNG_VPFLUSHDONE, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_key_freeid(hpa_t tdr)
+{
+	return __seamcall(TDH_MNG_KEY_FREEID, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mng_init(hpa_t tdr, hpa_t td_params,
+			       struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MNG_INIT, tdr, td_params, 0, 0, out);
+}
+
+static inline u64 tdh_vp_init(hpa_t tdvpr, u64 rcx)
+{
+	return __seamcall(TDH_VP_INIT, tdvpr, rcx, 0, 0, NULL);
+}
+
+static inline u64 tdh_vp_rd(hpa_t tdvpr, u64 field,
+			    struct tdx_module_output *out)
+{
+	return __seamcall(TDH_VP_RD, tdvpr, field, 0, 0, out);
+}
+
+static inline u64 tdh_mng_key_reclaimid(hpa_t tdr)
+{
+	return __seamcall(TDH_MNG_KEY_RECLAIMID, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_phymem_page_reclaim(hpa_t page,
+					  struct tdx_module_output *out)
+{
+	return __seamcall(TDH_PHYMEM_PAGE_RECLAIM, page, 0, 0, 0, out);
+}
+
+static inline u64 tdh_mem_page_remove(hpa_t tdr, gpa_t gpa, int level,
+				      struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MEM_PAGE_REMOVE, gpa | level, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_sys_lp_shutdown(void)
+{
+	return __seamcall(TDH_SYS_LP_SHUTDOWN, 0, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mem_track(hpa_t tdr)
+{
+	return __seamcall(TDH_MEM_TRACK, tdr, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_mem_range_unblock(hpa_t tdr, gpa_t gpa, int level,
+					struct tdx_module_output *out)
+{
+	return __seamcall(TDH_MEM_RANGE_UNBLOCK, gpa | level, tdr, 0, 0, out);
+}
+
+static inline u64 tdh_phymem_cache_wb(bool resume)
+{
+	return __seamcall(TDH_PHYMEM_CACHE_WB, resume ? 1 : 0, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_phymem_page_wbinvd(hpa_t page)
+{
+	return __seamcall(TDH_PHYMEM_PAGE_WBINVD, page, 0, 0, 0, NULL);
+}
+
+static inline u64 tdh_vp_wr(hpa_t tdvpr, u64 field, u64 val, u64 mask,
+			    struct tdx_module_output *out)
+{
+	return __seamcall(TDH_VP_WR, tdvpr, field, val, mask, out);
+}
+#endif /* CONFIG_INTEL_TDX_HOST */
+
+#endif /* __KVM_X86_TDX_OPS_H */
diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamc=
all.S
index f81be6b9c133..b90a7fe05494 100644
--- a/arch/x86/virt/vmx/tdx/seamcall.S
+++ b/arch/x86/virt/vmx/tdx/seamcall.S
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include <linux/linkage.h>
+#include <asm/export.h>
 #include <asm/frame.h>
=20
 #include "tdxcall.S"
@@ -50,3 +51,4 @@ SYM_FUNC_START(__seamcall)
 	FRAME_END
 	RET
 SYM_FUNC_END(__seamcall)
+EXPORT_SYMBOL_GPL(__seamcall)
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1F195C7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:24:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230250AbjB0IYX (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:23 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54750 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230138AbjB0IYH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:07 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB543E077;
        Mon, 27 Feb 2023 00:24:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486246; x=1709022246;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=LSvblS0zzLFZPSvb7ezpy2oP+GCdiZKRgqQI1tfHIBE=;
  b=aqE+t4iK7UnEBDwkXJkgvWD4bVtp69ifBU2TZ2G+gmJPNmDMF3Px+n4/
   VO91ZcRuQ299BymgQ9SlWv47+qjEXw+ua41SoKBlcadqC5sFsoo9WzM3F
   MAKZtYs+XGv4hM5s2ifvQtf28eBZmros8byArXMdN0IyzGmfVMQMiXPqn
   rukmDFxdFt0CSd32oMphDX6nWjns0a7wMIJU8aqnBhAybJq4SIaRKT9Ah
   7wGjsc0g/hwBuAfz15f67Eal/6jU11fh5NidhPDjXcdoV5fY8D9dCWKNS
   qWE6NgVXboJJJNctzAXllG8iP66TEh0d8uSq4j1vaRwlBBwtk1uV94h8b
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608679"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608679"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:01 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242022"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242022"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:01 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 005/106] KVM: TDX: Add helper functions to print TDX
 SEAMCALL error
Date: Mon, 27 Feb 2023 00:22:04 -0800
Message-Id: 
 <083041b9149ffbd0a94396aad13126ded3675974.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add helper functions to print out errors from the TDX module in a uniform
manner.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/Makefile        |  2 +-
 arch/x86/kvm/vmx/tdx_error.c | 21 +++++++++++++++++++++
 arch/x86/kvm/vmx/tdx_ops.h   |  3 +++
 3 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kvm/vmx/tdx_error.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 4b01ab842ab7..e3354b784e10 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -25,7 +25,7 @@ kvm-$(CONFIG_KVM_SMM)	+=3D smm.o
 kvm-intel-y		+=3D vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
 			   vmx/hyperv.o vmx/nested.o vmx/posted_intr.o vmx/main.o
 kvm-intel-$(CONFIG_X86_SGX_KVM)	+=3D vmx/sgx.o
-kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+=3D vmx/tdx.o
+kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+=3D vmx/tdx.o vmx/tdx_error.o
=20
 kvm-amd-y		+=3D svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o \
 			   svm/sev.o svm/hyperv.o
diff --git a/arch/x86/kvm/vmx/tdx_error.c b/arch/x86/kvm/vmx/tdx_error.c
new file mode 100644
index 000000000000..574b72d34e1e
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_error.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+/* functions to record TDX SEAMCALL error */
+
+#include <linux/kernel.h>
+#include <linux/bug.h>
+
+#include "tdx_ops.h"
+
+void pr_tdx_error(u64 op, u64 error_code, const struct tdx_module_output *=
out)
+{
+	if (!out) {
+		pr_err_ratelimited("SEAMCALL[%lld] failed: 0x%llx\n",
+				   op, error_code);
+		return;
+	}
+
+	pr_err_ratelimited("SEAMCALL[%lld] failed: 0x%llx RCX 0x%llx, RDX 0x%llx,"
+			   " R8 0x%llx, R9 0x%llx, R10 0x%llx, R11 0x%llx\n",
+			   op, error_code,
+			   out->rcx, out->rdx, out->r8, out->r9, out->r10, out->r11);
+}
diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
index 85adbf49c277..8cc2f01c509b 100644
--- a/arch/x86/kvm/vmx/tdx_ops.h
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -9,12 +9,15 @@
 #include <asm/cacheflush.h>
 #include <asm/asm.h>
 #include <asm/kvm_host.h>
+#include <asm/tdx.h>
=20
 #include "tdx_errno.h"
 #include "tdx_arch.h"
=20
 #ifdef CONFIG_INTEL_TDX_HOST
=20
+void pr_tdx_error(u64 op, u64 error_code, const struct tdx_module_output *=
out);
+
 static inline u64 tdh_mng_addcx(hpa_t tdr, hpa_t addr)
 {
 	clflush_cache_range(__va(addr), PAGE_SIZE);
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D5365C7EE30
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:24:20 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230236AbjB0IYT (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:19 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54736 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230132AbjB0IYH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:07 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5900BE073;
        Mon, 27 Feb 2023 00:24:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486246; x=1709022246;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=SAIRBW3GQF6z72q7YVmwyvFnCXNO+WesxE0zpp7Gks8=;
  b=EvjLzh+tPjSBvcmWeYZEoSHQgl2otqMMlkUSgRSSYm2LxJCRUvT+StKK
   7dd5kM/CV0Smv3aU0NnvhWB38fFtZmZ3Vux/errl0aXP7KFLF6WE5rM+v
   YQ0sQTwGWj/KgmF1v+d/O4uoMv9ege5Waeri0z46DIh8DpYiNTmK8BhVg
   8B48fQeKngQQR3/6KoR+N8Deey8d2bP+THjf7jvVzkk9ntN/5uewsqhGW
   73QdTud5iQyTTnPtDm4T/QvdIH438TsIef29PAiXujEkTF81vFIIdXqMi
   2iSYXFt/7x0kvZAE+gfIL9Y8dyyLX3ViZAdYxdMep2gFdI72L+b7NhHwX
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608681"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608681"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:02 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242026"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242026"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:01 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 006/106] [MARKER] The start of TDX KVM patch series: TD VM
 creation/destruction
Date: Mon, 27 Feb 2023 00:22:05 -0800
Message-Id: 
 <e73a137311439186a635bd1713f7dea4463d76f1.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD VM
creation/destruction.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index db32e89e16e9..221372cfb4af 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -15,8 +15,8 @@ Patch Layer status
 ------------------
   Patch layer                          Status
 * TDX, VMX coexistence:                 Applied
-* TDX architectural definitions:        Applying
-* TD VM creation/destruction:           Not yet
+* TDX architectural definitions:        Applied
+* TD VM creation/destruction:           Applying
 * TD vcpu creation/destruction:         Not yet
 * TDX EPT violation:                    Not yet
 * TD finalization:                      Not yet
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3CE45C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:24:35 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230304AbjB0IYd (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:33 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54822 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230150AbjB0IYI (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:08 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59707E3A3;
        Mon, 27 Feb 2023 00:24:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486247; x=1709022247;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=3GNvcfSGcBRvDtcvam+Pl/vBnLhJtO+TDzon9fivrBM=;
  b=BNNQGIjLPFi/XQi7HFMmsJslU9epGj/PAHJGjWsY+OC/K29IbJkotu/I
   vDTq9cuyv4HIdhtZIUOd5rB5JYnuuEaY3PMoPg3BxUr0jpgN2DPF8JTkw
   /n8M+621LyQLTBr7cOUi/JVHPk1YjVHu9Ax4T4epFFVAiGR1vM+HERwO0
   7Qm9oU3lPRZ8TgAvchBhZzsPGJrNjzP1N/wGmyvlIqPgnXYEARsZR5gFb
   +wIiXztvjQCr/WtBLbJFwDlNdAyvnl+P1oFJFPc97ATsd2Ctya3O9KGSQ
   P2lwL37AG55BFNsF48rSzCqjaq/sJU5VeAgdapo2TVFTjmfOuzI/u0u4o
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608686"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608686"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:02 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242029"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242029"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:01 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 007/106] x86/cpu: Add helper functions to allocate/free
 TDX private host key id
Date: Mon, 27 Feb 2023 00:22:06 -0800
Message-Id: 
 <e3d6d84f411a27e0899c391294b601b20455a81f.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX private host key id (HKID) is assigned to guest TD.  The memory
controller encrypts guest TD memory with the assigned TDX HKID.  Add helper
functions to allocate/free TDX private HKID so that TDX KVM can manage it.

Also export the global TDX private HKID that is used to encrypt TDX module,
its memory and some dynamic data (TDR).  When VMM releasing encrypted page
to reuse it, the page needs to be flushed with the used HKID.  VMM needs
the global TDX private HKID to flush such pages.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/tdx.h  | 12 ++++++++++++
 arch/x86/virt/vmx/tdx/tdx.c | 34 +++++++++++++++++++++++++++++++++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 9c61d247c425..2094d634e1a3 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -108,12 +108,24 @@ static inline long tdx_kvm_hypercall(unsigned int nr,=
 unsigned long p1,
 bool platform_tdx_enabled(void);
 int tdx_enable(void);
 int tdx_cpu_online(unsigned int cpu);
+/*
+ * Key id globally used by TDX module: TDX module maps TDR with this TDX g=
lobal
+ * key id.  TDR includes key id assigned to the TD.  Then TDX module maps =
other
+ * TD-related pages with the assigned key id.  TDR requires this TDX globa=
l key
+ * id for cache flush unlike other TD-related pages.
+ */
+extern u32 tdx_global_keyid __ro_after_init;
+int tdx_guest_keyid_alloc(void);
+void tdx_guest_keyid_free(int keyid);
+
 u64 __seamcall(u64 op, u64 rcx, u64 rdx, u64 r8, u64 r9,
 	       struct tdx_module_output *out);
 #else	/* !CONFIG_INTEL_TDX_HOST */
 static inline bool platform_tdx_enabled(void) { return false; }
 static inline int tdx_enable(void)  { return -EINVAL; }
 static inline int tdx_cpu_online(unsigned int cpu) { return 0; }
+static inline int tdx_guest_keyid_alloc(void) { return -EOPNOTSUPP; }
+static inline void tdx_guest_keyid_free(int keyid) { }
 #endif	/* CONFIG_INTEL_TDX_HOST */
=20
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index c291fbd29bb0..cf5431ee3cf8 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -31,7 +31,8 @@
 #include <asm/tdx.h>
 #include "tdx.h"
=20
-static u32 tdx_global_keyid __ro_after_init;
+u32 tdx_global_keyid __ro_after_init;
+EXPORT_SYMBOL_GPL(tdx_global_keyid);
 static u32 tdx_guest_keyid_start __ro_after_init;
 static u32 tdx_nr_guest_keyids __ro_after_init;
=20
@@ -132,6 +133,31 @@ static struct notifier_block tdx_memory_nb =3D {
 	.notifier_call =3D tdx_memory_notifier,
 };
=20
+/* TDX KeyID pool */
+static DEFINE_IDA(tdx_guest_keyid_pool);
+
+int tdx_guest_keyid_alloc(void)
+{
+	if (WARN_ON_ONCE(!tdx_guest_keyid_start || !tdx_nr_guest_keyids))
+		return -EINVAL;
+
+	/* The first keyID is reserved for the global key. */
+	return ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start + 1,
+			       tdx_guest_keyid_start + tdx_nr_guest_keyids - 1,
+			       GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(tdx_guest_keyid_alloc);
+
+void tdx_guest_keyid_free(int keyid)
+{
+	/* keyid =3D 0 is reserved. */
+	if (WARN_ON_ONCE(keyid <=3D 0))
+		return;
+
+	ida_free(&tdx_guest_keyid_pool, keyid);
+}
+EXPORT_SYMBOL_GPL(tdx_guest_keyid_free);
+
 static int __init tdx_init(void)
 {
 	u32 tdx_keyid_start, nr_tdx_keyids;
@@ -1220,6 +1246,12 @@ static int init_tdx_module(void)
 	if (ret)
 		goto out_free_pamts;
=20
+	/*
+	 * Reserve the first TDX KeyID as global KeyID to protect
+	 * TDX module metadata.
+	 */
+	tdx_global_keyid =3D tdx_keyid_start;
+
 	/* Initialize TDMRs to complete the TDX module initialization */
 	ret =3D init_tdmrs(&tdx_tdmr_list);
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C861EC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:24:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230318AbjB0IYg (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:36 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54838 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230151AbjB0IYI (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:08 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DACCEC50;
        Mon, 27 Feb 2023 00:24:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486247; x=1709022247;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=2d593n9V4pv+3WndkfsOASE/b7B8vOJz74H2TDmfMbk=;
  b=RTmURD5zq9ERt+JBljtU7YW+RsyBcI6KggktuVO/ivAx9wQWb90MsxAQ
   ys4KZ691Ii3QZHwppmgdO74oqSQRO9ARTeMPIAKIW7Xr7Npf4pn55+3LO
   R14z/vf6K/hkfcj0lofrhU8//JgiaYHjBtgV5bprOuxa+0uyPf+9v3kYa
   ZzHHsFiTtEh7NJvAXK2s+GsWTtLzRNdOxYthHi/SFxCxFGCchHcyI32q4
   y2DyM3hr6fjRj/upq5oa+OjCjC5KCdXimPPNQMNZJhmUvrIhDJwBp/tCw
   SQDMaHi6vn7VtZNb5Mh8WvljbV95dc0C2cC/rGtqA5HmT7YMcMEBbv6h3
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608690"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608690"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:02 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242032"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242032"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:02 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 008/106] x86/virt/tdx: Add a helper function to return
 system wide info about TDX module
Date: Mon, 27 Feb 2023 00:22:07 -0800
Message-Id: 
 <efc6fd0b6e7e27bd178d7aadaa79f71561d6f526.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX KVM needs system-wide information about the TDX module, struct
tdsysinfo_struct.  Add a helper function tdx_get_sysinfo() to return it
instead of KVM getting it with various error checks.  Make KVM call the
function and stash the info.  Move out the struct definition about it to
common place arch/x86/include/asm/tdx.h.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/tdx.h  | 54 +++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.c      | 15 ++++++++++-
 arch/x86/virt/vmx/tdx/tdx.c | 21 ++++++++++++---
 arch/x86/virt/vmx/tdx/tdx.h | 51 -----------------------------------
 4 files changed, 85 insertions(+), 56 deletions(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 2094d634e1a3..a10bc61e6008 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -105,6 +105,58 @@ static inline long tdx_kvm_hypercall(unsigned int nr, =
unsigned long p1,
 #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
=20
 #ifdef CONFIG_INTEL_TDX_HOST
+struct tdx_cpuid_config {
+	u32	leaf;
+	u32	sub_leaf;
+	u32	eax;
+	u32	ebx;
+	u32	ecx;
+	u32	edx;
+} __packed;
+
+#define TDSYSINFO_STRUCT_SIZE		1024
+#define TDSYSINFO_STRUCT_ALIGNMENT	1024
+
+/*
+ * The size of this structure itself is flexible.  The actual structure
+ * passed to TDH.SYS.INFO must be padded to TDSYSINFO_STRUCT_SIZE and be
+ * aligned to TDSYSINFO_STRUCT_ALIGNMENT using DECLARE_PADDED_STRUCT().
+ */
+struct tdsysinfo_struct {
+	/* TDX-SEAM Module Info */
+	u32	attributes;
+	u32	vendor_id;
+	u32	build_date;
+	u16	build_num;
+	u16	minor_version;
+	u16	major_version;
+	u8	reserved0[14];
+	/* Memory Info */
+	u16	max_tdmrs;
+	u16	max_reserved_per_tdmr;
+	u16	pamt_entry_size;
+	u8	reserved1[10];
+	/* Control Struct Info */
+	u16	tdcs_base_size;
+	u8	reserved2[2];
+	u16	tdvps_base_size;
+	u8	tdvps_xfam_dependent_size;
+	u8	reserved3[9];
+	/* TD Capabilities */
+	u64	attributes_fixed0;
+	u64	attributes_fixed1;
+	u64	xfam_fixed0;
+	u64	xfam_fixed1;
+	u8	reserved4[32];
+	u32	num_cpuid_config;
+	/*
+	 * The actual number of CPUID_CONFIG depends on above
+	 * 'num_cpuid_config'.
+	 */
+	DECLARE_FLEX_ARRAY(struct tdx_cpuid_config, cpuid_configs);
+} __packed;
+
+const struct tdsysinfo_struct *tdx_get_sysinfo(void);
 bool platform_tdx_enabled(void);
 int tdx_enable(void);
 int tdx_cpu_online(unsigned int cpu);
@@ -121,6 +173,8 @@ void tdx_guest_keyid_free(int keyid);
 u64 __seamcall(u64 op, u64 rcx, u64 rdx, u64 r8, u64 r9,
 	       struct tdx_module_output *out);
 #else	/* !CONFIG_INTEL_TDX_HOST */
+struct tdsysinfo_struct;
+static inline const struct tdsysinfo_struct *tdx_get_sysinfo(void) { retur=
n NULL; }
 static inline bool platform_tdx_enabled(void) { return false; }
 static inline int tdx_enable(void)  { return -EINVAL; }
 static inline int tdx_cpu_online(unsigned int cpu) { return 0; }
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 82239d18fde3..4764c29b6988 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -11,9 +11,18 @@
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
=20
+#define TDX_MAX_NR_CPUID_CONFIGS					\
+	((TDSYSINFO_STRUCT_SIZE -					\
+		offsetof(struct tdsysinfo_struct, cpuid_configs))	\
+		/ sizeof(struct tdx_cpuid_config))
+
 static int __init tdx_module_setup(void)
 {
-	int ret;
+	const struct tdsysinfo_struct *tdsysinfo;
+	int ret =3D 0;
+
+	BUILD_BUG_ON(sizeof(*tdsysinfo) > TDSYSINFO_STRUCT_SIZE);
+	BUILD_BUG_ON(TDX_MAX_NR_CPUID_CONFIGS !=3D 37);
=20
 	ret =3D tdx_enable();
 	if (ret) {
@@ -21,6 +30,10 @@ static int __init tdx_module_setup(void)
 		return ret;
 	}
=20
+	/* Sanitary check just in case. */
+	tdsysinfo =3D tdx_get_sysinfo();
+	WARN_ON(tdsysinfo->num_cpuid_config > TDX_MAX_NR_CPUID_CONFIGS);
+
 	pr_info("TDX is supported.\n");
 	return 0;
 }
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index cf5431ee3cf8..79b7b2d73ff5 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -357,7 +357,7 @@ static void print_cmrs(struct cmr_info *cmr_array, int =
nr_cmrs)
  * CMRs, and save them to @sysinfo and @cmr_array.  @sysinfo must have
  * been padded to have enough room to save the TDSYSINFO_STRUCT.
  */
-static int tdx_get_sysinfo(struct tdsysinfo_struct *sysinfo,
+static int __tdx_get_sysinfo(struct tdsysinfo_struct *sysinfo,
 			   struct cmr_info *cmr_array)
 {
 	struct tdx_module_output out;
@@ -382,6 +382,21 @@ static int tdx_get_sysinfo(struct tdsysinfo_struct *sy=
sinfo,
 	return 0;
 }
=20
+static DECLARE_PADDED_STRUCT(tdsysinfo_struct, tdsysinfo,
+			     TDSYSINFO_STRUCT_SIZE, TDSYSINFO_STRUCT_ALIGNMENT);
+
+const struct tdsysinfo_struct *tdx_get_sysinfo(void)
+{
+	const struct tdsysinfo_struct *r =3D NULL;
+
+	mutex_lock(&tdx_module_lock);
+	if (tdx_module_status =3D=3D TDX_MODULE_INITIALIZED)
+		r =3D &PADDED_STRUCT(tdsysinfo);
+	mutex_unlock(&tdx_module_lock);
+	return r;
+}
+EXPORT_SYMBOL_GPL(tdx_get_sysinfo);
+
 /*
  * Add a memory region as a TDX memory block.  The caller must make sure
  * all memory regions are added in address ascending order and don't
@@ -1164,8 +1179,6 @@ static int init_tdmrs(struct tdmr_info_list *tdmr_lis=
t)
=20
 static int init_tdx_module(void)
 {
-	static DECLARE_PADDED_STRUCT(tdsysinfo_struct, tdsysinfo,
-			TDSYSINFO_STRUCT_SIZE, TDSYSINFO_STRUCT_ALIGNMENT);
 	static struct cmr_info cmr_array[MAX_CMRS]
 			__aligned(CMR_INFO_ARRAY_ALIGNMENT);
 	struct tdsysinfo_struct *sysinfo =3D &PADDED_STRUCT(tdsysinfo);
@@ -1196,7 +1209,7 @@ static int init_tdx_module(void)
 	if (ret)
 		goto out;
=20
-	ret =3D tdx_get_sysinfo(sysinfo, cmr_array);
+	ret =3D __tdx_get_sysinfo(sysinfo, cmr_array);
 	if (ret)
 		goto out;
=20
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 4e312c7f9553..66ca6f1f3d23 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -29,15 +29,6 @@ struct cmr_info {
 #define MAX_CMRS			32
 #define CMR_INFO_ARRAY_ALIGNMENT	512
=20
-struct cpuid_config {
-	u32	leaf;
-	u32	sub_leaf;
-	u32	eax;
-	u32	ebx;
-	u32	ecx;
-	u32	edx;
-} __packed;
-
 #define DECLARE_PADDED_STRUCT(type, name, size, alignment)	\
 	struct type##_padded {					\
 		union {						\
@@ -48,48 +39,6 @@ struct cpuid_config {
=20
 #define PADDED_STRUCT(name)	(name##_padded.name)
=20
-#define TDSYSINFO_STRUCT_SIZE		1024
-#define TDSYSINFO_STRUCT_ALIGNMENT	1024
-
-/*
- * The size of this structure itself is flexible.  The actual structure
- * passed to TDH.SYS.INFO must be padded to TDSYSINFO_STRUCT_SIZE and be
- * aligned to TDSYSINFO_STRUCT_ALIGNMENT using DECLARE_PADDED_STRUCT().
- */
-struct tdsysinfo_struct {
-	/* TDX-SEAM Module Info */
-	u32	attributes;
-	u32	vendor_id;
-	u32	build_date;
-	u16	build_num;
-	u16	minor_version;
-	u16	major_version;
-	u8	reserved0[14];
-	/* Memory Info */
-	u16	max_tdmrs;
-	u16	max_reserved_per_tdmr;
-	u16	pamt_entry_size;
-	u8	reserved1[10];
-	/* Control Struct Info */
-	u16	tdcs_base_size;
-	u8	reserved2[2];
-	u16	tdvps_base_size;
-	u8	tdvps_xfam_dependent_size;
-	u8	reserved3[9];
-	/* TD Capabilities */
-	u64	attributes_fixed0;
-	u64	attributes_fixed1;
-	u64	xfam_fixed0;
-	u64	xfam_fixed1;
-	u8	reserved4[32];
-	u32	num_cpuid_config;
-	/*
-	 * The actual number of CPUID_CONFIG depends on above
-	 * 'num_cpuid_config'.
-	 */
-	DECLARE_FLEX_ARRAY(struct cpuid_config, cpuid_configs);
-} __packed;
-
 struct tdmr_reserved_area {
 	u64 offset;
 	u64 size;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DCBABC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:24:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230360AbjB0IYs (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:48 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54822 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230169AbjB0IYK (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:10 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99D6EEC7C;
        Mon, 27 Feb 2023 00:24:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486247; x=1709022247;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Bsx7OZ1n4ev+QEdntSMxW6D9XTHyqvBmXiteTGVegHs=;
  b=RuiZyKLDvqNPEY2nwTOotBDB0xee7ftSPdA9aadrGcwHQFUFC4mWAQ99
   4bo4Hu3NWyqr4maulwjFAUFYUQ0bTQDwg8zJ9sdlpWIT0DmZzArw+RnAw
   KAfpPbkUdCpzos6YIq4irNAKBu767pKsKhzfR/mDUqabSQzI/2zLH+csV
   MUydA8PrxY5mpi2ne+lgDgZ6UmWEc+fDsYTawHH/kiC1ZtPzMe1uaLOwN
   NQjw3vgNTlYOOueneh31jrCRvCK9oKKabK83S1SnpdXJkoa5VaXViWmfJ
   GVV341c6O1EGDX6Vpoa4zy1ce2FNDRd34HXvMictV6VKFrtrbtABp5ucI
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608693"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608693"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:02 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242035"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242035"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:02 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 009/106] KVM: TDX: x86: Add ioctl to get TDX systemwide
 parameters
Date: Mon, 27 Feb 2023 00:22:08 -0800
Message-Id: 
 <f824d1d15f45373d3f3f9573b47e741c6cee1497.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Implement a system-scoped ioctl to get system-wide parameters for TDX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h    |  1 +
 arch/x86/include/asm/kvm_host.h       |  1 +
 arch/x86/include/uapi/asm/kvm.h       | 48 +++++++++++++++++++++++++
 arch/x86/kvm/vmx/main.c               |  2 ++
 arch/x86/kvm/vmx/tdx.c                | 51 +++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h            |  2 ++
 arch/x86/kvm/x86.c                    |  6 ++++
 tools/arch/x86/include/uapi/asm/kvm.h | 48 +++++++++++++++++++++++++
 8 files changed, 159 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index eac4b65d1b01..b46dcac078b2 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -117,6 +117,7 @@ KVM_X86_OP(enter_smm)
 KVM_X86_OP(leave_smm)
 KVM_X86_OP(enable_smi_window)
 #endif
+KVM_X86_OP_OPTIONAL(dev_mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_register_region)
 KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index ffb85c35cacc..58fc697095fd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1702,6 +1702,7 @@ struct kvm_x86_ops {
 	void (*enable_smi_window)(struct kvm_vcpu *vcpu);
 #endif
=20
+	int (*dev_mem_enc_ioctl)(void __user *argp);
 	int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp);
 	int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *ar=
gp);
 	int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *=
argp);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 53ce363ba5fe..861bbf4546c4 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -532,4 +532,52 @@ struct kvm_pmu_event_filter {
 #define KVM_X86_DEFAULT_VM	0
 #define KVM_X86_PROTECTED_VM	1
=20
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum kvm_tdx_cmd_id {
+	KVM_TDX_CAPABILITIES =3D 0,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	/* enum kvm_tdx_cmd_id */
+	__u32 id;
+	/* flags for sub-commend. If sub-command doesn't use this, set zero. */
+	__u32 flags;
+	/*
+	 * data for each sub-command. An immediate or a pointer to the actual
+	 * data in process virtual address.  If sub-command doesn't use it,
+	 * set zero.
+	 */
+	__u64 data;
+	/*
+	 * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+	 * status code in addition to -Exxx.
+	 * Defined for consistency with struct kvm_sev_cmd.
+	 */
+	__u64 error;
+	/* Reserved: Defined for consistency with struct kvm_sev_cmd. */
+	__u64 unused;
+};
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+
+	__u32 nr_cpuid_configs;
+	__u32 padding;
+	struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 030251cf714e..620742d98ed3 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -188,6 +188,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.complete_emulated_msr =3D kvm_complete_insn_gp,
=20
 	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
+
+	.dev_mem_enc_ioctl =3D tdx_dev_ioctl,
 };
=20
 struct kvm_x86_init_ops vt_init_ops __initdata =3D {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 4764c29b6988..7a42eee995f6 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -16,6 +16,57 @@
 		offsetof(struct tdsysinfo_struct, cpuid_configs))	\
 		/ sizeof(struct tdx_cpuid_config))
=20
+int tdx_dev_ioctl(void __user *argp)
+{
+	struct kvm_tdx_capabilities __user *user_caps;
+	const struct tdsysinfo_struct *tdsysinfo;
+	struct kvm_tdx_capabilities caps;
+	struct kvm_tdx_cmd cmd;
+
+	BUILD_BUG_ON(sizeof(struct kvm_tdx_cpuid_config) !=3D
+		     sizeof(struct tdx_cpuid_config));
+
+	if (copy_from_user(&cmd, argp, sizeof(cmd)))
+		return -EFAULT;
+	if (cmd.flags || cmd.error || cmd.unused)
+		return -EINVAL;
+	/*
+	 * Currently only KVM_TDX_CAPABILITIES is defined for system-scoped
+	 * mem_enc_ioctl().
+	 */
+	if (cmd.id !=3D KVM_TDX_CAPABILITIES)
+		return -EINVAL;
+
+	tdsysinfo =3D tdx_get_sysinfo();
+	if (!tdsysinfo)
+		return -ENOTSUPP;
+
+	user_caps =3D (void __user *)cmd.data;
+	if (copy_from_user(&caps, user_caps, sizeof(caps)))
+		return -EFAULT;
+
+	if (caps.nr_cpuid_configs < tdsysinfo->num_cpuid_config)
+		return -E2BIG;
+
+	caps =3D (struct kvm_tdx_capabilities) {
+		.attrs_fixed0 =3D tdsysinfo->attributes_fixed0,
+		.attrs_fixed1 =3D tdsysinfo->attributes_fixed1,
+		.xfam_fixed0 =3D tdsysinfo->xfam_fixed0,
+		.xfam_fixed1 =3D tdsysinfo->xfam_fixed1,
+		.nr_cpuid_configs =3D tdsysinfo->num_cpuid_config,
+		.padding =3D 0,
+	};
+
+	if (copy_to_user(user_caps, &caps, sizeof(caps)))
+		return -EFAULT;
+	if (copy_to_user(user_caps->cpuid_configs, &tdsysinfo->cpuid_configs,
+			 tdsysinfo->num_cpuid_config *
+			 sizeof(struct tdx_cpuid_config)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static int __init tdx_module_setup(void)
 {
 	const struct tdsysinfo_struct *tdsysinfo;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 429faa3deb71..5dc3f0d11427 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -140,9 +140,11 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu);
 #ifdef CONFIG_INTEL_TDX_HOST
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
 bool tdx_is_vm_type_supported(unsigned long type);
+int tdx_dev_ioctl(void __user *argp);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 -ENOSYS; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
+static inline int tdx_dev_ioctl(void __user *argp) { return -EOPNOTSUPP; };
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 589844a27349..c7459bc8b315 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4708,6 +4708,12 @@ long kvm_arch_dev_ioctl(struct file *filp,
 		r =3D kvm_x86_dev_has_attr(&attr);
 		break;
 	}
+	case KVM_MEMORY_ENCRYPT_OP:
+		r =3D -EINVAL;
+		if (!kvm_x86_ops.dev_mem_enc_ioctl)
+			goto out;
+		r =3D static_call(kvm_x86_dev_mem_enc_ioctl)(argp);
+		break;
 	default:
 		r =3D -EINVAL;
 		break;
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 53ce363ba5fe..861bbf4546c4 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -532,4 +532,52 @@ struct kvm_pmu_event_filter {
 #define KVM_X86_DEFAULT_VM	0
 #define KVM_X86_PROTECTED_VM	1
=20
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum kvm_tdx_cmd_id {
+	KVM_TDX_CAPABILITIES =3D 0,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	/* enum kvm_tdx_cmd_id */
+	__u32 id;
+	/* flags for sub-commend. If sub-command doesn't use this, set zero. */
+	__u32 flags;
+	/*
+	 * data for each sub-command. An immediate or a pointer to the actual
+	 * data in process virtual address.  If sub-command doesn't use it,
+	 * set zero.
+	 */
+	__u64 data;
+	/*
+	 * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+	 * status code in addition to -Exxx.
+	 * Defined for consistency with struct kvm_sev_cmd.
+	 */
+	__u64 error;
+	/* Reserved: Defined for consistency with struct kvm_sev_cmd. */
+	__u64 unused;
+};
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+
+	__u32 nr_cpuid_configs;
+	__u32 padding;
+	struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
 #endif /* _ASM_X86_KVM_H */
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6D814C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:24:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230350AbjB0IYl (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:41 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54810 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230157AbjB0IYJ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:09 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B22E41165A;
        Mon, 27 Feb 2023 00:24:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486247; x=1709022247;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=vt6CFftujemCX1PnCsrk85Rf20s9RBT08Ts/E/nchBc=;
  b=C5KsLI+X4tw7SKCG6HQJDPsGAW9hCGZVXlDxd3Ea52TJOsLComf637ms
   SCPMmEEccQ3sXGSmc+Y46QYXjor6i6bGoUtcbV45KVBKxHalXATA74IPh
   a68+62eCzlizgBaiV8c1IOiPy0OIky/cw5H1v7YESFz6F3t9Cn5Xw8Drv
   RO1HRdpft9MpkxnZX0EgaXzPHuboLV42E2F2+g7gG5g6aCjFbRMshDHDD
   iUmF0qFnTUlc8C/RFfMmUmdHfQ+CChpbt334tlmqlJ9MRffQpPUSj30SS
   oX7bFwY0H+O84nHJ2Wg3KQ5ZRWvpih+Q86/wVcn11BDlhJadqcu/9mAXo
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608700"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608700"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:02 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242041"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242041"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:02 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 010/106] KVM: TDX: Add place holder for TDX VM specific
 mem_enc_op ioctl
Date: Mon, 27 Feb 2023 00:22:09 -0800
Message-Id: 
 <4e6cea32d83276bb887c1e484cb8389fa5728df7.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a place holder function for TDX specific VM-scoped ioctl as mem_enc_op.
TDX specific sub-commands will be added to retrieve/pass TDX specific
parameters.  Make mem_enc_ioctl non-optional as it's not optional now.

KVM_MEMORY_ENCRYPT_OP was introduced for VM-scoped operations specific for
guest state-protected VM.  It defined subcommands for technology-specific
operations under KVM_MEMORY_ENCRYPT_OP.  Despite its name, the subcommands
are not limited to memory encryption, but various technology-specific
operations are defined.  It's natural to repurpose KVM_MEMORY_ENCRYPT_OP
for TDX specific operations and define subcommands.

TDX requires VM-scoped TDX-specific operations for device model, for
example, qemu.  Getting system-wide parameters, TDX-specific VM
initialization.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 +-
 arch/x86/kvm/vmx/main.c            |  9 +++++++++
 arch/x86/kvm/vmx/tdx.c             | 26 ++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h         |  4 ++++
 arch/x86/kvm/x86.c                 |  4 ----
 5 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index b46dcac078b2..58fbaa05fc8c 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -117,7 +117,7 @@ KVM_X86_OP(enter_smm)
 KVM_X86_OP(leave_smm)
 KVM_X86_OP(enable_smi_window)
 #endif
-KVM_X86_OP_OPTIONAL(dev_mem_enc_ioctl)
+KVM_X86_OP(dev_mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_register_region)
 KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 620742d98ed3..d90da9fd75bf 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -37,6 +37,14 @@ static int vt_vm_init(struct kvm *kvm)
 	return vmx_vm_init(kvm);
 }
=20
+static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
+{
+	if (!is_td(kvm))
+		return -ENOTTY;
+
+	return tdx_vm_ioctl(kvm, argp);
+}
+
 #define VMX_REQUIRED_APICV_INHIBITS		       \
 (						       \
        BIT(APICV_INHIBIT_REASON_DISABLE)|	       \
@@ -190,6 +198,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
=20
 	.dev_mem_enc_ioctl =3D tdx_dev_ioctl,
+	.mem_enc_ioctl =3D vt_mem_enc_ioctl,
 };
=20
 struct kvm_x86_init_ops vt_init_ops __initdata =3D {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 7a42eee995f6..cfedb2592725 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -67,6 +67,32 @@ int tdx_dev_ioctl(void __user *argp)
 	return 0;
 }
=20
+int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
+{
+	struct kvm_tdx_cmd tdx_cmd;
+	int r;
+
+	if (copy_from_user(&tdx_cmd, argp, sizeof(struct kvm_tdx_cmd)))
+		return -EFAULT;
+	if (tdx_cmd.error || tdx_cmd.unused)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	switch (tdx_cmd.id) {
+	default:
+		r =3D -EINVAL;
+		goto out;
+	}
+
+	if (copy_to_user(argp, &tdx_cmd, sizeof(struct kvm_tdx_cmd)))
+		r =3D -EFAULT;
+
+out:
+	mutex_unlock(&kvm->lock);
+	return r;
+}
+
 static int __init tdx_module_setup(void)
 {
 	const struct tdsysinfo_struct *tdsysinfo;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 5dc3f0d11427..6598e16f8e9f 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -141,10 +141,14 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu);
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
 bool tdx_is_vm_type_supported(unsigned long type);
 int tdx_dev_ioctl(void __user *argp);
+
+int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 -ENOSYS; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
 static inline int tdx_dev_ioctl(void __user *argp) { return -EOPNOTSUPP; };
+
+static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c7459bc8b315..3e00fb7863a8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6961,10 +6961,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		goto out;
 	}
 	case KVM_MEMORY_ENCRYPT_OP: {
-		r =3D -ENOTTY;
-		if (!kvm_x86_ops.mem_enc_ioctl)
-			goto out;
-
 		r =3D static_call(kvm_x86_mem_enc_ioctl)(kvm, argp);
 		break;
 	}
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3EDB7C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:24:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230330AbjB0IYi (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:38 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54822 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230162AbjB0IYJ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:09 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 878C01B56A;
        Mon, 27 Feb 2023 00:24:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486248; x=1709022248;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ZHtH5cD+sidHdCeHBrWYu6ZRCepDrZPF+TBklpC+idg=;
  b=QmvguWykQ707u+UBZ3vMGTw5NcHoVZtVruPyeP5ffPiQsXSfdHBtZTze
   m/jfmbIzkXdagwzpeQEJZMhMd5/pHiq4n+1VhiYaLoKjiMLxgypuUyMEv
   +FRdwCD3HhzyvqyXn8/RSO+SaLisKk3NmAqKBrB196YnexZBiMTpPL4zq
   Nn9r+BIM+//m4yH1GL9QbWqrz+jbYcYNMghgz6sGZi4BLZUpqyuku2pWD
   YlZ5YsR1HUJFb+uJwmEkapW0/PoxCsYqCK+Kwi9GceLPkTLQzNSWL7YH4
   43iZtUrKEylXnqcAnpV4i3NTGPwUvQeEBeVJD9pQZphpPkb2AzsveAcgf
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608705"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608705"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:03 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242044"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242044"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:02 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 011/106] KVM: x86,
 tdx: Make KVM_CAP_MAX_VCPUS backend specific
Date: Mon, 27 Feb 2023 00:22:10 -0800
Message-Id: 
 <ec9ee6f90f8c5702baee7f70e03caa2f4e92adf7.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX has its own limitation on the maximum number of vcpus.  Make it backend
specific and return TDX specific value for KVM_CAP_MAX_VCPUS.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/vmx/main.c            | 13 +++++++++++++
 arch/x86/kvm/x86.c                 |  2 ++
 4 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 58fbaa05fc8c..6914f1d61803 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -21,6 +21,7 @@ KVM_X86_OP(hardware_unsetup)
 KVM_X86_OP(has_emulated_msr)
 KVM_X86_OP(vcpu_after_set_cpuid)
 KVM_X86_OP(is_vm_type_supported)
+KVM_X86_OP_OPTIONAL(max_vcpus);
 KVM_X86_OP(vm_init)
 KVM_X86_OP_OPTIONAL(vm_destroy)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 58fc697095fd..1c761c9e1edb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1547,6 +1547,7 @@ struct kvm_x86_ops {
 	void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
=20
 	bool (*is_vm_type_supported)(unsigned long vm_type);
+	int (*max_vcpus)(struct kvm *kvm);
 	unsigned int vm_size;
 	int (*vm_init)(struct kvm *kvm);
 	void (*vm_destroy)(struct kvm *kvm);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d90da9fd75bf..41c2e4a1b157 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -6,6 +6,7 @@
 #include "nested.h"
 #include "pmu.h"
 #include "tdx.h"
+#include "tdx_arch.h"
=20
 static bool enable_tdx __ro_after_init;
 module_param_named(tdx, enable_tdx, bool, 0444);
@@ -16,6 +17,17 @@ static bool vt_is_vm_type_supported(unsigned long type)
 		(enable_tdx && tdx_is_vm_type_supported(type));
 }
=20
+static int vt_max_vcpus(struct kvm *kvm)
+{
+	if (!kvm)
+		return KVM_MAX_VCPUS;
+
+	if (is_td(kvm))
+		return min3(kvm->max_vcpus, KVM_MAX_VCPUS, TDX_MAX_VCPUS);
+
+	return kvm->max_vcpus;
+}
+
 static __init int vt_hardware_setup(void)
 {
 	int ret;
@@ -68,6 +80,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.has_emulated_msr =3D vmx_has_emulated_msr,
=20
 	.is_vm_type_supported =3D vt_is_vm_type_supported,
+	.max_vcpus =3D vt_max_vcpus,
 	.vm_size =3D sizeof(struct kvm_vmx),
 	.vm_init =3D vt_vm_init,
 	.vm_destroy =3D vmx_vm_destroy,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3e00fb7863a8..c54baa3973f2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4490,6 +4490,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lon=
g ext)
 		break;
 	case KVM_CAP_MAX_VCPUS:
 		r =3D KVM_MAX_VCPUS;
+		if (kvm_x86_ops.max_vcpus)
+			r =3D static_call(kvm_x86_max_vcpus)(kvm);
 		break;
 	case KVM_CAP_MAX_VCPU_ID:
 		r =3D KVM_MAX_VCPU_IDS;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 763E1C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:24:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230355AbjB0IYo (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:44 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54838 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230166AbjB0IYK (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:10 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CAE601B571;
        Mon, 27 Feb 2023 00:24:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486248; x=1709022248;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=eMrrcPub2G1cGsbfy712RRcBRV7gQPTW9/tXfVJH+SQ=;
  b=W5LtNkJSVZc83LPuM9AGEPy4e6FRdhekkldAeaGC9nBProsNqw3ohugZ
   D40OZG0xwK3c7+hayWAHkImIezdljPbsEz1GZb+iD81o389ih1+fEnTvO
   Q0DPbsqOckmIzqDClL6pjhNWw+3/Da43yHjI3y0Pmu2NiCEt3z3TNaIto
   7D0It/Zt1pl3iWAKHLVEYtnAXMY3O38j0MATXfbgFrJAADgzsC/BWx5BP
   dvZlcI6JfMcMNPOM4SYNAyikEfUhfmXq6MypLSA9wcn30TzIgbQqxDh8D
   Uz+b1lMVEXxf1Xm+Nb9MxAA5KO1Ulfy7XWKuK3BpI6T8a1mLNuGu7D3nh
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608709"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608709"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:03 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242047"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242047"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:02 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 012/106] KVM: x86/vmx, tdx: Allow VMX,
 TDX to override KVM_ENABLE_CAP
Date: Mon, 27 Feb 2023 00:22:11 -0800
Message-Id: 
 <b6867d9e8101fc464acfe76772c2cac8f078269f.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX has its own control of maximum number of vcpus. Not KVM_MAX_VCPUS.
Allow vmx, tdx to override KVM_CAP_MAX_CPUS so that it can specify its own
maximum number of cpus instead of KVM_MAX_VCPUS.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/vmx/main.c            |  9 +++++++++
 arch/x86/kvm/vmx/tdx.c             | 30 ++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h             |  3 +++
 arch/x86/kvm/vmx/x86_ops.h         |  2 ++
 arch/x86/kvm/x86.c                 |  2 ++
 7 files changed, 48 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 6914f1d61803..7522c193f2b4 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -22,6 +22,7 @@ KVM_X86_OP(has_emulated_msr)
 KVM_X86_OP(vcpu_after_set_cpuid)
 KVM_X86_OP(is_vm_type_supported)
 KVM_X86_OP_OPTIONAL(max_vcpus);
+KVM_X86_OP_OPTIONAL(vm_enable_cap)
 KVM_X86_OP(vm_init)
 KVM_X86_OP_OPTIONAL(vm_destroy)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 1c761c9e1edb..bc9ecba514a9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1549,6 +1549,7 @@ struct kvm_x86_ops {
 	bool (*is_vm_type_supported)(unsigned long vm_type);
 	int (*max_vcpus)(struct kvm *kvm);
 	unsigned int vm_size;
+	int (*vm_enable_cap)(struct kvm *kvm, struct kvm_enable_cap *cap);
 	int (*vm_init)(struct kvm *kvm);
 	void (*vm_destroy)(struct kvm *kvm);
=20
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 41c2e4a1b157..a090c029efd5 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -41,6 +41,14 @@ static __init int vt_hardware_setup(void)
 	return 0;
 }
=20
+static int vt_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
+{
+	if (is_td(kvm))
+		return tdx_vm_enable_cap(kvm, cap);
+
+	return -EINVAL;
+}
+
 static int vt_vm_init(struct kvm *kvm)
 {
 	if (is_td(kvm))
@@ -82,6 +90,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.is_vm_type_supported =3D vt_is_vm_type_supported,
 	.max_vcpus =3D vt_max_vcpus,
 	.vm_size =3D sizeof(struct kvm_vmx),
+	.vm_enable_cap =3D vt_vm_enable_cap,
 	.vm_init =3D vt_vm_init,
 	.vm_destroy =3D vmx_vm_destroy,
=20
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index cfedb2592725..16e207168dc1 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -16,6 +16,36 @@
 		offsetof(struct tdsysinfo_struct, cpuid_configs))	\
 		/ sizeof(struct tdx_cpuid_config))
=20
+int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
+{
+	int r;
+
+	switch (cap->cap) {
+	case KVM_CAP_MAX_VCPUS: {
+		if (cap->flags || cap->args[0] =3D=3D 0)
+			return -EINVAL;
+		if (cap->args[0] > KVM_MAX_VCPUS)
+			return -E2BIG;
+		if (cap->args[0] > TDX_MAX_VCPUS)
+			return -E2BIG;
+
+		mutex_lock(&kvm->lock);
+		if (kvm->created_vcpus)
+			r =3D -EBUSY;
+		else {
+			kvm->max_vcpus =3D cap->args[0];
+			r =3D 0;
+		}
+		mutex_unlock(&kvm->lock);
+		break;
+	}
+	default:
+		r =3D -EINVAL;
+		break;
+	}
+	return r;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 2210c8c1e893..3860aa351bd9 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -3,6 +3,9 @@
 #define __KVM_X86_TDX_H
=20
 #ifdef CONFIG_INTEL_TDX_HOST
+
+#include "tdx_ops.h"
+
 struct kvm_tdx {
 	struct kvm kvm;
 	/* TDX specific members follow. */
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 6598e16f8e9f..71f7dc9ca118 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -142,12 +142,14 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86=
_ops);
 bool tdx_is_vm_type_supported(unsigned long type);
 int tdx_dev_ioctl(void __user *argp);
=20
+int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 -ENOSYS; }
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
 static inline int tdx_dev_ioctl(void __user *argp) { return -EOPNOTSUPP; };
=20
+static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap=
 *cap) { return -EINVAL; };
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 #endif
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c54baa3973f2..318e36535aa6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6421,6 +6421,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		break;
 	default:
 		r =3D -EINVAL;
+		if (kvm_x86_ops.vm_enable_cap)
+			r =3D static_call(kvm_x86_vm_enable_cap)(kvm, cap);
 		break;
 	}
 	return r;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 58995C7EE2E
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230417AbjB0IY4 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:56 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54838 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230183AbjB0IYM (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:12 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C6481B544;
        Mon, 27 Feb 2023 00:24:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486249; x=1709022249;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=nTv/U+mUm95QtF7oZPCtVaqb4pkWJ9aAmF3abQXYvnY=;
  b=YIZBcyurFQ8fQphawGHYBUlg/cT0zZybj5ejyURmPCrebprgbEwAJBJ2
   6QdGnWnFWAn1c/0Rx7FMi3dtiXaW64dlQG+yyc6x/OS0rpz40fDUzEvWe
   Enzht5raZt7JvrRNvQUxkH/J71E32i90f5If7monMnfENmp+d+n7moPzr
   FLWv6b/CycfTPd4nyNcTWp0dPSyuw8OgnZ4VS3VoPZ/j3rAkkJqrESX06
   WdGXNzFBzpDZj9AGrNVmR4+P+vZwwopf6PXTEH32XjjgjeVixzmvOQ9PZ
   qb+41O9Uj5ZjdqMIg3LWrQ4gVFIQUrCKjD2VtdSPivTL78pKrgKB5yUNs
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608710"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608710"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:03 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242050"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242050"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:03 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 013/106] KVM: TDX: create/destroy VM structure
Date: Mon, 27 Feb 2023 00:22:12 -0800
Message-Id: 
 <fd52ff91fbce051ecf9781af2e5c54138c995230.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

As the first step to create TDX guest, create/destroy VM struct.  Assign
TDX private Host Key ID (HKID) to the TDX guest for memory encryption and
allocate extra pages for the TDX guest. On destruction, free allocated
pages, and HKID.

Before tearing down private page tables, TDX requires some resources of the
guest TD to be destroyed (i.e. HKID must have been reclaimed, etc).  Add
flush_shadow_all_private callback before tearing down private page tables
for it.

Add vm_free() of kvm_x86_ops hook at the end of kvm_arch_destroy_vm()
because some per-VM TDX resources, e.g. TDR, need to be freed after other
TDX resources, e.g. HKID, were freed.

Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>

---
Changes v11 -> v12:
- use cpu_feature_enabled().

Changes v10 -> v11:
- Fix doule free in tdx_vm_free() by setting NULL.
- replace struct tdx_td_page tdr and tdcs from struct kvm_tdx with
  unsigned long
---
 arch/x86/include/asm/kvm-x86-ops.h |   2 +
 arch/x86/include/asm/kvm_host.h    |   2 +
 arch/x86/kvm/vmx/main.c            |  34 ++-
 arch/x86/kvm/vmx/tdx.c             | 440 ++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h             |   6 +-
 arch/x86/kvm/vmx/x86_ops.h         |   9 +
 arch/x86/kvm/x86.c                 |   8 +
 7 files changed, 496 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 7522c193f2b4..c30d2d2ad686 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -24,7 +24,9 @@ KVM_X86_OP(is_vm_type_supported)
 KVM_X86_OP_OPTIONAL(max_vcpus);
 KVM_X86_OP_OPTIONAL(vm_enable_cap)
 KVM_X86_OP(vm_init)
+KVM_X86_OP_OPTIONAL(flush_shadow_all_private)
 KVM_X86_OP_OPTIONAL(vm_destroy)
+KVM_X86_OP_OPTIONAL(vm_free)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate)
 KVM_X86_OP(vcpu_create)
 KVM_X86_OP(vcpu_free)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index bc9ecba514a9..f4e82ee3d668 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1551,7 +1551,9 @@ struct kvm_x86_ops {
 	unsigned int vm_size;
 	int (*vm_enable_cap)(struct kvm *kvm, struct kvm_enable_cap *cap);
 	int (*vm_init)(struct kvm *kvm);
+	void (*flush_shadow_all_private)(struct kvm *kvm);
 	void (*vm_destroy)(struct kvm *kvm);
+	void (*vm_free)(struct kvm *kvm);
=20
 	/* Create, but do not attach this VCPU */
 	int (*vcpu_precreate)(struct kvm *kvm);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index a090c029efd5..cdc73c09bf0b 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -49,14 +49,40 @@ static int vt_vm_enable_cap(struct kvm *kvm, struct kvm=
_enable_cap *cap)
 	return -EINVAL;
 }
=20
+static void vt_hardware_unsetup(void)
+{
+	tdx_hardware_unsetup();
+	vmx_hardware_unsetup();
+}
+
 static int vt_vm_init(struct kvm *kvm)
 {
 	if (is_td(kvm))
-		return -EOPNOTSUPP;	/* Not ready to create guest TD yet. */
+		return tdx_vm_init(kvm);
=20
 	return vmx_vm_init(kvm);
 }
=20
+static void vt_flush_shadow_all_private(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		tdx_mmu_release_hkid(kvm);
+}
+
+static void vt_vm_destroy(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return;
+
+	vmx_vm_destroy(kvm);
+}
+
+static void vt_vm_free(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		tdx_vm_free(kvm);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -81,7 +107,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.check_processor_compatibility =3D vmx_check_processor_compat,
=20
-	.hardware_unsetup =3D vmx_hardware_unsetup,
+	.hardware_unsetup =3D vt_hardware_unsetup,
=20
 	.hardware_enable =3D vmx_hardware_enable,
 	.hardware_disable =3D vmx_hardware_disable,
@@ -92,7 +118,9 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vm_size =3D sizeof(struct kvm_vmx),
 	.vm_enable_cap =3D vt_vm_enable_cap,
 	.vm_init =3D vt_vm_init,
-	.vm_destroy =3D vmx_vm_destroy,
+	.flush_shadow_all_private =3D vt_flush_shadow_all_private,
+	.vm_destroy =3D vt_vm_destroy,
+	.vm_free =3D vt_vm_free,
=20
 	.vcpu_precreate =3D vmx_vcpu_precreate,
 	.vcpu_create =3D vmx_vcpu_create,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 16e207168dc1..928eb47e7379 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -6,6 +6,7 @@
 #include "capabilities.h"
 #include "x86_ops.h"
 #include "tdx.h"
+#include "tdx_ops.h"
 #include "x86.h"
=20
 #undef pr_fmt
@@ -46,6 +47,271 @@ int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enabl=
e_cap *cap)
 	return r;
 }
=20
+struct tdx_info {
+	u8 nr_tdcs_pages;
+};
+
+/* Info about the TDX module. */
+static struct tdx_info tdx_info;
+
+/*
+ * Some TDX SEAMCALLs (TDH.MNG.CREATE, TDH.PHYMEM.CACHE.WB,
+ * TDH.MNG.KEY.RECLAIMID, TDH.MNG.KEY.FREEID etc) tries to acquire a globa=
l lock
+ * internally in TDX module.  If failed, TDX_OPERAND_BUSY is returned with=
out
+ * spinning or waiting due to a constraint on execution time.  It's caller=
's
+ * responsibility to avoid race (or retry on TDX_OPERAND_BUSY).  Use this =
mutex
+ * to avoid race in TDX module because the kernel knows better about sched=
uling.
+ */
+static DEFINE_MUTEX(tdx_lock);
+static struct mutex *tdx_mng_key_config_lock;
+
+static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
+{
+	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
+}
+
+static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->tdr_pa;
+}
+
+static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx)
+{
+	tdx_guest_keyid_free(kvm_tdx->hkid);
+	kvm_tdx->hkid =3D 0;
+}
+
+static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->hkid > 0;
+}
+
+static void tdx_clear_page(unsigned long page_pa)
+{
+	const void *zero_page =3D (const void *) __va(page_to_phys(ZERO_PAGE(0)));
+	void *page =3D __va(page_pa);
+	unsigned long i;
+
+	if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) {
+		clear_page(page);
+		return;
+	}
+
+	/*
+	 * Zeroing the page is only necessary for systems with MKTME-i:
+	 * when re-assign one page from old keyid to a new keyid, MOVDIR64B is
+	 * required to clear/write the page with new keyid to prevent integrity
+	 * error when read on the page with new keyid.
+	 *
+	 * clflush doesn't flush cache with HKID set.
+	 * The cache line could be poisoned (even without MKTME-i), clear the
+	 * poison bit.
+	 */
+	for (i =3D 0; i < PAGE_SIZE; i +=3D 64)
+		movdir64b(page + i, zero_page);
+	/*
+	 * MOVDIR64B store uses WC buffer.  Prevent following memory reads
+	 * from seeing potentially poisoned cache.
+	 */
+	__mb();
+}
+
+static int tdx_reclaim_page(hpa_t pa, bool do_wb, u16 hkid)
+{
+	struct tdx_module_output out;
+	u64 err;
+
+	do {
+		err =3D tdh_phymem_page_reclaim(pa, &out);
+		/*
+		 * TDH.PHYMEM.PAGE.RECLAIM is allowed only when TD is shutdown.
+		 * state.  i.e. destructing TD.
+		 * TDH.PHYMEM.PAGE.RECLAIM requires TDR and target page.
+		 * Because we're destructing TD, it's rare to contend with TDR.
+		 */
+	} while (err =3D=3D (TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX));
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_PHYMEM_PAGE_RECLAIM, err, &out);
+		return -EIO;
+	}
+
+	if (do_wb) {
+		/*
+		 * Only TDR page gets into this path.  No contention is expected
+		 * because of the last page of TD.
+		 */
+		err =3D tdh_phymem_page_wbinvd(set_hkid_to_hpa(pa, hkid));
+		if (WARN_ON_ONCE(err)) {
+			pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL);
+			return -EIO;
+		}
+	}
+
+	tdx_clear_page(pa);
+	return 0;
+}
+
+static void tdx_reclaim_td_page(unsigned long td_page_pa)
+{
+	if (!td_page_pa)
+		return;
+	/*
+	 * TDCX are being reclaimed.  TDX module maps TDCX with HKID
+	 * assigned to the TD.  Here the cache associated to the TD
+	 * was already flushed by TDH.PHYMEM.CACHE.WB before here, So
+	 * cache doesn't need to be flushed again.
+	 */
+	if (tdx_reclaim_page(td_page_pa, false, 0))
+		/*
+		 * Leak the page on failure:
+		 * tdx_reclaim_page() returns an error if and only if there's an
+		 * unexpected, fatal error, e.g. a SEAMCALL with bad params,
+		 * incorrect concurrency in KVM, a TDX Module bug, etc.
+		 * Retrying at a later point is highly unlikely to be
+		 * successful.
+		 * No log here as tdx_reclaim_page() already did.
+		 */
+		return;
+	free_page((unsigned long)__va(td_page_pa));
+}
+
+static int tdx_do_tdh_phymem_cache_wb(void *param)
+{
+	u64 err =3D 0;
+
+	do {
+		err =3D tdh_phymem_cache_wb(!!err);
+	} while (err =3D=3D TDX_INTERRUPTED_RESUMABLE);
+
+	/* Other thread may have done for us. */
+	if (err =3D=3D TDX_NO_HKID_READY_TO_WBCACHE)
+		err =3D TDX_SUCCESS;
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_PHYMEM_CACHE_WB, err, NULL);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+void tdx_mmu_release_hkid(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	cpumask_var_t packages;
+	bool cpumask_allocated;
+	u64 err;
+	int ret;
+	int i;
+
+	if (!is_hkid_assigned(kvm_tdx))
+		return;
+
+	if (!is_td_created(kvm_tdx))
+		goto free_hkid;
+
+	cpumask_allocated =3D zalloc_cpumask_var(&packages, GFP_KERNEL);
+	cpus_read_lock();
+	for_each_online_cpu(i) {
+		if (cpumask_allocated &&
+			cpumask_test_and_set_cpu(topology_physical_package_id(i),
+						packages))
+			continue;
+
+		/*
+		 * We can destroy multiple the guest TDs simultaneously.
+		 * Prevent tdh_phymem_cache_wb from returning TDX_BUSY by
+		 * serialization.
+		 */
+		mutex_lock(&tdx_lock);
+		ret =3D smp_call_on_cpu(i, tdx_do_tdh_phymem_cache_wb, NULL, 1);
+		mutex_unlock(&tdx_lock);
+		if (ret)
+			break;
+	}
+	cpus_read_unlock();
+	free_cpumask_var(packages);
+
+	mutex_lock(&tdx_lock);
+	err =3D tdh_mng_key_freeid(kvm_tdx->tdr_pa);
+	mutex_unlock(&tdx_lock);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_KEY_FREEID, err, NULL);
+		pr_err("tdh_mng_key_freeid failed. HKID %d is leaked.\n",
+			kvm_tdx->hkid);
+		return;
+	}
+
+free_hkid:
+	tdx_hkid_free(kvm_tdx);
+}
+
+void tdx_vm_free(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	int i;
+
+	/* Can't reclaim or free TD pages if teardown failed. */
+	if (is_hkid_assigned(kvm_tdx))
+		return;
+
+	if (kvm_tdx->tdcs_pa) {
+		for (i =3D 0; i < tdx_info.nr_tdcs_pages; i++)
+			tdx_reclaim_td_page(kvm_tdx->tdcs_pa[i]);
+		kfree(kvm_tdx->tdcs_pa);
+		kvm_tdx->tdcs_pa =3D NULL;
+	}
+
+	if (!kvm_tdx->tdr_pa)
+		return;
+	/*
+	 * TDX module maps TDR with TDX global HKID.  TDX module may access TDR
+	 * while operating on TD (Especially reclaiming TDCS).  Cache flush with
+	 * TDX global HKID is needed.
+	 */
+	if (tdx_reclaim_page(kvm_tdx->tdr_pa, true, tdx_global_keyid))
+		return;
+
+	free_page((unsigned long)__va(kvm_tdx->tdr_pa));
+	kvm_tdx->tdr_pa =3D 0;
+}
+
+static int tdx_do_tdh_mng_key_config(void *param)
+{
+	hpa_t *tdr_p =3D param;
+	u64 err;
+
+	do {
+		err =3D tdh_mng_key_config(*tdr_p);
+
+		/*
+		 * If it failed to generate a random key, retry it because this
+		 * is typically caused by an entropy error of the CPU's random
+		 * number generator.
+		 */
+	} while (err =3D=3D TDX_KEY_GENERATION_FAILED);
+
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_KEY_CONFIG, err, NULL);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int __tdx_td_init(struct kvm *kvm);
+
+int tdx_vm_init(struct kvm *kvm)
+{
+	/*
+	 * TDX has its own limit of the number of vcpus in addition to
+	 * KVM_MAX_VCPUS.
+	 */
+	kvm->max_vcpus =3D min(kvm->max_vcpus, TDX_MAX_VCPUS);
+
+	/* Place holder for TDX specific logic. */
+	return __tdx_td_init(kvm);
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
@@ -97,6 +363,160 @@ int tdx_dev_ioctl(void __user *argp)
 	return 0;
 }
=20
+static int __tdx_td_init(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	cpumask_var_t packages;
+	unsigned long *tdcs_pa =3D NULL;
+	unsigned long tdr_pa =3D 0;
+	unsigned long va;
+	int ret, i;
+	u64 err;
+
+	ret =3D tdx_guest_keyid_alloc();
+	if (ret < 0)
+		return ret;
+	kvm_tdx->hkid =3D ret;
+
+	va =3D __get_free_page(GFP_KERNEL_ACCOUNT);
+	if (!va)
+		goto free_hkid;
+	tdr_pa =3D __pa(va);
+
+	tdcs_pa =3D kcalloc(tdx_info.nr_tdcs_pages, sizeof(*kvm_tdx->tdcs_pa),
+			  GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	if (!tdcs_pa)
+		goto free_tdr;
+	for (i =3D 0; i < tdx_info.nr_tdcs_pages; i++) {
+		va =3D __get_free_page(GFP_KERNEL_ACCOUNT);
+		if (!va)
+			goto free_tdcs;
+		tdcs_pa[i] =3D __pa(va);
+	}
+
+	if (!zalloc_cpumask_var(&packages, GFP_KERNEL)) {
+		ret =3D -ENOMEM;
+		goto free_tdcs;
+	}
+	cpus_read_lock();
+	/*
+	 * Need at least one CPU of the package to be online in order to
+	 * program all packages for host key id.  Check it.
+	 */
+	for_each_present_cpu(i)
+		cpumask_set_cpu(topology_physical_package_id(i), packages);
+	for_each_online_cpu(i)
+		cpumask_clear_cpu(topology_physical_package_id(i), packages);
+	if (!cpumask_empty(packages)) {
+		ret =3D -EIO;
+		/*
+		 * Because it's hard for human operator to figure out the
+		 * reason, warn it.
+		 */
+		pr_warn("All packages need to have online CPU to create TD. Online CPU a=
nd retry.\n");
+		goto free_packages;
+	}
+
+	/*
+	 * Acquire global lock to avoid TDX_OPERAND_BUSY:
+	 * TDH.MNG.CREATE and other APIs try to lock the global Key Owner
+	 * Table (KOT) to track the assigned TDX private HKID.  It doesn't spin
+	 * to acquire the lock, returns TDX_OPERAND_BUSY instead, and let the
+	 * caller to handle the contention.  This is because of time limitation
+	 * usable inside the TDX module and OS/VMM knows better about process
+	 * scheduling.
+	 *
+	 * APIs to acquire the lock of KOT:
+	 * TDH.MNG.CREATE, TDH.MNG.KEY.FREEID, TDH.MNG.VPFLUSHDONE, and
+	 * TDH.PHYMEM.CACHE.WB.
+	 */
+	mutex_lock(&tdx_lock);
+	err =3D tdh_mng_create(tdr_pa, kvm_tdx->hkid);
+	mutex_unlock(&tdx_lock);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_CREATE, err, NULL);
+		ret =3D -EIO;
+		goto free_packages;
+	}
+	kvm_tdx->tdr_pa =3D tdr_pa;
+
+	for_each_online_cpu(i) {
+		int pkg =3D topology_physical_package_id(i);
+
+		if (cpumask_test_and_set_cpu(pkg, packages))
+			continue;
+
+		/*
+		 * Program the memory controller in the package with an
+		 * encryption key associated to a TDX private host key id
+		 * assigned to this TDR.  Concurrent operations on same memory
+		 * controller results in TDX_OPERAND_BUSY.  Avoid this race by
+		 * mutex.
+		 */
+		mutex_lock(&tdx_mng_key_config_lock[pkg]);
+		ret =3D smp_call_on_cpu(i, tdx_do_tdh_mng_key_config,
+				      &kvm_tdx->tdr_pa, true);
+		mutex_unlock(&tdx_mng_key_config_lock[pkg]);
+		if (ret)
+			break;
+	}
+	cpus_read_unlock();
+	free_cpumask_var(packages);
+	if (ret)
+		goto teardown;
+
+	kvm_tdx->tdcs_pa =3D tdcs_pa;
+	for (i =3D 0; i < tdx_info.nr_tdcs_pages; i++) {
+		err =3D tdh_mng_addcx(kvm_tdx->tdr_pa, tdcs_pa[i]);
+		if (WARN_ON_ONCE(err)) {
+			pr_tdx_error(TDH_MNG_ADDCX, err, NULL);
+			for (i++; i < tdx_info.nr_tdcs_pages; i++) {
+				free_page((unsigned long)__va(tdcs_pa[i]));
+				tdcs_pa[i] =3D 0;
+			}
+			ret =3D -EIO;
+			goto teardown;
+		}
+	}
+
+	/*
+	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a de=
dicated
+	 * ioctl() to define the configure CPUID values for the TD.
+	 */
+	return 0;
+
+	/*
+	 * The sequence for freeing resources from a partially initialized TD
+	 * varies based on where in the initialization flow failure occurred.
+	 * Simply use the full teardown and destroy, which naturally play nice
+	 * with partial initialization.
+	 */
+teardown:
+	tdx_mmu_release_hkid(kvm);
+	tdx_vm_free(kvm);
+	return ret;
+
+free_packages:
+	cpus_read_unlock();
+	free_cpumask_var(packages);
+free_tdcs:
+	for (i =3D 0; i < tdx_info.nr_tdcs_pages; i++) {
+		if (tdcs_pa[i])
+			free_page((unsigned long)__va(tdcs_pa[i]));
+	}
+	kfree(tdcs_pa);
+	kvm_tdx->tdcs_pa =3D NULL;
+
+free_tdr:
+	if (tdr_pa)
+		free_page((unsigned long)__va(tdr_pa));
+	kvm_tdx->tdr_pa =3D 0;
+free_hkid:
+	if (is_hkid_assigned(kvm_tdx))
+		tdx_hkid_free(kvm_tdx);
+	return ret;
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -137,9 +557,11 @@ static int __init tdx_module_setup(void)
 		return ret;
 	}
=20
-	/* Sanitary check just in case. */
 	tdsysinfo =3D tdx_get_sysinfo();
 	WARN_ON(tdsysinfo->num_cpuid_config > TDX_MAX_NR_CPUID_CONFIGS);
+	tdx_info =3D (struct tdx_info) {
+		.nr_tdcs_pages =3D tdsysinfo->tdcs_base_size / PAGE_SIZE,
+	};
=20
 	pr_info("TDX is supported.\n");
 	return 0;
@@ -153,6 +575,8 @@ bool tdx_is_vm_type_supported(unsigned long type)
=20
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops)
 {
+	int max_pkgs;
+	int i;
 	int r;
=20
 	if (!enable_ept) {
@@ -160,6 +584,14 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_=
ops)
 		return -EINVAL;
 	}
=20
+	max_pkgs =3D topology_max_packages();
+	tdx_mng_key_config_lock =3D kcalloc(max_pkgs, sizeof(*tdx_mng_key_config_=
lock),
+				   GFP_KERNEL);
+	if (!tdx_mng_key_config_lock)
+		return -ENOMEM;
+	for (i =3D 0; i < max_pkgs; i++)
+		mutex_init(&tdx_mng_key_config_lock[i]);
+
 	/* TDX requires VMX. */
 	r =3D vmxon_all();
 	if (!r)
@@ -168,3 +600,9 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_o=
ps)
=20
 	return r;
 }
+
+void tdx_hardware_unsetup(void)
+{
+	/* kfree accepts NULL. */
+	kfree(tdx_mng_key_config_lock);
+}
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 3860aa351bd9..4b790503e43e 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -8,7 +8,11 @@
=20
 struct kvm_tdx {
 	struct kvm kvm;
-	/* TDX specific members follow. */
+
+	unsigned long tdr_pa;
+	unsigned long *tdcs_pa;
+
+	int hkid;
 };
=20
 struct vcpu_tdx {
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 71f7dc9ca118..e497a5347329 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -139,17 +139,26 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu);
=20
 #ifdef CONFIG_INTEL_TDX_HOST
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
+void tdx_hardware_unsetup(void);
 bool tdx_is_vm_type_supported(unsigned long type);
 int tdx_dev_ioctl(void __user *argp);
=20
 int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
+int tdx_vm_init(struct kvm *kvm);
+void tdx_mmu_release_hkid(struct kvm *kvm);
+void tdx_vm_free(struct kvm *kvm);
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 -ENOSYS; }
+static inline void tdx_hardware_unsetup(void) {}
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
 static inline int tdx_dev_ioctl(void __user *argp) { return -EOPNOTSUPP; };
=20
 static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap=
 *cap) { return -EINVAL; };
+static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
+static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
+static inline void tdx_flush_shadow_all_private(struct kvm *kvm) {}
+static inline void tdx_vm_free(struct kvm *kvm) {}
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
 #endif
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 318e36535aa6..b2dd5670f552 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12356,6 +12356,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_page_track_cleanup(kvm);
 	kvm_xen_destroy_vm(kvm);
 	kvm_hv_destroy_vm(kvm);
+	static_call_cond(kvm_x86_vm_free)(kvm);
 }
=20
 static void memslot_rmap_free(struct kvm_memory_slot *slot)
@@ -12670,6 +12671,13 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
=20
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
+	/*
+	 * kvm_mmu_zap_all() zaps both private and shared page tables.  Before
+	 * tearing down private page tables, TDX requires some TD resources to
+	 * be destroyed (i.e. keyID must have been reclaimed, etc).  Invoke
+	 * kvm_x86_flush_shadow_all_private() for this.
+	 */
+	static_call_cond(kvm_x86_flush_shadow_all_private)(kvm);
 	kvm_mmu_zap_all(kvm);
 }
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 72F43C7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230478AbjB0IY6 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:58 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54810 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230182AbjB0IYM (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:12 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8858B1BAD3;
        Mon, 27 Feb 2023 00:24:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486249; x=1709022249;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Gm3o6oZWTNIJLdQxbpBA/95h3ABUgvW/vNEAuzFsDEI=;
  b=HPkQVU8YVyb7vDU7dGD1OKoEEyjHy0gcXCOuAf0w32JrbioVMIRM32jI
   Kpnnb6NhHazo8x6KItrvRWpNP70ehDQeefk/W1LE2fkV3IrBXRQxmO/8Z
   jydUIRcjEiiUrnUBQpwrsL251L3ocDrSiO41prSWqLEOyw1Q3Ykvh4ESY
   iZZcj+/0/7KLnINGbxQgEKKqqFU1TsfW3+xZODpx4zDxelGho7tzfIkaQ
   QyIn+aICnGCX6+s3ni6mrFAqwxsWFCaQpSnjGnhjPvuC/LdvFe748MyLs
   iawdmpjWqE9aGKHmBRxXZ87RG+wi/u71wCHjt/uWkNgGEZP3TTeVZ/vC9
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608716"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608716"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:03 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242055"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242055"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:03 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>
Subject: [PATCH v12 014/106] KVM: TDX: initialize VM with TDX specific
 parameters
Date: Mon, 27 Feb 2023 00:22:13 -0800
Message-Id: 
 <914ff128be7fcb777493d8c3e8a94f76e100a54e.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX requires additional parameters for TDX VM for confidential execution to
protect its confidentiality of its memory contents and its CPU state from
any other software, including VMM. When creating guest TD VM before
creating vcpu, the number of vcpu, TSC frequency (that is same among
vcpus. and it can't be changed.)  CPUIDs which is emulated by the TDX
module. It means guest can trust those CPUIDs. and sha384 values for
measurement.

Add new subcommand, KVM_TDX_INIT_VM, to pass parameters for TDX guest.  It
assigns encryption key to the TDX guest for memory encryption.  TDX
encrypts memory per-guest bases.  It assigns device model passes per-VM
parameters for the TDX guest.  The maximum number of vcpus, tsc frequency
(TDX guest has fised VM-wide TSC frequency. not per-vcpu.  The TDX guest
can not change it.), attributes (production or debug), available extended
features (which is reflected into guest XCR0, IA32_XSS MSR), cpuids, sha384
measurements, and etc.

This subcommand is called before creating vcpu and KVM_SET_CPUID2, i.e.
cpuids configurations aren't available yet.  So CPUIDs configuration values
needs to be passed in struct kvm_tdx_init_vm.  It's device model
responsibility to make this cpuid config for KVM_TDX_INIT_VM and
KVM_SET_CPUID2.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
Changes from v11 to v12
- ABI change. Changes struct kvm_tdx_init_vm layout
---
 arch/x86/include/asm/tdx.h            |   3 +
 arch/x86/include/uapi/asm/kvm.h       |  24 +++
 arch/x86/kvm/cpuid.c                  |   7 +
 arch/x86/kvm/cpuid.h                  |   2 +
 arch/x86/kvm/vmx/tdx.c                | 247 ++++++++++++++++++++++++--
 arch/x86/kvm/vmx/tdx.h                |  18 ++
 tools/arch/x86/include/uapi/asm/kvm.h |  33 ++++
 7 files changed, 324 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index a10bc61e6008..605af911632b 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -105,6 +105,9 @@ static inline long tdx_kvm_hypercall(unsigned int nr, u=
nsigned long p1,
 #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
=20
 #ifdef CONFIG_INTEL_TDX_HOST
+
+/* -1 indicates CPUID leaf with no sub-leaves. */
+#define TDX_CPUID_NO_SUBLEAF	((u32)-1)
 struct tdx_cpuid_config {
 	u32	leaf;
 	u32	sub_leaf;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 861bbf4546c4..04b3fa91e5b9 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -535,6 +535,7 @@ struct kvm_pmu_event_filter {
 /* Trust Domain eXtension sub-ioctl() commands. */
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
+	KVM_TDX_INIT_VM,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -580,4 +581,27 @@ struct kvm_tdx_capabilities {
 	struct kvm_tdx_cpuid_config cpuid_configs[0];
 };
=20
+struct kvm_tdx_init_vm {
+	__u64 attributes;
+	__u64 mrconfigid[6];	/* sha384 digest */
+	__u64 mrowner[6];	/* sha384 digest */
+	__u64 mrownerconfig[6];	/* sha348 digest */
+	/*
+	 * For future extensibility to make sizeof(struct kvm_tdx_init_vm) =3D 8K=
B.
+	 * This should be enough given sizeof(TD_PARAMS) =3D 1024.
+	 * 8KB was chosen given because
+	 * sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES(=3D256) =3D 8K=
B.
+	 */
+	__u64 reserved[1004];
+
+	/*
+	 * KVM_TDX_INIT_VM is called before vcpu creation, thus before
+	 * KVM_SET_CPUID2.
+	 * This configuration supersedes KVM_SET_CPUID2s for VCPUs. The user
+	 * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with this
+	 * values.
+	 */
+	struct kvm_cpuid2 cpuid;
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2a9f1e200dbc..8866e4c1ca2b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1378,6 +1378,13 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
 	return r;
 }
=20
+struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2( struct kvm_cpuid2 *cpuid,
+						u32 function, u32 index)
+{
+	return cpuid_entry2_find(cpuid->entries, cpuid->nent, function, index);
+}
+EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry2);
+
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
 						    u32 function, u32 index)
 {
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index b1658c0de847..a0e799297629 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -13,6 +13,8 @@ void kvm_set_cpu_caps(void);
=20
 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
 void kvm_update_pv_runtime(struct kvm_vcpu *vcpu);
+struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(struct kvm_cpuid2 *cpuid,
+					       u32 function, u32 index);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
 						    u32 function, u32 index);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 928eb47e7379..b172fcb075b2 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -6,7 +6,6 @@
 #include "capabilities.h"
 #include "x86_ops.h"
 #include "tdx.h"
-#include "tdx_ops.h"
 #include "x86.h"
=20
 #undef pr_fmt
@@ -298,18 +297,21 @@ static int tdx_do_tdh_mng_key_config(void *param)
 	return 0;
 }
=20
-static int __tdx_td_init(struct kvm *kvm);
-
 int tdx_vm_init(struct kvm *kvm)
 {
+	/*
+	 * This function initializes only KVM software construct.  It doesn't
+	 * initialize TDX stuff, e.g. TDCS, TDR, TDCX, HKID etc.
+	 * It is handled by KVM_TDX_INIT_VM, __tdx_td_init().
+	 */
+
 	/*
 	 * TDX has its own limit of the number of vcpus in addition to
 	 * KVM_MAX_VCPUS.
 	 */
 	kvm->max_vcpus =3D min(kvm->max_vcpus, TDX_MAX_VCPUS);
=20
-	/* Place holder for TDX specific logic. */
-	return __tdx_td_init(kvm);
+	return 0;
 }
=20
 int tdx_dev_ioctl(void __user *argp)
@@ -363,9 +365,162 @@ int tdx_dev_ioctl(void __user *argp)
 	return 0;
 }
=20
-static int __tdx_td_init(struct kvm *kvm)
+static void setup_tdparams_eptp_controls(struct kvm_cpuid2 *cpuid, struct =
td_params *td_params)
+{
+	const struct kvm_cpuid_entry2 *entry;
+	int max_pa =3D 36;
+
+	entry =3D kvm_find_cpuid_entry2(cpuid, 0x80000008, 0);
+	if (entry)
+		max_pa =3D entry->eax & 0xff;
+
+	td_params->eptp_controls =3D VMX_EPTP_MT_WB;
+	/*
+	 * No CPU supports 4-level && max_pa > 48.
+	 * "5-level paging and 5-level EPT" section 4.1 4-level EPT
+	 * "4-level EPT is limited to translating 48-bit guest-physical
+	 *  addresses."
+	 * cpu_has_vmx_ept_5levels() check is just in case.
+	 */
+	if (cpu_has_vmx_ept_5levels() && max_pa > 48) {
+		td_params->eptp_controls |=3D VMX_EPTP_PWL_5;
+		td_params->exec_controls |=3D TDX_EXEC_CONTROL_MAX_GPAW;
+	} else {
+		td_params->eptp_controls |=3D VMX_EPTP_PWL_4;
+	}
+}
+
+static void setup_tdparams_cpuids(const struct tdsysinfo_struct *tdsysinfo,
+				  struct kvm_cpuid2 *cpuid,
+				  struct td_params *td_params)
+{
+	int i;
+
+	/*
+	 * td_params.cpuid_values: The number and the order of cpuid_value must
+	 * be same to the one of struct tdsysinfo.{num_cpuid_config, cpuid_config=
s}
+	 * It's assumed that td_params was zeroed.
+	 */
+	for (i =3D 0; i < tdsysinfo->num_cpuid_config; i++) {
+		const struct tdx_cpuid_config *config =3D &tdsysinfo->cpuid_configs[i];
+		/* TDX_CPUID_NO_SUBLEAF in TDX CPUID_CONFIG means index =3D 0. */
+		u32 index =3D config->sub_leaf =3D=3D TDX_CPUID_NO_SUBLEAF ? 0: config->=
sub_leaf;
+		const struct kvm_cpuid_entry2 *entry =3D
+			kvm_find_cpuid_entry2(cpuid, config->leaf, index);
+		struct tdx_cpuid_value *value =3D &td_params->cpuid_values[i];
+
+		if (!entry)
+			continue;
+
+		/*
+		 * tdsysinfo.cpuid_configs[].{eax, ebx, ecx, edx}
+		 * bit 1 means it can be configured to zero or one.
+		 * bit 0 means it must be zero.
+		 * Mask out non-configurable bits.
+		 */
+		value->eax =3D entry->eax & config->eax;
+		value->ebx =3D entry->ebx & config->ebx;
+		value->ecx =3D entry->ecx & config->ecx;
+		value->edx =3D entry->edx & config->edx;
+	}
+}
+
+static int setup_tdparams_xfam(struct kvm_cpuid2 *cpuid, struct td_params =
*td_params)
+{
+	const struct kvm_cpuid_entry2 *entry;
+	u64 guest_supported_xcr0;
+	u64 guest_supported_xss;
+
+	/* Setup td_params.xfam */
+	entry =3D kvm_find_cpuid_entry2(cpuid, 0xd, 0);
+	if (entry)
+		guest_supported_xcr0 =3D (entry->eax | ((u64)entry->edx << 32));
+	else
+		guest_supported_xcr0 =3D 0;
+	guest_supported_xcr0 &=3D kvm_caps.supported_xcr0;
+
+	entry =3D kvm_find_cpuid_entry2(cpuid, 0xd, 1);
+	if (entry)
+		guest_supported_xss =3D (entry->ecx | ((u64)entry->edx << 32));
+	else
+		guest_supported_xss =3D 0;
+	/* PT can be exposed to TD guest regardless of KVM's XSS support */
+	guest_supported_xss &=3D (kvm_caps.supported_xss | XFEATURE_MASK_PT);
+
+	td_params->xfam =3D guest_supported_xcr0 | guest_supported_xss;
+	if (td_params->xfam & XFEATURE_MASK_LBR) {
+		/*
+		 * TODO: once KVM supports LBR(save/restore LBR related
+		 * registers around TDENTER), remove this guard.
+		 */
+		pr_warn("TD doesn't support LBR yet. KVM needs to save/restore "
+			"IA32_LBR_DEPTH properly.\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (td_params->xfam & XFEATURE_MASK_XTILE) {
+		/*
+		 * TODO: once KVM supports AMX(save/restore AMX related
+		 * registers around TDENTER), remove this guard.
+		 */
+		pr_warn("TD doesn't support AMX yet. KVM needs to save/restore "
+			"IA32_XFD, IA32_XFD_ERR properly.\n");
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static int setup_tdparams(struct kvm *kvm, struct td_params *td_params,
+			struct kvm_tdx_init_vm *init_vm)
+{
+	struct kvm_cpuid2 *cpuid =3D &init_vm->cpuid;
+	const struct tdsysinfo_struct *tdsysinfo;
+	int ret;
+
+	tdsysinfo =3D tdx_get_sysinfo();
+	if (!tdsysinfo)
+		return -ENOTSUPP;
+	if (kvm->created_vcpus)
+		return -EBUSY;
+
+	if (td_params->attributes & TDX_TD_ATTRIBUTE_PERFMON) {
+		/*
+		 * TODO: save/restore PMU related registers around TDENTER.
+		 * Once it's done, remove this guard.
+		 */
+		pr_warn("TD doesn't support perfmon yet. KVM needs to save/restore "
+			"host perf registers properly.\n");
+		return -EOPNOTSUPP;
+	}
+
+	td_params->max_vcpus =3D kvm->max_vcpus;
+	td_params->attributes =3D init_vm->attributes;
+	td_params->tsc_frequency =3D TDX_TSC_KHZ_TO_25MHZ(kvm->arch.default_tsc_k=
hz);
+
+	setup_tdparams_eptp_controls(cpuid, td_params);
+	setup_tdparams_cpuids(tdsysinfo, cpuid, td_params);
+	ret =3D setup_tdparams_xfam(cpuid, td_params);
+	if (ret)
+		return ret;
+
+#define MEMCPY_SAME_SIZE(dst, src)				\
+	do {							\
+		BUILD_BUG_ON(sizeof(dst) !=3D sizeof(src));	\
+		memcpy((dst), (src), sizeof(dst));		\
+	} while (0)
+
+	MEMCPY_SAME_SIZE(td_params->mrconfigid, init_vm->mrconfigid);
+	MEMCPY_SAME_SIZE(td_params->mrowner, init_vm->mrowner);
+	MEMCPY_SAME_SIZE(td_params->mrownerconfig, init_vm->mrownerconfig);
+
+	return 0;
+}
+
+static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params)
 {
 	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	struct tdx_module_output out;
 	cpumask_var_t packages;
 	unsigned long *tdcs_pa =3D NULL;
 	unsigned long tdr_pa =3D 0;
@@ -479,10 +634,13 @@ static int __tdx_td_init(struct kvm *kvm)
 		}
 	}
=20
-	/*
-	 * Note, TDH_MNG_INIT cannot be invoked here.  TDH_MNG_INIT requires a de=
dicated
-	 * ioctl() to define the configure CPUID values for the TD.
-	 */
+	err =3D tdh_mng_init(kvm_tdx->tdr_pa, __pa(td_params), &out);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_INIT, err, &out);
+		ret =3D -EIO;
+		goto teardown;
+	}
+
 	return 0;
=20
 	/*
@@ -517,6 +675,72 @@ static int __tdx_td_init(struct kvm *kvm)
 	return ret;
 }
=20
+static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	struct kvm_tdx_init_vm *init_vm =3D NULL;
+	struct td_params *td_params =3D NULL;
+	int ret;
+
+	BUILD_BUG_ON(sizeof(*init_vm) !=3D 8 * 1024);
+	BUILD_BUG_ON(sizeof(struct td_params) !=3D 1024);
+
+	if (is_hkid_assigned(kvm_tdx))
+		return -EINVAL;
+
+	if (cmd->flags)
+		return -EINVAL;
+
+	init_vm =3D kzalloc(sizeof(*init_vm) +
+			  sizeof(init_vm->cpuid.entries[0]) * KVM_MAX_CPUID_ENTRIES,
+			  GFP_KERNEL);
+	if (!init_vm)
+		return -ENOMEM;
+	if (copy_from_user(init_vm, (void __user *)cmd->data, sizeof(*init_vm))) {
+		ret =3D -EFAULT;
+		goto out;
+	}
+	if (init_vm->cpuid.nent > KVM_MAX_CPUID_ENTRIES) {
+		ret =3D -E2BIG;
+		goto out;
+	}
+	if (copy_from_user(init_vm->cpuid.entries,
+			   (void __user *)cmd->data + sizeof(*init_vm),
+			   sizeof(init_vm->cpuid.entries[0]) * init_vm->cpuid.nent)) {
+		ret =3D -EFAULT;
+		goto out;
+	}
+
+	if (init_vm->cpuid.padding) {
+		ret =3D -EINVAL;
+		goto out;
+	}
+
+	td_params =3D kzalloc(sizeof(struct td_params), GFP_KERNEL);
+	if (!td_params) {
+		ret =3D -ENOMEM;
+		goto out;
+	}
+
+	ret =3D setup_tdparams(kvm, td_params, init_vm);
+	if (ret)
+		goto out;
+
+	ret =3D __tdx_td_init(kvm, td_params);
+	if (ret)
+		goto out;
+
+	kvm_tdx->tsc_offset =3D td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFF=
SET);
+	kvm_tdx->attributes =3D td_params->attributes;
+	kvm_tdx->xfam =3D td_params->xfam;
+
+out:
+	/* kfree() accepts NULL. */
+	kfree(init_vm);
+	kfree(td_params);
+	return ret;
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -530,6 +754,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	mutex_lock(&kvm->lock);
=20
 	switch (tdx_cmd.id) {
+	case KVM_TDX_INIT_VM:
+		r =3D tdx_td_init(kvm, &tdx_cmd);
+		break;
 	default:
 		r =3D -EINVAL;
 		goto out;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 4b790503e43e..1e00e75b1c5e 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -12,7 +12,11 @@ struct kvm_tdx {
 	unsigned long tdr_pa;
 	unsigned long *tdcs_pa;
=20
+	u64 attributes;
+	u64 xfam;
 	int hkid;
+
+	u64 tsc_offset;
 };
=20
 struct vcpu_tdx {
@@ -39,6 +43,20 @@ static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *v=
cpu)
 {
 	return container_of(vcpu, struct vcpu_tdx, vcpu);
 }
+
+static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u3=
2 field)
+{
+	struct tdx_module_output out;
+	u64 err;
+
+	err =3D tdh_mng_rd(kvm_tdx->tdr_pa, TDCS_EXEC(field), &out);
+	if (unlikely(err)) {
+		pr_err("TDH_MNG_RD[EXEC.0x%x] failed: 0x%llx\n", field, err);
+		return 0;
+	}
+	return out.r8;
+}
+
 #else
 struct kvm_tdx {
 	struct kvm kvm;
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 861bbf4546c4..85cd178c10a7 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -535,6 +535,7 @@ struct kvm_pmu_event_filter {
 /* Trust Domain eXtension sub-ioctl() commands. */
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
+	KVM_TDX_INIT_VM,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -580,4 +581,36 @@ struct kvm_tdx_capabilities {
 	struct kvm_tdx_cpuid_config cpuid_configs[0];
 };
=20
+struct kvm_tdx_init_vm {
+	__u64 attributes;
+	__u32 max_vcpus;
+	__u32 padding;
+	__u64 mrconfigid[6];    /* sha384 digest */
+	__u64 mrowner[6];       /* sha384 digest */
+	__u64 mrownerconfig[6]; /* sha348 digest */
+	union {
+		/*
+		 * KVM_TDX_INIT_VM is called before vcpu creation, thus before
+		 * KVM_SET_CPUID2.  CPUID configurations needs to be passed.
+		 *
+		 * This configuration supersedes KVM_SET_CPUID{,2}.
+		 * The user space VMM, e.g. qemu, should make them consistent
+		 * with this values.
+		 * sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES(256)
+		 * =3D 8KB.
+		 */
+		struct {
+			struct kvm_cpuid2 cpuid;
+			/* 8KB with KVM_MAX_CPUID_ENTRIES. */
+			struct kvm_cpuid_entry2 entries[];
+		};
+		/*
+		 * For future extensibility.
+		 * The size(struct kvm_tdx_init_vm) =3D 16KB.
+		 * This should be enough given sizeof(TD_PARAMS) =3D 1024
+		 */
+		__u64 reserved[2028];
+	};
+};
+
 #endif /* _ASM_X86_KVM_H */
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7273DC64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230370AbjB0IYu (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:50 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55030 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229560AbjB0IYM (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:12 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95142EC50;
        Mon, 27 Feb 2023 00:24:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486249; x=1709022249;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=aUkxk0K+ujM2vXAltuKHLqNEsWK1nhlL3i5SZD9Eemg=;
  b=k8xV4BU8nj7T4b4gMDQlkzLq1nrvUBY7buCrApA6MMnWZ/2X+b8rBpNX
   eDih0Yiw/ACOCKNXfsgy1wHHlLIlygNq04bDzQf1Q6hL6wikApiDhzDgF
   A5YpUbPkL6fvjY0k4gh2ARXCIs4lEv0Ag1m4dWuTAqzhESTr06ovF9DsG
   GNMbZWQVcD/Gp3gF/GX/9k3HJmrOK4OjY1EW7lDMg/L2c+zvM7k+bXiNr
   3MgoOXtyl+yDDSEq4fInCxqKIJnkB3jxNIsxOo8lRbjWBBqDg4ASydmUv
   Pwe5he6lQetZPFDV7UtaydKIf+TY0pKAK0rP/seA163X8FbvBDog/0Nuu
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608719"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608719"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:03 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242060"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242060"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:03 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 015/106] KVM: TDX: Make pmu_intel.c ignore guest TD case
Date: Mon, 27 Feb 2023 00:22:14 -0800
Message-Id: 
 <0fbac4f1a4c25536cd53ad2194ee1b25d17d1702.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because TDX KVM doesn't support PMU yet (it's future work of TDX KVM
support as another patch series) and pmu_intel.c touches vmx specific
structure in vcpu initialization, as workaround add dummy structure to
struct vcpu_tdx and pmu_intel.c can ignore TDX case.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 46 +++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/pmu_intel.h | 28 ++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h       |  8 ++++++-
 arch/x86/kvm/vmx/vmx.c       |  2 +-
 arch/x86/kvm/vmx/vmx.h       | 32 +------------------------
 5 files changed, 82 insertions(+), 34 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/pmu_intel.h

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index efce9ad70e4e..39f43b0290c5 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -19,6 +19,7 @@
 #include "lapic.h"
 #include "nested.h"
 #include "pmu.h"
+#include "tdx.h"
=20
 #define MSR_PMC_FULL_WIDTH_BIT      (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
=20
@@ -37,6 +38,26 @@ static struct kvm_event_hw_type_mapping intel_arch_event=
s[] =3D {
 /* mapping between fixed pmc index and intel_arch_events array */
 static int fixed_pmc_events[] =3D {1, 0, 7};
=20
+struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_INTEL_TDX_HOST
+	if (is_td_vcpu(vcpu))
+		return &to_tdx(vcpu)->lbr_desc;
+#endif
+
+	return &to_vmx(vcpu)->lbr_desc;
+}
+
+struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_INTEL_TDX_HOST
+	if (is_td_vcpu(vcpu))
+		return &to_tdx(vcpu)->lbr_desc.records;
+#endif
+
+	return &to_vmx(vcpu)->lbr_desc.records;
+}
+
 static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 {
 	struct kvm_pmc *pmc;
@@ -169,6 +190,23 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm=
_pmu *pmu, u32 msr)
 	return get_gp_pmc(pmu, msr, MSR_IA32_PMC0);
 }
=20
+bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return false;
+	return cpuid_model_is_consistent(vcpu);
+}
+
+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
+{
+	struct x86_pmu_lbr *lbr =3D vcpu_to_lbr_records(vcpu);
+
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return lbr->nr && (vcpu_get_perf_capabilities(vcpu) & PMU_CAP_LBR_FMT);
+}
+
 static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
 {
 	struct x86_pmu_lbr *records =3D vcpu_to_lbr_records(vcpu);
@@ -279,6 +317,9 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *v=
cpu)
 					PERF_SAMPLE_BRANCH_USER,
 	};
=20
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return 0;
+
 	if (unlikely(lbr_desc->event)) {
 		__set_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use);
 		return 0;
@@ -588,7 +629,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters);
=20
 	perf_capabilities =3D vcpu_get_perf_capabilities(vcpu);
-	if (cpuid_model_is_consistent(vcpu) &&
+	if (intel_pmu_lbr_is_compatible(vcpu) &&
 	    (perf_capabilities & PMU_CAP_LBR_FMT))
 		x86_perf_get_lbr(&lbr_desc->records);
 	else
@@ -644,6 +685,9 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
 	struct kvm_pmc *pmc =3D NULL;
 	int i;
=20
+	if (is_td_vcpu(vcpu))
+		return;
+
 	for (i =3D 0; i < KVM_INTEL_PMC_MAX_GENERIC; i++) {
 		pmc =3D &pmu->gp_counters[i];
=20
diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h
new file mode 100644
index 000000000000..66bba47c1269
--- /dev/null
+++ b/arch/x86/kvm/vmx/pmu_intel.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_VMX_PMU_INTEL_H
+#define  __KVM_X86_VMX_PMU_INTEL_H
+
+struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu);
+struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu);
+
+bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
+bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);
+int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
+
+struct lbr_desc {
+	/* Basic info about guest LBR records. */
+	struct x86_pmu_lbr records;
+
+	/*
+	 * Emulate LBR feature via passthrough LBR registers when the
+	 * per-vcpu guest LBR event is scheduled on the current pcpu.
+	 *
+	 * The records may be inaccurate if the host reclaims the LBR.
+	 */
+	struct perf_event *event;
+
+	/* True if LBRs are marked as not intercepted in the MSR bitmap */
+	bool msr_passthrough;
+};
+
+#endif /* __KVM_X86_VMX_PMU_INTEL_H */
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 1e00e75b1c5e..5728820fed5e 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -4,6 +4,7 @@
=20
 #ifdef CONFIG_INTEL_TDX_HOST
=20
+#include "pmu_intel.h"
 #include "tdx_ops.h"
=20
 struct kvm_tdx {
@@ -21,7 +22,12 @@ struct kvm_tdx {
=20
 struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
-	/* TDX specific members follow. */
+
+	/*
+	 * Dummy to make pmu_intel not corrupt memory.
+	 * TODO: Support PMU for TDX.  Future work.
+	 */
+	struct lbr_desc lbr_desc;
 };
=20
 static inline bool is_td(struct kvm *kvm)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bddbdd2988f4..5d2ff4d964bd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2434,7 +2434,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_dat=
a *msr_info)
 			if ((data & PMU_CAP_LBR_FMT) !=3D
 			    (kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT))
 				return 1;
-			if (!cpuid_model_is_consistent(vcpu))
+			if (!intel_pmu_lbr_is_compatible(vcpu))
 				return 1;
 		}
 		if (data & PERF_CAP_PEBS_FORMAT) {
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a3da84f4ea45..d49d0ace9fb8 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -11,6 +11,7 @@
 #include "capabilities.h"
 #include "../kvm_cache_regs.h"
 #include "posted_intr.h"
+#include "pmu_intel.h"
 #include "vmcs.h"
 #include "vmx_ops.h"
 #include "../cpuid.h"
@@ -105,22 +106,6 @@ static inline bool intel_pmu_has_perf_global_ctrl(stru=
ct kvm_pmu *pmu)
 	return pmu->version > 1;
 }
=20
-struct lbr_desc {
-	/* Basic info about guest LBR records. */
-	struct x86_pmu_lbr records;
-
-	/*
-	 * Emulate LBR feature via passthrough LBR registers when the
-	 * per-vcpu guest LBR event is scheduled on the current pcpu.
-	 *
-	 * The records may be inaccurate if the host reclaims the LBR.
-	 */
-	struct perf_event *event;
-
-	/* True if LBRs are marked as not intercepted in the MSR bitmap */
-	bool msr_passthrough;
-};
-
 /*
  * The nested_vmx structure is part of vcpu_vmx, and holds information we =
need
  * for correct emulation of VMX (i.e., nested VMX) on this vcpu.
@@ -650,21 +635,6 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu =
*vcpu)
 	return container_of(vcpu, struct vcpu_vmx, vcpu);
 }
=20
-static inline struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu)
-{
-	return &to_vmx(vcpu)->lbr_desc;
-}
-
-static inline struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcp=
u)
-{
-	return &vcpu_to_lbr_desc(vcpu)->records;
-}
-
-static inline bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
-{
-	return !!vcpu_to_lbr_records(vcpu)->nr;
-}
-
 void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu);
 int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
 void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu);
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 46B86C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230388AbjB0IYy (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:24:54 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55116 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230189AbjB0IYN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:13 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F0D51165A;
        Mon, 27 Feb 2023 00:24:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486250; x=1709022250;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=zCc6NW7WYHgo5usOtP96CBCwLP0H+JhEtaoEGU4OuyU=;
  b=Z4gyfwj8+8Uz0vJ1EPp4mowFiOgugKHuZlDNi2f/QyBQJ4Hgz8D69VHr
   haa2DAfVExPtoqg9pRYHdXMNVu5FGrqzXv8Tu4tc2pigS1zNY6I1A5xqk
   29PuT17xtLkEALFhaGp3r74aqIiiuPhcL+J0C6bTL5Itjg8NsgFE25QOU
   nRbofBFfwwKEy1mLt1kzTU1YpPtbGahI9aWaLDV/3T7tuk+CLLxg22yBg
   GlncoabpYGTAWZyRrhfOC3ZU6d/0T/D9nzzBEXlZrIcuvCnsMeQI6s3TK
   LxTw3o4O2WrceTPSiD1h4eJYRsP9KcxN5uVwkZj9hFDUhV8seVPgHndzU
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608724"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608724"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:03 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242067"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242067"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:03 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 016/106] KVM: TDX: Refuse to unplug the last cpu on the
 package
Date: Mon, 27 Feb 2023 00:22:15 -0800
Message-Id: 
 <2569a9b652be6e69ef69b5a930cdb170f773b4aa.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

In order to reclaim TDX HKID, (i.e. when deleting guest TD), needs to call
TDH.PHYMEM.PAGE.WBINVD on all packages.  If we have active TDX HKID, refuse
to offline the last online cpu to guarantee at least one CPU online per
package. Add arch callback for cpu offline.
Because TDX doesn't support suspend by the TDX 1.0 spec, this also refuses
suspend if TDs are running.  If no TD is running, suspend is allowed.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/vmx/main.c            |  1 +
 arch/x86/kvm/vmx/tdx.c             | 43 +++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/x86_ops.h         |  2 ++
 arch/x86/kvm/x86.c                 |  5 ++++
 include/linux/kvm_host.h           |  1 +
 virt/kvm/kvm_main.c                | 12 +++++++--
 8 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index c30d2d2ad686..f763981b7dbc 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -18,6 +18,7 @@ KVM_X86_OP(check_processor_compatibility)
 KVM_X86_OP(hardware_enable)
 KVM_X86_OP(hardware_disable)
 KVM_X86_OP(hardware_unsetup)
+KVM_X86_OP_OPTIONAL_RET0(offline_cpu)
 KVM_X86_OP(has_emulated_msr)
 KVM_X86_OP(vcpu_after_set_cpuid)
 KVM_X86_OP(is_vm_type_supported)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index f4e82ee3d668..5ca84fd5bd43 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1543,6 +1543,7 @@ struct kvm_x86_ops {
 	int (*hardware_enable)(void);
 	void (*hardware_disable)(void);
 	void (*hardware_unsetup)(void);
+	int (*offline_cpu)(void);
 	bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
 	void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
=20
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index cdc73c09bf0b..a15ee25d47e0 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -108,6 +108,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.check_processor_compatibility =3D vmx_check_processor_compat,
=20
 	.hardware_unsetup =3D vt_hardware_unsetup,
+	.offline_cpu =3D tdx_offline_cpu,
=20
 	.hardware_enable =3D vmx_hardware_enable,
 	.hardware_disable =3D vmx_hardware_disable,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b172fcb075b2..8d657bacc050 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -63,6 +63,7 @@ static struct tdx_info tdx_info;
  */
 static DEFINE_MUTEX(tdx_lock);
 static struct mutex *tdx_mng_key_config_lock;
+static atomic_t nr_configured_hkid;
=20
 static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
 {
@@ -238,7 +239,8 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
 		pr_err("tdh_mng_key_freeid failed. HKID %d is leaked.\n",
 			kvm_tdx->hkid);
 		return;
-	}
+	} else
+		atomic_dec(&nr_configured_hkid);
=20
 free_hkid:
 	tdx_hkid_free(kvm_tdx);
@@ -615,6 +617,8 @@ static int __tdx_td_init(struct kvm *kvm, struct td_par=
ams *td_params)
 		if (ret)
 			break;
 	}
+	if (!ret)
+		atomic_inc(&nr_configured_hkid);
 	cpus_read_unlock();
 	free_cpumask_var(packages);
 	if (ret)
@@ -833,3 +837,40 @@ void tdx_hardware_unsetup(void)
 	/* kfree accepts NULL. */
 	kfree(tdx_mng_key_config_lock);
 }
+
+int tdx_offline_cpu(void)
+{
+	int curr_cpu =3D smp_processor_id();
+	cpumask_var_t packages;
+	int ret =3D 0;
+	int i;
+
+	/* No TD is running.  Allow any cpu to be offline. */
+	if (!atomic_read(&nr_configured_hkid))
+		return 0;
+
+	/*
+	 * In order to reclaim TDX HKID, (i.e. when deleting guest TD), need to
+	 * call TDH.PHYMEM.PAGE.WBINVD on all packages to program all memory
+	 * controller with pconfig.  If we have active TDX HKID, refuse to
+	 * offline the last online cpu.
+	 */
+	if (!zalloc_cpumask_var(&packages, GFP_KERNEL))
+		return -ENOMEM;
+	for_each_online_cpu(i) {
+		if (i !=3D curr_cpu)
+			cpumask_set_cpu(topology_physical_package_id(i), packages);
+	}
+	/* Check if this cpu is the last online cpu of this package. */
+	if (!cpumask_test_cpu(topology_physical_package_id(curr_cpu), packages))
+		ret =3D -EBUSY;
+	free_cpumask_var(packages);
+	if (ret)
+		/*
+		 * Because it's hard for human operator to understand the
+		 * reason, warn it.
+		 */
+		pr_warn_ratelimited("TDX requires all packages to have an online CPU. "
+				    "Delete all TDs in order to offline all CPUs of a package.\n");
+	return ret;
+}
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index e497a5347329..4960e7d58add 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -142,6 +142,7 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_o=
ps);
 void tdx_hardware_unsetup(void);
 bool tdx_is_vm_type_supported(unsigned long type);
 int tdx_dev_ioctl(void __user *argp);
+int tdx_offline_cpu(void);
=20
 int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap);
 int tdx_vm_init(struct kvm *kvm);
@@ -153,6 +154,7 @@ static inline int tdx_hardware_setup(struct kvm_x86_ops=
 *x86_ops) { return -ENOS
 static inline void tdx_hardware_unsetup(void) {}
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
 static inline int tdx_dev_ioctl(void __user *argp) { return -EOPNOTSUPP; };
+static inline int tdx_offline_cpu(void) { return 0; }
=20
 static inline int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_cap=
 *cap) { return -EINVAL; };
 static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2dd5670f552..a2a2d62d490a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12119,6 +12119,11 @@ void kvm_arch_hardware_disable(void)
 	drop_user_return_notifiers();
 }
=20
+int kvm_arch_offline_cpu(unsigned int cpu)
+{
+	return static_call(kvm_x86_offline_cpu)();
+}
+
 bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
 {
 	return vcpu->kvm->arch.bsp_vcpu_id =3D=3D vcpu->vcpu_id;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7b0bef248dd8..f6470338d5fa 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1478,6 +1478,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm=
_vcpu *vcpu) {}
 int kvm_arch_hardware_enable(void);
 void kvm_arch_hardware_disable(void);
 #endif
+int kvm_arch_offline_cpu(unsigned int cpu);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3f1a55834440..f8495e27d210 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5509,13 +5509,21 @@ static void hardware_disable_nolock(void *junk)
 	__this_cpu_write(hardware_enabled, false);
 }
=20
+__weak int kvm_arch_offline_cpu(unsigned int cpu)
+{
+	return 0;
+}
+
 static int kvm_offline_cpu(unsigned int cpu)
 {
+	int r =3D 0;
+
 	mutex_lock(&kvm_lock);
-	if (kvm_usage_count)
+	r =3D kvm_arch_offline_cpu(cpu);
+	if (!r && kvm_usage_count)
 		hardware_disable_nolock(NULL);
 	mutex_unlock(&kvm_lock);
-	return 0;
+	return r;
 }
=20
 static void hardware_disable_all_nolock(void)
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 48EEDC7EE30
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:05 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230510AbjB0IZD (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:03 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55130 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230191AbjB0IYN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:13 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43FA01C305;
        Mon, 27 Feb 2023 00:24:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486251; x=1709022251;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=yLb9uSg9ibzit6JQaiXE1CHceF2wSNvx507S82c8rJA=;
  b=XFTu2Dj7grgiZpzJz1xJq0jOi7Pw/oGRX71UTROvGTM0HVcBO2m8HGGp
   fqjeGvHY8k/XqOwUvpgwBNAd74wKD07zw4jJC6cmglB4dK6S7s2SJHHxb
   BtmqonHpAkQYFSHLTSATaAkrktgsLaP0S0l5bPb3sOk9Ebo8Rd6cd3zUI
   QssHUkQQQ8Bw/Js/7qDbQ4HFipCKSEf1z4l5Kn7+k+yO+HUkf83n2nRqg
   SO0Hr3vd9M+MiwRGsh0DFr8H98aJAPrHR1kNCaZ+GAty00gjtgj6Eis8o
   IkIQaYs1GvIummS4WFNGLt7C1aIeLKH32OGZt5CiMgEKKwBuD6RX0GCnF
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608728"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608728"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:04 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242074"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242074"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:03 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 017/106] [MARKER] The start of TDX KVM patch series: TD
 vcpu creation/destruction
Date: Mon, 27 Feb 2023 00:22:16 -0800
Message-Id: 
 <1764bee4547d53ee40137737a82b9dd712fe307b.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD vcpu
creation/destruction.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 221372cfb4af..a4ee04271d66 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -9,15 +9,15 @@ Layer status
 What qemu can do
 ----------------
 - TDX VM TYPE is exposed to Qemu.
-- Qemu can try to create VM of TDX VM type and then fails.
+- Qemu can create/destroy guest of TDX vm type.
=20
 Patch Layer status
 ------------------
   Patch layer                          Status
 * TDX, VMX coexistence:                 Applied
 * TDX architectural definitions:        Applied
-* TD VM creation/destruction:           Applying
-* TD vcpu creation/destruction:         Not yet
+* TD VM creation/destruction:           Applied
+* TD vcpu creation/destruction:         Applying
 * TDX EPT violation:                    Not yet
 * TD finalization:                      Not yet
 * TD vcpu enter/exit:                   Not yet
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AD23EC64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231132AbjB0IZP (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:15 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55116 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230207AbjB0IYO (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:14 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38CC61BAE7;
        Mon, 27 Feb 2023 00:24:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486252; x=1709022252;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=WVUeIvYEdAcjqa2VkjlwVtG3JQgy8obVJnSSJx2b6f0=;
  b=UjUaFabEw+UW4unEbAWJbhhozsKkAr1HN37nOagQROPp+AhNCCCzq/Da
   qfHEdg9Pk6eOgN/H4IG747BxNWQ6pldCIWKzEfLd+TMpi9d/rSlN8ofcw
   yivIPCwR97vCgi2B0MJThjB9v6zfTx303z7q3NqnTM5A5COj5s/9dTTqg
   zpF6cpN6s4rDPrdsMmQlRACgdgzV/RpYgDkkaCHPB+2jp4I+IIKjV4hEF
   jXks1lLVdxuvhzzGPjqEhAaXnrfoDxIsHQC3ag4WHDDFlkiMxjKbhL6W6
   w+JqlNbjBkGw+qtmRBNDS9hFIgKOtY/Gjk66VIr9xuvMryxss6RHorEYd
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608734"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608734"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:04 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242079"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242079"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:03 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 018/106] KVM: TDX: allocate/free TDX vcpu structure
Date: Mon, 27 Feb 2023 00:22:17 -0800
Message-Id: 
 <d9a90145ff5ce000ba28b4e9bac969f5cf802c1d.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The next step of TDX guest creation is to create vcpu.  Allocate TDX vcpu
structures, initialize it that doesn't require TDX SEAMCALL.  Allocate
pages of TDX vcpu for the TDX module.  Actual donation TDX vcpu pages to
the TDX module is not done yet.

In the case of the conventional case, cpuid is empty at the initialization.
and cpuid is configured after the vcpu initialization.  Because TDX
supports only X2APIC mode, cpuid is forcibly initialized to support X2APIC
on the vcpu initialization in vcpu_reset method. because
kvm_arch_vcpu_create() also initializes kvm MMU that depends on local apic
settings.  So x2apic needs to be initialized to X2APIC mode by vcpu_reset
method.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
Changes v11 -> v12:
- add more comments in tdx_vcpu_reset().
- use KVM_BUG_ON()

Changes v10 -> v11:
- NULL check of kvmalloc_array() in tdx_vcpu_reset. Move it to
  tdx_vcpu_create()
---
 arch/x86/kvm/vmx/main.c    | 44 ++++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 82 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h | 10 +++++
 arch/x86/kvm/x86.c         |  2 +
 4 files changed, 134 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index a15ee25d47e0..904b98a9a7ed 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -83,6 +83,42 @@ static void vt_vm_free(struct kvm *kvm)
 		tdx_vm_free(kvm);
 }
=20
+static int vt_vcpu_precreate(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return 0;
+
+	return vmx_vcpu_precreate(kvm);
+}
+
+static int vt_vcpu_create(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_create(vcpu);
+
+	return vmx_vcpu_create(vcpu);
+}
+
+static void vt_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_vcpu_free(vcpu);
+		return;
+	}
+
+	vmx_vcpu_free(vcpu);
+}
+
+static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_vcpu_reset(vcpu, init_event);
+		return;
+	}
+
+	vmx_vcpu_reset(vcpu, init_event);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -123,10 +159,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vm_destroy =3D vt_vm_destroy,
 	.vm_free =3D vt_vm_free,
=20
-	.vcpu_precreate =3D vmx_vcpu_precreate,
-	.vcpu_create =3D vmx_vcpu_create,
-	.vcpu_free =3D vmx_vcpu_free,
-	.vcpu_reset =3D vmx_vcpu_reset,
+	.vcpu_precreate =3D vt_vcpu_precreate,
+	.vcpu_create =3D vt_vcpu_create,
+	.vcpu_free =3D vt_vcpu_free,
+	.vcpu_reset =3D vt_vcpu_reset,
=20
 	.prepare_switch_to_guest =3D vmx_prepare_switch_to_guest,
 	.vcpu_load =3D vmx_vcpu_load,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 8d657bacc050..e6c83634582e 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -316,6 +316,88 @@ int tdx_vm_init(struct kvm *kvm)
 	return 0;
 }
=20
+int tdx_vcpu_create(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *e;
+
+	/*
+	 * On cpu creation, cpuid entry is blank.  Forcibly enable
+	 * X2APIC feature to allow X2APIC.
+	 * Because vcpu_reset() can't return error, allocation is done here.
+	 */
+	WARN_ON_ONCE(vcpu->arch.cpuid_entries);
+	WARN_ON_ONCE(vcpu->arch.cpuid_nent);
+	/*
+	 * Because vcpu->arch.cpuid_entries is freed by kvfree(), use kvmalloc
+	 * same to kvm_vcpu_ioctl_set_cpuid().
+	 * In error case, the memory freeing is done by kvm_arch_destroy_vm()
+	 * =3D> kvm_destroy_vpcus() =3D> kvm_vcpu_destroy() =3D>
+	 * kvm_arch_vcpu_destroy().
+	 */
+	e =3D kvmalloc_array(1, sizeof(*e), GFP_KERNEL_ACCOUNT);
+	if (!e)
+		return -ENOMEM;
+	*e  =3D (struct kvm_cpuid_entry2) {
+		.function =3D 1,	/* Features for X2APIC */
+		.index =3D 0,
+		.eax =3D 0,
+		.ebx =3D 0,
+		.ecx =3D 1ULL << 21,	/* X2APIC */
+		.edx =3D 0,
+	};
+	vcpu->arch.cpuid_entries =3D e;
+	vcpu->arch.cpuid_nent =3D 1;
+
+	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
+	if (!vcpu->arch.apic)
+		return -EINVAL;
+
+	fpstate_set_confidential(&vcpu->arch.guest_fpu);
+
+	vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
+
+	vcpu->arch.cr0_guest_owned_bits =3D -1ul;
+	vcpu->arch.cr4_guest_owned_bits =3D -1ul;
+
+	vcpu->arch.tsc_offset =3D to_kvm_tdx(vcpu->kvm)->tsc_offset;
+	vcpu->arch.l1_tsc_offset =3D vcpu->arch.tsc_offset;
+	vcpu->arch.guest_state_protected =3D
+		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
+
+	return 0;
+}
+
+void tdx_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	/* This is stub for now.  More logic will come. */
+}
+
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+	struct msr_data apic_base_msr;
+
+	/* Ignore INIT silently because TDX doesn't support INIT event. */
+	if (init_event)
+		return;
+
+	/*
+	 * TDX requires X2APIC. kvm_arch_vcpu_reset() initialize KVM mmu that
+	 * depends on local apic setting.  Set local apic mode before it.
+	 */
+	apic_base_msr.data =3D APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC;
+	if (kvm_vcpu_is_reset_bsp(vcpu))
+		apic_base_msr.data |=3D MSR_IA32_APICBASE_BSP;
+	apic_base_msr.host_initiated =3D true;
+	if (KVM_BUG_ON(kvm_set_apic_base(vcpu, &apic_base_msr), vcpu->kvm))
+		return;
+
+	/*
+	 * Don't update mp_state to runnable because more initialization
+	 * is needed by TDX_VCPU_INIT.
+	 */
+	return;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 4960e7d58add..b7708e725e93 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -148,7 +148,12 @@ int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enab=
le_cap *cap);
 int tdx_vm_init(struct kvm *kvm);
 void tdx_mmu_release_hkid(struct kvm *kvm);
 void tdx_vm_free(struct kvm *kvm);
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
+
+int tdx_vcpu_create(struct kvm_vcpu *vcpu);
+void tdx_vcpu_free(struct kvm_vcpu *vcpu);
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 -ENOSYS; }
 static inline void tdx_hardware_unsetup(void) {}
@@ -161,7 +166,12 @@ static inline int tdx_vm_init(struct kvm *kvm) { retur=
n -EOPNOTSUPP; }
 static inline void tdx_mmu_release_hkid(struct kvm *kvm) {}
 static inline void tdx_flush_shadow_all_private(struct kvm *kvm) {}
 static inline void tdx_vm_free(struct kvm *kvm) {}
+
 static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur=
n -EOPNOTSUPP; }
+
+static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTS=
UPP; }
+static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
+static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a2a2d62d490a..275bdbcb3043 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -492,6 +492,7 @@ int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr=
_data *msr_info)
 	kvm_recalculate_apic_map(vcpu->kvm);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(kvm_set_apic_base);
=20
 /*
  * Handle a fault on a hardware virtualization (VMX or SVM) instruction.
@@ -12128,6 +12129,7 @@ bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
 {
 	return vcpu->kvm->arch.bsp_vcpu_id =3D=3D vcpu->vcpu_id;
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_is_reset_bsp);
=20
 bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
 {
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B27EEC7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230521AbjB0IZG (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:06 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55138 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230220AbjB0IYP (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:15 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52D181C7F0;
        Mon, 27 Feb 2023 00:24:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486252; x=1709022252;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=FRF4YG7latKPR+rb5VKLvf/aGh0jUExntGVwuO44yDU=;
  b=UUV/onQOKYqqLY+RdYA/vYuIQXFw/RYnKehw2TnhcKjeagdJCBRUr0/g
   ACYcmehr5BKrBn/oLJQh4Zj0doNQd2WFJl1lq3IZYbQgOVzXaA0oHFlET
   7UvVRY6xChbOIGPHjtIK0MH/2L3s9IJxwpnZc8cH2yZuDx2+cZwXPeTEI
   lA8vhMfbuycZTALK8aJWt6z07/8uurCQ0GOqef6fSz1qxvDN9NcJ5wat1
   dpwOxIPiwaj36oLmvckLUsSWxdOJPC3KYZP6NVtcTRVedarPBdQ/l3hMZ
   wENyXuVwOYeC0QFGkDn+vaFzXFNwu+fKbXaHi4GoHjIjgLTVb2tkj9FHS
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608738"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608738"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:04 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242086"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242086"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:04 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 019/106] KVM: TDX: Do TDX specific vcpu initialization
Date: Mon, 27 Feb 2023 00:22:18 -0800
Message-Id: 
 <c8d5398f675b46d6209e3ba522a797e14429a668.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TD guest vcpu needs TDX specific initialization before running.  Repurpose
KVM_MEMORY_ENCRYPT_OP to vcpu-scope, add a new sub-command
KVM_TDX_INIT_VCPU, and implement the callback for it.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h    |   1 +
 arch/x86/include/asm/kvm_host.h       |   1 +
 arch/x86/include/uapi/asm/kvm.h       |   1 +
 arch/x86/kvm/vmx/main.c               |   9 ++
 arch/x86/kvm/vmx/tdx.c                | 154 +++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h                |   7 ++
 arch/x86/kvm/vmx/x86_ops.h            |   4 +
 arch/x86/kvm/x86.c                    |   6 +
 tools/arch/x86/include/uapi/asm/kvm.h |   1 +
 9 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index f763981b7dbc..d29e16098c30 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -124,6 +124,7 @@ KVM_X86_OP(enable_smi_window)
 #endif
 KVM_X86_OP(dev_mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
+KVM_X86_OP_OPTIONAL(vcpu_mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_register_region)
 KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
 KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 5ca84fd5bd43..bc8d40572238 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1709,6 +1709,7 @@ struct kvm_x86_ops {
=20
 	int (*dev_mem_enc_ioctl)(void __user *argp);
 	int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp);
+	int (*vcpu_mem_enc_ioctl)(struct kvm_vcpu *vcpu, void __user *argp);
 	int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *ar=
gp);
 	int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *=
argp);
 	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 04b3fa91e5b9..212df13e4ab5 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -536,6 +536,7 @@ struct kvm_pmu_event_filter {
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
 	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 904b98a9a7ed..fa0590e37ec1 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -127,6 +127,14 @@ static int vt_mem_enc_ioctl(struct kvm *kvm, void __us=
er *argp)
 	return tdx_vm_ioctl(kvm, argp);
 }
=20
+static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
+{
+	if (!is_td_vcpu(vcpu))
+		return -EINVAL;
+
+	return tdx_vcpu_ioctl(vcpu, argp);
+}
+
 #define VMX_REQUIRED_APICV_INHIBITS		       \
 (						       \
        BIT(APICV_INHIBIT_REASON_DISABLE)|	       \
@@ -286,6 +294,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.dev_mem_enc_ioctl =3D tdx_dev_ioctl,
 	.mem_enc_ioctl =3D vt_mem_enc_ioctl,
+	.vcpu_mem_enc_ioctl =3D vt_vcpu_mem_enc_ioctl,
 };
=20
 struct kvm_x86_init_ops vt_init_ops __initdata =3D {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index e6c83634582e..ae7acf78631f 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -48,6 +48,7 @@ int tdx_vm_enable_cap(struct kvm *kvm, struct kvm_enable_=
cap *cap)
=20
 struct tdx_info {
 	u8 nr_tdcs_pages;
+	u8 nr_tdvpx_pages;
 };
=20
 /* Info about the TDX module. */
@@ -70,6 +71,11 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u=
16 hkid)
 	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
 }
=20
+static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
+{
+	return tdx->tdvpr_pa;
+}
+
 static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
 {
 	return kvm_tdx->tdr_pa;
@@ -86,6 +92,11 @@ static inline bool is_hkid_assigned(struct kvm_tdx *kvm_=
tdx)
 	return kvm_tdx->hkid > 0;
 }
=20
+static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx)
+{
+	return kvm_tdx->finalized;
+}
+
 static void tdx_clear_page(unsigned long page_pa)
 {
 	const void *zero_page =3D (const void *) __va(page_to_phys(ZERO_PAGE(0)));
@@ -369,7 +380,25 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
=20
 void tdx_vcpu_free(struct kvm_vcpu *vcpu)
 {
-	/* This is stub for now.  More logic will come. */
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	int i;
+
+	/*
+	 * This methods can be called when vcpu allocation/initialization
+	 * failed. So it's possible that hkid, tdvpx and tdvpr are not assigned
+	 * yet.
+	 */
+	if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
+		return;
+
+	if (tdx->tdvpx_pa) {
+		for (i =3D 0; i < tdx_info.nr_tdvpx_pages; i++)
+			tdx_reclaim_td_page(tdx->tdvpx_pa[i]);
+		kfree(tdx->tdvpx_pa);
+		tdx->tdvpx_pa =3D NULL;
+	}
+	tdx_reclaim_td_page(tdx->tdvpr_pa);
+	tdx->tdvpr_pa =3D 0;
 }
=20
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -379,6 +408,8 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_ev=
ent)
 	/* Ignore INIT silently because TDX doesn't support INIT event. */
 	if (init_event)
 		return;
+	if (KVM_BUG_ON(is_td_vcpu_created(to_tdx(vcpu)), vcpu->kvm))
+		return;
=20
 	/*
 	 * TDX requires X2APIC. kvm_arch_vcpu_reset() initialize KVM mmu that
@@ -856,6 +887,122 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	return r;
 }
=20
+/* VMM can pass one 64bit auxiliary data to vcpu via RCX for guest BIOS. */
+static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	unsigned long *tdvpx_pa =3D NULL;
+	unsigned long tdvpr_pa;
+	unsigned long va;
+	int ret, i;
+	u64 err;
+
+	if (is_td_vcpu_created(tdx))
+		return -EINVAL;
+
+	/*
+	 * vcpu_free method frees allocated pages.  Avoid partial setup so
+	 * that the method can't handle it.
+	 */
+	va =3D __get_free_page(GFP_KERNEL_ACCOUNT);
+	if (!va)
+		return -ENOMEM;
+	tdvpr_pa =3D __pa(va);
+
+	tdvpx_pa =3D kcalloc(tdx_info.nr_tdvpx_pages, sizeof(*tdx->tdvpx_pa),
+			   GFP_KERNEL_ACCOUNT);
+	if (!tdvpx_pa) {
+		ret =3D -ENOMEM;
+		goto free_tdvpr;
+	}
+	for (i =3D 0; i < tdx_info.nr_tdvpx_pages; i++) {
+		va =3D __get_free_page(GFP_KERNEL_ACCOUNT);
+		if (!va) {
+			ret =3D -ENOMEM;
+			goto free_tdvpx;
+		}
+		tdvpx_pa[i] =3D __pa(va);
+	}
+
+	err =3D tdh_vp_create(kvm_tdx->tdr_pa, tdvpr_pa);
+	if (KVM_BUG_ON(err, vcpu->kvm)) {
+		ret =3D -EIO;
+		pr_tdx_error(TDH_VP_CREATE, err, NULL);
+		goto free_tdvpx;
+	}
+	tdx->tdvpr_pa =3D tdvpr_pa;
+
+	tdx->tdvpx_pa =3D tdvpx_pa;
+	for (i =3D 0; i < tdx_info.nr_tdvpx_pages; i++) {
+		err =3D tdh_vp_addcx(tdx->tdvpr_pa, tdvpx_pa[i]);
+		if (KVM_BUG_ON(err, vcpu->kvm)) {
+			pr_tdx_error(TDH_VP_ADDCX, err, NULL);
+			for (; i < tdx_info.nr_tdvpx_pages; i++) {
+				free_page((unsigned long)__va(tdvpx_pa[i]));
+				tdvpx_pa[i] =3D 0;
+			}
+			/* vcpu_free method frees TDVPX and TDR donated to TDX */
+			return -EIO;
+		}
+	}
+
+	err =3D tdh_vp_init(tdx->tdvpr_pa, vcpu_rcx);
+	if (KVM_BUG_ON(err, vcpu->kvm)) {
+		pr_tdx_error(TDH_VP_INIT, err, NULL);
+		return -EIO;
+	}
+
+	vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
+	return 0;
+
+free_tdvpx:
+	for (i =3D 0; i < tdx_info.nr_tdvpx_pages; i++) {
+		if (tdvpx_pa[i])
+			free_page((unsigned long)__va(tdvpx_pa[i]));
+		tdvpx_pa[i] =3D 0;
+	}
+	kfree(tdvpx_pa);
+	tdx->tdvpx_pa =3D NULL;
+free_tdvpr:
+	if (tdvpr_pa)
+		free_page((unsigned long)__va(tdvpr_pa));
+	tdx->tdvpr_pa =3D 0;
+
+	return ret;
+}
+
+int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	struct kvm_tdx_cmd cmd;
+	int ret;
+
+	if (tdx->initialized)
+		return -EINVAL;
+
+	if (!is_hkid_assigned(kvm_tdx) || is_td_finalized(kvm_tdx))
+		return -EINVAL;
+
+	if (copy_from_user(&cmd, argp, sizeof(cmd)))
+		return -EFAULT;
+
+	if (cmd.error || cmd.unused)
+		return -EINVAL;
+
+	/* Currently only KVM_TDX_INTI_VCPU is defined for vcpu operation. */
+	if (cmd.flags || cmd.id !=3D KVM_TDX_INIT_VCPU)
+		return -EINVAL;
+
+	ret =3D tdx_td_vcpu_init(vcpu, (u64)cmd.data);
+	if (ret)
+		return ret;
+
+	tdx->initialized =3D true;
+	return 0;
+}
+
 static int __init tdx_module_setup(void)
 {
 	const struct tdsysinfo_struct *tdsysinfo;
@@ -874,6 +1021,11 @@ static int __init tdx_module_setup(void)
 	WARN_ON(tdsysinfo->num_cpuid_config > TDX_MAX_NR_CPUID_CONFIGS);
 	tdx_info =3D (struct tdx_info) {
 		.nr_tdcs_pages =3D tdsysinfo->tdcs_base_size / PAGE_SIZE,
+		/*
+		 * TDVPS =3D TDVPR(4K page) + TDVPX(multiple 4K pages).
+		 * -1 for TDVPR.
+		 */
+		.nr_tdvpx_pages =3D tdsysinfo->tdvps_base_size / PAGE_SIZE - 1,
 	};
=20
 	pr_info("TDX is supported.\n");
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 5728820fed5e..5fa4d3198873 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -17,12 +17,19 @@ struct kvm_tdx {
 	u64 xfam;
 	int hkid;
=20
+	bool finalized;
+
 	u64 tsc_offset;
 };
=20
 struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
=20
+	unsigned long tdvpr_pa;
+	unsigned long *tdvpx_pa;
+
+	bool initialized;
+
 	/*
 	 * Dummy to make pmu_intel not corrupt memory.
 	 * TODO: Support PMU for TDX.  Future work.
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index b7708e725e93..c939c606b38f 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -154,6 +154,8 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_create(struct kvm_vcpu *vcpu);
 void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+
+int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 -ENOSYS; }
 static inline void tdx_hardware_unsetup(void) {}
@@ -172,6 +174,8 @@ static inline int tdx_vm_ioctl(struct kvm *kvm, void __=
user *argp) { return -EOP
 static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTS=
UPP; }
 static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
+
+static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 275bdbcb3043..b5b51342c9a9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5993,6 +5993,12 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	case KVM_SET_DEVICE_ATTR:
 		r =3D kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp);
 		break;
+	case KVM_MEMORY_ENCRYPT_OP:
+		r =3D -ENOTTY;
+		if (!kvm_x86_ops.vcpu_mem_enc_ioctl)
+			goto out;
+		r =3D kvm_x86_ops.vcpu_mem_enc_ioctl(vcpu, argp);
+		break;
 	default:
 		r =3D -EINVAL;
 	}
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 85cd178c10a7..4bde72881dc1 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -536,6 +536,7 @@ struct kvm_pmu_event_filter {
 enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
 	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 80F67C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:20 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231140AbjB0IZT (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:19 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55254 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230214AbjB0IYO (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:14 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52B591C7EE;
        Mon, 27 Feb 2023 00:24:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486252; x=1709022252;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=UXNXNU4nYYne5av5LrY+aIKSppF2pcXmbEdX5ewBtCU=;
  b=DpCe2wILAIxBKy6MYZyb0FvnRC0DHoNsJYzGlnSdYlxmCZqaOacEwpUo
   R13H6lDVyQ8yC+b7MAUZER4eRyEqk78FLhPwaaHgTUEO4XqyX7ObCi2HC
   +67tZWntDybLvJviE4JLkcnQaHabJDV7SvrG92sLReXTNEts0B+/9nKrh
   43vbjVijhtaMrBNUSX24x8NsrcNNIxj5fC01jZH2dWI7Pc2KT8H4b94eV
   Pilvz6zTsP1GkVP2MBMS+fL4eMc9LhF9BwE6bloRBMa9eJwdoVFnR/CSR
   dQkmhU2VT2qkzcdan7/WMr4oZSiTVUHnIl/U6dWsn6BJrp4sfnmcoJSIu
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608743"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608743"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:04 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242094"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242094"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:04 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 020/106] [MARKER] The start of TDX KVM patch series: KVM
 MMU GPA shared bits
Date: Mon, 27 Feb 2023 00:22:19 -0800
Message-Id: 
 <7e3ac5624457cac3d6f06f85c1d8db74e01db4c6.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of KVM MMU GPA
shared bits.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index a4ee04271d66..88343749d4c2 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -10,6 +10,7 @@ What qemu can do
 ----------------
 - TDX VM TYPE is exposed to Qemu.
 - Qemu can create/destroy guest of TDX vm type.
+- Qemu can create/destroy vcpu of TDX vm type.
=20
 Patch Layer status
 ------------------
@@ -17,12 +18,12 @@ Patch Layer status
 * TDX, VMX coexistence:                 Applied
 * TDX architectural definitions:        Applied
 * TD VM creation/destruction:           Applied
-* TD vcpu creation/destruction:         Applying
+* TD vcpu creation/destruction:         Applied
 * TDX EPT violation:                    Not yet
 * TD finalization:                      Not yet
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
-* KVM MMU GPA shared bits:              Not yet
+* KVM MMU GPA shared bits:              Applying
 * KVM TDP refactoring for TDX:          Not yet
 * KVM TDP MMU hooks:                    Not yet
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 31AC4C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231130AbjB0IZJ (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:09 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55312 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230227AbjB0IYQ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:16 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62E0F1CACE;
        Mon, 27 Feb 2023 00:24:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486253; x=1709022253;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=6IUEaKr0ie2ujCgGN6VZgF9eKrBotyox/Fn0sSiFqCA=;
  b=cTG9cgTC6D3ObJlEIR5H/XfHS9qsMt5gCLiLvvsVOzVer7W+SApUp/FJ
   Q9ApsIFjlDHe0bNvkumkXMewh6FdFlpe8PFldEg94VschhfjC5+mfZy6j
   pPkXgxNEI648ua8BeUJPoy64ghSpalkTZMtApfyO+a5tSyGbkn70BK8pX
   /K0hkMMyQpRo+V77FZLQjgTvt2EInP2BSP8IklJZ5ZYVlabHcNlApnlTl
   E7LK38YRBKdCkbid2zdRNsPlqZIQD6aZnenxGPmemXHVN0UT+QYJN5BOf
   KFr8hyf+W8pI3a5GRw6eEtAcPVqlM7CnuZ82D5mkob3OFaaepa1baDO0v
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608750"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608750"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:04 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242104"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242104"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:04 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 021/106] KVM: x86/mmu: introduce config for PRIVATE KVM
 MMU
Date: Mon, 27 Feb 2023 00:22:20 -0800
Message-Id: 
 <f047c0022458a19b3ad9c8c7a3ca4d8e6c8ecd94.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

To keep the case of non TDX intact, introduce a new config option for
private KVM MMU support.  At the moment, this is synonym for
CONFIG_INTEL_TDX_HOST && CONFIG_KVM_INTEL.  The config makes it clear
that the config is only for x86 KVM MMU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/Kconfig | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 718010600956..45ac6b01af44 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -153,4 +153,8 @@ config KVM_XEN
 config KVM_EXTERNAL_WRITE_TRACKING
 	bool
=20
+config KVM_MMU_PRIVATE
+	def_bool y
+	depends on INTEL_TDX_HOST && KVM_INTEL
+
 endif # VIRTUALIZATION
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E4E65C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231153AbjB0IZV (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:21 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56744 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230374AbjB0IYu (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:50 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8568C1ACE1;
        Mon, 27 Feb 2023 00:24:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486254; x=1709022254;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=tznujyOV/xWUCOerVv7HhNu3v4jxWrzWup7DnAlogmI=;
  b=FU5wC84IVBNBDqeNz6fAL3OaocU/Spr0CWjRTuwenBWqd3ugaI3s1dI0
   8p6gefkpY90QtPCGlxRpWzPPUPmTZanlUXpphRSs94u7VLCk0D8WTNkiy
   MBZmz4iN3kT20193g3DgeXNIDtT4gHmNpt8L0uCsHlHqDkumzvfNs+ovR
   gv1vk0bnSk+IjzzCKM0Mgtpli2Q0mwqdRdVjQZKcENoji/EXK0Lr9ilep
   yqTbHbWOYSu+D3jG0TrD7YMQuTq5HR596Zlt6b9UsNEQAxHtTOrt89pd3
   mBpJLkaJnPehZ3iLbSqlop+Z2xhaPK/P7yc05fiHCCV0M3EdcPWZ9Y6ad
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608755"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608755"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:05 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242112"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242112"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:04 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: [PATCH v12 022/106] KVM: x86/mmu: Add address conversion functions
 for TDX shared bit of GPA
Date: Mon, 27 Feb 2023 00:22:21 -0800
Message-Id: 
 <859cd56a7d059d5549536e88f2882b8495940e6b.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX repurposes one GPA bit (51 bit or 47 bit based on configuration) to
indicate the GPA is private(if cleared) or shared (if set) with VMM.  If
GPA.shared is set, GPA is covered by the existing conventional EPT pointed
by EPTP.  If GPA.shared bit is cleared, GPA is covered by TDX module.
VMM has to issue SEAMCALLs to operate.

Add a member to remember GPA shared bit for each guest TDs, add address
conversion functions between private GPA and shared GPA and test if GPA
is private.

Because struct kvm_arch (or struct kvm which includes struct kvm_arch. See
kvm_arch_alloc_vm() that passes __GPF_ZERO) is zero-cleared when allocated,
the new member to remember GPA shared bit is guaranteed to be zero with
this patch unless it's initialized explicitly.

Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++++
 arch/x86/kvm/mmu.h              | 32 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.c          |  5 +++++
 3 files changed, 41 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index bc8d40572238..e8506b48b97c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1448,6 +1448,10 @@ struct kvm_arch {
 	 */
 #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
 	struct kvm_mmu_memory_cache split_desc_cache;
+
+#ifdef CONFIG_KVM_MMU_PRIVATE
+	gfn_t gfn_shared_mask;
+#endif
 };
=20
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 168c46fd8dd1..951b14079602 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -278,4 +278,36 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu =
*vcpu,
 		return gpa;
 	return translate_nested_gpa(vcpu, gpa, access, exception);
 }
+
+static inline gfn_t kvm_gfn_shared_mask(const struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_MMU_PRIVATE
+	return kvm->arch.gfn_shared_mask;
+#else
+	return 0;
+#endif
+}
+
+static inline gfn_t kvm_gfn_shared(const struct kvm *kvm, gfn_t gfn)
+{
+	return gfn | kvm_gfn_shared_mask(kvm);
+}
+
+static inline gfn_t kvm_gfn_private(const struct kvm *kvm, gfn_t gfn)
+{
+	return gfn & ~kvm_gfn_shared_mask(kvm);
+}
+
+static inline gpa_t kvm_gpa_private(const struct kvm *kvm, gpa_t gpa)
+{
+	return gpa & ~gfn_to_gpa(kvm_gfn_shared_mask(kvm));
+}
+
+static inline bool kvm_is_private_gpa(const struct kvm *kvm, gpa_t gpa)
+{
+	gfn_t mask =3D kvm_gfn_shared_mask(kvm);
+
+	return mask && !(gpa_to_gfn(gpa) & mask);
+}
+
 #endif
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index ae7acf78631f..3f61cdc53c57 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -851,6 +851,11 @@ static int tdx_td_init(struct kvm *kvm, struct kvm_tdx=
_cmd *cmd)
 	kvm_tdx->attributes =3D td_params->attributes;
 	kvm_tdx->xfam =3D td_params->xfam;
=20
+	if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW)
+		kvm->arch.gfn_shared_mask =3D gpa_to_gfn(BIT_ULL(51));
+	else
+		kvm->arch.gfn_shared_mask =3D gpa_to_gfn(BIT_ULL(47));
+
 out:
 	/* kfree() accepts NULL. */
 	kfree(init_vm);
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 36E65C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:26:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231398AbjB0I0y (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:54 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55130 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230328AbjB0IYi (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:38 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 670C21A652;
        Mon, 27 Feb 2023 00:24:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486253; x=1709022253;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=PWnXznMMeziyc9Hs5eWqvj28CFaE86kvU5fCx1QnDYQ=;
  b=a1eiobxWh7MTIwaqmjarhVCqzC/xE+C76wCx+OItVP2fJCyyCeyVt0B2
   BgdbEPR0bp6BdnhhRqu8eKASG3gl1spsufJ6f5ucP3T8EyhBJRDR1p71G
   aJCa27Gy7P4n9O9HeIBHPX3gzI5P9zTNonQOgCCL7obBHhkzITYvbv8F5
   bx538BAgd1dhezw/SusNTehfuHR0zki3blBAvg12UtatsyFx718kRAyBX
   4uiTbQdCyeFr1/vgbBuyosF/agPvE/f6amcvvdIEpS0chEjSKJGjfLvkS
   WbyhF/xDLBV/rzh19jV10mzjOLL5sFSKNbZgFQQHzxx3QDNVjPqtEeIFK
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608757"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608757"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:05 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242122"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242122"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:04 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 023/106] [MARKER] The start of TDX KVM patch series: KVM
 TDP refactoring for TDX
Date: Mon, 27 Feb 2023 00:22:22 -0800
Message-Id: 
 <3b8904822cc7ae6fd813a76e02fb741838f79094.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of KVM TDP
refactoring for TDX.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 88343749d4c2..f10aff0b060e 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -24,6 +24,6 @@ Patch Layer status
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
-* KVM MMU GPA shared bits:              Applying
-* KVM TDP refactoring for TDX:          Not yet
+* KVM MMU GPA shared bits:              Applied
+* KVM TDP refactoring for TDX:          Applying
 * KVM TDP MMU hooks:                    Not yet
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 00159C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231161AbjB0IZY (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:24 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56762 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230205AbjB0IYu (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:50 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E000D1C303;
        Mon, 27 Feb 2023 00:24:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486254; x=1709022254;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=VCu6jpU86EpIGkSblys4cK9cOLbJ9NW8iFkKQFkQpM8=;
  b=jWA301taI8CSjvpcBBX/CfAsMLJGYVhNI9gU/28WH0YwkKcLDiE8+OoW
   jbjqRyTlGTlLnRkz7zpVhWRUpPeESoLvfjzqYCEThHt5ZZd9lqpQ80IRG
   4hi/MBHosrFmqfUBqvW4701+0DM07sMlMjQXMsSmnxd+oocuqxGMhQhBV
   6miq0G/AYb6odJTtTY9vNMsGNWIfrHFLgdAJFGIvPFkBdTzrO472bm3Ll
   El0v0t61dmFAeSmjsnAFS7S7PL3ugLEEkVyTrM3YIqyjDDLccEp+ZVxh9
   0acsIVUIMzGgDS/Dnw5xDvFyOor2VCmSwg6Tc+wb3xot9yjjCXc5AA31w
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608759"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608759"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:05 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242129"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242129"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:05 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 024/106] KVM: Allow page-sized MMU caches to be
 initialized with custom 64-bit values
Date: Mon, 27 Feb 2023 00:22:23 -0800
Message-Id: 
 <de154ee8d96bd337d0c9ae82d2ababad059decb4.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <seanjc@google.com>

Add support to MMU caches for initializing a page with a custom 64-bit
value, e.g. to pre-fill an entire page table with non-zero PTE values.
The functionality will be used by x86 to support Intel's TDX, which needs
to set bit 63 in all non-present PTEs in order to prevent !PRESENT page
faults from getting reflected into the guest (Intel's EPT Violation #VE
architecture made the less than brilliant decision of having the per-PTE
behavior be opt-out instead of opt-in).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_types.h |  1 +
 virt/kvm/kvm_main.c       | 16 ++++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 2728d49bbdf6..7c2b9332b7c5 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -94,6 +94,7 @@ struct kvm_mmu_memory_cache {
 	int nobjs;
 	gfp_t gfp_zero;
 	gfp_t gfp_custom;
+	u64 init_value;
 	struct kmem_cache *kmem_cache;
 	int capacity;
 	void **objects;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f8495e27d210..87400796df6e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -381,12 +381,17 @@ static void kvm_flush_shadow_all(struct kvm *kvm)
 static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache=
 *mc,
 					       gfp_t gfp_flags)
 {
+	void *page;
+
 	gfp_flags |=3D mc->gfp_zero;
=20
 	if (mc->kmem_cache)
 		return kmem_cache_alloc(mc->kmem_cache, gfp_flags);
-	else
-		return (void *)__get_free_page(gfp_flags);
+
+	page =3D (void *)__get_free_page(gfp_flags);
+	if (page && mc->init_value)
+		memset64(page, mc->init_value, PAGE_SIZE / sizeof(mc->init_value));
+	return page;
 }
=20
 int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capa=
city, int min)
@@ -401,6 +406,13 @@ int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory=
_cache *mc, int capacity,
 		if (WARN_ON_ONCE(!capacity))
 			return -EIO;
=20
+		/*
+		 * Custom init values can be used only for page allocations,
+		 * and obviously conflict with __GFP_ZERO.
+		 */
+		if (WARN_ON_ONCE(mc->init_value && (mc->kmem_cache || mc->gfp_zero)))
+			return -EIO;
+
 		mc->objects =3D kvmalloc_array(sizeof(void *), capacity, gfp);
 		if (!mc->objects)
 			return -ENOMEM;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 88D20C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231166AbjB0IZ1 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:27 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55136 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230415AbjB0IY4 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:56 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E01941C337;
        Mon, 27 Feb 2023 00:24:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486254; x=1709022254;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=X+3zCrOz1M3BQjf665aykPhXZdf0cnWo4OHe8tVXUKQ=;
  b=bOnLCXRpdYVGg6WjvFcFUlG6vpxiobiHYnOGv/ntsfN6ujbc9O7gQz3W
   HBMxMYHeS96IItAHkTjTUEzkO+tE/vk00QaXqhwMN95Bou3Gew5XuMZOr
   c5X8xz+2MqyrIJ3b2++sCDo/tTcMLPI4mOpRxxSDKIFlJdW0ga8zsfmoC
   txnJO+1U3cG1C1boeDQelT8xqflK1u6E8nbX/tRhfaot1BcGxdl0tnSyI
   pCg6EygicZM8L6lIlH8JPTu6o7ixd3bZvBStKmyXHvL30VnKdwCQuAcLY
   qsmq8zCIsuCcMVR3k0SovXYaRjUKq89wUeOKHOWanWmIMdT2F0Cskihvo
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608771"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608771"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:05 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242133"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242133"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:05 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 025/106] KVM: x86/mmu: Replace hardcoded value 0 for the
 initial value for SPTE
Date: Mon, 27 Feb 2023 00:22:24 -0800
Message-Id: 
 <04d1691215b2ef7e81041b597f5330158103ff28.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX support will need the "suppress #VE" bit (bit 63) set as the
initial value for SPTE.  To reduce code change size, introduce a new macro
SHADOW_NONPRESENT_VALUE for the initial value for the shadow page table
entry (SPTE) and replace hard-coded value 0 for it.  Initialize shadow page
tables with their value.

The plan is to unconditionally set the "suppress #VE" bit for both AMD and
Intel as: 1) AMD hardware uses the bit 63 as NX for present SPTE and
ignored for non-present SPTE; 2) for conventional VMX guests, KVM never
enables the "EPT-violation #VE" in VMCS control and "suppress #VE" bit is
ignored by hardware.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c         | 17 +++++++++++++----
 arch/x86/kvm/mmu/paging_tmpl.h |  3 ++-
 arch/x86/kvm/mmu/spte.h        |  2 ++
 arch/x86/kvm/mmu/tdp_mmu.c     | 15 ++++++++-------
 4 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 38745d275e22..5ce591bb6c19 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -547,9 +547,9 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u=
64 *sptep)
=20
 	if (!is_shadow_present_pte(old_spte) ||
 	    !spte_has_volatile_bits(old_spte))
-		__update_clear_spte_fast(sptep, 0ull);
+		__update_clear_spte_fast(sptep, SHADOW_NONPRESENT_VALUE);
 	else
-		old_spte =3D __update_clear_spte_slow(sptep, 0ull);
+		old_spte =3D __update_clear_spte_slow(sptep, SHADOW_NONPRESENT_VALUE);
=20
 	if (!is_shadow_present_pte(old_spte))
 		return old_spte;
@@ -583,7 +583,7 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u=
64 *sptep)
  */
 static void mmu_spte_clear_no_track(u64 *sptep)
 {
-	__update_clear_spte_fast(sptep, 0ull);
+	__update_clear_spte_fast(sptep, SHADOW_NONPRESENT_VALUE);
 }
=20
 static u64 mmu_spte_get_lockless(u64 *sptep)
@@ -6001,7 +6001,16 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_page_header_cache.kmem_cache =3D mmu_page_header_cache;
 	vcpu->arch.mmu_page_header_cache.gfp_zero =3D __GFP_ZERO;
=20
-	vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO;
+	/*
+	 * When X86_64, initial SEPT entries are initialized with
+	 * SHADOW_NONPRESENT_VALUE.  Otherwise zeroed.  See
+	 * mmu_memory_cache_alloc_obj().
+	 */
+	if (IS_ENABLED(CONFIG_X86_64))
+		vcpu->arch.mmu_shadow_page_cache.init_value =3D
+			SHADOW_NONPRESENT_VALUE;
+	if (!vcpu->arch.mmu_shadow_page_cache.init_value)
+		vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO;
=20
 	vcpu->arch.mmu =3D &vcpu->arch.root_mmu;
 	vcpu->arch.walk_mmu =3D &vcpu->arch.root_mmu;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e5662dbd519c..8ef7bc674d41 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1028,7 +1028,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, st=
ruct kvm_mmu_page *sp)
 		gpa_t pte_gpa;
 		gfn_t gfn;
=20
-		if (!sp->spt[i])
+		/* spt[i] has initial value of shadow page table allocation */
+		if (sp->spt[i] =3D=3D SHADOW_NONPRESENT_VALUE)
 			continue;
=20
 		pte_gpa =3D first_pte_gpa + i * sizeof(pt_element_t);
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 0d8deefee66c..f190eaf6b2b5 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -148,6 +148,8 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS =3D=3D 8 && MMIO_S=
PTE_GEN_HIGH_BITS =3D=3D 11);
=20
 #define MMIO_SPTE_GEN_MASK		GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE=
_GEN_HIGH_BITS - 1, 0)
=20
+#define SHADOW_NONPRESENT_VALUE	0ULL
+
 extern u64 __read_mostly shadow_host_writable_mask;
 extern u64 __read_mostly shadow_mmu_writable_mask;
 extern u64 __read_mostly shadow_nx_mask;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 59f47b51784d..380bd0cea939 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -690,7 +690,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *k=
vm,
 	 * here since the SPTE is going from non-present to non-present.  Use
 	 * the raw write helper to avoid an unnecessary check on volatile bits.
 	 */
-	__kvm_tdp_mmu_write_spte(iter->sptep, 0);
+	__kvm_tdp_mmu_write_spte(iter->sptep, SHADOW_NONPRESENT_VALUE);
=20
 	return 0;
 }
@@ -867,8 +867,8 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct =
kvm_mmu_page *root,
 			continue;
=20
 		if (!shared)
-			tdp_mmu_set_spte(kvm, &iter, 0);
-		else if (tdp_mmu_set_spte_atomic(kvm, &iter, 0))
+			tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
+		else if (tdp_mmu_set_spte_atomic(kvm, &iter, SHADOW_NONPRESENT_VALUE))
 			goto retry;
 	}
 }
@@ -924,8 +924,9 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu=
_page *sp)
 	if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
 		return false;
=20
-	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
-			   sp->gfn, sp->role.level + 1, true, true);
+	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte,
+			   SHADOW_NONPRESENT_VALUE, sp->gfn, sp->role.level + 1,
+			   true, true);
=20
 	return true;
 }
@@ -959,7 +960,7 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct k=
vm_mmu_page *root,
 		    !is_last_spte(iter.old_spte, iter.level))
 			continue;
=20
-		tdp_mmu_set_spte(kvm, &iter, 0);
+		tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
 		flush =3D true;
 	}
=20
@@ -1328,7 +1329,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_=
iter *iter,
 	 * invariant that the PFN of a present * leaf SPTE can never change.
 	 * See __handle_changed_spte().
 	 */
-	tdp_mmu_set_spte(kvm, iter, 0);
+	tdp_mmu_set_spte(kvm, iter, SHADOW_NONPRESENT_VALUE);
=20
 	if (!pte_write(range->pte)) {
 		new_spte =3D kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 317A9C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:32 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231172AbjB0IZa (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:30 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55254 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230421AbjB0IY4 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:56 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CB951C7CC;
        Mon, 27 Feb 2023 00:24:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486255; x=1709022255;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ZtDPsWa2OtV4wCi06vZnxAcBCv72AJDWzb2ws6jTaDg=;
  b=hlTTxuAlRezdaOlRdGYICfmKVQwr8XPv3AopZLa97O8Ja+6UKaWSQehb
   cwj/BFtYdVpsHx5n85Vbi0mmE7Zdu0coCHPylmzNwNu2yXV2uYLYKqd0z
   mSx2tVw17x8S6L9MeVCHBI/VC6H7UA/60T5t6rREvYdz0/BV/tkdj59On
   gQrOce4GcjREJnwPPmTywKZI+YizPMrIIF0M5HYFO2FfouvVzGRl93cMZ
   xrm8wf19qAydwPVCjeSZoQzYSsoCU8Nud8dOBAXIaCyFlGnQh2owg8/gQ
   k+IS71Jav3wshp7XOhJTTQ0MSzZEpvDtjltwT6GZIvA1F15WvuS+zXHoo
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608774"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608774"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:05 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242137"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242137"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:05 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 026/106] KVM: x86/mmu: Allow non-zero value for
 non-present SPTE and removed SPTE
Date: Mon, 27 Feb 2023 00:22:25 -0800
Message-Id: 
 <a5bb94ae3cf47494f7e45fe8a6e969778d77d932.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

For TD guest, the current way to emulate MMIO doesn't work any more, as KVM
is not able to access the private memory of TD guest and do the emulation.
Instead, TD guest expects to receive #VE when it accesses the MMIO and then
it can explicitly make hypercall to KVM to get the expected information.

To achieve this, the TDX module always enables "EPT-violation #VE" in the
VMCS control.  And accordingly, for the MMIO spte for the shared GPA,
1. KVM needs to set "suppress #VE" bit for the non-present SPTE so that EPT
violation happens on TD accessing MMIO range.  2. On EPT violation, KVM
sets the MMIO spte to clear "suppress #VE" bit so the TD guest can receive
the #VE instead of EPT misconfigration unlike VMX case.  For the shared GPA
that is not populated yet, EPT violation need to be triggered when TD guest
accesses such shared GPA.  The non-present SPTE value for shared GPA should
set "suppress #VE" bit.

Add "suppress #VE" bit (bit 63) to SHADOW_NONPRESENT_VALUE and
REMOVED_SPTE.  Unconditionally set the "suppress #VE" bit (which is bit 63)
for both AMD and Intel as: 1) AMD hardware doesn't use this bit when
present bit is off; 2) for normal VMX guest, KVM never enables the
"EPT-violation #VE" in VMCS control and "suppress #VE" bit is ignored by
hardware.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/spte.h | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index f190eaf6b2b5..471378ee9071 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -148,7 +148,20 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS =3D=3D 8 && MMIO_=
SPTE_GEN_HIGH_BITS =3D=3D 11);
=20
 #define MMIO_SPTE_GEN_MASK		GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE=
_GEN_HIGH_BITS - 1, 0)
=20
+/*
+ * Non-present SPTE value for both VMX and SVM for TDP MMU.
+ * For SVM NPT, for non-present spte (bit 0 =3D 0), other bits are ignored.
+ * For VMX EPT, bit 63 is ignored if #VE is disabled. (EPT_VIOLATION_VE=3D=
0)
+ *              bit 63 is #VE suppress if #VE is enabled. (EPT_VIOLATION_V=
E=3D1)
+ * For TDX:
+ *   TDX module sets EPT_VIOLATION_VE for Secure-EPT and conventional EPT
+ */
+#ifdef CONFIG_X86_64
+#define SHADOW_NONPRESENT_VALUE	BIT_ULL(63)
+static_assert(!(SHADOW_NONPRESENT_VALUE & SPTE_MMU_PRESENT_MASK));
+#else
 #define SHADOW_NONPRESENT_VALUE	0ULL
+#endif
=20
 extern u64 __read_mostly shadow_host_writable_mask;
 extern u64 __read_mostly shadow_mmu_writable_mask;
@@ -195,7 +208,7 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
  *
  * Only used by the TDP MMU.
  */
-#define REMOVED_SPTE	0x5a0ULL
+#define REMOVED_SPTE	(SHADOW_NONPRESENT_VALUE | 0x5a0ULL)
=20
 /* Removed SPTEs must not be misconstrued as shadow present PTEs. */
 static_assert(!(REMOVED_SPTE & SPTE_MMU_PRESENT_MASK));
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8551CC7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230493AbjB0IZp (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:45 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56980 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230472AbjB0IY5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:57 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C2F11CAC2;
        Mon, 27 Feb 2023 00:24:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486256; x=1709022256;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=6fv/A0IW7uL9kDyROAGN7ITa97YOBlnR3nZihL0cU0I=;
  b=F90DRcbz0educKrzqilo8gEUCMkRA1r+RKN68UUyl17EgCKwv8OP7M8s
   r25UxKDGlAKHSqg9J+jQ47VPBsMmQhZ2cXV/9or6ChK/oHM25RbXNwJ78
   l3nIY2w2DEUCuzgxfWYmn7TcN58kWLO08030LWYssVuqDjNXd73D1ZVGW
   VBL3SWFYEEma0wuqfPxfzmWq7UZRB6B3VRcarEx9Y0VbT05VYFoK5RFVD
   LW3zwKEBuccZaJYmw+EOA+lp6wj7N9dw5uQubDNRWM+ACExPegiAtTT2p
   nLZPg7z0QYLJV+kpsoIDSxfam77Z9SuiuiaDi05CUFzAufUjJUnYJZl/L
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608779"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608779"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:06 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242142"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242142"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:05 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 027/106] KVM: x86/mmu: Add Suppress VE bit to
 shadow_mmio_mask/shadow_present_mask
Date: Mon, 27 Feb 2023 00:22:26 -0800
Message-Id: 
 <a6a83678c386507e081b9c1babfbdabb396eaecc.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

To make use of the same value of shadow_mmio_mask and shadow_present_mask
for TDX and VMX, add Suppress-VE bit to shadow_mmio_mask and
shadow_present_mask so that they can be common for both VMX and TDX.

TDX will require shadow_mmio_mask and shadow_present_mask to include
VMX_SUPPRESS_VE for shared GPA so that EPT violation is triggered for
shared GPA.  For VMX, VMX_SUPPRESS_VE doesn't matter for MMIO because the
spte value is required to cause EPT misconfig.  the additional bit doesn't
affect VMX logic to add the bit to shadow_mmio_{value, mask}.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/vmx.h | 1 +
 arch/x86/kvm/mmu/spte.c    | 6 ++++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 498dc600bd5c..cdbf12c1a83c 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -511,6 +511,7 @@ enum vmcs_field {
 #define VMX_EPT_IPAT_BIT    			(1ull << 6)
 #define VMX_EPT_ACCESS_BIT			(1ull << 8)
 #define VMX_EPT_DIRTY_BIT			(1ull << 9)
+#define VMX_EPT_SUPPRESS_VE_BIT			(1ull << 63)
 #define VMX_EPT_RWX_MASK                        (VMX_EPT_READABLE_MASK |  =
     \
 						 VMX_EPT_WRITABLE_MASK |       \
 						 VMX_EPT_EXECUTABLE_MASK)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index fce6f047399f..cc0bc058fb25 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -431,7 +431,9 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_e=
xec_only)
 	shadow_dirty_mask	=3D has_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull;
 	shadow_nx_mask		=3D 0ull;
 	shadow_x_mask		=3D VMX_EPT_EXECUTABLE_MASK;
-	shadow_present_mask	=3D has_exec_only ? 0ull : VMX_EPT_READABLE_MASK;
+	/* VMX_EPT_SUPPRESS_VE_BIT is needed for W or X violation. */
+	shadow_present_mask	=3D
+		(has_exec_only ? 0ull : VMX_EPT_READABLE_MASK) | VMX_EPT_SUPPRESS_VE_BIT;
 	/*
 	 * EPT overrides the host MTRRs, and so KVM must program the desired
 	 * memtype directly into the SPTEs.  Note, this mask is just the mask
@@ -448,7 +450,7 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_e=
xec_only)
 	 * of an EPT paging-structure entry is 110b (write/execute).
 	 */
 	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE,
-				   VMX_EPT_RWX_MASK, 0);
+				   VMX_EPT_RWX_MASK | VMX_EPT_SUPPRESS_VE_BIT, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_ept_masks);
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CBFB9C7EE30
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230497AbjB0IZr (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:47 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55158 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230475AbjB0IY5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:57 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B4301C7F2;
        Mon, 27 Feb 2023 00:24:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486256; x=1709022256;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=WqnskFIQm98MygHS3gDq/PQGttioEKm5ixTtOo1X8Eg=;
  b=jmQkASTdIGI594CW6iIEmlcCQBMVlLc2wdRAnfc9GRJcjYL8uBt9B9wo
   u0OocMDQCYoSSodudWmNf5scb8EiHXkBQZqx//fo7osy1LOMqMzlhUfST
   KKTsL2oorhN5AYXsT5iVS1hTkX8ejdChA5ukTz3ilCA76Ne16nyXL4ooT
   s8HC2xMu/6UGfUMRwI7Q/mWIfjxlNgVV+jKB21eW6XA/gNhUx9G36MQTw
   v2IstUgofysbZjpcGeysnLFXcix91Q+wWsYyKbu4xequYKuHaogZLhNZz
   ySHIQnQOvAZoKQ9BAqFABkr7AQE8n8ZC2w1jCe7Nv2rQdjvdhDSjj0jeb
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608784"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608784"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:06 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242145"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242145"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:05 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 028/106] KVM: x86/mmu: Track shadow MMIO value on a per-VM
 basis
Date: Mon, 27 Feb 2023 00:22:27 -0800
Message-Id: 
 <feea31f8b6101435d62586b9286e9ddbb73ac373.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX will use a different shadow PTE entry value for MMIO from VMX.  Add
members to kvm_arch and track value for MMIO per-VM instead of global
variables.  By using the per-VM EPT entry value for MMIO, the existing VMX
logic is kept working.  Introduce a separate setter function so that guest
TD can override later.

Also require mmio spte cachcing for TDX.  Actually this is true case
because TDX require EPT and KVM EPT allows mmio spte caching.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu.h              |  1 +
 arch/x86/kvm/mmu/mmu.c          |  7 ++++---
 arch/x86/kvm/mmu/spte.c         | 10 ++++++++--
 arch/x86/kvm/mmu/spte.h         |  4 ++--
 arch/x86/kvm/mmu/tdp_mmu.c      |  6 +++---
 6 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index e8506b48b97c..f120c4484316 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1260,6 +1260,8 @@ struct kvm_arch {
 	 */
 	spinlock_t mmu_unsync_pages_lock;
=20
+	u64 shadow_mmio_value;
+
 	struct list_head assigned_dev_head;
 	struct iommu_domain *iommu_domain;
 	bool iommu_noncoherent;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 951b14079602..0234201d5e63 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -101,6 +101,7 @@ static inline u8 kvm_get_shadow_phys_bits(void)
 }
=20
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_=
mask);
+void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
 void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
=20
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5ce591bb6c19..8955d893d173 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2445,7 +2445,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct k=
vm_mmu_page *sp,
 				return kvm_mmu_prepare_zap_page(kvm, child,
 								invalid_list);
 		}
-	} else if (is_mmio_spte(pte)) {
+	} else if (is_mmio_spte(kvm, pte)) {
 		mmu_spte_clear_no_track(spte);
 	}
 	return 0;
@@ -4126,7 +4126,7 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vc=
pu, u64 addr, bool direct)
 	if (WARN_ON(reserved))
 		return -EINVAL;
=20
-	if (is_mmio_spte(spte)) {
+	if (is_mmio_spte(vcpu->kvm, spte)) {
 		gfn_t gfn =3D get_mmio_spte_gfn(spte);
 		unsigned int access =3D get_mmio_spte_access(spte);
=20
@@ -4674,7 +4674,7 @@ static unsigned long get_cr3(struct kvm_vcpu *vcpu)
 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
 			   unsigned int access)
 {
-	if (unlikely(is_mmio_spte(*sptep))) {
+	if (unlikely(is_mmio_spte(vcpu->kvm, *sptep))) {
 		if (gfn !=3D get_mmio_spte_gfn(*sptep)) {
 			mmu_spte_clear_no_track(sptep);
 			return true;
@@ -6163,6 +6163,7 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	struct kvm_page_track_notifier_node *node =3D &kvm->arch.mmu_sp_tracker;
 	int r;
=20
+	kvm->arch.shadow_mmio_value =3D shadow_mmio_value;
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
 	INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index cc0bc058fb25..a23e9205fc42 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -74,10 +74,10 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsi=
gned int access)
 	u64 spte =3D generation_mmio_spte_mask(gen);
 	u64 gpa =3D gfn << PAGE_SHIFT;
=20
-	WARN_ON_ONCE(!shadow_mmio_value);
+	WARN_ON_ONCE(!vcpu->kvm->arch.shadow_mmio_value);
=20
 	access &=3D shadow_mmio_access_mask;
-	spte |=3D shadow_mmio_value | access;
+	spte |=3D vcpu->kvm->arch.shadow_mmio_value | access;
 	spte |=3D gpa | shadow_nonpresent_or_rsvd_mask;
 	spte |=3D (gpa & shadow_nonpresent_or_rsvd_mask)
 		<< SHADOW_NONPRESENT_OR_RSVD_MASK_LEN;
@@ -413,6 +413,12 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mm=
io_mask, u64 access_mask)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
=20
+void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value)
+{
+	kvm->arch.shadow_mmio_value =3D mmio_value;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_value);
+
 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask)
 {
 	/* shadow_me_value must be a subset of shadow_me_mask */
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 471378ee9071..256395eb593f 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -251,9 +251,9 @@ static inline struct kvm_mmu_page *sptep_to_sp(u64 *spt=
ep)
 	return to_shadow_page(__pa(sptep));
 }
=20
-static inline bool is_mmio_spte(u64 spte)
+static inline bool is_mmio_spte(struct kvm *kvm, u64 spte)
 {
-	return (spte & shadow_mmio_mask) =3D=3D shadow_mmio_value &&
+	return (spte & shadow_mmio_mask) =3D=3D kvm->arch.shadow_mmio_value &&
 	       likely(enable_mmio_caching);
 }
=20
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 380bd0cea939..2619e20c4dfa 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -576,8 +576,8 @@ static void __handle_changed_spte(struct kvm *kvm, int =
as_id, gfn_t gfn,
 		 * impact the guest since both the former and current SPTEs
 		 * are nonpresent.
 		 */
-		if (WARN_ON(!is_mmio_spte(old_spte) &&
-			    !is_mmio_spte(new_spte) &&
+		if (WARN_ON(!is_mmio_spte(kvm, old_spte) &&
+			    !is_mmio_spte(kvm, new_spte) &&
 			    !is_removed_spte(new_spte)))
 			pr_err("Unexpected SPTE change! Nonpresent SPTEs\n"
 			       "should not be replaced with another,\n"
@@ -1095,7 +1095,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm=
_vcpu *vcpu,
 	}
=20
 	/* If a MMIO SPTE is installed, the MMIO will need to be emulated. */
-	if (unlikely(is_mmio_spte(new_spte))) {
+	if (unlikely(is_mmio_spte(vcpu->kvm, new_spte))) {
 		vcpu->stat.pf_mmio_spte_created++;
 		trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn,
 				     new_spte);
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 43F4EC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231223AbjB0IZs (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:48 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54810 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230486AbjB0IY6 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:24:58 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B5D91C7F5;
        Mon, 27 Feb 2023 00:24:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486256; x=1709022256;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=NQmrs82q67JKccX4iPWUTE8xOSyHlsd2sMxQgdEWU+4=;
  b=Q19FHmn3kvV4hFQ/7hrVJ6MlznIjKSsdvluT8SwGHMlEqgBi3I0fWnff
   302mc5GlszgYTZbfTkE4TYsgJEKulgU+vP995oZ9y83RM5KR2BwzMtzzj
   1kJaU2Qt6nl3TuM7UkAXeJvn8hA65lbyA74UIa60Ht+iJUbSgmGKwNnlX
   DE6O2LuixPoDlCYGwIkmx5EYbswVzcQYkm9EJRi3SoL1sc0dJ6bvZbvmr
   FkkEUGzeFhrLw0QUDw7cm4ezrxDy5CE/I+B8lsGPZDjdOIynQoQ1htGzX
   tg1DCOI3xY44bJoslzr43uP3NpYfmzFShfcwVrSxvXQ2TseXul2hdRHIg
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608788"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608788"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:06 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242148"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242148"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:06 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 029/106] KVM: x86/mmu: Disallow fast page fault on private
 GPA
Date: Mon, 27 Feb 2023 00:22:28 -0800
Message-Id: 
 <475bc1d2f93dd2f95eacc0d05eef35fe66c72ab6.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX requires TDX SEAMCALL to operate Secure EPT instead of direct memory
access and TDX SEAMCALL is heavy operation.  Fast page fault on private GPA
doesn't make sense.  Disallow fast page fault on private GPA.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8955d893d173..aaa485daa4d9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3280,8 +3280,16 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *=
vcpu,
 	return RET_PF_CONTINUE;
 }
=20
-static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
+static bool page_fault_can_be_fast(struct kvm *kvm, struct kvm_page_fault =
*fault)
 {
+	/*
+	 * TDX private mapping doesn't support fast page fault because the EPT
+	 * entry is read/written with TDX SEAMCALLs instead of direct memory
+	 * access.
+	 */
+	if (kvm_is_private_gpa(kvm, fault->addr))
+		return false;
+
 	/*
 	 * Page faults with reserved bits set, i.e. faults on MMIO SPTEs, only
 	 * reach the common page fault handler if the SPTE has an invalid MMIO
@@ -3391,7 +3399,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, str=
uct kvm_page_fault *fault)
 	u64 *sptep =3D NULL;
 	uint retry_count =3D 0;
=20
-	if (!page_fault_can_be_fast(fault))
+	if (!page_fault_can_be_fast(vcpu->kvm, fault))
 		return ret;
=20
 	walk_shadow_page_lockless_begin(vcpu);
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4D049C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:25:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231232AbjB0IZw (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:25:52 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56798 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230183AbjB0IZB (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:25:01 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16B471CF53;
        Mon, 27 Feb 2023 00:24:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486258; x=1709022258;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=LDJcIHAq8JVYVl68jExUwHxi1WYlAFgPTKVC+c7YPdE=;
  b=nqFr9WrlOJFEudK5eBlICi3CQV+eazp5KJZCSUH1YbtMNnl2xeyeylBP
   9hQ63I68YH6zlSTbrwLYoC21EVM+Axn696X73kZGSYkfnEs/MSw0R589z
   o5KsrQ64HVsBzZTGfI18ZA5MZKu6o7a1OxnCYnp87ZscgsL1T9b92GPbd
   H33Z4EcfRvIU5+qQwkSbXkN0yIfWGS+GapRPoIJcfJ2XrrFOAoo+0g6Fb
   +sK4Rp4piN0/2lR+hsdbI8f1Kpet9D7/GfcJQ36B/BLHtdGjyjeNgP/WA
   7RUV7u15qhv+iWfkhbjldbWYSXfJ1V2NOzPyv+Mm5Zc6e6DRRU6N4Q4Ln
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608790"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608790"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:06 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242151"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242151"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:06 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 030/106] KVM: x86/mmu: Allow per-VM override of the TDP
 max page level
Date: Mon, 27 Feb 2023 00:22:29 -0800
Message-Id: 
 <027a00b862e456ad31c862259bff2d5a03e4eaa3.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX requires special handling to support large private page.  For
simplicity, only support 4K page for TD guest for now.  Add per-VM maximum
page level support to support different maximum page sizes for TD guest and
conventional VMX guest.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/mmu/mmu.c          | 1 +
 arch/x86/kvm/mmu/mmu_internal.h | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index f120c4484316..079503be0fb3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1234,6 +1234,7 @@ struct kvm_arch {
 	unsigned long n_requested_mmu_pages;
 	unsigned long n_max_mmu_pages;
 	unsigned int indirect_shadow_pages;
+	int tdp_max_page_level;
 	u8 mmu_valid_gen;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
 	struct list_head active_mmu_pages;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index aaa485daa4d9..898f36f2d84a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6195,6 +6195,7 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	kvm->arch.split_desc_cache.kmem_cache =3D pte_list_desc_cache;
 	kvm->arch.split_desc_cache.gfp_zero =3D __GFP_ZERO;
=20
+	kvm->arch.tdp_max_page_level =3D KVM_MAX_HUGEPAGE_LEVEL;
 	return 0;
 }
=20
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index e642d431df4b..f6d81505d4ba 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -277,7 +277,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu=
 *vcpu, gpa_t cr2_or_gpa,
 		.nx_huge_page_workaround_enabled =3D
 			is_nx_huge_page_enabled(vcpu->kvm),
=20
-		.max_level =3D KVM_MAX_HUGEPAGE_LEVEL,
+		.max_level =3D vcpu->kvm->arch.tdp_max_page_level,
 		.req_level =3D PG_LEVEL_4K,
 		.goal_level =3D PG_LEVEL_4K,
 		.is_private =3D kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0D13CC7EE2E
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:26:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231288AbjB0I0I (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:08 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55138 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230527AbjB0IZH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:25:07 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 365841D922;
        Mon, 27 Feb 2023 00:24:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486259; x=1709022259;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=fJfaht6JtigXZX2VjmlF7GKORLBKsS2O1wppMToE0tM=;
  b=JYn3P550mQ2prU0XVdzgizpbf7J8aOkvU/Mkvrl9Bf9yI5DYYRmoatER
   rhayyrRBYJCGVL11esdW/GvYNqveeGWOqhU9VTrzsRXta1qkdjbh46b1H
   XobqGyxWIyqD6J7wWX1OSsbc+dj/crkr9T3om9k6QfSu7twLJaFOdwlMi
   cLA7ngbRmsfNe2asTaF4twDQ4Dpp9hrIbhl+FxUOfG1LTK3zsdQwGUi4W
   QAWpLwFVsRypcNsxvGuMRX/fwKM1rkrUnr9zQJwO6ezWyFrOVJDPWeo4s
   Yll5e1cYSkhtsckfOcQu4ayqnZ/1hmD9Zxk97StvgWjI4Kczv4N5pnC0M
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608798"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608798"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:06 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242154"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242154"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:06 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 031/106] KVM: VMX: Introduce test mode related to EPT
 violation VE
Date: Mon, 27 Feb 2023 00:22:30 -0800
Message-Id: 
 <d42eb6c7ba400cb16bfd068cf6719090fa1ae291.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

To support TDX, KVM is enhanced to operate with #VE.  For TDX, KVM programs
to inject #VE conditionally and set #VE suppress bit in EPT entry.  For VMX
case, #VE isn't used.  If #VE happens for VMX, it's a bug.  To be
defensive (test that VMX case isn't broken), introduce option
ept_violation_ve_test and when it's set, set error.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/vmx.h | 12 +++++++
 arch/x86/kvm/vmx/vmcs.h    |  5 +++
 arch/x86/kvm/vmx/vmx.c     | 69 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h     |  6 +++-
 4 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index cdbf12c1a83c..752d53652007 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -68,6 +68,7 @@
 #define SECONDARY_EXEC_ENCLS_EXITING		VMCS_CONTROL_BIT(ENCLS_EXITING)
 #define SECONDARY_EXEC_RDSEED_EXITING		VMCS_CONTROL_BIT(RDSEED_EXITING)
 #define SECONDARY_EXEC_ENABLE_PML               VMCS_CONTROL_BIT(PAGE_MOD_=
LOGGING)
+#define SECONDARY_EXEC_EPT_VIOLATION_VE		VMCS_CONTROL_BIT(EPT_VIOLATION_VE)
 #define SECONDARY_EXEC_PT_CONCEAL_VMX		VMCS_CONTROL_BIT(PT_CONCEAL_VMX)
 #define SECONDARY_EXEC_XSAVES			VMCS_CONTROL_BIT(XSAVES)
 #define SECONDARY_EXEC_MODE_BASED_EPT_EXEC	VMCS_CONTROL_BIT(MODE_BASED_EPT=
_EXEC)
@@ -223,6 +224,8 @@ enum vmcs_field {
 	VMREAD_BITMAP_HIGH              =3D 0x00002027,
 	VMWRITE_BITMAP                  =3D 0x00002028,
 	VMWRITE_BITMAP_HIGH             =3D 0x00002029,
+	VE_INFORMATION_ADDRESS		=3D 0x0000202A,
+	VE_INFORMATION_ADDRESS_HIGH	=3D 0x0000202B,
 	XSS_EXIT_BITMAP                 =3D 0x0000202C,
 	XSS_EXIT_BITMAP_HIGH            =3D 0x0000202D,
 	ENCLS_EXITING_BITMAP		=3D 0x0000202E,
@@ -628,4 +631,13 @@ enum vmx_l1d_flush_state {
=20
 extern enum vmx_l1d_flush_state l1tf_vmx_mitigation;
=20
+struct vmx_ve_information {
+	u32 exit_reason;
+	u32 delivery;
+	u64 exit_qualification;
+	u64 guest_linear_address;
+	u64 guest_physical_address;
+	u16 eptp_index;
+};
+
 #endif
diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
index ac290a44a693..9277676057a7 100644
--- a/arch/x86/kvm/vmx/vmcs.h
+++ b/arch/x86/kvm/vmx/vmcs.h
@@ -140,6 +140,11 @@ static inline bool is_nm_fault(u32 intr_info)
 	return is_exception_n(intr_info, NM_VECTOR);
 }
=20
+static inline bool is_ve_fault(u32 intr_info)
+{
+	return is_exception_n(intr_info, VE_VECTOR);
+}
+
 /* Undocumented: icebp/int1 */
 static inline bool is_icebp(u32 intr_info)
 {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5d2ff4d964bd..2afa29eaa258 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -127,6 +127,9 @@ module_param(error_on_inconsistent_vmcs_config, bool, 0=
444);
 static bool __read_mostly dump_invalid_vmcs =3D 0;
 module_param(dump_invalid_vmcs, bool, 0644);
=20
+static bool __read_mostly ept_violation_ve_test;
+module_param(ept_violation_ve_test, bool, 0444);
+
 #define MSR_BITMAP_MODE_X2APIC		1
 #define MSR_BITMAP_MODE_X2APIC_APICV	2
=20
@@ -844,6 +847,13 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
=20
 	eb =3D (1u << PF_VECTOR) | (1u << UD_VECTOR) | (1u << MC_VECTOR) |
 	     (1u << DB_VECTOR) | (1u << AC_VECTOR);
+	/*
+	 * #VE isn't used for VMX, but for TDX.  To test against unexpected
+	 * change related to #VE for VMX, intercept unexpected #VE and warn on
+	 * it.
+	 */
+	if (ept_violation_ve_test)
+		eb |=3D 1u << VE_VECTOR;
 	/*
 	 * Guest access to VMware backdoor ports could legitimately
 	 * trigger #GP because of TSS I/O permission bitmap.
@@ -2615,6 +2625,9 @@ static int setup_vmcs_config(struct vmcs_config *vmcs=
_conf,
 					&_cpu_based_2nd_exec_control))
 			return -EIO;
 	}
+	if (!ept_violation_ve_test)
+		_cpu_based_2nd_exec_control &=3D ~SECONDARY_EXEC_EPT_VIOLATION_VE;
+
 #ifndef CONFIG_X86_64
 	if (!(_cpu_based_2nd_exec_control &
 				SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
@@ -2639,6 +2652,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs=
_conf,
 			return -EIO;
=20
 		vmx_cap->ept =3D 0;
+		_cpu_based_2nd_exec_control &=3D ~SECONDARY_EXEC_EPT_VIOLATION_VE;
 	}
 	if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_VPID) &&
 	    vmx_cap->vpid) {
@@ -4568,6 +4582,7 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx=
 *vmx)
 		exec_control &=3D ~SECONDARY_EXEC_ENABLE_VPID;
 	if (!enable_ept) {
 		exec_control &=3D ~SECONDARY_EXEC_ENABLE_EPT;
+		exec_control &=3D ~SECONDARY_EXEC_EPT_VIOLATION_VE;
 		enable_unrestricted_guest =3D 0;
 	}
 	if (!enable_unrestricted_guest)
@@ -4695,8 +4710,40 @@ static void init_vmcs(struct vcpu_vmx *vmx)
=20
 	exec_controls_set(vmx, vmx_exec_control(vmx));
=20
-	if (cpu_has_secondary_exec_ctrls())
+	if (cpu_has_secondary_exec_ctrls()) {
 		secondary_exec_controls_set(vmx, vmx_secondary_exec_control(vmx));
+		if (secondary_exec_controls_get(vmx) &
+		    SECONDARY_EXEC_EPT_VIOLATION_VE) {
+			if (!vmx->ve_info) {
+				/* ve_info must be page aligned. */
+				struct page *page;
+
+				BUILD_BUG_ON(sizeof(*vmx->ve_info) > PAGE_SIZE);
+				page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+				if (page)
+					vmx->ve_info =3D page_to_virt(page);
+			}
+			if (vmx->ve_info) {
+				/*
+				 * Allow #VE delivery. CPU sets this field to
+				 * 0xFFFFFFFF on #VE delivery.  Another #VE can
+				 * occur only if software clears the field.
+				 */
+				vmx->ve_info->delivery =3D 0;
+				vmcs_write64(VE_INFORMATION_ADDRESS,
+					     __pa(vmx->ve_info));
+			} else {
+				/*
+				 * Because SECONDARY_EXEC_EPT_VIOLATION_VE is
+				 * used only when ept_violation_ve_test is true,
+				 * it's okay to go with the bit disabled.
+				 */
+				pr_err("Failed to allocate ve_info. disabling EPT_VIOLATION_VE.\n");
+				secondary_exec_controls_clearbit(vmx,
+								 SECONDARY_EXEC_EPT_VIOLATION_VE);
+			}
+		}
+	}
=20
 	if (cpu_has_tertiary_exec_ctrls())
 		tertiary_exec_controls_set(vmx, vmx_tertiary_exec_control(vmx));
@@ -5176,6 +5223,12 @@ static int handle_exception_nmi(struct kvm_vcpu *vcp=
u)
 	if (is_invalid_opcode(intr_info))
 		return handle_ud(vcpu);
=20
+	/*
+	 * #VE isn't supposed to happen.  Although vcpu can send
+	 */
+	if (KVM_BUG_ON(is_ve_fault(intr_info), vcpu->kvm))
+		return -EIO;
+
 	error_code =3D 0;
 	if (intr_info & INTR_INFO_DELIVER_CODE_MASK)
 		error_code =3D vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
@@ -6364,6 +6417,18 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (secondary_exec_control & SECONDARY_EXEC_ENABLE_VPID)
 		pr_err("Virtual processor ID =3D 0x%04x\n",
 		       vmcs_read16(VIRTUAL_PROCESSOR_ID));
+	if (secondary_exec_control & SECONDARY_EXEC_EPT_VIOLATION_VE) {
+		struct vmx_ve_information *ve_info;
+
+		pr_err("VE info address =3D 0x%016llx\n",
+		       vmcs_read64(VE_INFORMATION_ADDRESS));
+		ve_info =3D __va(vmcs_read64(VE_INFORMATION_ADDRESS));
+		pr_err("ve_info: 0x%08x 0x%08x 0x%016llx 0x%016llx 0x%016llx 0x%04x\n",
+		       ve_info->exit_reason, ve_info->delivery,
+		       ve_info->exit_qualification,
+		       ve_info->guest_linear_address,
+		       ve_info->guest_physical_address, ve_info->eptp_index);
+	}
 }
=20
 /*
@@ -7362,6 +7427,8 @@ void vmx_vcpu_free(struct kvm_vcpu *vcpu)
 	free_vpid(vmx->vpid);
 	nested_vmx_free_vcpu(vcpu);
 	free_loaded_vmcs(vmx->loaded_vmcs);
+	if (vmx->ve_info)
+		free_page((unsigned long)vmx->ve_info);
 }
=20
 int vmx_vcpu_create(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d49d0ace9fb8..1813caeb24d8 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -359,6 +359,9 @@ struct vcpu_vmx {
 		DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 		DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 	} shadow_msr_intercept;
+
+	/* ve_info must be page aligned. */
+	struct vmx_ve_information *ve_info;
 };
=20
 struct kvm_vmx {
@@ -570,7 +573,8 @@ static inline u8 vmx_get_rvi(void)
 	 SECONDARY_EXEC_ENABLE_VMFUNC |					\
 	 SECONDARY_EXEC_BUS_LOCK_DETECTION |				\
 	 SECONDARY_EXEC_NOTIFY_VM_EXITING |				\
-	 SECONDARY_EXEC_ENCLS_EXITING)
+	 SECONDARY_EXEC_ENCLS_EXITING |					\
+	 SECONDARY_EXEC_EPT_VIOLATION_VE)
=20
 #define KVM_REQUIRED_VMX_TERTIARY_VM_EXEC_CONTROL 0
 #define KVM_OPTIONAL_VMX_TERTIARY_VM_EXEC_CONTROL			\
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3B8A9C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:26:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231255AbjB0I0B (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:01 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57436 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230525AbjB0IZH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:25:07 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6ACBF1CAE4;
        Mon, 27 Feb 2023 00:24:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486259; x=1709022259;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=inIssjsZxhtpLrcAN4HYRvnpjDxPI83d4OPZ//gWL30=;
  b=mkr1/0eC0ufvO2CzQj5ITjlK60E3I7IMjHgKexoWjBSRFZoPt4XR6IYj
   /+LzWKp2Duj3y6O3LD/Ipc/id+fSz3i2uttzJt6YzvMDO6CB/Gn7+/pKO
   RAVFNA23baIlVd/AIo5Z+J1hVqPHYgIfuvv/gTt7j6xNe3e9wRRjAzpug
   uSS4dMQpGml93WHlMlqVKmE4sJJc2D8oKU5ctoBIWYicA86FO1W6ddzXP
   TVzZvH1eT3oNvy0FInq14FEv+3xZya3jL3fJSiIk1xUjLfggcZ4ZXun/w
   fvrFDjcgfkVTiBebe44LVHHuFqzXCNdbydWZeUJncKQbf53oJep/47y6G
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608800"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608800"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:06 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242157"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242157"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:06 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 032/106] [MARKER] The start of TDX KVM patch series: KVM
 TDP MMU hooks
Date: Mon, 27 Feb 2023 00:22:31 -0800
Message-Id: 
 <0a9f9d0a333de9c7bde65edd9eb1fc958c0d1760.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of KVM TDP MMU
hooks.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index f10aff0b060e..f4aba85148e3 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -25,5 +25,5 @@ Patch Layer status
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
-* KVM TDP refactoring for TDX:          Applying
-* KVM TDP MMU hooks:                    Not yet
+* KVM TDP refactoring for TDX:          Applied
+* KVM TDP MMU hooks:                    Applying
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ECA94C7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:26:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231276AbjB0I0G (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:06 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56928 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230523AbjB0IZG (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:25:06 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AD731D92A;
        Mon, 27 Feb 2023 00:24:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486259; x=1709022259;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=W2/Apufv2lCT0qsu8jSnkv/XiCPPeRiKUKJNcfhRBE8=;
  b=FDA1UiDOx7fQrzC/nBH9ZPIpZvjinByIsTq7uyi+Lvh+sgP7YGm98GRn
   zgbUtfegZiZUyvr+aJTWwjdCulKL+Yh/oDiO2Mb6/0FQlRfYAqYBERC2+
   4cuV/SDd++8Vr4eBpgrIV3x4pXSLACw+HjPhirYePCMtkgWc+5jRYoLoE
   LYOBDN2Nf+ygAQ+Vb5oOpr7eccipSPnvRjClHExsSxL6YjOx+MZst3JFK
   VWgD5Ss7h2GOsjtEDzKt41mW65x2z26ryx5rKl+RrsdW/67bCbfraAMD3
   sWnXfXe5NIUsKsK9LmT+CerAQ6bAnLxJ/EFbBKkE9OjGLqsj6b9tbB7sm
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608804"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608804"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:07 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242160"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242160"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:06 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 033/106] KVM: x86/tdp_mmu: Init role member of struct
 kvm_mmu_page at allocation
Date: Mon, 27 Feb 2023 00:22:32 -0800
Message-Id: 
 <897eb6e3f61d164de13ebda4e203ac4508bfb175.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Refactor tdp_mmu_alloc_sp() and tdp_mmu_init_sp and eliminate
tdp_mmu_init_child_sp().  Currently tdp_mmu_init_sp() (or
tdp_mmu_init_child_sp()) sets kvm_mmu_page.role after tdp_mmu_alloc_sp()
allocating struct kvm_mmu_page and its page table page.  This patch makes
tdp_mmu_alloc_sp() initialize kvm_mmu_page.role instead of
tdp_mmu_init_sp().

To handle private page tables, argument of is_private needs to be passed
down.  Given that already page level is passed down, it would be cumbersome
to add one more parameter about sp. Instead replace the level argument with
union kvm_mmu_page_role.  Thus the number of argument won't be increased
and more info about sp can be passed down.

For private sp, secure page table will be also allocated in addition to
struct kvm_mmu_page and page table (spt member).  The allocation functions
(tdp_mmu_alloc_sp() and __tdp_mmu_alloc_sp_for_split()) need to know if the
allocation is for the conventional page table or private page table.  Pass
union kvm_mmu_role to those functions and initialize role member of struct
kvm_mmu_page.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/tdp_iter.h | 12 ++++++++++
 arch/x86/kvm/mmu/tdp_mmu.c  | 44 ++++++++++++++++---------------------
 2 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index f0af385c56e0..9e56a5b1024c 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -115,4 +115,16 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_=
mmu_page *root,
 void tdp_iter_next(struct tdp_iter *iter);
 void tdp_iter_restart(struct tdp_iter *iter);
=20
+static inline union kvm_mmu_page_role tdp_iter_child_role(struct tdp_iter =
*iter)
+{
+	union kvm_mmu_page_role child_role;
+	struct kvm_mmu_page *parent_sp;
+
+	parent_sp =3D sptep_to_sp(rcu_dereference(iter->sptep));
+
+	child_role =3D parent_sp->role;
+	child_role.level--;
+	return child_role;
+}
+
 #endif /* __KVM_X86_MMU_TDP_ITER_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 2619e20c4dfa..d4e7880a2eea 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -260,24 +260,30 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct =
kvm *kvm,
 		    kvm_mmu_page_as_id(_root) !=3D _as_id) {		\
 		} else
=20
-static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
+static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu,
+					     union kvm_mmu_page_role role)
 {
 	struct kvm_mmu_page *sp;
=20
 	sp =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
 	sp->spt =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
+	sp->role =3D role;
=20
 	return sp;
 }
=20
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
-			    gfn_t gfn, union kvm_mmu_page_role role)
+			    gfn_t gfn)
 {
 	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
=20
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
=20
-	sp->role =3D role;
+	/*
+	 * role must be set before calling this function.  At least role.level
+	 * is not 0 (PG_LEVEL_NONE).
+	 */
+	WARN_ON_ONCE(!sp->role.word);
 	sp->gfn =3D gfn;
 	sp->ptep =3D sptep;
 	sp->tdp_mmu_page =3D true;
@@ -285,20 +291,6 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, t=
dp_ptep_t sptep,
 	trace_kvm_mmu_get_page(sp, true);
 }
=20
-static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp,
-				  struct tdp_iter *iter)
-{
-	struct kvm_mmu_page *parent_sp;
-	union kvm_mmu_page_role role;
-
-	parent_sp =3D sptep_to_sp(rcu_dereference(iter->sptep));
-
-	role =3D parent_sp->role;
-	role.level--;
-
-	tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role);
-}
-
 hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
 {
 	union kvm_mmu_page_role role =3D vcpu->arch.mmu->root_role;
@@ -317,8 +309,8 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vc=
pu)
 			goto out;
 	}
=20
-	root =3D tdp_mmu_alloc_sp(vcpu);
-	tdp_mmu_init_sp(root, NULL, 0, role);
+	root =3D tdp_mmu_alloc_sp(vcpu, role);
+	tdp_mmu_init_sp(root, NULL, 0);
=20
 	refcount_set(&root->tdp_mmu_root_count, 1);
=20
@@ -1185,8 +1177,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm=
_page_fault *fault)
 		 * The SPTE is either non-present or points to a huge page that
 		 * needs to be split.
 		 */
-		sp =3D tdp_mmu_alloc_sp(vcpu);
-		tdp_mmu_init_child_sp(sp, &iter);
+		sp =3D tdp_mmu_alloc_sp(vcpu, tdp_iter_child_role(&iter));
+		tdp_mmu_init_sp(sp, iter.sptep, iter.gfn);
=20
 		sp->nx_huge_page_disallowed =3D fault->huge_page_disallowed;
=20
@@ -1415,7 +1407,7 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm,
 	return spte_set;
 }
=20
-static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp)
+static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp, union =
kvm_mmu_page_role role)
 {
 	struct kvm_mmu_page *sp;
=20
@@ -1425,6 +1417,7 @@ static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_sp=
lit(gfp_t gfp)
 	if (!sp)
 		return NULL;
=20
+	sp->role =3D role;
 	sp->spt =3D (void *)__get_free_page(gfp);
 	if (!sp->spt) {
 		kmem_cache_free(mmu_page_header_cache, sp);
@@ -1438,6 +1431,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli=
t(struct kvm *kvm,
 						       struct tdp_iter *iter,
 						       bool shared)
 {
+	union kvm_mmu_page_role role =3D tdp_iter_child_role(iter);
 	struct kvm_mmu_page *sp;
=20
 	/*
@@ -1449,7 +1443,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli=
t(struct kvm *kvm,
 	 * If this allocation fails we drop the lock and retry with reclaim
 	 * allowed.
 	 */
-	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT);
+	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT, role);
 	if (sp)
 		return sp;
=20
@@ -1461,7 +1455,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli=
t(struct kvm *kvm,
 		write_unlock(&kvm->mmu_lock);
=20
 	iter->yielded =3D true;
-	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_KERNEL_ACCOUNT);
+	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_KERNEL_ACCOUNT, role);
=20
 	if (shared)
 		read_lock(&kvm->mmu_lock);
@@ -1556,7 +1550,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *=
kvm,
 				continue;
 		}
=20
-		tdp_mmu_init_child_sp(sp, &iter);
+		tdp_mmu_init_sp(sp, iter.sptep, iter.gfn);
=20
 		if (tdp_mmu_split_huge_page(kvm, &iter, sp, shared))
 			goto retry;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 590D0C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:26:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231307AbjB0I0O (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:14 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56954 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230446AbjB0IZP (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:25:15 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42B701CF51;
        Mon, 27 Feb 2023 00:24:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486260; x=1709022260;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=kiMrCiwWUYaEeugDN3MIePrm0EH5lDPG8Yp+BXFRlOE=;
  b=LnvY4nWQ5JQoDPYcjNkZFRrCL0kzFqS46Q06op03Da0wEFqEckDxZgAO
   S0pZka5ScyJ7trVOeKOVYMBxdQml+Bg96NfZo+HwIoOsh1+xqqrm3irQx
   4OCF38XU8yX4RcL0tUS9PkTKS1vIReYFayzl6ViZleeJX86sGCaZHx57I
   r2b9AX60+TgPvyHT0nfwpIExXYzvxZ9yx+ST1xvnBkpL8yLoWLzkccUZ9
   lUlqPh5U3ERHo5t21LFsdiqSopHUDCXYrIZKMUVkpROkFHCFx7mRYLV3W
   UwSAeIgsePchHbMhHNi4GiWnDkG2LxDywvevQa+RLLkxDVoJOQ5euwTgk
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608808"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608808"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:07 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242165"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242165"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:06 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 034/106] KVM: x86/mmu: Require TDP MMU and mmio caching
 for TDX
Date: Mon, 27 Feb 2023 00:22:33 -0800
Message-Id: 
 <4bb83ed60289137bcc2ca72c56a88c2f935536af.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

As TDP MMU is becoming main stream than the legacy MMU, the legacy MMU
support for TDX isn't implemented.  TDX requires KVM mmio caching.  Disable
TDX support when TDP MMU or mmio caching aren't supported.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c  | 1 +
 arch/x86/kvm/vmx/main.c | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 898f36f2d84a..a6b0b53634e8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -99,6 +99,7 @@ module_param_named(flush_on_reuse, force_flush_and_sync_o=
n_reuse, bool, 0644);
  * If the hardware supports that we don't need to do shadow paging.
  */
 bool tdp_enabled =3D false;
+EXPORT_SYMBOL_GPL(tdp_enabled);
=20
 bool __ro_after_init tdp_mmu_allowed;
=20
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index fa0590e37ec1..0cd85c96ed84 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -2,6 +2,7 @@
 #include <linux/moduleparam.h>
=20
 #include "x86_ops.h"
+#include "mmu.h"
 #include "vmx.h"
 #include "nested.h"
 #include "pmu.h"
@@ -38,6 +39,11 @@ static __init int vt_hardware_setup(void)
=20
 	enable_tdx =3D enable_tdx && !tdx_hardware_setup(&vt_x86_ops);
=20
+	/* TDX requires KVM TDP MMU and MMIO caching. */
+	if (enable_tdx && (!tdp_enabled || !enable_mmio_caching)) {
+		enable_tdx =3D false;
+		pr_warn_ratelimited("tdp mmu and mmio caching need to be enabled.\n");
+	}
 	return 0;
 }
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2B44CC7EE30
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:26:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231298AbjB0I0L (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:11 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58160 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230452AbjB0IZU (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:25:20 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 292031DBB1;
        Mon, 27 Feb 2023 00:24:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486261; x=1709022261;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=JDRTnO6DfbfhrWHZOZhRoRN3sYCmVHAzigG9XmEuHP4=;
  b=kziP0tWQoK0tnCANb0omX/YBtVviZN7MEAoYqECagA9LFu7411lEG6qs
   A+EXQQC4kRpZs2sJZWi5Lm3zglS0p8O3PbhDTmBxxy3ECJBMAnb6q2WM6
   sv6I4kD6B52B+jUJKihNqa/Mhn3azpd7L9ZgQe6ed4lmV5d52uYZ9tpFQ
   w825F7gX1M5KwNrCj3gh2Dmbdcm2YgLGZfqklSpT4/LrE/a5Sv1ruhtds
   d/wnMp+LfX+8Ksfgc/G7VKB3xRCoxYCoAkwcaU5Bz3nCTuP8vYTpXSsFd
   U3aAgh+0Mkd3XyfIxgeW4vT6XrK/WAohGV49PnNRbpeFBoqJtQu1bPAxe
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608815"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608815"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:07 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242168"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242168"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:07 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 035/106] KVM: x86/mmu: Add a new is_private member for
 union kvm_mmu_page_role
Date: Mon, 27 Feb 2023 00:22:34 -0800
Message-Id: 
 <d3f292c79dac94f0a1a436c8b7cbef00ded646e8.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because TDX support introduces private mapping, add a new member in union
kvm_mmu_page_role with access functions to check the member.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/mmu/mmu_internal.h |  5 +++++
 arch/x86/kvm/mmu/spte.h         |  6 ++++++
 3 files changed, 38 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 079503be0fb3..afe9285930b3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -338,7 +338,12 @@ union kvm_mmu_page_role {
 		unsigned ad_disabled:1;
 		unsigned guest_mode:1;
 		unsigned passthrough:1;
+#ifdef CONFIG_KVM_MMU_PRIVATE
+		unsigned is_private:1;
+		unsigned :4;
+#else
 		unsigned :5;
+#endif
=20
 		/*
 		 * This is left at the top of the word so that
@@ -350,6 +355,28 @@ union kvm_mmu_page_role {
 	};
 };
=20
+#ifdef CONFIG_KVM_MMU_PRIVATE
+static inline bool kvm_mmu_page_role_is_private(union kvm_mmu_page_role ro=
le)
+{
+	return !!role.is_private;
+}
+
+static inline void kvm_mmu_page_role_set_private(union kvm_mmu_page_role *=
role)
+{
+	role->is_private =3D 1;
+}
+#else
+static inline bool kvm_mmu_page_role_is_private(union kvm_mmu_page_role ro=
le)
+{
+	return false;
+}
+
+static inline void kvm_mmu_page_role_set_private(union kvm_mmu_page_role *=
role)
+{
+	WARN_ON_ONCE(1);
+}
+#endif
+
 /*
  * kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties
  * relevant to the current MMU configuration.   When loading CR0, CR4, or =
EFER,
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index f6d81505d4ba..0111eed08c04 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -143,6 +143,11 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_pa=
ge *sp)
 	return kvm_mmu_role_as_id(sp->role);
 }
=20
+static inline bool is_private_sp(const struct kvm_mmu_page *sp)
+{
+	return kvm_mmu_page_role_is_private(sp->role);
+}
+
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page =
*sp)
 {
 	/*
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 256395eb593f..7046671b08cb 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -251,6 +251,12 @@ static inline struct kvm_mmu_page *sptep_to_sp(u64 *sp=
tep)
 	return to_shadow_page(__pa(sptep));
 }
=20
+static inline bool is_private_sptep(u64 *sptep)
+{
+	WARN_ON_ONCE(!sptep);
+	return is_private_sp(sptep_to_sp(sptep));
+}
+
 static inline bool is_mmio_spte(struct kvm *kvm, u64 spte)
 {
 	return (spte & shadow_mmio_mask) =3D=3D kvm->arch.shadow_mmio_value &&
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9ADAFC64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:26:18 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231313AbjB0I0R (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:17 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59242 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231176AbjB0IZl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:25:41 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F2A61DBB3;
        Mon, 27 Feb 2023 00:24:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486261; x=1709022261;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=hbZQg/rKtctKAJPVs2bis0b54Gw3EaInb2Ubug/EbuU=;
  b=mBWK4AvZW+ZZvGVe2JxdaRiuMsAiJAtCF4C/3lZJQhEMPkrq1GcCCNe4
   G2t/yJaAmyIoVrprHjnP0Sp2MFiLZ5znbIMrTk0LuZL6yQ5S+GxsefB12
   lMXqijnOBDlcgZbQySimtlycinTDMqQk88hTN0anrbKl+MNmavGmWFT+7
   zhDIOA9hCa3XSmJ76MQWQqYBmDNCm0wjN1wkLjJdmqWUzfzVnQGu6vKxc
   Huru2wDIyRTKUB0bP7VEU5Qxbh3lJN9hePaT7JVyNrwFfMb/23Y+8LrRv
   d5jSNqwhKlK1Sny9fTscvbgly1LPrxPalSFBg9wJciy9TqUacyOSOj6KQ
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608817"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608817"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:07 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242172"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242172"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:07 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 036/106] KVM: x86/mmu: Add a private pointer to struct
 kvm_mmu_page
Date: Mon, 27 Feb 2023 00:22:35 -0800
Message-Id: 
 <1ac13fb0ac8647642aeff20d84001564f8fbfbd5.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

For private GPA, CPU refers a private page table whose contents are
encrypted.  The dedicated APIs to operate on it (e.g. updating/reading its
PTE entry) are used and their cost is expensive.

When KVM resolves KVM page fault, it walks the page tables.  To reuse the
existing KVM MMU code and mitigate the heavy cost to directly walk private
page table, allocate one more page to copy the dummy page table for KVM MMU
code to directly walk.  Resolve KVM page fault with the existing code, and
do additional operations necessary for the private page table.  To
distinguish such cases, the existing KVM page table is called a shared page
table (i.e. not associated with private page table), and the page table
with private page table is called a private page table.  The relationship
is depicted below.

Add a private pointer to struct kvm_mmu_page for private page table and
add helper functions to allocate/initialize/free a private page table
page.

              KVM page fault                     |
                     |                           |
                     V                           |
        -------------+----------                 |
        |                      |                 |
        V                      V                 |
     shared GPA           private GPA            |
        |                      |                 |
        V                      V                 |
    shared PT root      dummy PT root            |    private PT root
        |                      |                 |           |
        V                      V                 |           V
     shared PT            dummy PT ----propagate---->   private PT
        |                      |                 |           |
        |                      \-----------------+------\    |
        |                                        |      |    |
        V                                        |      V    V
  shared guest page                              |    private guest page
                                                 |
                           non-encrypted memory  |    encrypted memory
                                                 |
PT: page table
- Shared PT is visible to KVM and it is used by CPU.
- Private PT is used by CPU but it is invisible to KVM.
- Dummy PT is visible to KVM but not used by CPU.  It is used to
  propagate PT change to the actual private PT which is used by CPU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  5 ++
 arch/x86/kvm/mmu/mmu.c          |  7 +++
 arch/x86/kvm/mmu/mmu_internal.h | 83 +++++++++++++++++++++++++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c      |  1 +
 4 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index afe9285930b3..0f25cd0a0f02 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -810,6 +810,11 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu_memory_cache mmu_shadow_page_cache;
 	struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
+	/*
+	 * This cache is to allocate private page table. E.g.  Secure-EPT used
+	 * by the TDX module.
+	 */
+	struct kvm_mmu_memory_cache mmu_private_spt_cache;
=20
 	/*
 	 * QEMU userspace and the guest each have their own FPU state.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a6b0b53634e8..1d0560111554 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -666,6 +666,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vc=
pu, bool maybe_indirect)
 				       1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
 	if (r)
 		return r;
+	if (kvm_gfn_shared_mask(vcpu->kvm)) {
+		r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_private_spt_cache,
+					       PT64_ROOT_MAX_LEVEL);
+		if (r)
+			return r;
+	}
 	r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
 				       PT64_ROOT_MAX_LEVEL);
 	if (r)
@@ -685,6 +691,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcp=
u)
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
+	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_private_spt_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
=20
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index 0111eed08c04..e1168a01af64 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -93,7 +93,23 @@ struct kvm_mmu_page {
 		int root_count;
 		refcount_t tdp_mmu_root_count;
 	};
-	unsigned int unsync_children;
+	union {
+		struct {
+			unsigned int unsync_children;
+			/*
+			 * Number of writes since the last time traversal
+			 * visited this page.
+			 */
+			atomic_t write_flooding_count;
+		};
+#ifdef CONFIG_KVM_MMU_PRIVATE
+		/*
+		 * Associated private shadow page table, e.g. Secure-EPT page
+		 * passed to the TDX module.
+		 */
+		void *private_spt;
+#endif
+	};
 	union {
 		struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
 		tdp_ptep_t ptep;
@@ -122,9 +138,6 @@ struct kvm_mmu_page {
 	int clear_spte_count;
 #endif
=20
-	/* Number of writes since the last time traversal visited this page.  */
-	atomic_t write_flooding_count;
-
 #ifdef CONFIG_X86_64
 	/* Used for freeing the page asynchronously if it is a TDP MMU page. */
 	struct rcu_head rcu_head;
@@ -148,6 +161,68 @@ static inline bool is_private_sp(const struct kvm_mmu_=
page *sp)
 	return kvm_mmu_page_role_is_private(sp->role);
 }
=20
+#ifdef CONFIG_KVM_MMU_PRIVATE
+static inline void *kvm_mmu_private_spt(struct kvm_mmu_page *sp)
+{
+	return sp->private_spt;
+}
+
+static inline void kvm_mmu_init_private_spt(struct kvm_mmu_page *sp, void =
*private_spt)
+{
+	sp->private_spt =3D private_spt;
+}
+
+static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct=
 kvm_mmu_page *sp)
+{
+	bool is_root =3D vcpu->arch.root_mmu.root_role.level =3D=3D sp->role.leve=
l;
+
+	KVM_BUG_ON(!kvm_mmu_page_role_is_private(sp->role), vcpu->kvm);
+	if (is_root)
+		/*
+		 * Because TDX module assigns root Secure-EPT page and set it to
+		 * Secure-EPTP when TD vcpu is created, secure page table for
+		 * root isn't needed.
+		 */
+		sp->private_spt =3D NULL;
+	else {
+		/*
+		 * Because the TDX module doesn't trust VMM and initializes
+		 * the pages itself, KVM doesn't initialize them.  Allocate
+		 * pages with garbage and give them to the TDX module.
+		 */
+		sp->private_spt =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_private_s=
pt_cache);
+		/*
+		 * Because mmu_private_spt_cache is topped up before staring kvm
+		 * page fault resolving, the allocation above shouldn't fail.
+		 */
+		WARN_ON_ONCE(!sp->private_spt);
+	}
+}
+
+static inline void kvm_mmu_free_private_spt(struct kvm_mmu_page *sp)
+{
+	if (sp->private_spt)
+		free_page((unsigned long)sp->private_spt);
+}
+#else
+static inline void *kvm_mmu_private_spt(struct kvm_mmu_page *sp)
+{
+	return NULL;
+}
+
+static inline void kvm_mmu_init_private_spt(struct kvm_mmu_page *sp, void =
*private_spt)
+{
+}
+
+static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct=
 kvm_mmu_page *sp)
+{
+}
+
+static inline void kvm_mmu_free_private_spt(struct kvm_mmu_page *sp)
+{
+}
+#endif
+
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page =
*sp)
 {
 	/*
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index d4e7880a2eea..cc93085e27a0 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -56,6 +56,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
=20
 static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
 {
+	kvm_mmu_free_private_spt(sp);
 	free_page((unsigned long)sp->spt);
 	kmem_cache_free(mmu_page_header_cache, sp);
 }
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1F9A7C7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:26:33 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230494AbjB0I0b (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:31 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59340 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231197AbjB0IZn (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:25:43 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 230B51D910;
        Mon, 27 Feb 2023 00:24:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486263; x=1709022263;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=dNS0sA+3cs/IyUzhFlhglO96msoXa6kMetWLyhQaOEw=;
  b=dVnjVbUqGzG5Inl7DaOCnr54yp8+0livfZuS3r7dZUvZ3aBpqK00O85W
   XFqo+9/4EuCNdFrbvkVa0cB+iSwNuyCSH4PUVAqeuJ7uSDsuQw1rhaUbH
   Qut9cjCxuNGeafPPBljdYkHrfcTlrvdj8PzxMjhygyxPsjyXzZpcVCakP
   51pw3/0bJC7ZzcJ0ElUQjMujaHNYXdj7u+I0w4U6zm6uVVFqOqmri6me5
   D5wYYFtwMRU61RugWatU01tE2m14XVhlpUqX3xAJjDaRHe6a3g/JaN8wp
   uOzHrQ6l7rAS+K8lVNJBJ3V1GkSzMESkLCxXMsd+IKHUcM43mZsTRnt6D
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608822"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608822"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:07 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242177"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242177"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:07 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 037/106] KVM: Add flags to struct kvm_gfn_range
Date: Mon, 27 Feb 2023 00:22:36 -0800
Message-Id: 
 <3f815ce9665313a5f1fedcf6afebb746ea6de678.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

kvm_unmap_gfn_range() needs to know the reason of the callback for TDX.
mmu notifier, set memattr ioctl or restrictedmem notifier.  Based on the
reason, TDX changes the behavior.  For mmu notifier, it's the operation on
shared memory slot to zap shared PTE.  For set memattr, it's the operation
of private<->shared conversion, zap the original PTE.  For restrictedmem,
it's punching a hole of the range, zap the corresponding PTE.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/linux/kvm_host.h | 10 +++++++++-
 virt/kvm/kvm_main.c      |  5 ++++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f6470338d5fa..5e4bf78025e3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -257,12 +257,20 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
=20
 #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER
+
+#define KVM_GFN_RANGE_FLAGS_RESTRICTED_MEM	BIT(0)
+#define KVM_GFN_RANGE_FLAGS_SET_MEM_ATTR	BIT(1)
+
 struct kvm_gfn_range {
 	struct kvm_memory_slot *slot;
 	gfn_t start;
 	gfn_t end;
-	pte_t pte;
+	union {
+		pte_t pte;
+		u64 attrs;
+	};
 	bool may_block;
+	unsigned int flags;
 };
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 87400796df6e..d6db3f19ad74 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -626,6 +626,7 @@ static __always_inline int __kvm_handle_hva_range(struc=
t kvm *kvm,
 			gfn_range.start =3D hva_to_gfn_memslot(hva_start, slot);
 			gfn_range.end =3D hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot);
 			gfn_range.slot =3D slot;
+			gfn_range.flags =3D 0;
=20
 			if (!locked) {
 				locked =3D true;
@@ -957,6 +958,7 @@ static int restrictedmem_get_gfn_range(struct kvm_memor=
y_slot *slot,
 	range->slot =3D slot;
 	range->pte =3D __pte(0);
 	range->may_block =3D true;
+	range->flags =3D KVM_GFN_RANGE_FLAGS_RESTRICTED_MEM;
 	return 0;
 }
=20
@@ -2557,8 +2559,9 @@ static void kvm_mem_attrs_changed(struct kvm *kvm, un=
signed long attrs,
 	bool flush =3D false;
 	int i;
=20
-	gfn_range.pte =3D __pte(0);
+	gfn_range.attrs =3D attrs;
 	gfn_range.may_block =3D true;
+	gfn_range.flags =3D KVM_GFN_RANGE_FLAGS_SET_MEM_ATTR;
=20
 	for (i =3D 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
 		slots =3D __kvm_memslots(kvm, i);
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 70B6BC7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:26:24 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230481AbjB0I0V (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:21 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59278 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231184AbjB0IZm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:25:42 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CE6416313;
        Mon, 27 Feb 2023 00:24:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486262; x=1709022262;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=u2I6bng2mwuKzmtAe3KryI63mfch6qNsX77FSW8RZdU=;
  b=eB0UhO+M5hLsVDBxoPQweL7GGMmiC4leoddKLwzyHvH+di2P19zLRiei
   xzkVsdAjcJ46mhgj4bHe5JXGzTAeZtzFXZjIqLFpCi4WCIFxAgov+F3gT
   4O682FCDyLcsB601iiVN1XWdEqL6rWY55vdU5oskhI5SCvmhxXuaL6jOb
   liqNrpATJFohI6N3vhtvTMlCOa+8jc0PkWM6hud/p4K1dnv97b57OyP2S
   hJOeYS0oN7p4J430ZWxazMc2+kwoU2Sv88EvqnEFyunysfsW/hAApHNZ/
   QHEXxlSK2ZNoYI5T6xJN/OT+CEbnoLjHrzV1SDlF/RTfanBJAAAHGau/6
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608821"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608821"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:07 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242180"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242180"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:07 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 038/106] KVM: x86/tdp_mmu: Don't zap private pages for
 unsupported cases
Date: Mon, 27 Feb 2023 00:22:37 -0800
Message-Id: 
 <a11cc760a4b3f6a5082eda9c59c197261ce6170e.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX supports only write-back(WB) memory type for private memory
architecturally so that (virtualized) memory type change doesn't make sense
for private memory.  Also currently, page migration isn't supported for TDX
yet. (TDX architecturally supports page migration. it's KVM and kernel
implementation issue.)

Regarding memory type change (mtrr virtualization and lapic page mapping
change), pages are zapped by kvm_zap_gfn_range().  On the next KVM page
fault, the SPTE entry with a new memory type for the page is populated.
Regarding page migration, pages are zapped by the mmu notifier. On the next
KVM page fault, the new migrated page is populated.  Don't zap private
pages on unmapping for those two cases.

When deleting/moving a KVM memory slot, zap private pages. Typically
tearing down VM.  Don't invalidate private page tables. i.e. zap only leaf
SPTEs for KVM mmu that has a shared bit mask. The existing
kvm_tdp_mmu_invalidate_all_roots() depends on role.invalid with read-lock
of mmu_lock so that other vcpu can operate on KVM mmu concurrently.  It
marks the root page table invalid and zaps SPTEs of the root page
tables. The TDX module doesn't allow to unlink a protected root page table
from the hardware and then allocate a new one for it. i.e. replacing a
protected root page table.  Instead, zap only leaf SPTEs for KVM mmu with a
shared bit mask set.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c     | 81 ++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c | 24 ++++++++---
 arch/x86/kvm/mmu/tdp_mmu.h |  5 ++-
 3 files changed, 99 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1d0560111554..d050495f834a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1590,8 +1590,28 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm=
_gfn_range *range)
 	if (kvm_memslots_have_rmaps(kvm))
 		flush =3D kvm_handle_gfn_range(kvm, range, kvm_zap_rmap);
=20
-	if (tdp_mmu_enabled)
-		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush);
+	if (tdp_mmu_enabled) {
+		bool zap_private;
+
+		if (range->flags & KVM_GFN_RANGE_FLAGS_RESTRICTED_MEM) {
+			WARN_ON_ONCE(!kvm_slot_can_be_private(range->slot));
+			/*
+			 * For private slot, the callback is triggered
+			 * via PUNCH_HOLE (fallocate(PUNCH_HOLE) or truncate).
+			 * private-shared conversion is done by
+			 * KVM_SET_MEMORY_ATTRIBUTES.
+			 */
+			zap_private =3D true;
+		} else if (range->flags & KVM_GFN_RANGE_FLAGS_SET_MEM_ATTR)
+			zap_private =3D !(range->attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE);
+		else
+			/*
+			 * For now private pages are pinned during VM's life
+			 * time.
+			 */
+			zap_private =3D false;
+		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush, zap_private);
+	}
=20
 	return flush;
 }
@@ -6167,11 +6187,54 @@ static bool kvm_has_zapped_obsolete_pages(struct kv=
m *kvm)
 	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
 }
=20
+static void kvm_mmu_zap_memslot(struct kvm *kvm, struct kvm_memory_slot *s=
lot)
+{
+	bool flush =3D false;
+
+	write_lock(&kvm->mmu_lock);
+
+	/*
+	 * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
+	 * case scenario we'll have unused shadow pages lying around until they
+	 * are recycled due to age or when the VM is destroyed.
+	 */
+	if (tdp_mmu_enabled) {
+		struct kvm_gfn_range range =3D {
+		      .slot =3D slot,
+		      .start =3D slot->base_gfn,
+		      .end =3D slot->base_gfn + slot->npages,
+		      .may_block =3D true,
+		};
+
+		/*
+		 * this handles both private gfn and shared gfn.
+		 * All private page should be zapped on memslot deletion.
+		 */
+		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush, true);
+	} else {
+		/* TDX supports only TDP-MMU case. */
+		WARN_ON_ONCE(1);
+		flush =3D true;
+	}
+	if (flush)
+		kvm_flush_remote_tlbs(kvm);
+
+	write_unlock(&kvm->mmu_lock);
+}
+
 static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
 			struct kvm_memory_slot *slot,
 			struct kvm_page_track_notifier_node *node)
 {
-	kvm_mmu_zap_all_fast(kvm);
+	if (kvm_gfn_shared_mask(kvm))
+		/*
+		 * Secure-EPT requires to release PTs from the leaf.  The
+		 * optimization to zap root PT first with child PT doesn't
+		 * work.
+		 */
+		kvm_mmu_zap_memslot(kvm, slot);
+	else
+		kvm_mmu_zap_all_fast(kvm);
 }
=20
 int kvm_mmu_init_vm(struct kvm *kvm)
@@ -6279,8 +6342,18 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_st=
art, gfn_t gfn_end)
=20
 	if (tdp_mmu_enabled) {
 		for (i =3D 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++)
+			/*
+			 * zap_private =3D true. Zap both private/shared pages.
+			 *
+			 * kvm_zap_gfn_range() is used when MTRR or PAT memory
+			 * type was changed.  Later on the next kvm page fault,
+			 * populate it with updated spte entry.
+			 * Because only WB is supported for private pages, don't
+			 * care of private pages.
+			 */
 			flush =3D kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start,
-						      gfn_end, true, flush);
+						      gfn_end, true, flush,
+						      false);
 	}
=20
 	if (flush)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index cc93085e27a0..e17adceec426 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -932,7 +932,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu=
_page *sp)
  * operation can cause a soft lockup.
  */
 static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
-			      gfn_t start, gfn_t end, bool can_yield, bool flush)
+			      gfn_t start, gfn_t end, bool can_yield, bool flush,
+			      bool zap_private)
 {
 	struct tdp_iter iter;
=20
@@ -940,6 +941,10 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct =
kvm_mmu_page *root,
=20
 	lockdep_assert_held_write(&kvm->mmu_lock);
=20
+	WARN_ON_ONCE(zap_private && !is_private_sp(root));
+	if (!zap_private && is_private_sp(root))
+		return false;
+
 	rcu_read_lock();
=20
 	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
@@ -972,12 +977,13 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct=
 kvm_mmu_page *root,
  * more SPTEs were zapped since the MMU lock was last acquired.
  */
 bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t =
end,
-			   bool can_yield, bool flush)
+			   bool can_yield, bool flush, bool zap_private)
 {
 	struct kvm_mmu_page *root;
=20
 	for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
-		flush =3D tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);
+		flush =3D tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush,
+					  zap_private && is_private_sp(root));
=20
 	return flush;
 }
@@ -1037,6 +1043,12 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kv=
m)
=20
 	lockdep_assert_held_write(&kvm->mmu_lock);
 	list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
+		/*
+		 * Skip private root since private page table
+		 * is only torn down when VM is destroyed.
+		 */
+		if (is_private_sp(root))
+			continue;
 		if (!root->role.invalid &&
 		    !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) {
 			root->role.invalid =3D true;
@@ -1221,11 +1233,13 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct k=
vm_page_fault *fault)
 	return ret;
 }
=20
+/* Used by mmu notifier via kvm_unmap_gfn_range() */
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *ra=
nge,
-				 bool flush)
+				 bool flush, bool zap_private)
 {
 	return kvm_tdp_mmu_zap_leafs(kvm, range->slot->as_id, range->start,
-				     range->end, range->may_block, flush);
+				     range->end, range->may_block, flush,
+				     zap_private);
 }
=20
 typedef bool (*tdp_handler_t)(struct kvm *kvm, struct tdp_iter *iter,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 0a63b1afabd3..b32cbdf2f675 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -21,7 +21,8 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu=
_page *root,
 			  bool shared);
=20
 bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start,
-				 gfn_t end, bool can_yield, bool flush);
+			   gfn_t end, bool can_yield, bool flush,
+			   bool zap_private);
 bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp);
 void kvm_tdp_mmu_zap_all(struct kvm *kvm);
 void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
@@ -30,7 +31,7 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm);
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
=20
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *ra=
nge,
-				 bool flush);
+				 bool flush, bool zap_private);
 bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *rang=
e);
 bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range=
);
 bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range=
);
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0BEF0C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:26:36 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230339AbjB0I0e (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:34 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56468 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231203AbjB0IZn (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:25:43 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D23081DB90;
        Mon, 27 Feb 2023 00:24:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486263; x=1709022263;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=pdgoFQm3JRuaMw47/Kh+D9Xlv4eahf93UcQ7oLnvQ+k=;
  b=aaLsea3IAcl381HtOr3qt3li5+kuf7lIL3PKc/JCmJcidqUNZFfvjikw
   MF0tIdEOpTrZ1sSdT7PdA+ITFU8RertCSTRb63vjUzpMagWAVrKkrfv/g
   eHW/Ux6c8jooX+g/0j62wYSNhbqNIr8NFQdAu01vz7TgHkzYj+CSmXoyc
   lPY57OQhQ76OgB+4U1RkD/hmNk5tSDVSPEN4+VwdMLk6gx4KFV3KqTlZP
   WNtuz3o5rAnxmewQ1UPoJ2/hmbo/NgAbb1juTh/QYrq2mmRRyI/eR+LR5
   bLfbCW7aA9lvVR7Iee8CKmDvWEyPbK/+blPr6MOQgXUsIVf/w2uayHkVY
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608823"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608823"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:08 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242183"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242183"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:07 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 039/106] KVM: x86/tdp_mmu: Sprinkle __must_check
Date: Mon, 27 Feb 2023 00:22:38 -0800
Message-Id: 
 <40c4f479dfab6092dbd0054dea05ec18f719dd40.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDP MMU allows tdp_mmu_set_spte_atomic() and tdp_mmu_zap_spte_atomic() to
return -EBUSY or -EAGAIN error.  The caller must check the return value and
retry.  Sprinkle __must_check to guarantee it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index e17adceec426..03a16f8ee8c7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -628,9 +628,9 @@ static void handle_changed_spte(struct kvm *kvm, int as=
_id, gfn_t gfn,
  *            no side-effects other than setting iter->old_spte to the last
  *            known value of the spte.
  */
-static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
-					  struct tdp_iter *iter,
-					  u64 new_spte)
+static inline int __must_check tdp_mmu_set_spte_atomic(struct kvm *kvm,
+						       struct tdp_iter *iter,
+						       u64 new_spte)
 {
 	u64 *sptep =3D rcu_dereference(iter->sptep);
=20
@@ -658,8 +658,8 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *k=
vm,
 	return 0;
 }
=20
-static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
-					  struct tdp_iter *iter)
+static inline int __must_check tdp_mmu_zap_spte_atomic(struct kvm *kvm,
+						       struct tdp_iter *iter)
 {
 	int ret;
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5CFEAC64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:26:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231360AbjB0I0i (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:38 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55120 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231208AbjB0IZp (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:25:45 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7728B1CAF7;
        Mon, 27 Feb 2023 00:24:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486264; x=1709022264;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=I26buyunuEQ1vsnvR89PZJpLRJDoztid9o5R0qFYcW8=;
  b=DOLX+5GAPCY0i1FLXuTYfN01YIODIC2fbD1BY1DPVBjQd53aBjdNHj3q
   utDdDLpC3VRZ5TZU58aXuiJbhETDuhZ2KZzV/kJ7RwAaLmOa3MM3TuM9W
   XMCMw/bHw5D7l5FIgb05Xa3cCl9zMMgH7PBxhN/NuUDeeYI9NPc4rXF9F
   JJyJVWkBHy/ql4qPAA52guk/veMHeuX+O+1Fd5mJ0mPEBQK9Us4di2twY
   0mq6sBXsgIw2/YE/ZIFgsEZHQaIXxOOY+h32rmvjEMjdto78iZLiqWOiV
   JBUK4SSKN6x4OwqaQ2wsAuSsZ7uJF4bo8975mJL4ynfk+uvRMtCAd97dT
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608835"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608835"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:08 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242187"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242187"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:08 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 040/106] KVM: x86/tdp_mmu: Support TDX private mapping for
 TDP MMU
Date: Mon, 27 Feb 2023 00:22:39 -0800
Message-Id: 
 <afa8f6f9a7907d5d302fcfc1f4e9e53d7365efab.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Allocate protected page table for private page table, and add hooks to
operate on protected page table.  This patch adds allocation/free of
protected page tables and hooks.  When calling hooks to update SPTE entry,
freeze the entry, call hooks and unfreeze the entry to allow concurrent
updates on page tables.  Which is the advantage of TDP MMU.  As
kvm_gfn_shared_mask() returns false always, those hooks aren't called yet
with this patch.

When the faulting GPA is private, the KVM fault is called private.  When
resolving private KVM fault, allocate protected page table and call hooks
to operate on protected page table. On the change of the private PTE entry,
invoke kvm_x86_ops hook in __handle_changed_spte() to propagate the change
to protected page table. The following depicts the relationship.

  private KVM page fault   |
      |                    |
      V                    |
 private GPA               |     CPU protected EPTP
      |                    |           |
      V                    |           V
 private PT root           |     protected PT root
      |                    |           |
      V                    |           V
   private PT --hook to propagate-->protected PT
      |                    |           |
      \--------------------+------\    |
                           |      |    |
                           |      V    V
                           |    private guest page
                           |
                           |
     non-encrypted memory  |    encrypted memory
                           |
PT: page table

The existing KVM TDP MMU code uses atomic update of SPTE.  On populating
the EPT entry, atomically set the entry.  However, it requires TLB
shootdown to zap SPTE.  To address it, the entry is frozen with the special
SPTE value that clears the present bit. After the TLB shootdown, the entry
is set to the eventual value (unfreeze).

For protected page table, hooks are called to update protected page table
in addition to direct access to the private SPTE. For the zapping case, it
works to freeze the SPTE. It can call hooks in addition to TLB shootdown.
For populating the private SPTE entry, there can be a race condition
without further protection

  vcpu 1: populating 2M private SPTE
  vcpu 2: populating 4K private SPTE
  vcpu 2: TDX SEAMCALL to update 4K protected SPTE =3D> error
  vcpu 1: TDX SEAMCALL to update 2M protected SPTE

To avoid the race, the frozen SPTE is utilized.  Instead of atomic update
of the private entry, freeze the entry, call the hook that update protected
SPTE, set the entry to the final value.

Support 4K page only at this stage.  2M page support can be done in future
patches.

Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
Changes v11 -> v12
- Split tdp mmu hooks at tdp_mmu_set_spte_atomic() for populating and
  __handle_changed_spte() for zapping.
---
 arch/x86/include/asm/kvm-x86-ops.h |   5 +
 arch/x86/include/asm/kvm_host.h    |  11 ++
 arch/x86/kvm/mmu/mmu.c             |  13 +-
 arch/x86/kvm/mmu/mmu_internal.h    |  21 +-
 arch/x86/kvm/mmu/tdp_iter.h        |   2 +-
 arch/x86/kvm/mmu/tdp_mmu.c         | 300 +++++++++++++++++++++++++----
 arch/x86/kvm/mmu/tdp_mmu.h         |   2 +-
 virt/kvm/kvm_main.c                |   1 +
 8 files changed, 313 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index d29e16098c30..2681300ce142 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -96,6 +96,11 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr)
 KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
 KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
 KVM_X86_OP(load_mmu_pgd)
+KVM_X86_OP_OPTIONAL(link_private_spt)
+KVM_X86_OP_OPTIONAL(free_private_spt)
+KVM_X86_OP_OPTIONAL(set_private_spte)
+KVM_X86_OP_OPTIONAL(remove_private_spte)
+KVM_X86_OP_OPTIONAL(zap_private_spte)
 KVM_X86_OP(has_wbinvd_exit)
 KVM_X86_OP(get_l2_tsc_offset)
 KVM_X86_OP(get_l2_tsc_multiplier)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 0f25cd0a0f02..39c28383c2d6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -470,6 +470,7 @@ struct kvm_mmu {
 			 struct kvm_mmu_page *sp);
 	void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
 	struct kvm_mmu_root_info root;
+	hpa_t private_root_hpa;
 	union kvm_cpu_role cpu_role;
 	union kvm_mmu_page_role root_role;
=20
@@ -1690,6 +1691,16 @@ struct kvm_x86_ops {
 	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			     int root_level);
=20
+	int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
+				void *private_spt);
+	int (*free_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
+				void *private_spt);
+	int (*set_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
+				 kvm_pfn_t pfn);
+	int (*remove_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level leve=
l,
+				    kvm_pfn_t pfn);
+	int (*zap_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level);
+
 	bool (*has_wbinvd_exit)(void);
=20
 	u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d050495f834a..621ac8ea54d6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3681,7 +3681,12 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *v=
cpu)
 		goto out_unlock;
=20
 	if (tdp_mmu_enabled) {
-		root =3D kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
+		if (kvm_gfn_shared_mask(vcpu->kvm) &&
+		    !VALID_PAGE(mmu->private_root_hpa)) {
+			root =3D kvm_tdp_mmu_get_vcpu_root_hpa(vcpu, true);
+			mmu->private_root_hpa =3D root;
+		}
+		root =3D kvm_tdp_mmu_get_vcpu_root_hpa(vcpu, false);
 		mmu->root.hpa =3D root;
 	} else if (shadow_root_level >=3D PT64_ROOT_4LEVEL) {
 		root =3D mmu_alloc_root(vcpu, 0, 0, shadow_root_level);
@@ -5981,6 +5986,7 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, st=
ruct kvm_mmu *mmu)
=20
 	mmu->root.hpa =3D INVALID_PAGE;
 	mmu->root.pgd =3D 0;
+	mmu->private_root_hpa =3D INVALID_PAGE;
 	for (i =3D 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
 		mmu->prev_roots[i] =3D KVM_MMU_ROOT_INFO_INVALID;
=20
@@ -6207,7 +6213,7 @@ static void kvm_mmu_zap_memslot(struct kvm *kvm, stru=
ct kvm_memory_slot *slot)
 		};
=20
 		/*
-		 * this handles both private gfn and shared gfn.
+		 * This handles both private gfn and shared gfn.
 		 * All private page should be zapped on memslot deletion.
 		 */
 		flush =3D kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush, true);
@@ -7027,6 +7033,9 @@ int kvm_mmu_vendor_module_init(void)
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_unload(vcpu);
+	if (tdp_mmu_enabled)
+		mmu_free_root_page(vcpu->kvm, &vcpu->arch.mmu->private_root_hpa,
+				NULL);
 	free_mmu_pages(&vcpu->arch.root_mmu);
 	free_mmu_pages(&vcpu->arch.guest_mmu);
 	mmu_free_memory_caches(vcpu);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index e1168a01af64..917f7066527b 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -6,6 +6,8 @@
 #include <linux/kvm_host.h>
 #include <asm/kvm_host.h>
=20
+#include "mmu.h"
+
 #undef MMU_DEBUG
=20
 #ifdef MMU_DEBUG
@@ -204,6 +206,15 @@ static inline void kvm_mmu_free_private_spt(struct kvm=
_mmu_page *sp)
 	if (sp->private_spt)
 		free_page((unsigned long)sp->private_spt);
 }
+
+static inline gfn_t kvm_gfn_for_root(struct kvm *kvm, struct kvm_mmu_page =
*root,
+				     gfn_t gfn)
+{
+	if (is_private_sp(root))
+		return kvm_gfn_private(kvm, gfn);
+	else
+		return kvm_gfn_shared(kvm, gfn);
+}
 #else
 static inline void *kvm_mmu_private_spt(struct kvm_mmu_page *sp)
 {
@@ -221,6 +232,12 @@ static inline void kvm_mmu_alloc_private_spt(struct kv=
m_vcpu *vcpu, struct kvm_m
 static inline void kvm_mmu_free_private_spt(struct kvm_mmu_page *sp)
 {
 }
+
+static inline gfn_t kvm_gfn_for_root(struct kvm *kvm, struct kvm_mmu_page =
*root,
+				     gfn_t gfn)
+{
+	return gfn;
+}
 #endif
=20
 static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page =
*sp)
@@ -360,12 +377,12 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vc=
pu *vcpu, gpa_t cr2_or_gpa,
 		.max_level =3D vcpu->kvm->arch.tdp_max_page_level,
 		.req_level =3D PG_LEVEL_4K,
 		.goal_level =3D PG_LEVEL_4K,
-		.is_private =3D kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
+		.is_private =3D kvm_is_private_gpa(vcpu->kvm, cr2_or_gpa),
 	};
 	int r;
=20
 	if (vcpu->arch.mmu->root_role.direct) {
-		fault.gfn =3D fault.addr >> PAGE_SHIFT;
+		fault.gfn =3D gpa_to_gfn(fault.addr) & ~kvm_gfn_shared_mask(vcpu->kvm);
 		fault.slot =3D kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
 	}
=20
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index 9e56a5b1024c..eab62baf8549 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -71,7 +71,7 @@ struct tdp_iter {
 	tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL];
 	/* A pointer to the current SPTE */
 	tdp_ptep_t sptep;
-	/* The lowest GFN mapped by the current SPTE */
+	/* The lowest GFN (shared bits included) mapped by the current SPTE */
 	gfn_t gfn;
 	/* The level of the root page given to the iterator */
 	int root_level;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 03a16f8ee8c7..106e858ee39a 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -270,6 +270,9 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm=
_vcpu *vcpu,
 	sp->spt =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
 	sp->role =3D role;
=20
+	if (kvm_mmu_page_role_is_private(role))
+		kvm_mmu_alloc_private_spt(vcpu, sp);
+
 	return sp;
 }
=20
@@ -292,7 +295,8 @@ static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, td=
p_ptep_t sptep,
 	trace_kvm_mmu_get_page(sp, true);
 }
=20
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
+static struct kvm_mmu_page *kvm_tdp_mmu_get_vcpu_root(struct kvm_vcpu *vcp=
u,
+						      bool private)
 {
 	union kvm_mmu_page_role role =3D vcpu->arch.mmu->root_role;
 	struct kvm *kvm =3D vcpu->kvm;
@@ -304,6 +308,8 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vc=
pu)
 	 * Check for an existing root before allocating a new one.  Note, the
 	 * role check prevents consuming an invalid root.
 	 */
+	if (private)
+		kvm_mmu_page_role_set_private(&role);
 	for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) {
 		if (root->role.word =3D=3D role.word &&
 		    kvm_tdp_mmu_get_root(root))
@@ -320,11 +326,17 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *=
vcpu)
 	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
=20
 out:
-	return __pa(root->spt);
+	return root;
+}
+
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu, bool private)
+{
+	return __pa(kvm_tdp_mmu_get_vcpu_root(vcpu, private)->spt);
 }
=20
 static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
-				u64 old_spte, u64 new_spte, int level,
+				u64 old_spte, u64 new_spte,
+				union kvm_mmu_page_role role,
 				bool shared);
=20
 static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int =
level)
@@ -351,6 +363,8 @@ static void handle_changed_spte_dirty_log(struct kvm *k=
vm, int as_id, gfn_t gfn,
=20
 	if ((!is_writable_pte(old_spte) || pfn_changed) &&
 	    is_writable_pte(new_spte)) {
+		/* For memory slot operations, use GFN without aliasing */
+		gfn =3D gfn & ~kvm_gfn_shared_mask(kvm);
 		slot =3D __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
 		mark_page_dirty_in_slot(kvm, slot, gfn);
 	}
@@ -491,12 +505,78 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt=
ep_t pt, bool shared)
 							  REMOVED_SPTE, level);
 		}
 		handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
-				    old_spte, REMOVED_SPTE, level, shared);
+				    old_spte, REMOVED_SPTE, sp->role,
+				    shared);
+	}
+
+	if (is_private_sp(sp) &&
+	    WARN_ON(static_call(kvm_x86_free_private_spt)(kvm, sp->gfn, sp->role.=
level,
+							  kvm_mmu_private_spt(sp)))) {
+		/*
+		 * Failed to unlink Secure EPT page and there is nothing to do
+		 * further.  Intentionally leak the page to prevent the kernel
+		 * from accessing the encrypted page.
+		 */
+		kvm_mmu_init_private_spt(sp, NULL);
 	}
=20
 	call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
 }
=20
+static void *get_private_spt(gfn_t gfn, u64 new_spte, int level)
+{
+	if (is_shadow_present_pte(new_spte) && !is_last_spte(new_spte, level)) {
+		struct kvm_mmu_page *sp =3D to_shadow_page(pfn_to_hpa(spte_to_pfn(new_sp=
te)));
+		void *private_spt =3D kvm_mmu_private_spt(sp);
+
+		WARN_ON_ONCE(!private_spt);
+		WARN_ON_ONCE(sp->role.level + 1 !=3D level);
+		WARN_ON_ONCE(sp->gfn !=3D gfn);
+		return private_spt;
+	}
+
+	return NULL;
+}
+
+static void handle_removed_private_spte(struct kvm *kvm, gfn_t gfn,
+					u64 old_spte, u64 new_spte,
+					int level)
+{
+	bool was_present =3D is_shadow_present_pte(old_spte);
+	bool is_present =3D is_shadow_present_pte(new_spte);
+	bool was_leaf =3D was_present && is_last_spte(old_spte, level);
+	bool is_leaf =3D is_present && is_last_spte(new_spte, level);
+	kvm_pfn_t old_pfn =3D spte_to_pfn(old_spte);
+	kvm_pfn_t new_pfn =3D spte_to_pfn(new_spte);
+	int ret;
+
+	/* Ignore change of software only bits. e.g. host_writable */
+	if (was_leaf =3D=3D is_leaf && was_present =3D=3D is_present)
+		return;
+
+	/*
+	 * Allow only leaf page to be zapped.  Reclaim Non-leaf page tables at
+	 * destroying VM.
+	 */
+	WARN_ON_ONCE(is_present);
+	if (!was_leaf)
+		return;
+
+	/* non-present -> non-present doesn't make sense. */
+	KVM_BUG_ON(!was_present, kvm);
+	KVM_BUG_ON(new_pfn, kvm);
+
+	/* Zapping leaf spte is allowed only when write lock is held. */
+	lockdep_assert_held_write(&kvm->mmu_lock);
+	ret =3D static_call(kvm_x86_zap_private_spte)(kvm, gfn, level);
+	/* Because write lock is held, operation should success. */
+	if (KVM_BUG_ON(ret, kvm))
+		return;
+
+	ret =3D static_call(kvm_x86_remove_private_spte)(kvm, gfn, level, old_pfn=
);
+	KVM_BUG_ON(ret, kvm);
+}
+
 /**
  * __handle_changed_spte - handle bookkeeping associated with an SPTE chan=
ge
  * @kvm: kvm instance
@@ -504,7 +584,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep=
_t pt, bool shared)
  * @gfn: the base GFN that was mapped by the SPTE
  * @old_spte: The value of the SPTE before the change
  * @new_spte: The value of the SPTE after the change
- * @level: the level of the PT the SPTE is part of in the paging structure
+ * @role: the role of the PT the SPTE is part of in the paging structure
  * @shared: This operation may not be running under the exclusive use of
  *	    the MMU lock and the operation must synchronize with other
  *	    threads that might be modifying SPTEs.
@@ -513,14 +593,18 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt=
ep_t pt, bool shared)
  * This function must be called for all TDP SPTE modifications.
  */
 static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
-				  u64 old_spte, u64 new_spte, int level,
-				  bool shared)
+				 u64 old_spte, u64 new_spte,
+				 union kvm_mmu_page_role role, bool shared)
 {
+	bool is_private =3D kvm_mmu_page_role_is_private(role);
+	int level =3D role.level;
 	bool was_present =3D is_shadow_present_pte(old_spte);
 	bool is_present =3D is_shadow_present_pte(new_spte);
 	bool was_leaf =3D was_present && is_last_spte(old_spte, level);
 	bool is_leaf =3D is_present && is_last_spte(new_spte, level);
-	bool pfn_changed =3D spte_to_pfn(old_spte) !=3D spte_to_pfn(new_spte);
+	kvm_pfn_t old_pfn =3D spte_to_pfn(old_spte);
+	kvm_pfn_t new_pfn =3D spte_to_pfn(new_spte);
+	bool pfn_changed =3D old_pfn !=3D new_pfn;
=20
 	WARN_ON(level > PT64_ROOT_MAX_LEVEL);
 	WARN_ON(level < PG_LEVEL_4K);
@@ -587,7 +671,7 @@ static void __handle_changed_spte(struct kvm *kvm, int =
as_id, gfn_t gfn,
=20
 	if (was_leaf && is_dirty_spte(old_spte) &&
 	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
-		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
+		kvm_set_pfn_dirty(old_pfn);
=20
 	/*
 	 * Recursively handle child PTs if the change removed a subtree from
@@ -596,19 +680,88 @@ static void __handle_changed_spte(struct kvm *kvm, in=
t as_id, gfn_t gfn,
 	 * pages are kernel allocations and should never be migrated.
 	 */
 	if (was_present && !was_leaf &&
-	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
+	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) {
+		KVM_BUG_ON(is_private !=3D is_private_sptep(spte_to_child_pt(old_spte, l=
evel)),
+			   kvm);
 		handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
+	}
+
+	/*
+	 * Secure-EPT requires to remove Secure-EPT tables after removing
+	 * children.  hooks after after handling lower page table by above
+	 * handle_remove_pt().
+	 */
+	if (is_private && !is_removed_spte(new_spte) && !is_present)
+		handle_removed_private_spte(kvm, gfn, old_spte, new_spte, role.level);
 }
=20
 static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
-				u64 old_spte, u64 new_spte, int level,
-				bool shared)
+			       u64 old_spte, u64 new_spte,
+			       union kvm_mmu_page_role role,
+			       bool shared)
 {
-	__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
-			      shared);
-	handle_changed_spte_acc_track(old_spte, new_spte, level);
+	__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, role, shared);
+
+	handle_changed_spte_acc_track(old_spte, new_spte, role.level);
 	handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
-				      new_spte, level);
+				      new_spte, role.level);
+}
+
+static int __must_check __set_private_spte_present(struct kvm *kvm, tdp_pt=
ep_t sptep,
+						   gfn_t gfn, u64 old_spte,
+						   u64 new_spte, int level)
+{
+	bool was_present =3D is_shadow_present_pte(old_spte);
+	bool is_present =3D is_shadow_present_pte(new_spte);
+	bool is_leaf =3D is_present && is_last_spte(new_spte, level);
+	kvm_pfn_t new_pfn =3D spte_to_pfn(new_spte);
+	int ret =3D 0;
+
+	lockdep_assert_held(&kvm->mmu_lock);
+	/* TDP MMU doesn't change present -> present */
+	KVM_BUG_ON(was_present, kvm);
+
+	/*
+	 * Use different call to either set up middle level
+	 * private page table, or leaf.
+	 */
+	if (is_leaf)
+		ret =3D static_call(kvm_x86_set_private_spte)(kvm, gfn, level, new_pfn);
+	else {
+		void *private_spt =3D get_private_spt(gfn, new_spte, level);
+
+		KVM_BUG_ON(!private_spt, kvm);
+		ret =3D static_call(kvm_x86_link_private_spt)(kvm, gfn, level, private_s=
pt);
+	}
+
+	return ret;
+}
+
+static int __must_check set_private_spte_present(struct kvm *kvm, tdp_ptep=
_t sptep,
+						 gfn_t gfn, u64 old_spte,
+						 u64 new_spte, int level)
+{
+	int ret;
+
+	/*
+	 * For private page table, callbacks are needed to propagate SPTE
+	 * change into the protected page table.  In order to atomically update
+	 * both the SPTE and the protected page tables with callbacks, utilize
+	 * freezing SPTE.
+	 * - Freeze the SPTE. Set entry to REMOVED_SPTE.
+	 * - Trigger callbacks for protected page tables.
+	 * - Unfreeze the SPTE.  Set the entry to new_spte.
+	 */
+	lockdep_assert_held(&kvm->mmu_lock);
+	if (!try_cmpxchg64(sptep, &old_spte, REMOVED_SPTE))
+		return -EBUSY;
+
+	ret =3D __set_private_spte_present(kvm, sptep, gfn, old_spte, new_spte, l=
evel);
+	if (ret)
+		__kvm_tdp_mmu_write_spte(sptep, old_spte);
+	else
+		__kvm_tdp_mmu_write_spte(sptep, new_spte);
+	return ret;
 }
=20
 /*
@@ -633,6 +786,7 @@ static inline int __must_check tdp_mmu_set_spte_atomic(=
struct kvm *kvm,
 						       u64 new_spte)
 {
 	u64 *sptep =3D rcu_dereference(iter->sptep);
+	bool freezed =3D false;
=20
 	/*
 	 * The caller is responsible for ensuring the old SPTE is not a REMOVED
@@ -644,17 +798,33 @@ static inline int __must_check tdp_mmu_set_spte_atomi=
c(struct kvm *kvm,
=20
 	lockdep_assert_held_read(&kvm->mmu_lock);
=20
-	/*
-	 * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and
-	 * does not hold the mmu_lock.
-	 */
-	if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte))
-		return -EBUSY;
+	if (is_private_sptep(iter->sptep) && !is_removed_spte(new_spte)) {
+		int ret;
+
+		if (is_shadow_present_pte(new_spte)) {
+			ret =3D set_private_spte_present(kvm, iter->sptep, iter->gfn,
+						       iter->old_spte, new_spte, iter->level);
+			if (ret)
+				return ret;
+		} else {
+			if (!try_cmpxchg64(sptep, &iter->old_spte, REMOVED_SPTE))
+				return -EBUSY;
+			freezed =3D true;
+		}
+	} else {
+		/*
+		 * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs
+		 * and does not hold the mmu_lock.
+		 */
+		if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte))
+			return -EBUSY;
+	}
=20
 	__handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
-			      new_spte, iter->level, true);
+			      new_spte, sptep_to_sp(sptep)->role, true);
 	handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level);
-
+	if (freezed)
+		__kvm_tdp_mmu_write_spte(sptep, new_spte);
 	return 0;
 }
=20
@@ -716,6 +886,8 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_i=
d, tdp_ptep_t sptep,
 			      u64 old_spte, u64 new_spte, gfn_t gfn, int level,
 			      bool record_acc_track, bool record_dirty_log)
 {
+	union kvm_mmu_page_role role;
+
 	lockdep_assert_held_write(&kvm->mmu_lock);
=20
 	/*
@@ -728,8 +900,17 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_=
id, tdp_ptep_t sptep,
 	WARN_ON(is_removed_spte(old_spte) || is_removed_spte(new_spte));
=20
 	old_spte =3D kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
+	if (is_private_sptep(sptep) && !is_removed_spte(new_spte) &&
+	    is_shadow_present_pte(new_spte)) {
+		lockdep_assert_held_write(&kvm->mmu_lock);
+		/* Because write spin lock is held, no race.  It should success. */
+		KVM_BUG_ON(__set_private_spte_present(kvm, sptep, gfn, old_spte,
+						      new_spte, level), kvm);
+	}
=20
-	__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
+	role =3D sptep_to_sp(sptep)->role;
+	role.level =3D level;
+	__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, role, false);
=20
 	if (record_acc_track)
 		handle_changed_spte_acc_track(old_spte, new_spte, level);
@@ -781,8 +962,11 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struc=
t kvm *kvm,
 			continue;					\
 		else
=20
-#define tdp_mmu_for_each_pte(_iter, _mmu, _start, _end)		\
-	for_each_tdp_pte(_iter, to_shadow_page(_mmu->root.hpa), _start, _end)
+#define tdp_mmu_for_each_pte(_iter, _mmu, _private, _start, _end)	\
+	for_each_tdp_pte(_iter,						\
+		 to_shadow_page((_private) ? _mmu->private_root_hpa :	\
+				_mmu->root.hpa),			\
+		_start, _end)
=20
 /*
  * Yield if the MMU lock is contended or this thread needs to return contr=
ol
@@ -945,6 +1129,14 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct=
 kvm_mmu_page *root,
 	if (!zap_private && is_private_sp(root))
 		return false;
=20
+	/*
+	 * start and end doesn't have GFN shared bit.  This function zaps
+	 * a region including alias.  Adjust shared bit of [start, end) if the
+	 * root is shared.
+	 */
+	start =3D kvm_gfn_for_root(kvm, root, start);
+	end =3D kvm_gfn_for_root(kvm, root, end);
+
 	rcu_read_lock();
=20
 	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
@@ -1075,10 +1267,19 @@ static int tdp_mmu_map_handle_target_level(struct k=
vm_vcpu *vcpu,
=20
 	if (unlikely(!fault->slot))
 		new_spte =3D make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
-	else
-		wrprot =3D make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
-					 fault->pfn, iter->old_spte, fault->prefetch, true,
-					 fault->map_writable, &new_spte);
+	else {
+		unsigned long pte_access =3D ACC_ALL;
+
+		/* TDX shared GPAs are no executable, enforce this for the SDV. */
+		if (kvm_gfn_shared_mask(vcpu->kvm) && !fault->is_private)
+			pte_access &=3D ~ACC_EXEC_MASK;
+
+		wrprot =3D make_spte(vcpu, sp, fault->slot, pte_access,
+				   gpa_to_gfn(fault->addr)/* include shared bit */,
+				   fault->pfn, iter->old_spte,
+				   fault->prefetch, true, fault->map_writable,
+				   &new_spte);
+	}
=20
 	if (new_spte =3D=3D iter->old_spte)
 		ret =3D RET_PF_SPURIOUS;
@@ -1157,6 +1358,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm=
_page_fault *fault)
 	struct kvm *kvm =3D vcpu->kvm;
 	struct tdp_iter iter;
 	struct kvm_mmu_page *sp;
+	gfn_t raw_gfn;
+	bool is_private =3D fault->is_private;
 	int ret =3D RET_PF_RETRY;
=20
 	kvm_mmu_hugepage_adjust(vcpu, fault);
@@ -1165,7 +1368,17 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kv=
m_page_fault *fault)
=20
 	rcu_read_lock();
=20
-	tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
+	raw_gfn =3D gpa_to_gfn(fault->addr);
+
+	if (is_error_noslot_pfn(fault->pfn) ||
+	    !kvm_pfn_to_refcounted_page(fault->pfn)) {
+		if (is_private) {
+			rcu_read_unlock();
+			return -EFAULT;
+		}
+	}
+
+	tdp_mmu_for_each_pte(iter, mmu, is_private, raw_gfn, raw_gfn + 1) {
 		int r;
=20
 		if (fault->nx_huge_page_workaround_enabled)
@@ -1195,9 +1408,14 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kv=
m_page_fault *fault)
=20
 		sp->nx_huge_page_disallowed =3D fault->huge_page_disallowed;
=20
-		if (is_shadow_present_pte(iter.old_spte))
+		if (is_shadow_present_pte(iter.old_spte)) {
+			/*
+			 * TODO: large page support.
+			 * Doesn't support large page for TDX now
+			 */
+			KVM_BUG_ON(is_private_sptep(iter.sptep), vcpu->kvm);
 			r =3D tdp_mmu_split_huge_page(kvm, &iter, sp, true);
-		else
+		} else
 			r =3D tdp_mmu_link_sp(kvm, &iter, sp, true);
=20
 		/*
@@ -1434,6 +1652,8 @@ static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_sp=
lit(gfp_t gfp, union kvm_mm
=20
 	sp->role =3D role;
 	sp->spt =3D (void *)__get_free_page(gfp);
+	/* TODO: large page support for private GPA. */
+	WARN_ON_ONCE(kvm_mmu_page_role_is_private(role));
 	if (!sp->spt) {
 		kmem_cache_free(mmu_page_header_cache, sp);
 		return NULL;
@@ -1449,6 +1669,11 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spl=
it(struct kvm *kvm,
 	union kvm_mmu_page_role role =3D tdp_iter_child_role(iter);
 	struct kvm_mmu_page *sp;
=20
+	KVM_BUG_ON(kvm_mmu_page_role_is_private(role) !=3D
+		   is_private_sptep(iter->sptep), kvm);
+	/* TODO: Large page isn't supported for private SPTE yet. */
+	KVM_BUG_ON(kvm_mmu_page_role_is_private(role), kvm);
+
 	/*
 	 * Since we are allocating while under the MMU lock we have to be
 	 * careful about GFP flags. Use GFP_NOWAIT to avoid blocking on direct
@@ -1877,7 +2102,7 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 a=
ddr, u64 *sptes,
=20
 	*root_level =3D vcpu->arch.mmu->root_role.level;
=20
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	tdp_mmu_for_each_pte(iter, mmu, false, gfn, gfn + 1) {
 		leaf =3D iter.level;
 		sptes[leaf] =3D iter.old_spte;
 	}
@@ -1904,7 +2129,10 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_v=
cpu *vcpu, u64 addr,
 	gfn_t gfn =3D addr >> PAGE_SHIFT;
 	tdp_ptep_t sptep =3D NULL;
=20
-	tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+	/* fast page fault for private GPA isn't supported. */
+	WARN_ON_ONCE(kvm_is_private_gpa(vcpu->kvm, addr));
+
+	tdp_mmu_for_each_pte(iter, mmu, false, gfn, gfn + 1) {
 		*spte =3D iter.old_spte;
 		sptep =3D iter.sptep;
 	}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index b32cbdf2f675..3ae3c3b8642a 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -10,7 +10,7 @@
 int kvm_mmu_init_tdp_mmu(struct kvm *kvm);
 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
=20
-hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
+hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu, bool private);
=20
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *=
root)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d6db3f19ad74..42f01d0d6a49 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -206,6 +206,7 @@ struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn)
=20
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(kvm_pfn_to_refcounted_page);
=20
 /*
  * Switches to specified vcpu, until a matching vcpu_put()
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 984A7C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230471AbjB0I07 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:59 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57828 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231250AbjB0I0B (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:01 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5BE51DBBD;
        Mon, 27 Feb 2023 00:24:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486273; x=1709022273;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=r1B2F+HLEioNnoLhCM1zZNobB3Qg1+lQsIfJnAC3q9A=;
  b=f5aUgPB0aYtkFluU88i08tB8s1NYPJL2K6bAhbKX+0oszxrVo/l52xij
   vBbFDrDWN2CB2yxsxRGNXoGTnxjGPURhpTcyDEdqockbWmbBv5bx/yXSh
   6sKwavX1obnW1uIbOMvDNYM2Wx9kxFXxZ82tnjKgTySZpJwN5yhUfAGuS
   p9qc2FlXhm27sV+ltX9of9tzVcF3HhrpxWZJoi0bs6B4Q/CRYGOU0PSmf
   lZu+DkSrYWFZx7t7/ez23WHFfimLdb3IqpUiVj4dmGFA4CnNTrqWigy6a
   FKoXihgb5lHi6uR6YGVRZBLrV/fnDak4EWT6+JmccYJkBt8loswFdgFdc
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608845"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608845"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:10 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242191"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242191"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:08 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 041/106] [MARKER] The start of TDX KVM patch series: TDX
 EPT violation
Date: Mon, 27 Feb 2023 00:22:40 -0800
Message-Id: 
 <a0b892434a60d2ce4d63099a2c342a59ddbb94a0.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TDX EPT
violation.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index f4aba85148e3..9b3ab0363184 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -19,11 +19,11 @@ Patch Layer status
 * TDX architectural definitions:        Applied
 * TD VM creation/destruction:           Applied
 * TD vcpu creation/destruction:         Applied
-* TDX EPT violation:                    Not yet
+* TDX EPT violation:                    Applying
 * TD finalization:                      Not yet
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
 * KVM TDP refactoring for TDX:          Applied
-* KVM TDP MMU hooks:                    Applying
+* KVM TDP MMU hooks:                    Applied
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9915AC64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:26:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231371AbjB0I0l (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:26:41 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55258 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231234AbjB0IZ7 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:25:59 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6F601CF45;
        Mon, 27 Feb 2023 00:24:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486268; x=1709022268;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=8S92uRGEDTJGnSHCFrEvDnFy25uhgD3zt+1Ujpc9p0Q=;
  b=OEi709ICY85xc26WC5b9NDGfpV4HQlAQaaO2mvqiWgNgxJzlBQMfKWKR
   HBzzfK2tcJkaXuvyVjVD24YI7Gih7NMolnGMNklXeDrrZQuGUA2J7YCBf
   Oe3gvO/9Ol7M0TPL4ZyTIti2OazpVeEdQTEMlyed004rMWpnYR1NeHr7k
   Cq+2GYtT85FRtcUNufs0TCDXnzx6JirYVnSmxZ3wFj2m/85fhC7uzbBXV
   ltFVwk1ILf8oQh9MIm6X/WgiudPBNE65AsOSy5qN+Q0YiLaEeoZD1a37s
   ygNlV5fO1oOsPWTD5dXGmx1DRUDBDlVEqk16GwGmjklh/ejvatkck5/8I
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608841"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608841"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:10 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242195"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242195"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:08 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 042/106] KVM: x86/mmu: Disallow dirty logging for x86 TDX
Date: Mon, 27 Feb 2023 00:22:41 -0800
Message-Id: 
 <40b8dbc590de6b75a0778d3fd3d5781848a642ef.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX doesn't support dirty logging.  Report dirty logging isn't supported so
that device model, for example qemu, can properly handle it.  Silently
ignore on dirty logging on private GFNs of TDX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c     |  3 +++
 arch/x86/kvm/mmu/tdp_mmu.c | 36 +++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c         |  8 ++++++++
 include/linux/kvm_host.h   |  1 +
 virt/kvm/kvm_main.c        | 10 +++++++++-
 5 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 621ac8ea54d6..6421f92e618e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6690,6 +6690,9 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *=
kvm,
 	for_each_rmap_spte(rmap_head, &iter, sptep) {
 		sp =3D sptep_to_sp(sptep);
=20
+		/* Private page dirty logging is not supported yet. */
+		KVM_BUG_ON(is_private_sptep(sptep), kvm);
+
 		/*
 		 * We cannot do huge page mapping for indirect shadow pages,
 		 * which are found on the last rmap (level =3D 1) when not using
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 106e858ee39a..573f38e472f3 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1476,9 +1476,27 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(s=
truct kvm *kvm,
 	 * into this helper allow blocking; it'd be dead, wasteful code.
 	 */
 	for_each_tdp_mmu_root(kvm, root, range->slot->as_id) {
+		gfn_t start;
+		gfn_t end;
+
+		/*
+		 * For now, operation on private GPA, e.g. dirty page logging,
+		 * isn't supported yet.
+		 */
+		if (is_private_sp(root))
+			continue;
+
 		rcu_read_lock();
=20
-		tdp_root_for_each_leaf_pte(iter, root, range->start, range->end)
+		/*
+		 * For TDX shared mapping, set GFN shared bit to the range,
+		 * so the handler() doesn't need to set it, to avoid duplicated
+		 * code in multiple handler()s.
+		 */
+		start =3D kvm_gfn_shared(kvm, range->start);
+		end =3D kvm_gfn_shared(kvm, range->end);
+
+		tdp_root_for_each_leaf_pte(iter, root, start, end)
 			ret |=3D handler(kvm, &iter, range);
=20
 		rcu_read_unlock();
@@ -1961,6 +1979,13 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *k=
vm,
 	struct kvm_mmu_page *root;
=20
 	lockdep_assert_held_write(&kvm->mmu_lock);
+	/*
+	 * First TDX generation doesn't support clearing dirty bit,
+	 * since there's no secure EPT API to support it.  For now silently
+	 * ignore KVM_CLEAR_DIRTY_LOG.
+	 */
+	if (!kvm_arch_dirty_log_supported(kvm))
+		return;
 	for_each_tdp_mmu_root(kvm, root, slot->as_id)
 		clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
 }
@@ -2080,6 +2105,15 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
 	bool spte_set =3D false;
=20
 	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	/*
+	 * First TDX generation doesn't support write protecting private
+	 * mappings, silently ignore the request.  KVM_GET_DIRTY_LOG etc
+	 * can reach here, no warning.
+	 */
+	if (!kvm_arch_dirty_log_supported(kvm))
+		return false;
+
 	for_each_tdp_mmu_root(kvm, root, slot->as_id)
 		spte_set |=3D write_protect_gfn(kvm, root, gfn, min_level);
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b5b51342c9a9..89ee421e0cbf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12549,6 +12549,9 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kv=
m,
 	u32 new_flags =3D new ? new->flags : 0;
 	bool log_dirty_pages =3D new_flags & KVM_MEM_LOG_DIRTY_PAGES;
=20
+	if (!kvm_arch_dirty_log_supported(kvm) && log_dirty_pages)
+		return;
+
 	/*
 	 * Update CPU dirty logging if dirty logging is being toggled.  This
 	 * applies to all operations.
@@ -13521,6 +13524,11 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, un=
signed int size,
 }
 EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
=20
+bool kvm_arch_dirty_log_supported(struct kvm *kvm)
+{
+	return kvm->arch.vm_type !=3D KVM_X86_PROTECTED_VM;
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5e4bf78025e3..04debfe30572 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1495,6 +1495,7 @@ bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcp=
u *vcpu);
 int kvm_arch_post_init_vm(struct kvm *kvm);
 void kvm_arch_pre_destroy_vm(struct kvm *kvm);
 int kvm_arch_create_vm_debugfs(struct kvm *kvm);
+bool kvm_arch_dirty_log_supported(struct kvm *kvm);
=20
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 42f01d0d6a49..e9f8225f3406 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1700,10 +1700,18 @@ static void kvm_replace_memslot(struct kvm *kvm,
 	}
 }
=20
+bool __weak kvm_arch_dirty_log_supported(struct kvm *kvm)
+{
+	return true;
+}
+
 static int check_memory_region_flags(struct kvm *kvm,
 				     const struct kvm_userspace_memory_region2 *mem)
 {
-	u32 valid_flags =3D KVM_MEM_LOG_DIRTY_PAGES;
+	u32 valid_flags =3D 0;
+
+	if (kvm_arch_dirty_log_supported(kvm))
+		valid_flags |=3D KVM_MEM_LOG_DIRTY_PAGES;
=20
 	if (kvm_arch_has_private_mem(kvm))
 		valid_flags |=3D KVM_MEM_PRIVATE;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0995AC7EE2E
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:03 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231424AbjB0I1B (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:01 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58160 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231265AbjB0I0D (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:03 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC7951A494;
        Mon, 27 Feb 2023 00:24:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486274; x=1709022274;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=bAyBc4s3tiQ/01jXFwZRD2ovwWFrfG+pQJPt5k9h7e8=;
  b=ksm94Tzgna+Ram9a/aL4ZZ9uNiW+5dYXvxxSPywzF5PGBboiH0b3jZhV
   6PCPGwhSYd0LfcaxUCZJJ+JzX29A+PwEDfdJlHwepYVCE5tQgZ+lEy76+
   iHW4SkSROS6576it/hg5kWrOVo1Re6dE+ZmOqom3Z+VuufsXU5U5MY5KZ
   g4C3rS4ezNchQebUiwBd5jW4HpXTqVXiRbq1YSPn4nXAtcij/l6ljo00P
   tXR1kIuadLP59iZ4RaiqY697DpErhfcFc0uOpD1pmK8wCCmF6wS8AiDp/
   xosFFzS+USolOhbwUk015944siqedhvwlp6VJO7S1Y+Q0B64GReRaTCd7
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608853"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608853"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:10 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242199"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242199"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:08 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Yan Zhao <yan.y.zhao@intel.com>,
        Yuan Yao <yuan.yao@linux.intel.com>
Subject: [PATCH v12 043/106] KVM: x86/mmu: TDX: Do not enable page track for
 TD guest
Date: Mon, 27 Feb 2023 00:22:42 -0800
Message-Id: 
 <29d0cc81c60c2d48c47c2a8fd9243ac9b6bc0e63.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Yan Zhao <yan.y.zhao@intel.com>

TDX does not support write protection and hence page track.
Though !tdp_enabled and kvm_shadow_root_allocated(kvm) are always false
for TD guest, should also return false when external write tracking is
enabled.

Cc: Yuan Yao <yuan.yao@linux.intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mmu/page_track.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 0a2ac438d647..d738d318ce17 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -22,6 +22,9 @@
=20
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 {
+	if (kvm->arch.vm_type =3D=3D KVM_X86_PROTECTED_VM)
+		return false;
+
 	return IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) ||
 	       !tdp_enabled || kvm_shadow_root_allocated(kvm);
 }
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C4C8FC64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:19 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231481AbjB0I1R (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:17 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59310 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231190AbjB0I0S (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:18 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 776F61E1E7;
        Mon, 27 Feb 2023 00:24:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486284; x=1709022284;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=OisPSAN9WpQglyBtnlIoqac+2UNEIFCqzm+gT4fkGwU=;
  b=mcZvQOGq2ll5YLtdu0YIT6Muzo2mo3Lodyo6YQSBIENjB8L0R0iLxshu
   yTSz5gA2hpOrxaD8PJ/UkBDnPJ4paNRpmyzX0ZrdUXkjmAXnBcdZnHJiP
   sP8CW/1UKC3/9RQnD/iMODOV1CNLbQSsoyALpHUT2ox7M43ycCKvd0URl
   1djGbUIoOCiyXWyirTMVz+MDzrFAOy5t8U3RfVVf4ZcVfCYMzUSGswnoh
   rXkyNEytWoreS/r8g5KjJAmq/2GGBKYNRcJCRuODPSEWtkbOhvrQfvNFf
   6L1SU4UtZdEaiFXbCNTGVSF3B3MWGEotJGicdhRF+LnMVkdAsVnzjv8vs
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608867"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608867"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:11 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242203"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242203"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:09 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 044/106] KVM: VMX: Split out guts of EPT violation to
 common/exposed function
Date: Mon, 27 Feb 2023 00:22:43 -0800
Message-Id: 
 <138391bda018cf7a91f309039784f054c5142a1a.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

The difference of TDX EPT violation is how to retrieve information, GPA,
and exit qualification.  To share the code to handle EPT violation, split
out the guts of EPT violation handler so that VMX/TDX exit handler can call
it after retrieving GPA and exit qualification.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/vmx/common.h | 33 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c    | 25 +++----------------------
 2 files changed, 36 insertions(+), 22 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/common.h

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
new file mode 100644
index 000000000000..235908f3e044
--- /dev/null
+++ b/arch/x86/kvm/vmx/common.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __KVM_X86_VMX_COMMON_H
+#define __KVM_X86_VMX_COMMON_H
+
+#include <linux/kvm_host.h>
+
+#include "mmu.h"
+
+static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t =
gpa,
+					     unsigned long exit_qualification)
+{
+	u64 error_code;
+
+	/* Is it a read fault? */
+	error_code =3D (exit_qualification & EPT_VIOLATION_ACC_READ)
+		     ? PFERR_USER_MASK : 0;
+	/* Is it a write fault? */
+	error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_WRITE)
+		      ? PFERR_WRITE_MASK : 0;
+	/* Is it a fetch fault? */
+	error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_INSTR)
+		      ? PFERR_FETCH_MASK : 0;
+	/* ept page table entry is present? */
+	error_code |=3D (exit_qualification & EPT_VIOLATION_RWX_MASK)
+		      ? PFERR_PRESENT_MASK : 0;
+
+	error_code |=3D (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) !=3D =
0 ?
+	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
+
+	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
+}
+
+#endif /* __KVM_X86_VMX_COMMON_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2afa29eaa258..c2f4d76f7902 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -51,6 +51,7 @@
 #include <asm/vmx.h>
=20
 #include "capabilities.h"
+#include "common.h"
 #include "cpuid.h"
 #include "hyperv.h"
 #include "kvm_onhyperv.h"
@@ -5760,11 +5761,8 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
=20
 static int handle_ept_violation(struct kvm_vcpu *vcpu)
 {
-	unsigned long exit_qualification;
+	unsigned long exit_qualification =3D vmx_get_exit_qual(vcpu);
 	gpa_t gpa;
-	u64 error_code;
-
-	exit_qualification =3D vmx_get_exit_qual(vcpu);
=20
 	/*
 	 * EPT violation happened while executing iret from NMI,
@@ -5779,23 +5777,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcp=
u)
=20
 	gpa =3D vmcs_read64(GUEST_PHYSICAL_ADDRESS);
 	trace_kvm_page_fault(vcpu, gpa, exit_qualification);
-
-	/* Is it a read fault? */
-	error_code =3D (exit_qualification & EPT_VIOLATION_ACC_READ)
-		     ? PFERR_USER_MASK : 0;
-	/* Is it a write fault? */
-	error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_WRITE)
-		      ? PFERR_WRITE_MASK : 0;
-	/* Is it a fetch fault? */
-	error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_INSTR)
-		      ? PFERR_FETCH_MASK : 0;
-	/* ept page table entry is present? */
-	error_code |=3D (exit_qualification & EPT_VIOLATION_RWX_MASK)
-		      ? PFERR_PRESENT_MASK : 0;
-
-	error_code |=3D (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) !=3D =
0 ?
-	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
-
 	vcpu->arch.exit_qualification =3D exit_qualification;
=20
 	/*
@@ -5809,7 +5790,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	if (unlikely(allow_smaller_maxphyaddr && kvm_vcpu_is_illegal_gpa(vcpu, gp=
a)))
 		return kvm_emulate_instruction(vcpu, 0);
=20
-	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
+	return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification);
 }
=20
 static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9E11FC7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231441AbjB0I1D (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:03 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55158 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231281AbjB0I0H (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:07 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 391CB1DBA1;
        Mon, 27 Feb 2023 00:24:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486279; x=1709022279;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=wPBml+MMAV/1PWip7l5RqLNhMgjPjuoG7zHBpGXgJGg=;
  b=nzPZOb9RLyafWsO2QMHyN3tXl4/VpcZlA9lVgbuL+hXVQDIAifxxwZZf
   5s48JTACiEtyiIpXA31Bw2yvBjppEgugAXGQnJWfHj41kaF7iU6jx8CP+
   t3J3bwFkIQe3myG5N8Q2gWfIki3tpiMhcQ6fLTyDHtB1JMKv9Jf0jnexu
   1qS3P5LzmJwxRsq4x4mQWtC38weO0JwMhCSGubKRw+HFm8SuBKN+Jgsuu
   J7gULa5vyzDNDvddoLBTDyrz6sz7PxXo64v5GJH3URdftJcb7h0qxlgQw
   wO8MgKBYnoUJnP3qnp9BSEXF/H0ZkXYKsAjAg8xnjTvL/QrOGhg0bXzAr
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608859"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608859"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:10 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242206"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242206"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:09 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 045/106] KVM: VMX: Move setting of EPT MMU masks to common
 VT-x code
Date: Mon, 27 Feb 2023 00:22:44 -0800
Message-Id: 
 <e7ba11d8e404096f22238947e0850ba294e0c61d.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

EPT MMU masks are used commonly for VMX and TDX.  The value needs to be
initialized in common code before both VMX/TDX-specific initialization
code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c | 9 +++++++++
 arch/x86/kvm/vmx/vmx.c  | 4 ----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 0cd85c96ed84..71fa6d27c0ef 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -5,6 +5,7 @@
 #include "mmu.h"
 #include "vmx.h"
 #include "nested.h"
+#include "mmu.h"
 #include "pmu.h"
 #include "tdx.h"
 #include "tdx_arch.h"
@@ -39,6 +40,14 @@ static __init int vt_hardware_setup(void)
=20
 	enable_tdx =3D enable_tdx && !tdx_hardware_setup(&vt_x86_ops);
=20
+	/*
+	 * As kvm_mmu_set_ept_masks() updates enable_mmio_caching, call it
+	 * before checking enable_mmio_caching.
+	 */
+	if (enable_ept)
+		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
+				      cpu_has_vmx_ept_execute_only());
+
 	/* TDX requires KVM TDP MMU and MMIO caching. */
 	if (enable_tdx && (!tdp_enabled || !enable_mmio_caching)) {
 		enable_tdx =3D false;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c2f4d76f7902..3ff3b33fe9af 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8351,10 +8351,6 @@ __init int vmx_hardware_setup(void)
=20
 	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
=20
-	if (enable_ept)
-		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
-				      cpu_has_vmx_ept_execute_only());
-
 	/*
 	 * Setup shadow_me_value/shadow_me_mask to include MKTME KeyID
 	 * bits to shadow_zero_check.
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 01949C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231462AbjB0I1K (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:10 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59254 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231287AbjB0I0H (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:07 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BFAF1E1F5;
        Mon, 27 Feb 2023 00:24:40 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486280; x=1709022280;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=kcN0sh5u0GjherE8wd3XD0GFtCM4wz9n5VCD6qLEbec=;
  b=gLFT6RvW6+xh0uQIVaPykT/0V15L15Cn1mu2IB+w1+ggWQj2lh10+LfV
   ShZbSRtiBurcn3DPmu3UgyK23mhk4gCG1eWpu6HiEkDdD6h40ljm6iq/4
   q6trWvLFBp9dihANH9goECjtgW8cTR88S4cqjFFQ6siar0t3rUvdze6ig
   KWr3bhRDKt4vODfiEoDDalCaIhkBe9TugZ1dCvF4BYxcYmah3KyB/gvxm
   pfcC1ywbTuAXErUkaU3VGRxFrgJsWkRnL6VCDe7Ln3fFMGV8G0dBnLYuC
   VR9/p0rv3KK9TYD2uixamy3ONwHBSMUSNhV8YERH3+T/Cn+NLSHCwyPzF
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608860"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608860"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:10 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242210"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242210"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:09 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 046/106] KVM: TDX: Add accessors VMX VMCS helpers
Date: Mon, 27 Feb 2023 00:22:45 -0800
Message-Id: 
 <c388ae69d1098e23d94b469b7e21495ac932897a.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX defines SEAMCALL APIs to access TDX control structures corresponding to
the VMX VMCS.  Introduce helper accessors to hide its SEAMCALL ABI details.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.h | 95 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 5fa4d3198873..7c8f5880d104 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -57,6 +57,101 @@ static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *=
vcpu)
 	return container_of(vcpu, struct vcpu_tdx, vcpu);
 }
=20
+static __always_inline void tdvps_vmcs_check(u32 field, u8 bits)
+{
+#define VMCS_ENC_ACCESS_TYPE_MASK	0x1UL
+#define VMCS_ENC_ACCESS_TYPE_FULL	0x0UL
+#define VMCS_ENC_ACCESS_TYPE_HIGH	0x1UL
+#define VMCS_ENC_ACCESS_TYPE(field)	((field) & VMCS_ENC_ACCESS_TYPE_MASK)
+
+	/* TDX is 64bit only.  HIGH field isn't supported. */
+	BUILD_BUG_ON_MSG(__builtin_constant_p(field) &&
+			 VMCS_ENC_ACCESS_TYPE(field) =3D=3D VMCS_ENC_ACCESS_TYPE_HIGH,
+			 "Read/Write to TD VMCS *_HIGH fields not supported");
+
+	BUILD_BUG_ON(bits !=3D 16 && bits !=3D 32 && bits !=3D 64);
+
+#define VMCS_ENC_WIDTH_MASK	GENMASK(14, 13)
+#define VMCS_ENC_WIDTH_16BIT	(0UL << 13)
+#define VMCS_ENC_WIDTH_64BIT	(1UL << 13)
+#define VMCS_ENC_WIDTH_32BIT	(2UL << 13)
+#define VMCS_ENC_WIDTH_NATURAL	(3UL << 13)
+#define VMCS_ENC_WIDTH(field)	((field) & VMCS_ENC_WIDTH_MASK)
+
+	/* TDX is 64bit only.  i.e. natural width =3D 64bit. */
+	BUILD_BUG_ON_MSG(bits !=3D 64 && __builtin_constant_p(field) &&
+			 (VMCS_ENC_WIDTH(field) =3D=3D VMCS_ENC_WIDTH_64BIT ||
+			  VMCS_ENC_WIDTH(field) =3D=3D VMCS_ENC_WIDTH_NATURAL),
+			 "Invalid TD VMCS access for 64-bit field");
+	BUILD_BUG_ON_MSG(bits !=3D 32 && __builtin_constant_p(field) &&
+			 VMCS_ENC_WIDTH(field) =3D=3D VMCS_ENC_WIDTH_32BIT,
+			 "Invalid TD VMCS access for 32-bit field");
+	BUILD_BUG_ON_MSG(bits !=3D 16 && __builtin_constant_p(field) &&
+			 VMCS_ENC_WIDTH(field) =3D=3D VMCS_ENC_WIDTH_16BIT,
+			 "Invalid TD VMCS access for 16-bit field");
+}
+
+static __always_inline void tdvps_state_non_arch_check(u64 field, u8 bits)=
 {}
+static __always_inline void tdvps_management_check(u64 field, u8 bits) {}
+
+#define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass)				\
+static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *t=
dx,	\
+							u32 field)		\
+{										\
+	struct tdx_module_output out;						\
+	u64 err;								\
+										\
+	tdvps_##lclass##_check(field, bits);					\
+	err =3D tdh_vp_rd(tdx->tdvpr_pa, TDVPS_##uclass(field), &out);		\
+	if (KVM_BUG_ON(err, tdx->vcpu.kvm)) {					\
+		pr_err("TDH_VP_RD["#uclass".0x%x] failed: 0x%llx\n",		\
+		       field, err);						\
+		return 0;							\
+	}									\
+	return (u##bits)out.r8;							\
+}										\
+static __always_inline void td_##lclass##_write##bits(struct vcpu_tdx *tdx=
,	\
+						      u32 field, u##bits val)	\
+{										\
+	struct tdx_module_output out;						\
+	u64 err;								\
+										\
+	tdvps_##lclass##_check(field, bits);					\
+	err =3D tdh_vp_wr(tdx->tdvpr_pa, TDVPS_##uclass(field), val,		\
+		      GENMASK_ULL(bits - 1, 0), &out);				\
+	if (KVM_BUG_ON(err, tdx->vcpu.kvm))					\
+		pr_err("TDH_VP_WR["#uclass".0x%x] =3D 0x%llx failed: 0x%llx\n",	\
+		       field, (u64)val, err);					\
+}										\
+static __always_inline void td_##lclass##_setbit##bits(struct vcpu_tdx *td=
x,	\
+						       u32 field, u64 bit)	\
+{										\
+	struct tdx_module_output out;						\
+	u64 err;								\
+										\
+	tdvps_##lclass##_check(field, bits);					\
+	err =3D tdh_vp_wr(tdx->tdvpr_pa, TDVPS_##uclass(field), bit, bit, &out);	\
+	if (KVM_BUG_ON(err, tdx->vcpu.kvm))					\
+		pr_err("TDH_VP_WR["#uclass".0x%x] |=3D 0x%llx failed: 0x%llx\n",	\
+		       field, bit, err);					\
+}										\
+static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *=
tdx,	\
+							 u32 field, u64 bit)	\
+{										\
+	struct tdx_module_output out;						\
+	u64 err;								\
+										\
+	tdvps_##lclass##_check(field, bits);					\
+	err =3D tdh_vp_wr(tdx->tdvpr_pa, TDVPS_##uclass(field), 0, bit, &out);	\
+	if (KVM_BUG_ON(err, tdx->vcpu.kvm))					\
+		pr_err("TDH_VP_WR["#uclass".0x%x] &=3D ~0x%llx failed: 0x%llx\n",	\
+		       field, bit,  err);					\
+}
+
+TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs);
+TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs);
+TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs);
+
 static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u3=
2 field)
 {
 	struct tdx_module_output out;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DBCE3C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:16 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231470AbjB0I1P (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:15 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56956 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230525AbjB0I0C (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:02 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0AF661E1CA;
        Mon, 27 Feb 2023 00:24:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486274; x=1709022274;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=wqhYAgEq0aQVzIcjQv09Ul3LyWHRxYoDhByn6KU8XzQ=;
  b=mkHf88mvn9Q4ZwIbzIrZy1pCRWopt9vkp5zBlmGIkmCJ+jST8CneIn89
   +zJ9cd1hl8AIRpob2v9v5C425a4Js1+BTtoY8+uQD+PVy/TIhzkc9Gp0A
   DWvQwNvwbd7W/vScs6LP/qrhtV9C0BBe0i//OHjXwYqUqEocl1b/zWisf
   x7ygNwVRMCCP2Mmn8K8VKjusyuStrwamlp6H40xaiyfr6fNNDhME9BuOY
   pZJ9ZMfZBD1Gdg9UslP5aTkjLb1Fjd+6zTP1e9mXJWubgW/ugxdJQrnGH
   TN2Jo4oHd3Uf2tULTzvfmVRAuiCPlENH99Q6epZz9YeSVQr2O9qhxQgiM
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608847"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608847"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:10 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242213"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242213"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:09 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 047/106] KVM: TDX: Add load_mmu_pgd method for TDX
Date: Mon, 27 Feb 2023 00:22:46 -0800
Message-Id: 
 <8a0c3eebc742ce4a5a596c601d462ca2e7d56425.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

For virtual IO, the guest TD shares guest pages with VMM without
encryption.  Shared EPT is used to map guest pages in unprotected way.

Add the VMCS field encoding for the shared EPTP, which will be used by
TDX to have separate EPT walks for private GPAs (existing EPTP) versus
shared GPAs (new shared EPTP).

Set shared EPT pointer value for the TDX guest to initialize TDX MMU.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/vmx.h |  1 +
 arch/x86/kvm/vmx/main.c    | 13 ++++++++++++-
 arch/x86/kvm/vmx/tdx.c     |  5 +++++
 arch/x86/kvm/vmx/x86_ops.h |  4 ++++
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 752d53652007..1205018b5b6b 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -234,6 +234,7 @@ enum vmcs_field {
 	TSC_MULTIPLIER_HIGH             =3D 0x00002033,
 	TERTIARY_VM_EXEC_CONTROL	=3D 0x00002034,
 	TERTIARY_VM_EXEC_CONTROL_HIGH	=3D 0x00002035,
+	SHARED_EPT_POINTER		=3D 0x0000203C,
 	PID_POINTER_TABLE		=3D 0x00002042,
 	PID_POINTER_TABLE_HIGH		=3D 0x00002043,
 	GUEST_PHYSICAL_ADDRESS          =3D 0x00002400,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 71fa6d27c0ef..68b91b1b2162 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -134,6 +134,17 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool =
init_event)
 	vmx_vcpu_reset(vcpu, init_event);
 }
=20
+static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
+			int pgd_level)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
+		return;
+	}
+
+	vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -267,7 +278,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.write_tsc_offset =3D vmx_write_tsc_offset,
 	.write_tsc_multiplier =3D vmx_write_tsc_multiplier,
=20
-	.load_mmu_pgd =3D vmx_load_mmu_pgd,
+	.load_mmu_pgd =3D vt_load_mmu_pgd,
=20
 	.check_intercept =3D vmx_check_intercept,
 	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 3f61cdc53c57..477ad69b1361 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -429,6 +429,11 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	return;
 }
=20
+void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
+{
+	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index c939c606b38f..d730c63185a9 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -156,6 +156,8 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
=20
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
+
+void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_leve=
l);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 -ENOSYS; }
 static inline void tdx_hardware_unsetup(void) {}
@@ -176,6 +178,8 @@ static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu)=
 {}
 static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
=20
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
+
+static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,=
 int root_level) {}
 #endif
=20
 #endif /* __KVM_X86_VMX_X86_OPS_H */
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 82B97C7EE2E
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229512AbjB0I1I (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:08 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55130 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230437AbjB0I0G (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:06 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFB291D90C;
        Mon, 27 Feb 2023 00:24:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486277; x=1709022277;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=hdxwm7yMgHpzCli9IIsSaeTMsG08HLO4AGUK13v/zFo=;
  b=WDIUTQGDmppkW3CPRVbwan9XnFwukzrhOsQ0Ai0KoBNbi4gFMs99jcsu
   uppbn3hLsqgq/QzpEuHcxrcz52Pz4JzZVzTlMOQLtZ+bE3Lhzb+VhavEo
   PjCG1fzCdd36jLm69iZ0WWMeQEMlWXtWcNbf47xe4guzP9FMuyLQGn0NM
   3izR2pVCncWRqCZbRS4Fq3oQMEMj94qBjGWeG2qlXHqXqb9qbYjOew5Eb
   uXLAcjgHWyl59gnlSzE8/3Ax3uZr0aZGP9tXflWlZaKuo33zcUL9ZoVdk
   yaXOV/mE2Ek4YdLHPRBxmdtfXfYk0YM0CzKeV1eNjKU1XV/EhoSKhIWUT
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608854"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608854"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:10 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242216"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242216"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:09 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Yuan Yao <yuan.yao@intel.com>
Subject: [PATCH v12 048/106] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY
 with operand SEPT
Date: Mon, 27 Feb 2023 00:22:47 -0800
Message-Id: 
 <91c327b370bf5e44df4d254bd0f3408efacc5c09.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Yuan Yao <yuan.yao@intel.com>

TDX module internally uses locks to protect internal resources.  It tries
to acquire the locks.  If it fails to obtain the lock, it returns
TDX_OPERAND_BUSY error without spin because its execution time limitation.

TDX SEAMCALL API reference describes what resources are used.  It's known
which TDX SEAMCALL can cause contention with which resources.  VMM can
avoid contention inside the TDX module by avoiding contentious TDX SEAMCALL
with, for example, spinlock.  Because OS knows better its process
scheduling and its scalability, a lock at OS/VMM layer would work better
than simply retrying TDX SEAMCALLs.

TDH.MEM.* API except for TDH.MEM.TRACK operates on a secure EPT tree and
the TDX module internally tries to acquire the lock of the secure EPT tree.
They return TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT in case of failure to
get the lock.  TDX KVM allows sept callbacks to return error so that TDP
MMU layer can retry.

TDH.VP.ENTER is an exception with zero-step attack mitigation.  Normally
TDH.VP.ENTER uses only TD vcpu resources and it doesn't cause contention.
When a zero-step attack is suspected, it obtains a secure EPT tree lock and
tracks the GPAs causing a secure EPT fault.  Thus TDG.VP.ENTER may result
in TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT.  Also TDH.MEM.* SEAMCALLs may
result in TDX_OPERAN_BUSY | TDX_OPERAND_ID_SEPT.

Retry TDX TDH.MEM.* API and TDH.VP.ENTER on the error because the error is
a rare event caused by zero-step attack mitigation and spinlock can not be
used for TDH.VP.ENTER due to indefinite time execution.

Signed-off-by: Yuan Yao <yuan.yao@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx_ops.h | 42 ++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
index 8cc2f01c509b..86330d0e4b22 100644
--- a/arch/x86/kvm/vmx/tdx_ops.h
+++ b/arch/x86/kvm/vmx/tdx_ops.h
@@ -18,6 +18,36 @@
=20
 void pr_tdx_error(u64 op, u64 error_code, const struct tdx_module_output *=
out);
=20
+/*
+ * TDX module acquires its internal lock for resources.  It doesn't spin t=
o get
+ * locks because of its restrictions of allowed execution time.  Instead, =
it
+ * returns TDX_OPERAND_BUSY with an operand id.
+ *
+ * Multiple VCPUs can operate on SEPT.  Also with zero-step attack mitigat=
ion,
+ * TDH.VP.ENTER may rarely acquire SEPT lock and release it when zero-step
+ * attack is suspected.  It results in TDX_OPERAND_BUSY | TDX_OPERAND_ID_S=
EPT
+ * with TDH.MEM.* operation.  Note: TDH.MEM.TRACK is an exception.
+ *
+ * Because TDP MMU uses read lock for scalability, spin lock around SEAMCA=
LL
+ * spoils TDP MMU effort.  Retry several times with the assumption that SE=
PT
+ * lock contention is rare.  But don't loop forever to avoid lockup.  Let =
TDP
+ * MMU retry.
+ */
+#define TDX_ERROR_SEPT_BUSY    (TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT)
+
+static inline u64 seamcall_sept(u64 op, u64 rcx, u64 rdx, u64 r8, u64 r9,
+				struct tdx_module_output *out)
+{
+#define SEAMCALL_RETRY_MAX     16
+	int retry =3D SEAMCALL_RETRY_MAX;
+	u64 ret;
+
+	do {
+		ret =3D __seamcall(op, rcx, rdx, r8, r9, out);
+	} while (ret =3D=3D TDX_ERROR_SEPT_BUSY && retry-- > 0);
+	return ret;
+}
+
 static inline u64 tdh_mng_addcx(hpa_t tdr, hpa_t addr)
 {
 	clflush_cache_range(__va(addr), PAGE_SIZE);
@@ -28,14 +58,14 @@ static inline u64 tdh_mem_page_add(hpa_t tdr, gpa_t gpa=
, hpa_t hpa, hpa_t source
 				   struct tdx_module_output *out)
 {
 	clflush_cache_range(__va(hpa), PAGE_SIZE);
-	return __seamcall(TDH_MEM_PAGE_ADD, gpa, tdr, hpa, source, out);
+	return seamcall_sept(TDH_MEM_PAGE_ADD, gpa, tdr, hpa, source, out);
 }
=20
 static inline u64 tdh_mem_sept_add(hpa_t tdr, gpa_t gpa, int level, hpa_t =
page,
 				   struct tdx_module_output *out)
 {
 	clflush_cache_range(__va(page), PAGE_SIZE);
-	return __seamcall(TDH_MEM_SEPT_ADD, gpa | level, tdr, page, 0, out);
+	return seamcall_sept(TDH_MEM_SEPT_ADD, gpa | level, tdr, page, 0, out);
 }
=20
 static inline u64 tdh_mem_sept_remove(hpa_t tdr, gpa_t gpa, int level,
@@ -61,13 +91,13 @@ static inline u64 tdh_mem_page_aug(hpa_t tdr, gpa_t gpa=
, hpa_t hpa,
 				   struct tdx_module_output *out)
 {
 	clflush_cache_range(__va(hpa), PAGE_SIZE);
-	return __seamcall(TDH_MEM_PAGE_AUG, gpa, tdr, hpa, 0, out);
+	return seamcall_sept(TDH_MEM_PAGE_AUG, gpa, tdr, hpa, 0, out);
 }
=20
 static inline u64 tdh_mem_range_block(hpa_t tdr, gpa_t gpa, int level,
 				      struct tdx_module_output *out)
 {
-	return __seamcall(TDH_MEM_RANGE_BLOCK, gpa | level, tdr, 0, 0, out);
+	return seamcall_sept(TDH_MEM_RANGE_BLOCK, gpa | level, tdr, 0, 0, out);
 }
=20
 static inline u64 tdh_mng_key_config(hpa_t tdr)
@@ -149,7 +179,7 @@ static inline u64 tdh_phymem_page_reclaim(hpa_t page,
 static inline u64 tdh_mem_page_remove(hpa_t tdr, gpa_t gpa, int level,
 				      struct tdx_module_output *out)
 {
-	return __seamcall(TDH_MEM_PAGE_REMOVE, gpa | level, tdr, 0, 0, out);
+	return seamcall_sept(TDH_MEM_PAGE_REMOVE, gpa | level, tdr, 0, 0, out);
 }
=20
 static inline u64 tdh_sys_lp_shutdown(void)
@@ -165,7 +195,7 @@ static inline u64 tdh_mem_track(hpa_t tdr)
 static inline u64 tdh_mem_range_unblock(hpa_t tdr, gpa_t gpa, int level,
 					struct tdx_module_output *out)
 {
-	return __seamcall(TDH_MEM_RANGE_UNBLOCK, gpa | level, tdr, 0, 0, out);
+	return seamcall_sept(TDH_MEM_RANGE_UNBLOCK, gpa | level, tdr, 0, 0, out);
 }
=20
 static inline u64 tdh_phymem_cache_wb(bool resume)
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 271B9C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:24 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231493AbjB0I1X (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:23 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56468 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231327AbjB0I03 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:29 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 670041E1FC;
        Mon, 27 Feb 2023 00:24:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486287; x=1709022287;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=X6KqWXkyD93I4EZoIvrtBjzb6FA8Vszrukcnn17tpjI=;
  b=eZa6J6msj2j0drwJQmU6vseCJ+O9dmM1Hfb/JBGsbTo3Nz4MlPZJDRB3
   GTSXYQxruY810nY1/ToJ00uesEd6BG/yYoV9IrUM+ZmmXf9b+eZJ6PQsK
   Af13NPQNXN6W3ITMjimLZLXGsIjxeuDvMXirxZ7oz+9Q6+i9LJGd0nPvZ
   yKkIF4ffrey3NCwgr6vSoHrmoff/XFR/wzDVnCEGeEB07fzeMgm7NSReO
   N6x5DnDHAuq5Sr52vexTxEXqWSuyRo+lEX+7EPMPKYXYfOqOErl7Ke47o
   ZPn7ZZrsWVz16+zS5IWGF6lhE3tA0zNcTd+DloxFdglYMq/nMYoB+GYgg
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608870"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608870"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:11 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242223"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242223"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:10 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 049/106] KVM: TDX: TDP MMU TDX support
Date: Mon, 27 Feb 2023 00:22:48 -0800
Message-Id: 
 <f9645a9855dd994107c3a9dea5886543b7f9cc34.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implement hooks of TDP MMU for TDX backend.  TLB flush, TLB shootdown,
propagating the change private EPT entry to Secure EPT and freeing Secure
EPT page. TLB flush handles both shared EPT and private EPT.  It flushes
shared EPT same as VMX.  It also waits for the TDX TLB shootdown.  For the
hook to free Secure EPT page, unlinks the Secure EPT page from the Secure
EPT so that the page can be freed to OS.

Propagate the entry change to Secure EPT.  The possible entry changes are
present -> non-present(zapping) and non-present -> present(population).  On
population just link the Secure EPT page or the private guest page to the
Secure EPT by TDX SEAMCALL. Because TDP MMU allows concurrent
zapping/population, zapping requires synchronous TLB shoot down with the
frozen EPT entry.  It zaps the secure entry, increments TLB counter, sends
IPI to remote vcpus to trigger TLB flush, and then unlinks the private
guest page from the Secure EPT. For simplicity, batched zapping with
exclude lock is handled as concurrent zapping.  Although it's inefficient,
it can be optimized in the future.

For MMIO SPTE, the spte value changes as follows.
initial value (suppress VE bit is set)
-> Guest issues MMIO and triggers EPT violation
-> KVM updates SPTE value to MMIO value (suppress VE bit is cleared)
-> Guest MMIO resumes.  It triggers VE exception in guest TD
-> Guest VE handler issues TDG.VP.VMCALL<MMIO>
-> KVM handles MMIO
-> Guest VE handler resumes its execution after MMIO instruction

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>

---
Changes v11 to v12
- removed tlb_remote_flush_with_range mothod.  Instead disable TDX if
  hyper-v support populate tlb_remote_flush and tlb_remote_flush_with_range
---
 arch/x86/kvm/mmu/spte.c    |   3 +-
 arch/x86/kvm/vmx/main.c    |  71 ++++++++-
 arch/x86/kvm/vmx/tdx.c     | 302 ++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h     |   7 +
 arch/x86/kvm/vmx/x86_ops.h |   4 +
 5 files changed, 380 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index a23e9205fc42..48e17588a127 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -74,7 +74,8 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsign=
ed int access)
 	u64 spte =3D generation_mmio_spte_mask(gen);
 	u64 gpa =3D gfn << PAGE_SHIFT;
=20
-	WARN_ON_ONCE(!vcpu->kvm->arch.shadow_mmio_value);
+	WARN_ON_ONCE(!vcpu->kvm->arch.shadow_mmio_value &&
+		     !kvm_gfn_shared_mask(vcpu->kvm));
=20
 	access &=3D shadow_mmio_access_mask;
 	spte |=3D vcpu->kvm->arch.shadow_mmio_value | access;
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 68b91b1b2162..a2ca09f10b6e 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -29,6 +29,7 @@ static int vt_max_vcpus(struct kvm *kvm)
=20
 	return kvm->max_vcpus;
 }
+static int vt_tlb_remote_flush(struct kvm *kvm);
=20
 static __init int vt_hardware_setup(void)
 {
@@ -51,8 +52,20 @@ static __init int vt_hardware_setup(void)
 	/* TDX requires KVM TDP MMU and MMIO caching. */
 	if (enable_tdx && (!tdp_enabled || !enable_mmio_caching)) {
 		enable_tdx =3D false;
-		pr_warn_ratelimited("tdp mmu and mmio caching need to be enabled.\n");
+		pr_warn_ratelimited("tdp mmu and mmio caching need to be enabled for TDX=
 support.\n");
 	}
+	/*
+	 * TDX KVM overrides tlb_remote_flush method and assumes
+	 * tlb_remote_flush_with_range =3D NULL that falls back to
+	 * tlb_remote_flush.  Disable TDX if there are conflicts.
+	 */
+	if (vt_x86_ops.tlb_remote_flush ||
+	    vt_x86_ops.tlb_remote_flush_with_range) {
+		enable_tdx =3D false;
+		pr_warn_ratelimited("TDX requires baremetal. Not Supported on VMM guest.=
\n");
+	}
+	if (enable_tdx)
+		vt_x86_ops.tlb_remote_flush =3D vt_tlb_remote_flush;
 	return 0;
 }
=20
@@ -134,6 +147,54 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool =
init_event)
 	vmx_vcpu_reset(vcpu, init_event);
 }
=20
+static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		 tdx_flush_tlb(vcpu);
+		 return;
+	}
+
+	vmx_flush_tlb_all(vcpu);
+}
+
+static void vt_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_flush_tlb(vcpu);
+		return;
+	}
+
+	vmx_flush_tlb_current(vcpu);
+}
+
+static int vt_tlb_remote_flush(struct kvm *kvm)
+{
+	if (is_td(kvm))
+		return tdx_sept_tlb_remote_flush(kvm);
+
+	/*
+	 * fallback to KVM_REQ_TLB_FLUSH.
+	 * See kvm_arch_flush_remote_tlb() and kvm_flush_remote_tlbs().
+	 */
+	return -EOPNOTSUPP;
+}
+
+static void vt_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_flush_tlb_gva(vcpu, addr);
+}
+
+static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_flush_tlb_guest(vcpu);
+}
+
 static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			int pgd_level)
 {
@@ -226,10 +287,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.set_rflags =3D vmx_set_rflags,
 	.get_if_flag =3D vmx_get_if_flag,
=20
-	.flush_tlb_all =3D vmx_flush_tlb_all,
-	.flush_tlb_current =3D vmx_flush_tlb_current,
-	.flush_tlb_gva =3D vmx_flush_tlb_gva,
-	.flush_tlb_guest =3D vmx_flush_tlb_guest,
+	.flush_tlb_all =3D vt_flush_tlb_all,
+	.flush_tlb_current =3D vt_flush_tlb_current,
+	.flush_tlb_gva =3D vt_flush_tlb_gva,
+	.flush_tlb_guest =3D vt_flush_tlb_guest,
=20
 	.vcpu_pre_run =3D vmx_vcpu_pre_run,
 	.vcpu_run =3D vmx_vcpu_run,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 477ad69b1361..8f191177bfe9 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -6,7 +6,9 @@
 #include "capabilities.h"
 #include "x86_ops.h"
 #include "tdx.h"
+#include "vmx.h"
 #include "x86.h"
+#include "mmu.h"
=20
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -312,6 +314,22 @@ static int tdx_do_tdh_mng_key_config(void *param)
=20
 int tdx_vm_init(struct kvm *kvm)
 {
+	/*
+	 * Because guest TD is protected, VMM can't parse the instruction in TD.
+	 * Instead, guest uses MMIO hypercall.  For unmodified device driver,
+	 * #VE needs to be injected for MMIO and #VE handler in TD converts MMIO
+	 * instruction into MMIO hypercall.
+	 *
+	 * SPTE value for MMIO needs to be setup so that #VE is injected into
+	 * TD instead of triggering EPT MISCONFIG.
+	 * - RWX=3D0 so that EPT violation is triggered.
+	 * - suppress #VE bit is cleared to inject #VE.
+	 */
+	kvm_mmu_set_mmio_spte_value(kvm, 0);
+
+	/* TODO: Enable 2mb and 1gb large page support. */
+	kvm->arch.tdp_max_page_level =3D PG_LEVEL_4K;
+
 	/*
 	 * This function initializes only KVM software construct.  It doesn't
 	 * initialize TDX stuff, e.g. TDCS, TDR, TDCX, HKID etc.
@@ -434,6 +452,261 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t ro=
ot_hpa, int pgd_level)
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
 }
=20
+static void tdx_unpin(struct kvm *kvm, kvm_pfn_t pfn)
+{
+	struct page *page =3D pfn_to_page(pfn);
+
+	put_page(page);
+}
+
+static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
+				     enum pg_level level, kvm_pfn_t pfn)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	hpa_t hpa =3D pfn_to_hpa(pfn);
+	gpa_t gpa =3D gfn_to_gpa(gfn);
+	struct tdx_module_output out;
+	u64 err;
+
+	/* TODO: handle large pages. */
+	if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm))
+		return -EINVAL;
+
+	/*
+	 * Because restricted mem doesn't support page migration with
+	 * a_ops->migrate_page (yet), no callback isn't triggered for KVM on
+	 * page migration.  Until restricted mem supports page migration,
+	 * prevent page migration.
+	 * TODO: Once restricted mem introduces callback on page migration,
+	 * implement it and remove get_page/put_page().
+	 */
+	get_page(pfn_to_page(pfn));
+
+	if (likely(is_td_finalized(kvm_tdx))) {
+		err =3D tdh_mem_page_aug(kvm_tdx->tdr_pa, gpa, hpa, &out);
+		if (err =3D=3D TDX_ERROR_SEPT_BUSY) {
+			tdx_unpin(kvm, pfn);
+			return -EAGAIN;
+		}
+		if (KVM_BUG_ON(err, kvm)) {
+			pr_tdx_error(TDH_MEM_PAGE_AUG, err, &out);
+			tdx_unpin(kvm, pfn);
+			return -EIO;
+		}
+		return 0;
+	}
+
+	/* TODO: tdh_mem_page_add() comes here for the initial memory. */
+
+	return 0;
+}
+
+static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
+				       enum pg_level level, kvm_pfn_t pfn)
+{
+	int tdx_level =3D pg_level_to_tdx_sept_level(level);
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	struct tdx_module_output out;
+	gpa_t gpa =3D gfn_to_gpa(gfn);
+	hpa_t hpa =3D pfn_to_hpa(pfn);
+	hpa_t hpa_with_hkid;
+	u64 err;
+
+	/* TODO: handle large pages. */
+	if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm))
+		return -EINVAL;
+
+	if (!is_hkid_assigned(kvm_tdx)) {
+		/*
+		 * The HKID assigned to this TD was already freed and cache
+		 * was already flushed. We don't have to flush again.
+		 */
+		err =3D tdx_reclaim_page(hpa, false, 0);
+		if (KVM_BUG_ON(err, kvm))
+			return -EIO;
+		tdx_unpin(kvm, pfn);
+		return 0;
+	}
+
+	do {
+		/*
+		 * When zapping private page, write lock is held. So no race
+		 * condition with other vcpu sept operation.  Race only with
+		 * TDH.VP.ENTER.
+		 */
+		err =3D tdh_mem_page_remove(kvm_tdx->tdr_pa, gpa, tdx_level, &out);
+	} while (err =3D=3D TDX_ERROR_SEPT_BUSY);
+	if (KVM_BUG_ON(err, kvm)) {
+		pr_tdx_error(TDH_MEM_PAGE_REMOVE, err, &out);
+		return -EIO;
+	}
+
+	hpa_with_hkid =3D set_hkid_to_hpa(hpa, (u16)kvm_tdx->hkid);
+	do {
+		/*
+		 * TDX_OPERAND_BUSY can happen on locking PAMT entry.  Because
+		 * this page was removed above, other thread shouldn't be
+		 * repeatedly operating on this page.  Just retry loop.
+		 */
+		err =3D tdh_phymem_page_wbinvd(hpa_with_hkid);
+	} while (err =3D=3D (TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX));
+	if (KVM_BUG_ON(err, kvm)) {
+		pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL);
+		return -EIO;
+	}
+	tdx_unpin(kvm, pfn);
+	return 0;
+}
+
+static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,
+				     enum pg_level level, void *private_spt)
+{
+	int tdx_level =3D pg_level_to_tdx_sept_level(level);
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	gpa_t gpa =3D gfn_to_gpa(gfn);
+	hpa_t hpa =3D __pa(private_spt);
+	struct tdx_module_output out;
+	u64 err;
+
+	err =3D tdh_mem_sept_add(kvm_tdx->tdr_pa, gpa, tdx_level, hpa, &out);
+	if (err =3D=3D TDX_ERROR_SEPT_BUSY)
+		return -EAGAIN;
+	if (KVM_BUG_ON(err, kvm)) {
+		pr_tdx_error(TDH_MEM_SEPT_ADD, err, &out);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn,
+				      enum pg_level level)
+{
+	int tdx_level =3D pg_level_to_tdx_sept_level(level);
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	gpa_t gpa =3D gfn_to_gpa(gfn);
+	struct tdx_module_output out;
+	u64 err;
+
+	/* For now large page isn't supported yet. */
+	WARN_ON_ONCE(level !=3D PG_LEVEL_4K);
+	err =3D tdh_mem_range_block(kvm_tdx->tdr_pa, gpa, tdx_level, &out);
+	if (err =3D=3D TDX_ERROR_SEPT_BUSY)
+		return -EAGAIN;
+	if (KVM_BUG_ON(err, kvm)) {
+		pr_tdx_error(TDH_MEM_RANGE_BLOCK, err, &out);
+		return -EIO;
+	}
+	return 0;
+}
+
+/*
+ * TLB shoot down procedure:
+ * There is a global epoch counter and each vcpu has local epoch counter.
+ * - TDH.MEM.RANGE.BLOCK(TDR. level, range) on one vcpu
+ *   This blocks the subsequenct creation of TLB translation on that range.
+ *   This corresponds to clear the present bit(all RXW) in EPT entry
+ * - TDH.MEM.TRACK(TDR): advances the epoch counter which is global.
+ * - IPI to remote vcpus
+ * - TDExit and re-entry with TDH.VP.ENTER on remote vcpus
+ * - On re-entry, TDX module compares the local epoch counter with the glo=
bal
+ *   epoch counter.  If the local epoch counter is older than the global e=
poch
+ *   counter, update the local epoch counter and flushes TLB.
+ */
+static void tdx_track(struct kvm_tdx *kvm_tdx)
+{
+	u64 err;
+
+	KVM_BUG_ON(!is_hkid_assigned(kvm_tdx), &kvm_tdx->kvm);
+	/* If TD isn't finalized, it's before any vcpu running. */
+	if (unlikely(!is_td_finalized(kvm_tdx)))
+		return;
+
+	/*
+	 * tdx_flush_tlb() waits for this function to issue TDH.MEM.TRACK() by
+	 * the counter.  The counter is used instead of bool because multiple
+	 * TDH_MEM_TRACK() can be issued concurrently by multiple vcpus.
+	 */
+	atomic_inc(&kvm_tdx->tdh_mem_track);
+	/*
+	 * KVM_REQ_TLB_FLUSH waits for the empty IPI handler, ack_flush(), with
+	 * KVM_REQUEST_WAIT.
+	 */
+	kvm_make_all_cpus_request(&kvm_tdx->kvm, KVM_REQ_TLB_FLUSH);
+
+	do {
+		/*
+		 * kvm_flush_remote_tlbs() doesn't allow to return error and
+		 * retry.
+		 */
+		err =3D tdh_mem_track(kvm_tdx->tdr_pa);
+	} while ((err & TDX_SEAMCALL_STATUS_MASK) =3D=3D TDX_OPERAND_BUSY);
+
+	/* Release remote vcpu waiting for TDH.MEM.TRACK in tdx_flush_tlb(). */
+	atomic_dec(&kvm_tdx->tdh_mem_track);
+
+	if (KVM_BUG_ON(err, &kvm_tdx->kvm))
+		pr_tdx_error(TDH_MEM_TRACK, err, NULL);
+
+}
+
+static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
+				     enum pg_level level, void *private_spt)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+
+	/*
+	 * The HKID assigned to this TD was already freed and cache was
+	 * already flushed. We don't have to flush again.
+	 */
+	if (!is_hkid_assigned(kvm_tdx))
+		return tdx_reclaim_page(__pa(private_spt), false, 0);
+
+	/*
+	 * free_private_spt() is (obviously) called when a shadow page is being
+	 * zapped.  KVM doesn't (yet) zap private SPs while the TD is active.
+	 * Note: This function is for private shadow page.  Not for private
+	 * guest page.   private guest page can be zapped during TD is active.
+	 * shared <-> private conversion and slot move/deletion.
+	 */
+	KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm);
+	return -EINVAL;
+}
+
+int tdx_sept_tlb_remote_flush(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx;
+
+	if (!is_td(kvm))
+		return -EOPNOTSUPP;
+
+	kvm_tdx =3D to_kvm_tdx(kvm);
+	if (is_hkid_assigned(kvm_tdx))
+		tdx_track(kvm_tdx);
+
+	return 0;
+}
+
+static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
+					 enum pg_level level, kvm_pfn_t pfn)
+{
+	/*
+	 * TDX requires TLB tracking before dropping private page.  Do
+	 * it here, although it is also done later.
+	 * If hkid isn't assigned, the guest is destroying and no vcpu
+	 * runs further.  TLB shootdown isn't needed.
+	 *
+	 * TODO: implement with_range version for optimization.
+	 * kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+	 *   =3D> tdx_sept_tlb_remote_flush_with_range(kvm, gfn,
+	 *                                 KVM_PAGES_PER_HPAGE(level));
+	 */
+	if (is_hkid_assigned(to_kvm_tdx(kvm)))
+		kvm_flush_remote_tlbs(kvm);
+
+	return tdx_sept_drop_private_spte(kvm, gfn, level, pfn);
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
@@ -868,6 +1141,25 @@ static int tdx_td_init(struct kvm *kvm, struct kvm_td=
x_cmd *cmd)
 	return ret;
 }
=20
+void tdx_flush_tlb(struct kvm_vcpu *vcpu)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
+	u64 root_hpa =3D mmu->root.hpa;
+
+	/* Flush the shared EPTP, if it's valid. */
+	if (VALID_PAGE(root_hpa))
+		ept_sync_context(construct_eptp(vcpu, root_hpa,
+						mmu->root_role.level));
+
+	/*
+	 * See tdx_track().  Wait for tlb shootdown initiater to finish
+	 * TDH_MEM_TRACK() so that TLB is flushed on the next TDENTER.
+	 */
+	while (atomic_read(&kvm_tdx->tdh_mem_track))
+		cpu_relax();
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -1072,8 +1364,16 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x8=
6_ops)
 	if (!r)
 		r =3D tdx_module_setup();
 	vmxoff_all();
+	if (r)
+		return r;
=20
-	return r;
+	x86_ops->link_private_spt =3D tdx_sept_link_private_spt;
+	x86_ops->free_private_spt =3D tdx_sept_free_private_spt;
+	x86_ops->set_private_spte =3D tdx_sept_set_private_spte;
+	x86_ops->remove_private_spte =3D tdx_sept_remove_private_spte;
+	x86_ops->zap_private_spte =3D tdx_sept_zap_private_spte;
+
+	return 0;
 }
=20
 void tdx_hardware_unsetup(void)
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 7c8f5880d104..7acd708bffa8 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -18,6 +18,7 @@ struct kvm_tdx {
 	int hkid;
=20
 	bool finalized;
+	atomic_t tdh_mem_track;
=20
 	u64 tsc_offset;
 };
@@ -165,6 +166,12 @@ static __always_inline u64 td_tdcs_exec_read64(struct =
kvm_tdx *kvm_tdx, u32 fiel
 	return out.r8;
 }
=20
+static __always_inline int pg_level_to_tdx_sept_level(enum pg_level level)
+{
+	WARN_ON_ONCE(level =3D=3D PG_LEVEL_NONE);
+	return level - 1;
+}
+
 #else
 struct kvm_tdx {
 	struct kvm kvm;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index d730c63185a9..2d1d53d14843 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -157,6 +157,8 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_ev=
ent);
=20
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
=20
+void tdx_flush_tlb(struct kvm_vcpu *vcpu);
+int tdx_sept_tlb_remote_flush(struct kvm *kvm);
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_leve=
l);
 #else
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 -ENOSYS; }
@@ -179,6 +181,8 @@ static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu=
, bool init_event) {}
=20
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
=20
+static inline void tdx_flush_tlb(struct kvm_vcpu *vcpu) {}
+static inline int tdx_sept_tlb_remote_flush(struct kvm *kvm) { return 0; }
 static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,=
 int root_level) {}
 #endif
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 51525C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230495AbjB0I1U (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:20 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57014 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231328AbjB0I03 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:29 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 044591DBBC;
        Mon, 27 Feb 2023 00:24:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486289; x=1709022289;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ZIquss/UCHV9FIcBuzeg5rtBZs8Xjy0qgzbdb+X3RSQ=;
  b=Wuzmq1vVqSo4svw58hjFV0c6f/36jxd8mrbAl/3Seyv6Ctmgy5Qp9HuL
   Jz10Un5Gh5pv+gswopko/RhuDhQ93QQ73xiEebVdPBmWI4q76t880JxLt
   Dos7gjHdWbvD8VlTaCsbYrSYoWV55sSyQJ5DnbIJwEXmH54gpYDVW4Xqv
   jIozaxUyZ42OlGRd/cPi8YWbHLEJLRmqbaI5ESYUUx+wvrb3SImQ8q2TD
   Ea/ed1IMxUym7bImVVsoDsl+gNxpwWNQeaL+/Y6WzmVDXgouVYEcOMTuf
   fWJqSyYUcJIZnoGgvrfvUHcMNsHsah+4l0pzi9fFpbURY9FcxPSMoJ5rh
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608872"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608872"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:11 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242226"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242226"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:10 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 050/106] KVM: TDX: MTRR: implement get_mt_mask() for TDX
Date: Mon, 27 Feb 2023 00:22:49 -0800
Message-Id: 
 <93ef8e57cb80e8e0268c91758968a1950de4b5f0.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because TDX virtualize cpuid[0x1].EDX[MTRR: bit 12] to fixed 1, guest TD
thinks MTRR is supported.  Although TDX supports only WB for private GPA,
it's desirable to support MTRR for shared GPA.  As guest access to MTRR
MSRs causes #VE and KVM/x86 tracks the values of MTRR MSRs, the remining
part is to implement get_mt_mask method for TDX for shared GPA.

Pass around shared bit from kvm fault handler to get_mt_mask method so that
it can determine if the gfn is shared or private.  Implement get_mt_mask()
following vmx case for shared GPA and return WB for private GPA.
the existing vmx_get_mt_mask() can't be directly used as CPU state(CR0.CD)
is protected.  GFN passed to kvm_mtrr_check_gfn_range_consistency() should
include shared bit.

Suggested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
Changes from v11 to V12
- Make common function for VMX and TDX
- pass around shared bit from KVM fault handler to get_mt_mask method
- updated commit message
---
 arch/x86/kvm/mmu/mmu.c     |  2 +-
 arch/x86/kvm/mmu/spte.c    |  5 +++--
 arch/x86/kvm/mmu/spte.h    |  2 +-
 arch/x86/kvm/vmx/common.h  |  2 ++
 arch/x86/kvm/vmx/main.c    | 11 ++++++++++-
 arch/x86/kvm/vmx/tdx.c     | 17 +++++++++++++++++
 arch/x86/kvm/vmx/vmx.c     |  5 +++--
 arch/x86/kvm/vmx/x86_ops.h |  2 ++
 8 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6421f92e618e..0c852517c0e7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4555,7 +4555,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct =
kvm_page_fault *fault)
 	if (shadow_memtype_mask && kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
 		for ( ; fault->max_level > PG_LEVEL_4K; --fault->max_level) {
 			int page_num =3D KVM_PAGES_PER_HPAGE(fault->max_level);
-			gfn_t base =3D fault->gfn & ~(page_num - 1);
+			gfn_t base =3D (fault->addr >> PAGE_SHIFT) & ~(page_num - 1);
=20
 			if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num))
 				break;
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 48e17588a127..9c874bca69f6 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -137,13 +137,14 @@ bool spte_has_volatile_bits(u64 spte)
=20
 bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	       const struct kvm_memory_slot *slot,
-	       unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
+	       unsigned int pte_access, gfn_t gfn_including_shared, kvm_pfn_t pfn,
 	       u64 old_spte, bool prefetch, bool can_unsync,
 	       bool host_writable, u64 *new_spte)
 {
 	int level =3D sp->role.level;
 	u64 spte =3D SPTE_MMU_PRESENT_MASK;
 	bool wrprot =3D false;
+	gfn_t gfn =3D gfn_including_shared & ~kvm_gfn_shared_mask(vcpu->kvm);
=20
 	WARN_ON_ONCE(!pte_access && !shadow_present_mask);
=20
@@ -191,7 +192,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_pa=
ge *sp,
 		spte |=3D PT_PAGE_SIZE_MASK;
=20
 	if (shadow_memtype_mask)
-		spte |=3D static_call(kvm_x86_get_mt_mask)(vcpu, gfn,
+		spte |=3D static_call(kvm_x86_get_mt_mask)(vcpu, gfn_including_shared,
 							 kvm_is_mmio_pfn(pfn));
 	if (host_writable)
 		spte |=3D shadow_host_writable_mask;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 7046671b08cb..067ea1ae3a13 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -481,7 +481,7 @@ bool spte_has_volatile_bits(u64 spte);
=20
 bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	       const struct kvm_memory_slot *slot,
-	       unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
+	       unsigned int pte_access, gfn_t gfn_including_shared, kvm_pfn_t pfn,
 	       u64 old_spte, bool prefetch, bool can_unsync,
 	       bool host_writable, u64 *new_spte);
 u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte,
diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 235908f3e044..422b24af7fc1 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -6,6 +6,8 @@
=20
 #include "mmu.h"
=20
+u8 __vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio, bool =
check_cr0_cd);
+
 static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t =
gpa,
 					     unsigned long exit_qualification)
 {
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index a2ca09f10b6e..f42617c7aeaf 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -3,6 +3,7 @@
=20
 #include "x86_ops.h"
 #include "mmu.h"
+#include "common.h"
 #include "vmx.h"
 #include "nested.h"
 #include "mmu.h"
@@ -206,6 +207,14 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa=
_t root_hpa,
 	vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
 }
=20
+static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_get_mt_mask(vcpu, gfn, is_mmio);
+
+	return __vmx_get_mt_mask(vcpu, gfn, is_mmio, true);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -326,7 +335,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.set_tss_addr =3D vmx_set_tss_addr,
 	.set_identity_map_addr =3D vmx_set_identity_map_addr,
-	.get_mt_mask =3D vmx_get_mt_mask,
+	.get_mt_mask =3D vt_get_mt_mask,
=20
 	.get_exit_info =3D vmx_get_exit_info,
=20
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 8f191177bfe9..f532f5c352f3 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -5,6 +5,7 @@
=20
 #include "capabilities.h"
 #include "x86_ops.h"
+#include "common.h"
 #include "tdx.h"
 #include "vmx.h"
 #include "x86.h"
@@ -345,6 +346,22 @@ int tdx_vm_init(struct kvm *kvm)
 	return 0;
 }
=20
+u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+{
+	/* TDX private GPA is always WB. */
+	if (kvm_gfn_private(vcpu->kvm, gfn)) {
+		/* MMIO is only for shared GPA. */
+		WARN_ON_ONCE(is_mmio);
+		return  MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
+	}
+
+	/* Drop shared bit as MTRR doesn't know about shared bit. */
+	gfn =3D kvm_gfn_private(vcpu->kvm, gfn);
+
+	/* As TDX enforces CR0.CD to 0, pass check_cr0_cd =3D false. */
+	return __vmx_get_mt_mask(vcpu, gfn, is_mmio, false);
+}
+
 int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *e;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3ff3b33fe9af..72da86abf989 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7552,7 +7552,8 @@ int vmx_vm_init(struct kvm *kvm)
 	return 0;
 }
=20
-u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+u8 __vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio,
+		     bool check_cr0_cd)
 {
 	u8 cache;
=20
@@ -7580,7 +7581,7 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, =
bool is_mmio)
 	if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
 		return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
=20
-	if (kvm_read_cr0(vcpu) & X86_CR0_CD) {
+	if (check_cr0_cd && kvm_read_cr0(vcpu) & X86_CR0_CD) {
 		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
 			cache =3D MTRR_TYPE_WRBACK;
 		else
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 2d1d53d14843..69f66e857ce5 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -154,6 +154,7 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_create(struct kvm_vcpu *vcpu);
 void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
=20
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
=20
@@ -178,6 +179,7 @@ static inline int tdx_vm_ioctl(struct kvm *kvm, void __=
user *argp) { return -EOP
 static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTS=
UPP; }
 static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
+static inline u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is=
_mmio) { return 0; }
=20
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1E00EC64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231499AbjB0I1Z (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:25 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55120 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231342AbjB0I0b (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:31 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE5341E298;
        Mon, 27 Feb 2023 00:24:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486291; x=1709022291;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=fhfduvAtIn2GOQVZ8YLJzAnqeWsCauqfTbMobAUWChw=;
  b=HLK2m6vbd6OqYvLEIB/teNuf8uGBWkwgNf55vKwQzPI2wY4SDLq2nKjX
   FjJiZq3iCMEhB6uQHZiegFgq8bwgvSJBXpgOyKpROSQWZnCwFg0MrfS6y
   k9ZtNylTAdAudO3qOIrQf5kqKpalKk9UFyQuyIPl3Ajbhde30SmMvD/qm
   sQRnXRkpcIM9/1EA0SNbR4R+CTa74Of6bSGKTcx16WZDpastq6EWXEy6K
   hyVXz+oS/Mgpqi49ibGh8Ij610IpcE4ZcpECE00FeGfz8KGwH+DUtdspF
   CAl3GK4FwGBjZuUGykoTYECpEoD5vzksgbZRtzM5iToMul1dSq4y7JrxS
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608876"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608876"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:11 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242229"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242229"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:10 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 051/106] [MARKER] The start of TDX KVM patch series: TD
 finalization
Date: Mon, 27 Feb 2023 00:22:50 -0800
Message-Id: 
 <081047431e463725f4bd75169a63ab673c5becb3.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD finalization.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 9b3ab0363184..c081217a0036 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -11,6 +11,7 @@ What qemu can do
 - TDX VM TYPE is exposed to Qemu.
 - Qemu can create/destroy guest of TDX vm type.
 - Qemu can create/destroy vcpu of TDX vm type.
+- Qemu can populate initial guest memory image.
=20
 Patch Layer status
 ------------------
@@ -19,8 +20,8 @@ Patch Layer status
 * TDX architectural definitions:        Applied
 * TD VM creation/destruction:           Applied
 * TD vcpu creation/destruction:         Applied
-* TDX EPT violation:                    Applying
-* TD finalization:                      Not yet
+* TDX EPT violation:                    Applied
+* TD finalization:                      Applying
 * TD vcpu enter/exit:                   Not yet
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 27515C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231512AbjB0I1m (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:42 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57132 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231350AbjB0I0d (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:33 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEF511BAE7;
        Mon, 27 Feb 2023 00:24:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486296; x=1709022296;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=tfBK5JpdMF/MY0Wdo/DRpyz4bbNvCN8GnTwi5j1Zf+A=;
  b=drD42yP+Rm929WarRyEW5ZIPxHGc9xg8ycHtfbkrOnrwU3MhsTkgiGON
   d7qvU+MSY54/5jDSqCQbRizjahw80j/bml9hlAwy3J1bCEt8mNV2btxsF
   jAEg/dD6IlgJVg8BC1W2cy+hDAHVKAPfIfaY534B/+rfXxu+9pbZM/Dxt
   diW98sKBgvHvzVjbZPzdQEDBLZwlPuMdtSTlrnBaT3XwpSz2sn3TpjxBI
   9vZJLlgbTGRHZMiZXrA/3Jm9acdZR7rSuXb0Jw8VVl7aEqA9xbxesQSQZ
   SYLitUa1jOE5GPOZMrrc8hePqiQuUCgevzJ4ILBxNqTzRrc38rnR/weJk
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608879"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608879"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:11 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242233"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242233"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:10 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 052/106] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page()
 for use by TDX
Date: Mon, 27 Feb 2023 00:22:51 -0800
Message-Id: 
 <bd0bc958db2a06a549708a4a023fc724f33a548c.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Introduce a helper to directly (pun intended) fault-in a TDP page
without having to go through the full page fault path.  This allows
TDX to get the resulting pfn and also allows the RET_PF_* enums to
stay in mmu.c where they belong.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu.h     |  3 +++
 arch/x86/kvm/mmu/mmu.c | 49 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 0234201d5e63..6944f78c4401 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -154,6 +154,9 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vc=
pu)
 					  vcpu->arch.mmu->root_role.level);
 }
=20
+kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
+			       u32 error_code, int max_level);
+
 /*
  * Check if a given access (described through the I/D, W/R and U/S bits of=
 a
  * page fault error code pfec) causes a permission fault with the given PTE
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0c852517c0e7..6fef584c92c3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4570,6 +4570,55 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct=
 kvm_page_fault *fault)
 	return direct_page_fault(vcpu, fault);
 }
=20
+kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
+			       u32 error_code, int max_level)
+{
+	int r;
+	struct kvm_page_fault fault =3D (struct kvm_page_fault) {
+		.addr =3D gpa,
+		.error_code =3D error_code,
+		.exec =3D error_code & PFERR_FETCH_MASK,
+		.write =3D error_code & PFERR_WRITE_MASK,
+		.present =3D error_code & PFERR_PRESENT_MASK,
+		.rsvd =3D error_code & PFERR_RSVD_MASK,
+		.user =3D error_code & PFERR_USER_MASK,
+		.prefetch =3D false,
+		.is_tdp =3D true,
+		.nx_huge_page_workaround_enabled =3D is_nx_huge_page_enabled(vcpu->kvm),
+		.is_private =3D kvm_is_private_gpa(vcpu->kvm, gpa),
+	};
+
+	WARN_ON_ONCE(!vcpu->arch.mmu->root_role.direct);
+	fault.gfn =3D gpa_to_gfn(fault.addr) & ~kvm_gfn_shared_mask(vcpu->kvm);
+	fault.slot =3D kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
+
+	if (mmu_topup_memory_caches(vcpu, false))
+		return KVM_PFN_ERR_FAULT;
+
+	/*
+	 * Loop on the page fault path to handle the case where an mmu_notifier
+	 * invalidation triggers RET_PF_RETRY.  In the normal page fault path,
+	 * KVM needs to resume the guest in case the invalidation changed any
+	 * of the page fault properties, i.e. the gpa or error code.  For this
+	 * path, the gpa and error code are fixed by the caller, and the caller
+	 * expects failure if and only if the page fault can't be fixed.
+	 */
+	do {
+		fault.max_level =3D max_level;
+		fault.req_level =3D PG_LEVEL_4K;
+		fault.goal_level =3D PG_LEVEL_4K;
+
+#ifdef CONFIG_X86_64
+		if (tdp_mmu_enabled)
+			r =3D kvm_tdp_mmu_page_fault(vcpu, &fault);
+		else
+#endif
+			r =3D direct_page_fault(vcpu, &fault);
+	} while (r =3D=3D RET_PF_RETRY && !is_error_noslot_pfn(fault.pfn));
+	return fault.pfn;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page);
+
 static void nonpaging_init_context(struct kvm_mmu *context)
 {
 	context->page_fault =3D nonpaging_page_fault;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6E63EC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:58 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231552AbjB0I14 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:56 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56384 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231395AbjB0I0y (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:54 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E66E81E5C1;
        Mon, 27 Feb 2023 00:25:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486301; x=1709022301;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=GIYJoz3gOJ0fv60dErCRVxeDGvP4DZYUFM2kK0Hz8dQ=;
  b=B5QeTkhZk0xTNM3Mw1bG7v1R1Ze3prGoMs/8tY5xURuGgFHV6t6pt650
   uUYh/HplwH2FEhlO1sgrSXwzVhCsRgwoHBRgQEAVfatE2YWcKJlsfm4Ka
   j25mzHwKVAj4YaWjRm7rWZkgC0l3QYxFir+nnPKUEwyYCXlhs6Qo8NzIX
   7Ogv1x8/MJJC2G4zG7QV8HolthwkpEqUCdKdSy5SSyEfeEZ2+A1xhaKWS
   J5t5Bd55O7aCXhTrZQoqzY6XHb6lvrKY5fN3lSUJD6tqCM8C8w+KSfQXQ
   7XlQZb2B8M6Y189pHa4H3shFH56z3jSiznCucZUBDktGK2nksPlv59e3E
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608884"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608884"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:12 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242237"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242237"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:10 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 053/106] KVM: TDX: Create initial guest memory
Date: Mon, 27 Feb 2023 00:22:52 -0800
Message-Id: 
 <8bbd6c6cb0bb04de920a2c3f59f8aceb91b07c2c.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because the guest memory is protected in TDX, the creation of the initial
guest memory requires a dedicated TDX module API, tdh_mem_page_add, instead
of directly copying the memory contents into the guest memory in the case
of the default VM type.  KVM MMU page fault handler callback,
private_page_add, handles it.

Define new subcommand, KVM_TDX_INIT_MEM_REGION, of VM-scoped
KVM_MEMORY_ENCRYPT_OP.  It assigns the guest page, copies the initial
memory contents into the guest memory, encrypts the guest memory.  At the
same time, optionally it extends memory measurement of the TDX guest.  It
calls the KVM MMU page fault(EPT-violation) handler to trigger the
callbacks for it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/uapi/asm/kvm.h       |   9 ++
 arch/x86/kvm/mmu/mmu.c                |   1 +
 arch/x86/kvm/vmx/tdx.c                | 156 +++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h                |   2 +
 tools/arch/x86/include/uapi/asm/kvm.h |   9 ++
 5 files changed, 172 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 212df13e4ab5..096ff9465fc0 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -537,6 +537,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -605,4 +606,12 @@ struct kvm_tdx_init_vm {
 	struct kvm_cpuid2 cpuid;
 };
=20
+#define KVM_TDX_MEASURE_MEMORY_REGION	(1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6fef584c92c3..7a7ba4e4574c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5587,6 +5587,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 out:
 	return r;
 }
+EXPORT_SYMBOL(kvm_mmu_load);
=20
 void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index f532f5c352f3..e503590f5e59 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -469,6 +469,21 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t roo=
t_hpa, int pgd_level)
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
 }
=20
+static void tdx_measure_page(struct kvm_tdx *kvm_tdx, hpa_t gpa)
+{
+	struct tdx_module_output out;
+	u64 err;
+	int i;
+
+	for (i =3D 0; i < PAGE_SIZE; i +=3D TDX_EXTENDMR_CHUNKSIZE) {
+		err =3D tdh_mr_extend(kvm_tdx->tdr_pa, gpa + i, &out);
+		if (KVM_BUG_ON(err, &kvm_tdx->kvm)) {
+			pr_tdx_error(TDH_MR_EXTEND, err, &out);
+			break;
+		}
+	}
+}
+
 static void tdx_unpin(struct kvm *kvm, kvm_pfn_t pfn)
 {
 	struct page *page =3D pfn_to_page(pfn);
@@ -483,12 +498,10 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,=
 gfn_t gfn,
 	hpa_t hpa =3D pfn_to_hpa(pfn);
 	gpa_t gpa =3D gfn_to_gpa(gfn);
 	struct tdx_module_output out;
+	hpa_t source_pa;
+	bool measure;
 	u64 err;
=20
-	/* TODO: handle large pages. */
-	if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm))
-		return -EINVAL;
-
 	/*
 	 * Because restricted mem doesn't support page migration with
 	 * a_ops->migrate_page (yet), no callback isn't triggered for KVM on
@@ -499,7 +512,12 @@ static int tdx_sept_set_private_spte(struct kvm *kvm, =
gfn_t gfn,
 	 */
 	get_page(pfn_to_page(pfn));
=20
+	/* Build-time faults are induced and handled via TDH_MEM_PAGE_ADD. */
 	if (likely(is_td_finalized(kvm_tdx))) {
+		/* TODO: handle large pages. */
+		if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm))
+			return -EINVAL;
+
 		err =3D tdh_mem_page_aug(kvm_tdx->tdr_pa, gpa, hpa, &out);
 		if (err =3D=3D TDX_ERROR_SEPT_BUSY) {
 			tdx_unpin(kvm, pfn);
@@ -513,7 +531,45 @@ static int tdx_sept_set_private_spte(struct kvm *kvm, =
gfn_t gfn,
 		return 0;
 	}
=20
-	/* TODO: tdh_mem_page_add() comes here for the initial memory. */
+	/*
+	 * KVM_INIT_MEM_REGION, tdx_init_mem_region(), supports only 4K page
+	 * because tdh_mem_page_add() supports only 4K page.
+	 */
+	if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm))
+		return -EINVAL;
+
+	/*
+	 * In case of TDP MMU, fault handler can run concurrently.  Note
+	 * 'source_pa' is a TD scope variable, meaning if there are multiple
+	 * threads reaching here with all needing to access 'source_pa', it
+	 * will break.  However fortunately this won't happen, because below
+	 * TDH_MEM_PAGE_ADD code path is only used when VM is being created
+	 * before it is running, using KVM_TDX_INIT_MEM_REGION ioctl (which
+	 * always uses vcpu 0's page table and protected by vcpu->mutex).
+	 */
+	if (KVM_BUG_ON(kvm_tdx->source_pa =3D=3D INVALID_PAGE, kvm)) {
+		tdx_unpin(kvm, pfn);
+		return -EINVAL;
+	}
+
+	source_pa =3D kvm_tdx->source_pa & ~KVM_TDX_MEASURE_MEMORY_REGION;
+	measure =3D kvm_tdx->source_pa & KVM_TDX_MEASURE_MEMORY_REGION;
+	kvm_tdx->source_pa =3D INVALID_PAGE;
+
+	do {
+		err =3D tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, hpa, source_pa,
+				       &out);
+		/*
+		 * This path is executed during populating initial guest memory
+		 * image. i.e. before running any vcpu.  Race is rare.
+		 */
+	} while (err =3D=3D TDX_ERROR_SEPT_BUSY);
+	if (KVM_BUG_ON(err, kvm)) {
+		pr_tdx_error(TDH_MEM_PAGE_ADD, err, &out);
+		tdx_unpin(kvm, pfn);
+		return -EIO;
+	} else if (measure)
+		tdx_measure_page(kvm_tdx, gpa);
=20
 	return 0;
 }
@@ -1177,6 +1233,93 @@ void tdx_flush_tlb(struct kvm_vcpu *vcpu)
 		cpu_relax();
 }
=20
+#define TDX_SEPT_PFERR	PFERR_WRITE_MASK
+
+static int tdx_init_mem_region(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	struct kvm_tdx_init_mem_region region;
+	struct kvm_vcpu *vcpu;
+	struct page *page;
+	kvm_pfn_t pfn;
+	int idx, ret =3D 0;
+
+	/* The BSP vCPU must be created before initializing memory regions. */
+	if (!atomic_read(&kvm->online_vcpus))
+		return -EINVAL;
+
+	if (cmd->flags & ~KVM_TDX_MEASURE_MEMORY_REGION)
+		return -EINVAL;
+
+	if (copy_from_user(&region, (void __user *)cmd->data, sizeof(region)))
+		return -EFAULT;
+
+	/* Sanity check */
+	if (!IS_ALIGNED(region.source_addr, PAGE_SIZE) ||
+	    !IS_ALIGNED(region.gpa, PAGE_SIZE) ||
+	    !region.nr_pages ||
+	    region.gpa + (region.nr_pages << PAGE_SHIFT) <=3D region.gpa ||
+	    !kvm_is_private_gpa(kvm, region.gpa) ||
+	    !kvm_is_private_gpa(kvm, region.gpa + (region.nr_pages << PAGE_SHIFT)=
))
+		return -EINVAL;
+
+	vcpu =3D kvm_get_vcpu(kvm, 0);
+	if (mutex_lock_killable(&vcpu->mutex))
+		return -EINTR;
+
+	vcpu_load(vcpu);
+	idx =3D srcu_read_lock(&kvm->srcu);
+
+	kvm_mmu_reload(vcpu);
+
+	while (region.nr_pages) {
+		if (signal_pending(current)) {
+			ret =3D -ERESTARTSYS;
+			break;
+		}
+
+		if (need_resched())
+			cond_resched();
+
+		/* Pin the source page. */
+		ret =3D get_user_pages_fast(region.source_addr, 1, 0, &page);
+		if (ret < 0)
+			break;
+		if (ret !=3D 1) {
+			ret =3D -ENOMEM;
+			break;
+		}
+
+		kvm_tdx->source_pa =3D pfn_to_hpa(page_to_pfn(page)) |
+				     (cmd->flags & KVM_TDX_MEASURE_MEMORY_REGION);
+
+		pfn =3D kvm_mmu_map_tdp_page(vcpu, region.gpa, TDX_SEPT_PFERR,
+					   PG_LEVEL_4K);
+		if (is_error_noslot_pfn(pfn) || kvm->vm_bugged)
+			ret =3D -EFAULT;
+		else
+			ret =3D 0;
+
+		put_page(page);
+		if (ret)
+			break;
+
+		region.source_addr +=3D PAGE_SIZE;
+		region.gpa +=3D PAGE_SIZE;
+		region.nr_pages--;
+	}
+
+	srcu_read_unlock(&kvm->srcu, idx);
+	vcpu_put(vcpu);
+
+	mutex_unlock(&vcpu->mutex);
+
+	if (copy_to_user((void __user *)cmd->data, &region, sizeof(region)))
+		ret =3D -EFAULT;
+
+	return ret;
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -1193,6 +1336,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_TDX_INIT_VM:
 		r =3D tdx_td_init(kvm, &tdx_cmd);
 		break;
+	case KVM_TDX_INIT_MEM_REGION:
+		r =3D tdx_init_mem_region(kvm, &tdx_cmd);
+		break;
 	default:
 		r =3D -EINVAL;
 		goto out;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 7acd708bffa8..9d8445324841 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -17,6 +17,8 @@ struct kvm_tdx {
 	u64 xfam;
 	int hkid;
=20
+	hpa_t source_pa;
+
 	bool finalized;
 	atomic_t tdh_mem_track;
=20
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index 4bde72881dc1..ac0bef5e497d 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -537,6 +537,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_CAPABILITIES =3D 0,
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
@@ -614,4 +615,12 @@ struct kvm_tdx_init_vm {
 	};
 };
=20
+#define KVM_TDX_MEASURE_MEMORY_REGION	(1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6A3F2C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231528AbjB0I1s (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:48 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57828 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231379AbjB0I0w (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:52 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3AABF1ACE1;
        Mon, 27 Feb 2023 00:25:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486302; x=1709022302;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Detz0K5v4lG/FqhE8VT6sVgoLB3EmKFYg75QVE/fo4Y=;
  b=OLhzZJlqdfd+LHGiIZNdLfLOW7OdV8mHeWqCxghukuv5GYBFYGxXWNh/
   HwVnmSdpCEJqvgvDsjKMZsyKJZclSAxvsruKG/QQEfNHDkeWN38M1PEdP
   DR1XYx1tSxOf0VUe+c96HmyiDJfHjpXufuLa9g6DOm1eucJmRc8ScETkk
   /H0P32gmk0G13toN0ZSFAYC5BXwEZ5oH3m7h0zM4fXzGD3NY3MU9d7UQA
   cxTb3LqBp1PgjXXUaWtVAi4MvIj49i9gSQQdr+hvk91UKx6+kocFwuWOH
   JqzRXQz5ilzIP1OXjubZ9SUahyPsq87vI0Ux+uuDsICN+JTxC3bDIq0XQ
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608888"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608888"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:12 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242243"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242243"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:11 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 054/106] KVM: TDX: Finalize VM initialization
Date: Mon, 27 Feb 2023 00:22:53 -0800
Message-Id: 
 <06783dfca318ba31448f9f99b02f73b174404bb4.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

To protect the initial contents of the guest TD, the TDX module measures
the guest TD during the build process as SHA-384 measurement.  The
measurement of the guest TD contents needs to be completed to make the
guest TD ready to run.

Add a new subcommand, KVM_TDX_FINALIZE_VM, for VM-scoped
KVM_MEMORY_ENCRYPT_OP to finalize the measurement and mark the TDX VM ready
to run.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/uapi/asm/kvm.h       |  1 +
 arch/x86/kvm/vmx/tdx.c                | 31 +++++++++++++++++++++++++++
 tools/arch/x86/include/uapi/asm/kvm.h |  1 +
 3 files changed, 33 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv=
m.h
index 096ff9465fc0..082e808c9260 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -538,6 +538,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
 	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index e503590f5e59..8e6f4122e99e 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1320,6 +1320,34 @@ static int tdx_init_mem_region(struct kvm *kvm, stru=
ct kvm_tdx_cmd *cmd)
 	return ret;
 }
=20
+static int tdx_td_finalizemr(struct kvm *kvm)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
+	u64 err;
+
+	if (!is_hkid_assigned(kvm_tdx) || is_td_finalized(kvm_tdx))
+		return -EINVAL;
+
+	err =3D tdh_mr_finalize(kvm_tdx->tdr_pa);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MR_FINALIZE, err, NULL);
+		return -EIO;
+	}
+
+	/*
+	 * Blindly do TDH_MEM_TRACK after finalizing the measurement to handle
+	 * the case where SEPT entries were zapped/blocked, e.g. from failed
+	 * NUMA balancing, after they were added to the TD via
+	 * tdx_init_mem_region().  TDX module doesn't allow TDH_MEM_TRACK prior
+	 * to TDH.MR.FINALIZE, and conversely requires TDH.MEM.TRACK for entries
+	 * that were TDH.MEM.RANGE.BLOCK'd prior to TDH.MR.FINALIZE.
+	 */
+	(void)tdh_mem_track(to_kvm_tdx(kvm)->tdr_pa);
+
+	kvm_tdx->finalized =3D true;
+	return 0;
+}
+
 int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_tdx_cmd tdx_cmd;
@@ -1339,6 +1367,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_TDX_INIT_MEM_REGION:
 		r =3D tdx_init_mem_region(kvm, &tdx_cmd);
 		break;
+	case KVM_TDX_FINALIZE_VM:
+		r =3D tdx_td_finalizemr(kvm);
+		break;
 	default:
 		r =3D -EINVAL;
 		goto out;
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include=
/uapi/asm/kvm.h
index ac0bef5e497d..4cecea304922 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -538,6 +538,7 @@ enum kvm_tdx_cmd_id {
 	KVM_TDX_INIT_VM,
 	KVM_TDX_INIT_VCPU,
 	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
=20
 	KVM_TDX_CMD_NR_MAX,
 };
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 26E3DC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:27:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231542AbjB0I1w (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:27:52 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59242 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230214AbjB0I0y (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:54 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6BD41E5D9;
        Mon, 27 Feb 2023 00:25:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486302; x=1709022302;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=q6CZUCqvV52bUpbvBQe02hCjee74Kgx17UhYM581eRM=;
  b=b99uReLJFR3HnwOXb3D+mMgUlGJQOgh5lOaRLg5ECc8/M1OB/JLPNNsx
   skC9M7UcarcaT80bx3MXxQ8SAFK/zXHMpc1FtMb5gBNUITR6UxHI3tPDW
   Z20uCYzsQyXLBzqsgQpT19O2Fh+kpuGBdCUln44Zvvn84r63QhEReB77A
   pDr1wjM3Fo7vdCHaxbrI6xHyDWD9o2aBg9MMz3PqSUzg779gO1kw5aeA/
   TMXCRWCDKjqn7LOPhOTjnHW5Ie8omxwbvTOpfq6qtfTQSHGyMZREawLCB
   3/PQn82yC0KTpT9BYoieiwAIYCKAJnFHLprmfffNn3hm427TCQdsWbXmJ
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608889"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608889"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:12 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242248"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242248"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:11 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 055/106] [MARKER] The start of TDX KVM patch series: TD
 vcpu enter/exit
Date: Mon, 27 Feb 2023 00:22:54 -0800
Message-Id: 
 <3cc8fbdd5c75959549fd695a12226be49fea5fd8.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD vcpu
enter/exit.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index c081217a0036..58bff496abda 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -12,6 +12,7 @@ What qemu can do
 - Qemu can create/destroy guest of TDX vm type.
 - Qemu can create/destroy vcpu of TDX vm type.
 - Qemu can populate initial guest memory image.
+- Qemu can finalize guest TD.
=20
 Patch Layer status
 ------------------
@@ -21,8 +22,8 @@ Patch Layer status
 * TD VM creation/destruction:           Applied
 * TD vcpu creation/destruction:         Applied
 * TDX EPT violation:                    Applied
-* TD finalization:                      Applying
-* TD vcpu enter/exit:                   Not yet
+* TD finalization:                      Applied
+* TD vcpu enter/exit:                   Applying
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A1A7CC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:28:02 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231560AbjB0I2A (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:28:00 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59254 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230523AbjB0I04 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:26:56 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADFDB1E5DA;
        Mon, 27 Feb 2023 00:25:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486304; x=1709022304;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=SPNDiE0gWf+etVYH4mwxBcWL2qwEF5HM7/i5D4ZRbpM=;
  b=jWysgh03zHtZvYhZsDY92107yGKuyfWkqK36mbRmpmPYBYJdydD2MrT9
   z0f5ct4Yl/v/lt6onIprDPOR+20Juulb1yAWMoDQUWxYeIpTO8Zif69os
   p+MK0PO+NeQaQ6eURLbCsZ9phH7eewBHuUVbr6zVAl7ggv2qEINP487ex
   r2P/MR/VU13YB6DJVFaO3zjWOQCnYrZWtw0DnKiOSWUCpk1r0VfF3obRA
   fVWXyI2lBY05675XfEsSfRx9IBYSOKWixTKUPK+NE3THgGvxmPmqaQtPf
   Wo3pM3bRFFkv375GAKUbDSQXBi/c82a6fO6hBEJbQBqegoa8leCoJaPLa
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608905"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608905"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:13 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242253"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242253"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:12 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 056/106] KVM: TDX: Add helper assembly function to TDX
 vcpu
Date: Mon, 27 Feb 2023 00:22:55 -0800
Message-Id: 
 <61f8f8ec4bd45bebf1d5f657e61090fcbbc42946.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX defines an API to run TDX vcpu with its own ABI.  Define an assembly
helper function to run TDX vcpu to hide the special ABI so that C code can
call it with function call ABI.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/tdx.h |   3 +-
 arch/x86/kvm/vmx/vmenter.S | 156 +++++++++++++++++++++++++++++++++++++
 2 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 605af911632b..c8be21089c5a 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -18,7 +18,8 @@
  * Bits 47:40 =3D=3D 0xFF indicate Reserved status code class that never u=
sed by
  * TDX module.
  */
-#define TDX_ERROR			_BITUL(63)
+#define TDX_ERROR_BIT			63
+#define TDX_ERROR			_BITUL(TDX_ERROR_BIT)
 #define TDX_SW_ERROR			(TDX_ERROR | GENMASK_ULL(47, 40))
 #define TDX_SEAMCALL_VMFAILINVALID	(TDX_SW_ERROR | _UL(0xFFFF0000))
=20
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 766c6b3ef5ed..58516611f31b 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -6,6 +6,7 @@
 #include <asm/nospec-branch.h>
 #include <asm/percpu.h>
 #include <asm/segment.h>
+#include <asm/tdx.h>
 #include "kvm-asm-offsets.h"
 #include "run_flags.h"
=20
@@ -31,6 +32,12 @@
 #define VCPU_R15	__VCPU_REGS_R15 * WORD_SIZE
 #endif
=20
+#ifdef CONFIG_INTEL_TDX_HOST
+#define TDH_VP_ENTER		0
+#define EXIT_REASON_TDCALL	77
+#define seamcall		.byte 0x66,0x0f,0x01,0xcf
+#endif
+
 .section .noinstr.text, "ax"
=20
 /**
@@ -352,3 +359,152 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
 	pop %_ASM_BP
 	RET
 SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff)
+
+#ifdef CONFIG_INTEL_TDX_HOST
+
+.pushsection .noinstr.text, "ax"
+
+/**
+ * __tdx_vcpu_run - Call SEAMCALL(TDH_VP_ENTER) to run a TD vcpu
+ * @tdvpr:	physical address of TDVPR
+ * @regs:	void * (to registers of TDVCPU)
+ * @gpr_mask:	non-zero if guest registers need to be loaded prior to TDH_V=
P_ENTER
+ *
+ * Returns:
+ *	TD-Exit Reason
+ *
+ * Note: KVM doesn't support using XMM in its hypercalls, it's the HyperV
+ *	 code's responsibility to save/restore XMM registers on TDVMCALL.
+ */
+SYM_FUNC_START(__tdx_vcpu_run)
+	push %rbp
+	mov  %rsp, %rbp
+
+	push %r15
+	push %r14
+	push %r13
+	push %r12
+	push %rbx
+
+	/* Save @regs, which is needed after TDH_VP_ENTER to capture output. */
+	push %rsi
+
+	/* Load @tdvpr to RCX */
+	mov %rdi, %rcx
+
+	/* No need to load guest GPRs if the last exit wasn't a TDVMCALL. */
+	test %dx, %dx
+	je 1f
+
+	/* Load @regs to RAX, which will be clobbered with $TDH_VP_ENTER anyways.=
 */
+	mov %rsi, %rax
+
+	mov VCPU_RBX(%rax), %rbx
+	mov VCPU_RDX(%rax), %rdx
+	mov VCPU_RBP(%rax), %rbp
+	mov VCPU_RSI(%rax), %rsi
+	mov VCPU_RDI(%rax), %rdi
+
+	mov VCPU_R8 (%rax),  %r8
+	mov VCPU_R9 (%rax),  %r9
+	mov VCPU_R10(%rax), %r10
+	mov VCPU_R11(%rax), %r11
+	mov VCPU_R12(%rax), %r12
+	mov VCPU_R13(%rax), %r13
+	mov VCPU_R14(%rax), %r14
+	mov VCPU_R15(%rax), %r15
+
+	/*  Load TDH_VP_ENTER to RAX.  This kills the @regs pointer! */
+1:	mov $TDH_VP_ENTER, %rax
+
+2:	seamcall
+
+	/*
+	 * Use same return value convention to tdxcall.S.
+	 * TDX_SEAMCALL_VMFAILINVALID doesn't conflict with any TDX status code.
+	 */
+	jnc 3f
+	mov $TDX_SEAMCALL_VMFAILINVALID, %rax
+	jmp 5f
+3:
+
+	/* Skip to the exit path if TDH_VP_ENTER failed. */
+	bt $TDX_ERROR_BIT, %rax
+	jc 5f
+
+	/* Temporarily save the TD-Exit reason. */
+	push %rax
+
+	/* check if TD-exit due to TDVMCALL */
+	cmp $EXIT_REASON_TDCALL, %ax
+
+	/* Reload @regs to RAX. */
+	mov 8(%rsp), %rax
+
+	/* Jump on non-TDVMCALL */
+	jne 4f
+
+	/* Save all output from SEAMCALL(TDH_VP_ENTER) */
+	mov %rbx, VCPU_RBX(%rax)
+	mov %rbp, VCPU_RBP(%rax)
+	mov %rsi, VCPU_RSI(%rax)
+	mov %rdi, VCPU_RDI(%rax)
+	mov %r10, VCPU_R10(%rax)
+	mov %r11, VCPU_R11(%rax)
+	mov %r12, VCPU_R12(%rax)
+	mov %r13, VCPU_R13(%rax)
+	mov %r14, VCPU_R14(%rax)
+	mov %r15, VCPU_R15(%rax)
+
+4:	mov %rcx, VCPU_RCX(%rax)
+	mov %rdx, VCPU_RDX(%rax)
+	mov %r8,  VCPU_R8 (%rax)
+	mov %r9,  VCPU_R9 (%rax)
+
+	/*
+	 * Clear all general purpose registers except RSP and RAX to prevent
+	 * speculative use of the guest's values.
+	 */
+	xor %rbx, %rbx
+	xor %rcx, %rcx
+	xor %rdx, %rdx
+	xor %rsi, %rsi
+	xor %rdi, %rdi
+	xor %rbp, %rbp
+	xor %r8,  %r8
+	xor %r9,  %r9
+	xor %r10, %r10
+	xor %r11, %r11
+	xor %r12, %r12
+	xor %r13, %r13
+	xor %r14, %r14
+	xor %r15, %r15
+
+	/* Restore the TD-Exit reason to RAX for return. */
+	pop %rax
+
+	/* "POP" @regs. */
+5:	add $8, %rsp
+	pop %rbx
+	pop %r12
+	pop %r13
+	pop %r14
+	pop %r15
+
+	pop %rbp
+	RET
+
+6:	cmpb $0, kvm_rebooting
+	je 1f
+	mov $TDX_SW_ERROR, %r12
+	orq %r12, %rax
+	jmp 5b
+1:	ud2
+	/* Use FAULT version to know what fault happened. */
+	_ASM_EXTABLE_FAULT(2b, 6b)
+
+SYM_FUNC_END(__tdx_vcpu_run)
+
+.popsection
+
+#endif
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EE8CDC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:28:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231575AbjB0I2N (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:28:13 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56468 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231435AbjB0I1D (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:27:03 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A1B41C300;
        Mon, 27 Feb 2023 00:25:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486306; x=1709022306;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=xZQIFheOhSM43WqCbELIcqO3jlltrDoUOEztrD7R924=;
  b=fj1DoXw6nF5uQvLp+1ayWD3kuofo4EmYtHAK7IRCZkkhpMc0VtbRl1fE
   7/bVwMLEjqshO9Mjm7SXZFYLvuLe6jIEqh5TgkZdOm2J1LM7sdHpw4q5v
   65JaeC+WQFWzPrRpf7RHeNB5eG3CidnpsOxJlN2RI/sN+CZ01ZUw2Vb7T
   FswPfS25u8p/RZV+KL0yBAXCh2ymRf8ldgsaW0Xv69vYDLp2t16f7xPgx
   g+bRhUz7863MUuhHt1UxvDbQyozb7tdkr77eByFoYut8sl7Qkc0Sli4Yf
   YB8PvwxcbvP7xrQ6VWnU8COGEDv60ZR/qKQQJAiA5cZDTbWXTX8hyD/ZZ
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608923"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608923"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242261"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242261"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:12 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 057/106] KVM: TDX: Implement TDX vcpu enter/exit path
Date: Mon, 27 Feb 2023 00:22:56 -0800
Message-Id: 
 <4b5a5aeddddf17b9e736bc70f2bcfe1b5f05ced4.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This patch implements running TDX vcpu.  Once vcpu runs on the logical
processor (LP), the TDX vcpu is associated with it.  When the TDX vcpu
moves to another LP, the TDX vcpu needs to flush its status on the LP.
When destroying TDX vcpu, it needs to complete flush and flush cpu memory
cache.  Track which LP the TDX vcpu run and flush it as necessary.

Do nothing on sched_in event as TDX doesn't support pause loop.

TDX vcpu execution requires restoring PMU debug store after returning back
to KVM because the TDX module unconditionally resets the value.  To reuse
the existing code, export perf_restore_debug_store.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 21 +++++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 32 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h     | 33 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  2 ++
 arch/x86/kvm/x86.c         |  1 +
 5 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index f42617c7aeaf..fe490620301e 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -148,6 +148,23 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool =
init_event)
 	vmx_vcpu_reset(vcpu, init_event);
 }
=20
+static int vt_vcpu_pre_run(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		/* Unconditionally continue to vcpu_run(). */
+		return 1;
+
+	return vmx_vcpu_pre_run(vcpu);
+}
+
+static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_vcpu_run(vcpu);
+
+	return vmx_vcpu_run(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu)) {
@@ -301,8 +318,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.flush_tlb_gva =3D vt_flush_tlb_gva,
 	.flush_tlb_guest =3D vt_flush_tlb_guest,
=20
-	.vcpu_pre_run =3D vmx_vcpu_pre_run,
-	.vcpu_run =3D vmx_vcpu_run,
+	.vcpu_pre_run =3D vt_vcpu_pre_run,
+	.vcpu_run =3D vt_vcpu_run,
 	.handle_exit =3D vmx_handle_exit,
 	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
 	.update_emulated_instruction =3D vmx_update_emulated_instruction,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 8e6f4122e99e..9273e7399c49 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -11,6 +11,9 @@
 #include "x86.h"
 #include "mmu.h"
=20
+#include <trace/events/kvm.h>
+#include "trace.h"
+
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
=20
@@ -464,6 +467,35 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	return;
 }
=20
+u64 __tdx_vcpu_run(hpa_t tdvpr, void *regs, u32 regs_mask);
+
+static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
+					struct vcpu_tdx *tdx)
+{
+	guest_enter_irqoff();
+	tdx->exit_reason.full =3D __tdx_vcpu_run(tdx->tdvpr_pa, vcpu->arch.regs, =
0);
+	guest_exit_irqoff();
+}
+
+fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	if (unlikely(vcpu->kvm->vm_bugged)) {
+		tdx->exit_reason.full =3D TDX_NON_RECOVERABLE_VCPU;
+		return EXIT_FASTPATH_NONE;
+	}
+
+	trace_kvm_entry(vcpu);
+
+	tdx_vcpu_enter_exit(vcpu, tdx);
+
+	vcpu->arch.regs_avail &=3D ~VMX_REGS_LAZY_LOAD_SET;
+	trace_kvm_exit(vcpu, KVM_ISA_VMX);
+
+	return EXIT_FASTPATH_NONE;
+}
+
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
 {
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 9d8445324841..af29e1d89657 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -25,12 +25,45 @@ struct kvm_tdx {
 	u64 tsc_offset;
 };
=20
+union tdx_exit_reason {
+	struct {
+		/* 31:0 mirror the VMX Exit Reason format */
+		u64 basic		: 16;
+		u64 reserved16		: 1;
+		u64 reserved17		: 1;
+		u64 reserved18		: 1;
+		u64 reserved19		: 1;
+		u64 reserved20		: 1;
+		u64 reserved21		: 1;
+		u64 reserved22		: 1;
+		u64 reserved23		: 1;
+		u64 reserved24		: 1;
+		u64 reserved25		: 1;
+		u64 bus_lock_detected	: 1;
+		u64 enclave_mode	: 1;
+		u64 smi_pending_mtf	: 1;
+		u64 smi_from_vmx_root	: 1;
+		u64 reserved30		: 1;
+		u64 failed_vmentry	: 1;
+
+		/* 63:32 are TDX specific */
+		u64 details_l1		: 8;
+		u64 class		: 8;
+		u64 reserved61_48	: 14;
+		u64 non_recoverable	: 1;
+		u64 error		: 1;
+	};
+	u64 full;
+};
+
 struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
=20
 	unsigned long tdvpr_pa;
 	unsigned long *tdvpx_pa;
=20
+	union tdx_exit_reason exit_reason;
+
 	bool initialized;
=20
 	/*
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 69f66e857ce5..482839d5acc9 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -154,6 +154,7 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp);
 int tdx_vcpu_create(struct kvm_vcpu *vcpu);
 void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu);
 u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
=20
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -179,6 +180,7 @@ static inline int tdx_vm_ioctl(struct kvm *kvm, void __=
user *argp) { return -EOP
 static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTS=
UPP; }
 static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
+static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu) { return EXIT=
_FASTPATH_NONE; }
 static inline u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is=
_mmio) { return 0; }
=20
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 89ee421e0cbf..5b6705f81eb4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -305,6 +305,7 @@ const struct kvm_stats_header kvm_vcpu_stats_header =3D=
 {
 };
=20
 u64 __read_mostly host_xcr0;
+EXPORT_SYMBOL_GPL(host_xcr0);
=20
 static struct kmem_cache *x86_emulator_cache;
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 132B9C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:28:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231433AbjB0I2K (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:28:10 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59312 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231421AbjB0I1B (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:27:01 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D45B81C337;
        Mon, 27 Feb 2023 00:25:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486305; x=1709022305;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=M7VduQRpTCB62659dGw/1suEXSDb1C2YMLZ0heer3y8=;
  b=CMe3X4+HbTxuraf80Og/OZVXqYuB0wCTr0fTwE8Ri4NMy+kAv1gh0nIA
   Tyg7uk31GiP2L+hfrqptTR5ZAQAPaFtGK78InV0FUuD+jYu9mW8Dqhza6
   6E1iL3jIj9nwxhXRiP5ggUb5TMbAyCOr/1LkskBgtAAXEygojcu3vVBlh
   kLXrDtukdGmsD4UJaOQ8OF5Y34Ly/TGag+G5p//BgsJsN6WCZ7rAsR4/n
   cMvVBTZaPDEWWpAe/AzuL7nmsaDFp2zoCD/xxXoBUl4oWS+pVkeHb1kmz
   oWOrv21byU3ZQQMlxoaTEn7gkT/FdR6mYisZ0uwOf/B8O1p4MprT1r2SW
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608915"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608915"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242265"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242265"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:12 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 058/106] KVM: TDX: vcpu_run: save/restore host state(host
 kernel gs)
Date: Mon, 27 Feb 2023 00:22:57 -0800
Message-Id: 
 <75de792e506a3518bb4f9b03ac05ae38d4b83c81.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

On entering/exiting TDX vcpu, Preserved or clobbered CPU state is different
from VMX case.  Add TDX hooks to save/restore host/guest CPU state.
Save/restore kernel GS base MSR.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c    | 30 ++++++++++++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 40 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h     |  4 ++++
 arch/x86/kvm/vmx/x86_ops.h |  4 ++++
 4 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index fe490620301e..6066682cb4ff 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -148,6 +148,32 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool =
init_event)
 	vmx_vcpu_reset(vcpu, init_event);
 }
=20
+static void vt_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * All host state is saved/restored across SEAMCALL/SEAMRET, and the
+	 * guest state of a TD is obviously off limits.  Deferring MSRs and DRs
+	 * is pointless because the TDX module needs to load *something* so as
+	 * not to expose guest state.
+	 */
+	if (is_td_vcpu(vcpu)) {
+		tdx_prepare_switch_to_guest(vcpu);
+		return;
+	}
+
+	vmx_prepare_switch_to_guest(vcpu);
+}
+
+static void vt_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_vcpu_put(vcpu);
+		return;
+	}
+
+	vmx_vcpu_put(vcpu);
+}
+
 static int vt_vcpu_pre_run(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -285,9 +311,9 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vcpu_free =3D vt_vcpu_free,
 	.vcpu_reset =3D vt_vcpu_reset,
=20
-	.prepare_switch_to_guest =3D vmx_prepare_switch_to_guest,
+	.prepare_switch_to_guest =3D vt_prepare_switch_to_guest,
 	.vcpu_load =3D vmx_vcpu_load,
-	.vcpu_put =3D vmx_vcpu_put,
+	.vcpu_put =3D vt_vcpu_put,
=20
 	.update_exception_bitmap =3D vmx_update_exception_bitmap,
 	.get_msr_feature =3D vmx_get_msr_feature,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 9273e7399c49..734925d3a5c0 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/cpu.h>
+#include <linux/mmu_context.h>
=20
 #include <asm/tdx.h>
=20
@@ -367,6 +368,7 @@ u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bo=
ol is_mmio)
=20
 int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
 	struct kvm_cpuid_entry2 *e;
=20
 	/*
@@ -413,9 +415,45 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.guest_state_protected =3D
 		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
=20
+	tdx->host_state_need_save =3D true;
+	tdx->host_state_need_restore =3D false;
+
 	return 0;
 }
=20
+void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	if (!tdx->host_state_need_save)
+		return;
+
+	if (likely(is_64bit_mm(current->mm)))
+		tdx->msr_host_kernel_gs_base =3D current->thread.gsbase;
+	else
+		tdx->msr_host_kernel_gs_base =3D read_msr(MSR_KERNEL_GS_BASE);
+
+	tdx->host_state_need_save =3D false;
+}
+
+static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	tdx->host_state_need_save =3D true;
+	if (!tdx->host_state_need_restore)
+		return;
+
+	wrmsrl(MSR_KERNEL_GS_BASE, tdx->msr_host_kernel_gs_base);
+	tdx->host_state_need_restore =3D false;
+}
+
+void tdx_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	vmx_vcpu_pi_put(vcpu);
+	tdx_prepare_switch_to_host(vcpu);
+}
+
 void tdx_vcpu_free(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
@@ -490,6 +528,8 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
+	tdx->host_state_need_restore =3D true;
+
 	vcpu->arch.regs_avail &=3D ~VMX_REGS_LAZY_LOAD_SET;
 	trace_kvm_exit(vcpu, KVM_ISA_VMX);
=20
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index af29e1d89657..cd50d366b7ee 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -66,6 +66,10 @@ struct vcpu_tdx {
=20
 	bool initialized;
=20
+	bool host_state_need_save;
+	bool host_state_need_restore;
+	u64 msr_host_kernel_gs_base;
+
 	/*
 	 * Dummy to make pmu_intel not corrupt memory.
 	 * TODO: Support PMU for TDX.  Future work.
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 482839d5acc9..bafbf4e06a5b 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -155,6 +155,8 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu);
 void tdx_vcpu_free(struct kvm_vcpu *vcpu);
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
 fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu);
+void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
+void tdx_vcpu_put(struct kvm_vcpu *vcpu);
 u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
=20
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -181,6 +183,8 @@ static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu=
) { return -EOPNOTSUPP; }
 static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) =
{}
 static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu) { return EXIT=
_FASTPATH_NONE; }
+static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
+static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
 static inline u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is=
_mmio) { return 0; }
=20
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ACE79C7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:28:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231379AbjB0I22 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:28:28 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57464 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230416AbjB0I1n (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:27:43 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1351E1EBCD;
        Mon, 27 Feb 2023 00:25:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486314; x=1709022314;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=tJiFpzx61hSkb+RfZk9tKpuU2W8hI14I36ADlaPWiEI=;
  b=a0HHpbXNKqyggs+GFCdUYkrP0DqL3g7xzgppC0lYHsdQn9/D6qelSxk8
   Y0LbO+4qHXbiKr92Y57AMZeoCnh+X/wQKQv5P9U+e7tQGSmZ40knxbleR
   Vq2qDj0LN36qyY1IFnsRBfJ0qO8BeMKFcpqHNZ4Rdu0GZLa8zLirJO4Cx
   FeOT4BLxdclEHK9+ucQtL9GT3oKH0Hz0hINzEgMP8s3fwiPo0Ll3vhyCZ
   2CPl53LU/ra0Y7Ch2Px1fmpYC/ufX/FEC2fYejC+P6iJZbOwI/xuF9QUs
   xy6EskNL9gh1E1V4GX6Pq2PtqvQFiNzH6BfmBQvg79OarbjzOH+xC7jO+
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608935"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608935"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242269"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242269"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:12 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 059/106] KVM: TDX: restore host xsave state when exit from
 the guest TD
Date: Mon, 27 Feb 2023 00:22:58 -0800
Message-Id: 
 <cadd0436dbd5a64ed561fbd9b4e4c0f61f46a9e3.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

On exiting from the guest TD, xsave state is clobbered.  Restore xsave
state on TD exit.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 734925d3a5c0..dee63c3931c8 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2,6 +2,7 @@
 #include <linux/cpu.h>
 #include <linux/mmu_context.h>
=20
+#include <asm/fpu/xcr.h>
 #include <asm/tdx.h>
=20
 #include "capabilities.h"
@@ -505,6 +506,22 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	return;
 }
=20
+static void tdx_restore_host_xsave_state(struct kvm_vcpu *vcpu)
+{
+	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
+
+	if (static_cpu_has(X86_FEATURE_XSAVE) &&
+	    host_xcr0 !=3D (kvm_tdx->xfam & kvm_caps.supported_xcr0))
+		xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);
+	if (static_cpu_has(X86_FEATURE_XSAVES) &&
+	    /* PT can be exposed to TD guest regardless of KVM's XSS support */
+	    host_xss !=3D (kvm_tdx->xfam & (kvm_caps.supported_xss | XFEATURE_MAS=
K_PT)))
+		wrmsrl(MSR_IA32_XSS, host_xss);
+	if (static_cpu_has(X86_FEATURE_PKU) &&
+	    (kvm_tdx->xfam & XFEATURE_MASK_PKRU))
+		write_pkru(vcpu->arch.host_pkru);
+}
+
 u64 __tdx_vcpu_run(hpa_t tdvpr, void *regs, u32 regs_mask);
=20
 static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
@@ -528,6 +545,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
+	tdx_restore_host_xsave_state(vcpu);
 	tdx->host_state_need_restore =3D true;
=20
 	vcpu->arch.regs_avail &=3D ~VMX_REGS_LAZY_LOAD_SET;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0872EC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:28:18 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231582AbjB0I2Q (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:28:16 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57030 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231475AbjB0I1R (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:27:17 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F0B31E5C3;
        Mon, 27 Feb 2023 00:25:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486307; x=1709022307;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=z5+pcCEM0KIA+R+2FU+LRdsod874uutXmWYads+dgjg=;
  b=k3z7XTOMnvKAjqE9D6Ny3LipQS+m4ZhRUaONqRkzntQhpyb5ofNNnQ24
   G3luf7nWpfklFoaTT1HJ4utgxf+m6LfESMm34Eybwi3JQ4YfXJ19MJ46v
   4pO4u0ypxwQyvLD3rdaTGya7sGuL66fUX9QKzpFLSGBkInaOfOSzKJmjS
   OwOe38wbfJcEPS1R3ku7hPlomWsEYIyWtoJM2xJmR9ewyZtPaFCfb9e8w
   9wyw/gbS6kf3S90pG7xplyIDLLfCURzuK2csyWgX7Ub+LDS5G/KEhsjBA
   ow6+gRdbu2YGgjOvTyj/xOf+WrDobELhnDFJrmR+xR9Ewg7grXlQQ38mb
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608928"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608928"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242273"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242273"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:13 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Chao Gao <chao.gao@intel.com>
Subject: [PATCH v12 060/106] KVM: x86: Allow to update cached values in
 kvm_user_return_msrs w/o wrmsr
Date: Mon, 27 Feb 2023 00:22:59 -0800
Message-Id: 
 <6c8fa75f6e2d004d1e3fb14afe701de42f728df1.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Chao Gao <chao.gao@intel.com>

Several MSRs are constant and only used in userspace(ring 3).  But VMs may
have different values.  KVM uses kvm_set_user_return_msr() to switch to
guest's values and leverages user return notifier to restore them when the
kernel is to return to userspace.  To eliminate unnecessary wrmsr, KVM also
caches the value it wrote to an MSR last time.

TDX module unconditionally resets some of these MSRs to architectural INIT
state on TD exit.  It makes the cached values in kvm_user_return_msrs are
inconsistent with values in hardware.  This inconsistency needs to be
fixed.  Otherwise, it may mislead kvm_on_user_return() to skip restoring
some MSRs to the host's values.  kvm_set_user_return_msr() can help correct
this case, but it is not optimal as it always does a wrmsr.  So, introduce
a variation of kvm_set_user_return_msr() to update cached values and skip
that wrmsr.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 25 ++++++++++++++++++++-----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 39c28383c2d6..fdfb37e31fa3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2170,6 +2170,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ip=
i_bitmap_low,
 int kvm_add_user_return_msr(u32 msr);
 int kvm_find_user_return_msr(u32 msr);
 int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
+void kvm_user_return_update_cache(unsigned int index, u64 val);
=20
 static inline bool kvm_is_supported_user_return_msr(u32 msr)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5b6705f81eb4..049ec2fcfef0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -430,6 +430,15 @@ static void kvm_user_return_msr_cpu_online(void)
 	}
 }
=20
+static void kvm_user_return_register_notifier(struct kvm_user_return_msrs =
*msrs)
+{
+	if (!msrs->registered) {
+		msrs->urn.on_user_return =3D kvm_on_user_return;
+		user_return_notifier_register(&msrs->urn);
+		msrs->registered =3D true;
+	}
+}
+
 int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
 {
 	unsigned int cpu =3D smp_processor_id();
@@ -444,15 +453,21 @@ int kvm_set_user_return_msr(unsigned slot, u64 value,=
 u64 mask)
 		return 1;
=20
 	msrs->values[slot].curr =3D value;
-	if (!msrs->registered) {
-		msrs->urn.on_user_return =3D kvm_on_user_return;
-		user_return_notifier_register(&msrs->urn);
-		msrs->registered =3D true;
-	}
+	kvm_user_return_register_notifier(msrs);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_set_user_return_msr);
=20
+/* Update the cache, "curr", and register the notifier */
+void kvm_user_return_update_cache(unsigned int slot, u64 value)
+{
+	struct kvm_user_return_msrs *msrs =3D this_cpu_ptr(user_return_msrs);
+
+	msrs->values[slot].curr =3D value;
+	kvm_user_return_register_notifier(msrs);
+}
+EXPORT_SYMBOL_GPL(kvm_user_return_update_cache);
+
 static void drop_user_return_notifiers(void)
 {
 	unsigned int cpu =3D smp_processor_id();
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 30D4BC7EE2E
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:28:32 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230480AbjB0I2a (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:28:30 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55310 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231350AbjB0I1o (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:27:44 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12C0B1EBC4;
        Mon, 27 Feb 2023 00:25:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486314; x=1709022314;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ven1HxTVimVypG4T/5nPJRVxUdsPtANYR+ssWz9PhF4=;
  b=SI+7jjou2eyUjrU+YZexT4dg9HxbJ3yZbVhH+ZaHrrhqsu8yL6F8p5o8
   hVSBFLMlJDqgXff/CW9KngqtujfOQk8Au8KeV45GNIw/JEXfPaZhU1rgO
   SG/IBeUTLgUPTeZxAhngx2Bj1sCtr5RP0XxecPnuelG68OsnmB7Lv4BX2
   joJZhXguDzIJ1+BudmB2HWJMAkT24Z7yhXrbH9QXQdNCGXh9SRQgncYuy
   k2OBm3UUOrjZjA/xd1pN3M4wD5kx8O8WSWkw8MZSAE3pHPe1knBPZ+lxp
   TyGnK11v30dO08cLATUB+jJyeJjCQcRYYhQ5rUuMhBoiBDIZ4tgt6aNG6
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608933"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608933"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242276"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242276"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:13 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 061/106] KVM: TDX: restore user ret MSRs
Date: Mon, 27 Feb 2023 00:23:00 -0800
Message-Id: 
 <500381b5c5bc357b39f3a1c3135513698716cff7.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Several user ret MSRs are clobbered on TD exit.  Restore those values on
TD exit and before returning to ring 3.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 43 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index dee63c3931c8..de8d2d4b03aa 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -506,6 +506,28 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	return;
 }
=20
+struct tdx_uret_msr {
+	u32 msr;
+	unsigned int slot;
+	u64 defval;
+};
+
+static struct tdx_uret_msr tdx_uret_msrs[] =3D {
+	{.msr =3D MSR_SYSCALL_MASK,},
+	{.msr =3D MSR_STAR,},
+	{.msr =3D MSR_LSTAR,},
+	{.msr =3D MSR_TSC_AUX,},
+};
+
+static void tdx_user_return_update_cache(void)
+{
+	int i;
+
+	for (i =3D 0; i < ARRAY_SIZE(tdx_uret_msrs); i++)
+		kvm_user_return_update_cache(tdx_uret_msrs[i].slot,
+					     tdx_uret_msrs[i].defval);
+}
+
 static void tdx_restore_host_xsave_state(struct kvm_vcpu *vcpu)
 {
 	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm);
@@ -545,6 +567,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
+	tdx_user_return_update_cache();
 	tdx_restore_host_xsave_state(vcpu);
 	tdx->host_state_need_restore =3D true;
=20
@@ -1635,6 +1658,26 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x8=
6_ops)
 		return -EINVAL;
 	}
=20
+	for (i =3D 0; i < ARRAY_SIZE(tdx_uret_msrs); i++) {
+		/*
+		 * Here it checks if MSRs (tdx_uret_msrs) can be saved/restored
+		 * before returning to user space.
+		 *
+		 * this_cpu_ptr(user_return_msrs)->registered isn't checked
+		 * because the registration is done at vcpu runtime by
+		 * kvm_set_user_return_msr().
+		 * Here is setting up cpu feature before running vcpu,
+		 * registered is already false.
+		 */
+		tdx_uret_msrs[i].slot =3D kvm_find_user_return_msr(tdx_uret_msrs[i].msr);
+		if (tdx_uret_msrs[i].slot =3D=3D -1) {
+			/* If any MSR isn't supported, it is a KVM bug */
+			pr_err("MSR %x isn't included by kvm_find_user_return_msr\n",
+				tdx_uret_msrs[i].msr);
+			return -EIO;
+		}
+	}
+
 	max_pkgs =3D topology_max_packages();
 	tdx_mng_key_config_lock =3D kcalloc(max_pkgs, sizeof(*tdx_mng_key_config_=
lock),
 				   GFP_KERNEL);
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2FDEBC64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:28:35 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231350AbjB0I2d (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:28:33 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56956 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231518AbjB0I1r (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:27:47 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8DD61E2B2;
        Mon, 27 Feb 2023 00:25:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486314; x=1709022314;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=2MZHmHkMFeF59QdyZJqNO+OmKhW9N2Ox4GCrmhtpfBg=;
  b=n3Ikr6+iPNi8gQNn1DlbAnv/JtfhVbbqH2pFCQqKac8NETGF7ffNzOwT
   TbbKowbbmN0HS/6cUcU7v97llaNxQH+s5u+0h+eAE6LdJZ1Sgvask1VGs
   ZLCKvPGgPUPWQd8BeidA/fyW7OuZieouVbmCmsR/yEZzo45xiBv3706Zf
   UT66L2a57v6AE17g0xa0rMeYZSt7AXzEbMipR60t2xK1jxjR+o/HzkZCY
   SLe7R2IWvJEg+3kvYLY8LU8P7Y/ggu2VAONPikgm+KzlxiQoD9GzLQA5g
   H+UQBK6zGvp410mP0ZLpEZkp1gRwNpK4pI9qyV52Lb4C74gwEZHJNDX4i
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608944"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608944"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242283"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242283"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:13 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 062/106] [MARKER] The start of TDX KVM patch series: TD
 vcpu exits/interrupts/hypercalls
Date: Mon, 27 Feb 2023 00:23:01 -0800
Message-Id: 
 <2f18e1a5dcf8077aab938bcaeee5307d6440597b.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the start of patch series of TD vcpu
exits, interrupts, and hypercalls.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/intel-tdx-layer-status.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
index 58bff496abda..010c387ef5cc 100644
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ b/Documentation/virt/kvm/intel-tdx-layer-status.rst
@@ -13,6 +13,7 @@ What qemu can do
 - Qemu can create/destroy vcpu of TDX vm type.
 - Qemu can populate initial guest memory image.
 - Qemu can finalize guest TD.
+- Qemu can start to run vcpu. But vcpu can not make progress yet.
=20
 Patch Layer status
 ------------------
@@ -23,7 +24,7 @@ Patch Layer status
 * TD vcpu creation/destruction:         Applied
 * TDX EPT violation:                    Applied
 * TD finalization:                      Applied
-* TD vcpu enter/exit:                   Applying
+* TD vcpu enter/exit:                   Applied
 * TD vcpu interrupts/exit/hypercall:    Not yet
=20
 * KVM MMU GPA shared bits:              Applied
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D1AAEC7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:28:45 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231633AbjB0I2o (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:28:44 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57004 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231448AbjB0I2J (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:09 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE54B1EBE5;
        Mon, 27 Feb 2023 00:25:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486317; x=1709022317;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=DlwMNkw053J3WJvncMnbVtQODdBLGKqN9ksDYBpJ4yk=;
  b=eewTGA5h4KFHZccmed4Izr0ds0GzqG3ux/6yvQkZzB+3X0QM35XT7wj0
   IxOGzrbj4zVKCMu+KIdCvf1kZn9U4NY+ilx9ZQU2YaqQilk8zcMJT6plp
   99sDgzsvN2IPQmshaQprXDaROzouHRQQBLp75DeoHxgQPy01OHibVklYg
   RsyB39BKd9jhXBOVhK70SPWSZPUNiOV6gtWYccWvNxcz/8deq6NcYAHKN
   zE/9cadVjFziWk5QrNdPJxQz81pgKfvhiBsRJqv0XinMOdP1zMg/sQR5a
   FQ8s1Cb+kPDekFBjstmknViVY2As5nKgA/B5Wkm9N1UeQLruP0R8GBUU5
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608945"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608945"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242295"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242295"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:13 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 063/106] KVM: TDX: complete interrupts after tdexit
Date: Mon, 27 Feb 2023 00:23:02 -0800
Message-Id: 
 <583bb97acb8a2d6da4ae4b2b8270fc7831aa4810.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This corresponds to VMX __vmx_complete_interrupts().  Because TDX
virtualize vAPIC, KVM only needs to care NMI injection.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 10 ++++++++++
 arch/x86/kvm/vmx/tdx.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index de8d2d4b03aa..0e07fd13ec66 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -506,6 +506,14 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_e=
vent)
 	return;
 }
=20
+static void tdx_complete_interrupts(struct kvm_vcpu *vcpu)
+{
+	/* Avoid costly SEAMCALL if no nmi was injected */
+	if (vcpu->arch.nmi_injected)
+		vcpu->arch.nmi_injected =3D td_management_read8(to_tdx(vcpu),
+							      TD_VCPU_PEND_NMI);
+}
+
 struct tdx_uret_msr {
 	u32 msr;
 	unsigned int slot;
@@ -574,6 +582,8 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
 	vcpu->arch.regs_avail &=3D ~VMX_REGS_LAZY_LOAD_SET;
 	trace_kvm_exit(vcpu, KVM_ISA_VMX);
=20
+	tdx_complete_interrupts(vcpu);
+
 	return EXIT_FASTPATH_NONE;
 }
=20
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index cd50d366b7ee..e66e5762ae04 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -192,6 +192,8 @@ TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs);
 TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs);
 TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs);
=20
+TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management);
+
 static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u3=
2 field)
 {
 	struct tdx_module_output out;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B9E1CC7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:28:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231652AbjB0I2q (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:28:46 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57030 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231570AbjB0I2M (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:12 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACDBF1EBED;
        Mon, 27 Feb 2023 00:25:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486322; x=1709022322;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=wQRsBjUiBMYTpgEwJyYgRZtvi4+wq6LFWeS2Z3xRVmY=;
  b=C1sTnLac6mNQkfujHWOQA7UINP7ODCu9M/QQtL5gyD6iGOpd2JQ9/ZIs
   huXlWQWp9y7+EO2JKBP4xLvLDbeWz7qCHA5QWVyqKO1ipqaOzuuaBkMpW
   yabTpHAhmW8AOAzYDAeqBw7IMmwnHZdJtvb9Nb4LLBBRMip7ojEMCV37d
   XYeAfNRvtJoibsOE5h5z0g5JWHv11aaadO5DYfyplIa6jqB8KcIQU5H23
   GQAN6NYJ9me8l00LWW/lNFQe6kRaW7HjhHZpfxscQslKzHLTRuDuoiBU8
   u2kGO9cQ7yzm361ATgUuyBsqmmRgWnAQGK67G+fpa+rZlQf1o9StLTQ7W
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608954"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608954"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242299"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242299"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 064/106] KVM: TDX: restore debug store when TD exit
Date: Mon, 27 Feb 2023 00:23:03 -0800
Message-Id: 
 <b3e9c1901160d665c0607e217915872e4f39c63a.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because debug store is clobbered, restore it on TD exit.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/events/intel/ds.c | 1 +
 arch/x86/kvm/vmx/tdx.c     | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 88e58b6ee73c..4989b35161b6 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2350,3 +2350,4 @@ void perf_restore_debug_store(void)
=20
 	wrmsrl(MSR_IA32_DS_AREA, (unsigned long)ds);
 }
+EXPORT_SYMBOL_GPL(perf_restore_debug_store);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 0e07fd13ec66..30eae42b071d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -576,6 +576,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
 	tdx_user_return_update_cache();
+	perf_restore_debug_store();
 	tdx_restore_host_xsave_state(vcpu);
 	tdx->host_state_need_restore =3D true;
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 656D6C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:28:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231661AbjB0I2s (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:28:48 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56744 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231572AbjB0I2M (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:12 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17DBF1EFC3;
        Mon, 27 Feb 2023 00:25:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486323; x=1709022323;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=LanarsgvaG0xu5W8XHMQZbnvKl6TokXzlWJtLLSNfL0=;
  b=Xvbcdw8l3xLCwa+op1MJ1VLNtGBUwr/KScT29bRGirWk67sUcb7O2+E4
   w/en5ayRyLblUHlQtYZWA4Fu6cxnkWHpMfdruq4ni6dqhC3RE8eUSq7X3
   8fYJfkn/mfb1S757AycI766Vp+BcZ2T/Zw3m1OB9AuEsUMfr7Rl0G/zKD
   9pYonB2Z7J77bGr1SxOK3Ksb+j3+qQZZ1oiz/un92GCreMhsjNrejcfr5
   1mnmyEqPvS1bEYvyFCD06jMxvCgPvDhkkVxNl7jMpzt9U//LSHXa0TMzR
   GCu6muy132QgJCMOedfjpwb65i+OXAnrlnjIPAmH6Sdmw6zWEN3xkwnsI
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608956"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608956"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242303"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242303"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 065/106] KVM: TDX: handle vcpu migration over logical
 processor
Date: Mon, 27 Feb 2023 00:23:04 -0800
Message-Id: 
 <4318ac4e20afbcd467895a5d988ec28ae769d3a0.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

For vcpu migration, in the case of VMX, VMCS is flushed on the source pcpu,
and load it on the target pcpu.  There are corresponding TDX SEAMCALL APIs,
call them on vcpu migration.  The logic is mostly same as VMX except the
TDX SEAMCALLs are used.

When shutting down the machine, (VMX or TDX) vcpus needs to be shutdown on
each pcpu.  Do the similar for TDX with TDX SEAMCALL APIs.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    |  37 ++++++++-
 arch/x86/kvm/vmx/tdx.c     | 152 +++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h     |   2 +
 arch/x86/kvm/vmx/x86_ops.h |   7 ++
 4 files changed, 195 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 6066682cb4ff..2749d6995638 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -32,6 +32,13 @@ static int vt_max_vcpus(struct kvm *kvm)
 }
 static int vt_tlb_remote_flush(struct kvm *kvm);
=20
+static void vt_hardware_disable(void)
+{
+	/* Note, TDX *and* VMX need to be disabled if TDX is enabled. */
+	tdx_hardware_disable();
+	vmx_hardware_disable();
+}
+
 static __init int vt_hardware_setup(void)
 {
 	int ret;
@@ -191,6 +198,16 @@ static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu)
 	return vmx_vcpu_run(vcpu);
 }
=20
+static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_vcpu_load(vcpu, cpu);
+		return;
+	}
+
+	vmx_vcpu_load(vcpu, cpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu)) {
@@ -250,6 +267,14 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa=
_t root_hpa,
 	vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level);
 }
=20
+static void vt_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_sched_in(vcpu, cpu);
+}
+
 static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	if (is_td_vcpu(vcpu))
@@ -294,7 +319,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.offline_cpu =3D tdx_offline_cpu,
=20
 	.hardware_enable =3D vmx_hardware_enable,
-	.hardware_disable =3D vmx_hardware_disable,
+	.hardware_disable =3D vt_hardware_disable,
 	.has_emulated_msr =3D vmx_has_emulated_msr,
=20
 	.is_vm_type_supported =3D vt_is_vm_type_supported,
@@ -312,7 +337,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vcpu_reset =3D vt_vcpu_reset,
=20
 	.prepare_switch_to_guest =3D vt_prepare_switch_to_guest,
-	.vcpu_load =3D vmx_vcpu_load,
+	.vcpu_load =3D vt_vcpu_load,
 	.vcpu_put =3D vt_vcpu_put,
=20
 	.update_exception_bitmap =3D vmx_update_exception_bitmap,
@@ -398,7 +423,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.request_immediate_exit =3D vmx_request_immediate_exit,
=20
-	.sched_in =3D vmx_sched_in,
+	.sched_in =3D vt_sched_in,
=20
 	.cpu_dirty_log_size =3D PML_ENTITY_NUM,
 	.update_cpu_dirty_logging =3D vmx_update_cpu_dirty_logging,
@@ -466,6 +491,10 @@ static int __init vt_init(void)
 	if (r)
 		goto err_vmx_init;
=20
+	r =3D tdx_init();
+	if (r)
+		goto err_tdx_init;
+
 	/*
 	 * Common KVM initialization _must_ come last, after this, /dev/kvm is
 	 * exposed to userspace!
@@ -488,6 +517,8 @@ static int __init vt_init(void)
 	return 0;
=20
 err_kvm_init:
+	/* tdx_exit() is not defined. */
+err_tdx_init:
 	vmx_exit();
 err_vmx_init:
 	kvm_x86_vendor_exit();
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 30eae42b071d..8b5a3d852e57 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -74,6 +74,14 @@ static DEFINE_MUTEX(tdx_lock);
 static struct mutex *tdx_mng_key_config_lock;
 static atomic_t nr_configured_hkid;
=20
+/*
+ * A per-CPU list of TD vCPUs associated with a given CPU.  Used when a CPU
+ * is brought down to invoke TDH_VP_FLUSH on the approapriate TD vCPUS.
+ * Protected by interrupt mask.  This list is manipulated in process conte=
xt
+ * of vcpu and IPI callback.  See tdx_flush_vp_on_cpu().
+ */
+static DEFINE_PER_CPU(struct list_head, associated_tdvcpus);
+
 static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
 {
 	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
@@ -105,6 +113,31 @@ static inline bool is_td_finalized(struct kvm_tdx *kvm=
_tdx)
 	return kvm_tdx->finalized;
 }
=20
+static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu)
+{
+	list_del(&to_tdx(vcpu)->cpu_list);
+
+	/*
+	 * Ensure tdx->cpu_list is updated is before setting vcpu->cpu to -1,
+	 * otherwise, a different CPU can see vcpu->cpu =3D -1 and add the vCPU
+	 * to its list before its deleted from this CPUs list.
+	 */
+	smp_wmb();
+
+	vcpu->cpu =3D -1;
+}
+
+void tdx_hardware_disable(void)
+{
+	int cpu =3D raw_smp_processor_id();
+	struct list_head *tdvcpus =3D &per_cpu(associated_tdvcpus, cpu);
+	struct vcpu_tdx *tdx, *tmp;
+
+	/* Safe variant needed as tdx_disassociate_vp() deletes the entry. */
+	list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list)
+		tdx_disassociate_vp(&tdx->vcpu);
+}
+
 static void tdx_clear_page(unsigned long page_pa)
 {
 	const void *zero_page =3D (const void *) __va(page_to_phys(ZERO_PAGE(0)));
@@ -194,6 +227,68 @@ static void tdx_reclaim_td_page(unsigned long td_page_=
pa)
 	free_page((unsigned long)__va(td_page_pa));
 }
=20
+struct tdx_flush_vp_arg {
+	struct kvm_vcpu *vcpu;
+	u64 err;
+};
+
+static void tdx_flush_vp(void *arg_)
+{
+	struct tdx_flush_vp_arg *arg =3D arg_;
+	struct kvm_vcpu *vcpu =3D arg->vcpu;
+	u64 err;
+
+	arg->err =3D 0;
+	lockdep_assert_irqs_disabled();
+
+	/* Task migration can race with CPU offlining. */
+	if (vcpu->cpu !=3D raw_smp_processor_id())
+		return;
+
+	/*
+	 * No need to do TDH_VP_FLUSH if the vCPU hasn't been initialized.  The
+	 * list tracking still needs to be updated so that it's correct if/when
+	 * the vCPU does get initialized.
+	 */
+	if (is_td_vcpu_created(to_tdx(vcpu))) {
+		/*
+		 * No need to retry.  TDX Resources needed for TDH.VP.FLUSH are,
+		 * TDVPR as exclusive, TDR as shared, and TDCS as shared.  This
+		 * vp flush function is called when destructing vcpu/TD or vcpu
+		 * migration.  No other thread uses TDVPR in those cases.
+		 */
+		err =3D tdh_vp_flush(to_tdx(vcpu)->tdvpr_pa);
+		if (unlikely(err && err !=3D TDX_VCPU_NOT_ASSOCIATED)) {
+			/*
+			 * This function is called in IPI context. Do not use
+			 * printk to avoid console semaphore.
+			 * The caller prints out the error message, instead.
+			 */
+			if (err)
+				arg->err =3D err;
+		}
+	}
+
+	tdx_disassociate_vp(vcpu);
+}
+
+static void tdx_flush_vp_on_cpu(struct kvm_vcpu *vcpu)
+{
+	struct tdx_flush_vp_arg arg =3D {
+		.vcpu =3D vcpu,
+	};
+	int cpu =3D vcpu->cpu;
+
+	if (unlikely(cpu =3D=3D -1))
+		return;
+
+	smp_call_function_single(cpu, tdx_flush_vp, &arg, 1);
+	if (WARN_ON_ONCE(arg.err)) {
+		pr_err("cpu: %d ", cpu);
+		pr_tdx_error(TDH_VP_FLUSH, arg.err, NULL);
+	}
+}
+
 static int tdx_do_tdh_phymem_cache_wb(void *param)
 {
 	u64 err =3D 0;
@@ -218,6 +313,8 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
 	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
 	cpumask_var_t packages;
 	bool cpumask_allocated;
+	struct kvm_vcpu *vcpu;
+	unsigned long j;
 	u64 err;
 	int ret;
 	int i;
@@ -228,6 +325,19 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
 	if (!is_td_created(kvm_tdx))
 		goto free_hkid;
=20
+	kvm_for_each_vcpu(j, vcpu, kvm)
+		tdx_flush_vp_on_cpu(vcpu);
+
+	mutex_lock(&tdx_lock);
+	err =3D tdh_mng_vpflushdone(kvm_tdx->tdr_pa);
+	mutex_unlock(&tdx_lock);
+	if (WARN_ON_ONCE(err)) {
+		pr_tdx_error(TDH_MNG_VPFLUSHDONE, err, NULL);
+		pr_err("tdh_mng_vpflushdone failed. HKID %d is leaked.\n",
+			kvm_tdx->hkid);
+		return;
+	}
+
 	cpumask_allocated =3D zalloc_cpumask_var(&packages, GFP_KERNEL);
 	cpus_read_lock();
 	for_each_online_cpu(i) {
@@ -422,6 +532,26 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	return 0;
 }
=20
+void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	if (vcpu->cpu =3D=3D cpu)
+		return;
+
+	tdx_flush_vp_on_cpu(vcpu);
+
+	local_irq_disable();
+	/*
+	 * Pairs with the smp_wmb() in tdx_disassociate_vp() to ensure
+	 * vcpu->cpu is read before tdx->cpu_list.
+	 */
+	smp_rmb();
+
+	list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu));
+	local_irq_enable();
+}
+
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
@@ -476,6 +606,19 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu)
 	}
 	tdx_reclaim_td_page(tdx->tdvpr_pa);
 	tdx->tdvpr_pa =3D 0;
+
+	/*
+	 * kvm_free_vcpus()
+	 *   -> kvm_unload_vcpu_mmu()
+	 *
+	 * does vcpu_load() for every vcpu after they already disassociated
+	 * from the per cpu list when tdx_vm_teardown(). So we need to
+	 * disassociate them again, otherwise the freed vcpu data will be
+	 * accessed when do list_{del,add}() on associated_tdvcpus list
+	 * later.
+	 */
+	tdx_flush_vp_on_cpu(vcpu);
+	WARN_ON_ONCE(vcpu->cpu !=3D -1);
 }
=20
 void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -1756,3 +1899,12 @@ int tdx_offline_cpu(void)
 				    "Delete all TDs in order to offline all CPUs of a package.\n");
 	return ret;
 }
+
+int __init tdx_init(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, cpu));
+	return 0;
+}
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index e66e5762ae04..1595c124899d 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -62,6 +62,8 @@ struct vcpu_tdx {
 	unsigned long tdvpr_pa;
 	unsigned long *tdvpx_pa;
=20
+	struct list_head cpu_list;
+
 	union tdx_exit_reason exit_reason;
=20
 	bool initialized;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index bafbf4e06a5b..49f8d63dd91a 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -138,8 +138,11 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu);
 void vmx_setup_mce(struct kvm_vcpu *vcpu);
=20
 #ifdef CONFIG_INTEL_TDX_HOST
+int __init tdx_init(void);
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
 void tdx_hardware_unsetup(void);
+void tdx_hardware_enable(void);
+void tdx_hardware_disable(void);
 bool tdx_is_vm_type_supported(unsigned long type);
 int tdx_dev_ioctl(void __user *argp);
 int tdx_offline_cpu(void);
@@ -157,6 +160,7 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_ev=
ent);
 fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu);
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void tdx_vcpu_put(struct kvm_vcpu *vcpu);
+void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
=20
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -165,8 +169,10 @@ void tdx_flush_tlb(struct kvm_vcpu *vcpu);
 int tdx_sept_tlb_remote_flush(struct kvm *kvm);
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_leve=
l);
 #else
+static inline int tdx_init(void) { return 0; };
 static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return=
 -ENOSYS; }
 static inline void tdx_hardware_unsetup(void) {}
+static inline void tdx_hardware_disable(void) {}
 static inline bool tdx_is_vm_type_supported(unsigned long type) { return f=
alse; }
 static inline int tdx_dev_ioctl(void __user *argp) { return -EOPNOTSUPP; };
 static inline int tdx_offline_cpu(void) { return 0; }
@@ -185,6 +191,7 @@ static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu=
, bool init_event) {}
 static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu) { return EXIT=
_FASTPATH_NONE; }
 static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
+static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
 static inline u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is=
_mmio) { return 0; }
=20
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 24170C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:28:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231674AbjB0I2v (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:28:51 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57070 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231487AbjB0I2U (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:20 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7454B1EFD2;
        Mon, 27 Feb 2023 00:25:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486323; x=1709022323;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=7jyAWIM/5SZwDrzTqOaEufAEey8qKjzXI2tIcfboZvM=;
  b=dn0sNxLfG7Ng5qMMsrrcwwCgDGbKvAs06NC4UE505uTl/KR6AN1bAPdh
   44K9bf/3UoPYZqbNzJUjgfZs2JoMm7NBvEuQB4EhuL32CcQEg77rhQKp0
   b75bbXEqsVeI9+53PpnV7i2fIglYWI6NuI+gD5tOF96OVAL7OBwuJofKi
   q5HUbI7GtQ45V8JbpqIwyNR8ynEhgQIoiPplAx0zQ+saCpvPoTM2rnx60
   iomRHYHXoP6E2du833JPXz3I06qeZ2IW+ldsFCS31fkWEbaKL8Qt4uwEr
   nDqMGP3qzBL2gmnaOgjiv3HCegunqOur9WWcqNF+YoFtaPuPRl7FmXjZY
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608958"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608958"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242308"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242308"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Chao Gao <chao.gao@intel.com>
Subject: [PATCH v12 066/106] KVM: x86: Add a switch_db_regs flag to handle
 TDX's auto-switched behavior
Date: Mon, 27 Feb 2023 00:23:05 -0800
Message-Id: 
 <f96af0d7894be8463c32c225c62d77f9f1073697.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a flag, KVM_DEBUGREG_AUTO_SWITCHED_GUEST, to skip saving/restoring DRs
irrespective of any other flags.  TDX-SEAM unconditionally saves and
restores guest DRs and reset to architectural INIT state on TD exit.
So, KVM needs to save host DRs before TD enter without restoring guest DRs
and restore host DRs after TD exit.

Opportunistically convert the KVM_DEBUGREG_* definitions to use BIT().

Reported-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 10 ++++++++--
 arch/x86/kvm/vmx/tdx.c          |  1 +
 arch/x86/kvm/x86.c              | 11 ++++++++---
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index fdfb37e31fa3..fc06f0885206 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -602,8 +602,14 @@ struct kvm_pmu {
 struct kvm_pmu_ops;
=20
 enum {
-	KVM_DEBUGREG_BP_ENABLED =3D 1,
-	KVM_DEBUGREG_WONT_EXIT =3D 2,
+	KVM_DEBUGREG_BP_ENABLED		=3D BIT(0),
+	KVM_DEBUGREG_WONT_EXIT		=3D BIT(1),
+	/*
+	 * Guest debug registers (DR0-3 and DR6) are saved/restored by hardware
+	 * on exit from or enter to guest. KVM needn't switch them. Because DR7
+	 * is cleared on exit from guest, DR7 need to be saved/restored.
+	 */
+	KVM_DEBUGREG_AUTO_SWITCH	=3D BIT(2),
 };
=20
 struct kvm_mtrr_range {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 8b5a3d852e57..20f33218f069 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -518,6 +518,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
=20
 	vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
=20
+	vcpu->arch.switch_db_regs =3D KVM_DEBUGREG_AUTO_SWITCH;
 	vcpu->arch.cr0_guest_owned_bits =3D -1ul;
 	vcpu->arch.cr4_guest_owned_bits =3D -1ul;
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 049ec2fcfef0..1b10f54c8acf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10632,7 +10632,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.guest_fpu.xfd_err)
 		wrmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
=20
-	if (unlikely(vcpu->arch.switch_db_regs)) {
+	if (unlikely(vcpu->arch.switch_db_regs & ~KVM_DEBUGREG_AUTO_SWITCH)) {
 		set_debugreg(0, 7);
 		set_debugreg(vcpu->arch.eff_db[0], 0);
 		set_debugreg(vcpu->arch.eff_db[1], 1);
@@ -10675,6 +10675,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 */
 	if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)) {
 		WARN_ON(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP);
+		WARN_ON(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCH);
 		static_call(kvm_x86_sync_dirty_debug_regs)(vcpu);
 		kvm_update_dr0123(vcpu);
 		kvm_update_dr7(vcpu);
@@ -10687,8 +10688,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * care about the messed up debug address registers. But if
 	 * we have some of them active, restore the old state.
 	 */
-	if (hw_breakpoint_active())
-		hw_breakpoint_restore();
+	if (hw_breakpoint_active()) {
+		if (!(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCH))
+			hw_breakpoint_restore();
+		else
+			set_debugreg(__this_cpu_read(cpu_dr7), 7);
+	}
=20
 	vcpu->arch.last_vmentry_cpu =3D vcpu->cpu;
 	vcpu->arch.last_guest_tsc =3D kvm_read_l1_tsc(vcpu, rdtsc());
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ED5A4C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:28:57 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231680AbjB0I2z (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:28:55 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57140 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230225AbjB0I2X (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:23 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A08931E9E5;
        Mon, 27 Feb 2023 00:25:25 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486325; x=1709022325;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=dYK7xY/icl9Sg8vYIdr9Yck02Aqi+i591xQC1tLpJVg=;
  b=VUdgT5I4GosrhXP3ErFKShQO0LbHEWlP85hN9s2RhRFvsaczCwCO/KQg
   quCRak2v9HrQ26cfyUsGjgeG9mbwLNYVD8faBb2Y4bY4lwHwYQAhFYnDn
   eAWYrZJBaFi/mQ5qoanyGMQaBHT2JV8he0MR0egtyMfSCeT3yrab9zvtU
   eoFmlbHzgxUQYzaJFFavJ77dGzmMfTM30rVfy7V5CRWZTvwJP8EEvaE+E
   G7LsU7owhTIhQUDeRfUoDzjmNNpulAIHqUWnBuzCL7zz5Zeix847QajAs
   xVEeQvnK/MIcmasJyGggDS1+jhwfPDaSHaNZfe7YP3IfYH3pGMtm54lKT
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608963"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608963"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242313"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242313"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 067/106] KVM: TDX: Add support for find pending IRQ in a
 protected local APIC
Date: Mon, 27 Feb 2023 00:23:06 -0800
Message-Id: 
 <e32bb8ae3e79ef4c241014b528c512ca3f8e650f.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <seanjc@google.com>

Add flag and hook to KVM's local APIC management to support determining
whether or not a TDX guest as a pending IRQ.  For TDX vCPUs, the virtual
APIC page is owned by the TDX module and cannot be accessed by KVM.  As a
result, registers that are virtualized by the CPU, e.g. PPR, cannot be
read or written by KVM.  To deliver interrupts for TDX guests, KVM must
send an IRQ to the CPU on the posted interrupt notification vector.  And
to determine if TDX vCPU has a pending interrupt, KVM must check if there
is an outstanding notification.

Return "no interrupt" in kvm_apic_has_interrupt() if the guest APIC is
protected to short-circuit the various other flows that try to pull an
IRQ out of the vAPIC, the only valid operation is querying _if_ an IRQ is
pending, KVM can't do anything based on _which_ IRQ is pending.

Intentionally omit sanity checks from other flows, e.g. PPR update, so as
not to degrade non-TDX guests with unnecessary checks.  A well-behaved KVM
and userspace will never reach those flows for TDX guests, but reaching
them is not fatal if something does go awry.

Note, this doesn't handle interrupts that have been delivered to the vCPU
but not yet recognized by the core, i.e. interrupts that are sitting in
vmcs.GUEST_INTR_STATUS.  Querying that state requires a SEAMCALL and will
be supported in a future patch.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/irq.c                 |  3 +++
 arch/x86/kvm/lapic.c               |  3 +++
 arch/x86/kvm/lapic.h               |  2 ++
 arch/x86/kvm/vmx/main.c            | 11 +++++++++++
 arch/x86/kvm/vmx/tdx.c             |  6 ++++++
 arch/x86/kvm/vmx/x86_ops.h         |  2 ++
 8 files changed, 29 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index 2681300ce142..e1242c4b248f 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -118,6 +118,7 @@ KVM_X86_OP_OPTIONAL(pi_update_irte)
 KVM_X86_OP_OPTIONAL(pi_start_assignment)
 KVM_X86_OP_OPTIONAL(apicv_post_state_restore)
 KVM_X86_OP_OPTIONAL_RET0(dy_apicv_has_pending_interrupt)
+KVM_X86_OP_OPTIONAL(protected_apic_has_interrupt)
 KVM_X86_OP_OPTIONAL(set_hv_timer)
 KVM_X86_OP_OPTIONAL(cancel_hv_timer)
 KVM_X86_OP(setup_mce)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index fc06f0885206..2051ae6da619 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1749,6 +1749,7 @@ struct kvm_x86_ops {
 	void (*pi_start_assignment)(struct kvm *kvm);
 	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
 	bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);
+	bool (*protected_apic_has_interrupt)(struct kvm_vcpu *vcpu);
=20
 	int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
 			    bool *expired);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index b2c397dd2bc6..fd6af5530c32 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -100,6 +100,9 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 	if (kvm_cpu_has_extint(v))
 		return 1;
=20
+	if (lapic_in_kernel(v) && v->arch.apic->guest_apic_protected)
+		return static_call(kvm_x86_protected_apic_has_interrupt)(v);
+
 	return kvm_apic_has_interrupt(v) !=3D -1;	/* LAPIC */
 }
 EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 80f92cbc4029..8c99a9d7b39b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2785,6 +2785,9 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
 	if (!kvm_apic_present(vcpu))
 		return -1;
=20
+	if (apic->guest_apic_protected)
+		return -1;
+
 	__apic_update_ppr(apic, &ppr);
 	return apic_has_interrupt_for_ppr(apic, ppr);
 }
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index df316ede7546..b2f652c0db33 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -66,6 +66,8 @@ struct kvm_lapic {
 	bool sw_enabled;
 	bool irr_pending;
 	bool lvt0_in_nmi_mode;
+	/* Select registers in the vAPIC cannot be read/written. */
+	bool guest_apic_protected;
 	/* Number of bits set in ISR. */
 	s16 isr_count;
 	/* The highest vector set in ISR; if -1 - invalid, must scan ISR. */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 2749d6995638..f5c569f43ccd 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -49,6 +49,9 @@ static __init int vt_hardware_setup(void)
=20
 	enable_tdx =3D enable_tdx && !tdx_hardware_setup(&vt_x86_ops);
=20
+	if (!enable_tdx)
+		vt_x86_ops.protected_apic_has_interrupt =3D NULL;
+
 	/*
 	 * As kvm_mmu_set_ept_masks() updates enable_mmio_caching, call it
 	 * before checking enable_mmio_caching.
@@ -208,6 +211,13 @@ static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cp=
u)
 	vmx_vcpu_load(vcpu, cpu);
 }
=20
+static bool vt_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	KVM_BUG_ON(!is_td_vcpu(vcpu), vcpu->kvm);
+
+	return tdx_protected_apic_has_interrupt(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu)) {
@@ -400,6 +410,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.sync_pir_to_irr =3D vmx_sync_pir_to_irr,
 	.deliver_interrupt =3D vmx_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
+	.protected_apic_has_interrupt =3D vt_protected_apic_has_interrupt,
=20
 	.set_tss_addr =3D vmx_set_tss_addr,
 	.set_identity_map_addr =3D vmx_set_identity_map_addr,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 20f33218f069..f5a3150ecff1 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -515,6 +515,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 		return -EINVAL;
=20
 	fpstate_set_confidential(&vcpu->arch.guest_fpu);
+	vcpu->arch.apic->guest_apic_protected =3D true;
=20
 	vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
=20
@@ -553,6 +554,11 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	local_irq_enable();
 }
=20
+bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	return pi_has_pending_interrupt(vcpu);
+}
+
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 49f8d63dd91a..238a948c3bdb 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -161,6 +161,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu);
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void tdx_vcpu_put(struct kvm_vcpu *vcpu);
 void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
+bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
 u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
=20
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
@@ -192,6 +193,7 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *=
vcpu) { return EXIT_FASTP
 static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
+static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)=
 { return false; }
 static inline u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is=
_mmio) { return 0; }
=20
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7D3A3C7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:29:04 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231690AbjB0I3D (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:29:03 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56928 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231501AbjB0I2Y (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:24 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3074B1EFEC;
        Mon, 27 Feb 2023 00:25:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486328; x=1709022328;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=GFLU4367mK+hlnmpk1ajR6jDNEWYTf2ro/PSD8hACls=;
  b=V4Tm2UjOeY88IZbF8Au7baazQlq+j8Ac1a4eIzakNK6KycXAAYeSfb5k
   kUhNdtvFwCTnLjwbgPG+Ji6ahI+IZ5nMz/f571IGDK+h70LTo/mkPg2gu
   SE5O4Y+3EeC61GGFuCiM7+VRQcdyWi6aI5/6729xroDb1SKdpIp0ugBLU
   t5Hh3FbLneX/TM0crSTRDJx3Do1mRy8lZC6mCNVu1t27kOSmInSCwnfGA
   Nz8+2v8MNZS7z5DgXlKiDqFsmr8BRuyXezuno5+XZiZ0+MCjbeWn12t7g
   tHz3DW2Q3cMjd6DKCiCh03mGcv2X1v3YTwi59qtCPQDYpEzEOj4tYLO1C
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608964"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608964"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242318"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242318"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 068/106] KVM: x86: Assume timer IRQ was injected if APIC
 state is proteced
Date: Mon, 27 Feb 2023 00:23:07 -0800
Message-Id: 
 <562d136ca6bb573f59c4bfa67f8376da27f795f2.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <seanjc@google.com>

If APIC state is protected, i.e. the vCPU is a TDX guest, assume a timer
IRQ was injected when deciding whether or not to busy wait in the "timer
advanced" path.  The "real" vIRR is not readable/writable, so trying to
query for a pending timer IRQ will return garbage.

Note, TDX can scour the PIR if it wants to be more precise and skip the
"wait" call entirely.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 8c99a9d7b39b..eae1459f8283 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1694,8 +1694,17 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
 static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
-	u32 reg =3D kvm_lapic_get_reg(apic, APIC_LVTT);
+	u32 reg;
=20
+	/*
+	 * Assume a timer IRQ was "injected" if the APIC is protected.  KVM's
+	 * copy of the vIRR is bogus, it's the responsibility of the caller to
+	 * precisely check whether or not a timer IRQ is pending.
+	 */
+	if (apic->guest_apic_protected)
+		return true;
+
+	reg  =3D kvm_lapic_get_reg(apic, APIC_LVTT);
 	if (kvm_apic_hw_enabled(apic)) {
 		int vec =3D reg & APIC_VECTOR_MASK;
 		void *bitmap =3D apic->regs + APIC_ISR;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 67B0DC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:29:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231710AbjB0I3L (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:29:11 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57826 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230506AbjB0I2Z (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:25 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0DB31E5E4;
        Mon, 27 Feb 2023 00:25:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486328; x=1709022328;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=dgh98XI2cNAkvEv0Hnd0+5QoSVsbNBqK22P5hIqCA8I=;
  b=WRGIIu1qbU03Bkjaa11KOgUL4noQOjtmWunLMAUTw9lq5z2TKMEFHjOl
   y0K5COAGmv3RRGdmAYWVLA6GPcdNb/ZSoEEt2xQ96m+MEphynNjrdWsb8
   Bj7qa5ar2pjna3zfyl+C5IgeKb+GC6RLncAfW3hmrImg3WEXY/2lNNp9p
   bxUJxWNZ8BbOHB0zn5dqTGVRcTDmhDDGoHCC5zQj7OZSpzEfKl6HuR21f
   POVVjXAFqj+iL22IxkO9XdbdkIFOc7+q37Jz5ZAH4yI27te0pjmm74FYn
   9NLyU7eqAee8jmjGcXopXEpxr20TPjkLl5tmtnqD4SGFKQXj4g5DfPDDI
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608967"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608967"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242322"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242322"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:14 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 069/106] KVM: TDX: remove use of struct vcpu_vmx from
 posted_interrupt.c
Date: Mon, 27 Feb 2023 00:23:08 -0800
Message-Id: 
 <38d9695753c768657b77b88b06a3c74785a264c3.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

As TDX will use posted_interrupt.c, the use of struct vcpu_vmx is a
blocker.  Because the members of struct pi_desc pi_desc and struct
list_head pi_wakeup_list are only used in posted_interrupt.c, introduce
common structure, struct vcpu_pi, make vcpu_vmx and vcpu_tdx has same
layout in the top of structure.

To minimize the diff size, avoid code conversion like,
vmx->pi_desc =3D> vmx->common->pi_desc.  Instead add compile time check
if the layout is expected.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/posted_intr.c | 41 ++++++++++++++++++++++++++--------
 arch/x86/kvm/vmx/posted_intr.h | 11 +++++++++
 arch/x86/kvm/vmx/tdx.c         |  1 +
 arch/x86/kvm/vmx/tdx.h         |  8 +++++++
 arch/x86/kvm/vmx/vmx.h         | 14 +++++++-----
 5 files changed, 60 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 94c38bea60e7..92de016852ca 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -11,6 +11,7 @@
 #include "posted_intr.h"
 #include "trace.h"
 #include "vmx.h"
+#include "tdx.h"
=20
 /*
  * Maintain a per-CPU list of vCPUs that need to be awakened by wakeup_han=
dler()
@@ -31,9 +32,29 @@ static DEFINE_PER_CPU(struct list_head, wakeup_vcpus_on_=
cpu);
  */
 static DEFINE_PER_CPU(raw_spinlock_t, wakeup_vcpus_on_cpu_lock);
=20
+/*
+ * The layout of the head of struct vcpu_vmx and struct vcpu_tdx must matc=
h with
+ * struct vcpu_pi.
+ */
+static_assert(offsetof(struct vcpu_pi, pi_desc) =3D=3D
+	      offsetof(struct vcpu_vmx, pi_desc));
+static_assert(offsetof(struct vcpu_pi, pi_wakeup_list) =3D=3D
+	      offsetof(struct vcpu_vmx, pi_wakeup_list));
+#ifdef CONFIG_INTEL_TDX_HOST
+static_assert(offsetof(struct vcpu_pi, pi_desc) =3D=3D
+	      offsetof(struct vcpu_tdx, pi_desc));
+static_assert(offsetof(struct vcpu_pi, pi_wakeup_list) =3D=3D
+	      offsetof(struct vcpu_tdx, pi_wakeup_list));
+#endif
+
+static inline struct vcpu_pi *vcpu_to_pi(struct kvm_vcpu *vcpu)
+{
+	return (struct vcpu_pi *)vcpu;
+}
+
 static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 {
-	return &(to_vmx(vcpu)->pi_desc);
+	return &vcpu_to_pi(vcpu)->pi_desc;
 }
=20
 static int pi_try_set_control(struct pi_desc *pi_desc, u64 *pold, u64 new)
@@ -52,8 +73,8 @@ static int pi_try_set_control(struct pi_desc *pi_desc, u6=
4 *pold, u64 new)
=20
 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 {
-	struct pi_desc *pi_desc =3D vcpu_to_pi_desc(vcpu);
-	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
+	struct vcpu_pi *vcpu_pi =3D vcpu_to_pi(vcpu);
+	struct pi_desc *pi_desc =3D &vcpu_pi->pi_desc;
 	struct pi_desc old, new;
 	unsigned long flags;
 	unsigned int dest;
@@ -90,7 +111,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 	 */
 	if (pi_desc->nv =3D=3D POSTED_INTR_WAKEUP_VECTOR) {
 		raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
-		list_del(&vmx->pi_wakeup_list);
+		list_del(&vcpu_pi->pi_wakeup_list);
 		raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
 	}
=20
@@ -145,15 +166,15 @@ static bool vmx_can_use_vtd_pi(struct kvm *kvm)
  */
 static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
 {
-	struct pi_desc *pi_desc =3D vcpu_to_pi_desc(vcpu);
-	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
+	struct vcpu_pi *vcpu_pi =3D vcpu_to_pi(vcpu);
+	struct pi_desc *pi_desc =3D &vcpu_pi->pi_desc;
 	struct pi_desc old, new;
 	unsigned long flags;
=20
 	local_irq_save(flags);
=20
 	raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
-	list_add_tail(&vmx->pi_wakeup_list,
+	list_add_tail(&vcpu_pi->pi_wakeup_list,
 		      &per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu));
 	raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
=20
@@ -190,7 +211,8 @@ static bool vmx_needs_pi_wakeup(struct kvm_vcpu *vcpu)
 	 * notification vector is switched to the one that calls
 	 * back to the pi_wakeup_handler() function.
 	 */
-	return vmx_can_use_ipiv(vcpu) || vmx_can_use_vtd_pi(vcpu->kvm);
+	return (vmx_can_use_ipiv(vcpu) && !is_td_vcpu(vcpu)) ||
+		vmx_can_use_vtd_pi(vcpu->kvm);
 }
=20
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
@@ -200,7 +222,8 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
 	if (!vmx_needs_pi_wakeup(vcpu))
 		return;
=20
-	if (kvm_vcpu_is_blocking(vcpu) && !vmx_interrupt_blocked(vcpu))
+	if (kvm_vcpu_is_blocking(vcpu) &&
+	    (is_td_vcpu(vcpu) || !vmx_interrupt_blocked(vcpu)))
 		pi_enable_wakeup_handler(vcpu);
=20
 	/*
diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
index 26992076552e..2fe8222308b2 100644
--- a/arch/x86/kvm/vmx/posted_intr.h
+++ b/arch/x86/kvm/vmx/posted_intr.h
@@ -94,6 +94,17 @@ static inline bool pi_test_sn(struct pi_desc *pi_desc)
 			(unsigned long *)&pi_desc->control);
 }
=20
+struct vcpu_pi {
+	struct kvm_vcpu	vcpu;
+
+	/* Posted interrupt descriptor */
+	struct pi_desc pi_desc;
+
+	/* Used if this vCPU is waiting for PI notification wakeup. */
+	struct list_head pi_wakeup_list;
+	/* Until here common layout betwwn vcpu_vmx and vcpu_tdx. */
+};
+
 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu);
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu);
 void pi_wakeup_handler(void);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index f5a3150ecff1..6d93289f201b 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -516,6 +516,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
=20
 	fpstate_set_confidential(&vcpu->arch.guest_fpu);
 	vcpu->arch.apic->guest_apic_protected =3D true;
+	INIT_LIST_HEAD(&tdx->pi_wakeup_list);
=20
 	vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
=20
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 1595c124899d..cee7b4bc0d0a 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -4,6 +4,7 @@
=20
 #ifdef CONFIG_INTEL_TDX_HOST
=20
+#include "posted_intr.h"
 #include "pmu_intel.h"
 #include "tdx_ops.h"
=20
@@ -59,6 +60,13 @@ union tdx_exit_reason {
 struct vcpu_tdx {
 	struct kvm_vcpu	vcpu;
=20
+	/* Posted interrupt descriptor */
+	struct pi_desc pi_desc;
+
+	/* Used if this vCPU is waiting for PI notification wakeup. */
+	struct list_head pi_wakeup_list;
+	/* Until here same layout to struct vcpu_pi. */
+
 	unsigned long tdvpr_pa;
 	unsigned long *tdvpx_pa;
=20
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 1813caeb24d8..0a7ab0a7d604 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -245,6 +245,14 @@ struct nested_vmx {
=20
 struct vcpu_vmx {
 	struct kvm_vcpu       vcpu;
+
+	/* Posted interrupt descriptor */
+	struct pi_desc pi_desc;
+
+	/* Used if this vCPU is waiting for PI notification wakeup. */
+	struct list_head pi_wakeup_list;
+	/* Until here same layout to struct vcpu_pi. */
+
 	u8                    fail;
 	u8		      x2apic_msr_bitmap_mode;
=20
@@ -314,12 +322,6 @@ struct vcpu_vmx {
=20
 	union vmx_exit_reason exit_reason;
=20
-	/* Posted interrupt descriptor */
-	struct pi_desc pi_desc;
-
-	/* Used if this vCPU is waiting for PI notification wakeup. */
-	struct list_head pi_wakeup_list;
-
 	/* Support for a guest hypervisor (nested VMX) */
 	struct nested_vmx nested;
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9CBAEC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:29:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231700AbjB0I3H (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:29:07 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57456 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231503AbjB0I2Y (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:24 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CABE1F5D6;
        Mon, 27 Feb 2023 00:25:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486328; x=1709022328;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=gi1+dfQAd3ZVDEkeohSiTnhN0Fj1CeqFsPtYQS4sMQo=;
  b=ebMozkACK0jZ2cjDGZIE75/fJU8ApRBLLXEn7QPoOyzOtJAePvfl0BI9
   yJjGffx8X59HixUcYzs9q09923MWvaomFTNcu0vt0GOAsmSQTlXMSIvde
   /G5Nj6eqiR3sHw3mjlMXaiXmDZsFcIkAN3hhSb9HvrMhUFBxvc4SAfHoC
   16Bs8K2KfCZ/0t3sSieHJ/KsBh4MNiTk1fuEBhIVCFUS1qipHa1RRYXD8
   +vI3gMuBJtccDixQgGGPtCS+3YT8T03wqRMOOozJszYaXs9+zUvGgJ3kD
   GsF2MREFaIosfsI/VR6lIT+XGvjnp0PPIgCFBpY4T/M5WdSlLHYr+zEMZ
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608969"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608969"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242328"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242328"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 070/106] KVM: TDX: Implement interrupt injection
Date: Mon, 27 Feb 2023 00:23:09 -0800
Message-Id: 
 <47a7d9af34207ee94d6ef63cb3aa24ec3499ccb4.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX supports interrupt inject into vcpu with posted interrupt.  Wire up the
corresponding kvm x86 operations to posted interrupt.  Move
kvm_vcpu_trigger_posted_interrupt() from vmx.c to common.h to share the
code.

VMX can inject interrupt by setting interrupt information field,
VM_ENTRY_INTR_INFO_FIELD, of VMCS.  TDX supports interrupt injection only
by posted interrupt.  Ignore the execution path to access
VM_ENTRY_INTR_INFO_FIELD.

As cpu state is protected and apicv is enabled for the TDX guest, VMM can
inject interrupt by updating posted interrupt descriptor.  Treat interrupt
can be injected always.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/common.h      | 71 ++++++++++++++++++++++++++
 arch/x86/kvm/vmx/main.c        | 93 ++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/posted_intr.c |  2 +-
 arch/x86/kvm/vmx/posted_intr.h |  2 +
 arch/x86/kvm/vmx/tdx.c         | 24 +++++++++
 arch/x86/kvm/vmx/vmx.c         | 67 +-----------------------
 arch/x86/kvm/vmx/x86_ops.h     |  7 ++-
 7 files changed, 189 insertions(+), 77 deletions(-)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 422b24af7fc1..39ddead9d2bd 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -4,6 +4,7 @@
=20
 #include <linux/kvm_host.h>
=20
+#include "posted_intr.h"
 #include "mmu.h"
=20
 u8 __vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio, bool =
check_cr0_cd);
@@ -32,4 +33,74 @@ static inline int __vmx_handle_ept_violation(struct kvm_=
vcpu *vcpu, gpa_t gpa,
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
=20
+static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
+						     int pi_vec)
+{
+#ifdef CONFIG_SMP
+	if (vcpu->mode =3D=3D IN_GUEST_MODE) {
+		/*
+		 * The vector of the virtual has already been set in the PIR.
+		 * Send a notification event to deliver the virtual interrupt
+		 * unless the vCPU is the currently running vCPU, i.e. the
+		 * event is being sent from a fastpath VM-Exit handler, in
+		 * which case the PIR will be synced to the vIRR before
+		 * re-entering the guest.
+		 *
+		 * When the target is not the running vCPU, the following
+		 * possibilities emerge:
+		 *
+		 * Case 1: vCPU stays in non-root mode. Sending a notification
+		 * event posts the interrupt to the vCPU.
+		 *
+		 * Case 2: vCPU exits to root mode and is still runnable. The
+		 * PIR will be synced to the vIRR before re-entering the guest.
+		 * Sending a notification event is ok as the host IRQ handler
+		 * will ignore the spurious event.
+		 *
+		 * Case 3: vCPU exits to root mode and is blocked. vcpu_block()
+		 * has already synced PIR to vIRR and never blocks the vCPU if
+		 * the vIRR is not empty. Therefore, a blocked vCPU here does
+		 * not wait for any requested interrupts in PIR, and sending a
+		 * notification event also results in a benign, spurious event.
+		 */
+
+		if (vcpu !=3D kvm_get_running_vcpu())
+			apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
+		return;
+	}
+#endif
+	/*
+	 * The vCPU isn't in the guest; wake the vCPU in case it is blocking,
+	 * otherwise do nothing as KVM will grab the highest priority pending
+	 * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
+	 */
+	kvm_vcpu_wake_up(vcpu);
+}
+
+/*
+ * Send interrupt to vcpu via posted interrupt way.
+ * 1. If target vcpu is running(non-root mode), send posted interrupt
+ * notification to vcpu and hardware will sync PIR to vIRR atomically.
+ * 2. If target vcpu isn't running(root mode), kick it to pick up the
+ * interrupt from PIR in next vmentry.
+ */
+static inline void __vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu,
+						  struct pi_desc *pi_desc, int vector)
+{
+	if (pi_test_and_set_pir(vector, pi_desc))
+		return;
+
+	/* If a previous notification has sent the IPI, nothing to do.  */
+	if (pi_test_and_set_on(pi_desc))
+		return;
+
+	/*
+	 * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
+	 * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
+	 * guaranteed to see PID.ON=3D1 and sync the PIR to IRR if triggering a
+	 * posted interrupt "fails" because vcpu->mode !=3D IN_GUEST_MODE.
+	 */
+	kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
+}
+
 #endif /* __KVM_X86_VMX_COMMON_H */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index f5c569f43ccd..5fe87253668a 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -218,6 +218,34 @@ static bool vt_protected_apic_has_interrupt(struct kvm=
_vcpu *vcpu)
 	return tdx_protected_apic_has_interrupt(vcpu);
 }
=20
+static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
+
+	pi_clear_on(pi);
+	memset(pi->pir, 0, sizeof(pi->pir));
+}
+
+static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return -1;
+
+	return vmx_sync_pir_to_irr(vcpu);
+}
+
+static void vt_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector)
+{
+	if (is_td_vcpu(apic->vcpu)) {
+		tdx_deliver_interrupt(apic, delivery_mode, trig_mode,
+					     vector);
+		return;
+	}
+
+	vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu)) {
@@ -285,6 +313,53 @@ static void vt_sched_in(struct kvm_vcpu *vcpu, int cpu)
 	vmx_sched_in(vcpu, cpu);
 }
=20
+static void vt_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+	vmx_set_interrupt_shadow(vcpu, mask);
+}
+
+static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return 0;
+
+	return vmx_get_interrupt_shadow(vcpu);
+}
+
+static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_inject_irq(vcpu, reinjected);
+}
+
+static void vt_cancel_injection(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_cancel_injection(vcpu);
+}
+
+static int vt_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_interrupt_allowed(vcpu, for_injection);
+}
+
+static void vt_enable_irq_window(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_enable_irq_window(vcpu);
+}
+
 static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	if (is_td_vcpu(vcpu))
@@ -384,31 +459,31 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.handle_exit =3D vmx_handle_exit,
 	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
 	.update_emulated_instruction =3D vmx_update_emulated_instruction,
-	.set_interrupt_shadow =3D vmx_set_interrupt_shadow,
-	.get_interrupt_shadow =3D vmx_get_interrupt_shadow,
+	.set_interrupt_shadow =3D vt_set_interrupt_shadow,
+	.get_interrupt_shadow =3D vt_get_interrupt_shadow,
 	.patch_hypercall =3D vmx_patch_hypercall,
-	.inject_irq =3D vmx_inject_irq,
+	.inject_irq =3D vt_inject_irq,
 	.inject_nmi =3D vmx_inject_nmi,
 	.inject_exception =3D vmx_inject_exception,
-	.cancel_injection =3D vmx_cancel_injection,
-	.interrupt_allowed =3D vmx_interrupt_allowed,
+	.cancel_injection =3D vt_cancel_injection,
+	.interrupt_allowed =3D vt_interrupt_allowed,
 	.nmi_allowed =3D vmx_nmi_allowed,
 	.get_nmi_mask =3D vmx_get_nmi_mask,
 	.set_nmi_mask =3D vmx_set_nmi_mask,
 	.enable_nmi_window =3D vmx_enable_nmi_window,
-	.enable_irq_window =3D vmx_enable_irq_window,
+	.enable_irq_window =3D vt_enable_irq_window,
 	.update_cr8_intercept =3D vmx_update_cr8_intercept,
 	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
 	.set_apic_access_page_addr =3D vmx_set_apic_access_page_addr,
 	.refresh_apicv_exec_ctrl =3D vmx_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap =3D vmx_load_eoi_exitmap,
-	.apicv_post_state_restore =3D vmx_apicv_post_state_restore,
+	.apicv_post_state_restore =3D vt_apicv_post_state_restore,
 	.required_apicv_inhibits =3D VMX_REQUIRED_APICV_INHIBITS,
 	.hwapic_irr_update =3D vmx_hwapic_irr_update,
 	.hwapic_isr_update =3D vmx_hwapic_isr_update,
 	.guest_apic_has_interrupt =3D vmx_guest_apic_has_interrupt,
-	.sync_pir_to_irr =3D vmx_sync_pir_to_irr,
-	.deliver_interrupt =3D vmx_deliver_interrupt,
+	.sync_pir_to_irr =3D vt_sync_pir_to_irr,
+	.deliver_interrupt =3D vt_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
 	.protected_apic_has_interrupt =3D vt_protected_apic_has_interrupt,
=20
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 92de016852ca..2b2da6c18504 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -52,7 +52,7 @@ static inline struct vcpu_pi *vcpu_to_pi(struct kvm_vcpu =
*vcpu)
 	return (struct vcpu_pi *)vcpu;
 }
=20
-static inline struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 {
 	return &vcpu_to_pi(vcpu)->pi_desc;
 }
diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
index 2fe8222308b2..0f9983b6910b 100644
--- a/arch/x86/kvm/vmx/posted_intr.h
+++ b/arch/x86/kvm/vmx/posted_intr.h
@@ -105,6 +105,8 @@ struct vcpu_pi {
 	/* Until here common layout betwwn vcpu_vmx and vcpu_tdx. */
 };
=20
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu);
+
 void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu);
 void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu);
 void pi_wakeup_handler(void);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6d93289f201b..455060f39018 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -529,6 +529,9 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.guest_state_protected =3D
 		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
=20
+	tdx->pi_desc.nv =3D POSTED_INTR_VECTOR;
+	tdx->pi_desc.sn =3D 1;
+
 	tdx->host_state_need_save =3D true;
 	tdx->host_state_need_restore =3D false;
=20
@@ -539,6 +542,7 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
=20
+	vmx_vcpu_pi_load(vcpu, cpu);
 	if (vcpu->cpu =3D=3D cpu)
 		return;
=20
@@ -724,6 +728,12 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	trace_kvm_entry(vcpu);
=20
+	if (pi_test_on(&tdx->pi_desc)) {
+		apic->send_IPI_self(POSTED_INTR_VECTOR);
+
+		kvm_wait_lapic_expire(vcpu);
+	}
+
 	tdx_vcpu_enter_exit(vcpu, tdx);
=20
 	tdx_user_return_update_cache();
@@ -1055,6 +1065,16 @@ static int tdx_sept_remove_private_spte(struct kvm *=
kvm, gfn_t gfn,
 	return tdx_sept_drop_private_spte(kvm, gfn, level, pfn);
 }
=20
+void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector)
+{
+	struct kvm_vcpu *vcpu =3D apic->vcpu;
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	/* TDX supports only posted interrupt.  No lapic emulation. */
+	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
@@ -1770,6 +1790,10 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __use=
r *argp)
 	if (ret)
 		return ret;
=20
+	td_vmcs_write16(tdx, POSTED_INTR_NV, POSTED_INTR_VECTOR);
+	td_vmcs_write64(tdx, POSTED_INTR_DESC_ADDR, __pa(&tdx->pi_desc));
+	td_vmcs_setbit32(tdx, PIN_BASED_VM_EXEC_CONTROL, PIN_BASED_POSTED_INTR);
+
 	tdx->initialized =3D true;
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 72da86abf989..f68ee2f5586b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4144,50 +4144,6 @@ void vmx_msr_filter_changed(struct kvm_vcpu *vcpu)
 		pt_update_intercept_for_msr(vcpu);
 }
=20
-static inline void kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu,
-						     int pi_vec)
-{
-#ifdef CONFIG_SMP
-	if (vcpu->mode =3D=3D IN_GUEST_MODE) {
-		/*
-		 * The vector of the virtual has already been set in the PIR.
-		 * Send a notification event to deliver the virtual interrupt
-		 * unless the vCPU is the currently running vCPU, i.e. the
-		 * event is being sent from a fastpath VM-Exit handler, in
-		 * which case the PIR will be synced to the vIRR before
-		 * re-entering the guest.
-		 *
-		 * When the target is not the running vCPU, the following
-		 * possibilities emerge:
-		 *
-		 * Case 1: vCPU stays in non-root mode. Sending a notification
-		 * event posts the interrupt to the vCPU.
-		 *
-		 * Case 2: vCPU exits to root mode and is still runnable. The
-		 * PIR will be synced to the vIRR before re-entering the guest.
-		 * Sending a notification event is ok as the host IRQ handler
-		 * will ignore the spurious event.
-		 *
-		 * Case 3: vCPU exits to root mode and is blocked. vcpu_block()
-		 * has already synced PIR to vIRR and never blocks the vCPU if
-		 * the vIRR is not empty. Therefore, a blocked vCPU here does
-		 * not wait for any requested interrupts in PIR, and sending a
-		 * notification event also results in a benign, spurious event.
-		 */
-
-		if (vcpu !=3D kvm_get_running_vcpu())
-			apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
-		return;
-	}
-#endif
-	/*
-	 * The vCPU isn't in the guest; wake the vCPU in case it is blocking,
-	 * otherwise do nothing as KVM will grab the highest priority pending
-	 * IRQ via ->sync_pir_to_irr() in vcpu_enter_guest().
-	 */
-	kvm_vcpu_wake_up(vcpu);
-}
-
 static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
 						int vector)
 {
@@ -4240,20 +4196,7 @@ static int vmx_deliver_posted_interrupt(struct kvm_v=
cpu *vcpu, int vector)
 	if (!vcpu->arch.apic->apicv_active)
 		return -1;
=20
-	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
-		return 0;
-
-	/* If a previous notification has sent the IPI, nothing to do.  */
-	if (pi_test_and_set_on(&vmx->pi_desc))
-		return 0;
-
-	/*
-	 * The implied barrier in pi_test_and_set_on() pairs with the smp_mb_*()
-	 * after setting vcpu->mode in vcpu_enter_guest(), thus the vCPU is
-	 * guaranteed to see PID.ON=3D1 and sync the PIR to IRR if triggering a
-	 * posted interrupt "fails" because vcpu->mode !=3D IN_GUEST_MODE.
-	 */
-	kvm_vcpu_trigger_posted_interrupt(vcpu, POSTED_INTR_VECTOR);
+	__vmx_deliver_posted_interrupt(vcpu, &vmx->pi_desc, vector);
 	return 0;
 }
=20
@@ -6886,14 +6829,6 @@ void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64=
 *eoi_exit_bitmap)
 	vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
 }
=20
-void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
-
-	pi_clear_on(&vmx->pi_desc);
-	memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
-}
-
 void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
=20
 static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 238a948c3bdb..fdad8f0edd6a 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -62,7 +62,6 @@ int vmx_check_intercept(struct kvm_vcpu *vcpu,
 bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu);
 void vmx_migrate_timers(struct kvm_vcpu *vcpu);
 void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
-void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu);
 bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason);
 void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr);
 void vmx_hwapic_isr_update(int max_isr);
@@ -164,6 +163,9 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
 u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
=20
+void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
+			   int trig_mode, int vector);
+
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
=20
 void tdx_flush_tlb(struct kvm_vcpu *vcpu);
@@ -196,6 +198,9 @@ static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu,=
 int cpu) {}
 static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)=
 { return false; }
 static inline u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is=
_mmio) { return 0; }
=20
+static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int deliv=
ery_mode,
+					 int trig_mode, int vector) {}
+
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
=20
 static inline void tdx_flush_tlb(struct kvm_vcpu *vcpu) {}
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 90874C7EE2E
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:29:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231732AbjB0I3V (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:29:21 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55158 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230416AbjB0I22 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:28 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E34CF1E9FD;
        Mon, 27 Feb 2023 00:25:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486333; x=1709022333;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=uKZ4svGjPpiJO1WLmPzwP6IW1GALh7NR89xaq8uPvtc=;
  b=ZeSlOD3tvh5yphkkoPzfh9XRPmFZLK3KM5BFBQjXDTLGNXWH+SW7HcLo
   JnAM3LLdCPoZE0FttOZFRly3mA1NjPbcDgA4nRtwcXUVG58SBaWrfPmlX
   GBI5AATBYTy7EZAdzyRYpbhYDDEw5shxVQkL870vTASdHtoo/mz2lCbrL
   G73xKPGe4QvEJPREVQMg6dD+VHkGg8IQiQY3HLN0rcuDfDMqzKNN/XPY8
   0whYQ8nVj0KFhyGSssurXJa8mzgvYux+tXcI7CeR8dzb5dExAUJDr9uho
   92f+ukLMVywB+Twt/xffJqVx9HAjt7oS9wev4/UeYBsi5WATpDkR6HTin
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608973"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608973"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242332"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242332"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 071/106] KVM: TDX: Implements vcpu request_immediate_exit
Date: Mon, 27 Feb 2023 00:23:10 -0800
Message-Id: 
 <645e278b405ced7d0ba5f7fd60564f37ecd1cd5a.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Now we are able to inject interrupts into TDX vcpu, it's ready to block TDX
vcpu.  Wire up kvm x86 methods for blocking/unblocking vcpu for TDX.  To
unblock on pending events, request immediate exit methods is also needed.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 5fe87253668a..cfcd5818f8cb 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -360,6 +360,16 @@ static void vt_enable_irq_window(struct kvm_vcpu *vcpu)
 	vmx_enable_irq_window(vcpu);
 }
=20
+static void vt_request_immediate_exit(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		__kvm_request_immediate_exit(vcpu);
+		return;
+	}
+
+	vmx_request_immediate_exit(vcpu);
+}
+
 static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	if (is_td_vcpu(vcpu))
@@ -507,7 +517,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.check_intercept =3D vmx_check_intercept,
 	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
=20
-	.request_immediate_exit =3D vmx_request_immediate_exit,
+	.request_immediate_exit =3D vt_request_immediate_exit,
=20
 	.sched_in =3D vt_sched_in,
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8FF6DC64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:29:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231265AbjB0I3Y (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:29:24 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59254 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231268AbjB0I22 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:28 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3A3F1EFEB;
        Mon, 27 Feb 2023 00:25:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486333; x=1709022333;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=kMBId5tgd/jYNAEuydqBQUWh1H/FrqNDIwg30f6z/Lw=;
  b=kBvSYjN6gmEAs1VWR6VbLwcKSXFHCmb/3YrT9huu5PGxt2f4YF8lHEsd
   O7FWIEVsKd61MY5Ttqo/L//gJLdJwuM46hxNnWhBRjr+g4RE0HmviIDLO
   1sCL31lt4A7yoKttCLI384eYhxLmKZzqhNIW5Us38dpjK2mO1R7lhVslz
   tZ391nqroSvfqqj6S3wdnYxDfZ/eRXAmgohEdo4jLQPhBvv87bPs6kBLD
   QQR1RkAXObwRHWZZShp0sUgMFRE+kKBxH3DJ1CbdE3IBCyIbSpl3U7QQw
   MQaPrlvkCawXLC/gjW+m4ioJfIIv3NnYYTmkDubtiADwLPEyFP/4C0qM6
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608979"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608979"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242337"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242337"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 072/106] KVM: TDX: Implement methods to inject NMI
Date: Mon, 27 Feb 2023 00:23:11 -0800
Message-Id: 
 <4dcdb9d3fd2d25d9ba8e2b68248f6d19978b9ffa.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX vcpu control structure defines one bit for pending NMI for VMM to
inject NMI by setting the bit without knowing TDX vcpu NMI states.  Because
the vcpu state is protected, VMM can't know about NMI states of TDX vcpu.
The TDX module handles actual injection and NMI states transition.

Add methods for NMI and treat NMI can be injected always.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c    | 64 +++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/tdx.c     |  5 +++
 arch/x86/kvm/vmx/x86_ops.h |  2 ++
 3 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index cfcd5818f8cb..134f03a891b4 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -294,6 +294,60 @@ static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu)
 	vmx_flush_tlb_guest(vcpu);
 }
=20
+static void vt_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_inject_nmi(vcpu);
+		return;
+	}
+
+	vmx_inject_nmi(vcpu);
+}
+
+static int vt_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	/*
+	 * The TDX module manages NMI windows and NMI reinjection, and hides NMI
+	 * blocking, all KVM can do is throw an NMI over the wall.
+	 */
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_nmi_allowed(vcpu, for_injection);
+}
+
+static bool vt_get_nmi_mask(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Assume NMIs are always unmasked.  KVM could query PEND_NMI and treat
+	 * NMIs as masked if a previous NMI is still pending, but SEAMCALLs are
+	 * expensive and the end result is unchanged as the only relevant usage
+	 * of get_nmi_mask() is to limit the number of pending NMIs, i.e. it
+	 * only changes whether KVM or the TDX module drops an NMI.
+	 */
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return vmx_get_nmi_mask(vcpu);
+}
+
+static void vt_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_nmi_mask(vcpu, masked);
+}
+
+static void vt_enable_nmi_window(struct kvm_vcpu *vcpu)
+{
+	/* Refer the comment in vt_get_nmi_mask(). */
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_enable_nmi_window(vcpu);
+}
+
 static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			int pgd_level)
 {
@@ -473,14 +527,14 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.get_interrupt_shadow =3D vt_get_interrupt_shadow,
 	.patch_hypercall =3D vmx_patch_hypercall,
 	.inject_irq =3D vt_inject_irq,
-	.inject_nmi =3D vmx_inject_nmi,
+	.inject_nmi =3D vt_inject_nmi,
 	.inject_exception =3D vmx_inject_exception,
 	.cancel_injection =3D vt_cancel_injection,
 	.interrupt_allowed =3D vt_interrupt_allowed,
-	.nmi_allowed =3D vmx_nmi_allowed,
-	.get_nmi_mask =3D vmx_get_nmi_mask,
-	.set_nmi_mask =3D vmx_set_nmi_mask,
-	.enable_nmi_window =3D vmx_enable_nmi_window,
+	.nmi_allowed =3D vt_nmi_allowed,
+	.get_nmi_mask =3D vt_get_nmi_mask,
+	.set_nmi_mask =3D vt_set_nmi_mask,
+	.enable_nmi_window =3D vt_enable_nmi_window,
 	.enable_irq_window =3D vt_enable_irq_window,
 	.update_cr8_intercept =3D vmx_update_cr8_intercept,
 	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 455060f39018..42ab1f13a48f 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -749,6 +749,11 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
 	return EXIT_FASTPATH_NONE;
 }
=20
+void tdx_inject_nmi(struct kvm_vcpu *vcpu)
+{
+	td_management_write8(to_tdx(vcpu), TD_VCPU_PEND_NMI, 1);
+}
+
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
 {
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index fdad8f0edd6a..3aaee10aaa29 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -165,6 +165,7 @@ u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bo=
ol is_mmio);
=20
 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 			   int trig_mode, int vector);
+void tdx_inject_nmi(struct kvm_vcpu *vcpu);
=20
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
=20
@@ -200,6 +201,7 @@ static inline u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu,=
 gfn_t gfn, bool is_mmio)
=20
 static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int deliv=
ery_mode,
 					 int trig_mode, int vector) {}
+static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {}
=20
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F0D97C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:29:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231320AbjB0I32 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:29:28 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56994 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231548AbjB0I23 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:29 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 043EE1F90F;
        Mon, 27 Feb 2023 00:25:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486335; x=1709022335;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=O8o1EMXeBw3pZWezBLuqppwDDHiSfLvw2sREaH393wE=;
  b=KddnKwPlGlmJ++xspP9jAlLiMavmgYm54prvCJIOWI1Cc8vcsvMzRkJp
   lHUflpFzE+S0v5WDZhrzPlzabh6+6ZMk1dwxCWCslWEP0Z/l5KV5X5C7u
   zB6FERAkh0jxn7u9Qai2G7ZyE+pbjGjcEi6ns2Wigmitf54IX1O7hkfxE
   Sobx6jycGrgIAA+Vs1x8ITPUY2MFGZ95JRqWFJb+O4fHxDwdxx/TkEBnZ
   BkB3RSlatT6QGw6HlYc4EzM1xXZOrqh2tPU20GtxMn+AD32OfjIUt7oms
   pi4Ks+fszYO/JaS2TL46NYbzvRY9jeSzZHtdHKppQBIHQrMikHg9I2Bn9
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608983"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608983"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242340"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242340"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 073/106] KVM: VMX: Modify NMI and INTR handlers to take
 intr_info as function argument
Date: Mon, 27 Feb 2023 00:23:12 -0800
Message-Id: 
 <ef1b82711ecbeb4bfd524fa64643a25de2c24136.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX uses different ABI to get information about VM exit.  Pass intr_info to
the NMI and INTR handlers instead of pulling it from vcpu_vmx in
preparation for sharing the bulk of the handlers with TDX.

When the guest TD exits to VMM, RAX holds status and exit reason, RCX holds
exit qualification etc rather than the VMCS fields because VMM doesn't have
access to the VMCS.  The eventual code will be

VMX:
  - get exit reason, intr_info, exit_qualification, and etc from VMCS
  - call NMI/INTR handlers (common code)

TDX:
  - get exit reason, intr_info, exit_qualification, and etc from guest
    registers
  - call NMI/INTR handlers (common code)

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f68ee2f5586b..5081679eba4a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6861,28 +6861,27 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu =
*vcpu)
 		rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
 }
=20
-static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
+static void handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_in=
fo)
 {
 	const unsigned long nmi_entry =3D (unsigned long)asm_exc_nmi_noist;
-	u32 intr_info =3D vmx_get_intr_info(&vmx->vcpu);
=20
 	/* if exit due to PF check for async PF */
 	if (is_page_fault(intr_info))
-		vmx->vcpu.arch.apf.host_apf_flags =3D kvm_read_and_reset_apf_flags();
+		vcpu->arch.apf.host_apf_flags =3D kvm_read_and_reset_apf_flags();
 	/* if exit due to NM, handle before interrupts are enabled */
 	else if (is_nm_fault(intr_info))
-		handle_nm_fault_irqoff(&vmx->vcpu);
+		handle_nm_fault_irqoff(vcpu);
 	/* Handle machine checks before interrupts are enabled */
 	else if (is_machine_check(intr_info))
 		kvm_machine_check();
 	/* We need to handle NMIs before interrupts are enabled */
 	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
+		handle_interrupt_nmi_irqoff(vcpu, nmi_entry);
 }
=20
-static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
+static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
+					     u32 intr_info)
 {
-	u32 intr_info =3D vmx_get_intr_info(vcpu);
 	unsigned int vector =3D intr_info & INTR_INFO_VECTOR_MASK;
 	gate_desc *desc =3D (gate_desc *)host_idt_base + vector;
=20
@@ -6902,9 +6901,9 @@ void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 		return;
=20
 	if (vmx->exit_reason.basic =3D=3D EXIT_REASON_EXTERNAL_INTERRUPT)
-		handle_external_interrupt_irqoff(vcpu);
+		handle_external_interrupt_irqoff(vcpu, vmx_get_intr_info(vcpu));
 	else if (vmx->exit_reason.basic =3D=3D EXIT_REASON_EXCEPTION_NMI)
-		handle_exception_nmi_irqoff(vmx);
+		handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
 }
=20
 /*
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1EFEEC7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:29:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230225AbjB0I3f (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:29:35 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57132 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230472AbjB0I2q (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:46 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA98F1F93A;
        Mon, 27 Feb 2023 00:25:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486339; x=1709022339;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=QOcZNbHq9/PzgJKWY9Z1cpCDbGr/g1QSckSvA57Km0g=;
  b=Ua3jMvXjiqvix1IB2fJr3DHwXrYIJaZxSVw1+02gjMoQfe8cgYTI4nqg
   ng1Zdc6MfuYj6ctX83rbh/fBEzHeuMfZAPutQD1xEwbqCowV2VdZrdC8j
   BEWB2a1JVzh0moD7QdNMciA1zJcjIA4/HE3wAfLpq4Fu0yV3WxzuQC4LH
   ttJiIT27vk3ukOlY5k8lsCwuzo8DWktBPAImwySpxUg8Uqj1eZKGaJkKQ
   MSuVnqYjQAMx0fEQ+vfLt+m5v2nsK2v8uLwYv4qOms0JgdIzC8jif64OS
   GiGYee8GpMAjITzRpGSLBBEO1TM4yZa0LuUJUhV8w+KMfgN6g6TSdK2V2
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608988"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608988"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:16 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242343"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242343"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:15 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 074/106] KVM: VMX: Move NMI/exception handler to common
 helper
Date: Mon, 27 Feb 2023 00:23:13 -0800
Message-Id: 
 <dc51a734782069cc3b4bae5652d7ae29e0afca02.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX mostly handles NMI/exception exit mostly the same to VMX case.  The
difference is how to retrieve exit qualification.  To share the code with
TDX, move NMI/exception to a common header, common.h.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/common.h | 70 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c    | 79 ++++-----------------------------------
 2 files changed, 78 insertions(+), 71 deletions(-)

diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 39ddead9d2bd..9e138318696c 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -4,8 +4,78 @@
=20
 #include <linux/kvm_host.h>
=20
+#include <asm/traps.h>
+
 #include "posted_intr.h"
 #include "mmu.h"
+#include "vmcs.h"
+#include "x86.h"
+
+extern unsigned long vmx_host_idt_base;
+void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
+
+static inline void vmx_handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
+						   unsigned long entry)
+{
+	bool is_nmi =3D entry =3D=3D (unsigned long)asm_exc_nmi_noist;
+
+	kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
+	vmx_do_interrupt_nmi_irqoff(entry);
+	kvm_after_interrupt(vcpu);
+}
+
+static inline void vmx_handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Save xfd_err to guest_fpu before interrupt is enabled, so the
+	 * MSR value is not clobbered by the host activity before the guest
+	 * has chance to consume it.
+	 *
+	 * Do not blindly read xfd_err here, since this exception might
+	 * be caused by L1 interception on a platform which doesn't
+	 * support xfd at all.
+	 *
+	 * Do it conditionally upon guest_fpu::xfd. xfd_err matters
+	 * only when xfd contains a non-zero value.
+	 *
+	 * Queuing exception is done in vmx_handle_exit. See comment there.
+	 */
+	if (vcpu->arch.guest_fpu.fpstate->xfd)
+		rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
+}
+
+static inline void vmx_handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu,
+						   u32 intr_info)
+{
+	const unsigned long nmi_entry =3D (unsigned long)asm_exc_nmi_noist;
+
+	/* if exit due to PF check for async PF */
+	if (is_page_fault(intr_info))
+		vcpu->arch.apf.host_apf_flags =3D kvm_read_and_reset_apf_flags();
+	/* if exit due to NM, handle before interrupts are enabled */
+	else if (is_nm_fault(intr_info))
+		vmx_handle_nm_fault_irqoff(vcpu);
+	/* Handle machine checks before interrupts are enabled */
+	else if (is_machine_check(intr_info))
+		kvm_machine_check();
+	/* We need to handle NMIs before interrupts are enabled */
+	else if (is_nmi(intr_info))
+		vmx_handle_interrupt_nmi_irqoff(vcpu, nmi_entry);
+}
+
+static inline void vmx_handle_external_interrupt_irqoff(struct kvm_vcpu *v=
cpu,
+							u32 intr_info)
+{
+	unsigned int vector =3D intr_info & INTR_INFO_VECTOR_MASK;
+	gate_desc *desc =3D (gate_desc *)vmx_host_idt_base + vector;
+
+	if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
+	    "unexpected VM-Exit interrupt info: 0x%x", intr_info))
+		return;
+
+	vmx_handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
+	vcpu->arch.at_instruction_boundary =3D true;
+}
=20
 u8 __vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio, bool =
check_cr0_cd);
=20
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5081679eba4a..17d047a73550 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -526,7 +526,7 @@ static inline void vmx_segment_cache_clear(struct vcpu_=
vmx *vmx)
 	vmx->segment_cache.bitmask =3D 0;
 }
=20
-static unsigned long host_idt_base;
+unsigned long vmx_host_idt_base;
=20
 #if IS_ENABLED(CONFIG_HYPERV)
 static bool __read_mostly enlightened_vmcs =3D true;
@@ -4260,7 +4260,7 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 	vmcs_write16(HOST_SS_SELECTOR, __KERNEL_DS);  /* 22.2.4 */
 	vmcs_write16(HOST_TR_SELECTOR, GDT_ENTRY_TSS*8);  /* 22.2.4 */
=20
-	vmcs_writel(HOST_IDTR_BASE, host_idt_base);   /* 22.2.4 */
+	vmcs_writel(HOST_IDTR_BASE, vmx_host_idt_base);   /* 22.2.4 */
=20
 	vmcs_writel(HOST_RIP, (unsigned long)vmx_vmexit); /* 22.2.5 */
=20
@@ -5151,10 +5151,10 @@ static int handle_exception_nmi(struct kvm_vcpu *vc=
pu)
 	intr_info =3D vmx_get_intr_info(vcpu);
=20
 	if (is_machine_check(intr_info) || is_nmi(intr_info))
-		return 1; /* handled by handle_exception_nmi_irqoff() */
+		return 1; /* handled by vmx_handle_exception_nmi_irqoff() */
=20
 	/*
-	 * Queue the exception here instead of in handle_nm_fault_irqoff().
+	 * Queue the exception here instead of in vmx_handle_nm_fault_irqoff().
 	 * This ensures the nested_vmx check is not skipped so vmexit can
 	 * be reflected to L1 (when it intercepts #NM) before reaching this
 	 * point.
@@ -6829,70 +6829,6 @@ void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64=
 *eoi_exit_bitmap)
 	vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]);
 }
=20
-void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
-
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
-					unsigned long entry)
-{
-	bool is_nmi =3D entry =3D=3D (unsigned long)asm_exc_nmi_noist;
-
-	kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
-	vmx_do_interrupt_nmi_irqoff(entry);
-	kvm_after_interrupt(vcpu);
-}
-
-static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
-{
-	/*
-	 * Save xfd_err to guest_fpu before interrupt is enabled, so the
-	 * MSR value is not clobbered by the host activity before the guest
-	 * has chance to consume it.
-	 *
-	 * Do not blindly read xfd_err here, since this exception might
-	 * be caused by L1 interception on a platform which doesn't
-	 * support xfd at all.
-	 *
-	 * Do it conditionally upon guest_fpu::xfd. xfd_err matters
-	 * only when xfd contains a non-zero value.
-	 *
-	 * Queuing exception is done in vmx_handle_exit. See comment there.
-	 */
-	if (vcpu->arch.guest_fpu.fpstate->xfd)
-		rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
-}
-
-static void handle_exception_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_in=
fo)
-{
-	const unsigned long nmi_entry =3D (unsigned long)asm_exc_nmi_noist;
-
-	/* if exit due to PF check for async PF */
-	if (is_page_fault(intr_info))
-		vcpu->arch.apf.host_apf_flags =3D kvm_read_and_reset_apf_flags();
-	/* if exit due to NM, handle before interrupts are enabled */
-	else if (is_nm_fault(intr_info))
-		handle_nm_fault_irqoff(vcpu);
-	/* Handle machine checks before interrupts are enabled */
-	else if (is_machine_check(intr_info))
-		kvm_machine_check();
-	/* We need to handle NMIs before interrupts are enabled */
-	else if (is_nmi(intr_info))
-		handle_interrupt_nmi_irqoff(vcpu, nmi_entry);
-}
-
-static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
-					     u32 intr_info)
-{
-	unsigned int vector =3D intr_info & INTR_INFO_VECTOR_MASK;
-	gate_desc *desc =3D (gate_desc *)host_idt_base + vector;
-
-	if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
-	    "unexpected VM-Exit interrupt info: 0x%x", intr_info))
-		return;
-
-	handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
-	vcpu->arch.at_instruction_boundary =3D true;
-}
-
 void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx =3D to_vmx(vcpu);
@@ -6901,9 +6837,10 @@ void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 		return;
=20
 	if (vmx->exit_reason.basic =3D=3D EXIT_REASON_EXTERNAL_INTERRUPT)
-		handle_external_interrupt_irqoff(vcpu, vmx_get_intr_info(vcpu));
+		vmx_handle_external_interrupt_irqoff(vcpu,
+						     vmx_get_intr_info(vcpu));
 	else if (vmx->exit_reason.basic =3D=3D EXIT_REASON_EXCEPTION_NMI)
-		handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
+		vmx_handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
 }
=20
 /*
@@ -8184,7 +8121,7 @@ __init int vmx_hardware_setup(void)
 	int r;
=20
 	store_idt(&dt);
-	host_idt_base =3D dt.address;
+	vmx_host_idt_base =3D dt.address;
=20
 	vmx_setup_user_return_msrs();
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 981D1C7EE2E
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:29:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231769AbjB0I3i (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:29:38 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57140 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231660AbjB0I2s (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:48 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2A751BAEE;
        Mon, 27 Feb 2023 00:25:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486343; x=1709022343;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=kQr2gKQ7trKpItl8kDnUbyR3cFkXgDJ0drMXkUnVsks=;
  b=fvgYKTtDnnBbNpIOX7eX+qMrZU/fn3PjinkXvSqOvNS/1W00B7NMd8Fc
   xX5WPk8/wUFKoiVBFQeDBNklj/A0AIhhbinG6xJXqA4aH6vf3fdYwNdM3
   PVP3F4XkW786iQIu/rSYHk1Oahhf03/z1Ztp9HGyvk71py4hyjGkbYWN8
   oKkgpOw0k/o16BDOXBwtV+eCaYJa06KKt9G3SqfOQsDBdWTzmGd4VddlJ
   VrSyUN8kwl9t9qWYRwYS/PsZazuZk+i7zk8jgS51QpuUnqLJMQd3KIzVH
   3WkFF/IRFS8Xwd5p7HVoEJqze3YBQ8GdVcac6dS8j79JosdviQZi2Ozfo
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317608994"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317608994"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:16 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242347"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242347"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:16 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 075/106] KVM: x86: Split core of hypercall emulation to
 helper function
Date: Mon, 27 Feb 2023 00:23:14 -0800
Message-Id: 
 <de9d55ffe25c566032eb44304d435e16a426c370.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

By necessity, TDX will use a different register ABI for hypercalls.
Break out the core functionality so that it may be reused for TDX.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 +++
 arch/x86/kvm/x86.c              | 54 ++++++++++++++++++++-------------
 2 files changed, 37 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 2051ae6da619..301059b2e882 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2085,6 +2085,10 @@ static inline void kvm_clear_apicv_inhibit(struct kv=
m *kvm,
 	kvm_set_or_clear_apicv_inhibit(kvm, reason, false);
 }
=20
+unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long=
 nr,
+				      unsigned long a0, unsigned long a1,
+				      unsigned long a2, unsigned long a3,
+				      int op_64_bit);
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
=20
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_=
code,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b10f54c8acf..94416992868b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9731,26 +9731,15 @@ static int complete_hypercall_exit(struct kvm_vcpu =
*vcpu)
 	return kvm_skip_emulated_instruction(vcpu);
 }
=20
-int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
+unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long=
 nr,
+				      unsigned long a0, unsigned long a1,
+				      unsigned long a2, unsigned long a3,
+				      int op_64_bit)
 {
-	unsigned long nr, a0, a1, a2, a3, ret;
-	int op_64_bit;
-
-	if (kvm_xen_hypercall_enabled(vcpu->kvm))
-		return kvm_xen_hypercall(vcpu);
-
-	if (kvm_hv_hypercall_enabled(vcpu))
-		return kvm_hv_hypercall(vcpu);
-
-	nr =3D kvm_rax_read(vcpu);
-	a0 =3D kvm_rbx_read(vcpu);
-	a1 =3D kvm_rcx_read(vcpu);
-	a2 =3D kvm_rdx_read(vcpu);
-	a3 =3D kvm_rsi_read(vcpu);
+	unsigned long ret;
=20
 	trace_kvm_hypercall(nr, a0, a1, a2, a3);
=20
-	op_64_bit =3D is_64_bit_hypercall(vcpu);
 	if (!op_64_bit) {
 		nr &=3D 0xFFFFFFFF;
 		a0 &=3D 0xFFFFFFFF;
@@ -9759,11 +9748,6 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		a3 &=3D 0xFFFFFFFF;
 	}
=20
-	if (static_call(kvm_x86_get_cpl)(vcpu) !=3D 0) {
-		ret =3D -KVM_EPERM;
-		goto out;
-	}
-
 	ret =3D -KVM_ENOSYS;
=20
 	switch (nr) {
@@ -9822,6 +9806,34 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		ret =3D -KVM_ENOSYS;
 		break;
 	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(__kvm_emulate_hypercall);
+
+int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
+{
+	unsigned long nr, a0, a1, a2, a3, ret;
+	int op_64_bit;
+
+	if (kvm_xen_hypercall_enabled(vcpu->kvm))
+		return kvm_xen_hypercall(vcpu);
+
+	if (kvm_hv_hypercall_enabled(vcpu))
+		return kvm_hv_hypercall(vcpu);
+
+	nr =3D kvm_rax_read(vcpu);
+	a0 =3D kvm_rbx_read(vcpu);
+	a1 =3D kvm_rcx_read(vcpu);
+	a2 =3D kvm_rdx_read(vcpu);
+	a3 =3D kvm_rsi_read(vcpu);
+	op_64_bit =3D is_64_bit_hypercall(vcpu);
+
+	if (static_call(kvm_x86_get_cpl)(vcpu) !=3D 0) {
+		ret =3D -KVM_EPERM;
+		goto out;
+	}
+
+	ret =3D __kvm_emulate_hypercall(vcpu, nr, a0, a1, a2, a3, op_64_bit);
 out:
 	if (!op_64_bit)
 		ret =3D (u32)ret;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B5F86C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:29:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231775AbjB0I3l (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:29:41 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57436 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231234AbjB0I27 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:28:59 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C9C920047;
        Mon, 27 Feb 2023 00:25:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486344; x=1709022344;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=RapnNNHUxBic8wskgMEuE3x/uigX4zqDhyTg0s68/Vo=;
  b=Z8u8iYfhMeBKanxTNd35v4Ahi60WZyX7PRB0en7FHIBEmrRo8V/76v0n
   E1+/U+byVmn7RRMD7xqaQZIdBdfxl2JSHPVFtxIIObXTVvBhRfbQlVRhH
   OTHtCyk4QIfM7wFdV8jsWdBoQI5iFta1kLpOvMak2ZO24VScj+HilyOsB
   mGrJr00sfbZq2hQT2To5WF24lxFFTNlV40LwK5ET1HSd9J8myUcfNWSQl
   JJcN5howE6OlK3RuBzRL0cm4oER37gwVAqHUu2UkEM1zW80f0I6ONb6Jv
   rni/XyUvAFbz7Y1l7cKQBf4fxf/j/+MpxEheQCrFM1Fgs6wcykxmKApGa
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609003"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609003"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:16 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242351"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242351"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:16 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 076/106] KVM: TDX: Add a place holder to handle TDX VM
 exit
Date: Mon, 27 Feb 2023 00:23:15 -0800
Message-Id: 
 <81acef289a9928eef080fe9b04740c5e6b239ec7.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up handle_exit and handle_exit_irqoff methods and add a place holder
to handle VM exit.  Add helper functions to get exit info, exit
qualification, etc.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/main.c    | 37 ++++++++++++++--
 arch/x86/kvm/vmx/tdx.c     | 88 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h | 10 +++++
 3 files changed, 132 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 134f03a891b4..fdb016ee6848 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -218,6 +218,25 @@ static bool vt_protected_apic_has_interrupt(struct kvm=
_vcpu *vcpu)
 	return tdx_protected_apic_has_interrupt(vcpu);
 }
=20
+static int vt_handle_exit(struct kvm_vcpu *vcpu,
+			     enum exit_fastpath_completion fastpath)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_handle_exit(vcpu, fastpath);
+
+	return vmx_handle_exit(vcpu, fastpath);
+}
+
+static void vt_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_handle_exit_irqoff(vcpu);
+		return;
+	}
+
+	vmx_handle_exit_irqoff(vcpu);
+}
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -424,6 +443,18 @@ static void vt_request_immediate_exit(struct kvm_vcpu =
*vcpu)
 	vmx_request_immediate_exit(vcpu);
 }
=20
+static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+			u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_get_exit_info(vcpu, reason, info1, info2, intr_info,
+				  error_code);
+		return;
+	}
+
+	vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code);
+}
+
 static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	if (is_td_vcpu(vcpu))
@@ -520,7 +551,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.vcpu_pre_run =3D vt_vcpu_pre_run,
 	.vcpu_run =3D vt_vcpu_run,
-	.handle_exit =3D vmx_handle_exit,
+	.handle_exit =3D vt_handle_exit,
 	.skip_emulated_instruction =3D vmx_skip_emulated_instruction,
 	.update_emulated_instruction =3D vmx_update_emulated_instruction,
 	.set_interrupt_shadow =3D vt_set_interrupt_shadow,
@@ -555,7 +586,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.set_identity_map_addr =3D vmx_set_identity_map_addr,
 	.get_mt_mask =3D vt_get_mt_mask,
=20
-	.get_exit_info =3D vmx_get_exit_info,
+	.get_exit_info =3D vt_get_exit_info,
=20
 	.vcpu_after_set_cpuid =3D vmx_vcpu_after_set_cpuid,
=20
@@ -569,7 +600,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.load_mmu_pgd =3D vt_load_mmu_pgd,
=20
 	.check_intercept =3D vmx_check_intercept,
-	.handle_exit_irqoff =3D vmx_handle_exit_irqoff,
+	.handle_exit_irqoff =3D vt_handle_exit_irqoff,
=20
 	.request_immediate_exit =3D vt_request_immediate_exit,
=20
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 42ab1f13a48f..1218896d8ba4 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -87,6 +87,26 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u=
16 hkid)
 	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
 }
=20
+static __always_inline unsigned long tdexit_exit_qual(struct kvm_vcpu *vcp=
u)
+{
+	return kvm_rcx_read(vcpu);
+}
+
+static __always_inline unsigned long tdexit_ext_exit_qual(struct kvm_vcpu =
*vcpu)
+{
+	return kvm_rdx_read(vcpu);
+}
+
+static __always_inline unsigned long tdexit_gpa(struct kvm_vcpu *vcpu)
+{
+	return kvm_r8_read(vcpu);
+}
+
+static __always_inline unsigned long tdexit_intr_info(struct kvm_vcpu *vcp=
u)
+{
+	return kvm_r9_read(vcpu);
+}
+
 static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
 {
 	return tdx->tdvpr_pa;
@@ -754,6 +774,25 @@ void tdx_inject_nmi(struct kvm_vcpu *vcpu)
 	td_management_write8(to_tdx(vcpu), TD_VCPU_PEND_NMI, 1);
 }
=20
+void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+	u16 exit_reason =3D tdx->exit_reason.basic;
+
+	if (exit_reason =3D=3D EXIT_REASON_EXCEPTION_NMI)
+		vmx_handle_exception_nmi_irqoff(vcpu, tdexit_intr_info(vcpu));
+	else if (exit_reason =3D=3D EXIT_REASON_EXTERNAL_INTERRUPT)
+		vmx_handle_external_interrupt_irqoff(vcpu,
+						     tdexit_intr_info(vcpu));
+}
+
+static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
+{
+	vcpu->run->exit_reason =3D KVM_EXIT_SHUTDOWN;
+	vcpu->mmio_needed =3D 0;
+	return 0;
+}
+
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
 {
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
@@ -1080,6 +1119,55 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, i=
nt delivery_mode,
 	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
 }
=20
+int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
+{
+	union tdx_exit_reason exit_reason =3D to_tdx(vcpu)->exit_reason;
+
+	/* See the comment of tdh_sept_seamcall(). */
+	if (unlikely(exit_reason.full =3D=3D (TDX_OPERAND_BUSY | TDX_OPERAND_ID_S=
EPT)))
+		return 1;
+
+	if (unlikely(exit_reason.non_recoverable || exit_reason.error)) {
+		if (exit_reason.basic =3D=3D EXIT_REASON_TRIPLE_FAULT)
+			return tdx_handle_triple_fault(vcpu);
+
+		kvm_pr_unimpl("TD exit 0x%llx, %d hkid 0x%x hkid pa 0x%llx\n",
+			      exit_reason.full, exit_reason.basic,
+			      to_kvm_tdx(vcpu->kvm)->hkid,
+			      set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid));
+		goto unhandled_exit;
+	}
+
+	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
+
+	switch (exit_reason.basic) {
+	default:
+		break;
+	}
+
+unhandled_exit:
+	vcpu->run->exit_reason =3D KVM_EXIT_INTERNAL_ERROR;
+	vcpu->run->internal.suberror =3D KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASO=
N;
+	vcpu->run->internal.ndata =3D 2;
+	vcpu->run->internal.data[0] =3D exit_reason.full;
+	vcpu->run->internal.data[1] =3D vcpu->arch.last_vmentry_cpu;
+	return 0;
+}
+
+void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	*reason =3D tdx->exit_reason.full;
+
+	*info1 =3D tdexit_exit_qual(vcpu);
+	*info2 =3D tdexit_ext_exit_qual(vcpu);
+
+	*intr_info =3D tdexit_intr_info(vcpu);
+	*error_code =3D 0;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 3aaee10aaa29..ac08f45d8c9e 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -161,11 +161,16 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcp=
u);
 void tdx_vcpu_put(struct kvm_vcpu *vcpu);
 void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
+void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu);
+int tdx_handle_exit(struct kvm_vcpu *vcpu,
+		enum exit_fastpath_completion fastpath);
 u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
=20
 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 			   int trig_mode, int vector);
 void tdx_inject_nmi(struct kvm_vcpu *vcpu);
+void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
+		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
=20
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
=20
@@ -197,11 +202,16 @@ static inline void tdx_prepare_switch_to_guest(struct=
 kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {}
 static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
 static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)=
 { return false; }
+static inline void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu) {}
+static inline int tdx_handle_exit(struct kvm_vcpu *vcpu,
+		enum exit_fastpath_completion fastpath) { return 0; }
 static inline u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is=
_mmio) { return 0; }
=20
 static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int deliv=
ery_mode,
 					 int trig_mode, int vector) {}
 static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {}
+static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u=
64 *info1,
+				     u64 *info2, u32 *intr_info, u32 *error_code) {}
=20
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EBDB5C7EE2E
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:29:45 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231785AbjB0I3o (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:29:44 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56928 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231358AbjB0I3A (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:00 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9933C1F92B;
        Mon, 27 Feb 2023 00:25:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486345; x=1709022345;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Ov5DgyTrv7L7li/EvdiDpXo3AQar4ALCRKLAlmlcPEo=;
  b=BqysAO8B7QLTTBzJijRb6KUtu5oOglEus3NiE2Y95FkvXxnT1ih+Mje/
   pyy+aIF3PKEF1R7qWcdSspBboGAIv1VNALhECJ34wrtbV58/37tNBp3d1
   gO7t7qHg+1d52TvfrqCSX5yZ5ipZr3GDkX/en4ZO9Sx/7ZGQW4J7HFFQt
   R3nrsKXCUIQLdbaCjhvsUSrLyAq1R621scL0989UVNqFJI0gBYve6dTtr
   01fgYNqqgT5hmwIVAPwzn2Ux+hK1XHjRF2zP5RV9db1anHA0wQU0Sff2b
   8kjCzmzn5kTCSjD8c3VrizxYxTJXOYKZ9oya+KxOw3qyMzfsLnU79uvJT
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609008"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609008"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:17 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242356"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242356"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:16 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Yao Yuan <yuan.yao@intel.com>
Subject: [PATCH v12 077/106] KVM: TDX: Handle vmentry failure for INTEL TD
 guest
Date: Mon, 27 Feb 2023 00:23:16 -0800
Message-Id: 
 <1de5d5d4e406c4103aec47ccdb4ecf2a826a73be.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Yao Yuan <yuan.yao@intel.com>

TDX module passes control back to VMM if it failed to vmentry for a TD, use
same exit reason to notify user space, align with VMX.
If VMM corrupted TD VMCS, machine check during entry can happens.  vm exit
reason will be EXIT_REASON_MCE_DURING_VMENTRY.  If VMM corrupted TD VMCS
with debug TD by TDH.VP.WR, the exit reason would be
EXIT_REASON_INVALID_STATE or EXIT_REASON_MSR_LOAD_FAIL.

Signed-off-by: Yao Yuan <yuan.yao@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 1218896d8ba4..36c5a0b6e452 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1138,6 +1138,28 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_=
t fastpath)
 		goto unhandled_exit;
 	}
=20
+	/*
+	 * When TDX module saw VMEXIT_REASON_FAILED_VMENTER_MC etc, TDH.VP.ENTER
+	 * returns with TDX_SUCCESS | exit_reason with failed_vmentry =3D 1.
+	 * Because TDX module maintains TD VMCS correctness, usually vmentry
+	 * failure shouldn't happen.  In some corner cases it can happen.  For
+	 * example
+	 * - machine check during entry: EXIT_REASON_MCE_DURING_VMENTRY
+	 * - TDH.VP.WR with debug TD.  VMM can corrupt TD VMCS
+	 *   - EXIT_REASON_INVALID_STATE
+	 *   - EXIT_REASON_MSR_LOAD_FAIL
+	 */
+	if (unlikely(exit_reason.failed_vmentry)) {
+		pr_err("TDExit: exit_reason 0x%016llx qualification=3D%016lx ext_qualifi=
cation=3D%016lx\n",
+		       exit_reason.full, tdexit_exit_qual(vcpu), tdexit_ext_exit_qual(vc=
pu));
+		vcpu->run->exit_reason =3D KVM_EXIT_FAIL_ENTRY;
+		vcpu->run->fail_entry.hardware_entry_failure_reason
+			=3D exit_reason.full;
+		vcpu->run->fail_entry.cpu =3D vcpu->arch.last_vmentry_cpu;
+
+		return 0;
+	}
+
 	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
=20
 	switch (exit_reason.basic) {
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F2A29C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:29:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230505AbjB0I3r (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:29:47 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56950 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231503AbjB0I3H (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:07 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5DBDA2007F;
        Mon, 27 Feb 2023 00:25:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486349; x=1709022349;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=jtQyQ4YIe85j6uadyLZpZT+m3TocaRzDlQSxEgXebOY=;
  b=f/n4XUjs4djxOAXbga/10zXFznl1ElcO14p3cmxgD1wIOPE7Nh1gtG4o
   vDO7wU9eyU612IJYqFAn9pnL1SBpN1+QcsS9vPgNMA3Uqud7CObdDiuMm
   WRv3hPLMvykp2xZzKd7lAm1cBU0cCUH/n+DvKtTjI/s1U7JQZHvfDZDJ/
   5bT6fjTec9s040Dh7cd8ZbNKeFnudGX25SZOHb/C1Wlx1zUMYammFDgtH
   3VPdrAo942KQBxgcVG5+s5iHv2KD9TGlAIKUZqOAedurn+8pB5oMdBjsP
   JqmbAa5Q2jv3EOCAhCoKO5eSRuKOwouxrQNW1yrrXShCvQMo4wE5UelfU
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609013"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609013"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:17 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242361"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242361"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:17 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 078/106] KVM: TDX: handle EXIT_REASON_OTHER_SMI
Date: Mon, 27 Feb 2023 00:23:17 -0800
Message-Id: 
 <1f4b4525d501aa503950d36d3a75c907fb0531cb.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

If the control reaches EXIT_REASON_OTHER_SMI, #SMI is delivered and
handled right after returning from the TDX module to KVM nothing needs to
be done in KVM.  Continue TDX vcpu execution.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/uapi/asm/vmx.h | 1 +
 arch/x86/kvm/vmx/tdx.c          | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vm=
x.h
index a5faf6d88f1b..b3a30ef3efdd 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -34,6 +34,7 @@
 #define EXIT_REASON_TRIPLE_FAULT        2
 #define EXIT_REASON_INIT_SIGNAL			3
 #define EXIT_REASON_SIPI_SIGNAL         4
+#define EXIT_REASON_OTHER_SMI           6
=20
 #define EXIT_REASON_INTERRUPT_WINDOW    7
 #define EXIT_REASON_NMI_WINDOW          8
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 36c5a0b6e452..f2cafa107042 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1163,6 +1163,13 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_=
t fastpath)
 	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
=20
 	switch (exit_reason.basic) {
+	case EXIT_REASON_OTHER_SMI:
+		/*
+		 * If reach here, it's not a Machine Check System Management
+		 * Interrupt(MSMI).  #SMI is delivered and handled right after
+		 * SEAMRET, nothing needs to be done in KVM.
+		 */
+		return 1;
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B7952C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:29:51 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231804AbjB0I3u (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:29:50 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57848 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230506AbjB0I3M (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:12 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2EDE1F912;
        Mon, 27 Feb 2023 00:25:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486351; x=1709022351;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=+j/O5c59ftnWOY3P/1CS7Aqjf84td7eyGA6wZZDWE1A=;
  b=ahZMquDMZekhvoD4TUedpm2imWNHsJqkNmxRURbzBuPfEiNp/tzt1K0K
   BFAnedru6S4yalxxOKT7Rb7jdEBt1Avi4GDRms4hnQQvPy3rzc/zVcUQ2
   tYCNxGHmcsPckMT0AVCODcM6mENcNplkBAmETJCANIcR4qQSXsSlNmot8
   tCnrdrORAEcxBBF5/kb+v4BKbCErjULWVG9P9XBc0KaYQWGN3D7kNaV2I
   VAiH5t6lUwYld8BHB7IQUsKaVxuqn4ge7ofIxjMMAdoqSOdDaJZOnF/Qx
   AliN5eUY6QyU+zGEGO48WaONScfYvbjVrVzxxYccD6yoUOWE8W3P48F+h
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609015"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609015"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:17 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242365"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242365"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:17 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 079/106] KVM: TDX: handle ept violation/misconfig exit
Date: Mon, 27 Feb 2023 00:23:18 -0800
Message-Id: 
 <58fb0c71ecf2e8a54b3ae7844acb61b76f2b1a14.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

On EPT violation, call a common function, __vmx_handle_ept_violation() to
trigger x86 MMU code.  On EPT misconfiguration, exit to ring 3 with
KVM_EXIT_UNKNOWN.  because EPT misconfiguration can't happen as MMIO is
trigged by TDG.VP.VMCALL. No point to set a misconfiguration value for the
fast path.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 46 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index f2cafa107042..e9d9b2e2300a 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1119,6 +1119,48 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, i=
nt delivery_mode,
 	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
 }
=20
+static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qual;
+
+	if (kvm_is_private_gpa(vcpu->kvm, tdexit_gpa(vcpu))) {
+		/*
+		 * Always treat SEPT violations as write faults.  Ignore the
+		 * EXIT_QUALIFICATION reported by TDX-SEAM for SEPT violations.
+		 * TD private pages are always RWX in the SEPT tables,
+		 * i.e. they're always mapped writable.  Just as importantly,
+		 * treating SEPT violations as write faults is necessary to
+		 * avoid COW allocations, which will cause TDAUGPAGE failures
+		 * due to aliasing a single HPA to multiple GPAs.
+		 */
+#define TDX_SEPT_VIOLATION_EXIT_QUAL	EPT_VIOLATION_ACC_WRITE
+		exit_qual =3D TDX_SEPT_VIOLATION_EXIT_QUAL;
+	} else {
+		exit_qual =3D tdexit_exit_qual(vcpu);
+		if (exit_qual & EPT_VIOLATION_ACC_INSTR) {
+			pr_warn("kvm: TDX instr fetch to shared GPA =3D 0x%lx @ RIP =3D 0x%lx\n=
",
+				tdexit_gpa(vcpu), kvm_rip_read(vcpu));
+			vcpu->run->exit_reason =3D KVM_EXIT_EXCEPTION;
+			vcpu->run->ex.exception =3D PF_VECTOR;
+			vcpu->run->ex.error_code =3D exit_qual;
+			return 0;
+		}
+	}
+
+	trace_kvm_page_fault(vcpu, tdexit_gpa(vcpu), exit_qual);
+	return __vmx_handle_ept_violation(vcpu, tdexit_gpa(vcpu), exit_qual);
+}
+
+static int tdx_handle_ept_misconfig(struct kvm_vcpu *vcpu)
+{
+	WARN_ON_ONCE(1);
+
+	vcpu->run->exit_reason =3D KVM_EXIT_UNKNOWN;
+	vcpu->run->hw.hardware_exit_reason =3D EXIT_REASON_EPT_MISCONFIG;
+
+	return 0;
+}
+
 int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
 {
 	union tdx_exit_reason exit_reason =3D to_tdx(vcpu)->exit_reason;
@@ -1163,6 +1205,10 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_=
t fastpath)
 	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
=20
 	switch (exit_reason.basic) {
+	case EXIT_REASON_EPT_VIOLATION:
+		return tdx_handle_ept_violation(vcpu);
+	case EXIT_REASON_EPT_MISCONFIG:
+		return tdx_handle_ept_misconfig(vcpu);
 	case EXIT_REASON_OTHER_SMI:
 		/*
 		 * If reach here, it's not a Machine Check System Management
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3C101C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:31:54 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231966AbjB0Ibw (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:31:52 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59244 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231604AbjB0I3S (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:18 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA70720567;
        Mon, 27 Feb 2023 00:25:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486353; x=1709022353;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=JlHvTaSpNKtx5l0pdLS+h+UoOp+BvjSJQEMljYn0BP4=;
  b=Al+ikSQcg54zYkq0YqTZjFQ2QH2PynzQHGcZ8ErK98j/yEqspX945MAU
   ZJppFZXW63inA44XM+Qfhut+6LnorBBX+PCBp0VDzP0HGMuPvxvd+JOXm
   bEFkaEoSifgoDl9XA9Nspg6v6JVTRQAAZxdVA5CliDzonWZd6soW/l+Rk
   1KR7zd8T6ICE2uYqh2GHIfs3t8lhMyKl1re0fLQ+6ME1OUKlam9oMXWWR
   KL3TphBzI5LV/5IIybHE6/llk4co6pjaNEL9JyBOM3U99/mnlQMFXZXm1
   pAwCOKHBxjQDWMDJI4nOA+SmYeaK22XnDRDAWEwx1TcFP9hM/QKEUZpPU
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609022"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609022"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:17 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242368"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242368"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:17 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 080/106] KVM: TDX: handle EXCEPTION_NMI and
 EXTERNAL_INTERRUPT
Date: Mon, 27 Feb 2023 00:23:19 -0800
Message-Id: 
 <e3e4a184f5ca8b8c381a7fd8baf06c7afd34b9d7.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because guest TD state is protected, exceptions in guest TDs can't be
intercepted.  TDX VMM doesn't need to handle exceptions.
tdx_handle_exit_irqoff() handles NMI and machine check.  Ignore NMI and
machine check and continue guest TD execution.

For external interrupt, increment stats same to the VMX case.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index e9d9b2e2300a..336fb3fc6f0e 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -786,6 +786,25 @@ void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 						     tdexit_intr_info(vcpu));
 }
=20
+static int tdx_handle_exception(struct kvm_vcpu *vcpu)
+{
+	u32 intr_info =3D tdexit_intr_info(vcpu);
+
+	if (is_nmi(intr_info) || is_machine_check(intr_info))
+		return 1;
+
+	kvm_pr_unimpl("unexpected exception 0x%x(exit_reason 0x%llx qual 0x%lx)\n=
",
+		intr_info,
+		to_tdx(vcpu)->exit_reason.full, tdexit_exit_qual(vcpu));
+	return -EFAULT;
+}
+
+static int tdx_handle_external_interrupt(struct kvm_vcpu *vcpu)
+{
+	++vcpu->stat.irq_exits;
+	return 1;
+}
+
 static int tdx_handle_triple_fault(struct kvm_vcpu *vcpu)
 {
 	vcpu->run->exit_reason =3D KVM_EXIT_SHUTDOWN;
@@ -1205,6 +1224,10 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_=
t fastpath)
 	WARN_ON_ONCE(fastpath !=3D EXIT_FASTPATH_NONE);
=20
 	switch (exit_reason.basic) {
+	case EXIT_REASON_EXCEPTION_NMI:
+		return tdx_handle_exception(vcpu);
+	case EXIT_REASON_EXTERNAL_INTERRUPT:
+		return tdx_handle_external_interrupt(vcpu);
 	case EXIT_REASON_EPT_VIOLATION:
 		return tdx_handle_ept_violation(vcpu);
 	case EXIT_REASON_EPT_MISCONFIG:
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EF6CCC64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:12 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232112AbjB0IcL (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:11 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59278 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231545AbjB0I3U (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:20 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BF0C1E28B;
        Mon, 27 Feb 2023 00:25:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486355; x=1709022355;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=+dQ//FPeJciaxnp/rLPyp0BhnelDwWIYgp7ZgSZ/9Qw=;
  b=ZNeJ435ImS1P9fawkxJai+Au7Pho9Z/w4D6oceVPhPP2ZT9heX/hfVzy
   ARDma3I6hoO1kX1WwOOl9mwiNcfpNaIPZpepVnMmavLSLdo5JOAtOXLo2
   RBlg0qtUUKhty+bQafuv6RN2kDyY0Mc5PfVYaLL6SsgsVbXFymfI6jciN
   kTMLfpOnMpXwiUI6OlNx8QIUaY54ZIKmMU96mE/aQroGoX7NHfKBP1CQP
   4GfMrblxU/h0/GKUUclILDXvT1xw18AIAC5qzyC+6rfGse2hSz6bwRmzL
   1TW63MQiCjJn0xJSXexlTatlG/YlJ+mHZLADS0gFZzWaJBSmCPQZzIOu9
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609028"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609028"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:18 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242372"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242372"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:17 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Xiaoyao Li <xiaoyao.li@intel.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 081/106] KVM: TDX: Add a place holder for handler of TDX
 hypercalls (TDG.VP.VMCALL)
Date: Mon, 27 Feb 2023 00:23:20 -0800
Message-Id: 
 <00b0d4a435a639f756d11a29bdd04a853f58d692.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX module specification defines TDG.VP.VMCALL API (TDVMCALL for short)
for the guest TD to call hypercall to VMM.  When the guest TD issues
TDG.VP.VMCALL, the guest TD exits to VMM with a new exit reason of
TDVMCALL.  The arguments from the guest TD and returned values from the VMM
are passed in the guest registers.  The guest RCX registers indicates which
registers are used.  Define helper functions to access those registers as
ABI.

Define the TDVMCALL exit reason, which is carved out from the VMX exit
reason namespace as the TDVMCALL exit from TDX guest to TDX-SEAM is really
just a VM-Exit.  Add a place holder to handle TDVMCALL exit.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/uapi/asm/vmx.h |  4 ++-
 arch/x86/kvm/vmx/tdx.c          | 56 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h          | 13 ++++++++
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vm=
x.h
index b3a30ef3efdd..f0f4a4cf84a7 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -93,6 +93,7 @@
 #define EXIT_REASON_TPAUSE              68
 #define EXIT_REASON_BUS_LOCK            74
 #define EXIT_REASON_NOTIFY              75
+#define EXIT_REASON_TDCALL              77
=20
 #define VMX_EXIT_REASONS \
 	{ EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
@@ -156,7 +157,8 @@
 	{ EXIT_REASON_UMWAIT,                "UMWAIT" }, \
 	{ EXIT_REASON_TPAUSE,                "TPAUSE" }, \
 	{ EXIT_REASON_BUS_LOCK,              "BUS_LOCK" }, \
-	{ EXIT_REASON_NOTIFY,                "NOTIFY" }
+	{ EXIT_REASON_NOTIFY,                "NOTIFY" }, \
+	{ EXIT_REASON_TDCALL,                "TDCALL" }
=20
 #define VMX_EXIT_REASON_FLAGS \
 	{ VMX_EXIT_REASONS_FAILED_VMENTRY,	"FAILED_VMENTRY" }
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 336fb3fc6f0e..553fa5b431bc 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -107,6 +107,41 @@ static __always_inline unsigned long tdexit_intr_info(=
struct kvm_vcpu *vcpu)
 	return kvm_r9_read(vcpu);
 }
=20
+#define BUILD_TDVMCALL_ACCESSORS(param, gpr)				\
+static __always_inline							\
+unsigned long tdvmcall_##param##_read(struct kvm_vcpu *vcpu)		\
+{									\
+	return kvm_##gpr##_read(vcpu);					\
+}									\
+static __always_inline void tdvmcall_##param##_write(struct kvm_vcpu *vcpu=
, \
+						     unsigned long val)	\
+{									\
+	kvm_##gpr##_write(vcpu, val);					\
+}
+BUILD_TDVMCALL_ACCESSORS(a0, r12);
+BUILD_TDVMCALL_ACCESSORS(a1, r13);
+BUILD_TDVMCALL_ACCESSORS(a2, r14);
+BUILD_TDVMCALL_ACCESSORS(a3, r15);
+
+static __always_inline unsigned long tdvmcall_exit_type(struct kvm_vcpu *v=
cpu)
+{
+	return kvm_r10_read(vcpu);
+}
+static __always_inline unsigned long tdvmcall_leaf(struct kvm_vcpu *vcpu)
+{
+	return kvm_r11_read(vcpu);
+}
+static __always_inline void tdvmcall_set_return_code(struct kvm_vcpu *vcpu,
+						     long val)
+{
+	kvm_r10_write(vcpu, val);
+}
+static __always_inline void tdvmcall_set_return_val(struct kvm_vcpu *vcpu,
+						    unsigned long val)
+{
+	kvm_r11_write(vcpu, val);
+}
+
 static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
 {
 	return tdx->tdvpr_pa;
@@ -733,7 +768,8 @@ static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu=
 *vcpu,
 					struct vcpu_tdx *tdx)
 {
 	guest_enter_irqoff();
-	tdx->exit_reason.full =3D __tdx_vcpu_run(tdx->tdvpr_pa, vcpu->arch.regs, =
0);
+	tdx->exit_reason.full =3D __tdx_vcpu_run(tdx->tdvpr_pa, vcpu->arch.regs,
+					tdx->tdvmcall.regs_mask);
 	guest_exit_irqoff();
 }
=20
@@ -766,6 +802,11 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu)
=20
 	tdx_complete_interrupts(vcpu);
=20
+	if (tdx->exit_reason.basic =3D=3D EXIT_REASON_TDCALL)
+		tdx->tdvmcall.rcx =3D vcpu->arch.regs[VCPU_REGS_RCX];
+	else
+		tdx->tdvmcall.rcx =3D 0;
+
 	return EXIT_FASTPATH_NONE;
 }
=20
@@ -812,6 +853,17 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vc=
pu)
 	return 0;
 }
=20
+static int handle_tdvmcall(struct kvm_vcpu *vcpu)
+{
+	switch (tdvmcall_leaf(vcpu)) {
+	default:
+		break;
+	}
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+	return 1;
+}
+
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
 {
 	td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK);
@@ -1228,6 +1280,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t=
 fastpath)
 		return tdx_handle_exception(vcpu);
 	case EXIT_REASON_EXTERNAL_INTERRUPT:
 		return tdx_handle_external_interrupt(vcpu);
+	case EXIT_REASON_TDCALL:
+		return handle_tdvmcall(vcpu);
 	case EXIT_REASON_EPT_VIOLATION:
 		return tdx_handle_ept_violation(vcpu);
 	case EXIT_REASON_EPT_MISCONFIG:
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index cee7b4bc0d0a..fa44a1a9295f 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -72,6 +72,19 @@ struct vcpu_tdx {
=20
 	struct list_head cpu_list;
=20
+	union {
+		struct {
+			union {
+				struct {
+					u16 gpr_mask;
+					u16 xmm_mask;
+				};
+				u32 regs_mask;
+			};
+			u32 reserved;
+		};
+		u64 rcx;
+	} tdvmcall;
 	union tdx_exit_reason exit_reason;
=20
 	bool initialized;
--=20
2.25.1
From nobody Tue Sep  9 16:53:37 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9DE44C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:31:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231691AbjB0Ib1 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:31:27 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57004 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231741AbjB0I3X (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:23 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5DE921CAF9;
        Mon, 27 Feb 2023 00:25:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486359; x=1709022359;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=02E5upieJWsSkdv0u3zu+nn8onJ2BcXYRIZhkNViwvM=;
  b=D1o1mIIUzH+0fbAlsoTK3dKUbB1ZM+jVqqWRm0PvhtbGAFH1MHJC0Hmy
   QxSV6HgCmxOtRuchng7CUccRyuaXtazx+Op5Cx941QW7kT+J20WFSTVtf
   Mf8PsUumPN2+86pqSlRYF3PbCCzdIMrRInMNkV/elDopnflrWNRbx79Qg
   hMaLrOHq1h+rh9t/UkNJbiEVNONGomLUIN70AYRwfS9iWVS2IM6WEaNWM
   UnPg3XFF+dgWfafBRVwtx53r7BXKfS0dNM1w2aqWi9nyj1I43LR7t3LNM
   2/1iijP1Tk1ODY2/EQVWzgzJOarjedmVcNiZ9+qG15GUim32mZs1dNC2Z
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609030"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609030"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:18 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242377"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242377"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:17 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 082/106] KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL
Date: Mon, 27 Feb 2023 00:23:21 -0800
Message-Id: 
 <1bf33cd9de03976825361bbe27a7ec5440d7910e.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX Guest-Host communication interface (GHCI) specification defines
the ABI for the guest TD to issue hypercall.   It reserves vendor specific
arguments for VMM specific use.  Use it as KVM hypercall and handle it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 553fa5b431bc..7f8431c95b83 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -853,8 +853,39 @@ static int tdx_handle_triple_fault(struct kvm_vcpu *vc=
pu)
 	return 0;
 }
=20
+static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
+{
+	unsigned long nr, a0, a1, a2, a3, ret;
+
+	/*
+	 * ABI for KVM tdvmcall argument:
+	 * In Guest-Hypervisor Communication Interface(GHCI) specification,
+	 * Non-zero leaf number (R10 !=3D 0) is defined to indicate
+	 * vendor-specific.  KVM uses this for KVM hypercall.  NOTE: KVM
+	 * hypercall number starts from one.  Zero isn't used for KVM hypercall
+	 * number.
+	 *
+	 * R10: KVM hypercall number
+	 * arguments: R11, R12, R13, R14.
+	 */
+	nr =3D kvm_r10_read(vcpu);
+	a0 =3D kvm_r11_read(vcpu);
+	a1 =3D kvm_r12_read(vcpu);
+	a2 =3D kvm_r13_read(vcpu);
+	a3 =3D kvm_r14_read(vcpu);
+
+	ret =3D __kvm_emulate_hypercall(vcpu, nr, a0, a1, a2, a3, true);
+
+	tdvmcall_set_return_code(vcpu, ret);
+
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
+	if (tdvmcall_exit_type(vcpu))
+		return tdx_emulate_vmcall(vcpu);
+
 	switch (tdvmcall_leaf(vcpu)) {
 	default:
 		break;
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A7EADC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232088AbjB0IcG (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:06 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59372 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231348AbjB0I32 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:28 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28A2E1CAE4;
        Mon, 27 Feb 2023 00:26:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486360; x=1709022360;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=T6nq7g2r2mrBlzP/27TwKQjtModad84Pmc4tI0HvULE=;
  b=UdVnddllTo12xkP3CX0YlUeymVIUu2649gEfR4MNT1iWH4tgDVBpuoOB
   Bp+4OSYeIWLtSPrfW9KjBxxdIIOMJTTMAYq3WPl5HVzFFK4HOurmi55OA
   vnT+uR2GRDbwZP3bZM9maS33bEmCWWLXrV7UeBxhRsrLfw4V0QNmmiq55
   yCvAITgnPVQmZfBtiz54JdofHcRvH6tGiGHTd7Ur7LN6O5D4THvL0RMDI
   Hr1kRlgtnJkGQ/fyfHxeBkre4/VSxXBPuftjJNLFfZdOxWuicSHeN+YdR
   t/iWaogEEhZc9oPog3C59i//3affAIH6k2HIFB6QwdoUm9J5+BNse/r3n
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609036"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609036"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:18 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242380"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242380"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:18 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 083/106] KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL
Date: Mon, 27 Feb 2023 00:23:22 -0800
Message-Id: 
 <127082dc6ecb2db38fd663cc0529dec189bdbe3f.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Some of TDG.VP.VMCALL require device model, for example, qemu, to handle
them on behalf of kvm kernel module.

Introduce new kvm exit, KVM_EXIT_TDX, and functions to setup it.
TDG_VP_VMCALL_INVALID_OPERAND is set as default return value to avoid
random value.  Device model should update R10 if necessary.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c   | 93 +++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/kvm.h | 57 ++++++++++++++++++++++++
 2 files changed, 148 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 7f8431c95b83..7d806aab9598 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -123,6 +123,18 @@ BUILD_TDVMCALL_ACCESSORS(a1, r13);
 BUILD_TDVMCALL_ACCESSORS(a2, r14);
 BUILD_TDVMCALL_ACCESSORS(a3, r15);
=20
+#define TDX_VMCALL_REG_MASK_RBX	BIT_ULL(2)
+#define TDX_VMCALL_REG_MASK_RDX	BIT_ULL(3)
+#define TDX_VMCALL_REG_MASK_RBP	BIT_ULL(5)
+#define TDX_VMCALL_REG_MASK_RSI	BIT_ULL(6)
+#define TDX_VMCALL_REG_MASK_RDI	BIT_ULL(7)
+#define TDX_VMCALL_REG_MASK_R8	BIT_ULL(8)
+#define TDX_VMCALL_REG_MASK_R9	BIT_ULL(9)
+#define TDX_VMCALL_REG_MASK_R12	BIT_ULL(12)
+#define TDX_VMCALL_REG_MASK_R13	BIT_ULL(13)
+#define TDX_VMCALL_REG_MASK_R14	BIT_ULL(14)
+#define TDX_VMCALL_REG_MASK_R15	BIT_ULL(15)
+
 static __always_inline unsigned long tdvmcall_exit_type(struct kvm_vcpu *v=
cpu)
 {
 	return kvm_r10_read(vcpu);
@@ -881,6 +893,80 @@ static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_complete_vp_vmcall(struct kvm_vcpu *vcpu)
+{
+	struct kvm_tdx_vmcall *tdx_vmcall =3D &vcpu->run->tdx.u.vmcall;
+	__u64 reg_mask;
+
+	tdvmcall_set_return_code(vcpu, tdx_vmcall->status_code);
+	tdvmcall_set_return_val(vcpu, tdx_vmcall->out_r11);
+
+	reg_mask =3D kvm_rcx_read(vcpu);
+	if (reg_mask & TDX_VMCALL_REG_MASK_R12)
+		kvm_r12_write(vcpu, tdx_vmcall->out_r12);
+	if (reg_mask & TDX_VMCALL_REG_MASK_R13)
+		kvm_r13_write(vcpu, tdx_vmcall->out_r13);
+	if (reg_mask & TDX_VMCALL_REG_MASK_R14)
+		kvm_r14_write(vcpu, tdx_vmcall->out_r14);
+	if (reg_mask & TDX_VMCALL_REG_MASK_R15)
+		kvm_r15_write(vcpu, tdx_vmcall->out_r15);
+	if (reg_mask & TDX_VMCALL_REG_MASK_RBX)
+		kvm_rbx_write(vcpu, tdx_vmcall->out_rbx);
+	if (reg_mask & TDX_VMCALL_REG_MASK_RDI)
+		kvm_rdi_write(vcpu, tdx_vmcall->out_rdi);
+	if (reg_mask & TDX_VMCALL_REG_MASK_RSI)
+		kvm_rsi_write(vcpu, tdx_vmcall->out_rsi);
+	if (reg_mask & TDX_VMCALL_REG_MASK_R8)
+		kvm_r8_write(vcpu, tdx_vmcall->out_r8);
+	if (reg_mask & TDX_VMCALL_REG_MASK_R9)
+		kvm_r9_write(vcpu, tdx_vmcall->out_r9);
+	if (reg_mask & TDX_VMCALL_REG_MASK_RDX)
+		kvm_rdx_write(vcpu, tdx_vmcall->out_rdx);
+
+	return 1;
+}
+
+static int tdx_vp_vmcall_to_user(struct kvm_vcpu *vcpu)
+{
+	struct kvm_tdx_vmcall *tdx_vmcall =3D &vcpu->run->tdx.u.vmcall;
+	__u64 reg_mask;
+
+	vcpu->arch.complete_userspace_io =3D tdx_complete_vp_vmcall;
+	memset(tdx_vmcall, 0, sizeof(*tdx_vmcall));
+
+	vcpu->run->exit_reason =3D KVM_EXIT_TDX;
+	vcpu->run->tdx.type =3D KVM_EXIT_TDX_VMCALL;
+	tdx_vmcall->type =3D tdvmcall_exit_type(vcpu);
+	tdx_vmcall->subfunction =3D tdvmcall_leaf(vcpu);
+	tdx_vmcall->status_code =3D TDG_VP_VMCALL_INVALID_OPERAND;
+
+	reg_mask =3D kvm_rcx_read(vcpu);
+	tdx_vmcall->reg_mask =3D reg_mask;
+	if (reg_mask & TDX_VMCALL_REG_MASK_R12)
+		tdx_vmcall->in_r12 =3D kvm_r12_read(vcpu);
+	if (reg_mask & TDX_VMCALL_REG_MASK_R13)
+		tdx_vmcall->in_r13 =3D kvm_r13_read(vcpu);
+	if (reg_mask & TDX_VMCALL_REG_MASK_R14)
+		tdx_vmcall->in_r14 =3D kvm_r14_read(vcpu);
+	if (reg_mask & TDX_VMCALL_REG_MASK_R15)
+		tdx_vmcall->in_r15 =3D kvm_r15_read(vcpu);
+	if (reg_mask & TDX_VMCALL_REG_MASK_RBX)
+		tdx_vmcall->in_rbx =3D kvm_rbx_read(vcpu);
+	if (reg_mask & TDX_VMCALL_REG_MASK_RDI)
+		tdx_vmcall->in_rdi =3D kvm_rdi_read(vcpu);
+	if (reg_mask & TDX_VMCALL_REG_MASK_RSI)
+		tdx_vmcall->in_rsi =3D kvm_rsi_read(vcpu);
+	if (reg_mask & TDX_VMCALL_REG_MASK_R8)
+		tdx_vmcall->in_r8 =3D kvm_r8_read(vcpu);
+	if (reg_mask & TDX_VMCALL_REG_MASK_R9)
+		tdx_vmcall->in_r9 =3D kvm_r9_read(vcpu);
+	if (reg_mask & TDX_VMCALL_REG_MASK_RDX)
+		tdx_vmcall->in_rdx =3D kvm_rdx_read(vcpu);
+
+	/* notify userspace to handle the request */
+	return 0;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -891,8 +977,11 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		break;
 	}
=20
-	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
-	return 1;
+	/*
+	 * Unknown VMCALL.  Toss the request to the user space as it may know
+	 * how to handle.
+	 */
+	return tdx_vp_vmcall_to_user(vcpu);
 }
=20
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2fba29125ec2..433b0ee9e4bb 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -237,6 +237,60 @@ struct kvm_xen_exit {
 	} u;
 };
=20
+struct kvm_tdx_exit {
+#define KVM_EXIT_TDX_VMCALL	1
+	__u32 type;
+	__u32 pad;
+
+	union {
+		struct kvm_tdx_vmcall {
+			/*
+			 * Guest-Host-Communication Interface for TDX spec
+			 * defines the ABI for TDG.VP.VMCALL.
+			 */
+
+			/* Input parameters: guest -> VMM */
+			__u64 type;		/* r10 */
+			__u64 subfunction;	/* r11 */
+			__u64 reg_mask;		/* rcx */
+			/*
+			 * Subfunction specific.
+			 * Registers are used in this order to pass input
+			 * arguments.  r12=3Darg0, r13=3Darg1, etc.
+			 */
+			__u64 in_r12;
+			__u64 in_r13;
+			__u64 in_r14;
+			__u64 in_r15;
+			__u64 in_rbx;
+			__u64 in_rdi;
+			__u64 in_rsi;
+			__u64 in_r8;
+			__u64 in_r9;
+			__u64 in_rdx;
+
+			/* Output parameters: VMM -> guest */
+			__u64 status_code;	/* r10 */
+			/*
+			 * Subfunction specific.
+			 * Registers are used in this order to output return
+			 * values.  r11=3Dret0, r12=3Dret1, etc.
+			 */
+			__u64 out_r11;
+			__u64 out_r12;
+			__u64 out_r13;
+			__u64 out_r14;
+			__u64 out_r15;
+			__u64 out_rbx;
+			__u64 out_rdi;
+			__u64 out_rsi;
+			__u64 out_r8;
+			__u64 out_r9;
+			__u64 out_rdx;
+		} vmcall;
+	} u;
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX        1048576
=20
@@ -279,6 +333,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_CSR        36
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_MEMORY_FAULT     38
+#define KVM_EXIT_TDX              39
=20
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -527,6 +582,8 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory;
+		/* KVM_EXIT_TDX_VMCALL */
+		struct kvm_tdx_exit tdx;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 72ABFC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:34 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232182AbjB0Icc (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:32 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57030 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231526AbjB0I33 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:29 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDE8C1D93D;
        Mon, 27 Feb 2023 00:26:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486361; x=1709022361;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=/Y94LCoLS6sZG9ihodBSXz/QYH5KARFcUZtH+ByNpEg=;
  b=AG+wpJTGktEt2RMytXOxRAq4fuqVTSXvh5PpCZ8eScb36YlGRyikv2bO
   gJsfqYxcY8R5arultj+77ZL1iIIWqDEb9hzqRVQAVaZNRTo1ZyxFEmKM7
   uX3JQ+PZ9v9fgOiM9Om7oELtGjcmwXP7iY5gsoPSq/ZRBD1kzdDaV0pTQ
   HCOf3QLKmsH0doMgnqQ+upsbdGb+s9sc/JKb+Yk5pJbTNUaOzsE8sE6wy
   +ct0bKqfiapPBzfNAppnz60LKxu7d3PAweKvbYiGeYPbDjwkV4X65+R2M
   CiN9Q5Ugpx1krZPZPcRhH3iFvWTtMBXI6xDChN5t4jo0etRxgKkSjE2Lr
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609040"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609040"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:18 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242383"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242383"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:18 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 084/106] KVM: TDX: Handle TDX PV CPUID hypercall
Date: Mon, 27 Feb 2023 00:23:23 -0800
Message-Id: 
 <eaf8e469752c1d0a35b322a6c3163b152a96bebf.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV CPUID hypercall to the KVM backend function.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 7d806aab9598..5add38cfd9f9 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -967,12 +967,34 @@ static int tdx_vp_vmcall_to_user(struct kvm_vcpu *vcp=
u)
 	return 0;
 }
=20
+static int tdx_emulate_cpuid(struct kvm_vcpu *vcpu)
+{
+	u32 eax, ebx, ecx, edx;
+
+	/* EAX and ECX for cpuid is stored in R12 and R13. */
+	eax =3D tdvmcall_a0_read(vcpu);
+	ecx =3D tdvmcall_a1_read(vcpu);
+
+	kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, false);
+
+	tdvmcall_a0_write(vcpu, eax);
+	tdvmcall_a1_write(vcpu, ebx);
+	tdvmcall_a2_write(vcpu, ecx);
+	tdvmcall_a3_write(vcpu, edx);
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
 		return tdx_emulate_vmcall(vcpu);
=20
 	switch (tdvmcall_leaf(vcpu)) {
+	case EXIT_REASON_CPUID:
+		return tdx_emulate_cpuid(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D2459C7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:45 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232223AbjB0Ico (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:44 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57464 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231786AbjB0I3p (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:45 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E4AC1CAD6;
        Mon, 27 Feb 2023 00:26:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486375; x=1709022375;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=MwS5q3Mu/hl9ne31Qt+848iYLVrjE88ultEAyvMyIB0=;
  b=jGILGuXujSFHwxcdeA1pP8Isc4XMgkRJmL9xsjR+/2bVw4/ukVxGmmXp
   +j7yfEiC4hkTyEsqerspkYgkw8msQFm9W297dO1MjJZpLzuc11m8o2lRd
   der7QmUs/LhuZ/DW4sN/ZZcPphAaYu916EPUYlP7HVSzIWMGFLko729bN
   oHJNsiMU5xn1qVYelUpZ4VP1Ml/f9vjZ9bVjsUFxM4sCaUFBp1vgG/i/8
   gp1zKVDAcSI9pdh+jCDCzdTHKDKqOyV4t6fg2qMifvz/YRKhSigFbWMCG
   HMkz/RikB9nurKLpPx70DQQ+vF1aHdSoOJNrBqWmktfdtOhA83/CTAuHc
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609045"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609045"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:18 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242387"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242387"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:18 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 085/106] KVM: TDX: Handle TDX PV HLT hypercall
Date: Mon, 27 Feb 2023 00:23:24 -0800
Message-Id: 
 <fdcd211cd380ae1a7cb3b3bd600a674e2d7db628.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV HLT hypercall to the KVM backend function.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h |  3 +++
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 5add38cfd9f9..a5bfb82c620f 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -628,7 +628,32 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
=20
 bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
 {
-	return pi_has_pending_interrupt(vcpu);
+	bool ret =3D pi_has_pending_interrupt(vcpu);
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	if (ret || vcpu->arch.mp_state !=3D KVM_MP_STATE_HALTED)
+		return true;
+
+	if (tdx->interrupt_disabled_hlt)
+		return false;
+
+	/*
+	 * This is for the case where the virtual interrupt is recognized,
+	 * i.e. set in vmcs.RVI, between the STI and "HLT".  KVM doesn't have
+	 * access to RVI and the interrupt is no longer in the PID (because it
+	 * was "recognized".  It doesn't get delivered in the guest because the
+	 * TDCALL completes before interrupts are enabled.
+	 *
+	 * TDX modules sets RVI while in an STI interrupt shadow.
+	 * - TDExit(typically TDG.VP.VMCALL<HLT>) from the guest to TDX module.
+	 *   The interrupt shadow at this point is gone.
+	 * - It knows that there is an interrupt that can be delivered
+	 *   (RVI > PPR && EFLAGS.IF=3D1, the other conditions of 29.2.2 don't
+	 *    matter)
+	 * - It forwards the TDExit nevertheless, to a clueless hypervisor that
+	 *   has no way to glean either RVI or PPR.
+	 */
+	return !!xchg(&tdx->buggy_hlt_workaround, 0);
 }
=20
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
@@ -987,6 +1012,17 @@ static int tdx_emulate_cpuid(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_emulate_hlt(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
+
+	/* See tdx_protected_apic_has_interrupt() to avoid heavy seamcall */
+	tdx->interrupt_disabled_hlt =3D tdvmcall_a0_read(vcpu);
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	return kvm_emulate_halt_noskip(vcpu);
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -995,6 +1031,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 	switch (tdvmcall_leaf(vcpu)) {
 	case EXIT_REASON_CPUID:
 		return tdx_emulate_cpuid(vcpu);
+	case EXIT_REASON_HLT:
+		return tdx_emulate_hlt(vcpu);
 	default:
 		break;
 	}
@@ -1328,6 +1366,8 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, in=
t delivery_mode,
 	struct kvm_vcpu *vcpu =3D apic->vcpu;
 	struct vcpu_tdx *tdx =3D to_tdx(vcpu);
=20
+	/* See comment in tdx_protected_apic_has_interrupt(). */
+	tdx->buggy_hlt_workaround =3D 1;
 	/* TDX supports only posted interrupt.  No lapic emulation. */
 	__vmx_deliver_posted_interrupt(vcpu, &tdx->pi_desc, vector);
 }
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index fa44a1a9295f..71818c500186 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -93,6 +93,9 @@ struct vcpu_tdx {
 	bool host_state_need_restore;
 	u64 msr_host_kernel_gs_base;
=20
+	bool interrupt_disabled_hlt;
+	unsigned int buggy_hlt_workaround;
+
 	/*
 	 * Dummy to make pmu_intel not corrupt memory.
 	 * TODO: Support PMU for TDX.  Future work.
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3891EC64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232163AbjB0IcZ (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:25 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56956 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231796AbjB0I3r (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:47 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 115A81DBAE;
        Mon, 27 Feb 2023 00:26:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486379; x=1709022379;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=RsnmacNGyTsy5V37hvOsIqmS/OZZ7UCE26uJO+k3xrM=;
  b=L1i1ISUGEnDBuAfROpioy/PqHNxIARyh2fTMOYa704dJZ7CgMv31INHm
   IF7BH383B38E5Sf5gvBP7j3V3WwImtKUgyd5cVwxkbdqtp0+gaS3/coUK
   ZS2DQFhBuXplfOS9f76ZxHoX8bbUnBfrEBInDpeYOwiJR7Wut6Nr9Qiwp
   FW2+D4VbvgkLglG/dJcnmOpfF08lqw3fRAOUfUzahrO/TT9Hvnq5pueoQ
   yjpLYfoUeRfgJyTlQE2zyngO1QBw6jZxWFDXza4BHu50NzUOL3/ick1OG
   /JWSOOBdaOLhMsAtkTJkzkXVppc/17KvcZmrSC4LjpAB5ipktVibOAsvh
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609051"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609051"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242392"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242392"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:18 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 086/106] KVM: TDX: Handle TDX PV port io hypercall
Date: Mon, 27 Feb 2023 00:23:25 -0800
Message-Id: 
 <63b6cdcd4219e23bc806e27de1fdd68b3ef8a841.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV port IO hypercall to the KVM backend function.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 57 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a5bfb82c620f..17399fa558ca 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1023,6 +1023,61 @@ static int tdx_emulate_hlt(struct kvm_vcpu *vcpu)
 	return kvm_emulate_halt_noskip(vcpu);
 }
=20
+static int tdx_complete_pio_in(struct kvm_vcpu *vcpu)
+{
+	struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt;
+	unsigned long val =3D 0;
+	int ret;
+
+	WARN_ON_ONCE(vcpu->arch.pio.count !=3D 1);
+
+	ret =3D ctxt->ops->pio_in_emulated(ctxt, vcpu->arch.pio.size,
+					 vcpu->arch.pio.port, &val, 1);
+	WARN_ON_ONCE(!ret);
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	tdvmcall_set_return_val(vcpu, val);
+
+	return 1;
+}
+
+static int tdx_emulate_io(struct kvm_vcpu *vcpu)
+{
+	struct x86_emulate_ctxt *ctxt =3D vcpu->arch.emulate_ctxt;
+	unsigned long val =3D 0;
+	unsigned int port;
+	int size, ret;
+	bool write;
+
+	++vcpu->stat.io_exits;
+
+	size =3D tdvmcall_a0_read(vcpu);
+	write =3D tdvmcall_a1_read(vcpu);
+	port =3D tdvmcall_a2_read(vcpu);
+
+	if (size !=3D 1 && size !=3D 2 && size !=3D 4) {
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+		return 1;
+	}
+
+	if (write) {
+		val =3D tdvmcall_a3_read(vcpu);
+		ret =3D ctxt->ops->pio_out_emulated(ctxt, size, port, &val, 1);
+
+		/* No need for a complete_userspace_io callback. */
+		vcpu->arch.pio.count =3D 0;
+	} else {
+		ret =3D ctxt->ops->pio_in_emulated(ctxt, size, port, &val, 1);
+		if (!ret)
+			vcpu->arch.complete_userspace_io =3D tdx_complete_pio_in;
+		else
+			tdvmcall_set_return_val(vcpu, val);
+	}
+	if (ret)
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	return ret;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -1033,6 +1088,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_cpuid(vcpu);
 	case EXIT_REASON_HLT:
 		return tdx_emulate_hlt(vcpu);
+	case EXIT_REASON_IO_INSTRUCTION:
+		return tdx_emulate_io(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 596ECC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:31:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231610AbjB0IbX (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:31:23 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57828 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231702AbjB0I3r (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:47 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 962B020696;
        Mon, 27 Feb 2023 00:26:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486377; x=1709022377;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=WTa1iMeCd9TadMkXVh+i7NaovPdalZW+3j59Ps8gGGU=;
  b=XhzaF9YJX9Z+6EKwtdjf8M6rBk4MYX3DJtj4mMNVYNhkRRgS+lPbtuHD
   VMJzmmMBeR3Dcw/mxBFGxgzFoG3i2wJh2J0S8e/HJNNHn75H6sJIjSNCG
   1pl/McRY9F7ZnI64uB6Fm3rHnX7d9seUZvRKtrx5VIuF3THp2STDa0BmG
   fCO9KJ/5QJa3fwb2psSWg9vG2IoxmD+bDaZMNE8V/xxUb3JjtgwiesMTc
   HhZkRMyE8yQGUGEzKQfPFOrMUJUFQLR4OdMj807n6M4yLNap3/e0tQBqF
   nYMG98phkpIy/d9Fc0WHwyEJpj3GsRrcIMEY5p//m7hv9E98qTj4Ixc8b
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609054"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609054"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242397"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242397"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:18 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 087/106] KVM: TDX: Handle TDX PV MMIO hypercall
Date: Mon, 27 Feb 2023 00:23:26 -0800
Message-Id: 
 <d2ef8648e1185d8f3db9dbeabbe9027a81644cfa.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

Export kvm_io_bus_read and kvm_mmio tracepoint and wire up TDX PV MMIO
hypercall to the KVM backend functions.

kvm_io_bus_read/write() searches KVM device emulated in kernel of the given
MMIO address and emulates the MMIO.  As TDX PV MMIO also needs it, export
kvm_io_bus_read().  kvm_io_bus_write() is already exported.  TDX PV MMIO
emulates some of MMIO itself.  To add trace point consistently with x86
kvm, export kvm_mmio tracepoint.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 114 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c     |   1 +
 virt/kvm/kvm_main.c    |   2 +
 3 files changed, 117 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 17399fa558ca..6d1100a2de7c 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1078,6 +1078,118 @@ static int tdx_emulate_io(struct kvm_vcpu *vcpu)
 	return ret;
 }
=20
+static int tdx_complete_mmio(struct kvm_vcpu *vcpu)
+{
+	unsigned long val =3D 0;
+	gpa_t gpa;
+	int size;
+
+	KVM_BUG_ON(vcpu->mmio_needed !=3D 1, vcpu->kvm);
+	vcpu->mmio_needed =3D 0;
+
+	if (!vcpu->mmio_is_write) {
+		gpa =3D vcpu->mmio_fragments[0].gpa;
+		size =3D vcpu->mmio_fragments[0].len;
+
+		memcpy(&val, vcpu->run->mmio.data, size);
+		tdvmcall_set_return_val(vcpu, val);
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val);
+	}
+	return 1;
+}
+
+static inline int tdx_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, int siz=
e,
+				 unsigned long val)
+{
+	if (kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) &&
+	    kvm_io_bus_write(vcpu, KVM_MMIO_BUS, gpa, size, &val))
+		return -EOPNOTSUPP;
+
+	trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val);
+	return 0;
+}
+
+static inline int tdx_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, int size)
+{
+	unsigned long val;
+
+	if (kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) &&
+	    kvm_io_bus_read(vcpu, KVM_MMIO_BUS, gpa, size, &val))
+		return -EOPNOTSUPP;
+
+	tdvmcall_set_return_val(vcpu, val);
+	trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val);
+	return 0;
+}
+
+static int tdx_emulate_mmio(struct kvm_vcpu *vcpu)
+{
+	struct kvm_memory_slot *slot;
+	int size, write, r;
+	unsigned long val;
+	gpa_t gpa;
+
+	KVM_BUG_ON(vcpu->mmio_needed, vcpu->kvm);
+
+	size =3D tdvmcall_a0_read(vcpu);
+	write =3D tdvmcall_a1_read(vcpu);
+	gpa =3D tdvmcall_a2_read(vcpu);
+	val =3D write ? tdvmcall_a3_read(vcpu) : 0;
+
+	if (size !=3D 1 && size !=3D 2 && size !=3D 4 && size !=3D 8)
+		goto error;
+	if (write !=3D 0 && write !=3D 1)
+		goto error;
+
+	/* Strip the shared bit, allow MMIO with and without it set. */
+	gpa =3D gpa & ~gfn_to_gpa(kvm_gfn_shared_mask(vcpu->kvm));
+
+	if (size > 8u || ((gpa + size - 1) ^ gpa) & PAGE_MASK)
+		goto error;
+
+	slot =3D kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(gpa));
+	if (slot && !(slot->flags & KVM_MEMSLOT_INVALID))
+		goto error;
+
+	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
+		trace_kvm_fast_mmio(gpa);
+		return 1;
+	}
+
+	if (write)
+		r =3D tdx_mmio_write(vcpu, gpa, size, val);
+	else
+		r =3D tdx_mmio_read(vcpu, gpa, size);
+	if (!r) {
+		/* Kernel completed device emulation. */
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+		return 1;
+	}
+
+	/* Request the device emulation to userspace device model. */
+	vcpu->mmio_needed =3D 1;
+	vcpu->mmio_is_write =3D write;
+	vcpu->arch.complete_userspace_io =3D tdx_complete_mmio;
+
+	vcpu->run->mmio.phys_addr =3D gpa;
+	vcpu->run->mmio.len =3D size;
+	vcpu->run->mmio.is_write =3D write;
+	vcpu->run->exit_reason =3D KVM_EXIT_MMIO;
+
+	if (write) {
+		memcpy(vcpu->run->mmio.data, &val, size);
+	} else {
+		vcpu->mmio_fragments[0].gpa =3D gpa;
+		vcpu->mmio_fragments[0].len =3D size;
+		trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, size, gpa, NULL);
+	}
+	return 0;
+
+error:
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -1090,6 +1202,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_hlt(vcpu);
 	case EXIT_REASON_IO_INSTRUCTION:
 		return tdx_emulate_io(vcpu);
+	case EXIT_REASON_EPT_VIOLATION:
+		return tdx_emulate_mmio(vcpu);
 	default:
 		break;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 94416992868b..4ec246c88e41 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13564,6 +13564,7 @@ bool kvm_arch_dirty_log_supported(struct kvm *kvm)
=20
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e9f8225f3406..79e3c228bedf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2683,6 +2683,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struc=
t kvm_vcpu *vcpu, gfn_t gfn
=20
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot);
=20
 bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
 {
@@ -5839,6 +5840,7 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_b=
us bus_idx, gpa_t addr,
 	r =3D __kvm_io_bus_read(vcpu, bus, &range, val);
 	return r < 0 ? r : 0;
 }
+EXPORT_SYMBOL_GPL(kvm_io_bus_read);
=20
 /* Caller must hold slots_lock. */
 int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t a=
ddr,
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 827D4C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:04 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232064AbjB0IcC (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:02 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57848 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231704AbjB0I3s (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:48 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2631420699;
        Mon, 27 Feb 2023 00:26:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486379; x=1709022379;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=G6RRMIjkPCTN/JbwD1ITQP55obUq7fJpjFHZouRyZGQ=;
  b=cXeAbcjx35/Xj6x3Z3bIh+7FzXNkuWsaWM1vMefbTQoF0y5oyAGPYqdV
   ykZ0xXq9LFu1202/oIAhOI83nMSJMN1O8LMf5xpx5RQBg49vBiBDtqr0j
   /ovCatm3dxfTeYACm+wQoK78nq0e11l0F0LtsKyPsiFIKCWaefLKZ7q3z
   +vbf/fJSx/njVOhZY8YPYWLU2pPmewqDBsvXKNBqxUAdF5Fw0fKPY0Dj6
   7744X8V2IHzs4QBawf+5nkmNhk+3M3Ih69w/BZZQGgKPS4eQEo4nodBEV
   lWitXimJeuAsi8ay5BORY5IJhesbbWQoyNt/RcRQj+QDoepchUMuxDI3f
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609060"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609060"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242401"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242401"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:18 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 088/106] KVM: TDX: Implement callbacks for MSR operations
 for TDX
Date: Mon, 27 Feb 2023 00:23:27 -0800
Message-Id: 
 <b30af5432b6441bef2c42b83738fa572a8b398d0.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implements set_msr/get_msr/has_emulated_msr methods for TDX to handle
hypercall from guest TD for paravirtualized rdmsr and wrmsr.  The TDX
module virtualizes MSRs.  For some MSRs, it injects #VE to the guest TD
upon RDMSR or WRMSR.  The exact list of such MSRs are defined in the spec.

Upon #VE, the guest TD may execute hypercalls,
TDG.VP.VMCALL<INSTRUCTION.RDMSR> and TDG.VP.VMCALL<INSTRUCTION.WRMSR>,
which are defined in GHCI (Guest-Host Communication Interface) so that the
host VMM (e.g. KVM) can virtualize the MSRs.

There are three classes of MSRs virtualization.
- non-configurable: TDX module directly virtualizes it. VMM can't configure.
  the value set by KVM_SET_MSR_INDEX_LIST is ignored.
- configurable: TDX module directly virtualizes it. VMM can configure at the
  VM creation time.  The value set by KVM_SET_MSR_INDEX_LIST is used.
- #VE case
  Guest TD would issue TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR> and
  VMM handles the MSR hypercall. The value set by KVM_SET_MSR_INDEX_LIST is
  used.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
Changes v10 -> v11
- added .msr_filter_changed()
---
 arch/x86/kvm/vmx/main.c    | 44 ++++++++++++++++++++---
 arch/x86/kvm/vmx/tdx.c     | 74 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h |  6 ++++
 arch/x86/kvm/x86.c         |  1 -
 arch/x86/kvm/x86.h         |  2 ++
 5 files changed, 122 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index fdb016ee6848..67d565a32e96 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -237,6 +237,42 @@ static void vt_handle_exit_irqoff(struct kvm_vcpu *vcp=
u)
 	vmx_handle_exit_irqoff(vcpu);
 }
=20
+static int vt_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_set_msr(vcpu, msr_info);
+
+	return vmx_set_msr(vcpu, msr_info);
+}
+
+/*
+ * The kvm parameter can be NULL (module initialization, or invocation bef=
ore
+ * VM creation). Be sure to check the kvm parameter before using it.
+ */
+static bool vt_has_emulated_msr(struct kvm *kvm, u32 index)
+{
+	if (kvm && is_td(kvm))
+		return tdx_has_emulated_msr(index, true);
+
+	return vmx_has_emulated_msr(kvm, index);
+}
+
+static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_get_msr(vcpu, msr_info);
+
+	return vmx_get_msr(vcpu, msr_info);
+}
+
+static void vt_msr_filter_changed(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_msr_filter_changed(vcpu);
+}
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -500,7 +536,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.hardware_enable =3D vmx_hardware_enable,
 	.hardware_disable =3D vt_hardware_disable,
-	.has_emulated_msr =3D vmx_has_emulated_msr,
+	.has_emulated_msr =3D vt_has_emulated_msr,
=20
 	.is_vm_type_supported =3D vt_is_vm_type_supported,
 	.max_vcpus =3D vt_max_vcpus,
@@ -522,8 +558,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.update_exception_bitmap =3D vmx_update_exception_bitmap,
 	.get_msr_feature =3D vmx_get_msr_feature,
-	.get_msr =3D vmx_get_msr,
-	.set_msr =3D vmx_set_msr,
+	.get_msr =3D vt_get_msr,
+	.set_msr =3D vt_set_msr,
 	.get_segment_base =3D vmx_get_segment_base,
 	.get_segment =3D vmx_get_segment,
 	.set_segment =3D vmx_set_segment,
@@ -632,7 +668,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.apic_init_signal_blocked =3D vmx_apic_init_signal_blocked,
 	.migrate_timers =3D vmx_migrate_timers,
=20
-	.msr_filter_changed =3D vmx_msr_filter_changed,
+	.msr_filter_changed =3D vt_msr_filter_changed,
 	.complete_emulated_msr =3D kvm_complete_insn_gp,
=20
 	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6d1100a2de7c..4455bf5ae1ac 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1673,6 +1673,80 @@ void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *r=
eason,
 	*error_code =3D 0;
 }
=20
+static bool tdx_is_emulated_kvm_msr(u32 index, bool write)
+{
+	switch (index) {
+	case MSR_KVM_POLL_CONTROL:
+		return true;
+	default:
+		return false;
+	}
+}
+
+bool tdx_has_emulated_msr(u32 index, bool write)
+{
+	switch (index) {
+	case MSR_IA32_UCODE_REV:
+	case MSR_IA32_ARCH_CAPABILITIES:
+	case MSR_IA32_POWER_CTL:
+	case MSR_MTRRcap:
+	case 0x200 ... 0x26f:
+		/* IA32_MTRR_PHYS{BASE, MASK}, IA32_MTRR_FIX*_* */
+	case MSR_IA32_CR_PAT:
+	case MSR_MTRRdefType:
+	case MSR_IA32_TSC_DEADLINE:
+	case MSR_IA32_MISC_ENABLE:
+	case MSR_PLATFORM_INFO:
+	case MSR_MISC_FEATURES_ENABLES:
+	case MSR_IA32_MCG_CAP:
+	case MSR_IA32_MCG_STATUS:
+	case MSR_IA32_MCG_CTL:
+	case MSR_IA32_MCG_EXT_CTL:
+	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+		/* MSR_IA32_MCx_{CTL, STATUS, ADDR, MISC, CTL2} */
+		return true;
+	case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
+		/*
+		 * x2APIC registers that are virtualized by the CPU can't be
+		 * emulated, KVM doesn't have access to the virtual APIC page.
+		 */
+		switch (index) {
+		case X2APIC_MSR(APIC_TASKPRI):
+		case X2APIC_MSR(APIC_PROCPRI):
+		case X2APIC_MSR(APIC_EOI):
+		case X2APIC_MSR(APIC_ISR) ... X2APIC_MSR(APIC_ISR + APIC_ISR_NR):
+		case X2APIC_MSR(APIC_TMR) ... X2APIC_MSR(APIC_TMR + APIC_ISR_NR):
+		case X2APIC_MSR(APIC_IRR) ... X2APIC_MSR(APIC_IRR + APIC_ISR_NR):
+			return false;
+		default:
+			return true;
+		}
+	case MSR_IA32_APICBASE:
+	case MSR_EFER:
+		return !write;
+	case 0x4b564d00 ... 0x4b564dff:
+		/* KVM custom MSRs */
+		return tdx_is_emulated_kvm_msr(index, write);
+	default:
+		return false;
+	}
+}
+
+int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
+{
+	if (tdx_has_emulated_msr(msr->index, false))
+		return kvm_get_msr_common(vcpu, msr);
+	return 1;
+}
+
+int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
+{
+	if (tdx_has_emulated_msr(msr->index, true))
+		return kvm_set_msr_common(vcpu, msr);
+	return 1;
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index ac08f45d8c9e..26d07514f6a4 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -171,6 +171,9 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, int =
delivery_mode,
 void tdx_inject_nmi(struct kvm_vcpu *vcpu);
 void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
 		u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
+bool tdx_has_emulated_msr(u32 index, bool write);
+int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
+int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
=20
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
=20
@@ -212,6 +215,9 @@ static inline void tdx_deliver_interrupt(struct kvm_lap=
ic *apic, int delivery_mo
 static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {}
 static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u=
64 *info1,
 				     u64 *info2, u32 *intr_info, u32 *error_code) {}
+static inline bool tdx_has_emulated_msr(u32 index, bool write) { return fa=
lse; }
+static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
+static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
=20
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4ec246c88e41..7530efce6810 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -87,7 +87,6 @@
 #include "trace.h"
=20
 #define MAX_IO_MSRS 256
-#define KVM_MAX_MCE_BANKS 32
=20
 struct kvm_caps kvm_caps __read_mostly =3D {
 	.supported_mce_cap =3D MCG_CTL_P | MCG_SER_P,
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 888f34224bba..2608432fe524 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -8,6 +8,8 @@
 #include "kvm_cache_regs.h"
 #include "kvm_emulate.h"
=20
+#define KVM_MAX_MCE_BANKS 32
+
 bool __kvm_is_vm_type_supported(unsigned long type);
=20
 struct kvm_caps {
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5E6E8C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232194AbjB0Ici (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:38 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59244 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231822AbjB0I34 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:56 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E44421DBA4;
        Mon, 27 Feb 2023 00:26:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486381; x=1709022381;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=VFZFvC05VbVBY4hlx/gwByBhvWXYymqzPur7cG6HHJ4=;
  b=R7AT5vQgxpqZdFqkgOoXPpbIp1hikerorGT4MpS1SAIXPgveYeq0+fdf
   ZpwoNu4HlfRr34kWVnuhEmMkE1lHCRVT3EjqIvPx/N3czykThnim7QitD
   EBReqPnGXBjoV/tcHaQfD64ZhDZbH0f+GAWry7XosMAHTcOP/CTQN4Wt9
   dFza/8o6sPoq3WiC/vRhiGgej3lFHIIQbELmH2jYDPbRF75cDdECPMKZN
   eS02CDIKzdTulZS53Q44eRcIzl24PDFcCz1sRNz863rasLJyLRbAYs60s
   /COw20KfqznQGYhB6UD12sYrQvxulSk4dqugM396/97TwvOUwQ3QUuoPn
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609062"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609062"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242404"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242404"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 089/106] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall
Date: Mon, 27 Feb 2023 00:23:28 -0800
Message-Id: 
 <9ed1d1be8ebb41c72f9077134dee9107a596a5fa.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV rdmsr/wrmsr hypercall to the KVM backend function.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 4455bf5ae1ac..f4c0f715a36f 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1190,6 +1190,41 @@ static int tdx_emulate_mmio(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_emulate_rdmsr(struct kvm_vcpu *vcpu)
+{
+	u32 index =3D tdvmcall_a0_read(vcpu);
+	u64 data;
+
+	if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ) ||
+	    kvm_get_msr(vcpu, index, &data)) {
+		trace_kvm_msr_read_ex(index);
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+		return 1;
+	}
+	trace_kvm_msr_read(index, data);
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	tdvmcall_set_return_val(vcpu, data);
+	return 1;
+}
+
+static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu)
+{
+	u32 index =3D tdvmcall_a0_read(vcpu);
+	u64 data =3D tdvmcall_a1_read(vcpu);
+
+	if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE) ||
+	    kvm_set_msr(vcpu, index, data)) {
+		trace_kvm_msr_write_ex(index, data);
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+		return 1;
+	}
+
+	trace_kvm_msr_write(index, data);
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -1204,6 +1239,10 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_io(vcpu);
 	case EXIT_REASON_EPT_VIOLATION:
 		return tdx_emulate_mmio(vcpu);
+	case EXIT_REASON_MSR_READ:
+		return tdx_emulate_rdmsr(vcpu);
+	case EXIT_REASON_MSR_WRITE:
+		return tdx_emulate_wrmsr(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 69F75C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:52 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232248AbjB0Ict (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:49 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38278 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231836AbjB0I35 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:57 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADE30206B6;
        Mon, 27 Feb 2023 00:26:25 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486385; x=1709022385;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=+e/THgQuzy9HKQWdHmsvCjGEpKAbWN26lu2fjw0xxh8=;
  b=M9t5L9PQ9sverzjKxV65vMAH2nXx5xryonvM0n51i2GyLojb/MBOu8uU
   OrAVJj71b6O4Smk6J/oSfH9wRigvivcE0r/Mw6EZ8nS124YfE5JMrOjaU
   66tc7a/HEC45xjfRBIuw1kogZVEu/YJoq26EdJG2QAOKyXBr1j4RqDren
   GITNHf9Gjh9kcxul4PhSS7EkjyUBru7vHIeWSVdsAgG+JuGC/eP/Jmn43
   ok7dQjTcByM3wKVnBgylYSiT1ZPhBZl/BBd8EZzBrAaBHafEIyQTLQIux
   JmqkRH87TpChUpkt/GfojPARfQklRpL+fVXWBdJYFb/14Auz/29HFzRLO
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609067"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609067"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242408"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242408"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 090/106] KVM: TDX: Handle TDX PV report fatal error
 hypercall
Date: Mon, 27 Feb 2023 00:23:29 -0800
Message-Id: 
 <f25f451c6b5b61a46df2ff2f8e389d5a8bdc9d3b.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV report fatal error hypercall to exit to device model so that
it can gracefully handle it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index f4c0f715a36f..f20fcf8325aa 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1243,6 +1243,13 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_rdmsr(vcpu);
 	case EXIT_REASON_MSR_WRITE:
 		return tdx_emulate_wrmsr(vcpu);
+	case TDG_VP_VMCALL_REPORT_FATAL_ERROR:
+		/*
+		 * Exit to userspace device model for tear down.
+		 * Because guest TD is already panicking, returning an error to
+		 * guest TD doesn't make sense.  No argument check is done.
+		 */
+		return tdx_vp_vmcall_to_user(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 45E6AC64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:31:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231842AbjB0Ibg (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:31:36 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59310 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231843AbjB0I37 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:29:59 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D282E206B8;
        Mon, 27 Feb 2023 00:26:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486388; x=1709022388;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=crRX4eJKIGYLcIqb0h3lQZf/BA+IJdNsvIu9Nezrjsc=;
  b=Z1B4RG0K6QW2X2YBAav4vB6UzBcTFTZrWwVrZ0xPQEr3qToAJxLis25o
   Zr+gvFGPmyTUrQSpPkmpc/pNSB8hXMrbJDwf46MsJWdLWfX9LyY+xay5o
   NrHPx5yoKOyEbI5z4ymiLb+c9Yp57vXW/px/ThhYPEwva2obeBnxUCmoA
   5Uyx8iFZ7gw3IGsKL9Jy5twLVz28XQn2HuBPX4JBuWCUWMC2CX4LgHrTV
   ps7xuH9Z4NHRS1osa4JrfBJwJhM0L9mnDPX4cFI2qJd9gJ7m2eZ8nbEbq
   QCPGFXWr2TRYKdB8jU3hhjNpCJWR5cSkXVbo8nZVJYGv+I0CPvVIX3ho/
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609070"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609070"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242411"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242411"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 091/106] KVM: TDX: Handle TDX PV map_gpa hypercall
Date: Mon, 27 Feb 2023 00:23:30 -0800
Message-Id: 
 <7cf93fefc4b2164e35a6e127abcd970c9465f92e.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Wire up TDX PV map_gpa hypercall to the kvm/mmu backend.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 53 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index f20fcf8325aa..08a4e63e4aea 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1225,6 +1225,57 @@ static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_map_gpa(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm =3D vcpu->kvm;
+	gpa_t gpa =3D tdvmcall_a0_read(vcpu);
+	gpa_t size =3D tdvmcall_a1_read(vcpu);
+	gpa_t end =3D gpa + size;
+	gfn_t s =3D gpa_to_gfn(gpa) & ~kvm_gfn_shared_mask(kvm);
+	gfn_t e =3D gpa_to_gfn(end) & ~kvm_gfn_shared_mask(kvm);
+	int i;
+
+	if (!IS_ALIGNED(gpa, 4096) || !IS_ALIGNED(size, 4096) ||
+	    end < gpa ||
+	    end > kvm_gfn_shared_mask(kvm) << (PAGE_SHIFT + 1) ||
+	    kvm_is_private_gpa(kvm, gpa) !=3D kvm_is_private_gpa(kvm, end)) {
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+		return 1;
+	}
+
+	/*
+	 * Check how the requested region overlaps with the KVM memory slots.
+	 * For simplicity, require that it must be contained within a memslot or
+	 * it must not overlap with any memslots (MMIO).
+	 */
+	for (i =3D 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
+		struct kvm_memslots *slots =3D __kvm_memslots(kvm, i);
+		struct kvm_memslot_iter iter;
+
+		kvm_for_each_memslot_in_gfn_range(&iter, slots, s, e) {
+			struct kvm_memory_slot *slot =3D iter.slot;
+			gfn_t slot_s =3D slot->base_gfn;
+			gfn_t slot_e =3D slot->base_gfn + slot->npages;
+
+			/* no overlap */
+			if (e < slot_s || s >=3D slot_e)
+				continue;
+
+			/* contained in slot */
+			if (slot_s <=3D s && e <=3D slot_e) {
+				if (kvm_slot_can_be_private(slot))
+					return tdx_vp_vmcall_to_user(vcpu);
+				continue;
+			}
+
+			break;
+		}
+	}
+
+	tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+	return 1;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	if (tdvmcall_exit_type(vcpu))
@@ -1250,6 +1301,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		 * guest TD doesn't make sense.  No argument check is done.
 		 */
 		return tdx_vp_vmcall_to_user(vcpu);
+	case TDG_VP_VMCALL_MAP_GPA:
+		return tdx_map_gpa(vcpu);
 	default:
 		break;
 	}
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9D190C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:31:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230242AbjB0IbT (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:31:19 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59330 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231856AbjB0IaA (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:00 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE7B01CAF8;
        Mon, 27 Feb 2023 00:26:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486394; x=1709022394;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=0RQI1RE12dXIaLvzePk/bGkx3f7Xb/jOfdXmZgioVYo=;
  b=ZVl2AMkbnL0BrZOiMRTr9FTyxtsHR3RkGZDdH6m0sS19rR3C9W1PROII
   ZBYtj8raQXLtKpv+/RMjPuUp/RznKl5tGp82ScmzSbhfsvL11ys0c3r62
   OUIOLF0qp0C5IEWsIrjHcPvkGJxlAqeMnn2cR8ixDBpwsRzWUtug+Ke5W
   NL3Xgdq+84ZsNLCeYRdujQ9tj7yQRfQM1cPiE6nCnrQNb7QLld++dlzOk
   80z/CJxxjQKDuTrRETlZ25/guC2NKgCVIPfkwyxODoVrzygBB6I0nQWuL
   6JzZeP9xCGdGMe5oQrHK5kDfQHzyrDM6SjFB+OeLZqic+x0cIshbDKfVc
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609072"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609072"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242415"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242415"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 092/106] KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo>
 hypercall
Date: Mon, 27 Feb 2023 00:23:31 -0800
Message-Id: 
 <2b22361d1decea80492eff3ec87d87ca16a0e9c5.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implement TDG.VP.VMCALL<GetTdVmCallInfo> hypercall.  If the input value is
zero, return success code and zero in output registers.

TDG.VP.VMCALL<GetTdVmCallInfo> hypercall is a subleaf of TDG.VP.VMCALL to
enumerate which TDG.VP.VMCALL sub leaves are supported.  This hypercall is
for future enhancement of the Guest-Host-Communication Interface (GHCI)
specification.  The GHCI version of 344426-001US defines it to require
input R12 to be zero and to return zero in output registers, R11, R12, R13,
and R14 so that guest TD enumerates no enhancement.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 08a4e63e4aea..35c6875d3bef 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1225,6 +1225,20 @@ static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu)
 	return 1;
 }
=20
+static int tdx_get_td_vm_call_info(struct kvm_vcpu *vcpu)
+{
+	if (tdvmcall_a0_read(vcpu))
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_INVALID_OPERAND);
+	else {
+		tdvmcall_set_return_code(vcpu, TDG_VP_VMCALL_SUCCESS);
+		kvm_r11_write(vcpu, 0);
+		tdvmcall_a0_write(vcpu, 0);
+		tdvmcall_a1_write(vcpu, 0);
+		tdvmcall_a2_write(vcpu, 0);
+	}
+	return 1;
+}
+
 static int tdx_map_gpa(struct kvm_vcpu *vcpu)
 {
 	struct kvm *kvm =3D vcpu->kvm;
@@ -1294,6 +1308,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 		return tdx_emulate_rdmsr(vcpu);
 	case EXIT_REASON_MSR_WRITE:
 		return tdx_emulate_wrmsr(vcpu);
+	case TDG_VP_VMCALL_GET_TD_VM_CALL_INFO:
+		return tdx_get_td_vm_call_info(vcpu);
 	case TDG_VP_VMCALL_REPORT_FATAL_ERROR:
 		/*
 		 * Exit to userspace device model for tear down.
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C3775C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232209AbjB0Icl (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:41 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38610 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231875AbjB0IaF (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:05 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 751E220D12;
        Mon, 27 Feb 2023 00:26:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486402; x=1709022402;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=x0ayolqox89Eiek3h19Ty17vWo1hFMBecMFI+9M9Tko=;
  b=FfbztvSMUbIkrfq9ybDoZveVhPBK3PUYv7ZoC9JBSsjFMH/SIqUGN7gL
   JjWk/J4KAteCsv0x34PAz1WWhvBYmpBbgts5HwEZffYpd+/6BjAhQMPpw
   XLexrq977pzGlV/YH7VXX8Vzcqp479VzCr+IjnN0CSQftwdDsQMgu/kSt
   hshX14ZeU9p2pFv3wh7Md3/v76ClmqMsiWbTmfxve9BrJOeu6vbgNel2+
   dXMfznDnjJCt7pIV7jYo7zkVNv9CIt7xM4bVRqaX/K5JX5ZXuQAgad91U
   I4OQ0cqlJtrOUfvSgvNeyfObSHPmYLXoprTAvo7mjhhFr/WCSTth6996g
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609074"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609074"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242422"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242422"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 093/106] KVM: TDX: Silently discard SMI request
Date: Mon, 27 Feb 2023 00:23:32 -0800
Message-Id: 
 <0f2736a8b7526c91d1e7438f9b0726f88c0aa825.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX doesn't support system-management mode (SMM) and system-management
interrupt (SMI) in guest TDs.  Because guest state (vcpu state, memory
state) is protected, it must go through the TDX module APIs to change guest
state, injecting SMI and changing vcpu mode into SMM.  The TDX module
doesn't provide a way for VMM to inject SMI into guest TD and a way for VMM
to switch guest vcpu mode into SMM.

We have two options in KVM when handling SMM or SMI in the guest TD or the
device model (e.g. QEMU): 1) silently ignore the request or 2) return a
meaningful error.

For simplicity, we implemented the option 1).

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/smm.h         |  7 +++++-
 arch/x86/kvm/vmx/main.c    | 45 ++++++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/tdx.c     | 29 ++++++++++++++++++++++++
 arch/x86/kvm/vmx/x86_ops.h | 12 ++++++++++
 4 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
index a1cf2ac5bd78..bc77902f5c18 100644
--- a/arch/x86/kvm/smm.h
+++ b/arch/x86/kvm/smm.h
@@ -142,7 +142,12 @@ union kvm_smram {
=20
 static inline int kvm_inject_smi(struct kvm_vcpu *vcpu)
 {
-	kvm_make_request(KVM_REQ_SMI, vcpu);
+	/*
+	 * If SMM isn't supported (e.g. TDX), silently discard SMI request.
+	 * Assume that SMM supported =3D MSR_IA32_SMBASE supported.
+	 */
+	if (static_call(kvm_x86_has_emulated_msr)(vcpu->kvm, MSR_IA32_SMBASE))
+		kvm_make_request(KVM_REQ_SMI, vcpu);
 	return 0;
 }
=20
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 67d565a32e96..872479b8edb8 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -273,6 +273,43 @@ static void vt_msr_filter_changed(struct kvm_vcpu *vcp=
u)
 	vmx_msr_filter_changed(vcpu);
 }
=20
+#ifdef CONFIG_KVM_SMM
+static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_smi_allowed(vcpu, for_injection);
+
+	return vmx_smi_allowed(vcpu, for_injection);
+}
+
+static int vt_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
+{
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_enter_smm(vcpu, smram);
+
+	return vmx_enter_smm(vcpu, smram);
+}
+
+static int vt_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smra=
m)
+{
+	if (unlikely(is_td_vcpu(vcpu)))
+		return tdx_leave_smm(vcpu, smram);
+
+	return vmx_leave_smm(vcpu, smram);
+}
+
+static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_enable_smi_window(vcpu);
+		return;
+	}
+
+	/* RSM will cause a vmexit anyway.  */
+	vmx_enable_smi_window(vcpu);
+}
+#endif
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -658,10 +695,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.setup_mce =3D vmx_setup_mce,
=20
 #ifdef CONFIG_KVM_SMM
-	.smi_allowed =3D vmx_smi_allowed,
-	.enter_smm =3D vmx_enter_smm,
-	.leave_smm =3D vmx_leave_smm,
-	.enable_smi_window =3D vmx_enable_smi_window,
+	.smi_allowed =3D vt_smi_allowed,
+	.enter_smm =3D vt_enter_smm,
+	.leave_smm =3D vt_leave_smm,
+	.enable_smi_window =3D vt_enable_smi_window,
 #endif
=20
 	.can_emulate_instruction =3D vmx_can_emulate_instruction,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 35c6875d3bef..01871f343ce2 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1862,6 +1862,35 @@ int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_da=
ta *msr)
 	return 1;
 }
=20
+#ifdef CONFIG_KVM_SMM
+int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+	/* SMI isn't supported for TDX. */
+	WARN_ON_ONCE(1);
+	return false;
+}
+
+int tdx_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
+{
+	/* smi_allowed() is always false for TDX as above. */
+	WARN_ON_ONCE(1);
+	return 0;
+}
+
+int tdx_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
+{
+	WARN_ON_ONCE(1);
+	return 0;
+}
+
+void tdx_enable_smi_window(struct kvm_vcpu *vcpu)
+{
+	/* SMI isn't supported for TDX.  Silently discard SMI request. */
+	WARN_ON_ONCE(1);
+	vcpu->arch.smi_pending =3D false;
+}
+#endif
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 26d07514f6a4..242fcd043d0a 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -226,4 +226,16 @@ static inline int tdx_sept_tlb_remote_flush(struct kvm=
 *kvm) { return 0; }
 static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,=
 int root_level) {}
 #endif
=20
+#if defined(CONFIG_INTEL_TDX_HOST) && defined(CONFIG_KVM_SMM)
+int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection);
+int tdx_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram);
+int tdx_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram);
+void tdx_enable_smi_window(struct kvm_vcpu *vcpu);
+#else
+static inline int tdx_smi_allowed(struct kvm_vcpu *vcpu, bool for_injectio=
n) { return false; }
+static inline int tdx_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *sm=
ram) { return 0; }
+static inline int tdx_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smr=
am *smram) { return 0; }
+static inline void tdx_enable_smi_window(struct kvm_vcpu *vcpu) {}
+#endif
+
 #endif /* __KVM_X86_VMX_X86_OPS_H */
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C0FC5C7EE30
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232012AbjB0IcA (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:00 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37450 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231895AbjB0IaH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:07 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E519206BA;
        Mon, 27 Feb 2023 00:26:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486405; x=1709022405;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Hddv35PrQWARrZl8kzpiaBZlqcCslatYoUy1Kb32Sf8=;
  b=Fzy6aO/4ELNOaldADnxZM8h3jCAx83KbymtwMwO8T/bRAPtsGSzCFTtt
   54hdH8VmqOtewnfgHtQfK3GHMC2BbpVsf5bwbglMPPYr8UO+DpwBkwFR0
   d6t5BSKAMbPsadDQUqciCYMwCZg+UWKHCMGa5Y9hWWOqbiorhxl1Vdg3l
   32Fa9oPsx5wp4AbFkvtHXrQeCEV1MqOOf5pLsmIZJ4nFNcX1Z/6QHAw/h
   rgl7v1AHSUh3hwSx+CQghe/qAvwp9rGpMjfU0+aIHbsGbAyUs/F0zVErL
   XRBbCAE+3GxCYqiG+eGdqkBAN4pmIlVlSkz5/aHut6hny+O4SgGkJIYd1
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609085"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609085"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:20 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242426"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242426"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:19 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 094/106] KVM: TDX: Silently ignore INIT/SIPI
Date: Mon, 27 Feb 2023 00:23:33 -0800
Message-Id: 
 <123833499f97111f361f304a67af1e86e04f9162.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

The TDX module API doesn't provide API for VMM to inject INIT IPI and SIPI.
Instead it defines the different protocols to boot application processors.
Ignore INIT and SIPI events for the TDX guest.

There are two options. 1) (silently) ignore INIT/SIPI request or 2) return
error to guest TDs somehow.  Given that TDX guest is paravirtualized to
boot AP, the option 1 is chosen for simplicity.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/lapic.c               | 19 +++++++++++-------
 arch/x86/kvm/svm/svm.c             |  1 +
 arch/x86/kvm/vmx/main.c            | 32 ++++++++++++++++++++++++++++--
 arch/x86/kvm/vmx/tdx.c             |  4 ++--
 6 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-=
x86-ops.h
index e1242c4b248f..5f699a93b9b3 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -144,6 +144,7 @@ KVM_X86_OP_OPTIONAL(migrate_timers)
 KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
+KVM_X86_OP(vcpu_deliver_init)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
=20
 #undef KVM_X86_OP
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 301059b2e882..473b2538424a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1786,6 +1786,7 @@ struct kvm_x86_ops {
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
=20
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
+	void (*vcpu_deliver_init)(struct kvm_vcpu *vcpu);
=20
 	/*
 	 * Returns vCPU specific APICv inhibit reasons
@@ -1997,6 +1998,7 @@ void kvm_get_segment(struct kvm_vcpu *vcpu, struct kv=
m_segment *var, int seg);
 void kvm_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int s=
eg);
 void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
+void kvm_vcpu_deliver_init(struct kvm_vcpu *vcpu);
=20
 int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
 		    int reason, bool has_error_code, u32 error_code);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index eae1459f8283..644a6a04fb56 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -3193,6 +3193,16 @@ int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 =
data, unsigned long len)
 	return 0;
 }
=20
+void kvm_vcpu_deliver_init(struct kvm_vcpu *vcpu)
+{
+	kvm_vcpu_reset(vcpu, true);
+	if (kvm_vcpu_is_bsp(vcpu))
+		vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
+	else
+		vcpu->arch.mp_state =3D KVM_MP_STATE_INIT_RECEIVED;
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_init);
+
 int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic =3D vcpu->arch.apic;
@@ -3224,13 +3234,8 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 		return 0;
 	}
=20
-	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
-		kvm_vcpu_reset(vcpu, true);
-		if (kvm_vcpu_is_bsp(apic->vcpu))
-			vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
-		else
-			vcpu->arch.mp_state =3D KVM_MP_STATE_INIT_RECEIVED;
-	}
+	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events))
+		static_call(kvm_x86_vcpu_deliver_init)(vcpu);
 	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events)) {
 		if (vcpu->arch.mp_state =3D=3D KVM_MP_STATE_INIT_RECEIVED) {
 			/* evaluate pending_events before reading the vector */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d0b01956e420..bdb08dc236a1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4830,6 +4830,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata =3D {
 	.complete_emulated_msr =3D svm_complete_emulated_msr,
=20
 	.vcpu_deliver_sipi_vector =3D svm_vcpu_deliver_sipi_vector,
+	.vcpu_deliver_init =3D kvm_vcpu_deliver_init,
 	.vcpu_get_apicv_inhibit_reasons =3D avic_vcpu_get_apicv_inhibit_reasons,
 };
=20
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 872479b8edb8..507a4b433ad0 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -310,6 +310,14 @@ static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
 }
 #endif
=20
+static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return true;
+
+	return vmx_apic_init_signal_blocked(vcpu);
+}
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -338,6 +346,25 @@ static void vt_deliver_interrupt(struct kvm_lapic *api=
c, int delivery_mode,
 	vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector);
 }
=20
+static void vt_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	kvm_vcpu_deliver_sipi_vector(vcpu, vector);
+}
+
+static void vt_vcpu_deliver_init(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu)) {
+		/* TDX doesn't support INIT.  Ignore INIT event */
+		vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE;
+		return;
+	}
+
+	kvm_vcpu_deliver_init(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu)) {
@@ -702,13 +729,14 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 #endif
=20
 	.can_emulate_instruction =3D vmx_can_emulate_instruction,
-	.apic_init_signal_blocked =3D vmx_apic_init_signal_blocked,
+	.apic_init_signal_blocked =3D vt_apic_init_signal_blocked,
 	.migrate_timers =3D vmx_migrate_timers,
=20
 	.msr_filter_changed =3D vt_msr_filter_changed,
 	.complete_emulated_msr =3D kvm_complete_insn_gp,
=20
-	.vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector,
+	.vcpu_deliver_sipi_vector =3D vt_vcpu_deliver_sipi_vector,
+	.vcpu_deliver_init =3D vt_vcpu_deliver_init,
=20
 	.dev_mem_enc_ioctl =3D tdx_dev_ioctl,
 	.mem_enc_ioctl =3D vt_mem_enc_ioctl,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 01871f343ce2..6df79f3659b3 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -729,8 +729,8 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_ev=
ent)
 {
 	struct msr_data apic_base_msr;
=20
-	/* Ignore INIT silently because TDX doesn't support INIT event. */
-	if (init_event)
+	/* vcpu_deliver_init method silently discards INIT event. */
+	if (KVM_BUG_ON(init_event, vcpu->kvm))
 		return;
 	if (KVM_BUG_ON(is_td_vcpu_created(to_tdx(vcpu)), vcpu->kvm))
 		return;
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8CF5EC64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:10 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232103AbjB0IcI (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:08 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38714 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231898AbjB0IaH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:07 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7DD9420D17;
        Mon, 27 Feb 2023 00:26:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486406; x=1709022406;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=3k3XPggnyg2n9vKHJ0JnG/eTGBBymuu01VfK1Uypk4M=;
  b=fEDngeiE9o+r2omx/ImmtaP5RBSVZX2AtKNL6zamdmJxBVcF/qjVmoaV
   FcXnIsqAo3B7cQJJHnQOlgfXnU3N9tnuXV31XwZPP1sEA+rgrJaOu3XT/
   LiiExFbiBOQV0QpD9HCDqBMpO7vQvvUY0Wn2W+ijVaW5OcdWED6RDMVXy
   /kCUc3lG8iuw7wDPN7/8gFnDgfw0MKtWO0UOjbcPwFw1o9ukwvlBabn3G
   kYvo9kZCcPam+kGIhk8jZ/wG9sLo8jj2+NNb2vJAU9IBDRsoE2qhTTlnn
   C+oniyD7m0u2amj/+chQKPiRkLDfHt1pZiO4H05WTK50GVeGD91X86lZ0
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609086"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609086"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:20 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242429"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242429"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:20 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>
Subject: [PATCH v12 095/106] KVM: TDX: Add methods to ignore accesses to CPU
 state
Date: Mon, 27 Feb 2023 00:23:34 -0800
Message-Id: 
 <647bd3aa177eeef5c3c9a59d2547c4af29af561a.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX protects TDX guest state from VMM.  Implement access methods for TDX
guest state to ignore them or return zero.  Because those methods can be
called by kvm ioctls to set/get cpu registers, they don't have KVM_BUG_ON
except one method.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 269 +++++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/tdx.c     |  49 ++++++-
 arch/x86/kvm/vmx/x86_ops.h |  13 ++
 3 files changed, 304 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 507a4b433ad0..4296e1e729b7 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -365,6 +365,184 @@ static void vt_vcpu_deliver_init(struct kvm_vcpu *vcp=
u)
 	kvm_vcpu_deliver_init(vcpu);
 }
=20
+static void vt_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_vcpu_after_set_cpuid(vcpu);
+}
+
+static void vt_update_exception_bitmap(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_update_exception_bitmap(vcpu);
+}
+
+static u64 vt_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_get_segment_base(vcpu, seg);
+
+	return vmx_get_segment_base(vcpu, seg);
+}
+
+static void vt_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
+			      int seg)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_get_segment(vcpu, var, seg);
+		return;
+	}
+
+	vmx_get_segment(vcpu, var, seg);
+}
+
+static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
+			      int seg)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_segment(vcpu, var, seg);
+}
+
+static int vt_get_cpl(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_get_cpl(vcpu);
+
+	return vmx_get_cpl(vcpu);
+}
+
+static void vt_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
+{
+	if (is_td_vcpu(vcpu)) {
+		*db =3D 0;
+		*l =3D 0;
+		return;
+	}
+
+	vmx_get_cs_db_l_bits(vcpu, db, l);
+}
+
+static void vt_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_cr0(vcpu, cr0);
+}
+
+static void vt_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_cr4(vcpu, cr4);
+}
+
+static int vt_set_efer(struct kvm_vcpu *vcpu, u64 efer)
+{
+	if (is_td_vcpu(vcpu))
+		return 0;
+
+	return vmx_set_efer(vcpu, efer);
+}
+
+static void vt_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	if (is_td_vcpu(vcpu)) {
+		memset(dt, 0, sizeof(*dt));
+		return;
+	}
+
+	vmx_get_idt(vcpu, dt);
+}
+
+static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_idt(vcpu, dt);
+}
+
+static void vt_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	if (is_td_vcpu(vcpu)) {
+		memset(dt, 0, sizeof(*dt));
+		return;
+	}
+
+	vmx_get_gdt(vcpu, dt);
+}
+
+static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_gdt(vcpu, dt);
+}
+
+static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_dr7(vcpu, val);
+}
+
+static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * MOV-DR exiting is always cleared for TD guest, even in debug mode.
+	 * Thus KVM_DEBUGREG_WONT_EXIT can never be set and it should never
+	 * reach here for TD vcpu.
+	 */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_sync_dirty_debug_regs(vcpu);
+}
+
+static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
+{
+	if (is_td_vcpu(vcpu)) {
+		tdx_cache_reg(vcpu, reg);
+		return;
+	}
+
+	vmx_cache_reg(vcpu, reg);
+}
+
+static unsigned long vt_get_rflags(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_get_rflags(vcpu);
+
+	return vmx_get_rflags(vcpu);
+}
+
+static void vt_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_rflags(vcpu, rflags);
+}
+
+static bool vt_get_if_flag(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return vmx_get_if_flag(vcpu);
+}
+
 static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu)) {
@@ -509,6 +687,14 @@ static void vt_inject_irq(struct kvm_vcpu *vcpu, bool =
reinjected)
 	vmx_inject_irq(vcpu, reinjected);
 }
=20
+static void vt_inject_exception(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_inject_exception(vcpu);
+}
+
 static void vt_cancel_injection(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -555,6 +741,39 @@ static void vt_get_exit_info(struct kvm_vcpu *vcpu, u3=
2 *reason,
 	vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code);
 }
=20
+
+static void vt_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int ir=
r)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_update_cr8_intercept(vcpu, tpr, irr);
+}
+
+static void vt_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitma=
p)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_load_eoi_exitmap(vcpu, eoi_exit_bitmap);
+}
+
+static int vt_set_tss_addr(struct kvm *kvm, unsigned int addr)
+{
+	if (is_td(kvm))
+		return 0;
+
+	return vmx_set_tss_addr(kvm, addr);
+}
+
+static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
+{
+	if (is_td(kvm))
+		return 0;
+
+	return vmx_set_identity_map_addr(kvm, ident_addr);
+}
+
 static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	if (is_td_vcpu(vcpu))
@@ -620,29 +839,29 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.vcpu_load =3D vt_vcpu_load,
 	.vcpu_put =3D vt_vcpu_put,
=20
-	.update_exception_bitmap =3D vmx_update_exception_bitmap,
+	.update_exception_bitmap =3D vt_update_exception_bitmap,
 	.get_msr_feature =3D vmx_get_msr_feature,
 	.get_msr =3D vt_get_msr,
 	.set_msr =3D vt_set_msr,
-	.get_segment_base =3D vmx_get_segment_base,
-	.get_segment =3D vmx_get_segment,
-	.set_segment =3D vmx_set_segment,
-	.get_cpl =3D vmx_get_cpl,
-	.get_cs_db_l_bits =3D vmx_get_cs_db_l_bits,
-	.set_cr0 =3D vmx_set_cr0,
+	.get_segment_base =3D vt_get_segment_base,
+	.get_segment =3D vt_get_segment,
+	.set_segment =3D vt_set_segment,
+	.get_cpl =3D vt_get_cpl,
+	.get_cs_db_l_bits =3D vt_get_cs_db_l_bits,
+	.set_cr0 =3D vt_set_cr0,
 	.is_valid_cr4 =3D vmx_is_valid_cr4,
-	.set_cr4 =3D vmx_set_cr4,
-	.set_efer =3D vmx_set_efer,
-	.get_idt =3D vmx_get_idt,
-	.set_idt =3D vmx_set_idt,
-	.get_gdt =3D vmx_get_gdt,
-	.set_gdt =3D vmx_set_gdt,
-	.set_dr7 =3D vmx_set_dr7,
-	.sync_dirty_debug_regs =3D vmx_sync_dirty_debug_regs,
-	.cache_reg =3D vmx_cache_reg,
-	.get_rflags =3D vmx_get_rflags,
-	.set_rflags =3D vmx_set_rflags,
-	.get_if_flag =3D vmx_get_if_flag,
+	.set_cr4 =3D vt_set_cr4,
+	.set_efer =3D vt_set_efer,
+	.get_idt =3D vt_get_idt,
+	.set_idt =3D vt_set_idt,
+	.get_gdt =3D vt_get_gdt,
+	.set_gdt =3D vt_set_gdt,
+	.set_dr7 =3D vt_set_dr7,
+	.sync_dirty_debug_regs =3D vt_sync_dirty_debug_regs,
+	.cache_reg =3D vt_cache_reg,
+	.get_rflags =3D vt_get_rflags,
+	.set_rflags =3D vt_set_rflags,
+	.get_if_flag =3D vt_get_if_flag,
=20
 	.flush_tlb_all =3D vt_flush_tlb_all,
 	.flush_tlb_current =3D vt_flush_tlb_current,
@@ -659,7 +878,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.patch_hypercall =3D vmx_patch_hypercall,
 	.inject_irq =3D vt_inject_irq,
 	.inject_nmi =3D vt_inject_nmi,
-	.inject_exception =3D vmx_inject_exception,
+	.inject_exception =3D vt_inject_exception,
 	.cancel_injection =3D vt_cancel_injection,
 	.interrupt_allowed =3D vt_interrupt_allowed,
 	.nmi_allowed =3D vt_nmi_allowed,
@@ -667,11 +886,11 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.set_nmi_mask =3D vt_set_nmi_mask,
 	.enable_nmi_window =3D vt_enable_nmi_window,
 	.enable_irq_window =3D vt_enable_irq_window,
-	.update_cr8_intercept =3D vmx_update_cr8_intercept,
+	.update_cr8_intercept =3D vt_update_cr8_intercept,
 	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
 	.set_apic_access_page_addr =3D vmx_set_apic_access_page_addr,
 	.refresh_apicv_exec_ctrl =3D vmx_refresh_apicv_exec_ctrl,
-	.load_eoi_exitmap =3D vmx_load_eoi_exitmap,
+	.load_eoi_exitmap =3D vt_load_eoi_exitmap,
 	.apicv_post_state_restore =3D vt_apicv_post_state_restore,
 	.required_apicv_inhibits =3D VMX_REQUIRED_APICV_INHIBITS,
 	.hwapic_irr_update =3D vmx_hwapic_irr_update,
@@ -682,13 +901,13 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
 	.protected_apic_has_interrupt =3D vt_protected_apic_has_interrupt,
=20
-	.set_tss_addr =3D vmx_set_tss_addr,
-	.set_identity_map_addr =3D vmx_set_identity_map_addr,
+	.set_tss_addr =3D vt_set_tss_addr,
+	.set_identity_map_addr =3D vt_set_identity_map_addr,
 	.get_mt_mask =3D vt_get_mt_mask,
=20
 	.get_exit_info =3D vt_get_exit_info,
=20
-	.vcpu_after_set_cpuid =3D vmx_vcpu_after_set_cpuid,
+	.vcpu_after_set_cpuid =3D vt_vcpu_after_set_cpuid,
=20
 	.has_wbinvd_exit =3D cpu_has_vmx_wbinvd_exit,
=20
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6df79f3659b3..6784cbf08cc4 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3,6 +3,7 @@
 #include <linux/mmu_context.h>
=20
 #include <asm/fpu/xcr.h>
+#include <asm/virtext.h>
 #include <asm/tdx.h>
=20
 #include "capabilities.h"
@@ -593,8 +594,15 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
=20
 	vcpu->arch.tsc_offset =3D to_kvm_tdx(vcpu->kvm)->tsc_offset;
 	vcpu->arch.l1_tsc_offset =3D vcpu->arch.tsc_offset;
-	vcpu->arch.guest_state_protected =3D
-		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
+	/*
+	 * TODO: support off-TD debug.  If TD DEBUG is enabled, guest state
+	 * can be accessed. guest_state_protected =3D false. and kvm ioctl to
+	 * access CPU states should be usable for user space VMM (e.g. qemu).
+	 *
+	 * vcpu->arch.guest_state_protected =3D
+	 *	!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
+	 */
+	vcpu->arch.guest_state_protected =3D true;
=20
 	tdx->pi_desc.nv =3D POSTED_INTR_VECTOR;
 	tdx->pi_desc.sn =3D 1;
@@ -1891,6 +1899,43 @@ void tdx_enable_smi_window(struct kvm_vcpu *vcpu)
 }
 #endif
=20
+int tdx_get_cpl(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
+{
+	kvm_register_mark_available(vcpu, reg);
+	switch (reg) {
+	case VCPU_REGS_RSP:
+	case VCPU_REGS_RIP:
+	case VCPU_EXREG_PDPTR:
+	case VCPU_EXREG_CR0:
+	case VCPU_EXREG_CR3:
+	case VCPU_EXREG_CR4:
+		break;
+	default:
+		KVM_BUG_ON(1, vcpu->kvm);
+		break;
+	}
+}
+
+unsigned long tdx_get_rflags(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+u64 tdx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+{
+	return 0;
+}
+
+void tdx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg)
+{
+	memset(var, 0, sizeof(*var));
+}
+
 int tdx_dev_ioctl(void __user *argp)
 {
 	struct kvm_tdx_capabilities __user *user_caps;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 242fcd043d0a..553f2b3880b6 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -175,6 +175,12 @@ bool tdx_has_emulated_msr(u32 index, bool write);
 int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
 int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
=20
+int tdx_get_cpl(struct kvm_vcpu *vcpu);
+void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
+unsigned long tdx_get_rflags(struct kvm_vcpu *vcpu);
+u64 tdx_get_segment_base(struct kvm_vcpu *vcpu, int seg);
+void tdx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int s=
eg);
+
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
=20
 void tdx_flush_tlb(struct kvm_vcpu *vcpu);
@@ -219,6 +225,13 @@ static inline bool tdx_has_emulated_msr(u32 index, boo=
l write) { return false; }
 static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
 static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
=20
+static inline int tdx_get_cpl(struct kvm_vcpu *vcpu) { return 0; }
+static inline void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) =
{}
+static inline unsigned long tdx_get_rflags(struct kvm_vcpu *vcpu) { return=
 0; }
+static inline u64 tdx_get_segment_base(struct kvm_vcpu *vcpu, int seg) { r=
eturn 0; }
+static inline void tdx_get_segment(struct kvm_vcpu *vcpu, struct kvm_segme=
nt *var,
+				   int seg) {}
+
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)=
 { return -EOPNOTSUPP; }
=20
 static inline void tdx_flush_tlb(struct kvm_vcpu *vcpu) {}
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F2A67C7EE2E
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:31:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231926AbjB0Ibp (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:31:45 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38932 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231939AbjB0IaN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:13 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6978C20D16;
        Mon, 27 Feb 2023 00:26:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486416; x=1709022416;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=3sz94AYFj+sDvvgjnAkjoG1G/WHdPHxZoFLnSh7r8a4=;
  b=CzlEy41afW+EPrItFljDwXTh8VEPh4hinPDxoDsVqMtouYPQQcI9m6/V
   1EDklLuA7APrWepxwmT4V8PXGDiYHdbPjZnei/JDyrK0Tz0Jf2dCCA+fY
   AbJqeT1QFwZ3RwuwarJeSlLAkSK7JO5TZd1gE0zWx3LovLwYGKaPoxTRe
   HSEQNjs4eWvSkLc96DL3se5mAn8+kNxQ8jboEGIF+rftGYZi7gg+yv0CG
   mjpu65Be2KqOeWaVJzshddvCmMiCzFB5gwA1KhKjr6+5rrFb7epE1eSLe
   oXpolAXk3dmz5u3vs5qlsmElAE5FjvgVE+LsteHCUoP5/cMcrSjB8oCep
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609090"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609090"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:20 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242432"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242432"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:20 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 096/106] KVM: TDX: Add methods to ignore guest instruction
 emulation
Date: Mon, 27 Feb 2023 00:23:35 -0800
Message-Id: 
 <bfe5d985783e63031ad354251c36b9fc7b953a3a.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because TDX protects TDX guest state from VMM, instructions in guest memory
cannot be emulated.  Implement methods to ignore guest instruction
emulator.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 4296e1e729b7..d15056666311 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -310,6 +310,30 @@ static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
 }
 #endif
=20
+static bool vt_can_emulate_instruction(struct kvm_vcpu *vcpu, int emul_typ=
e,
+				       void *insn, int insn_len)
+{
+	if (is_td_vcpu(vcpu))
+		return false;
+
+	return vmx_can_emulate_instruction(vcpu, emul_type, insn, insn_len);
+}
+
+static int vt_check_intercept(struct kvm_vcpu *vcpu,
+				 struct x86_instruction_info *info,
+				 enum x86_intercept_stage stage,
+				 struct x86_exception *exception)
+{
+	/*
+	 * This call back is triggered by the x86 instruction emulator. TDX
+	 * doesn't allow guest memory inspection.
+	 */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return X86EMUL_UNHANDLEABLE;
+
+	return vmx_check_intercept(vcpu, info, stage, exception);
+}
+
 static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -918,7 +942,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.load_mmu_pgd =3D vt_load_mmu_pgd,
=20
-	.check_intercept =3D vmx_check_intercept,
+	.check_intercept =3D vt_check_intercept,
 	.handle_exit_irqoff =3D vt_handle_exit_irqoff,
=20
 	.request_immediate_exit =3D vt_request_immediate_exit,
@@ -947,7 +971,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.enable_smi_window =3D vt_enable_smi_window,
 #endif
=20
-	.can_emulate_instruction =3D vmx_can_emulate_instruction,
+	.can_emulate_instruction =3D vt_can_emulate_instruction,
 	.apic_init_signal_blocked =3D vt_apic_init_signal_blocked,
 	.migrate_timers =3D vmx_migrate_timers,
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3F8D3C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232171AbjB0Ic3 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:29 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39012 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231955AbjB0IaP (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:15 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CE9A1D904;
        Mon, 27 Feb 2023 00:27:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486420; x=1709022420;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Sb3GhmOVq7LbIqC0EdKDOcQ7oJ/c//lx2iSy3QD7UPI=;
  b=j67oXbEn+NPne1zv/c7NXDSaV9cuLWouvlhT2T82e/PY1bU4kiK9BAB/
   VijeojBVI+PNO8/TNJvfL5aDWjtFtWGXe/5MqSP+buoThlxMR4XShttVh
   wgukaYcPTRzzls109HAXnigco7K7pDlgXjCMaXNf0Jxn5bKhn4BFAIOsy
   bIi2EeCe6u/87JxAGaLelg3lipvYOj67GN9Afoy4wbRtju02MlFV4d6+3
   T/MU3P/EvwAhtT8/Q8oShZmlxD0GmPnsqHrl5bBGQ/NDSAXnoJj1FbvZl
   Ygj3Ofr5KQDYal3tH3e+m6D7dgkXSqlQg2w3UpB60pNV4nO4siC+Sud/c
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609095"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609095"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:20 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242435"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242435"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:20 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 097/106] KVM: TDX: Add a method to ignore dirty logging
Date: Mon, 27 Feb 2023 00:23:36 -0800
Message-Id: 
 <d53c6a390a4cb446408df47db1a7b8dd5f1553ad.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Currently TDX KVM doesn't support tracking dirty pages (yet).  Implement a
method to ignore it.  Because the flag for kvm memory slot to enable dirty
logging isn't accepted for TDX, warn on the method is called for TDX.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d15056666311..79a3c623bccf 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -806,6 +806,14 @@ static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t =
gfn, bool is_mmio)
 	return __vmx_get_mt_mask(vcpu, gfn, is_mmio, true);
 }
=20
+static void vt_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_update_cpu_dirty_logging(vcpu);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -950,7 +958,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.sched_in =3D vt_sched_in,
=20
 	.cpu_dirty_log_size =3D PML_ENTITY_NUM,
-	.update_cpu_dirty_logging =3D vmx_update_cpu_dirty_logging,
+	.update_cpu_dirty_logging =3D vt_update_cpu_dirty_logging,
=20
 	.nested_ops =3D &vmx_nested_ops,
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BB2B4C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232123AbjB0IcN (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:13 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59246 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231958AbjB0IaP (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:15 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF5861D90C;
        Mon, 27 Feb 2023 00:27:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486420; x=1709022420;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ATtdQPA7Lu+GCXwzx5H1T5bzpq8fe+VljG85p8vhHvw=;
  b=UavQVdTPfu6dO7OIyZ7lcWdJbLWe9imcKmbJbCP9nID42aLIe2ZghAom
   +Grbpt1SMHGkeJXStN8K/SIFDRrY4y27Ros0elSHZyh+ThLAY7VCkaNSx
   Kjx2fUJP+08UwdM0fCtXdU9j0lxe35d+ILTtcZw21xo9wLSoL/GoXbVio
   PgF+51COWTIaQSuXViUoGa3/oj6ij0uYqE1POtn6vpHx97MnbGoItMpN1
   cq1rPiC/UweEhxdb6qZittDuIZt5EKaVT/VITqzyq8QNx17jjrvuTGuhS
   jK01qtWXBX68wYB/1WCYrcQRMCmzB2LasbPIOiyjp6BYB8zJvJGGi3BXJ
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609100"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609100"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:21 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242438"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242438"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:20 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 098/106] KVM: TDX: Add methods to ignore VMX preemption
 timer
Date: Mon, 27 Feb 2023 00:23:37 -0800
Message-Id: 
 <89314914a69303bb014beb56d0c30e73029b378a.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX doesn't support VMX preemption timer.  Implement access methods for VMM
to ignore VMX preemption timer.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 79a3c623bccf..ed9c5f9f3413 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -814,6 +814,27 @@ static void vt_update_cpu_dirty_logging(struct kvm_vcp=
u *vcpu)
 	vmx_update_cpu_dirty_logging(vcpu);
 }
=20
+#ifdef CONFIG_X86_64
+static int vt_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
+			      bool *expired)
+{
+	/* VMX-preemption timer isn't available for TDX. */
+	if (is_td_vcpu(vcpu))
+		return -EINVAL;
+
+	return vmx_set_hv_timer(vcpu, guest_deadline_tsc, expired);
+}
+
+static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu)
+{
+	/* VMX-preemption timer can't be set.  See vt_set_hv_timer(). */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_cancel_hv_timer(vcpu);
+}
+#endif
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -966,8 +987,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.pi_start_assignment =3D vmx_pi_start_assignment,
=20
 #ifdef CONFIG_X86_64
-	.set_hv_timer =3D vmx_set_hv_timer,
-	.cancel_hv_timer =3D vmx_cancel_hv_timer,
+	.set_hv_timer =3D vt_set_hv_timer,
+	.cancel_hv_timer =3D vt_cancel_hv_timer,
 #endif
=20
 	.setup_mce =3D vmx_setup_mce,
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5F3C3C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:31:34 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231812AbjB0Ibc (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:31:32 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59348 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231980AbjB0IaS (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:18 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 838321E1DB;
        Mon, 27 Feb 2023 00:27:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486423; x=1709022423;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=RgiM7RRY0j0k8kS1Un798rhchRa7fM1NXF8nqbI9RsY=;
  b=MuhDREMIkJf3gmOURTeKOOkOj5fGPNU+Pb2K+xJe8tCfv+CVo8iAVzKJ
   1lYeBcm5rq6GcpC2EayJppCYkr9nY6hyY2SGOhsiNXJ1WzyHB6a8lRf5C
   zAve131NQP+2YGa2EZUYSowu8XHthskgGJFKtaXPDDgKNS7tTGJwZKYgc
   eKwuRxZNGXXWK9o6RlmTyY7uuM14QAXooxy8rC7KRUtIvvGOrfr80ByCC
   mFGc7UzXba5NcaancwKQjPug7ZIqc5qyEPRk/yoXvYqymQA4ZPMKjatwu
   0GtKOScsCvPCiEEtIz+toGID+cI5/b3jPflcng+/pU9HyrXi5JXWS90Qk
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609105"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609105"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:21 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242441"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242441"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:20 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 099/106] KVM: TDX: Add methods to ignore accesses to TSC
Date: Mon, 27 Feb 2023 00:23:38 -0800
Message-Id: 
 <9b5301bd28a90eece54d301ccb41bbbc39bda024.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX protects TDX guest TSC state from VMM.  Implement access methods to
ignore guest TSC.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c | 44 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index ed9c5f9f3413..340e76e1b59e 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -806,6 +806,42 @@ static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t =
gfn, bool is_mmio)
 	return __vmx_get_mt_mask(vcpu, gfn, is_mmio, true);
 }
=20
+static u64 vt_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	/* TDX doesn't support L2 guest at the moment. */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return 0;
+
+	return vmx_get_l2_tsc_offset(vcpu);
+}
+
+static u64 vt_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
+{
+	/* TDX doesn't support L2 guest at the moment. */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return 0;
+
+	return vmx_get_l2_tsc_multiplier(vcpu);
+}
+
+static void vt_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+{
+	/* In TDX, tsc offset can't be changed. */
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_write_tsc_offset(vcpu, offset);
+}
+
+static void vt_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
+{
+	/* In TDX, tsc multiplier can't be changed. */
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_write_tsc_multiplier(vcpu, multiplier);
+}
+
 static void vt_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
 {
 	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
@@ -964,10 +1000,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
=20
 	.has_wbinvd_exit =3D cpu_has_vmx_wbinvd_exit,
=20
-	.get_l2_tsc_offset =3D vmx_get_l2_tsc_offset,
-	.get_l2_tsc_multiplier =3D vmx_get_l2_tsc_multiplier,
-	.write_tsc_offset =3D vmx_write_tsc_offset,
-	.write_tsc_multiplier =3D vmx_write_tsc_multiplier,
+	.get_l2_tsc_offset =3D vt_get_l2_tsc_offset,
+	.get_l2_tsc_multiplier =3D vt_get_l2_tsc_multiplier,
+	.write_tsc_offset =3D vt_write_tsc_offset,
+	.write_tsc_multiplier =3D vt_write_tsc_multiplier,
=20
 	.load_mmu_pgd =3D vt_load_mmu_pgd,
=20
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BDCCEC7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232153AbjB0IcW (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:22 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38264 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231961AbjB0IaP (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:15 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94B041D925;
        Mon, 27 Feb 2023 00:27:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486421; x=1709022421;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=RlLLDqsjdsWGZZGnbwmWrW/7AkVVVP9WsV72zRvKFMk=;
  b=S11P3KjvOEVOWJEVCd6zcoNFF1bBXRZj9l36BhYOnm5eSmfgvTKpXGED
   munJP6u+L5I2Uwr6/XqcXKjVAqNqQHnbnGBP7TS4KvQiGk0iePwVstM3N
   IbbODhjN4His3BkVprt9kWDvwTl2+gcjI4Ez5MqWKjkbj9bkKUh4hMxA1
   l9M0XlpdnuDQNUUzSi2BUU6aOsobzt3szpt6+LEKSL2ocV6JgI/zxv9ez
   C0pF3MUQz8oIz6wOd56fLVx1a/3O7KrEV68PLwCvAkgdXj1z1OQvqVb0Y
   nGJPHyjqvWLZ7pQGtQZmjJ1QrsIMUWdwZ3FF1j8ri+3fudA37sF8VrjyA
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609109"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609109"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:21 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242444"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242444"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:20 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 100/106] KVM: TDX: Ignore setting up mce
Date: Mon, 27 Feb 2023 00:23:39 -0800
Message-Id: 
 <4e6af93ad7f1970dc058bca1ffbedf5cd6b936f4.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because vmx_set_mce function is VMX specific and it cannot be used for TDX.
Add vt stub to ignore setting up mce for TDX.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 340e76e1b59e..73ea15754102 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -871,6 +871,14 @@ static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu)
 }
 #endif
=20
+static void vt_setup_mce(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_setup_mce(vcpu);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -1027,7 +1035,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.cancel_hv_timer =3D vt_cancel_hv_timer,
 #endif
=20
-	.setup_mce =3D vmx_setup_mce,
+	.setup_mce =3D vt_setup_mce,
=20
 #ifdef CONFIG_KVM_SMM
 	.smi_allowed =3D vt_smi_allowed,
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 972D6C7EE2D
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:31:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231983AbjB0Ib5 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:31:57 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38626 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232007AbjB0IaW (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:22 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FED01E1FE;
        Mon, 27 Feb 2023 00:27:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486425; x=1709022425;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=ZWV7hCkAnYam2QtQ43gFRUgp5krxkouJSB0YkN4Z1BQ=;
  b=WhHOADGfr8S7ldyDtCgwcIzVLy1eGFHdiUFQwXGpo55zujOrA2H5xIRV
   hyj+03wLDTwhhW3E6wPke4YrO8QOsf3lnHAoGAKgwWOmmN3zfuB/Yr6Uo
   RIyIAlh1yypZZsYfHQb8wu+o5Kwlz4t9ki51Ggclka73pBehGTkoOgXVz
   lgtwDOcG0BGybW6HVa7dLdOl32GCKR4pcUvCvlE9wFGzk6Jtw7c3Heiy5
   cdKlEW8U6NdJFaChEzPBbxO+21XT8SiTsu0sS0NlAoCoZ1PI5xOyB2UCt
   00RFF3nKSUoxvP6rGt8+nwyASDvB/HU2LLAyS5fkgs2C+9ffuSMp8kA2I
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609111"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609111"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:21 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242448"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242448"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:21 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 101/106] KVM: TDX: Add a method to ignore for TDX to
 ignore hypercall patch
Date: Mon, 27 Feb 2023 00:23:40 -0800
Message-Id: 
 <f3f1697c89290124b1fb7838168a62083bf0da50.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Because guest TD memory is protected, VMM patching guest binary for
hypercall instruction isn't possible.  Add a method to ignore hypercall
patching with a warning.  Note: guest TD kernel needs to be modified to use
TDG.VP.VMCALL for hypercall.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 73ea15754102..6a63d99ddf07 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -703,6 +703,19 @@ static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vc=
pu)
 	return vmx_get_interrupt_shadow(vcpu);
 }
=20
+static void vt_patch_hypercall(struct kvm_vcpu *vcpu,
+				  unsigned char *hypercall)
+{
+	/*
+	 * Because guest memory is protected, guest can't be patched. TD kernel
+	 * is modified to use TDG.VP.VMCAL for hypercall.
+	 */
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_patch_hypercall(vcpu, hypercall);
+}
+
 static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
 {
 	if (is_td_vcpu(vcpu))
@@ -972,7 +985,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.update_emulated_instruction =3D vmx_update_emulated_instruction,
 	.set_interrupt_shadow =3D vt_set_interrupt_shadow,
 	.get_interrupt_shadow =3D vt_get_interrupt_shadow,
-	.patch_hypercall =3D vmx_patch_hypercall,
+	.patch_hypercall =3D vt_patch_hypercall,
 	.inject_irq =3D vt_inject_irq,
 	.inject_nmi =3D vt_inject_nmi,
 	.inject_exception =3D vt_inject_exception,
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 77855C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232235AbjB0Icq (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:46 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38650 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232013AbjB0IaW (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:22 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1843D20D2C;
        Mon, 27 Feb 2023 00:27:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486427; x=1709022427;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=hoNyuwRHh8mYvNtBjngC8pcCNGALtnvX2EsBum9D/zg=;
  b=MAoEamUJp6Ayax2MYXwaUckvBiNIk2zsW+cAw6t09a0uNV1mdxuOJ6ge
   CPPqkILkboUPtnB1MvqNei5GMCgtR0+rqLr+zkjdH+XsqXQ8wJFfyb7h3
   Q95ULrMX7fRzK9EiFfpgxe2cYC4pCN1uYFZynX672dcrhf2Ev+Jgt2faF
   6Mu3EhVE2e+wMJ9K+oK5jFVkYKc03INLMsUbwNRjU6+pfKqiBvq6WXBzp
   QNkSiqdLsoC8fG8tpvR9WQtN+4M/hEOzIu8Hy2bYvdp4TqqYuqFon0qS2
   UdYMUr1BfXgUzrqdm10LuHVhD6b5ZVWrJIknWfiA8mgbLYfupOY5cRCdm
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609114"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609114"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:21 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242452"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242452"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:21 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 102/106] KVM: TDX: Add methods to ignore virtual apic
 related operation
Date: Mon, 27 Feb 2023 00:23:41 -0800
Message-Id: 
 <b8097fd6456aeddc3108721df266381d572ca30a.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX protects TDX guest APIC state from VMM.  Implement access methods of
TDX guest vAPIC state to ignore them or return zero.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 61 ++++++++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/tdx.c     |  6 ++++
 arch/x86/kvm/vmx/x86_ops.h |  3 ++
 3 files changed, 64 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 6a63d99ddf07..3f135d791422 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -342,6 +342,14 @@ static bool vt_apic_init_signal_blocked(struct kvm_vcp=
u *vcpu)
 	return vmx_apic_init_signal_blocked(vcpu);
 }
=20
+static void vt_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return tdx_set_virtual_apic_mode(vcpu);
+
+	return vmx_set_virtual_apic_mode(vcpu);
+}
+
 static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi =3D vcpu_to_pi_desc(vcpu);
@@ -350,6 +358,31 @@ static void vt_apicv_post_state_restore(struct kvm_vcp=
u *vcpu)
 	memset(pi->pir, 0, sizeof(pi->pir));
 }
=20
+static void vt_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	return vmx_hwapic_irr_update(vcpu, max_irr);
+}
+
+static void vt_hwapic_isr_update(int max_isr)
+{
+	if (is_td_vcpu(kvm_get_running_vcpu()))
+		return;
+
+	return vmx_hwapic_isr_update(max_isr);
+}
+
+static bool vt_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	/* TDX doesn't support L2 at the moment. */
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return false;
+
+	return vmx_guest_apic_has_interrupt(vcpu);
+}
+
 static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
 	if (is_td_vcpu(vcpu))
@@ -787,6 +820,22 @@ static void vt_update_cr8_intercept(struct kvm_vcpu *v=
cpu, int tpr, int irr)
 	vmx_update_cr8_intercept(vcpu, tpr, irr);
 }
=20
+static void vt_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
+{
+	if (is_td_vcpu(vcpu))
+		return;
+
+	vmx_set_apic_access_page_addr(vcpu);
+}
+
+static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+{
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
+		return;
+
+	vmx_refresh_apicv_exec_ctrl(vcpu);
+}
+
 static void vt_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitma=
p)
 {
 	if (is_td_vcpu(vcpu))
@@ -997,15 +1046,15 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D {
 	.enable_nmi_window =3D vt_enable_nmi_window,
 	.enable_irq_window =3D vt_enable_irq_window,
 	.update_cr8_intercept =3D vt_update_cr8_intercept,
-	.set_virtual_apic_mode =3D vmx_set_virtual_apic_mode,
-	.set_apic_access_page_addr =3D vmx_set_apic_access_page_addr,
-	.refresh_apicv_exec_ctrl =3D vmx_refresh_apicv_exec_ctrl,
+	.set_virtual_apic_mode =3D vt_set_virtual_apic_mode,
+	.set_apic_access_page_addr =3D vt_set_apic_access_page_addr,
+	.refresh_apicv_exec_ctrl =3D vt_refresh_apicv_exec_ctrl,
 	.load_eoi_exitmap =3D vt_load_eoi_exitmap,
 	.apicv_post_state_restore =3D vt_apicv_post_state_restore,
 	.required_apicv_inhibits =3D VMX_REQUIRED_APICV_INHIBITS,
-	.hwapic_irr_update =3D vmx_hwapic_irr_update,
-	.hwapic_isr_update =3D vmx_hwapic_isr_update,
-	.guest_apic_has_interrupt =3D vmx_guest_apic_has_interrupt,
+	.hwapic_irr_update =3D vt_hwapic_irr_update,
+	.hwapic_isr_update =3D vt_hwapic_isr_update,
+	.guest_apic_has_interrupt =3D vt_guest_apic_has_interrupt,
 	.sync_pir_to_irr =3D vt_sync_pir_to_irr,
 	.deliver_interrupt =3D vt_deliver_interrupt,
 	.dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt,
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6784cbf08cc4..292f55efe8f7 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1899,6 +1899,12 @@ void tdx_enable_smi_window(struct kvm_vcpu *vcpu)
 }
 #endif
=20
+void tdx_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
+{
+	/* Only x2APIC mode is supported for TD. */
+	WARN_ON_ONCE(kvm_get_apic_mode(vcpu) !=3D LAPIC_MODE_X2APIC);
+}
+
 int tdx_get_cpl(struct kvm_vcpu *vcpu)
 {
 	return 0;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 553f2b3880b6..c5ff72b62140 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -174,6 +174,7 @@ void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reas=
on,
 bool tdx_has_emulated_msr(u32 index, bool write);
 int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
 int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
+void tdx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
=20
 int tdx_get_cpl(struct kvm_vcpu *vcpu);
 void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
@@ -225,6 +226,8 @@ static inline bool tdx_has_emulated_msr(u32 index, bool=
 write) { return false; }
 static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
 static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)=
 { return 1; }
=20
+static inline void tdx_set_virtual_apic_mode(struct kvm_vcpu *vcpu) {}
+
 static inline int tdx_get_cpl(struct kvm_vcpu *vcpu) { return 0; }
 static inline void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) =
{}
 static inline unsigned long tdx_get_rflags(struct kvm_vcpu *vcpu) { return=
 0; }
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 72C75C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:32:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232143AbjB0IcT (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:32:19 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38888 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232063AbjB0Iai (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:38 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7713C211C2;
        Mon, 27 Feb 2023 00:27:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486428; x=1709022428;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=m4Ol8bMaoRsqFFF6dmsge4GR5c1IqOED1WLtsGCc6F0=;
  b=TP30hRZ2Sq4ERpUH+MUwTbCQ8yZbdngabgcU1cFOrltF0G3nUHlsXGvj
   daslAjNa75bFOtpq1GJP0jvZd+/3dO7BMjFXOJ7SyWXSEYRExaHcguWBo
   pa56cCkGffHaUsZ6J/12khGASJbn5cMk9RQqqykF9pZbDEzrf5lrs0orJ
   0uiHah9CoKPj3i5e1ly7aIUYvqSk3WYJgQnoYm8wSQPiAViIEDDozyKbF
   JcMCSjVEX5oQYL2O5CPUU0A3hWqysFdAouXBskPPXK/Wq8JYW7WBDr3nl
   HLzZEXhnBKUOtrBXlR6ZrFGa42ZTR6KY4N2WxAgormD+8Sxq6iYXpmtpT
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609116"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609116"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:21 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242456"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242456"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:21 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 103/106] Documentation/virt/kvm: Document on Trust Domain
 Extensions(TDX)
Date: Mon, 27 Feb 2023 00:23:42 -0800
Message-Id: 
 <f033d125917ba3fac3be5d32225c9d56b966f1ad.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add documentation to Intel Trusted Domain Extensions(TDX) support.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/api.rst       |   9 +-
 Documentation/virt/kvm/index.rst     |   2 +
 Documentation/virt/kvm/intel-tdx.rst | 351 +++++++++++++++++++++++++++
 3 files changed, 361 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/virt/kvm/intel-tdx.rst

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index fbff5cd6e404..59d7a3c66c4f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1375,6 +1375,9 @@ the memory region are automatically reflected into th=
e guest.  For example, an
 mmap() that affects the region will be made visible immediately.  Another
 example is madvise(MADV_DROP).
=20
+For TDX guest, deleting/moving memory region loses guest memory contents.
+Read only region isn't supported.  Only as-id 0 is supported.
+
 Note: On arm64, a write generated by the page-table walker (to update
 the Access and Dirty flags, for example) never results in a
 KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
@@ -4664,7 +4667,7 @@ H_GET_CPU_CHARACTERISTICS hypercall.
=20
 :Capability: basic
 :Architectures: x86
-:Type: vm
+:Type: vm ioctl, vcpu ioctl
 :Parameters: an opaque platform specific structure (in/out)
 :Returns: 0 on success; -1 on error
=20
@@ -4676,6 +4679,10 @@ Currently, this ioctl is used for issuing Secure Enc=
rypted Virtualization
 (SEV) commands on AMD Processors. The SEV commands are defined in
 Documentation/virt/kvm/x86/amd-memory-encryption.rst.
=20
+Currently, this ioctl is used for issuing Trusted Domain Extensions
+(TDX) commands on Intel Processors. The TDX commands are defined in
+Documentation/virt/kvm/intel-tdx.rst.
+
 4.111 KVM_MEMORY_ENCRYPT_REG_REGION
 -----------------------------------
=20
diff --git a/Documentation/virt/kvm/index.rst b/Documentation/virt/kvm/inde=
x.rst
index ad13ec55ddfe..20a2ab8fc78c 100644
--- a/Documentation/virt/kvm/index.rst
+++ b/Documentation/virt/kvm/index.rst
@@ -19,3 +19,5 @@ KVM
    vcpu-requests
    halt-polling
    review-checklist
+
+   intel-tdx
diff --git a/Documentation/virt/kvm/intel-tdx.rst b/Documentation/virt/kvm/=
intel-tdx.rst
new file mode 100644
index 000000000000..cd96ee3099c6
--- /dev/null
+++ b/Documentation/virt/kvm/intel-tdx.rst
@@ -0,0 +1,351 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+Intel Trust Domain Extensions (TDX)
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Overview
+=3D=3D=3D=3D=3D=3D=3D=3D
+TDX stands for Trust Domain Extensions which isolates VMs from
+the virtual-machine manager (VMM)/hypervisor and any other software on
+the platform. For details, see the specifications [1]_, whitepaper [2]_,
+architectural extensions specification [3]_, module documentation [4]_,
+loader interface specification [5]_, guest-hypervisor communication
+interface [6]_, virtual firmware design guide [7]_, and other resources
+([8]_, [9]_, [10]_, [11]_, and [12]_).
+
+
+API description
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+KVM_MEMORY_ENCRYPT_OP
+---------------------
+:Type: vm ioctl, vcpu ioctl
+
+For TDX operations, KVM_MEMORY_ENCRYPT_OP is re-purposed to be generic
+ioctl with TDX specific sub ioctl command.
+
+::
+
+  /* Trust Domain eXtension sub-ioctl() commands. */
+  enum kvm_tdx_cmd_id {
+          KVM_TDX_CAPABILITIES =3D 0,
+          KVM_TDX_INIT_VM,
+          KVM_TDX_INIT_VCPU,
+          KVM_TDX_INIT_MEM_REGION,
+          KVM_TDX_FINALIZE_VM,
+
+          KVM_TDX_CMD_NR_MAX,
+  };
+
+  struct kvm_tdx_cmd {
+        /* enum kvm_tdx_cmd_id */
+        __u32 id;
+        /* flags for sub-commend. If sub-command doesn't use this, set zer=
o. */
+        __u32 flags;
+        /*
+         * data for each sub-command. An immediate or a pointer to the act=
ual
+         * data in process virtual address.  If sub-command doesn't use it,
+         * set zero.
+         */
+        __u64 data;
+        /*
+         * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+         * status code in addition to -Exxx.
+         * Defined for consistency with struct kvm_sev_cmd.
+         */
+        __u64 error;
+        /* Reserved: Defined for consistency with struct kvm_sev_cmd. */
+        __u64 unused;
+  };
+
+KVM_TDX_CAPABILITIES
+--------------------
+:Type: vm ioctl
+
+Subset of TDSYSINFO_STRCUCT retrieved by TDH.SYS.INFO TDX SEAM call will be
+returned. Which describes about Intel TDX module.
+
+- id: KVM_TDX_CAPABILITIES
+- flags: must be 0
+- data: pointer to struct kvm_tdx_capabilities
+- error: must be 0
+- unused: must be 0
+
+::
+
+  struct kvm_tdx_cpuid_config {
+          __u32 leaf;
+          __u32 sub_leaf;
+          __u32 eax;
+          __u32 ebx;
+          __u32 ecx;
+          __u32 edx;
+  };
+
+  struct kvm_tdx_capabilities {
+          __u64 attrs_fixed0;
+          __u64 attrs_fixed1;
+          __u64 xfam_fixed0;
+          __u64 xfam_fixed1;
+
+          __u32 nr_cpuid_configs;
+          struct kvm_tdx_cpuid_config cpuid_configs[0];
+  };
+
+
+KVM_TDX_INIT_VM
+---------------
+:Type: vm ioctl
+
+Does additional VM initialization specific to TDX which corresponds to
+TDH.MNG.INIT TDX SEAM call.
+
+- id: KVM_TDX_INIT_VM
+- flags: must be 0
+- data: pointer to struct kvm_tdx_init_vm
+- error: must be 0
+- unused: must be 0
+
+::
+
+  struct kvm_tdx_init_vm {
+          __u64 attributes;
+          __u64 mrconfigid[6];          /* sha384 digest */
+          __u64 mrowner[6];             /* sha384 digest */
+          __u64 mrownerconfig[6];       /* sha348 digest */
+          __u64 reserved[1004];         /* must be zero for future extensi=
bility */
+
+          struct kvm_cpuid2 cpuid;
+  };
+
+
+KVM_TDX_INIT_VCPU
+-----------------
+:Type: vcpu ioctl
+
+Does additional VCPU initialization specific to TDX which corresponds to
+TDH.VP.INIT TDX SEAM call.
+
+- id: KVM_TDX_INIT_VCPU
+- flags: must be 0
+- data: initial value of the guest TD VCPU RCX
+- error: must be 0
+- unused: must be 0
+
+KVM_TDX_INIT_MEM_REGION
+-----------------------
+:Type: vm ioctl
+
+Encrypt a memory continuous region which corresponding to TDH.MEM.PAGE.ADD
+TDX SEAM call.
+If KVM_TDX_MEASURE_MEMORY_REGION flag is specified, it also extends measur=
ement
+which corresponds to TDH.MR.EXTEND TDX SEAM call.
+
+- id: KVM_TDX_INIT_VCPU
+- flags: flags
+            currently only KVM_TDX_MEASURE_MEMORY_REGION is defined
+- data: pointer to struct kvm_tdx_init_mem_region
+- error: must be 0
+- unused: must be 0
+
+::
+
+  #define KVM_TDX_MEASURE_MEMORY_REGION   (1UL << 0)
+
+  struct kvm_tdx_init_mem_region {
+          __u64 source_addr;
+          __u64 gpa;
+          __u64 nr_pages;
+  };
+
+
+KVM_TDX_FINALIZE_VM
+-------------------
+:Type: vm ioctl
+
+Complete measurement of the initial TD contents and mark it ready to run
+which corresponds to TDH.MR.FINALIZE
+
+- id: KVM_TDX_FINALIZE_VM
+- flags: must be 0
+- data: must be 0
+- error: must be 0
+- unused: must be 0
+
+KVM TDX creation flow
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+In addition to KVM normal flow, new TDX ioctls need to be called.  The con=
trol flow
+looks like as follows.
+
+#. system wide capability check
+
+   * KVM_CAP_VM_TYPES: check if VM type is supported and if KVM_X86_PROTEC=
TED_VM
+     is supported.
+
+#. creating VM
+
+   * KVM_CREATE_VM
+   * KVM_TDX_CAPABILITIES: query if TDX is supported on the platform.
+   * KVM_ENABLE_CAP_VM(KVM_CAP_MAX_VCPUS): set max_vcpus. KVM_MAX_VCPUS by
+     default.  KVM_MAX_VCPUS is not a part of ABI, but kernel internal con=
stant
+     that is subject to change.  Because max vcpus is a part of attestatio=
n, max
+     vcpus should be explicitly set.
+   * KVM_SET_TSC_KHZ for vm. optional
+   * KVM_TDX_INIT_VM: pass TDX specific VM parameters.
+
+#. creating VCPU
+
+   * KVM_CREATE_VCPU
+   * KVM_TDX_INIT_VCPU: pass TDX specific VCPU parameters.
+
+#. initializing guest memory
+
+   * allocate guest memory and initialize page same to normal KVM case
+     In TDX case, parse and load TDVF into guest memory in addition.
+   * KVM_TDX_INIT_MEM_REGION to add and measure guest pages.
+     If the pages has contents above, those pages need to be added.
+     Otherwise the contents will be lost and guest sees zero pages.
+   * KVM_TDX_FINALIAZE_VM: Finalize VM and measurement
+     This must be after KVM_TDX_INIT_MEM_REGION.
+
+#. run vcpu
+
+Design discussion
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+Coexistence of normal(VMX) VM and TD VM
+---------------------------------------
+It's required to allow both legacy(normal VMX) VMs and new TD VMs to
+coexist. Otherwise the benefits of VM flexibility would be eliminated.
+The main issue for it is that the logic of kvm_x86_ops callbacks for
+TDX is different from VMX. On the other hand, the variable,
+kvm_x86_ops, is global single variable. Not per-VM, not per-vcpu.
+
+Several points to be considered:
+
+  * No or minimal overhead when TDX is disabled(CONFIG_INTEL_TDX_HOST=3Dn).
+  * Avoid overhead of indirect call via function pointers.
+  * Contain the changes under arch/x86/kvm/vmx directory and share logic
+    with VMX for maintenance.
+    Even though the ways to operation on VM (VMX instruction vs TDX
+    SEAM call) are different, the basic idea remains the same. So, many
+    logic can be shared.
+  * Future maintenance
+    The huge change of kvm_x86_ops in (near) future isn't expected.
+    a centralized file is acceptable.
+
+- Wrapping kvm x86_ops: The current choice
+
+  Introduce dedicated file for arch/x86/kvm/vmx/main.c (the name,
+  main.c, is just chosen to show main entry points for callbacks.) and
+  wrapper functions around all the callbacks with
+  "if (is-tdx) tdx-callback() else vmx-callback()".
+
+  Pros:
+
+  - No major change in common x86 KVM code. The change is (mostly)
+    contained under arch/x86/kvm/vmx/.
+  - When TDX is disabled(CONFIG_INTEL_TDX_HOST=3Dn), the overhead is
+    optimized out.
+  - Micro optimization by avoiding function pointer.
+
+  Cons:
+
+  - Many boiler plates in arch/x86/kvm/vmx/main.c.
+
+KVM MMU Changes
+---------------
+KVM MMU needs to be enhanced to handle Secure/Shared-EPT. The
+high-level execution flow is mostly same to normal EPT case.
+EPT violation/misconfiguration -> invoke TDP fault handler ->
+resolve TDP fault -> resume execution. (or emulate MMIO)
+The difference is, that S-EPT is operated(read/write) via TDX SEAM
+call which is expensive instead of direct read/write EPT entry.
+One bit of GPA (51 or 47 bit) is repurposed so that it means shared
+with host(if set to 1) or private to TD(if cleared to 0).
+
+- The current implementation
+
+  * Reuse the existing MMU code with minimal update.  Because the
+    execution flow is mostly same. But additional operation, TDX call
+    for S-EPT, is needed. So add hooks for it to kvm_x86_ops.
+  * For performance, minimize TDX SEAM call to operate on S-EPT. When
+    getting corresponding S-EPT pages/entry from faulting GPA, don't
+    use TDX SEAM call to read S-EPT entry. Instead create shadow copy
+    in host memory.
+    Repurpose the existing kvm_mmu_page as shadow copy of S-EPT and
+    associate S-EPT to it.
+  * Treats share bit as attributes. mask/unmask the bit where
+    necessary to keep the existing traversing code works.
+    Introduce kvm.arch.gfn_shared_mask and use "if (gfn_share_mask)"
+    for special case.
+
+    * 0 : for non-TDX case
+    * 51 or 47 bit set for TDX case.
+
+  Pros:
+
+  - Large code reuse with minimal new hooks.
+  - Execution path is same.
+
+  Cons:
+
+  - Complicates the existing code.
+  - Repurpose kvm_mmu_page as shadow of Secure-EPT can be confusing.
+
+New KVM API, ioctl (sub)command, to manage TD VMs
+-------------------------------------------------
+Additional KVM APIs are needed to control TD VMs. The operations on TD
+VMs are specific to TDX.
+
+- Piggyback and repurpose KVM_MEMORY_ENCRYPT_OP
+
+  Although operations for TD VMs aren't necessarily related to memory
+  encryption, define sub operations of KVM_MEMORY_ENCRYPT_OP for TDX speci=
fic
+  ioctls.
+
+  Pros:
+
+  - No major change in common x86 KVM code.
+  - Follows the SEV case.
+
+  Cons:
+
+  - The sub operations of KVM_MEMORY_ENCRYPT_OP aren't necessarily memory
+    encryption, but operations on TD VMs.
+
+References
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+.. [1] TDX specification
+   https://software.intel.com/content/www/us/en/develop/articles/intel-tru=
st-domain-extensions.html
+.. [2] Intel Trust Domain Extensions (Intel TDX)
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/tdx-whitepaper-final9-17.pdf
+.. [3] Intel CPU Architectural Extensions Specification
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/intel-tdx-cpu-architectural-specification.pdf
+.. [4] Intel TDX Module 1.0 EAS
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/intel-tdx-module-1eas.pdf
+.. [5] Intel TDX Loader Interface Specification
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/intel-tdx-seamldr-interface-specification.pdf
+.. [6] Intel TDX Guest-Hypervisor Communication Interface
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/intel-tdx-guest-hypervisor-communication-interface.pdf
+.. [7] Intel TDX Virtual Firmware Design Guide
+   https://software.intel.com/content/dam/develop/external/us/en/documents=
/tdx-virtual-firmware-design-guide-rev-1.
+.. [8] intel public github
+
+   * kvm TDX branch: https://github.com/intel/tdx/tree/kvm
+   * TDX guest branch: https://github.com/intel/tdx/tree/guest
+
+.. [9] tdvf
+    https://github.com/tianocore/edk2-staging/tree/TDVF
+.. [10] KVM forum 2020: Intel Virtualization Technology Extensions to
+     Enable Hardware Isolated VMs
+     https://osseu2020.sched.com/event/eDzm/intel-virtualization-technolog=
y-extensions-to-enable-hardware-isolated-vms-sean-christopherson-intel
+.. [11] Linux Security Summit EU 2020:
+     Architectural Extensions for Hardware Virtual Machine Isolation
+     to Advance Confidential Computing in Public Clouds - Ravi Sahita
+     & Jun Nakajima, Intel Corporation
+     https://osseu2020.sched.com/event/eDOx/architectural-extensions-for-h=
ardware-virtual-machine-isolation-to-advance-confidential-computing-in-publ=
ic-clouds-ravi-sahita-jun-nakajima-intel-corporation
+.. [12] [RFCv2,00/16] KVM protected memory extension
+     https://lkml.org/lkml/2020/10/20/66
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B52D2C64ED8
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:31:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231948AbjB0Ibt (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:31:49 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38650 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232156AbjB0Iav (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:51 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1018D211EE;
        Mon, 27 Feb 2023 00:27:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486439; x=1709022439;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=IOoVU5qD3W7m9DxADnZ/g0MBLmVshztOkFbmG85quEU=;
  b=DbrruoPGCPFn09d+EurZjvgXaWQVh6nM8C7/L+REAvBZbN+sha/kaEGQ
   R0Jzy/v2MYMjfsL9EMsZmtgYZQ8+mUl354Yfm12qV7kRvuaWjB+rw1eE9
   YJg6iqU+w/VkK8lr6xKJkm6WxJF4FyU7FhGkkYZ31TX3RzrcWPXeEkawp
   tL7Ksjrf/+ioSpiZzHxFphTgQFOH3zUMqop6Wr+jmEaGy7ssNKWRlVTSL
   3pofqgwvOyOFloXyuE0aGMuQ7/b0TKWMk36MHqAeqnHeRJ77v7hAUPT12
   PzwG+/Iq3y8fgQBwQ01zrtpnHufuVM1STn1y1xhIsjQPUe9PZ1qRaWzwv
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609122"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609122"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:22 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242459"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242459"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:21 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>,
        Bagas Sanjaya <bagasdotme@gmail.com>
Subject: [PATCH v12 104/106] KVM: x86: design documentation on TDX support of
 x86 KVM TDP MMU
Date: Mon, 27 Feb 2023 00:23:43 -0800
Message-Id: 
 <8cb18e0aa03f6eb307710e77a05e7df66f4de5d7.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a high level design document on TDX changes to TDP MMU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
---
 Documentation/virt/kvm/index.rst       |   1 +
 Documentation/virt/kvm/tdx-tdp-mmu.rst | 417 +++++++++++++++++++++++++
 2 files changed, 418 insertions(+)
 create mode 100644 Documentation/virt/kvm/tdx-tdp-mmu.rst

diff --git a/Documentation/virt/kvm/index.rst b/Documentation/virt/kvm/inde=
x.rst
index 20a2ab8fc78c..eafacbff1f4e 100644
--- a/Documentation/virt/kvm/index.rst
+++ b/Documentation/virt/kvm/index.rst
@@ -21,3 +21,4 @@ KVM
    review-checklist
=20
    intel-tdx
+   tdx-tdp-mmu
diff --git a/Documentation/virt/kvm/tdx-tdp-mmu.rst b/Documentation/virt/kv=
m/tdx-tdp-mmu.rst
new file mode 100644
index 000000000000..2d91c94e6d8f
--- /dev/null
+++ b/Documentation/virt/kvm/tdx-tdp-mmu.rst
@@ -0,0 +1,417 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Design of TDP MMU for TDX support
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
+This document describes a (high level) design for TDX support of KVM TDP M=
MU of
+x86 KVM.
+
+In this document, we use "TD" or "guest TD" to differentiate it from the c=
urrent
+"VM" (Virtual Machine), which is supported by KVM today.
+
+
+Background of TDX
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+TD private memory is designed to hold TD private content, encrypted by the=
 CPU
+using the TD ephemeral key.  An encryption engine holds a table of encrypt=
ion
+keys, and an encryption key is selected for each memory transaction based =
on a
+Host Key Identifier (HKID).  By design, the host VMM does not have access =
to the
+encryption keys.
+
+In the first generation of MKTME, HKID is "stolen" from the physical addre=
ss by
+allocating a configurable number of bits from the top of the physical addr=
ess.
+The HKID space is partitioned into shared HKIDs for legacy MKTME accesses =
and
+private HKIDs for SEAM-mode-only accesses.  We use 0 for the shared HKID o=
n the
+host so that MKTME can be opaque or bypassed on the host.
+
+During TDX non-root operation (i.e. guest TD), memory accesses can be qual=
ified
+as either shared or private, based on the value of a new SHARED bit in the=
 Guest
+Physical Address (GPA).  The CPU translates shared GPAs using the usual VM=
X EPT
+(Extended Page Table) or "Shared EPT" (in this document), which resides in=
 the
+host VMM memory.  The Shared EPT is directly managed by the host VMM - the=
 same
+as with the current VMX.  Since guest TDs usually require I/O, and the data
+exchange needs to be done via shared memory, thus KVM needs to use the cur=
rent
+EPT functionality even for TDs.
+
+The CPU translates private GPAs using a separate Secure EPT.  The Secure E=
PT
+pages are encrypted and integrity-protected with the TD's ephemeral privat=
e key.
+Secure EPT can be managed _indirectly_ by the host VMM, using the TDX inte=
rface
+functions (SEAMCALLs), and thus conceptually Secure EPT is a subset of EPT
+because not all functionalities are available.
+
+Since the execution of such interface functions takes much longer time than
+accessing memory directly, in KVM we use the existing TDP code to mirror t=
he
+Secure EPT for the TD. And we think there are at least two options today in
+terms of the timing for executing such SEAMCALLs:
+
+1. synchronous, i.e. while walking the TDP page tables, or
+2. post-walk, i.e. record what needs to be done to the real Secure EPT dur=
ing
+   the walk, and execute SEAMCALLs later.
+
+The option 1 seems to be more intuitive and simpler, but the Secure EPT
+concurrency rules are different from the ones of the TDP or EPT. For examp=
le,
+MEM.SEPT.RD acquire shared access to the whole Secure EPT tree of the targ=
et
+
+Secure EPT(SEPT) operations
+---------------------------
+Secure EPT is an Extended Page Table for GPA-to-HPA translation of TD priv=
ate
+HPA.  A Secure EPT is designed to be encrypted with the TD's ephemeral pri=
vate
+key. SEPT pages are allocated by the host VMM via Intel TDX functions, but=
 their
+content is intended to be hidden and is not architectural.
+
+Unlike the conventional EPT, the CPU can't directly read/write its entry.
+Instead, TDX SEAMCALL API is used.  Several SEAMCALLs correspond to operat=
ion on
+the EPT entry.
+
+* TDH.MEM.SEPT.ADD():
+
+  Add a secure EPT page from the secure EPT tree.  This corresponds to upd=
ating
+  the non-leaf EPT entry with present bit set
+
+* TDH.MEM.SEPT.REMOVE():
+
+  Remove the secure page from the secure EPT tree.  There is no correspond=
ing
+  to the EPT operation.
+
+* TDH.MEM.SEPT.RD():
+
+  Read the secure EPT entry.  This corresponds to reading the EPT entry as
+  memory.  Please note that this is much slower than direct memory reading.
+
+* TDH.MEM.PAGE.ADD() and TDH.MEM.PAGE.AUG():
+
+  Add a private page to the secure EPT tree.  This corresponds to updating=
 the
+  leaf EPT entry with present bit set.
+
+* THD.MEM.PAGE.REMOVE():
+
+  Remove a private page from the secure EPT tree.  There is no correspondi=
ng
+  to the EPT operation.
+
+* TDH.MEM.RANGE.BLOCK():
+
+  This (mostly) corresponds to clearing the present bit of the leaf EPT en=
try.
+  Note that the private page is still linked in the secure EPT.  To remove=
 it
+  from the secure EPT, TDH.MEM.SEPT.REMOVE() and TDH.MEM.PAGE.REMOVE() nee=
ds to
+  be called.
+
+* TDH.MEM.TRACK():
+
+  Increment the TLB epoch counter. This (mostly) corresponds to EPT TLB fl=
ush.
+  Note that the private page is still linked in the secure EPT.  To remove=
 it
+  from the secure EPT, tdh_mem_page_remove() needs to be called.
+
+
+Adding private page
+-------------------
+The procedure of populating the private page looks as follows.
+
+1. TDH.MEM.SEPT.ADD(512G level)
+2. TDH.MEM.SEPT.ADD(1G level)
+3. TDH.MEM.SEPT.ADD(2M level)
+4. TDH.MEM.PAGE.AUG(4K level)
+
+Those operations correspond to updating the EPT entries.
+
+Dropping private page and TLB shootdown
+---------------------------------------
+The procedure of dropping the private page looks as follows.
+
+1. TDH.MEM.RANGE.BLOCK(4K level)
+
+   This mostly corresponds to clear the present bit in the EPT entry.  This
+   prevents (or blocks) TLB entry from creating in the future.  Note that =
the
+   private page is still linked in the secure EPT tree and the existing ca=
che
+   entry in the TLB isn't flushed.
+
+2. TDH.MEM.TRACK(range) and TLB shootdown
+
+   This mostly corresponds to the EPT TLB shootdown.  Because all vcpus sh=
are
+   the same Secure EPT, all vcpus need to flush TLB.
+
+   * TDH.MEM.TRACK(range) by one vcpu.  It increments the global internal =
TLB
+     epoch counter.
+
+   * send IPI to remote vcpus
+   * Other vcpu exits to VMM from guest TD and then re-enter. TDH.VP.ENTER=
().
+   * TDH.VP.ENTER() checks the TLB epoch counter and If its TLB is old, fl=
ush
+     TLB.
+
+   Note that only single vcpu issues tdh_mem_track().
+
+   Note that the private page is still linked in the secure EPT tree, unli=
ke the
+   conventional EPT.
+
+3. TDH.MEM.PAGE.PROMOTE, TDH.MEM.PAGEDEMOTE(), TDH.MEM.PAGE.RELOCATE(), or
+   TDH.MEM.PAGE.REMOVE()
+
+   There is no corresponding operation to the conventional EPT.
+
+   * When changing page size (e.g. 4K <-> 2M) TDH.MEM.PAGE.PROMOTE() or
+     TDH.MEM.PAGE.DEMOTE() is used.  During those operation, the guest pag=
e is
+     kept referenced in the Secure EPT.
+
+   * When migrating page, TDH.MEM.PAGE.RELOCATE().  This requires both sou=
rce
+     page and destination page.
+   * when destroying TD, TDH.MEM.PAGE.REMOVE() removes the private page fr=
om the
+     secure EPT tree.  In this case TLB shootdown is not needed because vc=
pus
+     don't run any more.
+
+The basic idea for TDX support
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
+Because shared EPT is the same as the existing EPT, use the existing logic=
 for
+shared EPT.  On the other hand, secure EPT requires additional operations
+instead of directly reading/writing of the EPT entry.
+
+On EPT violation, The KVM mmu walks down the EPT tree from the root, deter=
mines
+the EPT entry to operate, and updates the entry. If necessary, a TLB shoot=
down
+is done.  Because it's very slow to directly walk secure EPT by TDX SEAMCA=
LL,
+TDH.MEM.SEPT.RD(), the mirror of secure EPT is created and maintained.  Add
+hooks to KVM MMU to reuse the existing code.
+
+EPT violation on shared GPA
+---------------------------
+(1) EPT violation on shared GPA or zapping shared GPA
+    ::
+
+        walk down shared EPT tree (the existing code)
+                |
+                |
+                V
+        shared EPT tree (CPU refers.)
+
+(2) update the EPT entry. (the existing code)
+
+    TLB shootdown in the case of zapping.
+
+
+EPT violation on private GPA
+----------------------------
+(1) EPT violation on private GPA or zapping private GPA
+    ::
+
+        walk down the mirror of secure EPT tree (mostly same as the existi=
ng code)
+            |
+            |
+            V
+        mirror of secure EPT tree (KVM MMU software only. reuse of the exi=
sting code)
+
+(2) update the (mirrored) EPT entry. (mostly same as the existing code)
+
+(3) call the hooks with what EPT entry is changed
+    ::
+
+           |
+        NEW: hooks in KVM MMU
+           |
+           V
+        secure EPT root(CPU refers)
+
+(4) the TDX backend calls necessary TDX SEAMCALLs to update real secure EP=
T.
+
+The major modification is to add hooks for the TDX backend for additional
+operations and to pass down which EPT, shared EPT, or private EPT is used,=
 and
+twist the behavior if we're operating on private EPT.
+
+The following depicts the relationship.
+::
+
+                    KVM                             |       TDX module
+                     |                              |           |
+        -------------+----------                    |           |
+        |                      |                    |           |
+        V                      V                    |           |
+     shared GPA           private GPA               |           |
+  CPU shared EPT pointer  KVM private EPT pointer   |  CPU secure EPT poin=
ter
+        |                      |                    |           |
+        |                      |                    |           |
+        V                      V                    |           V
+  shared EPT                private EPT<-------mirror----->Secure EPT
+        |                      |                    |           |
+        |                      \--------------------+------\    |
+        |                                           |      |    |
+        V                                           |      V    V
+  shared guest page                                 |    private guest page
+                                                    |
+                                                    |
+                              non-encrypted memory  |    encrypted memory
+                                                    |
+
+shared EPT: CPU and KVM walk with shared GPA
+            Maintained by the existing code
+private EPT: KVM walks with private GPA
+             Maintained by the twisted existing code
+secure EPT: CPU walks with private GPA.
+            Maintained by TDX module with TDX SEAMCALLs via hooks
+
+
+Tracking private EPT page
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
+Shared EPT pages are managed by struct kvm_mmu_page.  They are linked in a=
 list
+structure.  When necessary, the list is traversed to operate on.  Private =
EPT
+pages have different characteristics.  For example, private pages can't be
+swapped out.  When shrinking memory, we'd like to traverse only shared EPT=
 pages
+and skip private EPT pages.  Likewise, page migration isn't supported for
+private pages (yet).  Introduce an additional list to track shared EPT pag=
es and
+track private EPT pages independently.
+
+At the beginning of EPT violation, the fault handler knows fault GPA, thus=
 it
+knows which EPT to operate on, private or shared.  If it's private EPT,
+an additional task is done.  Something like "if (private) { callback a hoo=
k }".
+Since the fault handler has deep function calls, it's cumbersome to hold t=
he
+information of which EPT is operating.  Options to mitigate it are
+
+1. Pass the information as an argument for the function call.
+2. Record the information in struct kvm_mmu_page somehow.
+3. Record the information in vcpu structure.
+
+Option 2 was chosen.  Because option 1 requires modifying all the function=
s.  It
+would affect badly to the normal case.  Option 3 doesn't work well because=
 in
+some cases, we need to walk both private and shared EPT.
+
+The role of the EPT page can be utilized and one bit can be curved out from
+unused bits in struct kvm_mmu_page_role.  When allocating the EPT page,
+initialize the information. Mostly struct kvm_mmu_page is available because
+we're operating on EPT pages.
+
+
+The conversion of private GPA and shared GPA
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+A page of a given GPA can be assigned to only private GPA xor shared GPA a=
t one
+time.  The GPA can't be accessed simultaneously via both private GPA and s=
hared
+GPA.  On guest startup, all the GPAs are assigned as private.  Guest conve=
rts
+the range of GPA to shared (or private) from private (or shared) by MapGPA
+hypercall.  MapGPA hypercall takes the start GPA and the size of the regio=
n.  If
+the given start GPA is shared, VMM converts the region into shared (if it's
+already shared, nop).  If the start GPA is private, VMM converts the regio=
n into
+private.  It implies the guest won't access the unmapped region. private(or
+shared) region after converting to shared(or private).
+
+If the guest TD triggers an EPT violation on the already converted region,=
 the
+access won't be allowed (loop in EPT violation) until other vcpu converts =
back
+the region.
+
+KVM MMU records which GPA is allowed to access, private or shared by xarra=
y.
+
+
+The original TDP MMU and race condition
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+Because vcpus share the EPT, once the EPT entry is zapped, we need to shoo=
tdown
+TLB.  Send IPI to remote vcpus.  Remote vcpus flush their down TLBs.  Unti=
l TLB
+shootdown is done, vcpus may reference the zapped guest page.
+
+TDP MMU uses read lock of mmu_lock to mitigate vcpu contention.  When read=
 lock
+is obtained, it depends on the atomic update of the EPT entry.  (On the ot=
her
+hand legacy MMU uses write lock.)  When vcpu is populating/zapping the EPT=
 entry
+with a read lock held, other vcpu may be populating or zapping the same EPT
+entry at the same time.
+
+To avoid the race condition, the entry is frozen.  It means the EPT entry =
is set
+to the special value, REMOVED_SPTE which clears the present bit.  And then=
 after
+TLB shootdown, update the EPT entry to the final value.
+
+Concurrent zapping
+------------------
+1. read lock
+2. freeze the EPT entry (atomically set the value to REMOVED_SPTE)
+   If other vcpu froze the entry, restart page fault.
+3. TLB shootdown
+
+   * send IPI to remote vcpus
+   * TLB flush (local and remote)
+
+   For each entry update, TLB shootdown is needed because of the
+   concurrency.
+4. atomically set the EPT entry to the final value
+5. read unlock
+
+Concurrent populating
+---------------------
+In the case of populating the non-present EPT entry, atomically update the=
 EPT
+entry.
+
+1. read lock
+
+2. atomically update the EPT entry
+   If other vcpu frozen the entry or updated the entry, restart page fault.
+
+3. read unlock
+
+In the case of updating the present EPT entry (e.g. page migration), the
+operation is split into two.  Zapping the entry and populating the entry.
+
+1. read lock
+2. zap the EPT entry.  follow the concurrent zapping case.
+3. populate the non-present EPT entry.
+4. read unlock
+
+Non-concurrent batched zapping
+------------------------------
+In some cases, zapping the ranges is done exclusively with a write lock he=
ld.
+In this case, the TLB shootdown is batched into one.
+
+1. write lock
+2. zap the EPT entries by traversing them
+3. TLB shootdown
+4. write unlock
+
+For Secure EPT, TDX SEAMCALLs are needed in addition to updating the mirro=
red
+EPT entry.
+
+TDX concurrent zapping
+----------------------
+Add a hook for TDX SEAMCALLs at the step of the TLB shootdown.
+
+1. read lock
+2. freeze the EPT entry(set the value to REMOVED_SPTE)
+3. TLB shootdown via a hook
+
+   * TLB.MEM.RANGE.BLOCK()
+   * TLB.MEM.TRACK()
+   * send IPI to remote vcpus
+
+4. set the EPT entry to the final value
+5. read unlock
+
+TDX concurrent populating
+-------------------------
+TDX SEAMCALLs are required in addition to operating the mirrored EPT entry=
.  The
+frozen entry is utilized by following the zapping case to avoid the race
+condition.  A hook can be added.
+
+1. read lock
+2. freeze the EPT entry
+3. hook
+
+   * TDH_MEM_SEPT_ADD() for non-leaf or TDH_MEM_PAGE_AUG() for leaf.
+
+4. set the EPT entry to the final value
+5. read unlock
+
+Without freezing the entry, the following race can happen.  Suppose two vc=
pus
+are faulting on the same GPA and the 2M and 4K level entries aren't popula=
ted
+yet.
+
+* vcpu 1: update 2M level EPT entry
+* vcpu 2: update 4K level EPT entry
+* vcpu 2: TDX SEAMCALL to update 4K secure EPT entry =3D> error
+* vcpu 1: TDX SEAMCALL to update 2M secure EPT entry
+
+
+TDX non-concurrent batched zapping
+----------------------------------
+For simplicity, the procedure of concurrent populating is utilized.  The
+procedure can be optimized later.
+
+
+Co-existing with unmapping guest private memory
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+TODO.  This needs to be addressed.
+
+
+Restrictions or future work
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
+The following features aren't supported yet at the moment.
+
+* optimizing non-concurrent zap
+* Large page
+* Page migration
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 41475C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:31:31 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231737AbjB0Ib3 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:31:29 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38978 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232086AbjB0Ial (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:41 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35796211D0;
        Mon, 27 Feb 2023 00:27:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486434; x=1709022434;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=EthrGG6csUFIjQZoAH/8u9QZ4GGSL5vUlX7/MC3Dx0k=;
  b=YwQuYcELaBZx/+AX1P58rsr8peN7ppTDs7wA4gEj/pQOvvxMN/bG/iY1
   rL1iZLe/JP9k8Ezysmgti8JkAQECZbiAKYtEDxDNfz5EBHxUbCbXyJkxi
   Vn40XU9NVaXYyVuacpoRweCvkVVRweoc2Lu0kP2GzyCEQVQeQQd3U0JIE
   ghW9v3gT1IrXXjHZcBeQ9d6HnaEx+fG/Qr1hRYQ66XbKs5mXLPmVmqFC3
   +2HBEHMgCgGKDyo7SCYibYjszbCW0+X6m44eEH2NUaUlwuEJGK+r9iRkV
   5tt2V/eqdpgjtM3pZJG5CLMWIaMLiV+ZMpD1caZl1a7qFxwkrb83kT95H
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609124"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609124"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:22 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242462"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242462"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:21 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 105/106] RFC: KVM: TDX: Make busy with S-EPT on entry bug
Date: Mon, 27 Feb 2023 00:23:44 -0800
Message-Id: 
 <4ae435eb4a0e1cafcd471c7dd7124a4f0289b7e7.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX module has mitigation against zero-step attacks or single-step attacks.
When the TDX module finds repeated EPT violations on the same guest RIP,
i.e. no advance in guest, it starts to suspect the attack.  The mitigation
logic on the next entry tries to take the lock of S-EPT.  It may result in
an error of TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT.  As KVM shouldn't
spuriously zap private S-EPT so that guest can make progress, KVM
shouldn't cause the TDX module to trigger the mitigation.  Make
(TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT) on entry KVM bug.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
- This patch is RFC because this is only lightly tested and stress test
  isn't done.
---
 arch/x86/kvm/vmx/tdx.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 292f55efe8f7..846cd4255f49 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1712,8 +1712,20 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_=
t fastpath)
 {
 	union tdx_exit_reason exit_reason =3D to_tdx(vcpu)->exit_reason;
=20
-	/* See the comment of tdh_sept_seamcall(). */
-	if (unlikely(exit_reason.full =3D=3D (TDX_OPERAND_BUSY | TDX_OPERAND_ID_S=
EPT)))
+	/*
+	 * See the comment of tdh_sept_seamcall().
+	 * TDX module has mitigation against zero-step attacks or single-step
+	 * attacks.  When the TDX module finds repeated EPT violations on the
+	 * same guest RIP, i.e. no advance in guest, it starts to suspect the
+	 * attack.  The mitigation logic on the next entry tries to take the
+	 * lock of S-EPT.  It may result in an error of (TDX_OPERAND_BUSY |
+	 * TDX_OPERAND_ID_SEPT).  As KVM shouldn't spuriously zap private S-EPT
+	 * so that guest can make progress, KVM shouldn't cause the TDX module
+	 * to trigger the mitigation.  Make (TDX_OPERAND_BUSY |
+	 * TDX_OPERAND_ID_SEPT) on entry KVM bug.
+	 */
+	if (KVM_BUG_ON(exit_reason.full =3D=3D (TDX_OPERAND_BUSY | TDX_OPERAND_ID=
_SEPT),
+		       vcpu->kvm))
 		return 1;
=20
 	if (unlikely(exit_reason.non_recoverable || exit_reason.error)) {
--=20
2.25.1
From nobody Tue Sep  9 16:53:38 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B24F2C64ED6
	for <linux-kernel@archiver.kernel.org>; Mon, 27 Feb 2023 08:31:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231869AbjB0Ibi (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 27 Feb 2023 03:31:38 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38714 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232169AbjB0Iaw (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Feb 2023 03:30:52 -0500
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52CF51E2A0;
        Mon, 27 Feb 2023 00:27:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1677486440; x=1709022440;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=gCfolmUN4wcMuFyXYTI9Lye+/UoF/Icc+zgmIb9hAwI=;
  b=V3l5l4EripN6/zgzl0SkZqXtewhDtUcypCfsCyGBe0aOKs0UNNSXKwDD
   pEBitiqyXtT3So5ZWyWH+9Ps5egNXkY5UOV+xCW/zsqF+vYMjXcW+nOV5
   RRwbr1p6QZDv4npel6d6zAsUTtOpwc38H/BIgP3QQ5absDJJ9Y+uIR51t
   3OB+gWiGQVr0GwsDvbO+ImHY1NCmlzL+4/GC2Kpw7OM4AcvqQrpZtzoCs
   fChGp/GePHz4dFtMG3STGVe7AeKWe77B1PR52wfqpzqUiJLcs7moOPpg3
   dj3jYPnn3FH9vAKrJoq/AGLAj/1olLHTrr5gBXKQgouQVNGNvLfZY/IUI
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="317609128"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="317609128"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:22 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10633"; a="783242465"
X-IronPort-AV: E=Sophos;i="5.97,331,1669104000";
   d="scan'208";a="783242465"
Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 27 Feb 2023 00:24:22 -0800
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>
Subject: [PATCH v12 106/106] [MARKER] the end of (the first phase of) TDX KVM
 patch series
Date: Mon, 27 Feb 2023 00:23:45 -0800
Message-Id: 
 <968095994bedc18c621af2b449900dcf0a7c3dfe.1677484918.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1677484918.git.isaku.yamahata@intel.com>
References: <cover.1677484918.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

This empty commit is to mark the end of (the first phase of) patch series
of TDX KVM support.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 .../virt/kvm/intel-tdx-layer-status.rst       | 32 -------------------
 1 file changed, 32 deletions(-)
 delete mode 100644 Documentation/virt/kvm/intel-tdx-layer-status.rst

diff --git a/Documentation/virt/kvm/intel-tdx-layer-status.rst b/Documentat=
ion/virt/kvm/intel-tdx-layer-status.rst
deleted file mode 100644
index 010c387ef5cc..000000000000
--- a/Documentation/virt/kvm/intel-tdx-layer-status.rst
+++ /dev/null
@@ -1,32 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-Intel Trust Dodmain Extensions(TDX)
-=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-
-Layer status
-=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-What qemu can do
-----------------
-- TDX VM TYPE is exposed to Qemu.
-- Qemu can create/destroy guest of TDX vm type.
-- Qemu can create/destroy vcpu of TDX vm type.
-- Qemu can populate initial guest memory image.
-- Qemu can finalize guest TD.
-- Qemu can start to run vcpu. But vcpu can not make progress yet.
-
-Patch Layer status
-------------------
-  Patch layer                          Status
-* TDX, VMX coexistence:                 Applied
-* TDX architectural definitions:        Applied
-* TD VM creation/destruction:           Applied
-* TD vcpu creation/destruction:         Applied
-* TDX EPT violation:                    Applied
-* TD finalization:                      Applied
-* TD vcpu enter/exit:                   Applied
-* TD vcpu interrupts/exit/hypercall:    Not yet
-
-* KVM MMU GPA shared bits:              Applied
-* KVM TDP refactoring for TDX:          Applied
-* KVM TDP MMU hooks:                    Applied
--=20
2.25.1