From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A11E21D0DF6; Wed, 30 Oct 2024 19:00:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314859; cv=none; b=FlsiGc0mcUPvT92K+ZWe7aecgXc1f7eeZ/JfoR1TXv2Gjub4u+vz60wofvvvFAuni7mVSAQALjxXltH6fG0vXkc/iiY7v8mFkMAJNOl8fCF/G1UEaAhC9xS7aOG10a+AZLS+7noWS5oipKLCGMeURbm/YVDqJsdk3d/au9mZMig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314859; c=relaxed/simple; bh=k+6bY16Z64yOlLYbI2NwSfZnYaym8nehqj2OcKUSP7w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EO7qrnWoGmQgBRtSL9l0JVp6IADNP7TpeChpBEpRFVXL6oRtOwezyIlm9KLbSk7IxPQn4hedNpRaY+VtydCZ/Dxue2167Qmu2+uobUbwiGOMIiCgy/fD328FWfO/Q/Qih/sHK5KNGTt+AuBMBBHcxkwamPGoNg8zZVTdXzhnVNY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QoRMtM2f; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QoRMtM2f" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314857; x=1761850857; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=k+6bY16Z64yOlLYbI2NwSfZnYaym8nehqj2OcKUSP7w=; b=QoRMtM2fXXa9QLTaKfZYmBssGSTGm+yK6tHLayW6fiaRc1uGyFElOigX HfnggF3SaQAQZ9BxsszMbla4diD6aRzWg1/sB3JYuOMTRmE+vhARWHAfs PempsA31NW12ZFojMz+5JtyQk3gQC3miJkQDVKWTASp8kR5GONoKXq/VT GGcGGKPyYYtCavbFdlKJyOaP86ImENMc3T6iuwEIMUYIO1nxU4zRgBErt LWpCoiFxPSY2C2Xr3Wa/NTLMOPyl5EPnDN3MR4ORtVdzvrJ7UN7XyyUSC voCKtWelD20IH5ot1DaHo0juYw25U8JBpFU6IT6zigfxAKyupY9LQSB8V g==; X-CSE-ConnectionGUID: zp4Xw3fkTIK8GWKDITxmiw== X-CSE-MsgGUID: JliSkwCQRn6Kon9bAvn5wA== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678709" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678709" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:55 -0700 X-CSE-ConnectionGUID: eWThE8BCSKmoqvZJ8KYh9A== X-CSE-MsgGUID: 50ZX0VBMRJuQWM7C+gcjRA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499303" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:54 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com Subject: [PATCH v2 01/25] x86/virt/tdx: Share the global metadata structure for KVM to use Date: Wed, 30 Oct 2024 12:00:14 -0700 Message-ID: <20241030190039.77971-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kai Huang The TDX host tracks all global metadata fields in 'struct tdx_sys_info'. For now they are only used by module initialization and are not shared to other kernel components. Future changes to support KVM TDX will need to read more global metadata fields, e.g., those in "TD Control Structures" and "TD Configurability". In the longer term, other TDX features like TDX Connect (which supports assigning trusted devices to TDX guests) will also require other kernel components such as pci/vt-d to access global metadata. To meet all those requirements, the idea is the TDX host core-kernel to to provide a centralized, canonical, and read-only structure for the global metadata that comes out from the TDX module for all kernel components to use. To achieve "read-only", the ideal way is to annotate the whole structure with __ro_after_init. However currently all global metadata fields are read by tdx_enable(), which could be called at any time at runtime thus isn't annotated with __init. The __ro_after_init can be done eventually, but it can only be done after moving VMXON out of KVM to the core-kernel: after that we can read all metadata during kernel boot (thus __ro_after_init), but doesn't necessarily have to do it in tdx_enable(). For now, add a helper function to return a 'const struct tdx_sys_info *' and export it for KVM to use. Note, KVM doesn't need to access all global metadata for TDX, thus exporting the entire 'struct tdx_sys_info' is overkill. Another option is to export sub-structures on demand. But this will result in more exports. Given the export is done via a const pointer thus the other in-kernel TDX won't be able to write to global metadata, simply export all global metadata fields in one function. The auto-generated 'tdx_global_metadata.h' contains declarations of 'struct tdx_sys_info' and its sub-structures. Move it to arch/x86/include/asm/ and include it to to expose those structures. Include 'tdx_global_metadata.h' inside the '#ifndef __ASSEMBLY__' since otherwise there will be build warning due to is also included by assembly. Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - New patch --- arch/x86/include/asm/tdx.h | 3 ++ .../tdx =3D> include/asm}/tdx_global_metadata.h | 0 arch/x86/virt/vmx/tdx/tdx.c | 28 +++++++++++++++---- arch/x86/virt/vmx/tdx/tdx.h | 1 - 4 files changed, 25 insertions(+), 7 deletions(-) rename arch/x86/{virt/vmx/tdx =3D> include/asm}/tdx_global_metadata.h (100= %) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index eba178996d84..b9758369d82c 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -33,6 +33,7 @@ #ifndef __ASSEMBLY__ =20 #include +#include "tdx_global_metadata.h" =20 /* * Used by the #VE exception handler to gather the #VE exception @@ -116,11 +117,13 @@ static inline u64 sc_retry(sc_func_t func, u64 fn, int tdx_cpu_enable(void); int tdx_enable(void); const char *tdx_dump_mce_info(struct mce *m); +const struct tdx_sys_info *tdx_get_sysinfo(void); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } static inline int tdx_enable(void) { return -ENODEV; } static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } +static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NU= LL; } #endif /* CONFIG_INTEL_TDX_HOST */ =20 #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.h b/arch/x86/include= /asm/tdx_global_metadata.h similarity index 100% rename from arch/x86/virt/vmx/tdx/tdx_global_metadata.h rename to arch/x86/include/asm/tdx_global_metadata.h diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 6982e100536d..7589c75eaa6c 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -52,6 +52,8 @@ static DEFINE_MUTEX(tdx_module_lock); /* All TDX-usable memory regions. Protected by mem_hotplug_lock. */ static LIST_HEAD(tdx_memlist); =20 +static struct tdx_sys_info tdx_sysinfo; + typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *arg= s); =20 static inline void seamcall_err(u64 fn, u64 err, struct tdx_module_args *a= rgs) @@ -1132,15 +1134,14 @@ static int init_tdmrs(struct tdmr_info_list *tdmr_l= ist) =20 static int init_tdx_module(void) { - struct tdx_sys_info sysinfo; int ret; =20 - ret =3D init_tdx_sys_info(&sysinfo); + ret =3D init_tdx_sys_info(&tdx_sysinfo); if (ret) return ret; =20 /* Check whether the kernel can support this module */ - ret =3D check_features(&sysinfo); + ret =3D check_features(&tdx_sysinfo); if (ret) return ret; =20 @@ -1161,13 +1162,14 @@ static int init_tdx_module(void) goto out_put_tdxmem; =20 /* Allocate enough space for constructing TDMRs */ - ret =3D alloc_tdmr_list(&tdx_tdmr_list, &sysinfo.tdmr); + ret =3D alloc_tdmr_list(&tdx_tdmr_list, &tdx_sysinfo.tdmr); if (ret) goto err_free_tdxmem; =20 /* Cover all TDX-usable memory regions in TDMRs */ - ret =3D construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr, - &sysinfo.cmr); + ret =3D construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, + &tdx_sysinfo.tdmr, &tdx_sysinfo.cmr); + if (ret) goto err_free_tdmrs; =20 @@ -1529,3 +1531,17 @@ void __init tdx_init(void) =20 check_tdx_erratum(); } + +const struct tdx_sys_info *tdx_get_sysinfo(void) +{ + const struct tdx_sys_info *p =3D NULL; + + /* Make sure all fields in @tdx_sysinfo have been populated */ + mutex_lock(&tdx_module_lock); + if (tdx_module_status =3D=3D TDX_MODULE_INITIALIZED) + p =3D (const struct tdx_sys_info *)&tdx_sysinfo; + mutex_unlock(&tdx_module_lock); + + return p; +} +EXPORT_SYMBOL_GPL(tdx_get_sysinfo); diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index c8be00f6b15a..9b708a8fb568 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -6,7 +6,6 @@ #include #include #include -#include "tdx_global_metadata.h" =20 /* * This file contains both macros and data structures defined by the TDX --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C3A221E8850; Wed, 30 Oct 2024 19:00:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314859; cv=none; b=fRCZlBQwnE5Ub36zuDnGY5Af8rY9c0XRp9mOMaIXFMiy108EzCLR5Yx5DRlR0sAZ3frSie/EOcnCoMyETmsPPH///HaXFbLAAq69o/Eeo6U/wkItKpJ5MlXqRZemJNEvRDF6V/J9t+uNvh7t+m9Ke0u2KtXkVBDcIzve2BGFj8g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314859; c=relaxed/simple; bh=jl2tybTGxkzwlloval+G94gP9hQCDDzMKi+ZVsdOKT8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fjhWtKVotrcU6GG2EXbYRJKcSWHxFcSzL9ELLHrUwQp4RfvExbRonbngbMhFONQ23esXpaxyu+vw3K30o7uywuqg4OKvKqGrdXeFstwpBkk9PJoH3rGOyllH5N8Dhk8UUmwiCbZpKUWLeUasBv027DnTdBmy4C5lwStH0d2iujs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lKQVNaNs; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lKQVNaNs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314858; x=1761850858; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jl2tybTGxkzwlloval+G94gP9hQCDDzMKi+ZVsdOKT8=; b=lKQVNaNsqQghDl+/lSOo6zNU3a0Hx/i9SVIBiQvdQPBNSwHRRsdeNhQe DTfEvB4ApprkZkbQXr/upg3NF4UP76skaxzxggxZwd029AFEe0GjciEbd Q2pLNNEO4e8g7cvbITD3F/BFhGb56DvGHamv+23zE2/yvc0y3jDwKa29t HVULiddJCVlhFCRIlZyMk+WPcGN2jjSNKtS4+NqAk85iZZyTL4l5sIIHr 5JrREff8VY5Yh4xR84go/WYuS1jGGbyvThhtXl2dACLbpKPAit1KwXPBl q55TFSIKrWS9IWBKv7YKHGqul5/TTwK58KKL/8Jh25PyiC/v2hsqn+k7M g==; X-CSE-ConnectionGUID: jtlRHt7jSmezJ3OF3WbxQg== X-CSE-MsgGUID: 6Tu0h+vKTtSos1GGUK926Q== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678714" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678714" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:55 -0700 X-CSE-ConnectionGUID: /2YVidyFRmmqSbT3+hjlIQ== X-CSE-MsgGUID: nEM97a+7R8CQgU8SC6kQDw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499310" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:55 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com Subject: [PATCH v2 02/25] KVM: TDX: Get TDX global information Date: Wed, 30 Oct 2024 12:00:15 -0700 Message-ID: <20241030190039.77971-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kai Huang KVM will need to consult some essential TDX global information to create and run TDX guests. Get the global information after initializing TDX. Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - New patch --- arch/x86/kvm/vmx/tdx.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 8651599822d5..f95a4dbcaf4a 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -12,6 +12,8 @@ module_param_named(tdx, enable_tdx, bool, 0444); =20 static enum cpuhp_state tdx_cpuhp_state; =20 +static const struct tdx_sys_info *tdx_sysinfo; + static int tdx_online_cpu(unsigned int cpu) { unsigned long flags; @@ -91,11 +93,20 @@ static int __init __tdx_bringup(void) if (r) goto tdx_bringup_err; =20 + /* Get TDX global information for later use */ + tdx_sysinfo =3D tdx_get_sysinfo(); + if (WARN_ON_ONCE(!tdx_sysinfo)) { + r =3D -EINVAL; + goto get_sysinfo_err; + } + /* * Leave hardware virtualization enabled after TDX is enabled * successfully. TDX CPU hotplug depends on this. */ return 0; +get_sysinfo_err: + __do_tdx_cleanup(); tdx_bringup_err: kvm_disable_virtualization(); return r; --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5534B1F4738; Wed, 30 Oct 2024 19:00:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314860; cv=none; b=Fqknj3u32TTen+lQdCziN3uqwduk6xH1tZ7G9/9VQGtLxrlTsrkk4dV8RYFJ/yYrjDQNWr2PBRvwE62rfYgpXuANa07F0BN2o9HgYUZLdujpSldBygMX9p+O0Fa/aD0enwbJscxp0+puEAhUzoAVNyN2WLSlx1CXFHBt9vPa4Zk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314860; c=relaxed/simple; bh=acNVo783QACSFNE9UVnReuXtxBaw25XkhCJyiieeobw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TApTUd0lIJ9P72NkPRFPpVONnApP/BhGUaNmufbXMsQ+JTpUw0hyjo2MxkazWu+TEdXab9PWYb4do0vwcpJxL/sJRnRLRSGuoQUtvgz/2a/ufNm2fJOlA6rbBKeG5lsfAikYSR169e4VyZYYCsK63NQinUW3gwZDWkJgeP9JwUM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=IPXnq3Sr; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="IPXnq3Sr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314858; x=1761850858; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=acNVo783QACSFNE9UVnReuXtxBaw25XkhCJyiieeobw=; b=IPXnq3Sr+83LR8fI7cEe20rYNZ712mBKTn0qBgJqxtTqBz+Ao/zDWuk/ D7OU+066vVPlkQ54ZJ5Eb4ZAitXR8nUWSJ2u3BvDmoeYS8NB41lAmxB04 u+hzTOrOGNXU7o0A8so1GQnuOwdNL/xCsUjCqiDiftmdUFdobUwJNytL6 psVdts5BzKczs3JRFeNGrjA/uZ+T7jxDNuKt5nByXTbwSQ/o1IMoKbwHd /L2Y6wOFOj9qIiddkotNMQ2wv5DWJwxF1rUz12+AB8ISjQkKAlar4xtGY ucELqQgQGu4OiduO6YBviIz2DXUxad7Z3r3KrjzG6VPaF1GcTEX1gtfJB A==; X-CSE-ConnectionGUID: mBXb2JsZTUisZYvXNLuIdQ== X-CSE-MsgGUID: nqFcW63qQd+auDk/Q6xQHw== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678719" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678719" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:56 -0700 X-CSE-ConnectionGUID: T//1YEvYSIC4gEiJMnxUTQ== X-CSE-MsgGUID: w1jPkm1uQai1hT9JrUu75Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499319" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:55 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com Subject: [PATCH v2 03/25] x86/virt/tdx: Read essential global metadata for KVM Date: Wed, 30 Oct 2024 12:00:16 -0700 Message-ID: <20241030190039.77971-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kai Huang KVM needs two classes of global metadata to create and run TDX guests: - "TD Control Structures" - "TD Configurability" The first class contains the sizes of TDX guest per-VM and per-vCPU control structures. KVM will need to use them to allocate enough space for those control structures. The second class contains info which reports things like which features are configurable to TDX guest etc. KVM will need to use them to properly configure TDX guests. Read them for KVM TDX to use. The code change is auto-generated by re-running the script in [1] after uncommenting the "td_conf" and "td_ctrl" part to regenerate the tdx_global_metadata.{hc} and update them to the existing ones in the kernel. #python tdx.py global_metadata.json tdx_global_metadata.h \ tdx_global_metadata.c The 'global_metadata.json' can be fetched from [2]. Link: https://lore.kernel.org/kvm/0853b155ec9aac09c594caa60914ed6ea4dc0a71.= camel@intel.com/ [1] Link: https://cdrdv2.intel.com/v1/dl/getContent/795381 [2] Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - New patch --- arch/x86/include/asm/tdx_global_metadata.h | 19 +++++++++ arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 46 +++++++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/arch/x86/include/asm/tdx_global_metadata.h b/arch/x86/include/= asm/tdx_global_metadata.h index fde370b855f1..206090c9952f 100644 --- a/arch/x86/include/asm/tdx_global_metadata.h +++ b/arch/x86/include/asm/tdx_global_metadata.h @@ -32,11 +32,30 @@ struct tdx_sys_info_cmr { u64 cmr_size[32]; }; =20 +struct tdx_sys_info_td_ctrl { + u16 tdr_base_size; + u16 tdcs_base_size; + u16 tdvps_base_size; +}; + +struct tdx_sys_info_td_conf { + u64 attributes_fixed0; + u64 attributes_fixed1; + u64 xfam_fixed0; + u64 xfam_fixed1; + u16 num_cpuid_config; + u16 max_vcpus_per_td; + u64 cpuid_config_leaves[32]; + u64 cpuid_config_values[32][2]; +}; + struct tdx_sys_info { struct tdx_sys_info_version version; struct tdx_sys_info_features features; struct tdx_sys_info_tdmr tdmr; struct tdx_sys_info_cmr cmr; + struct tdx_sys_info_td_ctrl td_ctrl; + struct tdx_sys_info_td_conf td_conf; }; =20 #endif diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vm= x/tdx/tdx_global_metadata.c index 2fe57e084453..44c2b3e079de 100644 --- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c +++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c @@ -76,6 +76,50 @@ static int get_tdx_sys_info_cmr(struct tdx_sys_info_cmr = *sysinfo_cmr) return ret; } =20 +static int get_tdx_sys_info_td_ctrl(struct tdx_sys_info_td_ctrl *sysinfo_t= d_ctrl) +{ + int ret =3D 0; + u64 val; + + if (!ret && !(ret =3D read_sys_metadata_field(0x9800000100000000, &val))) + sysinfo_td_ctrl->tdr_base_size =3D val; + if (!ret && !(ret =3D read_sys_metadata_field(0x9800000100000100, &val))) + sysinfo_td_ctrl->tdcs_base_size =3D val; + if (!ret && !(ret =3D read_sys_metadata_field(0x9800000100000200, &val))) + sysinfo_td_ctrl->tdvps_base_size =3D val; + + return ret; +} + +static int get_tdx_sys_info_td_conf(struct tdx_sys_info_td_conf *sysinfo_t= d_conf) +{ + int ret =3D 0; + u64 val; + int i, j; + + if (!ret && !(ret =3D read_sys_metadata_field(0x1900000300000000, &val))) + sysinfo_td_conf->attributes_fixed0 =3D val; + if (!ret && !(ret =3D read_sys_metadata_field(0x1900000300000001, &val))) + sysinfo_td_conf->attributes_fixed1 =3D val; + if (!ret && !(ret =3D read_sys_metadata_field(0x1900000300000002, &val))) + sysinfo_td_conf->xfam_fixed0 =3D val; + if (!ret && !(ret =3D read_sys_metadata_field(0x1900000300000003, &val))) + sysinfo_td_conf->xfam_fixed1 =3D val; + if (!ret && !(ret =3D read_sys_metadata_field(0x9900000100000004, &val))) + sysinfo_td_conf->num_cpuid_config =3D val; + if (!ret && !(ret =3D read_sys_metadata_field(0x9900000100000008, &val))) + sysinfo_td_conf->max_vcpus_per_td =3D val; + for (i =3D 0; i < sysinfo_td_conf->num_cpuid_config; i++) + if (!ret && !(ret =3D read_sys_metadata_field(0x9900000300000400 + i, &v= al))) + sysinfo_td_conf->cpuid_config_leaves[i] =3D val; + for (i =3D 0; i < sysinfo_td_conf->num_cpuid_config; i++) + for (j =3D 0; j < 2; j++) + if (!ret && !(ret =3D read_sys_metadata_field(0x9900000300000500 + i * = 2 + j, &val))) + sysinfo_td_conf->cpuid_config_values[i][j] =3D val; + + return ret; +} + static int get_tdx_sys_info(struct tdx_sys_info *sysinfo) { int ret =3D 0; @@ -84,6 +128,8 @@ static int get_tdx_sys_info(struct tdx_sys_info *sysinfo) ret =3D ret ?: get_tdx_sys_info_features(&sysinfo->features); ret =3D ret ?: get_tdx_sys_info_tdmr(&sysinfo->tdmr); ret =3D ret ?: get_tdx_sys_info_cmr(&sysinfo->cmr); + ret =3D ret ?: get_tdx_sys_info_td_ctrl(&sysinfo->td_ctrl); + ret =3D ret ?: get_tdx_sys_info_td_conf(&sysinfo->td_conf); =20 return ret; } --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 702FA213EE0; Wed, 30 Oct 2024 19:00:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314862; cv=none; b=DaVW+HpMC6TUU4l1EeY4hkCcM+SH8cQ5hcz1lSBi7QnKWXTqy3QcdAimyjorN0Xkhkx54D6nf0L4wnpp+KTfuse8UzNkwPWd3nQ+dwuYv5czMrSKo4CrDUYIZwIl2au+Ltctuh4tErg10vC1tDyDh1tduvJWoSXSa35CQq5KSTA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314862; c=relaxed/simple; bh=r4DqlT6kr0U8khXg231vkLTSX6v5nFIoCA9SrxE9kFg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=p77VfcHYwQ6VV2ElIeuzP70aWatWARUrZ76IIQu9HijKtr118eLt4PsbC984TENkp80Yc7MPQ4XYlYV+mmA5DDWeGUv7xBN9XCXQr+FDdMYjY7MSaIp0pna3VGdBTvA/NqsWV1Eml3O6ZM66JCd550OlxRMxyVMn248gxQ/Y9ZQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AwVRXfiF; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AwVRXfiF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314859; x=1761850859; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=r4DqlT6kr0U8khXg231vkLTSX6v5nFIoCA9SrxE9kFg=; b=AwVRXfiFV6hX3m+3LF7zJnNfLWGfTON9ltYPpnxP88tguxjy5P+9HSDi 7zZe99t4FxLIZraVi/zEhSXufX/zzaWuY1uhm1M6rly/s28TjPoHmJTna s96wmtrJLB3Pay0UCP0McsaMWqym24J89Uz3Bd5C2gJwAyVfMFPnjjlY6 OowYy2xnl7nX/1ZcoGK2WlK9NkR0osoJM5qkx+gmR5sZY1PEM/onC5zIj U8H7nbRJ79/T5DToeqGsgodnqVQZdBoB3ReBupxThakaR+0zEMSYbT+Js KIxESzzUYNVZYp8tIYY3acnOMFE1qcrtNclvWtbllgU+JGnAw+0pdWuJT w==; X-CSE-ConnectionGUID: vZbp1NL6TqywD4gW2QtCoQ== X-CSE-MsgGUID: pf0hGTkDTrGEXGusvPHLkA== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678724" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678724" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:56 -0700 X-CSE-ConnectionGUID: PlJXjUXCTtiMXcl0KHZs0Q== X-CSE-MsgGUID: I/VwUYewReu05n+gluy0sg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499327" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:56 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata Subject: [PATCH v2 04/25] x86/virt/tdx: Add tdx_guest_keyid_alloc/free() to alloc and free TDX guest KeyID Date: Wed, 30 Oct 2024 12:00:17 -0700 Message-ID: <20241030190039.77971-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Intel TDX protects guest VMs from malicious host and certain physical attacks. Pre-TDX Intel hardware has support for a memory encryption architecture called MK-TME, which repurposes several high bits of physical address as "KeyID". The BIOS reserves a sub-range of MK-TME KeyIDs as "TDX private KeyIDs". Each TDX guest must be assigned with a unique TDX KeyID when it is created. The kernel reserves the first TDX private KeyID for crypto-protection of specific TDX module data which has a lifecycle that exceeds the KeyID reserved for the TD's use. The rest of the KeyIDs are left for TDX guests to use. Create a small KeyID allocator. Export tdx_guest_keyid_alloc()/tdx_guest_keyid_free() to allocate and free TDX guest KeyID for KVM to use. Don't provide the stub functions when CONFIG_INTEL_TDX_HOST=3Dn since they are not supposed to be called in this case. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - Move code from KVM to x86 core, and export them. - Update log. uAPI breakout v1: - Update the commit message - Delete stale comment on global hkdi - Deleted WARN_ON_ONCE() as it doesn't seemed very usefull v19: - Removed stale comment in tdx_guest_keyid_alloc() by Binbin - Update sanity check in tdx_guest_keyid_free() by Binbin v18: - Moved the functions to kvm tdx from arch/x86/virt/vmx/tdx/ - Drop exporting symbols as the host tdx does. --- arch/x86/include/asm/tdx.h | 3 +++ arch/x86/virt/vmx/tdx/tdx.c | 17 +++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index b9758369d82c..d33e46d53d59 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -118,6 +118,9 @@ int tdx_cpu_enable(void); int tdx_enable(void); const char *tdx_dump_mce_info(struct mce *m); const struct tdx_sys_info *tdx_get_sysinfo(void); + +int tdx_guest_keyid_alloc(void); +void tdx_guest_keyid_free(unsigned int keyid); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 7589c75eaa6c..b883c1a4b002 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -42,6 +43,8 @@ static u32 tdx_global_keyid __ro_after_init; static u32 tdx_guest_keyid_start __ro_after_init; static u32 tdx_nr_guest_keyids __ro_after_init; =20 +static DEFINE_IDA(tdx_guest_keyid_pool); + static DEFINE_PER_CPU(bool, tdx_lp_initialized); =20 static struct tdmr_info_list tdx_tdmr_list; @@ -1545,3 +1548,17 @@ const struct tdx_sys_info *tdx_get_sysinfo(void) return p; } EXPORT_SYMBOL_GPL(tdx_get_sysinfo); + +int tdx_guest_keyid_alloc(void) +{ + return ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start, + tdx_guest_keyid_start + tdx_nr_guest_keyids - 1, + GFP_KERNEL); +} +EXPORT_SYMBOL_GPL(tdx_guest_keyid_alloc); + +void tdx_guest_keyid_free(unsigned int keyid) +{ + ida_free(&tdx_guest_keyid_pool, keyid); +} +EXPORT_SYMBOL_GPL(tdx_guest_keyid_free); --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE9DD2144C9; Wed, 30 Oct 2024 19:00:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314862; cv=none; b=QgSDU3lVNPZ48o3ywlO6WJIzyo4ODw/mW5ywjbq6vzb/9QKyx3EA2Kuq/reGe2Q6pq48mSxXzhnPTSTIX8y6VB1Iv61V/BZ/rSN5PBa+yKJwo6Fb163jpLFo42xfn8ns+R2Ucc+6JreMQVchCRGbY/RmzOmJxRdYEVisQCRE5y8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314862; c=relaxed/simple; bh=CiSo38Rz8D/47DL/LqhUW7zxwMDCBDtJIrYv9FERmIY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=L6bd0Q3y2lg9cmCnkbRrW44zFQPrvEKUWhLFVwEmjYMoiZIQhE/4XNSC+JY9yZb+KJpxo4QgboWGzMzSvLXlApQssXm3+diSLtY8VmDHaWVN1s8KPwb+QwOYMu9JOtmzMtrSWXzgE75mAwTNAOa5fzMyc7uXG7Cg9cMmwvgbgeA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=l24PXbhi; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="l24PXbhi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314860; x=1761850860; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CiSo38Rz8D/47DL/LqhUW7zxwMDCBDtJIrYv9FERmIY=; b=l24PXbhiU7zIrxThO5rmp4aQR1QbugDuHExx2XDZF1xw7K3YeaWSQu3Z PPPU5DBW7cLq4mWRijSjMH3c1Aj0pqSJn8lVjlUAuQs4Y7+691wB22tl4 6D/YBev8GKfUqMb1GVPz5ogSW5/jmat2BzacKbI6LZJ1tBl3VMBm2Mngq LYdyFlhNM0Sw/7txcV6EpOlUnsuXhQQR4ICCEfTwodAjlgmkEMXiDnZ8e HB/kdeQviia75G0dDGGEFeQUman82eM5iR0REWlm2DJK7U/BmDxhID22N 2GibhpkMUjKsgyqF5qLqp9xqBpj359URtQWVPp0IxqNodWM943SuRQqIU w==; X-CSE-ConnectionGUID: oxQ+PN7HRUOrgqGZWhTyTQ== X-CSE-MsgGUID: h2k15WMdRIWJp/IPXAOfwA== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678732" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678732" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:57 -0700 X-CSE-ConnectionGUID: 7spFaC6LQ2GIQ3AxxXTuWA== X-CSE-MsgGUID: C4WTCg4RQcKMTbXdbrI9TA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499335" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:56 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata , Sean Christopherson , Binbin Wu , Yuan Yao Subject: [PATCH v2 05/25] x86/virt/tdx: Add SEAMCALL wrappers for TDX KeyID management Date: Wed, 30 Oct 2024 12:00:18 -0700 Message-ID: <20241030190039.77971-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Intel TDX protects guest VMs from malicious host and certain physical attacks. Pre-TDX Intel hardware has support for a memory encryption architecture called MK-TME, which repurposes several high bits of physical address as "KeyID". TDX ends up with reserving a sub-range of MK-TME KeyIDs as "TDX private KeyIDs". Like MK-TME, these KeyIDs can be associated with an ephemeral key. For TDX this association is done by the TDX module. It also has its own tracking for which KeyIDs are in use. To do this ephemeral key setup and manipulate the TDX module's internal tracking, KVM will use the following SEAMCALLs: TDH.MNG.KEY.CONFIG: Mark the KeyID as in use, and initialize its ephemeral key. TDH.MNG.KEY.FREEID: Mark the KeyID as not in use. These SEAMCALLs both operate on TDR structures, which are setup using the previously added TDH.MNG.CREATE SEAMCALL. KVM's use of these operations will go like: - tdx_guest_keyid_alloc() - Initialize TD and TDR page with TDH.MNG.CREATE (not yet-added), passing KeyID - TDH.MNG.KEY.CONFIG to initialize the key - TD runs, teardown is started - TDH.MNG.KEY.FREEID - tdx_guest_keyid_free() Don't try to combine the tdx_guest_keyid_alloc() and TDH.MNG.KEY.CONFIG operations because TDH.MNG.CREATE and some locking need to be done in the middle. Don't combine TDH.MNG.KEY.FREEID and tdx_guest_keyid_free() so they are symmetrical with the creation path. So implement tdh_mng_key_config() and tdh_mng_key_freeid() as separate functions than tdx_guest_keyid_alloc() and tdx_guest_keyid_free(). Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu Reviewed-by: Yuan Yao Acked-by: Dave Hansen --- uAPI breakout v2: - Change to use 'u64' as function parameter to prepare to move SEAMCALL wrappers to arch/x86. (Kai) - Split to separate patch - Move SEAMCALL wrappers from KVM to x86 core; - Move TDH_xx macros from KVM to x86 core; - Re-write log uAPI breakout v1: - Make argument to C wrapper function struct kvm_tdx * or struct vcpu_tdx * .(Sean) - Drop unused helpers (Kai) - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) v19: - Update the commit message to match the patch by Yuan - Use seamcall() and seamcall_ret() by paolo v18: - removed stub functions for __seamcall{,_ret}() - Added Reviewed-by Binbin - Make tdx_seamcall() use struct tdx_module_args instead of taking each inputs. v16: - use struct tdx_module_args instead of struct tdx_module_output - Add tdh_mem_sept_rd() for SEPT_VE_DISABLE=3D1. --- arch/x86/include/asm/tdx.h | 4 ++++ arch/x86/virt/vmx/tdx/tdx.c | 20 ++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 16 +++++++++------- 3 files changed, 33 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index d33e46d53d59..9897335a8e2f 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -121,6 +121,10 @@ const struct tdx_sys_info *tdx_get_sysinfo(void); =20 int tdx_guest_keyid_alloc(void); void tdx_guest_keyid_free(unsigned int keyid); + +/* SEAMCALL wrappers for creating/destroying/running TDX guests */ +u64 tdh_mng_key_config(u64 tdr); +u64 tdh_mng_key_freeid(u64 tdr); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index b883c1a4b002..c42eab8cc069 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1562,3 +1562,23 @@ void tdx_guest_keyid_free(unsigned int keyid) ida_free(&tdx_guest_keyid_pool, keyid); } EXPORT_SYMBOL_GPL(tdx_guest_keyid_free); + +u64 tdh_mng_key_config(u64 tdr) +{ + struct tdx_module_args args =3D { + .rcx =3D tdr, + }; + + return seamcall(TDH_MNG_KEY_CONFIG, &args); +} +EXPORT_SYMBOL_GPL(tdh_mng_key_config); + +u64 tdh_mng_key_freeid(u64 tdr) +{ + struct tdx_module_args args =3D { + .rcx =3D tdr, + }; + + return seamcall(TDH_MNG_KEY_FREEID, &args); +} +EXPORT_SYMBOL_GPL(tdh_mng_key_freeid); diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 9b708a8fb568..95002e7ff4c5 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -17,13 +17,15 @@ /* * TDX module SEAMCALL leaf functions */ -#define TDH_PHYMEM_PAGE_RDMD 24 -#define TDH_SYS_KEY_CONFIG 31 -#define TDH_SYS_INIT 33 -#define TDH_SYS_RD 34 -#define TDH_SYS_LP_INIT 35 -#define TDH_SYS_TDMR_INIT 36 -#define TDH_SYS_CONFIG 45 +#define TDH_MNG_KEY_CONFIG 8 +#define TDH_MNG_KEY_FREEID 20 +#define TDH_PHYMEM_PAGE_RDMD 24 +#define TDH_SYS_KEY_CONFIG 31 +#define TDH_SYS_INIT 33 +#define TDH_SYS_RD 34 +#define TDH_SYS_LP_INIT 35 +#define TDH_SYS_TDMR_INIT 36 +#define TDH_SYS_CONFIG 45 =20 /* TDX page types */ #define PT_NDA 0x0 --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B1E7205AC7; Wed, 30 Oct 2024 19:01:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314865; cv=none; b=DtntZqHyB9jlFbMLFBbG2AHmRR+naqgJk4Y+2Tkvj8RkriJw3lZWQ3v9P9W4J/+SYWvcPzYFhc8MkPbuHzYz3Q0j7GHGPQcSi0tbt2Iag6ZWWgJhpI4iCPJ2QftwaXQJjAqOsMBbaKpMKyo1AWg9XyPch8xlgEq26F8Z+Wp5048= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314865; c=relaxed/simple; bh=6ROfy2DDXp7OY67erwalEx4mAjPzBIrJaF6l96Cx5xA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lHx6Zn/VptgIAzi1OEouNn2lQzMh8w7P8wO0GcEahJGNAIi7PZT8GNI/7f6lKyYGbWVm/vzJHYX07rX8uQ76g+xG2PSmGUA9VnyHsXNTsiLbnuEn1ZJxv36oIS4B2cJTkg2LZSlqE+s4YN+wQGHZrlyW65kkQKPo8KhgjFx/qBg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aIICdekM; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aIICdekM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314861; x=1761850861; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6ROfy2DDXp7OY67erwalEx4mAjPzBIrJaF6l96Cx5xA=; b=aIICdekM7RUKXNIrsWHoWDKhRacSTGrJrn62GP/x4i5wdOqsM3max6UN vV29f78d3bzcKsgsujq1PYHK8LWJ3UCm4OYaSvpWFYquiFPqCIwMVFd8V 266lE4s5wuvIBK5ThXyuLbdsrJ7Sk7CnGBErtUc8HoYCWTs6RonQuMdb/ bmGPWfrE0fr2Cyn2C55vaoUKKrh9qPJ05D54vfZ+/sDpX8DZ+EAZccL7A xrBaswBqwriOZ8MJcHGRcHXPO9Zm5IssM34P1mTGGcO7wDEjSRO9Wxqh3 rXDGNgvk5VlJRkULrYW5UT5ne/b6uCy/ttFYuDQbmvCQI4kRHssUctV+o w==; X-CSE-ConnectionGUID: 2GbWuA2JSYK8ns9mqnkQoQ== X-CSE-MsgGUID: EwqkGwTvSLOCDZ7QPpWANg== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678740" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678740" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:57 -0700 X-CSE-ConnectionGUID: bJgxID0XQYC7InQBpOZYGw== X-CSE-MsgGUID: QSoVWIl+QMa7TV5WD2yQpw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499341" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:57 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata , Sean Christopherson , Binbin Wu , Yuan Yao Subject: [PATCH v2 06/25] x86/virt/tdx: Add SEAMCALL wrappers for TDX TD creation Date: Wed, 30 Oct 2024 12:00:19 -0700 Message-ID: <20241030190039.77971-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Intel TDX protects guest VMs from malicious hosts and certain physical attacks. It defines various control structures that hold state for things like TDs or vCPUs. These control structures are stored in pages given to the TDX module and encrypted with either the global KeyID or the guest KeyIDs. To manipulate these control structures the TDX module defines a few SEAMCALLs. KVM will use these during the process of creating a TD as follows: 1) Allocate a unique TDX KeyID for a new guest. 1) Call TDH.MNG.CREATE to create a "TD Root" (TDR) page, together with the new allocated KeyID. Unlike the rest of the TDX guest, the TDR page is crypto-protected by the 'global KeyID'. 2) Call the previously added TDH.MNG.KEY.CONFIG on each package to configure the KeyID for the guest. After this step, the KeyID to protect the guest is ready and the rest of the guest will be protected by this KeyID. 3) Call TDH.MNG.ADDCX to add TD Control Structure (TDCS) pages. 4) Call TDH.MNG.INIT to initialize the TDCS. To reclaim these pages for use by the kernel other SEAMCALLs are needed, which will be added in future patches. Add tdh_mng_addcx(), tdh_mng_create() and tdh_mng_init() to export these SEAMCALLs so that KVM can use them to create TDs. For SEAMCALLs that give a page to the TDX module to be encrypted, clflush the page mapped with KeyID 0, such that any dirty cache lines don't write back later and clobber TD memory or control structures. Don't worry about the other MK-TME KeyIDs because the kernel doesn't use them. The TDX docs specify that this flush is not needed unless the TDX module exposes the CLFLUSH_BEFORE_ALLOC feature bit. Be conservative and aways flush. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu Reviewed-by: Yuan Yao --- uAPI breakout v2: - Change to use 'u64' as function parameter to prepare to move SEAMCALL wrappers to arch/x86. (Kai) - Split to separate patch - Move SEAMCALL wrappers from KVM to x86 core; - Move TDH_xx macros from KVM to x86 core; - Re-write log uAPI breakout v1: - Make argument to C wrapper function struct kvm_tdx * or struct vcpu_tdx * .(Sean) - Drop unused helpers (Kai) - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) v19: - Update the commit message to match the patch by Yuan - Use seamcall() and seamcall_ret() by paolo v18: - removed stub functions for __seamcall{,_ret}() - Added Reviewed-by Binbin - Make tdx_seamcall() use struct tdx_module_args instead of taking each inputs. v16: - use struct tdx_module_args instead of struct tdx_module_output - Add tdh_mem_sept_rd() for SEPT_VE_DISABLE=3D1. --- arch/x86/include/asm/tdx.h | 3 +++ arch/x86/virt/vmx/tdx/tdx.c | 39 +++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 3 +++ 3 files changed, 45 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 9897335a8e2f..9d19ca33e884 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -123,8 +123,11 @@ int tdx_guest_keyid_alloc(void); void tdx_guest_keyid_free(unsigned int keyid); =20 /* SEAMCALL wrappers for creating/destroying/running TDX guests */ +u64 tdh_mng_addcx(u64 tdr, u64 tdcs); u64 tdh_mng_key_config(u64 tdr); +u64 tdh_mng_create(u64 tdr, u64 hkid); u64 tdh_mng_key_freeid(u64 tdr); +u64 tdh_mng_init(u64 tdr, u64 td_params, u64 *rcx); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index c42eab8cc069..16122fd552ff 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1563,6 +1563,18 @@ void tdx_guest_keyid_free(unsigned int keyid) } EXPORT_SYMBOL_GPL(tdx_guest_keyid_free); =20 +u64 tdh_mng_addcx(u64 tdr, u64 tdcs) +{ + struct tdx_module_args args =3D { + .rcx =3D tdcs, + .rdx =3D tdr, + }; + + clflush_cache_range(__va(tdcs), PAGE_SIZE); + return seamcall(TDH_MNG_ADDCX, &args); +} +EXPORT_SYMBOL_GPL(tdh_mng_addcx); + u64 tdh_mng_key_config(u64 tdr) { struct tdx_module_args args =3D { @@ -1573,6 +1585,17 @@ u64 tdh_mng_key_config(u64 tdr) } EXPORT_SYMBOL_GPL(tdh_mng_key_config); =20 +u64 tdh_mng_create(u64 tdr, u64 hkid) +{ + struct tdx_module_args args =3D { + .rcx =3D tdr, + .rdx =3D hkid, + }; + clflush_cache_range(__va(tdr), PAGE_SIZE); + return seamcall(TDH_MNG_CREATE, &args); +} +EXPORT_SYMBOL_GPL(tdh_mng_create); + u64 tdh_mng_key_freeid(u64 tdr) { struct tdx_module_args args =3D { @@ -1582,3 +1605,19 @@ u64 tdh_mng_key_freeid(u64 tdr) return seamcall(TDH_MNG_KEY_FREEID, &args); } EXPORT_SYMBOL_GPL(tdh_mng_key_freeid); + +u64 tdh_mng_init(u64 tdr, u64 td_params, u64 *rcx) +{ + struct tdx_module_args args =3D { + .rcx =3D tdr, + .rdx =3D td_params, + }; + u64 ret; + + ret =3D seamcall_ret(TDH_MNG_INIT, &args); + + *rcx =3D args.rcx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mng_init); diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 95002e7ff4c5..b9287304f372 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -17,8 +17,11 @@ /* * TDX module SEAMCALL leaf functions */ +#define TDH_MNG_ADDCX 1 #define TDH_MNG_KEY_CONFIG 8 +#define TDH_MNG_CREATE 9 #define TDH_MNG_KEY_FREEID 20 +#define TDH_MNG_INIT 21 #define TDH_PHYMEM_PAGE_RDMD 24 #define TDH_SYS_KEY_CONFIG 31 #define TDH_SYS_INIT 33 --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A95111D0F74; Wed, 30 Oct 2024 19:01:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314865; cv=none; b=FBfWg3rpF0nSR6K3cCw/jNVh77/ClvPmuk+P0kzrwpNfz+b+e1UG6WBr0GFMeWU8IOB2xPg51sLgG08Fwgvz+duaDiRWU4/BEuefLDUfdBLdtpbvxTiedRvv86Dl5snbOdR95oi4UyO9f/FzKuv4U163ZvKsKLTjoR/BTTKW8ec= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314865; c=relaxed/simple; bh=m6vNof1vbkA2Ig4mR5WEhb8pC979uaUjk9FfOxwxvy0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tuKffqBwNPz7Ouy4711H4PqP/V2H2uTBtbfDs310zGkahdNoVoKj3JOlPQGwjAHzdMFcfwt5nlYz6SNoM3IuN6z8NiskZG9crFOprwmxlHwf8uD+7se5CNr/CfFwVapiIvkUZfjOZXPAReRLmxE80LiILkOprzXcCVZhLVVu9kI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OT4v/Sdp; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OT4v/Sdp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314863; x=1761850863; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m6vNof1vbkA2Ig4mR5WEhb8pC979uaUjk9FfOxwxvy0=; b=OT4v/SdpQXtLDybv+mGzt/QWxIu0MmY84z+07KQAYUYTXoquUj+wBXhj mOdjxugQBZedboHntrgTEmYQ0NE/ruBgH4TgEdpMhRjWtfM3SASYa/MNj 0Di0adbtC1zJuExBdIMOWIuulAKV4QF1uojakz/BTeNyFBTO9NpE6dAbH utpNzGJm0JngkTz2Ztvx5ohI9tAHAJJ7Vgbzj63cPgY84wqf88cNrC/0L ikTSpUZ0hZS/Zr5ji5qATjlm4E2qTSO4e0Smca3zvd//9HOjlUIT1iefN raeKPLvzxZ7VAGO+R4+TZOtYDHQAuNWZfOyGliQZcPJdioKxoA6Dq7qRT A==; X-CSE-ConnectionGUID: nKhZFDz5TyCI/VuMvfkv+w== X-CSE-MsgGUID: +5Gju/KOQXWG4IK6Eaj6Lg== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678748" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678748" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:58 -0700 X-CSE-ConnectionGUID: 5N/qpNJiRrqGCqdstuFtSQ== X-CSE-MsgGUID: lrpDf6hdTOq/1hNnTyIB+A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499351" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:57 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata , Sean Christopherson , Binbin Wu , Yuan Yao Subject: [PATCH v2 07/25] x86/virt/tdx: Add SEAMCALL wrappers for TDX vCPU creation Date: Wed, 30 Oct 2024 12:00:20 -0700 Message-ID: <20241030190039.77971-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Intel TDX protects guest VMs from malicious host and certain physical attacks. It defines various control structures that hold state for virtualized components of the TD (i.e. VMs or vCPUs) These control structures are stored in pages given to the TDX module and encrypted with either the global KeyID or the guest KeyIDs. To manipulate these control structures the TDX module defines a few SEAMCALLs. KVM will use these during the process of creating a vCPU as follows: 1) Call TDH.VP.CREATE to create a TD vCPU Root (TDVPR) page for each vCPU. 2) Call TDH.VP.ADDCX to add per-vCPU control pages (TDCX) for each vCPU. 3) Call TDH.VP.INIT to initialize the TDCX for each vCPU. To reclaim these pages for use by the kernel other SEAMCALLs are needed, which will be added in future patches. Export functions to allow KVM to make these SEAMCALLs. Export two variants for TDH.VP.CREATE, in order to support the planned logic of KVM to support TDX modules with and without the ENUM_TOPOLOGY feature. If KVM can drop support for the !ENUM_TOPOLOGY case, this could go down a single version. Leave that for later discussion. For SEAMCALLs that give a page to the TDX module to be encrypted, clflush the page mapped with KeyID 0, such that any dirty cache lines don't write back later and clobber TD memory or control structures. Don't worry about the other MK-TME KeyIDs because the kernel doesn't use them. The TDX docs specify that this flush is not needed unless the TDX module exposes the CLFLUSH_BEFORE_ALLOC feature bit. Be conservative and always flush. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu Reviewed-by: Yuan Yao --- uAPI breakout v2: - Change to use 'u64' as function parameter to prepare to move SEAMCALL wrappers to arch/x86. (Kai) - Split to separate patch - Move SEAMCALL wrappers from KVM to x86 core; - Move TDH_xx macros from KVM to x86 core; - Re-write log uAPI breakout v1: - Make argument to C wrapper function struct kvm_tdx * or struct vcpu_tdx * .(Sean) - Drop unused helpers (Kai) - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) v19: - Update the commit message to match the patch by Yuan - Use seamcall() and seamcall_ret() by paolo v18: - removed stub functions for __seamcall{,_ret}() - Added Reviewed-by Binbin - Make tdx_seamcall() use struct tdx_module_args instead of taking each inputs. v16: - use struct tdx_module_args instead of struct tdx_module_output - Add tdh_mem_sept_rd() for SEPT_VE_DISABLE=3D1. --- arch/x86/include/asm/tdx.h | 4 +++ arch/x86/virt/vmx/tdx/tdx.c | 49 +++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 12 +++++++++ 3 files changed, 65 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 9d19ca33e884..6951faa37031 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -124,10 +124,14 @@ void tdx_guest_keyid_free(unsigned int keyid); =20 /* SEAMCALL wrappers for creating/destroying/running TDX guests */ u64 tdh_mng_addcx(u64 tdr, u64 tdcs); +u64 tdh_vp_addcx(u64 tdvpr, u64 tdcx); u64 tdh_mng_key_config(u64 tdr); u64 tdh_mng_create(u64 tdr, u64 hkid); +u64 tdh_vp_create(u64 tdr, u64 tdvpr); u64 tdh_mng_key_freeid(u64 tdr); u64 tdh_mng_init(u64 tdr, u64 td_params, u64 *rcx); +u64 tdh_vp_init(u64 tdvpr, u64 initial_rcx); +u64 tdh_vp_init_apicid(u64 tdvpr, u64 initial_rcx, u32 x2apicid); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 16122fd552ff..b3003031e0fe 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1575,6 +1575,18 @@ u64 tdh_mng_addcx(u64 tdr, u64 tdcs) } EXPORT_SYMBOL_GPL(tdh_mng_addcx); =20 +u64 tdh_vp_addcx(u64 tdvpr, u64 tdcx) +{ + struct tdx_module_args args =3D { + .rcx =3D tdcx, + .rdx =3D tdvpr, + }; + + clflush_cache_range(__va(tdcx), PAGE_SIZE); + return seamcall(TDH_VP_ADDCX, &args); +} +EXPORT_SYMBOL_GPL(tdh_vp_addcx); + u64 tdh_mng_key_config(u64 tdr) { struct tdx_module_args args =3D { @@ -1591,11 +1603,24 @@ u64 tdh_mng_create(u64 tdr, u64 hkid) .rcx =3D tdr, .rdx =3D hkid, }; + clflush_cache_range(__va(tdr), PAGE_SIZE); return seamcall(TDH_MNG_CREATE, &args); } EXPORT_SYMBOL_GPL(tdh_mng_create); =20 +u64 tdh_vp_create(u64 tdr, u64 tdvpr) +{ + struct tdx_module_args args =3D { + .rcx =3D tdvpr, + .rdx =3D tdr, + }; + + clflush_cache_range(__va(tdr), PAGE_SIZE); + return seamcall(TDH_VP_CREATE, &args); +} +EXPORT_SYMBOL_GPL(tdh_vp_create); + u64 tdh_mng_key_freeid(u64 tdr) { struct tdx_module_args args =3D { @@ -1621,3 +1646,27 @@ u64 tdh_mng_init(u64 tdr, u64 td_params, u64 *rcx) return ret; } EXPORT_SYMBOL_GPL(tdh_mng_init); + +u64 tdh_vp_init(u64 tdvpr, u64 initial_rcx) +{ + struct tdx_module_args args =3D { + .rcx =3D tdvpr, + .rdx =3D initial_rcx, + }; + + return seamcall(TDH_VP_INIT, &args); +} +EXPORT_SYMBOL_GPL(tdh_vp_init); + +u64 tdh_vp_init_apicid(u64 tdvpr, u64 initial_rcx, u32 x2apicid) +{ + struct tdx_module_args args =3D { + .rcx =3D tdvpr, + .rdx =3D initial_rcx, + .r8 =3D x2apicid, + }; + + /* apicid requires version =3D=3D 1. */ + return seamcall(TDH_VP_INIT | (1ULL << TDX_VERSION_SHIFT), &args); +} +EXPORT_SYMBOL_GPL(tdh_vp_init_apicid); diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index b9287304f372..64b6504791e1 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -18,10 +18,13 @@ * TDX module SEAMCALL leaf functions */ #define TDH_MNG_ADDCX 1 +#define TDH_VP_ADDCX 4 #define TDH_MNG_KEY_CONFIG 8 #define TDH_MNG_CREATE 9 +#define TDH_VP_CREATE 10 #define TDH_MNG_KEY_FREEID 20 #define TDH_MNG_INIT 21 +#define TDH_VP_INIT 22 #define TDH_PHYMEM_PAGE_RDMD 24 #define TDH_SYS_KEY_CONFIG 31 #define TDH_SYS_INIT 33 @@ -30,6 +33,15 @@ #define TDH_SYS_TDMR_INIT 36 #define TDH_SYS_CONFIG 45 =20 + +/* + * SEAMCALL leaf: + * + * Bit 15:0 Leaf number + * Bit 23:16 Version number + */ +#define TDX_VERSION_SHIFT 16 + /* TDX page types */ #define PT_NDA 0x0 #define PT_RSVD 0x1 --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DEFC21765C; Wed, 30 Oct 2024 19:01:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314866; cv=none; b=HUOgsgk85IEl4YRl16nlWoh27p9i+QrgrSq/sf4rIpPHZtZ2/5t8pNAKlbk/ZY4XaXja5zyhSKlBXJmIZzfiIv/0zbl0/Bg37AXcGH9t0R6BAZpB8G7z4R4a0bftJx2idZZm45wLrm3G9hdzLRnFKgeDGQO1g+dvgRRjYrZRGfw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314866; c=relaxed/simple; bh=SjyU2BjDGns7PRjLKlfcB4VDcu/rQNtiWvnjpu5j1rk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mnVx2qTBsYtScyP4VD6zBy1eICXyM6hry/LyCmCFief3yAe8rF8zBSitV0AtHlAp9e15PGqF61DCYSNO2/c7v+umAdT46LsoOJe4pVw0Jff3Nm2RSKV/OBP5mBV9s7El64zmXylk9yKkv/s0zr4JUw/IPXbgqQthqZy9KSv8xgc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YDK/dtSN; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YDK/dtSN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314862; x=1761850862; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SjyU2BjDGns7PRjLKlfcB4VDcu/rQNtiWvnjpu5j1rk=; b=YDK/dtSNDIcTt7pJBT0CGsUMK+QbG+LXte/eeRfXj50ENendDpiqpXfb S0Z5YYwMN+S2NgtfoCMcKsh/sdim/uDqbsNyPuO8upGhe5OO8XOOzzWgH jP1h1GUfyP0Fg9nLRNcqX36XNDc3wV4fvTy62LPo0LcDmW50CISdVQ+sy 8b194zGlwryoqQPcpIn6wqrrU+3lg7aa0n3o/UGVHz+BrUqLd4LDRI2Z8 T+DLyoEhKUm1WRuP3n9zlu5IEd0Wl2I+MgEiMqsif1ZdLMCoky/DE4k7Q NaTHnYpOD5ToIjc2AqVsog5mrNhd1mASARqa2KuCpi3ZfBcpV5cdH3zOj w==; X-CSE-ConnectionGUID: ILlFkWSmQFOc+7N88JEuYQ== X-CSE-MsgGUID: ELAXIG+OSBihNCmdVnYzLQ== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678755" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678755" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:58 -0700 X-CSE-ConnectionGUID: ZEZ6jH+BRHu2ufnvsQHr6Q== X-CSE-MsgGUID: 63WGY/ogRi+CpjPAynOhXg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499358" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:58 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata , Sean Christopherson , Binbin Wu , Yuan Yao Subject: [PATCH v2 08/25] x86/virt/tdx: Add SEAMCALL wrappers for TDX page cache management Date: Wed, 30 Oct 2024 12:00:21 -0700 Message-ID: <20241030190039.77971-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Intel TDX protects guest VMs from malicious host and certain physical attacks. The TDX module uses pages provided by the host for both control structures and for TD guest pages. These pages are encrypted using the MK-TME encryption engine, with its special requirements around cache invalidation. For its own security, the TDX module ensures pages are flushed properly and track which usage they are currently assigned. For creating and tearing down TD VMs and vCPUs KVM will need to use the TDH.PHYMEM.PAGE.RECLAIM, TDH.PHYMEM.CACHE.WB, and TDH.PHYMEM.PAGE.WBINVD SEAMCALLs. Add tdh_phymem_page_reclaim() to enable KVM to call TDH.PHYMEM.PAGE.RECLAIM to reclaim the page for use by the host kernel. This effectively resets its state in the TDX module's page tracking (PAMT), if the page is available to be reclaimed. This will be used by KVM to reclaim the various types of pages owned by the TDX module. It will have a small wrapper in KVM that retries in the case of a relevant error code. Don't implement this wrapper in arch/x86 because KVM's solution around retrying SEAMCALLs will be better located in a single place. Add tdh_phymem_cache_wb() to enable KVM to call TDH.PHYMEM.CACHE.WB to do a cache write back in a way that the TDX module can verify, before it allows a KeyID to be freed. The KVM code will use this to have a small wrapper that handles retries. Since the TDH.PHYMEM.CACHE.WB operation is interruptible, have tdh_phymem_cache_wb() take a resume argument to pass this info to the TDX module for restarts. It is worth noting that this SEAMCALL uses a SEAM specific MSR to do the write back in sections. In this way it does export some new functionality that affects CPU state. Add tdh_phymem_page_wbinvd_tdr() to enable KVM to call TDH.PHYMEM.PAGE.WBINVD to do a cache write back and invalidate of a TDR, using the global KeyID. The underlying TDH.PHYMEM.PAGE.WBINVD SEAMCALL requires the related KeyID to be encoded into the SEAMCALL args. Since the global KeyID is not exposed to KVM, a dedicated wrapper is needed for TDR focused TDH.PHYMEM.PAGE.WBINVD operations. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu Reviewed-by: Yuan Yao --- uAPI breakout v2: - Change to use 'u64' as function parameter to prepare to move SEAMCALL wrappers to arch/x86. (Kai) - Split to separate patch - Move SEAMCALL wrappers from KVM to x86 core; - Move TDH_xx macros from KVM to x86 core; - Re-write log uAPI breakout v1: - Make argument to C wrapper function struct kvm_tdx * or struct vcpu_tdx * .(Sean) - Drop unused helpers (Kai) - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) v19: - Update the commit message to match the patch by Yuan - Use seamcall() and seamcall_ret() by paolo v18: - removed stub functions for __seamcall{,_ret}() - Added Reviewed-by Binbin - Make tdx_seamcall() use struct tdx_module_args instead of taking each inputs. v16: - use struct tdx_module_args instead of struct tdx_module_output - Add tdh_mem_sept_rd() for SEPT_VE_DISABLE=3D1. --- arch/x86/include/asm/tdx.h | 3 +++ arch/x86/virt/vmx/tdx/tdx.c | 44 +++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 4 +++- 3 files changed, 50 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 6951faa37031..0cf8975759de 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -132,6 +132,9 @@ u64 tdh_mng_key_freeid(u64 tdr); u64 tdh_mng_init(u64 tdr, u64 td_params, u64 *rcx); u64 tdh_vp_init(u64 tdvpr, u64 initial_rcx); u64 tdh_vp_init_apicid(u64 tdvpr, u64 initial_rcx, u32 x2apicid); +u64 tdh_phymem_page_reclaim(u64 page, u64 *rcx, u64 *rdx, u64 *r8); +u64 tdh_phymem_cache_wb(bool resume); +u64 tdh_phymem_page_wbinvd_tdr(u64 tdr); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index b3003031e0fe..7e7c2e2360af 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1670,3 +1670,47 @@ u64 tdh_vp_init_apicid(u64 tdvpr, u64 initial_rcx, u= 32 x2apicid) return seamcall(TDH_VP_INIT | (1ULL << TDX_VERSION_SHIFT), &args); } EXPORT_SYMBOL_GPL(tdh_vp_init_apicid); + +u64 tdh_phymem_page_reclaim(u64 page, u64 *rcx, u64 *rdx, u64 *r8) +{ + struct tdx_module_args args =3D { + .rcx =3D page, + }; + u64 ret; + + ret =3D seamcall_ret(TDH_PHYMEM_PAGE_RECLAIM, &args); + + /* + * Additional error information: + * + * - RCX: page type + * - RDX: owner + * - R8: page size (4K, 2M or 1G) + */ + *rcx =3D args.rcx; + *rdx =3D args.rdx; + *r8 =3D args.r8; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_phymem_page_reclaim); + +u64 tdh_phymem_cache_wb(bool resume) +{ + struct tdx_module_args args =3D { + .rcx =3D resume ? 1 : 0, + }; + + return seamcall(TDH_PHYMEM_CACHE_WB, &args); +} +EXPORT_SYMBOL_GPL(tdh_phymem_cache_wb); + +u64 tdh_phymem_page_wbinvd_tdr(u64 tdr) +{ + struct tdx_module_args args =3D {}; + + args.rcx =3D tdr | ((u64)tdx_global_keyid << boot_cpu_data.x86_phys_bits); + + return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); +} +EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_tdr); diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 64b6504791e1..191bdd1e571d 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -26,14 +26,16 @@ #define TDH_MNG_INIT 21 #define TDH_VP_INIT 22 #define TDH_PHYMEM_PAGE_RDMD 24 +#define TDH_PHYMEM_PAGE_RECLAIM 28 #define TDH_SYS_KEY_CONFIG 31 #define TDH_SYS_INIT 33 #define TDH_SYS_RD 34 #define TDH_SYS_LP_INIT 35 #define TDH_SYS_TDMR_INIT 36 +#define TDH_PHYMEM_CACHE_WB 40 +#define TDH_PHYMEM_PAGE_WBINVD 41 #define TDH_SYS_CONFIG 45 =20 - /* * SEAMCALL leaf: * --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69D372185B7; Wed, 30 Oct 2024 19:01:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314867; cv=none; b=MTXPUqtR7MXx6+u7YDlAkRiv/IFUoCAuyCcO+2fwXJgA03ItKNdzg0r4URFRxvxN8jOoGVlw8UeAT4xCIkhG9nEh5QgqL7HaAacWWzWhwf1vUdSoMRSzNATUyDY7sFqNbZWPeBUTNEsHe9cRC8TPJ3MUdwehE2t7XVM+sJs4WMA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314867; c=relaxed/simple; bh=QKmL1dXWlxybwbuO7zWK3CjAUm54v58gDsSxoui5AdE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JciO+V4AXg0LKLQfDDPg/NWKrBLSUNj7Oe4kWIqYubG69KX+oo0he2StqB2yygUHsKCD7lKkZITT6YOco+nBwitxDHU0rjIGyRCebJ37R7sw9YEyGiU5qtZkMMj+sojC87VW4FnFVl35OmyLtieB8dZ/3RdKbCAJdxk9AkNdLG8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KbxQydTq; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KbxQydTq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314865; x=1761850865; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QKmL1dXWlxybwbuO7zWK3CjAUm54v58gDsSxoui5AdE=; b=KbxQydTqELORbCQXH5tzi54jjdzC3wW6mCzzPD2J5RaaOVtaK2mcwqUt 6gpr9C79TjetVCvewDH7Nzkkdoce7Uu6JdTVaiu9uWLnc+HLMpTzbC6EA hAIvNYUC6pQcwmqbNV2QEHihe5LUY96c1gsBqW1Y2QZb+KzoPcI5cE/+Z Tf0AsiWE2a+xfjSqqHurqJEML8y5ha+Cr9mAQx26D+FUteZVX6nhnAAU5 kHj1NKOw3EsS+s+H2HgwK2Q4PN1QVbesgBN+VFNLX07m6Nf/hCYKhBcEa bRbq1D+v+a/QFetTqKwbKaBbNryEm49k79AFgQDsKY+nmjlp4d66A9VKr A==; X-CSE-ConnectionGUID: Z2wyeFgQSSuhs/uNOxPvxg== X-CSE-MsgGUID: LFGu1TvySnyp7mdqIn5wsg== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678760" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678760" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:59 -0700 X-CSE-ConnectionGUID: Y970gQYWQSKgCZbOx/JWbg== X-CSE-MsgGUID: eL8nNDU2QlK420knL8JN0g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499364" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:58 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata , Sean Christopherson , Binbin Wu , Yuan Yao Subject: [PATCH v2 09/25] x86/virt/tdx: Add SEAMCALL wrappers for TDX VM/vCPU field access Date: Wed, 30 Oct 2024 12:00:22 -0700 Message-ID: <20241030190039.77971-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Intel TDX protects guest VMs from malicious host and certain physical attacks. The TDX module has TD scoped and vCPU scoped "metadata fields". These fields are a bit like VMCS fields, and stored in data structures maintained by the TDX module. Export 3 SEAMCALLs for use in reading and writing these fields: Make tdh_mng_rd() use MNG.VP.RD to read the TD scoped metadata. Make tdh_vp_rd()/tdh_vp_wr() use TDH.VP.RD/WR to read/write the vCPU scoped metadata. KVM will use these by creating inline helpers that target various metadata sizes. Export the raw SEAMCALL leaf, to avoid exporting the large number of various sized helpers. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu Reviewed-by: Yuan Yao --- uAPI breakout v2: - Change to use 'u64' as function parameter to prepare to move SEAMCALL wrappers to arch/x86. (Kai) - Split to separate patch - Move SEAMCALL wrappers from KVM to x86 core; - Move TDH_xx macros from KVM to x86 core; - Re-write log uAPI breakout v1: - Make argument to C wrapper function struct kvm_tdx * or struct vcpu_tdx * .(Sean) - Drop unused helpers (Kai) - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) v19: - Update the commit message to match the patch by Yuan - Use seamcall() and seamcall_ret() by paolo v18: - removed stub functions for __seamcall{,_ret}() - Added Reviewed-by Binbin - Make tdx_seamcall() use struct tdx_module_args instead of taking each inputs. --- arch/x86/include/asm/tdx.h | 3 +++ arch/x86/virt/vmx/tdx/tdx.c | 47 +++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 3 +++ 3 files changed, 53 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 0cf8975759de..a70933ec7808 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -128,9 +128,12 @@ u64 tdh_vp_addcx(u64 tdvpr, u64 tdcx); u64 tdh_mng_key_config(u64 tdr); u64 tdh_mng_create(u64 tdr, u64 hkid); u64 tdh_vp_create(u64 tdr, u64 tdvpr); +u64 tdh_mng_rd(u64 tdr, u64 field, u64 *data); u64 tdh_mng_key_freeid(u64 tdr); u64 tdh_mng_init(u64 tdr, u64 td_params, u64 *rcx); u64 tdh_vp_init(u64 tdvpr, u64 initial_rcx); +u64 tdh_vp_rd(u64 tdvpr, u64 field, u64 *data); +u64 tdh_vp_wr(u64 tdvpr, u64 field, u64 data, u64 mask); u64 tdh_vp_init_apicid(u64 tdvpr, u64 initial_rcx, u32 x2apicid); u64 tdh_phymem_page_reclaim(u64 page, u64 *rcx, u64 *rdx, u64 *r8); u64 tdh_phymem_cache_wb(bool resume); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 7e7c2e2360af..82820422d698 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1621,6 +1621,23 @@ u64 tdh_vp_create(u64 tdr, u64 tdvpr) } EXPORT_SYMBOL_GPL(tdh_vp_create); =20 +u64 tdh_mng_rd(u64 tdr, u64 field, u64 *data) +{ + struct tdx_module_args args =3D { + .rcx =3D tdr, + .rdx =3D field, + }; + u64 ret; + + ret =3D seamcall_ret(TDH_MNG_RD, &args); + + /* R8: Content of the field, or 0 in case of error. */ + *data =3D args.r8; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mng_rd); + u64 tdh_mng_key_freeid(u64 tdr) { struct tdx_module_args args =3D { @@ -1658,6 +1675,36 @@ u64 tdh_vp_init(u64 tdvpr, u64 initial_rcx) } EXPORT_SYMBOL_GPL(tdh_vp_init); =20 +u64 tdh_vp_rd(u64 tdvpr, u64 field, u64 *data) +{ + struct tdx_module_args args =3D { + .rcx =3D tdvpr, + .rdx =3D field, + }; + u64 ret; + + ret =3D seamcall_ret(TDH_VP_RD, &args); + + /* R8: Content of the field, or 0 in case of error. */ + *data =3D args.r8; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_vp_rd); + +u64 tdh_vp_wr(u64 tdvpr, u64 field, u64 data, u64 mask) +{ + struct tdx_module_args args =3D { + .rcx =3D tdvpr, + .rdx =3D field, + .r8 =3D data, + .r9 =3D mask, + }; + + return seamcall(TDH_VP_WR, &args); +} +EXPORT_SYMBOL_GPL(tdh_vp_wr); + u64 tdh_vp_init_apicid(u64 tdvpr, u64 initial_rcx, u32 x2apicid) { struct tdx_module_args args =3D { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 191bdd1e571d..1915a558c126 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -22,10 +22,12 @@ #define TDH_MNG_KEY_CONFIG 8 #define TDH_MNG_CREATE 9 #define TDH_VP_CREATE 10 +#define TDH_MNG_RD 11 #define TDH_MNG_KEY_FREEID 20 #define TDH_MNG_INIT 21 #define TDH_VP_INIT 22 #define TDH_PHYMEM_PAGE_RDMD 24 +#define TDH_VP_RD 26 #define TDH_PHYMEM_PAGE_RECLAIM 28 #define TDH_SYS_KEY_CONFIG 31 #define TDH_SYS_INIT 33 @@ -33,6 +35,7 @@ #define TDH_SYS_LP_INIT 35 #define TDH_SYS_TDMR_INIT 36 #define TDH_PHYMEM_CACHE_WB 40 +#define TDH_VP_WR 43 #define TDH_PHYMEM_PAGE_WBINVD 41 #define TDH_SYS_CONFIG 45 =20 --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE7AB2178F3; Wed, 30 Oct 2024 19:01:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314868; cv=none; b=AoQpKm81IZns6rHOyCyJYjHLqidh6guxtcFnith3T4kP8TKLDakH9buLKuhCQ4jbjo1T1YA8p/k6gNWFxOweE8QDGUbtsR6llDCREJcDF2OUMBOXtVRFA6lRIQbj4Eaf8XVRAH4KCjVbK3Q0AYaFtmUILR0ZBfw0YhY5+lk6QPs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314868; c=relaxed/simple; bh=ohGJW98zbzi41fnrs+d0ySgurKfZKhW8uy+/TSjCT/A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UU8Mb+GC7uZ89bMBMpbgZrEimxbwXkclFpvw8HkWN5NL1//ZOETppFTx7IuIVQm9siqxxl/+xPh39C6+kq1OuQHv7enK9aaBfiPmQVFz6wlOYRFuCEbPy/AlopxYUBIsBIoIiYw/5XxhzPvZcdO7vxB0L3S0BhSSH1BPyyal/8c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=W1m2c6Xh; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="W1m2c6Xh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314866; x=1761850866; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ohGJW98zbzi41fnrs+d0ySgurKfZKhW8uy+/TSjCT/A=; b=W1m2c6XhaRnf96lziMJjayZPiEhe74Yr6Ir3x93rcDORZXaEuFokJZyA Ll795qbwnSFiNVjVMEs6CwmwC0JPrAps2JKMrW08uZtVmWO0WZBBFbTCs I6JdQSFzpMxvWyRY7FXAubwRafstYHSDeI0R4XYvBLlnVNPxAmEhqRPjq fVuDeLkkYcQKjlur3kQfxtmYkkOfl9MaXGrD2CeaVPPHV4yilAnZsQxR6 wb+g1bjuj/tVHb9msW/xPMnKlLmoSo01xUxbluUzlM7o2JYrUjSjmmJqU k7oRatpkCRfMCM2dQJ/8n206+mVuOeOoRIwSwfktoY8QUa9l1l4jiWxKH Q==; X-CSE-ConnectionGUID: WzyCXqgHRz+pjlVKWzzlsw== X-CSE-MsgGUID: 3tllxLS1TcqDS4qWO2QcHw== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678766" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678766" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:00 -0700 X-CSE-ConnectionGUID: Pm3GizZ1SrOtZ2EIozGNAQ== X-CSE-MsgGUID: 6zVzmGVARSGu5p0H9JnLuw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499372" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:00:59 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata , Sean Christopherson , Binbin Wu , Yuan Yao Subject: [PATCH v2 10/25] x86/virt/tdx: Add SEAMCALL wrappers for TDX flush operations Date: Wed, 30 Oct 2024 12:00:23 -0700 Message-ID: <20241030190039.77971-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Intel TDX protects guest VMs from malicious host and certain physical attacks. The TDX module has the concept of flushing vCPUs. These flushes include both a flush of the translation caches and also any other state internal to the TDX module. Before freeing a KeyID, this flush operation needs to be done. KVM will need to perform the flush on each pCPU associated with the TD, and also perform a TD scoped operation that checks if the flush has been done on all vCPU's associated with the TD. Add a tdh_vp_flush() function to be used to call TDH.VP.FLUSH on each pCPU associated with the TD during TD teardown. It will also be called when disabling TDX and during vCPU migration between pCPUs. Add tdh_mng_vpflushdone() to be used by KVM to call TDH.MNG.VPFLUSHDONE. KVM will use this during TD teardown to verify that TDH.VP.FLUSH has been called sufficiently, and advance the state machine that will allow for reclaiming the TD's KeyID. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu Reviewed-by: Yuan Yao --- uAPI breakout v2: - Change to use 'u64' as function parameter to prepare to move SEAMCALL wrappers to arch/x86. (Kai) - Split to separate patch - Move SEAMCALL wrappers from KVM to x86 core; - Move TDH_xx macros from KVM to x86 core; - Re-write log uAPI breakout v1: - Make argument to C wrapper function struct kvm_tdx * or struct vcpu_tdx * .(Sean) - Drop unused helpers (Kai) - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) v19: - Update the commit message to match the patch by Yuan - Use seamcall() and seamcall_ret() by paolo v18: - removed stub functions for __seamcall{,_ret}() - Added Reviewed-by Binbin - Make tdx_seamcall() use struct tdx_module_args instead of taking each inputs. --- arch/x86/include/asm/tdx.h | 2 ++ arch/x86/virt/vmx/tdx/tdx.c | 20 ++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 2 ++ 3 files changed, 24 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index a70933ec7808..d093dc4350ac 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -129,6 +129,8 @@ u64 tdh_mng_key_config(u64 tdr); u64 tdh_mng_create(u64 tdr, u64 hkid); u64 tdh_vp_create(u64 tdr, u64 tdvpr); u64 tdh_mng_rd(u64 tdr, u64 field, u64 *data); +u64 tdh_vp_flush(u64 tdvpr); +u64 tdh_mng_vpflushdone(u64 tdr); u64 tdh_mng_key_freeid(u64 tdr); u64 tdh_mng_init(u64 tdr, u64 td_params, u64 *rcx); u64 tdh_vp_init(u64 tdvpr, u64 initial_rcx); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 82820422d698..af121a73de80 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1638,6 +1638,26 @@ u64 tdh_mng_rd(u64 tdr, u64 field, u64 *data) } EXPORT_SYMBOL_GPL(tdh_mng_rd); =20 +u64 tdh_vp_flush(u64 tdvpr) +{ + struct tdx_module_args args =3D { + .rcx =3D tdvpr, + }; + + return seamcall(TDH_VP_FLUSH, &args); +} +EXPORT_SYMBOL_GPL(tdh_vp_flush); + +u64 tdh_mng_vpflushdone(u64 tdr) +{ + struct tdx_module_args args =3D { + .rcx =3D tdr, + }; + + return seamcall(TDH_MNG_VPFLUSHDONE, &args); +} +EXPORT_SYMBOL_GPL(tdh_mng_vpflushdone); + u64 tdh_mng_key_freeid(u64 tdr) { struct tdx_module_args args =3D { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 1915a558c126..a63037036c91 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -23,6 +23,8 @@ #define TDH_MNG_CREATE 9 #define TDH_VP_CREATE 10 #define TDH_MNG_RD 11 +#define TDH_VP_FLUSH 18 +#define TDH_MNG_VPFLUSHDONE 19 #define TDH_MNG_KEY_FREEID 20 #define TDH_MNG_INIT 21 #define TDH_VP_INIT 22 --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D889D1D12ED; Wed, 30 Oct 2024 19:01:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314869; cv=none; b=JDW3vuddDsW9hfPjqkf1yvRNLRk3SqOXze3cHSGDmRxqm9F560Sf0iOGL97JZHh5US8sIzcz2BEjPNqmwMh/TTBJwDSFcRAamDIJw2Iy3iBldpSvznu8afEn8DiXa4GqOJIs1aNJA7oS7dWElkY35nHAbDfjx1AERN35br/jbzk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314869; c=relaxed/simple; bh=m++ly584oN13pAwPwRJbz1ZBL5ddHyfpuukLAk6SSdY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kDQ7FxQh6NQpHslC0EEdgkCYQBSp+vMYZJJfuJuRmKQDMCHCviaDc0kHRUkyA5gLQbs8aSrGki8Fc4OgchEw9rlNwrBBjfDoD4LML3eFpheZHBX85Iz0JTX2pBGmzfkcNyblXQnDMUIvYim58Q15zoJ2mf6f32JqbW2BR9nejLU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hXvHha86; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hXvHha86" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314867; x=1761850867; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m++ly584oN13pAwPwRJbz1ZBL5ddHyfpuukLAk6SSdY=; b=hXvHha86cTyfqTmznqSXL/qHsBqY2ZIvZegmIyUumWGg1YUXKlKCmL/y hU3EfQ9wT1pGxa2eZFJv3nLVamYbX/ZBiV/gAk0ByMzxh9vHrBgBCbaY1 zQey/uLXtvwX2f8eEVX+rtWVbB8Nfkh0NbWfUPqIMO9nb4qafnP3ZyQAr Iyau9q/qgLUTc3bKT7hhna6Eo46P//LkpO5qZ1YEnyvNKMi0kERZARXf9 RxwQBUJHiY97mYDZxc94f1MLkPdozb6fLfhf2bcqzFBDSz103GvbP966s y8IqulBnPMsY+csiQ90pzKadGNFoqN5Q6MAByh7ZNIu7fXMk9aCVFl4nh A==; X-CSE-ConnectionGUID: CyDMbqk5RryCLZhUNpJoxw== X-CSE-MsgGUID: NKQzmycIR223nXrxItlTPw== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678772" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678772" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:01 -0700 X-CSE-ConnectionGUID: 3u2dExyDT1yrk7PU0vcPDg== X-CSE-MsgGUID: 7dH3aA+NRuCOesXmhBVSgA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499381" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:00 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata Subject: [PATCH v2 11/25] KVM: TDX: Add placeholders for TDX VM/vCPU structures Date: Wed, 30 Oct 2024 12:00:24 -0700 Message-ID: <20241030190039.77971-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Add TDX's own VM and vCPU structures as placeholder to manage and run TDX guests. Also add helper functions to check whether a VM/vCPU is TDX or normal VMX one, and add helpers to convert between TDX VM/vCPU and KVM VM/vCPU. TDX protects guest VMs from malicious host. Unlike VMX guests, TDX guests are crypto-protected. KVM cannot access TDX guests' memory and vCPU states directly. Instead, TDX requires KVM to use a set of TDX architecture-defined firmware APIs (a.k.a TDX module SEAMCALLs) to manage and run TDX guests. In fact, the way to manage and run TDX guests and normal VMX guests are quite different. Because of that, the current structures ('struct kvm_vmx' and 'struct vcpu_vmx') to manage VMX guests are not quite suitable for TDX guests. E.g., the majority of the members of 'struct vcpu_vmx' don't apply to TDX guests. Introduce TDX's own VM and vCPU structures ('struct kvm_tdx' and 'struct vcpu_tdx' respectively) for KVM to manage and run TDX guests. And instead of building TDX's VM and vCPU structures based on VMX's, build them directly based on 'struct kvm'. As a result, TDX and VMX guests will have different VM size and vCPU size/alignment. Currently, kvm_arch_alloc_vm() uses 'kvm_x86_ops::vm_size' to allocate enough space for the VM structure when creating guest. With TDX guests, ideally, KVM should allocate the VM structure based on the VM type so that the precise size can be allocated for VMX and TDX guests. But this requires more extensive code change. For now, simply choose the maximum size of 'struct kvm_tdx' and 'struct kvm_vmx' for VM structure allocation for both VMX and TDX guests. This would result in small memory waste for each VM which has smaller VM structure size but this is acceptable. For simplicity, use the same way for vCPU allocation too. Otherwise KVM would need to maintain a separate 'kvm_vcpu_cache' for each VM type. Note, updating the 'vt_x86_ops::vm_size' needs to be done before calling kvm_ops_update(), which copies vt_x86_ops to kvm_x86_ops. However this happens before TDX module initialization. Therefore theoretically it is possible that 'kvm_x86_ops::vm_size' is set to size of 'struct kvm_tdx' (when it's larger) but TDX actually fails to initialize at a later time. Again the worst case of this is wasting couple of bytes memory for each VM. KVM could choose to update 'kvm_x86_ops::vm_size' at a later time depending on TDX's status but that would require base KVM module to export either kvm_x86_ops or kvm_ops_update(). Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - Correct typo for update (Tony) uAPI breakout v1: - Re-add __always_inline to to_kvm_tdx(), to_tdx(). (Sean) - Fix bisectability issues in headers (Kai) - Add a comment around updating vt_x86_ops.vm_size. - Update the comment around updating vcpu_size/align: https://lore.kernel.org/kvm/25d2bf93854ae7410d82119227be3cb2ce47c4f2.cam= el@intel.com/ - Refine changelog: https://lore.kernel.org/kvm/9c592801471a137c51f583065764fbfc3081c016.cam= el@intel.com/ v19: - correctly update ops.vm_size, vcpu_size and, vcpu_align by Xiaoyao v14 -> v15: - use KVM_X86_TDX_VM --- arch/x86/kvm/vmx/main.c | 53 ++++++++++++++++++++++++++++++++++++++--- arch/x86/kvm/vmx/tdx.c | 2 +- arch/x86/kvm/vmx/tdx.h | 49 +++++++++++++++++++++++++++++++++++++ 3 files changed, 100 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 053294939eb1..245f7d1f1bd4 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -8,6 +8,39 @@ #include "posted_intr.h" #include "tdx.h" =20 +static __init int vt_hardware_setup(void) +{ + int ret; + + ret =3D vmx_hardware_setup(); + if (ret) + return ret; + + /* + * Update vt_x86_ops::vm_size here so it is ready before + * kvm_ops_update() is called in kvm_x86_vendor_init(). + * + * Note, the actual bringing up of TDX must be done after + * kvm_ops_update() because enabling TDX requires enabling + * hardware virtualization first, i.e., all online CPUs must + * be in post-VMXON state. This means the @vm_size here + * may be updated to TDX's size but TDX may fail to enable + * at later time. + * + * The VMX/VT code could update kvm_x86_ops::vm_size again + * after bringing up TDX, but this would require exporting + * either kvm_x86_ops or kvm_ops_update() from the base KVM + * module, which looks overkill. Anyway, the worst case here + * is KVM may allocate couple of more bytes than needed for + * each VM. + */ + if (enable_tdx) + vt_x86_ops.vm_size =3D max_t(unsigned int, vt_x86_ops.vm_size, + sizeof(struct kvm_tdx)); + + return 0; +} + #define VMX_REQUIRED_APICV_INHIBITS \ (BIT(APICV_INHIBIT_REASON_DISABLED) | \ BIT(APICV_INHIBIT_REASON_ABSENT) | \ @@ -161,7 +194,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D { }; =20 struct kvm_x86_init_ops vt_init_ops __initdata =3D { - .hardware_setup =3D vmx_hardware_setup, + .hardware_setup =3D vt_hardware_setup, .handle_intel_pt_intr =3D NULL, =20 .runtime_ops =3D &vt_x86_ops, @@ -178,6 +211,7 @@ module_exit(vt_exit); =20 static int __init vt_init(void) { + unsigned vcpu_size, vcpu_align; int r; =20 r =3D vmx_init(); @@ -187,12 +221,25 @@ static int __init vt_init(void) /* tdx_init() has been taken */ tdx_bringup(); =20 + /* + * TDX and VMX have different vCPU structures. Calculate the + * maximum size/align so that kvm_init() can use the larger + * values to create the kmem_vcpu_cache. + */ + vcpu_size =3D sizeof(struct vcpu_vmx); + vcpu_align =3D __alignof__(struct vcpu_vmx); + if (enable_tdx) { + vcpu_size =3D max_t(unsigned, vcpu_size, + sizeof(struct vcpu_tdx)); + vcpu_align =3D max_t(unsigned, vcpu_align, + __alignof__(struct vcpu_tdx)); + } + /* * Common KVM initialization _must_ come last, after this, /dev/kvm is * exposed to userspace! */ - r =3D kvm_init(sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx), - THIS_MODULE); + r =3D kvm_init(vcpu_size, vcpu_align, THIS_MODULE); if (r) goto err_kvm_init; =20 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f95a4dbcaf4a..f2830ff2af1d 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -7,7 +7,7 @@ #undef pr_fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt =20 -static bool enable_tdx __ro_after_init; +bool enable_tdx __ro_after_init; module_param_named(tdx, enable_tdx, bool, 0444); =20 static enum cpuhp_state tdx_cpuhp_state; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 766a6121f670..e6a232d58e6a 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -4,9 +4,58 @@ #ifdef CONFIG_INTEL_TDX_HOST void tdx_bringup(void); void tdx_cleanup(void); + +extern bool enable_tdx; + +struct kvm_tdx { + struct kvm kvm; + /* TDX specific members follow. */ +}; + +struct vcpu_tdx { + struct kvm_vcpu vcpu; + /* TDX specific members follow. */ +}; + +static inline bool is_td(struct kvm *kvm) +{ + return kvm->arch.vm_type =3D=3D KVM_X86_TDX_VM; +} + +static inline bool is_td_vcpu(struct kvm_vcpu *vcpu) +{ + return is_td(vcpu->kvm); +} + +static __always_inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm) +{ + return container_of(kvm, struct kvm_tdx, kvm); +} + +static __always_inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu) +{ + return container_of(vcpu, struct vcpu_tdx, vcpu); +} + #else static inline void tdx_bringup(void) {} static inline void tdx_cleanup(void) {} + +#define enable_tdx 0 + +struct kvm_tdx { + struct kvm kvm; +}; + +struct vcpu_tdx { + struct kvm_vcpu vcpu; +}; + +static inline bool is_td(struct kvm *kvm) { return false; } +static inline bool is_td_vcpu(struct kvm_vcpu *vcpu) { return false; } +static inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm) { return NULL; } +static inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu) { return NULL= ; } + #endif =20 #endif --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20887219485; Wed, 30 Oct 2024 19:01:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314870; cv=none; b=oL/B201mPIN81HsdxYb0pe169F0ELxuHcuiIzax9A87iZxwkYFI1lRu/p1YvPI2jYxOqv07v+o/frPcZEwmc23tkLUo/O5BJLMCt91QXE2iG5IOHVhWvFYSGkiduFL2dQBDy8heFM+wUTYZ7mzTmF4VB/S4nJ17EyCnIcJ2hK4M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314870; c=relaxed/simple; bh=ltoCTvaZo6xvLJEibxTPWANAwGnhjBVURGUrfWmgb/Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=feI0F6WQxQm9XsGFRc/7tcVpvG6PCQUYSo6KICsoma5V+CJylF7KGsW8pVs9H4NXe3UJ+dllRXBz8TLQ9kk2ycdAIbolQiulGgXvx+jZl8H8EvGlepCv33SliFvkKmBICkWjOGy0WmIuKInfhoTiR5cVx9WL47Isi8k3FCk/Glo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=C0AgSYH/; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="C0AgSYH/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314868; x=1761850868; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ltoCTvaZo6xvLJEibxTPWANAwGnhjBVURGUrfWmgb/Q=; b=C0AgSYH/gw75UDIVm3Q5CSiignhlkbuiKjWYpj3Y8oz+nq7ukGcTNgZb IVSWDGjcG9WPt8Ya7X4fQDbA7VDlg7E4YW1Sdf+XOQMtHRIYV1wG73yUg bWtcUFRsoUBcFVCFMECoAGUa+J9edTOiHItugUocbKQY3PD4kBiU5LUUa wASW8ID499j8LySLkX4zzFDBtN5JPTSo+apdU6Or3yJrhuzz+Z8pINpKc Uhiq5p8d3ovy+vWyKH6sCqrBuC5Pac2Y7X1RvhG4ZUSC6EgbFYdNz4jJK Ho4fgerXtZGHZdTdJP9POPhYk50xqvyGEqm3iUvNNsWLAd8H9seeWP1sv g==; X-CSE-ConnectionGUID: Ro1y/o+TRWS62GQ98mzi+Q== X-CSE-MsgGUID: Yf2C/cI1S5Sr3sajRHpfXg== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678780" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678780" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:01 -0700 X-CSE-ConnectionGUID: P29XC3duT0G8Znl/ysVwgw== X-CSE-MsgGUID: E244EQLmTCiIOF4pRXz5yw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499389" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:01 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata , Sean Christopherson Subject: [PATCH v2 12/25] KVM: TDX: Define TDX architectural definitions Date: Wed, 30 Oct 2024 12:00:25 -0700 Message-ID: <20241030190039.77971-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Define architectural definitions for KVM to issue the TDX SEAMCALLs. Structures and values that are architecturally defined in the TDX module specifications the chapter of ABI Reference. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini Reviewed-by: Xiaoyao Li --- uAPI breakout v2: - Use TDX 1.5 naming of config_flags instead of exec_controls (Xiaoyao) uAPI breakout v1: - Remove macros no longer needed due to reading metadata done in TDX host code: - Metadata field ID macros, bit definitions - TDX_MAX_NR_CPUID_CONFIGS - Drop unused defined (Kai) - Fix bisectability issues in headers (Kai) - Remove TDX_MAX_VCPUS define (Kai) - Remove unused TD_EXIT_OTHER_SMI_IS_MSMI define. - Move TDX vm type to separate patch - Move unions in tdx_arch.h to where they are introduced (Sean) v19: - drop tdvmcall constants by Xiaoyao v18: - Add metadata field id --- arch/x86/kvm/vmx/tdx.h | 2 + arch/x86/kvm/vmx/tdx_arch.h | 158 ++++++++++++++++++++++++++++++++++++ 2 files changed, 160 insertions(+) create mode 100644 arch/x86/kvm/vmx/tdx_arch.h diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index e6a232d58e6a..1d6fa81a072d 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -1,6 +1,8 @@ #ifndef __KVM_X86_VMX_TDX_H #define __KVM_X86_VMX_TDX_H =20 +#include "tdx_arch.h" + #ifdef CONFIG_INTEL_TDX_HOST void tdx_bringup(void); void tdx_cleanup(void); diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h new file mode 100644 index 000000000000..84af7666e958 --- /dev/null +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -0,0 +1,158 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* architectural constants/data definitions for TDX SEAMCALLs */ + +#ifndef __KVM_X86_TDX_ARCH_H +#define __KVM_X86_TDX_ARCH_H + +#include + +#define TDX_VERSION_SHIFT 16 + +/* + * TDX SEAMCALL API function leaves + */ +#define TDH_VP_ENTER 0 +#define TDH_MNG_ADDCX 1 +#define TDH_MEM_PAGE_ADD 2 +#define TDH_MEM_SEPT_ADD 3 +#define TDH_VP_ADDCX 4 +#define TDH_MEM_PAGE_AUG 6 +#define TDH_MEM_RANGE_BLOCK 7 +#define TDH_MNG_KEY_CONFIG 8 +#define TDH_MNG_CREATE 9 +#define TDH_VP_CREATE 10 +#define TDH_MNG_RD 11 +#define TDH_MR_EXTEND 16 +#define TDH_MR_FINALIZE 17 +#define TDH_VP_FLUSH 18 +#define TDH_MNG_VPFLUSHDONE 19 +#define TDH_MNG_KEY_FREEID 20 +#define TDH_MNG_INIT 21 +#define TDH_VP_INIT 22 +#define TDH_VP_RD 26 +#define TDH_MNG_KEY_RECLAIMID 27 +#define TDH_PHYMEM_PAGE_RECLAIM 28 +#define TDH_MEM_PAGE_REMOVE 29 +#define TDH_MEM_SEPT_REMOVE 30 +#define TDH_SYS_RD 34 +#define TDH_MEM_TRACK 38 +#define TDH_MEM_RANGE_UNBLOCK 39 +#define TDH_PHYMEM_CACHE_WB 40 +#define TDH_PHYMEM_PAGE_WBINVD 41 +#define TDH_VP_WR 43 + +/* TDX control structure (TDR/TDCS/TDVPS) field access codes */ +#define TDX_NON_ARCH BIT_ULL(63) +#define TDX_CLASS_SHIFT 56 +#define TDX_FIELD_MASK GENMASK_ULL(31, 0) + +#define __BUILD_TDX_FIELD(non_arch, class, field) \ + (((non_arch) ? TDX_NON_ARCH : 0) | \ + ((u64)(class) << TDX_CLASS_SHIFT) | \ + ((u64)(field) & TDX_FIELD_MASK)) + +#define BUILD_TDX_FIELD(class, field) \ + __BUILD_TDX_FIELD(false, (class), (field)) + +#define BUILD_TDX_FIELD_NON_ARCH(class, field) \ + __BUILD_TDX_FIELD(true, (class), (field)) + + +/* Class code for TD */ +#define TD_CLASS_EXECUTION_CONTROLS 17ULL + +/* Class code for TDVPS */ +#define TDVPS_CLASS_VMCS 0ULL +#define TDVPS_CLASS_GUEST_GPR 16ULL +#define TDVPS_CLASS_OTHER_GUEST 17ULL +#define TDVPS_CLASS_MANAGEMENT 32ULL + +enum tdx_tdcs_execution_control { + TD_TDCS_EXEC_TSC_OFFSET =3D 10, +}; + +/* @field is any of enum tdx_tdcs_execution_control */ +#define TDCS_EXEC(field) BUILD_TDX_FIELD(TD_CLASS_EXECUTION_CONTROLS, (fi= eld)) + +/* @field is the VMCS field encoding */ +#define TDVPS_VMCS(field) BUILD_TDX_FIELD(TDVPS_CLASS_VMCS, (field)) + +/* @field is any of enum tdx_guest_other_state */ +#define TDVPS_STATE(field) BUILD_TDX_FIELD(TDVPS_CLASS_OTHER_GUEST, (fiel= d)) +#define TDVPS_STATE_NON_ARCH(field) BUILD_TDX_FIELD_NON_ARCH(TDVPS_CLASS_O= THER_GUEST, (field)) + +/* Management class fields */ +enum tdx_vcpu_guest_management { + TD_VCPU_PEND_NMI =3D 11, +}; + +/* @field is any of enum tdx_vcpu_guest_management */ +#define TDVPS_MANAGEMENT(field) BUILD_TDX_FIELD(TDVPS_CLASS_MANAGEMENT, (= field)) + +#define TDX_EXTENDMR_CHUNKSIZE 256 + +struct tdx_cpuid_value { + u32 eax; + u32 ebx; + u32 ecx; + u32 edx; +} __packed; + +#define TDX_TD_ATTR_DEBUG BIT_ULL(0) +#define TDX_TD_ATTR_SEPT_VE_DISABLE BIT_ULL(28) +#define TDX_TD_ATTR_PKS BIT_ULL(30) +#define TDX_TD_ATTR_KL BIT_ULL(31) +#define TDX_TD_ATTR_PERFMON BIT_ULL(63) + +/* + * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is= 1024B. + */ +struct td_params { + u64 attributes; + u64 xfam; + u16 max_vcpus; + u8 reserved0[6]; + + u64 eptp_controls; + u64 config_flags; + u16 tsc_frequency; + u8 reserved1[38]; + + u64 mrconfigid[6]; + u64 mrowner[6]; + u64 mrownerconfig[6]; + u64 reserved2[4]; + + union { + DECLARE_FLEX_ARRAY(struct tdx_cpuid_value, cpuid_values); + u8 reserved3[768]; + }; +} __packed __aligned(1024); + +/* + * Guest uses MAX_PA for GPAW when set. + * 0: GPA.SHARED bit is GPA[47] + * 1: GPA.SHARED bit is GPA[51] + */ +#define TDX_CONFIG_FLAGS_MAX_GPAW BIT_ULL(0) + +/* + * TDH.VP.ENTER, TDG.VP.VMCALL preserves RBP + * 0: RBP can be used for TDG.VP.VMCALL input. RBP is clobbered. + * 1: RBP can't be used for TDG.VP.VMCALL input. RBP is preserved. + */ +#define TDX_CONFIG_FLAGS_NO_RBP_MOD BIT_ULL(2) + + +/* + * TDX requires the frequency to be defined in units of 25MHz, which is the + * frequency of the core crystal clock on TDX-capable platforms, i.e. the = TDX + * module can only program frequencies that are multiples of 25MHz. The + * frequency must be between 100mhz and 10ghz (inclusive). + */ +#define TDX_TSC_KHZ_TO_25MHZ(tsc_in_khz) ((tsc_in_khz) / (25 * 1000)) +#define TDX_TSC_25MHZ_TO_KHZ(tsc_in_25mhz) ((tsc_in_25mhz) * (25 * 1000)) +#define TDX_MIN_TSC_FREQUENCY_KHZ (100 * 1000) +#define TDX_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000) + +#endif /* __KVM_X86_TDX_ARCH_H */ --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D4981D0E03; Wed, 30 Oct 2024 19:01:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314870; cv=none; b=jp8p8Hk4X30cjBZ/lz4OL8evXXEQUa+xQ9laZ90T1iiXZQTPxPsgpy+YwMiM/QxAjCi/ZOrvZ7AZZS4SI7ENvIhZAQ0B/Rwj4ylhJYGt+Ue6gwanVDlY7cBc0kkQuCcIaz0YC79qgGV84GcoMskxMAml0mbVufev61MFoOl6Rs8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314870; c=relaxed/simple; bh=qwoM3hXtk8NjnongzOLCIGAUZzhoO9es9XXbY37Guow=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mrqZcUTl+We9I3rtnHUb3maaeIp37Ru0PThtbXinpLuKtMxlks/3nCguUZVPseRJLE6rLFBVMr/uJeQ7QwcQ+eXrkuvl9/uuYHXJG62or0G31AUkBELq/kP1BXY+VukixhkaRZ75tWrvtl0olt9efhCrOGbRqKjD6hBCElp4QdA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RkqNeDxC; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RkqNeDxC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314868; x=1761850868; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qwoM3hXtk8NjnongzOLCIGAUZzhoO9es9XXbY37Guow=; b=RkqNeDxCaFg4hRLdYRW68KtldeBjegnus18BsMKhEPlyZK/URlGlIdxn h9vYvGme3aZJ1u9oCgfF4JCLONldsEw4P8jNO4QgQUA1rrr06qtzcQbrQ jrCBg02fbXj3HGmkDNFn8CXWEM2oJOmM2IIsXeQFku3BoHVPw479x7D50 cnFfOSQom3wZb6iFIWhQgaspzpEypcEfkjCcoiCW3lN46wUTkXBE39VGP A+njbJHASP+qoOesNvLIA+uNZYFnLusCOMnSPurORfkBy9X+XrwFfP8XM ZxoPihsg75Y68XysXcf+oW2p7ffOlrOKHIiouc4YLauoR26MPLM1qn8jU g==; X-CSE-ConnectionGUID: vUUTfk5aQJuVmSG+RUZkQA== X-CSE-MsgGUID: Ib3QRcZuQBma+fK/YnIDaw== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678787" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678787" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:02 -0700 X-CSE-ConnectionGUID: C23x65XRQvitv7PN8sBfQg== X-CSE-MsgGUID: AXSKYaAXSlmS86T1AvXEvQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499395" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:01 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Sean Christopherson , Isaku Yamahata , Yuan Yao Subject: [PATCH v2 13/25] KVM: TDX: Add TDX "architectural" error codes Date: Wed, 30 Oct 2024 12:00:26 -0700 Message-ID: <20241030190039.77971-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Add error codes for the TDX SEAMCALLs both for TDX VMM side for TDH SEAMCALL and TDX guest side for TDG.VP.VMCALL. KVM issues the TDX SEAMCALLs and checks its error code. KVM handles hypercall from the TDX guest and may return an error. So error code for the TDX guest is also needed. TDX SEAMCALL uses bits 31:0 to return more information, so these error codes will only exactly match RAX[63:32]. Error codes for TDG.VP.VMCALL is defined by TDX Guest-Host-Communication interface spec. Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini Reviewed-by: Yuan Yao Reviewed-by: Xiaoyao Li --- uAPI breakout v2: - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) v19: - Drop TDX_EPT_WALK_FAILED, TDX_EPT_ENTRY_NOT_FREE - Rename TDG_VP_VMCALL_ =3D> TDVMCALL_ to match the existing code - Move TDVMCALL error codes to shared/tdx.h - Added TDX_OPERAND_ID_TDR - Fix bisectability issues in headers (Kai) --- arch/x86/include/asm/shared/tdx.h | 7 +++++- arch/x86/kvm/vmx/tdx.h | 1 + arch/x86/kvm/vmx/tdx_errno.h | 36 +++++++++++++++++++++++++++++++ 3 files changed, 43 insertions(+), 1 deletion(-) create mode 100644 arch/x86/kvm/vmx/tdx_errno.h diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/share= d/tdx.h index fdfd41511b02..620327f0161f 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -26,7 +26,12 @@ #define TDVMCALL_GET_QUOTE 0x10002 #define TDVMCALL_REPORT_FATAL_ERROR 0x10003 =20 -#define TDVMCALL_STATUS_RETRY 1 +/* + * TDG.VP.VMCALL Status Codes (returned in R10) + */ +#define TDVMCALL_STATUS_SUCCESS 0x0000000000000000ULL +#define TDVMCALL_STATUS_RETRY 0x0000000000000001ULL +#define TDVMCALL_STATUS_INVALID_OPERAND 0x8000000000000000ULL =20 /* * Bitmasks of exposed registers (with VMM). diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 1d6fa81a072d..faed454385ca 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -2,6 +2,7 @@ #define __KVM_X86_VMX_TDX_H =20 #include "tdx_arch.h" +#include "tdx_errno.h" =20 #ifdef CONFIG_INTEL_TDX_HOST void tdx_bringup(void); diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h new file mode 100644 index 000000000000..dc3fa2a58c2c --- /dev/null +++ b/arch/x86/kvm/vmx/tdx_errno.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* architectural status code for SEAMCALL */ + +#ifndef __KVM_X86_TDX_ERRNO_H +#define __KVM_X86_TDX_ERRNO_H + +#define TDX_SEAMCALL_STATUS_MASK 0xFFFFFFFF00000000ULL + +/* + * TDX SEAMCALL Status Codes (returned in RAX) + */ +#define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL +#define TDX_INTERRUPTED_RESUMABLE 0x8000000300000000ULL +#define TDX_OPERAND_INVALID 0xC000010000000000ULL +#define TDX_OPERAND_BUSY 0x8000020000000000ULL +#define TDX_PREVIOUS_TLB_EPOCH_BUSY 0x8000020100000000ULL +#define TDX_PAGE_METADATA_INCORRECT 0xC000030000000000ULL +#define TDX_VCPU_NOT_ASSOCIATED 0x8000070200000000ULL +#define TDX_KEY_GENERATION_FAILED 0x8000080000000000ULL +#define TDX_KEY_STATE_INCORRECT 0xC000081100000000ULL +#define TDX_KEY_CONFIGURED 0x0000081500000000ULL +#define TDX_NO_HKID_READY_TO_WBCACHE 0x0000082100000000ULL +#define TDX_FLUSHVP_NOT_DONE 0x8000082400000000ULL +#define TDX_EPT_WALK_FAILED 0xC0000B0000000000ULL +#define TDX_EPT_ENTRY_STATE_INCORRECT 0xC0000B0D00000000ULL + +/* + * TDX module operand ID, appears in 31:0 part of error code as + * detail information + */ +#define TDX_OPERAND_ID_RCX 0x01 +#define TDX_OPERAND_ID_TDR 0x80 +#define TDX_OPERAND_ID_SEPT 0x92 +#define TDX_OPERAND_ID_TD_EPOCH 0xa9 + +#endif /* __KVM_X86_TDX_ERRNO_H */ --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C26A9219CB2; Wed, 30 Oct 2024 19:01:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314872; cv=none; b=Fvdg9CGx7AQfD9S4FI0nr5V8U7K8l1H2lb71ch8lYuagvLGLYrnNTW2CU9H/jsTyFs8psnVWJnkffdYSNIHLrVPSvIk1cLNJcTCvGsdU7sye7aiFMIfLspGZgVA+JqbOimCvYawsgJnXYo70ywTCeDVZt4DI0v4mMDpobq27Yds= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314872; c=relaxed/simple; bh=V3368ffDx785RJyNPxRJG2meb/Ss18eCYKbchJoTgJk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NedfzlotqbKe2Zswif/9WQetJ0pvRHt2Gv6uJ8PZDVSovsAEf1AYbnS8w9ZvqZIyh2DHGeHX/aXwbL6O+hJhjK2LJgek3gF4iwog4fKfxZwcBJqIr0UlS8dFyjM+s3hQ1PgDOZgss5m3DZtwFrfNtQKkoBuCm3OelPB9CtP6B6g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=F7UuW2Vp; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="F7UuW2Vp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314870; x=1761850870; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=V3368ffDx785RJyNPxRJG2meb/Ss18eCYKbchJoTgJk=; b=F7UuW2VpdoTpzKnyxplm0AjR992ovYxqN7foPFPdQGRIhPYq813kRvIx dwaRbWD+Uz/ebprAtOdHNS2nfvPMHfUz+Ynhdm73i3t2qUQryb3zggBgl 4Yf5cB4NNvxPVsPMFi9P6wHwxc/LjopPondwbfZDMvgDXr/lv1GkE2qVU PyFyYb+qdbIVV4NFBq0OT8pO8yLMXR14wvUiQyyGHvGxy+zS3D4gKwoGP cxQ4vmD/DVmy5cuflNxhf7HDK4gajqUgXUvuX+75yaMnHpWbN1MGe0k+B q8LYqWtV4JhW/xHJ7Dnz7MCLcmiQBzikilTXnyyOjCGwdM863ps4SUhSU A==; X-CSE-ConnectionGUID: CLGx0q+sQP6b0zINHONxHw== X-CSE-MsgGUID: ghMi/JvjQQSIMg7rRRlFtg== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678791" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678791" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:03 -0700 X-CSE-ConnectionGUID: tMYuWgStRVq3CBNULN/6Pg== X-CSE-MsgGUID: CgUoCdcGS5O1a0XLRRDC+Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499406" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:02 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata , Binbin Wu , Yuan Yao Subject: [PATCH v2 14/25] KVM: TDX: Add helper functions to print TDX SEAMCALL error Date: Wed, 30 Oct 2024 12:00:27 -0700 Message-ID: <20241030190039.77971-15-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Add helper functions to print out errors from the TDX module in a uniform manner. Signed-off-by: Isaku Yamahata Co-developed-by: Tony Lindgren Signed-off-by: Tony Lindgren Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu Reviewed-by: Yuan Yao --- uAPI breakout v2: - Stringify the error codes, use ____pr_tdx_error_N() naming (Isaku, Kai) uAPI breakout v1: - Update for the wrapper functions for SEAMCALLs. (Sean) - Reorder header file include to adjust argument change of the C wrapper. - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) v19: - dropped unnecessary include v18: - Added Reviewed-by Binbin. --- arch/x86/kvm/vmx/tdx.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f2830ff2af1d..60b577379a9a 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -7,6 +7,21 @@ #undef pr_fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt =20 +#define pr_tdx_error(__fn, __err) \ + pr_err_ratelimited("SEAMCALL %s failed: 0x%llx\n", #__fn, __err) + +#define __pr_tdx_error_N(__fn_str, __err, __fmt, ...) \ + pr_err_ratelimited("SEAMCALL " __fn_str " failed: 0x%llx, " __fmt, __err= , __VA_ARGS__) + +#define pr_tdx_error_1(__fn, __err, __rcx) \ + __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx\n", __rcx) + +#define pr_tdx_error_2(__fn, __err, __rcx, __rdx) \ + __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx, rdx 0x%llx\n", __rcx, __rdx) + +#define pr_tdx_error_3(__fn, __err, __rcx, __rdx, __r8) \ + __pr_tdx_error_N(#__fn, __err, "rcx 0x%llx, rdx 0x%llx, r8 0x%llx\n", __r= cx, __rdx, __r8) + bool enable_tdx __ro_after_init; module_param_named(tdx, enable_tdx, bool, 0444); =20 --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C234D21A4DE; Wed, 30 Oct 2024 19:01:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314873; cv=none; b=dzYD96/wDogEhJUIh9UK0dl2Rwdew4ELGh1oXIz+tEH01/buo0Fhi1xiKKHbX11/xv674e8L6Qgcgq0UsdmhWVrIhgD7DILR1PovQHBVofuVf5B1HOLcagt8mJwXyvjCZS5dekGMHJB1MH3XnYqYbVzW3B0td58IYhhB88VLUzA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314873; c=relaxed/simple; bh=mzeac55PChFYlHtX87/enRBB80n5kh3lSvzao4EQ58g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CBG3VuiexE4JrT+fDw9uuTG3OMKD5AF6gP+vFIAqiMMb46P3JfWKYXm4eq/Ls2gcbFUrDYjdMtg30UjlV32bIhT/1LdiP5a2OdlHDT8VjWqoCl0Nbt3kYc+MhV8FK1BpDNGBw77yORGIvYlsZo7sbyOFsAJBpVm8LehK2mdopMs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SzdhNq9p; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SzdhNq9p" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314871; x=1761850871; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mzeac55PChFYlHtX87/enRBB80n5kh3lSvzao4EQ58g=; b=SzdhNq9pUwCUGutd/3Rt+joDDpX2mXFbEzlVwjYzHQ8lvWEu9J+4i5p6 qfC+t4rnUYQBJCEp81RwPvqUU0o3iZOhLlDnsOPW0k5QAVQund7cODLhK MU+TIqFR/1DBi21u/a1dufXFnrKDL6Kklhgt9V55M4I6i87H14fj0cYvq u2xpJnrILESvNv1WV7WI/TTWcQHOHR3CrGO+TYsGb7lDzvSsqL9wTt/LE R44CfBEAeRhMMy//8C3t806olgoOc+WLmmFIxWR5+yrfLrA/ltoqIxjJ9 dfg2vdCcJhwYl1STMy92Y9jg0whbs4iPw7WRCY0mZSKo9DTxVoNdOmGfk Q==; X-CSE-ConnectionGUID: KFZLrowySwa0EMf2l/RH6Q== X-CSE-MsgGUID: VLugOfobSCuzKuKcqQXXhA== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678798" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678798" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:03 -0700 X-CSE-ConnectionGUID: gOY4n9EnRNerv0f0EE+GUA== X-CSE-MsgGUID: ewgUmjeJSQOvKlqTd3pgUQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499410" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:03 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata Subject: [PATCH v2 15/25] KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl Date: Wed, 30 Oct 2024 12:00:28 -0700 Message-ID: <20241030190039.77971-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata KVM_MEMORY_ENCRYPT_OP was introduced for VM-scoped operations specific for guest state-protected VM. It defined subcommands for technology-specific operations under KVM_MEMORY_ENCRYPT_OP. Despite its name, the subcommands are not limited to memory encryption, but various technology-specific operations are defined. It's natural to repurpose KVM_MEMORY_ENCRYPT_OP for TDX specific operations and define subcommands. Add a place holder function for TDX specific VM-scoped ioctl as mem_enc_op. TDX specific sub-commands will be added to retrieve/pass TDX specific parameters. Make mem_enc_ioctl non-optional as it's always filled. Signed-off-by: Isaku Yamahata Co-developed-by: Tony Lindgren Signed-off-by: Tony Lindgren Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - Correct comment to use hw_error naming (Isaku) - Drop KVM_TDX_CAPABILITIES, it's not needed yet (Binbin) uAPI breakout v1: - rename error->hw_error (Kai) - Include "x86_ops.h" to tdx.c as the patch to initialize TDX module doesn't include it anymore. - Introduce tdx_vm_ioctl() as the first tdx func in x86_ops.h - Drop middle paragraph in the commit log (Tony) v15: - change struct kvm_tdx_cmd to drop unused member. --- arch/x86/include/asm/kvm-x86-ops.h | 2 +- arch/x86/include/uapi/asm/kvm.h | 24 ++++++++++++++++++++++ arch/x86/kvm/vmx/main.c | 10 ++++++++++ arch/x86/kvm/vmx/tdx.c | 32 ++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/x86_ops.h | 6 ++++++ arch/x86/kvm/x86.c | 4 ---- 6 files changed, 73 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 53756a670f41..f250137c837a 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -123,7 +123,7 @@ KVM_X86_OP(leave_smm) KVM_X86_OP(enable_smi_window) #endif KVM_X86_OP_OPTIONAL(dev_get_attr) -KVM_X86_OP_OPTIONAL(mem_enc_ioctl) +KVM_X86_OP(mem_enc_ioctl) KVM_X86_OP_OPTIONAL(mem_enc_register_region) KVM_X86_OP_OPTIONAL(mem_enc_unregister_region) KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv= m.h index cba4351b3091..b6cb87f2b477 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -926,4 +926,28 @@ struct kvm_hyperv_eventfd { #define KVM_X86_SNP_VM 4 #define KVM_X86_TDX_VM 5 =20 +/* Trust Domain eXtension sub-ioctl() commands. */ +enum kvm_tdx_cmd_id { + KVM_TDX_CMD_NR_MAX, +}; + +struct kvm_tdx_cmd { + /* enum kvm_tdx_cmd_id */ + __u32 id; + /* flags for sub-commend. If sub-command doesn't use this, set zero. */ + __u32 flags; + /* + * data for each sub-command. An immediate or a pointer to the actual + * data in process virtual address. If sub-command doesn't use it, + * set zero. + */ + __u64 data; + /* + * Auxiliary error code. The sub-command may return TDX SEAMCALL + * status code in addition to -Exxx. + * Defined for consistency with struct kvm_sev_cmd. + */ + __u64 hw_error; +}; + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 245f7d1f1bd4..6ed78deea543 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -41,6 +41,14 @@ static __init int vt_hardware_setup(void) return 0; } =20 +static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) +{ + if (!is_td(kvm)) + return -ENOTTY; + + return tdx_vm_ioctl(kvm, argp); +} + #define VMX_REQUIRED_APICV_INHIBITS \ (BIT(APICV_INHIBIT_REASON_DISABLED) | \ BIT(APICV_INHIBIT_REASON_ABSENT) | \ @@ -191,6 +199,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D { .vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector, =20 .get_untagged_addr =3D vmx_get_untagged_addr, + + .mem_enc_ioctl =3D vt_mem_enc_ioctl, }; =20 struct kvm_x86_init_ops vt_init_ops __initdata =3D { diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 60b577379a9a..76655d82f749 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2,6 +2,7 @@ #include #include #include "capabilities.h" +#include "x86_ops.h" #include "tdx.h" =20 #undef pr_fmt @@ -29,6 +30,37 @@ static enum cpuhp_state tdx_cpuhp_state; =20 static const struct tdx_sys_info *tdx_sysinfo; =20 +int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) +{ + struct kvm_tdx_cmd tdx_cmd; + int r; + + if (copy_from_user(&tdx_cmd, argp, sizeof(struct kvm_tdx_cmd))) + return -EFAULT; + + /* + * Userspace should never set hw_error. It is used to fill + * hardware-defined error by the kernel. + */ + if (tdx_cmd.hw_error) + return -EINVAL; + + mutex_lock(&kvm->lock); + + switch (tdx_cmd.id) { + default: + r =3D -EINVAL; + goto out; + } + + if (copy_to_user(argp, &tdx_cmd, sizeof(struct kvm_tdx_cmd))) + r =3D -EFAULT; + +out: + mutex_unlock(&kvm->lock); + return r; +} + static int tdx_online_cpu(unsigned int cpu) { unsigned long flags; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index a55981c5216e..42901be70f9d 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -118,4 +118,10 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu); #endif void vmx_setup_mce(struct kvm_vcpu *vcpu); =20 +#ifdef CONFIG_INTEL_TDX_HOST +int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); +#else +static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur= n -EOPNOTSUPP; } +#endif + #endif /* __KVM_X86_VMX_X86_OPS_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9da7c728c391..d86a18a4195b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7308,10 +7308,6 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned in= t ioctl, unsigned long arg) goto out; } case KVM_MEMORY_ENCRYPT_OP: { - r =3D -ENOTTY; - if (!kvm_x86_ops.mem_enc_ioctl) - goto out; - r =3D kvm_x86_call(mem_enc_ioctl)(kvm, argp); break; } --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 15F2121A6F7; Wed, 30 Oct 2024 19:01:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314873; cv=none; b=Jxr0xIQdkzR3XHXOvp8K+qT58H0IYP11Jk3StoryDZl3Ifkn089ZxBf7g6h2CLduHx8Wb1/FJkAsmAFeeQt65HgqMCVJ29n9GLbKNp/ciuuNCyENLgoQ1FOWOBBYKNuzMwQwyG2aFHpvJob0xr6jk5xXhNqW1N1gce9O26RfvOs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314873; c=relaxed/simple; bh=NVN/l0y1W8Q69PCUIhXwtLAbBTJp3BPF7W9IZxyskRs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HM3YrnWr93bu3RNRA0ofmAD6kMc2UyqVDpwMtvoS9zqbJWnnMAgEfqDR4tf89mWvUsPwKOhfBHMZEowh3LMAl0GKHbn2z42s8uq1GZ/gxRa7XpGqTamQ2EmjH2yftP9rsxz0z5tru85KiGsmLLj1sA2+2x4y5v1dQlknlSYTtOs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=U9ZwbH0H; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="U9ZwbH0H" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314871; x=1761850871; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NVN/l0y1W8Q69PCUIhXwtLAbBTJp3BPF7W9IZxyskRs=; b=U9ZwbH0HYGVTOazsEVwtry6Y7XcOj50gk6k1+WM680Shc4z5LuAlZA5z h4D5/6cXqqeJpw4ZHLCEBLUVW/zn1qDM9RdSN5JzZAxRcWLLP049qTYQX miPDk6vHNmhqgXdytIloNKKAB9zZ0TMPH3rjth+BzN9qRuyYL+eQvP7LR zy1ajSOJC2zWCCtrcf6hTWTSVfvG0HZFwRo4SGrRonlI61ofRgeT0Bg4j 63+3UcSv+/OV5d/uRMDKXmAetmdl0hvjpQD30b8z5mP9U1pTss1E+xCE+ 0g4HPxIdjmwb/vX3WCAGE0180SivG7FgFPHV5xqZDb7kba7weJ/mTPFkG w==; X-CSE-ConnectionGUID: U2WuM+7dQP+0AxVaV1yz+w== X-CSE-MsgGUID: 1HS+nAOURqulNR5tHGPx5g== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678802" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678802" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:04 -0700 X-CSE-ConnectionGUID: kP2ChCDPQBeG5h8dz2bBQw== X-CSE-MsgGUID: 5dloIss8TS+yjaN9BLjLTg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499417" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:03 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata , Binbin Wu Subject: [PATCH v2 16/25] KVM: TDX: Get system-wide info about TDX module on initialization Date: Wed, 30 Oct 2024 12:00:29 -0700 Message-ID: <20241030190039.77971-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata TDX KVM needs system-wide information about the TDX module. Generate the data based on tdx_sysinfo td_conf CPUID data. Signed-off-by: Isaku Yamahata Co-developed-by: Xiaoyao Li Signed-off-by: Xiaoyao Li Co-developed-by: Tony Lindgren Signed-off-by: Tony Lindgren Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu --- uAPI breakout v2: - Update stale patch description (Binbin) - Add KVM_TDX_CAPABILITIES where it's first used (Binbin) - Drop Drop unused KVM_TDX_CPUID_NO_SUBLEAF (Chao) - Drop mmu.h, it's only needed in later patches (Binbin) - Fold in Xiaoyao's capabilities changes (Tony) - Generate data without struct kvm_tdx_caps (Tony) - Use struct kvm_cpuid_entry2 as suggested (Binbin) - Use helpers for phys_addr_bits (Paolo) - Check TDX and KVM capabilities on _tdx_bringup() (Xiaoyao) - Change code around cpuid_config_value since struct tdx_cpuid_config_value {} is removed (Kai) uAPI breakout v1: - Mention about hardware_unsetup(). (Binbin) - Added Reviewed-by. (Binbin) - Eliminated tdx_md_read(). (Kai) - Include "x86_ops.h" to tdx.c as the patch to initialize TDX module doesn't include it anymore. - Introduce tdx_vm_ioctl() as the first tdx func in x86_ops.h v19: - Added features0 - Use tdx_sys_metadata_read() - Fix error recovery path by Yuan Change v18: - Newly Added --- arch/x86/include/uapi/asm/kvm.h | 9 +++ arch/x86/kvm/vmx/tdx.c | 137 ++++++++++++++++++++++++++++++++ 2 files changed, 146 insertions(+) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv= m.h index b6cb87f2b477..0630530af334 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -928,6 +928,8 @@ struct kvm_hyperv_eventfd { =20 /* Trust Domain eXtension sub-ioctl() commands. */ enum kvm_tdx_cmd_id { + KVM_TDX_CAPABILITIES =3D 0, + KVM_TDX_CMD_NR_MAX, }; =20 @@ -950,4 +952,11 @@ struct kvm_tdx_cmd { __u64 hw_error; }; =20 +struct kvm_tdx_capabilities { + __u64 supported_attrs; + __u64 supported_xfam; + __u64 reserved[254]; + struct kvm_cpuid2 cpuid; +}; + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 76655d82f749..253debbe685f 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -30,6 +30,134 @@ static enum cpuhp_state tdx_cpuhp_state; =20 static const struct tdx_sys_info *tdx_sysinfo; =20 +#define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE) + +static u64 tdx_get_supported_attrs(const struct tdx_sys_info_td_conf *td_c= onf) +{ + u64 val =3D KVM_SUPPORTED_TD_ATTRS; + + if ((val & td_conf->attributes_fixed1) !=3D td_conf->attributes_fixed1) + return 0; + + val &=3D td_conf->attributes_fixed0; + + return val; +} + +static u64 tdx_get_supported_xfam(const struct tdx_sys_info_td_conf *td_co= nf) +{ + u64 val =3D kvm_caps.supported_xcr0 | kvm_caps.supported_xss; + + /* + * PT and CET can be exposed to TD guest regardless of KVM's XSS, PT + * and, CET support. + */ + val |=3D XFEATURE_MASK_PT | XFEATURE_MASK_CET_USER | + XFEATURE_MASK_CET_KERNEL; + + if ((val & td_conf->xfam_fixed1) !=3D td_conf->xfam_fixed1) + return 0; + + val &=3D td_conf->xfam_fixed0; + + return val; +} + +static u32 tdx_set_guest_phys_addr_bits(const u32 eax, int addr_bits) +{ + return (eax & ~GENMASK(23, 16)) | (addr_bits & 0xff) << 16; +} + +#define KVM_TDX_CPUID_NO_SUBLEAF ((__u32)-1) + +static void td_init_cpuid_entry2(struct kvm_cpuid_entry2 *entry, unsigned = char idx) +{ + const struct tdx_sys_info_td_conf *td_conf =3D &tdx_sysinfo->td_conf; + + entry->function =3D (u32)td_conf->cpuid_config_leaves[idx]; + entry->index =3D td_conf->cpuid_config_leaves[idx] >> 32; + entry->eax =3D (u32)td_conf->cpuid_config_values[idx][0]; + entry->ebx =3D td_conf->cpuid_config_values[idx][0] >> 32; + entry->ecx =3D (u32)td_conf->cpuid_config_values[idx][1]; + entry->edx =3D td_conf->cpuid_config_values[idx][1] >> 32; + + if (entry->index =3D=3D KVM_TDX_CPUID_NO_SUBLEAF) + entry->index =3D 0; + + /* Work around missing support on old TDX modules */ + if (entry->function =3D=3D 0x80000008) + entry->eax =3D tdx_set_guest_phys_addr_bits(entry->eax, 0xff); +} + +static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf, + struct kvm_tdx_capabilities *caps) +{ + int i; + + caps->supported_attrs =3D tdx_get_supported_attrs(td_conf); + if (!caps->supported_attrs) + return -EIO; + + caps->supported_xfam =3D tdx_get_supported_xfam(td_conf); + if (!caps->supported_xfam) + return -EIO; + + caps->cpuid.nent =3D td_conf->num_cpuid_config; + + for (i =3D 0; i < td_conf->num_cpuid_config; i++) + td_init_cpuid_entry2(&caps->cpuid.entries[i], i); + + return 0; +} + +static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) +{ + const struct tdx_sys_info_td_conf *td_conf =3D &tdx_sysinfo->td_conf; + struct kvm_tdx_capabilities __user *user_caps; + struct kvm_tdx_capabilities *caps =3D NULL; + int ret =3D 0; + + /* flags is reserved for future use */ + if (cmd->flags) + return -EINVAL; + + caps =3D kmalloc(sizeof(*caps) + + sizeof(struct kvm_cpuid_entry2) * td_conf->num_cpuid_config, + GFP_KERNEL); + if (!caps) + return -ENOMEM; + + user_caps =3D u64_to_user_ptr(cmd->data); + if (copy_from_user(caps, user_caps, sizeof(*caps))) { + ret =3D -EFAULT; + goto out; + } + + if (caps->cpuid.nent < td_conf->num_cpuid_config) { + ret =3D -E2BIG; + goto out; + } + + ret =3D init_kvm_tdx_caps(td_conf, caps); + if (ret) + goto out; + + if (copy_to_user(user_caps, caps, sizeof(*caps))) { + ret =3D -EFAULT; + goto out; + } + + if (copy_to_user(user_caps->cpuid.entries, caps->cpuid.entries, + caps->cpuid.nent * + sizeof(caps->cpuid.entries[0]))) + ret =3D -EFAULT; + +out: + /* kfree() accepts NULL. */ + kfree(caps); + return ret; +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; @@ -48,6 +176,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) mutex_lock(&kvm->lock); =20 switch (tdx_cmd.id) { + case KVM_TDX_CAPABILITIES: + r =3D tdx_get_capabilities(&tdx_cmd); + break; default: r =3D -EINVAL; goto out; @@ -147,11 +278,17 @@ static int __init __tdx_bringup(void) goto get_sysinfo_err; } =20 + /* Check TDX module and KVM capabilities */ + if (!tdx_get_supported_attrs(&tdx_sysinfo->td_conf) || + !tdx_get_supported_xfam(&tdx_sysinfo->td_conf)) + goto get_sysinfo_err; + /* * Leave hardware virtualization enabled after TDX is enabled * successfully. TDX CPU hotplug depends on this. */ return 0; + get_sysinfo_err: __do_tdx_cleanup(); tdx_bringup_err: --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 391E421B433; Wed, 30 Oct 2024 19:01:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314876; cv=none; b=TgUSzmcyE3Ps+X7A3coa916SUDJ9psxGMC6mleijiwka7kcdNgl5+10DcvZ6jTQkg+9fjvvzE0mCAmIfRSWrl30FyknS6E+bQ/o4rVlgSR/l8vZbrLgNq0BLnculxlZd/pgktLPb3+UiLPJLfEvqBgRFCuUu33C9BN3XbD57q1Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314876; c=relaxed/simple; bh=GKgUieIxvK4U5APsHBW7qMuw780AqPBpGdFDmp9XDHw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fBrJ8yFVb2vEs8Hjv6y/yO6cXsyXzvdjxCRnS+z3UaEybjHutWuMmd36VS1p9/IoImE+1ATa+AbuAxbopk3ZhcTkvEO+2OuIByPInQtdPNPwYya9Jil/Nco2WjzJGcoKr18wcaGSiZF9JsoFe1L9vvr8rtc1h18Hk3UJ6Tkp1tc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aS2CJfZY; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aS2CJfZY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314872; x=1761850872; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GKgUieIxvK4U5APsHBW7qMuw780AqPBpGdFDmp9XDHw=; b=aS2CJfZYQ3oCmajlVuzdU8GAwI0JSQJlPcowgkKHLyl0rC27/aU2lcS6 rma3/1exUq9SWAgJP8uK8pXHXgmK/NHJlMk8TGImp3B2YxoPO9mW8IZq5 Qo3BeFStkkRHy7dFn0dspAAFcNSGa6R+y6dXBuisPFUnv5+9u2/ZGqG1x ci4ivJh6iBr+1TYj8JfbBCMb/Kbb8DE6rfhvzInRFBoKGCTZ30ERtwDhx G7kgHf9n/eza2M561WLASXtG7AHaNPOwZ9jv6+xC1DmzdxXmV3vemI59F fHN4lU0Kp4EMb6D3kmf4ZlN1ikhvbiuNFaQ3YixNLRuBpKhWvfCGgHQ75 A==; X-CSE-ConnectionGUID: MbXP5HxeTT6VGRalKV8fXg== X-CSE-MsgGUID: /Sr8358OQlCIfxDapDvQYg== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678808" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678808" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:04 -0700 X-CSE-ConnectionGUID: 9GenwyJpSDawQWGUYVDh8Q== X-CSE-MsgGUID: wvsHD7OJTN+g7nvfxZ8PxA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499423" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:04 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata , Sean Christopherson Subject: [PATCH v2 17/25] KVM: TDX: create/destroy VM structure Date: Wed, 30 Oct 2024 12:00:30 -0700 Message-ID: <20241030190039.77971-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Implement managing the TDX private KeyID to implement, create, destroy and free for a TDX guest. When creating at TDX guest, assign a TDX private KeyID for the TDX guest for memory encryption, and allocate pages for the guest. These are used for the Trust Domain Root (TDR) and Trust Domain Control Structure (TDCS). On destruction, free the allocated pages, and the KeyID. Before tearing down the private page tables, TDX requires the guest TD to be destroyed by reclaiming the KeyID. Do it at vm_destroy() kvm_x86_ops hook. Add a call for vm_free() at the end of kvm_arch_destroy_vm() because the per-VM TDR needs to be freed after the KeyID. Co-developed-by: Tony Lindgren Signed-off-by: Tony Lindgren Signed-off-by: Isaku Yamahata Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Co-developed-by: Kai Huang Signed-off-by: Kai Huang Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- Comment: this SOB/developed-by chain is getting a bit long. uAPI breakout v2: - Remove write_lock(&kvm->mmu_lock) in HKID release was moved and vCPUs should be destroyed (Isaku) - Drop include for tdx_ops.h (Xu) - Drop unncessary clearing of tdr_pa in __tdx_td_init() (Yuan) - Use is_td_created() in tdx_vm_free() (Nikolay) - Precalculate nr_tcds_pages (Nikolay) - fixup SEAMCALL call sites due to function parameter changes to SEAMCALL wrappers (Kai) - Add TD state handling (Tony) uAPI breakout v1: - Fix unnecessary include re-ordering (Chao) - Fix the unpaired curly brackets (Chao) - Drop the tdx_mng_key_config_lock (Chao) - Drop unnecessary is_hkid_assigned() check (Chao) - Use KVM_GENERIC_PRIVATE_MEM and undo the removal of EXPERT (Binbin) - Drop the word typically from comments (Binbin) - Clarify comments for the need of global tdx_lock mutex (Kai) - Add function comments for tdx_clear_page() (Kai) - Clarify comments for tdx_clear_page() poisoned page (Kai) - Move and update comments for limitations of __tdx_reclaim_page() (Kai) - Drop comment related to "rare to contend" (Kai) - Drop comment related to TDR and target page (Tony) - Make code easier to read with line breaks between paragraphs (Kai) - Use cond_resched() retry (Kai) - Use for loop for retries (Tony) - Use switch to handle errors (Tony) - Drop loop for tdh_mng_key_config() (Tony) - Rename tdx_reclaim_control_page() td_page_pa to ctrl_page_pa (Kai) - Reorganize comments for tdx_reclaim_control_page() (Kai) - Use smp_func_do_phymem_cache_wb() naming to indicate SMP (Kai) - Use bool resume in smp_func_do_phymem_cache_wb() (Kai) - Add comment on retrying to smp_func_do_phymem_cache_wb() (Kai) - Move code change to tdx_module_setup() to __tdx_bringup() due to initializing is done in post hardware_setup() now and tdx_module_setup() is removed. Remove the code to use API to read global metadata but use exported 'struct tdx_sysinfo' pointer. - Replace 'tdx_info->nr_tdcs_pages' with a wrapper tdx_sysinfo_nr_tdcs_pages() because the 'struct tdx_sysinfo' doesn't have nr_tdcs_pages directly. - Replace tdx_info->max_vcpus_per_td with the new exported pointer in tdx_vm_init(). - Add comment to tdx_mmu_release_hkid() on KeyID allocated (Kai) - Update comments for tdx_mmu_release_hkid() for locking (Kai) - Clarify tdx_mmu_release_hkid() comments for freeing HKID (Kai) - Use KVM_BUG_ON() for SEAMCALLs in tdx_mmu_release_hkid() (Kai) - Use continue for loop in tdx_vm_free() (Kai) - Clarify comments in tdx_vm_free() for reclaiming TDCS (Kai) - Use KVM_BUG_ON() for tdx_vm_free() - Prettify format with line breaks in tdx_vm_free() (Tony) - Prettify formatting for __tdx_td_init() with line breaks (Kai) - Simplify comments for __tdx_td_init() locking (Kai) - Update patch description (Kai) --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/Kconfig | 2 + arch/x86/kvm/vmx/main.c | 28 +- arch/x86/kvm/vmx/tdx.c | 458 +++++++++++++++++++++++++++++ arch/x86/kvm/vmx/tdx.h | 7 +- arch/x86/kvm/vmx/x86_ops.h | 6 + arch/x86/kvm/x86.c | 1 + 8 files changed, 501 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index f250137c837a..e7bd7867cb94 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -21,6 +21,7 @@ KVM_X86_OP(has_emulated_msr) KVM_X86_OP(vcpu_after_set_cpuid) KVM_X86_OP(vm_init) KVM_X86_OP_OPTIONAL(vm_destroy) +KVM_X86_OP_OPTIONAL(vm_free) KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate) KVM_X86_OP(vcpu_create) KVM_X86_OP(vcpu_free) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 85ed576660ee..d8478e103f07 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1647,6 +1647,7 @@ struct kvm_x86_ops { unsigned int vm_size; int (*vm_init)(struct kvm *kvm); void (*vm_destroy)(struct kvm *kvm); + void (*vm_free)(struct kvm *kvm); =20 /* Create, but do not attach this VCPU */ int (*vcpu_precreate)(struct kvm *kvm); diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index f09f13c01c6b..8d1c3f75028d 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -92,6 +92,8 @@ config KVM_SW_PROTECTED_VM config KVM_INTEL tristate "KVM for Intel (and compatible) processors support" depends on KVM && IA32_FEAT_CTL + select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST + select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST help Provides support for KVM on processors equipped with Intel's VT extensions, a.k.a. Virtual Machine Extensions (VMX). diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 6ed78deea543..ed4afa45b16b 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -41,6 +41,28 @@ static __init int vt_hardware_setup(void) return 0; } =20 +static int vt_vm_init(struct kvm *kvm) +{ + if (is_td(kvm)) + return tdx_vm_init(kvm); + + return vmx_vm_init(kvm); +} + +static void vt_vm_destroy(struct kvm *kvm) +{ + if (is_td(kvm)) + return tdx_mmu_release_hkid(kvm); + + vmx_vm_destroy(kvm); +} + +static void vt_vm_free(struct kvm *kvm) +{ + if (is_td(kvm)) + tdx_vm_free(kvm); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -72,8 +94,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D { .has_emulated_msr =3D vmx_has_emulated_msr, =20 .vm_size =3D sizeof(struct kvm_vmx), - .vm_init =3D vmx_vm_init, - .vm_destroy =3D vmx_vm_destroy, + + .vm_init =3D vt_vm_init, + .vm_destroy =3D vt_vm_destroy, + .vm_free =3D vt_vm_free, =20 .vcpu_precreate =3D vmx_vcpu_precreate, .vcpu_create =3D vmx_vcpu_create, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 253debbe685f..50217f601061 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -110,6 +110,285 @@ static int init_kvm_tdx_caps(const struct tdx_sys_inf= o_td_conf *td_conf, return 0; } =20 +/* + * Some SEAMCALLs acquire the TDX module globally, and can fail with + * TDX_OPERAND_BUSY. Use a global mutex to serialize these SEAMCALLs. + */ +static DEFINE_MUTEX(tdx_lock); + +/* Maximum number of retries to attempt for SEAMCALLs. */ +#define TDX_SEAMCALL_RETRIES 10000 + +static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid) +{ + return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits); +} + +static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx) +{ + tdx_guest_keyid_free(kvm_tdx->hkid); + kvm_tdx->hkid =3D -1; +} + +static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx) +{ + return kvm_tdx->hkid > 0; +} + +static void tdx_clear_page(unsigned long page_pa) +{ + const void *zero_page =3D (const void *) __va(page_to_phys(ZERO_PAGE(0))); + void *page =3D __va(page_pa); + unsigned long i; + + /* + * The page could have been poisoned. MOVDIR64B also clears + * the poison bit so the kernel can safely use the page again. + */ + for (i =3D 0; i < PAGE_SIZE; i +=3D 64) + movdir64b(page + i, zero_page); + /* + * MOVDIR64B store uses WC buffer. Prevent following memory reads + * from seeing potentially poisoned cache. + */ + __mb(); +} + +/* TDH.PHYMEM.PAGE.RECLAIM is allowed only when destroying the TD. */ +static int __tdx_reclaim_page(hpa_t pa) +{ + u64 err, rcx, rdx, r8; + int i; + + for (i =3D TDX_SEAMCALL_RETRIES; i > 0; i--) { + err =3D tdh_phymem_page_reclaim(pa, &rcx, &rdx, &r8); + + /* + * TDH.PHYMEM.PAGE.RECLAIM is allowed only when TD is shutdown. + * state. i.e. destructing TD. + * TDH.PHYMEM.PAGE.RECLAIM requires TDR and target page. + * Because we're destructing TD, it's rare to contend with TDR. + */ + switch (err) { + case TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX: + case TDX_OPERAND_BUSY | TDX_OPERAND_ID_TDR: + cond_resched(); + continue; + default: + goto out; + } + } + +out: + if (WARN_ON_ONCE(err)) { + pr_tdx_error_3(TDH_PHYMEM_PAGE_RECLAIM, err, rcx, rdx, r8); + return -EIO; + } + return 0; +} + +static int tdx_reclaim_page(hpa_t pa) +{ + int r; + + r =3D __tdx_reclaim_page(pa); + if (!r) + tdx_clear_page(pa); + return r; +} + + +/* + * Reclaim the TD control page(s) which are crypto-protected by TDX guest's + * private KeyID. Assume the cache associated with the TDX private KeyID = has + * been flushed. + */ +static void tdx_reclaim_control_page(unsigned long ctrl_page_pa) +{ + /* + * Leak the page if the kernel failed to reclaim the page. + * The kernel cannot use it safely anymore. + */ + if (tdx_reclaim_page(ctrl_page_pa)) + return; + + free_page((unsigned long)__va(ctrl_page_pa)); +} + +static void smp_func_do_phymem_cache_wb(void *unused) +{ + u64 err =3D 0; + bool resume; + int i; + + /* + * TDH.PHYMEM.CACHE.WB flushes caches associated with any TDX private + * KeyID on the package or core. The TDX module may not finish the + * cache flush but return TDX_INTERRUPTED_RESUMEABLE instead. The + * kernel should retry it until it returns success w/o rescheduling. + */ + for (i =3D TDX_SEAMCALL_RETRIES; i > 0; i--) { + resume =3D !!err; + err =3D tdh_phymem_cache_wb(resume); + switch (err) { + case TDX_INTERRUPTED_RESUMABLE: + continue; + case TDX_NO_HKID_READY_TO_WBCACHE: + err =3D TDX_SUCCESS; /* Already done by other thread */ + fallthrough; + default: + goto out; + } + } + +out: + if (WARN_ON_ONCE(err)) + pr_tdx_error(TDH_PHYMEM_CACHE_WB, err); +} + +void tdx_mmu_release_hkid(struct kvm *kvm) +{ + bool packages_allocated, targets_allocated; + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + cpumask_var_t packages, targets; + u64 err; + int i; + + if (!is_hkid_assigned(kvm_tdx)) + return; + + /* KeyID has been allocated but guest is not yet configured */ + if (!kvm_tdx->tdr_pa) { + tdx_hkid_free(kvm_tdx); + return; + } + + packages_allocated =3D zalloc_cpumask_var(&packages, GFP_KERNEL); + targets_allocated =3D zalloc_cpumask_var(&targets, GFP_KERNEL); + cpus_read_lock(); + + /* + * TDH.PHYMEM.CACHE.WB tries to acquire the TDX module global lock + * and can fail with TDX_OPERAND_BUSY when it fails to get the lock. + * Multiple TDX guests can be destroyed simultaneously. Take the + * mutex to prevent it from getting error. + */ + mutex_lock(&tdx_lock); + + /* + * Releasing HKID is in vm_destroy(). + * After the above flushing vps, there should be no more vCPU + * associations, as all vCPU fds have been released at this stage. + */ + for_each_online_cpu(i) { + if (packages_allocated && + cpumask_test_and_set_cpu(topology_physical_package_id(i), + packages)) + continue; + if (targets_allocated) + cpumask_set_cpu(i, targets); + } + if (targets_allocated) + on_each_cpu_mask(targets, smp_func_do_phymem_cache_wb, NULL, true); + else + on_each_cpu(smp_func_do_phymem_cache_wb, NULL, true); + /* + * In the case of error in smp_func_do_phymem_cache_wb(), the following + * tdh_mng_key_freeid() will fail. + */ + err =3D tdh_mng_key_freeid(kvm_tdx->tdr_pa); + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_MNG_KEY_FREEID, err); + pr_err("tdh_mng_key_freeid() failed. HKID %d is leaked.\n", + kvm_tdx->hkid); + } else { + tdx_hkid_free(kvm_tdx); + } + + mutex_unlock(&tdx_lock); + cpus_read_unlock(); + free_cpumask_var(targets); + free_cpumask_var(packages); +} + +static void tdx_reclaim_td_control_pages(struct kvm *kvm) +{ + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + u64 err; + int i; + + /* + * tdx_mmu_release_hkid() failed to reclaim HKID. Something went wrong + * heavily with TDX module. Give up freeing TD pages. As the function + * already warned, don't warn it again. + */ + if (is_hkid_assigned(kvm_tdx)) + return; + + if (kvm_tdx->tdcs_pa) { + for (i =3D 0; i < kvm_tdx->nr_tdcs_pages; i++) { + if (!kvm_tdx->tdcs_pa[i]) + continue; + + tdx_reclaim_control_page(kvm_tdx->tdcs_pa[i]); + } + kfree(kvm_tdx->tdcs_pa); + kvm_tdx->tdcs_pa =3D NULL; + } + + if (!kvm_tdx->tdr_pa) + return; + + if (__tdx_reclaim_page(kvm_tdx->tdr_pa)) + return; + + /* + * Use a SEAMCALL to ask the TDX module to flush the cache based on the + * KeyID. TDX module may access TDR while operating on TD (Especially + * when it is reclaiming TDCS). + */ + err =3D tdh_phymem_page_wbinvd_tdr(kvm_tdx->tdr_pa); + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err); + return; + } + tdx_clear_page(kvm_tdx->tdr_pa); + + free_page((unsigned long)__va(kvm_tdx->tdr_pa)); + kvm_tdx->tdr_pa =3D 0; +} + +void tdx_vm_free(struct kvm *kvm) +{ + tdx_reclaim_td_control_pages(kvm); +} + +static int tdx_do_tdh_mng_key_config(void *param) +{ + struct kvm_tdx *kvm_tdx =3D param; + u64 err; + + /* TDX_RND_NO_ENTROPY related retries are handled by sc_retry() */ + err =3D tdh_mng_key_config(kvm_tdx->tdr_pa); + + if (KVM_BUG_ON(err, &kvm_tdx->kvm)) { + pr_tdx_error(TDH_MNG_KEY_CONFIG, err); + return -EIO; + } + + return 0; +} + +static int __tdx_td_init(struct kvm *kvm); + +int tdx_vm_init(struct kvm *kvm) +{ + kvm->arch.has_private_mem =3D true; + + /* Place holder for TDX specific logic. */ + return __tdx_td_init(kvm); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf =3D &tdx_sysinfo->td_conf; @@ -158,6 +437,180 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *c= md) return ret; } =20 +static int __tdx_td_init(struct kvm *kvm) +{ + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + cpumask_var_t packages; + unsigned long *tdcs_pa =3D NULL; + unsigned long tdr_pa =3D 0; + unsigned long va; + int ret, i; + u64 err; + + ret =3D tdx_guest_keyid_alloc(); + if (ret < 0) + return ret; + kvm_tdx->hkid =3D ret; + + va =3D __get_free_page(GFP_KERNEL_ACCOUNT); + if (!va) + goto free_hkid; + tdr_pa =3D __pa(va); + + kvm_tdx->nr_tdcs_pages =3D tdx_sysinfo->td_ctrl.tdcs_base_size / PAGE_SIZ= E; + tdcs_pa =3D kcalloc(kvm_tdx->nr_tdcs_pages, sizeof(*kvm_tdx->tdcs_pa), + GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!tdcs_pa) + goto free_tdr; + + for (i =3D 0; i < kvm_tdx->nr_tdcs_pages; i++) { + va =3D __get_free_page(GFP_KERNEL_ACCOUNT); + if (!va) + goto free_tdcs; + tdcs_pa[i] =3D __pa(va); + } + + if (!zalloc_cpumask_var(&packages, GFP_KERNEL)) { + ret =3D -ENOMEM; + goto free_tdcs; + } + + cpus_read_lock(); + + /* + * Need at least one CPU of the package to be online in order to + * program all packages for host key id. Check it. + */ + for_each_present_cpu(i) + cpumask_set_cpu(topology_physical_package_id(i), packages); + for_each_online_cpu(i) + cpumask_clear_cpu(topology_physical_package_id(i), packages); + if (!cpumask_empty(packages)) { + ret =3D -EIO; + /* + * Because it's hard for human operator to figure out the + * reason, warn it. + */ +#define MSG_ALLPKG "All packages need to have online CPU to create TD. Onl= ine CPU and retry.\n" + pr_warn_ratelimited(MSG_ALLPKG); + goto free_packages; + } + + /* + * TDH.MNG.CREATE tries to grab the global TDX module and fails + * with TDX_OPERAND_BUSY when it fails to grab. Take the global + * lock to prevent it from failure. + */ + mutex_lock(&tdx_lock); + kvm_tdx->tdr_pa =3D tdr_pa; + err =3D tdh_mng_create(kvm_tdx->tdr_pa, kvm_tdx->hkid); + mutex_unlock(&tdx_lock); + + if (err =3D=3D TDX_RND_NO_ENTROPY) { + ret =3D -EAGAIN; + goto free_packages; + } + + if (WARN_ON_ONCE(err)) { + pr_tdx_error(TDH_MNG_CREATE, err); + ret =3D -EIO; + goto free_packages; + } + + for_each_online_cpu(i) { + int pkg =3D topology_physical_package_id(i); + + if (cpumask_test_and_set_cpu(pkg, packages)) + continue; + + /* + * Program the memory controller in the package with an + * encryption key associated to a TDX private host key id + * assigned to this TDR. Concurrent operations on same memory + * controller results in TDX_OPERAND_BUSY. No locking needed + * beyond the cpus_read_lock() above as it serializes against + * hotplug and the first online CPU of the package is always + * used. We never have two CPUs in the same socket trying to + * program the key. + */ + ret =3D smp_call_on_cpu(i, tdx_do_tdh_mng_key_config, + kvm_tdx, true); + if (ret) + break; + } + cpus_read_unlock(); + free_cpumask_var(packages); + if (ret) { + i =3D 0; + goto teardown; + } + + kvm_tdx->tdcs_pa =3D tdcs_pa; + for (i =3D 0; i < kvm_tdx->nr_tdcs_pages; i++) { + err =3D tdh_mng_addcx(kvm_tdx->tdr_pa, tdcs_pa[i]); + if (err =3D=3D TDX_RND_NO_ENTROPY) { + /* Here it's hard to allow userspace to retry. */ + ret =3D -EBUSY; + goto teardown; + } + if (WARN_ON_ONCE(err)) { + pr_tdx_error(TDH_MNG_ADDCX, err); + ret =3D -EIO; + goto teardown; + } + } + + /* + * Note, TDH_MNG_INIT cannot be invoked here. TDH_MNG_INIT requires a de= dicated + * ioctl() to define the configure CPUID values for the TD. + */ + return 0; + + /* + * The sequence for freeing resources from a partially initialized TD + * varies based on where in the initialization flow failure occurred. + * Simply use the full teardown and destroy, which naturally play nice + * with partial initialization. + */ +teardown: + /* Only free pages not yet added, so start at 'i' */ + for (; i < kvm_tdx->nr_tdcs_pages; i++) { + if (tdcs_pa[i]) { + free_page((unsigned long)__va(tdcs_pa[i])); + tdcs_pa[i] =3D 0; + } + } + if (!kvm_tdx->tdcs_pa) + kfree(tdcs_pa); + + tdx_mmu_release_hkid(kvm); + tdx_reclaim_td_control_pages(kvm); + + return ret; + +free_packages: + cpus_read_unlock(); + free_cpumask_var(packages); + +free_tdcs: + for (i =3D 0; i < kvm_tdx->nr_tdcs_pages; i++) { + if (tdcs_pa[i]) + free_page((unsigned long)__va(tdcs_pa[i])); + } + kfree(tdcs_pa); + kvm_tdx->tdcs_pa =3D NULL; + +free_tdr: + if (tdr_pa) + free_page((unsigned long)__va(tdr_pa)); + kvm_tdx->tdr_pa =3D 0; + +free_hkid: + tdx_hkid_free(kvm_tdx); + + return ret; +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; @@ -251,6 +704,11 @@ static int __init __tdx_bringup(void) { int r; =20 + if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) { + pr_warn("MOVDIR64B is reqiured for TDX\n"); + return -EOPNOTSUPP; + } + if (!enable_ept) { pr_err("Cannot enable TDX with EPT disabled.\n"); return -EINVAL; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index faed454385ca..e557a82bc882 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -12,7 +12,12 @@ extern bool enable_tdx; =20 struct kvm_tdx { struct kvm kvm; - /* TDX specific members follow. */ + + unsigned long tdr_pa; + unsigned long *tdcs_pa; + + int hkid; + u8 nr_tdcs_pages; }; =20 struct vcpu_tdx { diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 42901be70f9d..e7d5afce68f0 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -119,8 +119,14 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu); void vmx_setup_mce(struct kvm_vcpu *vcpu); =20 #ifdef CONFIG_INTEL_TDX_HOST +int tdx_vm_init(struct kvm *kvm); +void tdx_mmu_release_hkid(struct kvm *kvm); +void tdx_vm_free(struct kvm *kvm); int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); #else +static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } +static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} +static inline void tdx_vm_free(struct kvm *kvm) {} static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur= n -EOPNOTSUPP; } #endif =20 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d86a18a4195b..8a103c29dcd0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12883,6 +12883,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kvm_page_track_cleanup(kvm); kvm_xen_destroy_vm(kvm); kvm_hv_destroy_vm(kvm); + static_call_cond(kvm_x86_vm_free)(kvm); } =20 static void memslot_rmap_free(struct kvm_memory_slot *slot) --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 89CB821F4A4; Wed, 30 Oct 2024 19:01:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314876; cv=none; b=LJxEjggQvscvf/qBjGptXa+mouBJo4Qdk1bzCXFLDq8ftSPVsy1Sq1xk/bnVeKjxxwLB/XB53a2b7Ymr+Kqdj+qjE0HvkRka74La2zXc6UQfRXB8AzSleZUeBNvJwhCb28RWY/zKmWu+j7zIknDn+X5CttowyaKPRlNT7SrcjUQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314876; c=relaxed/simple; bh=Au+x7T6/rqwLMWU1pQUqQQMAW+zIxuKwmu5a/UDxj/o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qNjPegcxN2XPx2GzbxACdlvZQv899E2s6G6WZq0oU/hmBZXBxCceMzQYlFqDqr3v/mnhxaiRXgSnblU9hl/GsaRQSYfey76gdOdGhWT/DKD8oZWt1ew0cwE9g5eO7YSe4AlQc+n/CMXjUENokW5EhLbx/yirpOgkLCNHXS3WwWQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QNbiiXdC; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QNbiiXdC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314874; x=1761850874; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Au+x7T6/rqwLMWU1pQUqQQMAW+zIxuKwmu5a/UDxj/o=; b=QNbiiXdC90MLlrIej/TQO2UMi9iwd5HKeua9XXqUHcWAvg/1IxGGqFxJ Fit5rS6adh3dzdl7zXSvbPbPOTS/ekCCUSaMCQVQGvWaVqh7+5ZgWiWAC h4/iZb+up1tDhuFNHmKFG8Q7r5o1vTVXGQQXS9xpN6rDvnuF0hpeMg8rB Gqg1PBk2EB3/5YZrstuv7xH6RarQ0XbV5cQ0GkY3pIn932tUmbwSk23qO ElWKq8foAY1cggq5/EKaHw9oJZvw4SibQv8PVajqVXg1fMtzRYsNV6/Pf qBviJBLcNK8TzIB6pP/JDTJcYW6m8H2OLsOIOGT3QVtfAOfR457HiImb6 w==; X-CSE-ConnectionGUID: JB4HYZ8CTaeUQOqHfpBYFg== X-CSE-MsgGUID: dvse+m0WQYqiWzK6fVXvig== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678813" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678813" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:05 -0700 X-CSE-ConnectionGUID: qgYHTSsGSi2GXtc/Gpe2NQ== X-CSE-MsgGUID: oZJJCf8pTgWWuOm8lHTXJg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499430" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:04 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata Subject: [PATCH v2 18/25] KVM: TDX: Support per-VM KVM_CAP_MAX_VCPUS extension check Date: Wed, 30 Oct 2024 12:00:31 -0700 Message-ID: <20241030190039.77971-19-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Change to report the KVM_CAP_MAX_VCPUS extension from globally to per-VM to allow userspace to be able to query maximum vCPUs for TDX guest via checking the KVM_CAP_MAX_VCPU extension on per-VM basis. Today KVM x86 reports KVM_MAX_VCPUS as guest's maximum vCPUs for all guests globally, and userspace, i.e. Qemu, queries the KVM_MAX_VCPUS extension globally but not on per-VM basis. TDX has its own limit of maximum vCPUs it can support for all TDX guests in addition to KVM_MAX_VCPUS. TDX module reports this limit via the MAX_VCPU_PER_TD global metadata. Different modules may report different values. In practice, the reported value reflects the maximum logical CPUs that ALL the platforms that the module supports can possibly have. Note some old modules may also not support this metadata, in which case the limit is U16_MAX. The current way to always report KVM_MAX_VCPUS in the KVM_CAP_MAX_VCPUS extension is not enough for TDX. To accommodate TDX, change to report the KVM_CAP_MAX_VCPUS extension on per-VM basis. Specifically, override kvm->max_vcpus in tdx_vm_init() for TDX guest, and report kvm->max_vcpus in the KVM_CAP_MAX_VCPUS extension check. Change to report "the number of logical CPUs the platform has" as the maximum vCPUs for TDX guest. Simply forwarding the MAX_VCPU_PER_TD reported by the TDX module would result in an unpredictable ABI because the reported value to userspace would be depending on whims of TDX modules. This works in practice because of the MAX_VCPU_PER_TD reported by the TDX module will never be smaller than the one reported to userspace. But to make sure KVM never reports an unsupported value, sanity check the MAX_VCPU_PER_TD reported by TDX module is not smaller than the number of logical CPUs the platform has, otherwise refuse to use TDX. Note, when creating a TDX guest, TDX actually requires the "maximum vCPUs for _this_ TDX guest" as an input to initialize the TDX guest. But TDX guest's maximum vCPUs is not part of TDREPORT thus not part of attestation, thus there's no need to allow userspace to explicitly _configure_ the maximum vCPUs on per-VM basis. KVM will simply use kvm->max_vcpus as input when initializing the TDX guest. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - Implement proposal from Sean: (Kai) - https://lore.kernel.org/kvm/ZmzaqRy2zjvlsDfL@google.com/ - https://lore.kernel.org/kvm/fcbc5a898c3434af98656b92a83dbba01d055e51.c= amel@intel.com/ - Change title from "KVM: TDX: Allow userspace to configure maximum vCPUs for TDX guests" - Correct setting of kvm->max_vcpus (Kai) uAPI breakout v1: - Change to use exported 'struct tdx_sysinfo' pointer. - Remove the code to read 'max_vcpus_per_td' since it is now done in TDX host code. - Drop max_vcpu ops to use kvm.max_vcpus - Remove TDX_MAX_VCPUS (Kai) - Use type cast (u16) instead of calling memcpy() when reading the 'max_vcpus_per_td' (Kai) - Improve change log and change patch title from "KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific" (Kai) --- arch/x86/kvm/vmx/main.c | 1 + arch/x86/kvm/vmx/tdx.c | 51 +++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 2 ++ 3 files changed, 54 insertions(+) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index ed4afa45b16b..559f9450dec7 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -7,6 +7,7 @@ #include "pmu.h" #include "posted_intr.h" #include "tdx.h" +#include "tdx_arch.h" =20 static __init int vt_hardware_setup(void) { diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 50217f601061..c9093b003c13 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -385,6 +385,19 @@ int tdx_vm_init(struct kvm *kvm) { kvm->arch.has_private_mem =3D true; =20 + /* + * TDX has its own limit of maximum vCPUs it can support for all + * TDX guests in addition to KVM_MAX_VCPUS. TDX module reports + * such limit via the MAX_VCPU_PER_TD global metadata. In + * practice, it reflects the number of logical CPUs that ALL + * platforms that the TDX module supports can possibly have. + * + * Limit TDX guest's maximum vCPUs to the number of logical CPUs + * the platform has. Simply forwarding the MAX_VCPU_PER_TD to + * userspace would result in an unpredictable ABI. + */ + kvm->max_vcpus =3D min_t(int, kvm->max_vcpus, num_present_cpus()); + /* Place holder for TDX specific logic. */ return __tdx_td_init(kvm); } @@ -702,6 +715,7 @@ static int __init __do_tdx_bringup(void) =20 static int __init __tdx_bringup(void) { + const struct tdx_sys_info_td_conf *td_conf; int r; =20 if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) { @@ -741,6 +755,43 @@ static int __init __tdx_bringup(void) !tdx_get_supported_xfam(&tdx_sysinfo->td_conf)) goto get_sysinfo_err; =20 + /* + * TDX has its own limit of maximum vCPUs it can support for all + * TDX guests in addition to KVM_MAX_VCPUS. Userspace needs to + * query TDX guest's maximum vCPUs by checking KVM_CAP_MAX_VCPU + * extension on per-VM basis. + * + * TDX module reports such limit via the MAX_VCPU_PER_TD global + * metadata. Different modules may report different values. + * Some old module may also not support this metadata (in which + * case this limit is U16_MAX). + * + * In practice, the reported value reflects the maximum logical + * CPUs that ALL the platforms that the module supports can + * possibly have. + * + * Simply forwarding the MAX_VCPU_PER_TD to userspace could + * result in an unpredictable ABI. KVM instead always advertise + * the number of logical CPUs the platform has as the maximum + * vCPUs for TDX guests. + * + * Make sure MAX_VCPU_PER_TD reported by TDX module is not + * smaller than the number of logical CPUs, otherwise KVM will + * report an unsupported value to userspace. + * + * Note, a platform with TDX enabled in the BIOS cannot support + * physical CPU hotplug, and TDX requires the BIOS has marked + * all logical CPUs in MADT table as enabled. Just use + * num_present_cpus() for the number of logical CPUs. + */ + td_conf =3D &tdx_sysinfo->td_conf; + if (td_conf->max_vcpus_per_td < num_present_cpus()) { + pr_err("Disable TDX: MAX_VCPU_PER_TD (%u) smaller than number of logical= CPUs (%u).\n", + td_conf->max_vcpus_per_td, num_present_cpus()); + r =3D -EINVAL; + goto get_sysinfo_err; + } + /* * Leave hardware virtualization enabled after TDX is enabled * successfully. TDX CPU hotplug depends on this. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8a103c29dcd0..95a10c7bc507 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4744,6 +4744,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lon= g ext) break; case KVM_CAP_MAX_VCPUS: r =3D KVM_MAX_VCPUS; + if (kvm) + r =3D kvm->max_vcpus; break; case KVM_CAP_MAX_VCPU_ID: r =3D KVM_MAX_VCPU_IDS; --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCBCD21FD87; Wed, 30 Oct 2024 19:01:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314877; cv=none; b=Ht7kHwQoTkkZH12GWo5gGmWHRovWXEPrHcusseBbLl8mzc09X/QplX/ZeN7HsWsuX93aNMAGIYlIi5fj7poBzlnZ3kSD2RdJveLGZPbEq2r3s5ZRDICjhYf9c4FDNPGiXjo2x/r557+cXUQsBLw/fqWJIuD5Ll9on68LfU4tM4A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314877; c=relaxed/simple; bh=eulfXokD5/qZzy03Oms7b48P4y09C+Abv8jAdEYUHSk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=V5z4DApHfyvxETruU+WbUQmL1+SPBwUYsNPlllS3Unk4SIXX75rp2RZOSLZr5P9vNJ9S6GoNbmMTeFqEwZozlk2NGDyyGC0cVibFm/EiPZBVd6ZyYMR7onEIFUDsAk8qdUnmshrf9W7zyvY100b3JbAs/n8IczoVJq8JkdMRH6E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=d6hyuXjZ; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="d6hyuXjZ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314874; x=1761850874; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eulfXokD5/qZzy03Oms7b48P4y09C+Abv8jAdEYUHSk=; b=d6hyuXjZJbM8iO8nVGkLZ5Si0LGevBFPaLOXatw6nuvMk/KnvvSTGzYH YGEfZtj5FgZHfsuOW62p8hyFicQontCCLcC/kwqiFQg9Y8yrUi8VFn+D0 gzP7zyPD+7CY2KIzwpASdMrRFzpw20kjuRlO0tQ5Mmqo5bDUK3Qb9mea9 PnYodtnaDk0eqAk2J3AwoxFLc/FdfqNN+Jhdjk9fhkcjmOwgy5Y1RQL86 2WUEsF6FD8SEx73YiXVWLYisgLLi0Ph/ZxmLKTjKdnS2Nc7ZE2eJlGYiX CkXsmAdpktc8zVKNrUHw7/xDW9VU9swadd3EBCNTNxiz950ELRPFLO0Qg Q==; X-CSE-ConnectionGUID: R69U6LmiR/mb2l2mOR6seA== X-CSE-MsgGUID: K2MTfBKrR4qHHcCEubzatw== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678821" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678821" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:05 -0700 X-CSE-ConnectionGUID: mpyYMRYmRZ6Nk1XKNE8sKA== X-CSE-MsgGUID: zs4cth0sSy+ItWgwmeRccA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499437" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:05 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata Subject: [PATCH v2 19/25] KVM: TDX: initialize VM with TDX specific parameters Date: Wed, 30 Oct 2024 12:00:32 -0700 Message-ID: <20241030190039.77971-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata After the crypto-protection key has been configured, TDX requires a VM-scope initialization as a step of creating the TDX guest. This "per-VM" TDX initialization does the global configurations/features that the TDX guest can support, such as guest's CPUIDs (emulated by the TDX module), the maximum number of vcpus etc. This "per-VM" TDX initialization must be done before any "vcpu-scope" TDX initialization. To match this better, require the KVM_TDX_INIT_VM IOCTL() to be done before KVM creates any vcpus. Co-developed-by: Xiaoyao Li Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - Move setting of gfn_direct_bits into MMU part 2 series (Yan) - Enable NO_RBP_MOD after removing workaround in later pactches (Fengwei) - Use TDX 1.5 naming of config_flags instead of exec_controls (Xiaoyao) - fixup SEAMCALL call sites due to function parameter changes to SEAMCALL wrappers (Kai) - Reject leaves that are not in tdx caps (Rick) - Add TD state handling (Tony) - Generate data directly from td_conf (Tony) - Fold in guest physical address to configure EPT level (Xiaoyao) - Use helpers for phys_addr_bits and add comments (Paolo) uAPI breakout v1: - Drop TDX_TD_XFAM_CET and use XFEATURE_MASK_CET_{USER, KERNEL}. - Update for the wrapper functions for SEAMCALLs. (Sean) - Move gfn_shared_mask settings into this patch due to MMU section move - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) - Allow userspace configure xfam directly - Check if user sets non-configurable bits in CPUIDs - Rename error->hw_error - Move code change to tdx_module_setup() to __tdx_bringup() due to initializing is done in post hardware_setup() now and tdx_module_setup() is removed. Remove the code to use API to read global metadata but use exported 'struct tdx_sysinfo' pointer. - Replace 'tdx_info->nr_tdcs_pages' with a wrapper tdx_sysinfo_nr_tdcs_pages() because the 'struct tdx_sysinfo' doesn't have nr_tdcs_pages directly. - Replace tdx_info->max_vcpus_per_td with the new exported pointer in tdx_vm_init(). - Decrease the reserved space for struct kvm_tdx_init_vm (Kai) - Use sizeof_field() for struct kvm_tdx_init_vm cpuids (Tony) - No need to init init_vm, it gets copied over in tdx_td_init() (Chao) - Use kmalloc() instead of () kzalloc for init_vm in tdx_td_init() (Chao) - Add more line breaks to tdx_td_init() to make code easier to read (Tony) - Clarify patch description (Kai) v19: - Check NO_RBP_MOD of feature0 and set it - Update the comment for PT and CET v18: - remove the change of tools/arch/x86/include/uapi/asm/kvm.h - typo in comment. sha348 =3D> sha384 - updated comment in setup_tdparams_xfam() - fix setup_tdparams_xfam() to use init_vm instead of td_params --- arch/x86/include/uapi/asm/kvm.h | 24 +++ arch/x86/kvm/cpuid.c | 7 + arch/x86/kvm/cpuid.h | 2 + arch/x86/kvm/vmx/tdx.c | 259 ++++++++++++++++++++++++++++++-- arch/x86/kvm/vmx/tdx.h | 24 +++ 5 files changed, 306 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv= m.h index 0630530af334..892e16bd7430 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -929,6 +929,7 @@ struct kvm_hyperv_eventfd { /* Trust Domain eXtension sub-ioctl() commands. */ enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES =3D 0, + KVM_TDX_INIT_VM, =20 KVM_TDX_CMD_NR_MAX, }; @@ -959,4 +960,27 @@ struct kvm_tdx_capabilities { struct kvm_cpuid2 cpuid; }; =20 +struct kvm_tdx_init_vm { + __u64 attributes; + __u64 xfam; + __u64 mrconfigid[6]; /* sha384 digest */ + __u64 mrowner[6]; /* sha384 digest */ + __u64 mrownerconfig[6]; /* sha384 digest */ + + /* The total space for TD_PARAMS before the CPUIDs is 256 bytes */ + __u64 reserved[12]; + + /* + * Call KVM_TDX_INIT_VM before vcpu creation, thus before + * KVM_SET_CPUID2. + * This configuration supersedes KVM_SET_CPUID2s for VCPUs because the + * TDX module directly virtualizes those CPUIDs without VMM. The user + * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with + * those values. If it doesn't, KVM may have wrong idea of vCPUIDs of + * the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX + * module doesn't virtualize. + */ + struct kvm_cpuid2 cpuid; +}; + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 41786b834b16..14be20e003f4 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1513,6 +1513,13 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid, return r; } =20 +struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2( + struct kvm_cpuid_entry2 *entries, int nent, u32 function, u64 index) +{ + return cpuid_entry2_find(entries, nent, function, index); +} +EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry2); + struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu, u32 function, u32 index) { diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 41697cca354e..00570227e2ae 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -13,6 +13,8 @@ void kvm_set_cpu_caps(void); =20 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu); void kvm_update_pv_runtime(struct kvm_vcpu *vcpu); +struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(struct kvm_cpuid_entry2 *en= tries, + int nent, u32 function, u64 index); struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu, u32 function, u32 index); struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index c9093b003c13..ac224d79ba1e 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -5,6 +5,7 @@ #include "x86_ops.h" #include "tdx.h" =20 + #undef pr_fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt =20 @@ -63,6 +64,11 @@ static u64 tdx_get_supported_xfam(const struct tdx_sys_i= nfo_td_conf *td_conf) return val; } =20 +static int tdx_get_guest_phys_addr_bits(const u32 eax) +{ + return (eax & GENMASK(23, 16)) >> 16; +} + static u32 tdx_set_guest_phys_addr_bits(const u32 eax, int addr_bits) { return (eax & ~GENMASK(23, 16)) | (addr_bits & 0xff) << 16; @@ -360,7 +366,11 @@ static void tdx_reclaim_td_control_pages(struct kvm *k= vm) =20 void tdx_vm_free(struct kvm *kvm) { + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + tdx_reclaim_td_control_pages(kvm); + + kvm_tdx->state =3D TD_STATE_UNINITIALIZED; } =20 static int tdx_do_tdh_mng_key_config(void *param) @@ -379,10 +389,10 @@ static int tdx_do_tdh_mng_key_config(void *param) return 0; } =20 -static int __tdx_td_init(struct kvm *kvm); - int tdx_vm_init(struct kvm *kvm) { + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + kvm->arch.has_private_mem =3D true; =20 /* @@ -398,8 +408,9 @@ int tdx_vm_init(struct kvm *kvm) */ kvm->max_vcpus =3D min_t(int, kvm->max_vcpus, num_present_cpus()); =20 - /* Place holder for TDX specific logic. */ - return __tdx_td_init(kvm); + kvm_tdx->state =3D TD_STATE_UNINITIALIZED; + + return 0; } =20 static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) @@ -450,7 +461,142 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *c= md) return ret; } =20 -static int __tdx_td_init(struct kvm *kvm) +/* + * KVM reports guest physical address in CPUID.0x800000008.EAX[23:16], whi= ch is + * similar to TDX's GPAW. Use this field as the interface for userspace to + * configure the GPAW and EPT level for TDs. + * + * Only values 48 and 52 are supported. Value 52 means GPAW-52 and EPT lev= el + * 5, Value 48 means GPAW-48 and EPT level 4. For value 48, GPAW-48 is alw= ays + * supported. Value 52 is only supported when the platform supports 5 level + * EPT. + */ +static int setup_tdparams_eptp_controls(struct kvm_cpuid2 *cpuid, + struct td_params *td_params) +{ + const struct kvm_cpuid_entry2 *entry; + int guest_pa; + + entry =3D kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, 0x80000008, = 0); + if (!entry) + return -EINVAL; + + guest_pa =3D tdx_get_guest_phys_addr_bits(entry->eax); + + if (guest_pa !=3D 48 && guest_pa !=3D 52) + return -EINVAL; + + if (guest_pa =3D=3D 52 && !cpu_has_vmx_ept_5levels()) + return -EINVAL; + + td_params->eptp_controls =3D VMX_EPTP_MT_WB; + if (guest_pa =3D=3D 52) { + td_params->eptp_controls |=3D VMX_EPTP_PWL_5; + td_params->config_flags |=3D TDX_CONFIG_FLAGS_MAX_GPAW; + } else { + td_params->eptp_controls |=3D VMX_EPTP_PWL_4; + } + + return 0; +} + +static int setup_tdparams_cpuids(struct kvm_cpuid2 *cpuid, + struct td_params *td_params) +{ + const struct tdx_sys_info_td_conf *td_conf =3D &tdx_sysinfo->td_conf; + const struct kvm_cpuid_entry2 *entry; + struct tdx_cpuid_value *value; + int i, copy_cnt =3D 0; + + /* + * td_params.cpuid_values: The number and the order of cpuid_value must + * be same to the one of struct tdsysinfo.{num_cpuid_config, cpuid_config= s} + * It's assumed that td_params was zeroed. + */ + for (i =3D 0; i < td_conf->num_cpuid_config; i++) { + struct kvm_cpuid_entry2 tmp; + + td_init_cpuid_entry2(&tmp, i); + + entry =3D kvm_find_cpuid_entry2(cpuid->entries, cpuid->nent, + tmp.function, tmp.index); + if (!entry) + continue; + + copy_cnt++; + + value =3D &td_params->cpuid_values[i]; + value->eax =3D entry->eax; + value->ebx =3D entry->ebx; + value->ecx =3D entry->ecx; + value->edx =3D entry->edx; + + /* + * TDX module does not accept nonzero bits 16..23 for the + * CPUID[0x80000008].EAX, see setup_tdparams_eptp_controls(). + */ + if (tmp.function =3D=3D 0x80000008) + value->eax =3D tdx_set_guest_phys_addr_bits(value->eax, 0); + } + + /* + * Rely on the TDX module to reject invalid configuration, but it can't + * check of leafs that don't have a proper slot in td_params->cpuid_values + * to stick then. So fail if there were entries that didn't get copied to + * td_params. + */ + if (copy_cnt !=3D cpuid->nent) + return -EINVAL; + + return 0; +} + +static int setup_tdparams(struct kvm *kvm, struct td_params *td_params, + struct kvm_tdx_init_vm *init_vm) +{ + const struct tdx_sys_info_td_conf *td_conf =3D &tdx_sysinfo->td_conf; + struct kvm_cpuid2 *cpuid =3D &init_vm->cpuid; + int ret; + + if (kvm->created_vcpus) + return -EBUSY; + + if (init_vm->attributes & ~tdx_get_supported_attrs(td_conf)) + return -EINVAL; + + if (init_vm->xfam & ~tdx_get_supported_xfam(td_conf)) + return -EINVAL; + + td_params->max_vcpus =3D kvm->max_vcpus; + td_params->attributes =3D init_vm->attributes | td_conf->attributes_fixed= 1; + td_params->xfam =3D init_vm->xfam | td_conf->xfam_fixed1; + + td_params->config_flags =3D TDX_CONFIG_FLAGS_NO_RBP_MOD; + td_params->tsc_frequency =3D TDX_TSC_KHZ_TO_25MHZ(kvm->arch.default_tsc_k= hz); + + ret =3D setup_tdparams_eptp_controls(cpuid, td_params); + if (ret) + return ret; + + ret =3D setup_tdparams_cpuids(cpuid, td_params); + if (ret) + return ret; + +#define MEMCPY_SAME_SIZE(dst, src) \ + do { \ + BUILD_BUG_ON(sizeof(dst) !=3D sizeof(src)); \ + memcpy((dst), (src), sizeof(dst)); \ + } while (0) + + MEMCPY_SAME_SIZE(td_params->mrconfigid, init_vm->mrconfigid); + MEMCPY_SAME_SIZE(td_params->mrowner, init_vm->mrowner); + MEMCPY_SAME_SIZE(td_params->mrownerconfig, init_vm->mrownerconfig); + + return 0; +} + +static int __tdx_td_init(struct kvm *kvm, struct td_params *td_params, + u64 *seamcall_err) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); cpumask_var_t packages; @@ -458,8 +604,9 @@ static int __tdx_td_init(struct kvm *kvm) unsigned long tdr_pa =3D 0; unsigned long va; int ret, i; - u64 err; + u64 err, rcx; =20 + *seamcall_err =3D 0; ret =3D tdx_guest_keyid_alloc(); if (ret < 0) return ret; @@ -573,10 +720,23 @@ static int __tdx_td_init(struct kvm *kvm) } } =20 - /* - * Note, TDH_MNG_INIT cannot be invoked here. TDH_MNG_INIT requires a de= dicated - * ioctl() to define the configure CPUID values for the TD. - */ + err =3D tdh_mng_init(kvm_tdx->tdr_pa, __pa(td_params), &rcx); + if ((err & TDX_SEAMCALL_STATUS_MASK) =3D=3D TDX_OPERAND_INVALID) { + /* + * Because a user gives operands, don't warn. + * Return a hint to the user because it's sometimes hard for the + * user to figure out which operand is invalid. SEAMCALL status + * code includes which operand caused invalid operand error. + */ + *seamcall_err =3D err; + ret =3D -EINVAL; + goto teardown; + } else if (WARN_ON_ONCE(err)) { + pr_tdx_error_1(TDH_MNG_INIT, err, rcx); + ret =3D -EIO; + goto teardown; + } + return 0; =20 /* @@ -624,6 +784,82 @@ static int __tdx_td_init(struct kvm *kvm) return ret; } =20 +static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd) +{ + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + struct kvm_tdx_init_vm *init_vm; + struct td_params *td_params =3D NULL; + int ret; + + BUILD_BUG_ON(sizeof(*init_vm) !=3D 256 + sizeof_field(struct kvm_tdx_init= _vm, cpuid)); + BUILD_BUG_ON(sizeof(struct td_params) !=3D 1024); + + if (kvm_tdx->state !=3D TD_STATE_UNINITIALIZED) + return -EINVAL; + + if (cmd->flags) + return -EINVAL; + + init_vm =3D kmalloc(sizeof(*init_vm) + + sizeof(init_vm->cpuid.entries[0]) * KVM_MAX_CPUID_ENTRIES, + GFP_KERNEL); + if (!init_vm) + return -ENOMEM; + + if (copy_from_user(init_vm, u64_to_user_ptr(cmd->data), sizeof(*init_vm))= ) { + ret =3D -EFAULT; + goto out; + } + + if (init_vm->cpuid.nent > KVM_MAX_CPUID_ENTRIES) { + ret =3D -E2BIG; + goto out; + } + + if (copy_from_user(init_vm->cpuid.entries, + u64_to_user_ptr(cmd->data) + sizeof(*init_vm), + flex_array_size(init_vm, cpuid.entries, init_vm->cpuid.nent))) { + ret =3D -EFAULT; + goto out; + } + + if (memchr_inv(init_vm->reserved, 0, sizeof(init_vm->reserved))) { + ret =3D -EINVAL; + goto out; + } + + if (init_vm->cpuid.padding) { + ret =3D -EINVAL; + goto out; + } + + td_params =3D kzalloc(sizeof(struct td_params), GFP_KERNEL); + if (!td_params) { + ret =3D -ENOMEM; + goto out; + } + + ret =3D setup_tdparams(kvm, td_params, init_vm); + if (ret) + goto out; + + ret =3D __tdx_td_init(kvm, td_params, &cmd->hw_error); + if (ret) + goto out; + + kvm_tdx->tsc_offset =3D td_tdcs_exec_read64(kvm_tdx, TD_TDCS_EXEC_TSC_OFF= SET); + kvm_tdx->attributes =3D td_params->attributes; + kvm_tdx->xfam =3D td_params->xfam; + + kvm_tdx->state =3D TD_STATE_INITIALIZED; +out: + /* kfree() accepts NULL. */ + kfree(init_vm); + kfree(td_params); + + return ret; +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; @@ -645,6 +881,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) case KVM_TDX_CAPABILITIES: r =3D tdx_get_capabilities(&tdx_cmd); break; + case KVM_TDX_INIT_VM: + r =3D tdx_td_init(kvm, &tdx_cmd); + break; default: r =3D -EINVAL; goto out; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index e557a82bc882..1fcb7c1b078d 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -10,14 +10,27 @@ void tdx_cleanup(void); =20 extern bool enable_tdx; =20 +/* TDX module hardware states. These follow the TDX module OP_STATEs. */ +enum kvm_tdx_state { + TD_STATE_UNINITIALIZED =3D 0, + TD_STATE_INITIALIZED, + TD_STATE_RUNNABLE, +}; + struct kvm_tdx { struct kvm kvm; =20 unsigned long tdr_pa; unsigned long *tdcs_pa; =20 + u64 attributes; + u64 xfam; int hkid; u8 nr_tdcs_pages; + + u64 tsc_offset; + + enum kvm_tdx_state state; }; =20 struct vcpu_tdx { @@ -45,6 +58,17 @@ static __always_inline struct vcpu_tdx *to_tdx(struct kv= m_vcpu *vcpu) return container_of(vcpu, struct vcpu_tdx, vcpu); } =20 +static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u3= 2 field) +{ + u64 err, data; + + err =3D tdh_mng_rd(kvm_tdx->tdr_pa, TDCS_EXEC(field), &data); + if (unlikely(err)) { + pr_err("TDH_MNG_RD[EXEC.0x%x] failed: 0x%llx\n", field, err); + return 0; + } + return data; +} #else static inline void tdx_bringup(void) {} static inline void tdx_cleanup(void) {} --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F1841D1506; Wed, 30 Oct 2024 19:01:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314878; cv=none; b=LYIKknRITNTOoLttGsggRxSPJjucRWtiEbNnCfvD+DZaACxafA7wviLKfAQmymowPPZu+gmCpL1DgMkQrqmJtNf5FRSMPgvDODvde86nUcyfFZP/Cb0RuXMFddmNkiyPcKxJQx6MXVw+j0x+hOQwON9g5ZBaER9H3yfG3aUZu3g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314878; c=relaxed/simple; bh=BEDue/QNC2+Q1sz4S/Xu8rWd/5NHgqbk6JjIdzhTmPg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XraKe6EntmCxwR547ooRUq1tnEMPqjUhaPCSV6QvMYTc8kDqCN053zJU4NymOtGzac7BYppWCUtJoPWHtrHEdrUaO1nOpentQK0vbalEIcjJagdICPSQA0p7ggpQHJ6K6BCFF4FuMqtEJyIOVEbLTAs1NMD6FzEvdzpueJi7TlU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dDp2zYFH; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dDp2zYFH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314876; x=1761850876; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BEDue/QNC2+Q1sz4S/Xu8rWd/5NHgqbk6JjIdzhTmPg=; b=dDp2zYFHkloouEaY8XyWBpdS7QVn1DBHSc+lUHCbwGgepa2pxoRpI8K5 SAfqhcz3OzWnMfCvLhdBA1AEV6pb4aIcUmK/XlNYbl6L2oAQZUEAPbUut Ak4zJKtIgg4iP3DGvL9lg7EQUgHsRXQKvewNZx3XqzWF+SdHsSQAc+uOq LXwAGYLq84sOA48o47RsDLU/efV9UxRUYbfPLgzqXHK6A8u7NjVS+N6HR 8GhwEs0YQHjyY1a9mW+ifTEG0U6Zjq37c74ulonP86/SrOnkPuNIwpzjv 7takc+6oqhw8AmgbF5U/MmtL3f2XEKInNE7jfeBv0eEbM1X9w1k69gnMU g==; X-CSE-ConnectionGUID: TpiR/5PfRO+XR8PhTPFvDw== X-CSE-MsgGUID: gdhpFwUCTH6+olAO1Vt1dQ== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678829" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678829" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:06 -0700 X-CSE-ConnectionGUID: 3WPS5F1JQFCJxLJDoTKviA== X-CSE-MsgGUID: dAb0XkRETi6SyJgRy5F9WQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499442" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:05 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata Subject: [PATCH v2 20/25] KVM: TDX: Make pmu_intel.c ignore guest TD case Date: Wed, 30 Oct 2024 12:00:33 -0700 Message-ID: <20241030190039.77971-21-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata TDX KVM doesn't support PMU yet, it's future work of TDX KVM support as another patch series. For now, handle TDX by updating vcpu_to_lbr_desc() and vcpu_to_lbr_records() to return NULL. Signed-off-by: Isaku Yamahata Co-developed-by: Tony Lindgren Signed-off-by: Tony Lindgren Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - Make vcpu_to_lbr_desc() to return NULL (Paolo) - Drop unecessary ifdefs around is_td_vcpu() (Tony) uAPI breakout v1: - Fix bisectability issues in headers (Kai) - Fix rebase error from v19 (Chao Gao) - Make helpers static (Tony Lindgren) - Improve whitespace (Tony Lindgren) v18: - Removed unnecessary change to vmx.c which caused kernel warning. --- arch/x86/kvm/vmx/pmu_intel.c | 50 +++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/pmu_intel.h | 28 ++++++++++++++++++++ arch/x86/kvm/vmx/vmx.h | 34 +----------------------- 3 files changed, 78 insertions(+), 34 deletions(-) create mode 100644 arch/x86/kvm/vmx/pmu_intel.h diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 83382a4d1d66..1cd92b43f463 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -19,6 +19,7 @@ #include "lapic.h" #include "nested.h" #include "pmu.h" +#include "tdx.h" =20 /* * Perf's "BASE" is wildly misleading, architectural PMUs use bits 31:16 o= f ECX @@ -34,6 +35,22 @@ =20 #define MSR_PMC_FULL_WIDTH_BIT (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0) =20 +static struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return NULL; + + return &to_vmx(vcpu)->lbr_desc; +} + +static struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return NULL; + + return &to_vmx(vcpu)->lbr_desc.records; +} + static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data) { struct kvm_pmc *pmc; @@ -129,6 +146,22 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm= _pmu *pmu, u32 msr) return get_gp_pmc(pmu, msr, MSR_IA32_PMC0); } =20 +static bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return false; + + return cpuid_model_is_consistent(vcpu); +} + +bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return false; + + return !!vcpu_to_lbr_records(vcpu)->nr; +} + static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index) { struct x86_pmu_lbr *records =3D vcpu_to_lbr_records(vcpu); @@ -194,6 +227,9 @@ static inline void intel_pmu_release_guest_lbr_event(st= ruct kvm_vcpu *vcpu) { struct lbr_desc *lbr_desc =3D vcpu_to_lbr_desc(vcpu); =20 + if (!lbr_desc) + return; + if (lbr_desc->event) { perf_event_release_kernel(lbr_desc->event); lbr_desc->event =3D NULL; @@ -235,6 +271,9 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *v= cpu) PERF_SAMPLE_BRANCH_USER, }; =20 + if (WARN_ON_ONCE(!lbr_desc)) + return 0; + if (unlikely(lbr_desc->event)) { __set_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use); return 0; @@ -466,6 +505,9 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) u64 perf_capabilities; u64 counter_rsvd; =20 + if (!lbr_desc) + return; + memset(&lbr_desc->records, 0, sizeof(lbr_desc->records)); =20 /* @@ -542,7 +584,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters); =20 perf_capabilities =3D vcpu_get_perf_capabilities(vcpu); - if (cpuid_model_is_consistent(vcpu) && + if (intel_pmu_lbr_is_compatible(vcpu) && (perf_capabilities & PMU_CAP_LBR_FMT)) memcpy(&lbr_desc->records, &vmx_lbr_caps, sizeof(vmx_lbr_caps)); else @@ -570,6 +612,9 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu) struct kvm_pmu *pmu =3D vcpu_to_pmu(vcpu); struct lbr_desc *lbr_desc =3D vcpu_to_lbr_desc(vcpu); =20 + if (!lbr_desc) + return; + for (i =3D 0; i < KVM_MAX_NR_INTEL_GP_COUNTERS; i++) { pmu->gp_counters[i].type =3D KVM_PMC_GP; pmu->gp_counters[i].vcpu =3D vcpu; @@ -677,6 +722,9 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu) struct kvm_pmu *pmu =3D vcpu_to_pmu(vcpu); struct lbr_desc *lbr_desc =3D vcpu_to_lbr_desc(vcpu); =20 + if (WARN_ON_ONCE(!lbr_desc)) + return; + if (!lbr_desc->event) { vmx_disable_lbr_msrs_passthrough(vcpu); if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR) diff --git a/arch/x86/kvm/vmx/pmu_intel.h b/arch/x86/kvm/vmx/pmu_intel.h new file mode 100644 index 000000000000..5620d0882cdc --- /dev/null +++ b/arch/x86/kvm/vmx/pmu_intel.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __KVM_X86_VMX_PMU_INTEL_H +#define __KVM_X86_VMX_PMU_INTEL_H + +#include + +bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu); +int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu); + +struct lbr_desc { + /* Basic info about guest LBR records. */ + struct x86_pmu_lbr records; + + /* + * Emulate LBR feature via passthrough LBR registers when the + * per-vcpu guest LBR event is scheduled on the current pcpu. + * + * The records may be inaccurate if the host reclaims the LBR. + */ + struct perf_event *event; + + /* True if LBRs are marked as not intercepted in the MSR bitmap */ + bool msr_passthrough; +}; + +extern struct x86_pmu_lbr vmx_lbr_caps; + +#endif /* __KVM_X86_VMX_PMU_INTEL_H */ diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index ad9efe41e691..37a555c6dfbf 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -11,6 +11,7 @@ =20 #include "capabilities.h" #include "../kvm_cache_regs.h" +#include "pmu_intel.h" #include "vmcs.h" #include "vmx_ops.h" #include "../cpuid.h" @@ -90,24 +91,6 @@ union vmx_exit_reason { u32 full; }; =20 -struct lbr_desc { - /* Basic info about guest LBR records. */ - struct x86_pmu_lbr records; - - /* - * Emulate LBR feature via passthrough LBR registers when the - * per-vcpu guest LBR event is scheduled on the current pcpu. - * - * The records may be inaccurate if the host reclaims the LBR. - */ - struct perf_event *event; - - /* True if LBRs are marked as not intercepted in the MSR bitmap */ - bool msr_passthrough; -}; - -extern struct x86_pmu_lbr vmx_lbr_caps; - /* * The nested_vmx structure is part of vcpu_vmx, and holds information we = need * for correct emulation of VMX (i.e., nested VMX) on this vcpu. @@ -659,21 +642,6 @@ static __always_inline struct vcpu_vmx *to_vmx(struct = kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_vmx, vcpu); } =20 -static inline struct lbr_desc *vcpu_to_lbr_desc(struct kvm_vcpu *vcpu) -{ - return &to_vmx(vcpu)->lbr_desc; -} - -static inline struct x86_pmu_lbr *vcpu_to_lbr_records(struct kvm_vcpu *vcp= u) -{ - return &vcpu_to_lbr_desc(vcpu)->records; -} - -static inline bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu) -{ - return !!vcpu_to_lbr_records(vcpu)->nr; -} - void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu); int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu); void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu); --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D8BA227366; Wed, 30 Oct 2024 19:01:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314878; cv=none; b=cgtbHhpxQzV0j+DagLDi7dBbXZgO3zTt0rfy4k/Q5b81bfH+fkXLC5RtQEbn+yPKV2jhiedPAKbEUe9xcUsZAFgbr7ICSPfPzFhDyJ3157aZ5hOLNRjxsmry7EmWMbV/5ZZRTt1CjaHzlS0HN2BHbdpE75yl+1IQVtnAsQmvhOs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314878; c=relaxed/simple; bh=jqkj4Hb7DRvcZMExIzjynTbXnLXSSlyhjX7U93N28yw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LOmQop/vURyCRz9m33e0/NoXqexUwu13p+zQQAabal2sSitiK2ApIYufQ5g6cus1+97ov2mtp5UtXnpURUk+2QKngnOByvS5hXt804/mdPLh1/1DST7TGeEvDX/pDJi3xtB8bzOuIQCO2pB5sEeNrPyRqgkozF4UzktyYi9kpyg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZUOxsa5W; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZUOxsa5W" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314876; x=1761850876; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jqkj4Hb7DRvcZMExIzjynTbXnLXSSlyhjX7U93N28yw=; b=ZUOxsa5WDgWGJxjzmgnuZfVA2bOoQnArsEWJPyXQw6j7w7rnySSF/hgp AoqpiT54+AS7I8cONG3B53FsP1hIRjXPEOseUTfoM2c48GqdDybFdAmVf QfZYQEW+rFY87yTbWYcOaxWGIGxTBuhvn3gxlDI6lWNTHEX01fHUOzd7n 3bduxb9Csse0RbFu2s++N4C5jC85Bb+KvJ2rfM3NF2HxxjSeldcGNrVqH LXg0mQsEt+m8Ss+CfjCbLNh7KRg8t2P66UDfiGrKj4IHDdn9teN/TAb6I ZLoXxHbCEPmPLrfpv3xjFBj9aqEyU8mdwtSALuOyGclFaUrJCb1fNRz6Q g==; X-CSE-ConnectionGUID: S9Efz8FZTzeEEpy1GkGnGg== X-CSE-MsgGUID: L/U978McQRenQq2h2GgD5Q== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678836" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678836" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:07 -0700 X-CSE-ConnectionGUID: Ww617W06Ty+o0ciM4QW+Dg== X-CSE-MsgGUID: 2CCMq2zgS3eRXqqEoDd9Tw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499448" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:06 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata , Binbin Wu Subject: [PATCH v2 21/25] KVM: TDX: Don't offline the last cpu of one package when there's TDX guest Date: Wed, 30 Oct 2024 12:00:34 -0700 Message-ID: <20241030190039.77971-22-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Destroying TDX guest requires there's at least one cpu online for each package, because reclaiming the TDX KeyID of the guest (as part of the teardown process) requires to call some SEAMCALL (on any cpu) on all packages. Do not offline the last cpu of one package when there's any TDX guest running, otherwise KVM may not be able to teardown TDX guest resulting in leaking of TDX KeyID and other resources like TDX guest control structure pages. Implement the TDX version 'offline_cpu()' to prevent the cpu from going offline if it is the last cpu on the package. Co-developed-by: Kai Huang Signed-off-by: Kai Huang Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu --- uAPI breakout v2: - Update description to leave out stale part (Binbin) - Add some local hkid tracking on KVM side, now that the allocator is in arch/x86 code (Kai) uAPI breakout v1: - Remove nr_configured_keyid, use ida_is_empty() instead (Chao) - Change to use a simpler way to check whether the to-go-offline cpu is the last online cpu on the package. (Chao) - Improve the changelog (Kai) - Improve the patch title to call out "when there's TDX guest". (Kai) - Significantly reduce the code by using TDX's own CPUHP callback, instead of hooking into KVM's. - Update changelog to reflect the change. v18: - Added reviewed-by BinBin --- arch/x86/kvm/vmx/tdx.c | 43 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index ac224d79ba1e..17df857ae4c1 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -122,6 +122,8 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_= td_conf *td_conf, */ static DEFINE_MUTEX(tdx_lock); =20 +static atomic_t nr_configured_hkid; + /* Maximum number of retries to attempt for SEAMCALLs. */ #define TDX_SEAMCALL_RETRIES 10000 =20 @@ -134,6 +136,7 @@ static inline void tdx_hkid_free(struct kvm_tdx *kvm_td= x) { tdx_guest_keyid_free(kvm_tdx->hkid); kvm_tdx->hkid =3D -1; + atomic_dec(&nr_configured_hkid); } =20 static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx) @@ -612,6 +615,8 @@ static int __tdx_td_init(struct kvm *kvm, struct td_par= ams *td_params, return ret; kvm_tdx->hkid =3D ret; =20 + atomic_inc(&nr_configured_hkid); + va =3D __get_free_page(GFP_KERNEL_ACCOUNT); if (!va) goto free_hkid; @@ -913,6 +918,42 @@ static int tdx_online_cpu(unsigned int cpu) return r; } =20 +static int tdx_offline_cpu(unsigned int cpu) +{ + int i; + + /* No TD is running. Allow any cpu to be offline. */ + if (!atomic_read(&nr_configured_hkid)) + return 0; + + /* + * In order to reclaim TDX HKID, (i.e. when deleting guest TD), need to + * call TDH.PHYMEM.PAGE.WBINVD on all packages to program all memory + * controller with pconfig. If we have active TDX HKID, refuse to + * offline the last online cpu. + */ + for_each_online_cpu(i) { + /* + * Found another online cpu on the same package. + * Allow to offline. + */ + if (i !=3D cpu && topology_physical_package_id(i) =3D=3D + topology_physical_package_id(cpu)) + return 0; + } + + /* + * This is the last cpu of this package. Don't offline it. + * + * Because it's hard for human operator to understand the + * reason, warn it. + */ +#define MSG_ALLPKG_ONLINE \ + "TDX requires all packages to have an online CPU. Delete all TDs in order= to offline all CPUs of a package.\n" + pr_warn_ratelimited(MSG_ALLPKG_ONLINE); + return -EBUSY; +} + static void __do_tdx_cleanup(void) { /* @@ -938,7 +979,7 @@ static int __init __do_tdx_bringup(void) */ r =3D cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN, "kvm/cpu/tdx:online", - tdx_online_cpu, NULL); + tdx_online_cpu, tdx_offline_cpu); if (r < 0) return r; =20 --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 454C822803D; Wed, 30 Oct 2024 19:01:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314879; cv=none; b=QWwbxaBeB61PnYxI95c+wvuqOhF1JdUz8ieOHA1/SY0RbDAf4ZVS22qaOElhP+nTGPlgYeh3t4I466rb9Pl8QlPBuBBucAr74wBMkoLgE+M/pFxLIGofCQSdhYLjS2GAv+S9yYDC9shvp0cV1n11kvouNSUGRxmp233/c177A0s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314879; c=relaxed/simple; bh=fqmwphDCrskUaL59MGcD/o60ek35Jc4EKTRV3Dd/iMc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BSaDKCc+SIX003tQ4PI3aaw9H4COT0+acS6cL4m35wf0g7uGn4nLGtEm+cedfBI56ftJr4uj1bc22oib5iZFYi5zqkmPzRFGe8hZpfb8uM5UA7gkaeqmIIvlySObCfii9FJ07OJ43T+SgGuEI1l2+qBuW6y2B3fTPnoFQpSd7os= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nXrV47gV; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nXrV47gV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314877; x=1761850877; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fqmwphDCrskUaL59MGcD/o60ek35Jc4EKTRV3Dd/iMc=; b=nXrV47gVAU+VjlveJeC6DMi56X7MH9uwEd1jdQChzRAYqlaGtF64CZYU VCMN/D2ZffN1SzWlwPn2lrk7Z11ge4l5zJUvHJ3NdeH0Lf6cwwH6VxyfC bO0kNndnxRImwIvpBxJcFOt6mv/gKf6WgvLafeV7uLhxEdrlYkYrGrKJW lIv0ZbB0nFXxf4y0PHJegNjVcFa4442DLcy5hcmc9a/AmcFdL8zB51ExY IjxUrzsNS/VMUCw1G2YR+njE+GcUf+dU+RqAZqCweO+DKq0q0IaeD+yf0 AaTZ4ObbS7u/oizwvlG2ffAQOlooH6Sb75c01Dzye779H1b0mgfqOVNKd w==; X-CSE-ConnectionGUID: sWDU3JisQ3GgQR9kjXHQjw== X-CSE-MsgGUID: 19HHaE1dQ3uhxfhPyhe7TQ== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678843" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678843" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:07 -0700 X-CSE-ConnectionGUID: Pr6ol+dMTJ2NETEbXzs0kg== X-CSE-MsgGUID: zQmAWYC2SyWkBoeUEIvyEA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499453" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:07 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata Subject: [PATCH v2 22/25] KVM: TDX: create/free TDX vcpu structure Date: Wed, 30 Oct 2024 12:00:35 -0700 Message-ID: <20241030190039.77971-23-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Implement vcpu related stubs for TDX for create, reset and free. For now, create only the features that do not require the TDX SEAMCALL. The TDX specific vcpu initialization will be handled by KVM_TDX_INIT_VCPU. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - Drop dummy tdx_vcpu_reset() (Binbin) - Add TD state handling (Tony) uAPI breakout v1: - Dropped unnecessary WARN_ON_ONCE() in tdx_vcpu_create(). WARN_ON_ONCE(vcpu->arch.cpuid_entries), WARN_ON_ONCE(vcpu->arch.cpuid_nent) - Use kvm_tdx instead of to_kvm_tdx() in tdx_vcpu_create() (Chao) v19: - removed stale comment in tdx_vcpu_create(). v18: - update commit log to use create instead of allocate because the patch doesn't newly allocate memory for TDX vcpu. v16: - Add AMX support as the KVM upstream supports it. -- 2.46.0 --- arch/x86/kvm/vmx/main.c | 42 ++++++++++++++++++++++++++++++++++---- arch/x86/kvm/vmx/tdx.c | 34 ++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/x86_ops.h | 8 ++++++++ 3 files changed, 80 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 559f9450dec7..0548d54eb055 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -64,6 +64,40 @@ static void vt_vm_free(struct kvm *kvm) tdx_vm_free(kvm); } =20 +static int vt_vcpu_precreate(struct kvm *kvm) +{ + if (is_td(kvm)) + return 0; + + return vmx_vcpu_precreate(kvm); +} + +static int vt_vcpu_create(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return tdx_vcpu_create(vcpu); + + return vmx_vcpu_create(vcpu); +} + +static void vt_vcpu_free(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) { + tdx_vcpu_free(vcpu); + return; + } + + vmx_vcpu_free(vcpu); +} + +static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_vcpu_reset(vcpu, init_event); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -100,10 +134,10 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D { .vm_destroy =3D vt_vm_destroy, .vm_free =3D vt_vm_free, =20 - .vcpu_precreate =3D vmx_vcpu_precreate, - .vcpu_create =3D vmx_vcpu_create, - .vcpu_free =3D vmx_vcpu_free, - .vcpu_reset =3D vmx_vcpu_reset, + .vcpu_precreate =3D vt_vcpu_precreate, + .vcpu_create =3D vt_vcpu_create, + .vcpu_free =3D vt_vcpu_free, + .vcpu_reset =3D vt_vcpu_reset, =20 .prepare_switch_to_guest =3D vmx_prepare_switch_to_guest, .vcpu_load =3D vmx_vcpu_load, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 17df857ae4c1..479ffb8f41c8 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -416,6 +416,40 @@ int tdx_vm_init(struct kvm *kvm) return 0; } =20 +int tdx_vcpu_create(struct kvm_vcpu *vcpu) +{ + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm); + + if (kvm_tdx->state !=3D TD_STATE_INITIALIZED) + return -EIO; + + /* TDX only supports x2APIC, which requires an in-kernel local APIC. */ + if (!vcpu->arch.apic) + return -EINVAL; + + fpstate_set_confidential(&vcpu->arch.guest_fpu); + + vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX; + + vcpu->arch.cr0_guest_owned_bits =3D -1ul; + vcpu->arch.cr4_guest_owned_bits =3D -1ul; + + vcpu->arch.tsc_offset =3D kvm_tdx->tsc_offset; + vcpu->arch.l1_tsc_offset =3D vcpu->arch.tsc_offset; + vcpu->arch.guest_state_protected =3D + !(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTR_DEBUG); + + if ((kvm_tdx->xfam & XFEATURE_MASK_XTILE) =3D=3D XFEATURE_MASK_XTILE) + vcpu->arch.xfd_no_write_intercept =3D true; + + return 0; +} + +void tdx_vcpu_free(struct kvm_vcpu *vcpu) +{ + /* This is stub for now. More logic will come. */ +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf =3D &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index e7d5afce68f0..107c60ac94f4 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -122,12 +122,20 @@ void vmx_setup_mce(struct kvm_vcpu *vcpu); int tdx_vm_init(struct kvm *kvm); void tdx_mmu_release_hkid(struct kvm *kvm); void tdx_vm_free(struct kvm *kvm); + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); + +int tdx_vcpu_create(struct kvm_vcpu *vcpu); +void tdx_vcpu_free(struct kvm_vcpu *vcpu); #else static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} static inline void tdx_vm_free(struct kvm *kvm) {} + static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { retur= n -EOPNOTSUPP; } + +static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTS= UPP; } +static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} #endif =20 #endif /* __KVM_X86_VMX_X86_OPS_H */ --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAE0B2281CE; Wed, 30 Oct 2024 19:01:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314882; cv=none; b=VtX9OrgGP0Q2EF474q5BHz2x078uaoFiwDHCUwJCSSifgWU4bLor59C+LFKWCCU22VVd4d6rqqQNxREROsl8m4ClW3er1m0agCesF0E7KXykVqIl1SuiY1GXEP/xBmuFx/uVpgvxsgVluAqRjbTDD1j3LXCUMr7IdDbwk+tmhLw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314882; c=relaxed/simple; bh=F2bJ2/m6qJuvCutjYJe3t4NmQ/BD46xmLmcXSFwO62o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UH1gp7oLJSi/lB6YtT3jmx7ggh0NBBjekJ2QYikv3AeVQ5kLsv5BhiKQdCus4C37Lg+smvCqZJ2C2BKkZxOobqGGpLsCrVzbIIz8DdR6Xo4b3WK6b1IFq7QlweZPYuWz6Wk8MEhHhgsowAOqjJyPaRojlG0rqJzs/RefNRK1eNY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SIu7pD0r; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SIu7pD0r" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314879; x=1761850879; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=F2bJ2/m6qJuvCutjYJe3t4NmQ/BD46xmLmcXSFwO62o=; b=SIu7pD0rabIPTEZztFufirDAnnzCfvw3lfkfooEHKlqkQetlrueByYlm ffzQlEWg42KfUb4jKlo5OyHrG/ZJY526BeCL7/ZkhdhUmCvQNtE1bD3XI lFsPISd1rM4EQU949SbMemz1Nay/+lXBBMNOb7eVnGq8+yZ+wZ5+b5OIB 7WCMjByGqqbLU5bwF4uqWzvZ6pA+y7XMkR+E7HOBwYAYHoKnQyY9kP0v5 NtwVQ4KARIHRpQPhYhGk+2qj3JgLTqgoOxtcJd4F2NdeD3yaqKFJENi77 b3UHh9rEP+2Cpfhwdd6wIk6iSviEmYmybwGwMjyaNL9uI+sQVLvR4TVvO w==; X-CSE-ConnectionGUID: wp+vvAnnSaaFyBMyneXXXg== X-CSE-MsgGUID: r9qv2iAETeef0sL+nxzVAw== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678849" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678849" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:08 -0700 X-CSE-ConnectionGUID: QXu4dlQeSoSWBS27JUAWMQ== X-CSE-MsgGUID: CeygPoN7TEiW6Oo9fbL0AQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499460" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:07 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com, Isaku Yamahata , Sean Christopherson , Adrian Hunter Subject: [PATCH v2 23/25] KVM: TDX: Do TDX specific vcpu initialization Date: Wed, 30 Oct 2024 12:00:36 -0700 Message-ID: <20241030190039.77971-24-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata TD guest vcpu needs TDX specific initialization before running. Repurpose KVM_MEMORY_ENCRYPT_OP to vcpu-scope, add a new sub-command KVM_TDX_INIT_VCPU, and implement the callback for it. Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Tony Lindgren Signed-off-by: Tony Lindgren Co-developed-by: Adrian Hunter Signed-off-by: Adrian Hunter Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - Drop dummy tdx_vcpu_reset() (Binbin) - Calculate nr_vcpu_tcds_pages on init (Nikolay) - Use vcpu_tdcx naming instead of just tdcx (Yuan) - No need for is_td_vcpu_created() (Rick) - No need for is_td_finalized() (Rick) - Export functions used (Binbin) - Fixup SEAMCALL call sites due to function parameter changes to SEAMCALL wrappers (Kai) - Add TD state handling (Tony) - Clarify comment wrt is_hkid_assigned() in tdx_vcpu_free() (Yuan) - Fix error paths in tdx_td_vcpu_init() (Yuan) - Do not unnecessarily leak tdx->tdvpr_pa in tdx_vcpu_free() (Yuan) uAPI breakout v1: - Support FEATURES0_TOPOLOGY_ENUM - Update for the wrapper functions for SEAMCALLs. (Sean) - Remove WARN_ON_ONCE() in tdx_vcpu_free(). WARN_ON_ONCE(vcpu->cpu !=3D -1), WARN_ON_ONCE(tdx->tdvpx_pa), WARN_ON_ONCE(tdx->tdvpr_pa) - Remove KVM_BUG_ON() in tdx_vcpu_reset(). - Remove duplicate "tdx->tdvpr_pa=3D" lines - Rename tdvpx to tdcx as it is confusing, follow spec change for same reason (Isaku) - Updates from seamcall overhaul (Kai) - Rename error->hw_error - Change using tdx_info to using exported 'tdx_sysinfo' pointer in tdx_td_vcpu_init(). - Remove code to the old (non-existing) tdx_module_setup(). - Use a new wrapper tdx_sysinfo_nr_tdcx_pages() to replace tdx_info->nr_tdcx_pages. - Combine the two for loops in tdx_td_vcpu_init() (Chao) - Add more line breaks into tdx_td_vcpu_init() for readability (Tony) - Drop Drop local tdcx_pa in tdx_td_vcpu_init() (Rick) - Drop Drop local tdvpr_pa in tdx_td_vcpu_init() (Rick) v18: - Use tdh_sys_rd() instead of struct tdsysinfo_struct. - Rename tdx_reclaim_td_page() =3D> tdx_reclaim_control_page() - Remove the change of tools/arch/x86/include/uapi/asm/kvm.h. --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/vmx/main.c | 9 ++ arch/x86/kvm/vmx/tdx.c | 182 ++++++++++++++++++++++++++++- arch/x86/kvm/vmx/tdx.h | 13 ++- arch/x86/kvm/vmx/tdx_arch.h | 2 + arch/x86/kvm/vmx/x86_ops.h | 4 + arch/x86/kvm/x86.c | 8 ++ 9 files changed, 219 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index e7bd7867cb94..ec1b1b39c6b3 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -125,6 +125,7 @@ KVM_X86_OP(enable_smi_window) #endif KVM_X86_OP_OPTIONAL(dev_get_attr) KVM_X86_OP(mem_enc_ioctl) +KVM_X86_OP_OPTIONAL(vcpu_mem_enc_ioctl) KVM_X86_OP_OPTIONAL(mem_enc_register_region) KVM_X86_OP_OPTIONAL(mem_enc_unregister_region) KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index d8478e103f07..dfa89a5d15ef 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1827,6 +1827,7 @@ struct kvm_x86_ops { =20 int (*dev_get_attr)(u32 group, u64 attr, u64 *val); int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp); + int (*vcpu_mem_enc_ioctl)(struct kvm_vcpu *vcpu, void __user *argp); int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *ar= gp); int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *= argp); int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd); diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv= m.h index 892e16bd7430..2cfec4b42b9d 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -930,6 +930,7 @@ struct kvm_hyperv_eventfd { enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES =3D 0, KVM_TDX_INIT_VM, + KVM_TDX_INIT_VCPU, =20 KVM_TDX_CMD_NR_MAX, }; diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 0548d54eb055..d28ffddd766f 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -106,6 +106,14 @@ static int vt_mem_enc_ioctl(struct kvm *kvm, void __us= er *argp) return tdx_vm_ioctl(kvm, argp); } =20 +static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp) +{ + if (!is_td_vcpu(vcpu)) + return -EINVAL; + + return tdx_vcpu_ioctl(vcpu, argp); +} + #define VMX_REQUIRED_APICV_INHIBITS \ (BIT(APICV_INHIBIT_REASON_DISABLED) | \ BIT(APICV_INHIBIT_REASON_ABSENT) | \ @@ -260,6 +268,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D { .get_untagged_addr =3D vmx_get_untagged_addr, =20 .mem_enc_ioctl =3D vt_mem_enc_ioctl, + .vcpu_mem_enc_ioctl =3D vt_vcpu_mem_enc_ioctl, }; =20 struct kvm_x86_init_ops vt_init_ops __initdata =3D { diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 479ffb8f41c8..9008db6cf3b4 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -419,6 +419,7 @@ int tdx_vm_init(struct kvm *kvm) int tdx_vcpu_create(struct kvm_vcpu *vcpu) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm); + struct vcpu_tdx *tdx =3D to_tdx(vcpu); =20 if (kvm_tdx->state !=3D TD_STATE_INITIALIZED) return -EIO; @@ -442,12 +443,42 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) if ((kvm_tdx->xfam & XFEATURE_MASK_XTILE) =3D=3D XFEATURE_MASK_XTILE) vcpu->arch.xfd_no_write_intercept =3D true; =20 + tdx->state =3D VCPU_TD_STATE_UNINITIALIZED; + return 0; } =20 void tdx_vcpu_free(struct kvm_vcpu *vcpu) { - /* This is stub for now. More logic will come. */ + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm); + struct vcpu_tdx *tdx =3D to_tdx(vcpu); + int i; + + /* + * It is not possible to reclaim pages while hkid is assigned. It might + * be assigned if: + * 1. the TD VM is being destroyed but freeing hkid failed, in which + * case the pages are leaked + * 2. TD VCPU creation failed and this on the error path, in which case + * there is nothing to do anyway + */ + if (is_hkid_assigned(kvm_tdx)) + return; + + if (tdx->tdcx_pa) { + for (i =3D 0; i < kvm_tdx->nr_vcpu_tdcx_pages; i++) { + if (tdx->tdcx_pa[i]) + tdx_reclaim_control_page(tdx->tdcx_pa[i]); + } + kfree(tdx->tdcx_pa); + tdx->tdcx_pa =3D NULL; + } + if (tdx->tdvpr_pa) { + tdx_reclaim_control_page(tdx->tdvpr_pa); + tdx->tdvpr_pa =3D 0; + } + + tdx->state =3D VCPU_TD_STATE_UNINITIALIZED; } =20 static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) @@ -657,6 +688,9 @@ static int __tdx_td_init(struct kvm *kvm, struct td_par= ams *td_params, tdr_pa =3D __pa(va); =20 kvm_tdx->nr_tdcs_pages =3D tdx_sysinfo->td_ctrl.tdcs_base_size / PAGE_SIZ= E; + /* TDVPS =3D TDVPR(4K page) + TDCX(multiple 4K pages), -1 for TDVPR. */ + kvm_tdx->nr_vcpu_tdcx_pages =3D tdx_sysinfo->td_ctrl.tdvps_base_size / PA= GE_SIZE - 1; + tdcs_pa =3D kcalloc(kvm_tdx->nr_tdcs_pages, sizeof(*kvm_tdx->tdcs_pa), GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!tdcs_pa) @@ -936,6 +970,152 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) return r; } =20 +/* VMM can pass one 64bit auxiliary data to vcpu via RCX for guest BIOS. */ +static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx) +{ + const struct tdx_sys_info_features *modinfo =3D &tdx_sysinfo->features; + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm); + struct vcpu_tdx *tdx =3D to_tdx(vcpu); + unsigned long va; + int ret, i; + u64 err; + + va =3D __get_free_page(GFP_KERNEL_ACCOUNT); + if (!va) + return -ENOMEM; + tdx->tdvpr_pa =3D __pa(va); + + tdx->tdcx_pa =3D kcalloc(kvm_tdx->nr_vcpu_tdcx_pages, sizeof(*tdx->tdcx_p= a), + GFP_KERNEL_ACCOUNT); + if (!tdx->tdcx_pa) { + ret =3D -ENOMEM; + goto free_tdvpr; + } + + for (i =3D 0; i < kvm_tdx->nr_vcpu_tdcx_pages; i++) { + va =3D __get_free_page(GFP_KERNEL_ACCOUNT); + if (!va) { + ret =3D -ENOMEM; + goto free_tdcx; + } + tdx->tdcx_pa[i] =3D __pa(va); + } + + err =3D tdh_vp_create(kvm_tdx->tdr_pa, tdx->tdvpr_pa); + if (KVM_BUG_ON(err, vcpu->kvm)) { + ret =3D -EIO; + pr_tdx_error(TDH_VP_CREATE, err); + goto free_tdcx; + } + + for (i =3D 0; i < kvm_tdx->nr_vcpu_tdcx_pages; i++) { + err =3D tdh_vp_addcx(tdx->tdvpr_pa, tdx->tdcx_pa[i]); + if (KVM_BUG_ON(err, vcpu->kvm)) { + pr_tdx_error(TDH_VP_ADDCX, err); + /* + * Pages already added are reclaimed by the vcpu_free + * method, but the rest are freed here. + */ + for (; i < kvm_tdx->nr_vcpu_tdcx_pages; i++) { + free_page((unsigned long)__va(tdx->tdcx_pa[i])); + tdx->tdcx_pa[i] =3D 0; + } + return -EIO; + } + } + + if (modinfo->tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM) + err =3D tdh_vp_init_apicid(tdx->tdvpr_pa, vcpu_rcx, vcpu->vcpu_id); + else + err =3D tdh_vp_init(tdx->tdvpr_pa, vcpu_rcx); + + if (KVM_BUG_ON(err, vcpu->kvm)) { + pr_tdx_error(TDH_VP_INIT, err); + return -EIO; + } + + vcpu->arch.mp_state =3D KVM_MP_STATE_RUNNABLE; + + return 0; + +free_tdcx: + for (i =3D 0; i < kvm_tdx->nr_vcpu_tdcx_pages; i++) { + if (tdx->tdcx_pa[i]) + free_page((unsigned long)__va(tdx->tdcx_pa[i])); + tdx->tdcx_pa[i] =3D 0; + } + kfree(tdx->tdcx_pa); + tdx->tdcx_pa =3D NULL; + +free_tdvpr: + if (tdx->tdvpr_pa) + free_page((unsigned long)__va(tdx->tdvpr_pa)); + tdx->tdvpr_pa =3D 0; + + return ret; +} + +static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) +{ + struct msr_data apic_base_msr; + struct vcpu_tdx *tdx =3D to_tdx(vcpu); + int ret; + + if (cmd->flags) + return -EINVAL; + + if (tdx->state !=3D VCPU_TD_STATE_UNINITIALIZED) + return -EINVAL; + + /* + * As TDX requires X2APIC, set local apic mode to X2APIC. User space + * VMM, e.g. qemu, is required to set CPUID[0x1].ecx.X2APIC=3D1 by + * KVM_SET_CPUID2. Otherwise kvm_set_apic_base() will fail. + */ + apic_base_msr =3D (struct msr_data) { + .host_initiated =3D true, + .data =3D APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC | + (kvm_vcpu_is_reset_bsp(vcpu) ? MSR_IA32_APICBASE_BSP : 0), + }; + if (kvm_set_apic_base(vcpu, &apic_base_msr)) + return -EINVAL; + + ret =3D tdx_td_vcpu_init(vcpu, (u64)cmd->data); + if (ret) + return ret; + + tdx->state =3D VCPU_TD_STATE_INITIALIZED; + + return 0; +} + +int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) +{ + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm); + struct kvm_tdx_cmd cmd; + int ret; + + if (!is_hkid_assigned(kvm_tdx) || kvm_tdx->state =3D=3D TD_STATE_RUNNABLE) + return -EINVAL; + + if (copy_from_user(&cmd, argp, sizeof(cmd))) + return -EFAULT; + + if (cmd.hw_error) + return -EINVAL; + + switch (cmd.id) { + case KVM_TDX_INIT_VCPU: + ret =3D tdx_vcpu_init(vcpu, &cmd); + break; + default: + ret =3D -EINVAL; + break; + } + + return ret; +} + static int tdx_online_cpu(unsigned int cpu) { unsigned long flags; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 1fcb7c1b078d..1b78a7ea988e 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -27,15 +27,26 @@ struct kvm_tdx { u64 xfam; int hkid; u8 nr_tdcs_pages; + u8 nr_vcpu_tdcx_pages; =20 u64 tsc_offset; =20 enum kvm_tdx_state state; }; =20 +/* TDX module vCPU states */ +enum vcpu_tdx_state { + VCPU_TD_STATE_UNINITIALIZED =3D 0, + VCPU_TD_STATE_INITIALIZED, +}; + struct vcpu_tdx { struct kvm_vcpu vcpu; - /* TDX specific members follow. */ + + unsigned long tdvpr_pa; + unsigned long *tdcx_pa; + + enum vcpu_tdx_state state; }; =20 static inline bool is_td(struct kvm *kvm) diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index 84af7666e958..9d41699e66a2 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -155,4 +155,6 @@ struct td_params { #define TDX_MIN_TSC_FREQUENCY_KHZ (100 * 1000) #define TDX_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000) =20 +#define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM BIT_ULL(20) + #endif /* __KVM_X86_TDX_ARCH_H */ diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 107c60ac94f4..4739891858ea 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -127,6 +127,8 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); =20 int tdx_vcpu_create(struct kvm_vcpu *vcpu); void tdx_vcpu_free(struct kvm_vcpu *vcpu); + +int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); #else static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} @@ -136,6 +138,8 @@ static inline int tdx_vm_ioctl(struct kvm *kvm, void __= user *argp) { return -EOP =20 static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTS= UPP; } static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} + +static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp)= { return -EOPNOTSUPP; } #endif =20 #endif /* __KVM_X86_VMX_X86_OPS_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 95a10c7bc507..92de7ebf2cee 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -698,6 +698,7 @@ int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr= _data *msr_info) kvm_recalculate_apic_map(vcpu->kvm); return 0; } +EXPORT_SYMBOL_GPL(kvm_set_apic_base); =20 /* * Handle a fault on a hardware virtualization (VMX or SVM) instruction. @@ -6308,6 +6309,12 @@ long kvm_arch_vcpu_ioctl(struct file *filp, case KVM_SET_DEVICE_ATTR: r =3D kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp); break; + case KVM_MEMORY_ENCRYPT_OP: + r =3D -ENOTTY; + if (!kvm_x86_ops.vcpu_mem_enc_ioctl) + goto out; + r =3D kvm_x86_ops.vcpu_mem_enc_ioctl(vcpu, argp); + break; default: r =3D -EINVAL; } @@ -12663,6 +12670,7 @@ bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu) { return vcpu->kvm->arch.bsp_vcpu_id =3D=3D vcpu->vcpu_id; } +EXPORT_SYMBOL_GPL(kvm_vcpu_is_reset_bsp); =20 bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu) { --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE0B02281D3; Wed, 30 Oct 2024 19:01:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314881; cv=none; b=scuO+EeJCps2FwuvTtKjaID74zR4H336msU6x78pljVazXEsB/04lYRGiFEEkDEU1EXQ7jy8c5g2Q0RONTE4cduyiLMUW1wSOzOOvxWUrpldkswnHdYFADn/VNj1aNfwvpZEeu7YavktHCLU6QIHbVclgWWozJNtfohwdmlEhQk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314881; c=relaxed/simple; bh=PFnZhuzjhmZwmPdfn+yrAnFOs+bLGYc33soDXvtJ5ls=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HwiLpdGq84thZwD43ZNwZ/0MFs9MEz2RsbtJ/jwELML02gGcaIfNx0UCzXXmxMS+60ES2+//zsoDr19/6bIpYvam9VoHi8acLGz65blHdbqVl77pS+y0qegmQg73Fmb63XGUrX57q0mBi5P5mcF387Culh98b3kseHDBs4gNmw8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AWk9ItZ8; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AWk9ItZ8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314879; x=1761850879; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PFnZhuzjhmZwmPdfn+yrAnFOs+bLGYc33soDXvtJ5ls=; b=AWk9ItZ8Ao5TFprHQetRZBb5prO4bihuhfWYJeTKcFA2JyhKrK4v2y3Y aBRdsufqNJVR7OsVQ0QLcCyjZDhc0cdnjOt6LzXnktezc4Fd2vbwrScdM qAEoqrfJU61m/ikg1QrjOgGsg4vvDol4FnI0AeSslOkXxlidfFqWmVtIg kIsD5pH3YgdBHpbbezOnyEthzPQsJTIcQPnViQ/HdFk9/thLYRO0nubtf YFX308+ZV7oM4NvNMIVI3d8OU7CLzPO7ci8qWW3ldYJpn/J8XLv7IwfIK sU3kKTLJ/gBO+YKxCH+c27Mi+ZO/G9Qdus1HXhO6o5GjM4RGcLrX3kvPV g==; X-CSE-ConnectionGUID: KEmDUjKKRS6opc7oaEorSA== X-CSE-MsgGUID: ANtsv8NnReurM0KbrxxpAQ== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678857" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678857" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:08 -0700 X-CSE-ConnectionGUID: HnAeGdStQMSH5siphOsnlA== X-CSE-MsgGUID: FHnolFZfSXyObX9m1gnFKQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499465" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:08 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com Subject: [PATCH v2 24/25] KVM: x86: Introduce KVM_TDX_GET_CPUID Date: Wed, 30 Oct 2024 12:00:37 -0700 Message-ID: <20241030190039.77971-25-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li Implement an IOCTL to allow userspace to read the CPUID bit values for a configured TD. The TDX module doesn't provide the ability to set all CPUID bits. Instead some are configured indirectly, or have fixed values. But it does allow for the final resulting CPUID bits to be read. This information will be useful for userspace to understand the configuration of the TD, and set KVM's copy via KVM_SET_CPUID2. To prevent userspace from starting to use features that might not have KVM support yet, filter the reported values by KVM's support CPUID bits. Signed-off-by: Xiaoyao Li Co-developed-by: Tony Lindgren Signed-off-by: Tony Lindgren Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - Improve error path for tdx_vcpu_get_cpuid() (Xu) - Drop unused cpuid in struct kvm_tdx (Xu) - Rip out cpuid bit filtering - Fixup SEAMCALL call sites due to function parameter changes to SEAMCALL wrappers (Kai) - Add mmu.h for kvm_gfn_direct_bits() (Binbin) - Drop unused nr_premapped (Tao) - Fix formatting for tdx_vcpu_get_cpuid_leaf() (Tony) - Use helpers for phys_addr_bits (Paolo) uAPI breakout v1: - New patch --- arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/vmx/tdx.c | 167 ++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/tdx_arch.h | 5 + arch/x86/kvm/vmx/tdx_errno.h | 1 + 4 files changed, 174 insertions(+) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv= m.h index 2cfec4b42b9d..36fa03376581 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES =3D 0, KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, + KVM_TDX_GET_CPUID, =20 KVM_TDX_CMD_NR_MAX, }; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 9008db6cf3b4..1feb3307fd70 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2,6 +2,7 @@ #include #include #include "capabilities.h" +#include "mmu.h" #include "x86_ops.h" #include "tdx.h" =20 @@ -857,6 +858,94 @@ static int __tdx_td_init(struct kvm *kvm, struct td_pa= rams *td_params, return ret; } =20 +static u64 tdx_td_metadata_field_read(struct kvm_tdx *tdx, u64 field_id, + u64 *data) +{ + u64 err; + + err =3D tdh_mng_rd(tdx->tdr_pa, field_id, data); + + return err; +} + +#define TDX_MD_UNREADABLE_LEAF_MASK GENMASK(30, 7) +#define TDX_MD_UNREADABLE_SUBLEAF_MASK GENMASK(31, 7) + +static int tdx_read_cpuid(struct kvm_vcpu *vcpu, u32 leaf, u32 sub_leaf, + bool sub_leaf_set, struct kvm_cpuid_entry2 *out) +{ + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(vcpu->kvm); + u64 field_id =3D TD_MD_FIELD_ID_CPUID_VALUES; + u64 ebx_eax, edx_ecx; + u64 err =3D 0; + + if (sub_leaf & TDX_MD_UNREADABLE_LEAF_MASK || + sub_leaf_set & TDX_MD_UNREADABLE_SUBLEAF_MASK) + return -EINVAL; + + /* + * bit 23:17, REVSERVED: reserved, must be 0; + * bit 16, LEAF_31: leaf number bit 31; + * bit 15:9, LEAF_6_0: leaf number bits 6:0, leaf bits 30:7 are + * implicitly 0; + * bit 8, SUBLEAF_NA: sub-leaf not applicable flag; + * bit 7:1, SUBLEAF_6_0: sub-leaf number bits 6:0. If SUBLEAF_NA is 1, + * the SUBLEAF_6_0 is all-1. + * sub-leaf bits 31:7 are implicitly 0; + * bit 0, ELEMENT_I: Element index within field; + */ + field_id |=3D ((leaf & 0x80000000) ? 1 : 0) << 16; + field_id |=3D (leaf & 0x7f) << 9; + if (sub_leaf_set) + field_id |=3D (sub_leaf & 0x7f) << 1; + else + field_id |=3D 0x1fe; + + err =3D tdx_td_metadata_field_read(kvm_tdx, field_id, &ebx_eax); + if (err) //TODO check for specific errors + goto err_out; + + out->eax =3D (u32) ebx_eax; + out->ebx =3D (u32) (ebx_eax >> 32); + + field_id++; + err =3D tdx_td_metadata_field_read(kvm_tdx, field_id, &edx_ecx); + /* + * It's weird that reading edx_ecx fails while reading ebx_eax + * succeeded. + */ + if (WARN_ON_ONCE(err)) + goto err_out; + + out->ecx =3D (u32) edx_ecx; + out->edx =3D (u32) (edx_ecx >> 32); + + out->function =3D leaf; + out->index =3D sub_leaf; + out->flags |=3D sub_leaf_set ? KVM_CPUID_FLAG_SIGNIFCANT_INDEX : 0; + + /* + * Work around missing support on old TDX modules, fetch + * guest maxpa from gfn_direct_bits. + */ + if (leaf =3D=3D 0x80000008) { + gpa_t gpa_bits =3D gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm)); + unsigned int g_maxpa =3D __ffs(gpa_bits) + 1; + + out->eax =3D tdx_set_guest_phys_addr_bits(out->eax, g_maxpa); + } + + return 0; + +err_out: + out->eax =3D 0; + out->ebx =3D 0; + out->ecx =3D 0; + out->edx =3D 0; + + return -EIO; +} + static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); @@ -1055,6 +1144,81 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u= 64 vcpu_rcx) return ret; } =20 +/* Sometimes reads multipple subleafs. Return how many enties were written= . */ +static int tdx_vcpu_get_cpuid_leaf(struct kvm_vcpu *vcpu, u32 leaf, int ma= x_cnt, + struct kvm_cpuid_entry2 *output_e) +{ + int i; + + if (!max_cnt) + return 0; + + /* First try without a subleaf */ + if (!tdx_read_cpuid(vcpu, leaf, 0, false, output_e)) + return 1; + + /* + * If the try without a subleaf failed, try reading subleafs until + * failure. The TDX module only supports 6 bits of subleaf index. + */ + for (i =3D 0; i < 0b111111; i++) { + if (i > max_cnt) + goto out; + + /* Keep reading subleafs until there is a failure. */ + if (tdx_read_cpuid(vcpu, leaf, i, true, output_e)) + return i; + + output_e++; + } + +out: + return i; +} + +static int tdx_vcpu_get_cpuid(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *c= md) +{ + struct kvm_cpuid2 __user *output, *td_cpuid; + struct kvm_cpuid_entry2 *output_e; + int r =3D 0, i =3D 0, leaf; + + output =3D u64_to_user_ptr(cmd->data); + td_cpuid =3D kzalloc(sizeof(*td_cpuid) + + sizeof(output->entries[0]) * KVM_MAX_CPUID_ENTRIES, + GFP_KERNEL); + if (!td_cpuid) + return -ENOMEM; + + for (leaf =3D 0; leaf <=3D 0x1f; leaf++) { + output_e =3D &td_cpuid->entries[i]; + i +=3D tdx_vcpu_get_cpuid_leaf(vcpu, leaf, + KVM_MAX_CPUID_ENTRIES - i - 1, + output_e); + } + + for (leaf =3D 0x80000000; leaf <=3D 0x80000008; leaf++) { + output_e =3D &td_cpuid->entries[i]; + i +=3D tdx_vcpu_get_cpuid_leaf(vcpu, leaf, + KVM_MAX_CPUID_ENTRIES - i - 1, + output_e); + } + + td_cpuid->nent =3D i; + + if (copy_to_user(output, td_cpuid, sizeof(*output))) { + r =3D -EFAULT; + goto out; + } + if (copy_to_user(output->entries, td_cpuid->entries, + td_cpuid->nent * sizeof(struct kvm_cpuid_entry2))) + r =3D -EFAULT; + +out: + kfree(td_cpuid); + + return r; +} + static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) { struct msr_data apic_base_msr; @@ -1108,6 +1272,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user= *argp) case KVM_TDX_INIT_VCPU: ret =3D tdx_vcpu_init(vcpu, &cmd); break; + case KVM_TDX_GET_CPUID: + ret =3D tdx_vcpu_get_cpuid(vcpu, &cmd); + break; default: ret =3D -EINVAL; break; diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index 9d41699e66a2..d80ec118834e 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -157,4 +157,9 @@ struct td_params { =20 #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM BIT_ULL(20) =20 +/* + * TD scope metadata field ID. + */ +#define TD_MD_FIELD_ID_CPUID_VALUES 0x9410000300000000ULL + #endif /* __KVM_X86_TDX_ARCH_H */ diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h index dc3fa2a58c2c..f9dbb3a065cc 100644 --- a/arch/x86/kvm/vmx/tdx_errno.h +++ b/arch/x86/kvm/vmx/tdx_errno.h @@ -23,6 +23,7 @@ #define TDX_FLUSHVP_NOT_DONE 0x8000082400000000ULL #define TDX_EPT_WALK_FAILED 0xC0000B0000000000ULL #define TDX_EPT_ENTRY_STATE_INCORRECT 0xC0000B0D00000000ULL +#define TDX_METADATA_FIELD_NOT_READABLE 0xC0000C0200000000ULL =20 /* * TDX module operand ID, appears in 31:0 part of error code as --=20 2.47.0 From nobody Mon Nov 25 03:07:01 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD0302281FE; Wed, 30 Oct 2024 19:01:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314883; cv=none; b=BiTgPPywhiqSqkFZYvqTeO5+6j4pf8oEqMw8j6S3nQmtrznCS/dkgA6y8nMmvGEjabkeIyJ2Q6rpv4Ahf0KoI8WOEmLYBg/l3RrmdtZuhA9NcyXeaf3QTbO3Pxxw2/frB4HZecgR1ZLcer/hJQeU0ioMuggoRie2i7aMU9y8sAo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730314883; c=relaxed/simple; bh=2CVYgvFARR/9Ofmj+WCQSPOPjEa/ehZNBe3S2d4rNUg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kno2uF9kGanvvKLEOSsN7mphxG4bKsamWbv3dgv7rJk0RDVWiylJ+8Jf5ftfsc13azfmwjRbZDd0PLztmZATtbyr6QkcM4M1/zGnHc7Gmd2P3aQPKZQdNmhZJGGg6JQkoNvTZJsrWyYIzOlcrnOun5aJicOx8GgTREcn9V/sZ4o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KBB92bkW; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KBB92bkW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730314880; x=1761850880; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2CVYgvFARR/9Ofmj+WCQSPOPjEa/ehZNBe3S2d4rNUg=; b=KBB92bkWPjwQtUX3OFtFgMMW6uVo6gQG63tDHlCCOX6gEfmRJ4eZ4Sln +h7imxeLtR3M8gFxrBRcp25gsj4CSRxXEnfLsef0OGhE4kPEAOxD7Bq75 Cb4bn+8Z69N+L5UaZ2by9rXESuzj9XEgpEIm5SvKi6pNe3DETzmlaO08X EBYm/jMFnlbPohXDKtk4XPt3rzf2elCQTZB5aRNXQgysxBBnjbkXOLVAT HEcNaBBHLWcT2HwkveaPN7V+ooZocYU4Fm4zDrDyPGNLn2xpWoGBdCDF0 o9a3wfGRv7UxYCgbgPXh9vC0W5p4x/AgqAVyK8c0YRz6WP9+v6oRPE3vG w==; X-CSE-ConnectionGUID: q5xeqredS1GrvnP7rQ4abw== X-CSE-MsgGUID: TYCMro0ARxmenu3qS/0soQ== X-IronPort-AV: E=McAfee;i="6700,10204,11241"; a="17678863" X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="17678863" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:09 -0700 X-CSE-ConnectionGUID: WxANfO81RhqAnMRdHJ9OsQ== X-CSE-MsgGUID: vbdhL48zSX2xJ2zq+5XdpQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,245,1725346800"; d="scan'208";a="82499471" Received: from sramkris-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.223.186]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2024 12:01:08 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, yan.y.zhao@intel.com, isaku.yamahata@gmail.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tony.lindgren@linux.intel.com, xiaoyao.li@intel.com, reinette.chatre@intel.com Subject: [PATCH v2 25/25] KVM: x86/mmu: Taking guest pa into consideration when calculate tdp level Date: Wed, 30 Oct 2024 12:00:38 -0700 Message-ID: <20241030190039.77971-26-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241030190039.77971-1-rick.p.edgecombe@intel.com> References: <20241030190039.77971-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li For TDX, the maxpa (CPUID.0x80000008.EAX[7:0]) is fixed as native and the max_gpa (CPUID.0x80000008.EAX[23:16]) is configurable and used to configure the EPT level and GPAW. Use max_gpa to determine the TDP level. Signed-off-by: Xiaoyao Li Signed-off-by: Rick Edgecombe --- uAPI breakout v2: - Use if else for cpuid_query_maxguestphyaddr() (Paolo) uAPI breakout v1: - New patch --- arch/x86/kvm/cpuid.c | 14 ++++++++++++++ arch/x86/kvm/cpuid.h | 1 + arch/x86/kvm/mmu/mmu.c | 9 ++++++++- 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 14be20e003f4..e7179ce8eadc 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -423,6 +423,20 @@ int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu) return 36; } =20 +int cpuid_query_maxguestphyaddr(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best =3D kvm_find_cpuid_entry(vcpu, 0x80000000); + if (!best || best->eax < 0x80000008) + goto not_found; + best =3D kvm_find_cpuid_entry(vcpu, 0x80000008); + if (best) + return (best->eax >> 16) & 0xff; +not_found: + return 0; +} + /* * This "raw" version returns the reserved GPA bits without any adjustment= s for * encryption technologies that usurp bits. The raw mask should be used i= f and diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 00570227e2ae..61b839aa3548 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -37,6 +37,7 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 xstate_required_size(u64 xstate_bv, bool compacted); =20 int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu); +int cpuid_query_maxguestphyaddr(struct kvm_vcpu *vcpu); u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu); =20 static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 9a0fbec33984..2e253a488949 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5474,12 +5474,19 @@ void __kvm_mmu_refresh_passthrough_bits(struct kvm_= vcpu *vcpu, =20 static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu) { + int maxpa; + + if (vcpu->kvm->arch.vm_type =3D=3D KVM_X86_TDX_VM) + maxpa =3D cpuid_query_maxguestphyaddr(vcpu); + else + maxpa =3D cpuid_maxphyaddr(vcpu); + /* tdp_root_level is architecture forced level, use it if nonzero */ if (tdp_root_level) return tdp_root_level; =20 /* Use 5-level TDP if and only if it's useful/necessary. */ - if (max_tdp_level =3D=3D 5 && cpuid_maxphyaddr(vcpu) <=3D 48) + if (max_tdp_level =3D=3D 5 && maxpa <=3D 48) return 4; =20 return max_tdp_level; --=20 2.47.0