From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31A4CC433EF for ; Fri, 18 Mar 2022 15:32:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238213AbiCRPeK (ORCPT ); Fri, 18 Mar 2022 11:34:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238142AbiCRPcw (ORCPT ); Fri, 18 Mar 2022 11:32:52 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABC0B1B6961 for ; Fri, 18 Mar 2022 08:31:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617466; x=1679153466; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RV/lKPLiao9Xsjjozl2M1KuJME5QF9hTdLZ/K17rfcE=; b=KlsaUx1jrQhE9NOIUYq27E+IZpyWfd4zPZLX55Iqfd+qR/bfPeJwpRcU By/YGz+oKiQQuksFFDb5SHFJlosFX1/90WsbY5imhCFTEsYsNkWeyytBI EQYqvK4/DT0KEW3mHBDZyhfx81UdutpPzgmW8r5mpBxCybmXPf9IKr51k qMUT71wEkVEVqTuQiPfexu8GJJ9+O8diLFKWJarInGFKXEH9/j29Ls/yY Ulyi8338ehuRXhvxzHeneRhpoHzwNPfz7CqgVPBC1jAvYtrpvqY9GxaT+ rUYNqrW/0qmDtEJCvF+DdSBxG6YutVgsaXbN9WsoZS//z3RkhgG2Mz+X7 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="254710602" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="254710602" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="784269694" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga006.fm.intel.com with ESMTP; 18 Mar 2022 08:30:29 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 9D089107; Fri, 18 Mar 2022 17:30:49 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" , Dave Hansen , Borislav Petkov Subject: [PATCHv7 01/30] x86/tdx: Detect running as a TDX guest in early boot Date: Fri, 18 Mar 2022 18:30:19 +0300 Message-Id: <20220318153048.51177-2-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan In preparation of extending cc_platform_has() API to support TDX guest, use CPUID instruction to detect support for TDX guests in the early boot code (via tdx_early_init()). Since copy_bootdata() is the first user of cc_platform_has() API, detect the TDX guest status before it. Define a synthetic feature flag (X86_FEATURE_TDX_GUEST) and set this bit in a valid TDX guest platform. Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Borislav Petkov Reviewed-by: Thomas Gleixner --- arch/x86/Kconfig | 12 ++++++++++++ arch/x86/coco/Makefile | 2 ++ arch/x86/coco/tdx/Makefile | 3 +++ arch/x86/coco/tdx/tdx.c | 22 ++++++++++++++++++++++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/include/asm/tdx.h | 21 +++++++++++++++++++++ arch/x86/kernel/head64.c | 4 ++++ 8 files changed, 72 insertions(+), 1 deletion(-) create mode 100644 arch/x86/coco/tdx/Makefile create mode 100644 arch/x86/coco/tdx/tdx.c create mode 100644 arch/x86/include/asm/tdx.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 57a4e0285a80..c346d66b51fc 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -880,6 +880,18 @@ config ACRN_GUEST IOT with small footprint and real-time features. More details can be found in https://projectacrn.org/. =20 +config INTEL_TDX_GUEST + bool "Intel TDX (Trust Domain Extensions) - Guest Support" + depends on X86_64 && CPU_SUP_INTEL + depends on X86_X2APIC + help + Support running as a guest under Intel TDX. Without this support, + the guest kernel can not boot or run under TDX. + TDX includes memory encryption and integrity capabilities + which protect the confidentiality and integrity of guest + memory contents and CPU state. TDX guests are protected from + some attacks from the VMM. + endif #HYPERVISOR_GUEST =20 source "arch/x86/Kconfig.cpu" diff --git a/arch/x86/coco/Makefile b/arch/x86/coco/Makefile index c1ead00017a7..c816acf78b6a 100644 --- a/arch/x86/coco/Makefile +++ b/arch/x86/coco/Makefile @@ -4,3 +4,5 @@ KASAN_SANITIZE_core.o :=3D n CFLAGS_core.o +=3D -fno-stack-protector =20 obj-y +=3D core.o + +obj-$(CONFIG_INTEL_TDX_GUEST) +=3D tdx/ diff --git a/arch/x86/coco/tdx/Makefile b/arch/x86/coco/tdx/Makefile new file mode 100644 index 000000000000..c929d53ee059 --- /dev/null +++ b/arch/x86/coco/tdx/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-y +=3D tdx.o diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c new file mode 100644 index 000000000000..97674471fd1e --- /dev/null +++ b/arch/x86/coco/tdx/tdx.c @@ -0,0 +1,22 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2021-2022 Intel Corporation */ + +#undef pr_fmt +#define pr_fmt(fmt) "tdx: " fmt + +#include +#include + +void __init tdx_early_init(void) +{ + u32 eax, sig[3]; + + cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]); + + if (memcmp(TDX_IDENT, sig, sizeof(sig))) + return; + + setup_force_cpu_cap(X86_FEATURE_TDX_GUEST); + + pr_info("Guest detected\n"); +} diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpuf= eatures.h index 5cd22090e53d..cacc8dde854b 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -238,6 +238,7 @@ #define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL h= ypercall instruction */ #define X86_FEATURE_PVUNLOCK ( 8*32+20) /* "" PV unlock function */ #define X86_FEATURE_VCPUPREEMPT ( 8*32+21) /* "" PV vcpu_is_preempted fun= ction */ +#define X86_FEATURE_TDX_GUEST ( 8*32+22) /* Intel Trust Domain Extensions= Guest */ =20 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */ #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, = WRGSBASE instructions*/ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/as= m/disabled-features.h index 1231d63f836d..b37de8268c9a 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -68,6 +68,12 @@ # define DISABLE_SGX (1 << (X86_FEATURE_SGX & 31)) #endif =20 +#ifdef CONFIG_INTEL_TDX_GUEST +# define DISABLE_TDX_GUEST 0 +#else +# define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -79,7 +85,7 @@ #define DISABLED_MASK5 0 #define DISABLED_MASK6 0 #define DISABLED_MASK7 (DISABLE_PTI) -#define DISABLED_MASK8 0 +#define DISABLED_MASK8 (DISABLE_TDX_GUEST) #define DISABLED_MASK9 (DISABLE_SMAP|DISABLE_SGX) #define DISABLED_MASK10 0 #define DISABLED_MASK11 0 diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h new file mode 100644 index 000000000000..ba8042ce61c2 --- /dev/null +++ b/arch/x86/include/asm/tdx.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2021-2022 Intel Corporation */ +#ifndef _ASM_X86_TDX_H +#define _ASM_X86_TDX_H + +#include + +#define TDX_CPUID_LEAF_ID 0x21 +#define TDX_IDENT "IntelTDX " + +#ifdef CONFIG_INTEL_TDX_GUEST + +void __init tdx_early_init(void); + +#else + +static inline void tdx_early_init(void) { }; + +#endif /* CONFIG_INTEL_TDX_GUEST */ + +#endif /* _ASM_X86_TDX_H */ diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 4f5ecbbaae77..6dff50c3edd6 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -40,6 +40,7 @@ #include #include #include +#include =20 /* * Manage page tables very early on. @@ -514,6 +515,9 @@ asmlinkage __visible void __init x86_64_start_kernel(ch= ar * real_mode_data) =20 idt_setup_early_handler(); =20 + /* Needed before cc_platform_has() can be used for TDX */ + tdx_early_init(); + copy_bootdata(__va(real_mode_data)); =20 /* --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EE73C433F5 for ; Fri, 18 Mar 2022 15:30:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233408AbiCRPcD (ORCPT ); Fri, 18 Mar 2022 11:32:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233631AbiCRPb4 (ORCPT ); Fri, 18 Mar 2022 11:31:56 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5EB7B15DABE for ; Fri, 18 Mar 2022 08:30:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617437; x=1679153437; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=glw4TYJihfQV4SIJW7iDETuiSIcWoXqgeVFMA3Yu2UI=; b=XmFzecD/TqpV+UwLVi6Ehni6w2AmBFdREC6eaZut/mCEPx+CnatPjWdh Xn5nNH/hAgejBQbPk32Z00huExENQu4qIknljnOkS9k13Kovfte7xf/4k DR5dqLHUlagyfBdq6OW2300A0/nSKXlDJRhL5qn3eNB6W0evH/RQ/m7v6 ida01PbsZrGGklugYaVrFH052kkJdaU+RgjVTdB5wpraaF3lLi+mSqjMA +o/p2J8ceq5pnLZVGnjRBm/uIIA26x0zYb92X3wt4ez/XnKLOSiLWkDHB /5eiU4qixMeAfN3H+iaD7ZmaU4L4VSX6zO+hBOCjuYq+lRWYprR2Fv/X8 g==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="244612555" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="244612555" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="581751685" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga001.jf.intel.com with ESMTP; 18 Mar 2022 08:30:30 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id AF711192; Fri, 18 Mar 2022 17:30:49 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv7 02/30] x86/tdx: Provide common base for SEAMCALL and TDCALL C wrappers Date: Fri, 18 Mar 2022 18:30:20 +0300 Message-Id: <20220318153048.51177-3-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Secure Arbitration Mode (SEAM) is an extension of VMX architecture. It defines a new VMX root operation (SEAM VMX root) and a new VMX non-root operation (SEAM VMX non-root) which are both isolated from the legacy VMX operation where the host kernel runs. A CPU-attested software module (called 'TDX module') runs in SEAM VMX root to manage and protect VMs running in SEAM VMX non-root. SEAM VMX root is also used to host another CPU-attested software module (called 'P-SEAMLDR') to load and update the TDX module. Host kernel transits to either P-SEAMLDR or TDX module via the new SEAMCALL instruction, which is essentially a VMExit from VMX root mode to SEAM VMX root mode. SEAMCALLs are leaf functions defined by P-SEAMLDR and TDX module around the new SEAMCALL instruction. A guest kernel can also communicate with TDX module via TDCALL instruction. TDCALLs and SEAMCALLs use an ABI different from the x86-64 system-v ABI. RAX is used to carry both the SEAMCALL leaf function number (input) and the completion status (output). Additional GPRs (RCX, RDX, R8-R11) may be further used as both input and output operands in individual leaf. TDCALL and SEAMCALL share the same ABI and require the largely same code to pass down arguments and retrieve results. Define an assembly macro that can be used to implement C wrapper for both TDCALL and SEAMCALL. Suggested-by: Thomas Gleixner Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Thomas Gleixner --- TDCALL wrapper will be implemented using the macro later in the series. SEAMCALL wrapper is out-of-scope for the series and will be implemented as part of TDX host enabling. --- arch/x86/include/asm/tdx.h | 28 ++++++++++ arch/x86/kernel/asm-offsets.c | 9 ++++ arch/x86/virt/vmx/tdx/tdxcall.S | 94 +++++++++++++++++++++++++++++++++ 3 files changed, 131 insertions(+) create mode 100644 arch/x86/virt/vmx/tdx/tdxcall.S diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index ba8042ce61c2..2ffefe22f10a 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -8,6 +8,33 @@ #define TDX_CPUID_LEAF_ID 0x21 #define TDX_IDENT "IntelTDX " =20 +/* + * SW-defined error codes. + * + * Bits 47:40 =3D=3D 0xFF indicate Reserved status code class that never u= sed by + * TDX module. + */ +#define TDX_ERROR _BITUL(63) +#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(40, 47)) +#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000)) + +#ifndef __ASSEMBLY__ + +/* + * Used to gather the output registers values of the TDCALL and SEAMCALL + * instructions when requesting services from the TDX module. + * + * This is a software only structure and not part of the TDX module/VMM AB= I. + */ +struct tdx_module_output { + u64 rcx; + u64 rdx; + u64 r8; + u64 r9; + u64 r10; + u64 r11; +}; + #ifdef CONFIG_INTEL_TDX_GUEST =20 void __init tdx_early_init(void); @@ -18,4 +45,5 @@ static inline void tdx_early_init(void) { }; =20 #endif /* CONFIG_INTEL_TDX_GUEST */ =20 +#endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_TDX_H */ diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 9fb0a2f8b62a..7dca52f5cfc6 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -18,6 +18,7 @@ #include #include #include +#include =20 #ifdef CONFIG_XEN #include @@ -65,6 +66,14 @@ static void __used common(void) OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2); #endif =20 + BLANK(); + OFFSET(TDX_MODULE_rcx, tdx_module_output, rcx); + OFFSET(TDX_MODULE_rdx, tdx_module_output, rdx); + OFFSET(TDX_MODULE_r8, tdx_module_output, r8); + OFFSET(TDX_MODULE_r9, tdx_module_output, r9); + OFFSET(TDX_MODULE_r10, tdx_module_output, r10); + OFFSET(TDX_MODULE_r11, tdx_module_output, r11); + BLANK(); OFFSET(BP_scratch, boot_params, scratch); OFFSET(BP_secure_boot, boot_params, secure_boot); diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcal= l.S new file mode 100644 index 000000000000..a6c140fb6f0f --- /dev/null +++ b/arch/x86/virt/vmx/tdx/tdxcall.S @@ -0,0 +1,94 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include +#include + +/* + * TDX_MODULE_CALL - common helper macro for both + * TDCALL and SEAMCALL instructions. + * + * TDCALL - used by TDX guests to make requests to the + * TDX module and hypercalls to the VMM. + * SEAMCALL - used by TDX hosts to make requests to the + * TDX module. + * + * Both instruction are supported in Binutils >=3D 2.36. + */ +#define tdcall .byte 0x66,0x0f,0x01,0xcc +#define seamcall .byte 0x66,0x0f,0x01,0xcf + +.macro TDX_MODULE_CALL host:req + /* + * R12 will be used as temporary storage for struct tdx_module_output + * pointer. Since R12-R15 registers are not used by TDCALL/SEAMCALL + * services supported by this function, it can be reused. + */ + + /* Callee saved, so preserve it */ + push %r12 + + /* + * Push output pointer to stack. + * After the operation, it will be fetched into R12 register. + */ + push %r9 + + /* Mangle function call ABI into TDCALL/SEAMCALL ABI: */ + /* Move Leaf ID to RAX */ + mov %rdi, %rax + /* Move input 4 to R9 */ + mov %r8, %r9 + /* Move input 3 to R8 */ + mov %rcx, %r8 + /* Move input 1 to RCX */ + mov %rsi, %rcx + /* Leave input param 2 in RDX */ + + .if \host + seamcall + /* + * SEAMCALL instruction is essentially a VMExit from VMX root + * mode to SEAM VMX root mode. VMfailInvalid (CF=3D1) indicates + * that the targeted SEAM firmware is not loaded or disabled, + * or P-SEAMLDR is busy with another SEAMCALL. %rax is not + * changed in this case. + * + * Set %rax to TDX_SEAMCALL_VMFAILINVALID for VMfailInvalid. + * This value will never be used as actual SEAMCALL error code as + * it is from the Reserved status code class. + */ + jnc .Lno_vmfailinvalid + mov $TDX_SEAMCALL_VMFAILINVALID, %rax +.Lno_vmfailinvalid: + .else + tdcall + .endif + + /* + * Fetch output pointer from stack to R12 (It is used + * as temporary storage) + */ + pop %r12 + + /* + * Since this function can be initiated without an output pointer, + * check if caller provided an output struct before storing output + * registers. + * + * Update output registers, even if the call failed (RAX !=3D 0). + * Other registers may contain details of the failure. + */ + test %r12, %r12 + jz .Lno_output_struct + + /* Copy result registers to output struct: */ + movq %rcx, TDX_MODULE_rcx(%r12) + movq %rdx, TDX_MODULE_rdx(%r12) + movq %r8, TDX_MODULE_r8(%r12) + movq %r9, TDX_MODULE_r9(%r12) + movq %r10, TDX_MODULE_r10(%r12) + movq %r11, TDX_MODULE_r11(%r12) + +.Lno_output_struct: + /* Restore the state of R12 register */ + pop %r12 +.endm --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E3A8C433EF for ; Fri, 18 Mar 2022 15:32:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238063AbiCRPeH (ORCPT ); Fri, 18 Mar 2022 11:34:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238109AbiCRPcn (ORCPT ); Fri, 18 Mar 2022 11:32:43 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68871196D68 for ; Fri, 18 Mar 2022 08:31:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617464; x=1679153464; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/NfH8xbhdwy3Am/v6elhgw+vFs4vnswQIJ/0HMtfLtc=; b=hojQPVc4PCdcWiU/R8ddWptGwoxKQEtudyrYXOnCSscPGJT0CkBGx6A2 xuVYPKwVZIP4xDcbpiCRz0+NG7SEFyqLwyxXbRftRU17BaJ4dGdZEgUwg i+Oe8hBGTL2gEHk0/GrB9hlU1afoHk/scHScTP0g19RN+uKjvcKZHfqO5 LoDL6JuU7LPellzAOBizQRdMqtpvV0UPLUxabT50EmYYA1KdMqYSoSn4x RNs4tT2WRwqOScIVtyzFCWD0fiLJvYGZYL21DxNpqPwJVr0L7xdNEGJz1 VYoV0uIDgXAJripeYTD7YEzu7t9yL6UV/yRPsrzcKsnq4ArqjXdnlb3aR Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="254710604" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="254710604" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="784269698" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga006.fm.intel.com with ESMTP; 18 Mar 2022 08:30:30 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id BD9641A1; Fri, 18 Mar 2022 17:30:49 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" , Dave Hansen , Borislav Petkov Subject: [PATCHv7 03/30] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Date: Fri, 18 Mar 2022 18:30:21 +0300 Message-Id: <20220318153048.51177-4-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan Guests communicate with VMMs with hypercalls. Historically, these are implemented using instructions that are known to cause VMEXITs like VMCALL, VMLAUNCH, etc. However, with TDX, VMEXITs no longer expose the guest state to the host. This prevents the old hypercall mechanisms from working. So, to communicate with VMM, TDX specification defines a new instruction called TDCALL. In a TDX based VM, since the VMM is an untrusted entity, an intermediary layer -- TDX module -- facilitates secure communication between the host and the guest. TDX module is loaded like a firmware into a special CPU mode called SEAM. TDX guests communicate with the TDX module using the TDCALL instruction. A guest uses TDCALL to communicate with both the TDX module and VMM. The value of the RAX register when executing the TDCALL instruction is used to determine the TDCALL type. A leaf of TDCALL used to communicate with the VMM is called TDVMCALL. Add generic interfaces to communicate with the TDX module and VMM (using the TDCALL instruction). __tdx_module_call() - Used to communicate with the TDX module (via TDCALL instruction). __tdx_hypercall() - Used by the guest to request services from the VMM (via TDVMCALL leaf of TDCALL). Also define an additional wrapper _tdx_hypercall(), which adds error handling support for the TDCALL failure. The __tdx_module_call() and __tdx_hypercall() helper functions are implemented in assembly in a .S file. The TDCALL ABI requires shuffling arguments in and out of registers, which proved to be awkward with inline assembly. Just like syscalls, not all TDVMCALL use cases need to use the same number of argument registers. The implementation here picks the current worst-case scenario for TDCALL (4 registers). For TDCALLs with fewer than 4 arguments, there will end up being a few superfluous (cheap) instructions. But, this approach maximizes code reuse. For registers used by the TDCALL instruction, please check TDX GHCI specification, the section titled "TDCALL instruction" and "TDG.VP.VMCALL Interface". Based on previous patch by Sean Christopherson. Reviewed-by: Tony Luck Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov --- arch/x86/coco/tdx/Makefile | 2 +- arch/x86/coco/tdx/tdcall.S | 191 ++++++++++++++++++++++++++++++++++ arch/x86/coco/tdx/tdx.c | 24 +++++ arch/x86/include/asm/tdx.h | 30 ++++++ arch/x86/kernel/asm-offsets.c | 8 ++ 5 files changed, 254 insertions(+), 1 deletion(-) create mode 100644 arch/x86/coco/tdx/tdcall.S diff --git a/arch/x86/coco/tdx/Makefile b/arch/x86/coco/tdx/Makefile index c929d53ee059..46c55998557d 100644 --- a/arch/x86/coco/tdx/Makefile +++ b/arch/x86/coco/tdx/Makefile @@ -1,3 +1,3 @@ # SPDX-License-Identifier: GPL-2.0 =20 -obj-y +=3D tdx.o +obj-y +=3D tdx.o tdcall.o diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S new file mode 100644 index 000000000000..662479ccf630 --- /dev/null +++ b/arch/x86/coco/tdx/tdcall.S @@ -0,0 +1,191 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include +#include +#include +#include + +#include +#include +#include + +#include "../../virt/vmx/tdx/tdxcall.S" + +/* + * Bitmasks of exposed registers (with VMM). + */ +#define TDX_R10 BIT(10) +#define TDX_R11 BIT(11) +#define TDX_R12 BIT(12) +#define TDX_R13 BIT(13) +#define TDX_R14 BIT(14) +#define TDX_R15 BIT(15) + +/* + * These registers are clobbered to hold arguments for each + * TDVMCALL. They are safe to expose to the VMM. + * Each bit in this mask represents a register ID. Bit field + * details can be found in TDX GHCI specification, section + * titled "TDCALL [TDG.VP.VMCALL] leaf". + */ +#define TDVMCALL_EXPOSE_REGS_MASK ( TDX_R10 | TDX_R11 | \ + TDX_R12 | TDX_R13 | \ + TDX_R14 | TDX_R15 ) + +/* + * __tdx_module_call() - Used by TDX guests to request services from + * the TDX module (does not include VMM services) using TDCALL instruction. + * + * Transforms function call register arguments into the TDCALL register AB= I. + * After TDCALL operation, TDX module output is saved in @out (if it is + * provided by the user). + * + *------------------------------------------------------------------------- + * TDCALL ABI: + *------------------------------------------------------------------------- + * Input Registers: + * + * RAX - TDCALL Leaf number. + * RCX,RDX,R8-R9 - TDCALL Leaf specific input registers. + * + * Output Registers: + * + * RAX - TDCALL instruction error code. + * RCX,RDX,R8-R11 - TDCALL Leaf specific output registers. + * + *------------------------------------------------------------------------- + * + * __tdx_module_call() function ABI: + * + * @fn (RDI) - TDCALL Leaf ID, moved to RAX + * @rcx (RSI) - Input parameter 1, moved to RCX + * @rdx (RDX) - Input parameter 2, moved to RDX + * @r8 (RCX) - Input parameter 3, moved to R8 + * @r9 (R8) - Input parameter 4, moved to R9 + * + * @out (R9) - struct tdx_module_output pointer + * stored temporarily in R12 (not + * shared with the TDX module). It + * can be NULL. + * + * Return status of TDCALL via RAX. + */ +SYM_FUNC_START(__tdx_module_call) + FRAME_BEGIN + TDX_MODULE_CALL host=3D0 + FRAME_END + ret +SYM_FUNC_END(__tdx_module_call) + +/* + * __tdx_hypercall() - Make hypercalls to a TDX VMM using TDVMCALL leaf + * of TDCALL instruction + * + * Transforms values in function call argument struct tdx_hypercall_args = @args + * into the TDCALL register ABI. After TDCALL operation, VMM output is sav= ed + * back in @args. + * + *------------------------------------------------------------------------- + * TD VMCALL ABI: + *------------------------------------------------------------------------- + * + * Input Registers: + * + * RAX - TDCALL instruction leaf number (0 - TDG.VP.VMCALL) + * RCX - BITMAP which controls which part of TD Guest GPR + * is passed as-is to the VMM and back. + * R10 - Set 0 to indicate TDCALL follows standard TDX ABI + * specification. Non zero value indicates vendor + * specific ABI. + * R11 - VMCALL sub function number + * RBX, RBP, RDI, RSI - Used to pass VMCALL sub function specific argumen= ts. + * R8-R9, R12-R15 - Same as above. + * + * Output Registers: + * + * RAX - TDCALL instruction status (Not related to hyperca= ll + * output). + * R10 - Hypercall output error code. + * R11-R15 - Hypercall sub function specific output values. + * + *------------------------------------------------------------------------- + * + * __tdx_hypercall() function ABI: + * + * @args (RDI) - struct tdx_hypercall_args for input and output + * @flags (RSI) - TDX_HCALL_* flags + * + * On successful completion, return the hypercall error code. + */ +SYM_FUNC_START(__tdx_hypercall) + FRAME_BEGIN + + /* Save callee-saved GPRs as mandated by the x86_64 ABI */ + push %r15 + push %r14 + push %r13 + push %r12 + + /* Mangle function call ABI into TDCALL ABI: */ + /* Set TDCALL leaf ID (TDVMCALL (0)) in RAX */ + xor %eax, %eax + + /* Copy hypercall registers from arg struct: */ + movq TDX_HYPERCALL_r10(%rdi), %r10 + movq TDX_HYPERCALL_r11(%rdi), %r11 + movq TDX_HYPERCALL_r12(%rdi), %r12 + movq TDX_HYPERCALL_r13(%rdi), %r13 + movq TDX_HYPERCALL_r14(%rdi), %r14 + movq TDX_HYPERCALL_r15(%rdi), %r15 + + movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx + + tdcall + + /* + * RAX=3D=3D0 indicates a failure of the TDVMCALL mechanism itself and th= at + * something has gone horribly wrong with the TDX module. + * + * The return status of the hypercall operation is in a separate + * register (in R10). Hypercall errors are a part of normal operation + * and are handled by callers. + */ + testq %rax, %rax + jne .Lpanic + + /* TDVMCALL leaf return code is in R10 */ + movq %r10, %rax + + /* Copy hypercall result registers to arg struct if needed */ + testq $TDX_HCALL_HAS_OUTPUT, %rsi + jz .Lout + + movq %r10, TDX_HYPERCALL_r10(%rdi) + movq %r11, TDX_HYPERCALL_r11(%rdi) + movq %r12, TDX_HYPERCALL_r12(%rdi) + movq %r13, TDX_HYPERCALL_r13(%rdi) + movq %r14, TDX_HYPERCALL_r14(%rdi) + movq %r15, TDX_HYPERCALL_r15(%rdi) +.Lout: + /* + * Zero out registers exposed to the VMM to avoid speculative execution + * with VMM-controlled values. This needs to include all registers + * present in TDVMCALL_EXPOSE_REGS_MASK (except R12-R15). R12-R15 + * context will be restored. + */ + xor %r10d, %r10d + xor %r11d, %r11d + + /* Restore callee-saved GPRs as mandated by the x86_64 ABI */ + pop %r12 + pop %r13 + pop %r14 + pop %r15 + + FRAME_END + + retq +.Lpanic: + call __tdx_hypercall_failed + /* __tdx_hypercall_failed never returns */ + jmp .Lpanic +SYM_FUNC_END(__tdx_hypercall) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 97674471fd1e..4b57880e45b0 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -7,6 +7,30 @@ #include #include =20 +/* + * Wrapper for standard use of __tdx_hypercall with no output aside from + * return code. + */ +static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r1= 5) +{ + struct tdx_hypercall_args args =3D { + .r10 =3D TDX_HYPERCALL_STANDARD, + .r11 =3D fn, + .r12 =3D r12, + .r13 =3D r13, + .r14 =3D r14, + .r15 =3D r15, + }; + + return __tdx_hypercall(&args, 0); +} + +/* Called from __tdx_hypercall() for unrecoverable failure */ +void __tdx_hypercall_failed(void) +{ + panic("TDVMCALL failed. TDX module bug?"); +} + void __init tdx_early_init(void) { u32 eax, sig[3]; diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 2ffefe22f10a..0fd34019416a 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -3,11 +3,16 @@ #ifndef _ASM_X86_TDX_H #define _ASM_X86_TDX_H =20 +#include #include =20 #define TDX_CPUID_LEAF_ID 0x21 #define TDX_IDENT "IntelTDX " =20 +#define TDX_HYPERCALL_STANDARD 0 + +#define TDX_HCALL_HAS_OUTPUT BIT(0) + /* * SW-defined error codes. * @@ -35,10 +40,35 @@ struct tdx_module_output { u64 r11; }; =20 +/* + * Used in __tdx_hypercall() to pass down and get back registers' values of + * the TDCALL instruction when requesting services from the VMM. + * + * This is a software only structure and not part of the TDX module/VMM AB= I. + */ +struct tdx_hypercall_args { + u64 r10; + u64 r11; + u64 r12; + u64 r13; + u64 r14; + u64 r15; +}; + #ifdef CONFIG_INTEL_TDX_GUEST =20 void __init tdx_early_init(void); =20 +/* Used to communicate with the TDX module */ +u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, + struct tdx_module_output *out); + +/* Used to request services from the VMM */ +u64 __tdx_hypercall(struct tdx_hypercall_args *args, unsigned long flags); + +/* Called from __tdx_hypercall() for unrecoverable failure */ +void __tdx_hypercall_failed(void); + #else =20 static inline void tdx_early_init(void) { }; diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 7dca52f5cfc6..437308004ef2 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -74,6 +74,14 @@ static void __used common(void) OFFSET(TDX_MODULE_r10, tdx_module_output, r10); OFFSET(TDX_MODULE_r11, tdx_module_output, r11); =20 + BLANK(); + OFFSET(TDX_HYPERCALL_r10, tdx_hypercall_args, r10); + OFFSET(TDX_HYPERCALL_r11, tdx_hypercall_args, r11); + OFFSET(TDX_HYPERCALL_r12, tdx_hypercall_args, r12); + OFFSET(TDX_HYPERCALL_r13, tdx_hypercall_args, r13); + OFFSET(TDX_HYPERCALL_r14, tdx_hypercall_args, r14); + OFFSET(TDX_HYPERCALL_r15, tdx_hypercall_args, r15); + BLANK(); OFFSET(BP_scratch, boot_params, scratch); OFFSET(BP_secure_boot, boot_params, secure_boot); --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FF5FC433EF for ; Fri, 18 Mar 2022 15:30:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238050AbiCRPcG (ORCPT ); Fri, 18 Mar 2022 11:32:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234192AbiCRPb7 (ORCPT ); Fri, 18 Mar 2022 11:31:59 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D4CC17B884 for ; Fri, 18 Mar 2022 08:30:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617440; x=1679153440; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SWUA+7Quin3xMu8zY6rdS59Vy2K0Q4kmu8rvOJoKlSQ=; b=TnGmeU0BZ8OdT1QVtExaqy6e4JRLVXgZmFehPgdUUh1mTNDXMJfFQtFs W+s5rDrE/8dctaXtKyWOPObxnFDMRczZxOmQZzOz6s7wSZvUA7glcD5er FPHkjE+4/AO6XDm4ODHoAGi6/KAXX4zvprKuMylTXyMRDtrh2ZFUdrEhy oIgj+xEZEqNGGmxdqv/3RFpdihd36EfxILQpmaXFiMFmmw2ZWGvnzWMTw F9p7Kp7uLKvWFi91DaMXVOu7ostqHdxAdpuLsDIWCKsaNXuzMewmJ8Ng4 tIoB1Y7yuy5AcWqHRj2y11HQCDL2aDlCBXpN3cgjmYNcrgRJM5xsIwsZs g==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="244612570" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="244612570" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="645558365" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga002.fm.intel.com with ESMTP; 18 Mar 2022 08:30:30 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id CB07A1DD; Fri, 18 Mar 2022 17:30:49 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen , Borislav Petkov Subject: [PATCHv7 04/30] x86/tdx: Extend the confidential computing API to support TDX guests Date: Fri, 18 Mar 2022 18:30:22 +0300 Message-Id: <20220318153048.51177-5-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Confidential Computing (CC) features (like string I/O unroll support, memory encryption/decryption support, etc) are conditionally enabled in the kernel using cc_platform_has() API. Since TDX guests also need to use these CC features, extend cc_platform_has() API and add TDX guest-specific CC attributes support. CC API also provides an interface to deal with encryption mask. Extend it to cover TDX. Details about which bit in the page table entry to be used to indicate shared/private state is determined by using the TDINFO TDCALL. Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Borislav Petkov --- arch/x86/Kconfig | 1 + arch/x86/coco/core.c | 12 +++++++++++ arch/x86/coco/tdx/tdx.c | 47 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 60 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c346d66b51fc..93e67842e369 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -884,6 +884,7 @@ config INTEL_TDX_GUEST bool "Intel TDX (Trust Domain Extensions) - Guest Support" depends on X86_64 && CPU_SUP_INTEL depends on X86_X2APIC + select ARCH_HAS_CC_PLATFORM help Support running as a guest under Intel TDX. Without this support, the guest kernel can not boot or run under TDX. diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index fc1365dd927e..3f3008783e05 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -87,9 +87,18 @@ EXPORT_SYMBOL_GPL(cc_platform_has); =20 u64 cc_mkenc(u64 val) { + /* + * Both AMD and Intel use a bit in the page table to indicate + * encryption status of the page. + * + * - for AMD, bit *set* means the page is encrypted + * - for Intel *clear* means encrypted. + */ switch (vendor) { case CC_VENDOR_AMD: return val | cc_mask; + case CC_VENDOR_INTEL: + return val & ~cc_mask; default: return val; } @@ -97,9 +106,12 @@ u64 cc_mkenc(u64 val) =20 u64 cc_mkdec(u64 val) { + /* See comment in cc_mkenc() */ switch (vendor) { case CC_VENDOR_AMD: return val & ~cc_mask; + case CC_VENDOR_INTEL: + return val | cc_mask; default: return val; } diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 4b57880e45b0..96b2611baac5 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -5,8 +5,12 @@ #define pr_fmt(fmt) "tdx: " fmt =20 #include +#include #include =20 +/* TDX module Call Leaf IDs */ +#define TDX_GET_INFO 1 + /* * Wrapper for standard use of __tdx_hypercall with no output aside from * return code. @@ -31,8 +35,47 @@ void __tdx_hypercall_failed(void) panic("TDVMCALL failed. TDX module bug?"); } =20 +/* + * Used for TDX guests to make calls directly to the TD module. This + * should only be used for calls that have no legitimate reason to fail + * or where the kernel can not survive the call failing. + */ +static inline void tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r= 9, + struct tdx_module_output *out) +{ + if (__tdx_module_call(fn, rcx, rdx, r8, r9, out)) + panic("TDCALL %lld failed (Buggy TDX module!)\n", fn); +} + +static u64 get_cc_mask(void) +{ + struct tdx_module_output out; + unsigned int gpa_width; + + /* + * TDINFO TDX module call is used to get the TD execution environment + * information like GPA width, number of available vcpus, debug mode + * information, etc. More details about the ABI can be found in TDX + * Guest-Host-Communication Interface (GHCI), section 2.4.2 TDCALL + * [TDG.VP.INFO]. + * + * The GPA width that comes out of this call is critical. TDX guests + * can not meaningfully run without it. + */ + tdx_module_call(TDX_GET_INFO, 0, 0, 0, 0, &out); + + gpa_width =3D out.rcx & GENMASK(5, 0); + + /* + * The highest bit of a guest physical address is the "sharing" bit. + * Set it for shared pages and clear it for private pages. + */ + return BIT_ULL(gpa_width - 1); +} + void __init tdx_early_init(void) { + u64 cc_mask; u32 eax, sig[3]; =20 cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]); @@ -42,5 +85,9 @@ void __init tdx_early_init(void) =20 setup_force_cpu_cap(X86_FEATURE_TDX_GUEST); =20 + cc_set_vendor(CC_VENDOR_INTEL); + cc_mask =3D get_cc_mask(); + cc_set_mask(cc_mask); + pr_info("Guest detected\n"); } --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19036C433F5 for ; Fri, 18 Mar 2022 15:31:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238069AbiCRPcW (ORCPT ); Fri, 18 Mar 2022 11:32:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238038AbiCRPcG (ORCPT ); Fri, 18 Mar 2022 11:32:06 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03CFF17E35C for ; Fri, 18 Mar 2022 08:30:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617443; x=1679153443; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=K0Fgqrr+sASpxrEKEIlYhpHM4RccEPGBYxF+jivwZns=; b=icjpmK2kxgTY7JHXxAUcT/6mj5vCdeCDtCyZwnn3w6h0G8Lv8pJkVL1m 4NFgQ235TUUr9lmZz7+CTjExDTNjqkQYKkDKPpU8uZu85NNc1FyPaQBBG OpEoFScfDRl2xRCDfcXTxaZSvWrz3KW8loXJ8jeMKf/MxaoTnIo6OpyvA ZTvL0I5fjwxy61eT6nM7PAvsVg5wQf4s7fZ8X3IAysnr+KJ4Jx9hSoMSj K3lirorSBw6gedsDMGL3I+dyrR2QAaN8TfQchCup7o2T+3C2Ce+UwiwZR 2/rGS6WcysUyYGrdpuJb8gFlqeY9ygEap1VyivGvsa8P4XmRdZ0nBXc2c w==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="237102039" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="237102039" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="784269757" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga006.fm.intel.com with ESMTP; 18 Mar 2022 08:30:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id D4BC3150; Fri, 18 Mar 2022 17:30:49 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv7 05/30] x86/tdx: Exclude shared bit from __PHYSICAL_MASK Date: Fri, 18 Mar 2022 18:30:23 +0300 Message-Id: <20220318153048.51177-6-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In TDX guests, by default memory is protected from host access. If a guest needs to communicate with the VMM (like the I/O use case), it uses a single bit in the physical address to communicate the protected/shared attribute of the given page. In the x86 ARCH code, __PHYSICAL_MASK macro represents the width of the physical address in the given architecture. It is used in creating physical PAGE_MASK for address bits in the kernel. Since in TDX guest, a single bit is used as metadata, it needs to be excluded from valid physical address bits to avoid using incorrect addresses bits in the kernel. Enable DYNAMIC_PHYSICAL_MASK to support updating the __PHYSICAL_MASK. Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov Reviewed-by: Thomas Gleixner --- arch/x86/Kconfig | 1 + arch/x86/coco/tdx/tdx.c | 8 ++++++++ 2 files changed, 9 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 93e67842e369..d2f45e58e846 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -885,6 +885,7 @@ config INTEL_TDX_GUEST depends on X86_64 && CPU_SUP_INTEL depends on X86_X2APIC select ARCH_HAS_CC_PLATFORM + select DYNAMIC_PHYSICAL_MASK help Support running as a guest under Intel TDX. Without this support, the guest kernel can not boot or run under TDX. diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 96b2611baac5..e84f6dd3ed2a 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -89,5 +89,13 @@ void __init tdx_early_init(void) cc_mask =3D get_cc_mask(); cc_set_mask(cc_mask); =20 + /* + * All bits above GPA width are reserved and kernel treats shared bit + * as flag, not as part of physical address. + * + * Adjust physical mask to only cover valid GPA bits. + */ + physical_mask &=3D cc_mask - 1; + pr_info("Guest detected\n"); } --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD0EAC433F5 for ; Fri, 18 Mar 2022 15:32:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238204AbiCRPd4 (ORCPT ); Fri, 18 Mar 2022 11:33:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238037AbiCRPcH (ORCPT ); Fri, 18 Mar 2022 11:32:07 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A88B5184631 for ; Fri, 18 Mar 2022 08:30:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617445; x=1679153445; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MiHX6mAXe7Ut5goHg/MVZJZQy2HoE8tj7NOqZIWH6ss=; b=hRa7z0yuQBb3rfeS/cOx5ZnNwai6cZYgBm2zyHsrnbA8Ov6iBIRbU4Vw Lysj3LO/7DUoMwOwdTmMXNchHciLl4Oc4lG8ZbFkkd8qsA8OoF9dZ18OS 8d0AspRhoQM4Cqqt1Qurrt67x5zSdsC/3kAKjttvIGZi88gWGKun2vnUZ oRJaJGBXDYfOGrrSXRTNLhuOV6u6ImTT/8wLuCpAIgiiT75sD9HOeAzxy SyVGTyLASX1hfOnDSSrLLlUmlfaOCH2ojjcdl9YL7HMwc9Af5VlTgaVnm EXhMONX08rB6LUlaY8s4OGfNcMGWCH8PCuu59k97UCGm7u48CEhmD8GNu w==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="320366506" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="320366506" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="599534226" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga008.fm.intel.com with ESMTP; 18 Mar 2022 08:30:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id E766A374; Fri, 18 Mar 2022 17:30:49 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv7 06/30] x86/traps: Refactor exc_general_protection() Date: Fri, 18 Mar 2022 18:30:24 +0300 Message-Id: <20220318153048.51177-7-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" TDX brings a new exception -- Virtualization Exception (#VE). Handling of #VE structurally very similar to handling #GP. Extract two helpers from exc_general_protection() that can be reused for handling #VE. No functional changes. Suggested-by: Thomas Gleixner Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen --- arch/x86/kernel/traps.c | 57 ++++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 26 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 7ef00dee35be..bb02faf5a50c 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -611,13 +611,40 @@ static bool try_fixup_enqcmd_gp(void) #endif } =20 +static bool gp_try_fixup_and_notify(struct pt_regs *regs, int trapnr, + unsigned long error_code, const char *str) +{ + if (fixup_exception(regs, trapnr, error_code, 0)) + return true; + + current->thread.error_code =3D error_code; + current->thread.trap_nr =3D trapnr; + + /* + * To be potentially processing a kprobe fault and to trust the result + * from kprobe_running(), we have to be non-preemptible. + */ + if (!preemptible() && kprobe_running() && + kprobe_fault_handler(regs, trapnr)) + return true; + + return notify_die(DIE_GPF, str, regs, error_code, trapnr, SIGSEGV) =3D=3D= NOTIFY_STOP; +} + +static void gp_user_force_sig_segv(struct pt_regs *regs, int trapnr, + unsigned long error_code, const char *str) +{ + current->thread.error_code =3D error_code; + current->thread.trap_nr =3D trapnr; + show_signal(current, SIGSEGV, "", str, regs, error_code); + force_sig(SIGSEGV); +} + DEFINE_IDTENTRY_ERRORCODE(exc_general_protection) { char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] =3D GPFSTR; enum kernel_gp_hint hint =3D GP_NO_HINT; - struct task_struct *tsk; unsigned long gp_addr; - int ret; =20 if (user_mode(regs) && try_fixup_enqcmd_gp()) return; @@ -636,40 +663,18 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection) return; } =20 - tsk =3D current; - if (user_mode(regs)) { if (fixup_iopl_exception(regs)) goto exit; =20 - tsk->thread.error_code =3D error_code; - tsk->thread.trap_nr =3D X86_TRAP_GP; - if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0)) goto exit; =20 - show_signal(tsk, SIGSEGV, "", desc, regs, error_code); - force_sig(SIGSEGV); + gp_user_force_sig_segv(regs, X86_TRAP_GP, error_code, desc); goto exit; } =20 - if (fixup_exception(regs, X86_TRAP_GP, error_code, 0)) - goto exit; - - tsk->thread.error_code =3D error_code; - tsk->thread.trap_nr =3D X86_TRAP_GP; - - /* - * To be potentially processing a kprobe fault and to trust the result - * from kprobe_running(), we have to be non-preemptible. - */ - if (!preemptible() && - kprobe_running() && - kprobe_fault_handler(regs, X86_TRAP_GP)) - goto exit; - - ret =3D notify_die(DIE_GPF, desc, regs, error_code, X86_TRAP_GP, SIGSEGV); - if (ret =3D=3D NOTIFY_STOP) + if (gp_try_fixup_and_notify(regs, X86_TRAP_GP, error_code, desc)) goto exit; =20 if (error_code) --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06622C433EF for ; Fri, 18 Mar 2022 15:31:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238077AbiCRPc0 (ORCPT ); Fri, 18 Mar 2022 11:32:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238034AbiCRPcN (ORCPT ); Fri, 18 Mar 2022 11:32:13 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A472A187BA1 for ; Fri, 18 Mar 2022 08:30:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617447; x=1679153447; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RoAYLF7wSNigQdVy/C88CKPH34bh1+dOWypwNcODMGs=; b=N3qfI6ADymeXL0ON/g7cDdIE6o0w8FGGIZpAgsynUs+eOsNXPU8P+tNS KZAC00cgSvnZmKTDXLnQbpvKphfMKP9gUPOtFbNFovrfVSCCBvsRF+2Yo 5y+Cf1vPU8j8TDn1EP6lEjyLrQRkxSi+uAZogzeRPGpqwrrzpRYTSTTXy 1mZm2TppRJscNs7Y7TXOM6JxsezceoI/bysUuCQYAJKicBBf7/wfMJcOC WH8ePGhup5NmPMKaWvV+cN3MO6En5oEYry0BSDAWKFM6SutIO5Lmn6k9y GHyne7iqeNWf+6sKbYdFAhVwdeeIBFawbGQJVQyRekrdpHTzt16hHjjMd Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="256883201" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="256883201" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="715523725" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga005.jf.intel.com with ESMTP; 18 Mar 2022 08:30:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 01B12381; Fri, 18 Mar 2022 17:30:49 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Sean Christopherson , Dave Hansen Subject: [PATCHv7 07/30] x86/traps: Add #VE support for TDX guest Date: Fri, 18 Mar 2022 18:30:25 +0300 Message-Id: <20220318153048.51177-8-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Virtualization Exceptions (#VE) are delivered to TDX guests due to specific guest actions which may happen in either user space or the kernel: * Specific instructions (WBINVD, for example) * Specific MSR accesses * Specific CPUID leaf accesses * Access to specific guest physical addresses Syscall entry code has a critical window where the kernel stack is not yet set up. Any exception in this window leads to hard to debug issues and can be exploited for privilege escalation. Exceptions in the NMI entry code also cause issues. Returning from the exception handler with IRET will re-enable NMIs and nested NMI will corrupt the NMI stack. For these reasons, the kernel avoids #VEs during the syscall gap and the NMI entry code. Entry code paths do not access TD-shared memory, MMIO regions, use #VE triggering MSRs, instructions, or CPUID leaves that might generate #VE. VMM can remove memory from TD at any point, but access to unaccepted (or missing) private memory leads to VM termination, not to #VE. Similarly to page faults and breakpoints, #VEs are allowed in NMI handlers once the kernel is ready to deal with nested NMIs. During #VE delivery, all interrupts, including NMIs, are blocked until TDGETVEINFO is called. It prevents #VE nesting until the kernel reads the VE info. TDGETVEINFO retrieves the #VE info from the TDX module, which also clears the "#VE valid" flag. This must be done before anything else as any #VE that occurs while the valid flag is set escalates to #DF by TDX module. It will result in an oops. Virtual NMIs are inhibited if the #VE valid flag is set. NMI will not be delivered until TDGETVEINFO is called. For now, convert unhandled #VE's (everything, until later in this series) so that they appear just like a #GP by calling the ve_raise_fault() directly. The ve_raise_fault() function is similar to #GP handler and is responsible for sending SIGSEGV to userspace and CPU die and notifying debuggers and other die chain users. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen --- arch/x86/coco/tdx/tdx.c | 38 +++++++++++++++ arch/x86/include/asm/idtentry.h | 4 ++ arch/x86/include/asm/tdx.h | 21 ++++++++ arch/x86/kernel/idt.c | 3 ++ arch/x86/kernel/traps.c | 86 +++++++++++++++++++++++++++++++++ 5 files changed, 152 insertions(+) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index e84f6dd3ed2a..60a3f2ff5b95 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -10,6 +10,7 @@ =20 /* TDX module Call Leaf IDs */ #define TDX_GET_INFO 1 +#define TDX_GET_VEINFO 3 =20 /* * Wrapper for standard use of __tdx_hypercall with no output aside from @@ -73,6 +74,43 @@ static u64 get_cc_mask(void) return BIT_ULL(gpa_width - 1); } =20 +void tdx_get_ve_info(struct ve_info *ve) +{ + struct tdx_module_output out; + + /* + * Called during #VE handling to retrieve the #VE info from the + * TDX module. + * + * This has to be called early in #VE handling. A "nested" #VE which + * occurs before this will raise a #DF and is not recoverable. + * + * The call retrieves the #VE info from the TDX module, which also + * clears the "#VE valid" flag. This must be done before anything else + * because any #VE that occurs while the valid flag is set will lead to + * #DF. + * + * Note, the TDX module treats virtual NMIs as inhibited if the #VE + * valid flag is set. It means that NMI=3D>#VE will not result in a #DF. + */ + tdx_module_call(TDX_GET_VEINFO, 0, 0, 0, 0, &out); + + /* Transfer the output parameters */ + ve->exit_reason =3D out.rcx; + ve->exit_qual =3D out.rdx; + ve->gla =3D out.r8; + ve->gpa =3D out.r9; + ve->instr_len =3D lower_32_bits(out.r10); + ve->instr_info =3D upper_32_bits(out.r10); +} + +bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve) +{ + pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); + + return false; +} + void __init tdx_early_init(void) { u64 cc_mask; diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentr= y.h index 1345088e9902..8ccc81d653b3 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -625,6 +625,10 @@ DECLARE_IDTENTRY_XENCB(X86_TRAP_OTHER, exc_xen_hypervi= sor_callback); DECLARE_IDTENTRY_RAW(X86_TRAP_OTHER, exc_xen_unknown_trap); #endif =20 +#ifdef CONFIG_INTEL_TDX_GUEST +DECLARE_IDTENTRY(X86_TRAP_VE, exc_virtualization_exception); +#endif + /* Device interrupts common/spurious */ DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER, common_interrupt); #ifdef CONFIG_X86_LOCAL_APIC diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 0fd34019416a..ddbe750861f0 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -5,6 +5,7 @@ =20 #include #include +#include =20 #define TDX_CPUID_LEAF_ID 0x21 #define TDX_IDENT "IntelTDX " @@ -55,6 +56,22 @@ struct tdx_hypercall_args { u64 r15; }; =20 +/* + * Used by the #VE exception handler to gather the #VE exception + * info from the TDX module. This is a software only structure + * and not part of the TDX module/VMM ABI. + */ +struct ve_info { + u64 exit_reason; + u64 exit_qual; + /* Guest Linear (virtual) Address */ + u64 gla; + /* Guest Physical Address */ + u64 gpa; + u32 instr_len; + u32 instr_info; +}; + #ifdef CONFIG_INTEL_TDX_GUEST =20 void __init tdx_early_init(void); @@ -69,6 +86,10 @@ u64 __tdx_hypercall(struct tdx_hypercall_args *args, uns= igned long flags); /* Called from __tdx_hypercall() for unrecoverable failure */ void __tdx_hypercall_failed(void); =20 +void tdx_get_ve_info(struct ve_info *ve); + +bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve); + #else =20 static inline void tdx_early_init(void) { }; diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index df0fa695bb09..1da074123c16 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -68,6 +68,9 @@ static const __initconst struct idt_data early_idts[] =3D= { */ INTG(X86_TRAP_PF, asm_exc_page_fault), #endif +#ifdef CONFIG_INTEL_TDX_GUEST + INTG(X86_TRAP_VE, asm_exc_virtualization_exception), +#endif }; =20 /* diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index bb02faf5a50c..a77103e541bf 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -62,6 +62,7 @@ #include #include #include +#include =20 #ifdef CONFIG_X86_64 #include @@ -1272,6 +1273,91 @@ DEFINE_IDTENTRY(exc_device_not_available) } } =20 +#ifdef CONFIG_INTEL_TDX_GUEST + +#define VE_FAULT_STR "VE fault" + +static void ve_raise_fault(struct pt_regs *regs, long error_code) +{ + if (user_mode(regs)) { + gp_user_force_sig_segv(regs, X86_TRAP_VE, error_code, VE_FAULT_STR); + return; + } + + if (gp_try_fixup_and_notify(regs, X86_TRAP_VE, error_code, VE_FAULT_STR)) + return; + + die_addr(VE_FAULT_STR, regs, error_code, 0); +} + +/* + * Virtualization Exceptions (#VE) are delivered to TDX guests due to + * specific guest actions which may happen in either user space or the + * kernel: + * + * * Specific instructions (WBINVD, for example) + * * Specific MSR accesses + * * Specific CPUID leaf accesses + * * Access to specific guest physical addresses + * + * In the settings that Linux will run in, virtualization exceptions are + * never generated on accesses to normal, TD-private memory that has been + * accepted. + * + * Syscall entry code has a critical window where the kernel stack is not + * yet set up. Any exception in this window leads to hard to debug issues + * and can be exploited for privilege escalation. Exceptions in the NMI + * entry code also cause issues. Returning from the exception handler with + * IRET will re-enable NMIs and nested NMI will corrupt the NMI stack. + * + * For these reasons, the kernel avoids #VEs during the syscall gap and + * the NMI entry code. Entry code paths do not access TD-shared memory, + * MMIO regions, use #VE triggering MSRs, instructions, or CPUID leaves + * that might generate #VE. VMM can remove memory from TD at any point, + * but access to unaccepted (or missing) private memory leads to VM + * termination, not to #VE. + * + * Similarly to page faults and breakpoints, #VEs are allowed in NMI + * handlers once the kernel is ready to deal with nested NMIs. + * + * During #VE delivery, all interrupts, including NMIs, are blocked until + * TDGETVEINFO is called. It prevents #VE nesting until the kernel reads + * the VE info. + * + * If a guest kernel action which would normally cause a #VE occurs in + * the interrupt-disabled region before TDGETVEINFO, a #DF (fault + * exception) is delivered to the guest which will result in an oops. + * + * The entry code has been audited carefully for following these expectati= ons. + * Changes in the entry code have to be audited for correctness vs. this + * aspect. Similarly to #PF, #VE in these places will expose kernel to + * privilege escalation or may lead to random crashes. + */ +DEFINE_IDTENTRY(exc_virtualization_exception) +{ + struct ve_info ve; + + /* + * NMIs/Machine-checks/Interrupts will be in a disabled state + * till TDGETVEINFO TDCALL is executed. This ensures that VE + * info cannot be overwritten by a nested #VE. + */ + tdx_get_ve_info(&ve); + + cond_local_irq_enable(regs); + + /* + * If tdx_handle_virt_exception() could not process + * it successfully, treat it as #GP(0) and handle it. + */ + if (!tdx_handle_virt_exception(regs, &ve)) + ve_raise_fault(regs, 0); + + cond_local_irq_disable(regs); +} + +#endif + #ifdef CONFIG_X86_32 DEFINE_IDTENTRY_SW(iret_error) { --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A72D9C433EF for ; Fri, 18 Mar 2022 15:32:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238206AbiCRPeC (ORCPT ); Fri, 18 Mar 2022 11:34:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238053AbiCRPcH (ORCPT ); Fri, 18 Mar 2022 11:32:07 -0400 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBFEA185453 for ; Fri, 18 Mar 2022 08:30:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617445; x=1679153445; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nU1Ki/wgNi2DWNog8siHiOXpGYmZHO4/2osYozxHGTs=; b=mUrAlMOtPEsKJmctSl7/nSpIs+jehJad2Ddpy+NGDp9zlN+xAQOrYI3D CXMMNvIiHQrEhnG4/lL/rbS1Fw7744/CLM4RrF2bA2klHR0y4dV7AG/EE kHqwhka27vkETvwGxxD4yrApkYtN2YAPyHTVaimQH5Ya7vA2vvCC1Mqzy x2WXUhLWKjSUAPyHVXhpLjG8JnInocNG6RmFpXvHdqcYcuVI0jPXhSqEM PUi0rcqindxdt6nlNWoixEexjXXt+6sUTaJlNtaG2xNf3BPKkUT+e5aRO pTwWlIXU7xayK7AvG9qSB5QHv/FOtT0f6sBOuWoYjq5mo9jmuZ5toeYyM Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="317872560" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="317872560" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="513919294" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga002.jf.intel.com with ESMTP; 18 Mar 2022 08:30:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 1050D3C3; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv7 08/30] x86/tdx: Add HLT support for TDX guests Date: Fri, 18 Mar 2022 18:30:26 +0300 Message-Id: <20220318153048.51177-9-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The HLT instruction is a privileged instruction, executing it stops instruction execution and places the processor in a HALT state. It is used in kernel for cases like reboot, idle loop and exception fixup handlers. For the idle case, interrupts will be enabled (using STI) before the HLT instruction (this is also called safe_halt()). To support the HLT instruction in TDX guests, it needs to be emulated using TDVMCALL (hypercall to VMM). More details about it can be found in Intel Trust Domain Extensions (Intel TDX) Guest-Host-Communication Interface (GHCI) specification, section TDVMCALL[Instruction.HLT]. In TDX guests, executing HLT instruction will generate a #VE, which is used to emulate the HLT instruction. But #VE based emulation will not work for the safe_halt() flavor, because it requires STI instruction to be executed just before the TDCALL. Since idle loop is the only user of safe_halt() variant, handle it as a special case. To avoid *safe_halt() call in the idle function, define the tdx_guest_idle() and use it to override the "x86_idle" function pointer for a valid TDX guest. Alternative choices like PV ops have been considered for adding safe_halt() support. But it was rejected because HLT paravirt calls only exist under PARAVIRT_XXL, and enabling it in TDX guest just for safe_halt() use case is not worth the cost. Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen --- arch/x86/coco/tdx/tdcall.S | 13 ++++++ arch/x86/coco/tdx/tdx.c | 93 +++++++++++++++++++++++++++++++++++++- arch/x86/include/asm/tdx.h | 4 ++ arch/x86/kernel/process.c | 4 ++ 4 files changed, 112 insertions(+), 2 deletions(-) diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S index 662479ccf630..245888290bb6 100644 --- a/arch/x86/coco/tdx/tdcall.S +++ b/arch/x86/coco/tdx/tdcall.S @@ -139,6 +139,19 @@ SYM_FUNC_START(__tdx_hypercall) =20 movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx =20 + /* + * For the idle loop STI needs to be called directly before the TDCALL + * that enters idle (EXIT_REASON_HLT case). STI instruction enables + * interrupts only one instruction later. If there is a window between + * STI and the instruction that emulates the HALT state, there is a + * chance for interrupts to happen in this window, which can delay the + * HLT operation indefinitely. Since this is the not the desired + * result, conditionally call STI before TDCALL. + */ + testq $TDX_HCALL_ISSUE_STI, %rsi + jz .Lskip_sti + sti +.Lskip_sti: tdcall =20 /* diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 60a3f2ff5b95..ed7302581cc7 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -7,6 +7,7 @@ #include #include #include +#include =20 /* TDX module Call Leaf IDs */ #define TDX_GET_INFO 1 @@ -36,6 +37,17 @@ void __tdx_hypercall_failed(void) panic("TDVMCALL failed. TDX module bug?"); } =20 +/* + * The TDG.VP.VMCALL-Instruction-execution sub-functions are defined + * independently from but are currently matched 1:1 with VMX EXIT_REASONs. + * Reusing the KVM EXIT_REASON macros makes it easier to connect the host = and + * guest sides of these calls. + */ +static u64 hcall_func(u64 exit_reason) +{ + return exit_reason; +} + /* * Used for TDX guests to make calls directly to the TD module. This * should only be used for calls that have no legitimate reason to fail @@ -74,6 +86,62 @@ static u64 get_cc_mask(void) return BIT_ULL(gpa_width - 1); } =20 +static u64 __cpuidle __halt(const bool irq_disabled, const bool do_sti) +{ + struct tdx_hypercall_args args =3D { + .r10 =3D TDX_HYPERCALL_STANDARD, + .r11 =3D hcall_func(EXIT_REASON_HLT), + .r12 =3D irq_disabled, + }; + + /* + * Emulate HLT operation via hypercall. More info about ABI + * can be found in TDX Guest-Host-Communication Interface + * (GHCI), section 3.8 TDG.VP.VMCALL. + * + * The VMM uses the "IRQ disabled" param to understand IRQ + * enabled status (RFLAGS.IF) of the TD guest and to determine + * whether or not it should schedule the halted vCPU if an + * IRQ becomes pending. E.g. if IRQs are disabled, the VMM + * can keep the vCPU in virtual HLT, even if an IRQ is + * pending, without hanging/breaking the guest. + */ + return __tdx_hypercall(&args, do_sti ? TDX_HCALL_ISSUE_STI : 0); +} + +static bool handle_halt(void) +{ + /* + * Since non safe halt is mainly used in CPU offlining + * and the guest will always stay in the halt state, don't + * call the STI instruction (set do_sti as false). + */ + const bool irq_disabled =3D irqs_disabled(); + const bool do_sti =3D false; + + if (__halt(irq_disabled, do_sti)) + return false; + + return true; +} + +void __cpuidle tdx_safe_halt(void) +{ + /* + * For do_sti=3Dtrue case, __tdx_hypercall() function enables + * interrupts using the STI instruction before the TDCALL. So + * set irq_disabled as false. + */ + const bool irq_disabled =3D false; + const bool do_sti =3D true; + + /* + * Use WARN_ONCE() to report the failure. + */ + if (__halt(irq_disabled, do_sti)) + WARN_ONCE(1, "HLT instruction emulation failed\n"); +} + void tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; @@ -104,11 +172,32 @@ void tdx_get_ve_info(struct ve_info *ve) ve->instr_info =3D upper_32_bits(out.r10); } =20 +/* Handle the kernel #VE */ +static bool virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve) +{ + switch (ve->exit_reason) { + case EXIT_REASON_HLT: + return handle_halt(); + default: + pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); + return false; + } +} + bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve) { - pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); + bool ret; + + if (user_mode(regs)) + ret =3D false; + else + ret =3D virt_exception_kernel(regs, ve); + + /* After successful #VE handling, move the IP */ + if (ret) + regs->ip +=3D ve->instr_len; =20 - return false; + return ret; } =20 void __init tdx_early_init(void) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index ddbe750861f0..a792096a3750 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -13,6 +13,7 @@ #define TDX_HYPERCALL_STANDARD 0 =20 #define TDX_HCALL_HAS_OUTPUT BIT(0) +#define TDX_HCALL_ISSUE_STI BIT(1) =20 /* * SW-defined error codes. @@ -90,9 +91,12 @@ void tdx_get_ve_info(struct ve_info *ve); =20 bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve); =20 +void tdx_safe_halt(void); + #else =20 static inline void tdx_early_init(void) { }; +static inline void tdx_safe_halt(void) { }; =20 #endif /* CONFIG_INTEL_TDX_GUEST */ =20 diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index e131d71b3cae..2e90d57cf86e 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -46,6 +46,7 @@ #include #include #include +#include =20 #include "process.h" =20 @@ -873,6 +874,9 @@ void select_idle_routine(const struct cpuinfo_x86 *c) } else if (prefer_mwait_c1_over_halt(c)) { pr_info("using mwait in idle threads\n"); x86_idle =3D mwait_idle; + } else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) { + pr_info("using TDX aware idle routine\n"); + x86_idle =3D tdx_safe_halt; } else x86_idle =3D default_idle; } --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE1E2C433FE for ; Fri, 18 Mar 2022 15:31:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238103AbiCRPcl (ORCPT ); Fri, 18 Mar 2022 11:32:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238048AbiCRPcO (ORCPT ); Fri, 18 Mar 2022 11:32:14 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4C03187BB7 for ; Fri, 18 Mar 2022 08:30:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617447; x=1679153447; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vN6WcSfTQv8qfP8rq3qI612bnl8SkCCTfJrN6qzcq3o=; b=GdM6SfqxHvCToLocUkA2+RQv5w0RjDUQNzq9IV1Q2VHUMkp6feNKc3ui 4SnBPelEL6ZTqVXgeyxHCZcqsAUaPMcc/CJAMT05fXlSrqD4n9t10iI9o bh5MXiDmNz6LmMZDLSwEuaF0AAN1yabq+C8BkGSDf4RyiqZnIBdObCtWt PAUzWLmxOwDIVc5olsXYQmJpGL9y0fag9LF4zHYJUmfujeQC8rohqVIeK oozwKiugLK7QWJxq2d7F9hWQB0CflnJoFYvvIVw3LuT4YRU9737kP4tBG hYaqTX33QHvQRpbCwoFijzace5xbtKrbEyS3Nkr4UImAnWhSw8LkpF018 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="256883203" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="256883203" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="715523727" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga005.jf.intel.com with ESMTP; 18 Mar 2022 08:30:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 1E0CB48B; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv7 09/30] x86/tdx: Add MSR support for TDX guests Date: Fri, 18 Mar 2022 18:30:27 +0300 Message-Id: <20220318153048.51177-10-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Use hypercall to emulate MSR read/write for the TDX platform. There are two viable approaches for doing MSRs in a TD guest: 1. Execute the RDMSR/WRMSR instructions like most VMs and bare metal do. Some will succeed, others will cause a #VE. All of those that cause a #VE will be handled with a TDCALL. 2. Use paravirt infrastructure. The paravirt hook has to keep a list of which MSRs would cause a #VE and use a TDCALL. All other MSRs execute RDMSR/WRMSR instructions directly. The second option can be ruled out because the list of MSRs was challenging to maintain. That leaves option #1 as the only viable solution for the minimal TDX support. Kernel relies on the exception fixup machinery to handle MSR access errors. #VE handler uses the same exception fixup code as #GP. It covers MSR accesses along with other types of fixups. For performance-critical MSR writes (like TSC_DEADLINE), future patches will replace the WRMSR/#VE sequence with the direct TDCALL. RDMSR and WRMSR specification details can be found in Guest-Host-Communication Interface (GHCI) for Intel Trust Domain Extensions (Intel TDX) specification, sec titled "TDG.VP. VMCALL" and "TDG.VP.VMCALL". Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Thomas Gleixner --- arch/x86/coco/tdx/tdx.c | 42 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index ed7302581cc7..00ff0a830970 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -142,6 +142,44 @@ void __cpuidle tdx_safe_halt(void) WARN_ONCE(1, "HLT instruction emulation failed\n"); } =20 +static bool read_msr(struct pt_regs *regs) +{ + struct tdx_hypercall_args args =3D { + .r10 =3D TDX_HYPERCALL_STANDARD, + .r11 =3D hcall_func(EXIT_REASON_MSR_READ), + .r12 =3D regs->cx, + }; + + /* + * Emulate the MSR read via hypercall. More info about ABI + * can be found in TDX Guest-Host-Communication Interface + * (GHCI), section titled "TDG.VP.VMCALL". + */ + if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT)) + return false; + + regs->ax =3D lower_32_bits(args.r11); + regs->dx =3D upper_32_bits(args.r11); + return true; +} + +static bool write_msr(struct pt_regs *regs) +{ + struct tdx_hypercall_args args =3D { + .r10 =3D TDX_HYPERCALL_STANDARD, + .r11 =3D hcall_func(EXIT_REASON_MSR_WRITE), + .r12 =3D regs->cx, + .r13 =3D (u64)regs->dx << 32 | regs->ax, + }; + + /* + * Emulate the MSR write via hypercall. More info about ABI + * can be found in TDX Guest-Host-Communication Interface + * (GHCI) section titled "TDG.VP.VMCALL". + */ + return !__tdx_hypercall(&args, 0); +} + void tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; @@ -178,6 +216,10 @@ static bool virt_exception_kernel(struct pt_regs *regs= , struct ve_info *ve) switch (ve->exit_reason) { case EXIT_REASON_HLT: return handle_halt(); + case EXIT_REASON_MSR_READ: + return read_msr(regs); + case EXIT_REASON_MSR_WRITE: + return write_msr(regs); default: pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); return false; --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8407AC4332F for ; Fri, 18 Mar 2022 15:32:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238172AbiCRPcx (ORCPT ); Fri, 18 Mar 2022 11:32:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238057AbiCRPcQ (ORCPT ); Fri, 18 Mar 2022 11:32:16 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23D1A181B34 for ; Fri, 18 Mar 2022 08:30:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617457; x=1679153457; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xBrD9faVfntgWnuya1iuDAqcY9OqfL5gZb7hKgUoxq0=; b=aEQtxgZpkKADnAkDBS55E6KWIBi0RH6zHDxWWEmMv3tNl2EUDF1zrk0X In4AxwXGQ0FAFJBfB4Hu5DCeRYKw3RQqm6QmOZJU/BU0nIZJh57LVEq61 6zeic/rtJ6P5wmRJvU5g3s5ZBVx0flByTdRZYdlmHX03cJ+jBZwdyFwDT yhLLPRhdcEK+NssrxC292rlFITnXSz78AASDt2bJUpStk10bPWBgWF4Q+ ajl3ejbk8mfh0ZuFOxvUoiq73iF3anVHFJu9UphJhc3UO9aPUzCnSUPkf WTmijm4D+qVld8KkW3Vxz6MwMRHMX9Vn98Bs+5X0w0xrZRdTtRb5GBF85 A==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="255982582" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="255982582" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="558495420" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga008.jf.intel.com with ESMTP; 18 Mar 2022 08:30:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 2B3984E5; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv7 10/30] x86/tdx: Handle CPUID via #VE Date: Fri, 18 Mar 2022 18:30:28 +0300 Message-Id: <20220318153048.51177-11-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In TDX guests, most CPUID leaf/sub-leaf combinations are virtualized by the TDX module while some trigger #VE. Implement the #VE handling for EXIT_REASON_CPUID by handing it through the hypercall, which in turn lets the TDX module handle it by invoking the host VMM. More details on CPUID Virtualization can be found in the TDX module specification, the section titled "CPUID Virtualization". Note that VMM that handles the hypercall is not trusted. It can return data that may steer the guest kernel in wrong direct. Only allow VMM to control range reserved for hypervisor communication. Return all-zeros for any CPUID outside the hypervisor range. It matches CPU behaviour for non-supported leaf. Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Thomas Gleixner --- arch/x86/coco/tdx/tdx.c | 58 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 00ff0a830970..50c3b97d6db7 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -180,6 +180,48 @@ static bool write_msr(struct pt_regs *regs) return !__tdx_hypercall(&args, 0); } =20 +static bool handle_cpuid(struct pt_regs *regs) +{ + struct tdx_hypercall_args args =3D { + .r10 =3D TDX_HYPERCALL_STANDARD, + .r11 =3D hcall_func(EXIT_REASON_CPUID), + .r12 =3D regs->ax, + .r13 =3D regs->cx, + }; + + /* + * Only allow VMM to control range reserved for hypervisor + * communication. + * + * Return all-zeros for any CPUID outside the range. It matches CPU + * behaviour for non-supported leaf. + */ + if (regs->ax < 0x40000000 || regs->ax > 0x4FFFFFFF) { + regs->ax =3D regs->bx =3D regs->cx =3D regs->dx =3D 0; + return true; + } + + /* + * Emulate the CPUID instruction via a hypercall. More info about + * ABI can be found in TDX Guest-Host-Communication Interface + * (GHCI), section titled "VP.VMCALL". + */ + if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT)) + return false; + + /* + * As per TDX GHCI CPUID ABI, r12-r15 registers contain contents of + * EAX, EBX, ECX, EDX registers after the CPUID instruction execution. + * So copy the register contents back to pt_regs. + */ + regs->ax =3D args.r12; + regs->bx =3D args.r13; + regs->cx =3D args.r14; + regs->dx =3D args.r15; + + return true; +} + void tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; @@ -210,6 +252,18 @@ void tdx_get_ve_info(struct ve_info *ve) ve->instr_info =3D upper_32_bits(out.r10); } =20 +/* Handle the user initiated #VE */ +static bool virt_exception_user(struct pt_regs *regs, struct ve_info *ve) +{ + switch (ve->exit_reason) { + case EXIT_REASON_CPUID: + return handle_cpuid(regs); + default: + pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); + return false; + } +} + /* Handle the kernel #VE */ static bool virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve) { @@ -220,6 +274,8 @@ static bool virt_exception_kernel(struct pt_regs *regs,= struct ve_info *ve) return read_msr(regs); case EXIT_REASON_MSR_WRITE: return write_msr(regs); + case EXIT_REASON_CPUID: + return handle_cpuid(regs); default: pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); return false; @@ -231,7 +287,7 @@ bool tdx_handle_virt_exception(struct pt_regs *regs, st= ruct ve_info *ve) bool ret; =20 if (user_mode(regs)) - ret =3D false; + ret =3D virt_exception_user(regs, ve); else ret =3D virt_exception_kernel(regs, ve); =20 --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C461CC433EF for ; Fri, 18 Mar 2022 15:31:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234192AbiCRPcR (ORCPT ); Fri, 18 Mar 2022 11:32:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238052AbiCRPcG (ORCPT ); Fri, 18 Mar 2022 11:32:06 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CE0E181B34 for ; Fri, 18 Mar 2022 08:30:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617445; x=1679153445; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zUSjuCc2KWi4DDSLQKo+k0BdkoInqcJcDXFHbFwNot4=; b=IFCplKAciQZjqYva61NITffLV/os/4BY6Hpt10WwaE/2xdY31Ir4RWkV VkrIiJ63lQZ5klaGk8aFMqGe+lroELKTexTlOZNTYyasYXf0cLvNhAx5d Lzh113YbtewsJRn6zvPfzydTUAgcYNC+YtcTqB4QwgwuLll+7KHSkAAAU S+Q83iD0q+wmynSVK1YySjKc2RXxD5gChctItPC+xAcZp9pD5oD3SMFiN uHkvqLWVvpfdVQYdWwrZ7yl9ZLQEZNbwZu6vdshANVGVjwz4esG3mRAdz TI1ZcktySCEfPZhJ3D3LuZrVhzt0ehwyngARg2GVnvODdHcVx3HhmP+LL A==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="244612585" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="244612585" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="645558392" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga002.fm.intel.com with ESMTP; 18 Mar 2022 08:30:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 3D54D527; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv7 11/30] x86/tdx: Handle in-kernel MMIO Date: Fri, 18 Mar 2022 18:30:29 +0300 Message-Id: <20220318153048.51177-12-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In non-TDX VMs, MMIO is implemented by providing the guest a mapping which will cause a VMEXIT on access and then the VMM emulating the instruction that caused the VMEXIT. That's not possible for TDX VM. To emulate an instruction an emulator needs two things: - R/W access to the register file to read/modify instruction arguments and see RIP of the faulted instruction. - Read access to memory where instruction is placed to see what to emulate. In this case it is guest kernel text. Both of them are not available to VMM in TDX environment: - Register file is never exposed to VMM. When a TD exits to the module, it saves registers into the state-save area allocated for that TD. The module then scrubs these registers before returning execution control to the VMM, to help prevent leakage of TD state. - TDX does not allow guests to execute from shared memory. All executed instructions are in TD-private memory. Being private to the TD, VMMs have no way to access TD-private memory and no way to read the instruction to decode and emulate it. In TDX the MMIO regions are instead configured by VMM to trigger a #VE exception in the guest. Add #VE handling that emulates the MMIO instruction inside the guest and converts it into a controlled hypercall to the host. This approach is bad for performance. But, it has (virtually) no impact on the size of the kernel image and will work for a wide variety of drivers. This allows TDX deployments to use arbitrary devices and device drivers, including virtio. TDX customers have asked for the capability to use random devices in their deployments. In other words, even if all of the work was done to paravirtualize all x86 MMIO users and virtio, this approach would still be needed. There is essentially no way to get rid of this code. This approach is functional for all in-kernel MMIO users current and future and does so with a minimal amount of code and kernel image bloat. MMIO addresses can be used with any CPU instruction that accesses memory. Address only MMIO accesses done via io.h helpers, such as 'readl()' or 'writeq()'. Any CPU instruction that accesses memory can also be used to access MMIO. However, by convention, MMIO access are typically performed via io.h helpers such as 'readl()' or 'writeq()'. The io.h helpers intentionally use a limited set of instructions when accessing MMIO. This known, limited set of instructions makes MMIO instruction decoding and emulation feasible in KVM hosts and SEV guests today. MMIO accesses performed without the io.h helpers are at the mercy of the compiler. Compilers can and will generate a much more broad set of instructions which can not practically be decoded and emulated. TDX guests will oops if they encounter one of these decoding failures. This means that TDX guests *must* use the io.h helpers to access MMIO. This requirement is not new. Both KVM hosts and AMD SEV guests have the same limitations on MMIO access. =3D=3D=3D Potential alternative approaches =3D=3D=3D =3D=3D Paravirtualizing all MMIO =3D=3D An alternative to letting MMIO induce a #VE exception is to avoid the #VE in the first place. Similar to the port I/O case, it is theoretically possible to paravirtualize MMIO accesses. Like the exception-based approach offered here, a fully paravirtualized approach would be limited to MMIO users that leverage common infrastructure like the io.h macros. However, any paravirtual approach would be patching approximately 120k call sites. Any paravirtual approach would need to replace a bare memory access instruction with (at least) a function call. With a conservative overhead estimation of 5 bytes per call site (CALL instruction), it leads to bloating code by 600k. Many drivers will never be used in the TDX environment and the bloat cannot be justified. =3D=3D Patching TDX drivers =3D=3D Rather than touching the entire kernel, it might also be possible to just go after drivers that use MMIO in TDX guests *and* are performance critical to justify the effrort. Right now, that's limited only to virtio. All virtio MMIO appears to be done through a single function, which makes virtio eminently easy to patch. This approach will be adopted in the future, removing the bulk of MMIO #VEs. The #VE-based MMIO will remain serving non-virtio use cases. Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Thomas Gleixner --- arch/x86/coco/tdx/tdx.c | 121 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 50c3b97d6db7..ab10bc73a7c5 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -8,11 +8,17 @@ #include #include #include +#include +#include =20 /* TDX module Call Leaf IDs */ #define TDX_GET_INFO 1 #define TDX_GET_VEINFO 3 =20 +/* MMIO direction */ +#define EPT_READ 0 +#define EPT_WRITE 1 + /* * Wrapper for standard use of __tdx_hypercall with no output aside from * return code. @@ -222,6 +228,119 @@ static bool handle_cpuid(struct pt_regs *regs) return true; } =20 +static bool mmio_read(int size, unsigned long addr, unsigned long *val) +{ + struct tdx_hypercall_args args =3D { + .r10 =3D TDX_HYPERCALL_STANDARD, + .r11 =3D hcall_func(EXIT_REASON_EPT_VIOLATION), + .r12 =3D size, + .r13 =3D EPT_READ, + .r14 =3D addr, + .r15 =3D *val, + }; + + if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT)) + return false; + *val =3D args.r11; + return true; +} + +static bool mmio_write(int size, unsigned long addr, unsigned long val) +{ + return !_tdx_hypercall(hcall_func(EXIT_REASON_EPT_VIOLATION), size, + EPT_WRITE, addr, val); +} + +static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve) +{ + char buffer[MAX_INSN_SIZE]; + unsigned long *reg, val; + struct insn insn =3D {}; + enum mmio_type mmio; + int size, extend_size; + u8 extend_val =3D 0; + + /* Only in-kernel MMIO is supported */ + if (WARN_ON_ONCE(user_mode(regs))) + return false; + + if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE)) + return false; + + if (insn_decode(&insn, buffer, MAX_INSN_SIZE, INSN_MODE_64)) + return false; + + mmio =3D insn_decode_mmio(&insn, &size); + if (WARN_ON_ONCE(mmio =3D=3D MMIO_DECODE_FAILED)) + return false; + + if (mmio !=3D MMIO_WRITE_IMM && mmio !=3D MMIO_MOVS) { + reg =3D insn_get_modrm_reg_ptr(&insn, regs); + if (!reg) + return false; + } + + ve->instr_len =3D insn.length; + + /* Handle writes first */ + switch (mmio) { + case MMIO_WRITE: + memcpy(&val, reg, size); + return mmio_write(size, ve->gpa, val); + case MMIO_WRITE_IMM: + val =3D insn.immediate.value; + return mmio_write(size, ve->gpa, val); + case MMIO_READ: + case MMIO_READ_ZERO_EXTEND: + case MMIO_READ_SIGN_EXTEND: + /* Reads are handled below */ + break; + case MMIO_MOVS: + case MMIO_DECODE_FAILED: + /* + * MMIO was accessed with an instruction that could not be + * decoded or handled properly. It was likely not using io.h + * helpers or accessed MMIO accidentally. + */ + return false; + default: + WARN_ONCE(1, "Unknown insn_decode_mmio() decode value?"); + return false; + } + + /* Handle reads */ + if (!mmio_read(size, ve->gpa, &val)) + return false; + + switch (mmio) { + case MMIO_READ: + /* Zero-extend for 32-bit operation */ + extend_size =3D size =3D=3D 4 ? sizeof(*reg) : 0; + break; + case MMIO_READ_ZERO_EXTEND: + /* Zero extend based on operand size */ + extend_size =3D insn.opnd_bytes; + break; + case MMIO_READ_SIGN_EXTEND: + /* Sign extend based on operand size */ + extend_size =3D insn.opnd_bytes; + if (size =3D=3D 1 && val & BIT(7)) + extend_val =3D 0xFF; + else if (size > 1 && val & BIT(15)) + extend_val =3D 0xFF; + break; + default: + /* All other cases has to be covered with the first switch() */ + WARN_ON_ONCE(1); + return false; + } + + if (extend_size) + memset(reg, extend_val, extend_size); + memcpy(reg, &val, size); + return true; +} + void tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; @@ -276,6 +395,8 @@ static bool virt_exception_kernel(struct pt_regs *regs,= struct ve_info *ve) return write_msr(regs); case EXIT_REASON_CPUID: return handle_cpuid(regs); + case EXIT_REASON_EPT_VIOLATION: + return handle_mmio(regs, ve); default: pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); return false; --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A24EFC433F5 for ; Fri, 18 Mar 2022 15:31:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238084AbiCRPc2 (ORCPT ); Fri, 18 Mar 2022 11:32:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238039AbiCRPcH (ORCPT ); Fri, 18 Mar 2022 11:32:07 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8048183825 for ; Fri, 18 Mar 2022 08:30:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617445; x=1679153445; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vNzFzzPHe954x9p3SbOBt34LvV87Fb4PjuFuuahnWbA=; b=YreO+f/e910tdUlOclAC84vNO0jSVH+Wf45RoBsIHM5dUlPSl4gny0b9 Z1mD0lb7ts4WoR+G8Mk4cxNYBSTZX5lcrcmwgPYhfLpW1gZEmJ6Tqo0v2 VSQu0smeAzrRJEZ3kpXXXLNO6Xe17S0sl3Tg6iN1y14sxXK7X9hjmDM2F Ngg1zBn5my5XZZrml0xrkj4qP9C2bGhK+/pLSlN4/QIHtkM/tqfkztS0q dr4HDNaT+ooARS/m12/5tMEVaUerd0YRrbR8kmrRNDoAJ/HSDVIhRkZge BQO+AZbeNh+N2cjztb0Uey8ltyHvfQypSCGCXUy75EtXTHxI/f3hRHPqT w==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="256883198" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="256883198" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="715523720" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga005.jf.intel.com with ESMTP; 18 Mar 2022 08:30:37 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 4A6EA681; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" , Dave Hansen Subject: [PATCHv7 12/30] x86/tdx: Detect TDX at early kernel decompression time Date: Fri, 18 Mar 2022 18:30:30 +0300 Message-Id: <20220318153048.51177-13-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan The early decompression code does port I/O for its console output. But, handling the decompression-time port I/O demands a different approach from normal runtime because the IDT required to support #VE based port I/O emulation is not yet set up. Paravirtualizing I/O calls during the decompression step is acceptable because the decompression code doesn't have a lot of call sites to IO instruction. To support port I/O in decompression code, TDX must be detected before the decompression code might do port I/O. Detect whether the kernel runs in a TDX guest. Add an early_is_tdx_guest() interface to query the cached TDX guest status in the decompression code. TDX is detected with CPUID. Make cpuid_count() accessible outside boot/cpuflags.c. TDX detection in the main kernel is very similar. Move common bits into . The actual port I/O paravirtualization will come later in the series. Reviewed-by: Tony Luck Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen --- arch/x86/boot/compressed/Makefile | 1 + arch/x86/boot/compressed/misc.c | 8 ++++++++ arch/x86/boot/compressed/misc.h | 2 ++ arch/x86/boot/compressed/tdx.c | 16 ++++++++++++++++ arch/x86/boot/compressed/tdx.h | 13 +++++++++++++ arch/x86/boot/cpuflags.c | 3 +-- arch/x86/boot/cpuflags.h | 1 + arch/x86/include/asm/shared/tdx.h | 8 ++++++++ arch/x86/include/asm/tdx.h | 4 +--- 9 files changed, 51 insertions(+), 5 deletions(-) create mode 100644 arch/x86/boot/compressed/tdx.c create mode 100644 arch/x86/boot/compressed/tdx.h create mode 100644 arch/x86/include/asm/shared/tdx.h diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/M= akefile index 6115274fe10f..732f6b21ecbd 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -101,6 +101,7 @@ ifdef CONFIG_X86_64 endif =20 vmlinux-objs-$(CONFIG_ACPI) +=3D $(obj)/acpi.o +vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) +=3D $(obj)/tdx.o =20 vmlinux-objs-$(CONFIG_EFI_MIXED) +=3D $(obj)/efi_thunk_$(BITS).o efi-obj-$(CONFIG_EFI_STUB) =3D $(objtree)/drivers/firmware/efi/libstub/lib= .a diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/mis= c.c index a4339cb2d247..2b1169869b96 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -370,6 +370,14 @@ asmlinkage __visible void *extract_kernel(void *rmode,= memptr heap, lines =3D boot_params->screen_info.orig_video_lines; cols =3D boot_params->screen_info.orig_video_cols; =20 + /* + * Detect TDX guest environment. + * + * It has to be done before console_init() in order to use + * paravirtualized port I/O operations if needed. + */ + early_tdx_detect(); + console_init(); =20 /* diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/mis= c.h index 16ed360b6692..0d8e275a9d96 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -28,6 +28,8 @@ #include #include =20 +#include "tdx.h" + #define BOOT_CTYPE_H #include =20 diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c new file mode 100644 index 000000000000..5f6d01a2f1f4 --- /dev/null +++ b/arch/x86/boot/compressed/tdx.c @@ -0,0 +1,16 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "../cpuflags.h" +#include "../string.h" + +#include + +void early_tdx_detect(void) +{ + u32 eax, sig[3]; + + cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]); + + if (memcmp(TDX_IDENT, sig, sizeof(sig))) + return; +} diff --git a/arch/x86/boot/compressed/tdx.h b/arch/x86/boot/compressed/tdx.h new file mode 100644 index 000000000000..9055482cd35c --- /dev/null +++ b/arch/x86/boot/compressed/tdx.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef BOOT_COMPRESSED_TDX_H +#define BOOT_COMPRESSED_TDX_H + +#include + +#ifdef CONFIG_INTEL_TDX_GUEST +void early_tdx_detect(void); +#else +static inline void early_tdx_detect(void) { }; +#endif + +#endif /* BOOT_COMPRESSED_TDX_H */ diff --git a/arch/x86/boot/cpuflags.c b/arch/x86/boot/cpuflags.c index a0b75f73dc63..a83d67ec627d 100644 --- a/arch/x86/boot/cpuflags.c +++ b/arch/x86/boot/cpuflags.c @@ -71,8 +71,7 @@ int has_eflag(unsigned long mask) # define EBX_REG "=3Db" #endif =20 -static inline void cpuid_count(u32 id, u32 count, - u32 *a, u32 *b, u32 *c, u32 *d) +void cpuid_count(u32 id, u32 count, u32 *a, u32 *b, u32 *c, u32 *d) { asm volatile(".ifnc %%ebx,%3 ; movl %%ebx,%3 ; .endif \n\t" "cpuid \n\t" diff --git a/arch/x86/boot/cpuflags.h b/arch/x86/boot/cpuflags.h index 2e20814d3ce3..475b8fde90f7 100644 --- a/arch/x86/boot/cpuflags.h +++ b/arch/x86/boot/cpuflags.h @@ -17,5 +17,6 @@ extern u32 cpu_vendor[3]; =20 int has_eflag(unsigned long mask); void get_cpuflags(void); +void cpuid_count(u32 id, u32 count, u32 *a, u32 *b, u32 *c, u32 *d); =20 #endif diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/share= d/tdx.h new file mode 100644 index 000000000000..8209ba9ffe1a --- /dev/null +++ b/arch/x86/include/asm/shared/tdx.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_SHARED_TDX_H +#define _ASM_X86_SHARED_TDX_H + +#define TDX_CPUID_LEAF_ID 0x21 +#define TDX_IDENT "IntelTDX " + +#endif /* _ASM_X86_SHARED_TDX_H */ diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index a792096a3750..ca23adaa6681 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -6,9 +6,7 @@ #include #include #include - -#define TDX_CPUID_LEAF_ID 0x21 -#define TDX_IDENT "IntelTDX " +#include =20 #define TDX_HYPERCALL_STANDARD 0 =20 --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33817C433EF for ; Fri, 18 Mar 2022 15:31:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238097AbiCRPch (ORCPT ); Fri, 18 Mar 2022 11:32:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238041AbiCRPcH (ORCPT ); Fri, 18 Mar 2022 11:32:07 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3017A1877DE for ; Fri, 18 Mar 2022 08:30:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617447; x=1679153447; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aTa5BqUOjHSU0UOUB134z7W9sAixWU57Pm+JiJRgK24=; b=We9XtufUCB62blJlizrM6N0JoDFprs/657koUxsN63WmXgCiqBBxGH6S 502bNlfqxf+eUFWFQs8QwdV/oamVMUvBdc41tM9+L8+erTDANlhKW4Ju9 DRH0G08JLbf6y2OCOch4KXcp/sddPBRDkpYfYD4+oT4fZnYoFA8yNBSew 9KzF4iPIwJRytAD1SXAXchceRaRnssEeGqjRAaJ4SGnwXxed16Mh0AxOq xwpLPoeNTVPziPBKQYqI3PpYR83AISZrWcls/gNUN67bb5xU4+uDPLPiL oQT2G/SzESzJ2Hx+49+BZ3/D0BMtxZJTGmRwqli3dEnUGzWqIglm9bFYg A==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="244612606" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="244612606" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="645558404" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga002.fm.intel.com with ESMTP; 18 Mar 2022 08:30:40 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 57A23692; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv7 13/30] x86: Adjust types used in port I/O helpers Date: Fri, 18 Mar 2022 18:30:31 +0300 Message-Id: <20220318153048.51177-14-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Change port I/O helpers to use u8/u16/u32 instead of unsigned char/short/int for values. Use u16 instead of int for port number. It aligns the helpers with implementation in boot stub in preparation for consolidation. Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Thomas Gleixner --- arch/x86/include/asm/io.h | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index f6d91ecb8026..638c1a2a82e0 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -258,37 +258,37 @@ static inline void slow_down_io(void) #endif =20 #define BUILDIO(bwl, bw, type) \ -static inline void out##bwl(unsigned type value, int port) \ +static inline void out##bwl(type value, u16 port) \ { \ asm volatile("out" #bwl " %" #bw "0, %w1" \ : : "a"(value), "Nd"(port)); \ } \ \ -static inline unsigned type in##bwl(int port) \ +static inline type in##bwl(u16 port) \ { \ - unsigned type value; \ + type value; \ asm volatile("in" #bwl " %w1, %" #bw "0" \ : "=3Da"(value) : "Nd"(port)); \ return value; \ } \ \ -static inline void out##bwl##_p(unsigned type value, int port) \ +static inline void out##bwl##_p(type value, u16 port) \ { \ out##bwl(value, port); \ slow_down_io(); \ } \ \ -static inline unsigned type in##bwl##_p(int port) \ +static inline type in##bwl##_p(u16 port) \ { \ - unsigned type value =3D in##bwl(port); \ + type value =3D in##bwl(port); \ slow_down_io(); \ return value; \ } \ \ -static inline void outs##bwl(int port, const void *addr, unsigned long cou= nt) \ +static inline void outs##bwl(u16 port, const void *addr, unsigned long cou= nt) \ { \ if (cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) { \ - unsigned type *value =3D (unsigned type *)addr; \ + type *value =3D (type *)addr; \ while (count) { \ out##bwl(*value, port); \ value++; \ @@ -301,10 +301,10 @@ static inline void outs##bwl(int port, const void *ad= dr, unsigned long count) \ } \ } \ \ -static inline void ins##bwl(int port, void *addr, unsigned long count) \ +static inline void ins##bwl(u16 port, void *addr, unsigned long count) \ { \ if (cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) { \ - unsigned type *value =3D (unsigned type *)addr; \ + type *value =3D (type *)addr; \ while (count) { \ *value =3D in##bwl(port); \ value++; \ @@ -317,9 +317,9 @@ static inline void ins##bwl(int port, void *addr, unsig= ned long count) \ } \ } =20 -BUILDIO(b, b, char) -BUILDIO(w, w, short) -BUILDIO(l, , int) +BUILDIO(b, b, u8) +BUILDIO(w, w, u16) +BUILDIO(l, , u32) =20 #define inb inb #define inw inw --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CF2BC433F5 for ; Fri, 18 Mar 2022 15:31:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238090AbiCRPcc (ORCPT ); Fri, 18 Mar 2022 11:32:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238035AbiCRPcO (ORCPT ); Fri, 18 Mar 2022 11:32:14 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE7AD187BAC for ; Fri, 18 Mar 2022 08:30:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617447; x=1679153447; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MrbfU+2Ld2oSKZSOCd/qGfZ6MHnP7qtA9jz4VC0Pg0E=; b=hrSHZKx9HuMnCvO4eUZroEqEn/zfc5tMkyBNHBmTXdQB/jPehDJL2n1d P8D1UF8I/Phi0ur+vs4t4TrcZorIAwZdAc33pDob+FaBtQYS4iAljlowu iYGAl75SBVM1q7YRDWB/XlQBF9MAoSpS+53Jh+Dp2UzKPL8EOeUnZk7u2 ePxmFzYsiL7K/hZm27w6vmhw5eouYZhG3pXUIOYyVGf1Hqaz7Ndadoqxr yWiQWyxG3T+Z/C86efItLHwBcpjA3wAsmiQmARn/Fkbtz42F2I10jyzYw hVLL0ova8FQLMoCo/6IQIma8cyjlL2XQzQLx307/5JrmEf5tF9K49+B2d g==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="281967573" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="281967573" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="691353416" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga001.fm.intel.com with ESMTP; 18 Mar 2022 08:30:40 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 6597571E; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv7 14/30] x86: Consolidate port I/O helpers Date: Fri, 18 Mar 2022 18:30:32 +0300 Message-Id: <20220318153048.51177-15-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There are two implementations of port I/O helpers: one in the kernel and one in the boot stub. Move the helpers required for both to and use the one implementation everywhere. Signed-off-by: Kirill A. Shutemov Reviewed-by: Thomas Gleixner --- arch/x86/boot/boot.h | 35 +------------------------------- arch/x86/boot/compressed/misc.h | 2 +- arch/x86/include/asm/io.h | 22 ++------------------ arch/x86/include/asm/shared/io.h | 34 +++++++++++++++++++++++++++++++ 4 files changed, 38 insertions(+), 55 deletions(-) create mode 100644 arch/x86/include/asm/shared/io.h diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h index 34c9dbb6a47d..22a474c5b3e8 100644 --- a/arch/x86/boot/boot.h +++ b/arch/x86/boot/boot.h @@ -23,6 +23,7 @@ #include #include #include +#include #include "bitops.h" #include "ctype.h" #include "cpuflags.h" @@ -35,40 +36,6 @@ extern struct boot_params boot_params; =20 #define cpu_relax() asm volatile("rep; nop") =20 -/* Basic port I/O */ -static inline void outb(u8 v, u16 port) -{ - asm volatile("outb %0,%1" : : "a" (v), "dN" (port)); -} -static inline u8 inb(u16 port) -{ - u8 v; - asm volatile("inb %1,%0" : "=3Da" (v) : "dN" (port)); - return v; -} - -static inline void outw(u16 v, u16 port) -{ - asm volatile("outw %0,%1" : : "a" (v), "dN" (port)); -} -static inline u16 inw(u16 port) -{ - u16 v; - asm volatile("inw %1,%0" : "=3Da" (v) : "dN" (port)); - return v; -} - -static inline void outl(u32 v, u16 port) -{ - asm volatile("outl %0,%1" : : "a" (v), "dN" (port)); -} -static inline u32 inl(u16 port) -{ - u32 v; - asm volatile("inl %1,%0" : "=3Da" (v) : "dN" (port)); - return v; -} - static inline void io_delay(void) { const u16 DELAY_PORT =3D 0x80; diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/mis= c.h index 0d8e275a9d96..8a253e85f990 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -22,11 +22,11 @@ #include #include #include -#include #include #include #include #include +#include =20 #include "tdx.h" =20 diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 638c1a2a82e0..a1eb218a49f8 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -44,6 +44,7 @@ #include #include #include +#include =20 #define build_mmio_read(name, size, type, reg, barrier) \ static inline type name(const volatile void __iomem *addr) \ @@ -258,20 +259,6 @@ static inline void slow_down_io(void) #endif =20 #define BUILDIO(bwl, bw, type) \ -static inline void out##bwl(type value, u16 port) \ -{ \ - asm volatile("out" #bwl " %" #bw "0, %w1" \ - : : "a"(value), "Nd"(port)); \ -} \ - \ -static inline type in##bwl(u16 port) \ -{ \ - type value; \ - asm volatile("in" #bwl " %w1, %" #bw "0" \ - : "=3Da"(value) : "Nd"(port)); \ - return value; \ -} \ - \ static inline void out##bwl##_p(type value, u16 port) \ { \ out##bwl(value, port); \ @@ -320,10 +307,8 @@ static inline void ins##bwl(u16 port, void *addr, unsi= gned long count) \ BUILDIO(b, b, u8) BUILDIO(w, w, u16) BUILDIO(l, , u32) +#undef BUILDIO =20 -#define inb inb -#define inw inw -#define inl inl #define inb_p inb_p #define inw_p inw_p #define inl_p inl_p @@ -331,9 +316,6 @@ BUILDIO(l, , u32) #define insw insw #define insl insl =20 -#define outb outb -#define outw outw -#define outl outl #define outb_p outb_p #define outw_p outw_p #define outl_p outl_p diff --git a/arch/x86/include/asm/shared/io.h b/arch/x86/include/asm/shared= /io.h new file mode 100644 index 000000000000..c0ef921c0586 --- /dev/null +++ b/arch/x86/include/asm/shared/io.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_SHARED_IO_H +#define _ASM_X86_SHARED_IO_H + +#include + +#define BUILDIO(bwl, bw, type) \ +static inline void __out##bwl(type value, u16 port) \ +{ \ + asm volatile("out" #bwl " %" #bw "0, %w1" \ + : : "a"(value), "Nd"(port)); \ +} \ + \ +static inline type __in##bwl(u16 port) \ +{ \ + type value; \ + asm volatile("in" #bwl " %w1, %" #bw "0" \ + : "=3Da"(value) : "Nd"(port)); \ + return value; \ +} + +BUILDIO(b, b, u8) +BUILDIO(w, w, u16) +BUILDIO(l, , u32) +#undef BUILDIO + +#define inb __inb +#define inw __inw +#define inl __inl +#define outb __outb +#define outw __outw +#define outl __outl + +#endif --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 167A4C433F5 for ; Fri, 18 Mar 2022 15:33:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238262AbiCRPeu (ORCPT ); Fri, 18 Mar 2022 11:34:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238147AbiCRPcw (ORCPT ); Fri, 18 Mar 2022 11:32:52 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E5E81B694F for ; Fri, 18 Mar 2022 08:31:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617466; x=1679153466; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GPsonxD2SH/VQBEfitQ+ooyvb6UrqdA98KPm1ddBE5M=; b=Lxx20Sm2WDXNnX+1F4DCXwbi8XIcB5kXIdXcX9ZhmPw4V8YPG/AuepMC xyRNKJRR0386Ab5QfdnCNrQMNB9KimbWCmsD9LCMXmFqLBVvAUQlI5NLe Maof7eAPncXfgaglVaOHFHDblDohWCQ1mbHOrq5PgpMCzxFGdZ20wcCPs gJ2QFBIu35+3EimSbLyDdjWTW4MHE/ibJGV87PvnFfdwknPK0TGfYF51W JS625+/TDRdEd6xNLZHEOER3mqs6LBquMW6kvNpzsWp8gnbzV3Q/30byg nOHJ5YWlFTmF3ZB+PTBLMVJldsS+Mr+BEeVNlcg8m0O1xOHLnCr1We5eI A==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="257344036" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="257344036" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="784269845" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga006.fm.intel.com with ESMTP; 18 Mar 2022 08:30:44 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 73E5876F; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv7 15/30] x86/boot: Port I/O: allow to hook up alternative helpers Date: Fri, 18 Mar 2022 18:30:33 +0300 Message-Id: <20220318153048.51177-16-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Port I/O instructions trigger #VE in the TDX environment. In response to the exception, kernel emulates these instructions using hypercalls. But during early boot, on the decompression stage, it is cumbersome to deal with #VE. It is cleaner to go to hypercalls directly, bypassing #VE handling. Add a way to hook up alternative port I/O helpers in the boot stub with a new pio_ops structure. For now, set the ops structure to just call the normal I/O operation functions. out*()/in*() macros redefined to use pio_ops callbacks. It eliminates need in changing call sites. io_delay() changed to use port I/O helper instead of inline assembly. Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen --- arch/x86/boot/boot.h | 4 ++-- arch/x86/boot/compressed/misc.c | 4 ++++ arch/x86/boot/compressed/misc.h | 2 +- arch/x86/boot/io.h | 41 +++++++++++++++++++++++++++++++++ arch/x86/boot/main.c | 4 ++++ arch/x86/realmode/rm/wakemain.c | 4 ++++ 6 files changed, 56 insertions(+), 3 deletions(-) create mode 100644 arch/x86/boot/io.h diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h index 22a474c5b3e8..b42b91606ca8 100644 --- a/arch/x86/boot/boot.h +++ b/arch/x86/boot/boot.h @@ -23,10 +23,10 @@ #include #include #include -#include #include "bitops.h" #include "ctype.h" #include "cpuflags.h" +#include "io.h" =20 /* Useful macros */ #define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) @@ -39,7 +39,7 @@ extern struct boot_params boot_params; static inline void io_delay(void) { const u16 DELAY_PORT =3D 0x80; - asm volatile("outb %%al,%0" : : "dN" (DELAY_PORT)); + outb(0, DELAY_PORT); } =20 /* These functions are used to reference data in other segments. */ diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/mis= c.c index 2b1169869b96..9fdef6af20a1 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -47,6 +47,8 @@ void *memmove(void *dest, const void *src, size_t n); */ struct boot_params *boot_params; =20 +struct port_io_ops pio_ops; + memptr free_mem_ptr; memptr free_mem_end_ptr; =20 @@ -370,6 +372,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, = memptr heap, lines =3D boot_params->screen_info.orig_video_lines; cols =3D boot_params->screen_info.orig_video_cols; =20 + init_default_io_ops(); + /* * Detect TDX guest environment. * diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/mis= c.h index 8a253e85f990..ea71cf3d64e1 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -26,7 +26,6 @@ #include #include #include -#include =20 #include "tdx.h" =20 @@ -35,6 +34,7 @@ =20 #define BOOT_BOOT_H #include "../ctype.h" +#include "../io.h" =20 #ifdef CONFIG_X86_64 #define memptr long diff --git a/arch/x86/boot/io.h b/arch/x86/boot/io.h new file mode 100644 index 000000000000..fd589ebcbd05 --- /dev/null +++ b/arch/x86/boot/io.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef BOOT_IO_H +#define BOOT_IO_H + +#include + +#undef inb +#undef inw +#undef inl +#undef outb +#undef outw +#undef outl + +struct port_io_ops { + u8 (*inb)(u16 port); + void (*outb)(u8 v, u16 port); + void (*outw)(u16 v, u16 port); +}; + +extern struct port_io_ops pio_ops; + +/* + * Use the normal I/O instructions by default. + * TDX guests override these to use hypercalls. + */ +static inline void init_default_io_ops(void) +{ + pio_ops.inb =3D __inb; + pio_ops.outb =3D __outb; + pio_ops.outw =3D __outw; +} + +/* + * Redirect port I/O operations via pio_ops callbacks. + * TDX guests override these callbacks with TDX-specific helpers. + */ +#define inb pio_ops.inb +#define outb pio_ops.outb +#define outw pio_ops.outw + +#endif diff --git a/arch/x86/boot/main.c b/arch/x86/boot/main.c index e3add857c2c9..1202d4f8a390 100644 --- a/arch/x86/boot/main.c +++ b/arch/x86/boot/main.c @@ -17,6 +17,8 @@ =20 struct boot_params boot_params __attribute__((aligned(16))); =20 +struct port_io_ops pio_ops; + char *HEAP =3D _end; char *heap_end =3D _end; /* Default end of heap =3D no heap */ =20 @@ -133,6 +135,8 @@ static void init_heap(void) =20 void main(void) { + init_default_io_ops(); + /* First, copy the boot header into the "zeropage" */ copy_boot_params(); =20 diff --git a/arch/x86/realmode/rm/wakemain.c b/arch/x86/realmode/rm/wakemai= n.c index 1d6437e6d2ba..a6f4d8388ad8 100644 --- a/arch/x86/realmode/rm/wakemain.c +++ b/arch/x86/realmode/rm/wakemain.c @@ -62,8 +62,12 @@ static void send_morse(const char *pattern) } } =20 +struct port_io_ops pio_ops; + void main(void) { + init_default_io_ops(); + /* Kill machine if structures are wrong */ if (wakeup_header.real_magic !=3D 0x12345678) while (1) --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73821C433EF for ; Fri, 18 Mar 2022 15:32:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238129AbiCRPcv (ORCPT ); Fri, 18 Mar 2022 11:32:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238064AbiCRPcO (ORCPT ); Fri, 18 Mar 2022 11:32:14 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8E3518F22D for ; Fri, 18 Mar 2022 08:30:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617451; x=1679153451; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=C5OFnN3edMKiN/TXzxy53Xa0qjOU0a7CBm3QNbzbWCc=; b=ahjWlAN4JjIrEITUdvzqSkgvLyV3XcICEJRA2a6doRfHUXMPhmzUfK0n 2b5OfnBWn6a9B9/StmTLu2h1A+cxSpSJ6qNYjvrYXJ285MkmMvO5H6jC3 CERtjVJEqAOPbOFvV8ejm3aOs5iiG4hBxs+JZ3vc1+uVB8pIlMndEO+b+ tOPHkK/g07jFvv/Sm3E/xiCWY1NBPWYL9AbtfbEs+10Dk0rbyrtoLIvrB xvNUgbHERjlo7tMp91HtepAa1IjPS1YplEvf2z9UziIcnLWKVrm0ITqt+ jHfJmh7IV84X/T39QSdjopEeEsQ0g/XhBwevpjtYGp33F1xJ8u1Cq4knr A==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="244612628" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="244612628" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="645558456" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga002.fm.intel.com with ESMTP; 18 Mar 2022 08:30:44 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 85C8B7B7; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv7 16/30] x86/boot: Port I/O: add decompression-time support for TDX Date: Fri, 18 Mar 2022 18:30:34 +0300 Message-Id: <20220318153048.51177-17-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Port I/O instructions trigger #VE in the TDX environment. In response to the exception, kernel emulates these instructions using hypercalls. But during early boot, on the decompression stage, it is cumbersome to deal with #VE. It is cleaner to go to hypercalls directly, bypassing #VE handling. Hook up TDX-specific port I/O helpers if booting in TDX environment. Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen --- arch/x86/boot/compressed/Makefile | 2 +- arch/x86/boot/compressed/tdcall.S | 3 ++ arch/x86/boot/compressed/tdx.c | 61 +++++++++++++++++++++++++++++++ arch/x86/include/asm/shared/tdx.h | 32 ++++++++++++++++ arch/x86/include/asm/tdx.h | 27 -------------- 5 files changed, 97 insertions(+), 28 deletions(-) create mode 100644 arch/x86/boot/compressed/tdcall.S diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/M= akefile index 732f6b21ecbd..8fd0e6ae2e1f 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -101,7 +101,7 @@ ifdef CONFIG_X86_64 endif =20 vmlinux-objs-$(CONFIG_ACPI) +=3D $(obj)/acpi.o -vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) +=3D $(obj)/tdx.o +vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) +=3D $(obj)/tdx.o $(obj)/tdcall.o =20 vmlinux-objs-$(CONFIG_EFI_MIXED) +=3D $(obj)/efi_thunk_$(BITS).o efi-obj-$(CONFIG_EFI_STUB) =3D $(objtree)/drivers/firmware/efi/libstub/lib= .a diff --git a/arch/x86/boot/compressed/tdcall.S b/arch/x86/boot/compressed/t= dcall.S new file mode 100644 index 000000000000..46d0495e0d3a --- /dev/null +++ b/arch/x86/boot/compressed/tdcall.S @@ -0,0 +1,3 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include "../../coco/tdx/tdcall.S" diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c index 5f6d01a2f1f4..ff1fbe94243b 100644 --- a/arch/x86/boot/compressed/tdx.c +++ b/arch/x86/boot/compressed/tdx.c @@ -2,9 +2,65 @@ =20 #include "../cpuflags.h" #include "../string.h" +#include "../io.h" +#include "error.h" + +#include +#include =20 #include =20 +/* Called from __tdx_hypercall() for unrecoverable failure */ +void __tdx_hypercall_failed(void) +{ + error("TDVMCALL failed. TDX module bug?"); +} + +static inline unsigned int tdx_io_in(int size, u16 port) +{ + struct tdx_hypercall_args args =3D { + .r10 =3D TDX_HYPERCALL_STANDARD, + .r11 =3D EXIT_REASON_IO_INSTRUCTION, + .r12 =3D size, + .r13 =3D 0, + .r14 =3D port, + }; + + if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT)) + return UINT_MAX; + + return args.r11; +} + +static inline void tdx_io_out(int size, u16 port, u32 value) +{ + struct tdx_hypercall_args args =3D { + .r10 =3D TDX_HYPERCALL_STANDARD, + .r11 =3D EXIT_REASON_IO_INSTRUCTION, + .r12 =3D size, + .r13 =3D 1, + .r14 =3D port, + .r15 =3D value, + }; + + __tdx_hypercall(&args, 0); +} + +static inline u8 tdx_inb(u16 port) +{ + return tdx_io_in(1, port); +} + +static inline void tdx_outb(u8 value, u16 port) +{ + tdx_io_out(1, port, value); +} + +static inline void tdx_outw(u16 value, u16 port) +{ + tdx_io_out(2, port, value); +} + void early_tdx_detect(void) { u32 eax, sig[3]; @@ -13,4 +69,9 @@ void early_tdx_detect(void) =20 if (memcmp(TDX_IDENT, sig, sizeof(sig))) return; + + /* Use hypercalls instead of I/O instructions */ + inb =3D tdx_inb; + outb =3D tdx_outb; + outw =3D tdx_outw; } diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/share= d/tdx.h index 8209ba9ffe1a..e53f26228fbb 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -2,7 +2,39 @@ #ifndef _ASM_X86_SHARED_TDX_H #define _ASM_X86_SHARED_TDX_H =20 +#include +#include + +#define TDX_HYPERCALL_STANDARD 0 + +#define TDX_HCALL_HAS_OUTPUT BIT(0) +#define TDX_HCALL_ISSUE_STI BIT(1) + #define TDX_CPUID_LEAF_ID 0x21 #define TDX_IDENT "IntelTDX " =20 +#ifndef __ASSEMBLY__ + +/* + * Used in __tdx_hypercall() to pass down and get back registers' values of + * the TDCALL instruction when requesting services from the VMM. + * + * This is a software only structure and not part of the TDX module/VMM AB= I. + */ +struct tdx_hypercall_args { + u64 r10; + u64 r11; + u64 r12; + u64 r13; + u64 r14; + u64 r15; +}; + +/* Used to request services from the VMM */ +u64 __tdx_hypercall(struct tdx_hypercall_args *args, unsigned long flags); + +/* Called from __tdx_hypercall() for unrecoverable failure */ +void __tdx_hypercall_failed(void); + +#endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_SHARED_TDX_H */ diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index ca23adaa6681..1093a5e5f446 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -3,16 +3,10 @@ #ifndef _ASM_X86_TDX_H #define _ASM_X86_TDX_H =20 -#include #include #include #include =20 -#define TDX_HYPERCALL_STANDARD 0 - -#define TDX_HCALL_HAS_OUTPUT BIT(0) -#define TDX_HCALL_ISSUE_STI BIT(1) - /* * SW-defined error codes. * @@ -40,21 +34,6 @@ struct tdx_module_output { u64 r11; }; =20 -/* - * Used in __tdx_hypercall() to pass down and get back registers' values of - * the TDCALL instruction when requesting services from the VMM. - * - * This is a software only structure and not part of the TDX module/VMM AB= I. - */ -struct tdx_hypercall_args { - u64 r10; - u64 r11; - u64 r12; - u64 r13; - u64 r14; - u64 r15; -}; - /* * Used by the #VE exception handler to gather the #VE exception * info from the TDX module. This is a software only structure @@ -79,12 +58,6 @@ void __init tdx_early_init(void); u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, struct tdx_module_output *out); =20 -/* Used to request services from the VMM */ -u64 __tdx_hypercall(struct tdx_hypercall_args *args, unsigned long flags); - -/* Called from __tdx_hypercall() for unrecoverable failure */ -void __tdx_hypercall_failed(void); - void tdx_get_ve_info(struct ve_info *ve); =20 bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve); --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3EBDC433F5 for ; Fri, 18 Mar 2022 15:33:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238053AbiCRPer (ORCPT ); Fri, 18 Mar 2022 11:34:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51220 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238140AbiCRPcw (ORCPT ); Fri, 18 Mar 2022 11:32:52 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E7981B6951 for ; Fri, 18 Mar 2022 08:31:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617466; x=1679153466; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jbcz579ugk1DZLFgXoNISGGRTPCba2zYCNzfG63YnFs=; b=Sb+dBbYwi32wpgSCZl8BsRlxpADuuA7Ibel+X6SDtJ6xDoltdhwxZUdN GObLl9hwZB9lSslDbr0Szd+URFJ5OAdAAw0wwomYORrKTjk7X5BVWcje/ ni0aDX0r3OcYXE8OYl48Ko0mt4k0QR5gnhnTPgWAnneWOyfK5NgZSVGlp hsuqaKZr/tm3FrWK2NXwjKXlSQe4HLD6T+FprtAK/5XpQHdIyyV8sAgnj QXqi8U+OsgMWZ2IZpqC4IAoQtS7LMNASpS804/+a+5j08cJdMSEKhOulE clqn1OAjU6YtKAx8I0uiGqV+JRGt5C2Zf9KGKVosYwG/hNXP4ky1oit3e Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="237767760" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="237767760" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="599534252" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga008.fm.intel.com with ESMTP; 18 Mar 2022 08:30:44 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 921A27B8; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" , Dave Hansen Subject: [PATCHv7 17/30] x86/tdx: Port I/O: add runtime hypercalls Date: Fri, 18 Mar 2022 18:30:35 +0300 Message-Id: <20220318153048.51177-18-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan TDX hypervisors cannot emulate instructions directly. This includes port I/O which is normally emulated in the hypervisor. All port I/O instructions inside TDX trigger the #VE exception in the guest and would be normally emulated there. Use a hypercall to emulate port I/O. Extend the tdx_handle_virt_exception() and add support to handle the #VE due to port I/O instructions. String I/O operations are not supported in TDX. Unroll them by declaring CC_ATTR_GUEST_UNROLL_STRING_IO confidential computing attribute. =3D=3D Userspace Implications =3D=3D The ioperm() facility allows userspace access to I/O instructions like inb/outb. Among other things, this allows writing userspace device drivers. This series has no special handling for ioperm(). Users will be able to successfully request I/O permissions but will induce a #VE on their first I/O instruction which leads SIGSEGV. If this is undesirable users can enable kernel lockdown feature with 'lockdown=3Dintegrity' kernel command line option. It makes ioperm() fail. More robust handling of this situation (denying ioperm() in all TDX guests) will be addressed in follow-on work. Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Dan Williams Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Thomas Gleixner --- arch/x86/coco/core.c | 7 +++- arch/x86/coco/tdx/tdx.c | 79 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 1 deletion(-) diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index 3f3008783e05..df08edc94f9b 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -18,7 +18,12 @@ static u64 cc_mask __ro_after_init; =20 static bool intel_cc_platform_has(enum cc_attr attr) { - return false; + switch (attr) { + case CC_ATTR_GUEST_UNROLL_STRING_IO: + return true; + default: + return false; + } } =20 /* diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index ab10bc73a7c5..e47e2ed6b03e 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -19,6 +19,16 @@ #define EPT_READ 0 #define EPT_WRITE 1 =20 +/* Port I/O direction */ +#define PORT_READ 0 +#define PORT_WRITE 1 + +/* See Exit Qualification for I/O Instructions in VMX documentation */ +#define VE_IS_IO_IN(e) ((e) & BIT(3)) +#define VE_GET_IO_SIZE(e) (((e) & GENMASK(2, 0)) + 1) +#define VE_GET_PORT_NUM(e) ((e) >> 16) +#define VE_IS_IO_STRING(e) ((e) & BIT(4)) + /* * Wrapper for standard use of __tdx_hypercall with no output aside from * return code. @@ -341,6 +351,73 @@ static bool handle_mmio(struct pt_regs *regs, struct v= e_info *ve) return true; } =20 +static bool handle_in(struct pt_regs *regs, int size, int port) +{ + struct tdx_hypercall_args args =3D { + .r10 =3D TDX_HYPERCALL_STANDARD, + .r11 =3D hcall_func(EXIT_REASON_IO_INSTRUCTION), + .r12 =3D size, + .r13 =3D PORT_READ, + .r14 =3D port, + }; + u64 mask =3D GENMASK(BITS_PER_BYTE * size, 0); + bool success; + + /* + * Emulate the I/O read via hypercall. More info about ABI can be found + * in TDX Guest-Host-Communication Interface (GHCI) section titled + * "TDG.VP.VMCALL". + */ + success =3D !__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT); + + /* Update part of the register affected by the emulated instruction */ + regs->ax &=3D ~mask; + if (success) + regs->ax |=3D args.r11 & mask; + + return success; +} + +static bool handle_out(struct pt_regs *regs, int size, int port) +{ + u64 mask =3D GENMASK(BITS_PER_BYTE * size, 0); + + /* + * Emulate the I/O write via hypercall. More info about ABI can be found + * in TDX Guest-Host-Communication Interface (GHCI) section titled + * "TDG.VP.VMCALL". + */ + return !_tdx_hypercall(hcall_func(EXIT_REASON_IO_INSTRUCTION), size, + PORT_WRITE, port, regs->ax & mask); +} + +/* + * Emulate I/O using hypercall. + * + * Assumes the IO instruction was using ax, which is enforced + * by the standard io.h macros. + * + * Return True on success or False on failure. + */ +static bool handle_io(struct pt_regs *regs, u32 exit_qual) +{ + int size, port; + bool in; + + if (VE_IS_IO_STRING(exit_qual)) + return false; + + in =3D VE_IS_IO_IN(exit_qual); + size =3D VE_GET_IO_SIZE(exit_qual); + port =3D VE_GET_PORT_NUM(exit_qual); + + + if (in) + return handle_in(regs, size, port); + else + return handle_out(regs, size, port); +} + void tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; @@ -397,6 +474,8 @@ static bool virt_exception_kernel(struct pt_regs *regs,= struct ve_info *ve) return handle_cpuid(regs); case EXIT_REASON_EPT_VIOLATION: return handle_mmio(regs, ve); + case EXIT_REASON_IO_INSTRUCTION: + return handle_io(regs, ve->exit_qual); default: pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); return false; --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E674EC433EF for ; Fri, 18 Mar 2022 15:32:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238170AbiCRPdg (ORCPT ); Fri, 18 Mar 2022 11:33:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230257AbiCRPcP (ORCPT ); Fri, 18 Mar 2022 11:32:15 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1B751903DC for ; Fri, 18 Mar 2022 08:30:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617454; x=1679153454; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iznNJy8gxVc9yYg7adxEnkSxDrtLakTn+JZ2TNjZC0k=; b=nfzrFXAU153s2EpXVBIQKakAU0EmYkkkdfhMG8yFnmOiAjZrmj9IPnNH MbjKwdB/rs+vx4+XWVbPqjpIJ5wcjF8o0mdPcIYBu6aG+/PWsnGzS6/XS GsxN1moYiinfj4SLxphXTMZAyyQwU1RUrCGM+9MNG6D6vDLmo5qT2c7Ab u9xa1tgdVs1kZkFTrthzfHc20+daqbP6XUf18XtG8R78mYFAs9GCNr97d G+x+GiZ9GQ9LuxfxRTlS0L2/r5/gwpxeKChG/jByCkdJ5KRC9sK+SszPp Yp4jJpkVooqCugyG4kgF3v3WNMfQv5vlIVn8SJO93WruaeQ7XGjsPbGlh w==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="256883244" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="256883244" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="647504857" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga004.jf.intel.com with ESMTP; 18 Mar 2022 08:30:45 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 9DA4C7FE; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" , Dave Hansen Subject: [PATCHv7 18/30] x86/tdx: Port I/O: add early boot support Date: Fri, 18 Mar 2022 18:30:36 +0300 Message-Id: <20220318153048.51177-19-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Andi Kleen TDX guests cannot do port I/O directly. The TDX module triggers a #VE exception to let the guest kernel emulate port I/O by converting them into TDCALLs to call the host. But before IDT handlers are set up, port I/O cannot be emulated using normal kernel #VE handlers. To support the #VE-based emulation during this boot window, add a minimal early #VE handler support in early exception handlers. This is similar to what AMD SEV does. This is mainly to support earlyprintk's serial driver, as well as potentially the VGA driver. The early handler only supports I/O-related #VE exceptions. Unhandled or failed exceptions will be handled via early_fixup_exceptions() (like normal exception failures). At runtime I/O-related #VE exceptions (along with other types) handled by virt_exception_kernel(). Signed-off-by: Andi Kleen Reviewed-by: Dan Williams Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov Reviewed-by: Thomas Gleixner Reviewed-by: Dave Hansen --- arch/x86/coco/tdx/tdx.c | 16 ++++++++++++++++ arch/x86/include/asm/tdx.h | 4 ++++ arch/x86/kernel/head64.c | 3 +++ 3 files changed, 23 insertions(+) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index e47e2ed6b03e..cc14b7c0c157 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -418,6 +418,22 @@ static bool handle_io(struct pt_regs *regs, u32 exit_q= ual) return handle_out(regs, size, port); } =20 +/* + * Early #VE exception handler. Only handles a subset of port I/O. + * Intended only for earlyprintk. If failed, return false. + */ +__init bool tdx_early_handle_ve(struct pt_regs *regs) +{ + struct ve_info ve; + + tdx_get_ve_info(&ve); + + if (ve.exit_reason !=3D EXIT_REASON_IO_INSTRUCTION) + return false; + + return handle_io(regs, ve.exit_qual); +} + void tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 1093a5e5f446..c70d9bc74b67 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -64,11 +64,15 @@ bool tdx_handle_virt_exception(struct pt_regs *regs, st= ruct ve_info *ve); =20 void tdx_safe_halt(void); =20 +bool tdx_early_handle_ve(struct pt_regs *regs); + #else =20 static inline void tdx_early_init(void) { }; static inline void tdx_safe_halt(void) { }; =20 +static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return fals= e; } + #endif /* CONFIG_INTEL_TDX_GUEST */ =20 #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 6dff50c3edd6..ecbf50e5b8e0 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -417,6 +417,9 @@ void __init do_early_exception(struct pt_regs *regs, in= t trapnr) trapnr =3D=3D X86_TRAP_VC && handle_vc_boot_ghcb(regs)) return; =20 + if (trapnr =3D=3D X86_TRAP_VE && tdx_early_handle_ve(regs)) + return; + early_fixup_exception(regs, trapnr); } =20 --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C1EBC433EF for ; Fri, 18 Mar 2022 15:32:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238189AbiCRPdv (ORCPT ); Fri, 18 Mar 2022 11:33:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238063AbiCRPcO (ORCPT ); Fri, 18 Mar 2022 11:32:14 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9FA1218EEA5 for ; Fri, 18 Mar 2022 08:30:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617451; x=1679153451; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YWbMeFxbiZInnJwZYaMcEWOMo5NSAST0KVgK1zOCVnc=; b=klLadZ9jLcKEqKdBe7l2PMcdbCsOl4LpC/kcXDfn/V0sEyH6nszueBOm 79KQ2E5V8nkI2AAoxdwkzWp/fDZCtiIUzVLByn2KLoKl9RqcRJEiTlPJ5 n0wnct28KkKfbIOuskko2vgajU8mo0nCPNub/Gk4r/7mEZVcy/Yus0Jlt Q1GXC6zIzIArQrHoQk0f4MS4zMn5W9siv3bZJw0M0hJyxQ5MCvG5AjMtX TQissAT1/nb52M1BIUrnH9V6zEsRT0MofiQBXxX/wU2OsUXZ0mIMZquVH z52s6WqS6YootAqDPZY/C5lv70W2VtwIzdKqS4kPmKrAr21p3pQYfO+IN A==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="281967593" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="281967593" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="645558453" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga002.fm.intel.com with ESMTP; 18 Mar 2022 08:30:45 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id A8AD2862; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv7 19/30] x86/tdx: Wire up KVM hypercalls Date: Fri, 18 Mar 2022 18:30:37 +0300 Message-Id: <20220318153048.51177-20-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan KVM hypercalls use the VMCALL or VMMCALL instructions. Although the ABI is similar, those instructions no longer function for TDX guests. Make vendor-specific TDVMCALLs instead of VMCALL. This enables TDX guests to run with KVM acting as the hypervisor. Among other things, KVM hypercall is used to send IPIs. Since the KVM driver can be built as a kernel module, export tdx_kvm_hypercall() to make the symbols visible to kvm.ko. Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov Reviewed-by: Thomas Gleixner --- arch/x86/coco/tdx/tdx.c | 17 +++++++++++++++++ arch/x86/include/asm/kvm_para.h | 22 ++++++++++++++++++++++ arch/x86/include/asm/tdx.h | 11 +++++++++++ 3 files changed, 50 insertions(+) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index cc14b7c0c157..f50f530aff5f 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -64,6 +64,23 @@ static u64 hcall_func(u64 exit_reason) return exit_reason; } =20 +#ifdef CONFIG_KVM_GUEST +long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, unsigned long p2, + unsigned long p3, unsigned long p4) +{ + struct tdx_hypercall_args args =3D { + .r10 =3D nr, + .r11 =3D p1, + .r12 =3D p2, + .r13 =3D p3, + .r14 =3D p4, + }; + + return __tdx_hypercall(&args, 0); +} +EXPORT_SYMBOL_GPL(tdx_kvm_hypercall); +#endif + /* * Used for TDX guests to make calls directly to the TD module. This * should only be used for calls that have no legitimate reason to fail diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_par= a.h index 56935ebb1dfe..57bc74e112f2 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -7,6 +7,8 @@ #include #include =20 +#include + #ifdef CONFIG_KVM_GUEST bool kvm_check_and_clear_guest_paused(void); #else @@ -32,6 +34,10 @@ static inline bool kvm_check_and_clear_guest_paused(void) static inline long kvm_hypercall0(unsigned int nr) { long ret; + + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) + return tdx_kvm_hypercall(nr, 0, 0, 0, 0); + asm volatile(KVM_HYPERCALL : "=3Da"(ret) : "a"(nr) @@ -42,6 +48,10 @@ static inline long kvm_hypercall0(unsigned int nr) static inline long kvm_hypercall1(unsigned int nr, unsigned long p1) { long ret; + + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) + return tdx_kvm_hypercall(nr, p1, 0, 0, 0); + asm volatile(KVM_HYPERCALL : "=3Da"(ret) : "a"(nr), "b"(p1) @@ -53,6 +63,10 @@ static inline long kvm_hypercall2(unsigned int nr, unsig= ned long p1, unsigned long p2) { long ret; + + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) + return tdx_kvm_hypercall(nr, p1, p2, 0, 0); + asm volatile(KVM_HYPERCALL : "=3Da"(ret) : "a"(nr), "b"(p1), "c"(p2) @@ -64,6 +78,10 @@ static inline long kvm_hypercall3(unsigned int nr, unsig= ned long p1, unsigned long p2, unsigned long p3) { long ret; + + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) + return tdx_kvm_hypercall(nr, p1, p2, p3, 0); + asm volatile(KVM_HYPERCALL : "=3Da"(ret) : "a"(nr), "b"(p1), "c"(p2), "d"(p3) @@ -76,6 +94,10 @@ static inline long kvm_hypercall4(unsigned int nr, unsig= ned long p1, unsigned long p4) { long ret; + + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) + return tdx_kvm_hypercall(nr, p1, p2, p3, p4); + asm volatile(KVM_HYPERCALL : "=3Da"(ret) : "a"(nr), "b"(p1), "c"(p2), "d"(p3), "S"(p4) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index c70d9bc74b67..ea92641dd1f8 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -75,5 +75,16 @@ static inline bool tdx_early_handle_ve(struct pt_regs *r= egs) { return false; } =20 #endif /* CONFIG_INTEL_TDX_GUEST */ =20 +#if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST) +long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, unsigned long p2, + unsigned long p3, unsigned long p4); +#else +static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3, + unsigned long p4) +{ + return -ENODEV; +} +#endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */ #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_TDX_H */ --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C307EC433F5 for ; Fri, 18 Mar 2022 15:33:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238252AbiCRPek (ORCPT ); Fri, 18 Mar 2022 11:34:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238141AbiCRPcw (ORCPT ); Fri, 18 Mar 2022 11:32:52 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5D371B7633 for ; Fri, 18 Mar 2022 08:31:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617467; x=1679153467; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/2lBal/ZW3OSepDWlMlqxbUt7B7XIsjCt1gCcR1IiEk=; b=AfRz07bSDzVFnO5W+U9Pi+9Sxp+Va2QbrklMb/0GAr8WAyuof3WKBdt8 6sIEY2yyKSml8JurwJWSgXRdQsPC6pFbdITqYPEtLm9MwHzo0lebqcRDR Fc7mgsKav7NX3K3xnry0Fbey2X8iWqJ7cy+7tmCTfEY7Uq2GzOZqe1gUn CSPOtGTM/XVNgqGhAndxQ5+ihdQxplKk3IZ4sORsXkOjKwnc0zLFl/hT7 MZXx4jWUo8OnDSvMlNWlI0WMU4TuqkOpMdIqm54zW/QbAsvoNfu/RKVAU DXBTWBe96zsBq7OsIiw7J387fOyztIC/s9MIcJhDWg4fWFltLH6zGXTPN Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="257344043" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="257344043" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="715523775" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga005.jf.intel.com with ESMTP; 18 Mar 2022 08:30:45 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id B41A78A1; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, Sean Christopherson , "Kirill A . Shutemov" Subject: [PATCHv7 20/30] x86/boot: Add a trampoline for booting APs via firmware handoff Date: Fri, 18 Mar 2022 18:30:38 +0300 Message-Id: <20220318153048.51177-21-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Historically, x86 platforms have booted secondary processors (APs) using INIT followed by the start up IPI (SIPI) messages. In regular VMs, this boot sequence is supported by the VMM emulation. But such a wakeup model is fatal for secure VMs like TDX in which VMM is an untrusted entity. To address this issue, a new wakeup model was added in ACPI v6.4, in which firmware (like TDX virtual BIOS) will help boot the APs. More details about this wakeup model can be found in ACPI specification v6.4, the section titled "Multiprocessor Wakeup Structure". Since the existing trampoline code requires processors to boot in real mode with 16-bit addressing, it will not work for this wakeup model (because it boots the AP in 64-bit mode). To handle it, extend the trampoline code to support 64-bit mode firmware handoff. Also, extend IDT and GDT pointers to support 64-bit mode hand off. There is no TDX-specific detection for this new boot method. The kernel will rely on it as the sole boot method whenever the new ACPI structure is present. The ACPI table parser for the MADT multiprocessor wake up structure and the wakeup method that uses this structure will be added by the following patch in this series. Signed-off-by: Sean Christopherson Reviewed-by: Andi Kleen Reviewed-by: Dan Williams Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov Reviewed-by: Thomas Gleixner --- arch/x86/include/asm/apic.h | 2 ++ arch/x86/include/asm/realmode.h | 1 + arch/x86/kernel/smpboot.c | 12 ++++++-- arch/x86/realmode/rm/header.S | 1 + arch/x86/realmode/rm/trampoline_64.S | 38 ++++++++++++++++++++++++ arch/x86/realmode/rm/trampoline_common.S | 12 +++++++- 6 files changed, 63 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 48067af94678..35006e151774 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -328,6 +328,8 @@ struct apic { =20 /* wakeup_secondary_cpu */ int (*wakeup_secondary_cpu)(int apicid, unsigned long start_eip); + /* wakeup secondary CPU using 64-bit wakeup point */ + int (*wakeup_secondary_cpu_64)(int apicid, unsigned long start_eip); =20 void (*inquire_remote_apic)(int apicid); =20 diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmod= e.h index 331474b150f1..fd6f6e5b755a 100644 --- a/arch/x86/include/asm/realmode.h +++ b/arch/x86/include/asm/realmode.h @@ -25,6 +25,7 @@ struct real_mode_header { u32 sev_es_trampoline_start; #endif #ifdef CONFIG_X86_64 + u32 trampoline_start64; u32 trampoline_pgd; #endif /* ACPI S3 wakeup */ diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 617012f4619f..6269dd126dba 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1088,6 +1088,11 @@ static int do_boot_cpu(int apicid, int cpu, struct t= ask_struct *idle, unsigned long boot_error =3D 0; unsigned long timeout; =20 +#ifdef CONFIG_X86_64 + /* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */ + if (apic->wakeup_secondary_cpu_64) + start_ip =3D real_mode_header->trampoline_start64; +#endif idle->thread.sp =3D (unsigned long)task_pt_regs(idle); early_gdt_descr.address =3D (unsigned long)get_cpu_gdt_rw(cpu); initial_code =3D (unsigned long)start_secondary; @@ -1129,11 +1134,14 @@ static int do_boot_cpu(int apicid, int cpu, struct = task_struct *idle, =20 /* * Wake up a CPU in difference cases: - * - Use the method in the APIC driver if it's defined + * - Use a method from the APIC driver if one defined, with wakeup + * straight to 64-bit mode preferred over wakeup to RM. * Otherwise, * - Use an INIT boot APIC message for APs or NMI for BSP. */ - if (apic->wakeup_secondary_cpu) + if (apic->wakeup_secondary_cpu_64) + boot_error =3D apic->wakeup_secondary_cpu_64(apicid, start_ip); + else if (apic->wakeup_secondary_cpu) boot_error =3D apic->wakeup_secondary_cpu(apicid, start_ip); else boot_error =3D wakeup_cpu_via_init_nmi(cpu, start_ip, apicid, diff --git a/arch/x86/realmode/rm/header.S b/arch/x86/realmode/rm/header.S index 8c1db5bf5d78..2eb62be6d256 100644 --- a/arch/x86/realmode/rm/header.S +++ b/arch/x86/realmode/rm/header.S @@ -24,6 +24,7 @@ SYM_DATA_START(real_mode_header) .long pa_sev_es_trampoline_start #endif #ifdef CONFIG_X86_64 + .long pa_trampoline_start64 .long pa_trampoline_pgd; #endif /* ACPI S3 wakeup */ diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/tr= ampoline_64.S index cc8391f86cdb..ae112a91592f 100644 --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -161,6 +161,19 @@ SYM_CODE_START(startup_32) ljmpl $__KERNEL_CS, $pa_startup_64 SYM_CODE_END(startup_32) =20 +SYM_CODE_START(pa_trampoline_compat) + /* + * In compatibility mode. Prep ESP and DX for startup_32, then disable + * paging and complete the switch to legacy 32-bit mode. + */ + movl $rm_stack_end, %esp + movw $__KERNEL_DS, %dx + + movl $X86_CR0_PE, %eax + movl %eax, %cr0 + ljmpl $__KERNEL32_CS, $pa_startup_32 +SYM_CODE_END(pa_trampoline_compat) + .section ".text64","ax" .code64 .balign 4 @@ -169,6 +182,20 @@ SYM_CODE_START(startup_64) jmpq *tr_start(%rip) SYM_CODE_END(startup_64) =20 +SYM_CODE_START(trampoline_start64) + /* + * APs start here on a direct transfer from 64-bit BIOS with identity + * mapped page tables. Load the kernel's GDT in order to gear down to + * 32-bit mode (to handle 4-level vs. 5-level paging), and to (re)load + * segment registers. Load the zero IDT so any fault triggers a + * shutdown instead of jumping back into BIOS. + */ + lidt tr_idt(%rip) + lgdt tr_gdt64(%rip) + + ljmpl *tr_compat(%rip) +SYM_CODE_END(trampoline_start64) + .section ".rodata","a" # Duplicate the global descriptor table # so the kernel can live anywhere @@ -182,6 +209,17 @@ SYM_DATA_START(tr_gdt) .quad 0x00cf93000000ffff # __KERNEL_DS SYM_DATA_END_LABEL(tr_gdt, SYM_L_LOCAL, tr_gdt_end) =20 +SYM_DATA_START(tr_gdt64) + .short tr_gdt_end - tr_gdt - 1 # gdt limit + .long pa_tr_gdt + .long 0 +SYM_DATA_END(tr_gdt64) + +SYM_DATA_START(tr_compat) + .long pa_trampoline_compat + .short __KERNEL32_CS +SYM_DATA_END(tr_compat) + .bss .balign PAGE_SIZE SYM_DATA(trampoline_pgd, .space PAGE_SIZE) diff --git a/arch/x86/realmode/rm/trampoline_common.S b/arch/x86/realmode/r= m/trampoline_common.S index 5033e640f957..4331c32c47f8 100644 --- a/arch/x86/realmode/rm/trampoline_common.S +++ b/arch/x86/realmode/rm/trampoline_common.S @@ -1,4 +1,14 @@ /* SPDX-License-Identifier: GPL-2.0 */ .section ".rodata","a" .balign 16 -SYM_DATA_LOCAL(tr_idt, .fill 1, 6, 0) + +/* + * When a bootloader hands off to the kernel in 32-bit mode an + * IDT with a 2-byte limit and 4-byte base is needed. When a boot + * loader hands off to a kernel 64-bit mode the base address + * extends to 8-bytes. Reserve enough space for either scenario. + */ +SYM_DATA_START_LOCAL(tr_idt) + .short 0 + .quad 0 +SYM_DATA_END(tr_idt) --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22935C433F5 for ; Fri, 18 Mar 2022 15:33:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237058AbiCRPef (ORCPT ); Fri, 18 Mar 2022 11:34:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238146AbiCRPcw (ORCPT ); Fri, 18 Mar 2022 11:32:52 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B58B1B8FFE for ; Fri, 18 Mar 2022 08:31:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617468; x=1679153468; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=crdqLilDmDTbuIri6IDcrzQ6xYfWMHL1JhQqJk+iwUI=; b=lyXzKSa8wQyzCh9Vo5wjXqc9apkX7X3iWXsePDdZvTJqI9rdcEDslyGt GFshCRHhPj85bH1vK42BcGuCemCl5WvVhJnssP5D+3Pcl3oyJtD9VymIL hl1LWOecqPzZry/xQ2v9YPO/DpTpL5brB4XozE6yIdesWJOVimFQbSRC0 itCsIJF4P0LsHPILYjNhb7Zj3XbY9CXVF02wx1o9ia70naC3CXGS8H0qV 9UunHTY6sp7gjpMEvZygxM/7uWyrfzkqFBDYfQDLIWHsDIIoXpNhTlNvX JSTAF26ptZbdQ+OgLUYaaKPdj7u/FWkZ0LKiGBqPVf4nfliwKDz03AuwF Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="237767777" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="237767777" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="558495465" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga008.jf.intel.com with ESMTP; 18 Mar 2022 08:30:45 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id BFDF08A8; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, Sean Christopherson , "Rafael J . Wysocki" , "Kirill A . Shutemov" Subject: [PATCHv7 21/30] x86/acpi, x86/boot: Add multiprocessor wake-up support Date: Fri, 18 Mar 2022 18:30:39 +0300 Message-Id: <20220318153048.51177-22-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan TDX cannot use INIT/SIPI protocol to bring up secondary CPUs because it requires assistance from untrusted VMM. For platforms that do not support SIPI/INIT, ACPI defines a wakeup model (using mailbox) via MADT multiprocessor wakeup structure. More details about it can be found in ACPI specification v6.4, the section titled "Multiprocessor Wakeup Structure". If a platform firmware produces the multiprocessor wakeup structure, then OS may use this new mailbox-based mechanism to wake up the APs. Add ACPI MADT wake structure parsing support for x86 platform and if MADT wake table is present, update apic->wakeup_secondary_cpu_64 with new API which uses MADT wake mailbox to wake-up CPU. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Reviewed-by: Andi Kleen Reviewed-by: Rafael J. Wysocki Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen --- arch/x86/include/asm/apic.h | 5 ++ arch/x86/kernel/acpi/boot.c | 115 ++++++++++++++++++++++++++++++++++++ arch/x86/kernel/apic/apic.c | 10 ++++ 3 files changed, 130 insertions(+) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 35006e151774..bd8ae0a7010a 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -490,6 +490,11 @@ static inline unsigned int read_apic_id(void) return apic->get_apic_id(reg); } =20 +#ifdef CONFIG_X86_64 +typedef int (*wakeup_cpu_handler)(int apicid, unsigned long start_eip); +extern void acpi_wake_cpu_handler_update(wakeup_cpu_handler handler); +#endif + extern int default_apic_id_valid(u32 apicid); extern int default_acpi_madt_oem_check(char *, char *); extern void default_setup_apic_routing(void); diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 5b6d1a95776f..0ebfab0f4f7a 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -65,6 +65,13 @@ static u64 acpi_lapic_addr __initdata =3D APIC_DEFAULT_P= HYS_BASE; static bool acpi_support_online_capable; #endif =20 +#ifdef CONFIG_X86_64 +/* Physical address of the Multiprocessor Wakeup Structure mailbox */ +static u64 acpi_mp_wake_mailbox_paddr; +/* Virtual address of the Multiprocessor Wakeup Structure mailbox */ +static struct acpi_madt_multiproc_wakeup_mailbox *acpi_mp_wake_mailbox; +#endif + #ifdef CONFIG_X86_IO_APIC /* * Locks related to IOAPIC hotplug @@ -336,6 +343,83 @@ acpi_parse_lapic_nmi(union acpi_subtable_headers * hea= der, const unsigned long e return 0; } =20 +#ifdef CONFIG_X86_64 +/* Virtual address of the Multiprocessor Wakeup Structure mailbox */ +static int acpi_wakeup_cpu(int apicid, unsigned long start_ip) +{ + static physid_mask_t apic_id_wakemap =3D PHYSID_MASK_NONE; + u8 timeout; + + /* + * Remap mailbox memory only for the first call to acpi_wakeup_cpu(). + * + * Wakeup of secondary CPUs is fully serialized in the core code. + * No need to protect acpi_mp_wake_mailbox from concurrent accesses. + */ + if (!acpi_mp_wake_mailbox) { + acpi_mp_wake_mailbox =3D memremap(acpi_mp_wake_mailbox_paddr, + sizeof(*acpi_mp_wake_mailbox), + MEMREMAP_WB); + } + + /* + * According to the ACPI specification r6.4, section titled + * "Multiprocessor Wakeup Structure" the mailbox-based wakeup + * mechanism cannot be used more than once for the same CPU. + * Skip wakeups if they are attempted more than once. + */ + if (physid_isset(apicid, apic_id_wakemap)) { + pr_err("CPU already awake (APIC ID %x), skipping wakeup\n", + apicid); + return -EINVAL; + } + + /* + * Mailbox memory is shared between firmware and OS. Firmware will + * listen on mailbox command address, and once it receives the wakeup + * command, CPU associated with the given apicid will be booted. + * + * The value of apic_id and wakeup_vector has to be set before updating + * the wakeup command. Use memory barrier to preserve write order. + */ + acpi_mp_wake_mailbox->apic_id =3D apicid; + acpi_mp_wake_mailbox->wakeup_vector =3D start_ip; + smp_wmb(); + WRITE_ONCE(acpi_mp_wake_mailbox->command, ACPI_MP_WAKE_COMMAND_WAKEUP); + + /* + * After writing the wakeup command, wait for maximum timeout of 0xFF + * for firmware to reset the command address back zero to indicate + * the successful reception of command. + * NOTE: 0xFF as timeout value is decided based on our experiments. + * + * XXX: Change the timeout once ACPI specification comes up with + * standard maximum timeout value. + */ + timeout =3D 0xFF; + while (READ_ONCE(acpi_mp_wake_mailbox->command) && --timeout) + cpu_relax(); + + /* If timed out (timeout =3D=3D 0), return error */ + if (!timeout) { + /* + * XXX: Is there a recovery path after timeout is hit? + * Spec is unclear. Reset command to 0 if timeout is hit. + */ + acpi_mp_wake_mailbox->command =3D 0; + return -EIO; + } + + /* + * If the CPU wakeup process is successful, store the + * status in apic_id_wakemap to prevent re-wakeup + * requests. + */ + physid_set(apicid, apic_id_wakemap); + + return 0; +} +#endif #endif /*CONFIG_X86_LOCAL_APIC */ =20 #ifdef CONFIG_X86_IO_APIC @@ -1083,6 +1167,29 @@ static int __init acpi_parse_madt_lapic_entries(void) } return 0; } + +#ifdef CONFIG_X86_64 +static int __init acpi_parse_mp_wake(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_madt_multiproc_wakeup *mp_wake; + + if (!IS_ENABLED(CONFIG_SMP)) + return -ENODEV; + + mp_wake =3D (struct acpi_madt_multiproc_wakeup *)header; + if (BAD_MADT_ENTRY(mp_wake, end)) + return -EINVAL; + + acpi_table_print_madt_entry(&header->common); + + acpi_mp_wake_mailbox_paddr =3D mp_wake->base_address; + + acpi_wake_cpu_handler_update(acpi_wakeup_cpu); + + return 0; +} +#endif /* CONFIG_X86_64 */ #endif /* CONFIG_X86_LOCAL_APIC */ =20 #ifdef CONFIG_X86_IO_APIC @@ -1278,6 +1385,14 @@ static void __init acpi_process_madt(void) =20 smp_found_config =3D 1; } + +#ifdef CONFIG_X86_64 + /* + * Parse MADT MP Wake entry. + */ + acpi_table_parse_madt(ACPI_MADT_TYPE_MULTIPROC_WAKEUP, + acpi_parse_mp_wake, 1); +#endif } if (error =3D=3D -EINVAL) { /* diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index b70344bf6600..3c8f2c797a98 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2551,6 +2551,16 @@ u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool= extid) } EXPORT_SYMBOL_GPL(x86_msi_msg_get_destid); =20 +#ifdef CONFIG_X86_64 +void __init acpi_wake_cpu_handler_update(wakeup_cpu_handler handler) +{ + struct apic **drv; + + for (drv =3D __apicdrivers; drv < __apicdrivers_end; drv++) + (*drv)->wakeup_secondary_cpu_64 =3D handler; +} +#endif + /* * Override the generic EOI implementation with an optimized version. * Only called during early boot when only one CPU is active and with --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9637C433EF for ; Fri, 18 Mar 2022 15:32:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238108AbiCRPda (ORCPT ); Fri, 18 Mar 2022 11:33:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238052AbiCRPcS (ORCPT ); Fri, 18 Mar 2022 11:32:18 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31CD21760C8 for ; Fri, 18 Mar 2022 08:30:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617459; x=1679153459; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aja104HIf6wRILxqTg0MXkoRQclPoguBXAUavtna1c8=; b=E2FOpuVtRqzYo/34LI35AtnzoGEnKmb6ikCjSHRhqZOciVtvQ/pMf3Gz ldCjyJCU3ApaSVfpqEVDgEuUbsIZg1GzTbkXYxL2CZ9GH31oT482Vgp47 S9tFOuL0TLmg2QGF7Zs9ZOO5vJxWX0UFQwmc4iUBFziHNQpVgYqfKdnST jfoEWZGhSY+tu9LlxPqCtaoOgs1j3KQqiw0yIvuW+qQhUy+Q0eJfyPYZ2 PXAC+wS08CNjb+jPgCnaQtvWn0E4lkAoAS/xmspXK1VTPfq/8n/ixZ7B8 dx+QPBQpOVWtP0n/NSXgCiDVKLc+kaa/bNMh89eBHQS56nMaEDnvgVLdw g==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="255982620" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="255982620" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="513919405" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga002.jf.intel.com with ESMTP; 18 Mar 2022 08:30:45 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id CB7CC8E4; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv7 22/30] x86/boot: Set CR0.NE early and keep it set during the boot Date: Fri, 18 Mar 2022 18:30:40 +0300 Message-Id: <20220318153048.51177-23-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org TDX guest requires CR0.NE to be set. Clearing the bit triggers #GP(0). If CR0.NE is 0, the MS-DOS compatibility mode for handling floating-point exceptions is selected. In this mode, the software exception handler for floating-point exceptions is invoked externally using the processor=E2=80= =99s FERR#, INTR, and IGNNE# pins. Using FERR# and IGNNE# to handle floating-point exception is deprecated. CR0.NE=3D0 also limits newer processors to operate with one logical processor active. Kernel uses CR0_STATE constant to initialize CR0. It has NE bit set. But during early boot kernel has more ad-hoc approach to setting bit in the register. During some of this ad-hoc manipulation, CR0.NE is cleared. This causes a #GP in TDX guests and makes it die in early boot. Make CR0 initialization consistent, deriving the initial value of CR0 from CR0_STATE. Since CR0_STATE always has CR0.NE=3D1, this ensures that CR0.NE is never 0 and avoids the #GP. Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Thomas Gleixner --- arch/x86/boot/compressed/head_64.S | 7 ++++--- arch/x86/realmode/rm/trampoline_64.S | 8 ++++---- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/= head_64.S index fd9441f40457..d0c3d33f3542 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -289,7 +289,7 @@ SYM_FUNC_START(startup_32) pushl %eax =20 /* Enter paged protected Mode, activating Long Mode */ - movl $(X86_CR0_PG | X86_CR0_PE), %eax /* Enable Paging and Protected mode= */ + movl $CR0_STATE, %eax movl %eax, %cr0 =20 /* Jump from 32bit compatibility mode into 64bit mode. */ @@ -662,8 +662,9 @@ SYM_CODE_START(trampoline_32bit_src) pushl $__KERNEL_CS pushl %eax =20 - /* Enable paging again */ - movl $(X86_CR0_PG | X86_CR0_PE), %eax + /* Enable paging again. */ + movl %cr0, %eax + btsl $X86_CR0_PG_BIT, %eax movl %eax, %cr0 =20 lret diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/tr= ampoline_64.S index ae112a91592f..d380f2d1fd23 100644 --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -70,7 +70,7 @@ SYM_CODE_START(trampoline_start) movw $__KERNEL_DS, %dx # Data segment descriptor =20 # Enable protected mode - movl $X86_CR0_PE, %eax # protected mode (PE) bit + movl $(CR0_STATE & ~X86_CR0_PG), %eax movl %eax, %cr0 # into protected mode =20 # flush prefetch and jump to startup_32 @@ -148,8 +148,8 @@ SYM_CODE_START(startup_32) movl $MSR_EFER, %ecx wrmsr =20 - # Enable paging and in turn activate Long Mode - movl $(X86_CR0_PG | X86_CR0_WP | X86_CR0_PE), %eax + # Enable paging and in turn activate Long Mode. + movl $CR0_STATE, %eax movl %eax, %cr0 =20 /* @@ -169,7 +169,7 @@ SYM_CODE_START(pa_trampoline_compat) movl $rm_stack_end, %esp movw $__KERNEL_DS, %dx =20 - movl $X86_CR0_PE, %eax + movl $(CR0_STATE & ~X86_CR0_PG), %eax movl %eax, %cr0 ljmpl $__KERNEL32_CS, $pa_startup_32 SYM_CODE_END(pa_trampoline_compat) --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22DB5C433EF for ; Fri, 18 Mar 2022 15:32:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238177AbiCRPdj (ORCPT ); Fri, 18 Mar 2022 11:33:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238060AbiCRPcS (ORCPT ); Fri, 18 Mar 2022 11:32:18 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D66219058B for ; Fri, 18 Mar 2022 08:30:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617459; x=1679153459; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MSvS+9NbYSrsQgy6kndTGjPkboouyc+qZrLBTCDZhBM=; b=JM8hnDy6Nl3bnZZ/uUBZ7vEw8C71Ximy5NK/dz3cTzpychyTomu3ZITY 9T9vEGn/7fuvj2OFyoSVVZTBn4XwDKQGAumc/bEHG7No7x3/ie5r4fxUv wW99jAUO91jquuSTPGjF9Lbnq1Dr3Bh5UEdRQAPrVCB8X52rvrpQyufF6 UOC/YmXesGyd4ntdoCA9IV38vfBw49I9uiSmgUwyF/m2/FTe9c4bjCtsn cQlLPEOE5E8U7gsB+/UwXYAjL5pw8FiVDnYDgWeitkAJtBmjDmpzf8K3d hEHY5rAwe6hR6/hDzmzu110kd9IuFjAnjOUbHnII99t80bKuHeHs8NcfC g==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="255982618" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="255982618" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="581751864" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga001.jf.intel.com with ESMTP; 18 Mar 2022 08:30:45 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id D6A8AA2E; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv7 23/30] x86/boot: Avoid #VE during boot for TDX platforms Date: Fri, 18 Mar 2022 18:30:41 +0300 Message-Id: <20220318153048.51177-24-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Sean Christopherson There are a few MSRs and control register bits that the kernel normally needs to modify during boot. But, TDX disallows modification of these registers to help provide consistent security guarantees. Fortunately, TDX ensures that these are all in the correct state before the kernel loads, which means the kernel does not need to modify them. The conditions to avoid are: * Any writes to the EFER MSR * Clearing CR4.MCE This theoretically makes the guest boot more fragile. If, for instance, EFER was set up incorrectly and a WRMSR was performed, it will trigger early exception panic or a triple fault, if it's before early exceptions are set up. However, this is likely to trip up the guest BIOS long before control reaches the kernel. In any case, these kinds of problems are unlikely to occur in production environments, and developers have good debug tools to fix them quickly. Change the common boot code to work on TDX and non-TDX systems. This should have no functional effect on non-TDX systems. Signed-off-by: Sean Christopherson Reviewed-by: Andi Kleen Reviewed-by: Dan Williams Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov Reviewed-by: Thomas Gleixner --- arch/x86/Kconfig | 1 + arch/x86/boot/compressed/head_64.S | 20 ++++++++++++++++++-- arch/x86/boot/compressed/pgtable.h | 2 +- arch/x86/kernel/head_64.S | 28 ++++++++++++++++++++++++++-- arch/x86/realmode/rm/trampoline_64.S | 13 ++++++++++++- 5 files changed, 58 insertions(+), 6 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d2f45e58e846..98efb35ed7b1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -886,6 +886,7 @@ config INTEL_TDX_GUEST depends on X86_X2APIC select ARCH_HAS_CC_PLATFORM select DYNAMIC_PHYSICAL_MASK + select X86_MCE help Support running as a guest under Intel TDX. Without this support, the guest kernel can not boot or run under TDX. diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/= head_64.S index d0c3d33f3542..6d903b2fc544 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -643,12 +643,28 @@ SYM_CODE_START(trampoline_32bit_src) movl $MSR_EFER, %ecx rdmsr btsl $_EFER_LME, %eax + /* Avoid writing EFER if no change was made (for TDX guest) */ + jc 1f wrmsr - popl %edx +1: popl %edx popl %ecx =20 +#ifdef CONFIG_X86_MCE + /* + * Preserve CR4.MCE if the kernel will enable #MC support. + * Clearing MCE may fault in some environments (that also force #MC + * support). Any machine check that occurs before #MC support is fully + * configured will crash the system regardless of the CR4.MCE value set + * here. + */ + movl %cr4, %eax + andl $X86_CR4_MCE, %eax +#else + movl $0, %eax +#endif + /* Enable PAE and LA57 (if required) paging modes */ - movl $X86_CR4_PAE, %eax + orl $X86_CR4_PAE, %eax testl %edx, %edx jz 1f orl $X86_CR4_LA57, %eax diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/= pgtable.h index 6ff7e81b5628..cc9b2529a086 100644 --- a/arch/x86/boot/compressed/pgtable.h +++ b/arch/x86/boot/compressed/pgtable.h @@ -6,7 +6,7 @@ #define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0 =20 #define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE -#define TRAMPOLINE_32BIT_CODE_SIZE 0x70 +#define TRAMPOLINE_32BIT_CODE_SIZE 0x80 =20 #define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE =20 diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 9c63fc5988cd..184b7468ea76 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -140,8 +140,22 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_= GLOBAL) addq $(init_top_pgt - __START_KERNEL_map), %rax 1: =20 +#ifdef CONFIG_X86_MCE + /* + * Preserve CR4.MCE if the kernel will enable #MC support. + * Clearing MCE may fault in some environments (that also force #MC + * support). Any machine check that occurs before #MC support is fully + * configured will crash the system regardless of the CR4.MCE value set + * here. + */ + movq %cr4, %rcx + andl $X86_CR4_MCE, %ecx +#else + movl $0, %ecx +#endif + /* Enable PAE mode, PGE and LA57 */ - movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx + orl $(X86_CR4_PAE | X86_CR4_PGE), %ecx #ifdef CONFIG_X86_5LEVEL testl $1, __pgtable_l5_enabled(%rip) jz 1f @@ -246,13 +260,23 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L= _GLOBAL) /* Setup EFER (Extended Feature Enable Register) */ movl $MSR_EFER, %ecx rdmsr + /* + * Preserve current value of EFER for comparison and to skip + * EFER writes if no change was made (for TDX guest) + */ + movl %eax, %edx btsl $_EFER_SCE, %eax /* Enable System Call */ btl $20,%edi /* No Execute supported? */ jnc 1f btsl $_EFER_NX, %eax btsq $_PAGE_BIT_NX,early_pmd_flags(%rip) -1: wrmsr /* Make changes effective */ =20 + /* Avoid writing EFER if no change was made (for TDX guest) */ +1: cmpl %edx, %eax + je 1f + xor %edx, %edx + wrmsr /* Make changes effective */ +1: /* Setup cr0 */ movl $CR0_STATE, %eax /* Make changes effective */ diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/tr= ampoline_64.S index d380f2d1fd23..e38d61d6562e 100644 --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -143,11 +143,22 @@ SYM_CODE_START(startup_32) movl %eax, %cr3 =20 # Set up EFER + movl $MSR_EFER, %ecx + rdmsr + /* + * Skip writing to EFER if the register already has desired + * value (to avoid #VE for the TDX guest). + */ + cmp pa_tr_efer, %eax + jne .Lwrite_efer + cmp pa_tr_efer + 4, %edx + je .Ldone_efer +.Lwrite_efer: movl pa_tr_efer, %eax movl pa_tr_efer + 4, %edx - movl $MSR_EFER, %ecx wrmsr =20 +.Ldone_efer: # Enable paging and in turn activate Long Mode. movl $CR0_STATE, %eax movl %eax, %cr0 --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDB77C433F5 for ; Fri, 18 Mar 2022 15:32:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238068AbiCRPds (ORCPT ); Fri, 18 Mar 2022 11:33:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238072AbiCRPcT (ORCPT ); Fri, 18 Mar 2022 11:32:19 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A550192590 for ; Fri, 18 Mar 2022 08:31:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617460; x=1679153460; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5jDVsKlXLEWs7hNvGtEGjQQDLrI7Lv8BWQrcDFTtyks=; b=Zi8Vf0eTd7f2jMeDiUannCoOCnZjPSUE3QbH86MFmuTDP5S6rAn4MKwV MvdlSqd3asbQh8h18R+vMhTlaEbdIgc9037vhA84I1Y9Vh9ggJiGFNLE3 Xz02LD+FAbpTdyZb+sLHxkanYUj9wTaRL0IvjfvY03puUF0Oibvo4/Ql/ m4BMmTJvtG5HICBYNahT56Q02wgnMbeSC6YK6QVRawsS97+mkGXxvPSr+ kwS9nS052/1IZ8DhUbYlNhwgBE+9VziXcopcrcSNBIuez39M3DlggKCpH yV+Pr3Q0Tl9pnkqM3VEd+ukWWDHmjEGhoFx96hiuBHYWcFoAaKswWdlEY w==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="255982621" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="255982621" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="513919409" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga002.jf.intel.com with ESMTP; 18 Mar 2022 08:30:45 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id E1776A7A; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv7 24/30] x86/topology: Disable CPU online/offline control for TDX guests Date: Fri, 18 Mar 2022 18:30:42 +0300 Message-Id: <20220318153048.51177-25-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan Unlike regular VMs, TDX guests use the firmware hand-off wakeup method to wake up the APs during the boot process. This wakeup model uses a mailbox to communicate with firmware to bring up the APs. As per the design, this mailbox can only be used once for the given AP, which means after the APs are booted, the same mailbox cannot be used to offline/online the given AP. More details about this requirement can be found in Intel TDX Virtual Firmware Design Guide, sec titled "AP initialization in OS" and in sec titled "Hotplug Device". Since the architecture does not support any method of offlining the CPUs, disable CPU hotplug support in the kernel. Since this hotplug disable feature can be re-used by other VM guests, add a new CC attribute CC_ATTR_HOTPLUG_DISABLED and use it to disable the hotplug support. Attempt to offline CPU will fail with -EOPNOTSUPP. Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov Reviewed-by: Thomas Gleixner --- arch/x86/coco/core.c | 1 + include/linux/cc_platform.h | 10 ++++++++++ kernel/cpu.c | 7 +++++++ 3 files changed, 18 insertions(+) diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index df08edc94f9b..70956f9d7c7e 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -20,6 +20,7 @@ static bool intel_cc_platform_has(enum cc_attr attr) { switch (attr) { case CC_ATTR_GUEST_UNROLL_STRING_IO: + case CC_ATTR_HOTPLUG_DISABLED: return true; default: return false; diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index efd8205282da..691494bbaf5a 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -72,6 +72,16 @@ enum cc_attr { * Examples include TDX guest & SEV. */ CC_ATTR_GUEST_UNROLL_STRING_IO, + + /** + * @CC_ATTR_HOTPLUG_DISABLED: Hotplug is not supported or disabled. + * + * The platform/OS is running as a guest/virtual machine does not + * support CPU hotplug feature. + * + * Examples include TDX Guest. + */ + CC_ATTR_HOTPLUG_DISABLED, }; =20 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM diff --git a/kernel/cpu.c b/kernel/cpu.c index f39eb0b52dfe..c94f00fa34d3 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -34,6 +34,7 @@ #include #include #include +#include =20 #include #define CREATE_TRACE_POINTS @@ -1185,6 +1186,12 @@ static int __ref _cpu_down(unsigned int cpu, int tas= ks_frozen, =20 static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target) { + /* + * If the platform does not support hotplug, report it explicitly to + * differentiate it from a transient offlining failure. + */ + if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED)) + return -EOPNOTSUPP; if (cpu_hotplug_disabled) return -EBUSY; return _cpu_down(cpu, 0, target); --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5972C433EF for ; Fri, 18 Mar 2022 15:32:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238190AbiCRPdn (ORCPT ); Fri, 18 Mar 2022 11:33:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238065AbiCRPcO (ORCPT ); Fri, 18 Mar 2022 11:32:14 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4110818FAF9 for ; Fri, 18 Mar 2022 08:30:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617453; x=1679153453; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=62Po8lCg5yXfQsMK7VB6vvco0r9paC38cPMkaim6FVk=; b=Bypt44zoXoVMpcV/0iF1JRzfCCUqnKDD6uleo9S7CZwU6Hhc+TcS7GRW 7sVentfoLKajZu4sZr0pJsITqTUfbgyl0TsNlWK/IgdYO7DmvIkdriHbr OaFIMrjQ60opl+ihnYdlcplL/+qrA5+2RUKmYlF6wdTUmzcSaqJrkAXvu QrVqBdV2kR3lPcIj4XUepzI3icGKbzsufPWFGmEH4xppTUYSNbqiGRzoG Jyxp2PKcquM6qEy6kQyxb3wJhO/dYoagntRzWNHn35fReFSM0G57cI2R3 6esQz71VUKYF0R5o+xQrAwHQc/i9Kb6pYjlH9RRkDMNNdZrTV3xkUxcRy w==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="256883243" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="256883243" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="647504855" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga004.jf.intel.com with ESMTP; 18 Mar 2022 08:30:45 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id ECC3EAA5; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv7 25/30] x86/tdx: Make pages shared in ioremap() Date: Fri, 18 Mar 2022 18:30:43 +0300 Message-Id: <20220318153048.51177-26-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In TDX guests, guest memory is protected from host access. If a guest performs I/O, it needs to explicitly share the I/O memory with the host. Make all ioremap()ed pages that are not backed by normal memory (IORES_DESC_NONE or IORES_DESC_RESERVED) mapped as shared. The permissions in PAGE_KERNEL_IO already work for "decrypted" memory on AMD SEV/SME systems. That means that they have no need to make a pgprot_decrypted() call. TDX guests, on the other hand, _need_ change to PAGE_KERNEL_IO for "decrypted" mappings. Add a pgprot_decrypted() for TDX. Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Thomas Gleixner --- arch/x86/mm/ioremap.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 026031b3b782..a5d4ec1afca2 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -242,10 +242,15 @@ __ioremap_caller(resource_size_t phys_addr, unsigned = long size, * If the page being mapped is in memory and SEV is active then * make sure the memory encryption attribute is enabled in the * resulting mapping. + * In TDX guests, memory is marked private by default. If encryption + * is not requested (using encrypted), explicitly set decrypt + * attribute in all IOREMAPPED memory. */ prot =3D PAGE_KERNEL_IO; if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted) prot =3D pgprot_encrypted(prot); + else + prot =3D pgprot_decrypted(prot); =20 switch (pcm) { case _PAGE_CACHE_MODE_UC: --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AF4AC433EF for ; Fri, 18 Mar 2022 15:33:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238162AbiCRPez (ORCPT ); Fri, 18 Mar 2022 11:34:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238145AbiCRPcw (ORCPT ); Fri, 18 Mar 2022 11:32:52 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4C1F1B7624 for ; Fri, 18 Mar 2022 08:31:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617467; x=1679153467; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uNm+fl5+SRYSH6fmORtBED/Ic8hO0OtQFRhFzfeJrJo=; b=ltSfFEd0WwAzXhHTJUZrLCazfg7ATbd6RKzQHSfVcMfIeDoPyzppDLqB FrUlVsy0JunKbl4Yho6LWbr0pWJKZGksSX5TFRdQk5qo7FsfC9AhpyBf9 QR8gv27rmcBVHFE/OxklsGK256lmRzycHJdhut9qhmISFmyhly2AMcJ3H uN2H6qDRBGxesvJ7N0ANddGMgZTI0tYSjJYcCt+Oh4e/DHBLYAiViFLD4 FnYTHtOKagrKLclm8lb0kCbeNXDoAAhFn6xUvvtuy2j2ZvMN2uRCX6cj6 9qfToubOOAl28NDQm4rGJBIfr+2XLrhHG058qM/7yWffqSEj2o3x9Sp+D A==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="257344046" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="257344046" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="784269859" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga006.fm.intel.com with ESMTP; 18 Mar 2022 08:30:45 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 03224B02; Fri, 18 Mar 2022 17:30:50 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv7 26/30] x86/mm/cpa: Add support for TDX shared memory Date: Fri, 18 Mar 2022 18:30:44 +0300 Message-Id: <20220318153048.51177-27-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Intel TDX protects guest memory from VMM access. Any memory that is required for communication with the VMM must be explicitly shared. It is a two-step process: the guest sets the shared bit in the page table entry and notifies VMM about the change. The notification happens using MapGPA hypercall. Conversion back to private memory requires clearing the shared bit, notifying VMM with MapGPA hypercall following with accepting the memory with AcceptPage hypercall. Provide a TDX version of x86_platform.guest.* callbacks. It makes __set_memory_enc_pgtable() work right in TDX guest. Signed-off-by: Kirill A. Shutemov Reviewed-by: Thomas Gleixner --- arch/x86/coco/core.c | 1 + arch/x86/coco/tdx/tdx.c | 151 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/traps.c | 2 +- 3 files changed, 153 insertions(+), 1 deletion(-) diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index 70956f9d7c7e..9f74125c582d 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -21,6 +21,7 @@ static bool intel_cc_platform_has(enum cc_attr attr) switch (attr) { case CC_ATTR_GUEST_UNROLL_STRING_IO: case CC_ATTR_HOTPLUG_DISABLED: + case CC_ATTR_GUEST_MEM_ENCRYPT: return true; default: return false; diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index f50f530aff5f..1a8ac2f08bd4 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -10,10 +10,15 @@ #include #include #include +#include =20 /* TDX module Call Leaf IDs */ #define TDX_GET_INFO 1 #define TDX_GET_VEINFO 3 +#define TDX_ACCEPT_PAGE 6 + +/* TDX hypercall Leaf IDs */ +#define TDVMCALL_MAP_GPA 0x10001 =20 /* MMIO direction */ #define EPT_READ 0 @@ -531,6 +536,148 @@ bool tdx_handle_virt_exception(struct pt_regs *regs, = struct ve_info *ve) return ret; } =20 +static bool tdx_tlb_flush_required(bool private) +{ + /* + * TDX guest is responsible for flushing TLB on private->shared + * transition. VMM is responsible for flushing on shared->private. + * + * The VMM _can't_ flush private addresses as it can't generate PAs + * with the guest's HKID. Shared memory isn't subject to integrity + * checking, i.e. the VMM doesn't need to flush for its own protection. + * + * There's no need to flush when converting from shared to private, + * as flushing is the VMM's responsibility in this case, e.g. it must + * flush to avoid integrity failures in the face of a buggy or + * malicious guest. + */ + return !private; +} + +static bool tdx_cache_flush_required(void) +{ + /* + * AMD SME/SEV can avoid cache flushing if HW enforces cache coherence. + * TDX doesn't have such capability. + * + * Flush cache unconditionally. + */ + return true; +} + +static bool accept_page(phys_addr_t gpa, enum pg_level pg_level) +{ + u8 page_size; + u64 tdcall_rcx; + + /* + * Pass the page physical address to the TDX module to accept the + * pending, private page. + * + * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M, 2 - 1G. + */ + switch (pg_level) { + case PG_LEVEL_4K: + page_size =3D 0; + break; + case PG_LEVEL_2M: + page_size =3D 1; + break; + case PG_LEVEL_1G: + page_size =3D 2; + break; + default: + return false; + } + + tdcall_rcx =3D gpa | page_size; + return !__tdx_module_call(TDX_ACCEPT_PAGE, tdcall_rcx, 0, 0, 0, NULL); +} + +static inline enum pg_level size_to_level(unsigned long size) +{ + switch (size) { + case PUD_SIZE: + return PG_LEVEL_1G; + case PMD_SIZE: + return PG_LEVEL_2M; + case PAGE_SIZE: + return PG_LEVEL_4K; + default: + BUILD_BUG(); + } +} + +static bool try_accept_one(phys_addr_t *start, unsigned long len, + unsigned long accept_size) +{ + if (!IS_ALIGNED(*start, accept_size)) + return false; + + if (len < accept_size) + return false; + + if (!accept_page(*start, size_to_level(accept_size))) + return false; + + *start +=3D accept_size; + return true; +} + +/* + * Inform the VMM of the guest's intent for this physical page: shared with + * the VMM or private to the guest. The VMM is expected to change its map= ping + * of the page in response. + */ +static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool= enc) +{ + phys_addr_t start =3D __pa(vaddr); + phys_addr_t end =3D __pa(vaddr + numpages * PAGE_SIZE); + + if (!enc) { + /* Set the shared (decrypted) bits: */ + start |=3D cc_mkdec(0); + end |=3D cc_mkdec(0); + } + + /* + * Notify the VMM about page mapping conversion. More info about ABI + * can be found in TDX Guest-Host-Communication Interface (GHCI), + * section "TDG.VP.VMCALL" + */ + if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0)) + return false; + + /* private->shared conversion requires only MapGPA call */ + if (!enc) + return true; + + /* + * For shared->private conversion, accept the page using + * TDX_ACCEPT_PAGE TDX module call. + */ + while (start < end) { + unsigned long len =3D end - start; + + /* + * Try larger accepts first. It gives chance to VMM to keep + * 1G/2M SEPT entries where possible and speeds up process by + * cutting number of hypercalls (if successful). + */ + + if (try_accept_one(&start, len, PUD_SIZE)) + continue; + + if (try_accept_one(&start, len, PMD_SIZE)) + continue; + + if (!try_accept_one(&start, len, PAGE_SIZE)) + return false; + } + + return true; +} + void __init tdx_early_init(void) { u64 cc_mask; @@ -555,5 +702,9 @@ void __init tdx_early_init(void) */ physical_mask &=3D cc_mask - 1; =20 + x86_platform.guest.enc_cache_flush_required =3D tdx_cache_flush_required; + x86_platform.guest.enc_tlb_flush_required =3D tdx_tlb_flush_required; + x86_platform.guest.enc_status_change_finish =3D tdx_enc_status_changed; + pr_info("Guest detected\n"); } diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index a77103e541bf..962c7f531bf9 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -1302,7 +1302,7 @@ static void ve_raise_fault(struct pt_regs *regs, long= error_code) * * In the settings that Linux will run in, virtualization exceptions are * never generated on accesses to normal, TD-private memory that has been - * accepted. + * accepted (by BIOS or with tdx_enc_status_changed()). * * Syscall entry code has a critical window where the kernel stack is not * yet set up. Any exception in this window leads to hard to debug issues --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A27C3C433F5 for ; Fri, 18 Mar 2022 15:33:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238116AbiCRPeW (ORCPT ); Fri, 18 Mar 2022 11:34:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238154AbiCRPcw (ORCPT ); Fri, 18 Mar 2022 11:32:52 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F4181C2D98 for ; Fri, 18 Mar 2022 08:31:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617469; x=1679153469; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=H+Ipj73F2jgu/yunUnfKfnvnT5UJ0uuH36Yo4GgpWM8=; b=OqtJSujQa+eahQl9X9PWaDjZ8KHyGHidIRZfPX0V1rs3AJxgrJ7dBbhV rTRG3jlS769G37kv86J0Do/cMwQKrPBLEFukiREf0HB+NovoQev8gibFI YKXhaL0jtTdR/fhBj5Ig1K5FVzxWSmgfbekWCoEbja67MAA1/phsOsiPU Ctuu4WhBF9xO0M16qmOC3OX/DFnm3qmwfZbmuthFK77feJyhKYEk+ix/G Y/XnU5mTPmtSwz71dzYKtAG+xwMAn37zI5KkeYsqozRO64kLThBJ0pSfE Dem4nZ6QOULDA1JhaE5xqddmZmL4muO/1fVwW7yDTzk57VRN/Ktfwwgvr w==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="257344050" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="257344050" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="715523779" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga005.jf.intel.com with ESMTP; 18 Mar 2022 08:30:45 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 0FF0EB04; Fri, 18 Mar 2022 17:30:51 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv7 27/30] x86/mm: Make DMA memory shared for TD guest Date: Fri, 18 Mar 2022 18:30:45 +0300 Message-Id: <20220318153048.51177-28-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Intel TDX doesn't allow VMM to directly access guest private memory. Any memory that is required for communication with the VMM must be shared explicitly. The same rule applies for any DMA to and from the TDX guest. All DMA pages have to be marked as shared pages. A generic way to achieve this without any changes to device drivers is to use the SWIOTLB framework. In TDX guest, CC_ATTR_GUEST_MEM_ENCRYPT is set. It makes all DMA to be rerouted via SWIOTLB (see pci_swiotlb_detect()). mem_encrypt_init() generalized to cover TDX. It makes SWIOTLB buffer shared. Stop selecting DYNAMIC_PHYSICAL_MASK directly. It will get set indirectly by selcting X86_MEM_ENCRYPT. mem_encrypt_init() is currently under an AMD-specific #ifdef. Move it to a generic area of the header. Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen --- arch/x86/Kconfig | 2 +- arch/x86/coco/core.c | 1 + arch/x86/include/asm/mem_encrypt.h | 6 +++--- arch/x86/mm/mem_encrypt.c | 9 ++++++++- 4 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 98efb35ed7b1..1312cefb927d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -885,7 +885,7 @@ config INTEL_TDX_GUEST depends on X86_64 && CPU_SUP_INTEL depends on X86_X2APIC select ARCH_HAS_CC_PLATFORM - select DYNAMIC_PHYSICAL_MASK + select X86_MEM_ENCRYPT select X86_MCE help Support running as a guest under Intel TDX. Without this support, diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index 9f74125c582d..4320fadae716 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -22,6 +22,7 @@ static bool intel_cc_platform_has(enum cc_attr attr) case CC_ATTR_GUEST_UNROLL_STRING_IO: case CC_ATTR_HOTPLUG_DISABLED: case CC_ATTR_GUEST_MEM_ENCRYPT: + case CC_ATTR_MEM_ENCRYPT: return true; default: return false; diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_= encrypt.h index e2c6f433ed10..88ceaf3648b3 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -49,9 +49,6 @@ void __init early_set_mem_enc_dec_hypercall(unsigned long= vaddr, int npages, =20 void __init mem_encrypt_free_decrypted_mem(void); =20 -/* Architecture __weak replacement functions */ -void __init mem_encrypt_init(void); - void __init sev_es_init_vc_handling(void); =20 #define __bss_decrypted __section(".bss..decrypted") @@ -89,6 +86,9 @@ static inline void mem_encrypt_free_decrypted_mem(void) {= } =20 #endif /* CONFIG_AMD_MEM_ENCRYPT */ =20 +/* Architecture __weak replacement functions */ +void __init mem_encrypt_init(void); + /* * The __sme_pa() and __sme_pa_nodebug() macros are meant for use when * writing to or comparing values from the cr3 register. Having the diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index 50d209939c66..10ee40b5204b 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -42,7 +42,14 @@ bool force_dma_unencrypted(struct device *dev) =20 static void print_mem_encrypt_feature_info(void) { - pr_info("AMD Memory Encryption Features active:"); + pr_info("Memory Encryption Features active:"); + + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) { + pr_cont(" Intel TDX\n"); + return; + } + + pr_cont("AMD "); =20 /* Secure Memory Encryption */ if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) { --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 492AEC433EF for ; Fri, 18 Mar 2022 15:33:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235602AbiCRPeR (ORCPT ); Fri, 18 Mar 2022 11:34:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238149AbiCRPcw (ORCPT ); Fri, 18 Mar 2022 11:32:52 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 537411C2D9A for ; Fri, 18 Mar 2022 08:31:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617469; x=1679153469; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=u1cVO6vC+Je5QgVAf5o9RaTjVFNyO8cnFMef5xqiSLA=; b=DrMTCkEZD+WCyJCVtKhfXlUv3cR+y5ZKwiftQ04b+QSMkU2YyESpQHaB sIWY+qezoHtLZJ2T6o1bQEjl92vSq0FHAmbTR7EPHOAxi5JIHtwzVvmDZ Y/5PDHwpBAzs3CN3FHfbQDAj2UUvOE+3HjWeTbFbzPfORvxJ3eTNHeWgs D7vvxUcNDt2qVt+r6m4sSdQZtOOtVnIwq8LChjkaMimwyU7fiUuDD8xLw V231YyjMbzwxCMIQoPhQr6xIVYDL0PzXacYSAMp/78B+6Q18r6Gd9jWt4 nfA75hKyJO2+oIckp8sCQ9AcSspr4qEEGzr6aNwRhjugtpQNPuyywbxkc w==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="257344063" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="257344063" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="715523785" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga005.jf.intel.com with ESMTP; 18 Mar 2022 08:30:45 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 1D4F8B16; Fri, 18 Mar 2022 17:30:51 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, Isaku Yamahata , "Kirill A . Shutemov" , Dave Hansen Subject: [PATCHv7 28/30] x86/tdx: ioapic: Add shared bit for IOAPIC base address Date: Fri, 18 Mar 2022 18:30:46 +0300 Message-Id: <20220318153048.51177-29-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata The kernel interacts with each bare-metal IOAPIC with a special MMIO page. When running under KVM, the guest's IOAPICs are emulated by KVM. When running as a TDX guest, the guest needs to mark each IOAPIC mapping as "shared" with the host. This ensures that TDX private protections are not applied to the page, which allows the TDX host emulation to work. ioremap()-created mappings such as virtio will be marked as shared by default. However, the IOAPIC code does not use ioremap() and instead uses the fixmap mechanism. Introduce a special fixmap helper just for the IOAPIC code. Ensure that it marks IOAPIC pages as "shared". This replaces set_fixmap_nocache() with __set_fixmap() since __set_fixmap() allows custom 'prot' values. AMD SEV gets IOAPIC pages shared because FIXMAP_PAGE_NOCACHE has _ENC bit clear. TDX has to set bit to share the page with the host. Signed-off-by: Isaku Yamahata Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Thomas Gleixner --- arch/x86/kernel/apic/io_apic.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index c1bb384935b0..a868b76cd3d4 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -65,6 +65,7 @@ #include #include #include +#include =20 #define for_each_ioapic(idx) \ for ((idx) =3D 0; (idx) < nr_ioapics; (idx)++) @@ -2677,6 +2678,19 @@ static struct resource * __init ioapic_setup_resourc= es(void) return res; } =20 +static void io_apic_set_fixmap(enum fixed_addresses idx, phys_addr_t phys) +{ + pgprot_t flags =3D FIXMAP_PAGE_NOCACHE; + + /* + * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot + * bits, just like normal ioremap(): + */ + flags =3D pgprot_decrypted(flags); + + __set_fixmap(idx, phys, flags); +} + void __init io_apic_init_mappings(void) { unsigned long ioapic_phys, idx =3D FIX_IO_APIC_BASE_0; @@ -2709,7 +2723,7 @@ void __init io_apic_init_mappings(void) __func__, PAGE_SIZE, PAGE_SIZE); ioapic_phys =3D __pa(ioapic_phys); } - set_fixmap_nocache(idx, ioapic_phys); + io_apic_set_fixmap(idx, ioapic_phys); apic_printk(APIC_VERBOSE, "mapped IOAPIC to %08lx (%08lx)\n", __fix_to_virt(idx) + (ioapic_phys & ~PAGE_MASK), ioapic_phys); @@ -2838,7 +2852,7 @@ int mp_register_ioapic(int id, u32 address, u32 gsi_b= ase, ioapics[idx].mp_config.flags =3D MPC_APIC_USABLE; ioapics[idx].mp_config.apicaddr =3D address; =20 - set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address); + io_apic_set_fixmap(FIX_IO_APIC_BASE_0 + idx, address); if (bad_ioapic_register(idx)) { clear_fixmap(FIX_IO_APIC_BASE_0 + idx); return -ENODEV; --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E21DC433F5 for ; Fri, 18 Mar 2022 15:32:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238115AbiCRPct (ORCPT ); Fri, 18 Mar 2022 11:32:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238066AbiCRPcO (ORCPT ); Fri, 18 Mar 2022 11:32:14 -0400 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFA9F1903C1 for ; Fri, 18 Mar 2022 08:30:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617453; x=1679153453; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lIZlINAWsXzzBA4yVTehNHWf6CjqMPcGKd53Pxkg/vI=; b=AI0vcYcKXMoD1l02B+BvtyVB/HHaS/lT8ZP4XRq7cZjWWc9Zr+bFvOM4 Uq+rTrD540jQnFFZw1iPWejRgkLwxVGxaTY0GJSKcbLaJ2nMYN+XqGG2/ fcV/VzYmsvJPnMSE57agtcp/N229uXTYqutmbHJ25hadtl1/z/LvpB3wT TQ7TrSKVv44CBKnOx2ABh9xE5HWSRltdTjyeF5zWSfoxR2pY2lUagDFs9 qzfTtyhRgjimGak8bCuR30133EOBb9DhNvnlvYz4OobQrsg90Tp8RhXTU RiH5Jk0Q/U/HTLbcqKXloHly/xtMLOltLT3JGohX7PgID9WTAiBcYDk3h Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="317872610" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="317872610" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="517262374" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga006.jf.intel.com with ESMTP; 18 Mar 2022 08:30:46 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 2B5E0B23; Fri, 18 Mar 2022 17:30:51 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Dave Hansen Subject: [PATCHv7 29/30] ACPICA: Avoid cache flush inside virtual machines Date: Fri, 18 Mar 2022 18:30:47 +0300 Message-Id: <20220318153048.51177-30-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" While running inside virtual machine, the kernel can bypass cache flushing. Changing sleep state in a virtual machine doesn't affect the host system sleep state and cannot lead to data loss. Before entering sleep states, the ACPI code flushes caches to prevent data loss using the WBINVD instruction. This mechanism is required on bare metal. But, any use WBINVD inside of a guest is worthless. Changing sleep state in a virtual machine doesn't affect the host system sleep state and cannot lead to data loss, so most hypervisors simply ignore it. Despite this, the ACPI code calls WBINVD unconditionally anyway. It's useless, but also normally harmless. In TDX guests, though, WBINVD stops being harmless; it triggers a virtualization exception (#VE). If the ACPI cache-flushing WBINVD were left in place, TDX guests would need handling to recover from the exception. Avoid using WBINVD whenever running under a hypervisor. This both removes the useless WBINVDs and saves TDX from implementing WBINVD handling. Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen Reviewed-by: Dan Williams Reviewed-by: Thomas Gleixner --- arch/x86/include/asm/acenv.h | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/acenv.h b/arch/x86/include/asm/acenv.h index 9aff97f0de7f..d937c55e717e 100644 --- a/arch/x86/include/asm/acenv.h +++ b/arch/x86/include/asm/acenv.h @@ -13,7 +13,19 @@ =20 /* Asm macros */ =20 -#define ACPI_FLUSH_CPU_CACHE() wbinvd() +/* + * ACPI_FLUSH_CPU_CACHE() flushes caches on entering sleep states. + * It is required to prevent data loss. + * + * While running inside virtual machine, the kernel can bypass cache flush= ing. + * Changing sleep state in a virtual machine doesn't affect the host system + * sleep state and cannot lead to data loss. + */ +#define ACPI_FLUSH_CPU_CACHE() \ +do { \ + if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) \ + wbinvd(); \ +} while (0) =20 int __acpi_acquire_global_lock(unsigned int *lock); int __acpi_release_global_lock(unsigned int *lock); --=20 2.34.1 From nobody Mon Jun 22 18:08:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39FF1C433EF for ; Fri, 18 Mar 2022 15:33:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238243AbiCRPe3 (ORCPT ); Fri, 18 Mar 2022 11:34:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238144AbiCRPcw (ORCPT ); Fri, 18 Mar 2022 11:32:52 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 011C61B84CB for ; Fri, 18 Mar 2022 08:31:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647617467; x=1679153467; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jy8hmld/FRVFtw+SoRqVzGZssOns60kyihmirmt7mEs=; b=fOmTcQGiYIMO/Rnkk+JexmJYs0lys0X3uqPFgeCERGFRevXEFZAj/2DL vIAz63GC48hB61WVnBDEOx4n+2ST0jRJLIWRe9ck5VhOpZgDor3/BzVKh 8q2RBjDOQsuqivawPhsd27t5uNnHnoF2HpLibxPxlG+FJArJdxIcLaQfQ GT0am3vPQBtk/9yUiyNsUzp3n2zubo5dK59LNSueeU1nKwx7TJoLkKdNL 5FvrXlqhnP5egFzArg28a+XIClAUkEAyZhir40yup+3MwDyRSrW2nJHsI o4z2aERSQj+T0ZhCLPIK0ordW0IBpW/D6/SnG3TT6wjtLtdOrV0AV0bxw g==; X-IronPort-AV: E=McAfee;i="6200,9189,10290"; a="237767771" X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="237767771" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2022 08:30:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,192,1643702400"; d="scan'208";a="599534258" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga008.fm.intel.com with ESMTP; 18 Mar 2022 08:30:46 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 39B3AB42; Fri, 18 Mar 2022 17:30:51 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" , Dave Hansen Subject: [PATCHv7 30/30] Documentation/x86: Document TDX kernel architecture Date: Fri, 18 Mar 2022 18:30:48 +0300 Message-Id: <20220318153048.51177-31-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> References: <20220318153048.51177-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan Document the TDX guest architecture details like #VE support, shared memory, etc. Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov Reviewed-by: Dave Hansen --- Documentation/x86/index.rst | 1 + Documentation/x86/tdx.rst | 214 ++++++++++++++++++++++++++++++++++++ 2 files changed, 215 insertions(+) create mode 100644 Documentation/x86/tdx.rst diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index f498f1d36cd3..382e53ca850a 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -24,6 +24,7 @@ x86-specific Documentation intel-iommu intel_txt amd-memory-encryption + tdx pti mds microcode diff --git a/Documentation/x86/tdx.rst b/Documentation/x86/tdx.rst new file mode 100644 index 000000000000..8ca60256511b --- /dev/null +++ b/Documentation/x86/tdx.rst @@ -0,0 +1,214 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Intel Trust Domain Extensions (TDX) +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Intel's Trust Domain Extensions (TDX) protect confidential guest VMs from +the host and physical attacks by isolating the guest register state and by +encrypting the guest memory. In TDX, a special TDX module sits between the +host and the guest, and runs in a special mode and manages the guest/host +separation. + +Since the host cannot directly access guest registers or memory, much +normal functionality of a hypervisor must be moved into the guest. This is +implemented using a Virtualization Exception (#VE) that is handled by the +guest kernel. Some #VEs are handled entirely inside the guest kernel, but +some require the hypervisor to be involved. + +TDX includes new hypercall-like mechanisms for communicating from the +guest to the hypervisor or the TDX module. + +New TDX Exceptions +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +TDX guests behave differently from bare-metal and traditional VMX guests. +In TDX guests, otherwise normal instructions or memory accesses can cause +#VE or #GP exceptions. + +Instructions marked with an '*' conditionally cause exceptions. The +details for these instructions are discussed below. + +Instruction-based #VE +--------------------- + +- Port I/O (INS, OUTS, IN, OUT) +- HLT +- MONITOR, MWAIT +- WBINVD, INVD +- VMCALL +- RDMSR*,WRMSR* +- CPUID* + +Instruction-based #GP +--------------------- + +- All VMX instructions: INVEPT, INVVPID, VMCLEAR, VMFUNC, VMLAUNCH, + VMPTRLD, VMPTRST, VMREAD, VMRESUME, VMWRITE, VMXOFF, VMXON +- ENCLS, ENCLU +- GETSEC +- RSM +- ENQCMD +- RDMSR*,WRMSR* + +RDMSR/WRMSR Behavior +-------------------- + +MSR access behavior falls into three categories: + +- #GP generated +- #VE generated +- "Just works" + +In general, the #GP MSRs should not be used in guests. Their use likely +indicates a bug in the guest. The guest may try to handle the #GP with a +hypercall but it is unlikely to succeed. + +The #VE MSRs are typically able to be handled by the hypervisor. Guests +can make a hypercall to the hypervisor to handle the #VE. + +The "just works" MSRs do not need any special guest handling. They might +be implemented by directly passing through the MSR to the hardware or by +trapping and handling in the TDX module. Other than possibly being slow, +these MSRs appear to function just as they would on bare metal. + +CPUID Behavior +-------------- + +For some CPUID leaves and sub-leaves, the virtualized bit fields of CPUID +return values (in guest EAX/EBX/ECX/EDX) are configurable by the +hypervisor. For such cases, the Intel TDX module architecture defines two +virtualization types: + +- Bit fields for which the hypervisor configures the value seen by the + guest TD. + +- Bit fields for which the hypervisor configures the value such that the + guest TD either sees their native value or a value of 0 + +#VE generated for CPUID leaves and sub-leaves that TDX module doesn't know +how to handle. The guest kernel may ask the hypervisor for the value with +a hypercall. + +#VE on Memory Accesses +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +There are essentially two classes of TDX memory: private and shared. +Private memory receives full TDX protections. Its content is protected +against access from the hypervisor. Shared memory is expected to be +shared between guest and hypervisor. + +A TD guest is in control of whether its memory accesses are treated as +private or shared. It selects the behavior with a bit in its page table +entries. This helps ensure that a guest does not place sensitive +information in shared memory, exposing it to the untrusted hypervisor. + +#VE on Shared Memory +-------------------- + +Access to shared mappings can cause a #VE. The hypervisor ultimately +controls whether a shared memory access causes a #VE, so the guest must be +careful to only reference shared pages it can safely handle a #VE. For +instance, the guest should be careful not to access shared memory in the +#VE handler before it reads the #VE info structure (TDG.VP.VEINFO.GET). + +Shared mapping content is entirely controlled by the hypervisor. Shared +mappings must never be used for sensitive memory content like stacks or +kernel text, only for I/O buffers and MMIO regions. A good rule of thumb +is that hypervisor-shared memory should be treated the same as memory +mapped to userspace. Both the hypervisor and userspace are completely +untrusted. + +MMIO for virtual devices is implemented as shared memory. The guest must +be careful not to access device MMIO regions unless it is also prepared to +handle a #VE. + +#VE on Private Pages +-------------------- + +Accesses to private mappings can also cause #VEs. Since all kernel memory +is also private memory, the kernel might theoretically need to handle a +#VE on arbitrary kernel memory accesses. This is not feasible, so TDX +guests ensure that all guest memory has been "accepted" before memory is +used by the kernel. + +A modest amount of memory (typically 512M) is pre-accepted by the firmware +before the kernel runs to ensure that the kernel can start up without +being subjected to #VE's. + +The hypervisor is permitted to unilaterally move accepted pages to a +"blocked" state. However, if it does this, page access will not generate a +#VE. It will, instead, cause a "TD Exit" where the hypervisor is required +to handle the exception. + +Linux #VE handler +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Just like page faults or #GP's, #VE exceptions can be either handled or be +fatal. Typically, unhandled userspace #VE's result in a SIGSEGV. +Unhandled kernel #VE's result in an oops. + +Handling nested exceptions on x86 is typically nasty business. A #VE +could be interrupted by an NMI which triggers another #VE and hilarity +ensues. TDX #VE's have a novel solution to make it slightly less nasty. + +During #VE handling, the TDX module ensures that all interrupts (including +NMIs) are blocked. The block remains in place until the guest makes a +TDG.VP.VEINFO.GET TDCALL. This allows the guest to choose when interrupts +or new #VE's can be delivered. + +However, the guest kernel must still be careful to avoid potential +#VE-triggering actions (discussed above) while this block is in place. +While the block is in place, #VE's are elevated to double faults (#DF) +which are not recoverable. + +MMIO handling +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +In non-TDX VMs, MMIO is usually implemented by giving a guest access to +a mapping which will cause a VMEXIT on access, and then the hypervisor emu= lates +the access. That is not possible in TDX guests because VMEXIT will expose= the +register state to the host. TDX guests don't trust the host and can't have +their state exposed to the host. + +In TDX, the MMIO regions typically trigger a #VE exception in the guest. +The guest #VE handler then emulates the MMIO instruction inside the guest +and converts it into a controlled TDCALL to the host, rather than exposing +guest state to the host. + +MMIO addresses on x86 are just special physical addresses. They can +theoretically be accessed with any instruction that accesses memory. +However, the kernel instruction decoding method is limited. It is only +designed to decode instructions like those generated by io.h macros. + +MMIO access via other means (like structure overlays) may result in an +oops. + +Shared Memory Conversions +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D + +All TDX guest memory starts out as private at boot. This memory can not +be accessed by the hypervisor. However some kernel users like device +drivers might have a need to share data with the hypervisor. To do this, +memory must be converted between shared and private. This can be +accomplished using some existing memory encryption helpers: + +set_memory_decrypted() converts a range of pages to shared. +set_memory_encrypted() converts memory back to private. + +Device drivers are the primary user of shared memory, but there's no need +to touch every driver. DMA buffers and ioremap()'ed do the conversions +automatically. + +TDX uses SWIOTLB for most DMA allocations. The SWIOTLB buffer is +converted to shared on boot. + +For coherent DMA allocation, the DMA buffer gets converted on the +allocation. Check force_dma_unencrypted() for details. + +References +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +TDX reference material is collected here: + +https://www.intel.com/content/www/us/en/developer/articles/technical/intel= -trust-domain-extensions.html --=20 2.34.1