From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20045C433F5 for ; Mon, 24 Jan 2022 15:02:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238999AbiAXPCQ (ORCPT ); Mon, 24 Jan 2022 10:02:16 -0500 Received: from mga01.intel.com ([192.55.52.88]:64635 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234477AbiAXPCO (ORCPT ); Mon, 24 Jan 2022 10:02:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036534; x=1674572534; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/A3yuQETcYGHh0zJl4ypTfWdoT0Qd+Amo3JnNUQZ+5U=; b=i1/ziA95PgPylILearWcq3NZW4/v3U0noCpvLeVrOTYBfRS7tPID7Npq gVOtCB6GiZafl4SdUZGPXyA6T4Xo6r2Ng88RKJ5EKHN/LmPd1YfWaZAz9 JRoWoSGw/TusfvnNZ5yrmY4zP63jg7ShnTtHliJYNxKRfi9npDLcsdrcE R9wxmi2wv8+DGId2KDfxM7NCQi/hZvGV0Rksawwa8MNEU15JWhFf7tOAj 1erZOsd3uHLwBEnABkGDO49jS6WQm/qRB7vu06rLgl1Vdy7fIBeQBOBVv ZU90PyFNNDDz/cnT3Bc67OOqrsi1lvEy5yJzT2RB4WPmfKzWCQuIivfNa g==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="270498500" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="270498500" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="673649998" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga001.fm.intel.com with ESMTP; 24 Jan 2022 07:02:06 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 4E5A115C; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv2 01/29] x86/tdx: Detect running as a TDX guest in early boot Date: Mon, 24 Jan 2022 18:01:47 +0300 Message-Id: <20220124150215.36893-2-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan cc_platform_has() API is used in the kernel to enable confidential computing features. Since TDX guest is a confidential computing platform, it also needs to use this API. In preparation of extending cc_platform_has() API to support TDX guest, use CPUID instruction to detect support for TDX guests in the early boot code (via tdx_early_init()). Since copy_bootdata() is the first user of cc_platform_has() API, detect the TDX guest status before it. Since cc_plaform_has() API will be used frequently across the boot code, instead of repeatedly detecting the TDX guest status using the CPUID instruction, detect once and cache the result. Add a function (is_tdx_guest()) to read the cached TDX guest status in CC APIs. Define a synthetic feature flag (X86_FEATURE_TDX_GUEST) and set this bit in a valid TDX guest platform. This feature bit will be used to do TDX-specific handling in some areas of the ARCH code where a function call to check for TDX guest status is not cost-effective (for example, TDX hypercall support). Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/Kconfig | 12 +++++++++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++- arch/x86/include/asm/tdx.h | 23 ++++++++++++++++++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/head64.c | 4 +++ arch/x86/kernel/tdx.c | 31 ++++++++++++++++++++++++ 7 files changed, 79 insertions(+), 1 deletion(-) create mode 100644 arch/x86/include/asm/tdx.h create mode 100644 arch/x86/kernel/tdx.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 6fddb63271d9..09e6744af3f8 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -880,6 +880,18 @@ config ACRN_GUEST IOT with small footprint and real-time features. More details can be found in https://projectacrn.org/. =20 +config INTEL_TDX_GUEST + bool "Intel TDX (Trust Domain Extensions) - Guest Support" + depends on X86_64 && CPU_SUP_INTEL + depends on X86_X2APIC + help + Support running as a guest under Intel TDX. Without this support, + the guest kernel can not boot or run under TDX. + TDX includes memory encryption and integrity capabilities + which protect the confidentiality and integrity of guest + memory contents and CPU state. TDX guests are protected from + potential attacks from the VMM. + endif #HYPERVISOR_GUEST =20 source "arch/x86/Kconfig.cpu" diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpuf= eatures.h index 6db4e2932b3d..defed3bd543b 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -238,6 +238,7 @@ #define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL h= ypercall instruction */ #define X86_FEATURE_PVUNLOCK ( 8*32+20) /* "" PV unlock function */ #define X86_FEATURE_VCPUPREEMPT ( 8*32+21) /* "" PV vcpu_is_preempted fun= ction */ +#define X86_FEATURE_TDX_GUEST ( 8*32+22) /* Intel Trust Domain Extensions= Guest */ =20 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */ #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, = WRGSBASE instructions*/ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/as= m/disabled-features.h index 8f28fafa98b3..f556086e6093 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -65,6 +65,12 @@ # define DISABLE_SGX (1 << (X86_FEATURE_SGX & 31)) #endif =20 +#ifdef CONFIG_INTEL_TDX_GUEST +# define DISABLE_TDX_GUEST 0 +#else +# define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -76,7 +82,7 @@ #define DISABLED_MASK5 0 #define DISABLED_MASK6 0 #define DISABLED_MASK7 (DISABLE_PTI) -#define DISABLED_MASK8 0 +#define DISABLED_MASK8 (DISABLE_TDX_GUEST) #define DISABLED_MASK9 (DISABLE_SMAP|DISABLE_SGX) #define DISABLED_MASK10 0 #define DISABLED_MASK11 0 diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h new file mode 100644 index 000000000000..e375a950a033 --- /dev/null +++ b/arch/x86/include/asm/tdx.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2021-2022 Intel Corporation */ +#ifndef _ASM_X86_TDX_H +#define _ASM_X86_TDX_H + +#include + +#define TDX_CPUID_LEAF_ID 0x21 +#define TDX_IDENT "IntelTDX " + +#ifdef CONFIG_INTEL_TDX_GUEST + +void __init tdx_early_init(void); +bool is_tdx_guest(void); + +#else + +static inline void tdx_early_init(void) { }; +static inline bool is_tdx_guest(void) { return false; } + +#endif /* CONFIG_INTEL_TDX_GUEST */ + +#endif /* _ASM_X86_TDX_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 6aef9ee28a39..211d9fcdd729 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -130,6 +130,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK) +=3D pvclock.o obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) +=3D pmem.o =20 obj-$(CONFIG_JAILHOUSE_GUEST) +=3D jailhouse.o +obj-$(CONFIG_INTEL_TDX_GUEST) +=3D tdx.o =20 obj-$(CONFIG_EISA) +=3D eisa.o obj-$(CONFIG_PCSPKR_PLATFORM) +=3D pcspeaker.o diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index de563db9cdcd..1cb6346ec3d1 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -40,6 +40,7 @@ #include #include #include +#include =20 /* * Manage page tables very early on. @@ -516,6 +517,9 @@ asmlinkage __visible void __init x86_64_start_kernel(ch= ar * real_mode_data) =20 copy_bootdata(__va(real_mode_data)); =20 + /* Needed before cc_platform_has() can be used for TDX */ + tdx_early_init(); + /* * Load microcode early on BSP. */ diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c new file mode 100644 index 000000000000..1ef6979a6434 --- /dev/null +++ b/arch/x86/kernel/tdx.c @@ -0,0 +1,31 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2021-2022 Intel Corporation */ + +#undef pr_fmt +#define pr_fmt(fmt) "tdx: " fmt + +#include +#include + +static bool tdx_guest_detected __ro_after_init; + +bool is_tdx_guest(void) +{ + return tdx_guest_detected; +} + +void __init tdx_early_init(void) +{ + u32 eax, sig[3]; + + cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]); + + if (memcmp(TDX_IDENT, sig, 12)) + return; + + tdx_guest_detected =3D true; + + setup_force_cpu_cap(X86_FEATURE_TDX_GUEST); + + pr_info("Guest detected\n"); +} --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D5B1C433FE for ; Mon, 24 Jan 2022 15:02:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243104AbiAXPCm (ORCPT ); Mon, 24 Jan 2022 10:02:42 -0500 Received: from mga02.intel.com ([134.134.136.20]:19100 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239970AbiAXPCX (ORCPT ); Mon, 24 Jan 2022 10:02:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036542; x=1674572542; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/z7OrzV1y617WdUQU00LsxqK+cIPCzyFS6bpxBGqoJM=; b=OHQXKjEG3fR4xObY61Ww/flHGTvL+s3Baj63c0vJjv64ivNgIVX8uyz3 EoztkWmspa9Ndj9Mul/Ymt6pCQtyB19RtlrH0Ozjv6tl3NszOtVZdAEnI QxEexf06G/Irc7MZ4Em5H+0x2gH15ohOjkUm2KRk8FaHHZZnL/UwgipWG Qxab06q1G6aQIguLOMNBEc099D3ANvaTMAoE2lk6BQWo0gNOU2VYrqKSE ZGEGYtHpV9Hl3TPV0eY3Ahficd+XowcSVPNc4wjemlMp+uzxJ323gazI+ B3HGSvnl8ziBbpdzxIGEddukx1RV+3xENgtOoRRcVpYfb9vmQEL9312Z2 A==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="233423226" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="233423226" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="479104629" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga006.jf.intel.com with ESMTP; 24 Jan 2022 07:02:06 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 5BEFA2DD; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv2 02/29] x86/tdx: Extend the cc_platform_has() API to support TDX guests Date: Mon, 24 Jan 2022 18:01:48 +0300 Message-Id: <20220124150215.36893-3-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan Confidential Computing (CC) features (like string I/O unroll support, memory encryption/decryption support, etc) are conditionally enabled in the kernel using cc_platform_has() API. Since TDX guests also need to use these CC features, extend cc_platform_has() API and add TDX guest-specific CC attributes support. Use is_tdx_guest() API to detect for the TDX guest status and return TDX-specific CC attributes. To enable use of CC APIs in the TDX guest, select ARCH_HAS_CC_PLATFORM in the CONFIG_INTEL_TDX_GUEST case. This is a preparatory patch and just creates the framework for adding TDX guest specific CC attributes. Since is_tdx_guest() function (through cc_platform_has() API) is used in the early boot code, disable the instrumentation flags and function tracer. This is similar to AMD SEV and cc_platform.c. Since intel_cc_platform_has() function only gets called when is_tdx_guest() is true (valid CONFIG_INTEL_TDX_GUEST case), remove the redundant #ifdef in intel_cc_platform_has(). Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov Reviewed-by: Thomas Gleixner --- arch/x86/Kconfig | 1 + arch/x86/kernel/Makefile | 3 +++ arch/x86/kernel/cc_platform.c | 9 ++++----- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 09e6744af3f8..1491f25c844e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -884,6 +884,7 @@ config INTEL_TDX_GUEST bool "Intel TDX (Trust Domain Extensions) - Guest Support" depends on X86_64 && CPU_SUP_INTEL depends on X86_X2APIC + select ARCH_HAS_CC_PLATFORM help Support running as a guest under Intel TDX. Without this support, the guest kernel can not boot or run under TDX. diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 211d9fcdd729..67415037c33c 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -22,6 +22,7 @@ CFLAGS_REMOVE_early_printk.o =3D -pg CFLAGS_REMOVE_head64.o =3D -pg CFLAGS_REMOVE_sev.o =3D -pg CFLAGS_REMOVE_cc_platform.o =3D -pg +CFLAGS_REMOVE_tdx.o =3D -pg endif =20 KASAN_SANITIZE_head$(BITS).o :=3D n @@ -31,6 +32,7 @@ KASAN_SANITIZE_stacktrace.o :=3D n KASAN_SANITIZE_paravirt.o :=3D n KASAN_SANITIZE_sev.o :=3D n KASAN_SANITIZE_cc_platform.o :=3D n +KASAN_SANITIZE_tdx.o :=3D n =20 # With some compiler versions the generated code results in boot hangs, ca= used # by several compilation units. To be safe, disable all instrumentation. @@ -50,6 +52,7 @@ KCOV_INSTRUMENT :=3D n =20 CFLAGS_head$(BITS).o +=3D -fno-stack-protector CFLAGS_cc_platform.o +=3D -fno-stack-protector +CFLAGS_tdx.o +=3D -fno-stack-protector =20 CFLAGS_irq.o :=3D -I $(srctree)/$(src)/../include/asm/trace =20 diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 6a6ffcd978f6..c72b3919bca9 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -13,14 +13,11 @@ =20 #include #include +#include =20 -static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr) +static bool intel_cc_platform_has(enum cc_attr attr) { -#ifdef CONFIG_INTEL_TDX_GUEST return false; -#else - return false; -#endif } =20 /* @@ -76,6 +73,8 @@ bool cc_platform_has(enum cc_attr attr) { if (sme_me_mask) return amd_cc_platform_has(attr); + else if (is_tdx_guest()) + return intel_cc_platform_has(attr); =20 if (hv_is_isolation_supported()) return hyperv_cc_platform_has(attr); --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C68D6C433EF for ; Mon, 24 Jan 2022 15:02:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240255AbiAXPCe (ORCPT ); Mon, 24 Jan 2022 10:02:34 -0500 Received: from mga17.intel.com ([192.55.52.151]:63896 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239183AbiAXPCV (ORCPT ); Mon, 24 Jan 2022 10:02:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036541; x=1674572541; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Be7yIAQHIpAuRjuHid+cUNvWzrmqEPVws+GfaHNYB9Q=; b=FvhkOMUrfpbdg6ChrMYXWZLVmqc+K/dMEcCtHPddqg+GpzJVyhmmjLot 9wS/aNHJjPL9YEX+52MVYbbrb7zXCTtR3s5zzWQhisqLq0Y9zThdfO7hV GkZ0zniQd86FYZIvjGfUqzys9AMbjIl5C2k8RVP8VB0NEtva0I30HjpS3 mf7OuQPnztr1LFtbVu4D/hoVNzgVZm0jk68U3nfzbbNcEzWyu71mLuSHZ c4SbLugn07J9MzUUxmnj/myxreDtTbFjLtMZDAKH+aEazYLSWngomI7SE FsU5sl9cLbZPm6dAJYp9LfZokhYcoET0rIaJ/MhphY2vhAKrJKLqGgvSW Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="226734623" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="226734623" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="476743202" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga003.jf.intel.com with ESMTP; 24 Jan 2022 07:02:06 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 6A1873CC; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv2 03/29] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Date: Mon, 24 Jan 2022 18:01:49 +0300 Message-Id: <20220124150215.36893-4-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan Guests communicate with VMMs with hypercalls. Historically, these are implemented using instructions that are known to cause VMEXITs like VMCALL, VMLAUNCH, etc. However, with TDX, VMEXITs no longer expose the guest state to the host. This prevents the old hypercall mechanisms from working. So, to communicate with VMM, TDX specification defines a new instruction called TDCALL. In a TDX based VM, since the VMM is an untrusted entity, an intermediary layer -- TDX module -- facilitates secure communication between the host and the guest. TDX module is loaded like a firmware into a special CPU mode called SEAM. TDX guests communicate with the TDX module using the TDCALL instruction. A guest uses TDCALL to communicate with both the TDX module and VMM. The value of the RAX register when executing the TDCALL instruction is used to determine the TDCALL type. A variant of TDCALL used to communicate with the VMM is called TDVMCALL. Add generic interfaces to communicate with the TDX module and VMM (using the TDCALL instruction). __tdx_hypercall() - Used by the guest to request services from the VMM (via TDVMCALL). __tdx_module_call() - Used to communicate with the TDX module (via TDCALL). Also define an additional wrapper _tdx_hypercall(), which adds error handling support for the TDCALL failure. The __tdx_module_call() and __tdx_hypercall() helper functions are implemented in assembly in a .S file. The TDCALL ABI requires shuffling arguments in and out of registers, which proved to be awkward with inline assembly. Just like syscalls, not all TDVMCALL use cases need to use the same number of argument registers. The implementation here picks the current worst-case scenario for TDCALL (4 registers). For TDCALLs with fewer than 4 arguments, there will end up being a few superfluous (cheap) instructions. But, this approach maximizes code reuse. For registers used by the TDCALL instruction, please check TDX GHCI specification, the section titled "TDCALL instruction" and "TDG.VP.VMCALL Interface". Based on previous patch by Sean Christopherson. Reviewed-by: Tony Luck Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/tdx.h | 40 +++++ arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/asm-offsets.c | 20 +++ arch/x86/kernel/tdcall.S | 269 ++++++++++++++++++++++++++++++++++ arch/x86/kernel/tdx.c | 23 +++ 5 files changed, 353 insertions(+), 1 deletion(-) create mode 100644 arch/x86/kernel/tdcall.S diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index e375a950a033..5107a4d9ba8f 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -8,11 +8,51 @@ #define TDX_CPUID_LEAF_ID 0x21 #define TDX_IDENT "IntelTDX " =20 +#define TDX_HYPERCALL_STANDARD 0 + +/* + * Used in __tdx_module_call() to gather the output registers' + * values of the TDCALL instruction when requesting services from + * the TDX module. This is a software only structure and not part + * of the TDX module/VMM ABI + */ +struct tdx_module_output { + u64 rcx; + u64 rdx; + u64 r8; + u64 r9; + u64 r10; + u64 r11; +}; + +/* + * Used in __tdx_hypercall() to gather the output registers' values + * of the TDCALL instruction when requesting services from the VMM. + * This is a software only structure and not part of the TDX + * module/VMM ABI. + */ +struct tdx_hypercall_output { + u64 r10; + u64 r11; + u64 r12; + u64 r13; + u64 r14; + u64 r15; +}; + #ifdef CONFIG_INTEL_TDX_GUEST =20 void __init tdx_early_init(void); bool is_tdx_guest(void); =20 +/* Used to communicate with the TDX module */ +u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, + struct tdx_module_output *out); + +/* Used to request services from the VMM */ +u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14, + u64 r15, struct tdx_hypercall_output *out); + #else =20 static inline void tdx_early_init(void) { }; diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 67415037c33c..ce3e044f7f12 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -133,7 +133,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK) +=3D pvclock.o obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) +=3D pmem.o =20 obj-$(CONFIG_JAILHOUSE_GUEST) +=3D jailhouse.o -obj-$(CONFIG_INTEL_TDX_GUEST) +=3D tdx.o +obj-$(CONFIG_INTEL_TDX_GUEST) +=3D tdcall.o tdx.o =20 obj-$(CONFIG_EISA) +=3D eisa.o obj-$(CONFIG_PCSPKR_PLATFORM) +=3D pcspeaker.o diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 9fb0a2f8b62a..8a3c6b34be7d 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -18,6 +18,7 @@ #include #include #include +#include =20 #ifdef CONFIG_XEN #include @@ -65,6 +66,25 @@ static void __used common(void) OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2); #endif =20 +#ifdef CONFIG_INTEL_TDX_GUEST + BLANK(); + /* Offset for fields in tdx_module_output */ + OFFSET(TDX_MODULE_rcx, tdx_module_output, rcx); + OFFSET(TDX_MODULE_rdx, tdx_module_output, rdx); + OFFSET(TDX_MODULE_r8, tdx_module_output, r8); + OFFSET(TDX_MODULE_r9, tdx_module_output, r9); + OFFSET(TDX_MODULE_r10, tdx_module_output, r10); + OFFSET(TDX_MODULE_r11, tdx_module_output, r11); + + /* Offset for fields in tdx_hypercall_output */ + OFFSET(TDX_HYPERCALL_r10, tdx_hypercall_output, r10); + OFFSET(TDX_HYPERCALL_r11, tdx_hypercall_output, r11); + OFFSET(TDX_HYPERCALL_r12, tdx_hypercall_output, r12); + OFFSET(TDX_HYPERCALL_r13, tdx_hypercall_output, r13); + OFFSET(TDX_HYPERCALL_r14, tdx_hypercall_output, r14); + OFFSET(TDX_HYPERCALL_r15, tdx_hypercall_output, r15); +#endif + BLANK(); OFFSET(BP_scratch, boot_params, scratch); OFFSET(BP_secure_boot, boot_params, secure_boot); diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S new file mode 100644 index 000000000000..46a49a96cf6c --- /dev/null +++ b/arch/x86/kernel/tdcall.S @@ -0,0 +1,269 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include +#include +#include +#include + +#include +#include +#include + +/* + * Bitmasks of exposed registers (with VMM). + */ +#define TDX_R10 BIT(10) +#define TDX_R11 BIT(11) +#define TDX_R12 BIT(12) +#define TDX_R13 BIT(13) +#define TDX_R14 BIT(14) +#define TDX_R15 BIT(15) + +/* Frame offset + 8 (for arg1) */ +#define ARG7_SP_OFFSET (FRAME_OFFSET + 0x08) + +/* + * These registers are clobbered to hold arguments for each + * TDVMCALL. They are safe to expose to the VMM. + * Each bit in this mask represents a register ID. Bit field + * details can be found in TDX GHCI specification, section + * titled "TDCALL [TDG.VP.VMCALL] leaf". + */ +#define TDVMCALL_EXPOSE_REGS_MASK ( TDX_R10 | TDX_R11 | \ + TDX_R12 | TDX_R13 | \ + TDX_R14 | TDX_R15 ) + +/* + * TDX guests use the TDCALL instruction to make requests to the + * TDX module and hypercalls to the VMM. It is supported in + * Binutils >=3D 2.36. + */ +#define tdcall .byte 0x66,0x0f,0x01,0xcc + +/* + * __tdx_module_call() - Used by TDX guests to request services from + * the TDX module (does not include VMM services). + * + * Transforms function call register arguments into the TDCALL + * register ABI. After TDCALL operation, TDX module output is saved + * in @out (if it is provided by the user) + * + *------------------------------------------------------------------------- + * TDCALL ABI: + *------------------------------------------------------------------------- + * Input Registers: + * + * RAX - TDCALL Leaf number. + * RCX,RDX,R8-R9 - TDCALL Leaf specific input registers. + * + * Output Registers: + * + * RAX - TDCALL instruction error code. + * RCX,RDX,R8-R11 - TDCALL Leaf specific output registers. + * + *------------------------------------------------------------------------- + * + * __tdx_module_call() function ABI: + * + * @fn (RDI) - TDCALL Leaf ID, moved to RAX + * @rcx (RSI) - Input parameter 1, moved to RCX + * @rdx (RDX) - Input parameter 2, moved to RDX + * @r8 (RCX) - Input parameter 3, moved to R8 + * @r9 (R8) - Input parameter 4, moved to R9 + * + * @out (R9) - struct tdx_module_output pointer + * stored temporarily in R12 (not + * shared with the TDX module). It + * can be NULL. + * + * Return status of TDCALL via RAX. + */ +SYM_FUNC_START(__tdx_module_call) + FRAME_BEGIN + + /* + * R12 will be used as temporary storage for + * struct tdx_module_output pointer. Since R12-R15 + * registers are not used by TDCALL services supported + * by this function, it can be reused. + */ + + /* Callee saved, so preserve it */ + push %r12 + + /* + * Push output pointer to stack. + * After the TDCALL operation, it will be fetched + * into R12 register. + */ + push %r9 + + /* Mangle function call ABI into TDCALL ABI: */ + /* Move TDCALL Leaf ID to RAX */ + mov %rdi, %rax + /* Move input 4 to R9 */ + mov %r8, %r9 + /* Move input 3 to R8 */ + mov %rcx, %r8 + /* Move input 1 to RCX */ + mov %rsi, %rcx + /* Leave input param 2 in RDX */ + + tdcall + + /* + * Fetch output pointer from stack to R12 (It is used + * as temporary storage) + */ + pop %r12 + + /* Check for TDCALL success: 0 - Successful, otherwise failed */ + test %rax, %rax + jnz .Lno_output_struct + + /* + * Since this function can be initiated without an output pointer, + * check if caller provided an output struct before storing + * output registers. + */ + test %r12, %r12 + jz .Lno_output_struct + + /* Copy TDCALL result registers to output struct: */ + movq %rcx, TDX_MODULE_rcx(%r12) + movq %rdx, TDX_MODULE_rdx(%r12) + movq %r8, TDX_MODULE_r8(%r12) + movq %r9, TDX_MODULE_r9(%r12) + movq %r10, TDX_MODULE_r10(%r12) + movq %r11, TDX_MODULE_r11(%r12) + +.Lno_output_struct: + /* Restore the state of R12 register */ + pop %r12 + + FRAME_END + ret +SYM_FUNC_END(__tdx_module_call) + +/* + * __tdx_hypercall() - Make hypercalls to a TDX VMM. + * + * Transforms function call register arguments into the TDCALL + * register ABI. After TDCALL operation, VMM output is saved in @out. + * + *------------------------------------------------------------------------- + * TD VMCALL ABI: + *------------------------------------------------------------------------- + * + * Input Registers: + * + * RAX - TDCALL instruction leaf number (0 - TDG.VP.VMCALL) + * RCX - BITMAP which controls which part of TD Guest GPR + * is passed as-is to the VMM and back. + * R10 - Set 0 to indicate TDCALL follows standard TDX ABI + * specification. Non zero value indicates vendor + * specific ABI. + * R11 - VMCALL sub function number + * RBX, RBP, RDI, RSI - Used to pass VMCALL sub function specific argumen= ts. + * R8-R9, R12-R15 - Same as above. + * + * Output Registers: + * + * RAX - TDCALL instruction status (Not related to hyperca= ll + * output). + * R10 - Hypercall output error code. + * R11-R15 - Hypercall sub function specific output values. + * + *------------------------------------------------------------------------- + * + * __tdx_hypercall() function ABI: + * + * @type (RDI) - TD VMCALL type, moved to R10 + * @fn (RSI) - TD VMCALL sub function, moved to R11 + * @r12 (RDX) - Input parameter 1, moved to R12 + * @r13 (RCX) - Input parameter 2, moved to R13 + * @r14 (R8) - Input parameter 3, moved to R14 + * @r15 (R9) - Input parameter 4, moved to R15 + * + * @out (stack) - struct tdx_hypercall_output pointer (cannot be NU= LL) + * + * On successful completion, return TDCALL status or -EINVAL for invalid + * inputs. + */ +SYM_FUNC_START(__tdx_hypercall) + FRAME_BEGIN + + /* Move argument 7 from caller stack to RAX */ + movq ARG7_SP_OFFSET(%rsp), %rax + + /* Check if caller provided an output struct */ + test %rax, %rax + /* If out pointer is NULL, return -EINVAL */ + jz .Lret_err + + /* Save callee-saved GPRs as mandated by the x86_64 ABI */ + push %r15 + push %r14 + push %r13 + push %r12 + + /* + * Save output pointer (rax) on the stack, it will be used again + * when storing the output registers after the TDCALL operation. + */ + push %rax + + /* Mangle function call ABI into TDCALL ABI: */ + /* Set TDCALL leaf ID (TDVMCALL (0)) in RAX */ + xor %eax, %eax + /* Move TDVMCALL type (standard vs vendor) in R10 */ + mov %rdi, %r10 + /* Move TDVMCALL sub function id to R11 */ + mov %rsi, %r11 + /* Move input 1 to R12 */ + mov %rdx, %r12 + /* Move input 2 to R13 */ + mov %rcx, %r13 + /* Move input 3 to R14 */ + mov %r8, %r14 + /* Move input 4 to R15 */ + mov %r9, %r15 + + movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx + + tdcall + + /* Restore output pointer to R9 */ + pop %r9 + + /* Copy hypercall result registers to output struct: */ + movq %r10, TDX_HYPERCALL_r10(%r9) + movq %r11, TDX_HYPERCALL_r11(%r9) + movq %r12, TDX_HYPERCALL_r12(%r9) + movq %r13, TDX_HYPERCALL_r13(%r9) + movq %r14, TDX_HYPERCALL_r14(%r9) + movq %r15, TDX_HYPERCALL_r15(%r9) + + /* + * Zero out registers exposed to the VMM to avoid + * speculative execution with VMM-controlled values. + * This needs to include all registers present in + * TDVMCALL_EXPOSE_REGS_MASK (except R12-R15). + * R12-R15 context will be restored. + */ + xor %r10d, %r10d + xor %r11d, %r11d + + /* Restore callee-saved GPRs as mandated by the x86_64 ABI */ + pop %r12 + pop %r13 + pop %r14 + pop %r15 + + jmp .Lhcall_done +.Lret_err: + movq $-EINVAL, %rax +.Lhcall_done: + FRAME_END + + retq +SYM_FUNC_END(__tdx_hypercall) diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index 1ef6979a6434..d40b6df51e26 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -9,6 +9,29 @@ =20 static bool tdx_guest_detected __ro_after_init; =20 +/* + * Wrapper for standard use of __tdx_hypercall with panic report + * for TDCALL error. + */ +static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, + u64 r15, struct tdx_hypercall_output *out) +{ + struct tdx_hypercall_output dummy_out; + u64 err; + + /* __tdx_hypercall() does not accept NULL output pointer */ + if (!out) + out =3D &dummy_out; + + /* Non zero return value indicates buggy TDX module, so panic */ + err =3D __tdx_hypercall(TDX_HYPERCALL_STANDARD, fn, r12, r13, r14, + r15, out); + if (err) + panic("Hypercall fn %llu failed (Buggy TDX module!)\n", fn); + + return out->r10; +} + bool is_tdx_guest(void) { return tdx_guest_detected; --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37803C433EF for ; Mon, 24 Jan 2022 15:02:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238921AbiAXPCP (ORCPT ); Mon, 24 Jan 2022 10:02:15 -0500 Received: from mga01.intel.com ([192.55.52.88]:64635 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238258AbiAXPCO (ORCPT ); Mon, 24 Jan 2022 10:02:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036533; x=1674572533; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YfNJ+Z5jmbyRfVApmGRzvisUXCbBpskwPlYNL4Rp32I=; b=iDfVQSGeR40frPkkWaDvpLIVa4Ro1e0uwReQ490czglyovf4/e9cSimN 0pO9fYg9wrEZwZ4aH7y/bqFuBL1sAx9/pte8IUbYJ5hqqMNaZBt6YDFv8 Oi2Hbch+nPpLvdp/b6/y5eMPNLRmCYMbvC8Y7suYGlQ/6TdCh/wHBqdRK s0HCD5J5/FbU5gw63lLKSDb0/4rn+BWBYvviOp3CAHX+0QuMNA4R9SnQ4 WKYmTj+XhYNFiVhjslUKMFbmxN3xMsfkZfgJBbGt5HScXQif2qBKXLun5 GZgXrfdr2behni0oykr5KT8AD3823WA72fMG0uoeMIfXcllmVnMxGn62y g==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="270498501" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="270498501" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="580395540" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga008.fm.intel.com with ESMTP; 24 Jan 2022 07:02:06 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 78AC449F; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Sean Christopherson Subject: [PATCHv2 04/29] x86/traps: Add #VE support for TDX guest Date: Mon, 24 Jan 2022 18:01:50 +0300 Message-Id: <20220124150215.36893-5-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Virtualization Exceptions (#VE) are delivered to TDX guests due to specific guest actions which may happen in either user space or the kernel: * Specific instructions (WBINVD, for example) * Specific MSR accesses * Specific CPUID leaf accesses * Access to unmapped pages (EPT violation) In the settings that Linux will run in, virtualization exceptions are never generated on accesses to normal, TD-private memory that has been accepted. Syscall entry code has a critical window where the kernel stack is not yet set up. Any exception in this window leads to hard to debug issues and can be exploited for privilege escalation. Exceptions in the NMI entry code also cause issues. Returning from the exception handler with IRET will re-enable NMIs and nested NMI will corrupt the NMI stack. For these reasons, the kernel avoids #VEs during the syscall gap and the NMI entry code. Entry code paths do not access TD-shared memory, MMIO regions, use #VE triggering MSRs, instructions, or CPUID leaves that might generate #VE. VMM can remove memory from TD at any point, but access to unaccepted (or missing) private memory leads to VM termination, not to #VE. Similarly to page faults and breakpoints, #VEs are allowed in NMI handlers once the kernel is ready to deal with nested NMIs. During #VE delivery, all interrupts, including NMIs, are blocked until TDGETVEINFO is called. It prevents #VE nesting until the kernel reads the VE info. If a guest kernel action which would normally cause a #VE occurs in the interrupt-disabled region before TDGETVEINFO, a #DF (fault exception) is delivered to the guest which will result in an oops. Add basic infrastructure to handle any #VE which occurs in the kernel or userspace. Later patches will add handling for specific #VE scenarios. For now, convert unhandled #VE's (everything, until later in this series) so that they appear just like a #GP by calling the ve_raise_fault() directly. The ve_raise_fault() function is similar to #GP handler and is responsible for sending SIGSEGV to userspace and CPU die and notifying debuggers and other die chain users. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/idtentry.h | 4 ++ arch/x86/include/asm/tdx.h | 21 ++++++ arch/x86/kernel/idt.c | 3 + arch/x86/kernel/tdx.c | 63 ++++++++++++++++++ arch/x86/kernel/traps.c | 110 ++++++++++++++++++++++++++++++++ 5 files changed, 201 insertions(+) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentr= y.h index 1345088e9902..8ccc81d653b3 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -625,6 +625,10 @@ DECLARE_IDTENTRY_XENCB(X86_TRAP_OTHER, exc_xen_hypervi= sor_callback); DECLARE_IDTENTRY_RAW(X86_TRAP_OTHER, exc_xen_unknown_trap); #endif =20 +#ifdef CONFIG_INTEL_TDX_GUEST +DECLARE_IDTENTRY(X86_TRAP_VE, exc_virtualization_exception); +#endif + /* Device interrupts common/spurious */ DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER, common_interrupt); #ifdef CONFIG_X86_LOCAL_APIC diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 5107a4d9ba8f..d17143290f0a 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -4,6 +4,7 @@ #define _ASM_X86_TDX_H =20 #include +#include =20 #define TDX_CPUID_LEAF_ID 0x21 #define TDX_IDENT "IntelTDX " @@ -40,6 +41,22 @@ struct tdx_hypercall_output { u64 r15; }; =20 +/* + * Used by the #VE exception handler to gather the #VE exception + * info from the TDX module. This is a software only structure + * and not part of the TDX module/VMM ABI. + */ +struct ve_info { + u64 exit_reason; + u64 exit_qual; + /* Guest Linear (virtual) Address */ + u64 gla; + /* Guest Physical (virtual) Address */ + u64 gpa; + u32 instr_len; + u32 instr_info; +}; + #ifdef CONFIG_INTEL_TDX_GUEST =20 void __init tdx_early_init(void); @@ -53,6 +70,10 @@ u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, = u64 r9, u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14, u64 r15, struct tdx_hypercall_output *out); =20 +bool tdx_get_ve_info(struct ve_info *ve); + +bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve); + #else =20 static inline void tdx_early_init(void) { }; diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index df0fa695bb09..1da074123c16 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -68,6 +68,9 @@ static const __initconst struct idt_data early_idts[] =3D= { */ INTG(X86_TRAP_PF, asm_exc_page_fault), #endif +#ifdef CONFIG_INTEL_TDX_GUEST + INTG(X86_TRAP_VE, asm_exc_virtualization_exception), +#endif }; =20 /* diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index d40b6df51e26..5a5b25f9c4d3 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -7,6 +7,9 @@ #include #include =20 +/* TDX module Call Leaf IDs */ +#define TDX_GET_VEINFO 3 + static bool tdx_guest_detected __ro_after_init; =20 /* @@ -32,6 +35,66 @@ static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r1= 3, u64 r14, return out->r10; } =20 +bool tdx_get_ve_info(struct ve_info *ve) +{ + struct tdx_module_output out; + + /* + * NMIs and machine checks are suppressed. Before this point any + * #VE is fatal. After this point (TDGETVEINFO call), NMIs and + * additional #VEs are permitted (but it is expected not to + * happen unless kernel panics). + */ + if (__tdx_module_call(TDX_GET_VEINFO, 0, 0, 0, 0, &out)) + return false; + + ve->exit_reason =3D out.rcx; + ve->exit_qual =3D out.rdx; + ve->gla =3D out.r8; + ve->gpa =3D out.r9; + ve->instr_len =3D lower_32_bits(out.r10); + ve->instr_info =3D upper_32_bits(out.r10); + + return true; +} + +/* + * Handle the user initiated #VE. + * + * For example, executing the CPUID instruction from user space + * is a valid case and hence the resulting #VE has to be handled. + * + * For dis-allowed or invalid #VE just return failure. + */ +static bool tdx_virt_exception_user(struct pt_regs *regs, struct ve_info *= ve) +{ + pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); + return false; +} + +/* Handle the kernel #VE */ +static bool tdx_virt_exception_kernel(struct pt_regs *regs, struct ve_info= *ve) +{ + pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); + return false; +} + +bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve) +{ + bool ret; + + if (user_mode(regs)) + ret =3D tdx_virt_exception_user(regs, ve); + else + ret =3D tdx_virt_exception_kernel(regs, ve); + + /* After successful #VE handling, move the IP */ + if (ret) + regs->ip +=3D ve->instr_len; + + return ret; +} + bool is_tdx_guest(void) { return tdx_guest_detected; diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index c9d566dcf89a..428504535912 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -61,6 +61,7 @@ #include #include #include +#include =20 #ifdef CONFIG_X86_64 #include @@ -1212,6 +1213,115 @@ DEFINE_IDTENTRY(exc_device_not_available) } } =20 +#ifdef CONFIG_INTEL_TDX_GUEST + +#define VE_FAULT_STR "VE fault" + +static void ve_raise_fault(struct pt_regs *regs, long error_code) +{ + struct task_struct *tsk =3D current; + + if (user_mode(regs)) { + tsk->thread.error_code =3D error_code; + tsk->thread.trap_nr =3D X86_TRAP_VE; + show_signal(tsk, SIGSEGV, "", VE_FAULT_STR, regs, error_code); + force_sig(SIGSEGV); + return; + } + + /* + * Attempt to recover from #VE exception failure without + * triggering OOPS (useful for MSR read/write failures) + */ + if (fixup_exception(regs, X86_TRAP_VE, error_code, 0)) + return; + + tsk->thread.error_code =3D error_code; + tsk->thread.trap_nr =3D X86_TRAP_VE; + + /* + * To be potentially processing a kprobe fault and to trust the result + * from kprobe_running(), it should be non-preemptible. + */ + if (!preemptible() && kprobe_running() && + kprobe_fault_handler(regs, X86_TRAP_VE)) + return; + + /* Notify about #VE handling failure, useful for debugger hooks */ + if (notify_die(DIE_GPF, VE_FAULT_STR, regs, error_code, + X86_TRAP_VE, SIGSEGV) =3D=3D NOTIFY_STOP) + return; + + /* Trigger OOPS and panic */ + die_addr(VE_FAULT_STR, regs, error_code, 0); +} + +/* + * Virtualization Exceptions (#VE) are delivered to TDX guests due to + * specific guest actions which may happen in either user space or the + * kernel: + * + * * Specific instructions (WBINVD, for example) + * * Specific MSR accesses + * * Specific CPUID leaf accesses + * * Access to unmapped pages (EPT violation) + * + * In the settings that Linux will run in, virtualization exceptions are + * never generated on accesses to normal, TD-private memory that has been + * accepted. + * + * Syscall entry code has a critical window where the kernel stack is not + * yet set up. Any exception in this window leads to hard to debug issues + * and can be exploited for privilege escalation. Exceptions in the NMI + * entry code also cause issues. Returning from the exception handler with + * IRET will re-enable NMIs and nested NMI will corrupt the NMI stack. + * + * For these reasons, the kernel avoids #VEs during the syscall gap and + * the NMI entry code. Entry code paths do not access TD-shared memory, + * MMIO regions, use #VE triggering MSRs, instructions, or CPUID leaves + * that might generate #VE. VMM can remove memory from TD at any point, + * but access to unaccepted (or missing) private memory leads to VM + * termination, not to #VE. + * + * Similarly to page faults and breakpoints, #VEs are allowed in NMI + * handlers once the kernel is ready to deal with nested NMIs. + * + * During #VE delivery, all interrupts, including NMIs, are blocked until + * TDGETVEINFO is called. It prevents #VE nesting until the kernel reads + * the VE info. + * + * If a guest kernel action which would normally cause a #VE occurs in + * the interrupt-disabled region before TDGETVEINFO, a #DF (fault + * exception) is delivered to the guest which will result in an oops. + */ +DEFINE_IDTENTRY(exc_virtualization_exception) +{ + struct ve_info ve; + bool ret; + + /* + * NMIs/Machine-checks/Interrupts will be in a disabled state + * till TDGETVEINFO TDCALL is executed. This ensures that VE + * info cannot be overwritten by a nested #VE. + */ + ret =3D tdx_get_ve_info(&ve); + + cond_local_irq_enable(regs); + + if (ret) + ret =3D tdx_handle_virt_exception(regs, &ve); + /* + * If tdx_handle_virt_exception() could not process + * it successfully, treat it as #GP(0) and handle it. + */ + if (!ret) + ve_raise_fault(regs, 0); + + cond_local_irq_disable(regs); +} + +#endif + #ifdef CONFIG_X86_32 DEFINE_IDTENTRY_SW(iret_error) { --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68515C433EF for ; Mon, 24 Jan 2022 15:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240399AbiAXPCj (ORCPT ); Mon, 24 Jan 2022 10:02:39 -0500 Received: from mga17.intel.com ([192.55.52.151]:63899 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239874AbiAXPCV (ORCPT ); Mon, 24 Jan 2022 10:02:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036541; x=1674572541; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bFXr12xG4kfNh0tU9QaIEKqimwMPvI73xrWlRH6vZxE=; b=jNIZrJDXNO6dDE5mboLTYcX28wbWvtAEEpY4HRqtgjhnQ5XcNflKAF53 N3oYlfUEBaROWR1cUZzHMJSnGwyAJ0Nuw27b7zFVUxoNormLE+4VFwjle Ko1AlCe6fQPL+iIhJ0sSmj0LQJWmSLsXrnUW63jKA3b5ACqRW3CpzM4wc J7f0F6UoA3Ej0CZ4LVC+j9xereWH+sz1KnpWrbK2NhxkpS3nF70CH8GV4 kJp5EDx3s6kT/jqbIw/miIy5upFfi5iHQH/Up5DqNhX4MnR5pNrY/BkRE 73RY30XHK+Xd5kqHltlsSOI6fnv/nsr2iy5I1Njd/IOn6xeILxiLHucA8 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="226734654" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="226734654" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="562680159" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga001.jf.intel.com with ESMTP; 24 Jan 2022 07:02:13 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 86E724BB; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 05/29] x86/tdx: Add HLT support for TDX guests Date: Mon, 24 Jan 2022 18:01:51 +0300 Message-Id: <20220124150215.36893-6-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The HLT instruction is a privileged instruction, executing it stops instruction execution and places the processor in a HALT state. It is used in kernel for cases like reboot, idle loop and exception fixup handlers. For the idle case, interrupts will be enabled (using STI) before the HLT instruction (this is also called safe_halt()). To support the HLT instruction in TDX guests, it needs to be emulated using TDVMCALL (hypercall to VMM). More details about it can be found in Intel Trust Domain Extensions (Intel TDX) Guest-Host-Communication Interface (GHCI) specification, section TDVMCALL[Instruction.HLT]. In TDX guests, executing HLT instruction will generate a #VE, which is used to emulate the HLT instruction. But #VE based emulation will not work for the safe_halt() flavor, because it requires STI instruction to be executed just before the TDCALL. Since idle loop is the only user of safe_halt() variant, handle it as a special case. To avoid *safe_halt() call in the idle function, define the tdx_guest_idle() and use it to override the "x86_idle" function pointer for a valid TDX guest. Alternative choices like PV ops have been considered for adding safe_halt() support. But it was rejected because HLT paravirt calls only exist under PARAVIRT_XXL, and enabling it in TDX guest just for safe_halt() use case is not worth the cost. Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/tdx.h | 3 ++ arch/x86/kernel/process.c | 5 +++ arch/x86/kernel/tdcall.S | 31 +++++++++++++++++ arch/x86/kernel/tdx.c | 70 ++++++++++++++++++++++++++++++++++++-- 4 files changed, 107 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index d17143290f0a..9b4714a45bb9 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -74,10 +74,13 @@ bool tdx_get_ve_info(struct ve_info *ve); =20 bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve); =20 +void tdx_safe_halt(void); + #else =20 static inline void tdx_early_init(void) { }; static inline bool is_tdx_guest(void) { return false; } +static inline void tdx_safe_halt(void) { }; =20 #endif /* CONFIG_INTEL_TDX_GUEST */ =20 diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 81d8ef036637..d48afc69ebfa 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -46,6 +46,7 @@ #include #include #include +#include =20 #include "process.h" =20 @@ -870,6 +871,10 @@ void select_idle_routine(const struct cpuinfo_x86 *c) } else if (prefer_mwait_c1_over_halt(c)) { pr_info("using mwait in idle threads\n"); x86_idle =3D mwait_idle; + } else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) { + pr_info("using TDX aware idle routine\n"); + x86_idle =3D tdx_safe_halt; + return; } else x86_idle =3D default_idle; } diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S index 46a49a96cf6c..ae74da33ccc6 100644 --- a/arch/x86/kernel/tdcall.S +++ b/arch/x86/kernel/tdcall.S @@ -3,6 +3,7 @@ #include #include #include +#include =20 #include #include @@ -39,6 +40,12 @@ */ #define tdcall .byte 0x66,0x0f,0x01,0xcc =20 +/* + * Used in __tdx_hypercall() to determine whether to enable interrupts + * before issuing TDCALL for the EXIT_REASON_HLT case. + */ +#define ENABLE_IRQS_BEFORE_HLT 0x01 + /* * __tdx_module_call() - Used by TDX guests to request services from * the TDX module (does not include VMM services). @@ -230,6 +237,30 @@ SYM_FUNC_START(__tdx_hypercall) =20 movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx =20 + /* + * For the idle loop STI needs to be called directly before + * the TDCALL that enters idle (EXIT_REASON_HLT case). STI + * instruction enables interrupts only one instruction later. + * If there is a window between STI and the instruction that + * emulates the HALT state, there is a chance for interrupts to + * happen in this window, which can delay the HLT operation + * indefinitely. Since this is the not the desired result, + * conditionally call STI before TDCALL. + * + * Since STI instruction is only required for the idle case + * (a special case of EXIT_REASON_HLT), use the r15 register + * value to identify it. Since the R15 register is not used + * by the VMM as per EXIT_REASON_HLT ABI, re-use it in + * software to identify the STI case. + */ + cmpl $EXIT_REASON_HLT, %r11d + jne .Lskip_sti + cmpl $ENABLE_IRQS_BEFORE_HLT, %r15d + jne .Lskip_sti + /* Set R15 register to 0, it is unused in EXIT_REASON_HLT case */ + xor %r15, %r15 + sti +.Lskip_sti: tdcall =20 /* Restore output pointer to R9 */ diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index 5a5b25f9c4d3..eeb456631a65 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -6,6 +6,7 @@ =20 #include #include +#include =20 /* TDX module Call Leaf IDs */ #define TDX_GET_VEINFO 3 @@ -35,6 +36,61 @@ static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r1= 3, u64 r14, return out->r10; } =20 +static u64 __cpuidle _tdx_halt(const bool irq_disabled, const bool do_sti) +{ + /* + * Emulate HLT operation via hypercall. More info about ABI + * can be found in TDX Guest-Host-Communication Interface + * (GHCI), sec 3.8 TDG.VP.VMCALL. + * + * The VMM uses the "IRQ disabled" param to understand IRQ + * enabled status (RFLAGS.IF) of the TD guest and to determine + * whether or not it should schedule the halted vCPU if an + * IRQ becomes pending. E.g. if IRQs are disabled, the VMM + * can keep the vCPU in virtual HLT, even if an IRQ is + * pending, without hanging/breaking the guest. + * + * do_sti parameter is used by the __tdx_hypercall() to decide + * whether to call the STI instruction before executing the + * TDCALL instruction. + */ + return _tdx_hypercall(EXIT_REASON_HLT, irq_disabled, 0, 0, + do_sti, NULL); +} + +static bool tdx_halt(void) +{ + /* + * Since non safe halt is mainly used in CPU offlining + * and the guest will always stay in the halt state, don't + * call the STI instruction (set do_sti as false). + */ + const bool irq_disabled =3D irqs_disabled(); + const bool do_sti =3D false; + + if (_tdx_halt(irq_disabled, do_sti)) + return false; + + return true; +} + +void __cpuidle tdx_safe_halt(void) +{ + /* + * For do_sti=3Dtrue case, __tdx_hypercall() function enables + * interrupts using the STI instruction before the TDCALL. So + * set irq_disabled as false. + */ + const bool irq_disabled =3D false; + const bool do_sti =3D true; + + /* + * Use WARN_ONCE() to report the failure. + */ + if (_tdx_halt(irq_disabled, do_sti)) + WARN_ONCE(1, "HLT instruction emulation failed\n"); +} + bool tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; @@ -75,8 +131,18 @@ static bool tdx_virt_exception_user(struct pt_regs *reg= s, struct ve_info *ve) /* Handle the kernel #VE */ static bool tdx_virt_exception_kernel(struct pt_regs *regs, struct ve_info= *ve) { - pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); - return false; + bool ret =3D false; + + switch (ve->exit_reason) { + case EXIT_REASON_HLT: + ret =3D tdx_halt(); + break; + default: + pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); + break; + } + + return ret; } =20 bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve) --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63E1CC433EF for ; Mon, 24 Jan 2022 15:03:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240164AbiAXPDf (ORCPT ); Mon, 24 Jan 2022 10:03:35 -0500 Received: from mga17.intel.com ([192.55.52.151]:63899 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239195AbiAXPC2 (ORCPT ); Mon, 24 Jan 2022 10:02:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036548; x=1674572548; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rz6Q1RPO6JdnPBoGEUw9pTSfnxOZX28u8jhsGtM79xQ=; b=Lj9eNTrtVnaQUJ6w9bgmE7yfa06J8FS9YXhYqRoEP1fRofU+5fBiJJhG wenCDBOX2kHy82Tk4n9Po0n9O/4Z275T6TcZNUlZt+1Mvld97Q59dbgCM zghYRynmccCGDdTWEyL5ymS8zRG6hcEz5RKLQQ9cs+8lcho2IliRLMzKO iS06HNCFJq8DIZr8BXS99sKF4MzrUXF9KkEo+rfGXbvtrgjAUK/ZRS8Tw 576H7FtR3twmIuhPbwB5JTTC004UlB4nsqumVCjSdHDtg10USebN2qN7e KVK9rLJp/O1OCKkKzkduz8O2uHgK1DWIoKk5prAnqjsIkxpj4uUOxxXJa g==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="226734697" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="226734697" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:25 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="617258453" Received: from black.fi.intel.com ([10.237.72.28]) by FMSMGA003.fm.intel.com with ESMTP; 24 Jan 2022 07:02:13 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 942654DB; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 06/29] x86/tdx: Add MSR support for TDX guests Date: Mon, 24 Jan 2022 18:01:52 +0300 Message-Id: <20220124150215.36893-7-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Use hypercall to emulate MSR read/write for the TDX platform. There are two viable approaches for doing MSRs in a TD guest: 1. Execute the RDMSR/WRMSR instructions like most VMs and bare metal do. Some will succeed, others will cause a #VE. All of those that cause a #VE will be handled with a TDCALL. 2. Use paravirt infrastructure. The paravirt hook has to keep a list of which MSRs would cause a #VE and use a TDCALL. All other MSRs execute RDMSR/WRMSR instructions directly. The second option can be ruled out because the list of MSRs was challenging to maintain. That leaves option #1 as the only viable solution for the minimal TDX support. For performance-critical MSR writes (like TSC_DEADLINE), future patches will replace the WRMSR/#VE sequence with the direct TDCALL. RDMSR and WRMSR specification details can be found in Guest-Host-Communication Interface (GHCI) for Intel Trust Domain Extensions (Intel TDX) specification, sec titled "TDG.VP. VMCALL" and "TDG.VP.VMCALL". Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/kernel/tdx.c | 44 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index eeb456631a65..29a03a4bdb53 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -91,6 +91,39 @@ void __cpuidle tdx_safe_halt(void) WARN_ONCE(1, "HLT instruction emulation failed\n"); } =20 +static bool tdx_read_msr(unsigned int msr, u64 *val) +{ + struct tdx_hypercall_output out; + + /* + * Emulate the MSR read via hypercall. More info about ABI + * can be found in TDX Guest-Host-Communication Interface + * (GHCI), sec titled "TDG.VP.VMCALL". + */ + if (_tdx_hypercall(EXIT_REASON_MSR_READ, msr, 0, 0, 0, &out)) + return false; + + *val =3D out.r11; + + return true; +} + +static bool tdx_write_msr(unsigned int msr, unsigned int low, + unsigned int high) +{ + u64 ret; + + /* + * Emulate the MSR write via hypercall. More info about ABI + * can be found in TDX Guest-Host-Communication Interface + * (GHCI) sec titled "TDG.VP.VMCALL". + */ + ret =3D _tdx_hypercall(EXIT_REASON_MSR_WRITE, msr, (u64)high << 32 | low, + 0, 0, NULL); + + return ret ? false : true; +} + bool tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; @@ -132,11 +165,22 @@ static bool tdx_virt_exception_user(struct pt_regs *r= egs, struct ve_info *ve) static bool tdx_virt_exception_kernel(struct pt_regs *regs, struct ve_info= *ve) { bool ret =3D false; + u64 val; =20 switch (ve->exit_reason) { case EXIT_REASON_HLT: ret =3D tdx_halt(); break; + case EXIT_REASON_MSR_READ: + ret =3D tdx_read_msr(regs->cx, &val); + if (ret) { + regs->ax =3D lower_32_bits(val); + regs->dx =3D upper_32_bits(val); + } + break; + case EXIT_REASON_MSR_WRITE: + ret =3D tdx_write_msr(regs->cx, regs->ax, regs->dx); + break; default: pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); break; --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54F38C433EF for ; Mon, 24 Jan 2022 15:02:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240167AbiAXPC3 (ORCPT ); Mon, 24 Jan 2022 10:02:29 -0500 Received: from mga01.intel.com ([192.55.52.88]:64659 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239696AbiAXPCV (ORCPT ); Mon, 24 Jan 2022 10:02:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036541; x=1674572541; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Djfyc1VSSpZn3ALeKiQbg54iksPf4MVncppAVM6UlBY=; b=Nib/8vHeyFGyW0RdrrAeCo+GnnIGo15JSmR/kIHfqXxuMNqB9PRlRB+p vnEtB5zFHRMO4vWQBbLM7CwPwyzdDUNSoxHsl8t+76/tkSX5pfVHK11JG h+Y4NmdfnluQNPM5iyU/BTzWv1QwsOs8G+GfYTW8bf7vkKlwM4zSy6baz 5zL3gNpuQjW/wx6O2r2yq6Wc1pdnFq4dsPNuQVZdLL0Q6V8ooQtXpOYzC ClrCYcX6corEv9YEstoTR51mCSEt6pRHUURIw4DOEseF13NA2/+n9PYCM uUlFxn5u2uB5tssoJdfWbGiVgik+65I8PnN9wnfq9PWq7GtQV9eWd5EhJ Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="270498557" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="270498557" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="580395625" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga008.fm.intel.com with ESMTP; 24 Jan 2022 07:02:13 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id A1BE9501; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 07/29] x86/tdx: Handle CPUID via #VE Date: Mon, 24 Jan 2022 18:01:53 +0300 Message-Id: <20220124150215.36893-8-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In TDX guests, most CPUID leaf/sub-leaf combinations are virtualized by the TDX module while some trigger #VE. Implement the #VE handling for EXIT_REASON_CPUID by handing it through the hypercall, which in turn lets the TDX module handle it by invoking the host VMM. More details on CPUID Virtualization can be found in the TDX module specification, the section titled "CPUID Virtualization". Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/kernel/tdx.c | 42 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index 29a03a4bdb53..f213c67b4ecc 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -124,6 +124,31 @@ static bool tdx_write_msr(unsigned int msr, unsigned i= nt low, return ret ? false : true; } =20 +static bool tdx_handle_cpuid(struct pt_regs *regs) +{ + struct tdx_hypercall_output out; + + /* + * Emulate the CPUID instruction via a hypercall. More info about + * ABI can be found in TDX Guest-Host-Communication Interface + * (GHCI), section titled "VP.VMCALL". + */ + if (_tdx_hypercall(EXIT_REASON_CPUID, regs->ax, regs->cx, 0, 0, &out)) + return false; + + /* + * As per TDX GHCI CPUID ABI, r12-r15 registers contain contents of + * EAX, EBX, ECX, EDX registers after the CPUID instruction execution. + * So copy the register contents back to pt_regs. + */ + regs->ax =3D out.r12; + regs->bx =3D out.r13; + regs->cx =3D out.r14; + regs->dx =3D out.r15; + + return true; +} + bool tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; @@ -157,8 +182,18 @@ bool tdx_get_ve_info(struct ve_info *ve) */ static bool tdx_virt_exception_user(struct pt_regs *regs, struct ve_info *= ve) { - pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); - return false; + bool ret =3D false; + + switch (ve->exit_reason) { + case EXIT_REASON_CPUID: + ret =3D tdx_handle_cpuid(regs); + break; + default: + pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); + break; + } + + return ret; } =20 /* Handle the kernel #VE */ @@ -181,6 +216,9 @@ static bool tdx_virt_exception_kernel(struct pt_regs *r= egs, struct ve_info *ve) case EXIT_REASON_MSR_WRITE: ret =3D tdx_write_msr(regs->cx, regs->ax, regs->dx); break; + case EXIT_REASON_CPUID: + ret =3D tdx_handle_cpuid(regs); + break; default: pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); break; --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 609FFC433FE for ; Mon, 24 Jan 2022 15:02:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240330AbiAXPCg (ORCPT ); Mon, 24 Jan 2022 10:02:36 -0500 Received: from mga14.intel.com ([192.55.52.115]:43409 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239154AbiAXPCV (ORCPT ); Mon, 24 Jan 2022 10:02:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036541; x=1674572541; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YHy4p1HIVc+vzZYrEdBeLSslF/unh4oDHdSwuZYIthU=; b=CT8qzOvFdUqIFEf6BmR43zSefKaP2czld+0dGVhbGPWuTtipFGfB5i3C H+IsXO9PeiwnsH7QWKvj+hizzHiO0q9oE4OoL4kSaxkMgwZFNehfetxgl T/IAvLQ5XuX6UnT49FLfXplc7prqDFIoH6F24RAZIWgWclcnpdDb3sbqz PK60BDb1oU+c7i9C5G2ZQE60A1c2hnhsYSGJWNrDs6NkN70X4DD+QKRVJ YRrAM2TuRf3yr55L422BGiSLdVSqmkcYMsOx99IHd7a0eLmLHqUXGvTFL jaqauJWEQsLfc30nS9v2Q3b8nHNIqKcFwJ1NyRemz6iw1h7df8FMQULYk w==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="246280270" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="246280270" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="596810264" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga004.fm.intel.com with ESMTP; 24 Jan 2022 07:02:13 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id AF4CC557; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 08/29] x86/tdx: Handle in-kernel MMIO Date: Mon, 24 Jan 2022 18:01:54 +0300 Message-Id: <20220124150215.36893-9-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In non-TDX VMs, MMIO is implemented by providing the guest a mapping which will cause a VMEXIT on access and then the VMM emulating the instruction that caused the VMEXIT. That's not possible for TDX VM. To emulate an instruction an emulator needs two things: - R/W access to the register file to read/modify instruction arguments and see RIP of the faulted instruction. - Read access to memory where instruction is placed to see what to emulate. In this case it is guest kernel text. Both of them are not available to VMM in TDX environment: - Register file is never exposed to VMM. When a TD exits to the module, it saves registers into the state-save area allocated for that TD. The module then scrubs these registers before returning execution control to the VMM, to help prevent leakage of TD state. - Memory is encrypted a TD-private key. The CPU disallows software other than the TDX module and TDs from making memory accesses using the private key. In TDX the MMIO regions are instead configured to trigger a #VE exception in the guest. The guest #VE handler then emulates the MMIO instruction inside the guest and converts it into a controlled hypercall to the host. MMIO addresses can be used with any CPU instruction that accesses memory. This patch, however, covers only MMIO accesses done via io.h helpers, such as 'readl()' or 'writeq()'. readX()/writeX() helpers limit the range of instructions which can trigger MMIO. It makes MMIO instruction emulation feasible. Raw access to MMIO region allows compiler to generate whatever instruction it wants. Supporting all possible instructions is a task of a different scope MMIO access with anything other than helpers from io.h may result in MMIO_DECODE_FAILED and an oops. AMD SEV has the same limitations to MMIO handling. =3D=3D=3D Potential alternative approaches =3D=3D=3D =3D=3D Paravirtualizing all MMIO =3D=3D An alternative to letting MMIO induce a #VE exception is to avoid the #VE in the first place. Similar to the port I/O case, it is theoretically possible to paravirtualize MMIO accesses. Like the exception-based approach offered here, a fully paravirtualized approach would be limited to MMIO users that leverage common infrastructure like the io.h macros. However, any paravirtual approach would be patching approximately 120k call sites. With a conservative overhead estimation of 5 bytes per call site (CALL instruction), it leads to bloating code by 600k. Many drivers will never be used in the TDX environment and the bloat cannot be justified. =3D=3D Patching TDX drivers =3D=3D Rather than touching the entire kernel, it might also be possible to just go after drivers that use MMIO in TDX guests. Right now, that's limited only to virtio and some x86-specific drivers. All virtio MMIO appears to be done through a single function, which makes virtio eminently easy to patch. This will be implemented in the future, removing the bulk of MMIO #VEs. Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/kernel/tdx.c | 114 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 114 insertions(+) diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index f213c67b4ecc..8e630eeb765d 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -7,6 +7,8 @@ #include #include #include +#include +#include =20 /* TDX module Call Leaf IDs */ #define TDX_GET_VEINFO 3 @@ -149,6 +151,112 @@ static bool tdx_handle_cpuid(struct pt_regs *regs) return true; } =20 +static bool tdx_mmio(int size, bool write, unsigned long addr, + unsigned long *val) +{ + struct tdx_hypercall_output out; + u64 err; + + err =3D _tdx_hypercall(EXIT_REASON_EPT_VIOLATION, size, write, + addr, *val, &out); + if (err) + return true; + + *val =3D out.r11; + return false; +} + +static bool tdx_mmio_read(int size, unsigned long addr, unsigned long *val) +{ + return tdx_mmio(size, false, addr, val); +} + +static bool tdx_mmio_write(int size, unsigned long addr, unsigned long *va= l) +{ + return tdx_mmio(size, true, addr, val); +} + +static int tdx_handle_mmio(struct pt_regs *regs, struct ve_info *ve) +{ + char buffer[MAX_INSN_SIZE]; + unsigned long *reg, val =3D 0; + struct insn insn =3D {}; + enum mmio_type mmio; + int size; + bool err; + + if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE)) + return -EFAULT; + + if (insn_decode(&insn, buffer, MAX_INSN_SIZE, INSN_MODE_64)) + return -EFAULT; + + mmio =3D insn_decode_mmio(&insn, &size); + if (WARN_ON_ONCE(mmio =3D=3D MMIO_DECODE_FAILED)) + return -EFAULT; + + if (mmio !=3D MMIO_WRITE_IMM && mmio !=3D MMIO_MOVS) { + reg =3D insn_get_modrm_reg_ptr(&insn, regs); + if (!reg) + return -EFAULT; + } + + switch (mmio) { + case MMIO_WRITE: + memcpy(&val, reg, size); + err =3D tdx_mmio_write(size, ve->gpa, &val); + break; + case MMIO_WRITE_IMM: + val =3D insn.immediate.value; + err =3D tdx_mmio_write(size, ve->gpa, &val); + break; + case MMIO_READ: + err =3D tdx_mmio_read(size, ve->gpa, &val); + if (err) + break; + /* Zero-extend for 32-bit operation */ + if (size =3D=3D 4) + *reg =3D 0; + memcpy(reg, &val, size); + break; + case MMIO_READ_ZERO_EXTEND: + err =3D tdx_mmio_read(size, ve->gpa, &val); + if (err) + break; + + /* Zero extend based on operand size */ + memset(reg, 0, insn.opnd_bytes); + memcpy(reg, &val, size); + break; + case MMIO_READ_SIGN_EXTEND: { + u8 sign_byte =3D 0, msb =3D 7; + + err =3D tdx_mmio_read(size, ve->gpa, &val); + if (err) + break; + + if (size > 1) + msb =3D 15; + + if (val & BIT(msb)) + sign_byte =3D -1; + + /* Sign extend based on operand size */ + memset(reg, sign_byte, insn.opnd_bytes); + memcpy(reg, &val, size); + break; + } + case MMIO_MOVS: + case MMIO_DECODE_FAILED: + return -EFAULT; + } + + if (err) + return -EFAULT; + + return insn.length; +} + bool tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; @@ -219,6 +327,12 @@ static bool tdx_virt_exception_kernel(struct pt_regs *= regs, struct ve_info *ve) case EXIT_REASON_CPUID: ret =3D tdx_handle_cpuid(regs); break; + case EXIT_REASON_EPT_VIOLATION: + ve->instr_len =3D tdx_handle_mmio(regs, ve); + ret =3D ve->instr_len > 0; + if (!ret) + pr_warn_once("MMIO failed\n"); + break; default: pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); break; --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E22DC433EF for ; Mon, 24 Jan 2022 15:03:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240368AbiAXPDP (ORCPT ); Mon, 24 Jan 2022 10:03:15 -0500 Received: from mga14.intel.com ([192.55.52.115]:43409 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239188AbiAXPC2 (ORCPT ); Mon, 24 Jan 2022 10:02:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036548; x=1674572548; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qcUqbzAY6M/ii9A7rG49fqPPHN0dWo5mN7mZmaHffu4=; b=cNStg1N01fqC1hP0HtlVF87eTL36pkMt8awZxzX1it4zCJvbLwAXUKj8 8WOAodIIGdlnyeYYHV5hFaLnouaqgZ8ez/TEI9sQJjzZV2qwcY9On0omO riPw8+i747WFm/aQXMQOIMGeWaR3VY9gR/Xo/1QfHBRAJs8gQcD3UCgjB CGxC70CNLZ4hnoiRkXWWB903xoPBBvYqQa//23eUuSih5qjm6WSrk/+JN n6jsGaSBP3iah3/logAVH0+KqqJ0zXivGdE1tE+svXt1gArLux14bBA/L NtnrSC+CqjL+mWA4Huack4CJN+zg4D3K06NJjZYgwHm+h3XSi8PWUmr6+ Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="246280269" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="246280269" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="532102238" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga007.fm.intel.com with ESMTP; 24 Jan 2022 07:02:13 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id BC3DC56A; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv2 09/29] x86/tdx: Detect TDX at early kernel decompression time Date: Mon, 24 Jan 2022 18:01:55 +0300 Message-Id: <20220124150215.36893-10-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan The early decompression code does port I/O for its console output. But, handling the decompression-time port I/O demands a different approach from normal runtime because the IDT required to support #VE based port I/O emulation is not yet set up. Paravirtualizing I/O calls during the decompression step is acceptable because the decompression code size is small enough and hence patching it will not bloat the image size a lot. To support port I/O in decompression code, TDX must be detected before the decompression code might do port I/O. Add support to detect for TDX guest support before console_init() in the extract_kernel(). Detecting it above the console_init() is early enough for patching port I/O. Add an early_is_tdx_guest() interface to get the cached TDX guest status in the decompression code. The actual port I/O paravirtualization will come later in the series. Reviewed-by: Tony Luck Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov --- arch/x86/boot/compressed/Makefile | 1 + arch/x86/boot/compressed/misc.c | 8 ++++++++ arch/x86/boot/compressed/misc.h | 2 ++ arch/x86/boot/compressed/tdx.c | 29 +++++++++++++++++++++++++++++ arch/x86/boot/compressed/tdx.h | 16 ++++++++++++++++ arch/x86/boot/cpuflags.c | 3 +-- arch/x86/boot/cpuflags.h | 1 + arch/x86/include/asm/shared/tdx.h | 7 +++++++ arch/x86/include/asm/tdx.h | 4 +--- 9 files changed, 66 insertions(+), 5 deletions(-) create mode 100644 arch/x86/boot/compressed/tdx.c create mode 100644 arch/x86/boot/compressed/tdx.h create mode 100644 arch/x86/include/asm/shared/tdx.h diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/M= akefile index 6115274fe10f..732f6b21ecbd 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -101,6 +101,7 @@ ifdef CONFIG_X86_64 endif =20 vmlinux-objs-$(CONFIG_ACPI) +=3D $(obj)/acpi.o +vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) +=3D $(obj)/tdx.o =20 vmlinux-objs-$(CONFIG_EFI_MIXED) +=3D $(obj)/efi_thunk_$(BITS).o efi-obj-$(CONFIG_EFI_STUB) =3D $(objtree)/drivers/firmware/efi/libstub/lib= .a diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/mis= c.c index a4339cb2d247..d8373d766672 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -370,6 +370,14 @@ asmlinkage __visible void *extract_kernel(void *rmode,= memptr heap, lines =3D boot_params->screen_info.orig_video_lines; cols =3D boot_params->screen_info.orig_video_cols; =20 + /* + * Detect TDX guest environment. + * + * It has to be done before console_init() in order to use + * paravirtualized port I/O oprations if needed. + */ + early_tdx_detect(); + console_init(); =20 /* diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/mis= c.h index 16ed360b6692..0d8e275a9d96 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -28,6 +28,8 @@ #include #include =20 +#include "tdx.h" + #define BOOT_CTYPE_H #include =20 diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c new file mode 100644 index 000000000000..6853376fe69a --- /dev/null +++ b/arch/x86/boot/compressed/tdx.c @@ -0,0 +1,29 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * tdx.c - Early boot code for TDX + */ + +#include "../cpuflags.h" +#include "../string.h" + +#include + +static bool tdx_guest_detected; + +bool early_is_tdx_guest(void) +{ + return tdx_guest_detected; +} + +void early_tdx_detect(void) +{ + u32 eax, sig[3]; + + cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]); + + if (memcmp(TDX_IDENT, sig, 12)) + return; + + /* Cache TDX guest feature status */ + tdx_guest_detected =3D true; +} diff --git a/arch/x86/boot/compressed/tdx.h b/arch/x86/boot/compressed/tdx.h new file mode 100644 index 000000000000..18970c09512e --- /dev/null +++ b/arch/x86/boot/compressed/tdx.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2021 Intel Corporation */ +#ifndef BOOT_COMPRESSED_TDX_H +#define BOOT_COMPRESSED_TDX_H + +#include + +#ifdef CONFIG_INTEL_TDX_GUEST +void early_tdx_detect(void); +bool early_is_tdx_guest(void); +#else +static inline void early_tdx_detect(void) { }; +static inline bool early_is_tdx_guest(void) { return false; } +#endif + +#endif diff --git a/arch/x86/boot/cpuflags.c b/arch/x86/boot/cpuflags.c index a0b75f73dc63..a83d67ec627d 100644 --- a/arch/x86/boot/cpuflags.c +++ b/arch/x86/boot/cpuflags.c @@ -71,8 +71,7 @@ int has_eflag(unsigned long mask) # define EBX_REG "=3Db" #endif =20 -static inline void cpuid_count(u32 id, u32 count, - u32 *a, u32 *b, u32 *c, u32 *d) +void cpuid_count(u32 id, u32 count, u32 *a, u32 *b, u32 *c, u32 *d) { asm volatile(".ifnc %%ebx,%3 ; movl %%ebx,%3 ; .endif \n\t" "cpuid \n\t" diff --git a/arch/x86/boot/cpuflags.h b/arch/x86/boot/cpuflags.h index 2e20814d3ce3..475b8fde90f7 100644 --- a/arch/x86/boot/cpuflags.h +++ b/arch/x86/boot/cpuflags.h @@ -17,5 +17,6 @@ extern u32 cpu_vendor[3]; =20 int has_eflag(unsigned long mask); void get_cpuflags(void); +void cpuid_count(u32 id, u32 count, u32 *a, u32 *b, u32 *c, u32 *d); =20 #endif diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/share= d/tdx.h new file mode 100644 index 000000000000..12bede46d048 --- /dev/null +++ b/arch/x86/include/asm/shared/tdx.h @@ -0,0 +1,7 @@ +#ifndef _ASM_X86_SHARED_TDX_H +#define _ASM_X86_SHARED_TDX_H + +#define TDX_CPUID_LEAF_ID 0x21 +#define TDX_IDENT "IntelTDX " + +#endif diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 9b4714a45bb9..53f7dd0fbe58 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -5,9 +5,7 @@ =20 #include #include - -#define TDX_CPUID_LEAF_ID 0x21 -#define TDX_IDENT "IntelTDX " +#include =20 #define TDX_HYPERCALL_STANDARD 0 =20 --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1478C433F5 for ; Mon, 24 Jan 2022 15:02:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240103AbiAXPCx (ORCPT ); Mon, 24 Jan 2022 10:02:53 -0500 Received: from mga02.intel.com ([134.134.136.20]:19111 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239003AbiAXPC0 (ORCPT ); Mon, 24 Jan 2022 10:02:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036546; x=1674572546; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qHjdp065/pE2dGbXagUIgqF5Caf1mnBv+CAb9DUvCm4=; b=CZr6FmxHiajhsW3ltsN0+BzL+4sj4ug/EVS3cHkkv87UW6Q1rAT0dPGn gKClei+tPWUaBrSyCuLdvuZ6AiGvXfVtJzrj+N/RKM1WLkRdqEbhK41Ik P2Et2s108uPNAsxjheLjzuiGS64vjOIMeC2vY1/84u19VsEAKbQ/KiQSQ hwRphSf/mGwUe4dbukODaEEkq11r2BedN5YpztoYp4Un42stY/WdJ1g6S bIicOGjyZcVKmE301rRLOvS7GgHIQfrOWZysA7iOy2xqDj4uUcpbqYIFd jMYSREzeS1v3r5SQlwT6mx7DH67IwnnszgBKMn8CyPlgi77Zua3sNIX8z Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="233423270" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="233423270" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="766422581" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga006.fm.intel.com with ESMTP; 24 Jan 2022 07:02:13 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id CA4F15E4; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 10/29] x86: Consolidate port I/O helpers Date: Mon, 24 Jan 2022 18:01:56 +0300 Message-Id: <20220124150215.36893-11-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There are two implementations of port I/O helpers: one in the kernel and one in the boot stub. Move the helpers required for both to and use the one implementation everywhere. Signed-off-by: Kirill A. Shutemov --- arch/x86/boot/boot.h | 35 +------------------------------- arch/x86/boot/compressed/misc.h | 2 +- arch/x86/include/asm/io.h | 22 ++------------------ arch/x86/include/asm/shared/io.h | 32 +++++++++++++++++++++++++++++ 4 files changed, 36 insertions(+), 55 deletions(-) create mode 100644 arch/x86/include/asm/shared/io.h diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h index 34c9dbb6a47d..22a474c5b3e8 100644 --- a/arch/x86/boot/boot.h +++ b/arch/x86/boot/boot.h @@ -23,6 +23,7 @@ #include #include #include +#include #include "bitops.h" #include "ctype.h" #include "cpuflags.h" @@ -35,40 +36,6 @@ extern struct boot_params boot_params; =20 #define cpu_relax() asm volatile("rep; nop") =20 -/* Basic port I/O */ -static inline void outb(u8 v, u16 port) -{ - asm volatile("outb %0,%1" : : "a" (v), "dN" (port)); -} -static inline u8 inb(u16 port) -{ - u8 v; - asm volatile("inb %1,%0" : "=3Da" (v) : "dN" (port)); - return v; -} - -static inline void outw(u16 v, u16 port) -{ - asm volatile("outw %0,%1" : : "a" (v), "dN" (port)); -} -static inline u16 inw(u16 port) -{ - u16 v; - asm volatile("inw %1,%0" : "=3Da" (v) : "dN" (port)); - return v; -} - -static inline void outl(u32 v, u16 port) -{ - asm volatile("outl %0,%1" : : "a" (v), "dN" (port)); -} -static inline u32 inl(u16 port) -{ - u32 v; - asm volatile("inl %1,%0" : "=3Da" (v) : "dN" (port)); - return v; -} - static inline void io_delay(void) { const u16 DELAY_PORT =3D 0x80; diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/mis= c.h index 0d8e275a9d96..8a253e85f990 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -22,11 +22,11 @@ #include #include #include -#include #include #include #include #include +#include =20 #include "tdx.h" =20 diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index f6d91ecb8026..8ce0a40379de 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -44,6 +44,7 @@ #include #include #include +#include =20 #define build_mmio_read(name, size, type, reg, barrier) \ static inline type name(const volatile void __iomem *addr) \ @@ -258,20 +259,6 @@ static inline void slow_down_io(void) #endif =20 #define BUILDIO(bwl, bw, type) \ -static inline void out##bwl(unsigned type value, int port) \ -{ \ - asm volatile("out" #bwl " %" #bw "0, %w1" \ - : : "a"(value), "Nd"(port)); \ -} \ - \ -static inline unsigned type in##bwl(int port) \ -{ \ - unsigned type value; \ - asm volatile("in" #bwl " %w1, %" #bw "0" \ - : "=3Da"(value) : "Nd"(port)); \ - return value; \ -} \ - \ static inline void out##bwl##_p(unsigned type value, int port) \ { \ out##bwl(value, port); \ @@ -320,10 +307,8 @@ static inline void ins##bwl(int port, void *addr, unsi= gned long count) \ BUILDIO(b, b, char) BUILDIO(w, w, short) BUILDIO(l, , int) +#undef BUILDIO =20 -#define inb inb -#define inw inw -#define inl inl #define inb_p inb_p #define inw_p inw_p #define inl_p inl_p @@ -331,9 +316,6 @@ BUILDIO(l, , int) #define insw insw #define insl insl =20 -#define outb outb -#define outw outw -#define outl outl #define outb_p outb_p #define outw_p outw_p #define outl_p outl_p diff --git a/arch/x86/include/asm/shared/io.h b/arch/x86/include/asm/shared= /io.h new file mode 100644 index 000000000000..f17247f6c471 --- /dev/null +++ b/arch/x86/include/asm/shared/io.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_SHARED_IO_H +#define _ASM_X86_SHARED_IO_H + +#define BUILDIO(bwl, bw, type) \ +static inline void out##bwl(unsigned type value, int port) \ +{ \ + asm volatile("out" #bwl " %" #bw "0, %w1" \ + : : "a"(value), "Nd"(port)); \ +} \ + \ +static inline unsigned type in##bwl(int port) \ +{ \ + unsigned type value; \ + asm volatile("in" #bwl " %w1, %" #bw "0" \ + : "=3Da"(value) : "Nd"(port)); \ + return value; \ +} + +BUILDIO(b, b, char) +BUILDIO(w, w, short) +BUILDIO(l, , int) +#undef BUILDIO + +#define inb inb +#define inw inw +#define inl inl +#define outb outb +#define outw outw +#define outl outl + +#endif --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 074A2C433F5 for ; Mon, 24 Jan 2022 15:02:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240391AbiAXPCz (ORCPT ); Mon, 24 Jan 2022 10:02:55 -0500 Received: from mga01.intel.com ([192.55.52.88]:64673 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239029AbiAXPCZ (ORCPT ); Mon, 24 Jan 2022 10:02:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036545; x=1674572545; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OwNk3t0v5UzekLxVMaUjWa3gpsufOzXDlwCNNAuGzok=; b=MtGPg2G+/ATzf702hZFKSv6dCrZHFW3Bvr9nzf/v0q1+Ye6b8hfOLhKV JYwjfC19IttzAViyI+jA6KwIOfdNjDgKzHT2aOY8RpsJ7ju91y9r3n8ax jeY+fVNR/Pe5kM4iMchHhRz3X3BQcVuFguZZL4UWPyEnMyWGP1TivsVot FKvYZrsl5hAW/YWQznHgwFOCwRxY/ts+UJ7hdYVdAxhSRJlPEduXpBifw 2D8dPcfxfMlsrL5EpjUpB8+oAedVdNyQ8jP1yhV9SuXZTkXUIsLXMazTh 13w290kpMDJXTipP8HTe+OoJzpwIGpmg+fzrX5VvUMUWx0QU7RvlnDRVX A==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="270498595" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="270498595" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:24 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="479104662" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga006.jf.intel.com with ESMTP; 24 Jan 2022 07:02:13 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id D88A3898; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 11/29] x86/boot: Allow to hook up alternative port I/O helpers Date: Mon, 24 Jan 2022 18:01:57 +0300 Message-Id: <20220124150215.36893-12-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Port I/O instructions trigger #VE in the TDX environment. In response to the exception, kernel emulates these instructions using hypercalls. But during early boot, on the decompression stage, it is cumbersome to deal with #VE. It is cleaner to go to hypercalls directly, bypassing #VE handling. Add a way to hook up alternative port I/O helpers in the boot stub. All port I/O operations are routed via 'pio_ops'. By default 'pio_ops' initialized with native port I/O implementations. This is a preparation patch. The next patch will override 'pio_ops' if the kernel booted in the TDX environment. Signed-off-by: Kirill A. Shutemov --- arch/x86/boot/a20.c | 14 +++++++------- arch/x86/boot/boot.h | 2 +- arch/x86/boot/compressed/misc.c | 18 ++++++++++++------ arch/x86/boot/compressed/misc.h | 2 +- arch/x86/boot/early_serial_console.c | 28 ++++++++++++++-------------- arch/x86/boot/io.h | 28 ++++++++++++++++++++++++++++ arch/x86/boot/main.c | 4 ++++ arch/x86/boot/pm.c | 10 +++++----- arch/x86/boot/tty.c | 4 ++-- arch/x86/boot/video-vga.c | 6 +++--- arch/x86/boot/video.h | 8 +++++--- arch/x86/realmode/rm/wakemain.c | 14 +++++++++----- 12 files changed, 91 insertions(+), 47 deletions(-) create mode 100644 arch/x86/boot/io.h diff --git a/arch/x86/boot/a20.c b/arch/x86/boot/a20.c index a2b6b428922a..7f6dd5cc4670 100644 --- a/arch/x86/boot/a20.c +++ b/arch/x86/boot/a20.c @@ -25,7 +25,7 @@ static int empty_8042(void) while (loops--) { io_delay(); =20 - status =3D inb(0x64); + status =3D pio_ops.inb(0x64); if (status =3D=3D 0xff) { /* FF is a plausible, but very unlikely status */ if (!--ffs) @@ -34,7 +34,7 @@ static int empty_8042(void) if (status & 1) { /* Read and discard input data */ io_delay(); - (void)inb(0x60); + (void)pio_ops.inb(0x60); } else if (!(status & 2)) { /* Buffers empty, finished! */ return 0; @@ -99,13 +99,13 @@ static void enable_a20_kbc(void) { empty_8042(); =20 - outb(0xd1, 0x64); /* Command write */ + pio_ops.outb(0xd1, 0x64); /* Command write */ empty_8042(); =20 - outb(0xdf, 0x60); /* A20 on */ + pio_ops.outb(0xdf, 0x60); /* A20 on */ empty_8042(); =20 - outb(0xff, 0x64); /* Null command, but UHCI wants it */ + pio_ops.outb(0xff, 0x64); /* Null command, but UHCI wants it */ empty_8042(); } =20 @@ -113,10 +113,10 @@ static void enable_a20_fast(void) { u8 port_a; =20 - port_a =3D inb(0x92); /* Configuration port A */ + port_a =3D pio_ops.inb(0x92); /* Configuration port A */ port_a |=3D 0x02; /* Enable A20 */ port_a &=3D ~0x01; /* Do not reset machine */ - outb(port_a, 0x92); + pio_ops.outb(port_a, 0x92); } =20 /* diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h index 22a474c5b3e8..bd8f640ca15f 100644 --- a/arch/x86/boot/boot.h +++ b/arch/x86/boot/boot.h @@ -23,10 +23,10 @@ #include #include #include -#include #include "bitops.h" #include "ctype.h" #include "cpuflags.h" +#include "io.h" =20 /* Useful macros */ #define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/mis= c.c index d8373d766672..cc47cf239c67 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -47,6 +47,8 @@ void *memmove(void *dest, const void *src, size_t n); */ struct boot_params *boot_params; =20 +struct port_io_ops pio_ops; + memptr free_mem_ptr; memptr free_mem_end_ptr; =20 @@ -103,10 +105,12 @@ static void serial_putchar(int ch) { unsigned timeout =3D 0xffff; =20 - while ((inb(early_serial_base + LSR) & XMTRDY) =3D=3D 0 && --timeout) + while ((pio_ops.inb(early_serial_base + LSR) & XMTRDY) =3D=3D 0 && + --timeout) { cpu_relax(); + } =20 - outb(ch, early_serial_base + TXR); + pio_ops.outb(ch, early_serial_base + TXR); } =20 void __putstr(const char *s) @@ -152,10 +156,10 @@ void __putstr(const char *s) boot_params->screen_info.orig_y =3D y; =20 pos =3D (x + cols * y) * 2; /* Update cursor position */ - outb(14, vidport); - outb(0xff & (pos >> 9), vidport+1); - outb(15, vidport); - outb(0xff & (pos >> 1), vidport+1); + pio_ops.outb(14, vidport); + pio_ops.outb(0xff & (pos >> 9), vidport+1); + pio_ops.outb(15, vidport); + pio_ops.outb(0xff & (pos >> 1), vidport+1); } =20 void __puthex(unsigned long value) @@ -370,6 +374,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, = memptr heap, lines =3D boot_params->screen_info.orig_video_lines; cols =3D boot_params->screen_info.orig_video_cols; =20 + init_io_ops(); + /* * Detect TDX guest environment. * diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/mis= c.h index 8a253e85f990..ea71cf3d64e1 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -26,7 +26,6 @@ #include #include #include -#include =20 #include "tdx.h" =20 @@ -35,6 +34,7 @@ =20 #define BOOT_BOOT_H #include "../ctype.h" +#include "../io.h" =20 #ifdef CONFIG_X86_64 #define memptr long diff --git a/arch/x86/boot/early_serial_console.c b/arch/x86/boot/early_ser= ial_console.c index 023bf1c3de8b..03e43d770571 100644 --- a/arch/x86/boot/early_serial_console.c +++ b/arch/x86/boot/early_serial_console.c @@ -28,17 +28,17 @@ static void early_serial_init(int port, int baud) unsigned char c; unsigned divisor; =20 - outb(0x3, port + LCR); /* 8n1 */ - outb(0, port + IER); /* no interrupt */ - outb(0, port + FCR); /* no fifo */ - outb(0x3, port + MCR); /* DTR + RTS */ + pio_ops.outb(0x3, port + LCR); /* 8n1 */ + pio_ops.outb(0, port + IER); /* no interrupt */ + pio_ops.outb(0, port + FCR); /* no fifo */ + pio_ops.outb(0x3, port + MCR); /* DTR + RTS */ =20 divisor =3D 115200 / baud; - c =3D inb(port + LCR); - outb(c | DLAB, port + LCR); - outb(divisor & 0xff, port + DLL); - outb((divisor >> 8) & 0xff, port + DLH); - outb(c & ~DLAB, port + LCR); + c =3D pio_ops.inb(port + LCR); + pio_ops.outb(c | DLAB, port + LCR); + pio_ops.outb(divisor & 0xff, port + DLL); + pio_ops.outb((divisor >> 8) & 0xff, port + DLH); + pio_ops.outb(c & ~DLAB, port + LCR); =20 early_serial_base =3D port; } @@ -104,11 +104,11 @@ static unsigned int probe_baud(int port) unsigned char lcr, dll, dlh; unsigned int quot; =20 - lcr =3D inb(port + LCR); - outb(lcr | DLAB, port + LCR); - dll =3D inb(port + DLL); - dlh =3D inb(port + DLH); - outb(lcr, port + LCR); + lcr =3D pio_ops.inb(port + LCR); + pio_ops.outb(lcr | DLAB, port + LCR); + dll =3D pio_ops.inb(port + DLL); + dlh =3D pio_ops.inb(port + DLH); + pio_ops.outb(lcr, port + LCR); quot =3D (dlh << 8) | dll; =20 return BASE_BAUD / quot; diff --git a/arch/x86/boot/io.h b/arch/x86/boot/io.h new file mode 100644 index 000000000000..2659180e3210 --- /dev/null +++ b/arch/x86/boot/io.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef BOOT_IO_H +#define BOOT_IO_H + +#include + +struct port_io_ops { + unsigned char (*inb)(int port); + unsigned short (*inw)(int port); + unsigned int (*inl)(int port); + void (*outb)(unsigned char v, int port); + void (*outw)(unsigned short v, int port); + void (*outl)(unsigned int v, int port); +}; + +extern struct port_io_ops pio_ops; + +static inline void init_io_ops(void) +{ + pio_ops.inb =3D inb; + pio_ops.inw =3D inw; + pio_ops.inl =3D inl; + pio_ops.outb =3D outb; + pio_ops.outw =3D outw; + pio_ops.outl =3D outl; +} + +#endif diff --git a/arch/x86/boot/main.c b/arch/x86/boot/main.c index e3add857c2c9..447a797891be 100644 --- a/arch/x86/boot/main.c +++ b/arch/x86/boot/main.c @@ -17,6 +17,8 @@ =20 struct boot_params boot_params __attribute__((aligned(16))); =20 +struct port_io_ops pio_ops; + char *HEAP =3D _end; char *heap_end =3D _end; /* Default end of heap =3D no heap */ =20 @@ -133,6 +135,8 @@ static void init_heap(void) =20 void main(void) { + init_io_ops(); + /* First, copy the boot header into the "zeropage" */ copy_boot_params(); =20 diff --git a/arch/x86/boot/pm.c b/arch/x86/boot/pm.c index 40031a614712..4180b6a264c9 100644 --- a/arch/x86/boot/pm.c +++ b/arch/x86/boot/pm.c @@ -25,7 +25,7 @@ static void realmode_switch_hook(void) : "eax", "ebx", "ecx", "edx"); } else { asm volatile("cli"); - outb(0x80, 0x70); /* Disable NMI */ + pio_ops.outb(0x80, 0x70); /* Disable NMI */ io_delay(); } } @@ -35,9 +35,9 @@ static void realmode_switch_hook(void) */ static void mask_all_interrupts(void) { - outb(0xff, 0xa1); /* Mask all interrupts on the secondary PIC */ + pio_ops.outb(0xff, 0xa1); /* Mask all interrupts on the secondary PIC */ io_delay(); - outb(0xfb, 0x21); /* Mask all but cascade on the primary PIC */ + pio_ops.outb(0xfb, 0x21); /* Mask all but cascade on the primary PIC */ io_delay(); } =20 @@ -46,9 +46,9 @@ static void mask_all_interrupts(void) */ static void reset_coprocessor(void) { - outb(0, 0xf0); + pio_ops.outb(0, 0xf0); io_delay(); - outb(0, 0xf1); + pio_ops.outb(0, 0xf1); io_delay(); } =20 diff --git a/arch/x86/boot/tty.c b/arch/x86/boot/tty.c index f7eb976b0a4b..ee8700682801 100644 --- a/arch/x86/boot/tty.c +++ b/arch/x86/boot/tty.c @@ -29,10 +29,10 @@ static void __section(".inittext") serial_putchar(int c= h) { unsigned timeout =3D 0xffff; =20 - while ((inb(early_serial_base + LSR) & XMTRDY) =3D=3D 0 && --timeout) + while ((pio_ops.inb(early_serial_base + LSR) & XMTRDY) =3D=3D 0 && --time= out) cpu_relax(); =20 - outb(ch, early_serial_base + TXR); + pio_ops.outb(ch, early_serial_base + TXR); } =20 static void __section(".inittext") bios_putchar(int ch) diff --git a/arch/x86/boot/video-vga.c b/arch/x86/boot/video-vga.c index 4816cb9cf996..17baac542ee7 100644 --- a/arch/x86/boot/video-vga.c +++ b/arch/x86/boot/video-vga.c @@ -131,7 +131,7 @@ static void vga_set_80x43(void) /* I/O address of the VGA CRTC */ u16 vga_crtc(void) { - return (inb(0x3cc) & 1) ? 0x3d4 : 0x3b4; + return (pio_ops.inb(0x3cc) & 1) ? 0x3d4 : 0x3b4; } =20 static void vga_set_480_scanlines(void) @@ -148,10 +148,10 @@ static void vga_set_480_scanlines(void) out_idx(0xdf, crtc, 0x12); /* Vertical display end */ out_idx(0xe7, crtc, 0x15); /* Vertical blank start */ out_idx(0x04, crtc, 0x16); /* Vertical blank end */ - csel =3D inb(0x3cc); + csel =3D pio_ops.inb(0x3cc); csel &=3D 0x0d; csel |=3D 0xe2; - outb(csel, 0x3c2); + pio_ops.outb(csel, 0x3c2); } =20 static void vga_set_vertical_end(int lines) diff --git a/arch/x86/boot/video.h b/arch/x86/boot/video.h index 04bde0bb2003..87a5f726e731 100644 --- a/arch/x86/boot/video.h +++ b/arch/x86/boot/video.h @@ -15,6 +15,8 @@ =20 #include =20 +#include "boot.h" + /* * This code uses an extended set of video mode numbers. These include: * Aliases for standard modes @@ -96,13 +98,13 @@ extern int graphic_mode; /* Graphics mode with linear f= rame buffer */ /* Accessing VGA indexed registers */ static inline u8 in_idx(u16 port, u8 index) { - outb(index, port); - return inb(port+1); + pio_ops.outb(index, port); + return pio_ops.inb(port+1); } =20 static inline void out_idx(u8 v, u16 port, u8 index) { - outw(index+(v << 8), port); + pio_ops.outw(index+(v << 8), port); } =20 /* Writes a value to an indexed port and then reads the port again */ diff --git a/arch/x86/realmode/rm/wakemain.c b/arch/x86/realmode/rm/wakemai= n.c index 1d6437e6d2ba..b49404d0d63c 100644 --- a/arch/x86/realmode/rm/wakemain.c +++ b/arch/x86/realmode/rm/wakemain.c @@ -17,18 +17,18 @@ static void beep(unsigned int hz) } else { u16 div =3D 1193181/hz; =20 - outb(0xb6, 0x43); /* Ctr 2, squarewave, load, binary */ + pio_ops.outb(0xb6, 0x43); /* Ctr 2, squarewave, load, binary */ io_delay(); - outb(div, 0x42); /* LSB of counter */ + pio_ops.outb(div, 0x42); /* LSB of counter */ io_delay(); - outb(div >> 8, 0x42); /* MSB of counter */ + pio_ops.outb(div >> 8, 0x42); /* MSB of counter */ io_delay(); =20 enable =3D 0x03; /* Turn on speaker */ } - inb(0x61); /* Dummy read of System Control Port B */ + pio_ops.inb(0x61); /* Dummy read of System Control Port B */ io_delay(); - outb(enable, 0x61); /* Enable timer 2 output to speaker */ + pio_ops.outb(enable, 0x61); /* Enable timer 2 output to speaker */ io_delay(); } =20 @@ -62,8 +62,12 @@ static void send_morse(const char *pattern) } } =20 +struct port_io_ops pio_ops; + void main(void) { + init_io_ops(); + /* Kill machine if structures are wrong */ if (wakeup_header.real_magic !=3D 0x12345678) while (1) --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62A1DC433FE for ; Mon, 24 Jan 2022 15:03:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243291AbiAXPDg (ORCPT ); Mon, 24 Jan 2022 10:03:36 -0500 Received: from mga02.intel.com ([134.134.136.20]:19100 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240206AbiAXPCb (ORCPT ); Mon, 24 Jan 2022 10:02:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036551; x=1674572551; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cmM37ONetmfy+H/QLu0tbdNtpNyWydlaLbgWJ54yjjA=; b=W8uFv7VBoT16cJ/Mdev/lSajPYl0Uzmb6ptX8HVIwuVPpFo3YD60snTx 5ZnL2fviT/tJhkKSx0ye4BEyBiNE4sTYIKl9FZKJBuJvW1rRE+AWN3yuv 9FBd8dymKiadv6I9jpDY1n8mlzJ7VJ+5kWAyNDvSdkek8VgmMdkOIsEHF R/GZq8bWV6ZzPZXNchHmGttDQTCrOKiUPNYAXKCugN7nFe6jl8hzPtHiH JuOZxUdex6r88G4X3AM0nq+vN5h4f0PLGVW3fqWtxWNGVsL10C2AAPnA0 Jhfq33hFA58KkHQNx1FydB1jUKkVmzRCHDLiGoM2hC/+4PpQcxAf1BvAz Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="233423319" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="233423319" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="695447691" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga005.jf.intel.com with ESMTP; 24 Jan 2022 07:02:19 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id E64BA9E0; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 12/29] x86/boot/compressed: Support TDX guest port I/O at decompression time Date: Mon, 24 Jan 2022 18:01:58 +0300 Message-Id: <20220124150215.36893-13-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Port I/O instructions trigger #VE in the TDX environment. In response to the exception, kernel emulates these instructions using hypercalls. But during early boot, on the decompression stage, it is cumbersome to deal with #VE. It is cleaner to go to hypercalls directly, bypassing #VE handling. Hook up TDX-specific port I/O helpers if booting in TDX environment. Signed-off-by: Kirill A. Shutemov --- arch/x86/boot/compressed/Makefile | 2 +- arch/x86/boot/compressed/tdcall.S | 3 ++ arch/x86/boot/compressed/tdx.c | 59 +++++++++++++++++++++++++++++++ arch/x86/include/asm/shared/tdx.h | 23 ++++++++++++ arch/x86/include/asm/tdx.h | 21 ----------- 5 files changed, 86 insertions(+), 22 deletions(-) create mode 100644 arch/x86/boot/compressed/tdcall.S diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/M= akefile index 732f6b21ecbd..8fd0e6ae2e1f 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -101,7 +101,7 @@ ifdef CONFIG_X86_64 endif =20 vmlinux-objs-$(CONFIG_ACPI) +=3D $(obj)/acpi.o -vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) +=3D $(obj)/tdx.o +vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) +=3D $(obj)/tdx.o $(obj)/tdcall.o =20 vmlinux-objs-$(CONFIG_EFI_MIXED) +=3D $(obj)/efi_thunk_$(BITS).o efi-obj-$(CONFIG_EFI_STUB) =3D $(objtree)/drivers/firmware/efi/libstub/lib= .a diff --git a/arch/x86/boot/compressed/tdcall.S b/arch/x86/boot/compressed/t= dcall.S new file mode 100644 index 000000000000..aafadc136c88 --- /dev/null +++ b/arch/x86/boot/compressed/tdcall.S @@ -0,0 +1,3 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include "../../kernel/tdcall.S" diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c index 6853376fe69a..f2e1449c74cd 100644 --- a/arch/x86/boot/compressed/tdx.c +++ b/arch/x86/boot/compressed/tdx.c @@ -5,6 +5,10 @@ =20 #include "../cpuflags.h" #include "../string.h" +#include "../io.h" + +#include +#include =20 #include =20 @@ -15,6 +19,54 @@ bool early_is_tdx_guest(void) return tdx_guest_detected; } =20 +static inline unsigned int tdx_io_in(int size, int port) +{ + struct tdx_hypercall_output out; + + __tdx_hypercall(TDX_HYPERCALL_STANDARD, EXIT_REASON_IO_INSTRUCTION, + size, 0, port, 0, &out); + + return out.r10 ? UINT_MAX : out.r11; +} + +static inline void tdx_io_out(int size, int port, u64 value) +{ + struct tdx_hypercall_output out; + + __tdx_hypercall(TDX_HYPERCALL_STANDARD, EXIT_REASON_IO_INSTRUCTION, + size, 1, port, value, &out); +} + +static inline unsigned char tdx_inb(int port) +{ + return tdx_io_in(1, port); +} + +static inline unsigned short tdx_inw(int port) +{ + return tdx_io_in(2, port); +} + +static inline unsigned int tdx_inl(int port) +{ + return tdx_io_in(4, port); +} + +static inline void tdx_outb(unsigned char value, int port) +{ + tdx_io_out(1, port, value); +} + +static inline void tdx_outw(unsigned short value, int port) +{ + tdx_io_out(2, port, value); +} + +static inline void tdx_outl(unsigned int value, int port) +{ + tdx_io_out(4, port, value); +} + void early_tdx_detect(void) { u32 eax, sig[3]; @@ -26,4 +78,11 @@ void early_tdx_detect(void) =20 /* Cache TDX guest feature status */ tdx_guest_detected =3D true; + + pio_ops.inb =3D tdx_inb; + pio_ops.inw =3D tdx_inw; + pio_ops.inl =3D tdx_inl; + pio_ops.outb =3D tdx_outb; + pio_ops.outw =3D tdx_outw; + pio_ops.outl =3D tdx_outl; } diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/share= d/tdx.h index 12bede46d048..4a0218bedc75 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -1,7 +1,30 @@ #ifndef _ASM_X86_SHARED_TDX_H #define _ASM_X86_SHARED_TDX_H =20 +#include + +/* + * Used in __tdx_hypercall() to gather the output registers' values + * of the TDCALL instruction when requesting services from the VMM. + * This is a software only structure and not part of the TDX + * module/VMM ABI. + */ +struct tdx_hypercall_output { + u64 r10; + u64 r11; + u64 r12; + u64 r13; + u64 r14; + u64 r15; +}; + +#define TDX_HYPERCALL_STANDARD 0 + #define TDX_CPUID_LEAF_ID 0x21 #define TDX_IDENT "IntelTDX " =20 +/* Used to request services from the VMM */ +u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14, + u64 r15, struct tdx_hypercall_output *out); + #endif diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 53f7dd0fbe58..27eb4ab2fdd2 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -7,8 +7,6 @@ #include #include =20 -#define TDX_HYPERCALL_STANDARD 0 - /* * Used in __tdx_module_call() to gather the output registers' * values of the TDCALL instruction when requesting services from @@ -24,21 +22,6 @@ struct tdx_module_output { u64 r11; }; =20 -/* - * Used in __tdx_hypercall() to gather the output registers' values - * of the TDCALL instruction when requesting services from the VMM. - * This is a software only structure and not part of the TDX - * module/VMM ABI. - */ -struct tdx_hypercall_output { - u64 r10; - u64 r11; - u64 r12; - u64 r13; - u64 r14; - u64 r15; -}; - /* * Used by the #VE exception handler to gather the #VE exception * info from the TDX module. This is a software only structure @@ -64,10 +47,6 @@ bool is_tdx_guest(void); u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, struct tdx_module_output *out); =20 -/* Used to request services from the VMM */ -u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14, - u64 r15, struct tdx_hypercall_output *out); - bool tdx_get_ve_info(struct ve_info *ve); =20 bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve); --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF895C433F5 for ; Mon, 24 Jan 2022 15:03:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243356AbiAXPDI (ORCPT ); Mon, 24 Jan 2022 10:03:08 -0500 Received: from mga17.intel.com ([192.55.52.151]:63896 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239091AbiAXPC1 (ORCPT ); Mon, 24 Jan 2022 10:02:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036547; x=1674572547; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=c1HfnQEB9O1TcYNGiH9nKcRDKpMRlZ4Nz2csO46l184=; b=CT5e2TETwZqPFUnSVgESHjkDAtpWBZMp8SHiw/0jzKp4lWGQJgl97nTV 92yOgIp4GXoFPUB25uLjp242JI4kk63FijOsDq9Z1znv9nYtctBP3qVfo URcGBMsauSlSbdtF1yAvJ2IpzyMDYSPALeWiCSjxqOE0mV3mYCakkVsl1 DCEd3oPLXirOBQ72aUwo99hAWaaJsyD8AE1vrlkqx4kbQU/yN2Qww5Lri yWtWxwmMfJBcTbHLpkm0xXnGuGVQvCGddRE78a85zIsIJQE00ZRT5SgCa C+Dc39Xn2r9MXxhGHMj8UyxWzMTiwX4rNVG6jviATWvh6PuCzeW1kTDPs g==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="226734684" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="226734684" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="476743304" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga003.jf.intel.com with ESMTP; 24 Jan 2022 07:02:16 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id F19C69F0; Mon, 24 Jan 2022 17:02:19 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv2 13/29] x86/tdx: Add port I/O emulation Date: Mon, 24 Jan 2022 18:01:59 +0300 Message-Id: <20220124150215.36893-14-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan TDX hypervisors cannot emulate instructions directly. This includes port I/O which is normally emulated in the hypervisor. All port I/O instructions inside TDX trigger the #VE exception in the guest and would be normally emulated there. Use a hypercall to emulate port I/O. Extend the tdx_handle_virt_exception() and add support to handle the #VE due to port I/O instructions. String I/O operations are not supported in TDX. Unroll them by declaring CC_ATTR_GUEST_UNROLL_STRING_IO confidential computing attribute. Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Dan Williams Signed-off-by: Kirill A. Shutemov --- arch/x86/kernel/cc_platform.c | 3 +++ arch/x86/kernel/tdx.c | 48 +++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index c72b3919bca9..8da246ab4339 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -17,6 +17,9 @@ =20 static bool intel_cc_platform_has(enum cc_attr attr) { + if (attr =3D=3D CC_ATTR_GUEST_UNROLL_STRING_IO) + return true; + return false; } =20 diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index 8e630eeb765d..e73af22a4c11 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -13,6 +13,12 @@ /* TDX module Call Leaf IDs */ #define TDX_GET_VEINFO 3 =20 +/* See Exit Qualification for I/O Instructions in VMX documentation */ +#define VE_IS_IO_IN(exit_qual) (((exit_qual) & 8) ? 1 : 0) +#define VE_GET_IO_SIZE(exit_qual) (((exit_qual) & 7) + 1) +#define VE_GET_PORT_NUM(exit_qual) ((exit_qual) >> 16) +#define VE_IS_IO_STRING(exit_qual) ((exit_qual) & 16 ? 1 : 0) + static bool tdx_guest_detected __ro_after_init; =20 /* @@ -257,6 +263,45 @@ static int tdx_handle_mmio(struct pt_regs *regs, struc= t ve_info *ve) return insn.length; } =20 +/* + * Emulate I/O using hypercall. + * + * Assumes the IO instruction was using ax, which is enforced + * by the standard io.h macros. + * + * Return True on success or False on failure. + */ +static bool tdx_handle_io(struct pt_regs *regs, u32 exit_qual) +{ + struct tdx_hypercall_output out; + int size, port, ret; + u64 mask; + bool in; + + if (VE_IS_IO_STRING(exit_qual)) + return false; + + in =3D VE_IS_IO_IN(exit_qual); + size =3D VE_GET_IO_SIZE(exit_qual); + port =3D VE_GET_PORT_NUM(exit_qual); + mask =3D GENMASK(BITS_PER_BYTE * size, 0); + + /* + * Emulate the I/O read/write via hypercall. More info about + * ABI can be found in TDX Guest-Host-Communication Interface + * (GHCI) sec titled "TDG.VP.VMCALL". + */ + ret =3D _tdx_hypercall(EXIT_REASON_IO_INSTRUCTION, size, !in, port, + in ? 0 : regs->ax, &out); + if (!in) + return !ret; + + regs->ax &=3D ~mask; + regs->ax |=3D ret ? UINT_MAX : out.r11 & mask; + + return !ret; +} + bool tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; @@ -333,6 +378,9 @@ static bool tdx_virt_exception_kernel(struct pt_regs *r= egs, struct ve_info *ve) if (!ret) pr_warn_once("MMIO failed\n"); break; + case EXIT_REASON_IO_INSTRUCTION: + ret =3D tdx_handle_io(regs, ve->exit_qual); + break; default: pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); break; --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E067C43217 for ; Mon, 24 Jan 2022 15:02:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240242AbiAXPCo (ORCPT ); Mon, 24 Jan 2022 10:02:44 -0500 Received: from mga17.intel.com ([192.55.52.151]:63901 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239990AbiAXPCX (ORCPT ); Mon, 24 Jan 2022 10:02:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036542; x=1674572542; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=U2RBFvSabGQbSWUrVUR6IfnDqFHD0udYc5VGLJl32RM=; b=NSXkfMRxa1RHY8y7cNxbPAetjes42o8pspNcJTYAhtIYIyaF368uXRhx fmpU0Ax75ORWokI+Y+HWTPPO4YAQAYWBCqWhlfCIWqtX3v+UXjEaVHK1s V8QlBpcdq7o119+FPPBLcn9COqDVKwreIgW9AGrLemk8KJFAzQh4c7Q06 e8XK4C8/nv+PrZgSRCFFa4KCTGyBsbmbU8yiw2Eounetwo2L87dTZRehC wlQElrzTLJZS0oJZ9ktURxsDI8UbC2/nncLPAzFcYKiSZHvkOP5aBsIPd BJiuNtLduD+bEQJs2cbwbckHmPXhrPXQLtezj36cVD2Dr1LUqIotmhRIB g==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="226734674" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="226734674" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="562680192" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga001.jf.intel.com with ESMTP; 24 Jan 2022 07:02:16 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 07CADA03; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv2 14/29] x86/tdx: Early boot handling of port I/O Date: Mon, 24 Jan 2022 18:02:00 +0300 Message-Id: <20220124150215.36893-15-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Andi Kleen TDX guests cannot do port I/O directly. The TDX module triggers a #VE exception to let the guest kernel emulate port I/O, by converting them into TDCALLs to call the host. But before IDT handlers are set up, port I/O cannot be emulated using normal kernel #VE handlers. To support the #VE-based emulation during this boot window, add a minimal early #VE handler support in early exception handlers. This is similar to what AMD SEV does. This is mainly to support earlyprintk's serial driver, as well as potentially the VGA driver (although it is expected not to be used). The early handler only supports I/O-related #VE exceptions. Unhandled or failed exceptions will be handled via early_fixup_exceptions() (like normal exception failures). This early handler enables the use of normal in*/out* macros without patching them for every driver. Since there is no expectation that early port I/O is performance-critical, the #VE emulation cost is worth the simplicity benefit of not patching the port I/O usage in early code. There are also no concerns with nesting, since there should be no NMIs or interrupts this early. Signed-off-by: Andi Kleen Reviewed-by: Dan Williams Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/tdx.h | 4 ++++ arch/x86/kernel/head64.c | 3 +++ arch/x86/kernel/tdx.c | 17 +++++++++++++++++ 3 files changed, 24 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 27eb4ab2fdd2..8013686192fd 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -53,12 +53,16 @@ bool tdx_handle_virt_exception(struct pt_regs *regs, st= ruct ve_info *ve); =20 void tdx_safe_halt(void); =20 +bool tdx_early_handle_ve(struct pt_regs *regs); + #else =20 static inline void tdx_early_init(void) { }; static inline bool is_tdx_guest(void) { return false; } static inline void tdx_safe_halt(void) { }; =20 +static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return fals= e; } + #endif /* CONFIG_INTEL_TDX_GUEST */ =20 #endif /* _ASM_X86_TDX_H */ diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 1cb6346ec3d1..76d298ddfe75 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -417,6 +417,9 @@ void __init do_early_exception(struct pt_regs *regs, in= t trapnr) trapnr =3D=3D X86_TRAP_VC && handle_vc_boot_ghcb(regs)) return; =20 + if (trapnr =3D=3D X86_TRAP_VE && tdx_early_handle_ve(regs)) + return; + early_fixup_exception(regs, trapnr); } =20 diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index e73af22a4c11..ebb29dfb3ad4 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -302,6 +302,23 @@ static bool tdx_handle_io(struct pt_regs *regs, u32 ex= it_qual) return !ret; } =20 +/* + * Early #VE exception handler. Only handles a subset of port I/O. + * Intended only for earlyprintk. If failed, return false. + */ +__init bool tdx_early_handle_ve(struct pt_regs *regs) +{ + struct ve_info ve; + + if (tdx_get_ve_info(&ve)) + return false; + + if (ve.exit_reason !=3D EXIT_REASON_IO_INSTRUCTION) + return false; + + return tdx_handle_io(regs, ve.exit_qual); +} + bool tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 555D3C433EF for ; Mon, 24 Jan 2022 15:04:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243309AbiAXPEP (ORCPT ); Mon, 24 Jan 2022 10:04:15 -0500 Received: from mga12.intel.com ([192.55.52.136]:5640 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243216AbiAXPDL (ORCPT ); Mon, 24 Jan 2022 10:03:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036591; x=1674572591; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=f42jF0db27DH6xwdarJr96m9QIQKkwx3aAFrMyHkc2s=; b=Q1IUitG/tapnBqr6cYEDbEKs/shuc+RYvFfBeXJVZe6Bo6OuDvJzUdMg bLxrjAo8pLnUqexCOQZVlPjTCjPVf50lJFl8yz02MIgJE6H3dm3bCvdU1 CNsquyXxsXMI/TqnHYsXpD8RUZRbX365rmGlzBgva3KABtyJ5zjUc7hNP NugkJwtgwTF45m22yqogCdjGU73AlnxCGnvtn7ASP0AlE5kfwzw1KCuuK XsxTEEdvCKglUL/V2zlYzAc259LUROt51X899jAp+AT1D7iabvSaz8WEb yb64yzPPcC/AQjTqFAaTxcj52kCD/4klIQ7jZXys14Q/nPYTyvhbB9y5x w==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="226043250" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="226043250" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="532102276" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga007.fm.intel.com with ESMTP; 24 Jan 2022 07:02:20 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 11AA6A8F; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv2 15/29] x86/tdx: Wire up KVM hypercalls Date: Mon, 24 Jan 2022 18:02:01 +0300 Message-Id: <20220124150215.36893-16-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan KVM hypercalls use the VMCALL or VMMCALL instructions. Although the ABI is similar, those instructions no longer function for TDX guests. Make vendor-specific TDVMCALLs instead of VMCALL. This enables TDX guests to run with KVM acting as the hypervisor. Among other things, KVM hypercall is used to send IPIs. Since the KVM driver can be built as a kernel module, export tdx_kvm_hypercall() to make the symbols visible to kvm.ko. Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/kvm_para.h | 22 ++++++++++++++++++++++ arch/x86/include/asm/tdx.h | 11 +++++++++++ arch/x86/kernel/tdx.c | 15 +++++++++++++++ 3 files changed, 48 insertions(+) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_par= a.h index 56935ebb1dfe..57bc74e112f2 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -7,6 +7,8 @@ #include #include =20 +#include + #ifdef CONFIG_KVM_GUEST bool kvm_check_and_clear_guest_paused(void); #else @@ -32,6 +34,10 @@ static inline bool kvm_check_and_clear_guest_paused(void) static inline long kvm_hypercall0(unsigned int nr) { long ret; + + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) + return tdx_kvm_hypercall(nr, 0, 0, 0, 0); + asm volatile(KVM_HYPERCALL : "=3Da"(ret) : "a"(nr) @@ -42,6 +48,10 @@ static inline long kvm_hypercall0(unsigned int nr) static inline long kvm_hypercall1(unsigned int nr, unsigned long p1) { long ret; + + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) + return tdx_kvm_hypercall(nr, p1, 0, 0, 0); + asm volatile(KVM_HYPERCALL : "=3Da"(ret) : "a"(nr), "b"(p1) @@ -53,6 +63,10 @@ static inline long kvm_hypercall2(unsigned int nr, unsig= ned long p1, unsigned long p2) { long ret; + + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) + return tdx_kvm_hypercall(nr, p1, p2, 0, 0); + asm volatile(KVM_HYPERCALL : "=3Da"(ret) : "a"(nr), "b"(p1), "c"(p2) @@ -64,6 +78,10 @@ static inline long kvm_hypercall3(unsigned int nr, unsig= ned long p1, unsigned long p2, unsigned long p3) { long ret; + + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) + return tdx_kvm_hypercall(nr, p1, p2, p3, 0); + asm volatile(KVM_HYPERCALL : "=3Da"(ret) : "a"(nr), "b"(p1), "c"(p2), "d"(p3) @@ -76,6 +94,10 @@ static inline long kvm_hypercall4(unsigned int nr, unsig= ned long p1, unsigned long p4) { long ret; + + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) + return tdx_kvm_hypercall(nr, p1, p2, p3, p4); + asm volatile(KVM_HYPERCALL : "=3Da"(ret) : "a"(nr), "b"(p1), "c"(p2), "d"(p3), "S"(p4) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 8013686192fd..4bcaadf21dc6 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -65,4 +65,15 @@ static inline bool tdx_early_handle_ve(struct pt_regs *r= egs) { return false; } =20 #endif /* CONFIG_INTEL_TDX_GUEST */ =20 +#if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST) +long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, unsigned long p2, + unsigned long p3, unsigned long p4); +#else +static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3, + unsigned long p4) +{ + return -ENODEV; +} +#endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */ #endif /* _ASM_X86_TDX_H */ diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index ebb29dfb3ad4..a4e696f12666 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -44,6 +44,21 @@ static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r1= 3, u64 r14, return out->r10; } =20 +#ifdef CONFIG_KVM_GUEST +long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, unsigned long p2, + unsigned long p3, unsigned long p4) +{ + struct tdx_hypercall_output out; + + /* Non zero return value indicates buggy TDX module, so panic */ + if (__tdx_hypercall(nr, p1, p2, p3, p4, 0, &out)) + panic("KVM hypercall %u failed. Buggy TDX module?\n", nr); + + return out.r10; +} +EXPORT_SYMBOL_GPL(tdx_kvm_hypercall); +#endif + static u64 __cpuidle _tdx_halt(const bool irq_disabled, const bool do_sti) { /* --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C2D7C433F5 for ; Mon, 24 Jan 2022 15:03:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243243AbiAXPDM (ORCPT ); Mon, 24 Jan 2022 10:03:12 -0500 Received: from mga17.intel.com ([192.55.52.151]:63901 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239694AbiAXPC2 (ORCPT ); Mon, 24 Jan 2022 10:02:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036548; x=1674572548; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ny8hPu4sqyibLkN69362ey04CLPXYx99o5E/SzjH7Yk=; b=OzFHx6w5Oc3JkCrLOQyECwxcM281PFPHWEk9uZ8GqkzUAgZlGZscZOfK dT3purxiz9wvqOtT6Kj0RuGkmkzXt2s/4m0OQdqA+qUqPhxQo87LdBDmw 9RJ8z0Wt50eJvSA7okWXeRm9PF4+3glgNjiQ/NQPBumOBxHbIj4ZPX2/W iimxa6Bz90nAyfJTjq1Vie8A9V6b/zDNTWoOWtE7PnP3elnE46z21Skiu PbvnL8qBNakoH6BIx0Tgv1cCPsLXam3UMBSwy6T9oOFpDqcYfhZCIpjVw io9KHhb5/nehd8qb8uI5xITipHPlVwziFnGIMzF5Cj3+O24aGkb5f+/CL A==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="226734707" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="226734707" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="562680259" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga001.jf.intel.com with ESMTP; 24 Jan 2022 07:02:20 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 1CBD8ACE; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, Sean Christopherson , Kai Huang , "Kirill A . Shutemov" Subject: [PATCHv2 16/29] x86/boot: Add a trampoline for booting APs via firmware handoff Date: Mon, 24 Jan 2022 18:02:02 +0300 Message-Id: <20220124150215.36893-17-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Historically, x86 platforms have booted secondary processors (APs) using INIT followed by the start up IPI (SIPI) messages. In regular VMs, this boot sequence is supported by the VMM emulation. But such a wakeup model is fatal for secure VMs like TDX in which VMM is an untrusted entity. To address this issue, a new wakeup model was added in ACPI v6.4, in which firmware (like TDX virtual BIOS) will help boot the APs. More details about this wakeup model can be found in ACPI specification v6.4, the section titled "Multiprocessor Wakeup Structure". Since the existing trampoline code requires processors to boot in real mode with 16-bit addressing, it will not work for this wakeup model (because it boots the AP in 64-bit mode). To handle it, extend the trampoline code to support 64-bit mode firmware handoff. Also, extend IDT and GDT pointers to support 64-bit mode hand off. There is no TDX-specific detection for this new boot method. The kernel will rely on it as the sole boot method whenever the new ACPI structure is present. The ACPI table parser for the MADT multiprocessor wake up structure and the wakeup method that uses this structure will be added by the following patch in this series. Reported-by: Kai Huang Signed-off-by: Sean Christopherson Reviewed-by: Andi Kleen Reviewed-by: Dan Williams Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/apic.h | 2 ++ arch/x86/include/asm/realmode.h | 1 + arch/x86/kernel/smpboot.c | 12 ++++++-- arch/x86/realmode/rm/header.S | 1 + arch/x86/realmode/rm/trampoline_64.S | 38 ++++++++++++++++++++++++ arch/x86/realmode/rm/trampoline_common.S | 12 +++++++- 6 files changed, 63 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 48067af94678..35006e151774 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -328,6 +328,8 @@ struct apic { =20 /* wakeup_secondary_cpu */ int (*wakeup_secondary_cpu)(int apicid, unsigned long start_eip); + /* wakeup secondary CPU using 64-bit wakeup point */ + int (*wakeup_secondary_cpu_64)(int apicid, unsigned long start_eip); =20 void (*inquire_remote_apic)(int apicid); =20 diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmod= e.h index 331474b150f1..fd6f6e5b755a 100644 --- a/arch/x86/include/asm/realmode.h +++ b/arch/x86/include/asm/realmode.h @@ -25,6 +25,7 @@ struct real_mode_header { u32 sev_es_trampoline_start; #endif #ifdef CONFIG_X86_64 + u32 trampoline_start64; u32 trampoline_pgd; #endif /* ACPI S3 wakeup */ diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 617012f4619f..6269dd126dba 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1088,6 +1088,11 @@ static int do_boot_cpu(int apicid, int cpu, struct t= ask_struct *idle, unsigned long boot_error =3D 0; unsigned long timeout; =20 +#ifdef CONFIG_X86_64 + /* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */ + if (apic->wakeup_secondary_cpu_64) + start_ip =3D real_mode_header->trampoline_start64; +#endif idle->thread.sp =3D (unsigned long)task_pt_regs(idle); early_gdt_descr.address =3D (unsigned long)get_cpu_gdt_rw(cpu); initial_code =3D (unsigned long)start_secondary; @@ -1129,11 +1134,14 @@ static int do_boot_cpu(int apicid, int cpu, struct = task_struct *idle, =20 /* * Wake up a CPU in difference cases: - * - Use the method in the APIC driver if it's defined + * - Use a method from the APIC driver if one defined, with wakeup + * straight to 64-bit mode preferred over wakeup to RM. * Otherwise, * - Use an INIT boot APIC message for APs or NMI for BSP. */ - if (apic->wakeup_secondary_cpu) + if (apic->wakeup_secondary_cpu_64) + boot_error =3D apic->wakeup_secondary_cpu_64(apicid, start_ip); + else if (apic->wakeup_secondary_cpu) boot_error =3D apic->wakeup_secondary_cpu(apicid, start_ip); else boot_error =3D wakeup_cpu_via_init_nmi(cpu, start_ip, apicid, diff --git a/arch/x86/realmode/rm/header.S b/arch/x86/realmode/rm/header.S index 8c1db5bf5d78..2eb62be6d256 100644 --- a/arch/x86/realmode/rm/header.S +++ b/arch/x86/realmode/rm/header.S @@ -24,6 +24,7 @@ SYM_DATA_START(real_mode_header) .long pa_sev_es_trampoline_start #endif #ifdef CONFIG_X86_64 + .long pa_trampoline_start64 .long pa_trampoline_pgd; #endif /* ACPI S3 wakeup */ diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/tr= ampoline_64.S index cc8391f86cdb..ae112a91592f 100644 --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -161,6 +161,19 @@ SYM_CODE_START(startup_32) ljmpl $__KERNEL_CS, $pa_startup_64 SYM_CODE_END(startup_32) =20 +SYM_CODE_START(pa_trampoline_compat) + /* + * In compatibility mode. Prep ESP and DX for startup_32, then disable + * paging and complete the switch to legacy 32-bit mode. + */ + movl $rm_stack_end, %esp + movw $__KERNEL_DS, %dx + + movl $X86_CR0_PE, %eax + movl %eax, %cr0 + ljmpl $__KERNEL32_CS, $pa_startup_32 +SYM_CODE_END(pa_trampoline_compat) + .section ".text64","ax" .code64 .balign 4 @@ -169,6 +182,20 @@ SYM_CODE_START(startup_64) jmpq *tr_start(%rip) SYM_CODE_END(startup_64) =20 +SYM_CODE_START(trampoline_start64) + /* + * APs start here on a direct transfer from 64-bit BIOS with identity + * mapped page tables. Load the kernel's GDT in order to gear down to + * 32-bit mode (to handle 4-level vs. 5-level paging), and to (re)load + * segment registers. Load the zero IDT so any fault triggers a + * shutdown instead of jumping back into BIOS. + */ + lidt tr_idt(%rip) + lgdt tr_gdt64(%rip) + + ljmpl *tr_compat(%rip) +SYM_CODE_END(trampoline_start64) + .section ".rodata","a" # Duplicate the global descriptor table # so the kernel can live anywhere @@ -182,6 +209,17 @@ SYM_DATA_START(tr_gdt) .quad 0x00cf93000000ffff # __KERNEL_DS SYM_DATA_END_LABEL(tr_gdt, SYM_L_LOCAL, tr_gdt_end) =20 +SYM_DATA_START(tr_gdt64) + .short tr_gdt_end - tr_gdt - 1 # gdt limit + .long pa_tr_gdt + .long 0 +SYM_DATA_END(tr_gdt64) + +SYM_DATA_START(tr_compat) + .long pa_trampoline_compat + .short __KERNEL32_CS +SYM_DATA_END(tr_compat) + .bss .balign PAGE_SIZE SYM_DATA(trampoline_pgd, .space PAGE_SIZE) diff --git a/arch/x86/realmode/rm/trampoline_common.S b/arch/x86/realmode/r= m/trampoline_common.S index 5033e640f957..4331c32c47f8 100644 --- a/arch/x86/realmode/rm/trampoline_common.S +++ b/arch/x86/realmode/rm/trampoline_common.S @@ -1,4 +1,14 @@ /* SPDX-License-Identifier: GPL-2.0 */ .section ".rodata","a" .balign 16 -SYM_DATA_LOCAL(tr_idt, .fill 1, 6, 0) + +/* + * When a bootloader hands off to the kernel in 32-bit mode an + * IDT with a 2-byte limit and 4-byte base is needed. When a boot + * loader hands off to a kernel 64-bit mode the base address + * extends to 8-bytes. Reserve enough space for either scenario. + */ +SYM_DATA_START_LOCAL(tr_idt) + .short 0 + .quad 0 +SYM_DATA_END(tr_idt) --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94455C433F5 for ; Mon, 24 Jan 2022 15:03:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235309AbiAXPDx (ORCPT ); Mon, 24 Jan 2022 10:03:53 -0500 Received: from mga03.intel.com ([134.134.136.65]:2612 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240370AbiAXPCi (ORCPT ); Mon, 24 Jan 2022 10:02:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036558; x=1674572558; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cV4z4YKFQARzdqCciUMKqkAkSeqeXmSbqMOZknKC3MQ=; b=TIk9eflzA74EsunZvCENBQ69CPndxp36jPqVUbKHn1wd0T+SHl3PuWMR XNP7V4YVLRGubedXgJsBFU2HHIAo4jcb7mDebR0zsYr6KacZ6+B13vsCw PEqjhgafm9ewUvf9elTrfGlKfyu3eBQUAvnnRbuRoS7p8rTlPUTucbyRB d+VEIC7OoaRsIakSzAa3y+BWMBkHtB/unZGgSSOmGo4y919ASqMI+Zl1i p/gpdw82pjcTXwXxJADs/chF7sPQVi+6+LuLPNoiAuPGxxht0oqI+iv+B 4avUhGK4tA7wRfBUNSfVJakI5FOFACHEthlP8OWarbGYLWu/rr3JCQbnK A==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="246007654" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="246007654" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="596810315" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga004.fm.intel.com with ESMTP; 24 Jan 2022 07:02:20 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 27741BAF; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, Sean Christopherson , "Rafael J . Wysocki" , "Kirill A . Shutemov" Subject: [PATCHv2 17/29] x86/acpi, x86/boot: Add multiprocessor wake-up support Date: Mon, 24 Jan 2022 18:02:03 +0300 Message-Id: <20220124150215.36893-18-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan TDX cannot use INIT/SIPI protocol to bring up secondary CPUs because it requires assistance from untrusted VMM. For platforms that do not support SIPI/INIT, ACPI defines a wakeup model (using mailbox) via MADT multiprocessor wakeup structure. More details about it can be found in ACPI specification v6.4, the section titled "Multiprocessor Wakeup Structure". If a platform firmware produces the multiprocessor wakeup structure, then OS may use this new mailbox-based mechanism to wake up the APs. Add ACPI MADT wake structure parsing support for x86 platform and if MADT wake table is present, update apic->wakeup_secondary_cpu_64 with new API which uses MADT wake mailbox to wake-up CPU. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Reviewed-by: Andi Kleen Reviewed-by: Rafael J. Wysocki Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/apic.h | 5 ++ arch/x86/kernel/acpi/boot.c | 114 ++++++++++++++++++++++++++++++++++++ arch/x86/kernel/apic/apic.c | 10 ++++ 3 files changed, 129 insertions(+) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 35006e151774..bd8ae0a7010a 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -490,6 +490,11 @@ static inline unsigned int read_apic_id(void) return apic->get_apic_id(reg); } =20 +#ifdef CONFIG_X86_64 +typedef int (*wakeup_cpu_handler)(int apicid, unsigned long start_eip); +extern void acpi_wake_cpu_handler_update(wakeup_cpu_handler handler); +#endif + extern int default_apic_id_valid(u32 apicid); extern int default_acpi_madt_oem_check(char *, char *); extern void default_setup_apic_routing(void); diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 5b6d1a95776f..af204a217575 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -65,6 +65,15 @@ static u64 acpi_lapic_addr __initdata =3D APIC_DEFAULT_P= HYS_BASE; static bool acpi_support_online_capable; #endif =20 +#ifdef CONFIG_X86_64 +/* Physical address of the Multiprocessor Wakeup Structure mailbox */ +static u64 acpi_mp_wake_mailbox_paddr; +/* Virtual address of the Multiprocessor Wakeup Structure mailbox */ +static struct acpi_madt_multiproc_wakeup_mailbox *acpi_mp_wake_mailbox; +/* Lock to protect mailbox (acpi_mp_wake_mailbox) from parallel access */ +static DEFINE_SPINLOCK(mailbox_lock); +#endif + #ifdef CONFIG_X86_IO_APIC /* * Locks related to IOAPIC hotplug @@ -336,6 +345,80 @@ acpi_parse_lapic_nmi(union acpi_subtable_headers * hea= der, const unsigned long e return 0; } =20 +#ifdef CONFIG_X86_64 +/* Virtual address of the Multiprocessor Wakeup Structure mailbox */ +static int acpi_wakeup_cpu(int apicid, unsigned long start_ip) +{ + static physid_mask_t apic_id_wakemap =3D PHYSID_MASK_NONE; + unsigned long flags; + u8 timeout; + + /* Remap mailbox memory only for the first call to acpi_wakeup_cpu() */ + if (physids_empty(apic_id_wakemap)) { + acpi_mp_wake_mailbox =3D memremap(acpi_mp_wake_mailbox_paddr, + sizeof(*acpi_mp_wake_mailbox), + MEMREMAP_WB); + } + + /* + * According to the ACPI specification r6.4, section titled + * "Multiprocessor Wakeup Structure" the mailbox-based wakeup + * mechanism cannot be used more than once for the same CPU. + * Skip wakeups if they are attempted more than once. + */ + if (physid_isset(apicid, apic_id_wakemap)) { + pr_err("CPU already awake (APIC ID %x), skipping wakeup\n", + apicid); + return -EINVAL; + } + + spin_lock_irqsave(&mailbox_lock, flags); + + /* + * Mailbox memory is shared between firmware and OS. Firmware will + * listen on mailbox command address, and once it receives the wakeup + * command, CPU associated with the given apicid will be booted. + * + * The value of apic_id and wakeup_vector has to be set before updating + * the wakeup command. To let compiler preserve order of writes, use + * smp_store_release. + */ + smp_store_release(&acpi_mp_wake_mailbox->apic_id, apicid); + smp_store_release(&acpi_mp_wake_mailbox->wakeup_vector, start_ip); + smp_store_release(&acpi_mp_wake_mailbox->command, + ACPI_MP_WAKE_COMMAND_WAKEUP); + + /* + * After writing the wakeup command, wait for maximum timeout of 0xFF + * for firmware to reset the command address back zero to indicate + * the successful reception of command. + * NOTE: 0xFF as timeout value is decided based on our experiments. + * + * XXX: Change the timeout once ACPI specification comes up with + * standard maximum timeout value. + */ + timeout =3D 0xFF; + while (READ_ONCE(acpi_mp_wake_mailbox->command) && --timeout) + cpu_relax(); + + /* If timed out (timeout =3D=3D 0), return error */ + if (!timeout) { + spin_unlock_irqrestore(&mailbox_lock, flags); + return -EIO; + } + + /* + * If the CPU wakeup process is successful, store the + * status in apic_id_wakemap to prevent re-wakeup + * requests. + */ + physid_set(apicid, apic_id_wakemap); + + spin_unlock_irqrestore(&mailbox_lock, flags); + + return 0; +} +#endif #endif /*CONFIG_X86_LOCAL_APIC */ =20 #ifdef CONFIG_X86_IO_APIC @@ -1083,6 +1166,29 @@ static int __init acpi_parse_madt_lapic_entries(void) } return 0; } + +#ifdef CONFIG_X86_64 +static int __init acpi_parse_mp_wake(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_madt_multiproc_wakeup *mp_wake; + + if (!IS_ENABLED(CONFIG_SMP)) + return -ENODEV; + + mp_wake =3D (struct acpi_madt_multiproc_wakeup *)header; + if (BAD_MADT_ENTRY(mp_wake, end)) + return -EINVAL; + + acpi_table_print_madt_entry(&header->common); + + acpi_mp_wake_mailbox_paddr =3D mp_wake->base_address; + + acpi_wake_cpu_handler_update(acpi_wakeup_cpu); + + return 0; +} +#endif /* CONFIG_X86_64 */ #endif /* CONFIG_X86_LOCAL_APIC */ =20 #ifdef CONFIG_X86_IO_APIC @@ -1278,6 +1384,14 @@ static void __init acpi_process_madt(void) =20 smp_found_config =3D 1; } + +#ifdef CONFIG_X86_64 + /* + * Parse MADT MP Wake entry. + */ + acpi_table_parse_madt(ACPI_MADT_TYPE_MULTIPROC_WAKEUP, + acpi_parse_mp_wake, 1); +#endif } if (error =3D=3D -EINVAL) { /* diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index b70344bf6600..3c8f2c797a98 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2551,6 +2551,16 @@ u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool= extid) } EXPORT_SYMBOL_GPL(x86_msi_msg_get_destid); =20 +#ifdef CONFIG_X86_64 +void __init acpi_wake_cpu_handler_update(wakeup_cpu_handler handler) +{ + struct apic **drv; + + for (drv =3D __apicdrivers; drv < __apicdrivers_end; drv++) + (*drv)->wakeup_secondary_cpu_64 =3D handler; +} +#endif + /* * Override the generic EOI implementation with an optimized version. * Only called during early boot when only one CPU is active and with --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50FF7C433EF for ; Mon, 24 Jan 2022 15:03:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240377AbiAXPDT (ORCPT ); Mon, 24 Jan 2022 10:03:19 -0500 Received: from mga09.intel.com ([134.134.136.24]:24561 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239064AbiAXPC2 (ORCPT ); Mon, 24 Jan 2022 10:02:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036548; x=1674572548; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=itd9JwVebQjamRb5OjISwJgBcwd47o7FlQXvt16/lwM=; b=GW53SU/HC3gmnpmjO+13QD9E4QQ+lOUO1XGOdvOCf6lHP1inrSJhxWx5 kHRxqqpYikB+WgXnM3SKxmdyx/OeMv/LjjZb0N3drmyBwWiuSJhjyIe2q ja4HKYtMCE2sRmLOOAsl6O4KzysAqSspCyKA/FMQduAxHSDmODm7Y/+MZ QFqANr3gxowV13QhdPbbzzQSHAgAVa6wMU/B933Axb/sIqLY6tSrLQjUr MP0tTFKmsDfzHRq1VCnQxJGhLim8DCO4dSHKox3TbwValZ6wxzVEonilH o4OeP3rppsblYnDSLCwAT3vTfHzvyRN9mr0FIvK5mIHYYODRsTRvLeLR1 w==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="245843897" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="245843897" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="766422599" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga006.fm.intel.com with ESMTP; 24 Jan 2022 07:02:20 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 32D5FBC6; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv2 18/29] x86/boot: Avoid #VE during boot for TDX platforms Date: Mon, 24 Jan 2022 18:02:04 +0300 Message-Id: <20220124150215.36893-19-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Sean Christopherson There are a few MSRs and control register bits that the kernel normally needs to modify during boot. But, TDX disallows modification of these registers to help provide consistent security guarantees. Fortunately, TDX ensures that these are all in the correct state before the kernel loads, which means the kernel does not need to modify them. The conditions to avoid are: * Any writes to the EFER MSR * Clearing CR0.NE * Clearing CR3.MCE This theoretically makes the guest boot more fragile. If, for instance, EFER was set up incorrectly and a WRMSR was performed, it will trigger early exception panic or a triple fault, if it's before early exceptions are set up. However, this is likely to trip up the guest BIOS long before control reaches the kernel. In any case, these kinds of problems are unlikely to occur in production environments, and developers have good debug tools to fix them quickly. Change the common boot code to work on TDX and non-TDX systems. This should have no functional effect on non-TDX systems. Signed-off-by: Sean Christopherson Reviewed-by: Andi Kleen Reviewed-by: Dan Williams Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov --- arch/x86/Kconfig | 1 + arch/x86/boot/compressed/head_64.S | 25 +++++++++++++++++++++---- arch/x86/boot/compressed/pgtable.h | 2 +- arch/x86/kernel/head_64.S | 24 ++++++++++++++++++++++-- arch/x86/realmode/rm/trampoline_64.S | 27 +++++++++++++++++++++++---- 5 files changed, 68 insertions(+), 11 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1491f25c844e..1c59e02792e4 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -885,6 +885,7 @@ config INTEL_TDX_GUEST depends on X86_64 && CPU_SUP_INTEL depends on X86_X2APIC select ARCH_HAS_CC_PLATFORM + select X86_MCE help Support running as a guest under Intel TDX. Without this support, the guest kernel can not boot or run under TDX. diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/= head_64.S index fd9441f40457..b576d23d37cb 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -643,12 +643,25 @@ SYM_CODE_START(trampoline_32bit_src) movl $MSR_EFER, %ecx rdmsr btsl $_EFER_LME, %eax + /* Avoid writing EFER if no change was made (for TDX guest) */ + jc 1f wrmsr - popl %edx +1: popl %edx popl %ecx =20 /* Enable PAE and LA57 (if required) paging modes */ - movl $X86_CR4_PAE, %eax + movl %cr4, %eax + +#ifdef CONFIG_X86_MCE + /* + * Preserve CR4.MCE if the kernel will enable #MC support. Clearing + * MCE may fault in some environments (that also force #MC support). + * Any machine check that occurs before #MC support is fully configured + * will crash the system regardless of the CR4.MCE value set here. + */ + andl $X86_CR4_MCE, %eax +#endif + orl $X86_CR4_PAE, %eax testl %edx, %edx jz 1f orl $X86_CR4_LA57, %eax @@ -662,8 +675,12 @@ SYM_CODE_START(trampoline_32bit_src) pushl $__KERNEL_CS pushl %eax =20 - /* Enable paging again */ - movl $(X86_CR0_PG | X86_CR0_PE), %eax + /* + * Enable paging again. Keep CR0.NE set, FERR# is no longer used + * to handle x87 FPU errors and clearing NE may fault in some + * environments. + */ + movl $(X86_CR0_PG | X86_CR0_NE | X86_CR0_PE), %eax movl %eax, %cr0 =20 lret diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/= pgtable.h index 6ff7e81b5628..cc9b2529a086 100644 --- a/arch/x86/boot/compressed/pgtable.h +++ b/arch/x86/boot/compressed/pgtable.h @@ -6,7 +6,7 @@ #define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0 =20 #define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE -#define TRAMPOLINE_32BIT_CODE_SIZE 0x70 +#define TRAMPOLINE_32BIT_CODE_SIZE 0x80 =20 #define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE =20 diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 9c63fc5988cd..652845cc527e 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -141,7 +141,17 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_= GLOBAL) 1: =20 /* Enable PAE mode, PGE and LA57 */ - movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx + movq %cr4, %rcx +#ifdef CONFIG_X86_MCE + /* + * Preserve CR4.MCE if the kernel will enable #MC support. Clearing + * MCE may fault in some environments (that also force #MC support). + * Any machine check that occurs before #MC support is fully configured + * will crash the system regardless of the CR4.MCE value set here. + */ + andl $X86_CR4_MCE, %ecx +#endif + orl $(X86_CR4_PAE | X86_CR4_PGE), %ecx #ifdef CONFIG_X86_5LEVEL testl $1, __pgtable_l5_enabled(%rip) jz 1f @@ -246,13 +256,23 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L= _GLOBAL) /* Setup EFER (Extended Feature Enable Register) */ movl $MSR_EFER, %ecx rdmsr + /* + * Preserve current value of EFER for comparison and to skip + * EFER writes if no change was made (for TDX guest) + */ + movl %eax, %edx btsl $_EFER_SCE, %eax /* Enable System Call */ btl $20,%edi /* No Execute supported? */ jnc 1f btsl $_EFER_NX, %eax btsq $_PAGE_BIT_NX,early_pmd_flags(%rip) -1: wrmsr /* Make changes effective */ =20 + /* Avoid writing EFER if no change was made (for TDX guest) */ +1: cmpl %edx, %eax + je 1f + xor %edx, %edx + wrmsr /* Make changes effective */ +1: /* Setup cr0 */ movl $CR0_STATE, %eax /* Make changes effective */ diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/tr= ampoline_64.S index ae112a91592f..170f248d5769 100644 --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -143,13 +143,28 @@ SYM_CODE_START(startup_32) movl %eax, %cr3 =20 # Set up EFER + movl $MSR_EFER, %ecx + rdmsr + /* + * Skip writing to EFER if the register already has desired + * value (to avoid #VE for the TDX guest). + */ + cmp pa_tr_efer, %eax + jne .Lwrite_efer + cmp pa_tr_efer + 4, %edx + je .Ldone_efer +.Lwrite_efer: movl pa_tr_efer, %eax movl pa_tr_efer + 4, %edx - movl $MSR_EFER, %ecx wrmsr =20 - # Enable paging and in turn activate Long Mode - movl $(X86_CR0_PG | X86_CR0_WP | X86_CR0_PE), %eax +.Ldone_efer: + /* + * Enable paging and in turn activate Long Mode. Keep CR0.NE set, FERR# + * is no longer used to handle x87 FPU errors and clearing NE may fault + * in some environments. + */ + movl $(X86_CR0_PG | X86_CR0_WP | X86_CR0_NE | X86_CR0_PE), %eax movl %eax, %cr0 =20 /* @@ -169,7 +184,11 @@ SYM_CODE_START(pa_trampoline_compat) movl $rm_stack_end, %esp movw $__KERNEL_DS, %dx =20 - movl $X86_CR0_PE, %eax + /* + * Keep CR0.NE set, FERR# is no longer used to handle x87 FPU errors + * and clearing NE may fault in some environments. + */ + movl $(X86_CR0_NE | X86_CR0_PE), %eax movl %eax, %cr0 ljmpl $__KERNEL32_CS, $pa_startup_32 SYM_CODE_END(pa_trampoline_compat) --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CDEBC433EF for ; Mon, 24 Jan 2022 15:03:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243304AbiAXPD0 (ORCPT ); Mon, 24 Jan 2022 10:03:26 -0500 Received: from mga01.intel.com ([192.55.52.88]:64659 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240188AbiAXPCa (ORCPT ); Mon, 24 Jan 2022 10:02:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036550; x=1674572550; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cSRToTCHMEOWJc9OK9FLAffR3Q9AvG81cVXR2avjTXE=; b=fhjCvstedb0/QedXgyGn6A1Axce3V3Gw7Mr0n0bv1Lj5LG11FCLBd51f wCSxMn6MfewS7DL4SjSdSaUTwy0u72jm4QUfGy2cTAeX7G4tF1n5uo22R n/vsAh/eFOdHvJPlj8iu0WACbdPBbzflxJPnu7zc/AWuhvfin94aXZCMR 7vWgJ5WngV4WuFjejlMOQl4ugokqHSA1Uo0qO0ClGVbcO3K09yOYxyWgc OVPv/84YRexXlpGuqiPIElbpHwgRgo9Nla5R+4mLzTbg8mj+c3yYEJOft +it4dGNPXaDr5HGXwGMmC839P/eJ0F8bXmpRReIN7BES8OjqDF3kMbo+3 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="270498643" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="270498643" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="479104733" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga006.jf.intel.com with ESMTP; 24 Jan 2022 07:02:20 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 3D94CBF8; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv2 19/29] x86/topology: Disable CPU online/offline control for TDX guests Date: Mon, 24 Jan 2022 18:02:05 +0300 Message-Id: <20220124150215.36893-20-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan Unlike regular VMs, TDX guests use the firmware hand-off wakeup method to wake up the APs during the boot process. This wakeup model uses a mailbox to communicate with firmware to bring up the APs. As per the design, this mailbox can only be used once for the given AP, which means after the APs are booted, the same mailbox cannot be used to offline/online the given AP. More details about this requirement can be found in Intel TDX Virtual Firmware Design Guide, sec titled "AP initialization in OS" and in sec titled "Hotplug Device". Since the architecture does not support any method of offlining the CPUs, disable CPU hotplug support in the kernel. Since this hotplug disable feature can be re-used by other VM guests, add a new CC attribute CC_ATTR_HOTPLUG_DISABLED and use it to disable the hotplug support. With hotplug disabled, /sys/devices/system/cpu/cpuX/online sysfs option will not exist for TDX guests. Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/kernel/cc_platform.c | 7 ++++++- include/linux/cc_platform.h | 10 ++++++++++ kernel/cpu.c | 3 +++ 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 8da246ab4339..dcb31d6a7554 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -17,8 +17,13 @@ =20 static bool intel_cc_platform_has(enum cc_attr attr) { - if (attr =3D=3D CC_ATTR_GUEST_UNROLL_STRING_IO) + switch (attr) { + case CC_ATTR_GUEST_UNROLL_STRING_IO: + case CC_ATTR_HOTPLUG_DISABLED: return true; + default: + return false; + } =20 return false; } diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index efd8205282da..691494bbaf5a 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -72,6 +72,16 @@ enum cc_attr { * Examples include TDX guest & SEV. */ CC_ATTR_GUEST_UNROLL_STRING_IO, + + /** + * @CC_ATTR_HOTPLUG_DISABLED: Hotplug is not supported or disabled. + * + * The platform/OS is running as a guest/virtual machine does not + * support CPU hotplug feature. + * + * Examples include TDX Guest. + */ + CC_ATTR_HOTPLUG_DISABLED, }; =20 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM diff --git a/kernel/cpu.c b/kernel/cpu.c index 407a2568f35e..58fd06ebc2c8 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -34,6 +34,7 @@ #include #include #include +#include =20 #include #define CREATE_TRACE_POINTS @@ -1185,6 +1186,8 @@ static int __ref _cpu_down(unsigned int cpu, int task= s_frozen, =20 static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target) { + if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED)) + return -EOPNOTSUPP; if (cpu_hotplug_disabled) return -EBUSY; return _cpu_down(cpu, 0, target); --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD54EC433EF for ; Mon, 24 Jan 2022 15:04:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240669AbiAXPD7 (ORCPT ); Mon, 24 Jan 2022 10:03:59 -0500 Received: from mga06.intel.com ([134.134.136.31]:23292 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243112AbiAXPCm (ORCPT ); Mon, 24 Jan 2022 10:02:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036562; x=1674572562; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RJwVQAG7o45vV5vgUUxYVws+NoT7gBA+wwZMCRe/ZT4=; b=gnJvZYPc03glISxxTUPeF8nM9j+KAPaNhyeP4sqklcbjQdQT2EC2eBFH F+Yj5Vb18WxNR9vs8K5pqXd4dfghRrE+H/vciVVpUvUxfPgGDhoke176c AGW1CKE+ByMq4oW5oM1ZGs8OYa/+rpidhiUDGKexleddZcVmY4FqRxZh6 Y/pNuiVNK8KQTBSDdfpSOGv+ilxrQxqr1MRTbevi5bM2nZVOJAbA4EhxV bnOoTIqmMmrMdKy7kWAUA2c0eZNM6F70hYuI/H6jiMS8/NxVLNCIJYqyv IY1hQ3AseG3AiAjbFhKcznld9mcx77/Z74rfwJMSfsvU2MusUsEs0Pk3Z Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="306776624" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="306776624" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="580395682" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga008.fm.intel.com with ESMTP; 24 Jan 2022 07:02:20 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 47F0EC62; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 20/29] x86/tdx: Get page shared bit info from the TDX module Date: Mon, 24 Jan 2022 18:02:06 +0300 Message-Id: <20220124150215.36893-21-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Intel TDX doesn't allow VMM to access guest private memory. Any memory that is required for communication with the VMM must be shared explicitly by setting a bit in the page table entry. Details about which bit in the page table entry to be used to indicate shared/private state can be determined by using the TDINFO TDCALL (call to TDX module). Fetch and save the guest TD execution environment information at initialization time. The next patch will use the information. More details about the TDINFO TDCALL can be found in Guest-Host-Communication Interface (GHCI) for Intel Trust Domain Extensions (Intel TDX) specification, sec titled "TDCALL[TDINFO]". Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/kernel/tdx.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index a4e696f12666..b27c4261bfd2 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -11,6 +11,7 @@ #include =20 /* TDX module Call Leaf IDs */ +#define TDX_GET_INFO 1 #define TDX_GET_VEINFO 3 =20 /* See Exit Qualification for I/O Instructions in VMX documentation */ @@ -19,6 +20,12 @@ #define VE_GET_PORT_NUM(exit_qual) ((exit_qual) >> 16) #define VE_IS_IO_STRING(exit_qual) ((exit_qual) & 16 ? 1 : 0) =20 +/* Guest TD execution environment information */ +static struct { + unsigned int gpa_width; + unsigned long attributes; +} td_info __ro_after_init; + static bool tdx_guest_detected __ro_after_init; =20 /* @@ -59,6 +66,28 @@ long tdx_kvm_hypercall(unsigned int nr, unsigned long p1= , unsigned long p2, EXPORT_SYMBOL_GPL(tdx_kvm_hypercall); #endif =20 +static void tdx_get_info(void) +{ + struct tdx_module_output out; + u64 ret; + + /* + * TDINFO TDX module call is used to get the TD execution environment + * information like GPA width, number of available vcpus, debug mode + * information, etc. More details about the ABI can be found in TDX + * Guest-Host-Communication Interface (GHCI), sec 2.4.2 TDCALL + * [TDG.VP.INFO]. + */ + ret =3D __tdx_module_call(TDX_GET_INFO, 0, 0, 0, 0, &out); + + /* Non zero return value indicates buggy TDX module, so panic */ + if (ret) + panic("TDINFO TDCALL failed (Buggy TDX module!)\n"); + + td_info.gpa_width =3D out.rcx & GENMASK(5, 0); + td_info.attributes =3D out.rdx; +} + static u64 __cpuidle _tdx_halt(const bool irq_disabled, const bool do_sti) { /* @@ -455,5 +484,7 @@ void __init tdx_early_init(void) =20 setup_force_cpu_cap(X86_FEATURE_TDX_GUEST); =20 + tdx_get_info(); + pr_info("Guest detected\n"); } --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAE54C433FE for ; Mon, 24 Jan 2022 15:03:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240306AbiAXPDo (ORCPT ); Mon, 24 Jan 2022 10:03:44 -0500 Received: from mga02.intel.com ([134.134.136.20]:19111 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240231AbiAXPCd (ORCPT ); Mon, 24 Jan 2022 10:02:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036553; x=1674572553; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=R/2/2Vf7aAJY5Z0gTQ9vw6gI7x3m6wTJrRpGLg8JppI=; b=SeYa2kJgY4u8vieN3NsiqzacHqTXNbYSzhrtL4uS0WY8qOw6agkJPwum g0VNn0nmT/jOQE4p3QLUiB1E6YB/gjaATuq7ZOxkmpDOZiBMAdfqT8ExS wh1atElMoaFgfKVmwvgL8P5pwN0RYf9VWQu/uEmm0PcWpU7w7T1Pb/a8+ MWPi3YqaqG5dUQWI3/cN4PEf5ttAVcMr+E6dJZWiwX1+wZjTgce8Dccul WIBCIPcy0tQzE+5xLOa0RNM87YUXUAs+Eb/qMCuP+gWrxMzESB8/pzbK5 whFPhE4+foQJlbHT+SNXapDpXesuaqM1ZZyI4Mnk/aXWQVX2dNwkWNKKD Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="233423327" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="233423327" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="624104372" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga002.fm.intel.com with ESMTP; 24 Jan 2022 07:02:21 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 52933C98; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 21/29] x86/tdx: Exclude shared bit from __PHYSICAL_MASK Date: Mon, 24 Jan 2022 18:02:07 +0300 Message-Id: <20220124150215.36893-22-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In TDX guests, by default memory is protected from host access. If a guest needs to communicate with the VMM (like the I/O use case), it uses a single bit in the physical address to communicate the protected/shared attribute of the given page. In the x86 ARCH code, __PHYSICAL_MASK macro represents the width of the physical address in the given architecture. It is used in creating physical PAGE_MASK for address bits in the kernel. Since in TDX guest, a single bit is used as metadata, it needs to be excluded from valid physical address bits to avoid using incorrect addresses bits in the kernel. Enable DYNAMIC_PHYSICAL_MASK to support updating the __PHYSICAL_MASK. Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/Kconfig | 1 + arch/x86/kernel/tdx.c | 8 ++++++++ 2 files changed, 9 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1c59e02792e4..680c3cad9422 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -886,6 +886,7 @@ config INTEL_TDX_GUEST depends on X86_X2APIC select ARCH_HAS_CC_PLATFORM select X86_MCE + select DYNAMIC_PHYSICAL_MASK help Support running as a guest under Intel TDX. Without this support, the guest kernel can not boot or run under TDX. diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index b27c4261bfd2..beeaf61934bc 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -486,5 +486,13 @@ void __init tdx_early_init(void) =20 tdx_get_info(); =20 + /* + * All bits above GPA width are reserved and kernel treats shared bit + * as flag, not as part of physical address. + * + * Adjust physical mask to only cover valid GPA bits. + */ + physical_mask &=3D GENMASK_ULL(td_info.gpa_width - 2, 0); + pr_info("Guest detected\n"); } --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37754C433EF for ; Mon, 24 Jan 2022 15:03:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243323AbiAXPDB (ORCPT ); Mon, 24 Jan 2022 10:03:01 -0500 Received: from mga01.intel.com ([192.55.52.88]:64684 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239088AbiAXPC0 (ORCPT ); Mon, 24 Jan 2022 10:02:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036547; x=1674572547; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/1mJoSxaEj3D4K7PWjhYEhCnOZraFsCr3YgPjs2epB0=; b=gfWSLIBxDr08SpKcnZqny/BVLAWMbZmTJM25YCHqARYVid1E7Im5EeZz OQ6i3Hpa0P//F3cyDuLDzDlZZ7mmdlCvKbfTaijQZZT0dVYB4+JhmOnkQ /u/IHYBg5ebMl9KnsAeITb90wOXpzwy8e8Mi/FKGtcqLjHhqNh3Tbfg+/ PzBMIEX+ar8pA62OiNPVNiy7XYc0mMA/x6bgOUbjSjA7IF1SrqdAM8eOm KL84qEadi3ZLcUYMBJHQKYI8oqa/dBnHyU4ooC4PDeyWIZkCxZNRXSVcB EwQKVOxWRCcbGaUw6wXqlYno2oYCEedssdF3cMHQ1e9soKYrOVzYa/wKA Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="270498616" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="270498616" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="534254358" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga008.jf.intel.com with ESMTP; 24 Jan 2022 07:02:20 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 5DCDACCF; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 22/29] x86/tdx: Make pages shared in ioremap() Date: Mon, 24 Jan 2022 18:02:08 +0300 Message-Id: <20220124150215.36893-23-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In TDX guests, guest memory is protected from host access. If a guest performs I/O, it needs to explicitly share the I/O memory with the host. Make all ioremap()ed pages that are not backed by normal memory (IORES_DESC_NONE or IORES_DESC_RESERVED) mapped as shared. Since TDX memory encryption support is similar to AMD SEV architecture, reuse the infrastructure from AMD SEV code. Add tdx_shared_mask() interface to get the TDX guest shared bitmask. pgprot_decrypted() is used by drivers (i915, virtio_gpu, vfio). Export both pgprot_encrypted() and pgprot_decrypted(). Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/pgtable.h | 19 +++++++++++++------ arch/x86/include/asm/tdx.h | 4 ++++ arch/x86/kernel/cc_platform.c | 23 +++++++++++++++++++++++ arch/x86/kernel/tdx.c | 9 +++++++++ arch/x86/mm/ioremap.c | 5 +++++ 5 files changed, 54 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 8a9432fb3802..40e22db48319 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -15,12 +15,6 @@ cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS))) \ : (prot)) =20 -/* - * Macros to add or remove encryption attribute - */ -#define pgprot_encrypted(prot) __pgprot(__sme_set(pgprot_val(prot))) -#define pgprot_decrypted(prot) __pgprot(__sme_clr(pgprot_val(prot))) - #ifndef __ASSEMBLY__ #include #include @@ -38,6 +32,19 @@ void ptdump_walk_pgd_level_debugfs(struct seq_file *m, s= truct mm_struct *mm, void ptdump_walk_pgd_level_checkwx(void); void ptdump_walk_user_pgd_level_checkwx(void); =20 +/* + * Macros to add or remove encryption attribute + */ +#ifdef CONFIG_ARCH_HAS_CC_PLATFORM +pgprot_t pgprot_encrypted(pgprot_t prot); +pgprot_t pgprot_decrypted(pgprot_t prot); +#define pgprot_encrypted(prot) pgprot_encrypted(prot) +#define pgprot_decrypted(prot) pgprot_decrypted(prot) +#else +#define pgprot_encrypted(prot) (prot) +#define pgprot_decrypted(prot) (prot) +#endif + #ifdef CONFIG_DEBUG_WX #define debug_checkwx() ptdump_walk_pgd_level_checkwx() #define debug_checkwx_user() ptdump_walk_user_pgd_level_checkwx() diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 4bcaadf21dc6..c6a279e67dff 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -55,6 +55,8 @@ void tdx_safe_halt(void); =20 bool tdx_early_handle_ve(struct pt_regs *regs); =20 +phys_addr_t tdx_shared_mask(void); + #else =20 static inline void tdx_early_init(void) { }; @@ -63,6 +65,8 @@ static inline void tdx_safe_halt(void) { }; =20 static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return fals= e; } =20 +static inline phys_addr_t tdx_shared_mask(void) { return 0; } + #endif /* CONFIG_INTEL_TDX_GUEST */ =20 #if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index dcb31d6a7554..be8722ad4792 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -12,6 +12,7 @@ #include =20 #include +#include #include #include =20 @@ -90,3 +91,25 @@ bool cc_platform_has(enum cc_attr attr) return false; } EXPORT_SYMBOL_GPL(cc_platform_has); + +pgprot_t pgprot_encrypted(pgprot_t prot) +{ + if (sme_me_mask) + return __pgprot(__sme_set(pgprot_val(prot))); + else if (is_tdx_guest()) + return __pgprot(pgprot_val(prot) & ~tdx_shared_mask()); + + return prot; +} +EXPORT_SYMBOL_GPL(pgprot_encrypted); + +pgprot_t pgprot_decrypted(pgprot_t prot) +{ + if (sme_me_mask) + return __pgprot(__sme_clr(pgprot_val(prot))); + else if (is_tdx_guest()) + return __pgprot(pgprot_val(prot) | tdx_shared_mask()); + + return prot; +} +EXPORT_SYMBOL_GPL(pgprot_decrypted); diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index beeaf61934bc..3bf6621eae7d 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -66,6 +66,15 @@ long tdx_kvm_hypercall(unsigned int nr, unsigned long p1= , unsigned long p2, EXPORT_SYMBOL_GPL(tdx_kvm_hypercall); #endif =20 +/* + * The highest bit of a guest physical address is the "sharing" bit. + * Set it for shared pages and clear it for private pages. + */ +phys_addr_t tdx_shared_mask(void) +{ + return BIT_ULL(td_info.gpa_width - 1); +} + static void tdx_get_info(void) { struct tdx_module_output out; diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 026031b3b782..a5d4ec1afca2 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -242,10 +242,15 @@ __ioremap_caller(resource_size_t phys_addr, unsigned = long size, * If the page being mapped is in memory and SEV is active then * make sure the memory encryption attribute is enabled in the * resulting mapping. + * In TDX guests, memory is marked private by default. If encryption + * is not requested (using encrypted), explicitly set decrypt + * attribute in all IOREMAPPED memory. */ prot =3D PAGE_KERNEL_IO; if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted) prot =3D pgprot_encrypted(prot); + else + prot =3D pgprot_decrypted(prot); =20 switch (pcm) { case _PAGE_CACHE_MODE_UC: --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D35DAC433F5 for ; Mon, 24 Jan 2022 15:03:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243447AbiAXPDk (ORCPT ); Mon, 24 Jan 2022 10:03:40 -0500 Received: from mga01.intel.com ([192.55.52.88]:64673 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240210AbiAXPCb (ORCPT ); Mon, 24 Jan 2022 10:02:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036551; x=1674572551; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cIq4wkLg68PMcrcMSOZIfkxpvIMypkjtFrhM/WVfcdg=; b=gHuSh9yRFLeBX6J0TUyJHWIgjZY1zlENzl1JiD2+geV2HrRibKzQNikQ +uGkviFUtZLovAvSIpDEME95aa6TejeAb6kqgcwmDIlDJ6gT5PWsqwB6/ 7PCkn59gIZgrgR1VcfxRuum/zMfY4fqJTQKHv91hkys3Wtjjc6xC2LnSJ 2Ekne76K8lLfRG8bxwe+OwLtpybHWZSAUCWRLC7BY8BADyM2A998pNwC2 TZDOgs3PVqUhZ1jHDsaS5jq14nyVMnFrus8FXMS779KEzynhWq1H42CVW DJ2AU3jKawt3+Iz+lYhkppoKKs4UUR9YkZyaFfkjMTXRuWtM1Z75uLVBB g==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="270498646" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="270498646" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="479104738" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga006.jf.intel.com with ESMTP; 24 Jan 2022 07:02:24 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 68E7CCD2; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 23/29] x86/tdx: Add helper to convert memory between shared and private Date: Mon, 24 Jan 2022 18:02:09 +0300 Message-Id: <20220124150215.36893-24-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Intel TDX protects guest memory from VMM access. Any memory that is required for communication with the VMM must be explicitly shared. It is a two-step process: the guest sets the shared bit in the page table entry and notifies VMM about the change. The notification happens using MapGPA hypercall. Conversion back to private memory requires clearing the shared bit, notifying VMM with MapGPA hypercall following with accepting the memory with AcceptPage hypercall. Provide a helper to do conversion between shared and private memory. It is going to be used by the following patch. Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/tdx.h | 9 +++++ arch/x86/kernel/tdx.c | 78 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index c6a279e67dff..f6a5fb4bf72c 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -57,6 +57,8 @@ bool tdx_early_handle_ve(struct pt_regs *regs); =20 phys_addr_t tdx_shared_mask(void); =20 +int tdx_hcall_request_gpa_type(phys_addr_t start, phys_addr_t end, bool en= c); + #else =20 static inline void tdx_early_init(void) { }; @@ -67,6 +69,13 @@ static inline bool tdx_early_handle_ve(struct pt_regs *r= egs) { return false; } =20 static inline phys_addr_t tdx_shared_mask(void) { return 0; } =20 + +static inline int tdx_hcall_request_gpa_type(phys_addr_t start, + phys_addr_t end, bool enc) +{ + return -ENODEV; +} + #endif /* CONFIG_INTEL_TDX_GUEST */ =20 #if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST) diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index 3bf6621eae7d..ea638c6ecb92 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -13,6 +13,10 @@ /* TDX module Call Leaf IDs */ #define TDX_GET_INFO 1 #define TDX_GET_VEINFO 3 +#define TDX_ACCEPT_PAGE 6 + +/* TDX hypercall Leaf IDs */ +#define TDVMCALL_MAP_GPA 0x10001 =20 /* See Exit Qualification for I/O Instructions in VMX documentation */ #define VE_IS_IO_IN(exit_qual) (((exit_qual) & 8) ? 1 : 0) @@ -97,6 +101,80 @@ static void tdx_get_info(void) td_info.attributes =3D out.rdx; } =20 +static bool tdx_accept_page(phys_addr_t gpa, enum pg_level pg_level) +{ + /* + * Pass the page physical address to the TDX module to accept the + * pending, private page. + * + * Bits 2:0 if GPA encodes page size: 0 - 4K, 1 - 2M, 2 - 1G. + */ + switch (pg_level) { + case PG_LEVEL_4K: + break; + case PG_LEVEL_2M: + gpa |=3D 1; + break; + case PG_LEVEL_1G: + gpa |=3D 2; + break; + default: + return true; + } + + return __tdx_module_call(TDX_ACCEPT_PAGE, gpa, 0, 0, 0, NULL); +} + +/* + * Inform the VMM of the guest's intent for this physical page: shared with + * the VMM or private to the guest. The VMM is expected to change its map= ping + * of the page in response. + */ +int tdx_hcall_request_gpa_type(phys_addr_t start, phys_addr_t end, bool en= c) +{ + u64 ret; + + if (end <=3D start) + return -EINVAL; + + if (!enc) { + start |=3D tdx_shared_mask(); + end |=3D tdx_shared_mask(); + } + + /* + * Notify the VMM about page mapping conversion. More info about ABI + * can be found in TDX Guest-Host-Communication Interface (GHCI), + * sec "TDG.VP.VMCALL" + */ + ret =3D _tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0, NULL); + + if (ret) + ret =3D -EIO; + + if (ret || !enc) + return ret; + + /* + * For shared->private conversion, accept the page using + * TDX_ACCEPT_PAGE TDX module call. + */ + while (start < end) { + /* Try 2M page accept first if possible */ + if (!(start & ~PMD_MASK) && end - start >=3D PMD_SIZE && + !tdx_accept_page(start, PG_LEVEL_2M)) { + start +=3D PMD_SIZE; + continue; + } + + if (tdx_accept_page(start, PG_LEVEL_4K)) + return -EIO; + start +=3D PAGE_SIZE; + } + + return 0; +} + static u64 __cpuidle _tdx_halt(const bool irq_disabled, const bool do_sti) { /* --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E956C433EF for ; Mon, 24 Jan 2022 15:04:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243423AbiAXPEM (ORCPT ); Mon, 24 Jan 2022 10:04:12 -0500 Received: from mga12.intel.com ([192.55.52.136]:5630 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240314AbiAXPDK (ORCPT ); Mon, 24 Jan 2022 10:03:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036590; x=1674572590; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tKPEY1KQTqIVQ760tEEDgDD58NKNldq3MW2/0q9iqtM=; b=L89JNlrKz7B/qFGfq9NRLYNwpregWvEAJYDisCJTVURzmI31WOVYLcLR dysQwuB6PmiDBswhXUYeXChmHcYm+CWkCRKStEFsH8/eSXLHxLN80SuS7 E2dD0KCN0qkw3FkpqBJgyHBw/le1NbU8oPK2rRMNFaGyJCoTn856i9fsU +XbUWLWfgoihbWQBKIOWV1JOgSwxTgOaWOmMINrZfaKLoTOJFmdsr9PNr 7dh6NkMM01PBUqfxhqVkK6B/NKKWGsLGBrDbRQG9AmqBzzpZzRaBLdEMU LhDLVtzaBiNFCLS566C1k/cHnuqa2lrSkcsuPN0CIUjWq7uBqUkJAkH/9 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="226043252" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="226043252" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="532102282" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga007.fm.intel.com with ESMTP; 24 Jan 2022 07:02:22 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 76389D08; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , Sean Christopherson , Kai Huang Subject: [PATCHv2 24/29] x86/mm/cpa: Add support for TDX shared memory Date: Mon, 24 Jan 2022 18:02:10 +0300 Message-Id: <20220124150215.36893-25-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" TDX steals a bit from the physical address and uses it to indicate whether the page is private to the guest (bit set 0) or unprotected and shared with the VMM (bit set 1). AMD SEV uses a similar scheme, repurposing a bit from the physical address to indicate encrypted or decrypted pages. The kernel already has the infrastructure to deal with encrypted/decrypted pages for AMD SEV. Modify the __set_memory_enc_pgtable() and make it aware about TDX. After modifying page table entries, the kernel needs to notify VMM about the change with tdx_hcall_request_gpa_type(). Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Tested-by: Kai Huang Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/Kconfig | 2 +- arch/x86/include/asm/mem_encrypt.h | 8 ++++++ arch/x86/include/asm/set_memory.h | 1 - arch/x86/kernel/cc_platform.c | 2 ++ arch/x86/mm/mem_encrypt_amd.c | 10 ++++--- arch/x86/mm/pat/set_memory.c | 44 ++++++++++++++++++++++++++---- include/linux/cc_platform.h | 9 ++++++ 7 files changed, 64 insertions(+), 12 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 680c3cad9422..33e6ec6fd89f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -886,7 +886,7 @@ config INTEL_TDX_GUEST depends on X86_X2APIC select ARCH_HAS_CC_PLATFORM select X86_MCE - select DYNAMIC_PHYSICAL_MASK + select X86_MEM_ENCRYPT help Support running as a guest under Intel TDX. Without this support, the guest kernel can not boot or run under TDX. diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_= encrypt.h index e2c6f433ed10..f45a9ea2dec9 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -52,6 +52,8 @@ void __init mem_encrypt_free_decrypted_mem(void); /* Architecture __weak replacement functions */ void __init mem_encrypt_init(void); =20 +int amd_notify_range_enc_status_changed(unsigned long vaddr, int npages, b= ool enc); + void __init sev_es_init_vc_handling(void); =20 #define __bss_decrypted __section(".bss..decrypted") @@ -85,6 +87,12 @@ early_set_mem_enc_dec_hypercall(unsigned long vaddr, int= npages, bool enc) {} =20 static inline void mem_encrypt_free_decrypted_mem(void) { } =20 +static inline int amd_notify_range_enc_status_changed(unsigned long vaddr, + int npages, bool enc) +{ + return 0; +} + #define __bss_decrypted =20 #endif /* CONFIG_AMD_MEM_ENCRYPT */ diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_m= emory.h index ff0f2d90338a..ce8dd215f5b3 100644 --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -84,7 +84,6 @@ int set_pages_rw(struct page *page, int numpages); int set_direct_map_invalid_noflush(struct page *page); int set_direct_map_default_noflush(struct page *page); bool kernel_page_present(struct page *page); -void notify_range_enc_status_changed(unsigned long vaddr, int npages, bool= enc); =20 extern int kernel_set_to_readonly; =20 diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index be8722ad4792..1fbcf19fa20d 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -21,6 +21,8 @@ static bool intel_cc_platform_has(enum cc_attr attr) switch (attr) { case CC_ATTR_GUEST_UNROLL_STRING_IO: case CC_ATTR_HOTPLUG_DISABLED: + case CC_ATTR_GUEST_TDX: + case CC_ATTR_GUEST_MEM_ENCRYPT: return true; default: return false; diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c index 2b2d018ea345..6aa4e0c27368 100644 --- a/arch/x86/mm/mem_encrypt_amd.c +++ b/arch/x86/mm/mem_encrypt_amd.c @@ -256,7 +256,8 @@ static unsigned long pg_level_to_pfn(int level, pte_t *= kpte, pgprot_t *ret_prot) return pfn; } =20 -void notify_range_enc_status_changed(unsigned long vaddr, int npages, bool= enc) +int amd_notify_range_enc_status_changed(unsigned long vaddr, int npages, + bool enc) { #ifdef CONFIG_PARAVIRT unsigned long sz =3D npages << PAGE_SHIFT; @@ -270,7 +271,7 @@ void notify_range_enc_status_changed(unsigned long vadd= r, int npages, bool enc) kpte =3D lookup_address(vaddr, &level); if (!kpte || pte_none(*kpte)) { WARN_ONCE(1, "kpte lookup for vaddr\n"); - return; + return 0; } =20 pfn =3D pg_level_to_pfn(level, kpte, NULL); @@ -285,6 +286,7 @@ void notify_range_enc_status_changed(unsigned long vadd= r, int npages, bool enc) vaddr =3D (vaddr & pmask) + psize; } #endif + return 0; } =20 static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc) @@ -392,7 +394,7 @@ static int __init early_set_memory_enc_dec(unsigned lon= g vaddr, =20 ret =3D 0; =20 - notify_range_enc_status_changed(start, PAGE_ALIGN(size) >> PAGE_SHIFT, en= c); + amd_notify_range_enc_status_changed(start, PAGE_ALIGN(size) >> PAGE_SHIFT= , enc); out: __flush_tlb_all(); return ret; @@ -410,7 +412,7 @@ int __init early_set_memory_encrypted(unsigned long vad= dr, unsigned long size) =20 void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npage= s, bool enc) { - notify_range_enc_status_changed(vaddr, npages, enc); + amd_notify_range_enc_status_changed(vaddr, npages, enc); } =20 void __init mem_encrypt_free_decrypted_mem(void) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index b4072115c8ef..06c65689d6fb 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -32,6 +32,7 @@ #include #include #include +#include =20 #include "../mm_internal.h" =20 @@ -1983,6 +1984,27 @@ int set_memory_global(unsigned long addr, int numpag= es) __pgprot(_PAGE_GLOBAL), 0); } =20 +static pgprot_t pgprot_cc_mask(bool enc) +{ + if (enc) + return pgprot_encrypted(__pgprot(0)); + else + return pgprot_decrypted(__pgprot(0)); +} + +static int notify_range_enc_status_changed(unsigned long vaddr, int npages, + bool enc) +{ + if (cc_platform_has(CC_ATTR_GUEST_TDX)) { + phys_addr_t start =3D __pa(vaddr); + phys_addr_t end =3D __pa(vaddr + npages * PAGE_SIZE); + + return tdx_hcall_request_gpa_type(start, end, enc); + } else { + return amd_notify_range_enc_status_changed(vaddr, npages, enc); + } +} + /* * __set_memory_enc_pgtable() is used for the hypervisors that get * informed about "encryption" status via page tables. @@ -1999,8 +2021,10 @@ static int __set_memory_enc_pgtable(unsigned long ad= dr, int numpages, bool enc) memset(&cpa, 0, sizeof(cpa)); cpa.vaddr =3D &addr; cpa.numpages =3D numpages; - cpa.mask_set =3D enc ? __pgprot(_PAGE_ENC) : __pgprot(0); - cpa.mask_clr =3D enc ? __pgprot(0) : __pgprot(_PAGE_ENC); + + cpa.mask_set =3D pgprot_cc_mask(enc); + cpa.mask_clr =3D pgprot_cc_mask(!enc); + cpa.pgd =3D init_mm.pgd; =20 /* Must avoid aliasing mappings in the highmem code */ @@ -2008,9 +2032,17 @@ static int __set_memory_enc_pgtable(unsigned long ad= dr, int numpages, bool enc) vm_unmap_aliases(); =20 /* - * Before changing the encryption attribute, we need to flush caches. + * Before changing the encryption attribute, flush caches. + * + * For TDX, guest is responsible for flushing caches on private->shared + * transition. VMM is responsible for flushing on shared->private. */ - cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT)); + if (cc_platform_has(CC_ATTR_GUEST_TDX)) { + if (!enc) + cpa_flush(&cpa, 1); + } else { + cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT)); + } =20 ret =3D __change_page_attr_set_clr(&cpa, 1); =20 @@ -2027,8 +2059,8 @@ static int __set_memory_enc_pgtable(unsigned long add= r, int numpages, bool enc) * Notify hypervisor that a given memory range is mapped encrypted * or decrypted. */ - notify_range_enc_status_changed(addr, numpages, enc); - + if (!ret) + ret =3D notify_range_enc_status_changed(addr, numpages, enc); return ret; } =20 diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index 691494bbaf5a..16c0ad925bf0 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -82,6 +82,15 @@ enum cc_attr { * Examples include TDX Guest. */ CC_ATTR_HOTPLUG_DISABLED, + + /** + * @CC_ATTR_GUEST_TDX: Trust Domain Extension Support + * + * The platform/OS is running as a TDX guest/virtual machine. + * + * Examples include Intel TDX. + */ + CC_ATTR_GUEST_TDX =3D 0x100, }; =20 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7CC0C433F5 for ; Mon, 24 Jan 2022 15:04:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243265AbiAXPEC (ORCPT ); Mon, 24 Jan 2022 10:04:02 -0500 Received: from mga06.intel.com ([134.134.136.31]:23292 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243272AbiAXPC4 (ORCPT ); Mon, 24 Jan 2022 10:02:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036576; x=1674572576; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Qp+xpLoEv3ZzqsbcA1kKKaSYMc1Xo1lK2tI+kDEsRVU=; b=gEZ+YAyxQlVPkxyyWZqfzcGhzhUA55XhYSd7sx+e/k60Hj8D5lxBqWiM aZ+emwISk/xhNF170i2pqn9RnBwqQv2hVa9RxGnPORNNbx3pYy04eGLHu forn5y+q4PLd811rnXkBaH0N/WaPFhcIrm4M1sRuGSxnocDvQiGzZk4aB Wrvf9xHwfA3tbjcDaFd7wU9DcNwnp6cSQsBepW+nE+AOeGU81Mn0flLzM 64ro/uo6pf+sIswbZVJ2DiO8XnGtrYVeeyughfAZ24PHImQqR2XmMLBtS jkINqvfn3+iqcMdHaO/qRPtKry+Cq/LX2TGZce+ypRpva4TO/YKijA1he g==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="306776628" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="306776628" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="580395692" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga008.fm.intel.com with ESMTP; 24 Jan 2022 07:02:21 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 83825D66; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 25/29] x86/kvm: Use bounce buffers for TD guest Date: Mon, 24 Jan 2022 18:02:11 +0300 Message-Id: <20220124150215.36893-26-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Intel TDX doesn't allow VMM to directly access guest private memory. Any memory that is required for communication with the VMM must be shared explicitly. The same rule applies for any DMA to and from the TDX guest. All DMA pages have to be marked as shared pages. A generic way to achieve this without any changes to device drivers is to use the SWIOTLB framework. Force SWIOTLB on TD guest and make SWIOTLB buffer shared by generalizing mem_encrypt_init() to cover TDX. Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/kernel/cc_platform.c | 1 + arch/x86/kernel/tdx.c | 3 +++ arch/x86/mm/mem_encrypt.c | 9 ++++++++- 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 1fbcf19fa20d..62c89c077cdd 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -23,6 +23,7 @@ static bool intel_cc_platform_has(enum cc_attr attr) case CC_ATTR_HOTPLUG_DISABLED: case CC_ATTR_GUEST_TDX: case CC_ATTR_GUEST_MEM_ENCRYPT: + case CC_ATTR_MEM_ENCRYPT: return true; default: return false; diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index ea638c6ecb92..6048887ac846 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -5,6 +5,7 @@ #define pr_fmt(fmt) "tdx: " fmt =20 #include +#include #include #include #include @@ -581,5 +582,7 @@ void __init tdx_early_init(void) */ physical_mask &=3D GENMASK_ULL(td_info.gpa_width - 2, 0); =20 + swiotlb_force =3D SWIOTLB_FORCE; + pr_info("Guest detected\n"); } diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index 50d209939c66..194ace3a748a 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -42,7 +42,14 @@ bool force_dma_unencrypted(struct device *dev) =20 static void print_mem_encrypt_feature_info(void) { - pr_info("AMD Memory Encryption Features active:"); + pr_info("Memory Encryption Features active:"); + + if (cc_platform_has(CC_ATTR_GUEST_TDX)) { + pr_cont(" Intel TDX\n"); + return; + } + + pr_cont("AMD "); =20 /* Secure Memory Encryption */ if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) { --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E337C433EF for ; Mon, 24 Jan 2022 15:03:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243390AbiAXPDW (ORCPT ); Mon, 24 Jan 2022 10:03:22 -0500 Received: from mga09.intel.com ([134.134.136.24]:24570 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240178AbiAXPCa (ORCPT ); Mon, 24 Jan 2022 10:02:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036550; x=1674572550; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6U2oeDtFWUqMQn6DVvOOEWiJP1jJ+wjAZGt0rG0H/Pg=; b=DveB9u0rpReyJSfiztnR05JsVwZgmgh8klf35FTwzyoj1bPhk/hktRtz ODvS/YkeSRP7E5XgZ5KSZcwAX8RVzQ9eTrH520Qi0+G+kRWtJCrCvqqfQ dzlmVugpqvsTtZexoP1zyK5R8ihIlr5Izr+2qWf1eO6+pXtdxjUcfjU1C LXJDkAuGPzuWQzakR6yG+202ju5L1KmHsSuFqwZIutRwzHvdowCSopGUo ynMM4+fe7KpxE2WaDgpqAHKQuOyMQh0bUULoTd+sHV2so4FfZn6rD18qm zXAAIuh82rrUMvXDg27A7mP1PMdGtPHec9Ap8vuMkO6wOlA/+Ouyhq/a9 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="245843912" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="245843912" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="766422605" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga006.fm.intel.com with ESMTP; 24 Jan 2022 07:02:21 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 91276D99; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, Isaku Yamahata , "Kirill A . Shutemov" Subject: [PATCHv2 26/29] x86/tdx: ioapic: Add shared bit for IOAPIC base address Date: Mon, 24 Jan 2022 18:02:12 +0300 Message-Id: <20220124150215.36893-27-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata The kernel interacts with each bare-metal IOAPIC with a special MMIO page. When running under KVM, the guest's IOAPICs are emulated by KVM. When running as a TDX guest, the guest needs to mark each IOAPIC mapping as "shared" with the host. This ensures that TDX private protections are not applied to the page, which allows the TDX host emulation to work. Earlier patches in this series modified ioremap() so that ioremap()-created mappings such as virtio will be marked as shared. However, the IOAPIC code does not use ioremap() and instead uses the fixmap mechanism. Introduce a special fixmap helper just for the IOAPIC code. Ensure that it marks IOAPIC pages as "shared". This replaces set_fixmap_nocache() with __set_fixmap() since __set_fixmap() allows custom 'prot' values. Signed-off-by: Isaku Yamahata Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov --- arch/x86/kernel/apic/io_apic.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index c1bb384935b0..d2fef5893e41 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -49,6 +49,7 @@ #include #include #include +#include =20 #include #include @@ -65,6 +66,7 @@ #include #include #include +#include =20 #define for_each_ioapic(idx) \ for ((idx) =3D 0; (idx) < nr_ioapics; (idx)++) @@ -2677,6 +2679,18 @@ static struct resource * __init ioapic_setup_resourc= es(void) return res; } =20 +static void io_apic_set_fixmap_nocache(enum fixed_addresses idx, + phys_addr_t phys) +{ + pgprot_t flags =3D FIXMAP_PAGE_NOCACHE; + + /* Set TDX guest shared bit in pgprot flags */ + if (cc_platform_has(CC_ATTR_GUEST_TDX)) + flags =3D pgprot_decrypted(flags); + + __set_fixmap(idx, phys, flags); +} + void __init io_apic_init_mappings(void) { unsigned long ioapic_phys, idx =3D FIX_IO_APIC_BASE_0; @@ -2709,7 +2723,7 @@ void __init io_apic_init_mappings(void) __func__, PAGE_SIZE, PAGE_SIZE); ioapic_phys =3D __pa(ioapic_phys); } - set_fixmap_nocache(idx, ioapic_phys); + io_apic_set_fixmap_nocache(idx, ioapic_phys); apic_printk(APIC_VERBOSE, "mapped IOAPIC to %08lx (%08lx)\n", __fix_to_virt(idx) + (ioapic_phys & ~PAGE_MASK), ioapic_phys); @@ -2838,7 +2852,7 @@ int mp_register_ioapic(int id, u32 address, u32 gsi_b= ase, ioapics[idx].mp_config.flags =3D MPC_APIC_USABLE; ioapics[idx].mp_config.apicaddr =3D address; =20 - set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address); + io_apic_set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address); if (bad_ioapic_register(idx)) { clear_fixmap(FIX_IO_APIC_BASE_0 + idx); return -ENODEV; --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2155C433F5 for ; Mon, 24 Jan 2022 15:03:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243568AbiAXPDu (ORCPT ); Mon, 24 Jan 2022 10:03:50 -0500 Received: from mga17.intel.com ([192.55.52.151]:63901 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240212AbiAXPCi (ORCPT ); Mon, 24 Jan 2022 10:02:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036558; x=1674572558; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aMt9dMQFnuWJAh9cVzkqMUEeYOnZHiNvm0HEf2p9qHU=; b=SxnucDkiQgUA75N32Y5DrXTWnRRHSEPDmbDzP8U8pD2imqSbgjG03DPI Gx9hNlRqZdtTDSdrKV9vqXRSqzx1r7G3KsROfjxrTEBxCMBZjaaH7n/QH zogxmbzPQE4e3NIUkwkrF4WVpGOYU8LTJUSMAu+4Wh20CDZrKy4bxfuAS zQcZWJQAzv9zxmpLT9fzbJW0yLG0/75qMBtY/S8iTXd5TP4N+L4t8x8r4 HuKKq8QVKirgfMo/mD0OyKubvauPL4hOQku2PPTRA6gDCHZthMTg0yGHa R/iHEHE39W2mw7+N5SZSUzB5zOdFA8GZZpe+nK/l3fRMz9xbmNLnC79HX g==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="226734722" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="226734722" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="562680298" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga001.jf.intel.com with ESMTP; 24 Jan 2022 07:02:22 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 9E09CDA5; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 27/29] ACPICA: Avoid cache flush on TDX guest Date: Mon, 24 Jan 2022 18:02:13 +0300 Message-Id: <20220124150215.36893-28-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" ACPI_FLUSH_CPU_CACHE() flushes caches on entering sleep states. It is required to prevent data loss. While running inside TDX guest, the kernel can bypass cache flushing. Changing sleep state in a virtual machine doesn't affect the host system sleep state and cannot lead to data loss. The approach can be generalized to all guest kernels, but, to be cautious, let's limit it to TDX for now. Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/acenv.h | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/acenv.h b/arch/x86/include/asm/acenv.h index 9aff97f0de7f..d19deca6dd27 100644 --- a/arch/x86/include/asm/acenv.h +++ b/arch/x86/include/asm/acenv.h @@ -13,7 +13,21 @@ =20 /* Asm macros */ =20 -#define ACPI_FLUSH_CPU_CACHE() wbinvd() +/* + * ACPI_FLUSH_CPU_CACHE() flushes caches on entering sleep states. + * It is required to prevent data loss. + * + * While running inside TDX guest, the kernel can bypass cache flushing. + * Changing sleep state in a virtual machine doesn't affect the host system + * sleep state and cannot lead to data loss. + * + * TODO: Is it safe to generalize this from TDX guests to all guest kernel= s? + */ +#define ACPI_FLUSH_CPU_CACHE() \ +do { \ + if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) \ + wbinvd(); \ +} while (0) =20 int __acpi_acquire_global_lock(unsigned int *lock); int __acpi_release_global_lock(unsigned int *lock); --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8403FC433F5 for ; Mon, 24 Jan 2022 15:03:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243359AbiAXPDq (ORCPT ); Mon, 24 Jan 2022 10:03:46 -0500 Received: from mga09.intel.com ([134.134.136.24]:24561 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240334AbiAXPCg (ORCPT ); Mon, 24 Jan 2022 10:02:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036556; x=1674572556; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uav+UwLJTiQzJso3hjEQdqtULUSM2290C08sG/C0E5c=; b=i81tDKDNHkhJumD/BI+dj3j2K1JNMwNh1+/OywP64pNwBqIUpenAnh8U Iv5o9UFbp49L74E0M9nOFWiMatBJxgi1zFNiwsEdcgpwC9fY1RGsi40Hu trtPGvgeMQCpahuGY2Ctj4vq+SJ3kSCDAVWbX5Ah6pX9YvbAvniKbel9d 8MDPdH+3GMhYNMwJ6CtW2ZCm7lDpAeibJfghOOvxSS4D6BaXdUdJlmVTX z/FCRXFMy89+zKad8Z4HssET1PWoj56sTWavEZ4J1joSRLebJw/9RRtly 5unXHko86V83OKhFHRuvaOuXCB2krrRVhTbWB6wBIwUL/6IAs9kibVC4P Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="245843930" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="245843930" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:29 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="627523593" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga004.jf.intel.com with ESMTP; 24 Jan 2022 07:02:23 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id AAC23DD9; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 28/29] x86/tdx: Warn about unexpected WBINVD Date: Mon, 24 Jan 2022 18:02:14 +0300 Message-Id: <20220124150215.36893-29-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" WBINVD causes #VE in TDX guests. There's no reliable way to emulate it. The kernel can ask for VMM assistance, but VMM is untrusted and can ignore the request. Fortunately, there is no use case for WBINVD inside TDX guests. Warn about any unexpected WBINVD. Signed-off-by: Kirill A. Shutemov --- arch/x86/kernel/tdx.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c index 6048887ac846..22c785c2059c 100644 --- a/arch/x86/kernel/tdx.c +++ b/arch/x86/kernel/tdx.c @@ -530,6 +530,10 @@ static bool tdx_virt_exception_kernel(struct pt_regs *= regs, struct ve_info *ve) case EXIT_REASON_IO_INSTRUCTION: ret =3D tdx_handle_io(regs, ve->exit_qual); break; + case EXIT_REASON_WBINVD: + WARN_ONCE(1, "Unexpected WBINVD\n"); + ret =3D true; + break; default: pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); break; --=20 2.34.1 From nobody Tue Jun 30 05:31:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F00AC433EF for ; Mon, 24 Jan 2022 15:03:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243512AbiAXPDn (ORCPT ); Mon, 24 Jan 2022 10:03:43 -0500 Received: from mga05.intel.com ([192.55.52.43]:18497 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240234AbiAXPCd (ORCPT ); Mon, 24 Jan 2022 10:02:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643036553; x=1674572553; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dsefuNW/uaII71woGzX+BNGxwSbHexqdQ8a8s4IKvyM=; b=g4T87JfpM71ptSD6Rb3hHfvmuyZeoAwJefWn/ejGqBSKP+QcWlDxWWoG Klqc7segTo84ylW0PGv7MmVeYf5odlV9cwkwFhQh0eZgCvL8uyg4VB39v bRWSkCkmaHaT11/a9JeAFkUIKX+I57k1hzabmE5KnYubPswZj4ojC9Ce1 dHkbtvQsfuyk8BbpEECg4D3lWrTOW8lH71lizjzyK2L+eDCpAPjSg5KOA OgCOzRPcg07rR7ZucJXPX71kF4GymfvJPmEhDOZeIKoVZo65+oCNILRAq WFgzxfXbYrCX81gSelwszPkTkTlyXNXtf92azjdLG8CFUudQ2hyEpk1uT g==; X-IronPort-AV: E=McAfee;i="6200,9189,10236"; a="332416548" X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="332416548" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jan 2022 07:02:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,311,1635231600"; d="scan'208";a="476743361" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga003.jf.intel.com with ESMTP; 24 Jan 2022 07:02:23 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id B89EADDA; Mon, 24 Jan 2022 17:02:20 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" Subject: [PATCHv2 29/29] Documentation/x86: Document TDX kernel architecture Date: Mon, 24 Jan 2022 18:02:15 +0300 Message-Id: <20220124150215.36893-30-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> References: <20220124150215.36893-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kuppuswamy Sathyanarayanan Document the TDX guest architecture details like #VE support, shared memory, etc. Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov --- Documentation/x86/index.rst | 1 + Documentation/x86/tdx.rst | 194 ++++++++++++++++++++++++++++++++++++ 2 files changed, 195 insertions(+) create mode 100644 Documentation/x86/tdx.rst diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index f498f1d36cd3..382e53ca850a 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -24,6 +24,7 @@ x86-specific Documentation intel-iommu intel_txt amd-memory-encryption + tdx pti mds microcode diff --git a/Documentation/x86/tdx.rst b/Documentation/x86/tdx.rst new file mode 100644 index 000000000000..903c9cecccbd --- /dev/null +++ b/Documentation/x86/tdx.rst @@ -0,0 +1,194 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Intel Trust Domain Extensions (TDX) +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Intel's Trust Domain Extensions (TDX) protect confidential guest VMs +from the host and physical attacks by isolating the guest register +state and by encrypting the guest memory. In TDX, a special TDX module +sits between the host and the guest, and runs in a special mode and +manages the guest/host separation. + +Since the host cannot directly access guest registers or memory, much +normal functionality of a hypervisor (such as trapping MMIO, some MSRs, +some CPUIDs, and some other instructions) has to be moved into the +guest. This is implemented using a Virtualization Exception (#VE) that +is handled by the guest kernel. Some #VEs are handled inside the guest +kernel, but some require the hypervisor (VMM) to be involved. The TD +hypercall mechanism allows TD guests to call TDX module or hypervisor +function. + +#VE Exceptions: +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +In TDX guests, #VE Exceptions are delivered to TDX guests in following +scenarios: + +* Execution of certain instructions (see list below) +* Certain MSR accesses. +* CPUID usage (only for certain leaves) +* Shared memory access (including MMIO) + +#VE due to instruction execution +--------------------------------- + +Intel TDX dis-allows execution of certain instructions in non-root +mode. Execution of these instructions would lead to #VE or #GP. + +Details are, + +List of instructions that can cause a #VE is, + +* String I/O (INS, OUTS), IN, OUT +* HLT +* MONITOR, MWAIT +* WBINVD, INVD +* VMCALL + +List of instructions that can cause a #GP is, + +* All VMX instructions: INVEPT, INVVPID, VMCLEAR, VMFUNC, VMLAUNCH, + VMPTRLD, VMPTRST, VMREAD, VMRESUME, VMWRITE, VMXOFF, VMXON +* ENCLS, ENCLV +* GETSEC +* RSM +* ENQCMD + +#VE due to MSR access +---------------------- + +In TDX guest, MSR access behavior can be categorized as, + +* Native supported (also called "context switched MSR") + No special handling is required for these MSRs in TDX guests. +* #GP triggered + Dis-allowed MSR read/write would lead to #GP. +* #VE triggered + All MSRs that are not natively supported or dis-allowed + (triggers #GP) will trigger #VE. To support access to + these MSRs, it needs to be emulated using TDCALL. + +Look Intel TDX Module Specification, sec "MSR Virtualization" for the comp= lete +list of MSRs that fall under the categories above. + +#VE due to CPUID instruction +---------------------------- + +In TDX guests, most of CPUID leaf/sub-leaf combinations are virtualized by +the TDX module while some trigger #VE. Combinations of CPUID leaf/sub-leaf +which triggers #VE are configured by the VMM during the TD initialization +time (using TDH.MNG.INIT). + +#VE on Memory Accesses +---------------------- + +A TD guest is in control of whether its memory accesses are treated as +private or shared. It selects the behavior with a bit in its page table +entries. + +#VE on Shared Pages +------------------- + +Access to shared mappings can cause a #VE. The hypervisor controls whether +access of shared mapping causes a #VE, so the guest must be careful to only +reference shared pages it can safely handle a #VE, avoid nested #VEs. + +Content of shared mapping is not trusted since shared memory is writable +by the hypervisor. Shared mappings are never used for sensitive memory con= tent +like stacks or kernel text, only for I/O buffers and MMIO regions. The ker= nel +will not encounter shared mappings in sensitive contexts like syscall entry +or NMIs. + +#VE on Private Pages +-------------------- + +Some accesses to private mappings may cause #VEs. Before a mapping is +accepted (AKA in the SEPT_PENDING state), a reference would cause a #VE. +But, after acceptance, references typically succeed. + +The hypervisor can cause a private page reference to fail if it chooses +to move an accepted page to a "blocked" state. However, if it does +this, page access will not generate a #VE. It will, instead, cause a +"TD Exit" where the hypervisor is required to handle the exception. + +Linux #VE handler +----------------- + +Both user/kernel #VE exceptions are handled by the tdx_handle_virt_excepti= on() +handler. If successfully handled, the instruction pointer is incremented to +complete the handling process. If failed to handle, it is treated as a reg= ular +exception and handled via fixup handlers. + +In TD guests, #VE nesting (a #VE triggered before handling the current one +or AKA syscall gap issue) problem is handled by TDX module ensuring that +interrupts, including NMIs, are blocked. The hardware blocks interrupts +starting with #VE delivery until TDGETVEINFO is called. + +The kernel must avoid triggering #VE in entry paths: do not touch TD-shared +memory, including MMIO regions, and do not use #VE triggering MSRs, +instructions, or CPUID leaves that might generate #VE. + +MMIO handling: +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +In non-TDX VMs, MMIO is usually implemented by giving a guest access to a +mapping which will cause a VMEXIT on access, and then the VMM emulates the +access. That's not possible in TDX guests because VMEXIT will expose the +register state to the host. TDX guests don't trust the host and can't have +their state exposed to the host. + +In TDX the MMIO regions are instead configured to trigger a #VE +exception in the guest. The guest #VE handler then emulates the MMIO +instructions inside the guest and converts them into a controlled TDCALL +to the host, rather than completely exposing the state to the host. + +MMIO addresses on x86 are just special physical addresses. They can be +accessed with any instruction that accesses memory. However, the +introduced instruction decoding method is limited. It is only designed +to decode instructions like those generated by io.h macros. + +MMIO access via other means (like structure overlays) may result in +MMIO_DECODE_FAILED and an oops. + +Shared memory: +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Intel TDX doesn't allow the VMM to access guest private memory. Any +memory that is required for communication with VMM must be shared +explicitly by setting the bit in the page table entry. The shared bit +can be enumerated with TDX_GET_INFO. + +After setting the shared bit, the conversion must be completed with +MapGPA hypercall. The call informs the VMM about the conversion between +private/shared mappings. + +set_memory_decrypted() converts a range of pages to shared. +set_memory_encrypted() converts memory back to private. + +Device drivers are the primary user of shared memory, but there's no +need in touching every driver. DMA buffers and ioremap()'ed regions are +converted to shared automatically. + +TDX uses SWIOTLB for most DMA allocations. The SWIOTLB buffer is +converted to shared on boot. + +For coherent DMA allocation, the DMA buffer gets converted on the +allocation. Check force_dma_unencrypted() for details. + +References +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +More details about TDX module (and its response for MSR, memory access, +IO, CPUID etc) can be found at, + +https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-mod= ule-1.0-public-spec-v0.931.pdf + +More details about TDX hypercall and TDX module call ABI can be found +at, + +https://www.intel.com/content/dam/develop/external/us/en/documents/intel-t= dx-guest-hypervisor-communication-interface-1.0-344426-002.pdf + +More details about TDVF requirements can be found at, + +https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-vir= tual-firmware-design-guide-rev-1.01.pdf --=20 2.34.1