From nobody Tue Dec 16 23:27:13 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCD09290BD2; Thu, 22 May 2025 15:10:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747926649; cv=none; b=Gfh3kfRmS+PfNEj7k0JhDryOrjqe/vpghtdxEgSdTbs0+vai8X/ctElHpdOwvtYa1zHkkYHTiKh55IMSDAoio2MxT3moI0cJXZEiA914OsEb/rOWshxj4Dy8ebxFA4/PqNUR64FiMNWeBW1kqKkKl/VMqrSeRVsXWVmNorzQjkA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747926649; c=relaxed/simple; bh=t2m92IG+lESfSveuVAhyAr9qEmQos8t6zPYOu82WPfE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TgkNkEhmIscG3d4toIwBUqHqZRD+Zlwe9nJoKfx493tkvQEvuu9Yjt4ND2B4D+frkfl+tEr68KJLaqcd4fpMCvWxGpgA+2NnP/J2cwNhBUTtu6QP3IV8BdAZAwjwiVtALxj3tbV9R3pyQqsgq34ZY0lDIn6YyFl5oEanlJkie5E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=JvJipe03; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="JvJipe03" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1747926648; x=1779462648; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=t2m92IG+lESfSveuVAhyAr9qEmQos8t6zPYOu82WPfE=; b=JvJipe03Ubge9i+FuE6QbnGMRLdluSaBxy+9ecdyqXV6IgDzBN4NGLx6 DWK6E2CzUr8OugOCXohpFnr5XsNF7xGH9LQvkpuMigqR7TeCa8w0//i0b bjFWFaCNC3yUXlzENkxTrMAuVerZIL1hfKu6+p+aMzUYMbQapZ+Ex57gu GwtuAz2LQxZk+hqZ0am2VH6ZsOfSsKMSRTNWOkagAScbZCZ772N17Vl6M gPJ0C73yVQQ9ACB+pKygqR7CIO/Q4DSY3fWHaVFvlQMt/OEB5njCHXtoo LyPF9lLY+2D6W+qo/D9WFnYO7bWBks+EBkPfar5Fw2aB6S3gfq6a6vQUJ w==; X-CSE-ConnectionGUID: RIcwb7g0TAWBMWFqoIS9Xw== X-CSE-MsgGUID: 8tyTZTHpTxWXZ5MVHtsVQw== X-IronPort-AV: E=McAfee;i="6700,10204,11441"; a="61006718" X-IronPort-AV: E=Sophos;i="6.15,306,1739865600"; d="scan'208";a="61006718" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2025 08:10:48 -0700 X-CSE-ConnectionGUID: r8jPIcKCQmqqvUc06U2Kyw== X-CSE-MsgGUID: tgsrSSyrSwe3KqWzyL+ogQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,306,1739865600"; d="scan'208";a="171627635" Received: from 984fee019967.jf.intel.com ([10.165.54.94]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2025 08:10:47 -0700 From: Chao Gao To: x86@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, tglx@linutronix.de, dave.hansen@intel.com, seanjc@google.com, pbonzini@redhat.com Cc: peterz@infradead.org, rick.p.edgecombe@intel.com, weijiang.yang@intel.com, john.allen@amd.com, bp@alien8.de, chang.seok.bae@intel.com, xin3.li@intel.com, Chao Gao , Maxim Levitsky , Ingo Molnar , Dave Hansen , "H. Peter Anvin" , Mitchell Levy , Vignesh Balasubramanian Subject: [PATCH v8 6/6] x86/fpu/xstate: Add CET supervisor xfeature support as a guest-only feature Date: Thu, 22 May 2025 08:10:09 -0700 Message-ID: <20250522151031.426788-7-chao.gao@intel.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250522151031.426788-1-chao.gao@intel.com> References: <20250522151031.426788-1-chao.gao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yang Weijiang =3D=3D Background =3D=3D CET defines two register states: CET user, which includes user-mode control registers, and CET supervisor, which consists of shadow-stack pointers for privilege levels 0-2. Current kernels disable shadow stacks in kernel mode, making the CET supervisor state unused and eliminating the need for context switching. =3D=3D Problem =3D=3D To virtualize CET for guests, KVM must accurately emulate hardware behavior. A key challenge arises because there is no CPUID flag to indicate that shadow stack is supported only in user mode. Therefore, KVM cannot assume guests will not enable shadow stacks in kernel mode and must preserve the CET supervisor state of vCPUs. =3D=3D Solution =3D=3D An initial proposal to manually save and restore CET supervisor states using raw RDMSR/WRMSR in KVM was rejected due to performance concerns and its impact on KVM's ABI. Instead, leveraging the kernel's FPU infrastructure for context switching was favored [1]. The main question then became whether to enable the CET supervisor state globally for all processes or restrict it to vCPU processes. This decision involves a trade-off between a 24-byte XSTATE buffer waste for all non-vCPU processes and approximately 100 lines of code complexity in the kernel [2]. The agreed approach is to first try this optimal solution [3], i.e., restricting the CET supervisor state to guest FPUs only and eliminating unnecessary space waste. The guest-only xfeature infrastructure has already been added. Now, introduce CET supervisor xstate support as the first guest-only feature to prepare for the upcoming CET virtualization in KVM. Signed-off-by: Yang Weijiang Signed-off-by: Chao Gao Reviewed-by: Rick Edgecombe Reviewed-by: Maxim Levitsky Link: https://lore.kernel.org/kvm/ZM1jV3UPL0AMpVDI@google.com/ [1] Link: https://lore.kernel.org/kvm/1c2fd06e-2e97-4724-80ab-8695aa4334e7@inte= l.com/ [2] Link: https://lore.kernel.org/kvm/2597a87b-1248-b8ce-ce60-94074bc67ea4@inte= l.com/ [3] Reviewed-by: John Allen --- v5: Introduce CET supervisor xfeature directly as a guest-only feature, rather than first introducing it in one patch and then converting it to guest-only in a subsequent patch. (Chang) Add new features after cleanups/bug fixes (Chang, Dave, Ingo) Improve the commit message to follow the suggested background-problem-solution pattern. --- arch/x86/include/asm/fpu/types.h | 14 ++++++++++++-- arch/x86/include/asm/fpu/xstate.h | 5 ++--- arch/x86/kernel/fpu/xstate.c | 5 ++++- 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/ty= pes.h index 54ba567258d6..93e99d2583d6 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -118,7 +118,7 @@ enum xfeature { XFEATURE_PKRU, XFEATURE_PASID, XFEATURE_CET_USER, - XFEATURE_CET_KERNEL_UNUSED, + XFEATURE_CET_KERNEL, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -142,7 +142,7 @@ enum xfeature { #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) #define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) -#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL_UNUSED) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG) #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA) @@ -268,6 +268,16 @@ struct cet_user_state { u64 user_ssp; }; =20 +/* + * State component 12 is Control-flow Enforcement supervisor states. + * This state includes SSP pointers for privilege levels 0 through 2. + */ +struct cet_supervisor_state { + u64 pl0_ssp; + u64 pl1_ssp; + u64 pl2_ssp; +} __packed; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/x= state.h index a3cd25453f94..7a7dc9d56027 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -47,7 +47,7 @@ #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA =20 /* Supervisor features which are enabled only in guest FPUs */ -#define XFEATURE_MASK_GUEST_SUPERVISOR 0 +#define XFEATURE_MASK_GUEST_SUPERVISOR XFEATURE_MASK_CET_KERNEL =20 /* All currently supported supervisor features */ #define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ @@ -79,8 +79,7 @@ * Unsupported supervisor features. When a supervisor feature in this mask= is * supported in the future, move it to the supported supervisor feature ma= sk. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ - XFEATURE_MASK_CET_KERNEL) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) =20 /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED |= \ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index f5eb3e84c3dc..bcccb0977809 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -56,7 +56,7 @@ static const char *xfeature_names[] =3D "Protection Keys User registers", "PASID state", "Control-flow User registers", - "Control-flow Kernel registers (unused)", + "Control-flow Kernel registers (KVM only)", "unknown xstate feature", "unknown xstate feature", "unknown xstate feature", @@ -80,6 +80,7 @@ static unsigned short xsave_cpuid_features[] __initdata = =3D { [XFEATURE_PKRU] =3D X86_FEATURE_OSPKE, [XFEATURE_PASID] =3D X86_FEATURE_ENQCMD, [XFEATURE_CET_USER] =3D X86_FEATURE_SHSTK, + [XFEATURE_CET_KERNEL] =3D X86_FEATURE_SHSTK, [XFEATURE_XTILE_CFG] =3D X86_FEATURE_AMX_TILE, [XFEATURE_XTILE_DATA] =3D X86_FEATURE_AMX_TILE, [XFEATURE_APX] =3D X86_FEATURE_APX, @@ -371,6 +372,7 @@ static __init void os_xrstor_booting(struct xregs_state= *xstate) XFEATURE_MASK_BNDCSR | \ XFEATURE_MASK_PASID | \ XFEATURE_MASK_CET_USER | \ + XFEATURE_MASK_CET_KERNEL | \ XFEATURE_MASK_XTILE | \ XFEATURE_MASK_APX) =20 @@ -572,6 +574,7 @@ static bool __init check_xstate_against_struct(int nr) case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state); case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg); case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state); + case XFEATURE_CET_KERNEL: return XCHECK_SZ(sz, nr, struct cet_supervisor_= state); case XFEATURE_APX: return XCHECK_SZ(sz, nr, struct apx_state); case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return tru= e; default: --=20 2.47.1