[PATCH v3 00/10] Introduce CET supervisor state support

Chao Gao posted 10 patches 11 months, 1 week ago
There is a newer version of this series
arch/x86/include/asm/fpu/types.h  | 31 +++++++++++++++------------
arch/x86/include/asm/fpu/xstate.h | 11 ++++++----
arch/x86/kernel/fpu/core.c        | 34 +++++++++++++++++-------------
arch/x86/kernel/fpu/xstate.c      | 35 ++++++++++++++++++++++++-------
arch/x86/kernel/fpu/xstate.h      |  2 ++
5 files changed, 74 insertions(+), 39 deletions(-)
[PATCH v3 00/10] Introduce CET supervisor state support
Posted by Chao Gao 11 months, 1 week ago
==Changelog==
v2->v3:
 - reorder patches to add fpu_guest_cfg first and then introduce dynamic kernel
   feature concept (Dave)
 - Revise changelog for all patches except the first and the last one (Dave)
 - Split up patches that do multiple things into separate patches.
 - collect tags for patch 1

v1->v2:
 - rebase onto the latest kvm-x86/next
 - Add performance data to the cover-letter
 - v1: https://lore.kernel.org/kvm/73802bff-833c-4233-9a5b-88af0d062c82@intel.com/

==Background==

This series spins off from CET KVM virtualization enabling series [1].
The purpose is to get these preparation work resolved ahead of KVM part
landing. There was a discussion about introducing CET supervisor state
support [2] [3].

CET supervisor state, i.e., IA32_PL{0,1,2}_SSP, are xsave-managed MSRs,
it can be enabled via IA32_XSS[bit 12]. KVM relies on host side CET
supervisor state support to fully enable guest CET MSR contents storage.
The benefits are: 1) No need to manually save/restore the 3 MSRs when
vCPU fpu context is sched in/out. 2) Omit manually swapping the three
MSRs at VM-Exit/VM-Entry for guest/host. 3) Make guest CET user/supervisor
states managed in a consistent manner within host kernel FPU framework.

==Solution==

This series tries to:
1) Fix existing issue regarding enabling guest supervisor states support.
2) Add CET supervisor state support in core kernel.
3) Introduce new FPU config for guest fpstate setup.

With the preparation work landed, for guest fpstate, we have xstate_bv[12]
== xcomp_bv[12] == 1 and CET supervisor state is saved/reloaded when
xsaves/xrstors executes on guest fpstate.
For non-guest/normal fpstate, we have xstate_bv[12] == xcomp_bv[12] == 0,
then HW can optimize xsaves/xrstors operations.

==Performance==

We measured context-switching performance with the benchmark [4] in following
three cases.

case 1: the baseline. i.e., this series isn't applied
case 2: baseline + this series. CET-S space is allocated for guest fpu only.
case 3: baseline + allocate CET-S space for all tasks. Hardware init
        optimization avoids writing out CET-S space on each XSAVES.

the data are as follows

case |IA32_XSS[12] | Space | RFBM[12] | Drop%	
-----+-------------+-------+----------+------
  1  |	   0	   | None  |	0     |  0.0%
  2  |	   1	   | None  |	0     |  0.2%
  3  |	   1	   | 24B?  |	1     |  0.2%

Case 2 and 3 have no difference in performnace. But case 2 is preferred because
it can save 24B of CET-S space for all non-vCPU threads with just a one-line
change:

+	fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_KERNEL_DYNAMIC;

fpu_guest_cfg has its own merits. Regardless of the approach we take, using
different FPU configuration settings for the guest and the kernel improves
readability, decouples them from each other, and arguably enhances
extensibility.

[1]: https://lore.kernel.org/all/20240219074733.122080-1-weijiang.yang@intel.com/
[2]: https://lore.kernel.org/all/ZM1jV3UPL0AMpVDI@google.com/
[3]: https://lore.kernel.org/all/2597a87b-1248-b8ce-ce60-94074bc67ea4@intel.com/
[4]: https://github.com/antonblanchard/will-it-scale/blob/master/tests/context_switch1.c



Chao Gao (2):
  x86/fpu/xstate: Drop @perm from guest pseudo FPU container
  x86/fpu/xstate: Correct xfeatures cache in guest pseudo fpu container

Sean Christopherson (1):
  x86/fpu/xstate: Always preserve non-user xfeatures/flags in
    __state_perm

Yang Weijiang (7):
  x86/fpu/xstate: Correct guest fpstate size calculation
  x86/fpu/xstate: Introduce guest FPU configuration
  x86/fpu/xstate: Initialize guest perm with fpu_guest_cfg
  x86/fpu/xstate: Initialize guest fpstate with fpu_guest_config
  x86/fpu/xstate: Add CET supervisor xfeature support
  x86/fpu/xstate: Introduce XFEATURE_MASK_KERNEL_DYNAMIC xfeature set
  x86/fpu/xstate: Warn if CET supervisor state is detected in normal
    fpstate

 arch/x86/include/asm/fpu/types.h  | 31 +++++++++++++++------------
 arch/x86/include/asm/fpu/xstate.h | 11 ++++++----
 arch/x86/kernel/fpu/core.c        | 34 +++++++++++++++++-------------
 arch/x86/kernel/fpu/xstate.c      | 35 ++++++++++++++++++++++++-------
 arch/x86/kernel/fpu/xstate.h      |  2 ++
 5 files changed, 74 insertions(+), 39 deletions(-)

-- 
2.46.1
Re: [PATCH v3 00/10] Introduce CET supervisor state support
Posted by Dave Hansen 11 months ago
On 3/7/25 08:41, Chao Gao wrote:
> case |IA32_XSS[12] | Space | RFBM[12] | Drop%	
> -----+-------------+-------+----------+------
>   1  |	   0	   | None  |	0     |  0.0%
>   2  |	   1	   | None  |	0     |  0.2%
>   3  |	   1	   | 24B?  |	1     |  0.2%

So, 0.2% is still, what, dozens of cycles? Are you sure that it really
takes the CPU dozens of cycles to skip over the feature during XSAVE?

If it really turns out to be this measurable, we should probably follow
up with the folks that implement XSAVE and see what's going on under the
covers.

On a separate note, I was bugging Thomas a bit on IRC. His memory was
that the AMX-era FPU rework only expected KVM to support user features.
You might want to dig through the history a bit and see if _that_ was
ever properly addressed because that would change the problem you're
trying to solve.
Re: [PATCH v3 00/10] Introduce CET supervisor state support
Posted by Chao Gao 10 months, 3 weeks ago
On Fri, Mar 07, 2025 at 11:09:42AM -0800, Dave Hansen wrote:
>On 3/7/25 08:41, Chao Gao wrote:
>> case |IA32_XSS[12] | Space | RFBM[12] | Drop%	
>> -----+-------------+-------+----------+------
>>   1  |	   0	   | None  |	0     |  0.0%
>>   2  |	   1	   | None  |	0     |  0.2%
>>   3  |	   1	   | 24B?  |	1     |  0.2%
>
>So, 0.2% is still, what, dozens of cycles? Are you sure that it really
>takes the CPU dozens of cycles to skip over the feature during XSAVE?
>
>If it really turns out to be this measurable, we should probably follow
>up with the folks that implement XSAVE and see what's going on under the
>covers.

I reran the performance tests and observed a run-to-run variation of 0.4%
to 0.7%. So, I don't think there is any measurable performance difference.
I will update the performance statements in the cover letter.

>
>On a separate note, I was bugging Thomas a bit on IRC. His memory was
>that the AMX-era FPU rework only expected KVM to support user features.
>You might want to dig through the history a bit and see if _that_ was
>ever properly addressed because that would change the problem you're
>trying to solve.

I went through the email discussions and found only one relevant thread:

https://lore.kernel.org/kvm/87wnmf66m5.ffs@tglx/#t

where Thomas mentioned that guest_perm would be set as follows:

	guest_fpu::__state_perm = supported_xcr0 & xstate_get_group_perm();

If implemented this way, KVM would only support user features. However, the
committed change is:

980fe2fddcff ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions")

In this change, fpu->guest_perm is copied from fpu->perm:

+	/* Same defaults for guests */
+	fpu->guest_perm = fpu->perm;

There are indeed some issues with enabling supervisor features for guest
FPUs, but they have been addressed by recent changes in the tip-tree ([1],
[2]) and patch 1 of this series.

[1]: https://lore.kernel.org/all/20250218141045.85201-1-stanspas@amazon.de/
[2]: https://lore.kernel.org/all/20250317140613.1761633-1-chao.gao@intel.com/