[v7] Introduce CET supervisor state support

[PATCH v7 0/6] Introduce CET supervisor state support

Posted by Chao Gao 9 months ago

Dear maintainers and reviewers,

I kindly request your consideration for merging this series. Most of
patches have received Reviewed-by/Acked-by tags.

Thanks Chang, Rick, Xin, Sean and Dave for their help with this series.

== Changelog ==
v6->v7:
 - Collect reviews from Rick
 - Tweak __fpstate_reset() to handle guest fpstate rather than adding a
   guest-specific reset function (Sean & Dave)
 - Fold xfd initialization into __fpstate_reset() (Sean)
 - v6: https://lore.kernel.org/all/20250506093740.2864458-1-chao.gao@intel.com/

== Background ==

CET defines two register states: CET user, which includes user-mode control
registers, and CET supervisor, which consists of shadow-stack pointers for
privilege levels 0-2.

Current kernel disables shadow stacks in kernel mode, making the CET
supervisor state unused and eliminating the need for context switching.

== Problem ==

To virtualize CET for guests, KVM must accurately emulate hardware
behavior. A key challenge arises because there is no CPUID flag to indicate
that shadow stack is supported only in user mode. Therefore, KVM cannot
assume guests will not enable shadow stacks in kernel mode and must
preserve the CET supervisor state of vCPUs.

== Solution ==

An initial proposal to manually save and restore CET supervisor states
using raw RDMSR/WRMSR in KVM was rejected due to performance concerns and
its impact on KVM's ABI. Instead, leveraging the kernel's FPU
infrastructure for context switching was favored [1].

The main question then became whether to enable the CET supervisor state
globally for all processes or restrict it to vCPU processes. This decision
involves a trade-off between a 24-byte XSTATE buffer waste for all non-vCPU
processes and approximately 100 lines of code complexity in the kernel [2].
The agreed approach is to first try this optimal solution [3], i.e.,
restricting the CET supervisor state to guest FPUs only and eliminating
unnecessary space waste.

Key changes in this series are:

1) Fix existing issue regarding enabling guest supervisor states support.
2) Add default features and size for guest FPUs.
3) Add infrastructure to support guest-only features.
4) Add CET supervisor state as the first guest-only feature.

With the series in place, guest FPUs have xstate_bv[12] == xcomp_bv[12] == 1
and CET supervisor state is saved/reloaded when xsaves/xrstors executes on
guest FPUs. non-guest FPUs have xstate_bv[12] == xcomp_bv[12] == 0, then
CET supervisor state is not saved/restored.

== Performance ==

We measured context-switching performance with the benchmark [4] in following
three cases.

case 1: the baseline. i.e., this series isn't applied
case 2: baseline + this series. CET-S space is allocated for guest fpu only.
case 3: baseline + allocate CET-S space for all tasks. Hardware init
        optimization avoids writing out CET-S space on each XSAVES.

The performance differences in the three cases are very small and fall within the
run-to-run variation.

[1]: https://lore.kernel.org/kvm/ZM1jV3UPL0AMpVDI@google.com/
[2]: https://lore.kernel.org/kvm/1c2fd06e-2e97-4724-80ab-8695aa4334e7@intel.com/
[3]: https://lore.kernel.org/kvm/2597a87b-1248-b8ce-ce60-94074bc67ea4@intel.com/
[4]: https://github.com/antonblanchard/will-it-scale/blob/master/tests/context_switch1.c

Chao Gao (4):
  x86/fpu/xstate: Differentiate default features for host and guest FPUs
  x86/fpu: Initialize guest FPU permissions from guest defaults
  x86/fpu: Initialize guest fpstate and FPU pseudo container from guest
    defaults
  x86/fpu: Remove xfd argument from __fpstate_reset()

Yang Weijiang (2):
  x86/fpu/xstate: Introduce "guest-only" supervisor xfeature set
  x86/fpu/xstate: Add CET supervisor xfeature support as a guest-only
    feature

 arch/x86/include/asm/fpu/types.h  | 49 +++++++++++++++++++++++++++----
 arch/x86/include/asm/fpu/xstate.h |  9 ++++--
 arch/x86/kernel/fpu/core.c        | 49 ++++++++++++++++++++++---------
 arch/x86/kernel/fpu/init.c        |  1 +
 arch/x86/kernel/fpu/xstate.c      | 40 ++++++++++++++++++++-----
 arch/x86/kernel/fpu/xstate.h      |  5 ++++
 6 files changed, 123 insertions(+), 30 deletions(-)

-- 
2.47.1

Re: [PATCH v7 0/6] Introduce CET supervisor state support

Posted by Uros Bizjak 8 months, 3 weeks ago

On Mon, May 12, 2025 at 10:57 AM Chao Gao <chao.gao@intel.com> wrote:
>
> Dear maintainers and reviewers,
>
> I kindly request your consideration for merging this series. Most of
> patches have received Reviewed-by/Acked-by tags.
>
> Thanks Chang, Rick, Xin, Sean and Dave for their help with this series.
>
> == Changelog ==
> v6->v7:
>  - Collect reviews from Rick
>  - Tweak __fpstate_reset() to handle guest fpstate rather than adding a
>    guest-specific reset function (Sean & Dave)
>  - Fold xfd initialization into __fpstate_reset() (Sean)
>  - v6: https://lore.kernel.org/all/20250506093740.2864458-1-chao.gao@intel.com/
>
> == Background ==
>
> CET defines two register states: CET user, which includes user-mode control
> registers, and CET supervisor, which consists of shadow-stack pointers for
> privilege levels 0-2.
>
> Current kernel disables shadow stacks in kernel mode, making the CET
> supervisor state unused and eliminating the need for context switching.
>
> == Problem ==
>
> To virtualize CET for guests, KVM must accurately emulate hardware
> behavior. A key challenge arises because there is no CPUID flag to indicate
> that shadow stack is supported only in user mode. Therefore, KVM cannot
> assume guests will not enable shadow stacks in kernel mode and must
> preserve the CET supervisor state of vCPUs.
>
> == Solution ==
>
> An initial proposal to manually save and restore CET supervisor states
> using raw RDMSR/WRMSR in KVM was rejected due to performance concerns and
> its impact on KVM's ABI. Instead, leveraging the kernel's FPU
> infrastructure for context switching was favored [1].

Dear Chao,

I wonder if the same approach can be used to optimize switching of
Intel PT configuration context. There was a patch series [1] posted
some time ago that showed substantial reduction of overhead when
switching Intel PT configuration context on VM-Entry/Exit using
XSAVES/XRSTORS instructions:

Manual save(rdmsr):     ~334  cycles
Manual restore(wrmsr):  ~1668 cycles

XSAVES insturction:     ~124  cycles
XRSTORS instruction:    ~378  cycles

[1] https://lore.kernel.org/lkml/1557995114-21629-1-git-send-email-luwei.kang@intel.com/

Best regards,
Uros.

Re: [PATCH v7 0/6] Introduce CET supervisor state support

Posted by Dave Hansen 8 months, 3 weeks ago

On 5/16/25 00:51, Uros Bizjak wrote:
> I wonder if the same approach can be used to optimize switching of
> Intel PT configuration context. There was a patch series [1] posted
> some time ago that showed substantial reduction of overhead when
> switching Intel PT configuration context on VM-Entry/Exit using
> XSAVES/XRSTORS instructions:
> 
> Manual save(rdmsr):     ~334  cycles
> Manual restore(wrmsr):  ~1668 cycles
> 
> XSAVES insturction:     ~124  cycles
> XRSTORS instruction:    ~378  cycles

There's nothing stopping us from using XSAVES/XRSTORS for PT,
independent of the kernel FPU infrastructure. RFBM exists for a reason.

There's also WRMSRLIST which we're not using yet either. It's an even
better fit and doesn't have the goofiness that using XSAVE does like the
legacy portion of the save area including the header.

Re: [PATCH v7 0/6] Introduce CET supervisor state support

Posted by Chao Gao 8 months, 3 weeks ago

On Fri, May 16, 2025 at 09:51:50AM +0200, Uros Bizjak wrote:
>On Mon, May 12, 2025 at 10:57 AM Chao Gao <chao.gao@intel.com> wrote:
>>
>> Dear maintainers and reviewers,
>>
>> I kindly request your consideration for merging this series. Most of
>> patches have received Reviewed-by/Acked-by tags.
>>
>> Thanks Chang, Rick, Xin, Sean and Dave for their help with this series.
>>
>> == Changelog ==
>> v6->v7:
>>  - Collect reviews from Rick
>>  - Tweak __fpstate_reset() to handle guest fpstate rather than adding a
>>    guest-specific reset function (Sean & Dave)
>>  - Fold xfd initialization into __fpstate_reset() (Sean)
>>  - v6: https://lore.kernel.org/all/20250506093740.2864458-1-chao.gao@intel.com/
>>
>> == Background ==
>>
>> CET defines two register states: CET user, which includes user-mode control
>> registers, and CET supervisor, which consists of shadow-stack pointers for
>> privilege levels 0-2.
>>
>> Current kernel disables shadow stacks in kernel mode, making the CET
>> supervisor state unused and eliminating the need for context switching.
>>
>> == Problem ==
>>
>> To virtualize CET for guests, KVM must accurately emulate hardware
>> behavior. A key challenge arises because there is no CPUID flag to indicate
>> that shadow stack is supported only in user mode. Therefore, KVM cannot
>> assume guests will not enable shadow stacks in kernel mode and must
>> preserve the CET supervisor state of vCPUs.
>>
>> == Solution ==
>>
>> An initial proposal to manually save and restore CET supervisor states
>> using raw RDMSR/WRMSR in KVM was rejected due to performance concerns and
>> its impact on KVM's ABI. Instead, leveraging the kernel's FPU
>> infrastructure for context switching was favored [1].
>
>Dear Chao,
>
>I wonder if the same approach can be used to optimize switching of
>Intel PT configuration context. There was a patch series [1] posted
>some time ago that showed substantial reduction of overhead when
>switching Intel PT configuration context on VM-Entry/Exit using
>XSAVES/XRSTORS instructions:

No, the guest-only infrastructure utilizes the FPU core to switch states
during context switches, whereas Intel PT state is switched at different
points, i.e., on VM entry/exit.

Switching Intel PT state on VM entry/exit is necessary only for the
"host-guest" mode, which is currently marked as BROKEN. Unless functional
issues are addressed first, there's no point in optimizing its state
switching.

If we ever reinstate support for the "host-guest" mode, I think Intel PT
state probably could be implemented as an independent feature, similar to
LBR state.

Re: [PATCH v7 0/6] Introduce CET supervisor state support

Posted by Ingo Molnar 8 months, 4 weeks ago

* Chao Gao <chao.gao@intel.com> wrote:

> Dear maintainers and reviewers,
> 
> I kindly request your consideration for merging this series. Most of 
> patches have received Reviewed-by/Acked-by tags.

I don't see anything objectionable in this series.

The upcoming v6.16 merge window is already quite crowded in terms of 
FPU changes, so I think at this point we are looking at a v6.17 merge, 
done shortly after v6.16-rc1 if everything goes well. Dave, what do you 
think?

Thanks,

	Ingo

Re: [PATCH v7 0/6] Introduce CET supervisor state support

Posted by Dave Hansen 8 months, 3 weeks ago

On 5/15/25 08:41, Ingo Molnar wrote:
> * Chao Gao <chao.gao@intel.com> wrote:
>> I kindly request your consideration for merging this series. Most of 
>> patches have received Reviewed-by/Acked-by tags.
> I don't see anything objectionable in this series.
> 
> The upcoming v6.16 merge window is already quite crowded in terms of 
> FPU changes, so I think at this point we are looking at a v6.17 merge, 
> done shortly after v6.16-rc1 if everything goes well. Dave, what do you 
> think?

It's getting into shape, but it has a slight shortage of reviews. For
now, it's an all-Intel patch even though I _thought_ AMD had this
feature too. It's also purely for KVM and has some suggested-by's from
Sean, but no KVM acks on it.

I have the feeling Sean would speak up if this was going in a bad
direction for KVM, but I do love to see acks accompanying suggested-by's
to indicate that the suggestion was interpreted properly.

Re: [PATCH v7 0/6] Introduce CET supervisor state support

Posted by Dave Hansen 8 months, 3 weeks ago

On 5/15/25 08:41, Ingo Molnar wrote:
> * Chao Gao <chao.gao@intel.com> wrote:
>> I kindly request your consideration for merging this series. Most of 
>> patches have received Reviewed-by/Acked-by tags.
> I don't see anything objectionable in this series.
> 
> The upcoming v6.16 merge window is already quite crowded in terms of 
> FPU changes, so I think at this point we are looking at a v6.17 merge, 
> done shortly after v6.16-rc1 if everything goes well. Dave, what do you 
> think?

It's getting into shape, but it has a slight shortage of reviews. For
now, it's an all-Intel patch even though I _thought_ AMD had this
feature too. It's also purely for KVM and has some suggested-by's from
Sean, but no KVM acks on it.

Sean is not exactly the quiet type about things, but it always warms me
heart to see an acked-by accompanying a suggested-by because it
indicates that the suggestion was heard and implemented properly.

Re: [PATCH v7 0/6] Introduce CET supervisor state support

Posted by Peter Zijlstra 8 months, 3 weeks ago

On Fri, May 16, 2025 at 08:20:54AM -0700, Dave Hansen wrote:

> It's getting into shape, but it has a slight shortage of reviews. For
> now, it's an all-Intel patch even though I _thought_ AMD had this
> feature too. 

Yeah, AMD should have this. While AMD does not implement IBT, they did
do implement SS, and as such, they should be having this stuff as well.

Re: [PATCH v7 0/6] Introduce CET supervisor state support

Posted by Chao Gao 8 months, 3 weeks ago

On Fri, May 16, 2025 at 08:20:54AM -0700, Dave Hansen wrote:
>On 5/15/25 08:41, Ingo Molnar wrote:
>> * Chao Gao <chao.gao@intel.com> wrote:
>>> I kindly request your consideration for merging this series. Most of 
>>> patches have received Reviewed-by/Acked-by tags.
>> I don't see anything objectionable in this series.
>> 
>> The upcoming v6.16 merge window is already quite crowded in terms of 
>> FPU changes, so I think at this point we are looking at a v6.17 merge, 
>> done shortly after v6.16-rc1 if everything goes well. Dave, what do you 
>> think?
>
>It's getting into shape, but it has a slight shortage of reviews. For
>now, it's an all-Intel patch even though I _thought_ AMD had this
>feature too. It's also purely for KVM and has some suggested-by's from
>Sean, but no KVM acks on it.
>
>Sean is not exactly the quiet type about things, but it always warms me
>heart to see an acked-by accompanying a suggested-by because it
>indicates that the suggestion was heard and implemented properly.

Hi Sean, John,

Based on Dave's feedback, could you please review this series and provide your
reviewed-by/acked-by if appropriate?

Re: [PATCH v7 0/6] Introduce CET supervisor state support

Posted by Sean Christopherson 8 months, 3 weeks ago

On Wed, May 21, 2025, Chao Gao wrote:
> On Fri, May 16, 2025 at 08:20:54AM -0700, Dave Hansen wrote:
> >On 5/15/25 08:41, Ingo Molnar wrote:
> >> * Chao Gao <chao.gao@intel.com> wrote:
> >>> I kindly request your consideration for merging this series. Most of 
> >>> patches have received Reviewed-by/Acked-by tags.
> >> I don't see anything objectionable in this series.
> >> 
> >> The upcoming v6.16 merge window is already quite crowded in terms of 
> >> FPU changes, so I think at this point we are looking at a v6.17 merge, 
> >> done shortly after v6.16-rc1 if everything goes well. Dave, what do you 
> >> think?
> >
> >It's getting into shape, but it has a slight shortage of reviews. For
> >now, it's an all-Intel patch even though I _thought_ AMD had this
> >feature too. It's also purely for KVM and has some suggested-by's from
> >Sean, but no KVM acks on it.
> >
> >Sean is not exactly the quiet type about things, but it always warms me
> >heart to see an acked-by accompanying a suggested-by because it
> >indicates that the suggestion was heard and implemented properly.
> 
> Hi Sean, John,
> 
> Based on Dave's feedback, could you please review this series and provide your
> reviewed-by/acked-by if appropriate?

The initialization of default features is a bit gnarly and I think can be improved
without too much fuss, but otherwise this looks good.