From: Sean Christopherson <seanjc@google.com>
Load the guest's FPU state if userspace is accessing MSRs whose values
are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
to facilitate access to such kind of MSRs.
If MSRs supported in kvm_caps.supported_xss are passed through to guest,
the guest MSRs are swapped with host's before vCPU exits to userspace and
after it reenters kernel before next VM-entry.
Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
explicitly check @vcpu is non-null before attempting to load guest state.
The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
loading guest FPU state (which doesn't exist).
Note that guest_cpuid_has() is not queried as host userspace is allowed to
access MSRs that have not been exposed to the guest, e.g. it might do
KVM_SET_MSRS prior to KVM_SET_CPUID2.
The two helpers are put here in order to manifest accessing xsave-managed
MSRs requires special check and handling to guarantee the correctness of
read/write to the MSRs.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v14:
- s/rdmsrl/rdmsrq, s/wrmsrl/wrmsrq (Xin)
- return true in is_xstate_managed_msr() for MSR_IA32_S_CET
---
arch/x86/kvm/x86.c | 36 +++++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.h | 24 ++++++++++++++++++++++++
2 files changed, 59 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c15e8c00dc7d..7c0a07be6b64 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
static DEFINE_MUTEX(vendor_module_lock);
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
+
struct kvm_x86_ops kvm_x86_ops __read_mostly;
#define KVM_X86_OP(func) \
@@ -4566,6 +4569,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
EXPORT_SYMBOL_GPL(kvm_get_msr_common);
+/*
+ * Returns true if the MSR in question is managed via XSTATE, i.e. is context
+ * switched with the rest of guest FPU state.
+ */
+static bool is_xstate_managed_msr(u32 index)
+{
+ switch (index) {
+ case MSR_IA32_S_CET:
+ case MSR_IA32_U_CET:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ return true;
+ default:
+ return false;
+ }
+}
+
/*
* Read or write a bunch of msrs. All parameters are kernel addresses.
*
@@ -4576,11 +4595,26 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
int (*do_msr)(struct kvm_vcpu *vcpu,
unsigned index, u64 *data))
{
+ bool fpu_loaded = false;
int i;
- for (i = 0; i < msrs->nmsrs; ++i)
+ for (i = 0; i < msrs->nmsrs; ++i) {
+ /*
+ * If userspace is accessing one or more XSTATE-managed MSRs,
+ * temporarily load the guest's FPU state so that the guest's
+ * MSR value(s) is resident in hardware, i.e. so that KVM can
+ * get/set the MSR via RDMSR/WRMSR.
+ */
+ if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
+ is_xstate_managed_msr(entries[i].index)) {
+ kvm_load_guest_fpu(vcpu);
+ fpu_loaded = true;
+ }
if (do_msr(vcpu, entries[i].index, &entries[i].data))
break;
+ }
+ if (fpu_loaded)
+ kvm_put_guest_fpu(vcpu);
return i;
}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index eb3088684e8a..34afe43579bb 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -701,4 +701,28 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
+/*
+ * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
+ * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
+ * guest FPU should have been loaded already.
+ */
+
+static inline void kvm_get_xstate_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info)
+{
+ KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
+ kvm_fpu_get();
+ rdmsrq(msr_info->index, msr_info->data);
+ kvm_fpu_put();
+}
+
+static inline void kvm_set_xstate_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info)
+{
+ KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
+ kvm_fpu_get();
+ wrmsrq(msr_info->index, msr_info->data);
+ kvm_fpu_put();
+}
+
#endif
--
2.47.3
On 9/9/2025 5:39 PM, Chao Gao wrote:
> From: Sean Christopherson <seanjc@google.com>
>
> Load the guest's FPU state if userspace is accessing MSRs whose values
> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
> to facilitate access to such kind of MSRs.
>
> If MSRs supported in kvm_caps.supported_xss are passed through to guest,
> the guest MSRs are swapped with host's before vCPU exits to userspace and
> after it reenters kernel before next VM-entry.
>
> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
> explicitly check @vcpu is non-null before attempting to load guest state.
> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
> loading guest FPU state (which doesn't exist).
>
> Note that guest_cpuid_has() is not queried as host userspace is allowed to
> access MSRs that have not been exposed to the guest, e.g. it might do
> KVM_SET_MSRS prior to KVM_SET_CPUID2.
>
> The two helpers are put here in order to manifest accessing xsave-managed
> MSRs requires special check and handling to guarantee the correctness of
> read/write to the MSRs.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v14:
> - s/rdmsrl/rdmsrq, s/wrmsrl/wrmsrq (Xin)
> - return true in is_xstate_managed_msr() for MSR_IA32_S_CET
> ---
> arch/x86/kvm/x86.c | 36 +++++++++++++++++++++++++++++++++++-
> arch/x86/kvm/x86.h | 24 ++++++++++++++++++++++++
> 2 files changed, 59 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c15e8c00dc7d..7c0a07be6b64 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
> static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
>
> static DEFINE_MUTEX(vendor_module_lock);
> +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
> +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
> +
> struct kvm_x86_ops kvm_x86_ops __read_mostly;
>
> #define KVM_X86_OP(func) \
> @@ -4566,6 +4569,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> }
> EXPORT_SYMBOL_GPL(kvm_get_msr_common);
>
> +/*
> + * Returns true if the MSR in question is managed via XSTATE, i.e. is context
> + * switched with the rest of guest FPU state.
> + */
> +static bool is_xstate_managed_msr(u32 index)
> +{
> + switch (index) {
> + case MSR_IA32_S_CET:
> + case MSR_IA32_U_CET:
> + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> /*
> * Read or write a bunch of msrs. All parameters are kernel addresses.
> *
> @@ -4576,11 +4595,26 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
> int (*do_msr)(struct kvm_vcpu *vcpu,
> unsigned index, u64 *data))
> {
> + bool fpu_loaded = false;
> int i;
>
> - for (i = 0; i < msrs->nmsrs; ++i)
> + for (i = 0; i < msrs->nmsrs; ++i) {
> + /*
> + * If userspace is accessing one or more XSTATE-managed MSRs,
> + * temporarily load the guest's FPU state so that the guest's
> + * MSR value(s) is resident in hardware, i.e. so that KVM can
> + * get/set the MSR via RDMSR/WRMSR.
> + */
> + if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
why not check vcpu->arch.guest_supported_xss?
> + is_xstate_managed_msr(entries[i].index)) {
> + kvm_load_guest_fpu(vcpu);
> + fpu_loaded = true;
> + }
> if (do_msr(vcpu, entries[i].index, &entries[i].data))
> break;
> + }
> + if (fpu_loaded)
> + kvm_put_guest_fpu(vcpu);
>
> return i;
> }
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index eb3088684e8a..34afe43579bb 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -701,4 +701,28 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
>
> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
>
> +/*
> + * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
> + * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
> + * guest FPU should have been loaded already.
> + */
> +
> +static inline void kvm_get_xstate_msr(struct kvm_vcpu *vcpu,
> + struct msr_data *msr_info)
> +{
> + KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
> + kvm_fpu_get();
> + rdmsrq(msr_info->index, msr_info->data);
> + kvm_fpu_put();
> +}
> +
> +static inline void kvm_set_xstate_msr(struct kvm_vcpu *vcpu,
> + struct msr_data *msr_info)
> +{
> + KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
> + kvm_fpu_get();
> + wrmsrq(msr_info->index, msr_info->data);
> + kvm_fpu_put();
> +}
> +
> #endif
On Wed, Sep 10, 2025 at 05:37:50PM +0800, Xiaoyao Li wrote:
>On 9/9/2025 5:39 PM, Chao Gao wrote:
>> From: Sean Christopherson <seanjc@google.com>
>>
>> Load the guest's FPU state if userspace is accessing MSRs whose values
>> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
>> to facilitate access to such kind of MSRs.
>>
>> If MSRs supported in kvm_caps.supported_xss are passed through to guest,
>> the guest MSRs are swapped with host's before vCPU exits to userspace and
>> after it reenters kernel before next VM-entry.
>>
>> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
>> explicitly check @vcpu is non-null before attempting to load guest state.
>> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
>> loading guest FPU state (which doesn't exist).
>>
>> Note that guest_cpuid_has() is not queried as host userspace is allowed to
>> access MSRs that have not been exposed to the guest, e.g. it might do
>> KVM_SET_MSRS prior to KVM_SET_CPUID2.
...
>> + bool fpu_loaded = false;
>> int i;
>> - for (i = 0; i < msrs->nmsrs; ++i)
>> + for (i = 0; i < msrs->nmsrs; ++i) {
>> + /*
>> + * If userspace is accessing one or more XSTATE-managed MSRs,
>> + * temporarily load the guest's FPU state so that the guest's
>> + * MSR value(s) is resident in hardware, i.e. so that KVM can
>> + * get/set the MSR via RDMSR/WRMSR.
>> + */
>> + if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
>
>why not check vcpu->arch.guest_supported_xss?
Looks like Sean anticipated someone would ask this question.
On Wed, Sep 10, 2025, Chao Gao wrote:
> On Wed, Sep 10, 2025 at 05:37:50PM +0800, Xiaoyao Li wrote:
> >On 9/9/2025 5:39 PM, Chao Gao wrote:
> >> From: Sean Christopherson <seanjc@google.com>
> >>
> >> Load the guest's FPU state if userspace is accessing MSRs whose values
> >> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
> >> to facilitate access to such kind of MSRs.
> >>
> >> If MSRs supported in kvm_caps.supported_xss are passed through to guest,
> >> the guest MSRs are swapped with host's before vCPU exits to userspace and
> >> after it reenters kernel before next VM-entry.
> >>
> >> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
> >> explicitly check @vcpu is non-null before attempting to load guest state.
> >> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
> >> loading guest FPU state (which doesn't exist).
> >>
> >> Note that guest_cpuid_has() is not queried as host userspace is allowed to
> >> access MSRs that have not been exposed to the guest, e.g. it might do
> >> KVM_SET_MSRS prior to KVM_SET_CPUID2.
>
> ...
>
> >> + bool fpu_loaded = false;
> >> int i;
> >> - for (i = 0; i < msrs->nmsrs; ++i)
> >> + for (i = 0; i < msrs->nmsrs; ++i) {
> >> + /*
> >> + * If userspace is accessing one or more XSTATE-managed MSRs,
> >> + * temporarily load the guest's FPU state so that the guest's
> >> + * MSR value(s) is resident in hardware, i.e. so that KVM can
> >> + * get/set the MSR via RDMSR/WRMSR.
> >> + */
> >> + if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
> >
> >why not check vcpu->arch.guest_supported_xss?
>
> Looks like Sean anticipated someone would ask this question.
I don't think so, I'm pretty sure querying kvm_caps.supported_xss is a holdover
from the early days of this patch, e.g. before guest_cpu_cap_has() existed, and
potentially even before vcpu->arch.guest_supported_xss existed.
I'm pretty sure we can make this less weird and more accurate:
/*
* Returns true if the MSR in question is managed via XSTATE, i.e. is context
* switched with the rest of guest FPU state. Note! S_CET is _not_ context
* switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS.
* Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields,
* the value saved/restored via XSTATE is always the host's value. That detail
* is _extremely_ important, as the guest's S_CET must _never_ be resident in
* hardware while executing in the host. Loading guest values for U_CET and
* PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to
* userspace, and PL[0-3]_SSP are only consumed when transitioning to lower
* privilegel levels, i.e. are effectively only consumed by userspace as well.
*/
static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr)
{
if (!vcpu)
return false;
switch (msr) {
case MSR_IA32_U_CET:
return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ||
guest_cpu_cap_has(vcpu, X86_FEATURE_IBT);
case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
default:
return false;
}
}
Which is very desirable because the KVM_{G,S}ET_ONE_REG path also needs to
load/put the FPU, as found via a WIP selftest that tripped:
KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
And if we simplify is_xstate_managed_msr(), then the accessors can also do:
KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm);
On 9/10/2025 7:18 PM, Chao Gao wrote:
> On Wed, Sep 10, 2025 at 05:37:50PM +0800, Xiaoyao Li wrote:
>> On 9/9/2025 5:39 PM, Chao Gao wrote:
>>> From: Sean Christopherson <seanjc@google.com>
>>>
>>> Load the guest's FPU state if userspace is accessing MSRs whose values
>>> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
>>> to facilitate access to such kind of MSRs.
>>>
>>> If MSRs supported in kvm_caps.supported_xss are passed through to guest,
>>> the guest MSRs are swapped with host's before vCPU exits to userspace and
>>> after it reenters kernel before next VM-entry.
>>>
>>> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
>>> explicitly check @vcpu is non-null before attempting to load guest state.
>>> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
>>> loading guest FPU state (which doesn't exist).
>>>
>>> Note that guest_cpuid_has() is not queried as host userspace is allowed to
>>> access MSRs that have not been exposed to the guest, e.g. it might do
>>> KVM_SET_MSRS prior to KVM_SET_CPUID2.
>
> ...
>
>>> + bool fpu_loaded = false;
>>> int i;
>>> - for (i = 0; i < msrs->nmsrs; ++i)
>>> + for (i = 0; i < msrs->nmsrs; ++i) {
>>> + /*
>>> + * If userspace is accessing one or more XSTATE-managed MSRs,
>>> + * temporarily load the guest's FPU state so that the guest's
>>> + * MSR value(s) is resident in hardware, i.e. so that KVM can
>>> + * get/set the MSR via RDMSR/WRMSR.
>>> + */
>>> + if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
>>
>> why not check vcpu->arch.guest_supported_xss?
>
> Looks like Sean anticipated someone would ask this question.
here it determines whether to call kvm_load_guest_fpu().
- based on kvm_caps.supported_xss, it will always load guest fpu.
- based on vcpu->arch.guest_supported_xss, it depends on whether
userspace calls KVM_SET_CPUID2 and whether it enables any XSS feature.
So the difference is when no XSS feature is enabled for the VM.
In this case, if checking vcpu->arch.guest_supported_xss, it will skip
kvm_load_guest_fpu(). And it will result in GET_MSR gets usrerspace's
value and SET_MSR changes userspace's value, when MSR access is
eventually allowed in later do_msr() callback. Is my understanding
correctly?
On Wed, Sep 10, 2025 at 09:46:01PM +0800, Xiaoyao Li wrote:
>On 9/10/2025 7:18 PM, Chao Gao wrote:
>> On Wed, Sep 10, 2025 at 05:37:50PM +0800, Xiaoyao Li wrote:
>> > On 9/9/2025 5:39 PM, Chao Gao wrote:
>> > > From: Sean Christopherson <seanjc@google.com>
>> > >
>> > > Load the guest's FPU state if userspace is accessing MSRs whose values
>> > > are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
>> > > to facilitate access to such kind of MSRs.
>> > >
>> > > If MSRs supported in kvm_caps.supported_xss are passed through to guest,
>> > > the guest MSRs are swapped with host's before vCPU exits to userspace and
>> > > after it reenters kernel before next VM-entry.
>> > >
>> > > Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
>> > > explicitly check @vcpu is non-null before attempting to load guest state.
>> > > The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
>> > > loading guest FPU state (which doesn't exist).
>> > >
>> > > Note that guest_cpuid_has() is not queried as host userspace is allowed to
>> > > access MSRs that have not been exposed to the guest, e.g. it might do
>> > > KVM_SET_MSRS prior to KVM_SET_CPUID2.
>>
>> ...
>>
>> > > + bool fpu_loaded = false;
>> > > int i;
>> > > - for (i = 0; i < msrs->nmsrs; ++i)
>> > > + for (i = 0; i < msrs->nmsrs; ++i) {
>> > > + /*
>> > > + * If userspace is accessing one or more XSTATE-managed MSRs,
>> > > + * temporarily load the guest's FPU state so that the guest's
>> > > + * MSR value(s) is resident in hardware, i.e. so that KVM can
>> > > + * get/set the MSR via RDMSR/WRMSR.
>> > > + */
>> > > + if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
>> >
>> > why not check vcpu->arch.guest_supported_xss?
>>
>> Looks like Sean anticipated someone would ask this question.
>
>here it determines whether to call kvm_load_guest_fpu().
>
>- based on kvm_caps.supported_xss, it will always load guest fpu.
>- based on vcpu->arch.guest_supported_xss, it depends on whether userspace
>calls KVM_SET_CPUID2 and whether it enables any XSS feature.
>
>So the difference is when no XSS feature is enabled for the VM.
>
>In this case, if checking vcpu->arch.guest_supported_xss, it will skip
>kvm_load_guest_fpu(). And it will result in GET_MSR gets usrerspace's value
>and SET_MSR changes userspace's value, when MSR access is eventually allowed
>in later do_msr() callback. Is my understanding correctly?
Actually, there will be no functional issue.
Those MSR accesses are always "rejected" with KVM_MSR_RET_UNSUPPORTED by
__kvm_set/get_msr() and get fixup if they are "host_initiated" in
kvm_do_msr_access(). KVM doesn't access any hardware MSRs in the process.
Using vcpu->arch.guest_supported_xss here also works, but the correctness
isn't that obvious for this special case.
© 2016 - 2026 Red Hat, Inc.