From: Sean Christopherson <seanjc@google.com>
Load the guest's FPU state if userspace is accessing MSRs whose values
are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
to facilitate access to such kind of MSRs.
If MSRs supported in kvm_caps.supported_xss are passed through to guest,
the guest MSRs are swapped with host's before vCPU exits to userspace and
after it reenters kernel before next VM-entry.
Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
explicitly check @vcpu is non-null before attempting to load guest state.
The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
loading guest FPU state (which doesn't exist).
Note that guest_cpuid_has() is not queried as host userspace is allowed to
access MSRs that have not been exposed to the guest, e.g. it might do
KVM_SET_MSRS prior to KVM_SET_CPUID2.
The two helpers are put here in order to manifest accessing xsave-managed
MSRs requires special check and handling to guarantee the correctness of
read/write to the MSRs.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v14:
- s/rdmsrl/rdmsrq, s/wrmsrl/wrmsrq (Xin)
- return true in is_xstate_managed_msr() for MSR_IA32_S_CET
---
arch/x86/kvm/x86.c | 36 +++++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.h | 24 ++++++++++++++++++++++++
2 files changed, 59 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c15e8c00dc7d..7c0a07be6b64 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
static DEFINE_MUTEX(vendor_module_lock);
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
+
struct kvm_x86_ops kvm_x86_ops __read_mostly;
#define KVM_X86_OP(func) \
@@ -4566,6 +4569,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
EXPORT_SYMBOL_GPL(kvm_get_msr_common);
+/*
+ * Returns true if the MSR in question is managed via XSTATE, i.e. is context
+ * switched with the rest of guest FPU state.
+ */
+static bool is_xstate_managed_msr(u32 index)
+{
+ switch (index) {
+ case MSR_IA32_S_CET:
+ case MSR_IA32_U_CET:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ return true;
+ default:
+ return false;
+ }
+}
+
/*
* Read or write a bunch of msrs. All parameters are kernel addresses.
*
@@ -4576,11 +4595,26 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
int (*do_msr)(struct kvm_vcpu *vcpu,
unsigned index, u64 *data))
{
+ bool fpu_loaded = false;
int i;
- for (i = 0; i < msrs->nmsrs; ++i)
+ for (i = 0; i < msrs->nmsrs; ++i) {
+ /*
+ * If userspace is accessing one or more XSTATE-managed MSRs,
+ * temporarily load the guest's FPU state so that the guest's
+ * MSR value(s) is resident in hardware, i.e. so that KVM can
+ * get/set the MSR via RDMSR/WRMSR.
+ */
+ if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
+ is_xstate_managed_msr(entries[i].index)) {
+ kvm_load_guest_fpu(vcpu);
+ fpu_loaded = true;
+ }
if (do_msr(vcpu, entries[i].index, &entries[i].data))
break;
+ }
+ if (fpu_loaded)
+ kvm_put_guest_fpu(vcpu);
return i;
}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index eb3088684e8a..34afe43579bb 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -701,4 +701,28 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
+/*
+ * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
+ * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
+ * guest FPU should have been loaded already.
+ */
+
+static inline void kvm_get_xstate_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info)
+{
+ KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
+ kvm_fpu_get();
+ rdmsrq(msr_info->index, msr_info->data);
+ kvm_fpu_put();
+}
+
+static inline void kvm_set_xstate_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info)
+{
+ KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
+ kvm_fpu_get();
+ wrmsrq(msr_info->index, msr_info->data);
+ kvm_fpu_put();
+}
+
#endif
--
2.47.3
On 9/9/2025 5:39 PM, Chao Gao wrote: > From: Sean Christopherson <seanjc@google.com> > > Load the guest's FPU state if userspace is accessing MSRs whose values > are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(), > to facilitate access to such kind of MSRs. > > If MSRs supported in kvm_caps.supported_xss are passed through to guest, > the guest MSRs are swapped with host's before vCPU exits to userspace and > after it reenters kernel before next VM-entry. > > Because the modified code is also used for the KVM_GET_MSRS device ioctl(), > explicitly check @vcpu is non-null before attempting to load guest state. > The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without > loading guest FPU state (which doesn't exist). > > Note that guest_cpuid_has() is not queried as host userspace is allowed to > access MSRs that have not been exposed to the guest, e.g. it might do > KVM_SET_MSRS prior to KVM_SET_CPUID2. > > The two helpers are put here in order to manifest accessing xsave-managed > MSRs requires special check and handling to guarantee the correctness of > read/write to the MSRs. > > Signed-off-by: Sean Christopherson <seanjc@google.com> > Co-developed-by: Yang Weijiang <weijiang.yang@intel.com> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> > Tested-by: Mathias Krause <minipli@grsecurity.net> > Tested-by: John Allen <john.allen@amd.com> > Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > Signed-off-by: Chao Gao <chao.gao@intel.com> > --- > v14: > - s/rdmsrl/rdmsrq, s/wrmsrl/wrmsrq (Xin) > - return true in is_xstate_managed_msr() for MSR_IA32_S_CET > --- > arch/x86/kvm/x86.c | 36 +++++++++++++++++++++++++++++++++++- > arch/x86/kvm/x86.h | 24 ++++++++++++++++++++++++ > 2 files changed, 59 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index c15e8c00dc7d..7c0a07be6b64 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2); > static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2); > > static DEFINE_MUTEX(vendor_module_lock); > +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); > +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); > + > struct kvm_x86_ops kvm_x86_ops __read_mostly; > > #define KVM_X86_OP(func) \ > @@ -4566,6 +4569,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > } > EXPORT_SYMBOL_GPL(kvm_get_msr_common); > > +/* > + * Returns true if the MSR in question is managed via XSTATE, i.e. is context > + * switched with the rest of guest FPU state. > + */ > +static bool is_xstate_managed_msr(u32 index) > +{ > + switch (index) { > + case MSR_IA32_S_CET: > + case MSR_IA32_U_CET: > + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: > + return true; > + default: > + return false; > + } > +} > + > /* > * Read or write a bunch of msrs. All parameters are kernel addresses. > * > @@ -4576,11 +4595,26 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs, > int (*do_msr)(struct kvm_vcpu *vcpu, > unsigned index, u64 *data)) > { > + bool fpu_loaded = false; > int i; > > - for (i = 0; i < msrs->nmsrs; ++i) > + for (i = 0; i < msrs->nmsrs; ++i) { > + /* > + * If userspace is accessing one or more XSTATE-managed MSRs, > + * temporarily load the guest's FPU state so that the guest's > + * MSR value(s) is resident in hardware, i.e. so that KVM can > + * get/set the MSR via RDMSR/WRMSR. > + */ > + if (vcpu && !fpu_loaded && kvm_caps.supported_xss && why not check vcpu->arch.guest_supported_xss? > + is_xstate_managed_msr(entries[i].index)) { > + kvm_load_guest_fpu(vcpu); > + fpu_loaded = true; > + } > if (do_msr(vcpu, entries[i].index, &entries[i].data)) > break; > + } > + if (fpu_loaded) > + kvm_put_guest_fpu(vcpu); > > return i; > } > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h > index eb3088684e8a..34afe43579bb 100644 > --- a/arch/x86/kvm/x86.h > +++ b/arch/x86/kvm/x86.h > @@ -701,4 +701,28 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl, > > int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); > > +/* > + * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated > + * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest, > + * guest FPU should have been loaded already. > + */ > + > +static inline void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, > + struct msr_data *msr_info) > +{ > + KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm); > + kvm_fpu_get(); > + rdmsrq(msr_info->index, msr_info->data); > + kvm_fpu_put(); > +} > + > +static inline void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, > + struct msr_data *msr_info) > +{ > + KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm); > + kvm_fpu_get(); > + wrmsrq(msr_info->index, msr_info->data); > + kvm_fpu_put(); > +} > + > #endif
On Wed, Sep 10, 2025 at 05:37:50PM +0800, Xiaoyao Li wrote: >On 9/9/2025 5:39 PM, Chao Gao wrote: >> From: Sean Christopherson <seanjc@google.com> >> >> Load the guest's FPU state if userspace is accessing MSRs whose values >> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(), >> to facilitate access to such kind of MSRs. >> >> If MSRs supported in kvm_caps.supported_xss are passed through to guest, >> the guest MSRs are swapped with host's before vCPU exits to userspace and >> after it reenters kernel before next VM-entry. >> >> Because the modified code is also used for the KVM_GET_MSRS device ioctl(), >> explicitly check @vcpu is non-null before attempting to load guest state. >> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without >> loading guest FPU state (which doesn't exist). >> >> Note that guest_cpuid_has() is not queried as host userspace is allowed to >> access MSRs that have not been exposed to the guest, e.g. it might do >> KVM_SET_MSRS prior to KVM_SET_CPUID2. ... >> + bool fpu_loaded = false; >> int i; >> - for (i = 0; i < msrs->nmsrs; ++i) >> + for (i = 0; i < msrs->nmsrs; ++i) { >> + /* >> + * If userspace is accessing one or more XSTATE-managed MSRs, >> + * temporarily load the guest's FPU state so that the guest's >> + * MSR value(s) is resident in hardware, i.e. so that KVM can >> + * get/set the MSR via RDMSR/WRMSR. >> + */ >> + if (vcpu && !fpu_loaded && kvm_caps.supported_xss && > >why not check vcpu->arch.guest_supported_xss? Looks like Sean anticipated someone would ask this question.
On Wed, Sep 10, 2025, Chao Gao wrote: > On Wed, Sep 10, 2025 at 05:37:50PM +0800, Xiaoyao Li wrote: > >On 9/9/2025 5:39 PM, Chao Gao wrote: > >> From: Sean Christopherson <seanjc@google.com> > >> > >> Load the guest's FPU state if userspace is accessing MSRs whose values > >> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(), > >> to facilitate access to such kind of MSRs. > >> > >> If MSRs supported in kvm_caps.supported_xss are passed through to guest, > >> the guest MSRs are swapped with host's before vCPU exits to userspace and > >> after it reenters kernel before next VM-entry. > >> > >> Because the modified code is also used for the KVM_GET_MSRS device ioctl(), > >> explicitly check @vcpu is non-null before attempting to load guest state. > >> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without > >> loading guest FPU state (which doesn't exist). > >> > >> Note that guest_cpuid_has() is not queried as host userspace is allowed to > >> access MSRs that have not been exposed to the guest, e.g. it might do > >> KVM_SET_MSRS prior to KVM_SET_CPUID2. > > ... > > >> + bool fpu_loaded = false; > >> int i; > >> - for (i = 0; i < msrs->nmsrs; ++i) > >> + for (i = 0; i < msrs->nmsrs; ++i) { > >> + /* > >> + * If userspace is accessing one or more XSTATE-managed MSRs, > >> + * temporarily load the guest's FPU state so that the guest's > >> + * MSR value(s) is resident in hardware, i.e. so that KVM can > >> + * get/set the MSR via RDMSR/WRMSR. > >> + */ > >> + if (vcpu && !fpu_loaded && kvm_caps.supported_xss && > > > >why not check vcpu->arch.guest_supported_xss? > > Looks like Sean anticipated someone would ask this question. I don't think so, I'm pretty sure querying kvm_caps.supported_xss is a holdover from the early days of this patch, e.g. before guest_cpu_cap_has() existed, and potentially even before vcpu->arch.guest_supported_xss existed. I'm pretty sure we can make this less weird and more accurate: /* * Returns true if the MSR in question is managed via XSTATE, i.e. is context * switched with the rest of guest FPU state. Note! S_CET is _not_ context * switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS. * Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields, * the value saved/restored via XSTATE is always the host's value. That detail * is _extremely_ important, as the guest's S_CET must _never_ be resident in * hardware while executing in the host. Loading guest values for U_CET and * PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to * userspace, and PL[0-3]_SSP are only consumed when transitioning to lower * privilegel levels, i.e. are effectively only consumed by userspace as well. */ static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr) { if (!vcpu) return false; switch (msr) { case MSR_IA32_U_CET: return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) || guest_cpu_cap_has(vcpu, X86_FEATURE_IBT); case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP: return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK); default: return false; } } Which is very desirable because the KVM_{G,S}ET_ONE_REG path also needs to load/put the FPU, as found via a WIP selftest that tripped: KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm); And if we simplify is_xstate_managed_msr(), then the accessors can also do: KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm);
On 9/10/2025 7:18 PM, Chao Gao wrote: > On Wed, Sep 10, 2025 at 05:37:50PM +0800, Xiaoyao Li wrote: >> On 9/9/2025 5:39 PM, Chao Gao wrote: >>> From: Sean Christopherson <seanjc@google.com> >>> >>> Load the guest's FPU state if userspace is accessing MSRs whose values >>> are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(), >>> to facilitate access to such kind of MSRs. >>> >>> If MSRs supported in kvm_caps.supported_xss are passed through to guest, >>> the guest MSRs are swapped with host's before vCPU exits to userspace and >>> after it reenters kernel before next VM-entry. >>> >>> Because the modified code is also used for the KVM_GET_MSRS device ioctl(), >>> explicitly check @vcpu is non-null before attempting to load guest state. >>> The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without >>> loading guest FPU state (which doesn't exist). >>> >>> Note that guest_cpuid_has() is not queried as host userspace is allowed to >>> access MSRs that have not been exposed to the guest, e.g. it might do >>> KVM_SET_MSRS prior to KVM_SET_CPUID2. > > ... > >>> + bool fpu_loaded = false; >>> int i; >>> - for (i = 0; i < msrs->nmsrs; ++i) >>> + for (i = 0; i < msrs->nmsrs; ++i) { >>> + /* >>> + * If userspace is accessing one or more XSTATE-managed MSRs, >>> + * temporarily load the guest's FPU state so that the guest's >>> + * MSR value(s) is resident in hardware, i.e. so that KVM can >>> + * get/set the MSR via RDMSR/WRMSR. >>> + */ >>> + if (vcpu && !fpu_loaded && kvm_caps.supported_xss && >> >> why not check vcpu->arch.guest_supported_xss? > > Looks like Sean anticipated someone would ask this question. here it determines whether to call kvm_load_guest_fpu(). - based on kvm_caps.supported_xss, it will always load guest fpu. - based on vcpu->arch.guest_supported_xss, it depends on whether userspace calls KVM_SET_CPUID2 and whether it enables any XSS feature. So the difference is when no XSS feature is enabled for the VM. In this case, if checking vcpu->arch.guest_supported_xss, it will skip kvm_load_guest_fpu(). And it will result in GET_MSR gets usrerspace's value and SET_MSR changes userspace's value, when MSR access is eventually allowed in later do_msr() callback. Is my understanding correctly?
On Wed, Sep 10, 2025 at 09:46:01PM +0800, Xiaoyao Li wrote: >On 9/10/2025 7:18 PM, Chao Gao wrote: >> On Wed, Sep 10, 2025 at 05:37:50PM +0800, Xiaoyao Li wrote: >> > On 9/9/2025 5:39 PM, Chao Gao wrote: >> > > From: Sean Christopherson <seanjc@google.com> >> > > >> > > Load the guest's FPU state if userspace is accessing MSRs whose values >> > > are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(), >> > > to facilitate access to such kind of MSRs. >> > > >> > > If MSRs supported in kvm_caps.supported_xss are passed through to guest, >> > > the guest MSRs are swapped with host's before vCPU exits to userspace and >> > > after it reenters kernel before next VM-entry. >> > > >> > > Because the modified code is also used for the KVM_GET_MSRS device ioctl(), >> > > explicitly check @vcpu is non-null before attempting to load guest state. >> > > The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without >> > > loading guest FPU state (which doesn't exist). >> > > >> > > Note that guest_cpuid_has() is not queried as host userspace is allowed to >> > > access MSRs that have not been exposed to the guest, e.g. it might do >> > > KVM_SET_MSRS prior to KVM_SET_CPUID2. >> >> ... >> >> > > + bool fpu_loaded = false; >> > > int i; >> > > - for (i = 0; i < msrs->nmsrs; ++i) >> > > + for (i = 0; i < msrs->nmsrs; ++i) { >> > > + /* >> > > + * If userspace is accessing one or more XSTATE-managed MSRs, >> > > + * temporarily load the guest's FPU state so that the guest's >> > > + * MSR value(s) is resident in hardware, i.e. so that KVM can >> > > + * get/set the MSR via RDMSR/WRMSR. >> > > + */ >> > > + if (vcpu && !fpu_loaded && kvm_caps.supported_xss && >> > >> > why not check vcpu->arch.guest_supported_xss? >> >> Looks like Sean anticipated someone would ask this question. > >here it determines whether to call kvm_load_guest_fpu(). > >- based on kvm_caps.supported_xss, it will always load guest fpu. >- based on vcpu->arch.guest_supported_xss, it depends on whether userspace >calls KVM_SET_CPUID2 and whether it enables any XSS feature. > >So the difference is when no XSS feature is enabled for the VM. > >In this case, if checking vcpu->arch.guest_supported_xss, it will skip >kvm_load_guest_fpu(). And it will result in GET_MSR gets usrerspace's value >and SET_MSR changes userspace's value, when MSR access is eventually allowed >in later do_msr() callback. Is my understanding correctly? Actually, there will be no functional issue. Those MSR accesses are always "rejected" with KVM_MSR_RET_UNSUPPORTED by __kvm_set/get_msr() and get fixup if they are "host_initiated" in kvm_do_msr_access(). KVM doesn't access any hardware MSRs in the process. Using vcpu->arch.guest_supported_xss here also works, but the correctness isn't that obvious for this special case.
© 2016 - 2025 Red Hat, Inc.