[PATCH] x86/fpu/xstate: clear XSAVE features if DISABLED_MASK set

Jon Kohler posted 1 patch 2 years, 8 months ago
arch/x86/kernel/fpu/xstate.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
[PATCH] x86/fpu/xstate: clear XSAVE features if DISABLED_MASK set
Posted by Jon Kohler 2 years, 8 months ago
Respect DISABLED_MASK when clearing XSAVE features, such that features
that are disabled do not appear in the xfeatures mask.

This is important for kvm_load_{guest|host}_xsave_state, which look
at host_xcr0 and will do an expensive xsetbv when the guest and host
do not match.

A prime example if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled,
the guest OS will not see PKU masked; however, the guest will incur
xsetbv since the host mask will never match the guest, even though
DISABLED_MASK16 has DISABLE_PKU set.

Signed-off-by: Jon Kohler <jon@nutanix.com>
CC: kvm@vger.kernel.org
CC: Sean Christopherson <seanjc@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kernel/fpu/xstate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 0bab497c9436..211ef82b53e3 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -798,7 +798,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 		unsigned short cid = xsave_cpuid_features[i];

 		/* Careful: X86_FEATURE_FPU is 0! */
-		if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid))
+		if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) ||
+		    DISABLED_MASK_BIT_SET(cid))
 			fpu_kernel_cfg.max_features &= ~BIT_ULL(i);
 	}

--
2.30.1 (Apple Git-130)
Re: [PATCH] x86/fpu/xstate: clear XSAVE features if DISABLED_MASK set
Posted by Dave Hansen 2 years, 8 months ago
On 5/30/23 13:01, Jon Kohler wrote:
> Respect DISABLED_MASK when clearing XSAVE features, such that features
> that are disabled do not appear in the xfeatures mask.

One sanity check that I'd suggest adopting is "How many other users in
the code do this?"  How many DISABLED_MASK_BIT_SET() users are there?

> This is important for kvm_load_{guest|host}_xsave_state, which look
> at host_xcr0 and will do an expensive xsetbv when the guest and host
> do not match.

Is that the only problem?  kvm_load_guest_xsave_state() seems to have
some #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS code and I can't
imagine that KVM guests can even use PKRU if this code is compiled out.

Also, this will set XFEATURE_PKRU in xcr0 and go to the trouble of
XSAVE/XRSTOR'ing it all over the place even for regular tasks.

> A prime example if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled,
> the guest OS will not see PKU masked; however, the guest will incur
> xsetbv since the host mask will never match the guest, even though
> DISABLED_MASK16 has DISABLE_PKU set.

OK, so you care because you're seeing KVM go slow.  You tracked it down
to lots of XSETBV's?  You said, "what the heck, why is it doing XSETBV
so much?" and tracked it down to 'max_features' which we use to populate
XCR0?

> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 0bab497c9436..211ef82b53e3 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -798,7 +798,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
>  		unsigned short cid = xsave_cpuid_features[i];
> 
>  		/* Careful: X86_FEATURE_FPU is 0! */
> -		if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid))
> +		if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) ||
> +		    DISABLED_MASK_BIT_SET(cid))
>  			fpu_kernel_cfg.max_features &= ~BIT_ULL(i);
>  	}

I _think_ I'd rather this just be cpu_feature_enabled(cid) rather than
using DISABLED_MASK_BIT_SET() directly.

But, I guess this probably also isn't a big deal for _most_ people.  Any
sane distro kernel will just set CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
since it's pretty widespread on modern CPUs and works across Intel and
AMD now.
Re: [PATCH] x86/fpu/xstate: clear XSAVE features if DISABLED_MASK set
Posted by Jon Kohler 2 years, 8 months ago

> On May 30, 2023, at 6:22 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 5/30/23 13:01, Jon Kohler wrote:
>> Respect DISABLED_MASK when clearing XSAVE features, such that features
>> that are disabled do not appear in the xfeatures mask.
> 
> One sanity check that I'd suggest adopting is "How many other users in
> the code do this?"  How many DISABLED_MASK_BIT_SET() users are there?

Good tip, thank you. Just cpu_feature_enabled(), though I felt that using
DISABLED_MASK_BIT_SET() really does capture *exactly* what I’m trying to
do here. 

Happy to take suggestions, perhaps !cpu_feature_enabled(cid) instead?

Or, I did noodle with the idea of making a cpu_feature_disabled() as an 
alias for DISABLED_MASK_BIT_SET(), but that felt like bloating the change
for little gain.

> 
>> This is important for kvm_load_{guest|host}_xsave_state, which look
>> at host_xcr0 and will do an expensive xsetbv when the guest and host
>> do not match.
> 
> Is that the only problem?  kvm_load_guest_xsave_state() seems to have
> some #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS code and I can't
> imagine that KVM guests can even use PKRU if this code is compiled out.
> 
> Also, this will set XFEATURE_PKRU in xcr0 and go to the trouble of
> XSAVE/XRSTOR'ing it all over the place even for regular tasks.

Correct, KVM isn’t the only beneficiary here as you rightly pointed out. I’m
happy to clarify that in the commit msg if you’d like. 

Also, ack on the ifdef’s, I added those myself way back when, this change is
an addendum to that one to nuke the xsetbv overhead.

>> A prime example if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled,
>> the guest OS will not see PKU masked; however, the guest will incur
>> xsetbv since the host mask will never match the guest, even though
>> DISABLED_MASK16 has DISABLE_PKU set.
> 
> OK, so you care because you're seeing KVM go slow.  You tracked it down
> to lots of XSETBV's?  You said, "what the heck, why is it doing XSETBV
> so much?" and tracked it down to 'max_features' which we use to populate
> XCR0?

Yes and Yes, that is exactly how I arrived here. kvm_load_{guest|host}_xsave_state
is on the critical path for vmexit / vmentry. This overhead sticks out like a sore
thumb when looking at perf top or visualizing threads with a flamegraph if the guest,
for whatever reason, has a different xcr0 than the host. That is easy to do if guests
have PKU masked out.

> 
>> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
>> index 0bab497c9436..211ef82b53e3 100644
>> --- a/arch/x86/kernel/fpu/xstate.c
>> +++ b/arch/x86/kernel/fpu/xstate.c
>> @@ -798,7 +798,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
>> 		unsigned short cid = xsave_cpuid_features[i];
>> 
>> 		/* Careful: X86_FEATURE_FPU is 0! */
>> -		if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid))
>> +		if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) ||
>> +		    DISABLED_MASK_BIT_SET(cid))
>> 			fpu_kernel_cfg.max_features &= ~BIT_ULL(i);
>> 	}
> 
> I _think_ I'd rather this just be cpu_feature_enabled(cid) rather than
> using DISABLED_MASK_BIT_SET() directly.
> 
> But, I guess this probably also isn't a big deal for _most_ people.  Any
> sane distro kernel will just set CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> since it's pretty widespread on modern CPUs and works across Intel and
> AMD now.

Ack, I’m using PKU as the key example here, but looking forward this is more of a
correctness thing than anything else. If for any reason, any xsave feature is disabled
In the way that PKU is disabled, it will slip thru the cracks.

If it would make it cleaner, I’m happy to drop the example from the commit msg to
prevent any confusion that this is PKU specific in any way.

Thoughts?

Re: [PATCH] x86/fpu/xstate: clear XSAVE features if DISABLED_MASK set
Posted by Sean Christopherson 2 years, 8 months ago
On Wed, May 31, 2023, Jon Kohler wrote:
> 
> > On May 30, 2023, at 6:22 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> > 
> > On 5/30/23 13:01, Jon Kohler wrote:
> > Is that the only problem?  kvm_load_guest_xsave_state() seems to have
> > some #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS code and I can't
> > imagine that KVM guests can even use PKRU if this code is compiled out.

...

> >> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> >> index 0bab497c9436..211ef82b53e3 100644
> >> --- a/arch/x86/kernel/fpu/xstate.c
> >> +++ b/arch/x86/kernel/fpu/xstate.c
> >> @@ -798,7 +798,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
> >> 		unsigned short cid = xsave_cpuid_features[i];
> >> 
> >> 		/* Careful: X86_FEATURE_FPU is 0! */
> >> -		if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid))
> >> +		if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) ||
> >> +		    DISABLED_MASK_BIT_SET(cid))
> >> 			fpu_kernel_cfg.max_features &= ~BIT_ULL(i);
> >> 	}
> > 
> > I _think_ I'd rather this just be cpu_feature_enabled(cid) rather than
> > using DISABLED_MASK_BIT_SET() directly.

+1, xstate.c uses cpu_feature_enabled() all over the place, and IMO effectively
open coding cpu_feature_enabled() yields less intuitive code.

And on the KVM side, we can and should replace the #ifdef with cpu_feature_enabled()
(I'll post a patch), as modern compilers are clever enough to completely optimize
out the code when CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=n.  At that point, using
cpu_feature_enabled() in both KVM and xstate.c will provide a nice bit of symmetry.

Caveat #1: cpu_feature_enabled() has a flaw that's relevant to this code: in the
unlikely scenario that the compiler doesn't resolve "cid" to a compile-time
constant value, cpu_feature_enabled() won't query DISABLED_MASK_BIT_SET().  I don't
see any other use of cpu_feature_enabled() without a hardcoded X86_FEATURE_*, and
the below compiles with my config, so I think/hope we can just require a compile-time
constant when using cpu_feature_enabled().

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index ce0c8f7d3218..886200fbf8d9 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -141,8 +141,11 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
  * supporting a possible guest feature where host support for it
  * is not relevant.
  */
-#define cpu_feature_enabled(bit)       \
-       (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit))
+#define cpu_feature_enabled(bit)                               \
+({                                                             \
+       BUILD_BUG_ON(!__builtin_constant_p(bit));               \
+       DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit);   \
+})
 
 #define boot_cpu_has(bit)      cpu_has(&boot_cpu_data, bit)
 
Caveat #2: Using cpu_feature_enabled() could subtly break KVM, as KVM advertises
support for features based on boot_cpu_data.  E.g. if a feature were disabled by
Kconfig but present in hardware, KVM would allow the guest to use the feature
without properly context switching the data.  PKU isn't problematic because KVM
explicitly gates PKU on boot_cpu_has(X86_FEATURE_OSPKE), but KVM learned that
lesson the hard way (see commit c469268cd523, "KVM: x86: block guest protection
keys unless the host has them enabled").  Exposing a feature that's disabled in
the host isn't completely absurd, e.g. KVM already effectively does this for MPX.
The only reason using cpu_feature_enabled() wouldn't be problematic for MPX is
because there's no longer a Kconfig for MPX.

I'm totally ok gating xfeature bits on cpu_feature_enabled(), but there should be
a prep patch for KVM to clear features bits in kvm_cpu_caps if the corresponding
XCR0/XSS bit is not set in the host.  If KVM ever wants to expose an xstate feature
(other than MPX) that's disabled in the host, then we can revisit
fpu__init_system_xstate().  But we need to ensure the "failure" mode is that
KVM doesn't advertise the feature, as opposed to advertising a feature without
without context switching its data.

> > But, I guess this probably also isn't a big deal for _most_ people.  Any
> > sane distro kernel will just set CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> > since it's pretty widespread on modern CPUs and works across Intel and
> > AMD now.
> 
> Ack, I’m using PKU as the key example here, but looking forward this is more of a
> correctness thing than anything else. If for any reason, any xsave feature is disabled
> In the way that PKU is disabled, it will slip thru the cracks.

I'd be careful about billing this as a correctness thing.  See above.