[v2] KVM: EFER.LMSLE cleanup

[PATCH v2 0/3] KVM: EFER.LMSLE cleanup

Posted by Jim Mattson 3 years, 6 months ago

KVM has never properly virtualized EFER.LMSLE. However, when the
"nested" module parameter is set, KVM lets the guest set EFER.LMSLE.
Ostensibly, this is so that SLES11 Xen 4.0 will boot as a nested
hypervisor.

KVM passes EFER.LMSLE to the hardware through the VMCB, so
the setting works most of the time, but the KVM instruction emulator
completely ignores the bit, so incorrect guest behavior is almost
certainly assured.

With Zen3, AMD has abandoned EFER.LMSLE. KVM still allows it, though, as
long as "nested" is set. However, since the hardware doesn't support it,
the next VMRUN after the emulated WRMSR will fail with "invalid VMCB."

To clean things up, revert the hack that allowed a KVM guest to set
EFER.LMSLE, and enumerate CPUID.80000008H:EDX.EferLmsleUnsupported[bit
20] in KVM_GET_SUPPORTED_CPUID on SVM hosts.

Jim Mattson (3):
  Revert "KVM: SVM: Allow EFER.LMSLE to be set with nested svm"
  x86/cpufeatures: Introduce X86_FEATURE_NO_LMSLE
  KVM: SVM: Unconditionally enumerate EferLmsleUnsupported

 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/include/asm/msr-index.h   | 2 --
 arch/x86/kvm/svm/svm.c             | 3 ++-
 3 files changed, 3 insertions(+), 3 deletions(-)

v1 -> v2: Make no attempt to preserve existing behavior [Sean, Borislav]

-- 
2.37.3.968.ga6b4b080e4-goog

Re: [PATCH v2 0/3] KVM: EFER.LMSLE cleanup

Posted by Borislav Petkov 3 years, 6 months ago

On Tue, Sep 20, 2022 at 01:59:19PM -0700, Jim Mattson wrote:
> Jim Mattson (3):
>   Revert "KVM: SVM: Allow EFER.LMSLE to be set with nested svm"
>   x86/cpufeatures: Introduce X86_FEATURE_NO_LMSLE
>   KVM: SVM: Unconditionally enumerate EferLmsleUnsupported

Why do you need those two if you revert the hack? After the revert,
anything that tries to set LMSLE should get a #GP anyway, no?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 0/3] KVM: EFER.LMSLE cleanup

Posted by Sean Christopherson 3 years, 6 months ago

On Tue, Sep 20, 2022, Borislav Petkov wrote:
> On Tue, Sep 20, 2022 at 01:59:19PM -0700, Jim Mattson wrote:
> > Jim Mattson (3):
> >   Revert "KVM: SVM: Allow EFER.LMSLE to be set with nested svm"
> >   x86/cpufeatures: Introduce X86_FEATURE_NO_LMSLE
> >   KVM: SVM: Unconditionally enumerate EferLmsleUnsupported
> 
> Why do you need those two if you revert the hack? After the revert,
> anything that tries to set LMSLE should get a #GP anyway, no?

Yes, but ideally KVM would explicitly tell the guest "you don't have LMSLE".
Probably a moot point, but at the same time I don't see a reason not to be
explicit.

Re: [PATCH v2 0/3] KVM: EFER.LMSLE cleanup

Posted by Jim Mattson 3 years, 6 months ago

On Tue, Sep 20, 2022 at 2:17 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Sep 20, 2022 at 01:59:19PM -0700, Jim Mattson wrote:
> > Jim Mattson (3):
> >   Revert "KVM: SVM: Allow EFER.LMSLE to be set with nested svm"
> >   x86/cpufeatures: Introduce X86_FEATURE_NO_LMSLE
> >   KVM: SVM: Unconditionally enumerate EferLmsleUnsupported
>
> Why do you need those two if you revert the hack? After the revert,
> anything that tries to set LMSLE should get a #GP anyway, no?

Reporting that CPUID bit gives us the right to raise #GP. AMD CPUs
(going way back) that don't report EferLmsleUnsupported do not raise
#GP.

Re: [PATCH v2 0/3] KVM: EFER.LMSLE cleanup

Posted by Borislav Petkov 3 years, 6 months ago

On Tue, Sep 20, 2022 at 09:36:18PM +0000, Sean Christopherson wrote:
> Yes, but ideally KVM would explicitly tell the guest "you don't have LMSLE".
> Probably a moot point, but at the same time I don't see a reason not to be
> explicit.

Yes but...

On Tue, Sep 20, 2022 at 02:36:34PM -0700, Jim Mattson wrote:
> Reporting that CPUID bit gives us the right to raise #GP. AMD CPUs
> (going way back) that don't report EferLmsleUnsupported do not raise
> #GP.

... what does "gives us the right" mean exactly?

I'm pretty sure I'm missing something about how KVM works but wouldn't
it raise a guest #GP when the guest tries to set an unsupported EFER
bit? I.e., why do you need to explicitly do

	kvm_cpu_cap_set(X86_FEATURE_NO_LMSLE);

and not handle this like any other EFER reserved bit?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 0/3] KVM: EFER.LMSLE cleanup

Posted by Jim Mattson 3 years, 6 months ago

On Wed, Sep 21, 2022 at 2:28 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Sep 20, 2022 at 09:36:18PM +0000, Sean Christopherson wrote:
> > Yes, but ideally KVM would explicitly tell the guest "you don't have LMSLE".
> > Probably a moot point, but at the same time I don't see a reason not to be
> > explicit.
>
> Yes but...
>
> On Tue, Sep 20, 2022 at 02:36:34PM -0700, Jim Mattson wrote:
> > Reporting that CPUID bit gives us the right to raise #GP. AMD CPUs
> > (going way back) that don't report EferLmsleUnsupported do not raise
> > #GP.
>
> ... what does "gives us the right" mean exactly?
>
> I'm pretty sure I'm missing something about how KVM works but wouldn't
> it raise a guest #GP when the guest tries to set an unsupported EFER
> bit? I.e., why do you need to explicitly do
>
>         kvm_cpu_cap_set(X86_FEATURE_NO_LMSLE);
>
> and not handle this like any other EFER reserved bit?

EFER.LMLSE is not a reserved bit on AMD64 CPUs, unless
CPUID.80000008:EBX[20] is set (or you're running very, very old
hardware).

We really shouldn't just decide on a whim to treat EFER.LMSLE as
reserved under KVM. The guest CPUID information represents our
detailed contract with the guest software. By setting
CPUID.80000008:EBX[20], we are telling the guest that if it tries to
set EFER.LMSLE, we will raise a #GP.

If we don't set that bit in the guest CPUID information and we raise
#GP on an attempt to set EFER.LMSLE, the virtual hardware is
defective. We could document this behavior as an erratum, but since a
mechanism exists to declare that the guest can expect EFER.LMSLE to
#GP, doesn't it make sense to use it?

Re: [PATCH v2 0/3] KVM: EFER.LMSLE cleanup

Posted by Borislav Petkov 3 years, 6 months ago

On Wed, Sep 21, 2022 at 06:45:24AM -0700, Jim Mattson wrote:
> EFER.LMLSE is not a reserved bit on AMD64 CPUs, unless
> CPUID.80000008:EBX[20] is set (or you're running very, very old
> hardware).
> 
> We really shouldn't just decide on a whim to treat EFER.LMSLE as
> reserved under KVM. The guest CPUID information represents our
> detailed contract with the guest software. By setting
> CPUID.80000008:EBX[20], we are telling the guest that if it tries to
> set EFER.LMSLE, we will raise a #GP.

I understand all that. What I'm asking is, what happens in KVM *after*
your patch 1/3 is applied when a guest tries to set EFER.LMSLE? Does it
#GP or does it allow the WRMSR to succeed? I.e., does KVM check when
reserved bits in that MSR are being set?

By looking at it, there's kvm_enable_efer_bits() so it looks like KVM
does control which bits are allowed to set and which not...?

> If we don't set that bit in the guest CPUID information and we raise
> #GP on an attempt to set EFER.LMSLE, the virtual hardware is
> defective.

See, this is what I don't get - why is it defective? After the revert,
that bit to KVM is reserved.

> We could document this behavior as an erratum, but since a
> mechanism exists to declare that the guest can expect EFER.LMSLE to
> #GP, doesn't it make sense to use it?

I don't mind all that and the X86_FEATURE bit and so on - I'm just
trying to ask you guys: what is KVM's behavior when the guest tries to
set a reserved EFER bit.

Maybe I'm not expressing myself precisely enough...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 0/3] KVM: EFER.LMSLE cleanup

Posted by Jim Mattson 3 years, 6 months ago

On Wed, Sep 21, 2022 at 6:54 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Sep 21, 2022 at 06:45:24AM -0700, Jim Mattson wrote:
> > EFER.LMLSE is not a reserved bit on AMD64 CPUs, unless
> > CPUID.80000008:EBX[20] is set (or you're running very, very old
> > hardware).
> >
> > We really shouldn't just decide on a whim to treat EFER.LMSLE as
> > reserved under KVM. The guest CPUID information represents our
> > detailed contract with the guest software. By setting
> > CPUID.80000008:EBX[20], we are telling the guest that if it tries to
> > set EFER.LMSLE, we will raise a #GP.
>
> I understand all that. What I'm asking is, what happens in KVM *after*
> your patch 1/3 is applied when a guest tries to set EFER.LMSLE? Does it
> #GP or does it allow the WRMSR to succeed? I.e., does KVM check when
> reserved bits in that MSR are being set?
>
> By looking at it, there's kvm_enable_efer_bits() so it looks like KVM
> does control which bits are allowed to set and which not...?

Yes, after the revert, KVM will treat the bit as reserved, and it will
synthesize a #GP, *in violation of the architectural specification.*
As I said, we could document this behavior as a KVM erratum.

> > If we don't set that bit in the guest CPUID information and we raise
> > #GP on an attempt to set EFER.LMSLE, the virtual hardware is
> > defective.
>
> See, this is what I don't get - why is it defective? After the revert,
> that bit to KVM is reserved.

KVM can't just decide willy nilly to reserve arbitrary bits. If it is
in violation of AMD's architectural specification, the virtual CPU is
defective.

> > We could document this behavior as an erratum, but since a
> > mechanism exists to declare that the guest can expect EFER.LMSLE to
> > #GP, doesn't it make sense to use it?
>
> I don't mind all that and the X86_FEATURE bit and so on - I'm just
> trying to ask you guys: what is KVM's behavior when the guest tries to
> set a reserved EFER bit.
>
> Maybe I'm not expressing myself precisely enough...

I feel the same way. :-(

The two patches after the revert are to amend the contract with the
guest (as expressed by the guest CPUID table) so that the KVM virtual
CPU can raise a #GP on EFER.LMSLE and still conform to the
architectural specification.

From the APM, volume 2, 4.12.2 Data Limit Checks in 64-bit Mode:

> Data segment limit checking in 64-bit mode is not supported by all processor implementations and has been deprecated. If CPUID Fn8000_0008_EBX[EferLmlseUnsupported](bit 20) = 1, 64-bit mode segment limit checking is not supported and attempting to enable this feature by setting EFER.LMSLE =1 will result in a #GP exception.

Re: [PATCH v2 0/3] KVM: EFER.LMSLE cleanup

Posted by Borislav Petkov 3 years, 6 months ago

On Wed, Sep 21, 2022 at 08:11:29AM -0700, Jim Mattson wrote:
> Yes, after the revert, KVM will treat the bit as reserved, and it will
> synthesize a #GP, *in violation of the architectural specification.*

Architectural, schmarchitectural... Intel hasn't implemented it so meh.

> KVM can't just decide willy nilly to reserve arbitrary bits. If it is
> in violation of AMD's architectural specification, the virtual CPU is
> defective.

Grrr, after your revert that this bit was *only* reserved and nothing
else to KVM. Like every other reserved bit in EFER. Yeah, yeah, AMD
specified it as architectural but Intel didn't implement it so there's
this thing on paper and there's reality...

Anyway, enough about this - we're on the same page.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 0/3] KVM: EFER.LMSLE cleanup

Posted by Jim Mattson 3 years, 6 months ago

On Wed, Sep 21, 2022 at 9:06 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Sep 21, 2022 at 08:11:29AM -0700, Jim Mattson wrote:
> > Yes, after the revert, KVM will treat the bit as reserved, and it will
> > synthesize a #GP, *in violation of the architectural specification.*
>
> Architectural, schmarchitectural... Intel hasn't implemented it so meh.
>
> > KVM can't just decide willy nilly to reserve arbitrary bits. If it is
> > in violation of AMD's architectural specification, the virtual CPU is
> > defective.
>
> Grrr, after your revert that this bit was *only* reserved and nothing
> else to KVM. Like every other reserved bit in EFER. Yeah, yeah, AMD
> specified it as architectural but Intel didn't implement it so there's
> this thing on paper and there's reality...

AMD defined the 64-bit x86 extensions while Intel was distracted with
their VLIW science fair project. In this space, Intel produces AMD64
compatible CPUs. The definitive specification comes from AMD (which is
sad, because AMD's documentation is abysmal).

Re: [PATCH v2 0/3] KVM: EFER.LMSLE cleanup

Posted by Borislav Petkov 3 years, 6 months ago

On Wed, Sep 21, 2022 at 09:23:40AM -0700, Jim Mattson wrote:
> AMD defined the 64-bit x86 extensions while Intel was distracted with
> their VLIW science fair project. In this space, Intel produces AMD64
> compatible CPUs.

Almost-compatible. And maybe, just maybe, because Intel were probably
and practically forced to implement AMD64 but then thought, oh well,
we'll do some things differently.

> The definitive specification comes from AMD (which is sad, because
> AMD's documentation is abysmal).

Just don't tell me the SDM is better...

But you and I are really talking past each other: there's nothing
definitive about a spec if, while implementing it, the other vendor is
doing some subtle, but very software visible things differently.

I.e., the theory vs reality point I'm trying to get across.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 0/3] KVM: EFER.LMSLE cleanup

Posted by Jim Mattson 3 years, 6 months ago

On Wed, Sep 21, 2022 at 10:11 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Sep 21, 2022 at 09:23:40AM -0700, Jim Mattson wrote:
> > AMD defined the 64-bit x86 extensions while Intel was distracted with
> > their VLIW science fair project. In this space, Intel produces AMD64
> > compatible CPUs.
>
> Almost-compatible. And maybe, just maybe, because Intel were probably
> and practically forced to implement AMD64 but then thought, oh well,
> we'll do some things differently.
>
> > The definitive specification comes from AMD (which is sad, because
> > AMD's documentation is abysmal).
>
> Just don't tell me the SDM is better...
>
> But you and I are really talking past each other: there's nothing
> definitive about a spec if, while implementing it, the other vendor is
> doing some subtle, but very software visible things differently.
>
> I.e., the theory vs reality point I'm trying to get across.

I get it. In reality, all of the reverse polarity CPUID feature bits
are essentially useless.

The only software that actually uses LMSLE is defunct. That software
predated the definition of CPUID.80000008:EBX.EferLmlseUnsupported and
is no longer being updated, so it isn't going to check the feature
bit. It's just going to fail with an unexpected #GP.

Maybe you think I'm being overly pedantic, but the code to do the
right thing is trivial, so why not do it?

This way, if anyone files a bug against KVM because an old VMware
hypervisor dies with an unexpected #GP, I can point to the spec and
say that it's user error.