[PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for setting up nested VMX MSRs

Vitaly Kuznetsov posted 10 patches 3 years, 10 months ago
There is a newer version of this series
arch/x86/kvm/vmx/capabilities.h | 16 +++---
arch/x86/kvm/vmx/nested.c       | 37 +++++---------
arch/x86/kvm/vmx/nested.h       |  2 +-
arch/x86/kvm/vmx/vmx.c          | 91 +++++++++++++++++++++++++--------
4 files changed, 94 insertions(+), 52 deletions(-)
[PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for setting up nested VMX MSRs
Posted by Vitaly Kuznetsov 3 years, 10 months ago
vmcs_config is a sanitized version of host VMX MSRs where some controls are
filtered out (e.g. when Enlightened VMCS is enabled, some know bugs are 
discovered, some inconsistencies in controls are detected,...) but
nested_vmx_setup_ctls_msrs() uses raw host MSRs instead. This may end up
in exposing undesired controls to L1. Switch to using vmcs_config instead.

RFC part: vmcs_config's sanitization now is a mix of "what can't be enabled"
and "what KVM doesn't want" and we need to separate these as for nested VMX
MSRs only the first category makes sense. This gives vmcs_config a slightly
different meaning "controls which can be (theoretically) used". An alternative
approach would be to store sanitized host MSRs values separately, sanitize
them and and use in nested_vmx_setup_ctls_msrs() but currently I don't see
any benefits. Comments welcome!

Very lightly tested with KVM selftests / kvm-unit-tests.

Vitaly Kuznetsov (10):
  KVM: VMX: Move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering out of
    setup_vmcs_config()
  KVM: VMX: Add missing CPU based VM execution controls to vmcs_config
  KVM: VMX: Move CPU_BASED_{CR3_LOAD,CR3_STORE,INVLPG}_EXITING filtering
    out of setup_vmcs_config()
  KVM: VMX: Add missing VMEXIT controls to vmcs_config
  KVM: VMX: Add missing VMENTRY controls to vmcs_config
  KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs
  KVM: VMX: Store required-1 VMX controls in vmcs_config
  KVM: nVMX: Use sanitized required-1 bits for VMX control MSRs
  KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_config
  KVM: nVMX: Use cached host MSR_IA32_VMX_MISC value for setting up
    nested MSR

 arch/x86/kvm/vmx/capabilities.h | 16 +++---
 arch/x86/kvm/vmx/nested.c       | 37 +++++---------
 arch/x86/kvm/vmx/nested.h       |  2 +-
 arch/x86/kvm/vmx/vmx.c          | 91 +++++++++++++++++++++++++--------
 4 files changed, 94 insertions(+), 52 deletions(-)

-- 
2.35.3
Re: [PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for setting up nested VMX MSRs
Posted by Sean Christopherson 3 years, 10 months ago
On Wed, Jun 22, 2022, Vitaly Kuznetsov wrote:
> vmcs_config is a sanitized version of host VMX MSRs where some controls are
> filtered out (e.g. when Enlightened VMCS is enabled, some know bugs are 
> discovered, some inconsistencies in controls are detected,...) but
> nested_vmx_setup_ctls_msrs() uses raw host MSRs instead. This may end up
> in exposing undesired controls to L1. Switch to using vmcs_config instead.
> 
> RFC part: vmcs_config's sanitization now is a mix of "what can't be enabled"
> and "what KVM doesn't want" and we need to separate these as for nested VMX
> MSRs only the first category makes sense. This gives vmcs_config a slightly
> different meaning "controls which can be (theoretically) used". An alternative
> approach would be to store sanitized host MSRs values separately, sanitize
> them and and use in nested_vmx_setup_ctls_msrs() but currently I don't see
> any benefits. Comments welcome!

I like the idea overall, even though it's a decent amount of churn.  It seems
easier to maintain than separate paths for nested.  The alternative would be to
add common helpers to adjust the baseline configurations, but I don't see any
way to programmatically make that approach more robust.

An idea to further harden things.  Or: an excuse to extend macro shenanigans :-)

If we throw all of the "opt" and "min" lists into macros, e.g. KVM_REQUIRED_*
and KVM_OPTIONAL_*, and then use those to define a KVM_KNOWN_* field, we can
prevent using the mutators to set/clear unknown bits at runtime via BUILD_BUG_ON().
The core builders, e.g. vmx_exec_control(), can still set unknown bits, i.e. set
bits that aren't enumerated to L1, but that's easier to audit and this would catch
regressions for the issues fixed in patches.

It'll required making add_atomic_switch_msr_special() __always_inline (or just
open code it), but that's not a big deal.

E.g. if we have

  #define KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL <blah blah blah>
  #define KVM_OPTIONAL_CPU_BASED_VM_EXEC_CONTROL <blah blah blah>

Then the builders for the controls shadows can do:

diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 286c88e285ea..5eb75822a09e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -468,6 +468,8 @@ static inline u8 vmx_get_rvi(void)
 }
 
 #define BUILD_CONTROLS_SHADOW(lname, uname, bits)                              \
+#define KVM_KNOWN_ ## uname                                                    \
+       (KVM_REQUIRED_ ## uname | KVM_OPTIONAL_ ## uname)                       \
 static inline void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val)     \
 {                                                                              \
        if (vmx->loaded_vmcs->controls_shadow.lname != val) {                   \
@@ -485,10 +487,12 @@ static inline u##bits lname##_controls_get(struct vcpu_vmx *vmx)          \
 }                                                                              \
 static inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u##bits val)  \
 {                                                                              \
+       BUILD_BUG_ON(!(val & KVM_KNOWN_ ## uname));                             \
        lname##_controls_set(vmx, lname##_controls_get(vmx) | val);             \
 }                                                                              \
 static inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##bits val)        \
 {                                                                              \
+       BUILD_BUG_ON(!(val & KVM_KNOWN_ ## uname));                             \
        lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val);            \
 }
 BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32)



Handling the controls that are restricted to CONFIG_X86_64=y will be midly annoying,
but adding a base set isn't too bad, e.g.

#define __KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL <blah blah blah>
#ifdef CONFIG_X86_64
#define KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL (__KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL | \
						CPU_BASED_CR8_LOAD_EXITING |		   \
						CPU_BASED_CR8_STORE_EXITING)
#else
#define KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL __KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL
#endif
Re: [PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for setting up nested VMX MSRs
Posted by Vitaly Kuznetsov 3 years, 9 months ago
Sean Christopherson <seanjc@google.com> writes:

> On Wed, Jun 22, 2022, Vitaly Kuznetsov wrote:
>> vmcs_config is a sanitized version of host VMX MSRs where some controls are
>> filtered out (e.g. when Enlightened VMCS is enabled, some know bugs are 
>> discovered, some inconsistencies in controls are detected,...) but
>> nested_vmx_setup_ctls_msrs() uses raw host MSRs instead. This may end up
>> in exposing undesired controls to L1. Switch to using vmcs_config instead.
>> 
>> RFC part: vmcs_config's sanitization now is a mix of "what can't be enabled"
>> and "what KVM doesn't want" and we need to separate these as for nested VMX
>> MSRs only the first category makes sense. This gives vmcs_config a slightly
>> different meaning "controls which can be (theoretically) used". An alternative
>> approach would be to store sanitized host MSRs values separately, sanitize
>> them and and use in nested_vmx_setup_ctls_msrs() but currently I don't see
>> any benefits. Comments welcome!
>
> I like the idea overall, even though it's a decent amount of churn.  It seems
> easier to maintain than separate paths for nested.  The alternative would be to
> add common helpers to adjust the baseline configurations, but I don't see any
> way to programmatically make that approach more robust.
>
> An idea to further harden things.  Or: an excuse to extend macro shenanigans :-)
>
> If we throw all of the "opt" and "min" lists into macros, e.g. KVM_REQUIRED_*
> and KVM_OPTIONAL_*, and then use those to define a KVM_KNOWN_* field, we can
> prevent using the mutators to set/clear unknown bits at runtime via BUILD_BUG_ON().
> The core builders, e.g. vmx_exec_control(), can still set unknown bits, i.e. set
> bits that aren't enumerated to L1, but that's easier to audit and this would catch
> regressions for the issues fixed in patches.
>
> It'll required making add_atomic_switch_msr_special() __always_inline (or just
> open code it), but that's not a big deal.
>
> E.g. if we have
>
>   #define KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL <blah blah blah>
>   #define KVM_OPTIONAL_CPU_BASED_VM_EXEC_CONTROL <blah blah blah>
>
> Then the builders for the controls shadows can do:
>
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 286c88e285ea..5eb75822a09e 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -468,6 +468,8 @@ static inline u8 vmx_get_rvi(void)
>  }
>  
>  #define BUILD_CONTROLS_SHADOW(lname, uname, bits)                              \
> +#define KVM_KNOWN_ ## uname                                                    \
> +       (KVM_REQUIRED_ ## uname | KVM_OPTIONAL_ ## uname)                       \

I'm certainly not a macro jedi and I failed to make this compile as gcc
hates when I put '#define's in macros but I made a simpler version with
(presumeably) the same outcome. v1 is out, thanks for the suggestion!

>  static inline void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val)     \
>  {                                                                              \
>         if (vmx->loaded_vmcs->controls_shadow.lname != val) {                   \
> @@ -485,10 +487,12 @@ static inline u##bits lname##_controls_get(struct vcpu_vmx *vmx)          \
>  }                                                                              \
>  static inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u##bits val)  \
>  {                                                                              \
> +       BUILD_BUG_ON(!(val & KVM_KNOWN_ ## uname));                             \
>         lname##_controls_set(vmx, lname##_controls_get(vmx) | val);             \
>  }                                                                              \
>  static inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##bits val)        \
>  {                                                                              \
> +       BUILD_BUG_ON(!(val & KVM_KNOWN_ ## uname));                             \
>         lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val);            \
>  }
>  BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32)
>
>
>
> Handling the controls that are restricted to CONFIG_X86_64=y will be midly annoying,
> but adding a base set isn't too bad, e.g.
>
> #define __KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL <blah blah blah>
> #ifdef CONFIG_X86_64
> #define KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL (__KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL | \
> 						CPU_BASED_CR8_LOAD_EXITING |		   \
> 						CPU_BASED_CR8_STORE_EXITING)
> #else
> #define KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL __KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL
> #endif
>

-- 
Vitaly