i386/kvm: fix enlightened VMCS with fine-grained VMX feature enablement

[PATCH RFC] i386/kvm: fix enlightened VMCS with fine-grained VMX feature enablement

Posted by Vitaly Kuznetsov 4 years, 4 months ago

When enlightened VMCS is enabled, certain VMX controls disappear, e.g.
posted interrupts for PINBASED_CTLS. With fine-grained VMX feature
enablement QEMU tries to do KVM_SET_MSRS with default (matching CPU
model) values and fails as KVM doesn't allow to set now-unsupported
controls.

The ideal solution for the issue would probably be to re-read VMX
feature MSRs after enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS, however,
this doesn't seem to be possible: currently, KVM returns global
&vmcs_config.nested values for VMX MSRs when userspace does KVM_GET_MSR.

It is also possible to modify KVM to apply 'evmcs filtering' to VMX
MSRs when userspace tries to set them and hide the issue but this doesn't
seem to be entirely correct.

It is unfortunate that we now need to support the list of VMX features
disabled by enlightened VMCS in QEMU. When (and if) enlightened VMCS v2
arrives we'll need to fix QEMU and allow previously disabled features.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
- I don't quite like this workaround myself, thus RFC. I'm sure someone
 will suggest a better alternative.
---
 target/i386/kvm.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 0b511906e3fe..1b0589b79358 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1198,6 +1198,30 @@ static int hyperv_handle_properties(CPUState *cs,
         }
 
         if (!r) {
+            /*
+             * Certain VMX controls are unsupported when enlightened VMCS is
+             * enabled, filter them out here so we don't attempt to set them
+             * with KVM_SET_MSR even if they are supported by CPU model.
+             * The list below is for eVMCS version 1.
+             */
+            env->features[FEAT_VMX_PINBASED_CTLS] &=
+                ~(VMX_PIN_BASED_VMX_PREEMPTION_TIMER |
+                  VMX_PIN_BASED_POSTED_INTR);
+            env->features[FEAT_VMX_SECONDARY_CTLS] &=
+                ~(VMX_SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
+                  VMX_SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
+                  VMX_SECONDARY_EXEC_APIC_REGISTER_VIRT |
+                  VMX_SECONDARY_EXEC_ENABLE_PML |
+                  VMX_SECONDARY_EXEC_ENABLE_VMFUNC |
+                  VMX_SECONDARY_EXEC_SHADOW_VMCS |
+                  /* VMX_SECONDARY_EXEC_TSC_SCALING | */
+                  VMX_SECONDARY_EXEC_PAUSE_LOOP_EXITING);
+            env->features[FEAT_VMX_ENTRY_CTLS] &=
+                ~VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
+            env->features[FEAT_VMX_EXIT_CTLS] &=
+                ~VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
+            env->features[FEAT_VMX_VMFUNC] &= ~MSR_VMX_VMFUNC_EPT_SWITCHING;
+
             env->features[FEAT_HV_RECOMM_EAX] |=
                 HV_ENLIGHTENED_VMCS_RECOMMENDED;
             env->features[FEAT_HV_NESTED_EAX] = evmcs_version;
-- 
2.24.1

Re: [PATCH RFC] i386/kvm: fix enlightened VMCS with fine-grained VMX feature enablement

Posted by Paolo Bonzini 4 years, 4 months ago

On 02/01/20 21:39, Vitaly Kuznetsov wrote:
> When enlightened VMCS is enabled, certain VMX controls disappear, e.g.
> posted interrupts for PINBASED_CTLS. With fine-grained VMX feature
> enablement QEMU tries to do KVM_SET_MSRS with default (matching CPU
> model) values and fails as KVM doesn't allow to set now-unsupported
> controls.
> 
> The ideal solution for the issue would probably be to re-read VMX
> feature MSRs after enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS, however,
> this doesn't seem to be possible: currently, KVM returns global
> &vmcs_config.nested values for VMX MSRs when userspace does KVM_GET_MSR.
> 
> It is also possible to modify KVM to apply 'evmcs filtering' to VMX
> MSRs when userspace tries to set them and hide the issue but this doesn't
> seem to be entirely correct.
> 
> It is unfortunate that we now need to support the list of VMX features
> disabled by enlightened VMCS in QEMU. When (and if) enlightened VMCS v2
> arrives we'll need to fix QEMU and allow previously disabled features.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> - I don't quite like this workaround myself, thus RFC. I'm sure someone
>  will suggest a better alternative.

The patch itself is not a big deal, the only thing is that we should
probably check that we get warnings if the "forbidden" features are
explicitly requested by the user.  On the other hand, for something like
"-cpu Haswell,vmx,hv_evmcs" there should be no warnings.

That said, I'm not sure about the whole idea of disabling features, even
in the kernel.  I would prefer if the guest knew that it cannot use
these features if using eVMCS.  Is this the case for Windows?  If so, we
should teach guest-side KVM about this, not QEMU.

Paolo

> ---
>  target/i386/kvm.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 0b511906e3fe..1b0589b79358 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -1198,6 +1198,30 @@ static int hyperv_handle_properties(CPUState *cs,
>          }
>  
>          if (!r) {
> +            /*
> +             * Certain VMX controls are unsupported when enlightened VMCS is
> +             * enabled, filter them out here so we don't attempt to set them
> +             * with KVM_SET_MSR even if they are supported by CPU model.
> +             * The list below is for eVMCS version 1.
> +             */
> +            env->features[FEAT_VMX_PINBASED_CTLS] &=
> +                ~(VMX_PIN_BASED_VMX_PREEMPTION_TIMER |
> +                  VMX_PIN_BASED_POSTED_INTR);
> +            env->features[FEAT_VMX_SECONDARY_CTLS] &=
> +                ~(VMX_SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
> +                  VMX_SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
> +                  VMX_SECONDARY_EXEC_APIC_REGISTER_VIRT |
> +                  VMX_SECONDARY_EXEC_ENABLE_PML |
> +                  VMX_SECONDARY_EXEC_ENABLE_VMFUNC |
> +                  VMX_SECONDARY_EXEC_SHADOW_VMCS |
> +                  /* VMX_SECONDARY_EXEC_TSC_SCALING | */
> +                  VMX_SECONDARY_EXEC_PAUSE_LOOP_EXITING);
> +            env->features[FEAT_VMX_ENTRY_CTLS] &=
> +                ~VMX_VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
> +            env->features[FEAT_VMX_EXIT_CTLS] &=
> +                ~VMX_VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
> +            env->features[FEAT_VMX_VMFUNC] &= ~MSR_VMX_VMFUNC_EPT_SWITCHING;
> +
>              env->features[FEAT_HV_RECOMM_EAX] |=
>                  HV_ENLIGHTENED_VMCS_RECOMMENDED;
>              env->features[FEAT_HV_NESTED_EAX] = evmcs_version;
>

Re: [PATCH RFC] i386/kvm: fix enlightened VMCS with fine-grained VMX feature enablement

Posted by Vitaly Kuznetsov 4 years, 4 months ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 02/01/20 21:39, Vitaly Kuznetsov wrote:
>> When enlightened VMCS is enabled, certain VMX controls disappear, e.g.
>> posted interrupts for PINBASED_CTLS. With fine-grained VMX feature
>> enablement QEMU tries to do KVM_SET_MSRS with default (matching CPU
>> model) values and fails as KVM doesn't allow to set now-unsupported
>> controls.
>> 
>> The ideal solution for the issue would probably be to re-read VMX
>> feature MSRs after enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS, however,
>> this doesn't seem to be possible: currently, KVM returns global
>> &vmcs_config.nested values for VMX MSRs when userspace does KVM_GET_MSR.
>> 
>> It is also possible to modify KVM to apply 'evmcs filtering' to VMX
>> MSRs when userspace tries to set them and hide the issue but this doesn't
>> seem to be entirely correct.
>> 
>> It is unfortunate that we now need to support the list of VMX features
>> disabled by enlightened VMCS in QEMU. When (and if) enlightened VMCS v2
>> arrives we'll need to fix QEMU and allow previously disabled features.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> - I don't quite like this workaround myself, thus RFC. I'm sure someone
>>  will suggest a better alternative.
>
> The patch itself is not a big deal, the only thing is that we should
> probably check that we get warnings if the "forbidden" features are
> explicitly requested by the user.  On the other hand, for something like
> "-cpu Haswell,vmx,hv_evmcs" there should be no warnings.
>
> That said, I'm not sure about the whole idea of disabling features, even
> in the kernel.  I would prefer if the guest knew that it cannot use
> these features if using eVMCS.  Is this the case for Windows?

Honestly I forgot the story why we filtered out these features upon
eVMCS enablement in KVM. As there are no corresponding eVMCS fields,
there's no way a guest can actually use them.

I'm going to check that nothing breaks if we remove the filter. I'll go
and test Hyper-V 2016 and 2019.

> If so, we should teach guest-side KVM about this, not QEMU.

This is not required when enabling eVMCS on a genuine Hyper-V because it
correctly filters out unsupported features, however, to not break
KVM-on-KVM-using-eVMCS case we'll have to move the filter from host to
guest.

Thanks!

-- 
Vitaly

Re: [PATCH RFC] i386/kvm: fix enlightened VMCS with fine-grained VMX feature enablement

Posted by Paolo Bonzini 4 years, 4 months ago

On 07/01/20 13:08, Vitaly Kuznetsov wrote:
> Honestly I forgot the story why we filtered out these features upon
> eVMCS enablement in KVM. As there are no corresponding eVMCS fields,
> there's no way a guest can actually use them.

Well, mostly because we mimicked what Hyper-V was doing I guess.

> I'm going to check that nothing breaks if we remove the filter. I'll go
> and test Hyper-V 2016 and 2019.

KVM would break, right?  But we can mark that patch as stable material.

Paolo

>> If so, we should teach guest-side KVM about this, not QEMU.
> 
> This is not required when enabling eVMCS on a genuine Hyper-V because it
> correctly filters out unsupported features, however, to not break
> KVM-on-KVM-using-eVMCS case we'll have to move the filter from host to
> guest.
> 
> Thanks!
>

Re: [PATCH RFC] i386/kvm: fix enlightened VMCS with fine-grained VMX feature enablement

Posted by Vitaly Kuznetsov 4 years, 4 months ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 07/01/20 13:08, Vitaly Kuznetsov wrote:
>> Honestly I forgot the story why we filtered out these features upon
>> eVMCS enablement in KVM. As there are no corresponding eVMCS fields,
>> there's no way a guest can actually use them.
>
> Well, mostly because we mimicked what Hyper-V was doing I guess.
>

An update from reverse-engineering trenches.

I ran some tests to see if we can just drop the filtering and there is
only one problematic control which Hyper-V enables:

SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES

the problem with it is that we don't have 'apic_access_addr' field in
eVMCS ('virtual_apic_page_addr' is there). By running the same setup
with eVMCS disabled I figured out which address can be hardcoded to make
it boot. My guess was that the fields is present but not documented
properly, I tried scanning eVMCS for the value but with no luck so far.

I'll try to fish some information out of Microsoft.

-- 
Vitaly

Re: [PATCH RFC] i386/kvm: fix enlightened VMCS with fine-grained VMX feature enablement

Posted by Paolo Bonzini 4 years, 4 months ago

SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
>
> the problem with it is that we don't have 'apic_access_addr' field in
> eVMCS ('virtual_apic_page_addr' is there). By running the same setup
> with eVMCS disabled I figured out which address can be hardcoded to make
> it boot.
>

Maybe it's really hard coded (what is the value? Is it consistent across
Hyper-V version?) Can you try changing KVM to enable it and see if the
hardcoded APIC access address works?

Paolo

>

Re: [PATCH RFC] i386/kvm: fix enlightened VMCS with fine-grained VMX feature enablement

Posted by Vitaly Kuznetsov 4 years, 4 months ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
>>
>> the problem with it is that we don't have 'apic_access_addr' field in
>> eVMCS ('virtual_apic_page_addr' is there). By running the same setup
>> with eVMCS disabled I figured out which address can be hardcoded to make
>> it boot.
>>
>
> Maybe it's really hard coded (what is the value? Is it consistent across
> Hyper-V version?) Can you try changing KVM to enable it and see if the
> hardcoded APIC access address works?

Unfortunately, it's not the same even between BIOS and UEFI booted
WS2016 (Gen1/Gen2). It is, however, the same across reboot for the same
image so for example for WS2016Gen1 it is '0x294000' so if I do:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index fab7451a5793..8366b2a02b3b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1433,6 +1433,8 @@ static int copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx)
        vmcs12->tpr_threshold = evmcs->tpr_threshold;
        vmcs12->guest_rip = evmcs->guest_rip;
 
+       vmcs12->apic_access_addr = 0x294000;

it all works. I'm really puzzled.

-- 
Vitaly

Re: [PATCH RFC] i386/kvm: fix enlightened VMCS with fine-grained VMX feature enablement

Posted by Vitaly Kuznetsov 4 years, 4 months ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 07/01/20 13:08, Vitaly Kuznetsov wrote:
>> Honestly I forgot the story why we filtered out these features upon
>> eVMCS enablement in KVM. As there are no corresponding eVMCS fields,
>> there's no way a guest can actually use them.
>
> Well, mostly because we mimicked what Hyper-V was doing I guess.
>
>> I'm going to check that nothing breaks if we remove the filter. I'll go
>> and test Hyper-V 2016 and 2019.
>
> KVM would break, right?  But we can mark that patch as stable material.
>

While we are trying to understand how APIC virtualization works without
apic_access_addr field (maybe it doesn't?), should we fix the immediate
issue with QEMU-4.2 with a hack like:

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 72359709cdc1..038297e63396 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -357,15 +357,15 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 	if (vmcs_version)
 		*vmcs_version = nested_get_evmcs_version(vcpu);
 
-	/* We don't support disabling the feature for simplicity. */
-	if (evmcs_already_enabled)
-		return 0;
-
-	vmx->nested.msrs.pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
-	vmx->nested.msrs.entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
-	vmx->nested.msrs.exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
-	vmx->nested.msrs.secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
-	vmx->nested.msrs.vmfunc_controls &= ~EVMCS1_UNSUPPORTED_VMFUNC;
-
 	return 0;
 }
+
+void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata) {
+	/*
+	 * Enlightened VMCS doesn't have apic_access_addr field but Hyper-V
+	 * still tries to enable SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES when
+	 * it is available, filter it out
+	 */
+	if (msr_index == MSR_IA32_VMX_PROCBASED_CTLS2)
+		*pdata &= ~((u64)SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES << 32);
+}
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 07ebf6882a45..b88d9807a796 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -201,5 +201,6 @@ bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa);
 uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
 int nested_enable_evmcs(struct kvm_vcpu *vcpu,
 			uint16_t *vmcs_version);
+void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata);
 
 #endif /* __KVM_X86_VMX_EVMCS_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e3394c839dea..8eb74618b8d8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1849,8 +1849,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!nested_vmx_allowed(vcpu))
 			return 1;
-		return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
-				       &msr_info->data);
+		if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
+				    &msr_info->data))
+			return 1;
+		if (!msr_info->host_initiated &&
+		    vmx->nested.enlightened_vmcs_enabled)
+			nested_evmcs_filter_control_msr(msr_info->index,
+							&msr_info->data);
+		break;
 	case MSR_IA32_RTIT_CTL:
 		if (pt_mode != PT_MODE_HOST_GUEST)
 			return 1;

This should probably be complemented with a patch to not enable
unsupported controls when KVM is acting as a guest on eVMCS + a check
that none of the unsupported controls are enabled.

What do you think?

-- 
Vitaly