KVM: VMX: Fix APICv activation bugs

[PATCH v3 06/10] KVM: nVMX: Switch to vmcs01 to update SVI on-demand if L2 is active

Posted by Sean Christopherson 2 months ago

If APICv is activated while L2 is running and triggers an SVI update,
temporarily load vmcs01 and immediately update SVI instead of deferring
the update until the next nested VM-Exit.  This will eventually allow
killing off kvm_apic_update_hwapic_isr(), and all of nVMX's deferred
APICv updates.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c |  5 -----
 arch/x86/kvm/vmx/vmx.c    | 19 +++++++------------
 arch/x86/kvm/vmx/vmx.h    |  1 -
 3 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 8efab1cf833f..c2c96e4fe20e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5160,11 +5160,6 @@ void __nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 		vmx_refresh_apicv_exec_ctrl(vcpu);
 	}
 
-	if (vmx->nested.update_vmcs01_hwapic_isr) {
-		vmx->nested.update_vmcs01_hwapic_isr = false;
-		kvm_apic_update_hwapic_isr(vcpu);
-	}
-
 	if ((vm_exit_reason != -1) &&
 	    (enable_shadow_vmcs || nested_vmx_is_evmptr12_valid(vmx)))
 		vmx->nested.need_vmcs12_to_shadow_sync = true;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3ee86665d8de..74a815cddd37 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6963,21 +6963,16 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
 	u16 status;
 	u8 old;
 
-	/*
-	 * If L2 is active, defer the SVI update until vmcs01 is loaded, as SVI
-	 * is only relevant for if and only if Virtual Interrupt Delivery is
-	 * enabled in vmcs12, and if VID is enabled then L2 EOIs affect L2's
-	 * vAPIC, not L1's vAPIC.  KVM must update vmcs01 on the next nested
-	 * VM-Exit, otherwise L1 with run with a stale SVI.
-	 */
-	if (is_guest_mode(vcpu)) {
-		to_vmx(vcpu)->nested.update_vmcs01_hwapic_isr = true;
-		return;
-	}
-
 	if (max_isr == -1)
 		max_isr = 0;
 
+	/*
+	 * Always update SVI in vmcs01, as SVI is only relevant for L2 if and
+	 * only if Virtual Interrupt Delivery is enabled in vmcs12, and if VID
+	 * is enabled then L2 EOIs affect L2's vAPIC, not L1's vAPIC.
+	 */
+	guard(vmx_vmcs01)(vcpu);
+
 	status = vmcs_read16(GUEST_INTR_STATUS);
 	old = status >> 8;
 	if (max_isr != old) {
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 36f48c4b39c0..53969e49d9d1 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -134,7 +134,6 @@ struct nested_vmx {
 	bool change_vmcs01_virtual_apic_mode;
 	bool reload_vmcs01_apic_access_page;
 	bool update_vmcs01_apicv_status;
-	bool update_vmcs01_hwapic_isr;
 
 	/*
 	 * Enlightened VMCS has been enabled. It does not mean that L1 has to
-- 
2.52.0.223.gf5cc29aaa4-goog

Re: [PATCH v3 06/10] KVM: nVMX: Switch to vmcs01 to update SVI on-demand if L2 is active

Posted by Chao Gao 1 month, 2 weeks ago

On Fri, Dec 05, 2025 at 03:19:09PM -0800, Sean Christopherson wrote:
>If APICv is activated while L2 is running and triggers an SVI update,
>temporarily load vmcs01 and immediately update SVI instead of deferring
>the update until the next nested VM-Exit.  This will eventually allow
>killing off kvm_apic_update_hwapic_isr(), and all of nVMX's deferred
>APICv updates.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

A side topic below:

<snip>

>@@ -6963,21 +6963,16 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
> 	u16 status;
> 	u8 old;
> 
>-	/*
>-	 * If L2 is active, defer the SVI update until vmcs01 is loaded, as SVI
>-	 * is only relevant for if and only if Virtual Interrupt Delivery is
>-	 * enabled in vmcs12, and if VID is enabled then L2 EOIs affect L2's
>-	 * vAPIC, not L1's vAPIC.  KVM must update vmcs01 on the next nested
>-	 * VM-Exit, otherwise L1 with run with a stale SVI.
>-	 */
>-	if (is_guest_mode(vcpu)) {
>-		to_vmx(vcpu)->nested.update_vmcs01_hwapic_isr = true;
>-		return;
>-	}
>-
> 	if (max_isr == -1)
> 		max_isr = 0;
> 
>+	/*
>+	 * Always update SVI in vmcs01, as SVI is only relevant for L2 if and
>+	 * only if Virtual Interrupt Delivery is enabled in vmcs12, and if VID
>+	 * is enabled then L2 EOIs affect L2's vAPIC, not L1's vAPIC.
>+	 */
>+	guard(vmx_vmcs01)(vcpu);

KVM calls this function when virtualizing EOI for L2, and in a previous
discussion, you mentioned that the overhead of switching to VMCS01 is
"non-trivial and unnecessary" (see [1]).

My testing shows that guard(vmx_vmcs01) takes about 140-250 cycles. I think
this overhead is acceptable for nested scenarios, since it only affects
EOI-induced VM-exits in specific/suboptimal configurations.

But I'm wondering whether KVM should update SVI on every VM-entry instead of
doing it on-demand (i.e., when vISR gets changed). We've encountered two
SVI-related bugs [1][2] that were difficult to debug. Preventing these issues
entirely seems worthwhile, and the overhead of always updating SVI during
VM-entry should be minimal since KVM already updates RVI (RVI and SVI are in
the the same VMCS field) in vmx_sync_irr_to_pir() when APICv is enabled.

[1]: https://lore.kernel.org/kvm/ZxAL6thxEH67CpW7@google.com/
[2]: https://lore.kernel.org/kvm/20251205231913.441872-2-seanjc@google.com/

e.g.,

---
 arch/x86/include/asm/kvm-x86-ops.h |  1 -
 arch/x86/include/asm/kvm_host.h    |  1 -
 arch/x86/kvm/lapic.c               | 24 ++++++++--------
 arch/x86/kvm/lapic.h               |  1 +
 arch/x86/kvm/vmx/main.c            |  9 ------
 arch/x86/kvm/vmx/vmx.c             | 45 +++++++-----------------------
 arch/x86/kvm/vmx/x86_ops.h         |  1 -
 7 files changed, 22 insertions(+), 60 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index de709fb5bd76..69aa64aa9910 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -84,7 +84,6 @@ KVM_X86_OP(enable_nmi_window)
 KVM_X86_OP(enable_irq_window)
 KVM_X86_OP_OPTIONAL(update_cr8_intercept)
 KVM_X86_OP(refresh_apicv_exec_ctrl)
-KVM_X86_OP_OPTIONAL(hwapic_isr_update)
 KVM_X86_OP_OPTIONAL(load_eoi_exitmap)
 KVM_X86_OP_OPTIONAL(set_virtual_apic_mode)
 KVM_X86_OP_OPTIONAL(set_apic_access_page_addr)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5a3bfa293e8b..de57a90dac2f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1824,7 +1824,6 @@ struct kvm_x86_ops {
	const unsigned long required_apicv_inhibits;
	bool allow_apicv_in_x2apic_without_x2apic_virtualization;
	void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
-	void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr);
	void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
	void (*set_virtual_apic_mode)(struct kvm_vcpu *vcpu);
	void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 7be4d759884c..b0c87fa68f2a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -715,9 +715,7 @@ static inline void apic_set_isr(int vec, struct kvm_lapic *apic)
	 * because the processor can modify ISR under the hood.  Instead
	 * just set SVI.
	 */
-	if (unlikely(apic->apicv_active))
-		kvm_x86_call(hwapic_isr_update)(apic->vcpu, vec);
-	else {
+	if (likely(!apic->apicv_active)) {
		++apic->isr_count;
		BUG_ON(apic->isr_count > MAX_APIC_VECTOR);
		/*
@@ -761,9 +759,7 @@ static inline void apic_clear_isr(int vec, struct kvm_lapic *apic)
	 * on the other hand isr_count and highest_isr_cache are unused
	 * and must be left alone.
	 */
-	if (unlikely(apic->apicv_active))
-		kvm_x86_call(hwapic_isr_update)(apic->vcpu, apic_find_highest_isr(apic));
-	else {
+	if (likely(!apic->apicv_active)) {
		--apic->isr_count;
		BUG_ON(apic->isr_count < 0);
		apic->highest_isr_cache = -1;
@@ -781,6 +777,12 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_find_highest_irr);
 
+int kvm_lapic_find_highest_isr(struct kvm_vcpu *vcpu)
+{
+	return apic_find_highest_isr(vcpu->arch.apic);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lapic_find_highest_isr);
+
 static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
			     int vector, int level, int trig_mode,
			     struct dest_map *dest_map);
@@ -2774,12 +2776,10 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
	 */
	apic->irr_pending = true;
 
-	if (apic->apicv_active) {
+	if (apic->apicv_active)
		apic->isr_count = 1;
-		kvm_x86_call(hwapic_isr_update)(vcpu, apic_find_highest_isr(apic));
-	} else {
+	else
		apic->isr_count = count_vectors(apic->regs + APIC_ISR);
-	}
 
	apic->highest_isr_cache = -1;
 }
@@ -2907,10 +2907,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 
	vcpu->arch.pv_eoi.msr_val = 0;
	apic_update_ppr(apic);
-	if (apic->apicv_active) {
+	if (apic->apicv_active)
		kvm_x86_call(apicv_post_state_restore)(vcpu);
-		kvm_x86_call(hwapic_isr_update)(vcpu, -1);
-	}
 
	vcpu->arch.apic_arb_prio = 0;
	vcpu->arch.apic_attention = 0;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index aa0a9b55dbb7..6242c8c7a682 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -127,6 +127,7 @@ int kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value, bool host_initiated);
 int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
 int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
+int kvm_lapic_find_highest_isr(struct kvm_vcpu *vcpu);
 
 u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
 void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index a46ccd670785..73dd74efcf9a 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -282,14 +282,6 @@ static void vt_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
	return vmx_set_virtual_apic_mode(vcpu);
 }
 
-static void vt_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
-{
-	if (is_td_vcpu(vcpu))
-		return;
-
-	return vmx_hwapic_isr_update(vcpu, max_isr);
-}
-
 static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
	if (is_td_vcpu(vcpu))
@@ -953,7 +945,6 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
	.load_eoi_exitmap = vt_op(load_eoi_exitmap),
	.apicv_pre_state_restore = pi_apicv_pre_state_restore,
	.required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
-	.hwapic_isr_update = vt_op(hwapic_isr_update),
	.sync_pir_to_irr = vt_op(sync_pir_to_irr),
	.deliver_interrupt = vt_op(deliver_interrupt),
	.dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ef8d29c677b9..e7883bf7665f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6957,45 +6957,20 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
	read_unlock(&vcpu->kvm->mmu_lock);
 }
 
-void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
+static void vmx_set_rvi_svi(int rvi, int svi)
 {
	u16 status;
-	u8 old;
+	u16 new;
 
-	if (max_isr == -1)
-		max_isr = 0;
-
-	/*
-	 * Always update SVI in vmcs01, as SVI is only relevant for L2 if and
-	 * only if Virtual Interrupt Delivery is enabled in vmcs12, and if VID
-	 * is enabled then L2 EOIs affect L2's vAPIC, not L1's vAPIC.
-	 */
-	guard(vmx_vmcs01)(vcpu);
+	if (rvi == -1)
+		rvi = 0;
+	if (svi == -1)
+		svi = 0;
 
	status = vmcs_read16(GUEST_INTR_STATUS);
-	old = status >> 8;
-	if (max_isr != old) {
-		status &= 0xff;
-		status |= max_isr << 8;
-		vmcs_write16(GUEST_INTR_STATUS, status);
-	}
-}
-
-static void vmx_set_rvi(int vector)
-{
-	u16 status;
-	u8 old;
-
-	if (vector == -1)
-		vector = 0;
-
-	status = vmcs_read16(GUEST_INTR_STATUS);
-	old = (u8)status & 0xff;
-	if ((u8)vector != old) {
-		status &= ~0xff;
-		status |= (u8)vector;
-		vmcs_write16(GUEST_INTR_STATUS, status);
-	}
+	new = (rvi & 0xff) | ((u8)svi << 8);
+	if (new != status)
+		vmcs_write16(GUEST_INTR_STATUS, new);
 }
 
 int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
@@ -7037,7 +7012,7 @@ int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
	 * a VM-Exit and the subsequent entry will call sync_pir_to_irr.
	 */
	if (!is_guest_mode(vcpu) && kvm_vcpu_apicv_active(vcpu))
-		vmx_set_rvi(max_irr);
+		vmx_set_rvi_svi(max_irr, kvm_lapic_find_highest_isr(vcpu));
	else if (got_posted_interrupt)
		kvm_make_request(KVM_REQ_EVENT, vcpu);
 
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index d09abeac2b56..ab7349e67809 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -46,7 +46,6 @@ int vmx_check_intercept(struct kvm_vcpu *vcpu,
 bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu);
 void vmx_migrate_timers(struct kvm_vcpu *vcpu);
 void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
-void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr);
 int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu);
 void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
			   int trig_mode, int vector);
-- 
2.47.3

Re: [PATCH v3 06/10] KVM: nVMX: Switch to vmcs01 to update SVI on-demand if L2 is active

Posted by Sean Christopherson 1 month, 1 week ago

On Thu, Dec 25, 2025, Chao Gao wrote:
> On Fri, Dec 05, 2025 at 03:19:09PM -0800, Sean Christopherson wrote:
> >@@ -6963,21 +6963,16 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
> > 	u16 status;
> > 	u8 old;
> > 
> >-	/*
> >-	 * If L2 is active, defer the SVI update until vmcs01 is loaded, as SVI
> >-	 * is only relevant for if and only if Virtual Interrupt Delivery is
> >-	 * enabled in vmcs12, and if VID is enabled then L2 EOIs affect L2's
> >-	 * vAPIC, not L1's vAPIC.  KVM must update vmcs01 on the next nested
> >-	 * VM-Exit, otherwise L1 with run with a stale SVI.
> >-	 */
> >-	if (is_guest_mode(vcpu)) {
> >-		to_vmx(vcpu)->nested.update_vmcs01_hwapic_isr = true;
> >-		return;
> >-	}
> >-
> > 	if (max_isr == -1)
> > 		max_isr = 0;
> > 
> >+	/*
> >+	 * Always update SVI in vmcs01, as SVI is only relevant for L2 if and
> >+	 * only if Virtual Interrupt Delivery is enabled in vmcs12, and if VID
> >+	 * is enabled then L2 EOIs affect L2's vAPIC, not L1's vAPIC.
> >+	 */
> >+	guard(vmx_vmcs01)(vcpu);
> 
> KVM calls this function when virtualizing EOI for L2, and in a previous
> discussion, you mentioned that the overhead of switching to VMCS01 is
> "non-trivial and unnecessary" (see [1]).
> 
> My testing shows that guard(vmx_vmcs01) takes about 140-250 cycles. I think
> this overhead is acceptable for nested scenarios, since it only affects
> EOI-induced VM-exits in specific/suboptimal configurations.
> 
> But I'm wondering whether KVM should update SVI on every VM-entry instead of
> doing it on-demand (i.e., when vISR gets changed). We've encountered two
> SVI-related bugs [1][2] that were difficult to debug. Preventing these issues
> entirely seems worthwhile, and the overhead of always updating SVI during
> VM-entry should be minimal since KVM already updates RVI (RVI and SVI are in
> the the same VMCS field) in vmx_sync_irr_to_pir() when APICv is enabled.

Hmm.  At first glance, I _really_ like this idea, but I'm leaning fairly strongly
towards keeping .hwapic_isr_update().

While small (~28 cycles on EMR), the runtime cost isn't zero, and it affects the
fastpath.  And number of useful updates is comically small.  E.g. without a nested
VM, AFAICT they basically never happen post-boot.  Even when running nested VMs,
the number of useful update when running L1 hovers around ~0.001%.

More importantly, KVM will carry most of the complexity related to vISR updates
regardless of how KVM handles SVI because of the ISR caching for non-APICv
systems.  So while I acknowledge that we've had some nasty bugs and 100% agree
that squashing them entirely is _very_ enticing, I think those bugs were due to
what were effectively two systemic flaws in KVM: (1) not aligning SVI with KVM's
ISR caching code, and (2) the whole "defer updates to nested VM-Exit" mess.

At the end of this series, both (1) and (2) are "solved".  Huh.  And now that I
look at (1) again, the last patch is wrong (benignly wrong, but still wrong).
The changelog says this:

  First, it adds a call during kvm_lapic_reset(), but that's a glorified nop as
  the ISR has already been zeroed.

but that's simply not true.  There's already a call in kvm_lapic_reset().  So
that patch can be amended with:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 7be4d759884c..55a7a2be3a2e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2907,10 +2907,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)

        vcpu->arch.pv_eoi.msr_val = 0;
        apic_update_ppr(apic);
-       if (apic->apicv_active) {
+       if (apic->apicv_active)
                kvm_x86_call(apicv_post_state_restore)(vcpu);
-               kvm_x86_call(hwapic_isr_update)(vcpu, -1);
-       }

        vcpu->arch.apic_arb_prio = 0;
        vcpu->arch.apic_attention = 0;

At which point updates to highest_isr_cache and .hwapic_isr_update() are fully
symmetrical (ignoring that KVM simply invalidates highest_isr_cache instead of
scanning the vISR on EOI and APICv changes).

So yeah, the more I look at all of this, the more I'm in favor of keeping
.hwapic_isr_update(), e.g. if only to let it serve as a canary for finding issues
related to highest_isr_cache and/or isr_count.

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ef8d29c677b9..e7883bf7665f 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6957,45 +6957,20 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
> 	read_unlock(&vcpu->kvm->mmu_lock);
>  }
>  
> -void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
> +static void vmx_set_rvi_svi(int rvi, int svi)

If this ever goes anywhere, my vote would be to call this vmx_sync_guest_intr_status(),
and pass in only @rvi, e.g. 

  static void vmx_sync_guest_intr_status(struct kvm_vcpu *vcpu, int rvi)
  {
	int svi = kvm_lapic_find_highest_isr(vcpu);
	u16 status, new;

	...
  }

> 	status = vmcs_read16(GUEST_INTR_STATUS);
> +	new = (rvi & 0xff) | ((u8)svi << 8);

I think this is technically undefined behavior?  Due to a shift larger than type
(casting to an 8-bit value and then shifting by 8).  svi[31:8] should always be
'0', but to be paranoid we could do:

	new = (rvi & 0xff) | ((svi & 0xff) << 8);

> +	if (new != status)
> +		vmcs_write16(GUEST_INTR_STATUS, new);
>  }

Re: [PATCH v3 06/10] KVM: nVMX: Switch to vmcs01 to update SVI on-demand if L2 is active

Posted by Chao Gao 1 month, 1 week ago

>> But I'm wondering whether KVM should update SVI on every VM-entry instead of
>> doing it on-demand (i.e., when vISR gets changed). We've encountered two
>> SVI-related bugs [1][2] that were difficult to debug. Preventing these issues
>> entirely seems worthwhile, and the overhead of always updating SVI during
>> VM-entry should be minimal since KVM already updates RVI (RVI and SVI are in
>> the the same VMCS field) in vmx_sync_irr_to_pir() when APICv is enabled.
>
>Hmm.  At first glance, I _really_ like this idea, but I'm leaning fairly strongly
>towards keeping .hwapic_isr_update().
>
>While small (~28 cycles on EMR), the runtime cost isn't zero, and it affects the
>fastpath.  And number of useful updates is comically small.  E.g. without a nested
>VM, AFAICT they basically never happen post-boot.  Even when running nested VMs,
>the number of useful update when running L1 hovers around ~0.001%.
>
>More importantly, KVM will carry most of the complexity related to vISR updates
>regardless of how KVM handles SVI because of the ISR caching for non-APICv
>systems.  So while I acknowledge that we've had some nasty bugs and 100% agree
>that squashing them entirely is _very_ enticing, I think those bugs were due to
>what were effectively two systemic flaws in KVM: (1) not aligning SVI with KVM's
>ISR caching code, and (2) the whole "defer updates to nested VM-Exit" mess.
>
>At the end of this series, both (1) and (2) are "solved".
 
Fair enough. Thanks for this explanation.

> Huh.  And now that I
>look at (1) again, the last patch is wrong (benignly wrong, but still wrong).
>The changelog says this:
>
>  First, it adds a call during kvm_lapic_reset(), but that's a glorified nop as
>  the ISR has already been zeroed.
>
>but that's simply not true.  There's already a call in kvm_lapic_reset().  So
>that patch can be amended with:

Yeah, the fix looks good to me.

>
>diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>index 7be4d759884c..55a7a2be3a2e 100644
>--- a/arch/x86/kvm/lapic.c
>+++ b/arch/x86/kvm/lapic.c
>@@ -2907,10 +2907,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
> 
>        vcpu->arch.pv_eoi.msr_val = 0;
>        apic_update_ppr(apic);
>-       if (apic->apicv_active) {
>+       if (apic->apicv_active)
>                kvm_x86_call(apicv_post_state_restore)(vcpu);
>-               kvm_x86_call(hwapic_isr_update)(vcpu, -1);
>-       }
> 
>        vcpu->arch.apic_arb_prio = 0;
>        vcpu->arch.apic_attention = 0;
>