[PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled

Jon Kohler posted 1 patch 1 week, 6 days ago
arch/x86/kvm/lapic.c | 4 ++++
1 file changed, 4 insertions(+)
[PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
Posted by Jon Kohler 1 week, 6 days ago
From: Khushit Shah <khushit.shah@nutanix.com>

Problem:
We observed Windows w/ HyperV getting stuck during boot because of
level triggered interrupt storm. This is because KVM currently
does not respect Directed EOI bit set by guest in split-irqchip
mode.

We observed the following ACTUAL sequence on Windows guests with
Directed EOI enabled:
  1. Guest issues an APIC EOI.
  2. The interrupt is injected into L2 and serviced.
  3. Guest issues an IOAPIC EOI.

But, with the current behavior in split-irqchip mode:
  1. Guest issues an APIC EOI.
  2. KVM exits to userspace and QEMU's ioapic_service reasserts the
     interrupt because the line is not yet deasserted.
  3. Steps 1 and 2 keeps looping, and hence no progress is made.
(logs at the bug linked below).

This is because in split-irqchip mode, KVM requests a userspace IOAPIC
EOI exit on every APIC EOI. However, if the guest sets the Directed EOI
bit in the APIC Spurious Interrupt Vector Register (SPIV, bit 12), per
the x2APIC specification, the APIC does not broadcast EOIs to the IOAPIC.
In this case, it is the guest's responsibility to explicitly EOI the
IOAPIC by writing to its EOI register.

kernel-irqchip mode already handles this similarly in
kvm_ioapic_update_eoi_one().

Link: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com/

Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
---
 arch/x86/kvm/lapic.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0725d2cae742..a81e71ad5bda 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1473,6 +1473,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 
 	/* Request a KVM exit to inform the userspace IOAPIC. */
 	if (irqchip_split(apic->vcpu->kvm)) {
+		/* EOI the ioapic only if the Directed EOI is disabled. */
+		if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+			return;
+
 		apic->vcpu->arch.pending_ioapic_eoi = vector;
 		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
 		return;
-- 
2.43.0
Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
Posted by Sean Christopherson 1 week, 2 days ago
It'd be super helpful for downstream folks to mention "split IRQCHIP" somewhere
in the shortlog, e.g.

  KVM: x86: Suppress EOI broadcasts with split IRQCHIP if Directed EOI is enabled

On Thu, Sep 18, 2025, Jon Kohler wrote:
> From: Khushit Shah <khushit.shah@nutanix.com>
> 
> Problem:

Please read Documentation/process/maintainer-kvm-x86.rst.  I'll die on the hill
that leading with a problem statement results in a less efficient changelog.

And nowhere in the changelog does it state what change is actually being made.
The shortlog calls it out, but the shortlog isn't always visible, e.g. when
reviewing the initial patch email.

> We observed Windows w/ HyperV getting stuck during boot because of

No "we".  Over time, who "we" is can become unclear.

> level triggered interrupt storm. This is because KVM currently
> does not respect Directed EOI bit set by guest in split-irqchip
> mode.
>
> We observed the following ACTUAL sequence on Windows guests with

Uber nit, generally speaking, use asterisks or underscores to emphasive a word.
Using all caps suggests there is special meaning to the word, e.g. that it's an
acronym or something.  I am guilty of using ALL CAPS, but I'm pretty sure only
for "do NOT", where I think/hope the intent is clear.  For whatever reason, I
kept doing double-takes when reading this sentence.

Though to be honest, I would omit the whole "actual" part.  There's an assumption
that everyone is acting in good faith, i.e. that contributors aren't fabricating
a bug report or making things up.

> Directed EOI enabled:
>   1. Guest issues an APIC EOI.
>   2. The interrupt is injected into L2 and serviced.
>   3. Guest issues an IOAPIC EOI.
> 
> But, with the current behavior in split-irqchip mode:
>   1. Guest issues an APIC EOI.
>   2. KVM exits to userspace and QEMU's ioapic_service reasserts the
>      interrupt because the line is not yet deasserted.
>   3. Steps 1 and 2 keeps looping, and hence no progress is made.
> (logs at the bug linked below).

Eh, for the logs, there's a bug report, just use Closes.  And honestly, I would
shorten all of the above.  This is a fairly straightforward bug, providing a
super detailed play-by-play actually makes it _harder_ to understand what's

> This is because in split-irqchip mode, KVM requests a userspace IOAPIC

Please don't use QEMU's terminology when describing KVM bugs.

> EOI exit on every APIC EOI.

This is wrong.  Every *intercepted* EOI, but not all EOIs are intercepted.

> However, if the guest sets the Directed EOI bit in the APIC Spurious
> Interrupt Vector Register (SPIV, bit 12), per the x2APIC specification,

There is no singular x2APIC specification.  x2APIC is defined by both Intel's SDM
and AMD's APM.  Usually the Intel and AMD specs are compatible, key word "usually".
Case in point, I can't find anything in the APM that suggests AMD CPUs support
"Suppress EOI Broadcasts".

Of course, that could just be an APM documentation bug, since the bit appears
writable on Turin (though not on Milan).

> the APIC does not broadcast EOIs to the IOAPIC.  In this case, it is the
> guest's responsibility to explicitly EOI the IOAPIC by writing to its EOI
> register.
> 
> kernel-irqchip mode already handles this similarly in
> kvm_ioapic_update_eoi_one().

Hmm, I'm pretty sure that's dead code since commit 0bcc3fb95b97 ("KVM: lapic: stop
advertising DIRECTED_EOI when in-kernel IOAPIC is in use").

And Fudge with a capital 'F', because I'm pretty sure we're going to need at least
a new CAP, and probably a quirk too.

As per the aforementioned commit, advertising DIRECTED_EOI without an I/O APIC
that emulates the EOI register will make for a very sad guest.  So simply fixing
KVM will break existing setups, even though it's unequivocally the correct behavior.
And unfortunately, I _know_ there is at least one VMM in production that doesn't
support EOIs in the I/O APIC.

> Link: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com/

As above, this should be Closes, not Link.

  Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com

This also needs:

  Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
  Cc: stable@vger.kernel.org

> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
> ---
>  arch/x86/kvm/lapic.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 0725d2cae742..a81e71ad5bda 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1473,6 +1473,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>  
>  	/* Request a KVM exit to inform the userspace IOAPIC. */
>  	if (irqchip_split(apic->vcpu->kvm)) {
> +		/* EOI the ioapic only if the Directed EOI is disabled. */

Capitalize I/O APIC (ugh, the above uses IOAPIC, and further above uses ioapic).
I'd just avoid mentioning I/O APIC at this point, it should be super obvious
what's happening.

"the Directed EOI" doesn't parse.  Directed EOI is a feature, not a "thing".

"the ioapic" is also wrong in the sense that KVM has no idea how many I/O APICs
are being emulated by userspace.  Could be 1, could be 0 or 2.

Don't bother sending a v2, at least not yet.  I'll follow-up with various folks
to try and figure out the least awful way to get out of this mess, and will
probably post a v2 with a CAP and/or quirk.

In the meantime, I have the below sitting in a local branch:

--
From: Khushit Shah <khushit.shah@nutanix.com>
Date: Thu, 18 Sep 2025 09:25:28 -0700
Subject: [PATCH] KVM: x86: Suppress EOI broadcasts with split IRQCHIP if
 Directed EOI is enabled

Do not generate a KVM_EXIT_IOAPIC_EOI exit to userspace when handling EOIs
for a split IRQCHIP and the vCPU has enabled Directed EOIs in its local
APIC, i.e. if the guest has set "Suppress EOI Broadcasts" in Intel
parlance.

Incorrectly broadcasting EOIs can lead to a potentially fatal interrupt
storm if the IRQ line is still asserted and userspace reacts to the EOI by
re-injecting the IRQ.  E.g. Windows with Hyper-V enabled gets stuck during
boot when running under QEMU with a split IRQCHIP.

Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
APM.  But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.

Note #2, KVM doesn't support Directed EOIs for its in-kernel I/O APIC.
See commit 0bcc3fb95b97 ("KVM: lapic: stop advertising DIRECTED_EOI when
in-kernel IOAPIC is in use").

Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
Link: https://lore.kernel.org/r/20250918162529.640943-1-jon@nutanix.com
[sean: rewrite changelog and comment]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5fc437341e03..4d77112b887d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1429,6 +1429,15 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 
 	/* Request a KVM exit to inform the userspace IOAPIC. */
 	if (irqchip_split(apic->vcpu->kvm)) {
+		/*
+		 * Don't exit to userspace if the guest has enabled Directed
+		 * EOI, a.k.a. Suppress EOI Broadcasts, in which the local APIC
+		 * doesn't broadcast EOIs (the the guest must EOI the target
+		 * I/O APIC(s) directly).
+		 */
+		if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+			return;
+
 		apic->vcpu->arch.pending_ioapic_eoi = vector;
 		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
 		return;

base-commit: 07e27ad16399afcd693be20211b0dfae63e0615f
--
Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
Posted by Khushit Shah 1 week, 2 days ago
Thanks for the very detailed review.

> On 23 Sep 2025, at 3:21 AM, Sean Christopherson <seanjc@google.com> wrote:
> 
> Don't bother sending a v2, at least not yet.  I'll follow-up with various folks
> to try and figure out the least awful way to get out of this mess, and will
> probably post a v2 with a CAP and/or quirk.

I am happy to help with v2!

Reported-by: Khushit Shah <khushit.shah@nutanix.com>
Tested-by: Khushit Shah <khushit.shah@nutanix.com>

> On 23 Sep 2025, at 3:21 AM, Sean Christopherson <seanjc@google.com> wrote:
> 
> /* Request a KVM exit to inform the userspace IOAPIC. */
> if (irqchip_split(apic->vcpu->kvm)) {
> + /*
> +  * Don't exit to userspace if the guest has enabled Directed
> +  * EOI, a.k.a. Suppress EOI Broadcasts, in which the local APIC
> +  * doesn't broadcast EOIs (the the guest must EOI the target
> +  * I/O APIC(s) directly).
> +  */
> +

Nit: small typo in the comment (“the the”).
Re: [PATCH] KVM: x86: skip userspace IOAPIC EOI exit when Directed EOI is enabled
Posted by Huang, Kai 1 week, 2 days ago
> From: Khushit Shah <khushit.shah@nutanix.com>
> Date: Thu, 18 Sep 2025 09:25:28 -0700
> Subject: [PATCH] KVM: x86: Suppress EOI broadcasts with split IRQCHIP if
>  Directed EOI is enabled
> 
> Do not generate a KVM_EXIT_IOAPIC_EOI exit to userspace when handling EOIs
> for a split IRQCHIP and the vCPU has enabled Directed EOIs in its local
> APIC, i.e. if the guest has set "Suppress EOI Broadcasts" in Intel
> parlance.
> 
> Incorrectly broadcasting EOIs can lead to a potentially fatal interrupt
> storm if the IRQ line is still asserted and userspace reacts to the EOI by
> re-injecting the IRQ.  E.g. Windows with Hyper-V enabled gets stuck during
> boot when running under QEMU with a split IRQCHIP.
> 
> Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
> APM.  But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
> is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
> 
> Note #2, KVM doesn't support Directed EOIs for its in-kernel I/O APIC.
> See commit 0bcc3fb95b97 ("KVM: lapic: stop advertising DIRECTED_EOI when
> in-kernel IOAPIC is in use").
> 
> Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
> Cc: stable@vger.kernel.org
> Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com>
> Link: https://lore.kernel.org/r/20250918162529.640943-1-jon@nutanix.com
> [sean: rewrite changelog and comment]
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Kai Huang <kai.huang@intel.com>

> ---
>  arch/x86/kvm/lapic.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 5fc437341e03..4d77112b887d 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1429,6 +1429,15 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>  
>  	/* Request a KVM exit to inform the userspace IOAPIC. */
>  	if (irqchip_split(apic->vcpu->kvm)) {
> +		/*
> +		 * Don't exit to userspace if the guest has enabled Directed
> +		 * EOI, a.k.a. Suppress EOI Broadcasts, in which the local APIC
> +		 * doesn't broadcast EOIs (the the guest must EOI the target
> +		 * I/O APIC(s) directly).
> +		 */
> +		if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> +			return;
> +
>  		apic->vcpu->arch.pending_ioapic_eoi = vector;
>  		kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
>  		return;
> 
> base-commit: 07e27ad16399afcd693be20211b0dfae63e0615f
> --