[RFC PATCH v2 0/1] target/i386/kvm: Configure proper KVM SEOIB behavior

Khushit Shah posted 1 patch 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260309054243.440453-1-khushit.shah@nutanix.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, "Michael S. Tsirkin" <mst@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Marcelo Tosatti <mtosatti@redhat.com>
hw/i386/x86-common.c         | 99 ++++++++++++++++++++++++++++++++++++
hw/i386/x86.c                |  1 +
hw/intc/ioapic.c             |  2 -
include/hw/i386/x86.h        | 12 +++++
include/hw/intc/ioapic.h     |  2 +
include/system/system.h      |  1 +
system/vl.c                  |  5 ++
target/i386/kvm/kvm.c        | 43 ++++++++++++++++
target/i386/kvm/kvm_i386.h   | 12 +++++
target/i386/kvm/trace-events |  4 ++
10 files changed, 179 insertions(+), 2 deletions(-)
[RFC PATCH v2 0/1] target/i386/kvm: Configure proper KVM SEOIB behavior
Posted by Khushit Shah 1 month ago
This is an RFC posted for design feedback rather than merge.

Problem
-------

In split-irqchip mode, KVM unconditionally advertises x2APIC Suppress EOI
Broadcast (SEOIB) support to the guest. This is wrong in two ways:

  - IOAPIC v0x11 has no EOI register, so advertising SEOIB is incorrect.
  - Even with IOAPIC v0x20, KVM ignores the guest's suppression request
    and continues to broadcast LAPIC EOIs to the userspace IOAPIC.

This can cause interrupt storms in guests that rely on Directed EOI
semantics (e.g. Windows with Credential Guard, which hangs during boot).

KVM fix
-------

KVM now exposes two new x2APIC API flags to give userspace control:

  - KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST
  - KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST

This patch
----------

This patch wires those flags into QEMU via a machine-level field
(kvm_lapic_seoib_state) with three policy states:

  - SEOIB_STATE_QUIRKED (default): legacy behavior, no flags set
  - SEOIB_STATE_RESPECTED: advertise SEOIB and honor guest suppression
  - SEOIB_STATE_NOT_ADVERTISED: hide SEOIB from guest (for IOAPIC v0x11)

The current implementation automatically selects a policy based on IOAPIC
version at VM power-on time, and migrates the state as a VMState subsection.

Design challenges
-----------------

The KVM x2APIC API is one-way: once a flag is set, it cannot be reverted
back to the quirked state (consistent with other x2APIC API flags). This
has several implications:

  - During incoming migration, we must defer setting the flags until after
    the SEOIB state is loaded from the migration stream, since we cannot
    know the source VM's policy in advance.

  - Snapshot restore (loadvm) is problematic: if the running VM has already
    set enabled/disabled, restoring a QUIRKED snapshot cannot revert the
    KVM state. QMP/HMP loadvm makes this worse since it cannot be detected
    at init time.

  - The flags only take effect when kvm_apic_set_version() is called inside
    KVM, which happens from limited paths such as IOAPIC initialization or
    APIC state setting. This requires setting the flags before IOAPIC is
    initialized, necessitating fetching the IOAPIC version from global
    properties rather than from the initialized device.

  - Automatic policy selection (current approach) changes behavior for
    existing machine types without user opt-in, breaking migratability
    from a new QEMU to an older QEMU for the same machine type.

  - Machine-type gating (restrict to 10.2+) helps with older machine types,
    but still breaks migration to a destination with an older kernel that
    lacks the SEOIB API, again without user opt-in.

Preferred direction
-------------------

Given the above, I am leaning toward a KVM accelerator property (for
example, `seoib-policy=respected|not-advertised|quirked`) with the default
being quirked.

This approach:

  - The user explicitly opts in. But requires user to understand/configure
    the property.
  - Works with any machine type, no machine-type gating needed.
  - No need to fetch the IOAPIC version from globals.


Note: The current implementation also does not handle the QMP/HMP
loadvm corner cases. 

I would appreciate feedback on the preferred approach.

Changes in v2:
	- Update flags in patch description to match kernel naming.
  
Khushit Shah (1):
  target/i386/kvm: Configure proper KVM SEOIB behavior

 hw/i386/x86-common.c         | 99 ++++++++++++++++++++++++++++++++++++
 hw/i386/x86.c                |  1 +
 hw/intc/ioapic.c             |  2 -
 include/hw/i386/x86.h        | 12 +++++
 include/hw/intc/ioapic.h     |  2 +
 include/system/system.h      |  1 +
 system/vl.c                  |  5 ++
 target/i386/kvm/kvm.c        | 43 ++++++++++++++++
 target/i386/kvm/kvm_i386.h   | 12 +++++
 target/i386/kvm/trace-events |  4 ++
 10 files changed, 179 insertions(+), 2 deletions(-)

-- 
2.39.3
Re: [RFC PATCH v2 0/1] target/i386/kvm: Configure proper KVM SEOIB behavior
Posted by Mohamed Mediouni 1 month ago

> On 9. Mar 2026, at 06:42, Khushit Shah <khushit.shah@nutanix.com> wrote:
> 
>  - Machine-type gating (restrict to 10.2+) helps with older machine types,
>    but still breaks migration to a destination with an older kernel that
>    lacks the SEOIB API, again without user opt-in.
Hi,

Ugh about this one, but am still convinced that a machine model version
is _the_ right way to deal with this. Will this be backported to Linux LTSes on the KVM side?

Or alternatively, what are the odds of having it fit as a CPU flag?
For example -cpu host,x2apic-suppress-eoi-broadcast=on.
 
Re: [RFC PATCH v2 0/1] target/i386/kvm: Configure proper KVM SEOIB behavior
Posted by Khushit Shah 1 month ago

> On 9 Mar 2026, at 11:27 AM, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
> 
> Hi,
> 
> Ugh about this one, but am still convinced that a machine model version
> is _the_ right way to deal with this. Will this be backported to Linux LTSes on the KVM side?
> 
> Or alternatively, what are the odds of having it fit as a CPU flag?
> For example -cpu host,x2apic-suppress-eoi-broadcast=on.


Hi, 

Thanks for the review.

Regarding the -cpu flag: This doesn't feel right because SEOIB is an accelerator-specific (KVM)
value. Tying it to the CPU suggests an architectural feature that wouldn't apply to TCG. What
specifically feels off about keeping it as a KVM/Machine property?

Regarding the backports: The KVM kernel side is currently in 6.18 and 6.19. I haven’t yet came
around to manually backported it to the older stable releases yet.

Thanks, 
Khushit.
Re: [RFC PATCH v2 0/1] target/i386/kvm: Configure proper KVM SEOIB behavior
Posted by Mohamed Mediouni 1 month ago

> On 9. Mar 2026, at 07:33, Khushit Shah <khushit.shah@nutanix.com> wrote:
> 
> 
> 
>> On 9 Mar 2026, at 11:27 AM, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
>> 
>> Hi,
>> 
>> Ugh about this one, but am still convinced that a machine model version
>> is _the_ right way to deal with this. Will this be backported to Linux LTSes on the KVM side?
>> 
>> Or alternatively, what are the odds of having it fit as a CPU flag?
>> For example -cpu host,x2apic-suppress-eoi-broadcast=on.
> 
> 
> Hi, 
> 
> Thanks for the review.
> 
> Regarding the -cpu flag: This doesn't feel right because SEOIB is an accelerator-specific (KVM)
> value. Tying it to the CPU suggests an architectural feature that wouldn't apply to TCG. What
> specifically feels off about keeping it as a KVM/Machine property?
Hello,

The reason why I considered this is that QEMU CPU models are versioned separately from the machine
model, with an understanding that an older CPU model might not work on an older KVM.

And with a common CPU model being also a requirement for live migration too.

This has historically been generally leveraged for security mitigations but it could be a good fit here.

That could provide a path towards it becoming the default when -cpu host on a supported kernel (which doesn’t entail having LM support), while keeping LM working and having it stepped in with newer CPU model versions, which people will update to more often than new kernels. Maybe the best shape could be a no_seoib flag for older predefined CPU models…

My concern is primarily somebody installing a new QEMU, running qemu-system-x86_64 -M q35,kernel-irqchip=split [whatever], which would pick the latest machine model version and fail straight away if this is rolled as part of a new machine model if running on a kernel without the KVM side merged in.

And having a custom flag not turned on by default for a long time for this comes with the risk that almost nobody will enable it instead of it being eventually phased in as the default configuration… Or maybe it matters a lot less because kernel-irqchip=split is not the default today anyway?

Thank you,
-Mohamed
> 
> Regarding the backports: The KVM kernel side is currently in 6.18 and 6.19. I haven’t yet came
> around to manually backported it to the older stable releases yet.
> 
> Thanks, 
> Khushit.
Re: [RFC PATCH v2 0/1] target/i386/kvm: Configure proper KVM SEOIB behavior
Posted by Mohamed Mediouni 1 month ago

> On 9. Mar 2026, at 07:55, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
> 
> 
> 
>> On 9. Mar 2026, at 07:33, Khushit Shah <khushit.shah@nutanix.com> wrote:
>> 
>> 
>> 
>>> On 9 Mar 2026, at 11:27 AM, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
>>> 
>>> Hi,
>>> 
>>> Ugh about this one, but am still convinced that a machine model version
>>> is _the_ right way to deal with this. Will this be backported to Linux LTSes on the KVM side?
>>> 
>>> Or alternatively, what are the odds of having it fit as a CPU flag?
>>> For example -cpu host,x2apic-suppress-eoi-broadcast=on.
>> 
>> 
>> Hi, 
>> 
>> Thanks for the review.
>> 
>> Regarding the -cpu flag: This doesn't feel right because SEOIB is an accelerator-specific (KVM)
>> value. Tying it to the CPU suggests an architectural feature that wouldn't apply to TCG. What
>> specifically feels off about keeping it as a KVM/Machine property?
> Hello,
> 
> The reason why I considered this is that QEMU CPU models are versioned separately from the machine
> model, with an understanding that an older CPU model might not work on an older KVM.
> 
> And with a common CPU model being also a requirement for live migration too.
> 
> This has historically been generally leveraged for security mitigations but it could be a good fit here.
> 
> That could provide a path towards it becoming the default when -cpu host on a supported kernel (which doesn’t entail having LM support), while keeping LM working and having it stepped in with newer CPU model versions, which people will update to more often than new kernels. Maybe the best shape could be a no_seoib flag for older predefined CPU models…
s/new kernels/new machine models/g and no_seoib -> no_seoib_fix
> 
> My concern is primarily somebody installing a new QEMU, running qemu-system-x86_64 -M q35,kernel-irqchip=split [whatever], which would pick the latest machine model version and fail straight away if this is rolled as part of a new machine model if running on a kernel without the KVM side merged in.
> 
> And having a custom flag not turned on by default for a long time for this comes with the risk that almost nobody will enable it instead of it being eventually phased in as the default configuration… Or maybe it matters a lot less because kernel-irqchip=split is not the default today anyway?
Which is why I think that an accelerator property isn’t the ideal choice here - as there’s no -accel kvm-10.2 for example :/ And machine type-gating is more tied to the QEMU version than CPU model versions which have less of a tie.

But I might be wrong on this one, with machine type being potentially better in practice - although it’s way too late to define this as part of q35-10.2 baseline as that has been long released, so it could be a slower rollout with the default tied to a new QEMU release. And an error message to use a newer kernel in that case if using the newer machine model explicitly or implicitly.

> Thank you,
> -Mohamed
>> 
>> Regarding the backports: The KVM kernel side is currently in 6.18 and 6.19. I haven’t yet came
>> around to manually backported it to the older stable releases yet.
>> 
>> Thanks, 
>> Khushit.
Re: [RFC PATCH v2 0/1] target/i386/kvm: Configure proper KVM SEOIB behavior
Posted by Khushit Shah 1 month ago

> On 9 Mar 2026, at 12:44 PM, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
> 
>> And having a custom flag not turned on by default for a long time for this comes with the risk that almost nobody will enable it instead of it being eventually phased in as the default configuration… 

Yes, I agree with this :/. This is something that concerns me too. Although, for majority of guest,
either they do not use directed eoi or the spurious eoi exits from KVM just works. But, users will
end up using the quirked setup. Maybe this is fine with a printed warning? If they see something
breaking(apparently not alot, as the bug has been around since 2019) they can move to
respected/not-advertised. Definitely not happy with this tho.

Hmmm, machine-type gating looks the way forward, 

> On 9 Mar 2026, at 12:44 PM, Mohamed Mediouni <mohamed@unpredictable.fr> wrote:
> 
> But I might be wrong on this one, with machine type being potentially better in practice - although it’s way too late to define this as part of q35-10.2 baseline as that has been long released, so it could be a slower rollout with the default tied to a new QEMU release. And an error message to use a newer kernel in that case if using the newer machine model explicitly or implicitly.

I like this idea, by default we have it on, on error hint to use quirked mode explicitly.
This also fixes the live migration state issue. So basically:

For the latest (or next) qemu machine type version:
- seoib can be on/off by default based on the IOAPIC version.
- user can explicitly set it to quirked using the command line.

Thanks,
Khushit.
Re: [RFC PATCH v2 0/1] target/i386/kvm: Configure proper KVM SEOIB behavior
Posted by Mohamed Mediouni 1 month ago

> On 9. Mar 2026, at 06:42, Khushit Shah <khushit.shah@nutanix.com> wrote:
> 
> This is an RFC posted for design feedback rather than merge.
> 
> Problem
> -------
> 
> In split-irqchip mode, KVM unconditionally advertises x2APIC Suppress EOI
> Broadcast (SEOIB) support to the guest. This is wrong in two ways:
> 
>  - IOAPIC v0x11 has no EOI register, so advertising SEOIB is incorrect.
>  - Even with IOAPIC v0x20, KVM ignores the guest's suppression request
>    and continues to broadcast LAPIC EOIs to the userspace IOAPIC.
> 
> This can cause interrupt storms in guests that rely on Directed EOI
> semantics (e.g. Windows with Credential Guard, which hangs during boot).
> 
> KVM fix
> -------
> 
> KVM now exposes two new x2APIC API flags to give userspace control:
> 
>  - KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST
>  - KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST
> 
> This patch
> ----------
> 
> This patch wires those flags into QEMU via a machine-level field
> (kvm_lapic_seoib_state) with three policy states:
> 
>  - SEOIB_STATE_QUIRKED (default): legacy behavior, no flags set
>  - SEOIB_STATE_RESPECTED: advertise SEOIB and honor guest suppression
>  - SEOIB_STATE_NOT_ADVERTISED: hide SEOIB from guest (for IOAPIC v0x11)
> 
> The current implementation automatically selects a policy based on IOAPIC
> version at VM power-on time, and migrates the state as a VMState subsection.
> 
> Design challenges
> -----------------
> 
> The KVM x2APIC API is one-way: once a flag is set, it cannot be reverted
> back to the quirked state (consistent with other x2APIC API flags). This
> has several implications:
> 
>  - During incoming migration, we must defer setting the flags until after
>    the SEOIB state is loaded from the migration stream, since we cannot
>    know the source VM's policy in advance.
> 
>  - Snapshot restore (loadvm) is problematic: if the running VM has already
>    set enabled/disabled, restoring a QUIRKED snapshot cannot revert the
>    KVM state. QMP/HMP loadvm makes this worse since it cannot be detected
>    at init time.
> 
>  - The flags only take effect when kvm_apic_set_version() is called inside
>    KVM, which happens from limited paths such as IOAPIC initialization or
>    APIC state setting. This requires setting the flags before IOAPIC is
>    initialized, necessitating fetching the IOAPIC version from global
>    properties rather than from the initialized device.
> 
>  - Automatic policy selection (current approach) changes behavior for
>    existing machine types without user opt-in, breaking migratability
>    from a new QEMU to an older QEMU for the same machine type.
> 
>  - Machine-type gating (restrict to 10.2+) helps with older machine types,
>    but still breaks migration to a destination with an older kernel that
>    lacks the SEOIB API, again without user opt-in.
> 
> Preferred direction
> -------------------
> 
> Given the above, I am leaning toward a KVM accelerator property (for
> example, `seoib-policy=respected|not-advertised|quirked`) with the default
> being quirked.
> 
> This approach:
> 
>  - The user explicitly opts in. But requires user to understand/configure
>    the property.
>  - Works with any machine type, no machine-type gating needed.
>  - No need to fetch the IOAPIC version from globals.
> 
> 
Hello,

I don’t think that makes much sense. Tying it to a machine model version
sounds like it is a much better approach.

And it’d allow to have a saner way to deal with migration too - as you’d
never have the quirked state for the new machine model version.

Thank you,
-Mohamed
> Note: The current implementation also does not handle the QMP/HMP
> loadvm corner cases. 
> 
> I would appreciate feedback on the preferred approach.
> 
> Changes in v2:
> 	- Update flags in patch description to match kernel naming.
> 
> Khushit Shah (1):
>  target/i386/kvm: Configure proper KVM SEOIB behavior
> 
> hw/i386/x86-common.c         | 99 ++++++++++++++++++++++++++++++++++++
> hw/i386/x86.c                |  1 +
> hw/intc/ioapic.c             |  2 -
> include/hw/i386/x86.h        | 12 +++++
> include/hw/intc/ioapic.h     |  2 +
> include/system/system.h      |  1 +
> system/vl.c                  |  5 ++
> target/i386/kvm/kvm.c        | 43 ++++++++++++++++
> target/i386/kvm/kvm_i386.h   | 12 +++++
> target/i386/kvm/trace-events |  4 ++
> 10 files changed, 179 insertions(+), 2 deletions(-)
> 
> -- 
> 2.39.3
> 
> 
Re: [RFC PATCH v2 0/1] target/i386/kvm: Configure proper KVM SEOIB behavior
Posted by Khushit Shah 1 month ago

> On 9 Mar 2026, at 11:12 AM, Khushit Shah <khushit.shah@nutanix.com> wrote:
> 
> This is an RFC posted for design feedback rather than merge.
> 
> Problem
> -------
> 
> In split-irqchip mode, KVM unconditionally advertises x2APIC Suppress EOI
> Broadcast (SEOIB) support to the guest. This is wrong in two ways:
> 
>  - IOAPIC v0x11 has no EOI register, so advertising SEOIB is incorrect.
>  - Even with IOAPIC v0x20, KVM ignores the guest's suppression request
>    and continues to broadcast LAPIC EOIs to the userspace IOAPIC.
> 
> This can cause interrupt storms in guests that rely on Directed EOI
> semantics (e.g. Windows with Credential Guard, which hangs during boot).
> 
> KVM fix
> -------
> 
> KVM now exposes two new x2APIC API flags to give userspace control:
> 
>  - KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST
>  - KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST
> 
> This patch
> ----------
> 
> This patch wires those flags into QEMU via a machine-level field
> (kvm_lapic_seoib_state) with three policy states:
> 
>  - SEOIB_STATE_QUIRKED (default): legacy behavior, no flags set
>  - SEOIB_STATE_RESPECTED: advertise SEOIB and honor guest suppression
>  - SEOIB_STATE_NOT_ADVERTISED: hide SEOIB from guest (for IOAPIC v0x11)
> 
> The current implementation automatically selects a policy based on IOAPIC
> version at VM power-on time, and migrates the state as a VMState subsection.
> 
> Design challenges
> -----------------
> 
> The KVM x2APIC API is one-way: once a flag is set, it cannot be reverted
> back to the quirked state (consistent with other x2APIC API flags). This
> has several implications:
> 
>  - During incoming migration, we must defer setting the flags until after
>    the SEOIB state is loaded from the migration stream, since we cannot
>    know the source VM's policy in advance.
> 
>  - Snapshot restore (loadvm) is problematic: if the running VM has already
>    set enabled/disabled, restoring a QUIRKED snapshot cannot revert the
>    KVM state. QMP/HMP loadvm makes this worse since it cannot be detected
>    at init time.
> 
>  - The flags only take effect when kvm_apic_set_version() is called inside
>    KVM, which happens from limited paths such as IOAPIC initialization or
>    APIC state setting. This requires setting the flags before IOAPIC is
>    initialized, necessitating fetching the IOAPIC version from global
>    properties rather than from the initialized device.
> 
>  - Automatic policy selection (current approach) changes behavior for
>    existing machine types without user opt-in, breaking migratability
>    from a new QEMU to an older QEMU for the same machine type.
> 
>  - Machine-type gating (restrict to 10.2+) helps with older machine types,
>    but still breaks migration to a destination with an older kernel that
>    lacks the SEOIB API, again without user opt-in.
> 
> Preferred direction
> -------------------
> 
> Given the above, I am leaning toward a KVM accelerator property (for
> example, `seoib-policy=respected|not-advertised|quirked`) with the default
> being quirked.
> 
> This approach:
> 
>  - The user explicitly opts in. But requires user to understand/configure
>    the property.
>  - Works with any machine type, no machine-type gating needed.
>  - No need to fetch the IOAPIC version from globals.
> 
> 
> Note: The current implementation also does not handle the QMP/HMP
> loadvm corner cases. 
> 
> I would appreciate feedback on the preferred approach.
> 
> Changes in v2:
> - Update flags in patch description to match kernel naming.
> 
> Khushit Shah (1):
>  target/i386/kvm: Configure proper KVM SEOIB behavior
> 
> hw/i386/x86-common.c         | 99 ++++++++++++++++++++++++++++++++++++
> hw/i386/x86.c                |  1 +
> hw/intc/ioapic.c             |  2 -
> include/hw/i386/x86.h        | 12 +++++
> include/hw/intc/ioapic.h     |  2 +
> include/system/system.h      |  1 +
> system/vl.c                  |  5 ++
> target/i386/kvm/kvm.c        | 43 ++++++++++++++++
> target/i386/kvm/kvm_i386.h   | 12 +++++
> target/i386/kvm/trace-events |  4 ++
> 10 files changed, 179 insertions(+), 2 deletions(-)
> 
> -- 
> 2.39.3
> 

Missed maintainers in the initial mail, fixed.