KVM: x86: Synchronize APIC State with QEMU when irqchip=split

[PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by Yuguo Li 1 month, 4 weeks ago

When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is emulated by KVM.
When guest disables LINT0, KVM doesn't exit to QEMU for synchronization, leaving IOAPIC unaware of this change.
This may cause vCPU to be kicked when external devices(e.g. PIT)keep sending interrupts.
This patch ensure that KVM exits to QEMU for synchronization when the guest disables LINT0.

Signed-off-by: Yuguo Li <hugoolli@tencent.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/lapic.c            | 4 ++++
 arch/x86/kvm/x86.c              | 5 +++++
 include/uapi/linux/kvm.h        | 1 +
 4 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f19a76d3ca0e..f69ce111bbe0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -129,6 +129,7 @@
 	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
 	KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
+#define KVM_REQ_LAPIC_UPDATE              KVM_ARCH_REQ(35)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 8172c2042dd6..65ffa89bf8a6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2329,6 +2329,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 			val |= APIC_LVT_MASKED;
 		val &= apic_lvt_mask[index];
 		kvm_lapic_set_reg(apic, reg, val);
+		if (irqchip_split(apic->vcpu->kvm) && (val & APIC_LVT_MASKED)) {
+			kvm_make_request(KVM_REQ_LAPIC_UPDATE, apic->vcpu);
+			kvm_vcpu_kick(apic->vcpu);
+		}
 		break;
 	}
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a1c49bc681c4..0d6d29488ee9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10779,6 +10779,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			r = 0;
 			goto out;
 		}
+		if (kvm_check_request(KVM_REQ_LAPIC_UPDATE, vcpu)) {
+			vcpu->run->exit_reason = KVM_EXIT_APIC_SYNC;
+			r = 0;
+			goto out;
+		}
 
 		/*
 		 * KVM_REQ_HV_STIMER has to be processed after
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f0f0d49d2544..3425076d5c7b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -179,6 +179,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_LOONGARCH_IOCSR  38
 #define KVM_EXIT_MEMORY_FAULT     39
 #define KVM_EXIT_TDX              40
+#define KVM_EXIT_APIC_SYNC        41
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
-- 
2.43.5

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by Sean Christopherson 1 month, 4 weeks ago

On Wed, Aug 06, 2025, Yuguo Li wrote:
> When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is
> emulated by KVM.  When guest disables LINT0, KVM doesn't exit to QEMU for
> synchronization, leaving IOAPIC unaware of this change.  This may cause vCPU
> to be kicked when external devices(e.g. PIT)keep sending interrupts.

I don't entirely follow what the problem is.  Is the issue that QEMU injects an
IRQ that should have been blocked?  Or is QEMU forcing the vCPU to exit unnecessarily?

> This patch ensure that KVM exits to QEMU for synchronization when the guest
> disables LINT0.

Please wrap at ~75 characters.

> Signed-off-by: Yuguo Li <hugoolli@tencent.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/lapic.c            | 4 ++++
>  arch/x86/kvm/x86.c              | 5 +++++
>  include/uapi/linux/kvm.h        | 1 +
>  4 files changed, 11 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f19a76d3ca0e..f69ce111bbe0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -129,6 +129,7 @@
>  	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
>  	KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
> +#define KVM_REQ_LAPIC_UPDATE              KVM_ARCH_REQ(35)
>  
>  #define CR0_RESERVED_BITS                                               \
>  	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 8172c2042dd6..65ffa89bf8a6 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2329,6 +2329,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>  			val |= APIC_LVT_MASKED;
>  		val &= apic_lvt_mask[index];
>  		kvm_lapic_set_reg(apic, reg, val);
> +		if (irqchip_split(apic->vcpu->kvm) && (val & APIC_LVT_MASKED)) {

This applies to much more than just LINT0, and for at least LVTPC and LVTCMCI,
KVM definitely doesn't want to exit on every change.

Even for LINT0, it's not obvious that "pushing" from KVM is a better option than
having QEMU "pull" as needed.

At the very least, this would need to be guarded by a capability, otherwise
the new userspace exit would confuse existing VMMs (and probably result in the
VM being terminated).


> +			kvm_make_request(KVM_REQ_LAPIC_UPDATE, apic->vcpu);
> +			kvm_vcpu_kick(apic->vcpu);

Why kick?  Cross-vCPU writes to LINT0 shouldn't be a thing, i.e. the kick should
effectivel be a nop.

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by hugo lee 1 month, 4 weeks ago

On Thu, Aug 7, 2025 Sean Christopherson wrote:
>
> On Wed, Aug 06, 2025, Yuguo Li wrote:
> > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is
> > emulated by KVM.  When guest disables LINT0, KVM doesn't exit to QEMU for
> > synchronization, leaving IOAPIC unaware of this change.  This may cause vCPU
> > to be kicked when external devices(e.g. PIT)keep sending interrupts.
>
> I don't entirely follow what the problem is.  Is the issue that QEMU injects an
> IRQ that should have been blocked?  Or is QEMU forcing the vCPU to exit unnecessarily?
>

This issue is about QEMU keeps injecting should-be-blocked
(blocked by guest and qemu just doesn't know that) IRQs.
As a result, QEMU forces vCPU to exit unnecessarily.

> > This patch ensure that KVM exits to QEMU for synchronization when the guest
> > disables LINT0.
>
> Please wrap at ~75 characters.

Thanks for reminding, will do in the next patch.

>
> > Signed-off-by: Yuguo Li <hugoolli@tencent.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h | 1 +
> >  arch/x86/kvm/lapic.c            | 4 ++++
> >  arch/x86/kvm/x86.c              | 5 +++++
> >  include/uapi/linux/kvm.h        | 1 +
> >  4 files changed, 11 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index f19a76d3ca0e..f69ce111bbe0 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -129,6 +129,7 @@
> >       KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> >  #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
> >       KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
> > +#define KVM_REQ_LAPIC_UPDATE              KVM_ARCH_REQ(35)
> >
> >  #define CR0_RESERVED_BITS                                               \
> >       (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 8172c2042dd6..65ffa89bf8a6 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -2329,6 +2329,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
> >                       val |= APIC_LVT_MASKED;
> >               val &= apic_lvt_mask[index];
> >               kvm_lapic_set_reg(apic, reg, val);
> > +             if (irqchip_split(apic->vcpu->kvm) && (val & APIC_LVT_MASKED)) {
>
> This applies to much more than just LINT0, and for at least LVTPC and LVTCMCI,
> KVM definitely doesn't want to exit on every change.

Actually every masking on LAPIC should be synchronized with IOAPIC.
Because any desynchronization may cause unnecessary kicks
which rarely happens due to the well-behaving guests.
Exits here won't harm, but maybe only exit when LINT0 is being masked?
Since others unlikely cause exits.

>
> Even for LINT0, it's not obvious that "pushing" from KVM is a better option than
> having QEMU "pull" as needed.
>

QEMU has no idea when LINT0 is masked by the guest. Then the problem becomes
when it is needed to "pull". The guess on this could lead to extra costs.

> At the very least, this would need to be guarded by a capability, otherwise
> the new userspace exit would confuse existing VMMs (and probably result in the
> VM being terminated).

True, I'll add this protection.

>
>
> > +                     kvm_make_request(KVM_REQ_LAPIC_UPDATE, apic->vcpu);
> > +                     kvm_vcpu_kick(apic->vcpu);
>
> Why kick?  Cross-vCPU writes to LINT0 shouldn't be a thing, i.e. the kick should
> effectivel be a nop.

It is unnecessary, I will fix it in the next version.

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by Sean Christopherson 1 month, 4 weeks ago

On Thu, Aug 07, 2025, hugo lee wrote:
> On Thu, Aug 7, 2025 Sean Christopherson wrote:
> >
> > On Wed, Aug 06, 2025, Yuguo Li wrote:
> > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is
> > > emulated by KVM.  When guest disables LINT0, KVM doesn't exit to QEMU for
> > > synchronization, leaving IOAPIC unaware of this change.  This may cause vCPU
> > > to be kicked when external devices(e.g. PIT)keep sending interrupts.
> >
> > I don't entirely follow what the problem is.  Is the issue that QEMU injects an
> > IRQ that should have been blocked?  Or is QEMU forcing the vCPU to exit unnecessarily?
> >
> 
> This issue is about QEMU keeps injecting should-be-blocked
> (blocked by guest and qemu just doesn't know that) IRQs.
> As a result, QEMU forces vCPU to exit unnecessarily.

Is the problem that the guest receives spurious IRQs, or that QEMU is forcing
unnecesary exits, i.e hurting performance?

> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 8172c2042dd6..65ffa89bf8a6 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -2329,6 +2329,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
> > >                       val |= APIC_LVT_MASKED;
> > >               val &= apic_lvt_mask[index];
> > >               kvm_lapic_set_reg(apic, reg, val);
> > > +             if (irqchip_split(apic->vcpu->kvm) && (val & APIC_LVT_MASKED)) {
> >
> > This applies to much more than just LINT0, and for at least LVTPC and LVTCMCI,
> > KVM definitely doesn't want to exit on every change.
> 
> Actually every masking on LAPIC should be synchronized with IOAPIC.

No, because not all LVT entries can be wired up to the I/O APIC.

> Because any desynchronization may cause unnecessary kicks
> which rarely happens due to the well-behaving guests.
> Exits here won't harm, but maybe only exit when LINT0 is being masked?

Exits here absolutely will harm the VM by generating spurious slow path exits.

> Since others unlikely cause exits.

On Intel, LVTPC is masked on every PMI.

> > Even for LINT0, it's not obvious that "pushing" from KVM is a better option than
> > having QEMU "pull" as needed.
> >
> 
> QEMU has no idea when LINT0 is masked by the guest. Then the problem becomes
> when it is needed to "pull". The guess on this could lead to extra costs.

So this patch is motivated by performance?

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by hugo lee 1 month, 4 weeks ago

On Fri, Aug 8, 2025, Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Aug 07, 2025, hugo lee wrote:
> > On Thu, Aug 7, 2025 Sean Christopherson wrote:
> > >
> > > On Wed, Aug 06, 2025, Yuguo Li wrote:
> > > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is
> > > > emulated by KVM.  When guest disables LINT0, KVM doesn't exit to QEMU for
> > > > synchronization, leaving IOAPIC unaware of this change.  This may cause vCPU
> > > > to be kicked when external devices(e.g. PIT)keep sending interrupts.
> > >
> > > I don't entirely follow what the problem is.  Is the issue that QEMU injects an
> > > IRQ that should have been blocked?  Or is QEMU forcing the vCPU to exit unnecessarily?
> > >
> >
> > This issue is about QEMU keeps injecting should-be-blocked
> > (blocked by guest and qemu just doesn't know that) IRQs.
> > As a result, QEMU forces vCPU to exit unnecessarily.
>
> Is the problem that the guest receives spurious IRQs, or that QEMU is forcing
> unnecesary exits, i.e hurting performance?
>

It is QEMU is forcing unnecessary exits which will hurt performance by
trying to require the Big QEMU Lock in qemu_wait_io_event.

> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 8172c2042dd6..65ffa89bf8a6 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -2329,6 +2329,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
> > > >                       val |= APIC_LVT_MASKED;
> > > >               val &= apic_lvt_mask[index];
> > > >               kvm_lapic_set_reg(apic, reg, val);
> > > > +             if (irqchip_split(apic->vcpu->kvm) && (val & APIC_LVT_MASKED)) {
> > >
> > > This applies to much more than just LINT0, and for at least LVTPC and LVTCMCI,
> > > KVM definitely doesn't want to exit on every change.
> >
> > Actually every masking on LAPIC should be synchronized with IOAPIC.
>
> No, because not all LVT entries can be wired up to the I/O APIC.
>
> > Because any desynchronization may cause unnecessary kicks
> > which rarely happens due to the well-behaving guests.
> > Exits here won't harm, but maybe only exit when LINT0 is being masked?
>
> Exits here absolutely will harm the VM by generating spurious slow path exits.
>
> > Since others unlikely cause exits.
>
> On Intel, LVTPC is masked on every PMI.
>

So I will make it only exit when LINT0/1 is being masked.

> > > Even for LINT0, it's not obvious that "pushing" from KVM is a better option than
> > > having QEMU "pull" as needed.
> > >
> >
> > QEMU has no idea when LINT0 is masked by the guest. Then the problem becomes
> > when it is needed to "pull". The guess on this could lead to extra costs.
>
> So this patch is motivated by performance?

Yes.

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by Sean Christopherson 1 month, 3 weeks ago

On Fri, Aug 08, 2025, hugo lee wrote:
> On Fri, Aug 8, 2025, Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Aug 07, 2025, hugo lee wrote:
> > > On Thu, Aug 7, 2025 Sean Christopherson wrote:
> > > >
> > > > On Wed, Aug 06, 2025, Yuguo Li wrote:
> > > > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is
> > > > > emulated by KVM.  When guest disables LINT0, KVM doesn't exit to QEMU for
> > > > > synchronization, leaving IOAPIC unaware of this change.  This may cause vCPU
> > > > > to be kicked when external devices(e.g. PIT)keep sending interrupts.
> > > >
> > > > I don't entirely follow what the problem is.  Is the issue that QEMU injects an
> > > > IRQ that should have been blocked?  Or is QEMU forcing the vCPU to exit unnecessarily?
> > > >
> > >
> > > This issue is about QEMU keeps injecting should-be-blocked
> > > (blocked by guest and qemu just doesn't know that) IRQs.
> > > As a result, QEMU forces vCPU to exit unnecessarily.
> >
> > Is the problem that the guest receives spurious IRQs, or that QEMU is forcing
> > unnecesary exits, i.e hurting performance?
> >
> 
> It is QEMU is forcing unnecessary exits which will hurt performance by
> trying to require the Big QEMU Lock in qemu_wait_io_event.

Please elaborate on the performance impact and why the issue can't be solved in
QEMU.

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by hugo lee 1 month, 3 weeks ago

On Tue, Aug 12, 2025, Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Aug 08, 2025, hugo lee wrote:
> > On Fri, Aug 8, 2025, Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Thu, Aug 07, 2025, hugo lee wrote:
> > > > On Thu, Aug 7, 2025 Sean Christopherson wrote:
> > > > >
> > > > > On Wed, Aug 06, 2025, Yuguo Li wrote:
> > > > > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is
> > > > > > emulated by KVM.  When guest disables LINT0, KVM doesn't exit to QEMU for
> > > > > > synchronization, leaving IOAPIC unaware of this change.  This may cause vCPU
> > > > > > to be kicked when external devices(e.g. PIT)keep sending interrupts.
> > > > >
> > > > > I don't entirely follow what the problem is.  Is the issue that QEMU injects an
> > > > > IRQ that should have been blocked?  Or is QEMU forcing the vCPU to exit unnecessarily?
> > > > >
> > > >
> > > > This issue is about QEMU keeps injecting should-be-blocked
> > > > (blocked by guest and qemu just doesn't know that) IRQs.
> > > > As a result, QEMU forces vCPU to exit unnecessarily.
> > >
> > > Is the problem that the guest receives spurious IRQs, or that QEMU is forcing
> > > unnecesary exits, i.e hurting performance?
> > >
> >
> > It is QEMU is forcing unnecessary exits which will hurt performance by
> > trying to require the Big QEMU Lock in qemu_wait_io_event.
>
> Please elaborate on the performance impact and why the issue can't be solved in
> QEMU.

On some legacy bios images using guests, they may disable PIT
after booting.
When irqchip=split is on, qemu will keep kicking the guest and try to
get the Big QEMU Lock.

This could be solved in QEMU by guessing when to synchronize,
since QEMU doesn't know what's happening on the LAPIC in the kernel.
It can do synchronize in every qemu_cpu_kick which could also
cause unnecessary syncs and influence the performance.
So I think it is more reasonable to synchronize by the writer
than guessing in QEMU.

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by David Woodhouse 1 month, 3 weeks ago

On Tue, 2025-08-12 at 18:08 +0800, hugo lee wrote:
> 
> On some legacy bios images using guests, they may disable PIT
> after booting.

Do you mean they may *not* disable the PIT after booting? Linux had
that problem for a long time, until I fixed it with
https://git.kernel.org/torvalds/c/70e6b7d9ae3

> When irqchip=split is on, qemu will keep kicking the guest and try to
> get the Big QEMU Lock.

If it's the PIT, surely QEMU will keep stealing time pointlessly unless
we actually disable the PIT itself? Not just the IRQ delivery? Or do
you use this to realise that the IRQ output from the PIT isn't going
anywhere and thus disable the event in QEMU completely?

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by hugo lee 1 month, 3 weeks ago

On Tue, Aug 12, David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Tue, 2025-08-12 at 18:08 +0800, hugo lee wrote:
> >
> > On some legacy bios images using guests, they may disable PIT
> > after booting.
>
> Do you mean they may *not* disable the PIT after booting? Linux had
> that problem for a long time, until I fixed it with
> https://git.kernel.org/torvalds/c/70e6b7d9ae3
>

True, they disabled LINT0 and left PIT unaware.

> > When irqchip=split is on, qemu will keep kicking the guest and try to
> > get the Big QEMU Lock.
>
> If it's the PIT, surely QEMU will keep stealing time pointlessly unless
> we actually disable the PIT itself? Not just the IRQ delivery? Or do
> you use this to realise that the IRQ output from the PIT isn't going
> anywhere and thus disable the event in QEMU completely?
>

I'm using this to disable the PIT event in QEMU.

I'm aiming to solve the desynchronization caused by
irqchip=split, so the VM will behave more like the
physical one.
And this synchronization could eliminate the most
performance loss here.
The meaningless PIT which causes the pointless time
cost is guests' problem, and I don't
think we should disable it without clear instructions.

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by David Woodhouse 1 month, 3 weeks ago

On Tue, 2025-08-12 at 19:50 +0800, hugo lee wrote:
> On Tue, Aug 12, David Woodhouse <dwmw2@infradead.org> wrote:
> > 
> > On Tue, 2025-08-12 at 18:08 +0800, hugo lee wrote:
> > > 
> > > On some legacy bios images using guests, they may disable PIT
> > > after booting.
> > 
> > Do you mean they may *not* disable the PIT after booting? Linux had
> > that problem for a long time, until I fixed it with
> > https://git.kernel.org/torvalds/c/70e6b7d9ae3
> > 
> 
> True, they disabled LINT0 and left PIT unaware.
> 
> > > When irqchip=split is on, qemu will keep kicking the guest and try to
> > > get the Big QEMU Lock.
> > 
> > If it's the PIT, surely QEMU will keep stealing time pointlessly unless
> > we actually disable the PIT itself? Not just the IRQ delivery? Or do
> > you use this to realise that the IRQ output from the PIT isn't going
> > anywhere and thus disable the event in QEMU completely?
> > 
> 
> I'm using this to disable the PIT event in QEMU.
> 
> I'm aiming to solve the desynchronization caused by
> irqchip=split, so the VM will behave more like the
> physical one.

I suspect I'm going to hate your QEMU patch when I see it.

KVM has a callback when the IRQ is acked, which it uses to retrigger
the next interrupt in reinject mode.

Even in !reinject mode, the kvm_pit_ack_irq() callback could just as
easily be used to allow the hrtimer to stop completely until the
interrupt gets acked. Which I understand is basically what you want to
do in QEMU?

There shouldn't be any reason to special-case it on the LINT0 setup; if
the interrupt just remains pending in the PIC and is never serviced,
that should *also* mean we stop wasting steal time on it, right?

So ideally, QEMU would have the same infrastructure to 'resample' an
IRQ when it gets acked. And then it would know when the guest is
ignoring the PIT and it needn't bother to generate any more interrupts.

Except QEMU's interrupt controllers don't yet support that. So for VFIO
INTx interrupts, for example, QEMU unmaps the MMIO BARs of the device
while an interrupt is outstanding, then sends an event to the kernel's
resample irqfd when the guest touches a register therein!

I'd love to see you fix this in QEMU by hooking up that 'resample'
signal when the interrupt is acked in the interrupt controller, and
then wouldn't the kernel side of this and the special case for LINT0 be
unneeded?

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by hugo lee 1 month, 3 weeks ago

On Tue, Aug 12, David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Tue, 2025-08-12 at 19:50 +0800, hugo lee wrote:
> > On Tue, Aug 12, David Woodhouse <dwmw2@infradead.org> wrote:
> > >
> > > On Tue, 2025-08-12 at 18:08 +0800, hugo lee wrote:
> > > >
> > > > On some legacy bios images using guests, they may disable PIT
> > > > after booting.
> > >
> > > Do you mean they may *not* disable the PIT after booting? Linux had
> > > that problem for a long time, until I fixed it with
> > > https://git.kernel.org/torvalds/c/70e6b7d9ae3
> > >
> >
> > True, they disabled LINT0 and left PIT unaware.
> >
> > > > When irqchip=split is on, qemu will keep kicking the guest and try to
> > > > get the Big QEMU Lock.
> > >
> > > If it's the PIT, surely QEMU will keep stealing time pointlessly unless
> > > we actually disable the PIT itself? Not just the IRQ delivery? Or do
> > > you use this to realise that the IRQ output from the PIT isn't going
> > > anywhere and thus disable the event in QEMU completely?
> > >
> >
> > I'm using this to disable the PIT event in QEMU.
> >
> > I'm aiming to solve the desynchronization caused by
> > irqchip=split, so the VM will behave more like the
> > physical one.
>
> I suspect I'm going to hate your QEMU patch when I see it.
>
> KVM has a callback when the IRQ is acked, which it uses to retrigger
> the next interrupt in reinject mode.
>
> Even in !reinject mode, the kvm_pit_ack_irq() callback could just as
> easily be used to allow the hrtimer to stop completely until the
> interrupt gets acked. Which I understand is basically what you want to
> do in QEMU?
>
> There shouldn't be any reason to special-case it on the LINT0 setup; if
> the interrupt just remains pending in the PIC and is never serviced,
> that should *also* mean we stop wasting steal time on it, right?
>
> So ideally, QEMU would have the same infrastructure to 'resample' an
> IRQ when it gets acked. And then it would know when the guest is
> ignoring the PIT and it needn't bother to generate any more interrupts.
>
> Except QEMU's interrupt controllers don't yet support that. So for VFIO
> INTx interrupts, for example, QEMU unmaps the MMIO BARs of the device
> while an interrupt is outstanding, then sends an event to the kernel's
> resample irqfd when the guest touches a register therein!
>
> I'd love to see you fix this in QEMU by hooking up that 'resample'
> signal when the interrupt is acked in the interrupt controller, and
> then wouldn't the kernel side of this and the special case for LINT0 be
> unneeded?
>

Sorry for the misleading, what I was going to say is
do only cpu_synchroniza_state() in this new userspace exit reason
and do nothing on the PIT.
So QEMU will ignore the PIT as the guests do.

The resample is great and needed, but the synchronization
makes more sense to me on this question.

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by David Woodhouse 1 month, 3 weeks ago

On Wed, 2025-08-13 at 17:30 +0800, hugo lee wrote:
> 
> Sorry for the misleading, what I was going to say is
> do only cpu_synchroniza_state() in this new userspace exit reason
> and do nothing on the PIT.
> So QEMU will ignore the PIT as the guests do.
> 
> The resample is great and needed, but the synchronization
> makes more sense to me on this question.

So if the guest doesn't actually quiesce the PIT, QEMU will *still*
keep waking up to waggle the PIT output pin, it's just that QEMU won't
bother telling the kernel about it?

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by hugo lee 1 month, 3 weeks ago

On Wed, Aug 13, 2025 David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Wed, 2025-08-13 at 17:30 +0800, hugo lee wrote:
> >
> > Sorry for the misleading, what I was going to say is
> > do only cpu_synchroniza_state() in this new userspace exit reason
> > and do nothing on the PIT.
> > So QEMU will ignore the PIT as the guests do.
> >
> > The resample is great and needed, but the synchronization
> > makes more sense to me on this question.
>
> So if the guest doesn't actually quiesce the PIT, QEMU will *still*
> keep waking up to waggle the PIT output pin, it's just that QEMU won't
> bother telling the kernel about it?

Yes, just as guests wish.
This could eliminate the most performance loss.

But I guess resample is more acceptable.

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by David Woodhouse 1 month, 3 weeks ago

On Thu, 2025-08-14 at 16:54 +0800, hugo lee wrote:
> On Wed, Aug 13, 2025 David Woodhouse <dwmw2@infradead.org> wrote:
> > 
> > On Wed, 2025-08-13 at 17:30 +0800, hugo lee wrote:
> > > 
> > > Sorry for the misleading, what I was going to say is
> > > do only cpu_synchroniza_state() in this new userspace exit reason
> > > and do nothing on the PIT.
> > > So QEMU will ignore the PIT as the guests do.
> > > 
> > > The resample is great and needed, but the synchronization
> > > makes more sense to me on this question.
> > 
> > So if the guest doesn't actually quiesce the PIT, QEMU will *still*
> > keep waking up to waggle the PIT output pin, it's just that QEMU won't
> > bother telling the kernel about it?
> 
> Yes, just as guests wish.
> This could eliminate the most performance loss.
> 
> But I guess resample is more acceptable.

Simpler, cleaner, solves it for more use cases including when the
interrupt *is* delivered to the PIC but just never services. And allows
us to fix the VFIO INTx abomination and use the kernel's irqfd API
properly...

But that's a discussion for the qemu-devel list, I suppose.

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by David Woodhouse 1 month, 3 weeks ago

On Mon, 2025-08-11 at 09:32 -0700, Sean Christopherson wrote:
> On Fri, Aug 08, 2025, hugo lee wrote:
> > On Fri, Aug 8, 2025, Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > On Thu, Aug 07, 2025, hugo lee wrote:
> > > > On Thu, Aug 7, 2025 Sean Christopherson wrote:
> > > > > 
> > > > > On Wed, Aug 06, 2025, Yuguo Li wrote:
> > > > > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is
> > > > > > emulated by KVM.  When guest disables LINT0, KVM doesn't exit to QEMU for
> > > > > > synchronization, leaving IOAPIC unaware of this change.  This may cause vCPU
> > > > > > to be kicked when external devices(e.g. PIT)keep sending interrupts.
> > > > > 
> > > > > I don't entirely follow what the problem is.  Is the issue that QEMU injects an
> > > > > IRQ that should have been blocked?  Or is QEMU forcing the vCPU to exit unnecessarily?
> > > > > 
> > > > 
> > > > This issue is about QEMU keeps injecting should-be-blocked
> > > > (blocked by guest and qemu just doesn't know that) IRQs.
> > > > As a result, QEMU forces vCPU to exit unnecessarily.
> > > 
> > > Is the problem that the guest receives spurious IRQs, or that QEMU is forcing
> > > unnecesary exits, i.e hurting performance?
> > > 
> > 
> > It is QEMU is forcing unnecessary exits which will hurt performance by
> > trying to require the Big QEMU Lock in qemu_wait_io_event.
> 
> Please elaborate on the performance impact and why the issue can't be solved in
> QEMU.

Is there a corresponding QEMU patch to use this new exit reason?

Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when irqchip=split

Posted by hugo lee 1 month, 3 weeks ago

On Tue, Aug 12, 2025 at 5:39 PM David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Mon, 2025-08-11 at 09:32 -0700, Sean Christopherson wrote:
> > On Fri, Aug 08, 2025, hugo lee wrote:
> > > On Fri, Aug 8, 2025, Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Thu, Aug 07, 2025, hugo lee wrote:
> > > > > On Thu, Aug 7, 2025 Sean Christopherson wrote:
> > > > > >
> > > > > > On Wed, Aug 06, 2025, Yuguo Li wrote:
> > > > > > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is
> > > > > > > emulated by KVM.  When guest disables LINT0, KVM doesn't exit to QEMU for
> > > > > > > synchronization, leaving IOAPIC unaware of this change.  This may cause vCPU
> > > > > > > to be kicked when external devices(e.g. PIT)keep sending interrupts.
> > > > > >
> > > > > > I don't entirely follow what the problem is.  Is the issue that QEMU injects an
> > > > > > IRQ that should have been blocked?  Or is QEMU forcing the vCPU to exit unnecessarily?
> > > > > >
> > > > >
> > > > > This issue is about QEMU keeps injecting should-be-blocked
> > > > > (blocked by guest and qemu just doesn't know that) IRQs.
> > > > > As a result, QEMU forces vCPU to exit unnecessarily.
> > > >
> > > > Is the problem that the guest receives spurious IRQs, or that QEMU is forcing
> > > > unnecesary exits, i.e hurting performance?
> > > >
> > >
> > > It is QEMU is forcing unnecessary exits which will hurt performance by
> > > trying to require the Big QEMU Lock in qemu_wait_io_event.
> >
> > Please elaborate on the performance impact and why the issue can't be solved in
> > QEMU.
>
> Is there a corresponding QEMU patch to use this new exit reason?

No, but the patch is done and will be submitted soon.