arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/lapic.c | 4 ++++ arch/x86/kvm/x86.c | 5 +++++ include/uapi/linux/kvm.h | 1 + 4 files changed, 11 insertions(+)
When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is emulated by KVM.
When guest disables LINT0, KVM doesn't exit to QEMU for synchronization, leaving IOAPIC unaware of this change.
This may cause vCPU to be kicked when external devices(e.g. PIT)keep sending interrupts.
This patch ensure that KVM exits to QEMU for synchronization when the guest disables LINT0.
Signed-off-by: Yuguo Li <hugoolli@tencent.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/lapic.c | 4 ++++
arch/x86/kvm/x86.c | 5 +++++
include/uapi/linux/kvm.h | 1 +
4 files changed, 11 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f19a76d3ca0e..f69ce111bbe0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -129,6 +129,7 @@
KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
+#define KVM_REQ_LAPIC_UPDATE KVM_ARCH_REQ(35)
#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 8172c2042dd6..65ffa89bf8a6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2329,6 +2329,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
val |= APIC_LVT_MASKED;
val &= apic_lvt_mask[index];
kvm_lapic_set_reg(apic, reg, val);
+ if (irqchip_split(apic->vcpu->kvm) && (val & APIC_LVT_MASKED)) {
+ kvm_make_request(KVM_REQ_LAPIC_UPDATE, apic->vcpu);
+ kvm_vcpu_kick(apic->vcpu);
+ }
break;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a1c49bc681c4..0d6d29488ee9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10779,6 +10779,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
r = 0;
goto out;
}
+ if (kvm_check_request(KVM_REQ_LAPIC_UPDATE, vcpu)) {
+ vcpu->run->exit_reason = KVM_EXIT_APIC_SYNC;
+ r = 0;
+ goto out;
+ }
/*
* KVM_REQ_HV_STIMER has to be processed after
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f0f0d49d2544..3425076d5c7b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -179,6 +179,7 @@ struct kvm_xen_exit {
#define KVM_EXIT_LOONGARCH_IOCSR 38
#define KVM_EXIT_MEMORY_FAULT 39
#define KVM_EXIT_TDX 40
+#define KVM_EXIT_APIC_SYNC 41
/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
--
2.43.5
On Wed, Aug 06, 2025, Yuguo Li wrote: > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is > emulated by KVM. When guest disables LINT0, KVM doesn't exit to QEMU for > synchronization, leaving IOAPIC unaware of this change. This may cause vCPU > to be kicked when external devices(e.g. PIT)keep sending interrupts. I don't entirely follow what the problem is. Is the issue that QEMU injects an IRQ that should have been blocked? Or is QEMU forcing the vCPU to exit unnecessarily? > This patch ensure that KVM exits to QEMU for synchronization when the guest > disables LINT0. Please wrap at ~75 characters. > Signed-off-by: Yuguo Li <hugoolli@tencent.com> > --- > arch/x86/include/asm/kvm_host.h | 1 + > arch/x86/kvm/lapic.c | 4 ++++ > arch/x86/kvm/x86.c | 5 +++++ > include/uapi/linux/kvm.h | 1 + > 4 files changed, 11 insertions(+) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index f19a76d3ca0e..f69ce111bbe0 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -129,6 +129,7 @@ > KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) > #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \ > KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT) > +#define KVM_REQ_LAPIC_UPDATE KVM_ARCH_REQ(35) > > #define CR0_RESERVED_BITS \ > (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 8172c2042dd6..65ffa89bf8a6 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -2329,6 +2329,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) > val |= APIC_LVT_MASKED; > val &= apic_lvt_mask[index]; > kvm_lapic_set_reg(apic, reg, val); > + if (irqchip_split(apic->vcpu->kvm) && (val & APIC_LVT_MASKED)) { This applies to much more than just LINT0, and for at least LVTPC and LVTCMCI, KVM definitely doesn't want to exit on every change. Even for LINT0, it's not obvious that "pushing" from KVM is a better option than having QEMU "pull" as needed. At the very least, this would need to be guarded by a capability, otherwise the new userspace exit would confuse existing VMMs (and probably result in the VM being terminated). > + kvm_make_request(KVM_REQ_LAPIC_UPDATE, apic->vcpu); > + kvm_vcpu_kick(apic->vcpu); Why kick? Cross-vCPU writes to LINT0 shouldn't be a thing, i.e. the kick should effectivel be a nop.
On Thu, Aug 7, 2025 Sean Christopherson wrote: > > On Wed, Aug 06, 2025, Yuguo Li wrote: > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is > > emulated by KVM. When guest disables LINT0, KVM doesn't exit to QEMU for > > synchronization, leaving IOAPIC unaware of this change. This may cause vCPU > > to be kicked when external devices(e.g. PIT)keep sending interrupts. > > I don't entirely follow what the problem is. Is the issue that QEMU injects an > IRQ that should have been blocked? Or is QEMU forcing the vCPU to exit unnecessarily? > This issue is about QEMU keeps injecting should-be-blocked (blocked by guest and qemu just doesn't know that) IRQs. As a result, QEMU forces vCPU to exit unnecessarily. > > This patch ensure that KVM exits to QEMU for synchronization when the guest > > disables LINT0. > > Please wrap at ~75 characters. Thanks for reminding, will do in the next patch. > > > Signed-off-by: Yuguo Li <hugoolli@tencent.com> > > --- > > arch/x86/include/asm/kvm_host.h | 1 + > > arch/x86/kvm/lapic.c | 4 ++++ > > arch/x86/kvm/x86.c | 5 +++++ > > include/uapi/linux/kvm.h | 1 + > > 4 files changed, 11 insertions(+) > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index f19a76d3ca0e..f69ce111bbe0 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -129,6 +129,7 @@ > > KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) > > #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \ > > KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT) > > +#define KVM_REQ_LAPIC_UPDATE KVM_ARCH_REQ(35) > > > > #define CR0_RESERVED_BITS \ > > (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > index 8172c2042dd6..65ffa89bf8a6 100644 > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -2329,6 +2329,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) > > val |= APIC_LVT_MASKED; > > val &= apic_lvt_mask[index]; > > kvm_lapic_set_reg(apic, reg, val); > > + if (irqchip_split(apic->vcpu->kvm) && (val & APIC_LVT_MASKED)) { > > This applies to much more than just LINT0, and for at least LVTPC and LVTCMCI, > KVM definitely doesn't want to exit on every change. Actually every masking on LAPIC should be synchronized with IOAPIC. Because any desynchronization may cause unnecessary kicks which rarely happens due to the well-behaving guests. Exits here won't harm, but maybe only exit when LINT0 is being masked? Since others unlikely cause exits. > > Even for LINT0, it's not obvious that "pushing" from KVM is a better option than > having QEMU "pull" as needed. > QEMU has no idea when LINT0 is masked by the guest. Then the problem becomes when it is needed to "pull". The guess on this could lead to extra costs. > At the very least, this would need to be guarded by a capability, otherwise > the new userspace exit would confuse existing VMMs (and probably result in the > VM being terminated). True, I'll add this protection. > > > > + kvm_make_request(KVM_REQ_LAPIC_UPDATE, apic->vcpu); > > + kvm_vcpu_kick(apic->vcpu); > > Why kick? Cross-vCPU writes to LINT0 shouldn't be a thing, i.e. the kick should > effectivel be a nop. It is unnecessary, I will fix it in the next version.
On Thu, Aug 07, 2025, hugo lee wrote: > On Thu, Aug 7, 2025 Sean Christopherson wrote: > > > > On Wed, Aug 06, 2025, Yuguo Li wrote: > > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is > > > emulated by KVM. When guest disables LINT0, KVM doesn't exit to QEMU for > > > synchronization, leaving IOAPIC unaware of this change. This may cause vCPU > > > to be kicked when external devices(e.g. PIT)keep sending interrupts. > > > > I don't entirely follow what the problem is. Is the issue that QEMU injects an > > IRQ that should have been blocked? Or is QEMU forcing the vCPU to exit unnecessarily? > > > > This issue is about QEMU keeps injecting should-be-blocked > (blocked by guest and qemu just doesn't know that) IRQs. > As a result, QEMU forces vCPU to exit unnecessarily. Is the problem that the guest receives spurious IRQs, or that QEMU is forcing unnecesary exits, i.e hurting performance? > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > > index 8172c2042dd6..65ffa89bf8a6 100644 > > > --- a/arch/x86/kvm/lapic.c > > > +++ b/arch/x86/kvm/lapic.c > > > @@ -2329,6 +2329,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) > > > val |= APIC_LVT_MASKED; > > > val &= apic_lvt_mask[index]; > > > kvm_lapic_set_reg(apic, reg, val); > > > + if (irqchip_split(apic->vcpu->kvm) && (val & APIC_LVT_MASKED)) { > > > > This applies to much more than just LINT0, and for at least LVTPC and LVTCMCI, > > KVM definitely doesn't want to exit on every change. > > Actually every masking on LAPIC should be synchronized with IOAPIC. No, because not all LVT entries can be wired up to the I/O APIC. > Because any desynchronization may cause unnecessary kicks > which rarely happens due to the well-behaving guests. > Exits here won't harm, but maybe only exit when LINT0 is being masked? Exits here absolutely will harm the VM by generating spurious slow path exits. > Since others unlikely cause exits. On Intel, LVTPC is masked on every PMI. > > Even for LINT0, it's not obvious that "pushing" from KVM is a better option than > > having QEMU "pull" as needed. > > > > QEMU has no idea when LINT0 is masked by the guest. Then the problem becomes > when it is needed to "pull". The guess on this could lead to extra costs. So this patch is motivated by performance?
On Fri, Aug 8, 2025, Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Aug 07, 2025, hugo lee wrote: > > On Thu, Aug 7, 2025 Sean Christopherson wrote: > > > > > > On Wed, Aug 06, 2025, Yuguo Li wrote: > > > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is > > > > emulated by KVM. When guest disables LINT0, KVM doesn't exit to QEMU for > > > > synchronization, leaving IOAPIC unaware of this change. This may cause vCPU > > > > to be kicked when external devices(e.g. PIT)keep sending interrupts. > > > > > > I don't entirely follow what the problem is. Is the issue that QEMU injects an > > > IRQ that should have been blocked? Or is QEMU forcing the vCPU to exit unnecessarily? > > > > > > > This issue is about QEMU keeps injecting should-be-blocked > > (blocked by guest and qemu just doesn't know that) IRQs. > > As a result, QEMU forces vCPU to exit unnecessarily. > > Is the problem that the guest receives spurious IRQs, or that QEMU is forcing > unnecesary exits, i.e hurting performance? > It is QEMU is forcing unnecessary exits which will hurt performance by trying to require the Big QEMU Lock in qemu_wait_io_event. > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > > > index 8172c2042dd6..65ffa89bf8a6 100644 > > > > --- a/arch/x86/kvm/lapic.c > > > > +++ b/arch/x86/kvm/lapic.c > > > > @@ -2329,6 +2329,10 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val) > > > > val |= APIC_LVT_MASKED; > > > > val &= apic_lvt_mask[index]; > > > > kvm_lapic_set_reg(apic, reg, val); > > > > + if (irqchip_split(apic->vcpu->kvm) && (val & APIC_LVT_MASKED)) { > > > > > > This applies to much more than just LINT0, and for at least LVTPC and LVTCMCI, > > > KVM definitely doesn't want to exit on every change. > > > > Actually every masking on LAPIC should be synchronized with IOAPIC. > > No, because not all LVT entries can be wired up to the I/O APIC. > > > Because any desynchronization may cause unnecessary kicks > > which rarely happens due to the well-behaving guests. > > Exits here won't harm, but maybe only exit when LINT0 is being masked? > > Exits here absolutely will harm the VM by generating spurious slow path exits. > > > Since others unlikely cause exits. > > On Intel, LVTPC is masked on every PMI. > So I will make it only exit when LINT0/1 is being masked. > > > Even for LINT0, it's not obvious that "pushing" from KVM is a better option than > > > having QEMU "pull" as needed. > > > > > > > QEMU has no idea when LINT0 is masked by the guest. Then the problem becomes > > when it is needed to "pull". The guess on this could lead to extra costs. > > So this patch is motivated by performance? Yes.
On Fri, Aug 08, 2025, hugo lee wrote: > On Fri, Aug 8, 2025, Sean Christopherson <seanjc@google.com> wrote: > > > > On Thu, Aug 07, 2025, hugo lee wrote: > > > On Thu, Aug 7, 2025 Sean Christopherson wrote: > > > > > > > > On Wed, Aug 06, 2025, Yuguo Li wrote: > > > > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is > > > > > emulated by KVM. When guest disables LINT0, KVM doesn't exit to QEMU for > > > > > synchronization, leaving IOAPIC unaware of this change. This may cause vCPU > > > > > to be kicked when external devices(e.g. PIT)keep sending interrupts. > > > > > > > > I don't entirely follow what the problem is. Is the issue that QEMU injects an > > > > IRQ that should have been blocked? Or is QEMU forcing the vCPU to exit unnecessarily? > > > > > > > > > > This issue is about QEMU keeps injecting should-be-blocked > > > (blocked by guest and qemu just doesn't know that) IRQs. > > > As a result, QEMU forces vCPU to exit unnecessarily. > > > > Is the problem that the guest receives spurious IRQs, or that QEMU is forcing > > unnecesary exits, i.e hurting performance? > > > > It is QEMU is forcing unnecessary exits which will hurt performance by > trying to require the Big QEMU Lock in qemu_wait_io_event. Please elaborate on the performance impact and why the issue can't be solved in QEMU.
On Tue, Aug 12, 2025, Sean Christopherson <seanjc@google.com> wrote: > > On Fri, Aug 08, 2025, hugo lee wrote: > > On Fri, Aug 8, 2025, Sean Christopherson <seanjc@google.com> wrote: > > > > > > On Thu, Aug 07, 2025, hugo lee wrote: > > > > On Thu, Aug 7, 2025 Sean Christopherson wrote: > > > > > > > > > > On Wed, Aug 06, 2025, Yuguo Li wrote: > > > > > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is > > > > > > emulated by KVM. When guest disables LINT0, KVM doesn't exit to QEMU for > > > > > > synchronization, leaving IOAPIC unaware of this change. This may cause vCPU > > > > > > to be kicked when external devices(e.g. PIT)keep sending interrupts. > > > > > > > > > > I don't entirely follow what the problem is. Is the issue that QEMU injects an > > > > > IRQ that should have been blocked? Or is QEMU forcing the vCPU to exit unnecessarily? > > > > > > > > > > > > > This issue is about QEMU keeps injecting should-be-blocked > > > > (blocked by guest and qemu just doesn't know that) IRQs. > > > > As a result, QEMU forces vCPU to exit unnecessarily. > > > > > > Is the problem that the guest receives spurious IRQs, or that QEMU is forcing > > > unnecesary exits, i.e hurting performance? > > > > > > > It is QEMU is forcing unnecessary exits which will hurt performance by > > trying to require the Big QEMU Lock in qemu_wait_io_event. > > Please elaborate on the performance impact and why the issue can't be solved in > QEMU. On some legacy bios images using guests, they may disable PIT after booting. When irqchip=split is on, qemu will keep kicking the guest and try to get the Big QEMU Lock. This could be solved in QEMU by guessing when to synchronize, since QEMU doesn't know what's happening on the LAPIC in the kernel. It can do synchronize in every qemu_cpu_kick which could also cause unnecessary syncs and influence the performance. So I think it is more reasonable to synchronize by the writer than guessing in QEMU.
On Tue, 2025-08-12 at 18:08 +0800, hugo lee wrote: > > On some legacy bios images using guests, they may disable PIT > after booting. Do you mean they may *not* disable the PIT after booting? Linux had that problem for a long time, until I fixed it with https://git.kernel.org/torvalds/c/70e6b7d9ae3 > When irqchip=split is on, qemu will keep kicking the guest and try to > get the Big QEMU Lock. If it's the PIT, surely QEMU will keep stealing time pointlessly unless we actually disable the PIT itself? Not just the IRQ delivery? Or do you use this to realise that the IRQ output from the PIT isn't going anywhere and thus disable the event in QEMU completely?
On Tue, Aug 12, David Woodhouse <dwmw2@infradead.org> wrote: > > On Tue, 2025-08-12 at 18:08 +0800, hugo lee wrote: > > > > On some legacy bios images using guests, they may disable PIT > > after booting. > > Do you mean they may *not* disable the PIT after booting? Linux had > that problem for a long time, until I fixed it with > https://git.kernel.org/torvalds/c/70e6b7d9ae3 > True, they disabled LINT0 and left PIT unaware. > > When irqchip=split is on, qemu will keep kicking the guest and try to > > get the Big QEMU Lock. > > If it's the PIT, surely QEMU will keep stealing time pointlessly unless > we actually disable the PIT itself? Not just the IRQ delivery? Or do > you use this to realise that the IRQ output from the PIT isn't going > anywhere and thus disable the event in QEMU completely? > I'm using this to disable the PIT event in QEMU. I'm aiming to solve the desynchronization caused by irqchip=split, so the VM will behave more like the physical one. And this synchronization could eliminate the most performance loss here. The meaningless PIT which causes the pointless time cost is guests' problem, and I don't think we should disable it without clear instructions.
On Tue, 2025-08-12 at 19:50 +0800, hugo lee wrote: > On Tue, Aug 12, David Woodhouse <dwmw2@infradead.org> wrote: > > > > On Tue, 2025-08-12 at 18:08 +0800, hugo lee wrote: > > > > > > On some legacy bios images using guests, they may disable PIT > > > after booting. > > > > Do you mean they may *not* disable the PIT after booting? Linux had > > that problem for a long time, until I fixed it with > > https://git.kernel.org/torvalds/c/70e6b7d9ae3 > > > > True, they disabled LINT0 and left PIT unaware. > > > > When irqchip=split is on, qemu will keep kicking the guest and try to > > > get the Big QEMU Lock. > > > > If it's the PIT, surely QEMU will keep stealing time pointlessly unless > > we actually disable the PIT itself? Not just the IRQ delivery? Or do > > you use this to realise that the IRQ output from the PIT isn't going > > anywhere and thus disable the event in QEMU completely? > > > > I'm using this to disable the PIT event in QEMU. > > I'm aiming to solve the desynchronization caused by > irqchip=split, so the VM will behave more like the > physical one. I suspect I'm going to hate your QEMU patch when I see it. KVM has a callback when the IRQ is acked, which it uses to retrigger the next interrupt in reinject mode. Even in !reinject mode, the kvm_pit_ack_irq() callback could just as easily be used to allow the hrtimer to stop completely until the interrupt gets acked. Which I understand is basically what you want to do in QEMU? There shouldn't be any reason to special-case it on the LINT0 setup; if the interrupt just remains pending in the PIC and is never serviced, that should *also* mean we stop wasting steal time on it, right? So ideally, QEMU would have the same infrastructure to 'resample' an IRQ when it gets acked. And then it would know when the guest is ignoring the PIT and it needn't bother to generate any more interrupts. Except QEMU's interrupt controllers don't yet support that. So for VFIO INTx interrupts, for example, QEMU unmaps the MMIO BARs of the device while an interrupt is outstanding, then sends an event to the kernel's resample irqfd when the guest touches a register therein! I'd love to see you fix this in QEMU by hooking up that 'resample' signal when the interrupt is acked in the interrupt controller, and then wouldn't the kernel side of this and the special case for LINT0 be unneeded?
On Tue, Aug 12, David Woodhouse <dwmw2@infradead.org> wrote: > > On Tue, 2025-08-12 at 19:50 +0800, hugo lee wrote: > > On Tue, Aug 12, David Woodhouse <dwmw2@infradead.org> wrote: > > > > > > On Tue, 2025-08-12 at 18:08 +0800, hugo lee wrote: > > > > > > > > On some legacy bios images using guests, they may disable PIT > > > > after booting. > > > > > > Do you mean they may *not* disable the PIT after booting? Linux had > > > that problem for a long time, until I fixed it with > > > https://git.kernel.org/torvalds/c/70e6b7d9ae3 > > > > > > > True, they disabled LINT0 and left PIT unaware. > > > > > > When irqchip=split is on, qemu will keep kicking the guest and try to > > > > get the Big QEMU Lock. > > > > > > If it's the PIT, surely QEMU will keep stealing time pointlessly unless > > > we actually disable the PIT itself? Not just the IRQ delivery? Or do > > > you use this to realise that the IRQ output from the PIT isn't going > > > anywhere and thus disable the event in QEMU completely? > > > > > > > I'm using this to disable the PIT event in QEMU. > > > > I'm aiming to solve the desynchronization caused by > > irqchip=split, so the VM will behave more like the > > physical one. > > I suspect I'm going to hate your QEMU patch when I see it. > > KVM has a callback when the IRQ is acked, which it uses to retrigger > the next interrupt in reinject mode. > > Even in !reinject mode, the kvm_pit_ack_irq() callback could just as > easily be used to allow the hrtimer to stop completely until the > interrupt gets acked. Which I understand is basically what you want to > do in QEMU? > > There shouldn't be any reason to special-case it on the LINT0 setup; if > the interrupt just remains pending in the PIC and is never serviced, > that should *also* mean we stop wasting steal time on it, right? > > So ideally, QEMU would have the same infrastructure to 'resample' an > IRQ when it gets acked. And then it would know when the guest is > ignoring the PIT and it needn't bother to generate any more interrupts. > > Except QEMU's interrupt controllers don't yet support that. So for VFIO > INTx interrupts, for example, QEMU unmaps the MMIO BARs of the device > while an interrupt is outstanding, then sends an event to the kernel's > resample irqfd when the guest touches a register therein! > > I'd love to see you fix this in QEMU by hooking up that 'resample' > signal when the interrupt is acked in the interrupt controller, and > then wouldn't the kernel side of this and the special case for LINT0 be > unneeded? > Sorry for the misleading, what I was going to say is do only cpu_synchroniza_state() in this new userspace exit reason and do nothing on the PIT. So QEMU will ignore the PIT as the guests do. The resample is great and needed, but the synchronization makes more sense to me on this question.
On Wed, 2025-08-13 at 17:30 +0800, hugo lee wrote: > > Sorry for the misleading, what I was going to say is > do only cpu_synchroniza_state() in this new userspace exit reason > and do nothing on the PIT. > So QEMU will ignore the PIT as the guests do. > > The resample is great and needed, but the synchronization > makes more sense to me on this question. So if the guest doesn't actually quiesce the PIT, QEMU will *still* keep waking up to waggle the PIT output pin, it's just that QEMU won't bother telling the kernel about it?
On Wed, Aug 13, 2025 David Woodhouse <dwmw2@infradead.org> wrote: > > On Wed, 2025-08-13 at 17:30 +0800, hugo lee wrote: > > > > Sorry for the misleading, what I was going to say is > > do only cpu_synchroniza_state() in this new userspace exit reason > > and do nothing on the PIT. > > So QEMU will ignore the PIT as the guests do. > > > > The resample is great and needed, but the synchronization > > makes more sense to me on this question. > > So if the guest doesn't actually quiesce the PIT, QEMU will *still* > keep waking up to waggle the PIT output pin, it's just that QEMU won't > bother telling the kernel about it? Yes, just as guests wish. This could eliminate the most performance loss. But I guess resample is more acceptable.
On Thu, 2025-08-14 at 16:54 +0800, hugo lee wrote: > On Wed, Aug 13, 2025 David Woodhouse <dwmw2@infradead.org> wrote: > > > > On Wed, 2025-08-13 at 17:30 +0800, hugo lee wrote: > > > > > > Sorry for the misleading, what I was going to say is > > > do only cpu_synchroniza_state() in this new userspace exit reason > > > and do nothing on the PIT. > > > So QEMU will ignore the PIT as the guests do. > > > > > > The resample is great and needed, but the synchronization > > > makes more sense to me on this question. > > > > So if the guest doesn't actually quiesce the PIT, QEMU will *still* > > keep waking up to waggle the PIT output pin, it's just that QEMU won't > > bother telling the kernel about it? > > Yes, just as guests wish. > This could eliminate the most performance loss. > > But I guess resample is more acceptable. Simpler, cleaner, solves it for more use cases including when the interrupt *is* delivered to the PIC but just never services. And allows us to fix the VFIO INTx abomination and use the kernel's irqfd API properly... But that's a discussion for the qemu-devel list, I suppose.
On Mon, 2025-08-11 at 09:32 -0700, Sean Christopherson wrote: > On Fri, Aug 08, 2025, hugo lee wrote: > > On Fri, Aug 8, 2025, Sean Christopherson <seanjc@google.com> wrote: > > > > > > On Thu, Aug 07, 2025, hugo lee wrote: > > > > On Thu, Aug 7, 2025 Sean Christopherson wrote: > > > > > > > > > > On Wed, Aug 06, 2025, Yuguo Li wrote: > > > > > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is > > > > > > emulated by KVM. When guest disables LINT0, KVM doesn't exit to QEMU for > > > > > > synchronization, leaving IOAPIC unaware of this change. This may cause vCPU > > > > > > to be kicked when external devices(e.g. PIT)keep sending interrupts. > > > > > > > > > > I don't entirely follow what the problem is. Is the issue that QEMU injects an > > > > > IRQ that should have been blocked? Or is QEMU forcing the vCPU to exit unnecessarily? > > > > > > > > > > > > > This issue is about QEMU keeps injecting should-be-blocked > > > > (blocked by guest and qemu just doesn't know that) IRQs. > > > > As a result, QEMU forces vCPU to exit unnecessarily. > > > > > > Is the problem that the guest receives spurious IRQs, or that QEMU is forcing > > > unnecesary exits, i.e hurting performance? > > > > > > > It is QEMU is forcing unnecessary exits which will hurt performance by > > trying to require the Big QEMU Lock in qemu_wait_io_event. > > Please elaborate on the performance impact and why the issue can't be solved in > QEMU. Is there a corresponding QEMU patch to use this new exit reason?
On Tue, Aug 12, 2025 at 5:39 PM David Woodhouse <dwmw2@infradead.org> wrote: > > On Mon, 2025-08-11 at 09:32 -0700, Sean Christopherson wrote: > > On Fri, Aug 08, 2025, hugo lee wrote: > > > On Fri, Aug 8, 2025, Sean Christopherson <seanjc@google.com> wrote: > > > > > > > > On Thu, Aug 07, 2025, hugo lee wrote: > > > > > On Thu, Aug 7, 2025 Sean Christopherson wrote: > > > > > > > > > > > > On Wed, Aug 06, 2025, Yuguo Li wrote: > > > > > > > When using split irqchip mode, IOAPIC is handled by QEMU while the LAPIC is > > > > > > > emulated by KVM. When guest disables LINT0, KVM doesn't exit to QEMU for > > > > > > > synchronization, leaving IOAPIC unaware of this change. This may cause vCPU > > > > > > > to be kicked when external devices(e.g. PIT)keep sending interrupts. > > > > > > > > > > > > I don't entirely follow what the problem is. Is the issue that QEMU injects an > > > > > > IRQ that should have been blocked? Or is QEMU forcing the vCPU to exit unnecessarily? > > > > > > > > > > > > > > > > This issue is about QEMU keeps injecting should-be-blocked > > > > > (blocked by guest and qemu just doesn't know that) IRQs. > > > > > As a result, QEMU forces vCPU to exit unnecessarily. > > > > > > > > Is the problem that the guest receives spurious IRQs, or that QEMU is forcing > > > > unnecesary exits, i.e hurting performance? > > > > > > > > > > It is QEMU is forcing unnecessary exits which will hurt performance by > > > trying to require the Big QEMU Lock in qemu_wait_io_event. > > > > Please elaborate on the performance impact and why the issue can't be solved in > > QEMU. > > Is there a corresponding QEMU patch to use this new exit reason? No, but the patch is done and will be submitted soon.
© 2016 - 2025 Red Hat, Inc.