There are several bugs in the handling of the ISR register:
- switching level->edge was not lowering the interrupt and
clearing ISR
- switching on the enable bit was not raising a level-triggered
interrupt if the timer had fired
- the timer must be kept running even if not enabled, in
order to set the ISR flag, so writes to HPET_TN_CFG must
not call hpet_del_timer()
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
hw/timer/hpet.c | 60 +++++++++++++++++++++++++++++++++----------------
1 file changed, 41 insertions(+), 19 deletions(-)
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index 4cb5393c0b5..58073df02b5 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -196,21 +196,31 @@ static void update_irq(struct HPETTimer *timer, int set)
}
s = timer->state;
mask = 1 << timer->tn;
- if (!set || !timer_enabled(timer) || !hpet_enabled(timer->state)) {
+
+ if (set && (timer->config & HPET_TN_TYPE_LEVEL)) {
+ /*
+ * If HPET_TN_ENABLE bit is 0, "the timer will still operate and
+ * generate appropriate status bits, but will not cause an interrupt"
+ */
+ s->isr |= mask;
+ } else {
s->isr &= ~mask;
+ }
+
+ if (set && timer_enabled(timer) && hpet_enabled(s)) {
+ if (timer_fsb_route(timer)) {
+ address_space_stl_le(&address_space_memory, timer->fsb >> 32,
+ timer->fsb & 0xffffffff, MEMTXATTRS_UNSPECIFIED,
+ NULL);
+ } else if (timer->config & HPET_TN_TYPE_LEVEL) {
+ qemu_irq_raise(s->irqs[route]);
+ } else {
+ qemu_irq_pulse(s->irqs[route]);
+ }
+ } else {
if (!timer_fsb_route(timer)) {
qemu_irq_lower(s->irqs[route]);
}
- } else if (timer_fsb_route(timer)) {
- address_space_stl_le(&address_space_memory, timer->fsb >> 32,
- timer->fsb & 0xffffffff, MEMTXATTRS_UNSPECIFIED,
- NULL);
- } else if (timer->config & HPET_TN_TYPE_LEVEL) {
- s->isr |= mask;
- qemu_irq_raise(s->irqs[route]);
- } else {
- s->isr &= ~mask;
- qemu_irq_pulse(s->irqs[route]);
}
}
@@ -414,8 +424,13 @@ static void hpet_set_timer(HPETTimer *t)
static void hpet_del_timer(HPETTimer *t)
{
+ HPETState *s = t->state;
timer_del(t->qemu_timer);
- update_irq(t, 0);
+
+ if (s->isr & (1 << t->tn)) {
+ /* For level-triggered interrupt, this leaves ISR set but lowers irq. */
+ update_irq(t, 1);
+ }
}
static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
@@ -515,20 +530,26 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
switch ((addr - 0x100) % 0x20) {
case HPET_TN_CFG:
trace_hpet_ram_write_tn_cfg();
- if (activating_bit(old_val, new_val, HPET_TN_FSB_ENABLE)) {
+ if (deactivating_bit(old_val, new_val, HPET_TN_TYPE_LEVEL)) {
+ /*
+ * Do this before changing timer->config; otherwise, if
+ * HPET_TN_FSB is set, update_irq will not lower the qemu_irq.
+ */
update_irq(timer, 0);
}
val = hpet_fixup_reg(new_val, old_val, HPET_TN_CFG_WRITE_MASK);
timer->config = (timer->config & 0xffffffff00000000ULL) | val;
+ if (activating_bit(old_val, new_val, HPET_TN_ENABLE)
+ && (s->isr & (1 << timer_id))) {
+ update_irq(timer, 1);
+ }
+
if (new_val & HPET_TN_32BIT) {
timer->cmp = (uint32_t)timer->cmp;
timer->period = (uint32_t)timer->period;
}
- if (activating_bit(old_val, new_val, HPET_TN_ENABLE) &&
- hpet_enabled(s)) {
+ if (hpet_enabled(s)) {
hpet_set_timer(timer);
- } else if (deactivating_bit(old_val, new_val, HPET_TN_ENABLE)) {
- hpet_del_timer(timer);
}
break;
case HPET_TN_CFG + 4: // Interrupt capabilities
@@ -606,9 +627,10 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
s->hpet_offset =
ticks_to_ns(s->hpet_counter) - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
for (i = 0; i < s->num_timers; i++) {
- if ((&s->timer[i])->cmp != ~0ULL) {
- hpet_set_timer(&s->timer[i]);
+ if (timer_enabled(&s->timer[i]) && (s->isr & (1 << i))) {
+ update_irq(&s->timer[i], 1);
}
+ hpet_set_timer(&s->timer[i]);
}
} else if (deactivating_bit(old_val, new_val, HPET_CFG_ENABLE)) {
/* Halt main counter and disable interrupt generation. */
--
2.45.2
Hi, Am 23.07.24 um 16:15 schrieb Paolo Bonzini: > There are several bugs in the handling of the ISR register: > > - switching level->edge was not lowering the interrupt and > clearing ISR > > - switching on the enable bit was not raising a level-triggered > interrupt if the timer had fired > > - the timer must be kept running even if not enabled, in > order to set the ISR flag, so writes to HPET_TN_CFG must > not call hpet_del_timer() we've been getting user reports about increased CPU usage for QEMU processes on the host after they updated from QEMU 9.0 to QEMU 9.2. Bisecting points to this change, and users confirmed that adding the machine option hpet=off reduces the CPU usage again. Some quick experimentation suggests that in particular the last part here, i.e. "timer must be kept running even if not enabled" is the reason for this. I reproduced the issue with a Debian 12.10 guest with kernel 6.1. It enables and then immediately disables the HPET timer again during boot (i.e. the HPET_TN_ENABLE bit). I also installed 6.13.7 in the guest for comparison, but that just made the issue worse, because it uses a higher frequency for the timer. Is this something Linux should/could handle differently? I suppose there is nothing that can be done on the QEMU side to avoid this while aiming to keep the implementation correct? If necessary, our management layer downstream could disable the hpet timer by default and make users explicitly enable it if desired/required. I just wanted to ask if the issue is known and if there are any other suggestions? Thanks! Best Regards, Fiona
On Wed, Mar 19, 2025 at 4:29 PM Fiona Ebner <f.ebner@proxmox.com> wrote: > > Hi, > If necessary, our management layer downstream could disable the hpet > timer by default and make users explicitly enable it if > desired/required. I just wanted to ask if the issue is known and if > there are any other suggestions? Thanks! We can try to skip the timer if the interrupt would have no change - similar to how the RTC works, it's more complex but very effective in reducing CPU usage. That said, if you can disable the HPET timer by default without problems with e.g. live migration I strongly suggest you do. And in the mean time you can also revert these patches, they were actually reported as bugs but it's not clear what guest OS was affected. Paolo
Am 19.03.25 um 16:30 schrieb Paolo Bonzini: > On Wed, Mar 19, 2025 at 4:29 PM Fiona Ebner <f.ebner@proxmox.com> wrote: >> >> Hi, >> If necessary, our management layer downstream could disable the hpet >> timer by default and make users explicitly enable it if >> desired/required. I just wanted to ask if the issue is known and if >> there are any other suggestions? Thanks! > > We can try to skip the timer if the interrupt would have no change - > similar to how the RTC works, it's more complex but very effective in > reducing CPU usage. > > That said, if you can disable the HPET timer by default without > problems with e.g. live migration I strongly suggest you do. And in > the mean time you can also revert these patches, they were actually > reported as bugs but it's not clear what guest OS was affected. We'll only be able to disable it starting from a new (downstream) machine version, but that is fine. For now, I'll go for the revert, thank you for the suggestion! Is disabling it strongly suggested because of those bug reports? Or are there issues in general? Best Regards, Fiona
On Wed, Mar 19, 2025 at 4:47 PM Fiona Ebner <f.ebner@proxmox.com> wrote: > We'll only be able to disable it starting from a new (downstream) > machine version, but that is fine. For now, I'll go for the revert, > thank you for the suggestion! Is disabling it strongly suggested because > of those bug reports? Or are there issues in general? No, the bug reports are really just for corner cases and there are no huge issues. However, both Linux and Windows give the HPET a relatively high priority that it probably does not deserve. :) For Linux you should be using kvmclock already, and for Windows the Hyper-V paravirtualized clock. If you don't have the Hyper-V pv clock, the RTC periodic timer is more battle-tested as an emulated clock/timer device; disabling the HPET ensures that Windows uses the RTC. Paolo
Am 19.03.25 um 16:51 schrieb Paolo Bonzini: > On Wed, Mar 19, 2025 at 4:47 PM Fiona Ebner <f.ebner@proxmox.com> wrote: >> We'll only be able to disable it starting from a new (downstream) >> machine version, but that is fine. For now, I'll go for the revert, >> thank you for the suggestion! Is disabling it strongly suggested because >> of those bug reports? Or are there issues in general? > > No, the bug reports are really just for corner cases and there are no > huge issues. However, both Linux and Windows give the HPET a > relatively high priority that it probably does not deserve. :) > > For Linux you should be using kvmclock already, and for Windows the > Hyper-V paravirtualized clock. If you don't have the Hyper-V pv clock, > the RTC periodic timer is more battle-tested as an emulated > clock/timer device; disabling the HPET ensures that Windows uses the > RTC. Good to know! Right, for Windows guests, we actually already disable HPET :) Best Regards, Fiona
© 2016 - 2026 Red Hat, Inc.