[PULL 05/11] hpet: fix and cleanup persistence of interrupt status

Paolo Bonzini posted 11 patches 1 year, 6 months ago
[PULL 05/11] hpet: fix and cleanup persistence of interrupt status
Posted by Paolo Bonzini 1 year, 6 months ago
There are several bugs in the handling of the ISR register:

- switching level->edge was not lowering the interrupt and
  clearing ISR

- switching on the enable bit was not raising a level-triggered
  interrupt if the timer had fired

- the timer must be kept running even if not enabled, in
  order to set the ISR flag, so writes to HPET_TN_CFG must
  not call hpet_del_timer()

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/timer/hpet.c | 60 +++++++++++++++++++++++++++++++++----------------
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index 4cb5393c0b5..58073df02b5 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -196,21 +196,31 @@ static void update_irq(struct HPETTimer *timer, int set)
     }
     s = timer->state;
     mask = 1 << timer->tn;
-    if (!set || !timer_enabled(timer) || !hpet_enabled(timer->state)) {
+
+    if (set && (timer->config & HPET_TN_TYPE_LEVEL)) {
+        /*
+         * If HPET_TN_ENABLE bit is 0, "the timer will still operate and
+         * generate appropriate status bits, but will not cause an interrupt"
+         */
+        s->isr |= mask;
+    } else {
         s->isr &= ~mask;
+    }
+
+    if (set && timer_enabled(timer) && hpet_enabled(s)) {
+        if (timer_fsb_route(timer)) {
+            address_space_stl_le(&address_space_memory, timer->fsb >> 32,
+                                 timer->fsb & 0xffffffff, MEMTXATTRS_UNSPECIFIED,
+                                 NULL);
+        } else if (timer->config & HPET_TN_TYPE_LEVEL) {
+            qemu_irq_raise(s->irqs[route]);
+        } else {
+            qemu_irq_pulse(s->irqs[route]);
+        }
+    } else {
         if (!timer_fsb_route(timer)) {
             qemu_irq_lower(s->irqs[route]);
         }
-    } else if (timer_fsb_route(timer)) {
-        address_space_stl_le(&address_space_memory, timer->fsb >> 32,
-                             timer->fsb & 0xffffffff, MEMTXATTRS_UNSPECIFIED,
-                             NULL);
-    } else if (timer->config & HPET_TN_TYPE_LEVEL) {
-        s->isr |= mask;
-        qemu_irq_raise(s->irqs[route]);
-    } else {
-        s->isr &= ~mask;
-        qemu_irq_pulse(s->irqs[route]);
     }
 }
 
@@ -414,8 +424,13 @@ static void hpet_set_timer(HPETTimer *t)
 
 static void hpet_del_timer(HPETTimer *t)
 {
+    HPETState *s = t->state;
     timer_del(t->qemu_timer);
-    update_irq(t, 0);
+
+    if (s->isr & (1 << t->tn)) {
+        /* For level-triggered interrupt, this leaves ISR set but lowers irq.  */
+        update_irq(t, 1);
+    }
 }
 
 static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
@@ -515,20 +530,26 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
         switch ((addr - 0x100) % 0x20) {
         case HPET_TN_CFG:
             trace_hpet_ram_write_tn_cfg();
-            if (activating_bit(old_val, new_val, HPET_TN_FSB_ENABLE)) {
+            if (deactivating_bit(old_val, new_val, HPET_TN_TYPE_LEVEL)) {
+                /*
+                 * Do this before changing timer->config; otherwise, if
+                 * HPET_TN_FSB is set, update_irq will not lower the qemu_irq.
+                 */
                 update_irq(timer, 0);
             }
             val = hpet_fixup_reg(new_val, old_val, HPET_TN_CFG_WRITE_MASK);
             timer->config = (timer->config & 0xffffffff00000000ULL) | val;
+            if (activating_bit(old_val, new_val, HPET_TN_ENABLE)
+                && (s->isr & (1 << timer_id))) {
+                update_irq(timer, 1);
+            }
+
             if (new_val & HPET_TN_32BIT) {
                 timer->cmp = (uint32_t)timer->cmp;
                 timer->period = (uint32_t)timer->period;
             }
-            if (activating_bit(old_val, new_val, HPET_TN_ENABLE) &&
-                hpet_enabled(s)) {
+            if (hpet_enabled(s)) {
                 hpet_set_timer(timer);
-            } else if (deactivating_bit(old_val, new_val, HPET_TN_ENABLE)) {
-                hpet_del_timer(timer);
             }
             break;
         case HPET_TN_CFG + 4: // Interrupt capabilities
@@ -606,9 +627,10 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
                 s->hpet_offset =
                     ticks_to_ns(s->hpet_counter) - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
                 for (i = 0; i < s->num_timers; i++) {
-                    if ((&s->timer[i])->cmp != ~0ULL) {
-                        hpet_set_timer(&s->timer[i]);
+                    if (timer_enabled(&s->timer[i]) && (s->isr & (1 << i))) {
+                        update_irq(&s->timer[i], 1);
                     }
+                    hpet_set_timer(&s->timer[i]);
                 }
             } else if (deactivating_bit(old_val, new_val, HPET_CFG_ENABLE)) {
                 /* Halt main counter and disable interrupt generation. */
-- 
2.45.2
Re: [PULL 05/11] hpet: fix and cleanup persistence of interrupt status
Posted by Fiona Ebner 10 months, 3 weeks ago
Hi,

Am 23.07.24 um 16:15 schrieb Paolo Bonzini:
> There are several bugs in the handling of the ISR register:
> 
> - switching level->edge was not lowering the interrupt and
>   clearing ISR
> 
> - switching on the enable bit was not raising a level-triggered
>   interrupt if the timer had fired
> 
> - the timer must be kept running even if not enabled, in
>   order to set the ISR flag, so writes to HPET_TN_CFG must
>   not call hpet_del_timer()

we've been getting user reports about increased CPU usage for QEMU
processes on the host after they updated from QEMU 9.0 to QEMU 9.2.
Bisecting points to this change, and users confirmed that adding the
machine option hpet=off reduces the CPU usage again. Some quick
experimentation suggests that in particular the last part here, i.e.
"timer must be kept running even if not enabled" is the reason for this.

I reproduced the issue with a Debian 12.10 guest with kernel 6.1. It
enables and then immediately disables the HPET timer again during boot
(i.e. the HPET_TN_ENABLE bit).

I also installed 6.13.7 in the guest for comparison, but that just made
the issue worse, because it uses a higher frequency for the timer.

Is this something Linux should/could handle differently? I suppose there
is nothing that can be done on the QEMU side to avoid this while aiming
to keep the implementation correct?

If necessary, our management layer downstream could disable the hpet
timer by default and make users explicitly enable it if
desired/required. I just wanted to ask if the issue is known and if
there are any other suggestions? Thanks!

Best Regards,
Fiona
Re: [PULL 05/11] hpet: fix and cleanup persistence of interrupt status
Posted by Paolo Bonzini 10 months, 3 weeks ago
On Wed, Mar 19, 2025 at 4:29 PM Fiona Ebner <f.ebner@proxmox.com> wrote:
>
> Hi,
> If necessary, our management layer downstream could disable the hpet
> timer by default and make users explicitly enable it if
> desired/required. I just wanted to ask if the issue is known and if
> there are any other suggestions? Thanks!

We can try to skip the timer if the interrupt would have no change -
similar to how the RTC works, it's more complex but very effective in
reducing CPU usage.

That said, if you can disable the HPET timer by default without
problems with e.g. live migration I strongly suggest you do. And in
the mean time you can also revert these patches, they were actually
reported as bugs but it's not clear what guest OS was affected.

Paolo
Re: [PULL 05/11] hpet: fix and cleanup persistence of interrupt status
Posted by Fiona Ebner 10 months, 3 weeks ago
Am 19.03.25 um 16:30 schrieb Paolo Bonzini:
> On Wed, Mar 19, 2025 at 4:29 PM Fiona Ebner <f.ebner@proxmox.com> wrote:
>>
>> Hi,
>> If necessary, our management layer downstream could disable the hpet
>> timer by default and make users explicitly enable it if
>> desired/required. I just wanted to ask if the issue is known and if
>> there are any other suggestions? Thanks!
> 
> We can try to skip the timer if the interrupt would have no change -
> similar to how the RTC works, it's more complex but very effective in
> reducing CPU usage.
> 
> That said, if you can disable the HPET timer by default without
> problems with e.g. live migration I strongly suggest you do. And in
> the mean time you can also revert these patches, they were actually
> reported as bugs but it's not clear what guest OS was affected.

We'll only be able to disable it starting from a new (downstream)
machine version, but that is fine. For now, I'll go for the revert,
thank you for the suggestion! Is disabling it strongly suggested because
of those bug reports? Or are there issues in general?

Best Regards,
Fiona


Re: [PULL 05/11] hpet: fix and cleanup persistence of interrupt status
Posted by Paolo Bonzini 10 months, 3 weeks ago
On Wed, Mar 19, 2025 at 4:47 PM Fiona Ebner <f.ebner@proxmox.com> wrote:
> We'll only be able to disable it starting from a new (downstream)
> machine version, but that is fine. For now, I'll go for the revert,
> thank you for the suggestion! Is disabling it strongly suggested because
> of those bug reports? Or are there issues in general?

No, the bug reports are really just for corner cases and there are no
huge issues. However, both Linux and Windows give the HPET a
relatively high priority that it probably does not deserve. :)

For Linux you should be using kvmclock already, and for Windows the
Hyper-V paravirtualized clock. If you don't have the Hyper-V pv clock,
the RTC periodic timer is more battle-tested as an emulated
clock/timer device; disabling the HPET ensures that Windows uses the
RTC.

Paolo
Re: [PULL 05/11] hpet: fix and cleanup persistence of interrupt status
Posted by Fiona Ebner 10 months, 3 weeks ago
Am 19.03.25 um 16:51 schrieb Paolo Bonzini:
> On Wed, Mar 19, 2025 at 4:47 PM Fiona Ebner <f.ebner@proxmox.com> wrote:
>> We'll only be able to disable it starting from a new (downstream)
>> machine version, but that is fine. For now, I'll go for the revert,
>> thank you for the suggestion! Is disabling it strongly suggested because
>> of those bug reports? Or are there issues in general?
> 
> No, the bug reports are really just for corner cases and there are no
> huge issues. However, both Linux and Windows give the HPET a
> relatively high priority that it probably does not deserve. :)
> 
> For Linux you should be using kvmclock already, and for Windows the
> Hyper-V paravirtualized clock. If you don't have the Hyper-V pv clock,
> the RTC periodic timer is more battle-tested as an emulated
> clock/timer device; disabling the HPET ensures that Windows uses the
> RTC.

Good to know! Right, for Windows guests, we actually already disable HPET :)

Best Regards,
Fiona