[PATCH v4 5/8] hpet: make main counter read lock-less

Igor Mammedov posted 8 patches 2 months, 2 weeks ago
Maintainers: Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Riku Voipio <riku.voipio@iki.fi>, "Michael S. Tsirkin" <mst@redhat.com>, Igor Mammedov <imammedo@redhat.com>, Ani Sinha <anisinha@redhat.com>, Thomas Huth <thuth@redhat.com>, Halil Pasic <pasic@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, David Hildenbrand <david@redhat.com>, Jason Herne <jjherne@linux.ibm.com>, Stafford Horne <shorne@gmail.com>, Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Yanan Wang <wangyanan55@huawei.com>, Zhao Liu <zhao1.liu@intel.com>, Peter Xu <peterx@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, Alexander Graf <agraf@csgraf.de>, Mads Ynddal <mads@ynddal.dk>, Michael Rolnik <mrolnik@gmail.com>, Helge Deller <deller@gmx.de>, Cameron Esfahani <dirty@apple.com>, Roman Bolshakov <rbolshakov@ddn.com>, Phil Dennis-Jordan <phil@philjordan.eu>, Marcelo Tosatti <mtosatti@redhat.com>, Reinoud Zandijk <reinoud@netbsd.org>, Sunil Muthuswamy <sunilmut@microsoft.com>, Song Gao <gaosong@loongson.cn>, Laurent Vivier <laurent@vivier.eu>, "Edgar E. Iglesias" <edgar.iglesias@gmail.com>, Aurelien Jarno <aurelien@aurel32.net>, Jiaxun Yang <jiaxun.yang@flygoat.com>, Aleksandar Rikalo <arikalo@gmail.com>, Huacai Chen <chenhuacai@kernel.org>, Nicholas Piggin <npiggin@gmail.com>, Chinmay Rath <rathc@linux.ibm.com>, Harsh Prateek Bora <harshpb@linux.ibm.com>, Yoshinori Sato <yoshinori.sato@nifty.com>, Ilya Leoshkevich <iii@linux.ibm.com>, Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>, Artyom Tarasenko <atar4qemu@gmail.com>
[PATCH v4 5/8] hpet: make main counter read lock-less
Posted by Igor Mammedov 2 months, 2 weeks ago
Make access to main HPET counter lock-less.

In unlikely event of an update in progress, readers will busy wait
untill update is finished.

As result micro benchmark of concurrent reading of HPET counter
with large number of vCPU shows over 80% better (less) latency.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
v3:
  * make reader busy wait during update and reuse existing seqlock API
       Peter Xu <peterx@redhat.com>
---
 hw/timer/hpet.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index c776afc0f2..789a31d0a0 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -39,6 +39,7 @@
 #include "system/address-spaces.h"
 #include "qom/object.h"
 #include "qemu/lockable.h"
+#include "qemu/seqlock.h"
 #include "trace.h"
 
 struct hpet_fw_config hpet_fw_cfg = {.count = UINT8_MAX};
@@ -74,6 +75,7 @@ struct HPETState {
     MemoryRegion iomem;
     uint64_t hpet_offset;
     bool hpet_offset_saved;
+    QemuSeqLock state_version;
     qemu_irq irqs[HPET_NUM_IRQ_ROUTES];
     uint32_t flags;
     uint8_t rtc_irq_level;
@@ -430,17 +432,25 @@ static uint64_t hpet_ram_read(void *opaque, hwaddr addr,
     trace_hpet_ram_read(addr);
     addr &= ~4;
 
-    QEMU_LOCK_GUARD(&s->lock);
     if (addr == HPET_COUNTER) {
-        if (hpet_enabled(s)) {
-            cur_tick = hpet_get_ticks(s);
-        } else {
-            cur_tick = s->hpet_counter;
-        }
+        unsigned version;
+
+        /*
+         * Write update is rare, so busywait here is unlikely to happen
+         */
+        do {
+            version = seqlock_read_begin(&s->state_version);
+            if (unlikely(!hpet_enabled(s))) {
+                cur_tick = s->hpet_counter;
+            } else {
+                cur_tick = hpet_get_ticks(s);
+            }
+        } while (seqlock_read_retry(&s->state_version, version));
         trace_hpet_ram_read_reading_counter(addr & 4, cur_tick);
         return cur_tick >> shift;
     }
 
+    QEMU_LOCK_GUARD(&s->lock);
     /*address range of all global regs*/
     if (addr <= 0xff) {
         switch (addr) {
@@ -500,6 +510,7 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
             old_val = s->config;
             new_val = deposit64(old_val, shift, len, value);
             new_val = hpet_fixup_reg(new_val, old_val, HPET_CFG_WRITE_MASK);
+            seqlock_write_begin(&s->state_version);
             s->config = new_val;
             if (activating_bit(old_val, new_val, HPET_CFG_ENABLE)) {
                 /* Enable main counter and interrupt generation. */
@@ -518,6 +529,8 @@ static void hpet_ram_write(void *opaque, hwaddr addr,
                     hpet_del_timer(&s->timer[i]);
                 }
             }
+            seqlock_write_end(&s->state_version);
+
             /* i8254 and RTC output pins are disabled
              * when HPET is in legacy mode */
             if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
@@ -686,6 +699,7 @@ static void hpet_init(Object *obj)
     HPETState *s = HPET(obj);
 
     qemu_mutex_init(&s->lock);
+    seqlock_init(&s->state_version);
     /* HPET Area */
     memory_region_init_io(&s->iomem, obj, &hpet_ram_ops, s, "hpet", HPET_LEN);
     memory_region_enable_lockless_io(&s->iomem);
-- 
2.47.1
Re: [PATCH v4 5/8] hpet: make main counter read lock-less
Posted by Zhao Liu 2 months ago
On Thu, Aug 14, 2025 at 06:05:57PM +0200, Igor Mammedov wrote:
> Date: Thu, 14 Aug 2025 18:05:57 +0200
> From: Igor Mammedov <imammedo@redhat.com>
> Subject: [PATCH v4 5/8] hpet: make main counter read lock-less
> 
> Make access to main HPET counter lock-less.
> 
> In unlikely event of an update in progress, readers will busy wait
> untill update is finished.
> 
> As result micro benchmark of concurrent reading of HPET counter
> with large number of vCPU shows over 80% better (less) latency.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> ---
> v3:
>   * make reader busy wait during update and reuse existing seqlock API
>        Peter Xu <peterx@redhat.com>
> ---
>  hw/timer/hpet.c | 26 ++++++++++++++++++++------
>  1 file changed, 20 insertions(+), 6 deletions(-)
 
...

> -    QEMU_LOCK_GUARD(&s->lock);
>      if (addr == HPET_COUNTER) {
> -        if (hpet_enabled(s)) {
> -            cur_tick = hpet_get_ticks(s);
> -        } else {
> -            cur_tick = s->hpet_counter;
> -        }
> +        unsigned version;
> +
> +        /*
> +         * Write update is rare, so busywait here is unlikely to happen
> +         */
> +        do {
> +            version = seqlock_read_begin(&s->state_version);
> +            if (unlikely(!hpet_enabled(s))) {

is there any particular consideration for rearranging the order of the
conditional branches here (and not directly using likely(hpet_enable()))?

> +                cur_tick = s->hpet_counter;
> +            } else {
> +                cur_tick = hpet_get_ticks(s);
> +            }
> +        } while (seqlock_read_retry(&s->state_version, version));
>          trace_hpet_ram_read_reading_counter(addr & 4, cur_tick);
>          return cur_tick >> shift;
>      }

Nice imprvoment!

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Re: [PATCH v4 5/8] hpet: make main counter read lock-less
Posted by Igor Mammedov 2 months ago
On Mon, 25 Aug 2025 22:55:43 +0800
Zhao Liu <zhao1.liu@intel.com> wrote:

> On Thu, Aug 14, 2025 at 06:05:57PM +0200, Igor Mammedov wrote:
> > Date: Thu, 14 Aug 2025 18:05:57 +0200
> > From: Igor Mammedov <imammedo@redhat.com>
> > Subject: [PATCH v4 5/8] hpet: make main counter read lock-less
> > 
> > Make access to main HPET counter lock-less.
> > 
> > In unlikely event of an update in progress, readers will busy wait
> > untill update is finished.
> > 
> > As result micro benchmark of concurrent reading of HPET counter
> > with large number of vCPU shows over 80% better (less) latency.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > ---
> > v3:
> >   * make reader busy wait during update and reuse existing seqlock API
> >        Peter Xu <peterx@redhat.com>
> > ---
> >  hw/timer/hpet.c | 26 ++++++++++++++++++++------
> >  1 file changed, 20 insertions(+), 6 deletions(-)  
>  
> ...
> 
> > -    QEMU_LOCK_GUARD(&s->lock);
> >      if (addr == HPET_COUNTER) {
> > -        if (hpet_enabled(s)) {
> > -            cur_tick = hpet_get_ticks(s);
> > -        } else {
> > -            cur_tick = s->hpet_counter;
> > -        }
> > +        unsigned version;
> > +
> > +        /*
> > +         * Write update is rare, so busywait here is unlikely to happen
> > +         */
> > +        do {
> > +            version = seqlock_read_begin(&s->state_version);
> > +            if (unlikely(!hpet_enabled(s))) {  
> 
> is there any particular consideration for rearranging the order of the
> conditional branches here (and not directly using likely(hpet_enable()))?

not really, I suppose it should be the same either way.

> 
> > +                cur_tick = s->hpet_counter;
> > +            } else {
> > +                cur_tick = hpet_get_ticks(s);
> > +            }
> > +        } while (seqlock_read_retry(&s->state_version, version));
> >          trace_hpet_ram_read_reading_counter(addr & 4, cur_tick);
> >          return cur_tick >> shift;
> >      }  
> 
> Nice imprvoment!
> 
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> 

thanks!