Cc'ing Alex, Darren and Bandan.
On 30/7/25 14:39, Igor Mammedov wrote:
> v2:
> * Make both read and write pathes BQL-less (Gerd)
> * Refactor HPET to handle lock-less access correctly
> when stopping/starting counter in parallel. (Peter Maydell)
> * Publish kvm-unit-tests HPET bench/torture test [1] to verify
> HPET lock-less handling
>
> When booting WS2025 with following CLI
> 1) -M q35,hpet=off -cpu host -enable-kvm -smp 240,sockets=4
> the guest boots very slow and is sluggish after boot
> or it's stuck on boot at spinning circle (most of the time).
>
> pref shows that VM is experiencing heavy BQL contention on IO path
> which happens to be ACPI PM timer read access. A variation with
> HPET enabled moves contention to HPET timer read access.
> And it only gets worse with increasing number of VCPUs.
>
> Series prevents large VM vCPUs contending on BQL due to PM|HPET timer
> access and lets Windows to move on with boot process.
>
> Testing lock-less IO with HPET micro benchmark [1] shows approx 80%
> better performance than the current BLQ locked path.
> [chart https://ibb.co/MJY9999 shows much better scaling of lock-less
> IO compared to BQL one.]
>
> In my tests, with CLI WS2025 guest wasn't able to boot within 30min
> on both hosts
> * 32 core 2NUMA nodes
> * 448 cores 8NUMA nodes
> With ACPI PM timer in BQL-free read mode, guest boots within approx:
> * 2min
> * 1min
> respectively.
>
> With HPET enabled boot time shrinks ~2x
> * 4m13 -> 2m21
> * 2m19 -> 1m15
> respectively.
>
> 1) "[kvm-unit-tests PATCH v4 0/5] x86: add HPET counter tests"
> https://lore.kernel.org/kvm/20250725095429.1691734-1-imammedo@redhat.com/T/#t
> PS:
> Using hv-time=on cpu option helps a lot (when it works) and
> lets [1] guest boot fine in ~1-2min. Series doesn't make
> a significant impact in this case.
>
> PS2:
> Tested series with a bunch of different guests:
> RHEL-[6..10]x64, WS2012R2, WS2016, WS2022, WS2025
>
> PS3:
> dropped mention of https://bugzilla.redhat.com/show_bug.cgi?id=1322713
> as it's not reproducible with current software stack or even with
> the same qemu/seabios as reported (kernel versions mentioned in
> the report were interim ones and no longer available,
> so I've used nearest released at the time for testing)
>
> Igor Mammedov (6):
> memory: reintroduce BQL-free fine-grained PIO/MMIO
> acpi: mark PMTIMER as unlocked
> hpet: switch to fain-grained device locking
> hpet: move out main counter read into a separate block
> hpet: make main counter read lock-less
> kvm: i386: irqchip: take BQL only if there is an interrupt
>
> include/system/memory.h | 10 +++++++
> hw/acpi/core.c | 1 +
> hw/timer/hpet.c | 64 +++++++++++++++++++++++++++++++++++------
> system/memory.c | 6 ++++
> system/physmem.c | 2 +-
> target/i386/kvm/kvm.c | 58 +++++++++++++++++++++++--------------
> 6 files changed, 111 insertions(+), 30 deletions(-)
>