[PATCH v2 0/6] Reinvent BQL-free PIO/MMIO

Igor Mammedov posted 6 patches 6 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20250730123934.1787379-1-imammedo@redhat.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Igor Mammedov <imammedo@redhat.com>, Ani Sinha <anisinha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Marcelo Tosatti <mtosatti@redhat.com>
There is a newer version of this series
include/system/memory.h | 10 +++++++
hw/acpi/core.c          |  1 +
hw/timer/hpet.c         | 64 +++++++++++++++++++++++++++++++++++------
system/memory.c         |  6 ++++
system/physmem.c        |  2 +-
target/i386/kvm/kvm.c   | 58 +++++++++++++++++++++++--------------
6 files changed, 111 insertions(+), 30 deletions(-)
[PATCH v2 0/6] Reinvent BQL-free PIO/MMIO
Posted by Igor Mammedov 6 months, 1 week ago
v2:
  * Make both read and write pathes BQL-less (Gerd)
  * Refactor HPET to handle lock-less access correctly
    when stopping/starting counter in parallel. (Peter Maydell)
  * Publish kvm-unit-tests HPET bench/torture test [1] to verify
    HPET lock-less handling

When booting WS2025 with following CLI
 1)   -M q35,hpet=off -cpu host -enable-kvm -smp 240,sockets=4
the guest boots very slow and is sluggish after boot
or it's stuck on boot at spinning circle (most of the time).

pref shows that VM is experiencing heavy BQL contention on IO path
which happens to be ACPI PM timer read access. A variation with
HPET enabled moves contention to HPET timer read access.
And it only gets worse with increasing number of VCPUs.

Series prevents large VM vCPUs contending on BQL due to PM|HPET timer
access and lets Windows to move on with boot process.

Testing lock-less IO with HPET micro benchmark [1] shows approx 80%
better performance than the current BLQ locked path.
[chart https://ibb.co/MJY9999 shows much better scaling of lock-less
IO compared to BQL one.]

In my tests, with CLI WS2025 guest wasn't able to boot within 30min
on both hosts
  * 32 core 2NUMA nodes
  * 448 cores 8NUMA nodes
With ACPI PM timer in BQL-free read mode, guest boots within approx:
 * 2min
 * 1min
respectively.

With HPET enabled boot time shrinks ~2x
 * 4m13 -> 2m21
 * 2m19 -> 1m15
respectively.

1) "[kvm-unit-tests PATCH v4 0/5] x86: add HPET counter tests"
    https://lore.kernel.org/kvm/20250725095429.1691734-1-imammedo@redhat.com/T/#t
PS:
Using hv-time=on cpu option helps a lot (when it works) and
lets [1] guest boot fine in ~1-2min. Series doesn't make
a significant impact in this case.

PS2:
Tested series with a bunch of different guests:
 RHEL-[6..10]x64, WS2012R2, WS2016, WS2022, WS2025

PS3:
 dropped mention of https://bugzilla.redhat.com/show_bug.cgi?id=1322713
 as it's not reproducible with current software stack or even with
 the same qemu/seabios as reported (kernel versions mentioned in
 the report were interim ones and no longer available,
 so I've used nearest released at the time for testing) 

Igor Mammedov (6):
  memory: reintroduce BQL-free fine-grained PIO/MMIO
  acpi: mark PMTIMER as unlocked
  hpet: switch to fain-grained device locking
  hpet: move out main counter read into a separate block
  hpet: make main counter read lock-less
  kvm: i386: irqchip: take BQL only if there is an interrupt

 include/system/memory.h | 10 +++++++
 hw/acpi/core.c          |  1 +
 hw/timer/hpet.c         | 64 +++++++++++++++++++++++++++++++++++------
 system/memory.c         |  6 ++++
 system/physmem.c        |  2 +-
 target/i386/kvm/kvm.c   | 58 +++++++++++++++++++++++--------------
 6 files changed, 111 insertions(+), 30 deletions(-)

-- 
2.47.1
Re: [PATCH v2 0/6] Reinvent BQL-free PIO/MMIO
Posted by Philippe Mathieu-Daudé 6 months, 1 week ago
Cc'ing Alex, Darren and Bandan.

On 30/7/25 14:39, Igor Mammedov wrote:
> v2:
>    * Make both read and write pathes BQL-less (Gerd)
>    * Refactor HPET to handle lock-less access correctly
>      when stopping/starting counter in parallel. (Peter Maydell)
>    * Publish kvm-unit-tests HPET bench/torture test [1] to verify
>      HPET lock-less handling
> 
> When booting WS2025 with following CLI
>   1)   -M q35,hpet=off -cpu host -enable-kvm -smp 240,sockets=4
> the guest boots very slow and is sluggish after boot
> or it's stuck on boot at spinning circle (most of the time).
> 
> pref shows that VM is experiencing heavy BQL contention on IO path
> which happens to be ACPI PM timer read access. A variation with
> HPET enabled moves contention to HPET timer read access.
> And it only gets worse with increasing number of VCPUs.
> 
> Series prevents large VM vCPUs contending on BQL due to PM|HPET timer
> access and lets Windows to move on with boot process.
> 
> Testing lock-less IO with HPET micro benchmark [1] shows approx 80%
> better performance than the current BLQ locked path.
> [chart https://ibb.co/MJY9999 shows much better scaling of lock-less
> IO compared to BQL one.]
> 
> In my tests, with CLI WS2025 guest wasn't able to boot within 30min
> on both hosts
>    * 32 core 2NUMA nodes
>    * 448 cores 8NUMA nodes
> With ACPI PM timer in BQL-free read mode, guest boots within approx:
>   * 2min
>   * 1min
> respectively.
> 
> With HPET enabled boot time shrinks ~2x
>   * 4m13 -> 2m21
>   * 2m19 -> 1m15
> respectively.
> 
> 1) "[kvm-unit-tests PATCH v4 0/5] x86: add HPET counter tests"
>      https://lore.kernel.org/kvm/20250725095429.1691734-1-imammedo@redhat.com/T/#t
> PS:
> Using hv-time=on cpu option helps a lot (when it works) and
> lets [1] guest boot fine in ~1-2min. Series doesn't make
> a significant impact in this case.
> 
> PS2:
> Tested series with a bunch of different guests:
>   RHEL-[6..10]x64, WS2012R2, WS2016, WS2022, WS2025
> 
> PS3:
>   dropped mention of https://bugzilla.redhat.com/show_bug.cgi?id=1322713
>   as it's not reproducible with current software stack or even with
>   the same qemu/seabios as reported (kernel versions mentioned in
>   the report were interim ones and no longer available,
>   so I've used nearest released at the time for testing)
> 
> Igor Mammedov (6):
>    memory: reintroduce BQL-free fine-grained PIO/MMIO
>    acpi: mark PMTIMER as unlocked
>    hpet: switch to fain-grained device locking
>    hpet: move out main counter read into a separate block
>    hpet: make main counter read lock-less
>    kvm: i386: irqchip: take BQL only if there is an interrupt
> 
>   include/system/memory.h | 10 +++++++
>   hw/acpi/core.c          |  1 +
>   hw/timer/hpet.c         | 64 +++++++++++++++++++++++++++++++++++------
>   system/memory.c         |  6 ++++
>   system/physmem.c        |  2 +-
>   target/i386/kvm/kvm.c   | 58 +++++++++++++++++++++++--------------
>   6 files changed, 111 insertions(+), 30 deletions(-)
>