[PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO

Igor Mammedov posted 10 patches 3 months, 1 week ago
Maintainers: Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, Igor Mammedov <imammedo@redhat.com>, Ani Sinha <anisinha@redhat.com>, Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Yanan Wang <wangyanan55@huawei.com>, Zhao Liu <zhao1.liu@intel.com>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, Alexander Graf <agraf@csgraf.de>, Mads Ynddal <mads@ynddal.dk>, Michael Rolnik <mrolnik@gmail.com>, Helge Deller <deller@gmx.de>, Cameron Esfahani <dirty@apple.com>, Roman Bolshakov <rbolshakov@ddn.com>, Phil Dennis-Jordan <phil@philjordan.eu>, Marcelo Tosatti <mtosatti@redhat.com>, Reinoud Zandijk <reinoud@netbsd.org>, Sunil Muthuswamy <sunilmut@microsoft.com>, Song Gao <gaosong@loongson.cn>, Laurent Vivier <laurent@vivier.eu>, "Edgar E. Iglesias" <edgar.iglesias@gmail.com>, Aurelien Jarno <aurelien@aurel32.net>, Jiaxun Yang <jiaxun.yang@flygoat.com>, Aleksandar Rikalo <arikalo@gmail.com>, Huacai Chen <chenhuacai@kernel.org>, Stafford Horne <shorne@gmail.com>, Nicholas Piggin <npiggin@gmail.com>, Chinmay Rath <rathc@linux.ibm.com>, Harsh Prateek Bora <harshpb@linux.ibm.com>, Yoshinori Sato <yoshinori.sato@nifty.com>, Ilya Leoshkevich <iii@linux.ibm.com>, Thomas Huth <thuth@redhat.com>, Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>, Artyom Tarasenko <atar4qemu@gmail.com>
There is a newer version of this series
[PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO
Posted by Igor Mammedov 3 months, 1 week ago
This patch brings back Jan's idea [1] of BQL-free IO access

This will let us make access to ACPI PM/HPET timers cheaper,
and prevent BQL contention in case of workload that heavily
uses the timers with a lot of vCPUs.

1) 196ea13104f (memory: Add global-locking property to memory regions)
   ... de7ea885c539 (kvm: Switch to unlocked MMIO)

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v3:
  add comment for 'mr->disable_reentrancy_guard = true'
    Peter Xu <peterx@redhat.com>
---
 include/system/memory.h | 10 ++++++++++
 system/memory.c         | 15 +++++++++++++++
 system/physmem.c        |  2 +-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/system/memory.h b/include/system/memory.h
index e2cd6ed126..d04366c994 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -833,6 +833,7 @@ struct MemoryRegion {
     bool nonvolatile;
     bool rom_device;
     bool flush_coalesced_mmio;
+    bool lockless_io;
     bool unmergeable;
     uint8_t dirty_log_mask;
     bool is_iommu;
@@ -2341,6 +2342,15 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
  */
 void memory_region_clear_flush_coalesced(MemoryRegion *mr);
 
+/**
+ * memory_region_enable_lockless_io: Enable lockless (BQL free) acceess.
+ *
+ * Enable BQL-free access for devices with fine-grained locking.
+ *
+ * @mr: the memory region to be updated.
+ */
+void memory_region_enable_lockless_io(MemoryRegion *mr);
+
 /**
  * memory_region_add_eventfd: Request an eventfd to be triggered when a word
  *                            is written to a location.
diff --git a/system/memory.c b/system/memory.c
index 5646547940..44701c465c 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2546,6 +2546,21 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr)
     }
 }
 
+void memory_region_enable_lockless_io(MemoryRegion *mr)
+{
+    mr->lockless_io = true;
+    /*
+     * reentrancy_guard has per device scope, that when enabled
+     * will effectively prevent concurrent access to device's IO
+     * MemoryRegion(s) by not calling accessor callback.
+     *
+     * Turn it off for lock-less IO enabled devices, to allow
+     * concurrent IO.
+     * TODO: remove this when reentrancy_guard becomes per transaction.
+     */
+    mr->disable_reentrancy_guard = true;
+}
+
 void memory_region_add_eventfd(MemoryRegion *mr,
                                hwaddr addr,
                                unsigned size,
diff --git a/system/physmem.c b/system/physmem.c
index e5dd760e0b..f498572fc8 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2900,7 +2900,7 @@ bool prepare_mmio_access(MemoryRegion *mr)
 {
     bool release_lock = false;
 
-    if (!bql_locked()) {
+    if (!bql_locked() && !mr->lockless_io) {
         bql_lock();
         release_lock = true;
     }
-- 
2.47.1
Re: [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO
Posted by Peter Xu 3 months ago
On Fri, Aug 08, 2025 at 02:01:28PM +0200, Igor Mammedov wrote:
> This patch brings back Jan's idea [1] of BQL-free IO access
> 
> This will let us make access to ACPI PM/HPET timers cheaper,
> and prevent BQL contention in case of workload that heavily
> uses the timers with a lot of vCPUs.
> 
> 1) 196ea13104f (memory: Add global-locking property to memory regions)
>    ... de7ea885c539 (kvm: Switch to unlocked MMIO)
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu
Re: [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO
Posted by David Hildenbrand 3 months, 1 week ago
On 08.08.25 14:01, Igor Mammedov wrote:
> This patch brings back Jan's idea [1] of BQL-free IO access
> 
> This will let us make access to ACPI PM/HPET timers cheaper,
> and prevent BQL contention in case of workload that heavily
> uses the timers with a lot of vCPUs.
> 
> 1) 196ea13104f (memory: Add global-locking property to memory regions)
>     ... de7ea885c539 (kvm: Switch to unlocked MMIO)
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v3:
>    add comment for 'mr->disable_reentrancy_guard = true'
>      Peter Xu <peterx@redhat.com>
> ---
>   include/system/memory.h | 10 ++++++++++
>   system/memory.c         | 15 +++++++++++++++
>   system/physmem.c        |  2 +-
>   3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/include/system/memory.h b/include/system/memory.h
> index e2cd6ed126..d04366c994 100644
> --- a/include/system/memory.h
> +++ b/include/system/memory.h
> @@ -833,6 +833,7 @@ struct MemoryRegion {
>       bool nonvolatile;
>       bool rom_device;
>       bool flush_coalesced_mmio;
> +    bool lockless_io;
>       bool unmergeable;
>       uint8_t dirty_log_mask;
>       bool is_iommu;
> @@ -2341,6 +2342,15 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
>    */
>   void memory_region_clear_flush_coalesced(MemoryRegion *mr);
>   
> +/**
> + * memory_region_enable_lockless_io: Enable lockless (BQL free) acceess.
> + *
> + * Enable BQL-free access for devices with fine-grained locking.
> + *
> + * @mr: the memory region to be updated.
> + */
> +void memory_region_enable_lockless_io(MemoryRegion *mr);

Is this safe to use on any IO region, or could actually something break 
when mis-used? In case it's the latter, I assume we would want to 
carefully document under which scenarios this is safe to use.

-- 
Cheers,

David / dhildenb
Re: [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO
Posted by Igor Mammedov 3 months, 1 week ago
On Fri, 8 Aug 2025 14:12:54 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 08.08.25 14:01, Igor Mammedov wrote:
> > This patch brings back Jan's idea [1] of BQL-free IO access
> > 
> > This will let us make access to ACPI PM/HPET timers cheaper,
> > and prevent BQL contention in case of workload that heavily
> > uses the timers with a lot of vCPUs.
> > 
> > 1) 196ea13104f (memory: Add global-locking property to memory regions)
> >     ... de7ea885c539 (kvm: Switch to unlocked MMIO)
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > v3:
> >    add comment for 'mr->disable_reentrancy_guard = true'
> >      Peter Xu <peterx@redhat.com>
> > ---
> >   include/system/memory.h | 10 ++++++++++
> >   system/memory.c         | 15 +++++++++++++++
> >   system/physmem.c        |  2 +-
> >   3 files changed, 26 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/system/memory.h b/include/system/memory.h
> > index e2cd6ed126..d04366c994 100644
> > --- a/include/system/memory.h
> > +++ b/include/system/memory.h
> > @@ -833,6 +833,7 @@ struct MemoryRegion {
> >       bool nonvolatile;
> >       bool rom_device;
> >       bool flush_coalesced_mmio;
> > +    bool lockless_io;
> >       bool unmergeable;
> >       uint8_t dirty_log_mask;
> >       bool is_iommu;
> > @@ -2341,6 +2342,15 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
> >    */
> >   void memory_region_clear_flush_coalesced(MemoryRegion *mr);
> >   
> > +/**
> > + * memory_region_enable_lockless_io: Enable lockless (BQL free) acceess.
> > + *
> > + * Enable BQL-free access for devices with fine-grained locking.
> > + *
> > + * @mr: the memory region to be updated.
> > + */
> > +void memory_region_enable_lockless_io(MemoryRegion *mr);  
> 
> Is this safe to use on any IO region, or could actually something break 
> when mis-used? In case it's the latter, I assume we would want to 
> carefully document under which scenarios this is safe to use.

"for devices with fine-grained locking" is defining scope of where it's
applicable, in another words devices enabling this need to take care
of any locking if necessary.

in this series PM timer didn't need any, while HPET required
some refactoring to make it lock-less on main timer reads,
while taking per device lock for everything else.
Re: [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO
Posted by David Hildenbrand 3 months, 1 week ago
On 08.08.25 16:36, Igor Mammedov wrote:
> On Fri, 8 Aug 2025 14:12:54 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 08.08.25 14:01, Igor Mammedov wrote:
>>> This patch brings back Jan's idea [1] of BQL-free IO access
>>>
>>> This will let us make access to ACPI PM/HPET timers cheaper,
>>> and prevent BQL contention in case of workload that heavily
>>> uses the timers with a lot of vCPUs.
>>>
>>> 1) 196ea13104f (memory: Add global-locking property to memory regions)
>>>      ... de7ea885c539 (kvm: Switch to unlocked MMIO)
>>>
>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>> ---
>>> v3:
>>>     add comment for 'mr->disable_reentrancy_guard = true'
>>>       Peter Xu <peterx@redhat.com>
>>> ---
>>>    include/system/memory.h | 10 ++++++++++
>>>    system/memory.c         | 15 +++++++++++++++
>>>    system/physmem.c        |  2 +-
>>>    3 files changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/system/memory.h b/include/system/memory.h
>>> index e2cd6ed126..d04366c994 100644
>>> --- a/include/system/memory.h
>>> +++ b/include/system/memory.h
>>> @@ -833,6 +833,7 @@ struct MemoryRegion {
>>>        bool nonvolatile;
>>>        bool rom_device;
>>>        bool flush_coalesced_mmio;
>>> +    bool lockless_io;
>>>        bool unmergeable;
>>>        uint8_t dirty_log_mask;
>>>        bool is_iommu;
>>> @@ -2341,6 +2342,15 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
>>>     */
>>>    void memory_region_clear_flush_coalesced(MemoryRegion *mr);
>>>    
>>> +/**
>>> + * memory_region_enable_lockless_io: Enable lockless (BQL free) acceess.
>>> + *
>>> + * Enable BQL-free access for devices with fine-grained locking.
>>> + *
>>> + * @mr: the memory region to be updated.
>>> + */
>>> +void memory_region_enable_lockless_io(MemoryRegion *mr);
>>
>> Is this safe to use on any IO region, or could actually something break
>> when mis-used? In case it's the latter, I assume we would want to
>> carefully document under which scenarios this is safe to use.
> 
> "for devices with fine-grained locking" is defining scope of where it's
> applicable, in another words devices enabling this need to take care
> of any locking if necessary.

Okay, I would have stressed that a bit more, something like

"Enable BQL-free access for devices that are well prepared to handle 
locking during I/O themselves: either by doing fine grained locking or 
by providing lock-free I/O schemes."

Nothing else jumped at me.

-- 
Cheers,

David / dhildenb
Re: [PATCH v3 01/10] memory: reintroduce BQL-free fine-grained PIO/MMIO
Posted by Igor Mammedov 3 months ago
On Fri, 8 Aug 2025 17:24:54 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 08.08.25 16:36, Igor Mammedov wrote:
> > On Fri, 8 Aug 2025 14:12:54 +0200
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> On 08.08.25 14:01, Igor Mammedov wrote:  
> >>> This patch brings back Jan's idea [1] of BQL-free IO access
> >>>
> >>> This will let us make access to ACPI PM/HPET timers cheaper,
> >>> and prevent BQL contention in case of workload that heavily
> >>> uses the timers with a lot of vCPUs.
> >>>
> >>> 1) 196ea13104f (memory: Add global-locking property to memory regions)
> >>>      ... de7ea885c539 (kvm: Switch to unlocked MMIO)
> >>>
> >>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >>> ---
> >>> v3:
> >>>     add comment for 'mr->disable_reentrancy_guard = true'
> >>>       Peter Xu <peterx@redhat.com>
> >>> ---
> >>>    include/system/memory.h | 10 ++++++++++
> >>>    system/memory.c         | 15 +++++++++++++++
> >>>    system/physmem.c        |  2 +-
> >>>    3 files changed, 26 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/system/memory.h b/include/system/memory.h
> >>> index e2cd6ed126..d04366c994 100644
> >>> --- a/include/system/memory.h
> >>> +++ b/include/system/memory.h
> >>> @@ -833,6 +833,7 @@ struct MemoryRegion {
> >>>        bool nonvolatile;
> >>>        bool rom_device;
> >>>        bool flush_coalesced_mmio;
> >>> +    bool lockless_io;
> >>>        bool unmergeable;
> >>>        uint8_t dirty_log_mask;
> >>>        bool is_iommu;
> >>> @@ -2341,6 +2342,15 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
> >>>     */
> >>>    void memory_region_clear_flush_coalesced(MemoryRegion *mr);
> >>>    
> >>> +/**
> >>> + * memory_region_enable_lockless_io: Enable lockless (BQL free) acceess.
> >>> + *
> >>> + * Enable BQL-free access for devices with fine-grained locking.
> >>> + *
> >>> + * @mr: the memory region to be updated.
> >>> + */
> >>> +void memory_region_enable_lockless_io(MemoryRegion *mr);  
> >>
> >> Is this safe to use on any IO region, or could actually something break
> >> when mis-used? In case it's the latter, I assume we would want to
> >> carefully document under which scenarios this is safe to use.  
> > 
> > "for devices with fine-grained locking" is defining scope of where it's
> > applicable, in another words devices enabling this need to take care
> > of any locking if necessary.  
> 
> Okay, I would have stressed that a bit more, something like
> 
> "Enable BQL-free access for devices that are well prepared to handle 
> locking during I/O themselves: either by doing fine grained locking or 
> by providing lock-free I/O schemes."

Thank,
I'll fix it up on respin

> 
> Nothing else jumped at me.
>