[PATCH 01/14] hw/pci-bridge: Do not assume immediate MemoryRegion finalization

Akihiko Odaki posted 14 patches 4 months, 3 weeks ago
Maintainers: Richard Henderson <richard.henderson@linaro.org>, Helge Deller <deller@gmx.de>, "Marc-André Lureau" <marcandre.lureau@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Eduardo Habkost <eduardo@habkost.net>, Gerd Hoffmann <kraxel@redhat.com>, John Snow <jsnow@redhat.com>, Keith Busch <kbusch@kernel.org>, Klaus Jensen <its@irrelevant.dk>, Jesper Devantier <foss@defmacro.it>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Nicholas Piggin <npiggin@gmail.com>, Daniel Henrique Barboza <danielhb413@gmail.com>, Harsh Prateek Bora <harshpb@linux.ibm.com>, John Levon <john.levon@nutanix.com>, Thanos Makatos <thanos.makatos@nutanix.com>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>
[PATCH 01/14] hw/pci-bridge: Do not assume immediate MemoryRegion finalization
Posted by Akihiko Odaki 4 months, 3 weeks ago
When updating memory mappings, pci_bridge_update_mappings() performed
the following operations:
1. Start a transaction
2. Delete the subregions from the container
3. Unparent the subregions
4. Initialize the subregions
5. End the transaction

This assumes the old subregion instances are finalized immediately after
3, but it is not true; the finalization is delayed until 5. Remove the
assumption by using functions to dynamically update MemoryRegions.

Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
---
 include/hw/pci/pci.h |  1 +
 hw/pci/pci.c         |  2 +-
 hw/pci/pci_bridge.c  | 96 ++++++++++++++++++++++++++++------------------------
 3 files changed, 54 insertions(+), 45 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6b7d3ac8a361..d0bd214b4e11 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -256,6 +256,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
                       uint8_t attr, MemoryRegion *memory);
 void pci_register_vga(PCIDevice *pci_dev, MemoryRegion *mem,
                       MemoryRegion *io_lo, MemoryRegion *io_hi);
+void pci_update_vga(PCIDevice *pci_dev);
 void pci_unregister_vga(PCIDevice *pci_dev);
 pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num);
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index c70b5ceebaf1..516029f66cda 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1521,7 +1521,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
     }
 }
 
-static void pci_update_vga(PCIDevice *pci_dev)
+void pci_update_vga(PCIDevice *pci_dev)
 {
     uint16_t cmd;
 
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 76255c4cd892..240d0f904de9 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -145,11 +145,10 @@ pcibus_t pci_bridge_get_limit(const PCIDevice *bridge, uint8_t type)
     return limit;
 }
 
-static void pci_bridge_init_alias(PCIBridge *bridge, MemoryRegion *alias,
-                                  uint8_t type, const char *name,
-                                  MemoryRegion *space,
-                                  MemoryRegion *parent_space,
-                                  bool enabled)
+static void pci_bridge_update_alias(PCIBridge *bridge, bool init,
+                                    MemoryRegion *alias, uint8_t type,
+                                    const char *name, MemoryRegion *space,
+                                    MemoryRegion *parent_space, bool enabled)
 {
     PCIDevice *bridge_dev = PCI_DEVICE(bridge);
     pcibus_t base = pci_bridge_get_base(bridge_dev, type);
@@ -158,25 +157,37 @@ static void pci_bridge_init_alias(PCIBridge *bridge, MemoryRegion *alias,
      * Apparently no way to do this with existing memory APIs. */
     pcibus_t size = enabled && limit >= base ? limit + 1 - base : 0;
 
-    memory_region_init_alias(alias, OBJECT(bridge), name, space, base, size);
+    if (init) {
+        memory_region_init_alias(alias, OBJECT(bridge), name, space, base, size);
+    } else {
+        memory_region_set_size(alias, size);
+        memory_region_set_alias_offset(alias, base);
+    }
+
     memory_region_add_subregion_overlap(parent_space, base, alias, 1);
 }
 
-static void pci_bridge_init_vga_aliases(PCIBridge *br, PCIBus *parent,
-                                        MemoryRegion *alias_vga)
+static void pci_bridge_update_vga_aliases(PCIBridge *br, bool init,
+                                          PCIBus *parent,
+                                          MemoryRegion *alias_vga)
 {
     PCIDevice *pd = PCI_DEVICE(br);
     uint16_t brctl = pci_get_word(pd->config + PCI_BRIDGE_CONTROL);
 
-    memory_region_init_alias(&alias_vga[QEMU_PCI_VGA_IO_LO], OBJECT(br),
-                             "pci_bridge_vga_io_lo", &br->address_space_io,
-                             QEMU_PCI_VGA_IO_LO_BASE, QEMU_PCI_VGA_IO_LO_SIZE);
-    memory_region_init_alias(&alias_vga[QEMU_PCI_VGA_IO_HI], OBJECT(br),
-                             "pci_bridge_vga_io_hi", &br->address_space_io,
-                             QEMU_PCI_VGA_IO_HI_BASE, QEMU_PCI_VGA_IO_HI_SIZE);
-    memory_region_init_alias(&alias_vga[QEMU_PCI_VGA_MEM], OBJECT(br),
-                             "pci_bridge_vga_mem", &br->address_space_mem,
-                             QEMU_PCI_VGA_MEM_BASE, QEMU_PCI_VGA_MEM_SIZE);
+    if (init) {
+        memory_region_init_alias(&alias_vga[QEMU_PCI_VGA_IO_LO], OBJECT(br),
+                                 "pci_bridge_vga_io_lo", &br->address_space_io,
+                                 QEMU_PCI_VGA_IO_LO_BASE,
+                                 QEMU_PCI_VGA_IO_LO_SIZE);
+        memory_region_init_alias(&alias_vga[QEMU_PCI_VGA_IO_HI], OBJECT(br),
+                                 "pci_bridge_vga_io_hi", &br->address_space_io,
+                                 QEMU_PCI_VGA_IO_HI_BASE,
+                                 QEMU_PCI_VGA_IO_HI_SIZE);
+        memory_region_init_alias(&alias_vga[QEMU_PCI_VGA_MEM], OBJECT(br),
+                                 "pci_bridge_vga_mem", &br->address_space_mem,
+                                 QEMU_PCI_VGA_MEM_BASE,
+                                 QEMU_PCI_VGA_MEM_SIZE);
+    }
 
     if (brctl & PCI_BRIDGE_CTL_VGA) {
         pci_register_vga(pd, &alias_vga[QEMU_PCI_VGA_MEM],
@@ -185,33 +196,33 @@ static void pci_bridge_init_vga_aliases(PCIBridge *br, PCIBus *parent,
     }
 }
 
-static void pci_bridge_region_init(PCIBridge *br)
+static void pci_bridge_region_update(PCIBridge *br, bool init)
 {
     PCIDevice *pd = PCI_DEVICE(br);
     PCIBus *parent = pci_get_bus(pd);
     PCIBridgeWindows *w = &br->windows;
     uint16_t cmd = pci_get_word(pd->config + PCI_COMMAND);
 
-    pci_bridge_init_alias(br, &w->alias_pref_mem,
-                          PCI_BASE_ADDRESS_MEM_PREFETCH,
-                          "pci_bridge_pref_mem",
-                          &br->address_space_mem,
-                          parent->address_space_mem,
-                          cmd & PCI_COMMAND_MEMORY);
-    pci_bridge_init_alias(br, &w->alias_mem,
-                          PCI_BASE_ADDRESS_SPACE_MEMORY,
-                          "pci_bridge_mem",
-                          &br->address_space_mem,
-                          parent->address_space_mem,
-                          cmd & PCI_COMMAND_MEMORY);
-    pci_bridge_init_alias(br, &w->alias_io,
-                          PCI_BASE_ADDRESS_SPACE_IO,
-                          "pci_bridge_io",
-                          &br->address_space_io,
-                          parent->address_space_io,
-                          cmd & PCI_COMMAND_IO);
-
-    pci_bridge_init_vga_aliases(br, parent, w->alias_vga);
+    pci_bridge_update_alias(br, init, &w->alias_pref_mem,
+                            PCI_BASE_ADDRESS_MEM_PREFETCH,
+                            "pci_bridge_pref_mem",
+                            &br->address_space_mem,
+                            parent->address_space_mem,
+                            cmd & PCI_COMMAND_MEMORY);
+    pci_bridge_update_alias(br, init, &w->alias_mem,
+                            PCI_BASE_ADDRESS_SPACE_MEMORY,
+                            "pci_bridge_mem",
+                            &br->address_space_mem,
+                            parent->address_space_mem,
+                            cmd & PCI_COMMAND_MEMORY);
+    pci_bridge_update_alias(br, init, &w->alias_io,
+                            PCI_BASE_ADDRESS_SPACE_IO,
+                            "pci_bridge_io",
+                            &br->address_space_io,
+                            parent->address_space_io,
+                            cmd & PCI_COMMAND_IO);
+
+    pci_bridge_update_vga_aliases(br, init, parent, w->alias_vga);
 }
 
 static void pci_bridge_region_del(PCIBridge *br, PCIBridgeWindows *w)
@@ -237,14 +248,11 @@ static void pci_bridge_region_cleanup(PCIBridge *br, PCIBridgeWindows *w)
 
 void pci_bridge_update_mappings(PCIBridge *br)
 {
-    PCIBridgeWindows *w = &br->windows;
-
     /* Make updates atomic to: handle the case of one VCPU updating the bridge
      * while another accesses an unaffected region. */
     memory_region_transaction_begin();
-    pci_bridge_region_del(br, w);
-    pci_bridge_region_cleanup(br, w);
-    pci_bridge_region_init(br);
+    pci_bridge_region_del(br, &br->windows);
+    pci_bridge_region_update(br, false);
     memory_region_transaction_commit();
 }
 
@@ -386,7 +394,7 @@ void pci_bridge_initfn(PCIDevice *dev, const char *typename)
     memory_region_init(&br->address_space_io, OBJECT(br), "pci_bridge_io",
                        4 * GiB);
     address_space_init(&br->as_io, &br->address_space_io, "pci_bridge_pci_io");
-    pci_bridge_region_init(br);
+    pci_bridge_region_update(br, true);
     QLIST_INIT(&sec_bus->child);
     QLIST_INSERT_HEAD(&parent->child, sec_bus, sibling);
 

-- 
2.51.0
Re: [PATCH 01/14] hw/pci-bridge: Do not assume immediate MemoryRegion finalization
Posted by Peter Xu 4 months, 1 week ago
On Wed, Sep 17, 2025 at 07:32:47PM +0900, Akihiko Odaki wrote:
> When updating memory mappings, pci_bridge_update_mappings() performed
> the following operations:
> 1. Start a transaction
> 2. Delete the subregions from the container
> 3. Unparent the subregions
> 4. Initialize the subregions
> 5. End the transaction
> 
> This assumes the old subregion instances are finalized immediately after
> 3, but it is not true; the finalization is delayed until 5. Remove the
> assumption by using functions to dynamically update MemoryRegions.

To Paolo - you can ignore this if you'll merge it.  However I'm still
raising this as a pure question.

Could there be an explanation why it'll be delayed until step 5?

All the MRs involved in this patch are all aliases.  I believe this rules
out any map() ref-takers.  IIUC it is exactly what was marked exceptions in
the memory.rst here:

  There is an exception to the above rule: it is okay to call
  object_unparent at any time for an alias or a container region.  It is
  therefore also okay to create or destroy alias and container regions
  dynamically during a device's lifetime.
  
  This exceptional usage is valid because aliases and containers only help
  QEMU building the guest's memory map; they are never accessed directly.
  memory_region_ref and memory_region_unref are never called on aliases
  or containers, and the above situation then cannot happen.  Exploiting
  this exception is rarely necessary, and therefore it is discouraged,
  but nevertheless it is used in a few places.

I was expecting the current code should at least be fine, no?

Thanks,

-- 
Peter Xu
Re: [PATCH 01/14] hw/pci-bridge: Do not assume immediate MemoryRegion finalization
Posted by Akihiko Odaki 4 months, 1 week ago
On 2025/10/03 3:47, Peter Xu wrote:
> On Wed, Sep 17, 2025 at 07:32:47PM +0900, Akihiko Odaki wrote:
>> When updating memory mappings, pci_bridge_update_mappings() performed
>> the following operations:
>> 1. Start a transaction
>> 2. Delete the subregions from the container
>> 3. Unparent the subregions
>> 4. Initialize the subregions
>> 5. End the transaction
>>
>> This assumes the old subregion instances are finalized immediately after
>> 3, but it is not true; the finalization is delayed until 5. Remove the
>> assumption by using functions to dynamically update MemoryRegions.
> 
> To Paolo - you can ignore this if you'll merge it.  However I'm still
> raising this as a pure question.
> 
> Could there be an explanation why it'll be delayed until step 5?
> 
> All the MRs involved in this patch are all aliases.  I believe this rules
> out any map() ref-takers.  IIUC it is exactly what was marked exceptions in
> the memory.rst here:
> 
>    There is an exception to the above rule: it is okay to call
>    object_unparent at any time for an alias or a container region.  It is
>    therefore also okay to create or destroy alias and container regions
>    dynamically during a device's lifetime.
>    
>    This exceptional usage is valid because aliases and containers only help
>    QEMU building the guest's memory map; they are never accessed directly.
>    memory_region_ref and memory_region_unref are never called on aliases
>    or containers, and the above situation then cannot happen.  Exploiting
>    this exception is rarely necessary, and therefore it is discouraged,
>    but nevertheless it is used in a few places.
> 
> I was expecting the current code should at least be fine, no?

You are right. This patch does nothing good so I'll drop it.

Regards,
Akihiko Odaki