[PATCH v8 2/2] memory: Do not create circular reference with subregion

Akihiko Odaki posted 2 patches 1 year, 1 month ago
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>
There is a newer version of this series
[PATCH v8 2/2] memory: Do not create circular reference with subregion
Posted by Akihiko Odaki 1 year, 1 month ago
memory_region_update_container_subregions() used to call
memory_region_ref(), which creates a reference to the owner of the
subregion, on behalf of the owner of the container. This results in a
circular reference if the subregion and container have the same owner.

memory_region_ref() creates a reference to the owner instead of the
memory region to match the lifetime of the owner and memory region. We
do not need such a hack if the subregion and container have the same
owner because the owner will be alive as long as the container is.
Therefore, create a reference to the subregion itself instead ot its
owner in such a case; the reference to the subregion is still necessary
to ensure that the subregion gets finalized after the container.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 system/memory.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/system/memory.c b/system/memory.c
index 78e17e0efa83..972617b0192d 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1855,6 +1855,24 @@ void memory_region_unref(MemoryRegion *mr)
     }
 }
 
+static void memory_region_ref_subregion(MemoryRegion *subregion)
+{
+    if (subregion->container->owner == subregion->owner) {
+        object_ref(OBJECT(subregion));
+    } else {
+        memory_region_ref(subregion);
+    }
+}
+
+static void memory_region_unref_subregion(MemoryRegion *subregion)
+{
+    if (subregion->container->owner == subregion->owner) {
+        object_unref(OBJECT(subregion));
+     } else {
+        memory_region_unref(subregion);
+     }
+}
+
 uint64_t memory_region_size(MemoryRegion *mr)
 {
     if (int128_eq(mr->size, int128_2_64())) {
@@ -2644,7 +2662,8 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
 
     memory_region_transaction_begin();
 
-    memory_region_ref(subregion);
+    memory_region_ref_subregion(subregion);
+
     QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
         if (subregion->priority >= other->priority) {
             QTAILQ_INSERT_BEFORE(other, subregion, subregions_link);
@@ -2702,7 +2721,8 @@ void memory_region_del_subregion(MemoryRegion *mr,
         assert(alias->mapped_via_alias >= 0);
     }
     QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
-    memory_region_unref(subregion);
+    memory_region_unref_subregion(subregion);
+
     memory_region_update_pending |= mr->enabled && subregion->enabled;
     memory_region_transaction_commit();
 }

-- 
2.47.1
Re: [PATCH v8 2/2] memory: Do not create circular reference with subregion
Posted by Peter Maydell 5 months, 3 weeks ago
On Fri, 10 Jan 2025 at 09:20, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>
> memory_region_update_container_subregions() used to call
> memory_region_ref(), which creates a reference to the owner of the
> subregion, on behalf of the owner of the container. This results in a
> circular reference if the subregion and container have the same owner.
>
> memory_region_ref() creates a reference to the owner instead of the
> memory region to match the lifetime of the owner and memory region. We
> do not need such a hack if the subregion and container have the same
> owner because the owner will be alive as long as the container is.
> Therefore, create a reference to the subregion itself instead ot its
> owner in such a case; the reference to the subregion is still necessary
> to ensure that the subregion gets finalized after the container.
>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>

Hi; I came back to this patchset because it was never applied,
and so we still have various leaks of MemoryRegion objects
in devices which use the "device owns a container MemoryRegion C
and puts another MemoryRegion X which it owns into that container"
pattern.

Unfortunately, with this patch applied, I see a segfault when running
the device-introspect-test for Arm:

$ (cd build/x86 && QTEST_QEMU_BINARY=./qemu-system-arm
./tests/qtest/device-introspect-test -p
/arm/device/introspect/concrete/defaults/none)
[...]
# Testing device 'imx7.analog'
Broken pipe
../../tests/qtest/libqtest.c:208: kill_qemu() detected QEMU death from
signal 11 (Segmentation fault) (core dumped)

Here's a backtrace collected with gdb:

$ (cd build/x86 && QTEST_QEMU_BINARY="xterm -e gdb --tty $(tty) --args
./qemu-system-arm" ./tests/qtest/device-introspect-test -p
/arm/device/introspect/concrete/defaults/none)
[hit 'r' in the gdb prompt in the new window]

Thread 1 "qemu-system-arm" received signal SIGSEGV, Segmentation fault.
memory_region_unref_subregion (subregion=0x555557d58370) at
../../system/memory.c:1862
1862        if (subregion->container->owner == subregion->owner) {
(gdb) bt
#0  memory_region_unref_subregion (subregion=0x555557d58370) at
../../system/memory.c:1862
#1  0x0000555555d5f97b in memory_region_del_subregion
(mr=0x555557d58150, subregion=0x555557d58370)
    at ../../system/memory.c:2705
#2  0x0000555555d5cc1c in memory_region_finalize (obj=0x555557d58150)
at ../../system/memory.c:1811
#3  0x0000555556197595 in object_deinit (obj=0x555557d58150,
type=0x5555579d2ea0) at ../../qom/object.c:715
#4  0x0000555556197613 in object_finalize (data=0x555557d58150) at
../../qom/object.c:729
#5  0x00005555561988f9 in object_unref (objptr=0x555557d58150) at
../../qom/object.c:1232
#6  0x000055555619a3d6 in object_finalize_child_property
    (obj=0x555557d57e20, name=0x555557c59810 "imx7.analog[0]",
opaque=0x555557d58150) at ../../qom/object.c:1818
#7  0x0000555556197344 in object_property_del_all (obj=0x555557d57e20)
at ../../qom/object.c:667
#8  0x0000555556197600 in object_finalize (data=0x555557d57e20) at
../../qom/object.c:728
#9  0x00005555561988f9 in object_unref (objptr=0x555557d57e20) at
../../qom/object.c:1232
#10 0x00005555562f4600 in qmp_device_list_properties
(typename=0x555557d3ee70 "imx7.analog", errp=0x7fffffffdc00)
    at ../../qom/qom-qmp-cmds.c:237
#11 0x00005555563981ba in qmp_marshal_device_list_properties
(args=0x7fffe00031f0, ret=0x7fffef6afda8, errp=0x7fffef6afda0)
    at qapi/qapi-commands-qdev.c:65
#12 0x00005555563bcb53 in do_qmp_dispatch_bh (opaque=0x7fffef6afe40)
at ../../qapi/qmp-dispatch.c:128
#13 0x00005555563ed87f in aio_bh_call (bh=0x555557d3f540) at
../../util/async.c:172
#14 0x00005555563ed9cb in aio_bh_poll (ctx=0x555557a00770) at
../../util/async.c:219
#15 0x00005555563cd517 in aio_dispatch (ctx=0x555557a00770) at
../../util/aio-posix.c:436
#16 0x00005555563edebe in aio_ctx_dispatch (source=0x555557a00770,
callback=0x0, user_data=0x0) at ../../util/async.c:361
#17 0x00007ffff6e965c5 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#18 0x00007ffff6e96710 in g_main_context_dispatch () at
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#19 0x00005555563ef89e in glib_pollfds_poll () at ../../util/main-loop.c:287
#20 0x00005555563ef930 in os_host_main_loop_wait (timeout=0) at
../../util/main-loop.c:310
#21 0x00005555563efa6b in main_loop_wait (nonblocking=0) at
../../util/main-loop.c:589
#22 0x0000555555d7d181 in qemu_main_loop () at ../../system/runstate.c:905
#23 0x00005555562f94c7 in qemu_default_main (opaque=0x0) at
../../system/main.c:50
#24 0x00005555562f9585 in main (argc=18, argv=0x7fffffffe078) at
../../system/main.c:93


In memory_region_unref_subregion(), subregion->container is NULL.

This is because in memory_region_del_subregion() we do:

    subregion->container = NULL;

and then after that we call
    memory_region_unref_subregion(subregion);
which dereferences subregion->container.

Won't this always SEGV ?

thanks
-- PMM
Re: [PATCH v8 2/2] memory: Do not create circular reference with subregion
Posted by Peter Maydell 5 months, 3 weeks ago
(updated cc list with Akihiko's new email address)

On Thu, 21 Aug 2025 at 13:40, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Fri, 10 Jan 2025 at 09:20, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> >
> > memory_region_update_container_subregions() used to call
> > memory_region_ref(), which creates a reference to the owner of the
> > subregion, on behalf of the owner of the container. This results in a
> > circular reference if the subregion and container have the same owner.
> >
> > memory_region_ref() creates a reference to the owner instead of the
> > memory region to match the lifetime of the owner and memory region. We
> > do not need such a hack if the subregion and container have the same
> > owner because the owner will be alive as long as the container is.
> > Therefore, create a reference to the subregion itself instead ot its
> > owner in such a case; the reference to the subregion is still necessary
> > to ensure that the subregion gets finalized after the container.
> >
> > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
>
> Hi; I came back to this patchset because it was never applied,
> and so we still have various leaks of MemoryRegion objects
> in devices which use the "device owns a container MemoryRegion C
> and puts another MemoryRegion X which it owns into that container"
> pattern.
>
> Unfortunately, with this patch applied, I see a segfault when running
> the device-introspect-test for Arm:
>
> $ (cd build/x86 && QTEST_QEMU_BINARY=./qemu-system-arm
> ./tests/qtest/device-introspect-test -p
> /arm/device/introspect/concrete/defaults/none)
> [...]
> # Testing device 'imx7.analog'
> Broken pipe
> ../../tests/qtest/libqtest.c:208: kill_qemu() detected QEMU death from
> signal 11 (Segmentation fault) (core dumped)
>
> Here's a backtrace collected with gdb:
>
> $ (cd build/x86 && QTEST_QEMU_BINARY="xterm -e gdb --tty $(tty) --args
> ./qemu-system-arm" ./tests/qtest/device-introspect-test -p
> /arm/device/introspect/concrete/defaults/none)
> [hit 'r' in the gdb prompt in the new window]
>
> Thread 1 "qemu-system-arm" received signal SIGSEGV, Segmentation fault.
> memory_region_unref_subregion (subregion=0x555557d58370) at
> ../../system/memory.c:1862
> 1862        if (subregion->container->owner == subregion->owner) {
> (gdb) bt
> #0  memory_region_unref_subregion (subregion=0x555557d58370) at
> ../../system/memory.c:1862
> #1  0x0000555555d5f97b in memory_region_del_subregion
> (mr=0x555557d58150, subregion=0x555557d58370)
>     at ../../system/memory.c:2705
> #2  0x0000555555d5cc1c in memory_region_finalize (obj=0x555557d58150)
> at ../../system/memory.c:1811
> #3  0x0000555556197595 in object_deinit (obj=0x555557d58150,
> type=0x5555579d2ea0) at ../../qom/object.c:715
> #4  0x0000555556197613 in object_finalize (data=0x555557d58150) at
> ../../qom/object.c:729
> #5  0x00005555561988f9 in object_unref (objptr=0x555557d58150) at
> ../../qom/object.c:1232
> #6  0x000055555619a3d6 in object_finalize_child_property
>     (obj=0x555557d57e20, name=0x555557c59810 "imx7.analog[0]",
> opaque=0x555557d58150) at ../../qom/object.c:1818
> #7  0x0000555556197344 in object_property_del_all (obj=0x555557d57e20)
> at ../../qom/object.c:667
> #8  0x0000555556197600 in object_finalize (data=0x555557d57e20) at
> ../../qom/object.c:728
> #9  0x00005555561988f9 in object_unref (objptr=0x555557d57e20) at
> ../../qom/object.c:1232
> #10 0x00005555562f4600 in qmp_device_list_properties
> (typename=0x555557d3ee70 "imx7.analog", errp=0x7fffffffdc00)
>     at ../../qom/qom-qmp-cmds.c:237
> #11 0x00005555563981ba in qmp_marshal_device_list_properties
> (args=0x7fffe00031f0, ret=0x7fffef6afda8, errp=0x7fffef6afda0)
>     at qapi/qapi-commands-qdev.c:65
> #12 0x00005555563bcb53 in do_qmp_dispatch_bh (opaque=0x7fffef6afe40)
> at ../../qapi/qmp-dispatch.c:128
> #13 0x00005555563ed87f in aio_bh_call (bh=0x555557d3f540) at
> ../../util/async.c:172
> #14 0x00005555563ed9cb in aio_bh_poll (ctx=0x555557a00770) at
> ../../util/async.c:219
> #15 0x00005555563cd517 in aio_dispatch (ctx=0x555557a00770) at
> ../../util/aio-posix.c:436
> #16 0x00005555563edebe in aio_ctx_dispatch (source=0x555557a00770,
> callback=0x0, user_data=0x0) at ../../util/async.c:361
> #17 0x00007ffff6e965c5 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #18 0x00007ffff6e96710 in g_main_context_dispatch () at
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #19 0x00005555563ef89e in glib_pollfds_poll () at ../../util/main-loop.c:287
> #20 0x00005555563ef930 in os_host_main_loop_wait (timeout=0) at
> ../../util/main-loop.c:310
> #21 0x00005555563efa6b in main_loop_wait (nonblocking=0) at
> ../../util/main-loop.c:589
> #22 0x0000555555d7d181 in qemu_main_loop () at ../../system/runstate.c:905
> #23 0x00005555562f94c7 in qemu_default_main (opaque=0x0) at
> ../../system/main.c:50
> #24 0x00005555562f9585 in main (argc=18, argv=0x7fffffffe078) at
> ../../system/main.c:93
>
>
> In memory_region_unref_subregion(), subregion->container is NULL.
>
> This is because in memory_region_del_subregion() we do:
>
>     subregion->container = NULL;
>
> and then after that we call
>     memory_region_unref_subregion(subregion);
> which dereferences subregion->container.
>
> Won't this always SEGV ?
>
> thanks
> -- PMM
Re: [PATCH v8 2/2] memory: Do not create circular reference with subregion
Posted by Peter Xu 5 months, 3 weeks ago
On Thu, Aug 21, 2025 at 01:45:20PM +0100, Peter Maydell wrote:
> (updated cc list with Akihiko's new email address)
> 
> On Thu, 21 Aug 2025 at 13:40, Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > On Fri, 10 Jan 2025 at 09:20, Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > >
> > > memory_region_update_container_subregions() used to call
> > > memory_region_ref(), which creates a reference to the owner of the
> > > subregion, on behalf of the owner of the container. This results in a
> > > circular reference if the subregion and container have the same owner.
> > >
> > > memory_region_ref() creates a reference to the owner instead of the
> > > memory region to match the lifetime of the owner and memory region. We
> > > do not need such a hack if the subregion and container have the same
> > > owner because the owner will be alive as long as the container is.
> > > Therefore, create a reference to the subregion itself instead ot its
> > > owner in such a case; the reference to the subregion is still necessary
> > > to ensure that the subregion gets finalized after the container.
> > >
> > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > Reviewed-by: Peter Xu <peterx@redhat.com>
> >
> > Hi; I came back to this patchset because it was never applied,
> > and so we still have various leaks of MemoryRegion objects
> > in devices which use the "device owns a container MemoryRegion C
> > and puts another MemoryRegion X which it owns into that container"
> > pattern.
> >
> > Unfortunately, with this patch applied, I see a segfault when running
> > the device-introspect-test for Arm:
> >
> > $ (cd build/x86 && QTEST_QEMU_BINARY=./qemu-system-arm
> > ./tests/qtest/device-introspect-test -p
> > /arm/device/introspect/concrete/defaults/none)
> > [...]
> > # Testing device 'imx7.analog'
> > Broken pipe
> > ../../tests/qtest/libqtest.c:208: kill_qemu() detected QEMU death from
> > signal 11 (Segmentation fault) (core dumped)
> >
> > Here's a backtrace collected with gdb:
> >
> > $ (cd build/x86 && QTEST_QEMU_BINARY="xterm -e gdb --tty $(tty) --args
> > ./qemu-system-arm" ./tests/qtest/device-introspect-test -p
> > /arm/device/introspect/concrete/defaults/none)
> > [hit 'r' in the gdb prompt in the new window]
> >
> > Thread 1 "qemu-system-arm" received signal SIGSEGV, Segmentation fault.
> > memory_region_unref_subregion (subregion=0x555557d58370) at
> > ../../system/memory.c:1862
> > 1862        if (subregion->container->owner == subregion->owner) {
> > (gdb) bt
> > #0  memory_region_unref_subregion (subregion=0x555557d58370) at
> > ../../system/memory.c:1862
> > #1  0x0000555555d5f97b in memory_region_del_subregion
> > (mr=0x555557d58150, subregion=0x555557d58370)
> >     at ../../system/memory.c:2705
> > #2  0x0000555555d5cc1c in memory_region_finalize (obj=0x555557d58150)
> > at ../../system/memory.c:1811
> > #3  0x0000555556197595 in object_deinit (obj=0x555557d58150,
> > type=0x5555579d2ea0) at ../../qom/object.c:715
> > #4  0x0000555556197613 in object_finalize (data=0x555557d58150) at
> > ../../qom/object.c:729
> > #5  0x00005555561988f9 in object_unref (objptr=0x555557d58150) at
> > ../../qom/object.c:1232
> > #6  0x000055555619a3d6 in object_finalize_child_property
> >     (obj=0x555557d57e20, name=0x555557c59810 "imx7.analog[0]",
> > opaque=0x555557d58150) at ../../qom/object.c:1818
> > #7  0x0000555556197344 in object_property_del_all (obj=0x555557d57e20)
> > at ../../qom/object.c:667
> > #8  0x0000555556197600 in object_finalize (data=0x555557d57e20) at
> > ../../qom/object.c:728
> > #9  0x00005555561988f9 in object_unref (objptr=0x555557d57e20) at
> > ../../qom/object.c:1232
> > #10 0x00005555562f4600 in qmp_device_list_properties
> > (typename=0x555557d3ee70 "imx7.analog", errp=0x7fffffffdc00)
> >     at ../../qom/qom-qmp-cmds.c:237
> > #11 0x00005555563981ba in qmp_marshal_device_list_properties
> > (args=0x7fffe00031f0, ret=0x7fffef6afda8, errp=0x7fffef6afda0)
> >     at qapi/qapi-commands-qdev.c:65
> > #12 0x00005555563bcb53 in do_qmp_dispatch_bh (opaque=0x7fffef6afe40)
> > at ../../qapi/qmp-dispatch.c:128
> > #13 0x00005555563ed87f in aio_bh_call (bh=0x555557d3f540) at
> > ../../util/async.c:172
> > #14 0x00005555563ed9cb in aio_bh_poll (ctx=0x555557a00770) at
> > ../../util/async.c:219
> > #15 0x00005555563cd517 in aio_dispatch (ctx=0x555557a00770) at
> > ../../util/aio-posix.c:436
> > #16 0x00005555563edebe in aio_ctx_dispatch (source=0x555557a00770,
> > callback=0x0, user_data=0x0) at ../../util/async.c:361
> > #17 0x00007ffff6e965c5 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
> > #18 0x00007ffff6e96710 in g_main_context_dispatch () at
> > /lib/x86_64-linux-gnu/libglib-2.0.so.0
> > #19 0x00005555563ef89e in glib_pollfds_poll () at ../../util/main-loop.c:287
> > #20 0x00005555563ef930 in os_host_main_loop_wait (timeout=0) at
> > ../../util/main-loop.c:310
> > #21 0x00005555563efa6b in main_loop_wait (nonblocking=0) at
> > ../../util/main-loop.c:589
> > #22 0x0000555555d7d181 in qemu_main_loop () at ../../system/runstate.c:905
> > #23 0x00005555562f94c7 in qemu_default_main (opaque=0x0) at
> > ../../system/main.c:50
> > #24 0x00005555562f9585 in main (argc=18, argv=0x7fffffffe078) at
> > ../../system/main.c:93
> >
> >
> > In memory_region_unref_subregion(), subregion->container is NULL.
> >
> > This is because in memory_region_del_subregion() we do:
> >
> >     subregion->container = NULL;
> >
> > and then after that we call
> >     memory_region_unref_subregion(subregion);
> > which dereferences subregion->container.
> >
> > Won't this always SEGV ?
> >
> > thanks
> > -- PMM
> 

Peter, could you try the v3 version patch 8/9 instead?

https://lore.kernel.org/all/20240708-san-v3-8-b03f671c40c6@daynix.com/

I still prefer that one, and I hope that one doesn't have this issue.

If no one objects, IMHO we could merge v3's patch 8/9, and v8's patch 1 (I
can touch up some commit messages and details, e.g. v3's 8/9 commit
message is a bit too simple to me).

Thanks,

-- 
Peter Xu
Re: [PATCH v8 2/2] memory: Do not create circular reference with subregion
Posted by Peter Maydell 5 months, 3 weeks ago
On Thu, 21 Aug 2025 at 15:28, Peter Xu <peterx@redhat.com> wrote:
> > On Thu, 21 Aug 2025 at 13:40, Peter Maydell <peter.maydell@linaro.org> wrote:
> > > In memory_region_unref_subregion(), subregion->container is NULL.
> > >
> > > This is because in memory_region_del_subregion() we do:
> > >
> > >     subregion->container = NULL;
> > >
> > > and then after that we call
> > >     memory_region_unref_subregion(subregion);
> > > which dereferences subregion->container.
> > >
> > > Won't this always SEGV ?

> Peter, could you try the v3 version patch 8/9 instead?
>
> https://lore.kernel.org/all/20240708-san-v3-8-b03f671c40c6@daynix.com/
>
> I still prefer that one, and I hope that one doesn't have this issue.

That one fails like this:
qemu-system-arm: ../../system/memory.c:1799: memory_region_finalize:
Assertion `!mr->container' failed.

See the discussion on v2 (which was the same for this patch):
https://lore.kernel.org/all/CAFEAcA9KTSjwF1rABpM5nv9UFuKqZZk6+Qo4PEF4+rTirNi5fQ@mail.gmail.com/

thanks
-- PMM
Re: [PATCH v8 2/2] memory: Do not create circular reference with subregion
Posted by Peter Xu 5 months, 3 weeks ago
On Thu, Aug 21, 2025 at 03:47:06PM +0100, Peter Maydell wrote:
> On Thu, 21 Aug 2025 at 15:28, Peter Xu <peterx@redhat.com> wrote:
> > > On Thu, 21 Aug 2025 at 13:40, Peter Maydell <peter.maydell@linaro.org> wrote:
> > > > In memory_region_unref_subregion(), subregion->container is NULL.
> > > >
> > > > This is because in memory_region_del_subregion() we do:
> > > >
> > > >     subregion->container = NULL;
> > > >
> > > > and then after that we call
> > > >     memory_region_unref_subregion(subregion);
> > > > which dereferences subregion->container.
> > > >
> > > > Won't this always SEGV ?
> 
> > Peter, could you try the v3 version patch 8/9 instead?
> >
> > https://lore.kernel.org/all/20240708-san-v3-8-b03f671c40c6@daynix.com/
> >
> > I still prefer that one, and I hope that one doesn't have this issue.
> 
> That one fails like this:
> qemu-system-arm: ../../system/memory.c:1799: memory_region_finalize:
> Assertion `!mr->container' failed.
> 
> See the discussion on v2 (which was the same for this patch):
> https://lore.kernel.org/all/CAFEAcA9KTSjwF1rABpM5nv9UFuKqZZk6+Qo4PEF4+rTirNi5fQ@mail.gmail.com/

My apologies, it was a long discussion and I totally forgot that..

I remember I provided a draft somewhere during the discussion, anyway.. I
redid it, and attached a formal patch below that will contain the removal
of the mr->container check (plus auto-detach when it happens).  The hope is
this should work.. and comparing to the v8 of Akihiko's, it won't make MR's
own refcount any more complicated.

If necessary, I can send a formal patch.

Thanks,

===8<===
From f985c54af3e276b175bcb02725a5fb996c3f5fe5 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Thu, 21 Aug 2025 12:59:02 -0400
Subject: [PATCH] memory: Fix leaks due to owner-shared MRs circular references

Currently, QEMU refcounts the MR by always taking it from the owner.

It should be non-issue if all the owners of MRs will properly detach all
the MRs from their parents by memory_region_del_subregion() when the owner
will be freed.  However, it turns out many of the device emulations do not
do that properly.  It might be a challenge to fix all of such occurances.

Without fixing it, QEMU faces circular reference issue when an owner can
contain more than one MRs, then when they are linked within each other in
form of subregions, those links keep the owner alive forever, causing
memory leaks even if all the external refcounts are released for the owner.

To fix that, teach memory API to stop refcount on MRs that share the same
owner because if they share the lifecycle of the owner, then they share the
same lifecycle between themselves.

Meanwhile, allow auto-detachments of MRs during finalize() of MRs even
against its container, as long as they belong to the same owner.

The latter is needed because now it's possible to have MR finalize() happen
in any order when they exactly share the same lifespan.  In this case, we
should allow finalize() to happen in any order of either the parent or
child MR, however it should be guaranteed that the mr->container shares the
same owner of this MR to be finalized.

Proper document this behavior in code.

This patch is heavily based on the work done by Akihiko Odaki:

https://lore.kernel.org/r/CAFEAcA8DV40fGsci76r4yeP1P-SP_QjNRDD2OzPxjx5wRs0GEg@mail.gmail.com

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 docs/devel/memory.rst |  9 +++++++--
 system/memory.c       | 45 ++++++++++++++++++++++++++++++++++---------
 2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/docs/devel/memory.rst b/docs/devel/memory.rst
index 57fb2aec76..1367c7caf7 100644
--- a/docs/devel/memory.rst
+++ b/docs/devel/memory.rst
@@ -158,8 +158,13 @@ ioeventfd) can be changed during the region lifecycle.  They take effect
 as soon as the region is made visible.  This can be immediately, later,
 or never.
 
-Destruction of a memory region happens automatically when the owner
-object dies.
+Destruction of a memory region happens automatically when the owner object
+dies.  Note that the MRs under the same owner can be destroyed in any order
+when the owner object dies.  It's because the MRs will share the same
+lifespan of the owner, no matter if its a container or child MR.  It's
+suggested to always cleanly detach the MRs under the owner object when the
+owner object is going to be destroyed, however it is not required, as the
+memory core will automatically detach the links when MRs are destroyed.
 
 If however the memory region is part of a dynamically allocated data
 structure, you should call object_unparent() to destroy the memory region
diff --git a/system/memory.c b/system/memory.c
index 5646547940..d7f6ad9be2 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1796,16 +1796,36 @@ static void memory_region_finalize(Object *obj)
 {
     MemoryRegion *mr = MEMORY_REGION(obj);
 
-    assert(!mr->container);
-
-    /* We know the region is not visible in any address space (it
-     * does not have a container and cannot be a root either because
-     * it has no references, so we can blindly clear mr->enabled.
-     * memory_region_set_enabled instead could trigger a transaction
-     * and cause an infinite loop.
+    /*
+     * Each memory region (that can be dynamically freed..) must has an
+     * owner, and it always has the same lifecycle of its owner.  It means
+     * when reaching here, the memory region's owner refcount is zero.
+     *
+     * Here it is possible that the MR has:
+     *
+     * (1) mr->container set, which means this MR can be a subregion of a
+     *     container MR, in this case it must share the same owner
+     *     (otherwise the container should have kept a refcount of this
+     *     MR's owner), or,
+     *
+     * (2) mr->subregions non-empty, which means this MR can be a container
+     *     of other MRs (share the owner or not).
+     *
+     * We know the MR, or any MR that is attached to this one as either
+     * container or children, is not visible in any address space, because
+     * otherwise the address space should have taken at least one refcount
+     * of this MR's owner.  So we can blindly clear mr->enabled.
+     *
+     * memory_region_set_enabled instead could trigger a transaction and
+     * cause an infinite loop.
      */
     mr->enabled = false;
     memory_region_transaction_begin();
+    if (mr->container) {
+        /* Must share the owner; see above comments */
+        assert(mr->container->owner == mr->owner);
+        memory_region_del_subregion(mr->container, mr);
+    }
     while (!QTAILQ_EMPTY(&mr->subregions)) {
         MemoryRegion *subregion = QTAILQ_FIRST(&mr->subregions);
         memory_region_del_subregion(mr, subregion);
@@ -2625,7 +2645,10 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
 
     memory_region_transaction_begin();
 
-    memory_region_ref(subregion);
+    if (mr->owner != subregion->owner) {
+        memory_region_ref(subregion);
+    }
+
     QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
         if (subregion->priority >= other->priority) {
             QTAILQ_INSERT_BEFORE(other, subregion, subregions_link);
@@ -2683,7 +2706,11 @@ void memory_region_del_subregion(MemoryRegion *mr,
         assert(alias->mapped_via_alias >= 0);
     }
     QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
-    memory_region_unref(subregion);
+
+    if (mr->owner != subregion->owner) {
+        memory_region_unref(subregion);
+    }
+
     memory_region_update_pending |= mr->enabled && subregion->enabled;
     memory_region_transaction_commit();
 }
-- 
2.50.1


-- 
Peter Xu
Re: [PATCH v8 2/2] memory: Do not create circular reference with subregion
Posted by Peter Maydell 5 months, 2 weeks ago
On Thu, 21 Aug 2025 at 19:18, Peter Xu <peterx@redhat.com> wrote:
> I remember I provided a draft somewhere during the discussion, anyway.. I
> redid it, and attached a formal patch below that will contain the removal
> of the mr->container check (plus auto-detach when it happens).  The hope is
> this should work.. and comparing to the v8 of Akihiko's, it won't make MR's
> own refcount any more complicated.
>
> If necessary, I can send a formal patch.

This patch seems to work, in that it fixes the memory-region
related leaks I previously was seeing. Review comments below
(which are only about the commit message and docs).

I also can't remember the details of the previous discussions about
these patches, so I'm reviewing the one below essentially from
scratch. Apologies in advance if we end up going back around
a conversation loop that we've already had...

> Thanks,
>
> ===8<===
> From f985c54af3e276b175bcb02725a5fb996c3f5fe5 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Thu, 21 Aug 2025 12:59:02 -0400
> Subject: [PATCH] memory: Fix leaks due to owner-shared MRs circular references
>
> Currently, QEMU refcounts the MR by always taking it from the owner.
>
> It should be non-issue if all the owners of MRs will properly detach all
> the MRs from their parents by memory_region_del_subregion() when the owner
> will be freed.  However, it turns out many of the device emulations do not
> do that properly.  It might be a challenge to fix all of such occurances.

I think that it's not really right to cast this as "some devices
don't do this right". The pattern of "a device has a container MR C
and some other MRs A, B... which it puts into C" is a legitimate one.
If you do this then (with the current code in QEMU) there is *no*
place where a device emulation can do a manual "remove A, B.. from
the container C so the refcounts go down": the place where devices
undo what they did in instance_init is instance_deinit, but we
will never call instance_deinit because the refcount of the owning
device never goes to 0.

Further, even if we had some place where devices could somehow
undo the "put A, B... in the container so the refcounts go down
correctly", it is better API design to have the memory.c code
automatically handle this situation. "This just works" is more
reliable than "this works if you do cleanup step X", because it's
impossible to have the bug where you forget to write the code to
do the cleanup step.

> Without fixing it, QEMU faces circular reference issue when an owner can
> contain more than one MRs, then when they are linked within each other in
> form of subregions, those links keep the owner alive forever, causing
> memory leaks even if all the external refcounts are released for the owner.
>
> To fix that, teach memory API to stop refcount on MRs that share the same
> owner because if they share the lifecycle of the owner, then they share the
> same lifecycle between themselves.
>
> Meanwhile, allow auto-detachments of MRs during finalize() of MRs even
> against its container, as long as they belong to the same owner.
>
> The latter is needed because now it's possible to have MR finalize() happen
> in any order when they exactly share the same lifespan.  In this case, we
> should allow finalize() to happen in any order of either the parent or
> child MR, however it should be guaranteed that the mr->container shares the
> same owner of this MR to be finalized.
>
> Proper document this behavior in code.
>
> This patch is heavily based on the work done by Akihiko Odaki:
>
> https://lore.kernel.org/r/CAFEAcA8DV40fGsci76r4yeP1P-SP_QjNRDD2OzPxjx5wRs0GEg@mail.gmail.com
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  docs/devel/memory.rst |  9 +++++++--
>  system/memory.c       | 45 ++++++++++++++++++++++++++++++++++---------
>  2 files changed, 43 insertions(+), 11 deletions(-)
>
> diff --git a/docs/devel/memory.rst b/docs/devel/memory.rst
> index 57fb2aec76..1367c7caf7 100644
> --- a/docs/devel/memory.rst
> +++ b/docs/devel/memory.rst
> @@ -158,8 +158,13 @@ ioeventfd) can be changed during the region lifecycle.  They take effect
>  as soon as the region is made visible.  This can be immediately, later,
>  or never.
>
> -Destruction of a memory region happens automatically when the owner
> -object dies.
> +Destruction of a memory region happens automatically when the owner object
> +dies.  Note that the MRs under the same owner can be destroyed in any order
> +when the owner object dies.  It's because the MRs will share the same
> +lifespan of the owner, no matter if its a container or child MR.  It's
> +suggested to always cleanly detach the MRs under the owner object when the
> +owner object is going to be destroyed, however it is not required, as the
> +memory core will automatically detach the links when MRs are destroyed.

I think we should not say "we suggest you always cleanly detach the MRs":
instead we should say "you can rely on this happening, so you don't
need to write manual code to do it".

The actual code changes look OK to me.

thanks
-- PMM
Re: [PATCH v8 2/2] memory: Do not create circular reference with subregion
Posted by Peter Xu 5 months, 2 weeks ago
On Tue, Aug 26, 2025 at 06:45:43PM +0100, Peter Maydell wrote:
> On Thu, 21 Aug 2025 at 19:18, Peter Xu <peterx@redhat.com> wrote:
> > I remember I provided a draft somewhere during the discussion, anyway.. I
> > redid it, and attached a formal patch below that will contain the removal
> > of the mr->container check (plus auto-detach when it happens).  The hope is
> > this should work.. and comparing to the v8 of Akihiko's, it won't make MR's
> > own refcount any more complicated.
> >
> > If necessary, I can send a formal patch.
> 
> This patch seems to work, in that it fixes the memory-region
> related leaks I previously was seeing. Review comments below
> (which are only about the commit message and docs).
> 
> I also can't remember the details of the previous discussions about
> these patches, so I'm reviewing the one below essentially from
> scratch. Apologies in advance if we end up going back around
> a conversation loop that we've already had...
> 
> > Thanks,
> >
> > ===8<===
> > From f985c54af3e276b175bcb02725a5fb996c3f5fe5 Mon Sep 17 00:00:00 2001
> > From: Peter Xu <peterx@redhat.com>
> > Date: Thu, 21 Aug 2025 12:59:02 -0400
> > Subject: [PATCH] memory: Fix leaks due to owner-shared MRs circular references
> >
> > Currently, QEMU refcounts the MR by always taking it from the owner.
> >
> > It should be non-issue if all the owners of MRs will properly detach all
> > the MRs from their parents by memory_region_del_subregion() when the owner
> > will be freed.  However, it turns out many of the device emulations do not
> > do that properly.  It might be a challenge to fix all of such occurances.
> 
> I think that it's not really right to cast this as "some devices
> don't do this right". The pattern of "a device has a container MR C
> and some other MRs A, B... which it puts into C" is a legitimate one.
> If you do this then (with the current code in QEMU) there is *no*
> place where a device emulation can do a manual "remove A, B.. from
> the container C so the refcounts go down": the place where devices
> undo what they did in instance_init is instance_deinit, but we
> will never call instance_deinit because the refcount of the owning
> device never goes to 0.

It should still be able to reach zero if we skip the refcount of internal
MR subregions like what this patch does.

Said that, I think you're right..  the problem is we have object_deinit()
after object_property_del_all() (in object_finalize()), which means
memory_region_finalize() will be invoked before object_deinit()..  Then it
looks wrong now to delete subregions in a deinit() even if reachable,
because the MRs should have been freed..

> 
> Further, even if we had some place where devices could somehow
> undo the "put A, B... in the container so the refcounts go down
> correctly", it is better API design to have the memory.c code
> automatically handle this situation. "This just works" is more
> reliable than "this works if you do cleanup step X", because it's
> impossible to have the bug where you forget to write the code to
> do the cleanup step.

Fair enough.

> 
> > Without fixing it, QEMU faces circular reference issue when an owner can
> > contain more than one MRs, then when they are linked within each other in
> > form of subregions, those links keep the owner alive forever, causing
> > memory leaks even if all the external refcounts are released for the owner.
> >
> > To fix that, teach memory API to stop refcount on MRs that share the same
> > owner because if they share the lifecycle of the owner, then they share the
> > same lifecycle between themselves.
> >
> > Meanwhile, allow auto-detachments of MRs during finalize() of MRs even
> > against its container, as long as they belong to the same owner.
> >
> > The latter is needed because now it's possible to have MR finalize() happen
> > in any order when they exactly share the same lifespan.  In this case, we
> > should allow finalize() to happen in any order of either the parent or
> > child MR, however it should be guaranteed that the mr->container shares the
> > same owner of this MR to be finalized.
> >
> > Proper document this behavior in code.
> >
> > This patch is heavily based on the work done by Akihiko Odaki:
> >
> > https://lore.kernel.org/r/CAFEAcA8DV40fGsci76r4yeP1P-SP_QjNRDD2OzPxjx5wRs0GEg@mail.gmail.com
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  docs/devel/memory.rst |  9 +++++++--
> >  system/memory.c       | 45 ++++++++++++++++++++++++++++++++++---------
> >  2 files changed, 43 insertions(+), 11 deletions(-)
> >
> > diff --git a/docs/devel/memory.rst b/docs/devel/memory.rst
> > index 57fb2aec76..1367c7caf7 100644
> > --- a/docs/devel/memory.rst
> > +++ b/docs/devel/memory.rst
> > @@ -158,8 +158,13 @@ ioeventfd) can be changed during the region lifecycle.  They take effect
> >  as soon as the region is made visible.  This can be immediately, later,
> >  or never.
> >
> > -Destruction of a memory region happens automatically when the owner
> > -object dies.
> > +Destruction of a memory region happens automatically when the owner object
> > +dies.  Note that the MRs under the same owner can be destroyed in any order
> > +when the owner object dies.  It's because the MRs will share the same
> > +lifespan of the owner, no matter if its a container or child MR.  It's
> > +suggested to always cleanly detach the MRs under the owner object when the
> > +owner object is going to be destroyed, however it is not required, as the
> > +memory core will automatically detach the links when MRs are destroyed.
> 
> I think we should not say "we suggest you always cleanly detach the MRs":
> instead we should say "you can rely on this happening, so you don't
> need to write manual code to do it".
> 
> The actual code changes look OK to me.

I'll send the patch out officially for review, with above point taken.

Akihiko, let me know anytime when you want to either add your SoB or take
over the ownership of the patch.  I'll be OK with it.

Thanks,

-- 
Peter Xu
Re: [PATCH v8 2/2] memory: Do not create circular reference with subregion
Posted by Akihiko Odaki 5 months, 2 weeks ago
On 2025/08/27 6:57, Peter Xu wrote:
> On Tue, Aug 26, 2025 at 06:45:43PM +0100, Peter Maydell wrote:
>> On Thu, 21 Aug 2025 at 19:18, Peter Xu <peterx@redhat.com> wrote:
>>> I remember I provided a draft somewhere during the discussion, anyway.. I
>>> redid it, and attached a formal patch below that will contain the removal
>>> of the mr->container check (plus auto-detach when it happens).  The hope is
>>> this should work.. and comparing to the v8 of Akihiko's, it won't make MR's
>>> own refcount any more complicated.
>>>
>>> If necessary, I can send a formal patch.
>>
>> This patch seems to work, in that it fixes the memory-region
>> related leaks I previously was seeing. Review comments below
>> (which are only about the commit message and docs).
>>
>> I also can't remember the details of the previous discussions about
>> these patches, so I'm reviewing the one below essentially from
>> scratch. Apologies in advance if we end up going back around
>> a conversation loop that we've already had...
>>
>>> Thanks,
>>>
>>> ===8<===
>>>  From f985c54af3e276b175bcb02725a5fb996c3f5fe5 Mon Sep 17 00:00:00 2001
>>> From: Peter Xu <peterx@redhat.com>
>>> Date: Thu, 21 Aug 2025 12:59:02 -0400
>>> Subject: [PATCH] memory: Fix leaks due to owner-shared MRs circular references
>>>
>>> Currently, QEMU refcounts the MR by always taking it from the owner.
>>>
>>> It should be non-issue if all the owners of MRs will properly detach all
>>> the MRs from their parents by memory_region_del_subregion() when the owner
>>> will be freed.  However, it turns out many of the device emulations do not
>>> do that properly.  It might be a challenge to fix all of such occurances.
>>
>> I think that it's not really right to cast this as "some devices
>> don't do this right". The pattern of "a device has a container MR C
>> and some other MRs A, B... which it puts into C" is a legitimate one.
>> If you do this then (with the current code in QEMU) there is *no*
>> place where a device emulation can do a manual "remove A, B.. from
>> the container C so the refcounts go down": the place where devices
>> undo what they did in instance_init is instance_deinit, but we
>> will never call instance_deinit because the refcount of the owning
>> device never goes to 0.
> 
> It should still be able to reach zero if we skip the refcount of internal
> MR subregions like what this patch does.
> 
> Said that, I think you're right..  the problem is we have object_deinit()
> after object_property_del_all() (in object_finalize()), which means
> memory_region_finalize() will be invoked before object_deinit()..  Then it
> looks wrong now to delete subregions in a deinit() even if reachable,
> because the MRs should have been freed..
> 
>>
>> Further, even if we had some place where devices could somehow
>> undo the "put A, B... in the container so the refcounts go down
>> correctly", it is better API design to have the memory.c code
>> automatically handle this situation. "This just works" is more
>> reliable than "this works if you do cleanup step X", because it's
>> impossible to have the bug where you forget to write the code to
>> do the cleanup step.
> 
> Fair enough.
> 
>>
>>> Without fixing it, QEMU faces circular reference issue when an owner can
>>> contain more than one MRs, then when they are linked within each other in
>>> form of subregions, those links keep the owner alive forever, causing
>>> memory leaks even if all the external refcounts are released for the owner.
>>>
>>> To fix that, teach memory API to stop refcount on MRs that share the same
>>> owner because if they share the lifecycle of the owner, then they share the
>>> same lifecycle between themselves.
>>>
>>> Meanwhile, allow auto-detachments of MRs during finalize() of MRs even
>>> against its container, as long as they belong to the same owner.
>>>
>>> The latter is needed because now it's possible to have MR finalize() happen
>>> in any order when they exactly share the same lifespan.  In this case, we
>>> should allow finalize() to happen in any order of either the parent or
>>> child MR, however it should be guaranteed that the mr->container shares the
>>> same owner of this MR to be finalized.
>>>
>>> Proper document this behavior in code.
>>>
>>> This patch is heavily based on the work done by Akihiko Odaki:
>>>
>>> https://lore.kernel.org/r/CAFEAcA8DV40fGsci76r4yeP1P-SP_QjNRDD2OzPxjx5wRs0GEg@mail.gmail.com
>>>
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>   docs/devel/memory.rst |  9 +++++++--
>>>   system/memory.c       | 45 ++++++++++++++++++++++++++++++++++---------
>>>   2 files changed, 43 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/docs/devel/memory.rst b/docs/devel/memory.rst
>>> index 57fb2aec76..1367c7caf7 100644
>>> --- a/docs/devel/memory.rst
>>> +++ b/docs/devel/memory.rst
>>> @@ -158,8 +158,13 @@ ioeventfd) can be changed during the region lifecycle.  They take effect
>>>   as soon as the region is made visible.  This can be immediately, later,
>>>   or never.
>>>
>>> -Destruction of a memory region happens automatically when the owner
>>> -object dies.
>>> +Destruction of a memory region happens automatically when the owner object
>>> +dies.  Note that the MRs under the same owner can be destroyed in any order
>>> +when the owner object dies.  It's because the MRs will share the same
>>> +lifespan of the owner, no matter if its a container or child MR.  It's
>>> +suggested to always cleanly detach the MRs under the owner object when the
>>> +owner object is going to be destroyed, however it is not required, as the
>>> +memory core will automatically detach the links when MRs are destroyed.
>>
>> I think we should not say "we suggest you always cleanly detach the MRs":
>> instead we should say "you can rely on this happening, so you don't
>> need to write manual code to do it".
>>
>> The actual code changes look OK to me.
> 
> I'll send the patch out officially for review, with above point taken.
> 
> Akihiko, let me know anytime when you want to either add your SoB or take
> over the ownership of the patch.  I'll be OK with it.

I'm not in favor of the change. There was a long discussion in the past, 
but I think the following email is the most comprehensive description of 
my point and includes comparison between the two approaches:
https://lore.kernel.org/qemu-devel/2fe8b128-dda1-40af-89ac-e86ba53138f5@daynix.com/

So please check the email and reply it.

Regards,
Akihiko Odaki
Re: [PATCH v8 2/2] memory: Do not create circular reference with subregion
Posted by Peter Maydell 5 months, 2 weeks ago
On Thu, 28 Aug 2025 at 02:13, Akihiko Odaki
<odaki@rsg.ci.i.u-tokyo.ac.jp> wrote:
> I'm not in favor of the change. There was a long discussion in the past,
> but I think the following email is the most comprehensive description of
> my point and includes comparison between the two approaches:
> https://lore.kernel.org/qemu-devel/2fe8b128-dda1-40af-89ac-e86ba53138f5@daynix.com/
>
> So please check the email and reply it.

Do you have a version of your patch which works (i.e. doesn't
segfault) ?

-- PMM
Re: [PATCH v8 2/2] memory: Do not create circular reference with subregion
Posted by Akihiko Odaki 5 months, 2 weeks ago
On 2025/08/28 18:54, Peter Maydell wrote:
> On Thu, 28 Aug 2025 at 02:13, Akihiko Odaki
> <odaki@rsg.ci.i.u-tokyo.ac.jp> wrote:
>> I'm not in favor of the change. There was a long discussion in the past,
>> but I think the following email is the most comprehensive description of
>> my point and includes comparison between the two approaches:
>> https://lore.kernel.org/qemu-devel/2fe8b128-dda1-40af-89ac-e86ba53138f5@daynix.com/
>>
>> So please check the email and reply it.
> 
> Do you have a version of your patch which works (i.e. doesn't
> segfault) ?

Sorry, I missed the context. And indeed v7 and v8 do not work; It seems 
I failed to test v7.

I posted v9, which fixes the NULL dereference.

Regards,
Akihiko Odaki