include/exec/cpu-common.h | 10 ++++----- include/hw/core/cpu.h | 1 - include/system/memory.h | 24 ++++++++++++++++++--- hw/core/cpu-common.c | 1 + stubs/cpu-destroy-address-spaces.c | 15 +++++++++++++ system/memory.c | 20 +++++++++++++++++- system/physmem.c | 34 ++++++++++++++---------------- stubs/meson.build | 1 + 8 files changed, 78 insertions(+), 28 deletions(-) create mode 100644 stubs/cpu-destroy-address-spaces.c
When a vCPU is created, it typically calls cpu_address_space_init() one or more times to set up its address spaces. We don't currently do anything to destroy these address spaces, which means that we will leak them on a vcpu hot-plug -> hot-unplug cycle. This patchset fixes the leak by replacing the current cpu_address_space_destroy() (which has an awkward API, includes a bug, and is never called from anywhere) with a new cpu_destroy_address_spaces() which cleans up all the ASes a CPU has and is called from generic unrealize code. Patch 1 is just a comment improvement to clarify that address_space_destroy() defers most of its work to RCU and you can't free the memory for the AS struct itself until it's done. Patch 2 is from Peter Xu; we need to be able to do "destroy and free an AS" via RCU, and at the moment you can't do that. Patch 3 is the bugfix proper. thanks -- PMM Peter Maydell (2): include/system/memory.h: Clarify address_space_destroy() behaviour physmem: Destroy all CPU AddressSpaces on unrealize Peter Xu (1): memory: New AS helper to serialize destroy+free include/exec/cpu-common.h | 10 ++++----- include/hw/core/cpu.h | 1 - include/system/memory.h | 24 ++++++++++++++++++--- hw/core/cpu-common.c | 1 + stubs/cpu-destroy-address-spaces.c | 15 +++++++++++++ system/memory.c | 20 +++++++++++++++++- system/physmem.c | 34 ++++++++++++++---------------- stubs/meson.build | 1 + 8 files changed, 78 insertions(+), 28 deletions(-) create mode 100644 stubs/cpu-destroy-address-spaces.c -- 2.43.0
On 9/29/25 07:42, Peter Maydell wrote: > When a vCPU is created, it typically calls cpu_address_space_init() > one or more times to set up its address spaces. We don't currently > do anything to destroy these address spaces, which means that we > will leak them on a vcpu hot-plug -> hot-unplug cycle. > > This patchset fixes the leak by replacing the current > cpu_address_space_destroy() (which has an awkward API, includes > a bug, and is never called from anywhere) with a new > cpu_destroy_address_spaces() which cleans up all the ASes a CPU > has and is called from generic unrealize code. > > Patch 1 is just a comment improvement to clarify that > address_space_destroy() defers most of its work to RCU and you > can't free the memory for the AS struct itself until it's done. > > Patch 2 is from Peter Xu; we need to be able to do "destroy and > free an AS" via RCU, and at the moment you can't do that. > > Patch 3 is the bugfix proper. So... I wonder this is really the right direction. To be specific: Over in split-accel-land we recently had a bit of a kerfuffle over TCG registering its MemoryListener with all of the per-cpu address spaces and HVF not doing so. Things get even more complicated when one finds out that some MemoryListener callbacks are only used with "global" address spaces and some are only used with the per-cpu address spaces. On reflection, it seems to me that no address spaces should be owned by the cpus. All address spaces should be owned by the board model, and it should be the board model that configures the address spaces used by each cpu. If we do this then address spaces, and even the array of address spaces, may be created once by the board, shared between all (relevant) cpus, and not destroyed by cpu unplug. Moreover, in the context of virtualization, there's now exactly one address space (since Arm Secure and Tag memory spaces are not required or supported except by emulation; I assume the same is true for x86 smm), and the aforementioned kerfuffle goes away. There's no difference between global and per-cpu address spaces. Anyway, it seems like this provides the same flexibility in the complex heterogeneous case and simplifies things for the homogeneous virtualization case. Thoughts? r~
On Wed, Oct 01, 2025 at 01:36:16PM -0700, Richard Henderson wrote: > On 9/29/25 07:42, Peter Maydell wrote: > > When a vCPU is created, it typically calls cpu_address_space_init() > > one or more times to set up its address spaces. We don't currently > > do anything to destroy these address spaces, which means that we > > will leak them on a vcpu hot-plug -> hot-unplug cycle. > > > > This patchset fixes the leak by replacing the current > > cpu_address_space_destroy() (which has an awkward API, includes > > a bug, and is never called from anywhere) with a new > > cpu_destroy_address_spaces() which cleans up all the ASes a CPU > > has and is called from generic unrealize code. > > > > Patch 1 is just a comment improvement to clarify that > > address_space_destroy() defers most of its work to RCU and you > > can't free the memory for the AS struct itself until it's done. > > > > Patch 2 is from Peter Xu; we need to be able to do "destroy and > > free an AS" via RCU, and at the moment you can't do that. > > > > Patch 3 is the bugfix proper. > > So... I wonder this is really the right direction. > > To be specific: > > Over in split-accel-land we recently had a bit of a kerfuffle over TCG > registering its MemoryListener with all of the per-cpu address spaces and > HVF not doing so. Things get even more complicated when one finds out that > some MemoryListener callbacks are only used with "global" address spaces and > some are only used with the per-cpu address spaces. I only have a very preliminary understanding on this, so please bare with me if I'll make silly mistakes... I was expecting QEMU provides both the global view (address_space_memory), and the per-vcpu view. Then we can register to any of them on demand. Then the global views can be the so-called "board model" mentioned below. Is it not the case? The root MR is also shared in this case, making sure the address space operations internally will share the same flatview. > > On reflection, it seems to me that no address spaces should be owned by the > cpus. All address spaces should be owned by the board model, and it should > be the board model that configures the address spaces used by each cpu. > > If we do this then address spaces, and even the array of address spaces, may > be created once by the board, shared between all (relevant) cpus, and not > destroyed by cpu unplug. > > Moreover, in the context of virtualization, there's now exactly one address > space (since Arm Secure and Tag memory spaces are not required or supported > except by emulation; I assume the same is true for x86 smm), and the > aforementioned kerfuffle goes away. There's no difference between global > and per-cpu address spaces. > > Anyway, it seems like this provides the same flexibility in the complex > heterogeneous case and simplifies things for the homogeneous virtualization > case. We have another question to ask besides this design discussion: Peter's series here solidly fixes a memory leak that can easily reproduce with x86_64 + KVM on cpu hotplugs. Should we still merge it first considering it didn't change how we manage per-vcpu address spaces, but only fixing it? Then anything like a big overhaul can happen on top too. Thanks, -- Peter Xu
On 10/1/25 14:37, Peter Xu wrote: > I only have a very preliminary understanding on this, so please bare with > me if I'll make silly mistakes... > > I was expecting QEMU provides both the global view (address_space_memory), > and the per-vcpu view. My hypothesis is that separate views for each cpu doesn't make sense. In the true symmetric multiprocessor case, which is the vast majority of everything we emulate in QEMU, each processor has the exact same view(s). In the rare asymmetric multiprocessor case, of which we have a few, we have a couple of clusters and within each cluster all cpus share the exact same view(s). Thus access to address spaces on a per-cpu basis makes sense, but allocating them on a per-cpu basis does not. > We have another question to ask besides this design discussion: Peter's > series here solidly fixes a memory leak that can easily reproduce with > x86_64 + KVM on cpu hotplugs. > > Should we still merge it first considering it didn't change how we manage > per-vcpu address spaces, but only fixing it? Then anything like a big > overhaul can happen on top too. Sure, I'm happy with that. I just wanted to raise the related problem that we encountered over this past week. r~
On Wed, 1 Oct 2025 at 22:59, Richard Henderson <richard.henderson@linaro.org> wrote: > > On 10/1/25 14:37, Peter Xu wrote: > > I only have a very preliminary understanding on this, so please bare with > > me if I'll make silly mistakes... > > > > I was expecting QEMU provides both the global view (address_space_memory), > > and the per-vcpu view. > > My hypothesis is that separate views for each cpu doesn't make sense. > > In the true symmetric multiprocessor case, which is the vast majority of everything we > emulate in QEMU, each processor has the exact same view(s). This is not the case in general. Some CPUs have per-CPU devices which show up only for that CPU. Examples include the A9's per-CPU peripherals (which we don't currently model that way, but perhaps ought to) and some stuff in one of the M-profile boards. TCG registers its CPU listener with every AS because the information that TCG is caching that is stale is something it cares about for every AS the CPU has to deal with. HVF and KVM register their CPU listener only with address_space_memory because the information that those accelerators are caching is the details of RAM layout in the system, which in both cases the kernel imposes the limitation of being the same for every CPU. Similarly in target/arm's kvm_arm_register_device() we listen on address_space_memory because the thing we're listening for is "where has this device's MemoryRegion been put", and again the kernel imposes the requirement that the GIC (or whatever) is at the same location for all CPUs. The "global" view of address_space_memory and system_memory is a bit theoretically dubious because there's nothing saying all CPUs have to have some underlying mostly-the-same view of things. But in practice almost all systems do, plus KVM and HVF impose that world-model, so it's easier to retain this than try to think about how we would eliminate it. > In the rare asymmetric multiprocessor case, of which we have a few, we have a couple of > clusters and within each cluster all cpus share the exact same view(s). > > Thus access to address spaces on a per-cpu basis makes sense, but allocating them on a > per-cpu basis does not. Also, in general, per-CPU address spaces fits the rest of how we model memory-transaction-capable devices. The device (e.g. a DMA controller or similar) has a QOM property which takes a pointer to a MemoryRegion describing the view of the world that that device has; it then internally creates an AddressSpace whose purpose is to provide the APIs for making memory transactions into that MemoryRegion. cpu_address_space_init() is just some convenience wrapping around that. thanks -- PMM
Cc'ing Salil for previous discussions on https://lore.kernel.org/qemu-devel/20230918160257.30127-1-philmd@linaro.org/ On 29/9/25 16:42, Peter Maydell wrote: > When a vCPU is created, it typically calls cpu_address_space_init() > one or more times to set up its address spaces. We don't currently > do anything to destroy these address spaces, which means that we > will leak them on a vcpu hot-plug -> hot-unplug cycle. > > This patchset fixes the leak by replacing the current > cpu_address_space_destroy() (which has an awkward API, includes > a bug, and is never called from anywhere) with a new > cpu_destroy_address_spaces() which cleans up all the ASes a CPU > has and is called from generic unrealize code. > > Patch 1 is just a comment improvement to clarify that > address_space_destroy() defers most of its work to RCU and you > can't free the memory for the AS struct itself until it's done. > > Patch 2 is from Peter Xu; we need to be able to do "destroy and > free an AS" via RCU, and at the moment you can't do that. > > Patch 3 is the bugfix proper. > > thanks > -- PMM > > Peter Maydell (2): > include/system/memory.h: Clarify address_space_destroy() behaviour > physmem: Destroy all CPU AddressSpaces on unrealize > > Peter Xu (1): > memory: New AS helper to serialize destroy+free > > include/exec/cpu-common.h | 10 ++++----- > include/hw/core/cpu.h | 1 - > include/system/memory.h | 24 ++++++++++++++++++--- > hw/core/cpu-common.c | 1 + > stubs/cpu-destroy-address-spaces.c | 15 +++++++++++++ > system/memory.c | 20 +++++++++++++++++- > system/physmem.c | 34 ++++++++++++++---------------- > stubs/meson.build | 1 + > 8 files changed, 78 insertions(+), 28 deletions(-) > create mode 100644 stubs/cpu-destroy-address-spaces.c >
On Mon, Sep 29, 2025 at 05:02:40PM +0200, Philippe Mathieu-Daudé wrote: > Cc'ing Salil for previous discussions on > https://lore.kernel.org/qemu-devel/20230918160257.30127-1-philmd@linaro.org/ Thanks. While waiting for more comments, I pre-queued this series. Regarding to patch 1: I still think it's almost impossible to use it correctly on address_space_destroy(), even with the doc.. but I agree that's the best we can have right now. The problem is, afaiu we don't even have a way to know when a rcu callback is processed, unlike the grace period (which corresponds to synchronize_rcu()). I think it means there's almost no existing way for the user to properly release the AddressSpace memory if it is e.g. embeded inside a Device object. Here, we will need to rcu-free the Device object too (which I don't think we do at all..), making sure it's called after ASes' rcu callback. Even if we'll do so, we'll have yet another problem of having the Device free() rcu callback to happen after the AddressSpace's destroy rcu callback, we could rely on what Paolo mentioned on FIFO behavior of rcu queues, but it looks unwanted. For the long term, IMHO we may need to convert all embeded AddressSpaces into using dynamically allocated AddressSpaces (for CPUs, we can stick with cpu's address space API, but we have other places using ASes too that may also need to be converted to heap allocations). Then after all users switching to that we may be able to drop address_space_destroy(). Thanks, -- Peter Xu
© 2016 - 2025 Red Hat, Inc.