system: Don't leak CPU AddressSpaces

[PATCH 0/3] system: Don't leak CPU AddressSpaces

Posted by Peter Maydell 4 months, 1 week ago

When a vCPU is created, it typically calls cpu_address_space_init()
one or more times to set up its address spaces. We don't currently
do anything to destroy these address spaces, which means that we
will leak them on a vcpu hot-plug -> hot-unplug cycle.

This patchset fixes the leak by replacing the current
cpu_address_space_destroy() (which has an awkward API, includes
a bug, and is never called from anywhere) with a new
cpu_destroy_address_spaces() which cleans up all the ASes a CPU
has and is called from generic unrealize code.

Patch 1 is just a comment improvement to clarify that
address_space_destroy() defers most of its work to RCU and you
can't free the memory for the AS struct itself until it's done.

Patch 2 is from Peter Xu; we need to be able to do "destroy and
free an AS" via RCU, and at the moment you can't do that.

Patch 3 is the bugfix proper.

thanks
-- PMM

Peter Maydell (2):
  include/system/memory.h: Clarify address_space_destroy() behaviour
  physmem: Destroy all CPU AddressSpaces on unrealize

Peter Xu (1):
  memory: New AS helper to serialize destroy+free

 include/exec/cpu-common.h          | 10 ++++-----
 include/hw/core/cpu.h              |  1 -
 include/system/memory.h            | 24 ++++++++++++++++++---
 hw/core/cpu-common.c               |  1 +
 stubs/cpu-destroy-address-spaces.c | 15 +++++++++++++
 system/memory.c                    | 20 +++++++++++++++++-
 system/physmem.c                   | 34 ++++++++++++++----------------
 stubs/meson.build                  |  1 +
 8 files changed, 78 insertions(+), 28 deletions(-)
 create mode 100644 stubs/cpu-destroy-address-spaces.c

-- 
2.43.0

Re: [PATCH 0/3] system: Don't leak CPU AddressSpaces

Posted by Richard Henderson 4 months, 1 week ago

On 9/29/25 07:42, Peter Maydell wrote:
> When a vCPU is created, it typically calls cpu_address_space_init()
> one or more times to set up its address spaces. We don't currently
> do anything to destroy these address spaces, which means that we
> will leak them on a vcpu hot-plug -> hot-unplug cycle.
> 
> This patchset fixes the leak by replacing the current
> cpu_address_space_destroy() (which has an awkward API, includes
> a bug, and is never called from anywhere) with a new
> cpu_destroy_address_spaces() which cleans up all the ASes a CPU
> has and is called from generic unrealize code.
> 
> Patch 1 is just a comment improvement to clarify that
> address_space_destroy() defers most of its work to RCU and you
> can't free the memory for the AS struct itself until it's done.
> 
> Patch 2 is from Peter Xu; we need to be able to do "destroy and
> free an AS" via RCU, and at the moment you can't do that.
> 
> Patch 3 is the bugfix proper.

So... I wonder this is really the right direction.

To be specific:

Over in split-accel-land we recently had a bit of a kerfuffle over TCG registering its 
MemoryListener with all of the per-cpu address spaces and HVF not doing so.  Things get 
even more complicated when one finds out that some MemoryListener callbacks are only used 
with "global" address spaces and some are only used with the per-cpu address spaces.

On reflection, it seems to me that no address spaces should be owned by the cpus.  All 
address spaces should be owned by the board model, and it should be the board model that 
configures the address spaces used by each cpu.

If we do this then address spaces, and even the array of address spaces, may be created 
once by the board, shared between all (relevant) cpus, and not destroyed by cpu unplug.

Moreover, in the context of virtualization, there's now exactly one address space (since 
Arm Secure and Tag memory spaces are not required or supported except by emulation; I 
assume the same is true for x86 smm), and the aforementioned kerfuffle goes away.  There's 
no difference between global and per-cpu address spaces.

Anyway, it seems like this provides the same flexibility in the complex heterogeneous case 
and simplifies things for the homogeneous virtualization case.

Thoughts?

r~

Re: [PATCH 0/3] system: Don't leak CPU AddressSpaces

Posted by Peter Xu 4 months, 1 week ago

On Wed, Oct 01, 2025 at 01:36:16PM -0700, Richard Henderson wrote:
> On 9/29/25 07:42, Peter Maydell wrote:
> > When a vCPU is created, it typically calls cpu_address_space_init()
> > one or more times to set up its address spaces. We don't currently
> > do anything to destroy these address spaces, which means that we
> > will leak them on a vcpu hot-plug -> hot-unplug cycle.
> > 
> > This patchset fixes the leak by replacing the current
> > cpu_address_space_destroy() (which has an awkward API, includes
> > a bug, and is never called from anywhere) with a new
> > cpu_destroy_address_spaces() which cleans up all the ASes a CPU
> > has and is called from generic unrealize code.
> > 
> > Patch 1 is just a comment improvement to clarify that
> > address_space_destroy() defers most of its work to RCU and you
> > can't free the memory for the AS struct itself until it's done.
> > 
> > Patch 2 is from Peter Xu; we need to be able to do "destroy and
> > free an AS" via RCU, and at the moment you can't do that.
> > 
> > Patch 3 is the bugfix proper.
> 
> So... I wonder this is really the right direction.
> 
> To be specific:
> 
> Over in split-accel-land we recently had a bit of a kerfuffle over TCG
> registering its MemoryListener with all of the per-cpu address spaces and
> HVF not doing so.  Things get even more complicated when one finds out that
> some MemoryListener callbacks are only used with "global" address spaces and
> some are only used with the per-cpu address spaces.

I only have a very preliminary understanding on this, so please bare with
me if I'll make silly mistakes...

I was expecting QEMU provides both the global view (address_space_memory),
and the per-vcpu view.  Then we can register to any of them on demand.
Then the global views can be the so-called "board model" mentioned below.

Is it not the case?

The root MR is also shared in this case, making sure the address space
operations internally will share the same flatview.

> 
> On reflection, it seems to me that no address spaces should be owned by the
> cpus.  All address spaces should be owned by the board model, and it should
> be the board model that configures the address spaces used by each cpu.
> 
> If we do this then address spaces, and even the array of address spaces, may
> be created once by the board, shared between all (relevant) cpus, and not
> destroyed by cpu unplug.
> 
> Moreover, in the context of virtualization, there's now exactly one address
> space (since Arm Secure and Tag memory spaces are not required or supported
> except by emulation; I assume the same is true for x86 smm), and the
> aforementioned kerfuffle goes away.  There's no difference between global
> and per-cpu address spaces.
> 
> Anyway, it seems like this provides the same flexibility in the complex
> heterogeneous case and simplifies things for the homogeneous virtualization
> case.

We have another question to ask besides this design discussion: Peter's
series here solidly fixes a memory leak that can easily reproduce with
x86_64 + KVM on cpu hotplugs.

Should we still merge it first considering it didn't change how we manage
per-vcpu address spaces, but only fixing it?  Then anything like a big
overhaul can happen on top too.

Thanks,

-- 
Peter Xu

Re: [PATCH 0/3] system: Don't leak CPU AddressSpaces

Posted by Richard Henderson 4 months, 1 week ago

On 10/1/25 14:37, Peter Xu wrote:
> I only have a very preliminary understanding on this, so please bare with
> me if I'll make silly mistakes...
> 
> I was expecting QEMU provides both the global view (address_space_memory),
> and the per-vcpu view.

My hypothesis is that separate views for each cpu doesn't make sense.

In the true symmetric multiprocessor case, which is the vast majority of everything we 
emulate in QEMU, each processor has the exact same view(s).

In the rare asymmetric multiprocessor case, of which we have a few, we have a couple of 
clusters and within each cluster all cpus share the exact same view(s).

Thus access to address spaces on a per-cpu basis makes sense, but allocating them on a 
per-cpu basis does not.

> We have another question to ask besides this design discussion: Peter's
> series here solidly fixes a memory leak that can easily reproduce with
> x86_64 + KVM on cpu hotplugs.
> 
> Should we still merge it first considering it didn't change how we manage
> per-vcpu address spaces, but only fixing it?  Then anything like a big
> overhaul can happen on top too.

Sure, I'm happy with that.  I just wanted to raise the related problem that we encountered 
over this past week.

r~

Re: [PATCH 0/3] system: Don't leak CPU AddressSpaces

Posted by Peter Maydell 4 months, 1 week ago

On Wed, 1 Oct 2025 at 22:59, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 10/1/25 14:37, Peter Xu wrote:
> > I only have a very preliminary understanding on this, so please bare with
> > me if I'll make silly mistakes...
> >
> > I was expecting QEMU provides both the global view (address_space_memory),
> > and the per-vcpu view.
>
> My hypothesis is that separate views for each cpu doesn't make sense.
>
> In the true symmetric multiprocessor case, which is the vast majority of everything we
> emulate in QEMU, each processor has the exact same view(s).

This is not the case in general. Some CPUs have per-CPU
devices which show up only for that CPU. Examples include
the A9's per-CPU peripherals (which we don't currently model
that way, but perhaps ought to) and some stuff in one of the
M-profile boards.

TCG registers its CPU listener with every AS because
the information that TCG is caching that is stale is
something it cares about for every AS the CPU has to
deal with.

HVF and KVM register their CPU listener only with
address_space_memory because the information that those
accelerators are caching is the details of RAM layout in
the system, which in both cases the kernel imposes
the limitation of being the same for every CPU.
Similarly in target/arm's kvm_arm_register_device()
we listen on address_space_memory because the thing
we're listening for is "where has this device's
MemoryRegion been put", and again the kernel imposes
the requirement that the GIC (or whatever) is at the
same location for all CPUs.

The "global" view of address_space_memory and
system_memory is a bit theoretically dubious because
there's nothing saying all CPUs have to have some
underlying mostly-the-same view of things. But in
practice almost all systems do, plus KVM and HVF impose
that world-model, so it's easier to retain this than try
to think about how we would eliminate it.

> In the rare asymmetric multiprocessor case, of which we have a few, we have a couple of
> clusters and within each cluster all cpus share the exact same view(s).
>
> Thus access to address spaces on a per-cpu basis makes sense, but allocating them on a
> per-cpu basis does not.

Also, in general, per-CPU address spaces fits the rest of
how we model memory-transaction-capable devices. The
device (e.g. a DMA controller or similar) has a QOM
property which takes a pointer to a MemoryRegion describing
the view of the world that that device has; it then internally
creates an AddressSpace whose purpose is to provide the
APIs for making memory transactions into that MemoryRegion.

cpu_address_space_init() is just some convenience wrapping
around that.

thanks
-- PMM

Re: [PATCH 0/3] system: Don't leak CPU AddressSpaces

Posted by Philippe Mathieu-Daudé 4 months, 1 week ago

Cc'ing Salil for previous discussions on
https://lore.kernel.org/qemu-devel/20230918160257.30127-1-philmd@linaro.org/

On 29/9/25 16:42, Peter Maydell wrote:
> When a vCPU is created, it typically calls cpu_address_space_init()
> one or more times to set up its address spaces. We don't currently
> do anything to destroy these address spaces, which means that we
> will leak them on a vcpu hot-plug -> hot-unplug cycle.
> 
> This patchset fixes the leak by replacing the current
> cpu_address_space_destroy() (which has an awkward API, includes
> a bug, and is never called from anywhere) with a new
> cpu_destroy_address_spaces() which cleans up all the ASes a CPU
> has and is called from generic unrealize code.
> 
> Patch 1 is just a comment improvement to clarify that
> address_space_destroy() defers most of its work to RCU and you
> can't free the memory for the AS struct itself until it's done.
> 
> Patch 2 is from Peter Xu; we need to be able to do "destroy and
> free an AS" via RCU, and at the moment you can't do that.
> 
> Patch 3 is the bugfix proper.
> 
> thanks
> -- PMM
> 
> Peter Maydell (2):
>    include/system/memory.h: Clarify address_space_destroy() behaviour
>    physmem: Destroy all CPU AddressSpaces on unrealize
> 
> Peter Xu (1):
>    memory: New AS helper to serialize destroy+free
> 
>   include/exec/cpu-common.h          | 10 ++++-----
>   include/hw/core/cpu.h              |  1 -
>   include/system/memory.h            | 24 ++++++++++++++++++---
>   hw/core/cpu-common.c               |  1 +
>   stubs/cpu-destroy-address-spaces.c | 15 +++++++++++++
>   system/memory.c                    | 20 +++++++++++++++++-
>   system/physmem.c                   | 34 ++++++++++++++----------------
>   stubs/meson.build                  |  1 +
>   8 files changed, 78 insertions(+), 28 deletions(-)
>   create mode 100644 stubs/cpu-destroy-address-spaces.c
>

Re: [PATCH 0/3] system: Don't leak CPU AddressSpaces

Posted by Peter Xu 4 months, 1 week ago

On Mon, Sep 29, 2025 at 05:02:40PM +0200, Philippe Mathieu-Daudé wrote:
> Cc'ing Salil for previous discussions on
> https://lore.kernel.org/qemu-devel/20230918160257.30127-1-philmd@linaro.org/

Thanks.

While waiting for more comments, I pre-queued this series.

Regarding to patch 1: I still think it's almost impossible to use it
correctly on address_space_destroy(), even with the doc.. but I agree
that's the best we can have right now.

The problem is, afaiu we don't even have a way to know when a rcu callback
is processed, unlike the grace period (which corresponds to synchronize_rcu()).

I think it means there's almost no existing way for the user to properly
release the AddressSpace memory if it is e.g. embeded inside a Device
object.  Here, we will need to rcu-free the Device object too (which I
don't think we do at all..), making sure it's called after ASes' rcu
callback.

Even if we'll do so, we'll have yet another problem of having the Device
free() rcu callback to happen after the AddressSpace's destroy rcu
callback, we could rely on what Paolo mentioned on FIFO behavior of rcu
queues, but it looks unwanted.

For the long term, IMHO we may need to convert all embeded AddressSpaces
into using dynamically allocated AddressSpaces (for CPUs, we can stick with
cpu's address space API, but we have other places using ASes too that may
also need to be converted to heap allocations).  Then after all users
switching to that we may be able to drop address_space_destroy().

Thanks,

-- 
Peter Xu