[Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately

Alexey Kardashevskiy posted 1 patch 6 years, 8 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170825083123.47432-1-aik@ozlabs.ru
Test FreeBSD passed
Test checkpatch passed
Test docker passed
Test s390x passed
exec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately
Posted by Alexey Kardashevskiy 6 years, 8 months ago
Otherwise old dispatch holds way too much memory before RCU gets
a chance to free old dispatches.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

This is a follow-up to the "Memory use with >100 virtio devices"
thread.


I assume this is a dirty hack (which fixes the problem though)
and I wonder what the proper solution would be. Thanks.


What happens here is that every virtio block device creates 2 address
spaces - for modern config space (called "virtio-pci-cfg-as") and
for busmaster (common pci thing, called after the device name,
in my case "virtio-blk-pci").

Each address_space_init() updates topology for _every_ address space.
Every topology update (address_space_update_topology()) creates a new
dispatch tree - AddressSpaceDispatch with nodes (1KB) and
sections (48KB) and destroys the old one.

However the dispatch destructor is postponed via RCU which does not
get a chance to execute until the machine is initialized but before
we get there, memory is not returned to the pool, and this is a lot
of memory which grows n^2.


Interestingly, mem_add() from exec.c is called twice:
as as->dispatch_listener.region_add() and
as as->dispatch_listener.region_nop() - I did not understand
the trick but it does not work if I remove the .region_nop() hook.
How does it work? :)

---
 exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index 01ac21e3cd..ea5f3eb209 100644
--- a/exec.c
+++ b/exec.c
@@ -2707,7 +2707,7 @@ static void mem_commit(MemoryListener *listener)
 
     atomic_rcu_set(&as->dispatch, next);
     if (cur) {
-        call_rcu(cur, address_space_dispatch_free, rcu);
+        address_space_dispatch_free(cur);
     }
 }
 
-- 
2.11.0


Re: [Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately
Posted by Paolo Bonzini 6 years, 8 months ago
On 25/08/2017 10:31, Alexey Kardashevskiy wrote:
> Otherwise old dispatch holds way too much memory before RCU gets
> a chance to free old dispatches.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> This is a follow-up to the "Memory use with >100 virtio devices"
> thread.
> 
> I assume this is a dirty hack (which fixes the problem though)

This doesn't work.  AddressSpaceDispatch can be accessed outside the big
QEMU lock, which is why it is destroyed via call_rcu.

The solution is to: 1) share the FlatView structures if they refer to
the same root memory region; 2) have one AddressSpaceDispatch per
FlatView instead of one per AddressSpace, so that FlatView reference
counting takes care of clearing the AddressSpaceDispatch too.  Neither
is particularly hard.  The second requires getting rid of the
as->dispatch_listener and "hard coding" the creation of the
AddressSpaceDispatch from the FlatView in memory.c.

Paolo

> and I wonder what the proper solution would be. Thanks.
> 
> 
> What happens here is that every virtio block device creates 2 address
> spaces - for modern config space (called "virtio-pci-cfg-as") and
> for busmaster (common pci thing, called after the device name,
> in my case "virtio-blk-pci").
> 
> Each address_space_init() updates topology for _every_ address space.
> Every topology update (address_space_update_topology()) creates a new
> dispatch tree - AddressSpaceDispatch with nodes (1KB) and
> sections (48KB) and destroys the old one.
> 
> However the dispatch destructor is postponed via RCU which does not
> get a chance to execute until the machine is initialized but before
> we get there, memory is not returned to the pool, and this is a lot
> of memory which grows n^2.
> 
> 
> Interestingly, mem_add() from exec.c is called twice:
> as as->dispatch_listener.region_add() and
> as as->dispatch_listener.region_nop() - I did not understand
> the trick but it does not work if I remove the .region_nop() hook.
> How does it work? :)
> 
> ---
>  exec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/exec.c b/exec.c
> index 01ac21e3cd..ea5f3eb209 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2707,7 +2707,7 @@ static void mem_commit(MemoryListener *listener)
>  
>      atomic_rcu_set(&as->dispatch, next);
>      if (cur) {
> -        call_rcu(cur, address_space_dispatch_free, rcu);
> +        address_space_dispatch_free(cur);
>      }
>  }
>  
> 


Re: [Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately
Posted by Peter Maydell 6 years, 8 months ago
On 25 August 2017 at 09:53, Paolo Bonzini <pbonzini@redhat.com> wrote:
> The solution is to: 1) share the FlatView structures if they refer to
> the same root memory region; 2) have one AddressSpaceDispatch per
> FlatView instead of one per AddressSpace, so that FlatView reference
> counting takes care of clearing the AddressSpaceDispatch too.  Neither
> is particularly hard.

If we did this we could get rid of address_space_init_shareable(),
right? (It's a bit of a cheesy hack aimed at avoiding having duplicate
address space structures for the same root memory region, but if
the underlying code is better at not duplicating all the data
structures unless necessary then the benefit of having the
separate API goes away I think.)

thanks
-- PMM

Re: [Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately
Posted by Paolo Bonzini 6 years, 8 months ago
On 25/08/2017 11:22, Peter Maydell wrote:
> On 25 August 2017 at 09:53, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> The solution is to: 1) share the FlatView structures if they refer to
>> the same root memory region; 2) have one AddressSpaceDispatch per
>> FlatView instead of one per AddressSpace, so that FlatView reference
>> counting takes care of clearing the AddressSpaceDispatch too.  Neither
>> is particularly hard.
> If we did this we could get rid of address_space_init_shareable(),
> right? (It's a bit of a cheesy hack aimed at avoiding having duplicate
> address space structures for the same root memory region, but if
> the underlying code is better at not duplicating all the data
> structures unless necessary then the benefit of having the
> separate API goes away I think.)

Yes, indeed.

Paolo

Re: [Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately
Posted by David Gibson 6 years, 8 months ago
On Fri, Aug 25, 2017 at 11:57:26AM +0200, Paolo Bonzini wrote:
> On 25/08/2017 11:22, Peter Maydell wrote:
> > On 25 August 2017 at 09:53, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >> The solution is to: 1) share the FlatView structures if they refer to
> >> the same root memory region; 2) have one AddressSpaceDispatch per
> >> FlatView instead of one per AddressSpace, so that FlatView reference
> >> counting takes care of clearing the AddressSpaceDispatch too.  Neither
> >> is particularly hard.
> > If we did this we could get rid of address_space_init_shareable(),
> > right? (It's a bit of a cheesy hack aimed at avoiding having duplicate
> > address space structures for the same root memory region, but if
> > the underlying code is better at not duplicating all the data
> > structures unless necessary then the benefit of having the
> > separate API goes away I think.)
> 
> Yes, indeed.

Hm.  Why do we need to construct full ASes for virtio-blk, rather than
just MRs?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately
Posted by Peter Maydell 6 years, 8 months ago
On 25 August 2017 at 14:19, David Gibson <david@gibson.dropbear.id.au> wrote:
> Hm.  Why do we need to construct full ASes for virtio-blk, rather than
> just MRs?

It's the PCI layer that's doing it, but the overall reason is because
virtio-blk needs to make memory transactions (DMA reads and writes).
The AddressSpace is our construct for "thing you use to be able
to efficiently make memory transactions on a MemoryRegion".
(This is what the FlatView and AddressSpaceDispatch data structures
are for -- they let you efficiently go from "write to address 0x4000
in this thing" to the actual destination RAM or device read/write
pointers.)

thanks
-- PMM

Re: [Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately
Posted by Paolo Bonzini 6 years, 8 months ago
On 25/08/2017 15:19, David Gibson wrote:
> On Fri, Aug 25, 2017 at 11:57:26AM +0200, Paolo Bonzini wrote:
>> On 25/08/2017 11:22, Peter Maydell wrote:
>>> On 25 August 2017 at 09:53, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> The solution is to: 1) share the FlatView structures if they refer to
>>>> the same root memory region; 2) have one AddressSpaceDispatch per
>>>> FlatView instead of one per AddressSpace, so that FlatView reference
>>>> counting takes care of clearing the AddressSpaceDispatch too.  Neither
>>>> is particularly hard.
>>> If we did this we could get rid of address_space_init_shareable(),
>>> right? (It's a bit of a cheesy hack aimed at avoiding having duplicate
>>> address space structures for the same root memory region, but if
>>> the underlying code is better at not duplicating all the data
>>> structures unless necessary then the benefit of having the
>>> separate API goes away I think.)
>>
>> Yes, indeed.
> 
> Hm.  Why do we need to construct full ASes for virtio-blk, rather than
> just MRs?

It's an artifact of how virtio_address_space_read and
virtio_address_space_write are implemented.

Paolo

Re: [Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately
Posted by Alexey Kardashevskiy 6 years, 7 months ago
On 25/08/17 18:53, Paolo Bonzini wrote:
> On 25/08/2017 10:31, Alexey Kardashevskiy wrote:
>> Otherwise old dispatch holds way too much memory before RCU gets
>> a chance to free old dispatches.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> This is a follow-up to the "Memory use with >100 virtio devices"
>> thread.
>>
>> I assume this is a dirty hack (which fixes the problem though)
> 
> This doesn't work.  AddressSpaceDispatch can be accessed outside the big
> QEMU lock, which is why it is destroyed via call_rcu.
> 
> The solution is to: 1) share the FlatView structures if they refer to
> the same root memory region; 2) have one AddressSpaceDispatch per
> FlatView instead of one per AddressSpace, so that FlatView reference
> counting takes care of clearing the AddressSpaceDispatch too.  Neither
> is particularly hard.  The second requires getting rid of the
> as->dispatch_listener and "hard coding" the creation of the
> AddressSpaceDispatch from the FlatView in memory.c.


While I am trying this approach, here is a cheap workaround -

diff --git a/vl.c b/vl.c
index 8e247cc2a2..4d95bc2a6a 100644
--- a/vl.c
+++ b/vl.c
@@ -4655,12 +4655,16 @@ int main(int argc, char **argv, char **envp)
     igd_gfx_passthru();

     /* init generic devices */
+    memory_region_transaction_begin();
+
     rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
     if (qemu_opts_foreach(qemu_find_opts("device"),
                           device_init_func, NULL, NULL)) {
         exit(1);
     }

+    memory_region_transaction_commit();
+
     cpu_synchronize_all_post_init();

     rom_reset_order_override();



Just for self-education - how dirty is this hack?

This effectively postpones all dispatch trees allocation/disposal till MR
transation depth is 0 (which happens in this
memory_region_transaction_commit()), for 500 virtio devices it reduces the
amount of resident RAM from 45GB to 11GB with 2GB guest, good for speed too
- time to build a machine is reduced from 4:00 to 1:20.




> 
> Paolo
> 
>> and I wonder what the proper solution would be. Thanks.
>>
>>
>> What happens here is that every virtio block device creates 2 address
>> spaces - for modern config space (called "virtio-pci-cfg-as") and
>> for busmaster (common pci thing, called after the device name,
>> in my case "virtio-blk-pci").
>>
>> Each address_space_init() updates topology for _every_ address space.
>> Every topology update (address_space_update_topology()) creates a new
>> dispatch tree - AddressSpaceDispatch with nodes (1KB) and
>> sections (48KB) and destroys the old one.
>>
>> However the dispatch destructor is postponed via RCU which does not
>> get a chance to execute until the machine is initialized but before
>> we get there, memory is not returned to the pool, and this is a lot
>> of memory which grows n^2.
>>
>>
>> Interestingly, mem_add() from exec.c is called twice:
>> as as->dispatch_listener.region_add() and
>> as as->dispatch_listener.region_nop() - I did not understand
>> the trick but it does not work if I remove the .region_nop() hook.
>> How does it work? :)
>>
>> ---
>>  exec.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 01ac21e3cd..ea5f3eb209 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -2707,7 +2707,7 @@ static void mem_commit(MemoryListener *listener)
>>  
>>      atomic_rcu_set(&as->dispatch, next);
>>      if (cur) {
>> -        call_rcu(cur, address_space_dispatch_free, rcu);
>> +        address_space_dispatch_free(cur);
>>      }
>>  }
>>  
>>
> 


-- 
Alexey

Re: [Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately
Posted by Paolo Bonzini 6 years, 8 months ago
On 25/08/2017 10:31, Alexey Kardashevskiy wrote:
> 
> Interestingly, mem_add() from exec.c is called twice:
> as as->dispatch_listener.region_add() and
> as as->dispatch_listener.region_nop() - I did not understand
> the trick but it does not work if I remove the .region_nop() hook.
> How does it work? :)

Didn't note this.

The hooks are:

- region_add: a new MemoryRegionSection appeared compared to the
previous FlatView

- region_nop: a region that was in the previous FlatView stayed there

- region_del: a MemoryRegionSection disappeared compared to the previous
FlatView

Because the AddressSpaceDispatch is rebuilt from scratch, it cares about
both new (region_add) and existing (region_nop) regions.

Paolo

Re: [Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately
Posted by Paolo Bonzini 6 years, 8 months ago
On 25/08/2017 10:31, Alexey Kardashevskiy wrote:
> Each address_space_init() updates topology for _every_ address space.

And finally, does this patch help with the above?

diff --git a/memory.c b/memory.c
index c0adc35..97c16cc 100644
--- a/memory.c
+++ b/memory.c
@@ -2607,10 +2607,16 @@ void memory_region_invalidate_mmio_ptr(MemoryRegion *mr, hwaddr offset,
                           RUN_ON_CPU_HOST_PTR(invalidate_data));
 }
 
+static void address_space_rebuild(AddressSpace *as)
+{
+    MEMORY_LISTENER_CALL(as, begin, Forward);
+    address_space_update_topology(as);
+    MEMORY_LISTENER_CALL(as, commit, Forward);
+}
+
 void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
 {
     memory_region_ref(root);
-    memory_region_transaction_begin();
     as->ref_count = 1;
     as->root = root;
     as->malloced = false;
@@ -2622,8 +2628,7 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
     QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link);
     as->name = g_strdup(name ? name : "anonymous");
     address_space_init_dispatch(as);
-    memory_region_update_pending |= root->enabled;
-    memory_region_transaction_commit();
+    address_space_rebuild(as);
 }
 
 static void do_address_space_destroy(AddressSpace *as)


Completely untested because vacation is coming.

Thanks,

Paolo

Re: [Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately
Posted by Alexey Kardashevskiy 6 years, 8 months ago
On 25/08/17 19:01, Paolo Bonzini wrote:
> On 25/08/2017 10:31, Alexey Kardashevskiy wrote:
>> Each address_space_init() updates topology for _every_ address space.
> 
> And finally, does this patch help with the above?
> 
> diff --git a/memory.c b/memory.c
> index c0adc35..97c16cc 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -2607,10 +2607,16 @@ void memory_region_invalidate_mmio_ptr(MemoryRegion *mr, hwaddr offset,
>                            RUN_ON_CPU_HOST_PTR(invalidate_data));
>  }
>  
> +static void address_space_rebuild(AddressSpace *as)
> +{
> +    MEMORY_LISTENER_CALL(as, begin, Forward);
> +    address_space_update_topology(as);
> +    MEMORY_LISTENER_CALL(as, commit, Forward);
> +}
> +
>  void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
>  {
>      memory_region_ref(root);
> -    memory_region_transaction_begin();
>      as->ref_count = 1;
>      as->root = root;
>      as->malloced = false;
> @@ -2622,8 +2628,7 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
>      QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link);
>      as->name = g_strdup(name ? name : "anonymous");
>      address_space_init_dispatch(as);
> -    memory_region_update_pending |= root->enabled;
> -    memory_region_transaction_commit();
> +    address_space_rebuild(as);
>  }
>  
>  static void do_address_space_destroy(AddressSpace *as)
> 
> 
> Completely untested because vacation is coming.


Friday night arrived here already ;)


The patch did not help though (I also had to define section-less
MEMORY_LISTENER_CALL()). I'll look into your first suggestion on Monday.



-- 
Alexey

Re: [Qemu-devel] [RFC PATCH qemu] exec: Destroy dispatch immediately
Posted by Paolo Bonzini 6 years, 8 months ago
On 25/08/2017 11:16, Alexey Kardashevskiy wrote:
> On 25/08/17 19:01, Paolo Bonzini wrote:
>> On 25/08/2017 10:31, Alexey Kardashevskiy wrote:
>>> Each address_space_init() updates topology for _every_ address space.
>>
>> And finally, does this patch help with the above?
>>
>> diff --git a/memory.c b/memory.c
>> index c0adc35..97c16cc 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -2607,10 +2607,16 @@ void memory_region_invalidate_mmio_ptr(MemoryRegion *mr, hwaddr offset,
>>                            RUN_ON_CPU_HOST_PTR(invalidate_data));
>>  }
>>  
>> +static void address_space_rebuild(AddressSpace *as)
>> +{
>> +    MEMORY_LISTENER_CALL(as, begin, Forward);
>> +    address_space_update_topology(as);
>> +    MEMORY_LISTENER_CALL(as, commit, Forward);
>> +}
>> +
>>  void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
>>  {
>>      memory_region_ref(root);
>> -    memory_region_transaction_begin();
>>      as->ref_count = 1;
>>      as->root = root;
>>      as->malloced = false;
>> @@ -2622,8 +2628,7 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
>>      QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link);
>>      as->name = g_strdup(name ? name : "anonymous");
>>      address_space_init_dispatch(as);
>> -    memory_region_update_pending |= root->enabled;
>> -    memory_region_transaction_commit();
>> +    address_space_rebuild(as);
>>  }
>>  
>>  static void do_address_space_destroy(AddressSpace *as)
>>
>>
>> Completely untested because vacation is coming.
> 
> 
> Friday night arrived here already ;)
> 
> The patch did not help though (I also had to define section-less
> MEMORY_LISTENER_CALL()). I'll look into your first suggestion on Monday.

Note that this is not a full solution, only for the O(n^2) transaction
commits.

Paolo