[v1] virtio-mem: vfio support

[PATCH PROTOTYPE 0/6] virtio-mem: vfio support

Posted by David Hildenbrand 3 years, 7 months ago

This is a quick and dirty (1.5 days of hacking) prototype to make
vfio and virtio-mem play together. The basic idea was the result of Alex
brainstorming with me on how to tackle this.

A virtio-mem device manages a memory region in guest physical address
space, represented as a single (currently large) memory region in QEMU.
Before the guest is allowed to use memory blocks, it must coordinate with
the hypervisor (plug blocks). After a reboot, all memory is usually
unplugged - when the guest comes up, it detects the virtio-mem device and
selects memory blocks to plug (based on requests from the hypervisor).

Memory hot(un)plug consists of (un)plugging memory blocks via a virtio-mem
device (triggered by the guest). When unplugging blocks, we discard the
memory. In contrast to memory ballooning, we always know which memory
blocks a guest may use - especially during a reboot, after a crash, or
after kexec.

The issue with vfio is, that it cannot deal with random discards - for this
reason, virtio-mem and vfio can currently only run mutually exclusive.
Especially, vfio would currently map the whole memory region (with possible
only little/no plugged blocks), resulting in all pages getting pinned and
therefore resulting in a higher memory consumption than expected (turning
virtio-mem basically useless in these environments).

To make vfio work nicely with virtio-mem, we have to map only the plugged
blocks, and map/unmap properly when plugging/unplugging blocks (including
discarding of RAM when unplugging). We achieve that by using a new notifier
mechanism that communicates changes.

It's important to map memory in the granularity in which we could see
unmaps again (-> virtio-mem block size) - so when e.g., plugging
consecutive 100 MB with a block size of 2MB, we need 50 mappings. When
unmapping, we can use a single vfio_unmap call for the applicable range.
We expect that the block size of virtio-mem devices will be fairly large
in the future (to not run out of mappings and to improve hot(un)plug
performance), configured by the user, when used with vfio (e.g., 128MB,
1G, ...) - Linux guests will still have to be optimized for that.

We try to handle errors when plugging memory (mapping in VFIO) gracefully
- especially to cope with too many mappings in VFIO.


As I basically have no experience with vfio, all I did for testing is
passthrough a secondary GPU (NVIDIA GK208B) via vfio-pci to my guest
and saw it pop up in dmesg. I did *not* actually try to use it (I know
...), so there might still be plenty of BUGs regarding the actual mappings
in the code. When I resize virtio-mem devices (resulting in
memory hot(un)plug), I can spot the memory consumption of my host adjusting
accordingly - in contrast to before, wehreby my machine would always
consume the maximum size of my VM, as if all memory provided by
virtio-mem devices were fully plugged.

I even tested it with 2MB huge pages (sadly for the first time with
virtio-mem ever) - and it worked like a charm on the hypervisor side as
well. The number of free hugepages adjusted accordingly. (again, did not
properly test the device in the guest ...).

If anybody wants to play with it and needs some guidance, please feel
free to ask. I might add some vfio-related documentation to
https://virtio-mem.gitlab.io/ (but it really isn't that special - only
the block size limitations have to be considered).

David Hildenbrand (6):
  memory: Introduce sparse RAM handler for memory regions
  virtio-mem: Impelement SparseRAMHandler interface
  vfio: Implement support for sparse RAM memory regions
  memory: Extend ram_block_discard_(require|disable) by two discard
    types
  virtio-mem: Require only RAM_BLOCK_DISCARD_T_COORDINATED discards
  vfio: Disable only RAM_BLOCK_DISCARD_T_UNCOORDINATED discards

 exec.c                         | 109 +++++++++++++++++----
 hw/vfio/common.c               | 169 ++++++++++++++++++++++++++++++++-
 hw/virtio/virtio-mem.c         | 164 +++++++++++++++++++++++++++++++-
 include/exec/memory.h          | 151 ++++++++++++++++++++++++++++-
 include/hw/vfio/vfio-common.h  |  12 +++
 include/hw/virtio/virtio-mem.h |   3 +
 softmmu/memory.c               |   7 ++
 7 files changed, 583 insertions(+), 32 deletions(-)

-- 
2.26.2

Re: [PATCH PROTOTYPE 0/6] virtio-mem: vfio support

Posted by no-reply@patchew.org 3 years, 7 months ago

Patchew URL: https://patchew.org/QEMU/20200924160423.106747-1-david@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20200924160423.106747-1-david@redhat.com
Subject: [PATCH PROTOTYPE 0/6] virtio-mem: vfio support

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]      patchew/20200922210101.4081073-1-jsnow@redhat.com -> patchew/20200922210101.4081073-1-jsnow@redhat.com
 - [tag update]      patchew/20200924185414.28642-1-vsementsov@virtuozzo.com -> patchew/20200924185414.28642-1-vsementsov@virtuozzo.com
Switched to a new branch 'test'
8afd1df vfio: Disable only RAM_BLOCK_DISCARD_T_UNCOORDINATED discards
d676b32 virtio-mem: Require only RAM_BLOCK_DISCARD_T_COORDINATED discards
9492f67 memory: Extend ram_block_discard_(require|disable) by two discard types
9eeec69 vfio: Implement support for sparse RAM memory regions
3e21d3f virtio-mem: Impelement SparseRAMHandler interface
2cfc417 memory: Introduce sparse RAM handler for memory regions

=== OUTPUT BEGIN ===
1/6 Checking commit 2cfc4176fbf5 (memory: Introduce sparse RAM handler for memory regions)
ERROR: "foo* bar" should be "foo *bar"
#149: FILE: include/exec/memory.h:1952:
+static inline SparseRAMHandler* memory_region_get_sparse_ram_handler(

total: 1 errors, 0 warnings, 162 lines checked

Patch 1/6 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

2/6 Checking commit 3e21d3f59244 (virtio-mem: Impelement SparseRAMHandler interface)
3/6 Checking commit 9eeec69031b0 (vfio: Implement support for sparse RAM memory regions)
4/6 Checking commit 9492f6715512 (memory: Extend ram_block_discard_(require|disable) by two discard types)
5/6 Checking commit d676b32336b5 (virtio-mem: Require only RAM_BLOCK_DISCARD_T_COORDINATED discards)
6/6 Checking commit 8afd1df27b99 (vfio: Disable only RAM_BLOCK_DISCARD_T_UNCOORDINATED discards)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200924160423.106747-1-david@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

Re: [PATCH PROTOTYPE 0/6] virtio-mem: vfio support

Posted by Dr. David Alan Gilbert 3 years, 7 months ago

* David Hildenbrand (david@redhat.com) wrote:
> This is a quick and dirty (1.5 days of hacking) prototype to make
> vfio and virtio-mem play together. The basic idea was the result of Alex
> brainstorming with me on how to tackle this.
> 
> A virtio-mem device manages a memory region in guest physical address
> space, represented as a single (currently large) memory region in QEMU.
> Before the guest is allowed to use memory blocks, it must coordinate with
> the hypervisor (plug blocks). After a reboot, all memory is usually
> unplugged - when the guest comes up, it detects the virtio-mem device and
> selects memory blocks to plug (based on requests from the hypervisor).
> 
> Memory hot(un)plug consists of (un)plugging memory blocks via a virtio-mem
> device (triggered by the guest). When unplugging blocks, we discard the
> memory. In contrast to memory ballooning, we always know which memory
> blocks a guest may use - especially during a reboot, after a crash, or
> after kexec.
> 
> The issue with vfio is, that it cannot deal with random discards - for this
> reason, virtio-mem and vfio can currently only run mutually exclusive.
> Especially, vfio would currently map the whole memory region (with possible
> only little/no plugged blocks), resulting in all pages getting pinned and
> therefore resulting in a higher memory consumption than expected (turning
> virtio-mem basically useless in these environments).
> 
> To make vfio work nicely with virtio-mem, we have to map only the plugged
> blocks, and map/unmap properly when plugging/unplugging blocks (including
> discarding of RAM when unplugging). We achieve that by using a new notifier
> mechanism that communicates changes.
> 
> It's important to map memory in the granularity in which we could see
> unmaps again (-> virtio-mem block size) - so when e.g., plugging
> consecutive 100 MB with a block size of 2MB, we need 50 mappings. When
> unmapping, we can use a single vfio_unmap call for the applicable range.
> We expect that the block size of virtio-mem devices will be fairly large
> in the future (to not run out of mappings and to improve hot(un)plug
> performance), configured by the user, when used with vfio (e.g., 128MB,
> 1G, ...) - Linux guests will still have to be optimized for that.

This seems pretty painful for those few TB mappings.
Also the calls seem pretty painful; maybe it'll be possible to have
calls that are optimised for making multiple consecutive mappings.

Dave

> We try to handle errors when plugging memory (mapping in VFIO) gracefully
> - especially to cope with too many mappings in VFIO.
> 
> 
> As I basically have no experience with vfio, all I did for testing is
> passthrough a secondary GPU (NVIDIA GK208B) via vfio-pci to my guest
> and saw it pop up in dmesg. I did *not* actually try to use it (I know
> ...), so there might still be plenty of BUGs regarding the actual mappings
> in the code. When I resize virtio-mem devices (resulting in
> memory hot(un)plug), I can spot the memory consumption of my host adjusting
> accordingly - in contrast to before, wehreby my machine would always
> consume the maximum size of my VM, as if all memory provided by
> virtio-mem devices were fully plugged.
> 
> I even tested it with 2MB huge pages (sadly for the first time with
> virtio-mem ever) - and it worked like a charm on the hypervisor side as
> well. The number of free hugepages adjusted accordingly. (again, did not
> properly test the device in the guest ...).
> 
> If anybody wants to play with it and needs some guidance, please feel
> free to ask. I might add some vfio-related documentation to
> https://virtio-mem.gitlab.io/ (but it really isn't that special - only
> the block size limitations have to be considered).
> 
> David Hildenbrand (6):
>   memory: Introduce sparse RAM handler for memory regions
>   virtio-mem: Impelement SparseRAMHandler interface
>   vfio: Implement support for sparse RAM memory regions
>   memory: Extend ram_block_discard_(require|disable) by two discard
>     types
>   virtio-mem: Require only RAM_BLOCK_DISCARD_T_COORDINATED discards
>   vfio: Disable only RAM_BLOCK_DISCARD_T_UNCOORDINATED discards
> 
>  exec.c                         | 109 +++++++++++++++++----
>  hw/vfio/common.c               | 169 ++++++++++++++++++++++++++++++++-
>  hw/virtio/virtio-mem.c         | 164 +++++++++++++++++++++++++++++++-
>  include/exec/memory.h          | 151 ++++++++++++++++++++++++++++-
>  include/hw/vfio/vfio-common.h  |  12 +++
>  include/hw/virtio/virtio-mem.h |   3 +
>  softmmu/memory.c               |   7 ++
>  7 files changed, 583 insertions(+), 32 deletions(-)
> 
> -- 
> 2.26.2
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [PATCH PROTOTYPE 0/6] virtio-mem: vfio support

Posted by David Hildenbrand 3 years, 7 months ago

On 29.09.20 19:02, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> This is a quick and dirty (1.5 days of hacking) prototype to make
>> vfio and virtio-mem play together. The basic idea was the result of Alex
>> brainstorming with me on how to tackle this.
>>
>> A virtio-mem device manages a memory region in guest physical address
>> space, represented as a single (currently large) memory region in QEMU.
>> Before the guest is allowed to use memory blocks, it must coordinate with
>> the hypervisor (plug blocks). After a reboot, all memory is usually
>> unplugged - when the guest comes up, it detects the virtio-mem device and
>> selects memory blocks to plug (based on requests from the hypervisor).
>>
>> Memory hot(un)plug consists of (un)plugging memory blocks via a virtio-mem
>> device (triggered by the guest). When unplugging blocks, we discard the
>> memory. In contrast to memory ballooning, we always know which memory
>> blocks a guest may use - especially during a reboot, after a crash, or
>> after kexec.
>>
>> The issue with vfio is, that it cannot deal with random discards - for this
>> reason, virtio-mem and vfio can currently only run mutually exclusive.
>> Especially, vfio would currently map the whole memory region (with possible
>> only little/no plugged blocks), resulting in all pages getting pinned and
>> therefore resulting in a higher memory consumption than expected (turning
>> virtio-mem basically useless in these environments).
>>
>> To make vfio work nicely with virtio-mem, we have to map only the plugged
>> blocks, and map/unmap properly when plugging/unplugging blocks (including
>> discarding of RAM when unplugging). We achieve that by using a new notifier
>> mechanism that communicates changes.
>>
>> It's important to map memory in the granularity in which we could see
>> unmaps again (-> virtio-mem block size) - so when e.g., plugging
>> consecutive 100 MB with a block size of 2MB, we need 50 mappings. When
>> unmapping, we can use a single vfio_unmap call for the applicable range.
>> We expect that the block size of virtio-mem devices will be fairly large
>> in the future (to not run out of mappings and to improve hot(un)plug
>> performance), configured by the user, when used with vfio (e.g., 128MB,
>> 1G, ...) - Linux guests will still have to be optimized for that.
> 
> This seems pretty painful for those few TB mappings.
> Also the calls seem pretty painful; maybe it'll be possible to have
> calls that are optimised for making multiple consecutive mappings.

Exactly the future I imagine. This patchset is with no kernel interface
additions - once we have an optimized interface that understands
consecutive mappings (and the granularity), we can use that instead. The
prototype already prepared for that by notifying about consecutive ranges.

Thanks!

-- 
Thanks,

David / dhildenb