exec.c | 109 +++++++++++++++++---- hw/vfio/common.c | 169 ++++++++++++++++++++++++++++++++- hw/virtio/virtio-mem.c | 164 +++++++++++++++++++++++++++++++- include/exec/memory.h | 151 ++++++++++++++++++++++++++++- include/hw/vfio/vfio-common.h | 12 +++ include/hw/virtio/virtio-mem.h | 3 + softmmu/memory.c | 7 ++ 7 files changed, 583 insertions(+), 32 deletions(-)
This is a quick and dirty (1.5 days of hacking) prototype to make vfio and virtio-mem play together. The basic idea was the result of Alex brainstorming with me on how to tackle this. A virtio-mem device manages a memory region in guest physical address space, represented as a single (currently large) memory region in QEMU. Before the guest is allowed to use memory blocks, it must coordinate with the hypervisor (plug blocks). After a reboot, all memory is usually unplugged - when the guest comes up, it detects the virtio-mem device and selects memory blocks to plug (based on requests from the hypervisor). Memory hot(un)plug consists of (un)plugging memory blocks via a virtio-mem device (triggered by the guest). When unplugging blocks, we discard the memory. In contrast to memory ballooning, we always know which memory blocks a guest may use - especially during a reboot, after a crash, or after kexec. The issue with vfio is, that it cannot deal with random discards - for this reason, virtio-mem and vfio can currently only run mutually exclusive. Especially, vfio would currently map the whole memory region (with possible only little/no plugged blocks), resulting in all pages getting pinned and therefore resulting in a higher memory consumption than expected (turning virtio-mem basically useless in these environments). To make vfio work nicely with virtio-mem, we have to map only the plugged blocks, and map/unmap properly when plugging/unplugging blocks (including discarding of RAM when unplugging). We achieve that by using a new notifier mechanism that communicates changes. It's important to map memory in the granularity in which we could see unmaps again (-> virtio-mem block size) - so when e.g., plugging consecutive 100 MB with a block size of 2MB, we need 50 mappings. When unmapping, we can use a single vfio_unmap call for the applicable range. We expect that the block size of virtio-mem devices will be fairly large in the future (to not run out of mappings and to improve hot(un)plug performance), configured by the user, when used with vfio (e.g., 128MB, 1G, ...) - Linux guests will still have to be optimized for that. We try to handle errors when plugging memory (mapping in VFIO) gracefully - especially to cope with too many mappings in VFIO. As I basically have no experience with vfio, all I did for testing is passthrough a secondary GPU (NVIDIA GK208B) via vfio-pci to my guest and saw it pop up in dmesg. I did *not* actually try to use it (I know ...), so there might still be plenty of BUGs regarding the actual mappings in the code. When I resize virtio-mem devices (resulting in memory hot(un)plug), I can spot the memory consumption of my host adjusting accordingly - in contrast to before, wehreby my machine would always consume the maximum size of my VM, as if all memory provided by virtio-mem devices were fully plugged. I even tested it with 2MB huge pages (sadly for the first time with virtio-mem ever) - and it worked like a charm on the hypervisor side as well. The number of free hugepages adjusted accordingly. (again, did not properly test the device in the guest ...). If anybody wants to play with it and needs some guidance, please feel free to ask. I might add some vfio-related documentation to https://virtio-mem.gitlab.io/ (but it really isn't that special - only the block size limitations have to be considered). David Hildenbrand (6): memory: Introduce sparse RAM handler for memory regions virtio-mem: Impelement SparseRAMHandler interface vfio: Implement support for sparse RAM memory regions memory: Extend ram_block_discard_(require|disable) by two discard types virtio-mem: Require only RAM_BLOCK_DISCARD_T_COORDINATED discards vfio: Disable only RAM_BLOCK_DISCARD_T_UNCOORDINATED discards exec.c | 109 +++++++++++++++++---- hw/vfio/common.c | 169 ++++++++++++++++++++++++++++++++- hw/virtio/virtio-mem.c | 164 +++++++++++++++++++++++++++++++- include/exec/memory.h | 151 ++++++++++++++++++++++++++++- include/hw/vfio/vfio-common.h | 12 +++ include/hw/virtio/virtio-mem.h | 3 + softmmu/memory.c | 7 ++ 7 files changed, 583 insertions(+), 32 deletions(-) -- 2.26.2
Patchew URL: https://patchew.org/QEMU/20200924160423.106747-1-david@redhat.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20200924160423.106747-1-david@redhat.com Subject: [PATCH PROTOTYPE 0/6] virtio-mem: vfio support === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu - [tag update] patchew/20200922210101.4081073-1-jsnow@redhat.com -> patchew/20200922210101.4081073-1-jsnow@redhat.com - [tag update] patchew/20200924185414.28642-1-vsementsov@virtuozzo.com -> patchew/20200924185414.28642-1-vsementsov@virtuozzo.com Switched to a new branch 'test' 8afd1df vfio: Disable only RAM_BLOCK_DISCARD_T_UNCOORDINATED discards d676b32 virtio-mem: Require only RAM_BLOCK_DISCARD_T_COORDINATED discards 9492f67 memory: Extend ram_block_discard_(require|disable) by two discard types 9eeec69 vfio: Implement support for sparse RAM memory regions 3e21d3f virtio-mem: Impelement SparseRAMHandler interface 2cfc417 memory: Introduce sparse RAM handler for memory regions === OUTPUT BEGIN === 1/6 Checking commit 2cfc4176fbf5 (memory: Introduce sparse RAM handler for memory regions) ERROR: "foo* bar" should be "foo *bar" #149: FILE: include/exec/memory.h:1952: +static inline SparseRAMHandler* memory_region_get_sparse_ram_handler( total: 1 errors, 0 warnings, 162 lines checked Patch 1/6 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 2/6 Checking commit 3e21d3f59244 (virtio-mem: Impelement SparseRAMHandler interface) 3/6 Checking commit 9eeec69031b0 (vfio: Implement support for sparse RAM memory regions) 4/6 Checking commit 9492f6715512 (memory: Extend ram_block_discard_(require|disable) by two discard types) 5/6 Checking commit d676b32336b5 (virtio-mem: Require only RAM_BLOCK_DISCARD_T_COORDINATED discards) 6/6 Checking commit 8afd1df27b99 (vfio: Disable only RAM_BLOCK_DISCARD_T_UNCOORDINATED discards) === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20200924160423.106747-1-david@redhat.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
* David Hildenbrand (david@redhat.com) wrote: > This is a quick and dirty (1.5 days of hacking) prototype to make > vfio and virtio-mem play together. The basic idea was the result of Alex > brainstorming with me on how to tackle this. > > A virtio-mem device manages a memory region in guest physical address > space, represented as a single (currently large) memory region in QEMU. > Before the guest is allowed to use memory blocks, it must coordinate with > the hypervisor (plug blocks). After a reboot, all memory is usually > unplugged - when the guest comes up, it detects the virtio-mem device and > selects memory blocks to plug (based on requests from the hypervisor). > > Memory hot(un)plug consists of (un)plugging memory blocks via a virtio-mem > device (triggered by the guest). When unplugging blocks, we discard the > memory. In contrast to memory ballooning, we always know which memory > blocks a guest may use - especially during a reboot, after a crash, or > after kexec. > > The issue with vfio is, that it cannot deal with random discards - for this > reason, virtio-mem and vfio can currently only run mutually exclusive. > Especially, vfio would currently map the whole memory region (with possible > only little/no plugged blocks), resulting in all pages getting pinned and > therefore resulting in a higher memory consumption than expected (turning > virtio-mem basically useless in these environments). > > To make vfio work nicely with virtio-mem, we have to map only the plugged > blocks, and map/unmap properly when plugging/unplugging blocks (including > discarding of RAM when unplugging). We achieve that by using a new notifier > mechanism that communicates changes. > > It's important to map memory in the granularity in which we could see > unmaps again (-> virtio-mem block size) - so when e.g., plugging > consecutive 100 MB with a block size of 2MB, we need 50 mappings. When > unmapping, we can use a single vfio_unmap call for the applicable range. > We expect that the block size of virtio-mem devices will be fairly large > in the future (to not run out of mappings and to improve hot(un)plug > performance), configured by the user, when used with vfio (e.g., 128MB, > 1G, ...) - Linux guests will still have to be optimized for that. This seems pretty painful for those few TB mappings. Also the calls seem pretty painful; maybe it'll be possible to have calls that are optimised for making multiple consecutive mappings. Dave > We try to handle errors when plugging memory (mapping in VFIO) gracefully > - especially to cope with too many mappings in VFIO. > > > As I basically have no experience with vfio, all I did for testing is > passthrough a secondary GPU (NVIDIA GK208B) via vfio-pci to my guest > and saw it pop up in dmesg. I did *not* actually try to use it (I know > ...), so there might still be plenty of BUGs regarding the actual mappings > in the code. When I resize virtio-mem devices (resulting in > memory hot(un)plug), I can spot the memory consumption of my host adjusting > accordingly - in contrast to before, wehreby my machine would always > consume the maximum size of my VM, as if all memory provided by > virtio-mem devices were fully plugged. > > I even tested it with 2MB huge pages (sadly for the first time with > virtio-mem ever) - and it worked like a charm on the hypervisor side as > well. The number of free hugepages adjusted accordingly. (again, did not > properly test the device in the guest ...). > > If anybody wants to play with it and needs some guidance, please feel > free to ask. I might add some vfio-related documentation to > https://virtio-mem.gitlab.io/ (but it really isn't that special - only > the block size limitations have to be considered). > > David Hildenbrand (6): > memory: Introduce sparse RAM handler for memory regions > virtio-mem: Impelement SparseRAMHandler interface > vfio: Implement support for sparse RAM memory regions > memory: Extend ram_block_discard_(require|disable) by two discard > types > virtio-mem: Require only RAM_BLOCK_DISCARD_T_COORDINATED discards > vfio: Disable only RAM_BLOCK_DISCARD_T_UNCOORDINATED discards > > exec.c | 109 +++++++++++++++++---- > hw/vfio/common.c | 169 ++++++++++++++++++++++++++++++++- > hw/virtio/virtio-mem.c | 164 +++++++++++++++++++++++++++++++- > include/exec/memory.h | 151 ++++++++++++++++++++++++++++- > include/hw/vfio/vfio-common.h | 12 +++ > include/hw/virtio/virtio-mem.h | 3 + > softmmu/memory.c | 7 ++ > 7 files changed, 583 insertions(+), 32 deletions(-) > > -- > 2.26.2 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 29.09.20 19:02, Dr. David Alan Gilbert wrote: > * David Hildenbrand (david@redhat.com) wrote: >> This is a quick and dirty (1.5 days of hacking) prototype to make >> vfio and virtio-mem play together. The basic idea was the result of Alex >> brainstorming with me on how to tackle this. >> >> A virtio-mem device manages a memory region in guest physical address >> space, represented as a single (currently large) memory region in QEMU. >> Before the guest is allowed to use memory blocks, it must coordinate with >> the hypervisor (plug blocks). After a reboot, all memory is usually >> unplugged - when the guest comes up, it detects the virtio-mem device and >> selects memory blocks to plug (based on requests from the hypervisor). >> >> Memory hot(un)plug consists of (un)plugging memory blocks via a virtio-mem >> device (triggered by the guest). When unplugging blocks, we discard the >> memory. In contrast to memory ballooning, we always know which memory >> blocks a guest may use - especially during a reboot, after a crash, or >> after kexec. >> >> The issue with vfio is, that it cannot deal with random discards - for this >> reason, virtio-mem and vfio can currently only run mutually exclusive. >> Especially, vfio would currently map the whole memory region (with possible >> only little/no plugged blocks), resulting in all pages getting pinned and >> therefore resulting in a higher memory consumption than expected (turning >> virtio-mem basically useless in these environments). >> >> To make vfio work nicely with virtio-mem, we have to map only the plugged >> blocks, and map/unmap properly when plugging/unplugging blocks (including >> discarding of RAM when unplugging). We achieve that by using a new notifier >> mechanism that communicates changes. >> >> It's important to map memory in the granularity in which we could see >> unmaps again (-> virtio-mem block size) - so when e.g., plugging >> consecutive 100 MB with a block size of 2MB, we need 50 mappings. When >> unmapping, we can use a single vfio_unmap call for the applicable range. >> We expect that the block size of virtio-mem devices will be fairly large >> in the future (to not run out of mappings and to improve hot(un)plug >> performance), configured by the user, when used with vfio (e.g., 128MB, >> 1G, ...) - Linux guests will still have to be optimized for that. > > This seems pretty painful for those few TB mappings. > Also the calls seem pretty painful; maybe it'll be possible to have > calls that are optimised for making multiple consecutive mappings. Exactly the future I imagine. This patchset is with no kernel interface additions - once we have an optimized interface that understands consecutive mappings (and the granularity), we can use that instead. The prototype already prepared for that by notifying about consecutive ranges. Thanks! -- Thanks, David / dhildenb
© 2016 - 2024 Red Hat, Inc.