contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++- contrib/libvhost-user/libvhost-user.h | 8 + exec.c | 44 +++-- hw/virtio/trace-events | 13 ++ hw/virtio/vhost-user.c | 293 +++++++++++++++++++++++++++- include/exec/cpu-common.h | 3 + include/exec/ram_addr.h | 2 + migration/migration.c | 3 + migration/migration.h | 8 + migration/postcopy-ram.c | 357 +++++++++++++++++++++++++++------- migration/postcopy-ram.h | 69 +++++++ migration/ram.c | 5 + migration/ram.h | 1 + migration/savevm.c | 13 ++ migration/trace-events | 6 + trace-events | 3 + vl.c | 4 +- 17 files changed, 926 insertions(+), 84 deletions(-)
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Hi, This is a RFC/WIP series that enables postcopy migration with shared memory to a vhost-user process. It's based off current-head + Juan's load_cleanup series, and Alexey's bitmap series (v4). It's very lightly tested and seems to work, but it's quite rough. I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to use the new feature, since this is about the simplest client around. Structure: The basic idea is that near the start of postcopy, the client opens its own userfaultfd fd and sends that back to QEMU over the socket it's already using for VHUST_USER_* commands. Then when VHOST_USER_SET_MEM_TABLE arrives it registers the areas with userfaultfd and sends the mapped addresses back to QEMU. QEMU then reads the clients UFD in it's fault thread and issues requests back to the source as needed. QEMU also issues 'WAKE' ioctls on the UFD to let the client know that the page has arrived and can carry on. A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that the QEMU knows the client can talk postcopy. Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are added to guide the process along. Current known issues: I've not tested it with hugepages yet; and I suspect the madvises will need tweaking for it. The qemu gets to see the base addresses that the client has its regions mapped at; that's not great for security Take care of deadlocking; any thread in the client that accesses a userfault protected page can stall. There's a nasty hack of a lock around the set_mem_table message. I've not looked at the recent IOMMU code. Some cleanup and a lot of corner cases need thinking about. There are probably plenty of unknown issues as well. Test setup: I'm running on one host at the moment, with the guest scping a large file from the host as it migrates. The setup is based on one I found in the vhost-user setups. You'll need a recent kernel for the shared memory support in userfaultfd, and userfault isn't that happy if a process using shared memory core's - so make sure you have the latest fixes. SESS=vhost ulimit -c unlimited tmux -L $SESS new-session -d tmux -L $SESS set-option -g history-limit 30000 # Start a router using the system qemu tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0 tmux -L $SESS set-option -g set-remain-on-exit on # Start source vhost bridge tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log" sleep 0.5 tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/ tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log " # Start dest vhost bridge tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0. 1:5556 2>dst-vub-log" sleep 0.5 tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log" tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on tmux -L $SESS send-keys -t source "migrate_set_speed 20M tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on then once booted: tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M' tmux -L vhost send-keys -t source 'migrate_start_postcopy^M' (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M) Dave Dr. David Alan Gilbert (29): RAMBlock/migration: Add migration flags migrate: Update ram_block_discard_range for shared qemu_ram_block_host_offset migration/ram: ramblock_recv_bitmap_test_byte_offset postcopy: use UFFDIO_ZEROPAGE only when available postcopy: Add notifier chain postcopy: Add vhost-user flag for postcopy and check it vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message vhub: Support sending fds back to qemu vhub: Open userfaultfd postcopy: Allow registering of fd handler vhost+postcopy: Register shared ufd with postcopy vhost+postcopy: Transmit 'listen' to client vhost+postcopy: Register new regions with the ufd vhost+postcopy: Send address back to qemu vhost+postcopy: Stash RAMBlock and offset vhost+postcopy: Send requests to source for shared pages vhost+postcopy: Resolve client address postcopy: wake shared postcopy: postcopy_notify_shared_wake vhost+postcopy: Add vhost waker vhost+postcopy: Call wakeups vub+postcopy: madvises vhost+postcopy: Lock around set_mem_table vhu: enable = false on get_vring_base vhost: Add VHOST_USER_POSTCOPY_END message vhost+postcopy: Wire up POSTCOPY_END notify postcopy: Allow shared memory vhost-user: Claim support for postcopy contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++- contrib/libvhost-user/libvhost-user.h | 8 + exec.c | 44 +++-- hw/virtio/trace-events | 13 ++ hw/virtio/vhost-user.c | 293 +++++++++++++++++++++++++++- include/exec/cpu-common.h | 3 + include/exec/ram_addr.h | 2 + migration/migration.c | 3 + migration/migration.h | 8 + migration/postcopy-ram.c | 357 +++++++++++++++++++++++++++------- migration/postcopy-ram.h | 69 +++++++ migration/ram.c | 5 + migration/ram.h | 1 + migration/savevm.c | 13 ++ migration/trace-events | 6 + trace-events | 3 + vl.c | 4 +- 17 files changed, 926 insertions(+), 84 deletions(-) -- 2.13.0
* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Hi, > This is a RFC/WIP series that enables postcopy migration > with shared memory to a vhost-user process. > It's based off current-head + Juan's load_cleanup series, and > Alexey's bitmap series (v4). It's very lightly tested and seems > to work, but it's quite rough. Marc-André asked if I had a git with it all applied; so here we are: https://github.com/dagrh/qemu/commits/vhost git@github.com:dagrh/qemu.git on the vhost branch Dave > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to > use the new feature, since this is about the simplest > client around. > > Structure: > > The basic idea is that near the start of postcopy, the client > opens its own userfaultfd fd and sends that back to QEMU over > the socket it's already using for VHUST_USER_* commands. > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the > areas with userfaultfd and sends the mapped addresses back to QEMU. > > QEMU then reads the clients UFD in it's fault thread and issues > requests back to the source as needed. > QEMU also issues 'WAKE' ioctls on the UFD to let the client know > that the page has arrived and can carry on. > > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that > the QEMU knows the client can talk postcopy. > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are > added to guide the process along. > > Current known issues: > I've not tested it with hugepages yet; and I suspect the madvises > will need tweaking for it. > > The qemu gets to see the base addresses that the client has its > regions mapped at; that's not great for security > > Take care of deadlocking; any thread in the client that > accesses a userfault protected page can stall. > > There's a nasty hack of a lock around the set_mem_table message. > > I've not looked at the recent IOMMU code. > > Some cleanup and a lot of corner cases need thinking about. > > There are probably plenty of unknown issues as well. > > Test setup: > I'm running on one host at the moment, with the guest > scping a large file from the host as it migrates. > The setup is based on one I found in the vhost-user setups. > You'll need a recent kernel for the shared memory support > in userfaultfd, and userfault isn't that happy if a process > using shared memory core's - so make sure you have the > latest fixes. > > SESS=vhost > ulimit -c unlimited > tmux -L $SESS new-session -d > tmux -L $SESS set-option -g history-limit 30000 > # Start a router using the system qemu > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0 > tmux -L $SESS set-option -g set-remain-on-exit on > # Start source vhost bridge > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log" > sleep 0.5 > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/ > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log " > # Start dest vhost bridge > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0. > 1:5556 2>dst-vub-log" > sleep 0.5 > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log" > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on > tmux -L $SESS send-keys -t source "migrate_set_speed 20M > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on > > then once booted: > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M' > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M' > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M) > > > Dave > > Dr. David Alan Gilbert (29): > RAMBlock/migration: Add migration flags > migrate: Update ram_block_discard_range for shared > qemu_ram_block_host_offset > migration/ram: ramblock_recv_bitmap_test_byte_offset > postcopy: use UFFDIO_ZEROPAGE only when available > postcopy: Add notifier chain > postcopy: Add vhost-user flag for postcopy and check it > vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message > vhub: Support sending fds back to qemu > vhub: Open userfaultfd > postcopy: Allow registering of fd handler > vhost+postcopy: Register shared ufd with postcopy > vhost+postcopy: Transmit 'listen' to client > vhost+postcopy: Register new regions with the ufd > vhost+postcopy: Send address back to qemu > vhost+postcopy: Stash RAMBlock and offset > vhost+postcopy: Send requests to source for shared pages > vhost+postcopy: Resolve client address > postcopy: wake shared > postcopy: postcopy_notify_shared_wake > vhost+postcopy: Add vhost waker > vhost+postcopy: Call wakeups > vub+postcopy: madvises > vhost+postcopy: Lock around set_mem_table > vhu: enable = false on get_vring_base > vhost: Add VHOST_USER_POSTCOPY_END message > vhost+postcopy: Wire up POSTCOPY_END notify > postcopy: Allow shared memory > vhost-user: Claim support for postcopy > > contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++- > contrib/libvhost-user/libvhost-user.h | 8 + > exec.c | 44 +++-- > hw/virtio/trace-events | 13 ++ > hw/virtio/vhost-user.c | 293 +++++++++++++++++++++++++++- > include/exec/cpu-common.h | 3 + > include/exec/ram_addr.h | 2 + > migration/migration.c | 3 + > migration/migration.h | 8 + > migration/postcopy-ram.c | 357 +++++++++++++++++++++++++++------- > migration/postcopy-ram.h | 69 +++++++ > migration/ram.c | 5 + > migration/ram.h | 1 + > migration/savevm.c | 13 ++ > migration/trace-events | 6 + > trace-events | 3 + > vl.c | 4 +- > 17 files changed, 926 insertions(+), 84 deletions(-) > > -- > 2.13.0 > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Hi On Thu, Jun 29, 2017 at 8:56 PM Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Hi, > > This is a RFC/WIP series that enables postcopy migration > > with shared memory to a vhost-user process. > > It's based off current-head + Juan's load_cleanup series, and > > Alexey's bitmap series (v4). It's very lightly tested and seems > > to work, but it's quite rough. > > Marc-André asked if I had a git with it all applied; so here we are: > https://github.com/dagrh/qemu/commits/vhost > git@github.com:dagrh/qemu.git on the vhost branch > > I started looking at the series, but I am not familiar with ufd/postcopy. Could you update vhost-user.txt to describe the new messages? Otherwise, make check hangs in /x86_64/vhost-user/connect-fail (might be an unrelated regression?) Thanks -- Marc-André Lureau
* Marc-André Lureau (marcandre.lureau@gmail.com) wrote: > Hi > > On Thu, Jun 29, 2017 at 8:56 PM Dr. David Alan Gilbert <dgilbert@redhat.com> > wrote: > > > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote: > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > Hi, > > > This is a RFC/WIP series that enables postcopy migration > > > with shared memory to a vhost-user process. > > > It's based off current-head + Juan's load_cleanup series, and > > > Alexey's bitmap series (v4). It's very lightly tested and seems > > > to work, but it's quite rough. > > > > Marc-André asked if I had a git with it all applied; so here we are: > > https://github.com/dagrh/qemu/commits/vhost > > git@github.com:dagrh/qemu.git on the vhost branch > > > > > I started looking at the series, but I am not familiar with ufd/postcopy. I'm similarly unfamiliar with the vhost code when I started this (which probably shows!). The main thing about ufd is that a process registers with the ufd system and registers an area of memory with it; accesses to the memory block until the page is available, a message is sent down the ufd, and whoever receives that message may then respond by atomically copying a page into memory, or wakeing the process when it knows the page is there. This is the first time we've tried to use userfaultfd with shared memory and it does need a very recent kernel for it (4.11.0 or rhel 7.4 beta) > Could you update vhost-user.txt to describe the new messages? See below; I'll add that in. > Otherwise, > make check hangs in /x86_64/vhost-user/connect-fail (might be an unrelated > regression?) Thanks Entirely possible I broke it; I'll have a look - at the moment I'm more interested in comments on the structure of this set. Dave diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt index 481ab56e35..fec4cd0ffe 100644 --- a/docs/interop/vhost-user.txt +++ b/docs/interop/vhost-user.txt @@ -273,6 +273,14 @@ Once the source has finished migration, rings will be stopped by the source. No further update must be done before rings are restarted. +In postcopy migration the slave is started before all the memory has been +received from the source host, and care must be taken to avoid accessing pages +that have yet to be received. The slave opens a 'userfault'-fd and registers +the memory with it; this fd is then passed back over to the master. +The master services requests on the userfaultfd for pages that are accessed +and when the page is available it performs WAKE ioctl's on the userfaultfd +to wake the stalled slave. + IOMMU support ------------- @@ -326,6 +334,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 #define VHOST_USER_PROTOCOL_F_MTU 4 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 +#define VHOST_USER_PROTOCOL_F_POSTCOPY 6 Master message types -------------------- @@ -402,12 +411,17 @@ Master message types Id: 5 Equivalent ioctl: VHOST_SET_MEM_TABLE Master payload: memory regions description + Slave payload: (postcopy only) memory regions description Sets the memory map regions on the slave so it can translate the vring addresses. In the ancillary data there is an array of file descriptors for each memory mapped region. The size and ordering of the fds matches the number and ordering of memory regions. + When postcopy-listening has been received, SET_MEM_TABLE replies with + the bases of the memory mapped regions to the master. It must have mmap'd + the regions and enabled userfaultfd on them. + * VHOST_USER_SET_LOG_BASE Id: 6 @@ -580,6 +594,29 @@ Master message types This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature has been successfully negotiated. + * VHOST_USER_POSTCOPY_ADVISE + Id: 23 + Master payload: N/A + Slave payload: userfault fd + u64 + + Master advises slave that a migration with postcopy enabled is underway, + the slave must open a userfaultfd for later use. + Note that at this stage the migration is still in precopy mode. + + * VHOST_USER_POSTCOPY_LISTEN + Id: 24 + Master payload: N/A + + Master advises slave that a transition to postcopy mode has happened. + + * VHOST_USER_POSTCOPY_END + Id: 25 + Slave payload: u64 + + Master advises that postcopy migration has now completed. The + slave must disable the userfaultfd. The response is an acknowledgement + only. + Slave message types ------------------- > -- > Marc-André Lureau -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Marc-André Lureau (marcandre.lureau@gmail.com) wrote: > Hi > > On Thu, Jun 29, 2017 at 8:56 PM Dr. David Alan Gilbert <dgilbert@redhat.com> > wrote: > > > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote: > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > Hi, > > > This is a RFC/WIP series that enables postcopy migration > > > with shared memory to a vhost-user process. > > > It's based off current-head + Juan's load_cleanup series, and > > > Alexey's bitmap series (v4). It's very lightly tested and seems > > > to work, but it's quite rough. > > > > Marc-André asked if I had a git with it all applied; so here we are: > > https://github.com/dagrh/qemu/commits/vhost > > git@github.com:dagrh/qemu.git on the vhost branch > > > > > I started looking at the series, but I am not familiar with ufd/postcopy. > Could you update vhost-user.txt to describe the new messages? Otherwise, > make check hangs in /x86_64/vhost-user/connect-fail (might be an unrelated > regression?) Thanks OK, figured that one out; it was a cleanup path for the postcopy notifier trying to remove the notifier when because we were testing a failure path the notifier hadn't been added in the first place. (That was really nasty to find, for some reason those tests refuse to generate core dumps; I ended up with a while loop doing gdb --pid $(pgrep....) to nail the qemu that was about to seg) Dave > -- > Marc-André Lureau -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Hi, > This is a RFC/WIP series that enables postcopy migration > with shared memory to a vhost-user process. > It's based off current-head + Juan's load_cleanup series, and > Alexey's bitmap series (v4). It's very lightly tested and seems > to work, but it's quite rough. > > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to > use the new feature, since this is about the simplest > client around. > > Structure: > > The basic idea is that near the start of postcopy, the client > opens its own userfaultfd fd and sends that back to QEMU over > the socket it's already using for VHUST_USER_* commands. > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the > areas with userfaultfd and sends the mapped addresses back to QEMU. > > QEMU then reads the clients UFD in it's fault thread and issues > requests back to the source as needed. > QEMU also issues 'WAKE' ioctls on the UFD to let the client know > that the page has arrived and can carry on. > > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that > the QEMU knows the client can talk postcopy. > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are > added to guide the process along. > > Current known issues: > I've not tested it with hugepages yet; and I suspect the madvises > will need tweaking for it. > > The qemu gets to see the base addresses that the client has its > regions mapped at; that's not great for security Not urgent to fix. > Take care of deadlocking; any thread in the client that > accesses a userfault protected page can stall. And it can happen under a lock quite easily. What exactly is proposed here? Maybe we want to reuse the new channel that the IOMMU uses. > There's a nasty hack of a lock around the set_mem_table message. Yes. > I've not looked at the recent IOMMU code. > > Some cleanup and a lot of corner cases need thinking about. > > There are probably plenty of unknown issues as well. At the protocol level, I'd like to rename the feature to USER_PAGEFAULT. Client does not really know anything about copies, it's all internal to qemu. Spec can document that it's used by qemu for postcopy. > Test setup: > I'm running on one host at the moment, with the guest > scping a large file from the host as it migrates. > The setup is based on one I found in the vhost-user setups. > You'll need a recent kernel for the shared memory support > in userfaultfd, and userfault isn't that happy if a process > using shared memory core's - so make sure you have the > latest fixes. > > SESS=vhost > ulimit -c unlimited > tmux -L $SESS new-session -d > tmux -L $SESS set-option -g history-limit 30000 > # Start a router using the system qemu > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0 > tmux -L $SESS set-option -g set-remain-on-exit on > # Start source vhost bridge > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log" > sleep 0.5 > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/ > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log " > # Start dest vhost bridge > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0. > 1:5556 2>dst-vub-log" > sleep 0.5 > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log" > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on > tmux -L $SESS send-keys -t source "migrate_set_speed 20M > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on > > then once booted: > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M' > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M' > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M) > > > Dave > > Dr. David Alan Gilbert (29): > RAMBlock/migration: Add migration flags > migrate: Update ram_block_discard_range for shared > qemu_ram_block_host_offset > migration/ram: ramblock_recv_bitmap_test_byte_offset > postcopy: use UFFDIO_ZEROPAGE only when available > postcopy: Add notifier chain > postcopy: Add vhost-user flag for postcopy and check it > vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message > vhub: Support sending fds back to qemu > vhub: Open userfaultfd > postcopy: Allow registering of fd handler > vhost+postcopy: Register shared ufd with postcopy > vhost+postcopy: Transmit 'listen' to client > vhost+postcopy: Register new regions with the ufd > vhost+postcopy: Send address back to qemu > vhost+postcopy: Stash RAMBlock and offset > vhost+postcopy: Send requests to source for shared pages > vhost+postcopy: Resolve client address > postcopy: wake shared > postcopy: postcopy_notify_shared_wake > vhost+postcopy: Add vhost waker > vhost+postcopy: Call wakeups > vub+postcopy: madvises > vhost+postcopy: Lock around set_mem_table > vhu: enable = false on get_vring_base > vhost: Add VHOST_USER_POSTCOPY_END message > vhost+postcopy: Wire up POSTCOPY_END notify > postcopy: Allow shared memory > vhost-user: Claim support for postcopy > > contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++- > contrib/libvhost-user/libvhost-user.h | 8 + > exec.c | 44 +++-- > hw/virtio/trace-events | 13 ++ > hw/virtio/vhost-user.c | 293 +++++++++++++++++++++++++++- > include/exec/cpu-common.h | 3 + > include/exec/ram_addr.h | 2 + > migration/migration.c | 3 + > migration/migration.h | 8 + > migration/postcopy-ram.c | 357 +++++++++++++++++++++++++++------- > migration/postcopy-ram.h | 69 +++++++ > migration/ram.c | 5 + > migration/ram.h | 1 + > migration/savevm.c | 13 ++ > migration/trace-events | 6 + > trace-events | 3 + > vl.c | 4 +- > 17 files changed, 926 insertions(+), 84 deletions(-) > > -- > 2.13.0
* Michael S. Tsirkin (mst@redhat.com) wrote: > On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Hi, > > This is a RFC/WIP series that enables postcopy migration > > with shared memory to a vhost-user process. > > It's based off current-head + Juan's load_cleanup series, and > > Alexey's bitmap series (v4). It's very lightly tested and seems > > to work, but it's quite rough. > > > > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to > > use the new feature, since this is about the simplest > > client around. > > > > Structure: > > > > The basic idea is that near the start of postcopy, the client > > opens its own userfaultfd fd and sends that back to QEMU over > > the socket it's already using for VHUST_USER_* commands. > > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the > > areas with userfaultfd and sends the mapped addresses back to QEMU. > > > > QEMU then reads the clients UFD in it's fault thread and issues > > requests back to the source as needed. > > QEMU also issues 'WAKE' ioctls on the UFD to let the client know > > that the page has arrived and can carry on. > > > > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that > > the QEMU knows the client can talk postcopy. > > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are > > added to guide the process along. > > > > Current known issues: > > I've not tested it with hugepages yet; and I suspect the madvises > > will need tweaking for it. > > > > The qemu gets to see the base addresses that the client has its > > regions mapped at; that's not great for security > > Not urgent to fix. > > > Take care of deadlocking; any thread in the client that > > accesses a userfault protected page can stall. > > And it can happen under a lock quite easily. > What exactly is proposed here? > Maybe we want to reuse the new channel that the IOMMU uses. There's no fundamental reason to get deadlocks as long as you get it right; the qemu thread that processes the user-fault's is a separate independent thread, so once it's going the client can do whatever it likes and it will get woken up without intervention. Some care is needed around the postcopy-end; reception of the message that tells you to drop the userfault enables (which frees anything that hasn't been woken) must be allowed to happen for the postcopy complete; we take care that QEMUs fault thread lives on until that message is acknowledged. I'm more worried about how this will work in a full packet switch when one vhost-user client for an incoming migration stalls the whole switch unless care is taken about the design. How do we figure out whether this is going to fly on a full stack? That's my main reason for getting this WIP set out here to get comments. > > There's a nasty hack of a lock around the set_mem_table message. > > Yes. > > > I've not looked at the recent IOMMU code. > > > > Some cleanup and a lot of corner cases need thinking about. > > > > There are probably plenty of unknown issues as well. > > At the protocol level, I'd like to rename the feature to > USER_PAGEFAULT. Client does not really know anything about > copies, it's all internal to qemu. > Spec can document that it's used by qemu for postcopy. OK, tbh I suspect that using it for anything else would be tricky without adding more protocol features for that other use case. Dave > > Test setup: > > I'm running on one host at the moment, with the guest > > scping a large file from the host as it migrates. > > The setup is based on one I found in the vhost-user setups. > > You'll need a recent kernel for the shared memory support > > in userfaultfd, and userfault isn't that happy if a process > > using shared memory core's - so make sure you have the > > latest fixes. > > > > SESS=vhost > > ulimit -c unlimited > > tmux -L $SESS new-session -d > > tmux -L $SESS set-option -g history-limit 30000 > > # Start a router using the system qemu > > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca > > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0 > > tmux -L $SESS set-option -g set-remain-on-exit on > > # Start source vhost bridge > > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log" > > sleep 0.5 > > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe > > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/ > > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log " > > # Start dest vhost bridge > > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0. > > 1:5556 2>dst-vub-log" > > sleep 0.5 > > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend > > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm > > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log" > > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on > > tmux -L $SESS send-keys -t source "migrate_set_speed 20M > > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on > > > > then once booted: > > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M' > > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M' > > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M) > > > > > > Dave > > > > Dr. David Alan Gilbert (29): > > RAMBlock/migration: Add migration flags > > migrate: Update ram_block_discard_range for shared > > qemu_ram_block_host_offset > > migration/ram: ramblock_recv_bitmap_test_byte_offset > > postcopy: use UFFDIO_ZEROPAGE only when available > > postcopy: Add notifier chain > > postcopy: Add vhost-user flag for postcopy and check it > > vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message > > vhub: Support sending fds back to qemu > > vhub: Open userfaultfd > > postcopy: Allow registering of fd handler > > vhost+postcopy: Register shared ufd with postcopy > > vhost+postcopy: Transmit 'listen' to client > > vhost+postcopy: Register new regions with the ufd > > vhost+postcopy: Send address back to qemu > > vhost+postcopy: Stash RAMBlock and offset > > vhost+postcopy: Send requests to source for shared pages > > vhost+postcopy: Resolve client address > > postcopy: wake shared > > postcopy: postcopy_notify_shared_wake > > vhost+postcopy: Add vhost waker > > vhost+postcopy: Call wakeups > > vub+postcopy: madvises > > vhost+postcopy: Lock around set_mem_table > > vhu: enable = false on get_vring_base > > vhost: Add VHOST_USER_POSTCOPY_END message > > vhost+postcopy: Wire up POSTCOPY_END notify > > postcopy: Allow shared memory > > vhost-user: Claim support for postcopy > > > > contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++- > > contrib/libvhost-user/libvhost-user.h | 8 + > > exec.c | 44 +++-- > > hw/virtio/trace-events | 13 ++ > > hw/virtio/vhost-user.c | 293 +++++++++++++++++++++++++++- > > include/exec/cpu-common.h | 3 + > > include/exec/ram_addr.h | 2 + > > migration/migration.c | 3 + > > migration/migration.h | 8 + > > migration/postcopy-ram.c | 357 +++++++++++++++++++++++++++------- > > migration/postcopy-ram.h | 69 +++++++ > > migration/ram.c | 5 + > > migration/ram.h | 1 + > > migration/savevm.c | 13 ++ > > migration/trace-events | 6 + > > trace-events | 3 + > > vl.c | 4 +- > > 17 files changed, 926 insertions(+), 84 deletions(-) > > > > -- > > 2.13.0 -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri, Jul 07, 2017 at 01:01:56PM +0100, Dr. David Alan Gilbert wrote: > > > Take care of deadlocking; any thread in the client that > > > accesses a userfault protected page can stall. > > > > And it can happen under a lock quite easily. > > What exactly is proposed here? > > Maybe we want to reuse the new channel that the IOMMU uses. > > There's no fundamental reason to get deadlocks as long as you > get it right; the qemu thread that processes the user-fault's > is a separate independent thread, so once it's going the client > can do whatever it likes and it will get woken up without > intervention. You take a lock for the channel, then access guest memory. Then the thread that gets messages from qemu can't get on the channel to mark range as populated. > Some care is needed around the postcopy-end; reception of the > message that tells you to drop the userfault enables (which > frees anything that hasn't been woken) must be allowed to happen > for the postcopy complete; we take care that QEMUs fault > thread lives on until that message is acknowledged. > > I'm more worried about how this will work in a full packet switch > when one vhost-user client for an incoming migration stalls > the whole switch unless care is taken about the design. > How do we figure out whether this is going to fly on a full stack? It's performance though. Client could run in a separate thread for a while until migration finishes. We need to make sure there's explicit documentation that tells clients at what point they might block. > That's my main reason for getting this WIP set out here to > get comments. What will happen if QEMU dies? Is there a way to unblock the client? > > > There's a nasty hack of a lock around the set_mem_table message. > > > > Yes. > > > > > I've not looked at the recent IOMMU code. > > > > > > Some cleanup and a lot of corner cases need thinking about. > > > > > > There are probably plenty of unknown issues as well. > > > > At the protocol level, I'd like to rename the feature to > > USER_PAGEFAULT. Client does not really know anything about > > copies, it's all internal to qemu. > > Spec can document that it's used by qemu for postcopy. > > OK, tbh I suspect that using it for anything else would be tricky > without adding more protocol features for that other use case. > > Dave Why exactly? How does client have to know it's migration? -- MST
* Michael S. Tsirkin (mst@redhat.com) wrote: > On Fri, Jul 07, 2017 at 01:01:56PM +0100, Dr. David Alan Gilbert wrote: > > > > Take care of deadlocking; any thread in the client that > > > > accesses a userfault protected page can stall. > > > > > > And it can happen under a lock quite easily. > > > What exactly is proposed here? > > > Maybe we want to reuse the new channel that the IOMMU uses. > > > > There's no fundamental reason to get deadlocks as long as you > > get it right; the qemu thread that processes the user-fault's > > is a separate independent thread, so once it's going the client > > can do whatever it likes and it will get woken up without > > intervention. > > You take a lock for the channel, then access guest memory. > Then the thread that gets messages from qemu can't get > on the channel to mark range as populated. It doesn't need to get the message from qemu to know it's populated though; qemu performs a WAKE ioctl on the userfaultfd to cause it to wake, so there's no action needed by the client. (If it does need to take a lock then ye we have a problem). > > Some care is needed around the postcopy-end; reception of the > > message that tells you to drop the userfault enables (which > > frees anything that hasn't been woken) must be allowed to happen > > for the postcopy complete; we take care that QEMUs fault > > thread lives on until that message is acknowledged. > > > > I'm more worried about how this will work in a full packet switch > > when one vhost-user client for an incoming migration stalls > > the whole switch unless care is taken about the design. > > How do we figure out whether this is going to fly on a full stack? > > It's performance though. Client could run in a separate > thread for a while until migration finishes. > We need to make sure there's explicit documentation > that tells clients at what point they might block. Right. > > That's my main reason for getting this WIP set out here to > > get comments. > > What will happen if QEMU dies? Is there a way to unblock the client? If the client can detect this and close it's userfaultfd then yes; of course that detection has to be done in a thread that can't be being blocked by anything related to the userfaultfd that it might be blocked on. > > > > There's a nasty hack of a lock around the set_mem_table message. > > > > > > Yes. > > > > > > > I've not looked at the recent IOMMU code. > > > > > > > > Some cleanup and a lot of corner cases need thinking about. > > > > > > > > There are probably plenty of unknown issues as well. > > > > > > At the protocol level, I'd like to rename the feature to > > > USER_PAGEFAULT. Client does not really know anything about > > > copies, it's all internal to qemu. > > > Spec can document that it's used by qemu for postcopy. > > > > OK, tbh I suspect that using it for anything else would be tricky > > without adding more protocol features for that other use case. > > > > Dave > > Why exactly? How does client have to know it's migration? It's more the sequence I worry about; we're reliant on making sure that the userfaultfd is registered with the RAM before it's ever accessed, and we unregister at the end. This all keys in with migration requesting registration at the right point before loading the devices. Dave > -- > MST -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Hello, David! Thank for you patch set. On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Hi, > This is a RFC/WIP series that enables postcopy migration > with shared memory to a vhost-user process. > It's based off current-head + Juan's load_cleanup series, and > Alexey's bitmap series (v4). It's very lightly tested and seems > to work, but it's quite rough. > > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to > use the new feature, since this is about the simplest > client around. > > Structure: > > The basic idea is that near the start of postcopy, the client > opens its own userfaultfd fd and sends that back to QEMU over > the socket it's already using for VHUST_USER_* commands. > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the > areas with userfaultfd and sends the mapped addresses back to QEMU. userfault fd should be only one per all affected processes. But why are you opening userfaultfd on client side, why not to pass userfault fd which was opened at QEMU side? I guess, it could be several virtual switches with different ports (it's exotic configuration, but configuration when we have one QEMU, one vswitchd, and serveral vhost-user ports is typical), and as example, QEMU could be connected to these vswitches through these ports. In this case you will obtain 2 different userfault fd in QEMU. In case of one QEMU, one vswitchd and several vhost-user ports, you are keeping userfaultfd in VuDev structure on client side, looks like it's virtion_net sibling from DPDK, and that structure is per vhost-user connection (per one port). So from my point of view it's better to open fd on QEMU side, and pass it the same way as shared mem fd in SET_MEM_TABLE, but in POSTCOPY_ADVISE. > > QEMU then reads the clients UFD in it's fault thread and issues > requests back to the source as needed. > QEMU also issues 'WAKE' ioctls on the UFD to let the client know > that the page has arrived and can carry on. Not so clear for me why QEMU have to inform vhost client, due to single userfault fd, and kernel should wake up another faulted thread/processes. In my approach I just to send information about copied/received page to vhot client, to be able to enable previously disabled VRING. > > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that > the QEMU knows the client can talk postcopy. > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are > added to guide the process along. > > Current known issues: > I've not tested it with hugepages yet; and I suspect the madvises > will need tweaking for it. I saw you didn't change order of SET_MEM_TABLE call in QEMU side, some part or pages already arrived and copied, so I'm doing hole here according to received map. > > The qemu gets to see the base addresses that the client has its > regions mapped at; that's not great for security > > Take care of deadlocking; any thread in the client that > accesses a userfault protected page can stall. That's why I decided to disable VRINGs, but not the way as you did in GET_VRING_BASE, I send received bitmap, right after SET_MEM_TABLE, here could be synchronization problem, maybe similar problem as you described in "vhost+postcopy: Lock around set_mem_table" Unfortunately, my patches isn't yet ready. > > There's a nasty hack of a lock around the set_mem_table message. > > I've not looked at the recent IOMMU code. > > Some cleanup and a lot of corner cases need thinking about. > > There are probably plenty of unknown issues as well. > > Test setup: > I'm running on one host at the moment, with the guest > scping a large file from the host as it migrates. > The setup is based on one I found in the vhost-user setups. > You'll need a recent kernel for the shared memory support > in userfaultfd, and userfault isn't that happy if a process > using shared memory core's - so make sure you have the > latest fixes. > > SESS=vhost > ulimit -c unlimited > tmux -L $SESS new-session -d > tmux -L $SESS set-option -g history-limit 30000 > # Start a router using the system qemu > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0 > tmux -L $SESS set-option -g set-remain-on-exit on > # Start source vhost bridge > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log" > sleep 0.5 > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/ > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log " > # Start dest vhost bridge > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0. > 1:5556 2>dst-vub-log" > sleep 0.5 > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log" > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on > tmux -L $SESS send-keys -t source "migrate_set_speed 20M > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on > > then once booted: > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M' > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M' > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M) > > > Dave > > Dr. David Alan Gilbert (29): > RAMBlock/migration: Add migration flags > migrate: Update ram_block_discard_range for shared > qemu_ram_block_host_offset > migration/ram: ramblock_recv_bitmap_test_byte_offset > postcopy: use UFFDIO_ZEROPAGE only when available > postcopy: Add notifier chain > postcopy: Add vhost-user flag for postcopy and check it > vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message > vhub: Support sending fds back to qemu > vhub: Open userfaultfd > postcopy: Allow registering of fd handler > vhost+postcopy: Register shared ufd with postcopy > vhost+postcopy: Transmit 'listen' to client > vhost+postcopy: Register new regions with the ufd > vhost+postcopy: Send address back to qemu > vhost+postcopy: Stash RAMBlock and offset > vhost+postcopy: Send requests to source for shared pages > vhost+postcopy: Resolve client address > postcopy: wake shared > postcopy: postcopy_notify_shared_wake > vhost+postcopy: Add vhost waker > vhost+postcopy: Call wakeups > vub+postcopy: madvises > vhost+postcopy: Lock around set_mem_table > vhu: enable = false on get_vring_base > vhost: Add VHOST_USER_POSTCOPY_END message > vhost+postcopy: Wire up POSTCOPY_END notify > postcopy: Allow shared memory > vhost-user: Claim support for postcopy > > contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++- > contrib/libvhost-user/libvhost-user.h | 8 + > exec.c | 44 +++-- > hw/virtio/trace-events | 13 ++ > hw/virtio/vhost-user.c | 293 +++++++++++++++++++++++++++- > include/exec/cpu-common.h | 3 + > include/exec/ram_addr.h | 2 + > migration/migration.c | 3 + > migration/migration.h | 8 + > migration/postcopy-ram.c | 357 +++++++++++++++++++++++++++------- > migration/postcopy-ram.h | 69 +++++++ > migration/ram.c | 5 + > migration/ram.h | 1 + > migration/savevm.c | 13 ++ > migration/trace-events | 6 + > trace-events | 3 + > vl.c | 4 +- > 17 files changed, 926 insertions(+), 84 deletions(-) > > -- > 2.13.0 > > -- BR Alexey
* Alexey (a.perevalov@samsung.com) wrote: > > Hello, David! > > Thank for you patch set. > > On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Hi, > > This is a RFC/WIP series that enables postcopy migration > > with shared memory to a vhost-user process. > > It's based off current-head + Juan's load_cleanup series, and > > Alexey's bitmap series (v4). It's very lightly tested and seems > > to work, but it's quite rough. > > > > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to > > use the new feature, since this is about the simplest > > client around. > > > > Structure: > > > > The basic idea is that near the start of postcopy, the client > > opens its own userfaultfd fd and sends that back to QEMU over > > the socket it's already using for VHUST_USER_* commands. > > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the > > areas with userfaultfd and sends the mapped addresses back to QEMU. > > userfault fd should be only one per all affected processes. But > why are you opening userfaultfd on client side, why not to pass > userfault fd which was opened at QEMU side? I just checked with Andrea on the semantics, and ufd don't work like that. Any given userfaultfd only works on the address space of the process that opened it; so if you want a process to block on it's memory space it's the one that has to open the ufd. (I don't think I knew that when I wrote the code!) The nice thing about that is that you never get too confused about address spaces - any one ufd always has one address space in it's ioctls associated with one process. > I guess, it could > be several virtual switches with different ports (it's exotic > configuration, but configuration when we have one QEMU, one vswitchd, > and serveral vhost-user ports is typical), and as example, > QEMU could be connected to these vswitches through these ports. > In this case you will obtain 2 different userfault fd in QEMU. > In case of one QEMU, one vswitchd and several vhost-user ports, > you are keeping userfaultfd in VuDev structure on client side, > looks like it's virtion_net sibling from DPDK, and that structure > is per vhost-user connection (per one port). Multiple switches make sense to me actually; running two switches and having redundant routes in each VM let you live update the switch process one at a time. > So from my point of view it's better to open fd on QEMU side, and pass it > the same way as shared mem fd in SET_MEM_TABLE, but in POSTCOPY_ADVISE. Yes I see where you're coming from; but it's one address space per-ufd; If you had one ufd then you'd have to change the messages to be 'pid ... is waiting on address ....' and all the ioctls for doing wakes etc would have to gain a PID. > > > > QEMU then reads the clients UFD in it's fault thread and issues > > requests back to the source as needed. > > QEMU also issues 'WAKE' ioctls on the UFD to let the client know > > that the page has arrived and can carry on. > Not so clear for me why QEMU have to inform vhost client, > due to single userfault fd, and kernel should wake up another faulted > thread/processes. > In my approach I just to send information about copied/received page > to vhot client, to be able to enable previously disabled VRING. The client itself doesn't get notified; it's a UFFDIO_WAKE ioctl on the ufd that tells the kernel it can unblock a process that's trying to access the page. (Their is potential to remove some of that - if we can get the kernel to wake all the waiters for a physical page when a UFFDIO_COPY is done it would remove a lot of those). > > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that > > the QEMU knows the client can talk postcopy. > > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are > > added to guide the process along. > > > > Current known issues: > > I've not tested it with hugepages yet; and I suspect the madvises > > will need tweaking for it. > I saw you didn't change order of SET_MEM_TABLE call in QEMU side, > some part or pages already arrived and copied, so I'm doing > hole here according to received map. right, so I'm assuming they'll hit ufd faults and be immediately WAKEd when I find the bit is set in the received-bitmap. > > The qemu gets to see the base addresses that the client has its > > regions mapped at; that's not great for security > > > > Take care of deadlocking; any thread in the client that > > accesses a userfault protected page can stall. > That's why I decided to disable VRINGs, but not the way as you did > in GET_VRING_BASE, I send received bitmap, right after SET_MEM_TABLE, > here could be synchronization problem, maybe similar problem as you described in > "vhost+postcopy: Lock around set_mem_table" > > Unfortunately, my patches isn't yet ready. That's OK; these patches just-about work; only enough for me to post them and ask for opinions. Dave > > > > There's a nasty hack of a lock around the set_mem_table message. > > > > I've not looked at the recent IOMMU code. > > > > Some cleanup and a lot of corner cases need thinking about. > > > > There are probably plenty of unknown issues as well. > > > > Test setup: > > I'm running on one host at the moment, with the guest > > scping a large file from the host as it migrates. > > The setup is based on one I found in the vhost-user setups. > > You'll need a recent kernel for the shared memory support > > in userfaultfd, and userfault isn't that happy if a process > > using shared memory core's - so make sure you have the > > latest fixes. > > > > SESS=vhost > > ulimit -c unlimited > > tmux -L $SESS new-session -d > > tmux -L $SESS set-option -g history-limit 30000 > > # Start a router using the system qemu > > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca > > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0 > > tmux -L $SESS set-option -g set-remain-on-exit on > > # Start source vhost bridge > > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log" > > sleep 0.5 > > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe > > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/ > > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log " > > # Start dest vhost bridge > > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0. > > 1:5556 2>dst-vub-log" > > sleep 0.5 > > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend > > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm > > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log" > > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on > > tmux -L $SESS send-keys -t source "migrate_set_speed 20M > > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on > > > > then once booted: > > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M' > > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M' > > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M) > > > > > > Dave > > > > Dr. David Alan Gilbert (29): > > RAMBlock/migration: Add migration flags > > migrate: Update ram_block_discard_range for shared > > qemu_ram_block_host_offset > > migration/ram: ramblock_recv_bitmap_test_byte_offset > > postcopy: use UFFDIO_ZEROPAGE only when available > > postcopy: Add notifier chain > > postcopy: Add vhost-user flag for postcopy and check it > > vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message > > vhub: Support sending fds back to qemu > > vhub: Open userfaultfd > > postcopy: Allow registering of fd handler > > vhost+postcopy: Register shared ufd with postcopy > > vhost+postcopy: Transmit 'listen' to client > > vhost+postcopy: Register new regions with the ufd > > vhost+postcopy: Send address back to qemu > > vhost+postcopy: Stash RAMBlock and offset > > vhost+postcopy: Send requests to source for shared pages > > vhost+postcopy: Resolve client address > > postcopy: wake shared > > postcopy: postcopy_notify_shared_wake > > vhost+postcopy: Add vhost waker > > vhost+postcopy: Call wakeups > > vub+postcopy: madvises > > vhost+postcopy: Lock around set_mem_table > > vhu: enable = false on get_vring_base > > vhost: Add VHOST_USER_POSTCOPY_END message > > vhost+postcopy: Wire up POSTCOPY_END notify > > postcopy: Allow shared memory > > vhost-user: Claim support for postcopy > > > > contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++- > > contrib/libvhost-user/libvhost-user.h | 8 + > > exec.c | 44 +++-- > > hw/virtio/trace-events | 13 ++ > > hw/virtio/vhost-user.c | 293 +++++++++++++++++++++++++++- > > include/exec/cpu-common.h | 3 + > > include/exec/ram_addr.h | 2 + > > migration/migration.c | 3 + > > migration/migration.h | 8 + > > migration/postcopy-ram.c | 357 +++++++++++++++++++++++++++------- > > migration/postcopy-ram.h | 69 +++++++ > > migration/ram.c | 5 + > > migration/ram.h | 1 + > > migration/savevm.c | 13 ++ > > migration/trace-events | 6 + > > trace-events | 3 + > > vl.c | 4 +- > > 17 files changed, 926 insertions(+), 84 deletions(-) > > > > -- > > 2.13.0 > > > > > > -- > > BR > Alexey -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Mon, Jul 03, 2017 at 05:49:26PM +0100, Dr. David Alan Gilbert wrote: > * Alexey (a.perevalov@samsung.com) wrote: > > > > Hello, David! > > > > Thank for you patch set. > > > > On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote: > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > Hi, > > > This is a RFC/WIP series that enables postcopy migration > > > with shared memory to a vhost-user process. > > > It's based off current-head + Juan's load_cleanup series, and > > > Alexey's bitmap series (v4). It's very lightly tested and seems > > > to work, but it's quite rough. > > > > > > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to > > > use the new feature, since this is about the simplest > > > client around. > > > > > > Structure: > > > > > > The basic idea is that near the start of postcopy, the client > > > opens its own userfaultfd fd and sends that back to QEMU over > > > the socket it's already using for VHUST_USER_* commands. > > > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the > > > areas with userfaultfd and sends the mapped addresses back to QEMU. > > > > userfault fd should be only one per all affected processes. But > > why are you opening userfaultfd on client side, why not to pass > > userfault fd which was opened at QEMU side? > > I just checked with Andrea on the semantics, and ufd don't work like that. > Any given userfaultfd only works on the address space of the process > that opened it; so if you want a process to block on it's memory space > it's the one that has to open the ufd. yes it obtains from vma in handle_userfault ctx = vmf->vma->vm_userfaultfd_ctx.ctx; so that's per vma and it set into vma vma->vm_userfaultfd_ctx.ctx = ctx; in userfaultfd_register(struct userfaultfd_ctx *ctx, but into userfaultfd_register it puts from struct userfaultfd_ctx *ctx = file->private_data; becase file descriptor was transfered over unix domain socket (SOL_SOCKET) logically to assume userfaultfd context will be the same. > (I don't think I knew that when I wrote the code!) > The nice thing about that is that you never get too confused about > address spaces - any one ufd always has one address space in it's ioctls > associated with one process. > > > I guess, it could > > be several virtual switches with different ports (it's exotic > > configuration, but configuration when we have one QEMU, one vswitchd, > > and serveral vhost-user ports is typical), and as example, > > QEMU could be connected to these vswitches through these ports. > > In this case you will obtain 2 different userfault fd in QEMU. > > In case of one QEMU, one vswitchd and several vhost-user ports, > > you are keeping userfaultfd in VuDev structure on client side, > > looks like it's virtion_net sibling from DPDK, and that structure > > is per vhost-user connection (per one port). > > Multiple switches make sense to me actually; running two switches > and having redundant routes in each VM let you live update the switch > process one at a time. > > > So from my point of view it's better to open fd on QEMU side, and pass it > > the same way as shared mem fd in SET_MEM_TABLE, but in POSTCOPY_ADVISE. > > Yes I see where you're coming from; but it's one address space per-ufd; > If you had one ufd then you'd have to change the messages to be > 'pid ... is waiting on address ....' > and all the ioctls for doing wakes etc would have to gain a PID. > > > > > > > QEMU then reads the clients UFD in it's fault thread and issues > > > requests back to the source as needed. > > > QEMU also issues 'WAKE' ioctls on the UFD to let the client know > > > that the page has arrived and can carry on. > > Not so clear for me why QEMU have to inform vhost client, > > due to single userfault fd, and kernel should wake up another faulted > > thread/processes. > > In my approach I just to send information about copied/received page > > to vhot client, to be able to enable previously disabled VRING. > > The client itself doesn't get notified; it's a UFFDIO_WAKE ioctl > on the ufd that tells the kernel it can unblock a process that's > trying to access the page. > (Their is potential to remove some of that - if we can get the > kernel to wake all the waiters for a physical page when a UFFDIO_COPY > is done it would remove a lot of those). > > > > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that > > > the QEMU knows the client can talk postcopy. > > > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are > > > added to guide the process along. > > > > > > Current known issues: > > > I've not tested it with hugepages yet; and I suspect the madvises > > > will need tweaking for it. > > I saw you didn't change order of SET_MEM_TABLE call in QEMU side, > > some part or pages already arrived and copied, so I'm doing > > hole here according to received map. > > right, so I'm assuming they'll hit ufd faults and be immediately > WAKEd when I find the bit is set in the received-bitmap. > > > > The qemu gets to see the base addresses that the client has its > > > regions mapped at; that's not great for security > > > > > > Take care of deadlocking; any thread in the client that > > > accesses a userfault protected page can stall. > > That's why I decided to disable VRINGs, but not the way as you did > > in GET_VRING_BASE, I send received bitmap, right after SET_MEM_TABLE, > > here could be synchronization problem, maybe similar problem as you described in > > "vhost+postcopy: Lock around set_mem_table" > > > > Unfortunately, my patches isn't yet ready. > > That's OK; these patches just-about work; only enough for > me to post them and ask for opinions. > > Dave > > > > > > > There's a nasty hack of a lock around the set_mem_table message. > > > > > > I've not looked at the recent IOMMU code. > > > > > > Some cleanup and a lot of corner cases need thinking about. > > > > > > There are probably plenty of unknown issues as well. > > > > > > Test setup: > > > I'm running on one host at the moment, with the guest > > > scping a large file from the host as it migrates. > > > The setup is based on one I found in the vhost-user setups. > > > You'll need a recent kernel for the shared memory support > > > in userfaultfd, and userfault isn't that happy if a process > > > using shared memory core's - so make sure you have the > > > latest fixes. > > > > > > SESS=vhost > > > ulimit -c unlimited > > > tmux -L $SESS new-session -d > > > tmux -L $SESS set-option -g history-limit 30000 > > > # Start a router using the system qemu > > > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca > > > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0 > > > tmux -L $SESS set-option -g set-remain-on-exit on > > > # Start source vhost bridge > > > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log" > > > sleep 0.5 > > > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe > > > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/ > > > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log " > > > # Start dest vhost bridge > > > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0. > > > 1:5556 2>dst-vub-log" > > > sleep 0.5 > > > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend > > > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm > > > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log" > > > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on > > > tmux -L $SESS send-keys -t source "migrate_set_speed 20M > > > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on > > > > > > then once booted: > > > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M' > > > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M' > > > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M) > > > > > > > > > Dave > > > > > > Dr. David Alan Gilbert (29): > > > RAMBlock/migration: Add migration flags > > > migrate: Update ram_block_discard_range for shared > > > qemu_ram_block_host_offset > > > migration/ram: ramblock_recv_bitmap_test_byte_offset > > > postcopy: use UFFDIO_ZEROPAGE only when available > > > postcopy: Add notifier chain > > > postcopy: Add vhost-user flag for postcopy and check it > > > vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message > > > vhub: Support sending fds back to qemu > > > vhub: Open userfaultfd > > > postcopy: Allow registering of fd handler > > > vhost+postcopy: Register shared ufd with postcopy > > > vhost+postcopy: Transmit 'listen' to client > > > vhost+postcopy: Register new regions with the ufd > > > vhost+postcopy: Send address back to qemu > > > vhost+postcopy: Stash RAMBlock and offset > > > vhost+postcopy: Send requests to source for shared pages > > > vhost+postcopy: Resolve client address > > > postcopy: wake shared > > > postcopy: postcopy_notify_shared_wake > > > vhost+postcopy: Add vhost waker > > > vhost+postcopy: Call wakeups > > > vub+postcopy: madvises > > > vhost+postcopy: Lock around set_mem_table > > > vhu: enable = false on get_vring_base > > > vhost: Add VHOST_USER_POSTCOPY_END message > > > vhost+postcopy: Wire up POSTCOPY_END notify > > > postcopy: Allow shared memory > > > vhost-user: Claim support for postcopy > > > > > > contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++- > > > contrib/libvhost-user/libvhost-user.h | 8 + > > > exec.c | 44 +++-- > > > hw/virtio/trace-events | 13 ++ > > > hw/virtio/vhost-user.c | 293 +++++++++++++++++++++++++++- > > > include/exec/cpu-common.h | 3 + > > > include/exec/ram_addr.h | 2 + > > > migration/migration.c | 3 + > > > migration/migration.h | 8 + > > > migration/postcopy-ram.c | 357 +++++++++++++++++++++++++++------- > > > migration/postcopy-ram.h | 69 +++++++ > > > migration/ram.c | 5 + > > > migration/ram.h | 1 + > > > migration/savevm.c | 13 ++ > > > migration/trace-events | 6 + > > > trace-events | 3 + > > > vl.c | 4 +- > > > 17 files changed, 926 insertions(+), 84 deletions(-) > > > > > > -- > > > 2.13.0 > > > > > > > > > > -- > > > > BR > > Alexey > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > -- BR Alexey
© 2016 - 2024 Red Hat, Inc.