[v1] postcopy+vhost-user/shared ram

[Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram

Posted by Dr. David Alan Gilbert (git) 8 years, 7 months ago

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This is a RFC/WIP series that enables postcopy migration
with shared memory to a vhost-user process.
It's based off current-head + Juan's load_cleanup series, and
Alexey's bitmap series (v4).  It's very lightly tested and seems
to work, but it's quite rough.

I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
use the new feature, since this is about the simplest
client around.

Structure:

The basic idea is that near the start of postcopy, the client
opens its own userfaultfd fd and sends that back to QEMU over
the socket it's already using for VHUST_USER_* commands.
Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
areas with userfaultfd and sends the mapped addresses back to QEMU.

QEMU then reads the clients UFD in it's fault thread and issues
requests back to the source as needed.
QEMU also issues 'WAKE' ioctls on the UFD to let the client know
that the page has arrived and can carry on.

A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
the QEMU knows the client can talk postcopy.
Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
added to guide the process along.

Current known issues:
   I've not tested it with hugepages yet; and I suspect the madvises
   will need tweaking for it.

   The qemu gets to see the base addresses that the client has its
   regions mapped at; that's not great for security

   Take care of deadlocking; any thread in the client that
   accesses a userfault protected page can stall.

   There's a nasty hack of a lock around the set_mem_table message.

   I've not looked at the recent IOMMU code.

   Some cleanup and a lot of corner cases need thinking about.

   There are probably plenty of unknown issues as well.

Test setup:
  I'm running on one host at the moment, with the guest
  scping a large file from the host as it migrates.
  The setup is based on one I found in the vhost-user setups.
  You'll need a recent kernel for the shared memory support
  in userfaultfd, and userfault isn't that happy if a process
  using shared memory core's - so make sure you have the
  latest fixes.

SESS=vhost
ulimit -c unlimited
tmux -L $SESS new-session -d
tmux -L $SESS set-option -g history-limit 30000
# Start a router using the system qemu
tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
tmux -L $SESS set-option -g set-remain-on-exit on
# Start source vhost bridge
tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
sleep 0.5
tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
# Start dest vhost bridge
tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
1:5556 2>dst-vub-log"
sleep 0.5
tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
tmux -L $SESS send-keys -t source "migrate_set_speed 20M
tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on

then once booted:
tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
(Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)


Dave

Dr. David Alan Gilbert (29):
  RAMBlock/migration: Add migration flags
  migrate: Update ram_block_discard_range for shared
  qemu_ram_block_host_offset
  migration/ram: ramblock_recv_bitmap_test_byte_offset
  postcopy: use UFFDIO_ZEROPAGE only when available
  postcopy: Add notifier chain
  postcopy: Add vhost-user flag for postcopy and check it
  vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
  vhub: Support sending fds back to qemu
  vhub: Open userfaultfd
  postcopy: Allow registering of fd handler
  vhost+postcopy: Register shared ufd with postcopy
  vhost+postcopy: Transmit 'listen' to client
  vhost+postcopy: Register new regions with the ufd
  vhost+postcopy: Send address back to qemu
  vhost+postcopy: Stash RAMBlock and offset
  vhost+postcopy: Send requests to source for shared pages
  vhost+postcopy: Resolve client address
  postcopy: wake shared
  postcopy: postcopy_notify_shared_wake
  vhost+postcopy: Add vhost waker
  vhost+postcopy: Call wakeups
  vub+postcopy: madvises
  vhost+postcopy: Lock around set_mem_table
  vhu: enable = false on get_vring_base
  vhost: Add VHOST_USER_POSTCOPY_END message
  vhost+postcopy: Wire up POSTCOPY_END notify
  postcopy: Allow shared memory
  vhost-user: Claim support for postcopy

 contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
 contrib/libvhost-user/libvhost-user.h |   8 +
 exec.c                                |  44 +++--
 hw/virtio/trace-events                |  13 ++
 hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
 include/exec/cpu-common.h             |   3 +
 include/exec/ram_addr.h               |   2 +
 migration/migration.c                 |   3 +
 migration/migration.h                 |   8 +
 migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
 migration/postcopy-ram.h              |  69 +++++++
 migration/ram.c                       |   5 +
 migration/ram.h                       |   1 +
 migration/savevm.c                    |  13 ++
 migration/trace-events                |   6 +
 trace-events                          |   3 +
 vl.c                                  |   4 +-
 17 files changed, 926 insertions(+), 84 deletions(-)

-- 
2.13.0

Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram

Posted by Dr. David Alan Gilbert 8 years, 7 months ago

* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This is a RFC/WIP series that enables postcopy migration
> with shared memory to a vhost-user process.
> It's based off current-head + Juan's load_cleanup series, and
> Alexey's bitmap series (v4).  It's very lightly tested and seems
> to work, but it's quite rough.

Marc-André asked if I had a git with it all applied; so here we are:
https://github.com/dagrh/qemu/commits/vhost
git@github.com:dagrh/qemu.git on the vhost branch

Dave

> I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> use the new feature, since this is about the simplest
> client around.
> 
> Structure:
> 
> The basic idea is that near the start of postcopy, the client
> opens its own userfaultfd fd and sends that back to QEMU over
> the socket it's already using for VHUST_USER_* commands.
> Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> areas with userfaultfd and sends the mapped addresses back to QEMU.
> 
> QEMU then reads the clients UFD in it's fault thread and issues
> requests back to the source as needed.
> QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> that the page has arrived and can carry on.
> 
> A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> the QEMU knows the client can talk postcopy.
> Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> added to guide the process along.
> 
> Current known issues:
>    I've not tested it with hugepages yet; and I suspect the madvises
>    will need tweaking for it.
> 
>    The qemu gets to see the base addresses that the client has its
>    regions mapped at; that's not great for security
> 
>    Take care of deadlocking; any thread in the client that
>    accesses a userfault protected page can stall.
> 
>    There's a nasty hack of a lock around the set_mem_table message.
> 
>    I've not looked at the recent IOMMU code.
> 
>    Some cleanup and a lot of corner cases need thinking about.
> 
>    There are probably plenty of unknown issues as well.
> 
> Test setup:
>   I'm running on one host at the moment, with the guest
>   scping a large file from the host as it migrates.
>   The setup is based on one I found in the vhost-user setups.
>   You'll need a recent kernel for the shared memory support
>   in userfaultfd, and userfault isn't that happy if a process
>   using shared memory core's - so make sure you have the
>   latest fixes.
> 
> SESS=vhost
> ulimit -c unlimited
> tmux -L $SESS new-session -d
> tmux -L $SESS set-option -g history-limit 30000
> # Start a router using the system qemu
> tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> tmux -L $SESS set-option -g set-remain-on-exit on
> # Start source vhost bridge
> tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> sleep 0.5
> tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> # Start dest vhost bridge
> tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> 1:5556 2>dst-vub-log"
> sleep 0.5
> tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> 
> then once booted:
> tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> 
> 
> Dave
> 
> Dr. David Alan Gilbert (29):
>   RAMBlock/migration: Add migration flags
>   migrate: Update ram_block_discard_range for shared
>   qemu_ram_block_host_offset
>   migration/ram: ramblock_recv_bitmap_test_byte_offset
>   postcopy: use UFFDIO_ZEROPAGE only when available
>   postcopy: Add notifier chain
>   postcopy: Add vhost-user flag for postcopy and check it
>   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
>   vhub: Support sending fds back to qemu
>   vhub: Open userfaultfd
>   postcopy: Allow registering of fd handler
>   vhost+postcopy: Register shared ufd with postcopy
>   vhost+postcopy: Transmit 'listen' to client
>   vhost+postcopy: Register new regions with the ufd
>   vhost+postcopy: Send address back to qemu
>   vhost+postcopy: Stash RAMBlock and offset
>   vhost+postcopy: Send requests to source for shared pages
>   vhost+postcopy: Resolve client address
>   postcopy: wake shared
>   postcopy: postcopy_notify_shared_wake
>   vhost+postcopy: Add vhost waker
>   vhost+postcopy: Call wakeups
>   vub+postcopy: madvises
>   vhost+postcopy: Lock around set_mem_table
>   vhu: enable = false on get_vring_base
>   vhost: Add VHOST_USER_POSTCOPY_END message
>   vhost+postcopy: Wire up POSTCOPY_END notify
>   postcopy: Allow shared memory
>   vhost-user: Claim support for postcopy
> 
>  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
>  contrib/libvhost-user/libvhost-user.h |   8 +
>  exec.c                                |  44 +++--
>  hw/virtio/trace-events                |  13 ++
>  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
>  include/exec/cpu-common.h             |   3 +
>  include/exec/ram_addr.h               |   2 +
>  migration/migration.c                 |   3 +
>  migration/migration.h                 |   8 +
>  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
>  migration/postcopy-ram.h              |  69 +++++++
>  migration/ram.c                       |   5 +
>  migration/ram.h                       |   1 +
>  migration/savevm.c                    |  13 ++
>  migration/trace-events                |   6 +
>  trace-events                          |   3 +
>  vl.c                                  |   4 +-
>  17 files changed, 926 insertions(+), 84 deletions(-)
> 
> -- 
> 2.13.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram

Posted by Marc-André Lureau 8 years, 7 months ago

Hi

On Thu, Jun 29, 2017 at 8:56 PM Dr. David Alan Gilbert <dgilbert@redhat.com>
wrote:

> * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Hi,
> >   This is a RFC/WIP series that enables postcopy migration
> > with shared memory to a vhost-user process.
> > It's based off current-head + Juan's load_cleanup series, and
> > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > to work, but it's quite rough.
>
> Marc-André asked if I had a git with it all applied; so here we are:
> https://github.com/dagrh/qemu/commits/vhost
> git@github.com:dagrh/qemu.git on the vhost branch
>
>
I started looking at the series, but I am not familiar with ufd/postcopy.
Could you update vhost-user.txt to describe the new messages? Otherwise,
make check hangs in /x86_64/vhost-user/connect-fail (might be an unrelated
regression?) Thanks
-- 
Marc-André Lureau

Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram

Posted by Dr. David Alan Gilbert 8 years, 7 months ago

* Marc-André Lureau (marcandre.lureau@gmail.com) wrote:
> Hi
> 
> On Thu, Jun 29, 2017 at 8:56 PM Dr. David Alan Gilbert <dgilbert@redhat.com>
> wrote:
> 
> > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >
> > > Hi,
> > >   This is a RFC/WIP series that enables postcopy migration
> > > with shared memory to a vhost-user process.
> > > It's based off current-head + Juan's load_cleanup series, and
> > > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > > to work, but it's quite rough.
> >
> > Marc-André asked if I had a git with it all applied; so here we are:
> > https://github.com/dagrh/qemu/commits/vhost
> > git@github.com:dagrh/qemu.git on the vhost branch
> >
> >
> I started looking at the series, but I am not familiar with ufd/postcopy.

I'm similarly unfamiliar with the vhost code when I started this (which
probably shows!).
The main thing about ufd is that a process registers with the ufd system
and registers an area of memory with it;  accesses to the memory block
until the page is available, a message is sent down the ufd, and whoever
receives that message may then respond by atomically copying a page into
memory, or wakeing the process when it knows the page is there.
This is the first time we've tried to use userfaultfd with shared memory
and it does need a very recent kernel for it (4.11.0 or rhel 7.4 beta)

> Could you update vhost-user.txt to describe the new messages?

See below; I'll add that in.

> Otherwise,
> make check hangs in /x86_64/vhost-user/connect-fail (might be an unrelated
> regression?) Thanks

Entirely possible I broke it; I'll have a look - at the moment I'm more
interested in comments on the structure of this set.

Dave

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 481ab56e35..fec4cd0ffe 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -273,6 +273,14 @@ Once the source has finished migration, rings will be stopped by
 the source. No further update must be done before rings are
 restarted.

+In postcopy migration the slave is started before all the memory has been
+received from the source host, and care must be taken to avoid accessing pages
+that have yet to be received.  The slave opens a 'userfault'-fd and registers
+the memory with it; this fd is then passed back over to the master.
+The master services requests on the userfaultfd for pages that are accessed
+and when the page is available it performs WAKE ioctl's on the userfaultfd
+to wake the stalled slave.
+
 IOMMU support
 -------------

@@ -326,6 +334,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_REPLY_ACK      3
 #define VHOST_USER_PROTOCOL_F_MTU            4
 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
+#define VHOST_USER_PROTOCOL_F_POSTCOPY       6

 Master message types
 --------------------
@@ -402,12 +411,17 @@ Master message types
       Id: 5
       Equivalent ioctl: VHOST_SET_MEM_TABLE
       Master payload: memory regions description
+      Slave payload: (postcopy only) memory regions description

       Sets the memory map regions on the slave so it can translate the vring
       addresses. In the ancillary data there is an array of file descriptors
       for each memory mapped region. The size and ordering of the fds matches
       the number and ordering of memory regions.

+      When postcopy-listening has been received, SET_MEM_TABLE replies with
+      the bases of the memory mapped regions to the master.  It must have mmap'd
+      the regions and enabled userfaultfd on them.
+
  * VHOST_USER_SET_LOG_BASE

       Id: 6
@@ -580,6 +594,29 @@ Master message types
       This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
       has been successfully negotiated.

+ * VHOST_USER_POSTCOPY_ADVISE
+      Id: 23
+      Master payload: N/A
+      Slave payload: userfault fd + u64
+
+      Master advises slave that a migration with postcopy enabled is underway,
+      the slave must open a userfaultfd for later use.
+      Note that at this stage the migration is still in precopy mode.
+
+ * VHOST_USER_POSTCOPY_LISTEN
+      Id: 24
+      Master payload: N/A
+
+      Master advises slave that a transition to postcopy mode has happened.
+
+ * VHOST_USER_POSTCOPY_END
+      Id: 25
+      Slave payload: u64
+
+      Master advises that postcopy migration has now completed.  The
+      slave must disable the userfaultfd. The response is an acknowledgement
+      only.
+
 Slave message types
 -------------------


> -- 
> Marc-André Lureau
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram

Posted by Dr. David Alan Gilbert 8 years, 7 months ago

* Marc-André Lureau (marcandre.lureau@gmail.com) wrote:
> Hi
> 
> On Thu, Jun 29, 2017 at 8:56 PM Dr. David Alan Gilbert <dgilbert@redhat.com>
> wrote:
> 
> > * Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >
> > > Hi,
> > >   This is a RFC/WIP series that enables postcopy migration
> > > with shared memory to a vhost-user process.
> > > It's based off current-head + Juan's load_cleanup series, and
> > > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > > to work, but it's quite rough.
> >
> > Marc-André asked if I had a git with it all applied; so here we are:
> > https://github.com/dagrh/qemu/commits/vhost
> > git@github.com:dagrh/qemu.git on the vhost branch
> >
> >
> I started looking at the series, but I am not familiar with ufd/postcopy.
> Could you update vhost-user.txt to describe the new messages? Otherwise,
> make check hangs in /x86_64/vhost-user/connect-fail (might be an unrelated
> regression?) Thanks

OK, figured that one out;  it was a cleanup path for the postcopy
notifier trying to remove the notifier when because we were testing a
failure path the notifier hadn't been added in the first place.

(That was really nasty to find, for some reason those tests refuse to
generate core dumps; I ended up with a while loop doing gdb --pid
$(pgrep....) to nail the qemu that was about to seg)

Dave

> -- 
> Marc-André Lureau
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram

Posted by Michael S. Tsirkin 8 years, 7 months ago

On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This is a RFC/WIP series that enables postcopy migration
> with shared memory to a vhost-user process.
> It's based off current-head + Juan's load_cleanup series, and
> Alexey's bitmap series (v4).  It's very lightly tested and seems
> to work, but it's quite rough.
> 
> I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> use the new feature, since this is about the simplest
> client around.
> 
> Structure:
> 
> The basic idea is that near the start of postcopy, the client
> opens its own userfaultfd fd and sends that back to QEMU over
> the socket it's already using for VHUST_USER_* commands.
> Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> areas with userfaultfd and sends the mapped addresses back to QEMU.
> 
> QEMU then reads the clients UFD in it's fault thread and issues
> requests back to the source as needed.
> QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> that the page has arrived and can carry on.
> 
> A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> the QEMU knows the client can talk postcopy.
> Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> added to guide the process along.
> 
> Current known issues:
>    I've not tested it with hugepages yet; and I suspect the madvises
>    will need tweaking for it.
> 
>    The qemu gets to see the base addresses that the client has its
>    regions mapped at; that's not great for security

Not urgent to fix.

>    Take care of deadlocking; any thread in the client that
>    accesses a userfault protected page can stall.

And it can happen under a lock quite easily.
What exactly is proposed here?
Maybe we want to reuse the new channel that the IOMMU uses.


>    There's a nasty hack of a lock around the set_mem_table message.

Yes.

>    I've not looked at the recent IOMMU code.
> 
>    Some cleanup and a lot of corner cases need thinking about.
> 
>    There are probably plenty of unknown issues as well.

At the protocol level, I'd like to rename the feature to
USER_PAGEFAULT. Client does not really know anything about
copies, it's all internal to qemu.
Spec can document that it's used by qemu for postcopy.


> Test setup:
>   I'm running on one host at the moment, with the guest
>   scping a large file from the host as it migrates.
>   The setup is based on one I found in the vhost-user setups.
>   You'll need a recent kernel for the shared memory support
>   in userfaultfd, and userfault isn't that happy if a process
>   using shared memory core's - so make sure you have the
>   latest fixes.
> 
> SESS=vhost
> ulimit -c unlimited
> tmux -L $SESS new-session -d
> tmux -L $SESS set-option -g history-limit 30000
> # Start a router using the system qemu
> tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> tmux -L $SESS set-option -g set-remain-on-exit on
> # Start source vhost bridge
> tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> sleep 0.5
> tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> # Start dest vhost bridge
> tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> 1:5556 2>dst-vub-log"
> sleep 0.5
> tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> 
> then once booted:
> tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> 
> 
> Dave
> 
> Dr. David Alan Gilbert (29):
>   RAMBlock/migration: Add migration flags
>   migrate: Update ram_block_discard_range for shared
>   qemu_ram_block_host_offset
>   migration/ram: ramblock_recv_bitmap_test_byte_offset
>   postcopy: use UFFDIO_ZEROPAGE only when available
>   postcopy: Add notifier chain
>   postcopy: Add vhost-user flag for postcopy and check it
>   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
>   vhub: Support sending fds back to qemu
>   vhub: Open userfaultfd
>   postcopy: Allow registering of fd handler
>   vhost+postcopy: Register shared ufd with postcopy
>   vhost+postcopy: Transmit 'listen' to client
>   vhost+postcopy: Register new regions with the ufd
>   vhost+postcopy: Send address back to qemu
>   vhost+postcopy: Stash RAMBlock and offset
>   vhost+postcopy: Send requests to source for shared pages
>   vhost+postcopy: Resolve client address
>   postcopy: wake shared
>   postcopy: postcopy_notify_shared_wake
>   vhost+postcopy: Add vhost waker
>   vhost+postcopy: Call wakeups
>   vub+postcopy: madvises
>   vhost+postcopy: Lock around set_mem_table
>   vhu: enable = false on get_vring_base
>   vhost: Add VHOST_USER_POSTCOPY_END message
>   vhost+postcopy: Wire up POSTCOPY_END notify
>   postcopy: Allow shared memory
>   vhost-user: Claim support for postcopy
> 
>  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
>  contrib/libvhost-user/libvhost-user.h |   8 +
>  exec.c                                |  44 +++--
>  hw/virtio/trace-events                |  13 ++
>  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
>  include/exec/cpu-common.h             |   3 +
>  include/exec/ram_addr.h               |   2 +
>  migration/migration.c                 |   3 +
>  migration/migration.h                 |   8 +
>  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
>  migration/postcopy-ram.h              |  69 +++++++
>  migration/ram.c                       |   5 +
>  migration/ram.h                       |   1 +
>  migration/savevm.c                    |  13 ++
>  migration/trace-events                |   6 +
>  trace-events                          |   3 +
>  vl.c                                  |   4 +-
>  17 files changed, 926 insertions(+), 84 deletions(-)
> 
> -- 
> 2.13.0

Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram

Posted by Dr. David Alan Gilbert 8 years, 7 months ago

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This is a RFC/WIP series that enables postcopy migration
> > with shared memory to a vhost-user process.
> > It's based off current-head + Juan's load_cleanup series, and
> > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > to work, but it's quite rough.
> > 
> > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> > use the new feature, since this is about the simplest
> > client around.
> > 
> > Structure:
> > 
> > The basic idea is that near the start of postcopy, the client
> > opens its own userfaultfd fd and sends that back to QEMU over
> > the socket it's already using for VHUST_USER_* commands.
> > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> > areas with userfaultfd and sends the mapped addresses back to QEMU.
> > 
> > QEMU then reads the clients UFD in it's fault thread and issues
> > requests back to the source as needed.
> > QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> > that the page has arrived and can carry on.
> > 
> > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> > the QEMU knows the client can talk postcopy.
> > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> > added to guide the process along.
> > 
> > Current known issues:
> >    I've not tested it with hugepages yet; and I suspect the madvises
> >    will need tweaking for it.
> > 
> >    The qemu gets to see the base addresses that the client has its
> >    regions mapped at; that's not great for security
> 
> Not urgent to fix.
> 
> >    Take care of deadlocking; any thread in the client that
> >    accesses a userfault protected page can stall.
> 
> And it can happen under a lock quite easily.
> What exactly is proposed here?
> Maybe we want to reuse the new channel that the IOMMU uses.

There's no fundamental reason to get deadlocks as long as you
get it right; the qemu thread that processes the user-fault's
is a separate independent thread, so once it's going the client
can do whatever it likes and it will get woken up without
intervention.
Some care is needed around the postcopy-end; reception of the
message that tells you to drop the userfault enables (which
frees anything that hasn't been woken) must be allowed to happen
for the postcopy complete;  we take care that QEMUs fault
thread lives on until that message is acknowledged.

I'm more worried about how this will work in a full packet switch
when one vhost-user client for an incoming migration stalls
the whole switch unless care is taken about the design.
How do we figure out whether this is going to fly on a full stack?
That's my main reason for getting this WIP set out here to
get comments.

> >    There's a nasty hack of a lock around the set_mem_table message.
> 
> Yes.
> 
> >    I've not looked at the recent IOMMU code.
> > 
> >    Some cleanup and a lot of corner cases need thinking about.
> > 
> >    There are probably plenty of unknown issues as well.
> 
> At the protocol level, I'd like to rename the feature to
> USER_PAGEFAULT. Client does not really know anything about
> copies, it's all internal to qemu.
> Spec can document that it's used by qemu for postcopy.

OK, tbh I suspect that using it for anything else would be tricky
without adding more protocol features for that other use case.

Dave

> > Test setup:
> >   I'm running on one host at the moment, with the guest
> >   scping a large file from the host as it migrates.
> >   The setup is based on one I found in the vhost-user setups.
> >   You'll need a recent kernel for the shared memory support
> >   in userfaultfd, and userfault isn't that happy if a process
> >   using shared memory core's - so make sure you have the
> >   latest fixes.
> > 
> > SESS=vhost
> > ulimit -c unlimited
> > tmux -L $SESS new-session -d
> > tmux -L $SESS set-option -g history-limit 30000
> > # Start a router using the system qemu
> > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> > tmux -L $SESS set-option -g set-remain-on-exit on
> > # Start source vhost bridge
> > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> > # Start dest vhost bridge
> > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> > 1:5556 2>dst-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> > tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> > 
> > then once booted:
> > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> > 
> > 
> > Dave
> > 
> > Dr. David Alan Gilbert (29):
> >   RAMBlock/migration: Add migration flags
> >   migrate: Update ram_block_discard_range for shared
> >   qemu_ram_block_host_offset
> >   migration/ram: ramblock_recv_bitmap_test_byte_offset
> >   postcopy: use UFFDIO_ZEROPAGE only when available
> >   postcopy: Add notifier chain
> >   postcopy: Add vhost-user flag for postcopy and check it
> >   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
> >   vhub: Support sending fds back to qemu
> >   vhub: Open userfaultfd
> >   postcopy: Allow registering of fd handler
> >   vhost+postcopy: Register shared ufd with postcopy
> >   vhost+postcopy: Transmit 'listen' to client
> >   vhost+postcopy: Register new regions with the ufd
> >   vhost+postcopy: Send address back to qemu
> >   vhost+postcopy: Stash RAMBlock and offset
> >   vhost+postcopy: Send requests to source for shared pages
> >   vhost+postcopy: Resolve client address
> >   postcopy: wake shared
> >   postcopy: postcopy_notify_shared_wake
> >   vhost+postcopy: Add vhost waker
> >   vhost+postcopy: Call wakeups
> >   vub+postcopy: madvises
> >   vhost+postcopy: Lock around set_mem_table
> >   vhu: enable = false on get_vring_base
> >   vhost: Add VHOST_USER_POSTCOPY_END message
> >   vhost+postcopy: Wire up POSTCOPY_END notify
> >   postcopy: Allow shared memory
> >   vhost-user: Claim support for postcopy
> > 
> >  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
> >  contrib/libvhost-user/libvhost-user.h |   8 +
> >  exec.c                                |  44 +++--
> >  hw/virtio/trace-events                |  13 ++
> >  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
> >  include/exec/cpu-common.h             |   3 +
> >  include/exec/ram_addr.h               |   2 +
> >  migration/migration.c                 |   3 +
> >  migration/migration.h                 |   8 +
> >  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
> >  migration/postcopy-ram.h              |  69 +++++++
> >  migration/ram.c                       |   5 +
> >  migration/ram.h                       |   1 +
> >  migration/savevm.c                    |  13 ++
> >  migration/trace-events                |   6 +
> >  trace-events                          |   3 +
> >  vl.c                                  |   4 +-
> >  17 files changed, 926 insertions(+), 84 deletions(-)
> > 
> > -- 
> > 2.13.0
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram

Posted by Michael S. Tsirkin 8 years, 7 months ago

On Fri, Jul 07, 2017 at 01:01:56PM +0100, Dr. David Alan Gilbert wrote:
> > >    Take care of deadlocking; any thread in the client that
> > >    accesses a userfault protected page can stall.
> > 
> > And it can happen under a lock quite easily.
> > What exactly is proposed here?
> > Maybe we want to reuse the new channel that the IOMMU uses.
> 
> There's no fundamental reason to get deadlocks as long as you
> get it right; the qemu thread that processes the user-fault's
> is a separate independent thread, so once it's going the client
> can do whatever it likes and it will get woken up without
> intervention.

You take a lock for the channel, then access guest memory.
Then the thread that gets messages from qemu can't get
on the channel to mark range as populated.

> Some care is needed around the postcopy-end; reception of the
> message that tells you to drop the userfault enables (which
> frees anything that hasn't been woken) must be allowed to happen
> for the postcopy complete;  we take care that QEMUs fault
> thread lives on until that message is acknowledged.
>
> I'm more worried about how this will work in a full packet switch
> when one vhost-user client for an incoming migration stalls
> the whole switch unless care is taken about the design.
> How do we figure out whether this is going to fly on a full stack?

It's performance though. Client could run in a separate
thread for a while until migration finishes.
We need to make sure there's explicit documentation
that tells clients at what point they might block.

> That's my main reason for getting this WIP set out here to
> get comments.

What will happen if QEMU dies? Is there a way to unblock the client?

> > >    There's a nasty hack of a lock around the set_mem_table message.
> > 
> > Yes.
> > 
> > >    I've not looked at the recent IOMMU code.
> > > 
> > >    Some cleanup and a lot of corner cases need thinking about.
> > > 
> > >    There are probably plenty of unknown issues as well.
> > 
> > At the protocol level, I'd like to rename the feature to
> > USER_PAGEFAULT. Client does not really know anything about
> > copies, it's all internal to qemu.
> > Spec can document that it's used by qemu for postcopy.
> 
> OK, tbh I suspect that using it for anything else would be tricky
> without adding more protocol features for that other use case.
> 
> Dave

Why exactly? How does client have to know it's migration?

-- 
MST

Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram

Posted by Dr. David Alan Gilbert 8 years, 7 months ago

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Fri, Jul 07, 2017 at 01:01:56PM +0100, Dr. David Alan Gilbert wrote:
> > > >    Take care of deadlocking; any thread in the client that
> > > >    accesses a userfault protected page can stall.
> > > 
> > > And it can happen under a lock quite easily.
> > > What exactly is proposed here?
> > > Maybe we want to reuse the new channel that the IOMMU uses.
> > 
> > There's no fundamental reason to get deadlocks as long as you
> > get it right; the qemu thread that processes the user-fault's
> > is a separate independent thread, so once it's going the client
> > can do whatever it likes and it will get woken up without
> > intervention.
> 
> You take a lock for the channel, then access guest memory.
> Then the thread that gets messages from qemu can't get
> on the channel to mark range as populated.

It doesn't need to get the message from qemu to know it's populated
though; qemu performs a WAKE ioctl on the userfaultfd to cause
it to wake, so there's no action needed by the client.
(If it does need to take a lock then ye we have a problem).

> > Some care is needed around the postcopy-end; reception of the
> > message that tells you to drop the userfault enables (which
> > frees anything that hasn't been woken) must be allowed to happen
> > for the postcopy complete;  we take care that QEMUs fault
> > thread lives on until that message is acknowledged.
> >
> > I'm more worried about how this will work in a full packet switch
> > when one vhost-user client for an incoming migration stalls
> > the whole switch unless care is taken about the design.
> > How do we figure out whether this is going to fly on a full stack?
> 
> It's performance though. Client could run in a separate
> thread for a while until migration finishes.
> We need to make sure there's explicit documentation
> that tells clients at what point they might block.

Right.

> > That's my main reason for getting this WIP set out here to
> > get comments.
> 
> What will happen if QEMU dies? Is there a way to unblock the client?

If the client can detect this and close it's userfaultfd then yes;
of course that detection has to be done in a thread that can't be being
blocked by anything related to the userfaultfd that it might be blocked
on.

> > > >    There's a nasty hack of a lock around the set_mem_table message.
> > > 
> > > Yes.
> > > 
> > > >    I've not looked at the recent IOMMU code.
> > > > 
> > > >    Some cleanup and a lot of corner cases need thinking about.
> > > > 
> > > >    There are probably plenty of unknown issues as well.
> > > 
> > > At the protocol level, I'd like to rename the feature to
> > > USER_PAGEFAULT. Client does not really know anything about
> > > copies, it's all internal to qemu.
> > > Spec can document that it's used by qemu for postcopy.
> > 
> > OK, tbh I suspect that using it for anything else would be tricky
> > without adding more protocol features for that other use case.
> > 
> > Dave
> 
> Why exactly? How does client have to know it's migration?

It's more the sequence I worry about; we're reliant on
making sure that the userfaultfd is registered with the RAM before
it's ever accessed, and we unregister at the end.
This all keys in with migration requesting registration at the right
point before loading the devices.

Dave

> -- 
> MST
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram

Posted by Alexey 8 years, 7 months ago

Hello, David!

Thank for you patch set.

On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Hi,
>   This is a RFC/WIP series that enables postcopy migration
> with shared memory to a vhost-user process.
> It's based off current-head + Juan's load_cleanup series, and
> Alexey's bitmap series (v4).  It's very lightly tested and seems
> to work, but it's quite rough.
> 
> I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> use the new feature, since this is about the simplest
> client around.
> 
> Structure:
> 
> The basic idea is that near the start of postcopy, the client
> opens its own userfaultfd fd and sends that back to QEMU over
> the socket it's already using for VHUST_USER_* commands.
> Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> areas with userfaultfd and sends the mapped addresses back to QEMU.

userfault fd should be only one per all affected processes. But
why are you opening userfaultfd on client side, why not to pass
userfault fd which was opened at QEMU side? I guess, it could
be several virtual switches with different ports (it's exotic
configuration, but configuration when we have one QEMU, one vswitchd,
and serveral vhost-user ports is typical), and as example,
QEMU could be connected to these vswitches through these ports.
In this case you will obtain 2 different userfault fd in QEMU.
In case of one QEMU, one vswitchd and several vhost-user ports,
you are keeping userfaultfd in VuDev structure on client side,
looks like it's virtion_net sibling from DPDK, and that structure
is per vhost-user connection (per one port).

So from my point of view it's better to open fd on QEMU side, and pass it
the same way as shared mem fd in SET_MEM_TABLE, but in POSTCOPY_ADVISE.


> 
> QEMU then reads the clients UFD in it's fault thread and issues
> requests back to the source as needed.
> QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> that the page has arrived and can carry on.
Not so clear for me why QEMU have to inform vhost client,
due to single userfault fd, and kernel should wake up another faulted
thread/processes.
In my approach I just to send information about copied/received page
to vhot client, to be able to enable previously disabled VRING.

> 
> A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> the QEMU knows the client can talk postcopy.
> Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> added to guide the process along.
> 
> Current known issues:
>    I've not tested it with hugepages yet; and I suspect the madvises
>    will need tweaking for it.
I saw you didn't change order of SET_MEM_TABLE call in QEMU side,
some part or pages already arrived and copied, so I'm doing
hole here according to received map.

> 
>    The qemu gets to see the base addresses that the client has its
>    regions mapped at; that's not great for security
> 
>    Take care of deadlocking; any thread in the client that
>    accesses a userfault protected page can stall.
That's why I decided to disable VRINGs, but not the way as you did
in GET_VRING_BASE, I send received bitmap, right after SET_MEM_TABLE,
here could be synchronization problem, maybe similar problem as you described in
"vhost+postcopy: Lock around set_mem_table"

Unfortunately, my patches isn't yet ready.

> 
>    There's a nasty hack of a lock around the set_mem_table message.
> 
>    I've not looked at the recent IOMMU code.
> 
>    Some cleanup and a lot of corner cases need thinking about.
> 
>    There are probably plenty of unknown issues as well.
> 
> Test setup:
>   I'm running on one host at the moment, with the guest
>   scping a large file from the host as it migrates.
>   The setup is based on one I found in the vhost-user setups.
>   You'll need a recent kernel for the shared memory support
>   in userfaultfd, and userfault isn't that happy if a process
>   using shared memory core's - so make sure you have the
>   latest fixes.
> 
> SESS=vhost
> ulimit -c unlimited
> tmux -L $SESS new-session -d
> tmux -L $SESS set-option -g history-limit 30000
> # Start a router using the system qemu
> tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> tmux -L $SESS set-option -g set-remain-on-exit on
> # Start source vhost bridge
> tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> sleep 0.5
> tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> # Start dest vhost bridge
> tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> 1:5556 2>dst-vub-log"
> sleep 0.5
> tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> 
> then once booted:
> tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> 
> 
> Dave
> 
> Dr. David Alan Gilbert (29):
>   RAMBlock/migration: Add migration flags
>   migrate: Update ram_block_discard_range for shared
>   qemu_ram_block_host_offset
>   migration/ram: ramblock_recv_bitmap_test_byte_offset
>   postcopy: use UFFDIO_ZEROPAGE only when available
>   postcopy: Add notifier chain
>   postcopy: Add vhost-user flag for postcopy and check it
>   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
>   vhub: Support sending fds back to qemu
>   vhub: Open userfaultfd
>   postcopy: Allow registering of fd handler
>   vhost+postcopy: Register shared ufd with postcopy
>   vhost+postcopy: Transmit 'listen' to client
>   vhost+postcopy: Register new regions with the ufd
>   vhost+postcopy: Send address back to qemu
>   vhost+postcopy: Stash RAMBlock and offset
>   vhost+postcopy: Send requests to source for shared pages
>   vhost+postcopy: Resolve client address
>   postcopy: wake shared
>   postcopy: postcopy_notify_shared_wake
>   vhost+postcopy: Add vhost waker
>   vhost+postcopy: Call wakeups
>   vub+postcopy: madvises
>   vhost+postcopy: Lock around set_mem_table
>   vhu: enable = false on get_vring_base
>   vhost: Add VHOST_USER_POSTCOPY_END message
>   vhost+postcopy: Wire up POSTCOPY_END notify
>   postcopy: Allow shared memory
>   vhost-user: Claim support for postcopy
> 
>  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
>  contrib/libvhost-user/libvhost-user.h |   8 +
>  exec.c                                |  44 +++--
>  hw/virtio/trace-events                |  13 ++
>  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
>  include/exec/cpu-common.h             |   3 +
>  include/exec/ram_addr.h               |   2 +
>  migration/migration.c                 |   3 +
>  migration/migration.h                 |   8 +
>  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
>  migration/postcopy-ram.h              |  69 +++++++
>  migration/ram.c                       |   5 +
>  migration/ram.h                       |   1 +
>  migration/savevm.c                    |  13 ++
>  migration/trace-events                |   6 +
>  trace-events                          |   3 +
>  vl.c                                  |   4 +-
>  17 files changed, 926 insertions(+), 84 deletions(-)
> 
> -- 
> 2.13.0
> 
> 

-- 

BR
Alexey

Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram

Posted by Dr. David Alan Gilbert 8 years, 7 months ago

* Alexey (a.perevalov@samsung.com) wrote:
> 
> Hello, David!
> 
> Thank for you patch set.
> 
> On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hi,
> >   This is a RFC/WIP series that enables postcopy migration
> > with shared memory to a vhost-user process.
> > It's based off current-head + Juan's load_cleanup series, and
> > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > to work, but it's quite rough.
> > 
> > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> > use the new feature, since this is about the simplest
> > client around.
> > 
> > Structure:
> > 
> > The basic idea is that near the start of postcopy, the client
> > opens its own userfaultfd fd and sends that back to QEMU over
> > the socket it's already using for VHUST_USER_* commands.
> > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> > areas with userfaultfd and sends the mapped addresses back to QEMU.
> 
> userfault fd should be only one per all affected processes. But
> why are you opening userfaultfd on client side, why not to pass
> userfault fd which was opened at QEMU side?

I just checked with Andrea on the semantics, and ufd don't work like that.
Any given userfaultfd only works on the address space of the process
that opened it; so if you want a process to block on it's memory space
it's the one that has to open the ufd.
(I don't think I knew that when I wrote the code!)
The nice thing about that is that you never get too confused about
address spaces - any one ufd always has one address space in it's ioctls
associated with one process.

> I guess, it could
> be several virtual switches with different ports (it's exotic
> configuration, but configuration when we have one QEMU, one vswitchd,
> and serveral vhost-user ports is typical), and as example,
> QEMU could be connected to these vswitches through these ports.
> In this case you will obtain 2 different userfault fd in QEMU.
> In case of one QEMU, one vswitchd and several vhost-user ports,
> you are keeping userfaultfd in VuDev structure on client side,
> looks like it's virtion_net sibling from DPDK, and that structure
> is per vhost-user connection (per one port).

Multiple switches make sense to me actually; running two switches
and having redundant routes in each VM let you live update the switch
process one at a time.

> So from my point of view it's better to open fd on QEMU side, and pass it
> the same way as shared mem fd in SET_MEM_TABLE, but in POSTCOPY_ADVISE.

Yes I see where you're coming from; but it's one address space per-ufd;
If you had one ufd then you'd have to change the messages to be
  'pid ... is waiting on address ....'
and all the ioctls for doing wakes etc would have to gain a PID.

> > 
> > QEMU then reads the clients UFD in it's fault thread and issues
> > requests back to the source as needed.
> > QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> > that the page has arrived and can carry on.
> Not so clear for me why QEMU have to inform vhost client,
> due to single userfault fd, and kernel should wake up another faulted
> thread/processes.
> In my approach I just to send information about copied/received page
> to vhot client, to be able to enable previously disabled VRING.

The client itself doesn't get notified; it's a UFFDIO_WAKE ioctl
on the ufd that tells the kernel it can unblock a process that's
trying to access the page.
(Their is potential to remove some of that - if we can get the
kernel to wake all the waiters for a physical page when a UFFDIO_COPY
is done it would remove a lot of those).

> > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> > the QEMU knows the client can talk postcopy.
> > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> > added to guide the process along.
> > 
> > Current known issues:
> >    I've not tested it with hugepages yet; and I suspect the madvises
> >    will need tweaking for it.
> I saw you didn't change order of SET_MEM_TABLE call in QEMU side,
> some part or pages already arrived and copied, so I'm doing
> hole here according to received map.

right, so I'm assuming they'll hit ufd faults and be immediately
WAKEd when I find the bit is set in the received-bitmap.

> >    The qemu gets to see the base addresses that the client has its
> >    regions mapped at; that's not great for security
> > 
> >    Take care of deadlocking; any thread in the client that
> >    accesses a userfault protected page can stall.
> That's why I decided to disable VRINGs, but not the way as you did
> in GET_VRING_BASE, I send received bitmap, right after SET_MEM_TABLE,
> here could be synchronization problem, maybe similar problem as you described in
> "vhost+postcopy: Lock around set_mem_table"
> 
> Unfortunately, my patches isn't yet ready.

That's OK; these patches just-about work; only enough for
me to post them and ask for opinions.

Dave

> > 
> >    There's a nasty hack of a lock around the set_mem_table message.
> > 
> >    I've not looked at the recent IOMMU code.
> > 
> >    Some cleanup and a lot of corner cases need thinking about.
> > 
> >    There are probably plenty of unknown issues as well.
> > 
> > Test setup:
> >   I'm running on one host at the moment, with the guest
> >   scping a large file from the host as it migrates.
> >   The setup is based on one I found in the vhost-user setups.
> >   You'll need a recent kernel for the shared memory support
> >   in userfaultfd, and userfault isn't that happy if a process
> >   using shared memory core's - so make sure you have the
> >   latest fixes.
> > 
> > SESS=vhost
> > ulimit -c unlimited
> > tmux -L $SESS new-session -d
> > tmux -L $SESS set-option -g history-limit 30000
> > # Start a router using the system qemu
> > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> > tmux -L $SESS set-option -g set-remain-on-exit on
> > # Start source vhost bridge
> > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> > # Start dest vhost bridge
> > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> > 1:5556 2>dst-vub-log"
> > sleep 0.5
> > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> > tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> > 
> > then once booted:
> > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> > 
> > 
> > Dave
> > 
> > Dr. David Alan Gilbert (29):
> >   RAMBlock/migration: Add migration flags
> >   migrate: Update ram_block_discard_range for shared
> >   qemu_ram_block_host_offset
> >   migration/ram: ramblock_recv_bitmap_test_byte_offset
> >   postcopy: use UFFDIO_ZEROPAGE only when available
> >   postcopy: Add notifier chain
> >   postcopy: Add vhost-user flag for postcopy and check it
> >   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
> >   vhub: Support sending fds back to qemu
> >   vhub: Open userfaultfd
> >   postcopy: Allow registering of fd handler
> >   vhost+postcopy: Register shared ufd with postcopy
> >   vhost+postcopy: Transmit 'listen' to client
> >   vhost+postcopy: Register new regions with the ufd
> >   vhost+postcopy: Send address back to qemu
> >   vhost+postcopy: Stash RAMBlock and offset
> >   vhost+postcopy: Send requests to source for shared pages
> >   vhost+postcopy: Resolve client address
> >   postcopy: wake shared
> >   postcopy: postcopy_notify_shared_wake
> >   vhost+postcopy: Add vhost waker
> >   vhost+postcopy: Call wakeups
> >   vub+postcopy: madvises
> >   vhost+postcopy: Lock around set_mem_table
> >   vhu: enable = false on get_vring_base
> >   vhost: Add VHOST_USER_POSTCOPY_END message
> >   vhost+postcopy: Wire up POSTCOPY_END notify
> >   postcopy: Allow shared memory
> >   vhost-user: Claim support for postcopy
> > 
> >  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
> >  contrib/libvhost-user/libvhost-user.h |   8 +
> >  exec.c                                |  44 +++--
> >  hw/virtio/trace-events                |  13 ++
> >  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
> >  include/exec/cpu-common.h             |   3 +
> >  include/exec/ram_addr.h               |   2 +
> >  migration/migration.c                 |   3 +
> >  migration/migration.h                 |   8 +
> >  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
> >  migration/postcopy-ram.h              |  69 +++++++
> >  migration/ram.c                       |   5 +
> >  migration/ram.h                       |   1 +
> >  migration/savevm.c                    |  13 ++
> >  migration/trace-events                |   6 +
> >  trace-events                          |   3 +
> >  vl.c                                  |   4 +-
> >  17 files changed, 926 insertions(+), 84 deletions(-)
> > 
> > -- 
> > 2.13.0
> > 
> > 
> 
> -- 
> 
> BR
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC 00/29] postcopy+vhost-user/shared ram

Posted by Alexey 8 years, 7 months ago

On Mon, Jul 03, 2017 at 05:49:26PM +0100, Dr. David Alan Gilbert wrote:
> * Alexey (a.perevalov@samsung.com) wrote:
> > 
> > Hello, David!
> > 
> > Thank for you patch set.
> > 
> > On Wed, Jun 28, 2017 at 08:00:18PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Hi,
> > >   This is a RFC/WIP series that enables postcopy migration
> > > with shared memory to a vhost-user process.
> > > It's based off current-head + Juan's load_cleanup series, and
> > > Alexey's bitmap series (v4).  It's very lightly tested and seems
> > > to work, but it's quite rough.
> > > 
> > > I've modified the vhost-user-bridge (aka vhub) in qemu's tests/ to
> > > use the new feature, since this is about the simplest
> > > client around.
> > > 
> > > Structure:
> > > 
> > > The basic idea is that near the start of postcopy, the client
> > > opens its own userfaultfd fd and sends that back to QEMU over
> > > the socket it's already using for VHUST_USER_* commands.
> > > Then when VHOST_USER_SET_MEM_TABLE arrives it registers the
> > > areas with userfaultfd and sends the mapped addresses back to QEMU.
> > 
> > userfault fd should be only one per all affected processes. But
> > why are you opening userfaultfd on client side, why not to pass
> > userfault fd which was opened at QEMU side?
> 
> I just checked with Andrea on the semantics, and ufd don't work like that.
> Any given userfaultfd only works on the address space of the process
> that opened it; so if you want a process to block on it's memory space
> it's the one that has to open the ufd.

yes it obtains from vma in handle_userfault
ctx = vmf->vma->vm_userfaultfd_ctx.ctx;
so that's per vma

and it set into vma
vma->vm_userfaultfd_ctx.ctx = ctx;
in userfaultfd_register(struct userfaultfd_ctx *ctx,
but into userfaultfd_register it puts from
struct userfaultfd_ctx *ctx = file->private_data;
becase file descriptor was transfered over unix domain socket
(SOL_SOCKET) logically to assume userfaultfd context will be the same.


> (I don't think I knew that when I wrote the code!)
> The nice thing about that is that you never get too confused about
> address spaces - any one ufd always has one address space in it's ioctls
> associated with one process.
> 
> > I guess, it could
> > be several virtual switches with different ports (it's exotic
> > configuration, but configuration when we have one QEMU, one vswitchd,
> > and serveral vhost-user ports is typical), and as example,
> > QEMU could be connected to these vswitches through these ports.
> > In this case you will obtain 2 different userfault fd in QEMU.
> > In case of one QEMU, one vswitchd and several vhost-user ports,
> > you are keeping userfaultfd in VuDev structure on client side,
> > looks like it's virtion_net sibling from DPDK, and that structure
> > is per vhost-user connection (per one port).
> 
> Multiple switches make sense to me actually; running two switches
> and having redundant routes in each VM let you live update the switch
> process one at a time.
> 
> > So from my point of view it's better to open fd on QEMU side, and pass it
> > the same way as shared mem fd in SET_MEM_TABLE, but in POSTCOPY_ADVISE.
> 
> Yes I see where you're coming from; but it's one address space per-ufd;
> If you had one ufd then you'd have to change the messages to be
>   'pid ... is waiting on address ....'
> and all the ioctls for doing wakes etc would have to gain a PID.
> 
> > > 
> > > QEMU then reads the clients UFD in it's fault thread and issues
> > > requests back to the source as needed.
> > > QEMU also issues 'WAKE' ioctls on the UFD to let the client know
> > > that the page has arrived and can carry on.
> > Not so clear for me why QEMU have to inform vhost client,
> > due to single userfault fd, and kernel should wake up another faulted
> > thread/processes.
> > In my approach I just to send information about copied/received page
> > to vhot client, to be able to enable previously disabled VRING.
> 
> The client itself doesn't get notified; it's a UFFDIO_WAKE ioctl
> on the ufd that tells the kernel it can unblock a process that's
> trying to access the page.
> (Their is potential to remove some of that - if we can get the
> kernel to wake all the waiters for a physical page when a UFFDIO_COPY
> is done it would remove a lot of those).
> 
> > > A new feature (VHOST_USER_PROTOCOL_F_POSTCOPY) is added so that
> > > the QEMU knows the client can talk postcopy.
> > > Three new messages (VHOST_USER_POSTCOPY_{ADVISE/LISTEN/END}) are
> > > added to guide the process along.
> > > 
> > > Current known issues:
> > >    I've not tested it with hugepages yet; and I suspect the madvises
> > >    will need tweaking for it.
> > I saw you didn't change order of SET_MEM_TABLE call in QEMU side,
> > some part or pages already arrived and copied, so I'm doing
> > hole here according to received map.
> 
> right, so I'm assuming they'll hit ufd faults and be immediately
> WAKEd when I find the bit is set in the received-bitmap.
> 
> > >    The qemu gets to see the base addresses that the client has its
> > >    regions mapped at; that's not great for security
> > > 
> > >    Take care of deadlocking; any thread in the client that
> > >    accesses a userfault protected page can stall.
> > That's why I decided to disable VRINGs, but not the way as you did
> > in GET_VRING_BASE, I send received bitmap, right after SET_MEM_TABLE,
> > here could be synchronization problem, maybe similar problem as you described in
> > "vhost+postcopy: Lock around set_mem_table"
> > 
> > Unfortunately, my patches isn't yet ready.
> 
> That's OK; these patches just-about work; only enough for
> me to post them and ask for opinions.
> 
> Dave
> 
> > > 
> > >    There's a nasty hack of a lock around the set_mem_table message.
> > > 
> > >    I've not looked at the recent IOMMU code.
> > > 
> > >    Some cleanup and a lot of corner cases need thinking about.
> > > 
> > >    There are probably plenty of unknown issues as well.
> > > 
> > > Test setup:
> > >   I'm running on one host at the moment, with the guest
> > >   scping a large file from the host as it migrates.
> > >   The setup is based on one I found in the vhost-user setups.
> > >   You'll need a recent kernel for the shared memory support
> > >   in userfaultfd, and userfault isn't that happy if a process
> > >   using shared memory core's - so make sure you have the
> > >   latest fixes.
> > > 
> > > SESS=vhost
> > > ulimit -c unlimited
> > > tmux -L $SESS new-session -d
> > > tmux -L $SESS set-option -g history-limit 30000
> > > # Start a router using the system qemu
> > > tmux -L $SESS new-window -n router ./x86_64-softmmu/qemu-system-x86_64 -M none -nographic -net socket,vlan=0,udp=loca
> > > lhost:4444,localaddr=localhost:5555 -net socket,vlan=0,udp=localhost:4445,localaddr=localhost:5556 -net user,vlan=0
> > > tmux -L $SESS set-option -g set-remain-on-exit on
> > > # Start source vhost bridge
> > > tmux -L $SESS new-window -n srcvhostbr "./tests/vhost-user-bridge -u /tmp/vubrsrc.sock 2>src-vub-log"
> > > sleep 0.5
> > > tmux -L $SESS new-window -n source "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backe
> > > nd-file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/
> > > tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :0 -monitor stdio -trace events=/root/trace-file 2>src-qemu-log "
> > > # Start dest vhost bridge
> > > tmux -L $SESS new-window -n destvhostbr "./tests/vhost-user-bridge -u /tmp/vubrdst.sock -l 127.0.0.1:4445 -r 127.0.0.
> > > 1:5556 2>dst-vub-log"
> > > sleep 0.5
> > > tmux -L $SESS new-window -n dest "./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 8G -smp 2 -object memory-backend
> > > -file,id=mem,size=8G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tm
> > > p/vubrdst.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 my.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 -trace events=/root/trace-file 2>dst-qemu-log"
> > > tmux -L $SESS send-keys -t source "migrate_set_capability postcopy-ram on
> > > tmux -L $SESS send-keys -t source "migrate_set_speed 20M
> > > tmux -L $SESS send-keys -t dest "migrate_set_capability postcopy-ram on
> > > 
> > > then once booted:
> > > tmux -L vhost send-keys -t source 'migrate -d tcp:0:8888^M'
> > > tmux -L vhost send-keys -t source 'migrate_start_postcopy^M'
> > > (Note those ^M's are actual ctrl-M's i.e. ctrl-v ctrl-M)
> > > 
> > > 
> > > Dave
> > > 
> > > Dr. David Alan Gilbert (29):
> > >   RAMBlock/migration: Add migration flags
> > >   migrate: Update ram_block_discard_range for shared
> > >   qemu_ram_block_host_offset
> > >   migration/ram: ramblock_recv_bitmap_test_byte_offset
> > >   postcopy: use UFFDIO_ZEROPAGE only when available
> > >   postcopy: Add notifier chain
> > >   postcopy: Add vhost-user flag for postcopy and check it
> > >   vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
> > >   vhub: Support sending fds back to qemu
> > >   vhub: Open userfaultfd
> > >   postcopy: Allow registering of fd handler
> > >   vhost+postcopy: Register shared ufd with postcopy
> > >   vhost+postcopy: Transmit 'listen' to client
> > >   vhost+postcopy: Register new regions with the ufd
> > >   vhost+postcopy: Send address back to qemu
> > >   vhost+postcopy: Stash RAMBlock and offset
> > >   vhost+postcopy: Send requests to source for shared pages
> > >   vhost+postcopy: Resolve client address
> > >   postcopy: wake shared
> > >   postcopy: postcopy_notify_shared_wake
> > >   vhost+postcopy: Add vhost waker
> > >   vhost+postcopy: Call wakeups
> > >   vub+postcopy: madvises
> > >   vhost+postcopy: Lock around set_mem_table
> > >   vhu: enable = false on get_vring_base
> > >   vhost: Add VHOST_USER_POSTCOPY_END message
> > >   vhost+postcopy: Wire up POSTCOPY_END notify
> > >   postcopy: Allow shared memory
> > >   vhost-user: Claim support for postcopy
> > > 
> > >  contrib/libvhost-user/libvhost-user.c | 178 ++++++++++++++++-
> > >  contrib/libvhost-user/libvhost-user.h |   8 +
> > >  exec.c                                |  44 +++--
> > >  hw/virtio/trace-events                |  13 ++
> > >  hw/virtio/vhost-user.c                | 293 +++++++++++++++++++++++++++-
> > >  include/exec/cpu-common.h             |   3 +
> > >  include/exec/ram_addr.h               |   2 +
> > >  migration/migration.c                 |   3 +
> > >  migration/migration.h                 |   8 +
> > >  migration/postcopy-ram.c              | 357 +++++++++++++++++++++++++++-------
> > >  migration/postcopy-ram.h              |  69 +++++++
> > >  migration/ram.c                       |   5 +
> > >  migration/ram.h                       |   1 +
> > >  migration/savevm.c                    |  13 ++
> > >  migration/trace-events                |   6 +
> > >  trace-events                          |   3 +
> > >  vl.c                                  |   4 +-
> > >  17 files changed, 926 insertions(+), 84 deletions(-)
> > > 
> > > -- 
> > > 2.13.0
> > > 
> > > 
> > 
> > -- 
> > 
> > BR
> > Alexey
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 

BR
Alexey