[v1] Multifd v4

[Qemu-devel] [PATCH 00/16] Multifd v4

Posted by Juan Quintela 8 years, 7 months ago

Hi

This is the 4th version of multifd. Changes:
- XBZRLE don't need to be checked for
- Documentation and defaults are consistent
- split socketArgs
- use iovec instead of creating something similar.
- We use now the exported size of target page (another HACK removal)
- created qio_chanel_{wirtev,readv}_all functions.  the _full() name
  was already taken.
  What they do is the same that the without _all() function, but if it
  returns due to blocking it redo the call.
- it is checkpatch.pl clean now.

Please comment, Juan.




[v3]

- comments for previous verion addressed
- lots of bugs fixed
- remove DPRINTF from ram.c

- add multifd-group parameter, it gives how many pages we sent each
  time to the worker threads.  I am open to better names.
- Better flush support.
- with migration_set_speed 2G it is able to migrate "stress -vm 2
  -vm-bytes 512M" over loopback.

Please review.

Thanks, Juan.

[v2]

This is a version against current code.  It is based on top of QIO
work. It improves the thread synchronization and fixes the problem
when we could have two threads handing the same page.

Please comment, Juan.


Juan Quintela (16):
  qio: create new qio_channel_write_all
  qio: create new qio_channel_read_all
  migration: Test for disabled features on reception
  migration: Don't create decompression threads if not enabled
  migration: Add multifd capability
  migration: Create x-multifd-threads parameter
  migration: Create x-multifd-group parameter
  migration: Create multifd migration threads
  migration: Start of multiple fd work
  migration: Create ram_multifd_page
  migration: Really use multiple pages at a time
  migration: Send the fd number which we are going to use for this page
  migration: Create thread infrastructure for multifd recv side
  migration: Test new fd infrastructure
  migration: Transfer pages over new channels
  migration: Flush receive queue

 hmp.c                         |  18 ++
 include/io/channel.h          |  46 ++++
 include/migration/migration.h |  17 ++
 io/channel.c                  |  76 ++++++
 migration/migration.c         |  85 ++++++-
 migration/qemu-file-channel.c |  29 +--
 migration/ram.c               | 522 +++++++++++++++++++++++++++++++++++++++++-
 migration/socket.c            |  67 +++++-
 qapi-schema.json              |  30 ++-
 9 files changed, 848 insertions(+), 42 deletions(-)

-- 
2.9.3

Re: [Qemu-devel] [PATCH 00/16] Multifd v4

Posted by Dr. David Alan Gilbert 8 years, 7 months ago

* Juan Quintela (quintela@redhat.com) wrote:
> Hi
> 
> This is the 4th version of multifd. Changes:
> - XBZRLE don't need to be checked for
> - Documentation and defaults are consistent
> - split socketArgs
> - use iovec instead of creating something similar.
> - We use now the exported size of target page (another HACK removal)
> - created qio_chanel_{wirtev,readv}_all functions.  the _full() name
>   was already taken.
>   What they do is the same that the without _all() function, but if it
>   returns due to blocking it redo the call.
> - it is checkpatch.pl clean now.
> 
> Please comment, Juan.

High level things,
  a) I think you probably need to do some bandwidth measurements to show
    that multifd is managing to have some benefit - it would be good
    for the cover letter.
  b) By my count I think this is actually v5 (And I think you're missing
     the -v to git)

Dave
    
> 
> 
> 
> [v3]
> 
> - comments for previous verion addressed
> - lots of bugs fixed
> - remove DPRINTF from ram.c
> 
> - add multifd-group parameter, it gives how many pages we sent each
>   time to the worker threads.  I am open to better names.
> - Better flush support.
> - with migration_set_speed 2G it is able to migrate "stress -vm 2
>   -vm-bytes 512M" over loopback.
> 
> Please review.
> 
> Thanks, Juan.
> 
> [v2]
> 
> This is a version against current code.  It is based on top of QIO
> work. It improves the thread synchronization and fixes the problem
> when we could have two threads handing the same page.
> 
> Please comment, Juan.
> 
> 
> Juan Quintela (16):
>   qio: create new qio_channel_write_all
>   qio: create new qio_channel_read_all
>   migration: Test for disabled features on reception
>   migration: Don't create decompression threads if not enabled
>   migration: Add multifd capability
>   migration: Create x-multifd-threads parameter
>   migration: Create x-multifd-group parameter
>   migration: Create multifd migration threads
>   migration: Start of multiple fd work
>   migration: Create ram_multifd_page
>   migration: Really use multiple pages at a time
>   migration: Send the fd number which we are going to use for this page
>   migration: Create thread infrastructure for multifd recv side
>   migration: Test new fd infrastructure
>   migration: Transfer pages over new channels
>   migration: Flush receive queue
> 
>  hmp.c                         |  18 ++
>  include/io/channel.h          |  46 ++++
>  include/migration/migration.h |  17 ++
>  io/channel.c                  |  76 ++++++
>  migration/migration.c         |  85 ++++++-
>  migration/qemu-file-channel.c |  29 +--
>  migration/ram.c               | 522 +++++++++++++++++++++++++++++++++++++++++-
>  migration/socket.c            |  67 +++++-
>  qapi-schema.json              |  30 ++-
>  9 files changed, 848 insertions(+), 42 deletions(-)
> 
> -- 
> 2.9.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 00/16] Multifd v4

Posted by Daniel P. Berrange 8 years, 7 months ago

On Tue, Mar 14, 2017 at 10:21:43AM +0000, Dr. David Alan Gilbert wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
> > Hi
> > 
> > This is the 4th version of multifd. Changes:
> > - XBZRLE don't need to be checked for
> > - Documentation and defaults are consistent
> > - split socketArgs
> > - use iovec instead of creating something similar.
> > - We use now the exported size of target page (another HACK removal)
> > - created qio_chanel_{wirtev,readv}_all functions.  the _full() name
> >   was already taken.
> >   What they do is the same that the without _all() function, but if it
> >   returns due to blocking it redo the call.
> > - it is checkpatch.pl clean now.
> > 
> > Please comment, Juan.
> 
> High level things,
>   a) I think you probably need to do some bandwidth measurements to show
>     that multifd is managing to have some benefit - it would be good
>     for the cover letter.

multi-fd will certainly benefit encrypted migration, since we'll be able
to burn multiple CPUs for AES instead of bottlenecking on one CPU, and
thus able to take greater advantage of networks with > 1-GigE bandwidth.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

Re: [Qemu-devel] [PATCH 00/16] Multifd v4

Posted by Dr. David Alan Gilbert 8 years, 7 months ago

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Tue, Mar 14, 2017 at 10:21:43AM +0000, Dr. David Alan Gilbert wrote:
> > * Juan Quintela (quintela@redhat.com) wrote:
> > > Hi
> > > 
> > > This is the 4th version of multifd. Changes:
> > > - XBZRLE don't need to be checked for
> > > - Documentation and defaults are consistent
> > > - split socketArgs
> > > - use iovec instead of creating something similar.
> > > - We use now the exported size of target page (another HACK removal)
> > > - created qio_chanel_{wirtev,readv}_all functions.  the _full() name
> > >   was already taken.
> > >   What they do is the same that the without _all() function, but if it
> > >   returns due to blocking it redo the call.
> > > - it is checkpatch.pl clean now.
> > > 
> > > Please comment, Juan.
> > 
> > High level things,
> >   a) I think you probably need to do some bandwidth measurements to show
> >     that multifd is managing to have some benefit - it would be good
> >     for the cover letter.
> 
> multi-fd will certainly benefit encrypted migration, since we'll be able
> to burn multiple CPUs for AES instead of bottlenecking on one CPU, and
> thus able to take greater advantage of networks with > 1-GigE bandwidth.

Yes, that's one I really want to see.  It might be odd using lots of fd's
just to do that, but probably the easiest way.

Dave

> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 00/16] Multifd v4

Posted by Daniel P. Berrange 8 years, 7 months ago

On Tue, Mar 14, 2017 at 11:40:20AM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Tue, Mar 14, 2017 at 10:21:43AM +0000, Dr. David Alan Gilbert wrote:
> > > * Juan Quintela (quintela@redhat.com) wrote:
> > > > Hi
> > > > 
> > > > This is the 4th version of multifd. Changes:
> > > > - XBZRLE don't need to be checked for
> > > > - Documentation and defaults are consistent
> > > > - split socketArgs
> > > > - use iovec instead of creating something similar.
> > > > - We use now the exported size of target page (another HACK removal)
> > > > - created qio_chanel_{wirtev,readv}_all functions.  the _full() name
> > > >   was already taken.
> > > >   What they do is the same that the without _all() function, but if it
> > > >   returns due to blocking it redo the call.
> > > > - it is checkpatch.pl clean now.
> > > > 
> > > > Please comment, Juan.
> > > 
> > > High level things,
> > >   a) I think you probably need to do some bandwidth measurements to show
> > >     that multifd is managing to have some benefit - it would be good
> > >     for the cover letter.
> > 
> > multi-fd will certainly benefit encrypted migration, since we'll be able
> > to burn multiple CPUs for AES instead of bottlenecking on one CPU, and
> > thus able to take greater advantage of networks with > 1-GigE bandwidth.
> 
> Yes, that's one I really want to see.  It might be odd using lots of fd's
> just to do that, but probably the easiest way.

In theory you could have multiple threads doing encryption, all writing to
a single FD, but the AFAICT the GNUTLS library doesn't make this possible
as it encapsulates both encryption + i/o behind a single API call. Even if
we called the same API call from multiple threads, I'm pretty sure it will
serialize the encryption with internal locking, so you would not gain
anything. So using multiple distinct TLS connections is the only viable
option.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

Re: [Qemu-devel] [PATCH 00/16] Multifd v4

Posted by Daniel P. Berrange 8 years, 7 months ago

On Tue, Mar 14, 2017 at 10:21:43AM +0000, Dr. David Alan Gilbert wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
> > Hi
> > 
> > This is the 4th version of multifd. Changes:
> > - XBZRLE don't need to be checked for
> > - Documentation and defaults are consistent
> > - split socketArgs
> > - use iovec instead of creating something similar.
> > - We use now the exported size of target page (another HACK removal)
> > - created qio_chanel_{wirtev,readv}_all functions.  the _full() name
> >   was already taken.
> >   What they do is the same that the without _all() function, but if it
> >   returns due to blocking it redo the call.
> > - it is checkpatch.pl clean now.
> > 
> > Please comment, Juan.
> 
> High level things,
>   a) I think you probably need to do some bandwidth measurements to show
>     that multifd is managing to have some benefit - it would be good
>     for the cover letter.

Presumably this would be a building block to solving the latency problems
with post-copy, by reserving one channel for use transferring out of band
pages required by target host page faults.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

Re: [Qemu-devel] [PATCH 00/16] Multifd v4

Posted by Dr. David Alan Gilbert 8 years, 7 months ago

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Tue, Mar 14, 2017 at 10:21:43AM +0000, Dr. David Alan Gilbert wrote:
> > * Juan Quintela (quintela@redhat.com) wrote:
> > > Hi
> > > 
> > > This is the 4th version of multifd. Changes:
> > > - XBZRLE don't need to be checked for
> > > - Documentation and defaults are consistent
> > > - split socketArgs
> > > - use iovec instead of creating something similar.
> > > - We use now the exported size of target page (another HACK removal)
> > > - created qio_chanel_{wirtev,readv}_all functions.  the _full() name
> > >   was already taken.
> > >   What they do is the same that the without _all() function, but if it
> > >   returns due to blocking it redo the call.
> > > - it is checkpatch.pl clean now.
> > > 
> > > Please comment, Juan.
> > 
> > High level things,
> >   a) I think you probably need to do some bandwidth measurements to show
> >     that multifd is managing to have some benefit - it would be good
> >     for the cover letter.
> 
> Presumably this would be a building block to solving the latency problems
> with post-copy, by reserving one channel for use transferring out of band
> pages required by target host page faults.

Right, it's on my list to look at;  there's some interesting questions about
the way in which the main fd carrying the headers interacts, and also what
happens to pages immediately after the requested page; for example, lets
say we're currently streaming at address 'S' and a postcopy request (P) comes in;
so what we currently have on one FD is:

    S,S+1....S+n,P,P+1,P+2,P+n

Note that when a request comes in we flip location so we start sending background
pages from P+1 on the assumption that they'll be wanted soon.

with 3 FDs this would go initially as:
    S    S+3 P+1 P+4
    S+1  S+4 P+2 ..
    S+2  P   P+3 ..

now if we had a spare FD for postcopy we'd do:
    S    S+3 P+1 P+4
    S+1  S+4 P+2 ..
    S+2  S+5 P+3 ..
    -    P   -   -

So 'P' got there quickly - but P+1 is stuck behind the S's; is that what we want?
An interesting alternative would be to switch which fd we keep free:
    S    S+3 -   -   -
    S+1  S+4 P+2 P+4
    S+2  S+5 P+3 P+5
    -    P   P+1 P+6
  
So depending on your buffering P+1 might also now be pretty fast; but that's
starting to get into heuristics about guessing how much you should put on
your previously low-queue'd fd.

Dave

> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 00/16] Multifd v4

Posted by Daniel P. Berrange 8 years, 7 months ago

On Tue, Mar 14, 2017 at 12:22:23PM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Tue, Mar 14, 2017 at 10:21:43AM +0000, Dr. David Alan Gilbert wrote:
> > > * Juan Quintela (quintela@redhat.com) wrote:
> > > > Hi
> > > > 
> > > > This is the 4th version of multifd. Changes:
> > > > - XBZRLE don't need to be checked for
> > > > - Documentation and defaults are consistent
> > > > - split socketArgs
> > > > - use iovec instead of creating something similar.
> > > > - We use now the exported size of target page (another HACK removal)
> > > > - created qio_chanel_{wirtev,readv}_all functions.  the _full() name
> > > >   was already taken.
> > > >   What they do is the same that the without _all() function, but if it
> > > >   returns due to blocking it redo the call.
> > > > - it is checkpatch.pl clean now.
> > > > 
> > > > Please comment, Juan.
> > > 
> > > High level things,
> > >   a) I think you probably need to do some bandwidth measurements to show
> > >     that multifd is managing to have some benefit - it would be good
> > >     for the cover letter.
> > 
> > Presumably this would be a building block to solving the latency problems
> > with post-copy, by reserving one channel for use transferring out of band
> > pages required by target host page faults.
> 
> Right, it's on my list to look at;  there's some interesting questions about
> the way in which the main fd carrying the headers interacts, and also what
> happens to pages immediately after the requested page; for example, lets
> say we're currently streaming at address 'S' and a postcopy request (P) comes in;
> so what we currently have on one FD is:
> 
>     S,S+1....S+n,P,P+1,P+2,P+n
> 
> Note that when a request comes in we flip location so we start sending background
> pages from P+1 on the assumption that they'll be wanted soon.
> 
> with 3 FDs this would go initially as:
>     S    S+3 P+1 P+4
>     S+1  S+4 P+2 ..
>     S+2  P   P+3 ..
> 
> now if we had a spare FD for postcopy we'd do:
>     S    S+3 P+1 P+4
>     S+1  S+4 P+2 ..
>     S+2  S+5 P+3 ..
>     -    P   -   -
> 
> So 'P' got there quickly - but P+1 is stuck behind the S's; is that what we want?
> An interesting alternative would be to switch which fd we keep free:
>     S    S+3 -   -   -
>     S+1  S+4 P+2 P+4
>     S+2  S+5 P+3 P+5
>     -    P   P+1 P+6
>   
> So depending on your buffering P+1 might also now be pretty fast; but that's
> starting to get into heuristics about guessing how much you should put on
> your previously low-queue'd fd.

Ah, I see, so you're essentially trying todo read-ahead when post-copy
faults. It becomes even more fun when you have multiple page faults
coming in, (quite likely with multi-vCPU guests), as you have P, Q, R, S
come in, all of which want servicing quickly. So if you queue up too
many P+n pages for read-ahead, you'd delay Q, R & S

     S    S+3 -   -   -
     S+1  S+4 P+2 P+4 Q   R   ...
     S+2  S+5 P+3 P+5 Q+1 R+1 ...
     -    P   P+1 P+6 Q+2 ... ...

this tends to argue for overcommitting threads vs cpus. eg even if QEMU
is confined to only use 2 host CPUs, it would be worth having 4 migration
threads. They would contend for CPU time for AES encryption, but you
would reduce chance of getting stuck behind large send-buffers.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

Re: [Qemu-devel] [PATCH 00/16] Multifd v4

Posted by Dr. David Alan Gilbert 8 years, 7 months ago

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Tue, Mar 14, 2017 at 12:22:23PM +0000, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > On Tue, Mar 14, 2017 at 10:21:43AM +0000, Dr. David Alan Gilbert wrote:
> > > > * Juan Quintela (quintela@redhat.com) wrote:
> > > > > Hi
> > > > > 
> > > > > This is the 4th version of multifd. Changes:
> > > > > - XBZRLE don't need to be checked for
> > > > > - Documentation and defaults are consistent
> > > > > - split socketArgs
> > > > > - use iovec instead of creating something similar.
> > > > > - We use now the exported size of target page (another HACK removal)
> > > > > - created qio_chanel_{wirtev,readv}_all functions.  the _full() name
> > > > >   was already taken.
> > > > >   What they do is the same that the without _all() function, but if it
> > > > >   returns due to blocking it redo the call.
> > > > > - it is checkpatch.pl clean now.
> > > > > 
> > > > > Please comment, Juan.
> > > > 
> > > > High level things,
> > > >   a) I think you probably need to do some bandwidth measurements to show
> > > >     that multifd is managing to have some benefit - it would be good
> > > >     for the cover letter.
> > > 
> > > Presumably this would be a building block to solving the latency problems
> > > with post-copy, by reserving one channel for use transferring out of band
> > > pages required by target host page faults.
> > 
> > Right, it's on my list to look at;  there's some interesting questions about
> > the way in which the main fd carrying the headers interacts, and also what
> > happens to pages immediately after the requested page; for example, lets
> > say we're currently streaming at address 'S' and a postcopy request (P) comes in;
> > so what we currently have on one FD is:
> > 
> >     S,S+1....S+n,P,P+1,P+2,P+n
> > 
> > Note that when a request comes in we flip location so we start sending background
> > pages from P+1 on the assumption that they'll be wanted soon.
> > 
> > with 3 FDs this would go initially as:
> >     S    S+3 P+1 P+4
> >     S+1  S+4 P+2 ..
> >     S+2  P   P+3 ..
> > 
> > now if we had a spare FD for postcopy we'd do:
> >     S    S+3 P+1 P+4
> >     S+1  S+4 P+2 ..
> >     S+2  S+5 P+3 ..
> >     -    P   -   -
> > 
> > So 'P' got there quickly - but P+1 is stuck behind the S's; is that what we want?
> > An interesting alternative would be to switch which fd we keep free:
> >     S    S+3 -   -   -
> >     S+1  S+4 P+2 P+4
> >     S+2  S+5 P+3 P+5
> >     -    P   P+1 P+6
> >   
> > So depending on your buffering P+1 might also now be pretty fast; but that's
> > starting to get into heuristics about guessing how much you should put on
> > your previously low-queue'd fd.
> 
> Ah, I see, so you're essentially trying todo read-ahead when post-copy
> faults. It becomes even more fun when you have multiple page faults
> coming in, (quite likely with multi-vCPU guests), as you have P, Q, R, S
> come in, all of which want servicing quickly. So if you queue up too
> many P+n pages for read-ahead, you'd delay Q, R & S
> 
>      S    S+3 -   -   -
>      S+1  S+4 P+2 P+4 Q   R   ...
>      S+2  S+5 P+3 P+5 Q+1 R+1 ...
>      -    P   P+1 P+6 Q+2 ... ...
> 
> this tends to argue for overcommitting threads vs cpus. eg even if QEMU
> is confined to only use 2 host CPUs, it would be worth having 4 migration
> threads. They would contend for CPU time for AES encryption, but you
> would reduce chance of getting stuck behind large send-buffers.

Possibly although it becomes very heuristicy; and then I'm not sure what
happens when you find you've got AES offload hardware.
I also worry again about the fd carrying the headers, if the destination
gets bottlenecked reading pages off the other fd's it might not get to the
postcopy page.
So you can bottleneck on any of network bandwidth, source CPU bandwidth
or destination CPU bandwidth (which I think is where the current bottleneck
on one fd tends to be with no encryption/compression).

I think there's a syscall where you can ask how much is buffered in a socket,
of course that can only tell you about the sender, so really you do want to be
setup so that the source is trying to send no faster than the destination can
read it.

Dave

> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK