[PATCH v4 00/34] migration: File based migration with multifd and fixed-ram

Fabiano Rosas posted 34 patches 2 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20240220224138.24759-1-farosas@suse.de
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, "Daniel P. Berrangé" <berrange@redhat.com>, Fabiano Rosas <farosas@suse.de>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>, Thomas Huth <thuth@redhat.com>, Laurent Vivier <lvivier@redhat.com>
docs/devel/migration/features.rst   |   1 +
docs/devel/migration/fixed-ram.rst  | 137 +++++++++
docs/devel/migration/main.rst       |  22 ++
include/exec/ramblock.h             |  13 +
include/io/channel.h                |  83 ++++++
include/migration/qemu-file-types.h |   2 +
include/qemu/bitops.h               |  13 +
include/qemu/osdep.h                |   2 +
io/channel-file.c                   |  69 +++++
io/channel.c                        |  58 ++++
migration/fd.c                      |  30 ++
migration/fd.h                      |   1 +
migration/file.c                    | 258 +++++++++++++++-
migration/file.h                    |   9 +
migration/migration-hmp-cmds.c      |  11 +
migration/migration.c               |  68 ++++-
migration/multifd-zlib.c            |  26 +-
migration/multifd-zstd.c            |  26 +-
migration/multifd.c                 | 436 +++++++++++++++++++++-------
migration/multifd.h                 |  27 +-
migration/options.c                 |  66 +++++
migration/options.h                 |   2 +
migration/qemu-file.c               | 106 +++++++
migration/qemu-file.h               |   6 +
migration/ram.c                     | 333 ++++++++++++++++++++-
migration/ram.h                     |   1 +
migration/savevm.c                  |   1 +
monitor/fds.c                       |  27 +-
qapi/migration.json                 |  24 +-
tests/qtest/migration-helpers.c     |  42 +++
tests/qtest/migration-helpers.h     |   1 +
tests/qtest/migration-test.c        | 303 ++++++++++++++++++-
util/osdep.c                        |   9 +
33 files changed, 2041 insertions(+), 172 deletions(-)
create mode 100644 docs/devel/migration/fixed-ram.rst
[PATCH v4 00/34] migration: File based migration with multifd and fixed-ram
Posted by Fabiano Rosas 2 months, 3 weeks ago
Hi,

In this v4:

- Added support for 'fd:'. With fixed-ram, that comes free by the
  existing routing to file.c. With multifd I added a loop to create
  the channels.

- Dropped support for direct-io with fixed-ram _without_ multifd. This
  is something I said I would do for this version, but I had to drop
  it because performance is really bad. I think the single-threaded
  precopy code cannot cope with the extra latency/synchronicity of
  O_DIRECT.

- Dropped QIOTask related changes. The file migration now calls
  multifd_channel_connect() directly. Any error can now be returned
  all the way up to migrate_fd_connect(). We can also skip the
  channels_created semaphore logic when using fixed-ram.

- Moved the pwritev_read_contiguous code into a migration-specific
  file and dropped the write_base trick.

- Reduced the number of syncs to just one every ram iteration and one
  at the end on the send side; and a single one at the end on the recv
  side. The EOS flag cannot be skipped because it is used in control
  flow at ram_load_precopy.

The rest are minor changes, I have noted them in the patches
themselves.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1183853433

Series structure
================

This series enables fixed-ram in steps:

0) Cleanups                           [1-5]
1) QIOChannel interfaces              [6-10]
2) Fixed-ram format for precopy       [11-15]
3) Multifd adaptation without packets [16-19]
4) Fixed-ram format for multifd       [20-26]
5) Direct-io generic support          [27]
6) Direct-io for fixed-ram multifd with file: URI  [28-29]
7) Fdset interface for fixed-ram multifd  [30-34]

The majority of changes for this version are at step 3 due to the
rebase on top of the recent multifd cleanups.

Please take a look at the later patches in the series, step 5 onwards.

About fixed-ram
===============

Fixed-ram is a new stream format for the RAM section designed to
supplement the existing ``file:`` migration and make it compatible
with ``multifd``. This enables parallel migration of a guest's RAM to
a file.

The core of the feature is to ensure that each RAM page has a specific
offset in the resulting migration file. This enables the ``multifd``
threads to write exclusively to those offsets even if the guest is
constantly dirtying pages (i.e. live migration).

Another benefit is that the resulting file will have a bounded size,
since pages which are dirtied multiple times will always go to a fixed
location in the file, rather than constantly being added to a
sequential stream.

Having the pages at fixed offsets also allows the usage of O_DIRECT
for save/restore of the migration stream as the pages are ensured to
be written respecting O_DIRECT alignment restrictions.

Latest numbers
==============

=> guest: 128 GB RAM - 120 GB dirty - 1 vcpu in tight loop dirtying memory
=> host: 128 CPU AMD EPYC 7543 - 2 NVMe disks in RAID0 (8586 MiB/s) - xfs
=> pinned vcpus w/ NUMA shortest distances - average of 3 runs - results
   from query-migrate

non-live           | time (ms)   pages/s   mb/s   MB/s
-------------------+-----------------------------------
file               |    110512    256258   9549   1193
  + bg-snapshot    |    245660    119581   4303    537
-------------------+-----------------------------------
fixed-ram          |    157975    216877   6672    834
  + multifd 8 ch.  |     95922    292178  10982   1372
     + direct-io   |     23268   1936897  45330   5666
-------------------------------------------------------

live               | time (ms)   pages/s   mb/s   MB/s
-------------------+-----------------------------------
file               |         -         -      -      - (file grew 4x the VM size)
  + bg-snapshot    |    357635    141747   2974    371
-------------------+-----------------------------------
fixed-ram          |         -         -      -      - (no convergence in 5 min)
  + multifd 8 ch.  |    230812    497551  14900   1862
     + direct-io   |     27475   1788025  46736   5842
-------------------------------------------------------

Previous versions of this patchset have shown performance closer to
disk saturation, but due to the query-migrate bug[1] it's hard to be
confident in the previous numbers. I don't discard the possibility of
a performance regression, but for now I can't spot anything that could
have caused it.

1- https://lore.kernel.org/r/20240219194457.26923-1-farosas@suse.de

v3:
https://lore.kernel.org/r/20231127202612.23012-1-farosas@suse.de
v2:
https://lore.kernel.org/r/20231023203608.26370-1-farosas@suse.de
v1:
https://lore.kernel.org/r/20230330180336.2791-1-farosas@suse.de

Fabiano Rosas (31):
  docs/devel/migration.rst: Document the file transport
  tests/qtest/migration: Rename fd_proto test
  tests/qtest/migration: Add a fd + file test
  migration/multifd: Remove p->quit from recv side
  migration/multifd: Release recv sem_sync earlier
  io: fsync before closing a file channel
  migration/qemu-file: add utility methods for working with seekable
    channels
  migration/ram: Introduce 'fixed-ram' migration capability
  migration: Add fixed-ram URI compatibility check
  migration/ram: Add outgoing 'fixed-ram' migration
  migration/ram: Add incoming 'fixed-ram' migration
  tests/qtest/migration: Add tests for fixed-ram file-based migration
  migration/multifd: Rename MultiFDSend|RecvParams::data to
    compress_data
  migration/multifd: Decouple recv method from pages
  migration/multifd: Allow multifd without packets
  migration/multifd: Allow receiving pages without packets
  migration/multifd: Add outgoing QIOChannelFile support
  migration/multifd: Add incoming QIOChannelFile support
  migration/multifd: Prepare multifd sync for fixed-ram migration
  migration/multifd: Support outgoing fixed-ram stream format
  migration/multifd: Support incoming fixed-ram stream format
  migration/multifd: Add fixed-ram support to fd: URI
  tests/qtest/migration: Add a multifd + fixed-ram migration test
  migration: Add direct-io parameter
  migration/multifd: Add direct-io support
  tests/qtest/migration: Add tests for file migration with direct-io
  monitor: Honor QMP request for fd removal immediately
  monitor: Extract fdset fd flags comparison into a function
  monitor: fdset: Match against O_DIRECT
  migration: Add support for fdset with multifd + file
  tests/qtest/migration: Add a test for fixed-ram with passing of fds

Nikolay Borisov (3):
  io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file
  io: Add generic pwritev/preadv interface
  io: implement io_pwritev/preadv for QIOChannelFile

 docs/devel/migration/features.rst   |   1 +
 docs/devel/migration/fixed-ram.rst  | 137 +++++++++
 docs/devel/migration/main.rst       |  22 ++
 include/exec/ramblock.h             |  13 +
 include/io/channel.h                |  83 ++++++
 include/migration/qemu-file-types.h |   2 +
 include/qemu/bitops.h               |  13 +
 include/qemu/osdep.h                |   2 +
 io/channel-file.c                   |  69 +++++
 io/channel.c                        |  58 ++++
 migration/fd.c                      |  30 ++
 migration/fd.h                      |   1 +
 migration/file.c                    | 258 +++++++++++++++-
 migration/file.h                    |   9 +
 migration/migration-hmp-cmds.c      |  11 +
 migration/migration.c               |  68 ++++-
 migration/multifd-zlib.c            |  26 +-
 migration/multifd-zstd.c            |  26 +-
 migration/multifd.c                 | 436 +++++++++++++++++++++-------
 migration/multifd.h                 |  27 +-
 migration/options.c                 |  66 +++++
 migration/options.h                 |   2 +
 migration/qemu-file.c               | 106 +++++++
 migration/qemu-file.h               |   6 +
 migration/ram.c                     | 333 ++++++++++++++++++++-
 migration/ram.h                     |   1 +
 migration/savevm.c                  |   1 +
 monitor/fds.c                       |  27 +-
 qapi/migration.json                 |  24 +-
 tests/qtest/migration-helpers.c     |  42 +++
 tests/qtest/migration-helpers.h     |   1 +
 tests/qtest/migration-test.c        | 303 ++++++++++++++++++-
 util/osdep.c                        |   9 +
 33 files changed, 2041 insertions(+), 172 deletions(-)
 create mode 100644 docs/devel/migration/fixed-ram.rst

-- 
2.35.3
Re: [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram
Posted by Peter Xu 2 months, 2 weeks ago
On Tue, Feb 20, 2024 at 07:41:04PM -0300, Fabiano Rosas wrote:
> 0) Cleanups                           [1-5]

While I am still reading the rest.. I queued these five first.

-- 
Peter Xu
Re: [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram
Posted by Peter Xu 2 months, 3 weeks ago
On Tue, Feb 20, 2024 at 07:41:04PM -0300, Fabiano Rosas wrote:
> Latest numbers
> ==============
> 
> => guest: 128 GB RAM - 120 GB dirty - 1 vcpu in tight loop dirtying memory
> => host: 128 CPU AMD EPYC 7543 - 2 NVMe disks in RAID0 (8586 MiB/s) - xfs
> => pinned vcpus w/ NUMA shortest distances - average of 3 runs - results
>    from query-migrate
> 
> non-live           | time (ms)   pages/s   mb/s   MB/s
> -------------------+-----------------------------------
> file               |    110512    256258   9549   1193
>   + bg-snapshot    |    245660    119581   4303    537

Is this the one using userfault?  I'm surprised it's much slower when
enabled; logically for a non-live snapshot it should take similar loops
like a normal migration as it should have zero faults, then it should be
similar performance.

> -------------------+-----------------------------------
> fixed-ram          |    157975    216877   6672    834
>   + multifd 8 ch.  |     95922    292178  10982   1372
>      + direct-io   |     23268   1936897  45330   5666
> -------------------------------------------------------
> 
> live               | time (ms)   pages/s   mb/s   MB/s
> -------------------+-----------------------------------
> file               |         -         -      -      - (file grew 4x the VM size)
>   + bg-snapshot    |    357635    141747   2974    371
> -------------------+-----------------------------------
> fixed-ram          |         -         -      -      - (no convergence in 5 min)
>   + multifd 8 ch.  |    230812    497551  14900   1862
>      + direct-io   |     27475   1788025  46736   5842
> -------------------------------------------------------

Also surprised on direct-io too.. that is definitely something tremendous.

-- 
Peter Xu
Re: [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram
Posted by Fabiano Rosas 2 months, 3 weeks ago
Peter Xu <peterx@redhat.com> writes:

> On Tue, Feb 20, 2024 at 07:41:04PM -0300, Fabiano Rosas wrote:
>> Latest numbers
>> ==============
>> 
>> => guest: 128 GB RAM - 120 GB dirty - 1 vcpu in tight loop dirtying memory
>> => host: 128 CPU AMD EPYC 7543 - 2 NVMe disks in RAID0 (8586 MiB/s) - xfs
>> => pinned vcpus w/ NUMA shortest distances - average of 3 runs - results
>>    from query-migrate
>> 
>> non-live           | time (ms)   pages/s   mb/s   MB/s
>> -------------------+-----------------------------------
>> file               |    110512    256258   9549   1193
>>   + bg-snapshot    |    245660    119581   4303    537
>
> Is this the one using userfault?  I'm surprised it's much slower when
> enabled; logically for a non-live snapshot it should take similar loops
> like a normal migration as it should have zero faults, then it should be
> similar performance.

I just enabled the background-snapshot capability. Is there extra setup
that must be done to enable this properly? The ufd_version_check from
migration-test returns true on this system.

>> -------------------+-----------------------------------
>> fixed-ram          |    157975    216877   6672    834
>>   + multifd 8 ch.  |     95922    292178  10982   1372
>>      + direct-io   |     23268   1936897  45330   5666
>> -------------------------------------------------------
>> 
>> live               | time (ms)   pages/s   mb/s   MB/s
>> -------------------+-----------------------------------
>> file               |         -         -      -      - (file grew 4x the VM size)
>>   + bg-snapshot    |    357635    141747   2974    371
>> -------------------+-----------------------------------
>> fixed-ram          |         -         -      -      - (no convergence in 5 min)
>>   + multifd 8 ch.  |    230812    497551  14900   1862
>>      + direct-io   |     27475   1788025  46736   5842
>> -------------------------------------------------------
>
> Also surprised on direct-io too.. that is definitely something tremendous.

Indeed. That was the intention with this series all along after all.
Re: [PATCH v4 00/34] migration: File based migration with multifd and fixed-ram
Posted by Claudio Fontana 2 months, 3 weeks ago
On 2/23/24 03:59, Peter Xu wrote:
> On Tue, Feb 20, 2024 at 07:41:04PM -0300, Fabiano Rosas wrote:
>> Latest numbers
>> ==============
>>
>> => guest: 128 GB RAM - 120 GB dirty - 1 vcpu in tight loop dirtying memory
>> => host: 128 CPU AMD EPYC 7543 - 2 NVMe disks in RAID0 (8586 MiB/s) - xfs
>> => pinned vcpus w/ NUMA shortest distances - average of 3 runs - results
>>    from query-migrate
>>
>> non-live           | time (ms)   pages/s   mb/s   MB/s
>> -------------------+-----------------------------------
>> file               |    110512    256258   9549   1193
>>   + bg-snapshot    |    245660    119581   4303    537
> 
> Is this the one using userfault?  I'm surprised it's much slower when
> enabled; logically for a non-live snapshot it should take similar loops
> like a normal migration as it should have zero faults, then it should be
> similar performance.
> 
>> -------------------+-----------------------------------
>> fixed-ram          |    157975    216877   6672    834
>>   + multifd 8 ch.  |     95922    292178  10982   1372
>>      + direct-io   |     23268   1936897  45330   5666
>> -------------------------------------------------------
>>
>> live               | time (ms)   pages/s   mb/s   MB/s
>> -------------------+-----------------------------------
>> file               |         -         -      -      - (file grew 4x the VM size)
>>   + bg-snapshot    |    357635    141747   2974    371
>> -------------------+-----------------------------------
>> fixed-ram          |         -         -      -      - (no convergence in 5 min)
>>   + multifd 8 ch.  |    230812    497551  14900   1862
>>      + direct-io   |     27475   1788025  46736   5842
>> -------------------------------------------------------
> 
> Also surprised on direct-io too.. that is definitely something tremendous.
> 

Awesome! Can't wait to have this available for our customers.

Ciao,

Claudio