[PATCH v3 00/24] Multifd 🔀 device state transfer support with VFIO consumer

Maciej S. Szmigiero posted 24 patches 1 year, 2 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
hw/core/machine.c                  |   2 +
hw/vfio/migration.c                | 588 ++++++++++++++++++++++++++++-
hw/vfio/pci.c                      |  11 +
hw/vfio/trace-events               |  11 +-
include/block/aio.h                |   8 +-
include/block/thread-pool.h        |  20 +-
include/hw/vfio/vfio-common.h      |  21 ++
include/migration/client-options.h |   4 +
include/migration/misc.h           |  16 +
include/migration/register.h       |  67 +++-
include/qemu/typedefs.h            |   5 +
migration/colo.c                   |   3 +
migration/meson.build              |   1 +
migration/migration-hmp-cmds.c     |   2 +
migration/migration.c              |   3 +
migration/migration.h              |   2 +
migration/multifd-device-state.c   | 193 ++++++++++
migration/multifd-nocomp.c         |  45 ++-
migration/multifd.c                | 228 +++++++++--
migration/multifd.h                |  73 +++-
migration/options.c                |   9 +
migration/qemu-file.h              |   2 +
migration/ram.c                    |  10 +-
migration/savevm.c                 | 183 ++++++++-
migration/savevm.h                 |   4 +
migration/trace-events             |   1 +
scripts/analyze-migration.py       |  11 +
tests/unit/test-thread-pool.c      |   2 +-
util/async.c                       |   6 +-
util/thread-pool.c                 | 174 +++++++--
util/trace-events                  |   6 +-
31 files changed, 1586 insertions(+), 125 deletions(-)
create mode 100644 migration/multifd-device-state.c
[PATCH v3 00/24] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Maciej S. Szmigiero 1 year, 2 months ago
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

This is an updated v3 patch series of the v2 series located here:
https://lore.kernel.org/qemu-devel/cover.1724701542.git.maciej.szmigiero@oracle.com/

Changes from v2:
* Reworked the non-AIO (generic) thread pool to use Glib's GThreadPool
instead of making the current QEMU AIO thread pool generic.

* Added QEMU_VM_COMMAND MIG_CMD_SWITCHOVER_START sub-command to the
migration bit stream protocol via migration compatibility flag.
Used this new bit stream sub-command to achieve barrier between main
migration channel device state data and multifd device state data instead
of introducing save_live_complete_precopy_{begin,end} handlers for that as
the previous patch set version did,

* Added a new migration core thread pool of optional load threads and used
it to implement VFIO load thread instead of introducing load_finish handler
as the previous patch set version did.

* Made VFIO device config state load operation happen from that device load
thread instead of from (now gone) load_finish handler that did such load on
the main migration thread.
In the future this may allow pushing BQL deeper into the device config
state load operation internals and so doing more of it in parallel.

* Switched multifd_send() to using a serializing mutex for thread safety
instead of atomics as suggested by Peter since this seems to not cause
any performance regression while being simpler.

* Added two patches improving SaveVMHandlers documentation: one documenting
the BQL behavior of load SaveVMHandlers, another one explaining
{load,save}_cleanup handlers semantics.

* Added Peter's proposed patch making MultiFDSendData a struct from
https://lore.kernel.org/qemu-devel/ZuCickYhs3nf2ERC@x1n/
Other two patches from that message bring no performance benefits so they
were skipped (as discussed in that e-mail thread).

* Switched x-migration-multifd-transfer VFIO property to tri-state (On,
Off, Auto), with Auto being now the default value.
This means hat VFIO device state transfer via multifd channels is
automatically attempted in configurations that otherwise support it.
Note that in this patch set version (in contrast with the previous version)
x-migration-multifd-transfer setting is meaningful both on source AND
destination QEMU.

* Fixed a race condition with respect to the final multifd channel SYNC
packet sent by the RAM transfer code.

* Made VFIO's bytes_transferred counter atomic since it is accessed from
multiple threads (thanks Avihai for spotting it).

* Fixed an issue where VFIO device config sender QEMUFile wouldn't be
closed in some error conditions, switched to QEMUFile g_autoptr() automatic
memory management there to avoid such bugs in the future (also thanks
to Avihai for spotting the issue).

* Many, MANY small changes, like renamed functions, added review tags,
locks annotations, code formatting, split out changes into separate
commits, etc.

* Redid benchmarks.

========================================================================

Benchmark results:
These are 25th percentile of downtime results from 70-100 back-and-forth
live migrations with the same VM config (guest wasn't restarted during
these migrations).

Previous benchmarks reported the lowest downtime results ("0th percentile")
instead but these were subject to variation due to often being one of
outliers.

The used setup for bechmarking was the same as the RFC version of patch set
used.


Results with 6 multifd channels:
            4 VFs   2 VFs    1 VF
Disabled: 1900 ms  859 ms  487 ms
Enabled:  1095 ms  556 ms  366 ms 

Results with 4 VFs but varied multifd channel count:
             6 ch     8 ch    15 ch
Enabled:  1095 ms  1104 ms  1125 ms 


Important note:
4 VF benchmarks were done with commit 5504a8126115
("KVM: Dynamic sized kvm memslots array") and its revert-dependencies
reverted since this seems to improve performance in this VM config if the
multifd transfer is enabled: the downtime performance with this commit
present is 1141 ms enabled / 1730 ms disabled.

Smaller VF counts actually do seem to benefit from this commit, so it's
likely that in the future adding some kind of a memslot pre-allocation
bit stream message might make sense to avoid this downtime regression for
4 VF configs (and likely higher VF count too).

========================================================================

This series is obviously targeting post QEMU 9.2 release by now
(AFAIK called 10.0).

Will need to be changed to use hw_compat_10_0 once these become available.

========================================================================

Maciej S. Szmigiero (23):
  migration: Clarify that {load,save}_cleanup handlers can run without
    setup
  thread-pool: Remove thread_pool_submit() function
  thread-pool: Rename AIO pool functions to *_aio() and data types to
    *Aio
  thread-pool: Implement generic (non-AIO) pool support
  migration: Add MIG_CMD_SWITCHOVER_START and its load handler
  migration: Add qemu_loadvm_load_state_buffer() and its handler
  migration: Document the BQL behavior of load SaveVMHandlers
  migration: Add thread pool of optional load threads
  migration/multifd: Split packet into header and RAM data
  migration/multifd: Device state transfer support - receive side
  migration/multifd: Make multifd_send() thread safe
  migration/multifd: Add an explicit MultiFDSendData destructor
  migration/multifd: Device state transfer support - send side
  migration/multifd: Add migration_has_device_state_support()
  migration/multifd: Send final SYNC only after device state is complete
  migration: Add save_live_complete_precopy_thread handler
  vfio/migration: Don't run load cleanup if load setup didn't run
  vfio/migration: Add x-migration-multifd-transfer VFIO property
  vfio/migration: Add load_device_config_state_start trace event
  vfio/migration: Convert bytes_transferred counter to atomic
  vfio/migration: Multifd device state transfer support - receive side
  migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile
  vfio/migration: Multifd device state transfer support - send side

Peter Xu (1):
  migration/multifd: Make MultiFDSendData a struct

 hw/core/machine.c                  |   2 +
 hw/vfio/migration.c                | 588 ++++++++++++++++++++++++++++-
 hw/vfio/pci.c                      |  11 +
 hw/vfio/trace-events               |  11 +-
 include/block/aio.h                |   8 +-
 include/block/thread-pool.h        |  20 +-
 include/hw/vfio/vfio-common.h      |  21 ++
 include/migration/client-options.h |   4 +
 include/migration/misc.h           |  16 +
 include/migration/register.h       |  67 +++-
 include/qemu/typedefs.h            |   5 +
 migration/colo.c                   |   3 +
 migration/meson.build              |   1 +
 migration/migration-hmp-cmds.c     |   2 +
 migration/migration.c              |   3 +
 migration/migration.h              |   2 +
 migration/multifd-device-state.c   | 193 ++++++++++
 migration/multifd-nocomp.c         |  45 ++-
 migration/multifd.c                | 228 +++++++++--
 migration/multifd.h                |  73 +++-
 migration/options.c                |   9 +
 migration/qemu-file.h              |   2 +
 migration/ram.c                    |  10 +-
 migration/savevm.c                 | 183 ++++++++-
 migration/savevm.h                 |   4 +
 migration/trace-events             |   1 +
 scripts/analyze-migration.py       |  11 +
 tests/unit/test-thread-pool.c      |   2 +-
 util/async.c                       |   6 +-
 util/thread-pool.c                 | 174 +++++++--
 util/trace-events                  |   6 +-
 31 files changed, 1586 insertions(+), 125 deletions(-)
 create mode 100644 migration/multifd-device-state.c
Re: [PATCH v3 00/24] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Cédric Le Goater 1 year, 2 months ago
On 11/17/24 20:19, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> 
> This is an updated v3 patch series of the v2 series located here:
> https://lore.kernel.org/qemu-devel/cover.1724701542.git.maciej.szmigiero@oracle.com/
> 
> Changes from v2:
> * Reworked the non-AIO (generic) thread pool to use Glib's GThreadPool
> instead of making the current QEMU AIO thread pool generic.
> 
> * Added QEMU_VM_COMMAND MIG_CMD_SWITCHOVER_START sub-command to the
> migration bit stream protocol via migration compatibility flag.
> Used this new bit stream sub-command to achieve barrier between main
> migration channel device state data and multifd device state data instead
> of introducing save_live_complete_precopy_{begin,end} handlers for that as
> the previous patch set version did,
> 
> * Added a new migration core thread pool of optional load threads and used
> it to implement VFIO load thread instead of introducing load_finish handler
> as the previous patch set version did.
> 
> * Made VFIO device config state load operation happen from that device load
> thread instead of from (now gone) load_finish handler that did such load on
> the main migration thread.
> In the future this may allow pushing BQL deeper into the device config
> state load operation internals and so doing more of it in parallel.
> 
> * Switched multifd_send() to using a serializing mutex for thread safety
> instead of atomics as suggested by Peter since this seems to not cause
> any performance regression while being simpler.
> 
> * Added two patches improving SaveVMHandlers documentation: one documenting
> the BQL behavior of load SaveVMHandlers, another one explaining
> {load,save}_cleanup handlers semantics.
> 
> * Added Peter's proposed patch making MultiFDSendData a struct from
> https://lore.kernel.org/qemu-devel/ZuCickYhs3nf2ERC@x1n/
> Other two patches from that message bring no performance benefits so they
> were skipped (as discussed in that e-mail thread).
> 
> * Switched x-migration-multifd-transfer VFIO property to tri-state (On,
> Off, Auto), with Auto being now the default value.
> This means hat VFIO device state transfer via multifd channels is
> automatically attempted in configurations that otherwise support it.
> Note that in this patch set version (in contrast with the previous version)
> x-migration-multifd-transfer setting is meaningful both on source AND
> destination QEMU.
> 
> * Fixed a race condition with respect to the final multifd channel SYNC
> packet sent by the RAM transfer code.
> 
> * Made VFIO's bytes_transferred counter atomic since it is accessed from
> multiple threads (thanks Avihai for spotting it).
> 
> * Fixed an issue where VFIO device config sender QEMUFile wouldn't be
> closed in some error conditions, switched to QEMUFile g_autoptr() automatic
> memory management there to avoid such bugs in the future (also thanks
> to Avihai for spotting the issue).
> 
> * Many, MANY small changes, like renamed functions, added review tags,
> locks annotations, code formatting, split out changes into separate
> commits, etc.
> 
> * Redid benchmarks.
> 
> ========================================================================
> 
> Benchmark results:
> These are 25th percentile of downtime results from 70-100 back-and-forth
> live migrations with the same VM config (guest wasn't restarted during
> these migrations).
> 
> Previous benchmarks reported the lowest downtime results ("0th percentile")
> instead but these were subject to variation due to often being one of
> outliers.
> 
> The used setup for bechmarking was the same as the RFC version of patch set
> used.
> 
> 
> Results with 6 multifd channels:
>              4 VFs   2 VFs    1 VF
> Disabled: 1900 ms  859 ms  487 ms
> Enabled:  1095 ms  556 ms  366 ms
> 
> Results with 4 VFs but varied multifd channel count:
>               6 ch     8 ch    15 ch
> Enabled:  1095 ms  1104 ms  1125 ms
> 
> 
> Important note:
> 4 VF benchmarks were done with commit 5504a8126115
> ("KVM: Dynamic sized kvm memslots array") and its revert-dependencies
> reverted since this seems to improve performance in this VM config if the
> multifd transfer is enabled: the downtime performance with this commit
> present is 1141 ms enabled / 1730 ms disabled.
> 
> Smaller VF counts actually do seem to benefit from this commit, so it's
> likely that in the future adding some kind of a memslot pre-allocation
> bit stream message might make sense to avoid this downtime regression for
> 4 VF configs (and likely higher VF count too).
> 
> ========================================================================
> 
> This series is obviously targeting post QEMU 9.2 release by now
> (AFAIK called 10.0).
> 
> Will need to be changed to use hw_compat_10_0 once these become available.
> 
> ========================================================================
> 
> Maciej S. Szmigiero (23):
>    migration: Clarify that {load,save}_cleanup handlers can run without
>      setup
>    thread-pool: Remove thread_pool_submit() function
>    thread-pool: Rename AIO pool functions to *_aio() and data types to
>      *Aio
>    thread-pool: Implement generic (non-AIO) pool support
>    migration: Add MIG_CMD_SWITCHOVER_START and its load handler
>    migration: Add qemu_loadvm_load_state_buffer() and its handler
>    migration: Document the BQL behavior of load SaveVMHandlers
>    migration: Add thread pool of optional load threads
>    migration/multifd: Split packet into header and RAM data
>    migration/multifd: Device state transfer support - receive side
>    migration/multifd: Make multifd_send() thread safe
>    migration/multifd: Add an explicit MultiFDSendData destructor
>    migration/multifd: Device state transfer support - send side
>    migration/multifd: Add migration_has_device_state_support()
>    migration/multifd: Send final SYNC only after device state is complete
>    migration: Add save_live_complete_precopy_thread handler
>    vfio/migration: Don't run load cleanup if load setup didn't run
>    vfio/migration: Add x-migration-multifd-transfer VFIO property
>    vfio/migration: Add load_device_config_state_start trace event
>    vfio/migration: Convert bytes_transferred counter to atomic
>    vfio/migration: Multifd device state transfer support - receive side
>    migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile
>    vfio/migration: Multifd device state transfer support - send side
> 
> Peter Xu (1):
>    migration/multifd: Make MultiFDSendData a struct
> 
>   hw/core/machine.c                  |   2 +
>   hw/vfio/migration.c                | 588 ++++++++++++++++++++++++++++-
>   hw/vfio/pci.c                      |  11 +
>   hw/vfio/trace-events               |  11 +-
>   include/block/aio.h                |   8 +-
>   include/block/thread-pool.h        |  20 +-
>   include/hw/vfio/vfio-common.h      |  21 ++
>   include/migration/client-options.h |   4 +
>   include/migration/misc.h           |  16 +
>   include/migration/register.h       |  67 +++-
>   include/qemu/typedefs.h            |   5 +
>   migration/colo.c                   |   3 +
>   migration/meson.build              |   1 +
>   migration/migration-hmp-cmds.c     |   2 +
>   migration/migration.c              |   3 +
>   migration/migration.h              |   2 +
>   migration/multifd-device-state.c   | 193 ++++++++++
>   migration/multifd-nocomp.c         |  45 ++-
>   migration/multifd.c                | 228 +++++++++--
>   migration/multifd.h                |  73 +++-
>   migration/options.c                |   9 +
>   migration/qemu-file.h              |   2 +
>   migration/ram.c                    |  10 +-
>   migration/savevm.c                 | 183 ++++++++-
>   migration/savevm.h                 |   4 +
>   migration/trace-events             |   1 +
>   scripts/analyze-migration.py       |  11 +
>   tests/unit/test-thread-pool.c      |   2 +-
>   util/async.c                       |   6 +-
>   util/thread-pool.c                 | 174 +++++++--
>   util/trace-events                  |   6 +-
>   31 files changed, 1586 insertions(+), 125 deletions(-)
>   create mode 100644 migration/multifd-device-state.c


I did a quick run of a VM with a mlx5 VF and a vGPU and I didn't see
any issue when migrating. I used 4 channels for multifd. The trace
events looked ok and useful. We will tune these with time. I wished
we had some way to dump the thread and channel usage on each side.

A build was provided to RHEL QE. This to get more results when under
stress and with larger device states. Don't expect feedback before
next year though !

Having a small cookbook to run the migration from QEMU and from
libvirt would be a plus.

Thanks,

C.
Re: [PATCH v3 00/24] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Maciej S. Szmigiero 1 year, 2 months ago
On 5.12.2024 22:27, Cédric Le Goater wrote:
> On 11/17/24 20:19, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> This is an updated v3 patch series of the v2 series located here:
>> https://lore.kernel.org/qemu-devel/cover.1724701542.git.maciej.szmigiero@oracle.com/
>>
>> Changes from v2:
>> * Reworked the non-AIO (generic) thread pool to use Glib's GThreadPool
>> instead of making the current QEMU AIO thread pool generic.
>>
>> * Added QEMU_VM_COMMAND MIG_CMD_SWITCHOVER_START sub-command to the
>> migration bit stream protocol via migration compatibility flag.
>> Used this new bit stream sub-command to achieve barrier between main
>> migration channel device state data and multifd device state data instead
>> of introducing save_live_complete_precopy_{begin,end} handlers for that as
>> the previous patch set version did,
>>
>> * Added a new migration core thread pool of optional load threads and used
>> it to implement VFIO load thread instead of introducing load_finish handler
>> as the previous patch set version did.
>>
>> * Made VFIO device config state load operation happen from that device load
>> thread instead of from (now gone) load_finish handler that did such load on
>> the main migration thread.
>> In the future this may allow pushing BQL deeper into the device config
>> state load operation internals and so doing more of it in parallel.
>>
>> * Switched multifd_send() to using a serializing mutex for thread safety
>> instead of atomics as suggested by Peter since this seems to not cause
>> any performance regression while being simpler.
>>
>> * Added two patches improving SaveVMHandlers documentation: one documenting
>> the BQL behavior of load SaveVMHandlers, another one explaining
>> {load,save}_cleanup handlers semantics.
>>
>> * Added Peter's proposed patch making MultiFDSendData a struct from
>> https://lore.kernel.org/qemu-devel/ZuCickYhs3nf2ERC@x1n/
>> Other two patches from that message bring no performance benefits so they
>> were skipped (as discussed in that e-mail thread).
>>
>> * Switched x-migration-multifd-transfer VFIO property to tri-state (On,
>> Off, Auto), with Auto being now the default value.
>> This means hat VFIO device state transfer via multifd channels is
>> automatically attempted in configurations that otherwise support it.
>> Note that in this patch set version (in contrast with the previous version)
>> x-migration-multifd-transfer setting is meaningful both on source AND
>> destination QEMU.
>>
>> * Fixed a race condition with respect to the final multifd channel SYNC
>> packet sent by the RAM transfer code.
>>
>> * Made VFIO's bytes_transferred counter atomic since it is accessed from
>> multiple threads (thanks Avihai for spotting it).
>>
>> * Fixed an issue where VFIO device config sender QEMUFile wouldn't be
>> closed in some error conditions, switched to QEMUFile g_autoptr() automatic
>> memory management there to avoid such bugs in the future (also thanks
>> to Avihai for spotting the issue).
>>
>> * Many, MANY small changes, like renamed functions, added review tags,
>> locks annotations, code formatting, split out changes into separate
>> commits, etc.
>>
>> * Redid benchmarks.
>>
>> ========================================================================
>>
>> Benchmark results:
>> These are 25th percentile of downtime results from 70-100 back-and-forth
>> live migrations with the same VM config (guest wasn't restarted during
>> these migrations).
>>
>> Previous benchmarks reported the lowest downtime results ("0th percentile")
>> instead but these were subject to variation due to often being one of
>> outliers.
>>
>> The used setup for bechmarking was the same as the RFC version of patch set
>> used.
>>
>>
>> Results with 6 multifd channels:
>>              4 VFs   2 VFs    1 VF
>> Disabled: 1900 ms  859 ms  487 ms
>> Enabled:  1095 ms  556 ms  366 ms
>>
>> Results with 4 VFs but varied multifd channel count:
>>               6 ch     8 ch    15 ch
>> Enabled:  1095 ms  1104 ms  1125 ms
>>
>>
>> Important note:
>> 4 VF benchmarks were done with commit 5504a8126115
>> ("KVM: Dynamic sized kvm memslots array") and its revert-dependencies
>> reverted since this seems to improve performance in this VM config if the
>> multifd transfer is enabled: the downtime performance with this commit
>> present is 1141 ms enabled / 1730 ms disabled.
>>
>> Smaller VF counts actually do seem to benefit from this commit, so it's
>> likely that in the future adding some kind of a memslot pre-allocation
>> bit stream message might make sense to avoid this downtime regression for
>> 4 VF configs (and likely higher VF count too).
>>
>> ========================================================================
>>
>> This series is obviously targeting post QEMU 9.2 release by now
>> (AFAIK called 10.0).
>>
>> Will need to be changed to use hw_compat_10_0 once these become available.
>>
>> ========================================================================
>>
>> Maciej S. Szmigiero (23):
>>    migration: Clarify that {load,save}_cleanup handlers can run without
>>      setup
>>    thread-pool: Remove thread_pool_submit() function
>>    thread-pool: Rename AIO pool functions to *_aio() and data types to
>>      *Aio
>>    thread-pool: Implement generic (non-AIO) pool support
>>    migration: Add MIG_CMD_SWITCHOVER_START and its load handler
>>    migration: Add qemu_loadvm_load_state_buffer() and its handler
>>    migration: Document the BQL behavior of load SaveVMHandlers
>>    migration: Add thread pool of optional load threads
>>    migration/multifd: Split packet into header and RAM data
>>    migration/multifd: Device state transfer support - receive side
>>    migration/multifd: Make multifd_send() thread safe
>>    migration/multifd: Add an explicit MultiFDSendData destructor
>>    migration/multifd: Device state transfer support - send side
>>    migration/multifd: Add migration_has_device_state_support()
>>    migration/multifd: Send final SYNC only after device state is complete
>>    migration: Add save_live_complete_precopy_thread handler
>>    vfio/migration: Don't run load cleanup if load setup didn't run
>>    vfio/migration: Add x-migration-multifd-transfer VFIO property
>>    vfio/migration: Add load_device_config_state_start trace event
>>    vfio/migration: Convert bytes_transferred counter to atomic
>>    vfio/migration: Multifd device state transfer support - receive side
>>    migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile
>>    vfio/migration: Multifd device state transfer support - send side
>>
>> Peter Xu (1):
>>    migration/multifd: Make MultiFDSendData a struct
>>
>>   hw/core/machine.c                  |   2 +
>>   hw/vfio/migration.c                | 588 ++++++++++++++++++++++++++++-
>>   hw/vfio/pci.c                      |  11 +
>>   hw/vfio/trace-events               |  11 +-
>>   include/block/aio.h                |   8 +-
>>   include/block/thread-pool.h        |  20 +-
>>   include/hw/vfio/vfio-common.h      |  21 ++
>>   include/migration/client-options.h |   4 +
>>   include/migration/misc.h           |  16 +
>>   include/migration/register.h       |  67 +++-
>>   include/qemu/typedefs.h            |   5 +
>>   migration/colo.c                   |   3 +
>>   migration/meson.build              |   1 +
>>   migration/migration-hmp-cmds.c     |   2 +
>>   migration/migration.c              |   3 +
>>   migration/migration.h              |   2 +
>>   migration/multifd-device-state.c   | 193 ++++++++++
>>   migration/multifd-nocomp.c         |  45 ++-
>>   migration/multifd.c                | 228 +++++++++--
>>   migration/multifd.h                |  73 +++-
>>   migration/options.c                |   9 +
>>   migration/qemu-file.h              |   2 +
>>   migration/ram.c                    |  10 +-
>>   migration/savevm.c                 | 183 ++++++++-
>>   migration/savevm.h                 |   4 +
>>   migration/trace-events             |   1 +
>>   scripts/analyze-migration.py       |  11 +
>>   tests/unit/test-thread-pool.c      |   2 +-
>>   util/async.c                       |   6 +-
>>   util/thread-pool.c                 | 174 +++++++--
>>   util/trace-events                  |   6 +-
>>   31 files changed, 1586 insertions(+), 125 deletions(-)
>>   create mode 100644 migration/multifd-device-state.c
> 
> 
> I did a quick run of a VM with a mlx5 VF and a vGPU and I didn't see
> any issue when migrating. I used 4 channels for multifd. The trace
> events looked ok and useful. We will tune these with time. I wished
> we had some way to dump the thread and channel usage on each side.
> 
> A build was provided to RHEL QE. This to get more results when under
> stress and with larger device states. Don't expect feedback before
> next year though !

Thanks Cédric, more testing of a complex code change is always
appreciated.

Especially that your test environment is probably significantly
different from mine.

> Having a small cookbook to run the migration from QEMU and from
> libvirt would be a plus.
> 
> Thanks,
> 
> C.

Thanks,
Maciej


Re: [PATCH v3 00/24] Multifd 🔀device state transfer support with VFIO consumer
Posted by Peter Xu 1 year, 2 months ago
On Thu, Dec 05, 2024 at 10:27:09PM +0100, Cédric Le Goater wrote:

[...]

> > Important note:
> > 4 VF benchmarks were done with commit 5504a8126115
> > ("KVM: Dynamic sized kvm memslots array") and its revert-dependencies
> > reverted since this seems to improve performance in this VM config if the
> > multifd transfer is enabled: the downtime performance with this commit
> > present is 1141 ms enabled / 1730 ms disabled.

[1]

> > 
> > Smaller VF counts actually do seem to benefit from this commit, so it's
> > likely that in the future adding some kind of a memslot pre-allocation
> > bit stream message might make sense to avoid this downtime regression for
> > 4 VF configs (and likely higher VF count too).

[...]

> I did a quick run of a VM with a mlx5 VF and a vGPU and I didn't see
> any issue when migrating. I used 4 channels for multifd. The trace
> events looked ok and useful. We will tune these with time. I wished
> we had some way to dump the thread and channel usage on each side.
> 
> A build was provided to RHEL QE. This to get more results when under
> stress and with larger device states. Don't expect feedback before
> next year though !
> 
> Having a small cookbook to run the migration from QEMU and from
> libvirt would be a plus.

Cédric,

Did you test also with commit 5504a8126115 and relevant patches reverted,
per mentioned above [1]?  Or vanilla master branch?

I wonder whether it shows the same regression in your setup.

Thanks,

-- 
Peter Xu


Re: [PATCH v3 00/24] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Cédric Le Goater 1 year, 2 months ago
Hello !

[ ... ]

> Did you test also with commit 5504a8126115 and relevant patches reverted,
> per mentioned above [1]?  Or vanilla master branch?

I am on master and I didn't revert the "Dynamic sized kvm memslots array"
series, which we know has benefits for other VM configs and workloads.

For testing purposing, we could add a toggle defining a constant number of
memslots maybe ? and 0 would mean grow-on-demand ?

> I wonder whether it shows the same regression in your setup.

I didn't look at performance yet and downtime with mlx5 VFs was not a
big issue either. So I am expecting QE to share test results with vGPUs
under load.

Thanks,

C.
Re: [PATCH v3 00/24] Multifd 🔀device state transfer support with VFIO consumer
Posted by Peter Xu 1 year, 2 months ago
On Sun, Nov 17, 2024 at 08:19:55PM +0100, Maciej S. Szmigiero wrote:
> Important note:
> 4 VF benchmarks were done with commit 5504a8126115
> ("KVM: Dynamic sized kvm memslots array") and its revert-dependencies
> reverted since this seems to improve performance in this VM config if the
> multifd transfer is enabled: the downtime performance with this commit
> present is 1141 ms enabled / 1730 ms disabled.
> 
> Smaller VF counts actually do seem to benefit from this commit, so it's
> likely that in the future adding some kind of a memslot pre-allocation
> bit stream message might make sense to avoid this downtime regression for
> 4 VF configs (and likely higher VF count too).

I'm confused why revert 5504a8126115 could be faster, and it affects as
much as 600ms.  Also how that effect differs can relevant to num of VFs.

Could you share more on this regression?  Because if that's problematic we
need to fix it, or upstream QEMU (after this series merged) will still not
work.

-- 
Peter Xu
Re: [PATCH v3 00/24] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Maciej S. Szmigiero 1 year, 2 months ago
On 4.12.2024 20:10, Peter Xu wrote:
> On Sun, Nov 17, 2024 at 08:19:55PM +0100, Maciej S. Szmigiero wrote:
>> Important note:
>> 4 VF benchmarks were done with commit 5504a8126115
>> ("KVM: Dynamic sized kvm memslots array") and its revert-dependencies
>> reverted since this seems to improve performance in this VM config if the
>> multifd transfer is enabled: the downtime performance with this commit
>> present is 1141 ms enabled / 1730 ms disabled.
>>
>> Smaller VF counts actually do seem to benefit from this commit, so it's
>> likely that in the future adding some kind of a memslot pre-allocation
>> bit stream message might make sense to avoid this downtime regression for
>> 4 VF configs (and likely higher VF count too).
> 
> I'm confused why revert 5504a8126115 could be faster, and it affects as
> much as 600ms.  Also how that effect differs can relevant to num of VFs.
> 
> Could you share more on this regression?  Because if that's problematic we
> need to fix it, or upstream QEMU (after this series merged) will still not
> work.
> 

The number of memslots that the VM uses seems to differ depending on its
VF count, each VF using 2 memslots:
2 VFs, used slots: 13
4 VFs, used slots: 17
5 VFs, used slots: 19

So I suspect this performance difference is due to these higher counts
of memslots possibly benefiting from being preallocated on the previous
QEMU code (before commit 5504a8126115).

I can see that with this commit:
> #define  KVM_MEMSLOTS_NR_ALLOC_DEFAULT                      16

So it would explain why the difference is visible on 4 VFs only (and
possibly higher VF counts, just I don't have an ability to test migrating
it) since with 4 VF configs we exceed KVM_MEMSLOTS_NR_ALLOC_DEFAULT.

Thanks,
Maciej
Re: [PATCH v3 00/24] Multifd 🔀device state transfer support with VFIO consumer
Posted by Peter Xu 1 year, 2 months ago
On Fri, Dec 06, 2024 at 07:03:36PM +0100, Maciej S. Szmigiero wrote:
> On 4.12.2024 20:10, Peter Xu wrote:
> > On Sun, Nov 17, 2024 at 08:19:55PM +0100, Maciej S. Szmigiero wrote:
> > > Important note:
> > > 4 VF benchmarks were done with commit 5504a8126115
> > > ("KVM: Dynamic sized kvm memslots array") and its revert-dependencies
> > > reverted since this seems to improve performance in this VM config if the
> > > multifd transfer is enabled: the downtime performance with this commit
> > > present is 1141 ms enabled / 1730 ms disabled.
> > > 
> > > Smaller VF counts actually do seem to benefit from this commit, so it's
> > > likely that in the future adding some kind of a memslot pre-allocation
> > > bit stream message might make sense to avoid this downtime regression for
> > > 4 VF configs (and likely higher VF count too).
> > 
> > I'm confused why revert 5504a8126115 could be faster, and it affects as
> > much as 600ms.  Also how that effect differs can relevant to num of VFs.
> > 
> > Could you share more on this regression?  Because if that's problematic we
> > need to fix it, or upstream QEMU (after this series merged) will still not
> > work.
> > 
> 
> The number of memslots that the VM uses seems to differ depending on its
> VF count, each VF using 2 memslots:
> 2 VFs, used slots: 13
> 4 VFs, used slots: 17
> 5 VFs, used slots: 19

It's still pretty less.

> 
> So I suspect this performance difference is due to these higher counts
> of memslots possibly benefiting from being preallocated on the previous
> QEMU code (before commit 5504a8126115).
> 
> I can see that with this commit:
> > #define  KVM_MEMSLOTS_NR_ALLOC_DEFAULT                      16
> 
> So it would explain why the difference is visible on 4 VFs only (and
> possibly higher VF counts, just I don't have an ability to test migrating
> it) since with 4 VF configs we exceed KVM_MEMSLOTS_NR_ALLOC_DEFAULT.

I suppose it means kvm_slots_grow() is called once, but I don't understand
why it caused 500ms downtime!

Not to mention, that patchset should at least reduce downtime OTOH due to
the small num of slots, because some of the dirty sync / clear path would
need to walk the whole slot array (our lookup is pretty slow for now, but
probably no good reason to rework it yet if it's mostly 10-20).

In general, I would still expect that dynamic memslot work to speedup
(instead of slowing down) VFIO migrations.

There's something off here, or something I overlooked.  I suggest we figure
it out..  Even if we need to revert the kvm series on master, but I so far
doubt it.

Otherwise we should at least report the number with things on the master
branch, and we evaluate merging this series with that real number, because
fundamentally that's the numbers people will get when start using this
feature on master later.

-- 
Peter Xu
Re: [PATCH v3 00/24] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Maciej S. Szmigiero 1 year, 1 month ago
On 6.12.2024 23:20, Peter Xu wrote:
> On Fri, Dec 06, 2024 at 07:03:36PM +0100, Maciej S. Szmigiero wrote:
>> On 4.12.2024 20:10, Peter Xu wrote:
>>> On Sun, Nov 17, 2024 at 08:19:55PM +0100, Maciej S. Szmigiero wrote:
>>>> Important note:
>>>> 4 VF benchmarks were done with commit 5504a8126115
>>>> ("KVM: Dynamic sized kvm memslots array") and its revert-dependencies
>>>> reverted since this seems to improve performance in this VM config if the
>>>> multifd transfer is enabled: the downtime performance with this commit
>>>> present is 1141 ms enabled / 1730 ms disabled.
>>>>
>>>> Smaller VF counts actually do seem to benefit from this commit, so it's
>>>> likely that in the future adding some kind of a memslot pre-allocation
>>>> bit stream message might make sense to avoid this downtime regression for
>>>> 4 VF configs (and likely higher VF count too).
>>>
>>> I'm confused why revert 5504a8126115 could be faster, and it affects as
>>> much as 600ms.  Also how that effect differs can relevant to num of VFs.
>>>
>>> Could you share more on this regression?  Because if that's problematic we
>>> need to fix it, or upstream QEMU (after this series merged) will still not
>>> work.
>>>
>>
>> The number of memslots that the VM uses seems to differ depending on its
>> VF count, each VF using 2 memslots:
>> 2 VFs, used slots: 13
>> 4 VFs, used slots: 17
>> 5 VFs, used slots: 19
> 
> It's still pretty less.
> 
>>
>> So I suspect this performance difference is due to these higher counts
>> of memslots possibly benefiting from being preallocated on the previous
>> QEMU code (before commit 5504a8126115).
>>
>> I can see that with this commit:
>>> #define  KVM_MEMSLOTS_NR_ALLOC_DEFAULT                      16
>>
>> So it would explain why the difference is visible on 4 VFs only (and
>> possibly higher VF counts, just I don't have an ability to test migrating
>> it) since with 4 VF configs we exceed KVM_MEMSLOTS_NR_ALLOC_DEFAULT.
> 
> I suppose it means kvm_slots_grow() is called once, but I don't understand
> why it caused 500ms downtime!

In this cover letter sentence:
> "the downtime performance with this commit present is 1141 ms enabled / 1730 ms disabled"
"enabled" and "disabled" refer to *multifd transfer* being enabled, not
your patch being present (sorry for not being 100% clear there).

So the difference that the memslot patch makes is 1141 ms - 1095ms = 46 ms extra
downtime, not 500 ms.

I can guess this is because of extra contention on BQL, with unfortunate timing.

> Not to mention, that patchset should at least reduce downtime OTOH due to
> the small num of slots, because some of the dirty sync / clear path would
> need to walk the whole slot array (our lookup is pretty slow for now, but
> probably no good reason to rework it yet if it's mostly 10-20).

With multifd transfer being disabled your memslot patch indeed improves the
downtime by 1900 ms - 1730 ms = 170 ms.

> In general, I would still expect that dynamic memslot work to speedup
> (instead of slowing down) VFIO migrations.
> 
> There's something off here, or something I overlooked.  I suggest we figure
> it out..  Even if we need to revert the kvm series on master, but I so far
> doubt it.
> 
> Otherwise we should at least report the number with things on the master
> branch, and we evaluate merging this series with that real number, because
> fundamentally that's the numbers people will get when start using this
> feature on master later.

Sure, that's why in the cover letter I provided the numbers with your commit
present, too.

Thanks,
Maciej
Re: [PATCH v3 00/24] Multifd 🔀device state transfer support with VFIO consumer
Posted by Peter Xu 1 year, 1 month ago
On Wed, Dec 11, 2024 at 12:06:04AM +0100, Maciej S. Szmigiero wrote:
> On 6.12.2024 23:20, Peter Xu wrote:
> > On Fri, Dec 06, 2024 at 07:03:36PM +0100, Maciej S. Szmigiero wrote:
> > > On 4.12.2024 20:10, Peter Xu wrote:
> > > > On Sun, Nov 17, 2024 at 08:19:55PM +0100, Maciej S. Szmigiero wrote:
> > > > > Important note:
> > > > > 4 VF benchmarks were done with commit 5504a8126115
> > > > > ("KVM: Dynamic sized kvm memslots array") and its revert-dependencies
> > > > > reverted since this seems to improve performance in this VM config if the
> > > > > multifd transfer is enabled: the downtime performance with this commit
> > > > > present is 1141 ms enabled / 1730 ms disabled.
> > > > > 
> > > > > Smaller VF counts actually do seem to benefit from this commit, so it's
> > > > > likely that in the future adding some kind of a memslot pre-allocation
> > > > > bit stream message might make sense to avoid this downtime regression for
> > > > > 4 VF configs (and likely higher VF count too).
> > > > 
> > > > I'm confused why revert 5504a8126115 could be faster, and it affects as
> > > > much as 600ms.  Also how that effect differs can relevant to num of VFs.
> > > > 
> > > > Could you share more on this regression?  Because if that's problematic we
> > > > need to fix it, or upstream QEMU (after this series merged) will still not
> > > > work.
> > > > 
> > > 
> > > The number of memslots that the VM uses seems to differ depending on its
> > > VF count, each VF using 2 memslots:
> > > 2 VFs, used slots: 13
> > > 4 VFs, used slots: 17
> > > 5 VFs, used slots: 19
> > 
> > It's still pretty less.
> > 
> > > 
> > > So I suspect this performance difference is due to these higher counts
> > > of memslots possibly benefiting from being preallocated on the previous
> > > QEMU code (before commit 5504a8126115).
> > > 
> > > I can see that with this commit:
> > > > #define  KVM_MEMSLOTS_NR_ALLOC_DEFAULT                      16
> > > 
> > > So it would explain why the difference is visible on 4 VFs only (and
> > > possibly higher VF counts, just I don't have an ability to test migrating
> > > it) since with 4 VF configs we exceed KVM_MEMSLOTS_NR_ALLOC_DEFAULT.
> > 
> > I suppose it means kvm_slots_grow() is called once, but I don't understand
> > why it caused 500ms downtime!
> 
> In this cover letter sentence:
> > "the downtime performance with this commit present is 1141 ms enabled / 1730 ms disabled"
> "enabled" and "disabled" refer to *multifd transfer* being enabled, not
> your patch being present (sorry for not being 100% clear there).
> 
> So the difference that the memslot patch makes is 1141 ms - 1095ms = 46 ms extra
> downtime, not 500 ms.
> 
> I can guess this is because of extra contention on BQL, with unfortunate timing.

Hmm, I wonder why the address space changed during switchover.  I was
expecting the whole address space is updated on qemu boots up, and should
keep as is during migration.  Especially why that matters with mulitifd at
all..  I could have overlooked something.

> 
> > Not to mention, that patchset should at least reduce downtime OTOH due to
> > the small num of slots, because some of the dirty sync / clear path would
> > need to walk the whole slot array (our lookup is pretty slow for now, but
> > probably no good reason to rework it yet if it's mostly 10-20).
> 
> With multifd transfer being disabled your memslot patch indeed improves the
> downtime by 1900 ms - 1730 ms = 170 ms.

That's probably the other side of the change when slots grow, comparing to
the pure win where the series definitely should speedup the dirty track
operations quite a bit.

> 
> > In general, I would still expect that dynamic memslot work to speedup
> > (instead of slowing down) VFIO migrations.
> > 
> > There's something off here, or something I overlooked.  I suggest we figure
> > it out..  Even if we need to revert the kvm series on master, but I so far
> > doubt it.
> > 
> > Otherwise we should at least report the number with things on the master
> > branch, and we evaluate merging this series with that real number, because
> > fundamentally that's the numbers people will get when start using this
> > feature on master later.
> 
> Sure, that's why in the cover letter I provided the numbers with your commit
> present, too.

It seems to me we're not far away from the truth.  Anyway, feel free to
update if you figure out the reason, or got some news on profiling.

Thanks,

-- 
Peter Xu
Re: [PATCH v3 00/24] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Yanghang Liu 1 year, 1 month ago
FYI.  The following data comes from the first ping-pong mlx VF
migration after rebooting the host.


1. Test for multifd=0:

1.1 Outgoing migration:
VF number:                     1 VF                           4 VF
Time elapsed:             10194 ms                   10650 ms
Memory processed:    903.911 MiB             783.698 MiB
Memory bandwidth:    108.722 MiB/s          101.978 MiB/s
Iteration:                               4                              6
Normal data:                881.297 MiB             747.613 MiB
Total downtime                358ms                       518ms
Setup time                        52ms                        450ms

1.2 In coming migration:
VF number:                       1 VF                            4 VF
Time elapsed:                10161 ms                    10569 ms
Memory processed:     903.881 MiB                785.400 MiB
Memory bandwidth:     107.952 MiB/s             100.512 MiB/s
Iteration:                               4                                7
Normal data:                 881.262 MiB               749.297 MiB
Total downtime                315ms                        513ms
Setup time                        47ms                         414ms


2. Test for multifd=1:

2.1 Outgoing migration:
VF number                     1 VF                           1 VF
Channel number               4                                  5
Time elapsed:              10962 ms                  10071 ms
Memory processed:     908.968 MiB             908.424 MiB
Memory bandwidth:     108.378 MiB/s         110.109 MiB/s
Iteration:                               4
  4
Normal data:               882.852 MiB              882.566 MiB
Total downtime                318ms                       255ms
Setup time                         54ms                        43ms


VF number                    4 VFs                        4 VFs
Channel number             8                               16
Time elapsed:            10805 ms                  10943 ms
Setup time                   445 ms                       463ms
Memory processed:  786.334 MiB            784.926 MiB
Memory bandwidth   109.062 MiB/s         108.610 MiB/s
Iteration:                              5                           7
Normal data:            746.758 MiB             744.938 MiB
Total downtime            344 ms                     335ms


2.2 Incoming migration:
VF number                       1 VF                      1 VF
Channel number                4                            5
Time elapsed:                10064ms               10072 ms
Memory processed:     909.786 MiB           923.746 MiB
Memory bandwidth:      109.997 MiB/s       111.308 MiB/s
Iteration:                               4                          4
Normal data:               883.664 MiB            897.848 MiB
Total downtime                 313ms                   328ms
Setup time                        46ms                      47ms

VF number                   4 VFs                        4 VFs
Channel number             8                              16
Time elapsed:             10126 ms                 9941 ms
Memory processed:   791.308 MiB           779.560 MiB
Memory bandwidth:  108.876 MiB/s         110.170 MiB/s
Iteration:                          7                               5
 Normal data:             751.672 MiB           739.680 MiB
Total downtime             304 ms                    309ms
Setup time                    442 ms                    446ms


Best Regards,
Yanghang Liu




On Fri, Dec 13, 2024 at 1:36 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Dec 11, 2024 at 12:06:04AM +0100, Maciej S. Szmigiero wrote:
> > On 6.12.2024 23:20, Peter Xu wrote:
> > > On Fri, Dec 06, 2024 at 07:03:36PM +0100, Maciej S. Szmigiero wrote:
> > > > On 4.12.2024 20:10, Peter Xu wrote:
> > > > > On Sun, Nov 17, 2024 at 08:19:55PM +0100, Maciej S. Szmigiero wrote:
> > > > > > Important note:
> > > > > > 4 VF benchmarks were done with commit 5504a8126115
> > > > > > ("KVM: Dynamic sized kvm memslots array") and its revert-dependencies
> > > > > > reverted since this seems to improve performance in this VM config if the
> > > > > > multifd transfer is enabled: the downtime performance with this commit
> > > > > > present is 1141 ms enabled / 1730 ms disabled.
> > > > > >
> > > > > > Smaller VF counts actually do seem to benefit from this commit, so it's
> > > > > > likely that in the future adding some kind of a memslot pre-allocation
> > > > > > bit stream message might make sense to avoid this downtime regression for
> > > > > > 4 VF configs (and likely higher VF count too).
> > > > >
> > > > > I'm confused why revert 5504a8126115 could be faster, and it affects as
> > > > > much as 600ms.  Also how that effect differs can relevant to num of VFs.
> > > > >
> > > > > Could you share more on this regression?  Because if that's problematic we
> > > > > need to fix it, or upstream QEMU (after this series merged) will still not
> > > > > work.
> > > > >
> > > >
> > > > The number of memslots that the VM uses seems to differ depending on its
> > > > VF count, each VF using 2 memslots:
> > > > 2 VFs, used slots: 13
> > > > 4 VFs, used slots: 17
> > > > 5 VFs, used slots: 19
> > >
> > > It's still pretty less.
> > >
> > > >
> > > > So I suspect this performance difference is due to these higher counts
> > > > of memslots possibly benefiting from being preallocated on the previous
> > > > QEMU code (before commit 5504a8126115).
> > > >
> > > > I can see that with this commit:
> > > > > #define  KVM_MEMSLOTS_NR_ALLOC_DEFAULT                      16
> > > >
> > > > So it would explain why the difference is visible on 4 VFs only (and
> > > > possibly higher VF counts, just I don't have an ability to test migrating
> > > > it) since with 4 VF configs we exceed KVM_MEMSLOTS_NR_ALLOC_DEFAULT.
> > >
> > > I suppose it means kvm_slots_grow() is called once, but I don't understand
> > > why it caused 500ms downtime!
> >
> > In this cover letter sentence:
> > > "the downtime performance with this commit present is 1141 ms enabled / 1730 ms disabled"
> > "enabled" and "disabled" refer to *multifd transfer* being enabled, not
> > your patch being present (sorry for not being 100% clear there).
> >
> > So the difference that the memslot patch makes is 1141 ms - 1095ms = 46 ms extra
> > downtime, not 500 ms.
> >
> > I can guess this is because of extra contention on BQL, with unfortunate timing.
>
> Hmm, I wonder why the address space changed during switchover.  I was
> expecting the whole address space is updated on qemu boots up, and should
> keep as is during migration.  Especially why that matters with mulitifd at
> all..  I could have overlooked something.
>
> >
> > > Not to mention, that patchset should at least reduce downtime OTOH due to
> > > the small num of slots, because some of the dirty sync / clear path would
> > > need to walk the whole slot array (our lookup is pretty slow for now, but
> > > probably no good reason to rework it yet if it's mostly 10-20).
> >
> > With multifd transfer being disabled your memslot patch indeed improves the
> > downtime by 1900 ms - 1730 ms = 170 ms.
>
> That's probably the other side of the change when slots grow, comparing to
> the pure win where the series definitely should speedup the dirty track
> operations quite a bit.
>
> >
> > > In general, I would still expect that dynamic memslot work to speedup
> > > (instead of slowing down) VFIO migrations.
> > >
> > > There's something off here, or something I overlooked.  I suggest we figure
> > > it out..  Even if we need to revert the kvm series on master, but I so far
> > > doubt it.
> > >
> > > Otherwise we should at least report the number with things on the master
> > > branch, and we evaluate merging this series with that real number, because
> > > fundamentally that's the numbers people will get when start using this
> > > feature on master later.
> >
> > Sure, that's why in the cover letter I provided the numbers with your commit
> > present, too.
>
> It seems to me we're not far away from the truth.  Anyway, feel free to
> update if you figure out the reason, or got some news on profiling.
>
> Thanks,
>
> --
> Peter Xu
>
>
Re: [PATCH v3 00/24] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Cédric Le Goater 1 year, 1 month ago
Hello Yanghang

On 12/19/24 08:55, Yanghang Liu wrote:
> FYI.  The following data comes from the first ping-pong mlx VF
> migration after rebooting the host.
> 
> 
> 1. Test for multifd=0:
> 
> 1.1 Outgoing migration:
> VF number:                     1 VF                           4 VF
> Time elapsed:             10194 ms                   10650 ms
> Memory processed:    903.911 MiB             783.698 MiB
> Memory bandwidth:    108.722 MiB/s          101.978 MiB/s
> Iteration:                               4                              6
> Normal data:                881.297 MiB             747.613 MiB
> Total downtime                358ms                       518ms
> Setup time                        52ms                        450ms
> 
> 1.2 In coming migration:
> VF number:                       1 VF                            4 VF
> Time elapsed:                10161 ms                    10569 ms
> Memory processed:     903.881 MiB                785.400 MiB
> Memory bandwidth:     107.952 MiB/s             100.512 MiB/s
> Iteration:                               4                                7
> Normal data:                 881.262 MiB               749.297 MiB
> Total downtime                315ms                        513ms
> Setup time                        47ms                         414ms
> 
> 
> 2. Test for multifd=1:
> 
> 2.1 Outgoing migration:
> VF number                     1 VF                           1 VF
> Channel number               4                                  5
> Time elapsed:              10962 ms                  10071 ms
> Memory processed:     908.968 MiB             908.424 MiB
> Memory bandwidth:     108.378 MiB/s         110.109 MiB/s
> Iteration:                               4
>    4
> Normal data:               882.852 MiB              882.566 MiB
> Total downtime                318ms                       255ms
> Setup time                         54ms                        43ms
> 
> 
> VF number                    4 VFs                        4 VFs
> Channel number             8                               16
> Time elapsed:            10805 ms                  10943 ms
> Setup time                   445 ms                       463ms
> Memory processed:  786.334 MiB            784.926 MiB
> Memory bandwidth   109.062 MiB/s         108.610 MiB/s
> Iteration:                              5                           7
> Normal data:            746.758 MiB             744.938 MiB
> Total downtime            344 ms                     335ms
> 
> 
> 2.2 Incoming migration:
> VF number                       1 VF                      1 VF
> Channel number                4                            5
> Time elapsed:                10064ms               10072 ms
> Memory processed:     909.786 MiB           923.746 MiB
> Memory bandwidth:      109.997 MiB/s       111.308 MiB/s
> Iteration:                               4                          4
> Normal data:               883.664 MiB            897.848 MiB
> Total downtime                 313ms                   328ms
> Setup time                        46ms                      47ms
> 
> VF number                   4 VFs                        4 VFs
> Channel number             8                              16
> Time elapsed:             10126 ms                 9941 ms
> Memory processed:   791.308 MiB           779.560 MiB
> Memory bandwidth:  108.876 MiB/s         110.170 MiB/s
> Iteration:                          7                               5
>   Normal data:             751.672 MiB           739.680 MiB
> Total downtime             304 ms                    309ms
> Setup time                    442 ms                    446ms
> 

This is difficult to read. Could you please resend with a fixed
indentation ?

We would need more information on the host and vm config too.

Thanks,

C.
Re: [PATCH v3 00/24] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Yanghang Liu 1 year, 1 month ago
Sorry for the inconvenience.  Let me try to re-send my data via my Gmail client.

Test environment:
Host : Dell 7625
CPU : EPYC-Genoa
VM config : 4 vCPU, 8G memory
Network Device: MT2910

Test report:
+------------------+---------------+----------------+
| multifd=0        |     outgoing migration         |
+------------------+---------------+----------------+
| VF(s) number     | 1             | 4              |
| Time elapsed     | 10194 ms      | 10650 ms       |
| Memory processed | 903.911 MiB   | 783.698 MiB    |
| Memory bandwidth | 108.722 MiB/s | 101.978 MiB/s  |
| Iteration        | 4             | 6              |
| Normal data      | 881.297 MiB   | 747.613 MiB    |
| Total downtime   | 358ms         | 518ms          |
| Setup time       | 52ms          | 450ms          |
+------------------+---------------+----------------+

+------------------+---------------+----------------+
| multifd=0        |     incoming migration         |
+------------------+---------------+----------------+
| VF(s) number     | 1             | 4              |
| Time elapsed     | 10161 ms      | 10569 ms       |
| Memory processed | 903.881 MiB   | 785.400 MiB    |
| Memory bandwidth | 107.952 MiB/s | 100.512 MiB/s  |
| Iteration        | 4             | 7              |
| Normal data      | 881.262 MiB   | 749.297 MiB    |
| Total downtime   | 315ms         | 513ms          |
| Setup time       | 47ms          | 414ms          |
+------------------+---------------+----------------+



+------------------+---------------+---------------+
| multifd=1        |     outgoing migration        |
+------------------+---------------+---------------+
| VF(s) number     | 1             | 1             |
| Channel          | 4             | 5             |
| Time elapsed     | 10962 ms      | 10071 ms      |
| Memory processed | 908.968 MiB   | 908.424 MiB   |
| Memory bandwidth | 108.378 MiB/s | 110.109 MiB/s |
| Iteration        | 4             | 4             |
| Normal data      | 882.852 MiB   | 882.566 MiB   |
| Total downtime   | 318ms         | 255ms         |
| Setup time       | 54ms          | 43ms          |
+------------------+---------------+---------------+


+------------------+---------------+----------------+
| multifd=1        |     incoming migration         |
+------------------+---------------+----------------+
| VF(s) number     | 1             | 1              |
| Channel          | 4             | 5              |
| Time elapsed     | 10064ms       | 10072 ms       |
| Memory processed | 909.786 MiB   | 923.746 MiB    |
| Memory bandwidth | 109.997 MiB/s | 111.308 MiB/s  |
| Iteration        | 4             | 4              |
| Normal data      | 883.664 MiB   | 897.848 MiB    |
| Total downtime   | 313ms         | 328ms          |
| Setup time       | 46ms          | 47ms           |
+------------------+---------------+----------------+


+------------------+---------------+----------------+
| multifd=1        |     outgoing migration         |
+------------------+---------------+----------------+
| VF(s) number     | 4             | 4              |
| Channel          | 8             | 16             |
| Time elapsed     | 10805 ms      | 10943 ms       |
| Memory processed | 786.334 MiB   | 784.926 MiB    |
| Memory bandwidth | 109.062 MiB/s | 108.610 MiB/s  |
| Iteration        | 5             | 7              |
| Normal data      | 746.758 MiB   | 744.938 MiB    |
| Total downtime   | 344 ms        | 335ms          |
| Setup time       | 445 ms        | 463ms          |
+------------------+---------------+----------------+


+------------------+---------------+------------------+
| multifd=1        |     incoming migration           |
+------------------+---------------+------------------+
| VF(s) number     | 4             | 4                |
| Channel          | 8             | 16               |
| Time elapsed     | 10126 ms      | 9941 ms          |
| Memory processed | 791.308 MiB   | 779.560 MiB      |
| Memory bandwidth | 108.876 MiB/s | 110.170 MiB/s    |
| Iteration        | 7             | 5                |
| Normal data      | 751.672 MiB   | 739.680 MiB      |
| Total downtime   | 304 ms        | 309ms            |
| Setup time       | 442 ms        | 446ms            |
+------------------+---------------+------------------+

Best Regards,
Yanghang Liu


On Thu, Dec 19, 2024 at 4:53 PM Cédric Le Goater <clg@redhat.com> wrote:
>
> Hello Yanghang
>
> On 12/19/24 08:55, Yanghang Liu wrote:
> > FYI.  The following data comes from the first ping-pong mlx VF
> > migration after rebooting the host.
> >
> >
> > 1. Test for multifd=0:
> >
> > 1.1 Outgoing migration:
> > VF number:                     1 VF                           4 VF
> > Time elapsed:             10194 ms                   10650 ms
> > Memory processed:    903.911 MiB             783.698 MiB
> > Memory bandwidth:    108.722 MiB/s          101.978 MiB/s
> > Iteration:                               4                              6
> > Normal data:                881.297 MiB             747.613 MiB
> > Total downtime                358ms                       518ms
> > Setup time                        52ms                        450ms
> >
> > 1.2 In coming migration:
> > VF number:                       1 VF                            4 VF
> > Time elapsed:                10161 ms                    10569 ms
> > Memory processed:     903.881 MiB                785.400 MiB
> > Memory bandwidth:     107.952 MiB/s             100.512 MiB/s
> > Iteration:                               4                                7
> > Normal data:                 881.262 MiB               749.297 MiB
> > Total downtime                315ms                        513ms
> > Setup time                        47ms                         414ms
> >
> >
> > 2. Test for multifd=1:
> >
> > 2.1 Outgoing migration:
> > VF number                     1 VF                           1 VF
> > Channel number               4                                  5
> > Time elapsed:              10962 ms                  10071 ms
> > Memory processed:     908.968 MiB             908.424 MiB
> > Memory bandwidth:     108.378 MiB/s         110.109 MiB/s
> > Iteration:                               4
> >    4
> > Normal data:               882.852 MiB              882.566 MiB
> > Total downtime                318ms                       255ms
> > Setup time                         54ms                        43ms
> >
> >
> > VF number                    4 VFs                        4 VFs
> > Channel number             8                               16
> > Time elapsed:            10805 ms                  10943 ms
> > Setup time                   445 ms                       463ms
> > Memory processed:  786.334 MiB            784.926 MiB
> > Memory bandwidth   109.062 MiB/s         108.610 MiB/s
> > Iteration:                              5                           7
> > Normal data:            746.758 MiB             744.938 MiB
> > Total downtime            344 ms                     335ms
> >
> >
> > 2.2 Incoming migration:
> > VF number                       1 VF                      1 VF
> > Channel number                4                            5
> > Time elapsed:                10064ms               10072 ms
> > Memory processed:     909.786 MiB           923.746 MiB
> > Memory bandwidth:      109.997 MiB/s       111.308 MiB/s
> > Iteration:                               4                          4
> > Normal data:               883.664 MiB            897.848 MiB
> > Total downtime                 313ms                   328ms
> > Setup time                        46ms                      47ms
> >
> > VF number                   4 VFs                        4 VFs
> > Channel number             8                              16
> > Time elapsed:             10126 ms                 9941 ms
> > Memory processed:   791.308 MiB           779.560 MiB
> > Memory bandwidth:  108.876 MiB/s         110.170 MiB/s
> > Iteration:                          7                               5
> >   Normal data:             751.672 MiB           739.680 MiB
> > Total downtime             304 ms                    309ms
> > Setup time                    442 ms                    446ms
> >
>
> This is difficult to read. Could you please resend with a fixed
> indentation ?
>
> We would need more information on the host and vm config too.
>
> Thanks,
>
> C.
>