[PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer

Maciej S. Szmigiero posted 36 patches 3 weeks, 6 days ago
docs/devel/migration/vfio.rst      |  79 ++-
hw/core/machine.c                  |   2 +
hw/vfio/meson.build                |   1 +
hw/vfio/migration-multifd.c        | 786 +++++++++++++++++++++++++++++
hw/vfio/migration-multifd.h        |  37 ++
hw/vfio/migration.c                | 111 ++--
hw/vfio/pci.c                      |  40 ++
hw/vfio/trace-events               |  13 +-
include/block/aio.h                |   8 +-
include/block/thread-pool.h        |  62 ++-
include/hw/vfio/vfio-common.h      |  34 ++
include/migration/client-options.h |   4 +
include/migration/misc.h           |  25 +
include/migration/register.h       |  52 +-
include/qapi/error.h               |   2 +
include/qemu/typedefs.h            |   5 +
migration/colo.c                   |   3 +
migration/meson.build              |   1 +
migration/migration-hmp-cmds.c     |   2 +
migration/migration.c              |  20 +-
migration/migration.h              |   7 +
migration/multifd-device-state.c   | 212 ++++++++
migration/multifd-nocomp.c         |  30 +-
migration/multifd.c                | 248 +++++++--
migration/multifd.h                |  74 ++-
migration/options.c                |   9 +
migration/qemu-file.h              |   2 +
migration/savevm.c                 | 201 +++++++-
migration/savevm.h                 |   6 +-
migration/trace-events             |   1 +
scripts/analyze-migration.py       |  11 +
tests/unit/test-thread-pool.c      |   6 +-
util/async.c                       |   6 +-
util/thread-pool.c                 | 184 +++++--
util/trace-events                  |   6 +-
35 files changed, 2125 insertions(+), 165 deletions(-)
create mode 100644 hw/vfio/migration-multifd.c
create mode 100644 hw/vfio/migration-multifd.h
create mode 100644 migration/multifd-device-state.c
[PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Maciej S. Szmigiero 3 weeks, 6 days ago
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

This is an updated v6 patch series of the v5 series located here:
https://lore.kernel.org/qemu-devel/cover.1739994627.git.maciej.szmigiero@oracle.com/

What this patch set is about?
Current live migration device state transfer is done via the main (single)
migration channel, which reduces performance and severally impacts the
migration downtime for VMs having large device state that needs to be
transferred during the switchover phase.

Example devices that have such large switchover phase device state are some
types of VFIO SmartNICs and GPUs.

This patch set allows parallelizing this transfer by using multifd channels
for it.
It also introduces new load and save threads per VFIO device for decoupling
these operations from the main migration thread.
These threads run on newly introduced generic (non-AIO) thread pools,
instantiated by the core migration core.

Changes from v5:
* Add bql_locked() assertion to migration_incoming_state_destroy() with a
comment describing why holding BQL there is necessary.

* Add SPDX-License-Identifier to newly added files.

* Move consistency of multfd transfer settings check to the patch adding
x-migration-multifd-transfer property.

* Change packet->idx == UINT32_MAX message to the suggested one.

* Use WITH_QEMU_LOCK_GUARD() in vfio_load_state_buffer().

* Add vfio_load_bufs_thread_{start,end} trace events.

* Invert "ret" value computation logic in vfio_load_bufs_thread() and
  vfio_multifd_save_complete_precopy_thread() - initialize "ret" to false
  at definition, remove "ret = false" at every failure/early exit block and
  add "ret = true" just before the early exit jump label.

* Make vfio_load_bufs_thread_load_config() return a bool and take an
  "Error **" parameter.

* Make vfio_multifd_setup() (previously called vfio_multifd_transfer_setup())
  allocate struct VFIOMultifd if requested by "alloc_multifd" parameter.

* Add vfio_multifd_cleanup() call to vfio_save_cleanup() (for consistency
  with the load code), with a comment describing that it is currently a NOP
  there.

* Move vfio_multifd_cleanup() to migration-multifd.c.

* Move general multifd migration description in docs/devel/migration/vfio.rst
  from the top section to new "Multifd" section at the bottom.

* Add comment describing why x-migration-multifd-transfer needs to be
  a custom property above the variable containing that custom property type
  in register_vfio_pci_dev_type().

* Add object_class_property_set_description() description for all 3 newly
  added parameters: x-migration-multifd-transfer,
  x-migration-load-config-after-iter and x-migration-max-queued-buffers.

* Split out wiring vfio_multifd_setup() and vfio_multifd_cleanup() into
  general VFIO load/save setup and cleanup methods into a brand new
  patch/commit.

* Squash the patch introducing VFIOStateBuffer(s) into the "received buffers
  queuing" commit to fix building the interim code form at the time of this
  patch with "-Werror".
  
* Change device state packet "idstr" field to NULL-terminated and drop
  QEMU_NONSTRING marking from its definition.

* Add vbasedev->name to VFIO error messages to know which device caused
  that error.

* Move BQL lock ordering assert closer to the other lock in the lock order
  in vfio_load_state_buffer().

* Drop orphan "QemuThread load_bufs_thread" VFIOMultifd member leftover
  from the days of the version 2 of this patch set.

* Change "guint" into an "unsigned int" where it was present in this
  patch set.

* Use g_autoptr() for QEMUFile also in vfio_load_bufs_thread_load_config().

* Call multifd_abort_device_state_save_threads() if a migration error is
  already set in the save path to avoid needlessly waiting for the remaining
  threads to do all of their normal work.

* Other minor changes that should not have functional impact, like:
  renamed functions/labels, moved code lines between patches contained
  in this patch set, added review tags, code formatting, rebased on top
  of the latest QEMU git master, etc.

========================================================================

This patch set is targeting QEMU 10.0.

It is also exported as a git tree:
https://gitlab.com/maciejsszmigiero/qemu/-/commits/multifd-device-state-transfer-vfio

========================================================================

Maciej S. Szmigiero (35):
  migration: Clarify that {load,save}_cleanup handlers can run without
    setup
  thread-pool: Remove thread_pool_submit() function
  thread-pool: Rename AIO pool functions to *_aio() and data types to
    *Aio
  thread-pool: Implement generic (non-AIO) pool support
  migration: Add MIG_CMD_SWITCHOVER_START and its load handler
  migration: Add qemu_loadvm_load_state_buffer() and its handler
  migration: postcopy_ram_listen_thread() should take BQL for some calls
  error: define g_autoptr() cleanup function for the Error type
  migration: Add thread pool of optional load threads
  migration/multifd: Split packet into header and RAM data
  migration/multifd: Device state transfer support - receive side
  migration/multifd: Make multifd_send() thread safe
  migration/multifd: Add an explicit MultiFDSendData destructor
  migration/multifd: Device state transfer support - send side
  migration/multifd: Add multifd_device_state_supported()
  migration: Add save_live_complete_precopy_thread handler
  vfio/migration: Add load_device_config_state_start trace event
  vfio/migration: Convert bytes_transferred counter to atomic
  vfio/migration: Add vfio_add_bytes_transferred()
  vfio/migration: Move migration channel flags to vfio-common.h header
    file
  vfio/migration: Multifd device state transfer support - basic types
  vfio/migration: Multifd device state transfer - add support checking
    function
  vfio/migration: Multifd setup/cleanup functions and associated
    VFIOMultifd
  vfio/migration: Setup and cleanup multifd transfer in these general
    methods
  vfio/migration: Multifd device state transfer support - received
    buffers queuing
  vfio/migration: Multifd device state transfer support - load thread
  migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile
  vfio/migration: Multifd device state transfer support - config loading
    support
  vfio/migration: Multifd device state transfer support - send side
  vfio/migration: Add x-migration-multifd-transfer VFIO property
  vfio/migration: Make x-migration-multifd-transfer VFIO property
    mutable
  hw/core/machine: Add compat for x-migration-multifd-transfer VFIO
    property
  vfio/migration: Max in-flight VFIO device state buffer count limit
  vfio/migration: Add x-migration-load-config-after-iter VFIO property
  vfio/migration: Update VFIO migration documentation

Peter Xu (1):
  migration/multifd: Make MultiFDSendData a struct

 docs/devel/migration/vfio.rst      |  79 ++-
 hw/core/machine.c                  |   2 +
 hw/vfio/meson.build                |   1 +
 hw/vfio/migration-multifd.c        | 786 +++++++++++++++++++++++++++++
 hw/vfio/migration-multifd.h        |  37 ++
 hw/vfio/migration.c                | 111 ++--
 hw/vfio/pci.c                      |  40 ++
 hw/vfio/trace-events               |  13 +-
 include/block/aio.h                |   8 +-
 include/block/thread-pool.h        |  62 ++-
 include/hw/vfio/vfio-common.h      |  34 ++
 include/migration/client-options.h |   4 +
 include/migration/misc.h           |  25 +
 include/migration/register.h       |  52 +-
 include/qapi/error.h               |   2 +
 include/qemu/typedefs.h            |   5 +
 migration/colo.c                   |   3 +
 migration/meson.build              |   1 +
 migration/migration-hmp-cmds.c     |   2 +
 migration/migration.c              |  20 +-
 migration/migration.h              |   7 +
 migration/multifd-device-state.c   | 212 ++++++++
 migration/multifd-nocomp.c         |  30 +-
 migration/multifd.c                | 248 +++++++--
 migration/multifd.h                |  74 ++-
 migration/options.c                |   9 +
 migration/qemu-file.h              |   2 +
 migration/savevm.c                 | 201 +++++++-
 migration/savevm.h                 |   6 +-
 migration/trace-events             |   1 +
 scripts/analyze-migration.py       |  11 +
 tests/unit/test-thread-pool.c      |   6 +-
 util/async.c                       |   6 +-
 util/thread-pool.c                 | 184 +++++--
 util/trace-events                  |   6 +-
 35 files changed, 2125 insertions(+), 165 deletions(-)
 create mode 100644 hw/vfio/migration-multifd.c
 create mode 100644 hw/vfio/migration-multifd.h
 create mode 100644 migration/multifd-device-state.c
Re: [PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Cédric Le Goater 3 weeks, 5 days ago
Hello,

On 3/4/25 23:03, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> 
> This is an updated v6 patch series of the v5 series located here:
> https://lore.kernel.org/qemu-devel/cover.1739994627.git.maciej.szmigiero@oracle.com/
> 
> What this patch set is about?
> Current live migration device state transfer is done via the main (single)
> migration channel, which reduces performance and severally impacts the
> migration downtime for VMs having large device state that needs to be
> transferred during the switchover phase.
> 
> Example devices that have such large switchover phase device state are some
> types of VFIO SmartNICs and GPUs.
> 
> This patch set allows parallelizing this transfer by using multifd channels
> for it.
> It also introduces new load and save threads per VFIO device for decoupling
> these operations from the main migration thread.
> These threads run on newly introduced generic (non-AIO) thread pools,
> instantiated by the core migration core.

I think we are ready to apply 1-33. Avihai, please take a look !

7,15 and 17 still need an Ack from Peter and/or Fabiano though.

34 can be reworked a bit before -rc0.
35 is for QEMU 10.1.
36 needs some massaging. I will do that.

This can go through the vfio tree if everyone agrees.

Thanks,

C.




> Changes from v5:
> * Add bql_locked() assertion to migration_incoming_state_destroy() with a
> comment describing why holding BQL there is necessary.
> 
> * Add SPDX-License-Identifier to newly added files.
> 
> * Move consistency of multfd transfer settings check to the patch adding
> x-migration-multifd-transfer property.
> 
> * Change packet->idx == UINT32_MAX message to the suggested one.
> 
> * Use WITH_QEMU_LOCK_GUARD() in vfio_load_state_buffer().
> 
> * Add vfio_load_bufs_thread_{start,end} trace events.
> 
> * Invert "ret" value computation logic in vfio_load_bufs_thread() and
>    vfio_multifd_save_complete_precopy_thread() - initialize "ret" to false
>    at definition, remove "ret = false" at every failure/early exit block and
>    add "ret = true" just before the early exit jump label.
> 
> * Make vfio_load_bufs_thread_load_config() return a bool and take an
>    "Error **" parameter.
> 
> * Make vfio_multifd_setup() (previously called vfio_multifd_transfer_setup())
>    allocate struct VFIOMultifd if requested by "alloc_multifd" parameter.
> 
> * Add vfio_multifd_cleanup() call to vfio_save_cleanup() (for consistency
>    with the load code), with a comment describing that it is currently a NOP
>    there.
> 
> * Move vfio_multifd_cleanup() to migration-multifd.c.
> 
> * Move general multifd migration description in docs/devel/migration/vfio.rst
>    from the top section to new "Multifd" section at the bottom.
> 
> * Add comment describing why x-migration-multifd-transfer needs to be
>    a custom property above the variable containing that custom property type
>    in register_vfio_pci_dev_type().
> 
> * Add object_class_property_set_description() description for all 3 newly
>    added parameters: x-migration-multifd-transfer,
>    x-migration-load-config-after-iter and x-migration-max-queued-buffers.
> 
> * Split out wiring vfio_multifd_setup() and vfio_multifd_cleanup() into
>    general VFIO load/save setup and cleanup methods into a brand new
>    patch/commit.
> 
> * Squash the patch introducing VFIOStateBuffer(s) into the "received buffers
>    queuing" commit to fix building the interim code form at the time of this
>    patch with "-Werror".
>    
> * Change device state packet "idstr" field to NULL-terminated and drop
>    QEMU_NONSTRING marking from its definition.
> 
> * Add vbasedev->name to VFIO error messages to know which device caused
>    that error.
> 
> * Move BQL lock ordering assert closer to the other lock in the lock order
>    in vfio_load_state_buffer().
> 
> * Drop orphan "QemuThread load_bufs_thread" VFIOMultifd member leftover
>    from the days of the version 2 of this patch set.
> 
> * Change "guint" into an "unsigned int" where it was present in this
>    patch set.
> 
> * Use g_autoptr() for QEMUFile also in vfio_load_bufs_thread_load_config().
> 
> * Call multifd_abort_device_state_save_threads() if a migration error is
>    already set in the save path to avoid needlessly waiting for the remaining
>    threads to do all of their normal work.
> 
> * Other minor changes that should not have functional impact, like:
>    renamed functions/labels, moved code lines between patches contained
>    in this patch set, added review tags, code formatting, rebased on top
>    of the latest QEMU git master, etc.
> 
> ========================================================================
> 
> This patch set is targeting QEMU 10.0.
> 
> It is also exported as a git tree:
> https://gitlab.com/maciejsszmigiero/qemu/-/commits/multifd-device-state-transfer-vfio
> 
> ========================================================================
> 
> Maciej S. Szmigiero (35):
>    migration: Clarify that {load,save}_cleanup handlers can run without
>      setup
>    thread-pool: Remove thread_pool_submit() function
>    thread-pool: Rename AIO pool functions to *_aio() and data types to
>      *Aio
>    thread-pool: Implement generic (non-AIO) pool support
>    migration: Add MIG_CMD_SWITCHOVER_START and its load handler
>    migration: Add qemu_loadvm_load_state_buffer() and its handler
>    migration: postcopy_ram_listen_thread() should take BQL for some calls
>    error: define g_autoptr() cleanup function for the Error type
>    migration: Add thread pool of optional load threads
>    migration/multifd: Split packet into header and RAM data
>    migration/multifd: Device state transfer support - receive side
>    migration/multifd: Make multifd_send() thread safe
>    migration/multifd: Add an explicit MultiFDSendData destructor
>    migration/multifd: Device state transfer support - send side
>    migration/multifd: Add multifd_device_state_supported()
>    migration: Add save_live_complete_precopy_thread handler
>    vfio/migration: Add load_device_config_state_start trace event
>    vfio/migration: Convert bytes_transferred counter to atomic
>    vfio/migration: Add vfio_add_bytes_transferred()
>    vfio/migration: Move migration channel flags to vfio-common.h header
>      file
>    vfio/migration: Multifd device state transfer support - basic types
>    vfio/migration: Multifd device state transfer - add support checking
>      function
>    vfio/migration: Multifd setup/cleanup functions and associated
>      VFIOMultifd
>    vfio/migration: Setup and cleanup multifd transfer in these general
>      methods
>    vfio/migration: Multifd device state transfer support - received
>      buffers queuing
>    vfio/migration: Multifd device state transfer support - load thread
>    migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile
>    vfio/migration: Multifd device state transfer support - config loading
>      support
>    vfio/migration: Multifd device state transfer support - send side
>    vfio/migration: Add x-migration-multifd-transfer VFIO property
>    vfio/migration: Make x-migration-multifd-transfer VFIO property
>      mutable
>    hw/core/machine: Add compat for x-migration-multifd-transfer VFIO
>      property
>    vfio/migration: Max in-flight VFIO device state buffer count limit
>    vfio/migration: Add x-migration-load-config-after-iter VFIO property
>    vfio/migration: Update VFIO migration documentation
> 
> Peter Xu (1):
>    migration/multifd: Make MultiFDSendData a struct
> 
>   docs/devel/migration/vfio.rst      |  79 ++-
>   hw/core/machine.c                  |   2 +
>   hw/vfio/meson.build                |   1 +
>   hw/vfio/migration-multifd.c        | 786 +++++++++++++++++++++++++++++
>   hw/vfio/migration-multifd.h        |  37 ++
>   hw/vfio/migration.c                | 111 ++--
>   hw/vfio/pci.c                      |  40 ++
>   hw/vfio/trace-events               |  13 +-
>   include/block/aio.h                |   8 +-
>   include/block/thread-pool.h        |  62 ++-
>   include/hw/vfio/vfio-common.h      |  34 ++
>   include/migration/client-options.h |   4 +
>   include/migration/misc.h           |  25 +
>   include/migration/register.h       |  52 +-
>   include/qapi/error.h               |   2 +
>   include/qemu/typedefs.h            |   5 +
>   migration/colo.c                   |   3 +
>   migration/meson.build              |   1 +
>   migration/migration-hmp-cmds.c     |   2 +
>   migration/migration.c              |  20 +-
>   migration/migration.h              |   7 +
>   migration/multifd-device-state.c   | 212 ++++++++
>   migration/multifd-nocomp.c         |  30 +-
>   migration/multifd.c                | 248 +++++++--
>   migration/multifd.h                |  74 ++-
>   migration/options.c                |   9 +
>   migration/qemu-file.h              |   2 +
>   migration/savevm.c                 | 201 +++++++-
>   migration/savevm.h                 |   6 +-
>   migration/trace-events             |   1 +
>   scripts/analyze-migration.py       |  11 +
>   tests/unit/test-thread-pool.c      |   6 +-
>   util/async.c                       |   6 +-
>   util/thread-pool.c                 | 184 +++++--
>   util/trace-events                  |   6 +-
>   35 files changed, 2125 insertions(+), 165 deletions(-)
>   create mode 100644 hw/vfio/migration-multifd.c
>   create mode 100644 hw/vfio/migration-multifd.h
>   create mode 100644 migration/multifd-device-state.c
>
Re: [PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Cédric Le Goater 3 weeks, 5 days ago
On 3/5/25 10:29, Cédric Le Goater wrote:
> Hello,
> 
> On 3/4/25 23:03, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> This is an updated v6 patch series of the v5 series located here:
>> https://lore.kernel.org/qemu-devel/cover.1739994627.git.maciej.szmigiero@oracle.com/
>>
>> What this patch set is about?
>> Current live migration device state transfer is done via the main (single)
>> migration channel, which reduces performance and severally impacts the
>> migration downtime for VMs having large device state that needs to be
>> transferred during the switchover phase.
>>
>> Example devices that have such large switchover phase device state are some
>> types of VFIO SmartNICs and GPUs.
>>
>> This patch set allows parallelizing this transfer by using multifd channels
>> for it.
>> It also introduces new load and save threads per VFIO device for decoupling
>> these operations from the main migration thread.
>> These threads run on newly introduced generic (non-AIO) thread pools,
>> instantiated by the core migration core.
> 
> I think we are ready to apply 1-33. Avihai, please take a look !

Applied to vfio-next with changes for documentation.

Avihai, I will wait for your input before sending a PR.

Thanks,

C.



Re: [PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Avihai Horon 3 weeks, 5 days ago
On 05/03/2025 19:45, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 3/5/25 10:29, Cédric Le Goater wrote:
>> Hello,
>>
>> On 3/4/25 23:03, Maciej S. Szmigiero wrote:
>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>
>>> This is an updated v6 patch series of the v5 series located here:
>>> https://lore.kernel.org/qemu-devel/cover.1739994627.git.maciej.szmigiero@oracle.com/ 
>>>
>>>
>>> What this patch set is about?
>>> Current live migration device state transfer is done via the main 
>>> (single)
>>> migration channel, which reduces performance and severally impacts the
>>> migration downtime for VMs having large device state that needs to be
>>> transferred during the switchover phase.
>>>
>>> Example devices that have such large switchover phase device state 
>>> are some
>>> types of VFIO SmartNICs and GPUs.
>>>
>>> This patch set allows parallelizing this transfer by using multifd 
>>> channels
>>> for it.
>>> It also introduces new load and save threads per VFIO device for 
>>> decoupling
>>> these operations from the main migration thread.
>>> These threads run on newly introduced generic (non-AIO) thread pools,
>>> instantiated by the core migration core.
>>
>> I think we are ready to apply 1-33. Avihai, please take a look !
>
> Applied to vfio-next with changes for documentation.
>
> Avihai, I will wait for your input before sending a PR.

Other than the comment I left everything looks fine by me.

Thanks.


Re: [PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Avihai Horon 3 weeks, 5 days ago
On 05/03/2025 11:29, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> Hello,
>
> On 3/4/25 23:03, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> This is an updated v6 patch series of the v5 series located here:
>> https://lore.kernel.org/qemu-devel/cover.1739994627.git.maciej.szmigiero@oracle.com/ 
>>
>>
>> What this patch set is about?
>> Current live migration device state transfer is done via the main 
>> (single)
>> migration channel, which reduces performance and severally impacts the
>> migration downtime for VMs having large device state that needs to be
>> transferred during the switchover phase.
>>
>> Example devices that have such large switchover phase device state 
>> are some
>> types of VFIO SmartNICs and GPUs.
>>
>> This patch set allows parallelizing this transfer by using multifd 
>> channels
>> for it.
>> It also introduces new load and save threads per VFIO device for 
>> decoupling
>> these operations from the main migration thread.
>> These threads run on newly introduced generic (non-AIO) thread pools,
>> instantiated by the core migration core.
>
> I think we are ready to apply 1-33. Avihai, please take a look !

Sure, will try to do it by EOW.

When were you planning to apply?

>
> 7,15 and 17 still need an Ack from Peter and/or Fabiano though.
>
> 34 can be reworked a bit before -rc0.
> 35 is for QEMU 10.1.
> 36 needs some massaging. I will do that.
>
> This can go through the vfio tree if everyone agrees.
>
> Thanks,
>
> C.
>
>
>
>
>> Changes from v5:
>> * Add bql_locked() assertion to migration_incoming_state_destroy() 
>> with a
>> comment describing why holding BQL there is necessary.
>>
>> * Add SPDX-License-Identifier to newly added files.
>>
>> * Move consistency of multfd transfer settings check to the patch adding
>> x-migration-multifd-transfer property.
>>
>> * Change packet->idx == UINT32_MAX message to the suggested one.
>>
>> * Use WITH_QEMU_LOCK_GUARD() in vfio_load_state_buffer().
>>
>> * Add vfio_load_bufs_thread_{start,end} trace events.
>>
>> * Invert "ret" value computation logic in vfio_load_bufs_thread() and
>>    vfio_multifd_save_complete_precopy_thread() - initialize "ret" to 
>> false
>>    at definition, remove "ret = false" at every failure/early exit 
>> block and
>>    add "ret = true" just before the early exit jump label.
>>
>> * Make vfio_load_bufs_thread_load_config() return a bool and take an
>>    "Error **" parameter.
>>
>> * Make vfio_multifd_setup() (previously called 
>> vfio_multifd_transfer_setup())
>>    allocate struct VFIOMultifd if requested by "alloc_multifd" 
>> parameter.
>>
>> * Add vfio_multifd_cleanup() call to vfio_save_cleanup() (for 
>> consistency
>>    with the load code), with a comment describing that it is 
>> currently a NOP
>>    there.
>>
>> * Move vfio_multifd_cleanup() to migration-multifd.c.
>>
>> * Move general multifd migration description in 
>> docs/devel/migration/vfio.rst
>>    from the top section to new "Multifd" section at the bottom.
>>
>> * Add comment describing why x-migration-multifd-transfer needs to be
>>    a custom property above the variable containing that custom 
>> property type
>>    in register_vfio_pci_dev_type().
>>
>> * Add object_class_property_set_description() description for all 3 
>> newly
>>    added parameters: x-migration-multifd-transfer,
>>    x-migration-load-config-after-iter and 
>> x-migration-max-queued-buffers.
>>
>> * Split out wiring vfio_multifd_setup() and vfio_multifd_cleanup() into
>>    general VFIO load/save setup and cleanup methods into a brand new
>>    patch/commit.
>>
>> * Squash the patch introducing VFIOStateBuffer(s) into the "received 
>> buffers
>>    queuing" commit to fix building the interim code form at the time 
>> of this
>>    patch with "-Werror".
>>
>> * Change device state packet "idstr" field to NULL-terminated and drop
>>    QEMU_NONSTRING marking from its definition.
>>
>> * Add vbasedev->name to VFIO error messages to know which device caused
>>    that error.
>>
>> * Move BQL lock ordering assert closer to the other lock in the lock 
>> order
>>    in vfio_load_state_buffer().
>>
>> * Drop orphan "QemuThread load_bufs_thread" VFIOMultifd member leftover
>>    from the days of the version 2 of this patch set.
>>
>> * Change "guint" into an "unsigned int" where it was present in this
>>    patch set.
>>
>> * Use g_autoptr() for QEMUFile also in 
>> vfio_load_bufs_thread_load_config().
>>
>> * Call multifd_abort_device_state_save_threads() if a migration error is
>>    already set in the save path to avoid needlessly waiting for the 
>> remaining
>>    threads to do all of their normal work.
>>
>> * Other minor changes that should not have functional impact, like:
>>    renamed functions/labels, moved code lines between patches contained
>>    in this patch set, added review tags, code formatting, rebased on top
>>    of the latest QEMU git master, etc.
>>
>> ========================================================================
>>
>> This patch set is targeting QEMU 10.0.
>>
>> It is also exported as a git tree:
>> https://gitlab.com/maciejsszmigiero/qemu/-/commits/multifd-device-state-transfer-vfio 
>>
>>
>> ========================================================================
>>
>> Maciej S. Szmigiero (35):
>>    migration: Clarify that {load,save}_cleanup handlers can run without
>>      setup
>>    thread-pool: Remove thread_pool_submit() function
>>    thread-pool: Rename AIO pool functions to *_aio() and data types to
>>      *Aio
>>    thread-pool: Implement generic (non-AIO) pool support
>>    migration: Add MIG_CMD_SWITCHOVER_START and its load handler
>>    migration: Add qemu_loadvm_load_state_buffer() and its handler
>>    migration: postcopy_ram_listen_thread() should take BQL for some 
>> calls
>>    error: define g_autoptr() cleanup function for the Error type
>>    migration: Add thread pool of optional load threads
>>    migration/multifd: Split packet into header and RAM data
>>    migration/multifd: Device state transfer support - receive side
>>    migration/multifd: Make multifd_send() thread safe
>>    migration/multifd: Add an explicit MultiFDSendData destructor
>>    migration/multifd: Device state transfer support - send side
>>    migration/multifd: Add multifd_device_state_supported()
>>    migration: Add save_live_complete_precopy_thread handler
>>    vfio/migration: Add load_device_config_state_start trace event
>>    vfio/migration: Convert bytes_transferred counter to atomic
>>    vfio/migration: Add vfio_add_bytes_transferred()
>>    vfio/migration: Move migration channel flags to vfio-common.h header
>>      file
>>    vfio/migration: Multifd device state transfer support - basic types
>>    vfio/migration: Multifd device state transfer - add support checking
>>      function
>>    vfio/migration: Multifd setup/cleanup functions and associated
>>      VFIOMultifd
>>    vfio/migration: Setup and cleanup multifd transfer in these general
>>      methods
>>    vfio/migration: Multifd device state transfer support - received
>>      buffers queuing
>>    vfio/migration: Multifd device state transfer support - load thread
>>    migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile
>>    vfio/migration: Multifd device state transfer support - config 
>> loading
>>      support
>>    vfio/migration: Multifd device state transfer support - send side
>>    vfio/migration: Add x-migration-multifd-transfer VFIO property
>>    vfio/migration: Make x-migration-multifd-transfer VFIO property
>>      mutable
>>    hw/core/machine: Add compat for x-migration-multifd-transfer VFIO
>>      property
>>    vfio/migration: Max in-flight VFIO device state buffer count limit
>>    vfio/migration: Add x-migration-load-config-after-iter VFIO property
>>    vfio/migration: Update VFIO migration documentation
>>
>> Peter Xu (1):
>>    migration/multifd: Make MultiFDSendData a struct
>>
>>   docs/devel/migration/vfio.rst      |  79 ++-
>>   hw/core/machine.c                  |   2 +
>>   hw/vfio/meson.build                |   1 +
>>   hw/vfio/migration-multifd.c        | 786 +++++++++++++++++++++++++++++
>>   hw/vfio/migration-multifd.h        |  37 ++
>>   hw/vfio/migration.c                | 111 ++--
>>   hw/vfio/pci.c                      |  40 ++
>>   hw/vfio/trace-events               |  13 +-
>>   include/block/aio.h                |   8 +-
>>   include/block/thread-pool.h        |  62 ++-
>>   include/hw/vfio/vfio-common.h      |  34 ++
>>   include/migration/client-options.h |   4 +
>>   include/migration/misc.h           |  25 +
>>   include/migration/register.h       |  52 +-
>>   include/qapi/error.h               |   2 +
>>   include/qemu/typedefs.h            |   5 +
>>   migration/colo.c                   |   3 +
>>   migration/meson.build              |   1 +
>>   migration/migration-hmp-cmds.c     |   2 +
>>   migration/migration.c              |  20 +-
>>   migration/migration.h              |   7 +
>>   migration/multifd-device-state.c   | 212 ++++++++
>>   migration/multifd-nocomp.c         |  30 +-
>>   migration/multifd.c                | 248 +++++++--
>>   migration/multifd.h                |  74 ++-
>>   migration/options.c                |   9 +
>>   migration/qemu-file.h              |   2 +
>>   migration/savevm.c                 | 201 +++++++-
>>   migration/savevm.h                 |   6 +-
>>   migration/trace-events             |   1 +
>>   scripts/analyze-migration.py       |  11 +
>>   tests/unit/test-thread-pool.c      |   6 +-
>>   util/async.c                       |   6 +-
>>   util/thread-pool.c                 | 184 +++++--
>>   util/trace-events                  |   6 +-
>>   35 files changed, 2125 insertions(+), 165 deletions(-)
>>   create mode 100644 hw/vfio/migration-multifd.c
>>   create mode 100644 hw/vfio/migration-multifd.h
>>   create mode 100644 migration/multifd-device-state.c
>>
>

Re: [PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Cédric Le Goater 3 weeks, 5 days ago
On 3/5/25 10:33, Avihai Horon wrote:
> 
> On 05/03/2025 11:29, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello,
>>
>> On 3/4/25 23:03, Maciej S. Szmigiero wrote:
>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>
>>> This is an updated v6 patch series of the v5 series located here:
>>> https://lore.kernel.org/qemu-devel/cover.1739994627.git.maciej.szmigiero@oracle.com/
>>>
>>> What this patch set is about?
>>> Current live migration device state transfer is done via the main (single)
>>> migration channel, which reduces performance and severally impacts the
>>> migration downtime for VMs having large device state that needs to be
>>> transferred during the switchover phase.
>>>
>>> Example devices that have such large switchover phase device state are some
>>> types of VFIO SmartNICs and GPUs.
>>>
>>> This patch set allows parallelizing this transfer by using multifd channels
>>> for it.
>>> It also introduces new load and save threads per VFIO device for decoupling
>>> these operations from the main migration thread.
>>> These threads run on newly introduced generic (non-AIO) thread pools,
>>> instantiated by the core migration core.
>>
>> I think we are ready to apply 1-33. Avihai, please take a look !
> 
> Sure, will try to do it by EOW.

Thanks,

> When were you planning to apply?

before EOW :)

C.


Re: [PATCH v6 00/36] Multifd 🔀 device state transfer support with VFIO consumer
Posted by Avihai Horon 3 weeks, 5 days ago
On 05/03/2025 11:35, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 3/5/25 10:33, Avihai Horon wrote:
>>
>> On 05/03/2025 11:29, Cédric Le Goater wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Hello,
>>>
>>> On 3/4/25 23:03, Maciej S. Szmigiero wrote:
>>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>>
>>>> This is an updated v6 patch series of the v5 series located here:
>>>> https://lore.kernel.org/qemu-devel/cover.1739994627.git.maciej.szmigiero@oracle.com/ 
>>>>
>>>>
>>>> What this patch set is about?
>>>> Current live migration device state transfer is done via the main 
>>>> (single)
>>>> migration channel, which reduces performance and severally impacts the
>>>> migration downtime for VMs having large device state that needs to be
>>>> transferred during the switchover phase.
>>>>
>>>> Example devices that have such large switchover phase device state 
>>>> are some
>>>> types of VFIO SmartNICs and GPUs.
>>>>
>>>> This patch set allows parallelizing this transfer by using multifd 
>>>> channels
>>>> for it.
>>>> It also introduces new load and save threads per VFIO device for 
>>>> decoupling
>>>> these operations from the main migration thread.
>>>> These threads run on newly introduced generic (non-AIO) thread pools,
>>>> instantiated by the core migration core.
>>>
>>> I think we are ready to apply 1-33. Avihai, please take a look !
>>
>> Sure, will try to do it by EOW.
>
> Thanks,
>
>> When were you planning to apply?
>
> before EOW :)

Hehe, OK will go over it today/tomorrow.

Thanks.