Hello,
Changes from v1 [2]:
* Rebased on top of master.
* Added a new patch that runs a final save_query_pending at switchover
(replace RAM specific code). (Peter)
* Dropped patches #1 and #2 as they were already taken by Cedric (in
some way or another).
* Patch #3: Used Error variable of migration_completion() instead of
adding a new one in migration_completion_precopy(). (Cedric, Peter)
* Patch #6: Dropped renaming of qemu_loadvm_approve_switchover to legacy
since in next patch it's renamed back again. (Peter)
* Patch #7:
- Rephrased switchover-ack QAPI documentation to be more general and
not mention number of ACKs sent. (Peter)
- Dropped the new request_switchover_ack SaveVMHandler and instead
requested switchover ACKs via save_query_pending SaveVMHandler.
(Peter)
- Added documentation comment for legacy/new switchover-ack. (Peter)
* Patch #8: Adjusted according to changes in patch #7.
* Dropped patch #9 since the scenario that it comes to solve can't
really happen (kept only the part that extracts sending of
INIT_DATA_SENT flag to a helper). (Peter, Cedric)
* Patch #10: Used error_prepend() to add error prefix in
vfio_migration_init(). (Cedric)
* Patch #11-13: Adjusted according to changes in patch #7.
* Patch #12: Added explanation about VFIO_PRECOPY_INFO_REINIT and an
example to vfio migration documentation. (Peter)
* Collected R-bs.
===
This series adds support for the kernel VFIO_PRECOPY_INFO_REINIT feature
[1], which can reduce migration downtime in some VFIO precopy scenarios
(see the Tests section below). Supporting it requires refactoring
switchover-ack and making it re-usable; that work comes first.
=== Background ===
Switchover-ack is a mechanism to synchronize between source and
destination QEMU during migration to prevent the source from switching
over prematurely.
VFIO uses switchover-ack to ensure switchover happens only after
destination side has loaded the precopy initial bytes. This is important
for VFIO, as otherwise downtime could be impacted and be higher.
In its current state, switchover-ack is a one-time mechanism, meaning
that switchover is acked only once and past that another ACK cannot be
requested again. This was sufficient until now, as VFIO precopy initial
bytes was defined to be monotonically decreasing. Thus, when precopy
initial bytes reached zero for all VFIO devices, a single ACK would be
sent and its validity would hold.
=== Problem ===
Now the new VFIO_PRECOPY_INFO_REINIT feature allows precopy initial
bytes to be re-initialized during precopy. Specifically, it means that
initial bytes can grow after reaching zero, which would invalidate a
previously sent switchover ACK.
=== Solution ===
To support VFIO_PRECOPY_INFO_REINIT, this series makes switchover-ack
re-usable and allows devices to request another switchover ACK when
needed.
In the new mechansim, switchover ACK can be requested for a specific
device and in different times, so switchover ACK is changed to be
per-device (instead of a single ACK for all devices) and source side is
the one who does the pending ACKs accounting (instead of destination
side).
The old (legacy) switchover-ack mechanism is kept for backward
compatibility and is turned on by a compatibility property for older
machines.
With that infrastructure in place, VFIO uses the new switchover-ack path
and implements VFIO_PRECOPY_INFO_REINIT.
=== Tests ===
Functional and performance tests were run.
Performance tests were done by migrating a single VM with:
* 8 GB RAM
* 4 mlx5 VFIO devices:
- One device with 1GB of device data (stopcopy data) that runs
workload during precopy so VFIO_PRECOPY_INFO_REINIT is exercised
(generate new initial_bytes chunks during precopy).
- The other 3 devices are idle.
In this setup, VFIO_PRECOPY_INFO_REINIT reduced total migration downtime by
about 43%, and downtime attributed to the busy VFIO device by about 67%:
With VFIO_PRECOPY_INFO_REINIT:
1335ms total (~520ms from the VFIO device running the workload).
Without VFIO_PRECOPY_INFO_REINIT:
2352ms total (~1600ms from the VFIO device running the workload).
Functional tests covered the main code paths and combinations, including
legacy and new switchover-ack across versions, for example:
* Migration between QEMU 11.0 (old binary) and 11.1 (new binary).
* Migration between two 11.1 binaries with different machine versions.
* Migration when the VFIO device has no VFIO_PRECOPY_INFO_REINIT.
* Migration when the VFIO device has no VFIO precopy support.
=== Patch Breakdown ===
* Patches 1-7,13: Migration cleanups and the new switchover-ack mechanism
* Patches 8-12: VFIO cleanups and VFIO_PRECOPY_INFO_REINIT
Thanks.
[1] https://lore.kernel.org/all/20260317161753.18964-1-yishaih@nvidia.com/
[2] https://lore.kernel.org/qemu-devel/20260505081423.28326-1-avihaih@nvidia.com/
Avihai Horon (13):
migration: Propagate errors in migration_completion_precopy()
migration: Run final save_query_pending at switchover
migration: Log the approver in qemu_loadvm_approve_switchover()
migration: Replace switchover_ack_needed SaveVMHandler
migration: Rename switchover-ack code to legacy
migration: Make switchover-ack re-usable
migration: Fail migration if switchover-ack is requested after
switchover decision
vfio/migration: Extract VFIO_MIG_FLAG_DEV_INIT_DATA_SENT sending to
helper
vfio/migration: Add Error ** parameter to vfio_migration_init()
vfio/migration: Add new switchover-ack mechanism
vfio/migration: Implement VFIO_PRECOPY_INFO_REINIT feature
vfio/migration: Check VFIO_PRECOPY_INFO_REINIT during switchover
migration: Enable new switchover-ack
docs/devel/migration/vfio.rst | 17 ++-
qapi/migration.json | 16 ++-
hw/vfio/vfio-migration-internal.h | 2 +
include/migration/client-options.h | 1 +
include/migration/misc.h | 2 +
include/migration/register.h | 56 ++++-----
migration/migration.h | 34 +++++-
migration/savevm.h | 7 +-
hw/core/machine.c | 1 +
hw/s390x/s390-stattrib.c | 9 +-
hw/vfio/migration.c | 189 +++++++++++++++++++++++------
migration/block-dirty-bitmap.c | 6 +-
migration/migration.c | 92 ++++++++++++--
migration/options.c | 9 ++
migration/ram.c | 37 +++---
migration/savevm.c | 134 +++++++++++++-------
hw/vfio/trace-events | 7 +-
migration/trace-events | 9 +-
18 files changed, 461 insertions(+), 167 deletions(-)
--
2.40.1