Pre-copy support allows the VFIO device data to be transferred while the
VM is running. This helps to accommodate VFIO devices that have a large
amount of data that needs to be transferred, and it can reduce migration
downtime.
Pre-copy support is optional in VFIO migration protocol v2.
Implement pre-copy of VFIO migration protocol v2 and use it for devices
that support it. Full description of it can be found in the following
Linux commit: 4db52602a607 ("vfio: Extend the device migration protocol
with PRE_COPY").
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
docs/devel/vfio-migration.rst | 35 +++++---
include/hw/vfio/vfio-common.h | 2 +
hw/vfio/common.c | 6 +-
hw/vfio/migration.c | 165 ++++++++++++++++++++++++++++++++--
hw/vfio/trace-events | 4 +-
5 files changed, 190 insertions(+), 22 deletions(-)
diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index 1b68ccf115..e896b2a673 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -7,12 +7,14 @@ the guest is running on source host and restoring this saved state on the
destination host. This document details how saving and restoring of VFIO
devices is done in QEMU.
-Migration of VFIO devices currently consists of a single stop-and-copy phase.
-During the stop-and-copy phase the guest is stopped and the entire VFIO device
-data is transferred to the destination.
-
-The pre-copy phase of migration is currently not supported for VFIO devices.
-Support for VFIO pre-copy will be added later on.
+Migration of VFIO devices consists of two phases: the optional pre-copy phase,
+and the stop-and-copy phase. The pre-copy phase is iterative and allows to
+accommodate VFIO devices that have a large amount of data that needs to be
+transferred. The iterative pre-copy phase of migration allows for the guest to
+continue whilst the VFIO device state is transferred to the destination, this
+helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy
+support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
+VFIO_DEVICE_FEATURE_MIGRATION ioctl.
Note that currently VFIO migration is supported only for a single device. This
is due to VFIO migration's lack of P2P support. However, P2P support is planned
@@ -29,10 +31,20 @@ VFIO implements the device hooks for the iterative approach as follows:
* A ``load_setup`` function that sets the VFIO device on the destination in
_RESUMING state.
+* A ``state_pending_estimate`` function that reports an estimate of the
+ remaining pre-copy data that the vendor driver has yet to save for the VFIO
+ device.
+
* A ``state_pending_exact`` function that reads pending_bytes from the vendor
driver, which indicates the amount of data that the vendor driver has yet to
save for the VFIO device.
+* An ``is_active_iterate`` function that indicates ``save_live_iterate`` is
+ active only when the VFIO device is in pre-copy states.
+
+* A ``save_live_iterate`` function that reads the VFIO device's data from the
+ vendor driver during iterative pre-copy phase.
+
* A ``save_state`` function to save the device config space if it is present.
* A ``save_live_complete_precopy`` function that sets the VFIO device in
@@ -111,8 +123,10 @@ Flow of state changes during Live migration
===========================================
Below is the flow of state change during live migration.
-The values in the brackets represent the VM state, the migration state, and
+The values in the parentheses represent the VM state, the migration state, and
the VFIO device state, respectively.
+The text in the square brackets represents the flow if the VFIO device supports
+pre-copy.
Live migration save path
------------------------
@@ -124,11 +138,12 @@ Live migration save path
|
migrate_init spawns migration_thread
Migration thread then calls each device's .save_setup()
- (RUNNING, _SETUP, _RUNNING)
+ (RUNNING, _SETUP, _RUNNING [_PRE_COPY])
|
- (RUNNING, _ACTIVE, _RUNNING)
- If device is active, get pending_bytes by .state_pending_exact()
+ (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
+ If device is active, get pending_bytes by .state_pending_{estimate,exact}()
If total pending_bytes >= threshold_size, call .save_live_iterate()
+ [Data of VFIO device for pre-copy phase is copied]
Iterate till total pending bytes converge and are less than threshold
|
On migration completion, vCPU stops and calls .save_live_complete_precopy for
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 5f29dab839..1db901c194 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -67,6 +67,8 @@ typedef struct VFIOMigration {
void *data_buffer;
size_t data_buffer_size;
uint64_t mig_flags;
+ uint64_t precopy_init_size;
+ uint64_t precopy_dirty_size;
} VFIOMigration;
typedef struct VFIOAddressSpace {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 78358ede27..b73086e17a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -492,7 +492,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
}
if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF &&
- migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
+ (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+ migration->device_state == VFIO_DEVICE_STATE_PRE_COPY)) {
return false;
}
}
@@ -537,7 +538,8 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
return false;
}
- if (migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
+ if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+ migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) {
continue;
} else {
return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 8d33414379..d8f6a22ae1 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -68,6 +68,8 @@ static const char *mig_state_to_str(enum vfio_device_mig_state state)
return "STOP_COPY";
case VFIO_DEVICE_STATE_RESUMING:
return "RESUMING";
+ case VFIO_DEVICE_STATE_PRE_COPY:
+ return "PRE_COPY";
default:
return "UNKNOWN STATE";
}
@@ -241,6 +243,25 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
return 0;
}
+static int vfio_query_precopy_size(VFIOMigration *migration)
+{
+ struct vfio_precopy_info precopy = {
+ .argsz = sizeof(precopy),
+ };
+
+ migration->precopy_init_size = 0;
+ migration->precopy_dirty_size = 0;
+
+ if (ioctl(migration->data_fd, VFIO_MIG_GET_PRECOPY_INFO, &precopy)) {
+ return -errno;
+ }
+
+ migration->precopy_init_size = precopy.initial_bytes;
+ migration->precopy_dirty_size = precopy.dirty_bytes;
+
+ return 0;
+}
+
/* Returns the size of saved data on success and -errno on error */
static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration *migration)
{
@@ -249,6 +270,14 @@ static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration *migration)
data_size = read(migration->data_fd, migration->data_buffer,
migration->data_buffer_size);
if (data_size < 0) {
+ /*
+ * Pre-copy emptied all the device state for now. For more information,
+ * please refer to the Linux kernel VFIO uAPI.
+ */
+ if (errno == ENOMSG) {
+ return 0;
+ }
+
return -errno;
}
if (data_size == 0) {
@@ -265,6 +294,38 @@ static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration *migration)
return qemu_file_get_error(f) ?: data_size;
}
+static void vfio_update_estimated_pending_data(VFIOMigration *migration,
+ uint64_t data_size)
+{
+ if (!data_size) {
+ /*
+ * Pre-copy emptied all the device state for now, update estimated sizes
+ * accordingly.
+ */
+ migration->precopy_init_size = 0;
+ migration->precopy_dirty_size = 0;
+
+ return;
+ }
+
+ if (migration->precopy_init_size) {
+ uint64_t init_size = MIN(migration->precopy_init_size, data_size);
+
+ migration->precopy_init_size -= init_size;
+ data_size -= init_size;
+ }
+
+ migration->precopy_dirty_size -= MIN(migration->precopy_dirty_size,
+ data_size);
+}
+
+static bool vfio_precopy_supported(VFIODevice *vbasedev)
+{
+ VFIOMigration *migration = vbasedev->migration;
+
+ return migration->mig_flags & VFIO_MIGRATION_PRE_COPY;
+}
+
/* ---------------------------------------------------------------------- */
static int vfio_save_setup(QEMUFile *f, void *opaque)
@@ -285,6 +346,28 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
return -ENOMEM;
}
+ if (vfio_precopy_supported(vbasedev)) {
+ int ret;
+
+ switch (migration->device_state) {
+ case VFIO_DEVICE_STATE_RUNNING:
+ ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_PRE_COPY,
+ VFIO_DEVICE_STATE_RUNNING);
+ if (ret) {
+ return ret;
+ }
+
+ vfio_query_precopy_size(migration);
+
+ break;
+ case VFIO_DEVICE_STATE_STOP:
+ /* vfio_save_complete_precopy() will go to STOP_COPY */
+ break;
+ default:
+ return -EINVAL;
+ }
+ }
+
trace_vfio_save_setup(vbasedev->name, migration->data_buffer_size);
qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
@@ -299,26 +382,42 @@ static void vfio_save_cleanup(void *opaque)
g_free(migration->data_buffer);
migration->data_buffer = NULL;
+ migration->precopy_init_size = 0;
+ migration->precopy_dirty_size = 0;
vfio_migration_cleanup(vbasedev);
trace_vfio_save_cleanup(vbasedev->name);
}
+static void vfio_state_pending_estimate(void *opaque, uint64_t *must_precopy,
+ uint64_t *can_postcopy)
+{
+ VFIODevice *vbasedev = opaque;
+ VFIOMigration *migration = vbasedev->migration;
+
+ if (migration->device_state != VFIO_DEVICE_STATE_PRE_COPY) {
+ return;
+ }
+
+ *must_precopy +=
+ migration->precopy_init_size + migration->precopy_dirty_size;
+
+ trace_vfio_state_pending_estimate(vbasedev->name, *must_precopy,
+ *can_postcopy,
+ migration->precopy_init_size,
+ migration->precopy_dirty_size);
+}
+
/*
* Migration size of VFIO devices can be as little as a few KBs or as big as
* many GBs. This value should be big enough to cover the worst case.
*/
#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
-/*
- * Only exact function is implemented and not estimate function. The reason is
- * that during pre-copy phase of migration the estimate function is called
- * repeatedly while pending RAM size is over the threshold, thus migration
- * can't converge and querying the VFIO device pending data size is useless.
- */
static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
uint64_t *can_postcopy)
{
VFIODevice *vbasedev = opaque;
+ VFIOMigration *migration = vbasedev->migration;
uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
/*
@@ -328,8 +427,48 @@ static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
*must_precopy += stop_copy_size;
+ if (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) {
+ vfio_query_precopy_size(migration);
+
+ *must_precopy +=
+ migration->precopy_init_size + migration->precopy_dirty_size;
+ }
+
trace_vfio_state_pending_exact(vbasedev->name, *must_precopy, *can_postcopy,
- stop_copy_size);
+ stop_copy_size, migration->precopy_init_size,
+ migration->precopy_dirty_size);
+}
+
+static bool vfio_is_active_iterate(void *opaque)
+{
+ VFIODevice *vbasedev = opaque;
+ VFIOMigration *migration = vbasedev->migration;
+
+ return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY;
+}
+
+static int vfio_save_iterate(QEMUFile *f, void *opaque)
+{
+ VFIODevice *vbasedev = opaque;
+ VFIOMigration *migration = vbasedev->migration;
+ ssize_t data_size;
+
+ data_size = vfio_save_block(f, migration);
+ if (data_size < 0) {
+ return data_size;
+ }
+ qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+
+ vfio_update_estimated_pending_data(migration, data_size);
+
+ trace_vfio_save_iterate(vbasedev->name, migration->precopy_init_size,
+ migration->precopy_dirty_size);
+
+ /*
+ * A VFIO device's pre-copy dirty_bytes is not guaranteed to reach zero.
+ * Return 1 so following handlers will not be potentially blocked.
+ */
+ return 1;
}
static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
@@ -338,7 +477,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
ssize_t data_size;
int ret;
- /* We reach here with device state STOP only */
+ /* We reach here with device state STOP or STOP_COPY only */
ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
VFIO_DEVICE_STATE_STOP);
if (ret) {
@@ -457,7 +596,10 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
static const SaveVMHandlers savevm_vfio_handlers = {
.save_setup = vfio_save_setup,
.save_cleanup = vfio_save_cleanup,
+ .state_pending_estimate = vfio_state_pending_estimate,
.state_pending_exact = vfio_state_pending_exact,
+ .is_active_iterate = vfio_is_active_iterate,
+ .save_live_iterate = vfio_save_iterate,
.save_live_complete_precopy = vfio_save_complete_precopy,
.save_state = vfio_save_state,
.load_setup = vfio_load_setup,
@@ -470,13 +612,18 @@ static const SaveVMHandlers savevm_vfio_handlers = {
static void vfio_vmstate_change(void *opaque, bool running, RunState state)
{
VFIODevice *vbasedev = opaque;
+ VFIOMigration *migration = vbasedev->migration;
enum vfio_device_mig_state new_state;
int ret;
if (running) {
new_state = VFIO_DEVICE_STATE_RUNNING;
} else {
- new_state = VFIO_DEVICE_STATE_STOP;
+ new_state =
+ (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY &&
+ (state == RUN_STATE_FINISH_MIGRATE || state == RUN_STATE_PAUSED)) ?
+ VFIO_DEVICE_STATE_STOP_COPY :
+ VFIO_DEVICE_STATE_STOP;
}
/*
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 646e42fd27..548f9488a7 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -162,6 +162,8 @@ vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
vfio_save_cleanup(const char *name) " (%s)"
vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
vfio_save_device_config_state(const char *name) " (%s)"
+vfio_save_iterate(const char *name, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64"
vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64
-vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64
+vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
+vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
--
2.26.3
On 5/28/23 16:06, Avihai Horon wrote:
> Pre-copy support allows the VFIO device data to be transferred while the
> VM is running. This helps to accommodate VFIO devices that have a large
> amount of data that needs to be transferred, and it can reduce migration
> downtime.
>
> Pre-copy support is optional in VFIO migration protocol v2.
> Implement pre-copy of VFIO migration protocol v2 and use it for devices
> that support it. Full description of it can be found in the following
> Linux commit: 4db52602a607 ("vfio: Extend the device migration protocol
> with PRE_COPY").
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
LGTM,
Reviewed-by: Cédric Le Goater <clg@redhat.com>
one minor issue below,
> ---
> docs/devel/vfio-migration.rst | 35 +++++---
> include/hw/vfio/vfio-common.h | 2 +
> hw/vfio/common.c | 6 +-
> hw/vfio/migration.c | 165 ++++++++++++++++++++++++++++++++--
> hw/vfio/trace-events | 4 +-
> 5 files changed, 190 insertions(+), 22 deletions(-)
>
> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
> index 1b68ccf115..e896b2a673 100644
> --- a/docs/devel/vfio-migration.rst
> +++ b/docs/devel/vfio-migration.rst
> @@ -7,12 +7,14 @@ the guest is running on source host and restoring this saved state on the
> destination host. This document details how saving and restoring of VFIO
> devices is done in QEMU.
>
> -Migration of VFIO devices currently consists of a single stop-and-copy phase.
> -During the stop-and-copy phase the guest is stopped and the entire VFIO device
> -data is transferred to the destination.
> -
> -The pre-copy phase of migration is currently not supported for VFIO devices.
> -Support for VFIO pre-copy will be added later on.
> +Migration of VFIO devices consists of two phases: the optional pre-copy phase,
> +and the stop-and-copy phase. The pre-copy phase is iterative and allows to
> +accommodate VFIO devices that have a large amount of data that needs to be
> +transferred. The iterative pre-copy phase of migration allows for the guest to
> +continue whilst the VFIO device state is transferred to the destination, this
> +helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy
> +support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
> +VFIO_DEVICE_FEATURE_MIGRATION ioctl.
>
> Note that currently VFIO migration is supported only for a single device. This
> is due to VFIO migration's lack of P2P support. However, P2P support is planned
> @@ -29,10 +31,20 @@ VFIO implements the device hooks for the iterative approach as follows:
> * A ``load_setup`` function that sets the VFIO device on the destination in
> _RESUMING state.
>
> +* A ``state_pending_estimate`` function that reports an estimate of the
> + remaining pre-copy data that the vendor driver has yet to save for the VFIO
> + device.
> +
> * A ``state_pending_exact`` function that reads pending_bytes from the vendor
> driver, which indicates the amount of data that the vendor driver has yet to
> save for the VFIO device.
>
> +* An ``is_active_iterate`` function that indicates ``save_live_iterate`` is
> + active only when the VFIO device is in pre-copy states.
> +
> +* A ``save_live_iterate`` function that reads the VFIO device's data from the
> + vendor driver during iterative pre-copy phase.
> +
> * A ``save_state`` function to save the device config space if it is present.
>
> * A ``save_live_complete_precopy`` function that sets the VFIO device in
> @@ -111,8 +123,10 @@ Flow of state changes during Live migration
> ===========================================
>
> Below is the flow of state change during live migration.
> -The values in the brackets represent the VM state, the migration state, and
> +The values in the parentheses represent the VM state, the migration state, and
> the VFIO device state, respectively.
> +The text in the square brackets represents the flow if the VFIO device supports
> +pre-copy.
>
> Live migration save path
> ------------------------
> @@ -124,11 +138,12 @@ Live migration save path
> |
> migrate_init spawns migration_thread
> Migration thread then calls each device's .save_setup()
> - (RUNNING, _SETUP, _RUNNING)
> + (RUNNING, _SETUP, _RUNNING [_PRE_COPY])
> |
> - (RUNNING, _ACTIVE, _RUNNING)
> - If device is active, get pending_bytes by .state_pending_exact()
> + (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
> + If device is active, get pending_bytes by .state_pending_{estimate,exact}()
> If total pending_bytes >= threshold_size, call .save_live_iterate()
> + [Data of VFIO device for pre-copy phase is copied]
> Iterate till total pending bytes converge and are less than threshold
> |
> On migration completion, vCPU stops and calls .save_live_complete_precopy for
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 5f29dab839..1db901c194 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -67,6 +67,8 @@ typedef struct VFIOMigration {
> void *data_buffer;
> size_t data_buffer_size;
> uint64_t mig_flags;
> + uint64_t precopy_init_size;
> + uint64_t precopy_dirty_size;
> } VFIOMigration;
>
> typedef struct VFIOAddressSpace {
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 78358ede27..b73086e17a 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -492,7 +492,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
> }
>
> if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF &&
> - migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
> + (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> + migration->device_state == VFIO_DEVICE_STATE_PRE_COPY)) {
> return false;
> }
> }
> @@ -537,7 +538,8 @@ static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
> return false;
> }
>
> - if (migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
> + if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
> + migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) {
> continue;
> } else {
> return false;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 8d33414379..d8f6a22ae1 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -68,6 +68,8 @@ static const char *mig_state_to_str(enum vfio_device_mig_state state)
> return "STOP_COPY";
> case VFIO_DEVICE_STATE_RESUMING:
> return "RESUMING";
> + case VFIO_DEVICE_STATE_PRE_COPY:
> + return "PRE_COPY";
> default:
> return "UNKNOWN STATE";
> }
> @@ -241,6 +243,25 @@ static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
> return 0;
> }
>
> +static int vfio_query_precopy_size(VFIOMigration *migration)
> +{
> + struct vfio_precopy_info precopy = {
> + .argsz = sizeof(precopy),
> + };
> +
> + migration->precopy_init_size = 0;
> + migration->precopy_dirty_size = 0;
> +
> + if (ioctl(migration->data_fd, VFIO_MIG_GET_PRECOPY_INFO, &precopy)) {
> + return -errno;
> + }
> +
> + migration->precopy_init_size = precopy.initial_bytes;
> + migration->precopy_dirty_size = precopy.dirty_bytes;
> +
> + return 0;
> +}
> +
> /* Returns the size of saved data on success and -errno on error */
> static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration *migration)
> {
> @@ -249,6 +270,14 @@ static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration *migration)
> data_size = read(migration->data_fd, migration->data_buffer,
> migration->data_buffer_size);
> if (data_size < 0) {
> + /*
> + * Pre-copy emptied all the device state for now. For more information,
> + * please refer to the Linux kernel VFIO uAPI.
> + */
> + if (errno == ENOMSG) {
> + return 0;
> + }
> +
> return -errno;
> }
> if (data_size == 0) {
> @@ -265,6 +294,38 @@ static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration *migration)
> return qemu_file_get_error(f) ?: data_size;
> }
>
> +static void vfio_update_estimated_pending_data(VFIOMigration *migration,
> + uint64_t data_size)
> +{
> + if (!data_size) {
> + /*
> + * Pre-copy emptied all the device state for now, update estimated sizes
> + * accordingly.
> + */
> + migration->precopy_init_size = 0;
> + migration->precopy_dirty_size = 0;
> +
> + return;
> + }
> +
> + if (migration->precopy_init_size) {
> + uint64_t init_size = MIN(migration->precopy_init_size, data_size);
> +
> + migration->precopy_init_size -= init_size;
> + data_size -= init_size;
> + }
> +
> + migration->precopy_dirty_size -= MIN(migration->precopy_dirty_size,
> + data_size);
> +}
> +
> +static bool vfio_precopy_supported(VFIODevice *vbasedev)
> +{
> + VFIOMigration *migration = vbasedev->migration;
> +
> + return migration->mig_flags & VFIO_MIGRATION_PRE_COPY;
> +}
> +
> /* ---------------------------------------------------------------------- */
>
> static int vfio_save_setup(QEMUFile *f, void *opaque)
> @@ -285,6 +346,28 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
> return -ENOMEM;
> }
>
> + if (vfio_precopy_supported(vbasedev)) {
> + int ret;
> +
> + switch (migration->device_state) {
> + case VFIO_DEVICE_STATE_RUNNING:
> + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_PRE_COPY,
> + VFIO_DEVICE_STATE_RUNNING);
> + if (ret) {
> + return ret;
> + }
> +
> + vfio_query_precopy_size(migration);
> +
> + break;
> + case VFIO_DEVICE_STATE_STOP:
> + /* vfio_save_complete_precopy() will go to STOP_COPY */
> + break;
> + default:
> + return -EINVAL;
> + }
> + }
> +
> trace_vfio_save_setup(vbasedev->name, migration->data_buffer_size);
>
> qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> @@ -299,26 +382,42 @@ static void vfio_save_cleanup(void *opaque)
>
> g_free(migration->data_buffer);
> migration->data_buffer = NULL;
> + migration->precopy_init_size = 0;
> + migration->precopy_dirty_size = 0;
> vfio_migration_cleanup(vbasedev);
> trace_vfio_save_cleanup(vbasedev->name);
> }
>
> +static void vfio_state_pending_estimate(void *opaque, uint64_t *must_precopy,
> + uint64_t *can_postcopy)
> +{
> + VFIODevice *vbasedev = opaque;
> + VFIOMigration *migration = vbasedev->migration;
> +
> + if (migration->device_state != VFIO_DEVICE_STATE_PRE_COPY) {
> + return;
> + }
> +
> + *must_precopy +=
> + migration->precopy_init_size + migration->precopy_dirty_size;
> +
> + trace_vfio_state_pending_estimate(vbasedev->name, *must_precopy,
> + *can_postcopy,
> + migration->precopy_init_size,
> + migration->precopy_dirty_size);
> +}
> +
> /*
> * Migration size of VFIO devices can be as little as a few KBs or as big as
> * many GBs. This value should be big enough to cover the worst case.
> */
> #define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
>
> -/*
> - * Only exact function is implemented and not estimate function. The reason is
> - * that during pre-copy phase of migration the estimate function is called
> - * repeatedly while pending RAM size is over the threshold, thus migration
> - * can't converge and querying the VFIO device pending data size is useless.
> - */
> static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
> uint64_t *can_postcopy)
> {
> VFIODevice *vbasedev = opaque;
> + VFIOMigration *migration = vbasedev->migration;
> uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
>
> /*
> @@ -328,8 +427,48 @@ static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
> vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
> *must_precopy += stop_copy_size;
>
> + if (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) {
> + vfio_query_precopy_size(migration);
> +
> + *must_precopy +=
> + migration->precopy_init_size + migration->precopy_dirty_size;
> + }
> +
> trace_vfio_state_pending_exact(vbasedev->name, *must_precopy, *can_postcopy,
> - stop_copy_size);
> + stop_copy_size, migration->precopy_init_size,
> + migration->precopy_dirty_size);
> +}
> +
> +static bool vfio_is_active_iterate(void *opaque)
> +{
> + VFIODevice *vbasedev = opaque;
> + VFIOMigration *migration = vbasedev->migration;
> +
> + return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY;
> +}
> +
> +static int vfio_save_iterate(QEMUFile *f, void *opaque)
> +{
> + VFIODevice *vbasedev = opaque;
> + VFIOMigration *migration = vbasedev->migration;
> + ssize_t data_size;
> +
> + data_size = vfio_save_block(f, migration);
> + if (data_size < 0) {
> + return data_size;
> + }
> + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> +
> + vfio_update_estimated_pending_data(migration, data_size);
> +
> + trace_vfio_save_iterate(vbasedev->name, migration->precopy_init_size,
> + migration->precopy_dirty_size);
> +
> + /*
> + * A VFIO device's pre-copy dirty_bytes is not guaranteed to reach zero.
> + * Return 1 so following handlers will not be potentially blocked.
> + */
> + return 1;
> }
>
> static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
> @@ -338,7 +477,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
> ssize_t data_size;
> int ret;
>
> - /* We reach here with device state STOP only */
> + /* We reach here with device state STOP or STOP_COPY only */
> ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
> VFIO_DEVICE_STATE_STOP);
> if (ret) {
> @@ -457,7 +596,10 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
> static const SaveVMHandlers savevm_vfio_handlers = {
> .save_setup = vfio_save_setup,
> .save_cleanup = vfio_save_cleanup,
> + .state_pending_estimate = vfio_state_pending_estimate,
> .state_pending_exact = vfio_state_pending_exact,
> + .is_active_iterate = vfio_is_active_iterate,
> + .save_live_iterate = vfio_save_iterate,
> .save_live_complete_precopy = vfio_save_complete_precopy,
> .save_state = vfio_save_state,
> .load_setup = vfio_load_setup,
> @@ -470,13 +612,18 @@ static const SaveVMHandlers savevm_vfio_handlers = {
> static void vfio_vmstate_change(void *opaque, bool running, RunState state)
> {
> VFIODevice *vbasedev = opaque;
> + VFIOMigration *migration = vbasedev->migration;
> enum vfio_device_mig_state new_state;
> int ret;
>
> if (running) {
> new_state = VFIO_DEVICE_STATE_RUNNING;
> } else {
> - new_state = VFIO_DEVICE_STATE_STOP;
> + new_state =
> + (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY &&
> + (state == RUN_STATE_FINISH_MIGRATE || state == RUN_STATE_PAUSED)) ?
> + VFIO_DEVICE_STATE_STOP_COPY :
> + VFIO_DEVICE_STATE_STOP;
> }
>
> /*
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 646e42fd27..548f9488a7 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -162,6 +162,8 @@ vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
> vfio_save_cleanup(const char *name) " (%s)"
> vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
> vfio_save_device_config_state(const char *name) " (%s)"
> +vfio_save_iterate(const char *name, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64"
the extra '"' at the end breaks compile. No need to resend just for that.
It can be fixed.
Thanks,
C.
> vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64
> -vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64
> +vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
> +vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
> vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
On 30/05/2023 12:28, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 5/28/23 16:06, Avihai Horon wrote:
>> Pre-copy support allows the VFIO device data to be transferred while the
>> VM is running. This helps to accommodate VFIO devices that have a large
>> amount of data that needs to be transferred, and it can reduce migration
>> downtime.
>>
>> Pre-copy support is optional in VFIO migration protocol v2.
>> Implement pre-copy of VFIO migration protocol v2 and use it for devices
>> that support it. Full description of it can be found in the following
>> Linux commit: 4db52602a607 ("vfio: Extend the device migration protocol
>> with PRE_COPY").
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>
> LGTM,
>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>
> one minor issue below,
>
>> ---
>> docs/devel/vfio-migration.rst | 35 +++++---
>> include/hw/vfio/vfio-common.h | 2 +
>> hw/vfio/common.c | 6 +-
>> hw/vfio/migration.c | 165 ++++++++++++++++++++++++++++++++--
>> hw/vfio/trace-events | 4 +-
>> 5 files changed, 190 insertions(+), 22 deletions(-)
>>
>> diff --git a/docs/devel/vfio-migration.rst
>> b/docs/devel/vfio-migration.rst
>> index 1b68ccf115..e896b2a673 100644
>> --- a/docs/devel/vfio-migration.rst
>> +++ b/docs/devel/vfio-migration.rst
>> @@ -7,12 +7,14 @@ the guest is running on source host and restoring
>> this saved state on the
>> destination host. This document details how saving and restoring of
>> VFIO
>> devices is done in QEMU.
>>
>> -Migration of VFIO devices currently consists of a single
>> stop-and-copy phase.
>> -During the stop-and-copy phase the guest is stopped and the entire
>> VFIO device
>> -data is transferred to the destination.
>> -
>> -The pre-copy phase of migration is currently not supported for VFIO
>> devices.
>> -Support for VFIO pre-copy will be added later on.
>> +Migration of VFIO devices consists of two phases: the optional
>> pre-copy phase,
>> +and the stop-and-copy phase. The pre-copy phase is iterative and
>> allows to
>> +accommodate VFIO devices that have a large amount of data that needs
>> to be
>> +transferred. The iterative pre-copy phase of migration allows for
>> the guest to
>> +continue whilst the VFIO device state is transferred to the
>> destination, this
>> +helps to reduce the total downtime of the VM. VFIO devices opt-in to
>> pre-copy
>> +support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
>> +VFIO_DEVICE_FEATURE_MIGRATION ioctl.
>>
>> Note that currently VFIO migration is supported only for a single
>> device. This
>> is due to VFIO migration's lack of P2P support. However, P2P
>> support is planned
>> @@ -29,10 +31,20 @@ VFIO implements the device hooks for the
>> iterative approach as follows:
>> * A ``load_setup`` function that sets the VFIO device on the
>> destination in
>> _RESUMING state.
>>
>> +* A ``state_pending_estimate`` function that reports an estimate of the
>> + remaining pre-copy data that the vendor driver has yet to save for
>> the VFIO
>> + device.
>> +
>> * A ``state_pending_exact`` function that reads pending_bytes from
>> the vendor
>> driver, which indicates the amount of data that the vendor driver
>> has yet to
>> save for the VFIO device.
>>
>> +* An ``is_active_iterate`` function that indicates
>> ``save_live_iterate`` is
>> + active only when the VFIO device is in pre-copy states.
>> +
>> +* A ``save_live_iterate`` function that reads the VFIO device's data
>> from the
>> + vendor driver during iterative pre-copy phase.
>> +
>> * A ``save_state`` function to save the device config space if it
>> is present.
>>
>> * A ``save_live_complete_precopy`` function that sets the VFIO
>> device in
>> @@ -111,8 +123,10 @@ Flow of state changes during Live migration
>> ===========================================
>>
>> Below is the flow of state change during live migration.
>> -The values in the brackets represent the VM state, the migration
>> state, and
>> +The values in the parentheses represent the VM state, the migration
>> state, and
>> the VFIO device state, respectively.
>> +The text in the square brackets represents the flow if the VFIO
>> device supports
>> +pre-copy.
>>
>> Live migration save path
>> ------------------------
>> @@ -124,11 +138,12 @@ Live migration save path
>> |
>> migrate_init spawns migration_thread
>> Migration thread then calls each device's
>> .save_setup()
>> - (RUNNING, _SETUP, _RUNNING)
>> + (RUNNING, _SETUP, _RUNNING [_PRE_COPY])
>> |
>> - (RUNNING, _ACTIVE, _RUNNING)
>> - If device is active, get pending_bytes by
>> .state_pending_exact()
>> + (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
>> + If device is active, get pending_bytes by
>> .state_pending_{estimate,exact}()
>> If total pending_bytes >= threshold_size, call
>> .save_live_iterate()
>> + [Data of VFIO device for pre-copy phase is copied]
>> Iterate till total pending bytes converge and are less than
>> threshold
>> |
>> On migration completion, vCPU stops and calls
>> .save_live_complete_precopy for
>> diff --git a/include/hw/vfio/vfio-common.h
>> b/include/hw/vfio/vfio-common.h
>> index 5f29dab839..1db901c194 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -67,6 +67,8 @@ typedef struct VFIOMigration {
>> void *data_buffer;
>> size_t data_buffer_size;
>> uint64_t mig_flags;
>> + uint64_t precopy_init_size;
>> + uint64_t precopy_dirty_size;
>> } VFIOMigration;
>>
>> typedef struct VFIOAddressSpace {
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 78358ede27..b73086e17a 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -492,7 +492,8 @@ static bool
>> vfio_devices_all_dirty_tracking(VFIOContainer *container)
>> }
>>
>> if (vbasedev->pre_copy_dirty_page_tracking ==
>> ON_OFF_AUTO_OFF &&
>> - migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
>> + (migration->device_state ==
>> VFIO_DEVICE_STATE_RUNNING ||
>> + migration->device_state ==
>> VFIO_DEVICE_STATE_PRE_COPY)) {
>> return false;
>> }
>> }
>> @@ -537,7 +538,8 @@ static bool
>> vfio_devices_all_running_and_mig_active(VFIOContainer *container)
>> return false;
>> }
>>
>> - if (migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
>> + if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
>> + migration->device_state ==
>> VFIO_DEVICE_STATE_PRE_COPY) {
>> continue;
>> } else {
>> return false;
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 8d33414379..d8f6a22ae1 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -68,6 +68,8 @@ static const char *mig_state_to_str(enum
>> vfio_device_mig_state state)
>> return "STOP_COPY";
>> case VFIO_DEVICE_STATE_RESUMING:
>> return "RESUMING";
>> + case VFIO_DEVICE_STATE_PRE_COPY:
>> + return "PRE_COPY";
>> default:
>> return "UNKNOWN STATE";
>> }
>> @@ -241,6 +243,25 @@ static int vfio_query_stop_copy_size(VFIODevice
>> *vbasedev,
>> return 0;
>> }
>>
>> +static int vfio_query_precopy_size(VFIOMigration *migration)
>> +{
>> + struct vfio_precopy_info precopy = {
>> + .argsz = sizeof(precopy),
>> + };
>> +
>> + migration->precopy_init_size = 0;
>> + migration->precopy_dirty_size = 0;
>> +
>> + if (ioctl(migration->data_fd, VFIO_MIG_GET_PRECOPY_INFO,
>> &precopy)) {
>> + return -errno;
>> + }
>> +
>> + migration->precopy_init_size = precopy.initial_bytes;
>> + migration->precopy_dirty_size = precopy.dirty_bytes;
>> +
>> + return 0;
>> +}
>> +
>> /* Returns the size of saved data on success and -errno on error */
>> static ssize_t vfio_save_block(QEMUFile *f, VFIOMigration *migration)
>> {
>> @@ -249,6 +270,14 @@ static ssize_t vfio_save_block(QEMUFile *f,
>> VFIOMigration *migration)
>> data_size = read(migration->data_fd, migration->data_buffer,
>> migration->data_buffer_size);
>> if (data_size < 0) {
>> + /*
>> + * Pre-copy emptied all the device state for now. For more
>> information,
>> + * please refer to the Linux kernel VFIO uAPI.
>> + */
>> + if (errno == ENOMSG) {
>> + return 0;
>> + }
>> +
>> return -errno;
>> }
>> if (data_size == 0) {
>> @@ -265,6 +294,38 @@ static ssize_t vfio_save_block(QEMUFile *f,
>> VFIOMigration *migration)
>> return qemu_file_get_error(f) ?: data_size;
>> }
>>
>> +static void vfio_update_estimated_pending_data(VFIOMigration
>> *migration,
>> + uint64_t data_size)
>> +{
>> + if (!data_size) {
>> + /*
>> + * Pre-copy emptied all the device state for now, update
>> estimated sizes
>> + * accordingly.
>> + */
>> + migration->precopy_init_size = 0;
>> + migration->precopy_dirty_size = 0;
>> +
>> + return;
>> + }
>> +
>> + if (migration->precopy_init_size) {
>> + uint64_t init_size = MIN(migration->precopy_init_size,
>> data_size);
>> +
>> + migration->precopy_init_size -= init_size;
>> + data_size -= init_size;
>> + }
>> +
>> + migration->precopy_dirty_size -= MIN(migration->precopy_dirty_size,
>> + data_size);
>> +}
>> +
>> +static bool vfio_precopy_supported(VFIODevice *vbasedev)
>> +{
>> + VFIOMigration *migration = vbasedev->migration;
>> +
>> + return migration->mig_flags & VFIO_MIGRATION_PRE_COPY;
>> +}
>> +
>> /*
>> ----------------------------------------------------------------------
>> */
>>
>> static int vfio_save_setup(QEMUFile *f, void *opaque)
>> @@ -285,6 +346,28 @@ static int vfio_save_setup(QEMUFile *f, void
>> *opaque)
>> return -ENOMEM;
>> }
>>
>> + if (vfio_precopy_supported(vbasedev)) {
>> + int ret;
>> +
>> + switch (migration->device_state) {
>> + case VFIO_DEVICE_STATE_RUNNING:
>> + ret = vfio_migration_set_state(vbasedev,
>> VFIO_DEVICE_STATE_PRE_COPY,
>> + VFIO_DEVICE_STATE_RUNNING);
>> + if (ret) {
>> + return ret;
>> + }
>> +
>> + vfio_query_precopy_size(migration);
>> +
>> + break;
>> + case VFIO_DEVICE_STATE_STOP:
>> + /* vfio_save_complete_precopy() will go to STOP_COPY */
>> + break;
>> + default:
>> + return -EINVAL;
>> + }
>> + }
>> +
>> trace_vfio_save_setup(vbasedev->name,
>> migration->data_buffer_size);
>>
>> qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>> @@ -299,26 +382,42 @@ static void vfio_save_cleanup(void *opaque)
>>
>> g_free(migration->data_buffer);
>> migration->data_buffer = NULL;
>> + migration->precopy_init_size = 0;
>> + migration->precopy_dirty_size = 0;
>> vfio_migration_cleanup(vbasedev);
>> trace_vfio_save_cleanup(vbasedev->name);
>> }
>>
>> +static void vfio_state_pending_estimate(void *opaque, uint64_t
>> *must_precopy,
>> + uint64_t *can_postcopy)
>> +{
>> + VFIODevice *vbasedev = opaque;
>> + VFIOMigration *migration = vbasedev->migration;
>> +
>> + if (migration->device_state != VFIO_DEVICE_STATE_PRE_COPY) {
>> + return;
>> + }
>> +
>> + *must_precopy +=
>> + migration->precopy_init_size + migration->precopy_dirty_size;
>> +
>> + trace_vfio_state_pending_estimate(vbasedev->name, *must_precopy,
>> + *can_postcopy,
>> + migration->precopy_init_size,
>> + migration->precopy_dirty_size);
>> +}
>> +
>> /*
>> * Migration size of VFIO devices can be as little as a few KBs or
>> as big as
>> * many GBs. This value should be big enough to cover the worst case.
>> */
>> #define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
>>
>> -/*
>> - * Only exact function is implemented and not estimate function. The
>> reason is
>> - * that during pre-copy phase of migration the estimate function is
>> called
>> - * repeatedly while pending RAM size is over the threshold, thus
>> migration
>> - * can't converge and querying the VFIO device pending data size is
>> useless.
>> - */
>> static void vfio_state_pending_exact(void *opaque, uint64_t
>> *must_precopy,
>> uint64_t *can_postcopy)
>> {
>> VFIODevice *vbasedev = opaque;
>> + VFIOMigration *migration = vbasedev->migration;
>> uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
>>
>> /*
>> @@ -328,8 +427,48 @@ static void vfio_state_pending_exact(void
>> *opaque, uint64_t *must_precopy,
>> vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
>> *must_precopy += stop_copy_size;
>>
>> + if (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) {
>> + vfio_query_precopy_size(migration);
>> +
>> + *must_precopy +=
>> + migration->precopy_init_size +
>> migration->precopy_dirty_size;
>> + }
>> +
>> trace_vfio_state_pending_exact(vbasedev->name, *must_precopy,
>> *can_postcopy,
>> - stop_copy_size);
>> + stop_copy_size,
>> migration->precopy_init_size,
>> + migration->precopy_dirty_size);
>> +}
>> +
>> +static bool vfio_is_active_iterate(void *opaque)
>> +{
>> + VFIODevice *vbasedev = opaque;
>> + VFIOMigration *migration = vbasedev->migration;
>> +
>> + return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY;
>> +}
>> +
>> +static int vfio_save_iterate(QEMUFile *f, void *opaque)
>> +{
>> + VFIODevice *vbasedev = opaque;
>> + VFIOMigration *migration = vbasedev->migration;
>> + ssize_t data_size;
>> +
>> + data_size = vfio_save_block(f, migration);
>> + if (data_size < 0) {
>> + return data_size;
>> + }
>> + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>> +
>> + vfio_update_estimated_pending_data(migration, data_size);
>> +
>> + trace_vfio_save_iterate(vbasedev->name,
>> migration->precopy_init_size,
>> + migration->precopy_dirty_size);
>> +
>> + /*
>> + * A VFIO device's pre-copy dirty_bytes is not guaranteed to
>> reach zero.
>> + * Return 1 so following handlers will not be potentially blocked.
>> + */
>> + return 1;
>> }
>>
>> static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>> @@ -338,7 +477,7 @@ static int vfio_save_complete_precopy(QEMUFile
>> *f, void *opaque)
>> ssize_t data_size;
>> int ret;
>>
>> - /* We reach here with device state STOP only */
>> + /* We reach here with device state STOP or STOP_COPY only */
>> ret = vfio_migration_set_state(vbasedev,
>> VFIO_DEVICE_STATE_STOP_COPY,
>> VFIO_DEVICE_STATE_STOP);
>> if (ret) {
>> @@ -457,7 +596,10 @@ static int vfio_load_state(QEMUFile *f, void
>> *opaque, int version_id)
>> static const SaveVMHandlers savevm_vfio_handlers = {
>> .save_setup = vfio_save_setup,
>> .save_cleanup = vfio_save_cleanup,
>> + .state_pending_estimate = vfio_state_pending_estimate,
>> .state_pending_exact = vfio_state_pending_exact,
>> + .is_active_iterate = vfio_is_active_iterate,
>> + .save_live_iterate = vfio_save_iterate,
>> .save_live_complete_precopy = vfio_save_complete_precopy,
>> .save_state = vfio_save_state,
>> .load_setup = vfio_load_setup,
>> @@ -470,13 +612,18 @@ static const SaveVMHandlers
>> savevm_vfio_handlers = {
>> static void vfio_vmstate_change(void *opaque, bool running,
>> RunState state)
>> {
>> VFIODevice *vbasedev = opaque;
>> + VFIOMigration *migration = vbasedev->migration;
>> enum vfio_device_mig_state new_state;
>> int ret;
>>
>> if (running) {
>> new_state = VFIO_DEVICE_STATE_RUNNING;
>> } else {
>> - new_state = VFIO_DEVICE_STATE_STOP;
>> + new_state =
>> + (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY &&
>> + (state == RUN_STATE_FINISH_MIGRATE || state ==
>> RUN_STATE_PAUSED)) ?
>> + VFIO_DEVICE_STATE_STOP_COPY :
>> + VFIO_DEVICE_STATE_STOP;
>> }
>>
>> /*
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index 646e42fd27..548f9488a7 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -162,6 +162,8 @@ vfio_save_block(const char *name, int data_size)
>> " (%s) data_size %d"
>> vfio_save_cleanup(const char *name) " (%s)"
>> vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
>> vfio_save_device_config_state(const char *name) " (%s)"
>> +vfio_save_iterate(const char *name, uint64_t precopy_init_size,
>> uint64_t precopy_dirty_size) " (%s) precopy initial size 0x%"PRIx64"
>> precopy dirty size 0x%"PRIx64"
>
> the extra '"' at the end breaks compile. No need to resend just for that.
> It can be fixed.
>
Oh, strange that it doesn't break when I compile it.
Do you have any idea why would that be?
Thanks!
>
>
>> vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s)
>> data buffer size 0x%"PRIx64
>> -vfio_state_pending_exact(const char *name, uint64_t precopy,
>> uint64_t postcopy, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64"
>> postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64
>> +vfio_state_pending_estimate(const char *name, uint64_t precopy,
>> uint64_t postcopy, uint64_t precopy_init_size, uint64_t
>> precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64"
>> precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
>> +vfio_state_pending_exact(const char *name, uint64_t precopy,
>> uint64_t postcopy, uint64_t stopcopy_size, uint64_t
>> precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy
>> 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy
>> initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
>> vfio_vmstate_change(const char *name, int running, const char
>> *reason, const char *dev_state) " (%s) running %d reason %s device
>> state %s"
>
>>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events >>> index 646e42fd27..548f9488a7 100644 >>> --- a/hw/vfio/trace-events >>> +++ b/hw/vfio/trace-events >>> @@ -162,6 +162,8 @@ vfio_save_block(const char *name, int data_size) " (%s) data_size %d" >>> vfio_save_cleanup(const char *name) " (%s)" >>> vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d" >>> vfio_save_device_config_state(const char *name) " (%s)" >>> +vfio_save_iterate(const char *name, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64" >> >> the extra '"' at the end breaks compile. No need to resend just for that. >> It can be fixed. >> > Oh, strange that it doesn't break when I compile it. > Do you have any idea why would that be? It generates a -Werror=format= . Did you configure the build with --disable-werror ? C.
On 30/05/2023 13:17, Cédric Le Goater wrote: > External email: Use caution opening links or attachments > > >>>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events >>>> index 646e42fd27..548f9488a7 100644 >>>> --- a/hw/vfio/trace-events >>>> +++ b/hw/vfio/trace-events >>>> @@ -162,6 +162,8 @@ vfio_save_block(const char *name, int >>>> data_size) " (%s) data_size %d" >>>> vfio_save_cleanup(const char *name) " (%s)" >>>> vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d" >>>> vfio_save_device_config_state(const char *name) " (%s)" >>>> +vfio_save_iterate(const char *name, uint64_t precopy_init_size, >>>> uint64_t precopy_dirty_size) " (%s) precopy initial size >>>> 0x%"PRIx64" precopy dirty size 0x%"PRIx64" >>> >>> the extra '"' at the end breaks compile. No need to resend just for >>> that. >>> It can be fixed. >>> >> Oh, strange that it doesn't break when I compile it. >> Do you have any idea why would that be? > > It generates a -Werror=format= . > > Did you configure the build with --disable-werror ? Nope. configure prints this: User defined options [...] werror : true
© 2016 - 2026 Red Hat, Inc.