src/qemu/qemu_migration.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
The postcopy-recover migration state in QEMU means a connection for the
migration stream was established. Depending on the schedulers on both
hosts a relative timing of the corresponding MIGRATION event on the
source host and the destination host may differ. Specifically it's
possible that the source sees postcopy-recover while the destination is
still in postcopy-paused.
Currently the Perform phase on the source host ends when we get
postcopy-recover event and the Finish phase on the destination host is
called. If this is fast enough we can still see postcopy-paused state
when the Finish phase starts waiting for migration to complete. This is
interpreted as a failure and reported back to the caller. Even though
the recovery may actually start just a few moments later.
To avoid this race we now don't consider post-copy migration active in
postcopy-recover state and keep waiting for postcopy-active event (in
the success path). Thus the Finish phase is entered only after the
migration switches to postcopy-active. In this state QEMU guarantees the
destination already switched at least to postcopy-recover and we won't
be confused be seeing an old postcopy-failed state.
https://issues.redhat.com/browse/RHEL-73085
Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
---
src/qemu/qemu_migration.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index 50e350b0c4..1582a738a3 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -1872,11 +1872,11 @@ qemuMigrationUpdateJobType(virDomainJobData *jobData)
switch ((qemuMonitorMigrationStatus) priv->stats.mig.status) {
case QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY:
- case QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_RECOVER:
jobData->status = VIR_DOMAIN_JOB_STATUS_POSTCOPY;
break;
case QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
+ case QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_RECOVER:
jobData->status = VIR_DOMAIN_JOB_STATUS_POSTCOPY_RECOVER;
break;
--
2.47.1
On 1/10/25 18:41, Jiri Denemark wrote: > The postcopy-recover migration state in QEMU means a connection for the > migration stream was established. Depending on the schedulers on both > hosts a relative timing of the corresponding MIGRATION event on the > source host and the destination host may differ. Specifically it's > possible that the source sees postcopy-recover while the destination is > still in postcopy-paused. > > Currently the Perform phase on the source host ends when we get > postcopy-recover event and the Finish phase on the destination host is > called. If this is fast enough we can still see postcopy-paused state > when the Finish phase starts waiting for migration to complete. This is > interpreted as a failure and reported back to the caller. Even though > the recovery may actually start just a few moments later. > > To avoid this race we now don't consider post-copy migration active in > postcopy-recover state and keep waiting for postcopy-active event (in > the success path). Thus the Finish phase is entered only after the > migration switches to postcopy-active. In this state QEMU guarantees the > destination already switched at least to postcopy-recover and we won't > be confused be seeing an old postcopy-failed state. > > https://issues.redhat.com/browse/RHEL-73085 > > Signed-off-by: Jiri Denemark <jdenemar@redhat.com> > --- > src/qemu/qemu_migration.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Michal
© 2016 - 2025 Red Hat, Inc.