[libvirt PATCH 2/2] qemu: Ignore failure in post-copy migration when QEMU says completed

Jiri Denemark posted 2 patches 3 years, 2 months ago
[libvirt PATCH 2/2] qemu: Ignore failure in post-copy migration when QEMU says completed
Posted by Jiri Denemark 3 years, 2 months ago
When post-copy migration is running in Finish phase we already did
everything needed and we're just waiting for all the memory to transfer
to the destination. The domain is already running on there at this
point. Once all data is transferred (QEMU sends a MIGRATION completed
event) we're done. So in this specific post-copy case the source does
not need to care about the result of the Finish call as long as QEMU
says migration completed. The Finish call to the destination daemon may
fail for reasons that do not affect QEMU, e.g., libvirt daemon was
restarted there or the libvirt connection broke.

Currently we just mark the post-copy migration as failed on the source
and keep the domain paused there. But when libvirt daemon is restarted
at this point, it will detect migration finished successfully and kill
the domain as migrated. It make sense to do this even without having to
restart the daemon.

Closes: https://gitlab.com/libvirt/libvirt/-/issues/338

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
---
 src/qemu/qemu_migration.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index bba4e1dbf3..bef06f4caf 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -3901,6 +3901,7 @@ qemuMigrationSrcConfirmPhase(virQEMUDriver *driver,
     g_autoptr(qemuMigrationCookie) mig = NULL;
     qemuDomainObjPrivate *priv = vm->privateData;
     qemuDomainJobPrivate *jobPriv = vm->job->privateData;
+    qemuDomainJobDataPrivate *currentData = vm->job->current->privateData;
     virDomainJobData *jobData = NULL;
     qemuMigrationJobPhase phase;
 
@@ -3911,6 +3912,13 @@ qemuMigrationSrcConfirmPhase(virQEMUDriver *driver,
 
     virCheckFlags(QEMU_MIGRATION_FLAGS, -1);
 
+    if (retcode != 0 &&
+        virDomainObjIsPostcopy(vm, VIR_DOMAIN_JOB_OPERATION_MIGRATION_OUT) &&
+        currentData->stats.mig.status == QEMU_MONITOR_MIGRATION_STATUS_COMPLETED) {
+        VIR_DEBUG("Finish phase failed, but QEMU reports post-copy migration is completed; forcing success");
+        retcode = 0;
+    }
+
     if (flags & VIR_MIGRATE_POSTCOPY_RESUME) {
         phase = QEMU_MIGRATION_PHASE_CONFIRM_RESUME;
     } else if (virDomainObjIsFailedPostcopy(vm)) {
-- 
2.38.1
Re: [libvirt PATCH 2/2] qemu: Ignore failure in post-copy migration when QEMU says completed
Posted by Peter Krempa 3 years, 2 months ago
On Fri, Nov 18, 2022 at 16:37:22 +0100, Jiri Denemark wrote:
> When post-copy migration is running in Finish phase we already did
> everything needed and we're just waiting for all the memory to transfer
> to the destination. The domain is already running on there at this
> point. Once all data is transferred (QEMU sends a MIGRATION completed
> event) we're done. So in this specific post-copy case the source does
> not need to care about the result of the Finish call as long as QEMU
> says migration completed. The Finish call to the destination daemon may
> fail for reasons that do not affect QEMU, e.g., libvirt daemon was
> restarted there or the libvirt connection broke.
> 
> Currently we just mark the post-copy migration as failed on the source
> and keep the domain paused there. But when libvirt daemon is restarted
> at this point, it will detect migration finished successfully and kill
> the domain as migrated. It make sense to do this even without having to
> restart the daemon.
> 
> Closes: https://gitlab.com/libvirt/libvirt/-/issues/338
> 
> Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
> ---
>  src/qemu/qemu_migration.c | 8 ++++++++
>  1 file changed, 8 insertions(+)

Reviewed-by: Peter Krempa <pkrempa@redhat.com>