[libvirt] [PATCH] qemu: Fix post-copy migration on the source

Jiri Denemark posted 1 patch 5 years, 5 months ago
Test syntax-check passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/db827da851b60fd9d4d99714862f514e6a7e515a.1542374193.git.jdenemar@redhat.com
src/qemu/qemu_process.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
[libvirt] [PATCH] qemu: Fix post-copy migration on the source
Posted by Jiri Denemark 5 years, 5 months ago
Post-copy migration has been broken on the source since commit
v3.8.0-245-g32c29f10db which implemented support for
pause-before-switchover QEMU migration capability.

Even though the migration itself went well, the source did not really
know when it switched to the post-copy mode despite the messages logged
by MIGRATION event handler. As a result of this, the events emitted by
source libvirtd were not accurate and statistics of the completed
migration would cover only the pre-copy part of migration. Moreover, if
migration failed during the post-copy phase for some reason, the source
libvirtd would just happily resume the domain, which could lead to disk
corruption.

With the pause-before-switchover capability enabled, the order of events
emitted by QEMU changed:

                    pause-before-switchover
           disabled                        enabled
    MIGRATION, potcopy-active       STOP
    STOP                            MIGRATION, pre-switchover
                                    MIGRATION, postcopy-active

The STOP even handler checks the migration status (postcopy-active) and
sets the domain state accordingly. Which is sufficient when
pause-before-switchover is disabled, but once we enable it, the
migration status is still active when we get STOP from QEMU. Thus the
domain state set in the STOP handler has to be corrected once we are
notified that migration changed to postcopy-active.

This results in two SUSPENDED events to be emitted by the source
libvirtd during post-copy migration. The first one with
VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED detail, while the second one reports
the corrected VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY detail. This is
inevitable because we don't know whether migration will eventually
switch to post-copy at the time we emit the first event.

https://bugzilla.redhat.com/show_bug.cgi?id=1647365

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
---
 src/qemu/qemu_process.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 622341b8a4..2c89978996 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -1543,9 +1543,13 @@ static int
 qemuProcessHandleMigrationStatus(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
                                  virDomainObjPtr vm,
                                  int status,
-                                 void *opaque ATTRIBUTE_UNUSED)
+                                 void *opaque)
 {
     qemuDomainObjPrivatePtr priv;
+    virQEMUDriverPtr driver = opaque;
+    virObjectEventPtr event = NULL;
+    virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
+    int reason;
 
     virObjectLock(vm);
 
@@ -1562,8 +1566,28 @@ qemuProcessHandleMigrationStatus(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
     priv->job.current->stats.mig.status = status;
     virDomainObjBroadcast(vm);
 
+    if (status == QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY &&
+        virDomainObjGetState(vm, &reason) == VIR_DOMAIN_PAUSED &&
+        reason == VIR_DOMAIN_PAUSED_MIGRATION) {
+        VIR_DEBUG("Correcting paused state reason for domain %s to %s",
+                  vm->def->name,
+                  virDomainPausedReasonTypeToString(VIR_DOMAIN_PAUSED_POSTCOPY));
+
+        virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_POSTCOPY);
+        event = virDomainEventLifecycleNewFromObj(vm,
+                                                  VIR_DOMAIN_EVENT_SUSPENDED,
+                                                  VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY);
+
+        if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm, driver->caps) < 0) {
+            VIR_WARN("Unable to save status on vm %s after state change",
+                     vm->def->name);
+        }
+    }
+
  cleanup:
     virObjectUnlock(vm);
+    virObjectEventStateQueue(driver->domainEventState, event);
+    virObjectUnref(cfg);
     return 0;
 }
 
-- 
2.19.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: Fix post-copy migration on the source
Posted by Ján Tomko 5 years, 4 months ago
On Fri, Nov 16, 2018 at 02:16:33PM +0100, Jiri Denemark wrote:
>Post-copy migration has been broken on the source since commit
>v3.8.0-245-g32c29f10db which implemented support for
>pause-before-switchover QEMU migration capability.
>
>Even though the migration itself went well, the source did not really
>know when it switched to the post-copy mode despite the messages logged
>by MIGRATION event handler. As a result of this, the events emitted by
>source libvirtd were not accurate and statistics of the completed
>migration would cover only the pre-copy part of migration. Moreover, if
>migration failed during the post-copy phase for some reason, the source
>libvirtd would just happily resume the domain, which could lead to disk
>corruption.
>
>With the pause-before-switchover capability enabled, the order of events
>emitted by QEMU changed:
>
>                    pause-before-switchover
>           disabled                        enabled
>    MIGRATION, potcopy-active       STOP

s/pot/post/

But I guess it's just a matter of time until someone invents a
virtualization technology using plants, pots and gardening for the
terminology.

>    STOP                            MIGRATION, pre-switchover
>                                    MIGRATION, postcopy-active
>
>The STOP even handler checks the migration status (postcopy-active) and
>sets the domain state accordingly. Which is sufficient when
>pause-before-switchover is disabled, but once we enable it, the
>migration status is still active when we get STOP from QEMU. Thus the
>domain state set in the STOP handler has to be corrected once we are
>notified that migration changed to postcopy-active.
>
>This results in two SUSPENDED events to be emitted by the source
>libvirtd during post-copy migration. The first one with
>VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED detail, while the second one reports
>the corrected VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY detail. This is
>inevitable because we don't know whether migration will eventually
>switch to post-copy at the time we emit the first event.
>
>https://bugzilla.redhat.com/show_bug.cgi?id=1647365
>
>Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
>---
> src/qemu/qemu_process.c | 26 +++++++++++++++++++++++++-
> 1 file changed, 25 insertions(+), 1 deletion(-)
>

Reviewed-by: Ján Tomko <jtomko@redhat.com>

Jano
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list