[PATCHv2 4/4] qemu, libxl: migration: Call hook script on source host when migration fails

Guy Godfroy via Devel posted 4 patches 2 days, 9 hours ago
There is a newer version of this series
[PATCHv2 4/4] qemu, libxl: migration: Call hook script on source host when migration fails
Posted by Guy Godfroy via Devel 2 days, 9 hours ago
When outgoing migration fails after the "migrate-outgoing begin source"
hook was executed, the source host has no way to be notified and cannot
undo changes made in the begin hook (e.g., restoring exclusive storage
locks after switching to shared mode).

Add a hook call on the source host when migration fails, using the
"migrate-outgoing" operation with "end" as the sub-operation and
"failed" as the extra argument. The hook output and return code are
ignored since migration has already failed.

For qemu, the fail hook is called in the confirm phase when retcode != 0.
For libxl, the fail hook is called in both the perform phase (when
perform fails, since confirm is not called) and the confirm phase when
cancelled is true.

Resolves: https://gitlab.com/libvirt/libvirt/-/issues/37
Signed-off-by: Guy Godfroy <guy.godfroy@gugod.fr>
---
 docs/hooks.rst              | 27 ++++++++++++++++++++++++++-
 src/libxl/libxl_migration.c | 22 ++++++++++++++++++++++
 src/qemu/qemu_migration.c   | 12 ++++++++++++
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/docs/hooks.rst b/docs/hooks.rst
index cff1711536..4fe02b7c28 100644
--- a/docs/hooks.rst
+++ b/docs/hooks.rst
@@ -269,6 +269,17 @@ operation. There is no specific operation to indicate a "restart" is occurring.
    script returns failure, the migration will be canceled. This hook may be used,
    e.g., to change storage lock mode from exclusive to shared before migration.
 
+-  :since:`Since 12.3.0`, when migration fails on the source host, the qemu
+   hook script is called to allow the source host to undo any changes made in
+   the begin hook. It is called as:
+
+   ::
+
+      /etc/libvirt/hooks/qemu guest_name migrate-outgoing end failed
+
+   with domain XML sent to standard input of the script. Any output and the
+   return code of the script are ignored.
+
 -  :since:`Since 1.2.9`, the qemu hook script is also called when restoring a
    saved image either via the API or automatically when restoring a managed save
    machine. It is called as:
@@ -454,6 +465,17 @@ operation. There is no specific operation to indicate a "restart" is occurring.
    script returns failure, the migration will be canceled. This hook may be used,
    e.g., to change storage lock mode from exclusive to shared before migration.
 
+-  :since:`Since 12.3.0`, when migration fails on the source host, the libxl
+   hook script is called to allow the source host to undo any changes made in
+   the begin hook. It is called as:
+
+   ::
+
+      /etc/libvirt/hooks/libxl guest_name migrate-outgoing end failed
+
+   with domain XML sent to standard input of the script. Any output and the
+   return code of the script are ignored.
+
 -  :since:`Since 6.5.0`, you can also place several hook scripts in the
    directory ``/etc/libvirt/hooks/libxl.d/``. They are executed in alphabetical
    order after main script. In this case each script also acts as filter and can
@@ -599,7 +621,10 @@ destination hosts:
    called for domain start are executed on **destination** host.
 #. If all of these hook script executions exit successfully (exit status 0),
    the migration continues. Any other exit code indicates failure, and the
-   migration is aborted.
+   migration is aborted. If migration is aborted, the *qemu* hook script on
+   the **source** host is executed with the "migrate-outgoing" operation,
+   "end" sub-operation, and "failed" as the extra argument
+   (:since:`since 12.3.0`). The return code is ignored.
 #. The QEMU guest is then migrated to the destination host.
 #. Unless an error occurs during the migration process, the *qemu* hook script
    on the **source** host is then executed with the "stopped" and "release"
diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c
index be51cfd316..4e7e2bdbe9 100644
--- a/src/libxl/libxl_migration.c
+++ b/src/libxl/libxl_migration.c
@@ -1237,6 +1237,17 @@ libxlDomainMigrationSrcPerform(libxlDriverPrivate *driver,
             VIR_WARN("Unable to release lease on %s", vm->def->name);
         }
     } else {
+        /* Call hook to notify source host that migration failed */
+        if (virHookPresent(VIR_HOOK_DRIVER_LIBXL)) {
+            g_autofree char *hookxml = NULL;
+
+            if ((hookxml = virDomainDefFormat(vm->def, driver->xmlopt,
+                                              VIR_DOMAIN_DEF_FORMAT_SECURE)))
+                virHookCall(VIR_HOOK_DRIVER_LIBXL, vm->def->name,
+                            VIR_HOOK_LIBXL_OP_MIGRATE_OUTGOING, VIR_HOOK_SUBOP_END,
+                            "failed", hookxml, NULL);
+        }
+
         /*
          * Confirm phase will not be executed if perform fails. End the
          * job started in begin phase.
@@ -1358,6 +1369,17 @@ libxlDomainMigrationSrcConfirm(libxlDriverPrivate *driver,
     virObjectEvent *event = NULL;
 
     if (cancelled) {
+        /* Call hook to notify source host that migration failed */
+        if (virHookPresent(VIR_HOOK_DRIVER_LIBXL)) {
+            g_autofree char *hookxml = NULL;
+
+            if ((hookxml = virDomainDefFormat(vm->def, driver->xmlopt,
+                                              VIR_DOMAIN_DEF_FORMAT_SECURE)))
+                virHookCall(VIR_HOOK_DRIVER_LIBXL, vm->def->name,
+                            VIR_HOOK_LIBXL_OP_MIGRATE_OUTGOING, VIR_HOOK_SUBOP_END,
+                            "failed", hookxml, NULL);
+        }
+
         /* Resume lock process that was paused in MigrationSrcPerform */
         virDomainLockProcessResume(driver->lockManager,
                                    "xen:///system",
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index e9ce2d8b8b..afb5161ea3 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -4218,6 +4218,18 @@ qemuMigrationSrcConfirmPhase(virQEMUDriver *driver,
             qemuDomainSetMaxMemLock(vm, 0, &priv->preMigrationMemlock);
         }
 
+        /* Call hook to notify source host that migration failed */
+        if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) {
+            g_autofree char *xml = NULL;
+
+            if ((xml = qemuDomainDefFormatLive(driver, priv->qemuCaps,
+                                               vm->def, priv->origCPU,
+                                               false, true)))
+                virHookCall(VIR_HOOK_DRIVER_QEMU, vm->def->name,
+                            VIR_HOOK_QEMU_OP_MIGRATE_OUTGOING, VIR_HOOK_SUBOP_END,
+                            "failed", xml, NULL);
+        }
+
         qemuDomainSaveStatus(vm);
         virErrorRestore(&orig_err);
     }
-- 
2.53.0