[PATCH] qemu: Fix hang when migration is canceled at the last moment

Jiri Denemark posted 1 patch 7 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/e27da909783f4dad1bad43c5e0f6a9c9959b943a.1737461159.git.jdenemar@redhat.com
src/qemu/qemu_migration.c | 7 +++++++
1 file changed, 7 insertions(+)
[PATCH] qemu: Fix hang when migration is canceled at the last moment
Posted by Jiri Denemark 7 months, 2 weeks ago
When a migration is canceled very late once virtual CPUs are already
stopped, QEMU will automatically resume them. If this happens after we
exited a waiting loop in qemuMigrationSrcWaitForCompletion, but before a
loop that tries to make sure CPUs are stopped by waiting for the
appropriate event, we may end up waiting forever because the CPUs are
running (they were resumed by migrate_cancel), but the STOP event is
already gone.

This is possible because we enter monitor for fetching migration
statistics at which point other APIs can be processed and migration may
change its state. We should recheck the state when we get back from the
monitor code.

https://issues.redhat.com/browse/RHEL-52493

Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
---
 src/qemu/qemu_migration.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index bb4d11e196..53bbbee629 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -2169,6 +2169,13 @@ qemuMigrationSrcWaitForCompletion(virDomainObj *vm,
 
     ignore_value(qemuMigrationAnyFetchStats(vm, asyncJob, jobData, NULL));
 
+    /* We need to recheck migration status here as it might have changed while
+     * we were fetching statistics. For example, the migration might have been
+     * canceled.
+     */
+    if ((rv = qemuMigrationAnyCompleted(vm, asyncJob, dconn, flags)) < 0)
+        return rv;
+
     qemuDomainJobDataUpdateTime(jobData);
     qemuDomainJobDataUpdateDowntime(jobData);
     g_clear_pointer(&vm->job->completed, virDomainJobDataFree);
-- 
2.48.1
Re: [PATCH] qemu: Fix hang when migration is canceled at the last moment
Posted by Michal Prívozník 7 months, 2 weeks ago
On 1/21/25 13:05, Jiri Denemark wrote:
> When a migration is canceled very late once virtual CPUs are already
> stopped, QEMU will automatically resume them. If this happens after we
> exited a waiting loop in qemuMigrationSrcWaitForCompletion, but before a
> loop that tries to make sure CPUs are stopped by waiting for the
> appropriate event, we may end up waiting forever because the CPUs are
> running (they were resumed by migrate_cancel), but the STOP event is
> already gone.
> 
> This is possible because we enter monitor for fetching migration
> statistics at which point other APIs can be processed and migration may
> change its state. We should recheck the state when we get back from the
> monitor code.
> 
> https://issues.redhat.com/browse/RHEL-52493
> 
> Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
> ---
>  src/qemu/qemu_migration.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Michal