[PATCH] migration: Fix blocking in POSTCOPY_DEVICE during package load

Pranav Tyagi posted 1 patch 7 hours ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260421052227.8278-1-prtyagi@redhat.com
Maintainers: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>
migration/migration.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
[PATCH] migration: Fix blocking in POSTCOPY_DEVICE during package load
Posted by Pranav Tyagi 7 hours ago
The package_loaded event is not set in case MIG_RP_MSG_PONG does not
arrive on the source from the destination in the return path thread. The
migration thread would then be blocked waiting for package_loaded event
indefinitely in POSTCOPY_DEVICE state. Where as, in such a condition the
source VM can safely resume as the destination has not yet started. The
pong message can get lost in case of a network failure or destination
crash before sending the pong.

This patch uses the error detected in case of network failure or
destination crash to set the package_loaded event in the out path of the
return path thread. This will kick the migration thread out from
a condition of indefinitely waiting for the package_loaded event. The
migration thread then fails early and breaks from the migration loop to
resume the VM on the source side.

Fixes: 7b842fe354c6 ("migration: Introduce POSTCOPY_DEVICE state")
Signed-off-by: Pranav Tyagi <prtyagi@redhat.com>
---
 migration/migration.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 5c9aaa6e58..1656c1203c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2386,6 +2386,15 @@ out:
     if (err) {
         migrate_error_propagate(ms, err);
         trace_source_return_path_thread_bad_end();
+        if (ms->state == MIGRATION_STATUS_POSTCOPY_DEVICE) {
+            /*
+             * Kick the migration thread if it gets stuck in
+             * POSTCOPY_DEVICE state waiting for
+             * postcopy_package_loaded_event. The event will never be
+             * set as MIG_RP_MSG_PONG from the destination is lost.
+             */
+            qemu_event_set(&ms->postcopy_package_loaded_event);
+        }
     }
 
     if (ms->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
@@ -3232,6 +3241,17 @@ static MigIterateState migration_iteration_run(MigrationState *s)
              * package before actually completing.
              */
             qemu_event_wait(&s->postcopy_package_loaded_event);
+            /*
+             * Check for errors in case the migration thread was stuck in
+             * POSTCOPY_DEVICE state waiting for the
+             * postcopy_package_loaded_event which was never set.
+             * If so, fail now and break out of the iteration.
+             */
+            if (migrate_has_error(s)) {
+                migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
+                                  MIGRATION_STATUS_FAILING);
+                return MIG_ITERATE_BREAK;
+            }
             migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
                               MIGRATION_STATUS_POSTCOPY_ACTIVE);
         }
-- 
2.53.0