The package_loaded event is not set in case MIG_RP_MSG_PONG does not
arrive on the source from the destination in the return path thread. The
migration thread would then be blocked waiting for package_loaded event
indefinitely in POSTCOPY_DEVICE state. Where as, in such a condition the
source VM can safely resume as the destination has not yet started. The
pong message can get lost in case of a network failure or destination
crash before sending the pong.
This patch uses the error detected in case of network failure or
destination crash to set the package_loaded event in the out path of the
return path thread. This will kick the migration thread out from
a condition of indefinitely waiting for the package_loaded event. The
migration thread then fails early and breaks from the migration loop to
resume the VM on the source side.
Fixes: 7b842fe354c6 ("migration: Introduce POSTCOPY_DEVICE state")
Signed-off-by: Pranav Tyagi <prtyagi@redhat.com>
---
migration/migration.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/migration/migration.c b/migration/migration.c
index 5c9aaa6e58..1656c1203c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2386,6 +2386,15 @@ out:
if (err) {
migrate_error_propagate(ms, err);
trace_source_return_path_thread_bad_end();
+ if (ms->state == MIGRATION_STATUS_POSTCOPY_DEVICE) {
+ /*
+ * Kick the migration thread if it gets stuck in
+ * POSTCOPY_DEVICE state waiting for
+ * postcopy_package_loaded_event. The event will never be
+ * set as MIG_RP_MSG_PONG from the destination is lost.
+ */
+ qemu_event_set(&ms->postcopy_package_loaded_event);
+ }
}
if (ms->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
@@ -3232,6 +3241,17 @@ static MigIterateState migration_iteration_run(MigrationState *s)
* package before actually completing.
*/
qemu_event_wait(&s->postcopy_package_loaded_event);
+ /*
+ * Check for errors in case the migration thread was stuck in
+ * POSTCOPY_DEVICE state waiting for the
+ * postcopy_package_loaded_event which was never set.
+ * If so, fail now and break out of the iteration.
+ */
+ if (migrate_has_error(s)) {
+ migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
+ MIGRATION_STATUS_FAILING);
+ return MIG_ITERATE_BREAK;
+ }
migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
MIGRATION_STATUS_POSTCOPY_ACTIVE);
}
--
2.53.0