[v3] migration: Introduce POSTCOPY_DEVICE state

[PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy()

Posted by Juraj Marcin 3 months, 1 week ago

From: Juraj Marcin <jmarcin@redhat.com>

Currently, there are two functions that are responsible for calling the
cleanup of the incoming migration state. With successful precopy, it's
the incoming migration coroutine, and with successful postcopy it's the
postcopy listen thread. However, if postcopy fails during in the device
load, both functions will try to do the cleanup.

This patch refactors all cleanup that needs to be done on the incoming
side into a common function and defines a clear boundary, who is
responsible for the cleanup. The incoming migration coroutine is
responsible for calling the cleanup function, unless the listen thread
has been started, in which case the postcopy listen thread runs the
incoming migration cleanup in its BH.

Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
---
 migration/migration.c    | 44 +++++++++-------------------
 migration/migration.h    |  1 +
 migration/postcopy-ram.c | 63 +++++++++++++++++++++-------------------
 migration/trace-events   |  2 +-
 4 files changed, 49 insertions(+), 61 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 9a367f717e..637be71bfe 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -438,10 +438,15 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
 
 void migration_incoming_state_destroy(void)
 {
-    struct MigrationIncomingState *mis = migration_incoming_get_current();
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyState ps = postcopy_state_get();
 
     multifd_recv_cleanup();
 
+    if (ps != POSTCOPY_INCOMING_NONE) {
+        postcopy_incoming_cleanup(mis);
+    }
+
     /*
      * RAM state cleanup needs to happen after multifd cleanup, because
      * multifd threads can use some of its states (receivedmap).
@@ -866,7 +871,6 @@ process_incoming_migration_co(void *opaque)
 {
     MigrationState *s = migrate_get_current();
     MigrationIncomingState *mis = migration_incoming_get_current();
-    PostcopyState ps;
     int ret;
     Error *local_err = NULL;
 
@@ -883,25 +887,14 @@ process_incoming_migration_co(void *opaque)
 
     trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
 
-    ps = postcopy_state_get();
-    trace_process_incoming_migration_co_end(ret, ps);
-    if (ps != POSTCOPY_INCOMING_NONE) {
-        if (ps == POSTCOPY_INCOMING_ADVISE) {
-            /*
-             * Where a migration had postcopy enabled (and thus went to advise)
-             * but managed to complete within the precopy period, we can use
-             * the normal exit.
-             */
-            postcopy_incoming_cleanup(mis);
-        } else if (ret >= 0) {
-            /*
-             * Postcopy was started, cleanup should happen at the end of the
-             * postcopy thread.
-             */
-            trace_process_incoming_migration_co_postcopy_end_main();
-            goto out;
-        }
-        /* Else if something went wrong then just fall out of the normal exit */
+    trace_process_incoming_migration_co_end(ret);
+    if (mis->have_listen_thread) {
+        /*
+         * Postcopy was started, cleanup should happen at the end of the
+         * postcopy listen thread.
+         */
+        trace_process_incoming_migration_co_postcopy_end_main();
+        goto out;
     }
 
     if (ret < 0) {
@@ -933,15 +926,6 @@ fail:
         }
 
         exit(EXIT_FAILURE);
-    } else {
-        /*
-         * Report the error here in case that QEMU abruptly exits
-         * when postcopy is enabled.
-         */
-        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
-            error_report_err(s->error);
-            s->error = NULL;
-        }
     }
 out:
     /* Pairs with the refcount taken in qmp_migrate_incoming() */
diff --git a/migration/migration.h b/migration/migration.h
index 01329bf824..4a37f7202c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -254,6 +254,7 @@ struct MigrationIncomingState {
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
 void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
+void migration_incoming_qemu_exit(void);
 /*
  * Functions to work with blocktime context
  */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index b47c955763..48cbb46c27 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2078,6 +2078,24 @@ bool postcopy_is_paused(MigrationStatus status)
         status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
 }
 
+static void postcopy_listen_thread_bh(void *opaque)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    migration_incoming_state_destroy();
+
+    if (mis->state == MIGRATION_STATUS_FAILED) {
+        /*
+         * If something went wrong then we have a bad state so exit;
+         * we only could have gotten here if something failed before
+         * POSTCOPY_INCOMING_RUNNING (for example device load), otherwise
+         * postcopy migration would pause inside qemu_loadvm_state_main().
+         * Failing dirty-bitmaps won't fail the whole migration.
+         */
+        exit(1);
+    }
+}
+
 /*
  * Triggered by a postcopy_listen command; this thread takes over reading
  * the input stream, leaving the main thread free to carry on loading the rest
@@ -2131,53 +2149,38 @@ static void *postcopy_listen_thread(void *opaque)
                          "bitmaps are correctly migrated and valid.",
                          __func__, load_res, error_get_pretty(local_err));
             g_clear_pointer(&local_err, error_free);
-            load_res = 0; /* prevent further exit() */
         } else {
+            /*
+             * Something went fatally wrong and we have a bad state, QEMU will
+             * exit depending on if postcopy-exit-on-error is true, but the
+             * migration cannot be recovered.
+             */
             error_prepend(&local_err,
                           "loadvm failed during postcopy: %d: ", load_res);
             migrate_set_error(migr, local_err);
             g_clear_pointer(&local_err, error_report_err);
             migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                                            MIGRATION_STATUS_FAILED);
+            goto out;
         }
     }
-    if (load_res >= 0) {
-        /*
-         * This looks good, but it's possible that the device loading in the
-         * main thread hasn't finished yet, and so we might not be in 'RUN'
-         * state yet; wait for the end of the main thread.
-         */
-        qemu_event_wait(&mis->main_thread_load_event);
-    }
-    postcopy_incoming_cleanup(mis);
-
-    if (load_res < 0) {
-        /*
-         * If something went wrong then we have a bad state so exit;
-         * depending how far we got it might be possible at this point
-         * to leave the guest running and fire MCEs for pages that never
-         * arrived as a desperate recovery step.
-         */
-        rcu_unregister_thread();
-        exit(EXIT_FAILURE);
-    }
+    /*
+     * This looks good, but it's possible that the device loading in the
+     * main thread hasn't finished yet, and so we might not be in 'RUN'
+     * state yet; wait for the end of the main thread.
+     */
+    qemu_event_wait(&mis->main_thread_load_event);
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                                    MIGRATION_STATUS_COMPLETED);
-    /*
-     * If everything has worked fine, then the main thread has waited
-     * for us to start, and we're the last use of the mis.
-     * (If something broke then qemu will have to exit anyway since it's
-     * got a bad migration state).
-     */
-    bql_lock();
-    migration_incoming_state_destroy();
-    bql_unlock();
 
+out:
     rcu_unregister_thread();
     mis->have_listen_thread = false;
     postcopy_state_set(POSTCOPY_INCOMING_END);
 
+    migration_bh_schedule(postcopy_listen_thread_bh, NULL);
+
     object_unref(OBJECT(migr));
 
     return NULL;
diff --git a/migration/trace-events b/migration/trace-events
index e8edd1fbba..772636f3ac 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -193,7 +193,7 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
 source_return_path_thread_switchover_acked(void) ""
 migration_thread_low_pending(uint64_t pending) "%" PRIu64
 migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
-process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
+process_incoming_migration_co_end(int ret) "ret=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 postcopy_preempt_enabled(bool value) "%d"
 migration_precopy_complete(void) ""
-- 
2.51.0

Re: [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy()

Posted by Peter Xu 3 months, 1 week ago

On Thu, Oct 30, 2025 at 10:49:08PM +0100, Juraj Marcin wrote:
> From: Juraj Marcin <jmarcin@redhat.com>
> 
> Currently, there are two functions that are responsible for calling the
> cleanup of the incoming migration state. With successful precopy, it's
> the incoming migration coroutine, and with successful postcopy it's the
> postcopy listen thread. However, if postcopy fails during in the device
> load, both functions will try to do the cleanup.
> 
> This patch refactors all cleanup that needs to be done on the incoming
> side into a common function and defines a clear boundary, who is
> responsible for the cleanup. The incoming migration coroutine is
> responsible for calling the cleanup function, unless the listen thread
> has been started, in which case the postcopy listen thread runs the
> incoming migration cleanup in its BH.
> 
> Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
> ---
>  migration/migration.c    | 44 +++++++++-------------------
>  migration/migration.h    |  1 +
>  migration/postcopy-ram.c | 63 +++++++++++++++++++++-------------------
>  migration/trace-events   |  2 +-
>  4 files changed, 49 insertions(+), 61 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 9a367f717e..637be71bfe 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -438,10 +438,15 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
>  
>  void migration_incoming_state_destroy(void)
>  {
> -    struct MigrationIncomingState *mis = migration_incoming_get_current();
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    PostcopyState ps = postcopy_state_get();
>  
>      multifd_recv_cleanup();
>  
> +    if (ps != POSTCOPY_INCOMING_NONE) {
> +        postcopy_incoming_cleanup(mis);
> +    }
> +
>      /*
>       * RAM state cleanup needs to happen after multifd cleanup, because
>       * multifd threads can use some of its states (receivedmap).
> @@ -866,7 +871,6 @@ process_incoming_migration_co(void *opaque)
>  {
>      MigrationState *s = migrate_get_current();
>      MigrationIncomingState *mis = migration_incoming_get_current();
> -    PostcopyState ps;
>      int ret;
>      Error *local_err = NULL;
>  
> @@ -883,25 +887,14 @@ process_incoming_migration_co(void *opaque)
>  
>      trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
>  
> -    ps = postcopy_state_get();
> -    trace_process_incoming_migration_co_end(ret, ps);
> -    if (ps != POSTCOPY_INCOMING_NONE) {
> -        if (ps == POSTCOPY_INCOMING_ADVISE) {
> -            /*
> -             * Where a migration had postcopy enabled (and thus went to advise)
> -             * but managed to complete within the precopy period, we can use
> -             * the normal exit.
> -             */
> -            postcopy_incoming_cleanup(mis);
> -        } else if (ret >= 0) {
> -            /*
> -             * Postcopy was started, cleanup should happen at the end of the
> -             * postcopy thread.
> -             */
> -            trace_process_incoming_migration_co_postcopy_end_main();
> -            goto out;
> -        }
> -        /* Else if something went wrong then just fall out of the normal exit */
> +    trace_process_incoming_migration_co_end(ret);
> +    if (mis->have_listen_thread) {
> +        /*
> +         * Postcopy was started, cleanup should happen at the end of the
> +         * postcopy listen thread.
> +         */
> +        trace_process_incoming_migration_co_postcopy_end_main();
> +        goto out;
>      }
>  
>      if (ret < 0) {
> @@ -933,15 +926,6 @@ fail:
>          }
>  
>          exit(EXIT_FAILURE);
> -    } else {
> -        /*
> -         * Report the error here in case that QEMU abruptly exits
> -         * when postcopy is enabled.
> -         */
> -        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> -            error_report_err(s->error);
> -            s->error = NULL;
> -        }

The patch looks all good itself.  Here a pure question: is the old code
wrong?  If user sets exit_on_error=false, then this path seems to be
releasing the error object, then query-migrate will see nothing?

>      }
>  out:
>      /* Pairs with the refcount taken in qmp_migrate_incoming() */
> diff --git a/migration/migration.h b/migration/migration.h
> index 01329bf824..4a37f7202c 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -254,6 +254,7 @@ struct MigrationIncomingState {
>  MigrationIncomingState *migration_incoming_get_current(void);
>  void migration_incoming_state_destroy(void);
>  void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
> +void migration_incoming_qemu_exit(void);
>  /*
>   * Functions to work with blocktime context
>   */
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index b47c955763..48cbb46c27 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -2078,6 +2078,24 @@ bool postcopy_is_paused(MigrationStatus status)
>          status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
>  }
>  
> +static void postcopy_listen_thread_bh(void *opaque)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +
> +    migration_incoming_state_destroy();
> +
> +    if (mis->state == MIGRATION_STATUS_FAILED) {
> +        /*
> +         * If something went wrong then we have a bad state so exit;
> +         * we only could have gotten here if something failed before
> +         * POSTCOPY_INCOMING_RUNNING (for example device load), otherwise
> +         * postcopy migration would pause inside qemu_loadvm_state_main().
> +         * Failing dirty-bitmaps won't fail the whole migration.
> +         */
> +        exit(1);
> +    }
> +}
> +
>  /*
>   * Triggered by a postcopy_listen command; this thread takes over reading
>   * the input stream, leaving the main thread free to carry on loading the rest
> @@ -2131,53 +2149,38 @@ static void *postcopy_listen_thread(void *opaque)
>                           "bitmaps are correctly migrated and valid.",
>                           __func__, load_res, error_get_pretty(local_err));
>              g_clear_pointer(&local_err, error_free);
> -            load_res = 0; /* prevent further exit() */
>          } else {
> +            /*
> +             * Something went fatally wrong and we have a bad state, QEMU will
> +             * exit depending on if postcopy-exit-on-error is true, but the
> +             * migration cannot be recovered.
> +             */
>              error_prepend(&local_err,
>                            "loadvm failed during postcopy: %d: ", load_res);
>              migrate_set_error(migr, local_err);
>              g_clear_pointer(&local_err, error_report_err);
>              migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>                                             MIGRATION_STATUS_FAILED);
> +            goto out;
>          }
>      }
> -    if (load_res >= 0) {
> -        /*
> -         * This looks good, but it's possible that the device loading in the
> -         * main thread hasn't finished yet, and so we might not be in 'RUN'
> -         * state yet; wait for the end of the main thread.
> -         */
> -        qemu_event_wait(&mis->main_thread_load_event);
> -    }
> -    postcopy_incoming_cleanup(mis);
> -
> -    if (load_res < 0) {
> -        /*
> -         * If something went wrong then we have a bad state so exit;
> -         * depending how far we got it might be possible at this point
> -         * to leave the guest running and fire MCEs for pages that never
> -         * arrived as a desperate recovery step.
> -         */
> -        rcu_unregister_thread();
> -        exit(EXIT_FAILURE);
> -    }
> +    /*
> +     * This looks good, but it's possible that the device loading in the
> +     * main thread hasn't finished yet, and so we might not be in 'RUN'
> +     * state yet; wait for the end of the main thread.
> +     */
> +    qemu_event_wait(&mis->main_thread_load_event);
>  
>      migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>                                     MIGRATION_STATUS_COMPLETED);
> -    /*
> -     * If everything has worked fine, then the main thread has waited
> -     * for us to start, and we're the last use of the mis.
> -     * (If something broke then qemu will have to exit anyway since it's
> -     * got a bad migration state).
> -     */
> -    bql_lock();
> -    migration_incoming_state_destroy();
> -    bql_unlock();
>  
> +out:
>      rcu_unregister_thread();
>      mis->have_listen_thread = false;
>      postcopy_state_set(POSTCOPY_INCOMING_END);
>  
> +    migration_bh_schedule(postcopy_listen_thread_bh, NULL);
> +
>      object_unref(OBJECT(migr));
>  
>      return NULL;
> diff --git a/migration/trace-events b/migration/trace-events
> index e8edd1fbba..772636f3ac 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -193,7 +193,7 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
>  source_return_path_thread_switchover_acked(void) ""
>  migration_thread_low_pending(uint64_t pending) "%" PRIu64
>  migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
> -process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> +process_incoming_migration_co_end(int ret) "ret=%d"
>  process_incoming_migration_co_postcopy_end_main(void) ""
>  postcopy_preempt_enabled(bool value) "%d"
>  migration_precopy_complete(void) ""
> -- 
> 2.51.0
> 

-- 
Peter Xu

Re: [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy()

Posted by Juraj Marcin 3 months, 1 week ago

On 2025-10-30 18:49, Peter Xu wrote:
> On Thu, Oct 30, 2025 at 10:49:08PM +0100, Juraj Marcin wrote:
> > From: Juraj Marcin <jmarcin@redhat.com>
> > 
> > Currently, there are two functions that are responsible for calling the
> > cleanup of the incoming migration state. With successful precopy, it's
> > the incoming migration coroutine, and with successful postcopy it's the
> > postcopy listen thread. However, if postcopy fails during in the device
> > load, both functions will try to do the cleanup.
> > 
> > This patch refactors all cleanup that needs to be done on the incoming
> > side into a common function and defines a clear boundary, who is
> > responsible for the cleanup. The incoming migration coroutine is
> > responsible for calling the cleanup function, unless the listen thread
> > has been started, in which case the postcopy listen thread runs the
> > incoming migration cleanup in its BH.
> > 
> > Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
> > ---
> >  migration/migration.c    | 44 +++++++++-------------------
> >  migration/migration.h    |  1 +
> >  migration/postcopy-ram.c | 63 +++++++++++++++++++++-------------------
> >  migration/trace-events   |  2 +-
> >  4 files changed, 49 insertions(+), 61 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 9a367f717e..637be71bfe 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -438,10 +438,15 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
> >  
> >  void migration_incoming_state_destroy(void)
> >  {
> > -    struct MigrationIncomingState *mis = migration_incoming_get_current();
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    PostcopyState ps = postcopy_state_get();
> >  
> >      multifd_recv_cleanup();
> >  
> > +    if (ps != POSTCOPY_INCOMING_NONE) {
> > +        postcopy_incoming_cleanup(mis);
> > +    }
> > +
> >      /*
> >       * RAM state cleanup needs to happen after multifd cleanup, because
> >       * multifd threads can use some of its states (receivedmap).
> > @@ -866,7 +871,6 @@ process_incoming_migration_co(void *opaque)
> >  {
> >      MigrationState *s = migrate_get_current();
> >      MigrationIncomingState *mis = migration_incoming_get_current();
> > -    PostcopyState ps;
> >      int ret;
> >      Error *local_err = NULL;
> >  
> > @@ -883,25 +887,14 @@ process_incoming_migration_co(void *opaque)
> >  
> >      trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
> >  
> > -    ps = postcopy_state_get();
> > -    trace_process_incoming_migration_co_end(ret, ps);
> > -    if (ps != POSTCOPY_INCOMING_NONE) {
> > -        if (ps == POSTCOPY_INCOMING_ADVISE) {
> > -            /*
> > -             * Where a migration had postcopy enabled (and thus went to advise)
> > -             * but managed to complete within the precopy period, we can use
> > -             * the normal exit.
> > -             */
> > -            postcopy_incoming_cleanup(mis);
> > -        } else if (ret >= 0) {
> > -            /*
> > -             * Postcopy was started, cleanup should happen at the end of the
> > -             * postcopy thread.
> > -             */
> > -            trace_process_incoming_migration_co_postcopy_end_main();
> > -            goto out;
> > -        }
> > -        /* Else if something went wrong then just fall out of the normal exit */
> > +    trace_process_incoming_migration_co_end(ret);
> > +    if (mis->have_listen_thread) {
> > +        /*
> > +         * Postcopy was started, cleanup should happen at the end of the
> > +         * postcopy listen thread.
> > +         */
> > +        trace_process_incoming_migration_co_postcopy_end_main();
> > +        goto out;
> >      }
> >  
> >      if (ret < 0) {
> > @@ -933,15 +926,6 @@ fail:
> >          }
> >  
> >          exit(EXIT_FAILURE);
> > -    } else {
> > -        /*
> > -         * Report the error here in case that QEMU abruptly exits
> > -         * when postcopy is enabled.
> > -         */
> > -        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> > -            error_report_err(s->error);
> > -            s->error = NULL;
> > -        }
> 
> The patch looks all good itself.  Here a pure question: is the old code
> wrong?  If user sets exit_on_error=false, then this path seems to be
> releasing the error object, then query-migrate will see nothing?

I have tested this, and it is indeed what happens with the old code.
There is no "error-desc" in the response of "query-migrate" command,
just "status": "failed".

> 
> >      }
> >  out:
> >      /* Pairs with the refcount taken in qmp_migrate_incoming() */
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 01329bf824..4a37f7202c 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -254,6 +254,7 @@ struct MigrationIncomingState {
> >  MigrationIncomingState *migration_incoming_get_current(void);
> >  void migration_incoming_state_destroy(void);
> >  void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
> > +void migration_incoming_qemu_exit(void);
> >  /*
> >   * Functions to work with blocktime context
> >   */
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index b47c955763..48cbb46c27 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -2078,6 +2078,24 @@ bool postcopy_is_paused(MigrationStatus status)
> >          status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
> >  }
> >  
> > +static void postcopy_listen_thread_bh(void *opaque)
> > +{
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +
> > +    migration_incoming_state_destroy();
> > +
> > +    if (mis->state == MIGRATION_STATUS_FAILED) {
> > +        /*
> > +         * If something went wrong then we have a bad state so exit;
> > +         * we only could have gotten here if something failed before
> > +         * POSTCOPY_INCOMING_RUNNING (for example device load), otherwise
> > +         * postcopy migration would pause inside qemu_loadvm_state_main().
> > +         * Failing dirty-bitmaps won't fail the whole migration.
> > +         */
> > +        exit(1);
> > +    }
> > +}
> > +
> >  /*
> >   * Triggered by a postcopy_listen command; this thread takes over reading
> >   * the input stream, leaving the main thread free to carry on loading the rest
> > @@ -2131,53 +2149,38 @@ static void *postcopy_listen_thread(void *opaque)
> >                           "bitmaps are correctly migrated and valid.",
> >                           __func__, load_res, error_get_pretty(local_err));
> >              g_clear_pointer(&local_err, error_free);
> > -            load_res = 0; /* prevent further exit() */
> >          } else {
> > +            /*
> > +             * Something went fatally wrong and we have a bad state, QEMU will
> > +             * exit depending on if postcopy-exit-on-error is true, but the
> > +             * migration cannot be recovered.
> > +             */
> >              error_prepend(&local_err,
> >                            "loadvm failed during postcopy: %d: ", load_res);
> >              migrate_set_error(migr, local_err);
> >              g_clear_pointer(&local_err, error_report_err);
> >              migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> >                                             MIGRATION_STATUS_FAILED);
> > +            goto out;
> >          }
> >      }
> > -    if (load_res >= 0) {
> > -        /*
> > -         * This looks good, but it's possible that the device loading in the
> > -         * main thread hasn't finished yet, and so we might not be in 'RUN'
> > -         * state yet; wait for the end of the main thread.
> > -         */
> > -        qemu_event_wait(&mis->main_thread_load_event);
> > -    }
> > -    postcopy_incoming_cleanup(mis);
> > -
> > -    if (load_res < 0) {
> > -        /*
> > -         * If something went wrong then we have a bad state so exit;
> > -         * depending how far we got it might be possible at this point
> > -         * to leave the guest running and fire MCEs for pages that never
> > -         * arrived as a desperate recovery step.
> > -         */
> > -        rcu_unregister_thread();
> > -        exit(EXIT_FAILURE);
> > -    }
> > +    /*
> > +     * This looks good, but it's possible that the device loading in the
> > +     * main thread hasn't finished yet, and so we might not be in 'RUN'
> > +     * state yet; wait for the end of the main thread.
> > +     */
> > +    qemu_event_wait(&mis->main_thread_load_event);
> >  
> >      migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> >                                     MIGRATION_STATUS_COMPLETED);
> > -    /*
> > -     * If everything has worked fine, then the main thread has waited
> > -     * for us to start, and we're the last use of the mis.
> > -     * (If something broke then qemu will have to exit anyway since it's
> > -     * got a bad migration state).
> > -     */
> > -    bql_lock();
> > -    migration_incoming_state_destroy();
> > -    bql_unlock();
> >  
> > +out:
> >      rcu_unregister_thread();
> >      mis->have_listen_thread = false;
> >      postcopy_state_set(POSTCOPY_INCOMING_END);
> >  
> > +    migration_bh_schedule(postcopy_listen_thread_bh, NULL);
> > +
> >      object_unref(OBJECT(migr));
> >  
> >      return NULL;
> > diff --git a/migration/trace-events b/migration/trace-events
> > index e8edd1fbba..772636f3ac 100644
> > --- a/migration/trace-events
> > +++ b/migration/trace-events
> > @@ -193,7 +193,7 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
> >  source_return_path_thread_switchover_acked(void) ""
> >  migration_thread_low_pending(uint64_t pending) "%" PRIu64
> >  migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
> > -process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> > +process_incoming_migration_co_end(int ret) "ret=%d"
> >  process_incoming_migration_co_postcopy_end_main(void) ""
> >  postcopy_preempt_enabled(bool value) "%d"
> >  migration_precopy_complete(void) ""
> > -- 
> > 2.51.0
> > 
> 
> -- 
> Peter Xu
>

Re: [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy()

Posted by Peter Xu 3 months, 1 week ago

On Fri, Oct 31, 2025 at 12:03:57PM +0100, Juraj Marcin wrote:
> On 2025-10-30 18:49, Peter Xu wrote:
> > On Thu, Oct 30, 2025 at 10:49:08PM +0100, Juraj Marcin wrote:
> > > From: Juraj Marcin <jmarcin@redhat.com>
> > > 
> > > Currently, there are two functions that are responsible for calling the
> > > cleanup of the incoming migration state. With successful precopy, it's
> > > the incoming migration coroutine, and with successful postcopy it's the
> > > postcopy listen thread. However, if postcopy fails during in the device
> > > load, both functions will try to do the cleanup.
> > > 
> > > This patch refactors all cleanup that needs to be done on the incoming
> > > side into a common function and defines a clear boundary, who is
> > > responsible for the cleanup. The incoming migration coroutine is
> > > responsible for calling the cleanup function, unless the listen thread
> > > has been started, in which case the postcopy listen thread runs the
> > > incoming migration cleanup in its BH.
> > > 
> > > Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
> > > ---
> > >  migration/migration.c    | 44 +++++++++-------------------
> > >  migration/migration.h    |  1 +
> > >  migration/postcopy-ram.c | 63 +++++++++++++++++++++-------------------
> > >  migration/trace-events   |  2 +-
> > >  4 files changed, 49 insertions(+), 61 deletions(-)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 9a367f717e..637be71bfe 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -438,10 +438,15 @@ void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
> > >  
> > >  void migration_incoming_state_destroy(void)
> > >  {
> > > -    struct MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    PostcopyState ps = postcopy_state_get();
> > >  
> > >      multifd_recv_cleanup();
> > >  
> > > +    if (ps != POSTCOPY_INCOMING_NONE) {
> > > +        postcopy_incoming_cleanup(mis);
> > > +    }
> > > +
> > >      /*
> > >       * RAM state cleanup needs to happen after multifd cleanup, because
> > >       * multifd threads can use some of its states (receivedmap).
> > > @@ -866,7 +871,6 @@ process_incoming_migration_co(void *opaque)
> > >  {
> > >      MigrationState *s = migrate_get_current();
> > >      MigrationIncomingState *mis = migration_incoming_get_current();
> > > -    PostcopyState ps;
> > >      int ret;
> > >      Error *local_err = NULL;
> > >  
> > > @@ -883,25 +887,14 @@ process_incoming_migration_co(void *opaque)
> > >  
> > >      trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
> > >  
> > > -    ps = postcopy_state_get();
> > > -    trace_process_incoming_migration_co_end(ret, ps);
> > > -    if (ps != POSTCOPY_INCOMING_NONE) {
> > > -        if (ps == POSTCOPY_INCOMING_ADVISE) {
> > > -            /*
> > > -             * Where a migration had postcopy enabled (and thus went to advise)
> > > -             * but managed to complete within the precopy period, we can use
> > > -             * the normal exit.
> > > -             */
> > > -            postcopy_incoming_cleanup(mis);
> > > -        } else if (ret >= 0) {
> > > -            /*
> > > -             * Postcopy was started, cleanup should happen at the end of the
> > > -             * postcopy thread.
> > > -             */
> > > -            trace_process_incoming_migration_co_postcopy_end_main();
> > > -            goto out;
> > > -        }
> > > -        /* Else if something went wrong then just fall out of the normal exit */
> > > +    trace_process_incoming_migration_co_end(ret);
> > > +    if (mis->have_listen_thread) {
> > > +        /*
> > > +         * Postcopy was started, cleanup should happen at the end of the
> > > +         * postcopy listen thread.
> > > +         */
> > > +        trace_process_incoming_migration_co_postcopy_end_main();
> > > +        goto out;
> > >      }
> > >  
> > >      if (ret < 0) {
> > > @@ -933,15 +926,6 @@ fail:
> > >          }
> > >  
> > >          exit(EXIT_FAILURE);
> > > -    } else {
> > > -        /*
> > > -         * Report the error here in case that QEMU abruptly exits
> > > -         * when postcopy is enabled.
> > > -         */
> > > -        WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> > > -            error_report_err(s->error);
> > > -            s->error = NULL;
> > > -        }
> > 
> > The patch looks all good itself.  Here a pure question: is the old code
> > wrong?  If user sets exit_on_error=false, then this path seems to be
> > releasing the error object, then query-migrate will see nothing?
> 
> I have tested this, and it is indeed what happens with the old code.
> There is no "error-desc" in the response of "query-migrate" command,
> just "status": "failed".

Looks like it was a regression.

At the meantime, I feel like it's indeed fine if we do not dump error when
exit-on-error=false, like what you did here.

So I'll attach this:

Fixes: 9535435795 ("migration: push Error **errp into qemu_loadvm_state()")

And as the current patch looks all correct to me:

Reviewed-by: Peter Xu <peterx@redhat.com>

I've queued the series, thanks!

I'll post a small patch shortly to test the error-desc for
exit-on-error=false.

+Arun

> 
> > 
> > >      }
> > >  out:
> > >      /* Pairs with the refcount taken in qmp_migrate_incoming() */
> > > diff --git a/migration/migration.h b/migration/migration.h
> > > index 01329bf824..4a37f7202c 100644
> > > --- a/migration/migration.h
> > > +++ b/migration/migration.h
> > > @@ -254,6 +254,7 @@ struct MigrationIncomingState {
> > >  MigrationIncomingState *migration_incoming_get_current(void);
> > >  void migration_incoming_state_destroy(void);
> > >  void migration_incoming_transport_cleanup(MigrationIncomingState *mis);
> > > +void migration_incoming_qemu_exit(void);
> > >  /*
> > >   * Functions to work with blocktime context
> > >   */
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index b47c955763..48cbb46c27 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -2078,6 +2078,24 @@ bool postcopy_is_paused(MigrationStatus status)
> > >          status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
> > >  }
> > >  
> > > +static void postcopy_listen_thread_bh(void *opaque)
> > > +{
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > +
> > > +    migration_incoming_state_destroy();
> > > +
> > > +    if (mis->state == MIGRATION_STATUS_FAILED) {
> > > +        /*
> > > +         * If something went wrong then we have a bad state so exit;
> > > +         * we only could have gotten here if something failed before
> > > +         * POSTCOPY_INCOMING_RUNNING (for example device load), otherwise
> > > +         * postcopy migration would pause inside qemu_loadvm_state_main().
> > > +         * Failing dirty-bitmaps won't fail the whole migration.
> > > +         */
> > > +        exit(1);
> > > +    }
> > > +}
> > > +
> > >  /*
> > >   * Triggered by a postcopy_listen command; this thread takes over reading
> > >   * the input stream, leaving the main thread free to carry on loading the rest
> > > @@ -2131,53 +2149,38 @@ static void *postcopy_listen_thread(void *opaque)
> > >                           "bitmaps are correctly migrated and valid.",
> > >                           __func__, load_res, error_get_pretty(local_err));
> > >              g_clear_pointer(&local_err, error_free);
> > > -            load_res = 0; /* prevent further exit() */
> > >          } else {
> > > +            /*
> > > +             * Something went fatally wrong and we have a bad state, QEMU will
> > > +             * exit depending on if postcopy-exit-on-error is true, but the
> > > +             * migration cannot be recovered.
> > > +             */
> > >              error_prepend(&local_err,
> > >                            "loadvm failed during postcopy: %d: ", load_res);
> > >              migrate_set_error(migr, local_err);
> > >              g_clear_pointer(&local_err, error_report_err);
> > >              migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > >                                             MIGRATION_STATUS_FAILED);
> > > +            goto out;
> > >          }
> > >      }
> > > -    if (load_res >= 0) {
> > > -        /*
> > > -         * This looks good, but it's possible that the device loading in the
> > > -         * main thread hasn't finished yet, and so we might not be in 'RUN'
> > > -         * state yet; wait for the end of the main thread.
> > > -         */
> > > -        qemu_event_wait(&mis->main_thread_load_event);
> > > -    }
> > > -    postcopy_incoming_cleanup(mis);
> > > -
> > > -    if (load_res < 0) {
> > > -        /*
> > > -         * If something went wrong then we have a bad state so exit;
> > > -         * depending how far we got it might be possible at this point
> > > -         * to leave the guest running and fire MCEs for pages that never
> > > -         * arrived as a desperate recovery step.
> > > -         */
> > > -        rcu_unregister_thread();
> > > -        exit(EXIT_FAILURE);
> > > -    }
> > > +    /*
> > > +     * This looks good, but it's possible that the device loading in the
> > > +     * main thread hasn't finished yet, and so we might not be in 'RUN'
> > > +     * state yet; wait for the end of the main thread.
> > > +     */
> > > +    qemu_event_wait(&mis->main_thread_load_event);
> > >  
> > >      migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > >                                     MIGRATION_STATUS_COMPLETED);
> > > -    /*
> > > -     * If everything has worked fine, then the main thread has waited
> > > -     * for us to start, and we're the last use of the mis.
> > > -     * (If something broke then qemu will have to exit anyway since it's
> > > -     * got a bad migration state).
> > > -     */
> > > -    bql_lock();
> > > -    migration_incoming_state_destroy();
> > > -    bql_unlock();
> > >  
> > > +out:
> > >      rcu_unregister_thread();
> > >      mis->have_listen_thread = false;
> > >      postcopy_state_set(POSTCOPY_INCOMING_END);
> > >  
> > > +    migration_bh_schedule(postcopy_listen_thread_bh, NULL);
> > > +
> > >      object_unref(OBJECT(migr));
> > >  
> > >      return NULL;
> > > diff --git a/migration/trace-events b/migration/trace-events
> > > index e8edd1fbba..772636f3ac 100644
> > > --- a/migration/trace-events
> > > +++ b/migration/trace-events
> > > @@ -193,7 +193,7 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
> > >  source_return_path_thread_switchover_acked(void) ""
> > >  migration_thread_low_pending(uint64_t pending) "%" PRIu64
> > >  migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
> > > -process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> > > +process_incoming_migration_co_end(int ret) "ret=%d"
> > >  process_incoming_migration_co_postcopy_end_main(void) ""
> > >  postcopy_preempt_enabled(bool value) "%d"
> > >  migration_precopy_complete(void) ""
> > > -- 
> > > 2.51.0
> > > 
> > 
> > -- 
> > Peter Xu
> > 
> 

-- 
Peter Xu

Re: [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy()

Posted by Peter Xu 3 months, 1 week ago

On Fri, Oct 31, 2025 at 12:29:28PM -0400, Peter Xu wrote:
> I've queued the series, thanks!

Ahh, this series seems to break iotests 194..

https://gitlab.com/peterx/qemu/-/jobs/11916678476

Juraj, please have a look.  If the fix is small feel free to send a fixup
on top of a patch.

IIUC, we at least need below, but maybe not enough:

diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index e114c0b269..f6b34814a2 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -76,7 +76,7 @@ with iotests.FilePath('source.img') as source_img_path, \
 
     while True:
         event1 = source_vm.event_wait('MIGRATION')
-        if event1['data']['status'] == 'postcopy-active':
+        if event1['data']['status'] in ('postcopy-active', 'postcopy-device'):
             # This event is racy, it depends do we really do postcopy or bitmap
             # was migrated during downtime (and no data to migrate in postcopy
             # phase). So, don't log it.

-- 
Peter Xu

[PATCH v3 1/7] migration: Do not try to start VM if disk activation fails
[PATCH v3 2/7] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c
[PATCH v3 3/7] migration: Introduce postcopy incoming setup and cleanup functions
[PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy()
[PATCH v3 5/7] migration: Respect exit-on-error when migration fails before resuming
[PATCH v3 6/7] migration: Make postcopy listen thread joinable
[PATCH v3 7/7] migration: Introduce POSTCOPY_DEVICE state