[PATCH v5 13/21] migration: Postcopy recover with preempt enabled

Peter Xu posted 21 patches 3 years, 9 months ago
Maintainers: Juan Quintela <quintela@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>, Thomas Huth <thuth@redhat.com>, Laurent Vivier <lvivier@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>
There is a newer version of this series
[PATCH v5 13/21] migration: Postcopy recover with preempt enabled
Posted by Peter Xu 3 years, 9 months ago
To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
instead of stopping the thread it halts with a semaphore, preparing to be
kicked again when recovery is detected.

A mutex is introduced to make sure there's no concurrent operation upon the
socket.  To make it simple, the fast ram load thread will take the mutex during
its whole procedure, and only release it if it's paused.  The fast-path socket
will be properly released by the main loading thread safely when there's
network failures during postcopy with that mutex held.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c    | 27 +++++++++++++++++++++++----
 migration/migration.h    | 19 +++++++++++++++++++
 migration/postcopy-ram.c | 25 +++++++++++++++++++++++--
 migration/qemu-file.c    | 27 +++++++++++++++++++++++++++
 migration/qemu-file.h    |  1 +
 migration/savevm.c       | 26 ++++++++++++++++++++++++--
 migration/trace-events   |  2 ++
 7 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8264b03d4d..a0db5de685 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -215,9 +215,11 @@ void migration_object_init(void)
     current_incoming->postcopy_remote_fds =
         g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD));
     qemu_mutex_init(&current_incoming->rp_mutex);
+    qemu_mutex_init(&current_incoming->postcopy_prio_thread_mutex);
     qemu_event_init(&current_incoming->main_thread_load_event, false);
     qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
     qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
+    qemu_sem_init(&current_incoming->postcopy_pause_sem_fast_load, 0);
     qemu_mutex_init(&current_incoming->page_request_mutex);
     current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
@@ -697,9 +699,9 @@ static bool postcopy_try_recover(void)
 
         /*
          * Here, we only wake up the main loading thread (while the
-         * fault thread will still be waiting), so that we can receive
+         * rest threads will still be waiting), so that we can receive
          * commands from source now, and answer it if needed. The
-         * fault thread will be woken up afterwards until we are sure
+         * rest threads will be woken up afterwards until we are sure
          * that source is ready to reply to page requests.
          */
         qemu_sem_post(&mis->postcopy_pause_sem_dst);
@@ -3470,6 +3472,18 @@ static MigThrError postcopy_pause(MigrationState *s)
         qemu_file_shutdown(file);
         qemu_fclose(file);
 
+        /*
+         * Do the same to postcopy fast path socket too if there is.  No
+         * locking needed because no racer as long as we do this before setting
+         * status to paused.
+         */
+        if (s->postcopy_qemufile_src) {
+            migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+            qemu_file_shutdown(s->postcopy_qemufile_src);
+            qemu_fclose(s->postcopy_qemufile_src);
+            s->postcopy_qemufile_src = NULL;
+        }
+
         migrate_set_state(&s->state, s->state,
                           MIGRATION_STATUS_POSTCOPY_PAUSED);
 
@@ -3525,8 +3539,13 @@ static MigThrError migration_detect_error(MigrationState *s)
         return MIG_THR_ERR_FATAL;
     }
 
-    /* Try to detect any file errors */
-    ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
+    /*
+     * Try to detect any file errors.  Note that postcopy_qemufile_src will
+     * be NULL when postcopy preempt is not enabled.
+     */
+    ret = qemu_file_get_error_obj_any(s->to_dst_file,
+                                      s->postcopy_qemufile_src,
+                                      &local_error);
     if (!ret) {
         /* Everything is fine */
         assert(!local_error);
diff --git a/migration/migration.h b/migration/migration.h
index b8aacfe3af..91f845e9e4 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -118,6 +118,18 @@ struct MigrationIncomingState {
     /* Postcopy priority thread is used to receive postcopy requested pages */
     QemuThread postcopy_prio_thread;
     bool postcopy_prio_thread_created;
+    /*
+     * Used to sync between the ram load main thread and the fast ram load
+     * thread.  It protects postcopy_qemufile_dst, which is the postcopy
+     * fast channel.
+     *
+     * The ram fast load thread will take it mostly for the whole lifecycle
+     * because it needs to continuously read data from the channel, and
+     * it'll only release this mutex if postcopy is interrupted, so that
+     * the ram load main thread will take this mutex over and properly
+     * release the broken channel.
+     */
+    QemuMutex postcopy_prio_thread_mutex;
     /*
      * An array of temp host huge pages to be used, one for each postcopy
      * channel.
@@ -147,6 +159,13 @@ struct MigrationIncomingState {
     /* notify PAUSED postcopy incoming migrations to try to continue */
     QemuSemaphore postcopy_pause_sem_dst;
     QemuSemaphore postcopy_pause_sem_fault;
+    /*
+     * This semaphore is used to allow the ram fast load thread (only when
+     * postcopy preempt is enabled) fall into sleep when there's network
+     * interruption detected.  When the recovery is done, the main load
+     * thread will kick the fast ram load thread using this semaphore.
+     */
+    QemuSemaphore postcopy_pause_sem_fast_load;
 
     /* List of listening socket addresses  */
     SocketAddressList *socket_address_list;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index e92db0556b..b3c81b46f6 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1580,6 +1580,15 @@ int postcopy_preempt_setup(MigrationState *s, Error **errp)
     return 0;
 }
 
+static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
+{
+    trace_postcopy_pause_fast_load();
+    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
+    qemu_sem_wait(&mis->postcopy_pause_sem_fast_load);
+    qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
+    trace_postcopy_pause_fast_load_continued();
+}
+
 void *postcopy_preempt_thread(void *opaque)
 {
     MigrationIncomingState *mis = opaque;
@@ -1592,11 +1601,23 @@ void *postcopy_preempt_thread(void *opaque)
     qemu_sem_post(&mis->thread_sync_sem);
 
     /* Sending RAM_SAVE_FLAG_EOS to terminate this thread */
-    ret = ram_load_postcopy(mis->postcopy_qemufile_dst, RAM_CHANNEL_POSTCOPY);
+    qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
+    while (1) {
+        ret = ram_load_postcopy(mis->postcopy_qemufile_dst,
+                                RAM_CHANNEL_POSTCOPY);
+        /* If error happened, go into recovery routine */
+        if (ret) {
+            postcopy_pause_ram_fast_load(mis);
+        } else {
+            /* We're done */
+            break;
+        }
+    }
+    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
 
     rcu_unregister_thread();
 
     trace_postcopy_preempt_thread_exit();
 
-    return ret == 0 ? NULL : (void *)-1;
+    return NULL;
 }
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 1479cddad9..397652f0ba 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -139,6 +139,33 @@ int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
     return f->last_error;
 }
 
+/*
+ * Get last error for either stream f1 or f2 with optional Error*.
+ * The error returned (non-zero) can be either from f1 or f2.
+ *
+ * If any of the qemufile* is NULL, then skip the check on that file.
+ *
+ * When there is no error on both qemufile, zero is returned.
+ */
+int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp)
+{
+    int ret = 0;
+
+    if (f1) {
+        ret = qemu_file_get_error_obj(f1, errp);
+        /* If there's already error detected, return */
+        if (ret) {
+            return ret;
+        }
+    }
+
+    if (f2) {
+        ret = qemu_file_get_error_obj(f2, errp);
+    }
+
+    return ret;
+}
+
 /*
  * Set the last error for stream f with optional Error*
  */
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 3f36d4dc8c..2564e5e1c7 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -156,6 +156,7 @@ void qemu_file_update_transfer(QEMUFile *f, int64_t len);
 void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
 int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error_obj(QEMUFile *f, Error **errp);
+int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp);
 void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err);
 void qemu_file_set_error(QEMUFile *f, int ret);
 int qemu_file_shutdown(QEMUFile *f);
diff --git a/migration/savevm.c b/migration/savevm.c
index ecee05e631..050874650a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2152,6 +2152,13 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
      */
     qemu_sem_post(&mis->postcopy_pause_sem_fault);
 
+    if (migrate_postcopy_preempt()) {
+        /* The channel should already be setup again; make sure of it */
+        assert(mis->postcopy_qemufile_dst);
+        /* Kick the fast ram load thread too */
+        qemu_sem_post(&mis->postcopy_pause_sem_fast_load);
+    }
+
     return 0;
 }
 
@@ -2597,6 +2604,21 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
     mis->to_src_file = NULL;
     qemu_mutex_unlock(&mis->rp_mutex);
 
+    /*
+     * NOTE: this must happen before reset the PostcopyTmpPages below,
+     * otherwise it's racy to reset those fields when the fast load thread
+     * can be accessing it in parallel.
+     */
+    if (mis->postcopy_qemufile_dst) {
+        qemu_file_shutdown(mis->postcopy_qemufile_dst);
+        /* Take the mutex to make sure the fast ram load thread halted */
+        qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
+        migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
+        qemu_fclose(mis->postcopy_qemufile_dst);
+        mis->postcopy_qemufile_dst = NULL;
+        qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
+    }
+
     migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                       MIGRATION_STATUS_POSTCOPY_PAUSED);
 
@@ -2634,8 +2656,8 @@ retry:
     while (true) {
         section_type = qemu_get_byte(f);
 
-        if (qemu_file_get_error(f)) {
-            ret = qemu_file_get_error(f);
+        ret = qemu_file_get_error_obj_any(f, mis->postcopy_qemufile_dst, NULL);
+        if (ret) {
             break;
         }
 
diff --git a/migration/trace-events b/migration/trace-events
index 69f311169a..0e385c3a07 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -270,6 +270,8 @@ mark_postcopy_blocktime_begin(uint64_t addr, void *dd, uint32_t time, int cpu, i
 mark_postcopy_blocktime_end(uint64_t addr, void *dd, uint32_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %u, affected_cpu: %d"
 postcopy_pause_fault_thread(void) ""
 postcopy_pause_fault_thread_continued(void) ""
+postcopy_pause_fast_load(void) ""
+postcopy_pause_fast_load_continued(void) ""
 postcopy_ram_fault_thread_entry(void) ""
 postcopy_ram_fault_thread_exit(void) ""
 postcopy_ram_fault_thread_fds_core(int baseufd, int quitfd) "ufd: %d quitfd: %d"
-- 
2.32.0
Re: [PATCH v5 13/21] migration: Postcopy recover with preempt enabled
Posted by manish.mishra 3 years, 8 months ago
On 26/04/22 5:08 am, Peter Xu wrote:
LGTM,
Peter, I wanted to give review-tag for this and ealier patch too. I am 
new to qemu
review process so not sure how give review-tag, did not find any 
reference on
google too. So if you please let me know how to do it.
> To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
> needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
> instead of stopping the thread it halts with a semaphore, preparing to be
> kicked again when recovery is detected.
>
> A mutex is introduced to make sure there's no concurrent operation upon the
> socket.  To make it simple, the fast ram load thread will take the mutex during
> its whole procedure, and only release it if it's paused.  The fast-path socket
> will be properly released by the main loading thread safely when there's
> network failures during postcopy with that mutex held.
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   migration/migration.c    | 27 +++++++++++++++++++++++----
>   migration/migration.h    | 19 +++++++++++++++++++
>   migration/postcopy-ram.c | 25 +++++++++++++++++++++++--
>   migration/qemu-file.c    | 27 +++++++++++++++++++++++++++
>   migration/qemu-file.h    |  1 +
>   migration/savevm.c       | 26 ++++++++++++++++++++++++--
>   migration/trace-events   |  2 ++
>   7 files changed, 119 insertions(+), 8 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 8264b03d4d..a0db5de685 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -215,9 +215,11 @@ void migration_object_init(void)
>       current_incoming->postcopy_remote_fds =
>           g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD));
>       qemu_mutex_init(&current_incoming->rp_mutex);
> +    qemu_mutex_init(&current_incoming->postcopy_prio_thread_mutex);
>       qemu_event_init(&current_incoming->main_thread_load_event, false);
>       qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
>       qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
> +    qemu_sem_init(&current_incoming->postcopy_pause_sem_fast_load, 0);
>       qemu_mutex_init(&current_incoming->page_request_mutex);
>       current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
>   
> @@ -697,9 +699,9 @@ static bool postcopy_try_recover(void)
>   
>           /*
>            * Here, we only wake up the main loading thread (while the
> -         * fault thread will still be waiting), so that we can receive
> +         * rest threads will still be waiting), so that we can receive
>            * commands from source now, and answer it if needed. The
> -         * fault thread will be woken up afterwards until we are sure
> +         * rest threads will be woken up afterwards until we are sure
>            * that source is ready to reply to page requests.
>            */
>           qemu_sem_post(&mis->postcopy_pause_sem_dst);
> @@ -3470,6 +3472,18 @@ static MigThrError postcopy_pause(MigrationState *s)
>           qemu_file_shutdown(file);
>           qemu_fclose(file);
>   
> +        /*
> +         * Do the same to postcopy fast path socket too if there is.  No
> +         * locking needed because no racer as long as we do this before setting
> +         * status to paused.
> +         */
> +        if (s->postcopy_qemufile_src) {
> +            migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
> +            qemu_file_shutdown(s->postcopy_qemufile_src);
> +            qemu_fclose(s->postcopy_qemufile_src);
> +            s->postcopy_qemufile_src = NULL;
> +        }
> +
>           migrate_set_state(&s->state, s->state,
>                             MIGRATION_STATUS_POSTCOPY_PAUSED);
>   
> @@ -3525,8 +3539,13 @@ static MigThrError migration_detect_error(MigrationState *s)
>           return MIG_THR_ERR_FATAL;
>       }
>   
> -    /* Try to detect any file errors */
> -    ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
> +    /*
> +     * Try to detect any file errors.  Note that postcopy_qemufile_src will
> +     * be NULL when postcopy preempt is not enabled.
> +     */
> +    ret = qemu_file_get_error_obj_any(s->to_dst_file,
> +                                      s->postcopy_qemufile_src,
> +                                      &local_error);
>       if (!ret) {
>           /* Everything is fine */
>           assert(!local_error);
> diff --git a/migration/migration.h b/migration/migration.h
> index b8aacfe3af..91f845e9e4 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -118,6 +118,18 @@ struct MigrationIncomingState {
>       /* Postcopy priority thread is used to receive postcopy requested pages */
>       QemuThread postcopy_prio_thread;
>       bool postcopy_prio_thread_created;
> +    /*
> +     * Used to sync between the ram load main thread and the fast ram load
> +     * thread.  It protects postcopy_qemufile_dst, which is the postcopy
> +     * fast channel.
> +     *
> +     * The ram fast load thread will take it mostly for the whole lifecycle
> +     * because it needs to continuously read data from the channel, and
> +     * it'll only release this mutex if postcopy is interrupted, so that
> +     * the ram load main thread will take this mutex over and properly
> +     * release the broken channel.
> +     */
> +    QemuMutex postcopy_prio_thread_mutex;
>       /*
>        * An array of temp host huge pages to be used, one for each postcopy
>        * channel.
> @@ -147,6 +159,13 @@ struct MigrationIncomingState {
>       /* notify PAUSED postcopy incoming migrations to try to continue */
>       QemuSemaphore postcopy_pause_sem_dst;
>       QemuSemaphore postcopy_pause_sem_fault;
> +    /*
> +     * This semaphore is used to allow the ram fast load thread (only when
> +     * postcopy preempt is enabled) fall into sleep when there's network
> +     * interruption detected.  When the recovery is done, the main load
> +     * thread will kick the fast ram load thread using this semaphore.
> +     */
> +    QemuSemaphore postcopy_pause_sem_fast_load;
>   
>       /* List of listening socket addresses  */
>       SocketAddressList *socket_address_list;
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index e92db0556b..b3c81b46f6 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1580,6 +1580,15 @@ int postcopy_preempt_setup(MigrationState *s, Error **errp)
>       return 0;
>   }
>   
> +static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
> +{
> +    trace_postcopy_pause_fast_load();
> +    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);

I may have misunderstood synchronisation here but very very rare chance,

as both threads are working independently is it possible qemu_sem_post on

postcopy_pause_sem_fast_load is called by main thread even before we go

to qemu_sem_wait in next line, causing some kind of deadlock. That's should

not be possible as i understand it requires manually calling 
qmp_migration_recover

so chances are almost impossible. Just wanted to confirm it.

> +    qemu_sem_wait(&mis->postcopy_pause_sem_fast_load);

Just wanted to confirm why postcopy_pause_incoming is not called here 
itself.

Is it based on assumption that if there is error in any of the channel 
it will

eventually be paused on source side, closing both channels, resulting

postcopy_pause_incoming will be called from main thread on destination?

Usually it should be good to call as early as possible. It is left to main

thread default path so that we do not have any synchronisation overhead?

Also Peter, i was trying to understand postcopy recovery model so is use 
case

of qmp_migrate_pause just for debugging purpose?

> +    qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
> +    trace_postcopy_pause_fast_load_continued();
> +}
> +
>   void *postcopy_preempt_thread(void *opaque)
>   {
>       MigrationIncomingState *mis = opaque;
> @@ -1592,11 +1601,23 @@ void *postcopy_preempt_thread(void *opaque)
>       qemu_sem_post(&mis->thread_sync_sem);
>   
>       /* Sending RAM_SAVE_FLAG_EOS to terminate this thread */
> -    ret = ram_load_postcopy(mis->postcopy_qemufile_dst, RAM_CHANNEL_POSTCOPY);
> +    qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
> +    while (1) {
> +        ret = ram_load_postcopy(mis->postcopy_qemufile_dst,
> +                                RAM_CHANNEL_POSTCOPY);
> +        /* If error happened, go into recovery routine */
> +        if (ret) {
> +            postcopy_pause_ram_fast_load(mis);
> +        } else {
> +            /* We're done */
> +            break;
> +        }
> +    }
> +    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
>   
>       rcu_unregister_thread();
>   
>       trace_postcopy_preempt_thread_exit();
>   
> -    return ret == 0 ? NULL : (void *)-1;
> +    return NULL;
>   }
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 1479cddad9..397652f0ba 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -139,6 +139,33 @@ int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
>       return f->last_error;
>   }
>   
> +/*
> + * Get last error for either stream f1 or f2 with optional Error*.
> + * The error returned (non-zero) can be either from f1 or f2.
> + *
> + * If any of the qemufile* is NULL, then skip the check on that file.
> + *
> + * When there is no error on both qemufile, zero is returned.
> + */
> +int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp)
> +{
> +    int ret = 0;
> +
> +    if (f1) {
> +        ret = qemu_file_get_error_obj(f1, errp);
> +        /* If there's already error detected, return */
> +        if (ret) {
> +            return ret;
> +        }
> +    }
> +
> +    if (f2) {
> +        ret = qemu_file_get_error_obj(f2, errp);
> +    }
> +
> +    return ret;
> +}
> +
>   /*
>    * Set the last error for stream f with optional Error*
>    */
> diff --git a/migration/qemu-file.h b/migration/qemu-file.h
> index 3f36d4dc8c..2564e5e1c7 100644
> --- a/migration/qemu-file.h
> +++ b/migration/qemu-file.h
> @@ -156,6 +156,7 @@ void qemu_file_update_transfer(QEMUFile *f, int64_t len);
>   void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
>   int64_t qemu_file_get_rate_limit(QEMUFile *f);
>   int qemu_file_get_error_obj(QEMUFile *f, Error **errp);
> +int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp);
>   void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err);
>   void qemu_file_set_error(QEMUFile *f, int ret);
>   int qemu_file_shutdown(QEMUFile *f);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index ecee05e631..050874650a 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2152,6 +2152,13 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
>        */
>       qemu_sem_post(&mis->postcopy_pause_sem_fault);
>   
> +    if (migrate_postcopy_preempt()) {
> +        /* The channel should already be setup again; make sure of it */
> +        assert(mis->postcopy_qemufile_dst);
> +        /* Kick the fast ram load thread too */

> +        qemu_sem_post(&mis->postcopy_pause_sem_fast_load);
> +    }
> +
>       return 0;
>   }
>   
> @@ -2597,6 +2604,21 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
>       mis->to_src_file = NULL;
>       qemu_mutex_unlock(&mis->rp_mutex);
>   
> +    /*
> +     * NOTE: this must happen before reset the PostcopyTmpPages below,
> +     * otherwise it's racy to reset those fields when the fast load thread
> +     * can be accessing it in parallel.
> +     */
> +    if (mis->postcopy_qemufile_dst) {
> +        qemu_file_shutdown(mis->postcopy_qemufile_dst);
> +        /* Take the mutex to make sure the fast ram load thread halted */
> +        qemu_mutex_lock(&mis->postcopy_prio_thread_mutex);
> +        migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
> +        qemu_fclose(mis->postcopy_qemufile_dst);
> +        mis->postcopy_qemufile_dst = NULL;
> +        qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
> +    }
> +
>       migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>                         MIGRATION_STATUS_POSTCOPY_PAUSED);
>   
> @@ -2634,8 +2656,8 @@ retry:
>       while (true) {
>           section_type = qemu_get_byte(f);
>   
> -        if (qemu_file_get_error(f)) {
> -            ret = qemu_file_get_error(f);
> +        ret = qemu_file_get_error_obj_any(f, mis->postcopy_qemufile_dst, NULL);
> +        if (ret) {
>               break;
>           }
>   
> diff --git a/migration/trace-events b/migration/trace-events
> index 69f311169a..0e385c3a07 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -270,6 +270,8 @@ mark_postcopy_blocktime_begin(uint64_t addr, void *dd, uint32_t time, int cpu, i
>   mark_postcopy_blocktime_end(uint64_t addr, void *dd, uint32_t time, int affected_cpu) "addr: 0x%" PRIx64 ", dd: %p, time: %u, affected_cpu: %d"
>   postcopy_pause_fault_thread(void) ""
>   postcopy_pause_fault_thread_continued(void) ""
> +postcopy_pause_fast_load(void) ""
> +postcopy_pause_fast_load_continued(void) ""
>   postcopy_ram_fault_thread_entry(void) ""
>   postcopy_ram_fault_thread_exit(void) ""
>   postcopy_ram_fault_thread_fds_core(int baseufd, int quitfd) "ufd: %d quitfd: %d"
Re: [PATCH v5 13/21] migration: Postcopy recover with preempt enabled
Posted by Peter Xu 3 years, 8 months ago
Hi, Manish,

On Mon, May 16, 2022 at 07:01:35PM +0530, manish.mishra wrote:
> 
> On 26/04/22 5:08 am, Peter Xu wrote:
> LGTM,
> Peter, I wanted to give review-tag for this and ealier patch too. I am new
> to qemu
> review process so not sure how give review-tag, did not find any reference
> on
> google too. So if you please let me know how to do it.

It's here:

https://git.qemu.org/?p=qemu.git;a=blob;f=docs/devel/submitting-a-patch.rst;hb=HEAD#l492

Since afaict QEMU is mostly following what Linux does, you can also
reference to this one with more context:

https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

But since you're still having question regarding this patch, no rush on
providing your R-bs; let's finish the discussion first.

[...]

> > +static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
> > +{
> > +    trace_postcopy_pause_fast_load();
> > +    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
> 
> I may have misunderstood synchronisation here but very very rare chance,
> 
> as both threads are working independently is it possible qemu_sem_post on
> 
> postcopy_pause_sem_fast_load is called by main thread even before we go
> 
> to qemu_sem_wait in next line, causing some kind of deadlock. That's should
> 
> not be possible as i understand it requires manually calling
> qmp_migration_recover
> 
> so chances are almost impossible. Just wanted to confirm it.

Sorry I don't quite get the question, could you elaborate?  E.g., when the
described deadlock happened, what is both main thread and preempt load
thread doing?  What are they waiting at?

Note: we have already released mutex before waiting on sem.

> 
> > +    qemu_sem_wait(&mis->postcopy_pause_sem_fast_load);
> 
> Just wanted to confirm why postcopy_pause_incoming is not called here
> itself.

postcopy_pause_incoming() is only used in the main ram load thread, while
this function (postcopy_pause_ram_fast_load) is only called by the preempt
load thread.

> 
> Is it based on assumption that if there is error in any of the channel it
> will
> 
> eventually be paused on source side, closing both channels, resulting
> 
> postcopy_pause_incoming will be called from main thread on destination?

Yes.

> 
> Usually it should be good to call as early as possible. It is left to main
> 
> thread default path so that we do not have any synchronisation overhead?

What's the sync overhead you mentioned? What we want to do here is simply
to put all the dest QEMU migration threads into a halted state rather than
quitting, so that they can be continued when necessary.

> 
> Also Peter, i was trying to understand postcopy recovery model so is use
> case
> 
> of qmp_migrate_pause just for debugging purpose?

Yes.  It's also a way to cleanly stop using the network (comparing to force
unplug the nic ports?) for whatever reason with a shutdown() syscall upon
the socket.  I just don't know whether there's any real use case of that in
reality.

Thanks,

-- 
Peter Xu
Re: [PATCH v5 13/21] migration: Postcopy recover with preempt enabled
Posted by manish.mishra 3 years, 8 months ago
On 16/05/22 7:41 pm, Peter Xu wrote:
> Hi, Manish,
>
> On Mon, May 16, 2022 at 07:01:35PM +0530, manish.mishra wrote:
>> On 26/04/22 5:08 am, Peter Xu wrote:
>> LGTM,
>> Peter, I wanted to give review-tag for this and ealier patch too. I am new
>> to qemu
>> review process so not sure how give review-tag, did not find any reference
>> on
>> google too. So if you please let me know how to do it.
> It's here:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.qemu.org_-3Fp-3Dqemu.git-3Ba-3Dblob-3Bf-3Ddocs_devel_submitting-2Da-2Dpatch.rst-3Bhb-3DHEAD-23l492&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=8LU6rphEJ5GMSXEpSxe8JZ_hpn6TQDUXfjWM6Vt7DdShxnU3X5zYXbAMBLPYchdK&s=TUNUCtdl7LWhrdlfnIx1F08kC0d9IMvArl6cNMpfXkc&e=  
>
> Since afaict QEMU is mostly following what Linux does, you can also
> reference to this one with more context:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_html_v4.17_process_submitting-2Dpatches.html-23using-2Dreported-2Dby-2Dtested-2Dby-2Dreviewed-2Dby-2Dsuggested-2Dby-2Dand-2Dfixes&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=8LU6rphEJ5GMSXEpSxe8JZ_hpn6TQDUXfjWM6Vt7DdShxnU3X5zYXbAMBLPYchdK&s=TJmr_eC4LAccVY1EqgkLleXfJhUgtIjTJmLc3cedYr0&e=  
>
> But since you're still having question regarding this patch, no rush on
> providing your R-bs; let's finish the discussion first.
>
> [...]
>
>>> +static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
>>> +{
>>> +    trace_postcopy_pause_fast_load();
>>> +    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
>> I may have misunderstood synchronisation here but very very rare chance,
>>
>> as both threads are working independently is it possible qemu_sem_post on
>>
>> postcopy_pause_sem_fast_load is called by main thread even before we go
>>
>> to qemu_sem_wait in next line, causing some kind of deadlock. That's should
>>
>> not be possible as i understand it requires manually calling
>> qmp_migration_recover
>>
>> so chances are almost impossible. Just wanted to confirm it.
> Sorry I don't quite get the question, could you elaborate?  E.g., when the
> described deadlock happened, what is both main thread and preempt load
> thread doing?  What are they waiting at?
>
> Note: we have already released mutex before waiting on sem.

What i meant here is deadlock could be due the reason that we infinately wait

on qemu_sem_wait(&mis->postcopy_pause_sem_fast_load), because main

thread already called post on postcopy_pause_sem_fast_load after recovery

even before we moved to qemu_sem_wait(&mis->postcopy_pause_sem_fast_load)

in next line. Basically if we miss a post on postcopy_pause_sem_fast_load.

This is nearly impossibily case becuase it requires full recovery path to be completed

before this thread executes just next line. Also as recovery needs to be called manually,

So please ignore this.

Basically i wanted to check if we should use something like

int pthread_cond_wait(pthread_cond_t *restrict cond,
                    pthread_mutex_t *restrict mutex);

so that there is no race between releasing mutex and calling wait.

>
>>> +    qemu_sem_wait(&mis->postcopy_pause_sem_fast_load);
>> Just wanted to confirm why postcopy_pause_incoming is not called here
>> itself.
> postcopy_pause_incoming() is only used in the main ram load thread, while
> this function (postcopy_pause_ram_fast_load) is only called by the preempt
> load thread.
>
ok got it, thanks Peter, i meant if we should close both the channels as soon

as we relise there is some failure instead of main thread waiting for error event

and then closing and pausing post-copy. But agree current approach is good.

>> Is it based on assumption that if there is error in any of the channel it
>> will
>>
>> eventually be paused on source side, closing both channels, resulting
>>
>> postcopy_pause_incoming will be called from main thread on destination?
> Yes.
>
>> Usually it should be good to call as early as possible. It is left to main
>>
>> thread default path so that we do not have any synchronisation overhead?
> What's the sync overhead you mentioned? What we want to do here is simply
> to put all the dest QEMU migration threads into a halted state rather than
> quitting, so that they can be continued when necessary.
>
>> Also Peter, i was trying to understand postcopy recovery model so is use
>> case
>>
>> of qmp_migrate_pause just for debugging purpose?
> Yes.  It's also a way to cleanly stop using the network (comparing to force
> unplug the nic ports?) for whatever reason with a shutdown() syscall upon
> the socket.  I just don't know whether there's any real use case of that in
> reality.
>
> Thanks,
>
Re: [PATCH v5 13/21] migration: Postcopy recover with preempt enabled
Posted by manish.mishra 3 years, 8 months ago
On 16/05/22 8:21 pm, manish.mishra wrote:
>
>
> On 16/05/22 7:41 pm, Peter Xu wrote:
>> Hi, Manish,
>>
>> On Mon, May 16, 2022 at 07:01:35PM +0530, manish.mishra wrote:
>>> On 26/04/22 5:08 am, Peter Xu wrote:
>>> LGTM,
>>> Peter, I wanted to give review-tag for this and ealier patch too. I am new
>>> to qemu
>>> review process so not sure how give review-tag, did not find any reference
>>> on
>>> google too. So if you please let me know how to do it.
>> It's here:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.qemu.org_-3Fp-3Dqemu.git-3Ba-3Dblob-3Bf-3Ddocs_devel_submitting-2Da-2Dpatch.rst-3Bhb-3DHEAD-23l492&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=8LU6rphEJ5GMSXEpSxe8JZ_hpn6TQDUXfjWM6Vt7DdShxnU3X5zYXbAMBLPYchdK&s=TUNUCtdl7LWhrdlfnIx1F08kC0d9IMvArl6cNMpfXkc&e=  
>>
>> Since afaict QEMU is mostly following what Linux does, you can also
>> reference to this one with more context:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_html_v4.17_process_submitting-2Dpatches.html-23using-2Dreported-2Dby-2Dtested-2Dby-2Dreviewed-2Dby-2Dsuggested-2Dby-2Dand-2Dfixes&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=8LU6rphEJ5GMSXEpSxe8JZ_hpn6TQDUXfjWM6Vt7DdShxnU3X5zYXbAMBLPYchdK&s=TJmr_eC4LAccVY1EqgkLleXfJhUgtIjTJmLc3cedYr0&e=  
>>
>> But since you're still having question regarding this patch, no rush on
>> providing your R-bs; let's finish the discussion first.
>>
>> [...]
>>
>>>> +static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
>>>> +{
>>>> +    trace_postcopy_pause_fast_load();
>>>> +    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
>>> I may have misunderstood synchronisation here but very very rare chance,
>>>
>>> as both threads are working independently is it possible qemu_sem_post on
>>>
>>> postcopy_pause_sem_fast_load is called by main thread even before we go
>>>
>>> to qemu_sem_wait in next line, causing some kind of deadlock. That's should
>>>
>>> not be possible as i understand it requires manually calling
>>> qmp_migration_recover
>>>
>>> so chances are almost impossible. Just wanted to confirm it.
>> Sorry I don't quite get the question, could you elaborate?  E.g., when the
>> described deadlock happened, what is both main thread and preempt load
>> thread doing?  What are they waiting at?
>>
>> Note: we have already released mutex before waiting on sem.
>
> What i meant here is deadlock could be due the reason that we infinately wait
>
> on qemu_sem_wait(&mis->postcopy_pause_sem_fast_load), because main
>
> thread already called post on postcopy_pause_sem_fast_load after recovery
>
> even before we moved to qemu_sem_wait(&mis->postcopy_pause_sem_fast_load)
>
> in next line. Basically if we miss a post on postcopy_pause_sem_fast_load.
>
> This is nearly impossibily case becuase it requires full recovery path to be completed
>
> before this thread executes just next line. Also as recovery needs to be called manually,
>
> So please ignore this.
>
> Basically i wanted to check if we should use something like
>
> int pthread_cond_wait(pthread_cond_t *restrict cond,
>                     pthread_mutex_t *restrict mutex);
>
> so that there is no race between releasing mutex and calling wait.
>
Really sorry, please ignore this it is sem_post and sem_wait so that is not possible.
>
>>>> +    qemu_sem_wait(&mis->postcopy_pause_sem_fast_load);
>>> Just wanted to confirm why postcopy_pause_incoming is not called here
>>> itself.
>> postcopy_pause_incoming() is only used in the main ram load thread, while
>> this function (postcopy_pause_ram_fast_load) is only called by the preempt
>> load thread.
>>
> ok got it, thanks Peter, i meant if we should close both the channels as soon
>
> as we relise there is some failure instead of main thread waiting for error event
>
> and then closing and pausing post-copy. But agree current approach is good.
>
>>> Is it based on assumption that if there is error in any of the channel it
>>> will
>>>
>>> eventually be paused on source side, closing both channels, resulting
>>>
>>> postcopy_pause_incoming will be called from main thread on destination?
>> Yes.
>>
>>> Usually it should be good to call as early as possible. It is left to main
>>>
>>> thread default path so that we do not have any synchronisation overhead?
>> What's the sync overhead you mentioned? What we want to do here is simply
>> to put all the dest QEMU migration threads into a halted state rather than
>> quitting, so that they can be continued when necessary.
>>
>>> Also Peter, i was trying to understand postcopy recovery model so is use
>>> case
>>>
>>> of qmp_migrate_pause just for debugging purpose?
>> Yes.  It's also a way to cleanly stop using the network (comparing to force
>> unplug the nic ports?) for whatever reason with a shutdown() syscall upon
>> the socket.  I just don't know whether there's any real use case of that in
>> reality.
>>
>> Thanks,
>>
Re: [PATCH v5 13/21] migration: Postcopy recover with preempt enabled
Posted by Peter Xu 3 years, 8 months ago
On Mon, May 16, 2022 at 08:21:23PM +0530, manish.mishra wrote:
> 
> On 16/05/22 7:41 pm, Peter Xu wrote:
> > Hi, Manish,
> > 
> > On Mon, May 16, 2022 at 07:01:35PM +0530, manish.mishra wrote:
> > > On 26/04/22 5:08 am, Peter Xu wrote:
> > > LGTM,
> > > Peter, I wanted to give review-tag for this and ealier patch too. I am new
> > > to qemu
> > > review process so not sure how give review-tag, did not find any reference
> > > on
> > > google too. So if you please let me know how to do it.
> > It's here:
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.qemu.org_-3Fp-3Dqemu.git-3Ba-3Dblob-3Bf-3Ddocs_devel_submitting-2Da-2Dpatch.rst-3Bhb-3DHEAD-23l492&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=8LU6rphEJ5GMSXEpSxe8JZ_hpn6TQDUXfjWM6Vt7DdShxnU3X5zYXbAMBLPYchdK&s=TUNUCtdl7LWhrdlfnIx1F08kC0d9IMvArl6cNMpfXkc&e=
> > 
> > Since afaict QEMU is mostly following what Linux does, you can also
> > reference to this one with more context:
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_html_v4.17_process_submitting-2Dpatches.html-23using-2Dreported-2Dby-2Dtested-2Dby-2Dreviewed-2Dby-2Dsuggested-2Dby-2Dand-2Dfixes&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=8LU6rphEJ5GMSXEpSxe8JZ_hpn6TQDUXfjWM6Vt7DdShxnU3X5zYXbAMBLPYchdK&s=TJmr_eC4LAccVY1EqgkLleXfJhUgtIjTJmLc3cedYr0&e=
> > 
> > But since you're still having question regarding this patch, no rush on
> > providing your R-bs; let's finish the discussion first.
> > 
> > [...]
> > 
> > > > +static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
> > > > +{
> > > > +    trace_postcopy_pause_fast_load();
> > > > +    qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
> > > I may have misunderstood synchronisation here but very very rare chance,
> > > 
> > > as both threads are working independently is it possible qemu_sem_post on
> > > 
> > > postcopy_pause_sem_fast_load is called by main thread even before we go
> > > 
> > > to qemu_sem_wait in next line, causing some kind of deadlock. That's should
> > > 
> > > not be possible as i understand it requires manually calling
> > > qmp_migration_recover
> > > 
> > > so chances are almost impossible. Just wanted to confirm it.
> > Sorry I don't quite get the question, could you elaborate?  E.g., when the
> > described deadlock happened, what is both main thread and preempt load
> > thread doing?  What are they waiting at?
> > 
> > Note: we have already released mutex before waiting on sem.
> 
> What i meant here is deadlock could be due the reason that we infinately wait
> 
> on qemu_sem_wait(&mis->postcopy_pause_sem_fast_load), because main
> 
> thread already called post on postcopy_pause_sem_fast_load after recovery
> 
> even before we moved to qemu_sem_wait(&mis->postcopy_pause_sem_fast_load)
> 
> in next line. Basically if we miss a post on postcopy_pause_sem_fast_load.
> 
> This is nearly impossibily case becuase it requires full recovery path to be completed
> 
> before this thread executes just next line. Also as recovery needs to be called manually,
> 
> So please ignore this.

Yes the migration state has a dependency.

The other more obvious reason it won't happen is that sem is number based
and it remembers.  Please try below:

    sem_t *sem = sem_open("test", O_CREAT);
    sem_post(sem);
    sem_wait(sem);

And sem_wait() will return immediately because post() already set it to 1.

> 
> Basically i wanted to check if we should use something like
> 
> int pthread_cond_wait(pthread_cond_t *restrict cond,
>                    pthread_mutex_t *restrict mutex);
> 
> so that there is no race between releasing mutex and calling wait.

Yes I think condvar should also work here but it should be stricter.

Thanks,

-- 
Peter Xu