[Qemu-devel] [PATCH] migration: Fix race of image locking between src and dst

Fam Zheng posted 1 patch 6 years, 9 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170616160658.32290-1-famz@redhat.com
Test FreeBSD passed
Test checkpatch passed
Test docker passed
Test s390x passed
migration/colo.c      |  2 +-
migration/migration.c | 19 +++++++------------
migration/savevm.c    | 19 +++++++++++++++----
migration/savevm.h    |  3 ++-
4 files changed, 25 insertions(+), 18 deletions(-)
[Qemu-devel] [PATCH] migration: Fix race of image locking between src and dst
Posted by Fam Zheng 6 years, 9 months ago
Previously, dst side will immediately try to lock the write byte upon
receiving QEMU_VM_EOF, but at src side, bdrv_inactivate_all() is only
done after sending it. If the src host is under load, dst may fail to
acquire the lock due to racing with the src unlocking it.

Fix this by hoisting the bdrv_inactivate_all() operation before
QEMU_VM_EOF.

N.B. A further improvement could possibly be done to cleanly handover
locks between src and dst, so that there is no window where a third QEMU
could steal the locks and prevent src and dst from running.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
---
 migration/colo.c      |  2 +-
 migration/migration.c | 19 +++++++------------
 migration/savevm.c    | 19 +++++++++++++++----
 migration/savevm.h    |  3 ++-
 4 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index c436d63..c4ba4c3 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -352,7 +352,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     qemu_savevm_state_header(fb);
     qemu_savevm_state_begin(fb);
     qemu_mutex_lock_iothread();
-    qemu_savevm_state_complete_precopy(fb, false);
+    qemu_savevm_state_complete_precopy(fb, false, false);
     qemu_mutex_unlock_iothread();
 
     qemu_fflush(fb);
diff --git a/migration/migration.c b/migration/migration.c
index b9d8798..f588329 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1553,7 +1553,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
      * Cause any non-postcopiable, but iterative devices to
      * send out their final data.
      */
-    qemu_savevm_state_complete_precopy(ms->to_dst_file, true);
+    qemu_savevm_state_complete_precopy(ms->to_dst_file, true, false);
 
     /*
      * in Finish migrate and with the io-lock held everything should
@@ -1597,7 +1597,7 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
      */
     qemu_savevm_send_postcopy_listen(fb);
 
-    qemu_savevm_state_complete_precopy(fb, false);
+    qemu_savevm_state_complete_precopy(fb, false, false);
     qemu_savevm_send_ping(fb, 3);
 
     qemu_savevm_send_postcopy_run(fb);
@@ -1695,20 +1695,15 @@ static void migration_completion(MigrationState *s, int current_active_state,
         ret = global_state_store();
 
         if (!ret) {
+            bool inactivate = !migrate_colo_enabled();
             ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
             if (ret >= 0) {
                 qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
-                qemu_savevm_state_complete_precopy(s->to_dst_file, false);
+                ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
+                                                         inactivate);
             }
-            /*
-             * Don't mark the image with BDRV_O_INACTIVE flag if
-             * we will go into COLO stage later.
-             */
-            if (ret >= 0 && !migrate_colo_enabled()) {
-                ret = bdrv_inactivate_all();
-                if (ret >= 0) {
-                    s->block_inactive = true;
-                }
+            if (inactivate && ret >= 0) {
+                s->block_inactive = true;
             }
         }
         qemu_mutex_unlock_iothread();
diff --git a/migration/savevm.c b/migration/savevm.c
index f32a82d..6bfd489 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1104,7 +1104,8 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
     qemu_fflush(f);
 }
 
-void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
+int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
+                                       bool inactivate_disks)
 {
     QJSON *vmdesc;
     int vmdesc_len;
@@ -1138,12 +1139,12 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
         save_section_footer(f, se);
         if (ret < 0) {
             qemu_file_set_error(f, ret);
-            return;
+            return -1;
         }
     }
 
     if (iterable_only) {
-        return;
+        return 0;
     }
 
     vmdesc = qjson_new();
@@ -1173,6 +1174,15 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
         json_end_object(vmdesc);
     }
 
+    if (inactivate_disks) {
+        /* Inactivate before sending QEMU_VM_EOF so that the
+         * bdrv_invalidate_cache_all() on the other end won't fail. */
+        ret = bdrv_inactivate_all();
+        if (ret) {
+            qemu_file_set_error(f, ret);
+            return ret;
+        }
+    }
     if (!in_postcopy) {
         /* Postcopy stream will still be going */
         qemu_put_byte(f, QEMU_VM_EOF);
@@ -1190,6 +1200,7 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
     qjson_destroy(vmdesc);
 
     qemu_fflush(f);
+    return 0;
 }
 
 /* Give an estimate of the amount left to be transferred,
@@ -1263,7 +1274,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
 
     ret = qemu_file_get_error(f);
     if (ret == 0) {
-        qemu_savevm_state_complete_precopy(f, false);
+        qemu_savevm_state_complete_precopy(f, false, false);
         ret = qemu_file_get_error(f);
     }
     qemu_savevm_state_cleanup();
diff --git a/migration/savevm.h b/migration/savevm.h
index 45b59c1..5a2ed11 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -35,7 +35,8 @@ void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
 void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
-void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only);
+int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
+                                       bool inactivate_disks);
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
                                uint64_t *res_non_postcopiable,
                                uint64_t *res_postcopiable);
-- 
2.9.4


Re: [Qemu-devel] [PATCH] migration: Fix race of image locking between src and dst
Posted by Daniel P. Berrange 6 years, 9 months ago
On Sat, Jun 17, 2017 at 12:06:58AM +0800, Fam Zheng wrote:
> Previously, dst side will immediately try to lock the write byte upon
> receiving QEMU_VM_EOF, but at src side, bdrv_inactivate_all() is only
> done after sending it. If the src host is under load, dst may fail to
> acquire the lock due to racing with the src unlocking it.
> 
> Fix this by hoisting the bdrv_inactivate_all() operation before
> QEMU_VM_EOF.
> 
> N.B. A further improvement could possibly be done to cleanly handover
> locks between src and dst, so that there is no window where a third QEMU
> could steal the locks and prevent src and dst from running.
> 
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  migration/colo.c      |  2 +-
>  migration/migration.c | 19 +++++++------------
>  migration/savevm.c    | 19 +++++++++++++++----
>  migration/savevm.h    |  3 ++-
>  4 files changed, 25 insertions(+), 18 deletions(-)

[snip]

> @@ -1695,20 +1695,15 @@ static void migration_completion(MigrationState *s, int current_active_state,
>          ret = global_state_store();
>  
>          if (!ret) {
> +            bool inactivate = !migrate_colo_enabled();
>              ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>              if (ret >= 0) {
>                  qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
> -                qemu_savevm_state_complete_precopy(s->to_dst_file, false);
> +                ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
> +                                                         inactivate);
>              }
> -            /*
> -             * Don't mark the image with BDRV_O_INACTIVE flag if
> -             * we will go into COLO stage later.
> -             */
> -            if (ret >= 0 && !migrate_colo_enabled()) {
> -                ret = bdrv_inactivate_all();
> -                if (ret >= 0) {
> -                    s->block_inactive = true;
> -                }
> +            if (inactivate && ret >= 0) {
> +                s->block_inactive = true;
>              }
>          }
>          qemu_mutex_unlock_iothread();

[snip]

> @@ -1173,6 +1174,15 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
>          json_end_object(vmdesc);
>      }
>  
> +    if (inactivate_disks) {
> +        /* Inactivate before sending QEMU_VM_EOF so that the
> +         * bdrv_invalidate_cache_all() on the other end won't fail. */
> +        ret = bdrv_inactivate_all();
> +        if (ret) {
> +            qemu_file_set_error(f, ret);
> +            return ret;
> +        }
> +    }

IIUC as well as fixing the race condition, you're also improving
error reporting by using qemu_file_set_error() which was not done
previously. Would be nice to mention that in the commit message
too if you respin for any other reason, but that's just a nit-pick
so

  Reviewed-by: Daniel P. Berrange <berrange@redhat.com>

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] migration: Fix race of image locking between src and dst
Posted by Peter Maydell 6 years, 9 months ago
On 19 June 2017 at 15:49, Daniel P. Berrange <berrange@redhat.com> wrote:
> IIUC as well as fixing the race condition, you're also improving
> error reporting by using qemu_file_set_error() which was not done
> previously. Would be nice to mention that in the commit message
> too if you respin for any other reason, but that's just a nit-pick

I've added
    N.B. This commit includes a minor improvement to the error handling
    by using qemu_file_set_error().

to the commit message -- is that OK or do you want to suggest a
different wording?

thanks
-- PMM

Re: [Qemu-devel] [PATCH] migration: Fix race of image locking between src and dst
Posted by Daniel P. Berrange 6 years, 9 months ago
On Mon, Jun 19, 2017 at 04:27:53PM +0100, Peter Maydell wrote:
> On 19 June 2017 at 15:49, Daniel P. Berrange <berrange@redhat.com> wrote:
> > IIUC as well as fixing the race condition, you're also improving
> > error reporting by using qemu_file_set_error() which was not done
> > previously. Would be nice to mention that in the commit message
> > too if you respin for any other reason, but that's just a nit-pick
> 
> I've added
>     N.B. This commit includes a minor improvement to the error handling
>     by using qemu_file_set_error().
> 
> to the commit message -- is that OK or do you want to suggest a
> different wording?

That's fine with me.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] migration: Fix race of image locking between src and dst
Posted by Juan Quintela 6 years, 9 months ago
Fam Zheng <famz@redhat.com> wrote:
> Previously, dst side will immediately try to lock the write byte upon
> receiving QEMU_VM_EOF, but at src side, bdrv_inactivate_all() is only
> done after sending it. If the src host is under load, dst may fail to
> acquire the lock due to racing with the src unlocking it.
>
> Fix this by hoisting the bdrv_inactivate_all() operation before
> QEMU_VM_EOF.
>
> N.B. A further improvement could possibly be done to cleanly handover
> locks between src and dst, so that there is no window where a third QEMU
> could steal the locks and prevent src and dst from running.
>
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Fam Zheng <famz@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

Re: [Qemu-devel] [PATCH] migration: Fix race of image locking between src and dst
Posted by Peter Maydell 6 years, 9 months ago
On 19 June 2017 at 16:26, Juan Quintela <quintela@redhat.com> wrote:
> Fam Zheng <famz@redhat.com> wrote:
>> Previously, dst side will immediately try to lock the write byte upon
>> receiving QEMU_VM_EOF, but at src side, bdrv_inactivate_all() is only
>> done after sending it. If the src host is under load, dst may fail to
>> acquire the lock due to racing with the src unlocking it.
>>
>> Fix this by hoisting the bdrv_inactivate_all() operation before
>> QEMU_VM_EOF.
>>
>> N.B. A further improvement could possibly be done to cleanly handover
>> locks between src and dst, so that there is no window where a third QEMU
>> could steal the locks and prevent src and dst from running.
>>
>> Reported-by: Peter Maydell <peter.maydell@linaro.org>
>> Signed-off-by: Fam Zheng <famz@redhat.com>
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>

Applied to master, thanks all.

-- PMM