[v1] memory: Fix qemu crash on starting dirty log twice with stopped VM

[PATCH] memory: Fix qemu crash on starting dirty log twice with stopped VM

Posted by Peter Xu 4 years ago

QEMU can now easily crash with two continuous migration carried out:

(qemu) migrate -d exec:cat>out
(qemu) migrate_cancel
(qemu) migrate -d exec:cat>out
[crash] ../softmmu/memory.c:2782: memory_global_dirty_log_start: Assertion
`!(global_dirty_tracking & flags)' failed.

It's because memory API provides a way to postpone dirty log stop if the VM is
stopped, and that'll be re-done until the next VM start.  It was added in 2017
with commit 1931076077 ("migration: optimize the downtime", 2017-08-01).

However the recent work on allowing dirty tracking to be bitmask broke it,
which is commit 63b41db4bc ("memory: make global_dirty_tracking a bitmask",
2021-11-01).

The fix proposed in this patch contains two things:

  (1) Instead of passing over the flags to postpone stop dirty track, we add a
      global variable (along with current vmstate_change variable) to record
      what flags to stop dirty tracking.

  (2) When start dirty tracking, instead if remove the vmstate hook directly,
      we also execute the postponed stop process so that we make sure all the
      starts and stops will be paired.

This procedure is overlooked in the bitmask-ify work in 2021.

Actually with the postponed stop flags, a smarter thing to do is when start
dirty logging with specific flag, we can ignore the start if the flag is
contained in the "postponed to stop" flags, however that'll slightly complicate
the code, and it shouldn't be a major use case for QEMU.  Considering that this
should also copy stable, keep it simple while fixing the crash.

Cc: Hyman Huang <huangy81@chinatelecom.cn>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044818
Fixes: 63b41db4bc ("memory: make global_dirty_tracking a bitmask")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 softmmu/memory.c | 53 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 38 insertions(+), 15 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 678dc62f06..cbb9b241ea 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2790,16 +2790,25 @@ void memory_global_after_dirty_log_sync(void)
     MEMORY_LISTENER_CALL_GLOBAL(log_global_after_sync, Forward);
 }
 
+/*
+ * Dirty track stop flags that are postponed due to VM being stopped.  Should
+ * only be used within vmstate_change hook.
+ */
+static unsigned int postponed_stop_flags;
 static VMChangeStateEntry *vmstate_change;
+static void memory_global_dirty_log_stop_postponed_run(void);
 
 void memory_global_dirty_log_start(unsigned int flags)
 {
     unsigned int old_flags = global_dirty_tracking;
 
-    if (vmstate_change) {
-        qemu_del_vm_change_state_handler(vmstate_change);
-        vmstate_change = NULL;
-    }
+    /*
+     * If there is postponed dirty log stop flags, do it, so that start/stop
+     * will always be paired.  A smarter thing to do is ignore start process if
+     * the same flag has got postponed on stop, but it shouldn't matter a lot
+     * in major user scenarios, so keep the code simple for now.
+     */
+    memory_global_dirty_log_stop_postponed_run();
 
     assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
     assert(!(global_dirty_tracking & flags));
@@ -2830,29 +2839,43 @@ static void memory_global_dirty_log_do_stop(unsigned int flags)
     }
 }
 
+/*
+ * Execute the postponed dirty log stop operations if there is, then reset
+ * everything (including the flags and the vmstate change hook).
+ */
+static void memory_global_dirty_log_stop_postponed_run(void)
+{
+    if (!vmstate_change) {
+        /* It means nothing is postponed */
+        return;
+    }
+
+    memory_global_dirty_log_do_stop(postponed_stop_flags);
+    postponed_stop_flags = 0;
+    qemu_del_vm_change_state_handler(vmstate_change);
+    vmstate_change = NULL;
+}
+
 static void memory_vm_change_state_handler(void *opaque, bool running,
                                            RunState state)
 {
-    unsigned int flags = (unsigned int)(uintptr_t)opaque;
     if (running) {
-        memory_global_dirty_log_do_stop(flags);
-
-        if (vmstate_change) {
-            qemu_del_vm_change_state_handler(vmstate_change);
-            vmstate_change = NULL;
-        }
+        memory_global_dirty_log_stop_postponed_run();
     }
 }
 
 void memory_global_dirty_log_stop(unsigned int flags)
 {
     if (!runstate_is_running()) {
+        /* Postpone the dirty log stop, e.g., to when VM starts again */
         if (vmstate_change) {
-            return;
+            /* Batch with previous postponed flags */
+            postponed_stop_flags |= flags;
+        } else {
+            postponed_stop_flags = flags;
+            vmstate_change = qemu_add_vm_change_state_handler(
+                memory_vm_change_state_handler, NULL);
         }
-        vmstate_change = qemu_add_vm_change_state_handler(
-                                memory_vm_change_state_handler,
-                                (void *)(uintptr_t)flags);
         return;
     }
 
-- 
2.32.0

Re: [PATCH] memory: Fix qemu crash on starting dirty log twice with stopped VM

Posted by Paolo Bonzini 3 years, 12 months ago

On 2/7/22 04:26, Peter Xu wrote:
> QEMU can now easily crash with two continuous migration carried out:
> 
> (qemu) migrate -d exec:cat>out
> (qemu) migrate_cancel
> (qemu) migrate -d exec:cat>out
> [crash] ../softmmu/memory.c:2782: memory_global_dirty_log_start: Assertion
> `!(global_dirty_tracking & flags)' failed.
> 
> It's because memory API provides a way to postpone dirty log stop if the VM is
> stopped, and that'll be re-done until the next VM start.  It was added in 2017
> with commit 1931076077 ("migration: optimize the downtime", 2017-08-01).
> 
> However the recent work on allowing dirty tracking to be bitmask broke it,
> which is commit 63b41db4bc ("memory: make global_dirty_tracking a bitmask",
> 2021-11-01).
> 
> The fix proposed in this patch contains two things:
> 
>    (1) Instead of passing over the flags to postpone stop dirty track, we add a
>        global variable (along with current vmstate_change variable) to record
>        what flags to stop dirty tracking.
> 
>    (2) When start dirty tracking, instead if remove the vmstate hook directly,
>        we also execute the postponed stop process so that we make sure all the
>        starts and stops will be paired.
> 
> This procedure is overlooked in the bitmask-ify work in 2021.
> 
> Actually with the postponed stop flags, a smarter thing to do is when start
> dirty logging with specific flag, we can ignore the start if the flag is
> contained in the "postponed to stop" flags, however that'll slightly complicate
> the code, and it shouldn't be a major use case for QEMU.  Considering that this
> should also copy stable, keep it simple while fixing the crash.
> 
> Cc: Hyman Huang <huangy81@chinatelecom.cn>
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044818
> Fixes: 63b41db4bc ("memory: make global_dirty_tracking a bitmask")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   softmmu/memory.c | 53 ++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 38 insertions(+), 15 deletions(-)
> 
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 678dc62f06..cbb9b241ea 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -2790,16 +2790,25 @@ void memory_global_after_dirty_log_sync(void)
>       MEMORY_LISTENER_CALL_GLOBAL(log_global_after_sync, Forward);
>   }
>   
> +/*
> + * Dirty track stop flags that are postponed due to VM being stopped.  Should
> + * only be used within vmstate_change hook.
> + */
> +static unsigned int postponed_stop_flags;
>   static VMChangeStateEntry *vmstate_change;
> +static void memory_global_dirty_log_stop_postponed_run(void);
>   
>   void memory_global_dirty_log_start(unsigned int flags)
>   {
>       unsigned int old_flags = global_dirty_tracking;
>   
> -    if (vmstate_change) {
> -        qemu_del_vm_change_state_handler(vmstate_change);
> -        vmstate_change = NULL;
> -    }
> +    /*
> +     * If there is postponed dirty log stop flags, do it, so that start/stop
> +     * will always be paired.  A smarter thing to do is ignore start process if
> +     * the same flag has got postponed on stop, but it shouldn't matter a lot
> +     * in major user scenarios, so keep the code simple for now.
> +     */
> +    memory_global_dirty_log_stop_postponed_run();

I think this should be as easy as doing here:

     postponed_stop_flags &= ~flags;
     memory_global_dirty_log_stop_postponed_run();
     flags &= ~global_dirty_tracking_flags;
     if (!flags) {
         return;
     }

plus adding an "if (postponed_stop_flags)" in 
memory_global_dirty_log_stop_postponed_run?

Paolo

Re: [PATCH] memory: Fix qemu crash on starting dirty log twice with stopped VM

Posted by Peter Xu 3 years, 12 months ago

On Mon, Feb 07, 2022 at 10:08:44AM +0100, Paolo Bonzini wrote:
> >   void memory_global_dirty_log_start(unsigned int flags)
> >   {
> >       unsigned int old_flags = global_dirty_tracking;
> > -    if (vmstate_change) {
> > -        qemu_del_vm_change_state_handler(vmstate_change);
> > -        vmstate_change = NULL;
> > -    }
> > +    /*
> > +     * If there is postponed dirty log stop flags, do it, so that start/stop
> > +     * will always be paired.  A smarter thing to do is ignore start process if
> > +     * the same flag has got postponed on stop, but it shouldn't matter a lot
> > +     * in major user scenarios, so keep the code simple for now.
> > +     */
> > +    memory_global_dirty_log_stop_postponed_run();
> 
> I think this should be as easy as doing here:
> 
>     postponed_stop_flags &= ~flags;
>     memory_global_dirty_log_stop_postponed_run();
>     flags &= ~global_dirty_tracking_flags;
>     if (!flags) {
>         return;
>     }
> 
> plus adding an "if (postponed_stop_flags)" in
> memory_global_dirty_log_stop_postponed_run?

Yeah I can do.  Though the latter "if (!flags)" check will also start to allow
nesting of memory_global_dirty_log_start(), and it'll make this assert useless:

    assert(!(global_dirty_tracking & flags));

I'll probably drop it too, then.

Curious: do we have any real case of nesting calls of starting dirty log?  I
always thought there's none, but I could miss something.

I'll post a v2 soon.  Thanks for the quick review!

-- 
Peter Xu

Re: [PATCH] memory: Fix qemu crash on starting dirty log twice with stopped VM

Posted by Paolo Bonzini 3 years, 12 months ago

On 2/7/22 11:36, Peter Xu wrote:
> Yeah I can do.  Though the latter "if (!flags)" check will also start to allow
> nesting of memory_global_dirty_log_start(), and it'll make this assert useless:
> 
>      assert(!(global_dirty_tracking & flags));
> 
> I'll probably drop it too, then.
> 
> Curious: do we have any real case of nesting calls of starting dirty log?  I
> always thought there's none, but I could miss something.

I don't think so, but I think there's no disadvantage in allowing it.

Paolo