[Qemu-devel] [PATCH] qemu-iotests: fix 203 migration completion race

Stefan Hajnoczi posted 1 patch 7 years, 7 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180305155926.25858-1-stefanha@redhat.com
Test checkpatch passed
Test docker-build@min-glib passed
Test docker-mingw@fedora passed
Test docker-quick@centos6 passed
Test ppcbe passed
Test ppcle passed
Test s390x passed
tests/qemu-iotests/203     | 15 +++++++++++----
tests/qemu-iotests/203.out |  5 +++++
2 files changed, 16 insertions(+), 4 deletions(-)
[Qemu-devel] [PATCH] qemu-iotests: fix 203 migration completion race
Posted by Stefan Hajnoczi 7 years, 7 months ago
There is a race between the test's 'query-migrate' QMP command after the
QMP 'STOP' event and completing the migration:

The test case invokes 'query-migrate' upon receiving 'STOP'.  At this
point the migration thread may still be in the process of completing.
Therefore 'query-migrate' can return 'status': 'active' for a brief
window of time instead of 'status': 'completed'.  This results in
qemu-iotests 203 hanging.

Solve the race by enabling the 'events' migration capability, which
causes QEMU to emit migration-specific QMP events that do not suffer
from this race condition.  Wait for the QMP 'MIGRATION' event with
'status': 'completed'.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/203     | 15 +++++++++++----
 tests/qemu-iotests/203.out |  5 +++++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/203 b/tests/qemu-iotests/203
index 2c811917d8..4874a1a0d8 100755
--- a/tests/qemu-iotests/203
+++ b/tests/qemu-iotests/203
@@ -49,11 +49,18 @@ with iotests.FilePath('disk0.img') as disk0_img_path, \
                        node_name='drive1-node', iothread='iothread0',
                        force=True))
 
+    iotests.log('Enabling migration QMP events...')
+    iotests.log(vm.qmp('migrate-set-capabilities', capabilities=[
+        {
+            'capability': 'events',
+            'state': True
+        }
+    ]))
+
     iotests.log('Starting migration...')
     iotests.log(vm.qmp('migrate', uri='exec:cat >/dev/null'))
     while True:
-        vm.get_qmp_event(wait=60.0)
-        result = vm.qmp('query-migrate')
-        status = result.get('return', {}).get('status', None)
-        if status == 'completed':
+        event = vm.event_wait('MIGRATION')
+        iotests.log(event, filters=[iotests.filter_qmp_event])
+        if event['data']['status'] == 'completed':
             break
diff --git a/tests/qemu-iotests/203.out b/tests/qemu-iotests/203.out
index 3f1ff900e4..1a11f0975c 100644
--- a/tests/qemu-iotests/203.out
+++ b/tests/qemu-iotests/203.out
@@ -2,5 +2,10 @@ Launching VM...
 Setting IOThreads...
 {u'return': {}}
 {u'return': {}}
+Enabling migration QMP events...
+{u'return': {}}
 Starting migration...
 {u'return': {}}
+{u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'setup'}, u'event': u'MIGRATION'}
+{u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'active'}, u'event': u'MIGRATION'}
+{u'timestamp': {u'seconds': 'SECS', u'microseconds': 'USECS'}, u'data': {u'status': u'completed'}, u'event': u'MIGRATION'}
-- 
2.14.3


Re: [Qemu-devel] [PATCH] qemu-iotests: fix 203 migration completion race
Posted by Max Reitz 7 years, 7 months ago
On 2018-03-05 16:59, Stefan Hajnoczi wrote:
> There is a race between the test's 'query-migrate' QMP command after the
> QMP 'STOP' event and completing the migration:
> 
> The test case invokes 'query-migrate' upon receiving 'STOP'.  At this
> point the migration thread may still be in the process of completing.
> Therefore 'query-migrate' can return 'status': 'active' for a brief
> window of time instead of 'status': 'completed'.  This results in
> qemu-iotests 203 hanging.
> 
> Solve the race by enabling the 'events' migration capability, which
> causes QEMU to emit migration-specific QMP events that do not suffer
> from this race condition.  Wait for the QMP 'MIGRATION' event with
> 'status': 'completed'.
> 
> Reported-by: Max Reitz <mreitz@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tests/qemu-iotests/203     | 15 +++++++++++----
>  tests/qemu-iotests/203.out |  5 +++++
>  2 files changed, 16 insertions(+), 4 deletions(-)

So much for "the ppoll() dungeon"...

Thanks!

Reviewed-by: Max Reitz <mreitz@redhat.com>

Re: [Qemu-devel] [Qemu-block] [PATCH] qemu-iotests: fix 203 migration completion race
Posted by Stefan Hajnoczi 7 years, 7 months ago
On Mon, Mar 05, 2018 at 05:04:52PM +0100, Max Reitz wrote:
> On 2018-03-05 16:59, Stefan Hajnoczi wrote:
> > There is a race between the test's 'query-migrate' QMP command after the
> > QMP 'STOP' event and completing the migration:
> > 
> > The test case invokes 'query-migrate' upon receiving 'STOP'.  At this
> > point the migration thread may still be in the process of completing.
> > Therefore 'query-migrate' can return 'status': 'active' for a brief
> > window of time instead of 'status': 'completed'.  This results in
> > qemu-iotests 203 hanging.
> > 
> > Solve the race by enabling the 'events' migration capability, which
> > causes QEMU to emit migration-specific QMP events that do not suffer
> > from this race condition.  Wait for the QMP 'MIGRATION' event with
> > 'status': 'completed'.
> > 
> > Reported-by: Max Reitz <mreitz@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  tests/qemu-iotests/203     | 15 +++++++++++----
> >  tests/qemu-iotests/203.out |  5 +++++
> >  2 files changed, 16 insertions(+), 4 deletions(-)
> 
> So much for "the ppoll() dungeon"...

It was still a pain to debug :).

I put a ring buffer into the QMP monitor input/output code.  Then it was
possible to figure out the issue via GDB on a hung QEMU:

  (gdb) p current_run_state
  RUN_STATE_POSTMIGRATE
  (gdb) p current_migration->status
  MIGRATION_STATUS_COMPLETED
  (gdb) p monitor_out_ring
  ...'STOP' event...
  (gdb) p monitor_in_ring
  ...query-migrate...  <-- okay, the test checked if migration finished

Then looking at the code:

  static void migration_completion(MigrationState *s)
  {
      ...
      if (s->state == MIGRATION_STATUS_ACTIVE) {
          qemu_mutex_lock_iothread();
          s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
          qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
          s->vm_was_running = runstate_is_running();
          ret = global_state_store();

          if (!ret) {
              bool inactivate = !migrate_colo_enabled();

	        v---- The stop event comes from here
              ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
	      ...
          }
          qemu_mutex_unlock_iothread(); <--- oh, no!
      ...
      if (!migrate_colo_enabled()) {
          migrate_set_state(&s->state, current_active_state,
                            MIGRATION_STATUS_COMPLETED); <-- too late!
      }

      return;
Re: [Qemu-devel] [Qemu-block] [PATCH] qemu-iotests: fix 203 migration completion race
Posted by Max Reitz 7 years, 7 months ago
On 2018-03-06 17:18, Stefan Hajnoczi wrote:
> On Mon, Mar 05, 2018 at 05:04:52PM +0100, Max Reitz wrote:
>> On 2018-03-05 16:59, Stefan Hajnoczi wrote:
>>> There is a race between the test's 'query-migrate' QMP command after the
>>> QMP 'STOP' event and completing the migration:
>>>
>>> The test case invokes 'query-migrate' upon receiving 'STOP'.  At this
>>> point the migration thread may still be in the process of completing.
>>> Therefore 'query-migrate' can return 'status': 'active' for a brief
>>> window of time instead of 'status': 'completed'.  This results in
>>> qemu-iotests 203 hanging.
>>>
>>> Solve the race by enabling the 'events' migration capability, which
>>> causes QEMU to emit migration-specific QMP events that do not suffer
>>> from this race condition.  Wait for the QMP 'MIGRATION' event with
>>> 'status': 'completed'.
>>>
>>> Reported-by: Max Reitz <mreitz@redhat.com>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>>  tests/qemu-iotests/203     | 15 +++++++++++----
>>>  tests/qemu-iotests/203.out |  5 +++++
>>>  2 files changed, 16 insertions(+), 4 deletions(-)
>>
>> So much for "the ppoll() dungeon"...
> 
> It was still a pain to debug :).
> 
> I put a ring buffer into the QMP monitor input/output code.

Oh, wow.

>                                                              Then it was
> possible to figure out the issue via GDB on a hung QEMU:
> 
>   (gdb) p current_run_state
>   RUN_STATE_POSTMIGRATE
>   (gdb) p current_migration->status
>   MIGRATION_STATUS_COMPLETED
>   (gdb) p monitor_out_ring
>   ...'STOP' event...
>   (gdb) p monitor_in_ring
>   ...query-migrate...  <-- okay, the test checked if migration finished
> 
> Then looking at the code:
> 
>   static void migration_completion(MigrationState *s)
>   {
>       ...
>       if (s->state == MIGRATION_STATUS_ACTIVE) {
>           qemu_mutex_lock_iothread();
>           s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>           qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>           s->vm_was_running = runstate_is_running();
>           ret = global_state_store();
> 
>           if (!ret) {
>               bool inactivate = !migrate_colo_enabled();
> 
> 	        v---- The stop event comes from here
>               ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> 	      ...
>           }
>           qemu_mutex_unlock_iothread(); <--- oh, no!
>       ...
>       if (!migrate_colo_enabled()) {
>           migrate_set_state(&s->state, current_active_state,
>                             MIGRATION_STATUS_COMPLETED); <-- too late!
>       }
> 
>       return;

OK...  I guess the answer to this just is "the stop event doesn't mean
anything, use the migration events instead" (i.e. what your patch does).

Thanks a lot, applied to my block branch:

https://github.com/XanClic/qemu/commits/block

Max