[Qemu-devel] [PATCH 1/2] qemu-iotests: reduce chance of races in 185

Stefan Hajnoczi posted 2 patches 7 years, 5 months ago
There is a newer version of this series
[Qemu-devel] [PATCH 1/2] qemu-iotests: reduce chance of races in 185
Posted by Stefan Hajnoczi 7 years, 5 months ago
Commit 8565c3ab537e78f3e69977ec2c609dc9417a806e ("qemu-iotests: fix
185") identified a race condition in a sub-test.

Similar issues also affect the other sub-tests.  If disk I/O completes
quickly, it races with the QMP 'quit' command.  This causes spurious
test failures because QMP events are emitted in an unpredictable order.

This test relies on QEMU internals and there is no QMP API for getting
deterministic behavior needed to make this test 100% reliable.  At the
same time, the test is useful and it would be a shame to remove it.

Add sleep 0.5 to reduce the chance of races.  This is not a real fix but
appears to reduce spurious failures in practice.

Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/185     | 12 ++++++++++++
 tests/qemu-iotests/185.out |  4 ++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
index 298d88d04e..975404c99d 100755
--- a/tests/qemu-iotests/185
+++ b/tests/qemu-iotests/185
@@ -118,6 +118,9 @@ _send_qemu_cmd $h \
                       'speed': 65536 } }" \
     "return"
 
+# If we don't sleep here 'quit' command races with disk I/O
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
@@ -137,6 +140,9 @@ _send_qemu_cmd $h \
                       'speed': 65536 } }" \
     "return"
 
+# If we don't sleep here 'quit' command races with disk I/O
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
@@ -183,6 +189,9 @@ _send_qemu_cmd $h \
                       'speed': 65536 } }" \
     "return"
 
+# If we don't sleep here 'quit' command races with disk I/O
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
@@ -201,6 +210,9 @@ _send_qemu_cmd $h \
                       'speed': 65536 } }" \
     "return"
 
+# If we don't sleep here 'quit' command races with disk I/O
+sleep 0.5
+
 _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 wait=1 _cleanup_qemu
 
diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
index 2c4b04de73..1f21e79c13 100644
--- a/tests/qemu-iotests/185.out
+++ b/tests/qemu-iotests/185.out
@@ -26,9 +26,9 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.q
 
 {"return": {}}
 {"return": {}}
-{"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "commit"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "commit"}}
 
 === Start mirror job and exit qemu ===
-- 
2.14.3


Re: [Qemu-devel] [PATCH 1/2] qemu-iotests: reduce chance of races in 185
Posted by Vladimir Sementsov-Ogievskiy 7 years, 5 months ago
04.05.2018 18:50, Stefan Hajnoczi wrote:
> Commit 8565c3ab537e78f3e69977ec2c609dc9417a806e ("qemu-iotests: fix
> 185") identified a race condition in a sub-test.
>
> Similar issues also affect the other sub-tests.  If disk I/O completes
> quickly, it races with the QMP 'quit' command.  This causes spurious
> test failures because QMP events are emitted in an unpredictable order.
>
> This test relies on QEMU internals and there is no QMP API for getting
> deterministic behavior needed to make this test 100% reliable.  At the
> same time, the test is useful and it would be a shame to remove it.
>
> Add sleep 0.5 to reduce the chance of races.  This is not a real fix but
> appears to reduce spurious failures in practice.
>
> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   tests/qemu-iotests/185     | 12 ++++++++++++
>   tests/qemu-iotests/185.out |  4 ++--
>   2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
> index 298d88d04e..975404c99d 100755
> --- a/tests/qemu-iotests/185
> +++ b/tests/qemu-iotests/185
> @@ -118,6 +118,9 @@ _send_qemu_cmd $h \
>                         'speed': 65536 } }" \
>       "return"
>   
> +# If we don't sleep here 'quit' command races with disk I/O
> +sleep 0.5
> +
>   _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
>   wait=1 _cleanup_qemu
>   
> @@ -137,6 +140,9 @@ _send_qemu_cmd $h \
>                         'speed': 65536 } }" \
>       "return"
>   
> +# If we don't sleep here 'quit' command races with disk I/O
> +sleep 0.5
> +
>   _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
>   wait=1 _cleanup_qemu
>   
> @@ -183,6 +189,9 @@ _send_qemu_cmd $h \
>                         'speed': 65536 } }" \
>       "return"
>   
> +# If we don't sleep here 'quit' command races with disk I/O
> +sleep 0.5
> +
>   _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
>   wait=1 _cleanup_qemu
>   
> @@ -201,6 +210,9 @@ _send_qemu_cmd $h \
>                         'speed': 65536 } }" \
>       "return"
>   
> +# If we don't sleep here 'quit' command races with disk I/O
> +sleep 0.5
> +
>   _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
>   wait=1 _cleanup_qemu
>   
> diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
> index 2c4b04de73..1f21e79c13 100644
> --- a/tests/qemu-iotests/185.out
> +++ b/tests/qemu-iotests/185.out
> @@ -26,9 +26,9 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.q
>   
>   {"return": {}}
>   {"return": {}}
> -{"return": {}}
> -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "commit"}}
> +{"return": {}}
> +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "commit"}}
>   
>   === Start mirror job and exit qemu ===

Hm, it looks strange, that commit was done with such a small speed in 
0.5 seconds...

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [Qemu-block] [PATCH 1/2] qemu-iotests: reduce chance of races in 185
Posted by Stefan Hajnoczi 7 years, 5 months ago
On Fri, May 04, 2018 at 08:43:43PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 04.05.2018 18:50, Stefan Hajnoczi wrote:
> > Commit 8565c3ab537e78f3e69977ec2c609dc9417a806e ("qemu-iotests: fix
> > 185") identified a race condition in a sub-test.
> > 
> > Similar issues also affect the other sub-tests.  If disk I/O completes
> > quickly, it races with the QMP 'quit' command.  This causes spurious
> > test failures because QMP events are emitted in an unpredictable order.
> > 
> > This test relies on QEMU internals and there is no QMP API for getting
> > deterministic behavior needed to make this test 100% reliable.  At the
> > same time, the test is useful and it would be a shame to remove it.
> > 
> > Add sleep 0.5 to reduce the chance of races.  This is not a real fix but
> > appears to reduce spurious failures in practice.
> > 
> > Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   tests/qemu-iotests/185     | 12 ++++++++++++
> >   tests/qemu-iotests/185.out |  4 ++--
> >   2 files changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
> > index 298d88d04e..975404c99d 100755
> > --- a/tests/qemu-iotests/185
> > +++ b/tests/qemu-iotests/185
> > @@ -118,6 +118,9 @@ _send_qemu_cmd $h \
> >                         'speed': 65536 } }" \
> >       "return"
> > +# If we don't sleep here 'quit' command races with disk I/O
> > +sleep 0.5
> > +
> >   _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
> >   wait=1 _cleanup_qemu
> > @@ -137,6 +140,9 @@ _send_qemu_cmd $h \
> >                         'speed': 65536 } }" \
> >       "return"
> > +# If we don't sleep here 'quit' command races with disk I/O
> > +sleep 0.5
> > +
> >   _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
> >   wait=1 _cleanup_qemu
> > @@ -183,6 +189,9 @@ _send_qemu_cmd $h \
> >                         'speed': 65536 } }" \
> >       "return"
> > +# If we don't sleep here 'quit' command races with disk I/O
> > +sleep 0.5
> > +
> >   _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
> >   wait=1 _cleanup_qemu
> > @@ -201,6 +210,9 @@ _send_qemu_cmd $h \
> >                         'speed': 65536 } }" \
> >       "return"
> > +# If we don't sleep here 'quit' command races with disk I/O
> > +sleep 0.5
> > +
> >   _send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
> >   wait=1 _cleanup_qemu
> > diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
> > index 2c4b04de73..1f21e79c13 100644
> > --- a/tests/qemu-iotests/185.out
> > +++ b/tests/qemu-iotests/185.out
> > @@ -26,9 +26,9 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.q
> >   {"return": {}}
> >   {"return": {}}
> > -{"return": {}}
> > -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
> >   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "commit"}}
> > +{"return": {}}
> > +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
> >   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "commit"}}
> >   === Start mirror job and exit qemu ===
> 
> Hm, it looks strange, that commit was done with such a small speed in 0.5
> seconds...

Rate-limiting is broken in the mirror job (qemu.git/master), so that's
why it completes within 0.5s.

A separate patch (already in Jeff's tree) fixes the rate-limiting issue.

I will rebase onto Jeff's tree and resend.  Thanks for pointing this
out!

Stefan