[Qemu-devel] [PATCH] iotests: Fix intermittent failure in 219

Max Reitz posted 1 patch 4 years, 11 months ago
Test asan passed
Test docker-mingw@fedora passed
Test docker-clang@ubuntu passed
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190516161114.27596-1-mreitz@redhat.com
Maintainers: Max Reitz <mreitz@redhat.com>, Kevin Wolf <kwolf@redhat.com>
tests/qemu-iotests/219 | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
[Qemu-devel] [PATCH] iotests: Fix intermittent failure in 219
Posted by Max Reitz 4 years, 11 months ago
In 219, we wait for the job to make progress before we emit its status.
This makes the output reliable.

Unfortunately, there is a bug: We do not wait for any more progress if
the job has already reached its total-progress.  Right after the job has
been started, it is possible that total-progress is still 0, though.  In
that case, we may skip the first progress-making step and keep ending up
64 kB short.

To fix that bug, we cab simply wait for total-progress to reach 4 MB
(the image size) after starting the job.

Reported-by: Karen Mezick <kmezick@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1686651
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/219 | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/219 b/tests/qemu-iotests/219
index c03bbdb294..e0c51662c0 100755
--- a/tests/qemu-iotests/219
+++ b/tests/qemu-iotests/219
@@ -23,6 +23,8 @@ import iotests
 
 iotests.verify_image_format(supported_fmts=['qcow2'])
 
+img_size = 4 * 1024 * 1024
+
 def pause_wait(vm, job_id):
     with iotests.Timeout(3, "Timeout waiting for job to pause"):
         while True:
@@ -62,6 +64,8 @@ def test_pause_resume(vm):
                 iotests.log(vm.qmp('query-jobs'))
 
 def test_job_lifecycle(vm, job, job_args, has_ready=False):
+    global img_size
+
     iotests.log('')
     iotests.log('')
     iotests.log('Starting block job: %s (auto-finalize: %s; auto-dismiss: %s)' %
@@ -84,6 +88,10 @@ def test_job_lifecycle(vm, job, job_args, has_ready=False):
     iotests.log(iotests.filter_qmp_event(vm.event_wait('JOB_STATUS_CHANGE')))
     iotests.log(iotests.filter_qmp_event(vm.event_wait('JOB_STATUS_CHANGE')))
 
+    # Wait for total-progress to stabilize
+    while vm.qmp('query-jobs')['return'][0]['total-progress'] < img_size:
+        pass
+
     # RUNNING state:
     # pause/resume should work, complete/finalize/dismiss should error out
     iotests.log('')
@@ -173,9 +181,8 @@ with iotests.FilePath('disk.img') as disk_path, \
      iotests.FilePath('copy.img') as copy_path, \
      iotests.VM() as vm:
 
-    img_size = '4M'
-    iotests.qemu_img_create('-f', iotests.imgfmt, disk_path, img_size)
-    iotests.qemu_io('-c', 'write 0 %s' % (img_size),
+    iotests.qemu_img_create('-f', iotests.imgfmt, disk_path, str(img_size))
+    iotests.qemu_io('-c', 'write 0 %i' % (img_size),
                     '-f', iotests.imgfmt, disk_path)
 
     iotests.log('Launching VM...')
-- 
2.21.0


Re: [Qemu-devel] [Qemu-block] [PATCH] iotests: Fix intermittent failure in 219
Posted by John Snow 4 years, 11 months ago

On 5/16/19 12:11 PM, Max Reitz wrote:
> In 219, we wait for the job to make progress before we emit its status.
> This makes the output reliable.
> 
> Unfortunately, there is a bug: We do not wait for any more progress if
> the job has already reached its total-progress.  Right after the job has
> been started, it is possible that total-progress is still 0, though.  In
> that case, we may skip the first progress-making step and keep ending up
> 64 kB short.
> 

Oh, this took a while to understand.

The bug is that when the job has started, its current progress is the
same as its total progress, 0, which we might confuse as as being
sufficiently done.

You avoid this by forcing the job to update its total progress at least
once before proceeding.

We do this checking pre-patch in test_pause_resume, not excerpted below
in the diffstat.

I might recommend:

"In 219, we wait for the job to make progress before we emit its status.
This makes the output reliable. We do not wait for any more progress if
the job's current-progress already matches its total-progress.

Unfortunately, there is a bug: Right after the job has been started,
it's possible that total-progress is still 0."

But, well, that's just like, my opinion, man.

> To fix that bug, we cab simply wait for total-progress to reach 4 MB
> (the image size) after starting the job.
> 
> Reported-by: Karen Mezick <kmezick@redhat.com>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1686651
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tests/qemu-iotests/219 | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/qemu-iotests/219 b/tests/qemu-iotests/219
> index c03bbdb294..e0c51662c0 100755
> --- a/tests/qemu-iotests/219
> +++ b/tests/qemu-iotests/219
> @@ -23,6 +23,8 @@ import iotests
>  
>  iotests.verify_image_format(supported_fmts=['qcow2'])
>  
> +img_size = 4 * 1024 * 1024
> +
>  def pause_wait(vm, job_id):
>      with iotests.Timeout(3, "Timeout waiting for job to pause"):
>          while True:
> @@ -62,6 +64,8 @@ def test_pause_resume(vm):
>                  iotests.log(vm.qmp('query-jobs'))
>  
>  def test_job_lifecycle(vm, job, job_args, has_ready=False):
> +    global img_size
> +

You hate to see it happen, folks

>      iotests.log('')
>      iotests.log('')
>      iotests.log('Starting block job: %s (auto-finalize: %s; auto-dismiss: %s)' %
> @@ -84,6 +88,10 @@ def test_job_lifecycle(vm, job, job_args, has_ready=False):
>      iotests.log(iotests.filter_qmp_event(vm.event_wait('JOB_STATUS_CHANGE')))
>      iotests.log(iotests.filter_qmp_event(vm.event_wait('JOB_STATUS_CHANGE')))
>  

In my recent writing of tests, I have come to be quite afraid of naked
waits on JOB_STATUS_CHANGE without actually waiting on a specific one. I
suppose it's fine because we log it, so we will eventually see what went
wrong, but ...

Ehm, just a style thing. Not related to this patch.

> +    # Wait for total-progress to stabilize
> +    while vm.qmp('query-jobs')['return'][0]['total-progress'] < img_size:
> +        pass
> +
>      # RUNNING state:
>      # pause/resume should work, complete/finalize/dismiss should error out
>      iotests.log('')
> @@ -173,9 +181,8 @@ with iotests.FilePath('disk.img') as disk_path, \
>       iotests.FilePath('copy.img') as copy_path, \
>       iotests.VM() as vm:
>  
> -    img_size = '4M'
> -    iotests.qemu_img_create('-f', iotests.imgfmt, disk_path, img_size)
> -    iotests.qemu_io('-c', 'write 0 %s' % (img_size),
> +    iotests.qemu_img_create('-f', iotests.imgfmt, disk_path, str(img_size))
> +    iotests.qemu_io('-c', 'write 0 %i' % (img_size),
>                      '-f', iotests.imgfmt, disk_path)
>  
>      iotests.log('Launching VM...')
> 

Regardless of my spurious noisemaking:

Reviewed-by: John Snow <jsnow@redhat.com>

Re: [Qemu-devel] [PATCH] iotests: Fix intermittent failure in 219
Posted by Max Reitz 4 years, 11 months ago
On 16.05.19 18:11, Max Reitz wrote:
> In 219, we wait for the job to make progress before we emit its status.
> This makes the output reliable.
> 
> Unfortunately, there is a bug: We do not wait for any more progress if
> the job has already reached its total-progress.  Right after the job has
> been started, it is possible that total-progress is still 0, though.  In
> that case, we may skip the first progress-making step and keep ending up
> 64 kB short.
> 
> To fix that bug, we cab simply wait for total-progress to reach 4 MB
> (the image size) after starting the job.
> 
> Reported-by: Karen Mezick <kmezick@redhat.com>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1686651
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tests/qemu-iotests/219 | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)

Adjusted the commit message to what I understood John to mean, and
applied to my block(-on-kevin) branch.

(Thanks for the review!)

Max

Re: [Qemu-devel] [PATCH] iotests: Fix intermittent failure in 219
Posted by Max Reitz 4 years, 11 months ago
On 16.05.19 18:11, Max Reitz wrote:
> In 219, we wait for the job to make progress before we emit its status.
> This makes the output reliable.
> 
> Unfortunately, there is a bug: We do not wait for any more progress if
> the job has already reached its total-progress.  Right after the job has
> been started, it is possible that total-progress is still 0, though.  In
> that case, we may skip the first progress-making step and keep ending up
> 64 kB short.
> 
> To fix that bug, we cab simply wait for total-progress to reach 4 MB

s/cab/can/...

> (the image size) after starting the job.
> 
> Reported-by: Karen Mezick <kmezick@redhat.com>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1686651
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tests/qemu-iotests/219 | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)