[libvirt] [PATCH] qemu: blockjob: Don't report block job progress at 100% if job isn't ready

Peter Krempa posted 1 patch 5 years, 1 month ago
Test syntax-check passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/cbaa0f0d6a2eafc9692bd349289d6c1356ab4f22.1548779049.git.pkrempa@redhat.com
src/qemu/qemu_driver.c | 8 ++++++++
1 file changed, 8 insertions(+)
[libvirt] [PATCH] qemu: blockjob: Don't report block job progress at 100% if job isn't ready
Posted by Peter Krempa 5 years, 1 month ago
Some clients take the advice to poll virDomainGetBlockJobInfo rather
than wait for the ready event. In some cases qemu can get to 100% and
still not reach the synchronised phase.

Since we are dealing with a computer interacting, the error that the job
can't be finalized yet is not handled very well by those specific
implementations.

Our docs now correctly state to use the event.

We already munge the output for the start of the job, so we can do it
even here.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
---
 src/qemu/qemu_driver.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 26edbf799f..cf500507da 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -17477,6 +17477,14 @@ qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo,
         }
     }

+    /* If qemu reports that it's not ready yet don't make the job go to
+     * cur == end as some apps wrote code polling this instead of waiting for
+     * the ready event */
+    if (rawInfo->ready == 0 &&
+        info->cur == info->end &&
+        info->cur > 0)
+        info->cur -= 1;
+
     info->type = rawInfo->type;
     if (info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT &&
         disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT)
-- 
2.20.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: blockjob: Don't report block job progress at 100% if job isn't ready
Posted by Eric Blake 5 years, 1 month ago
On 1/29/19 10:24 AM, Peter Krempa wrote:
> Some clients take the advice to poll virDomainGetBlockJobInfo rather
> than wait for the ready event. In some cases qemu can get to 100% and
> still not reach the synchronised phase.
> 
> Since we are dealing with a computer interacting, the error that the job
> can't be finalized yet is not handled very well by those specific
> implementations.
> 
> Our docs now correctly state to use the event.
> 
> We already munge the output for the start of the job, so we can do it
> even here.
> 
> Signed-off-by: Peter Krempa <pkrempa@redhat.com>
> ---
>  src/qemu/qemu_driver.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 26edbf799f..cf500507da 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -17477,6 +17477,14 @@ qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo,
>          }
>      }
> 
> +    /* If qemu reports that it's not ready yet don't make the job go to
> +     * cur == end as some apps wrote code polling this instead of waiting for
> +     * the ready event */
> +    if (rawInfo->ready == 0 &&
> +        info->cur == info->end &&
> +        info->cur > 0)
> +        info->cur -= 1;

Why -= 1 instead of --?

But the patch looks sane to me.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: blockjob: Don't report block job progress at 100% if job isn't ready
Posted by Kashyap Chamarthy 5 years, 1 month ago
On Tue, Jan 29, 2019 at 05:24:09PM +0100, Peter Krempa wrote:
> Some clients take the advice to poll virDomainGetBlockJobInfo rather
> than wait for the ready event. In some cases qemu can get to 100% and
> still not reach the synchronised phase.
> 
> Since we are dealing with a computer interacting, 

OCD nit: The above sounds awkward to me, might want to rephrase it,
maybe something like: "Given that computers are interacting here ..."

> the error that the job
> can't be finalized yet is not handled very well by those specific
> implementations.
> 
> Our docs now correctly state to use the event.

Nit: Calling out the event name, VIR_DOMAIN_BLOCK_JOB_READY can be
useful when reading Git logs at a later time.

> We already munge the output for the start of the job, so we can do it
> even here.

Optional, a bit of explanation of "munging the output" at the start (and
end) of the job would be useful for those new to these block APIs.

Reading between the lines, and from what I recall Eric explaining once,
what you're implying here is: that even if QEMU reported "cur==end", but
if the _READY event is not emitted, libvirt manipulates it to report
"cur<end".

Thanks for the workaround (because "QEMU is not accurately reporting
data") fix.

/me wonders if he could make a reliable reproducer test for this...

> Signed-off-by: Peter Krempa <pkrempa@redhat.com>
> ---
>  src/qemu/qemu_driver.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 26edbf799f..cf500507da 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -17477,6 +17477,14 @@ qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo,
>          }
>      }
> 
> +    /* If qemu reports that it's not ready yet don't make the job go to
> +     * cur == end as some apps wrote code polling this instead of waiting for
> +     * the ready event */
> +    if (rawInfo->ready == 0 &&
> +        info->cur == info->end &&
> +        info->cur > 0)
> +        info->cur -= 1;
> +
>      info->type = rawInfo->type;
>      if (info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT &&
>          disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT)
> -- 
> 2.20.1
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list

-- 
/kashyap

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: blockjob: Don't report block job progress at 100% if job isn't ready
Posted by Peter Krempa 5 years, 1 month ago
On Tue, Jan 29, 2019 at 18:52:37 +0100, Kashyap Chamarthy wrote:
> On Tue, Jan 29, 2019 at 05:24:09PM +0100, Peter Krempa wrote:
> > Some clients take the advice to poll virDomainGetBlockJobInfo rather
> > than wait for the ready event. In some cases qemu can get to 100% and
> > still not reach the synchronised phase.
> > 
> > Since we are dealing with a computer interacting, 
> 
> OCD nit: The above sounds awkward to me, might want to rephrase it,
> maybe something like: "Given that computers are interacting here ..."
> 
> > the error that the job
> > can't be finalized yet is not handled very well by those specific
> > implementations.
> > 
> > Our docs now correctly state to use the event.
> 
> Nit: Calling out the event name, VIR_DOMAIN_BLOCK_JOB_READY can be
> useful when reading Git logs at a later time.
> 
> > We already munge the output for the start of the job, so we can do it
> > even here.
> 
> Optional, a bit of explanation of "munging the output" at the start (and
> end) of the job would be useful for those new to these block APIs.
> 
> Reading between the lines, and from what I recall Eric explaining once,
> what you're implying here is: that even if QEMU reported "cur==end", but
> if the _READY event is not emitted, libvirt manipulates it to report
> "cur<end".
> 
> Thanks for the workaround (because "QEMU is not accurately reporting
> data") fix.
> 
> /me wonders if he could make a reliable reproducer test for this...

Technically you'll need to patch either libvirt or qemu to do so. E.g.
add a delay to processing of the READY event from qemu. This is a race
across two applications which is not easy to reproduce.

Also I really don't want to think this is a solution. This is a dirty
hack to work around software not willing to adapt to the correct
approach.

I did not want do do this, but I've found that we do something similar
for the job startup. I really hate this solution, but given that it's two years since
I've pointed out that relying on polling the info via
virDomainBlockJobInfo is broken and apps should use the _READY event and
it's still an issue today I'm going to do this, but really don't want to
advertise this as a solution. It's not a solution. It's a hack.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: blockjob: Don't report block job progress at 100% if job isn't ready
Posted by Kashyap Chamarthy 5 years, 1 month ago
On Wed, Jan 30, 2019 at 10:46:24AM +0100, Peter Krempa wrote:
> On Tue, Jan 29, 2019 at 18:52:37 +0100, Kashyap Chamarthy wrote:

[...]

> > Optional, a bit of explanation of "munging the output" at the start (and
> > end) of the job would be useful for those new to these block APIs.
> > 
> > Reading between the lines, and from what I recall Eric explaining once,
> > what you're implying here is: that even if QEMU reported "cur==end", but
> > if the _READY event is not emitted, libvirt manipulates it to report
> > "cur<end".
> > 
> > Thanks for the workaround (because "QEMU is not accurately reporting
> > data") fix.
> > 
> > /me wonders if he could make a reliable reproducer test for this...
> 
> Technically you'll need to patch either libvirt or qemu to do so. E.g.
> add a delay to processing of the READY event from qemu. This is a race
> across two applications which is not easy to reproduce.

Yep, noted.  For now I won't spend time on it.

> Also I really don't want to think this is a solution. This is a dirty
> hack to work around software not willing to adapt to the correct
> approach.

It's not about "not willing" but the "small matter of implementing" it
correctly without regressing. :-)  And I realize it's not a perfect
"solution", that's why I called it a workaround.  And FWIW, Eric seems
to agree that it is reasonable given the circumstances.

> I did not want do do this, but I've found that we do something similar
> for the job startup. I really hate this solution, but given that it's
> two years since I've pointed out that relying on polling the info via
> virDomainBlockJobInfo is broken and apps should use the _READY event
> and it's still an issue today I'm going to do this, but really don't
> want to advertise this as a solution. It's not a solution. It's a
> hack.

Right, at least for Nova, I _did_ notify upstream that the correct way
is to use the events.  It's a matter of prioritizing that task.  (I'll
file a formal bug.)

-- 
/kashyap

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list