Some clients take the advice to poll virDomainGetBlockJobInfo rather
than wait for the ready event. In some cases qemu can get to 100% and
still not reach the synchronised phase.
Since we are dealing with a computer interacting, the error that the job
can't be finalized yet is not handled very well by those specific
implementations.
Our docs now correctly state to use the event.
We already munge the output for the start of the job, so we can do it
even here.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
---
src/qemu/qemu_driver.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 26edbf799f..cf500507da 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -17477,6 +17477,14 @@ qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo,
}
}
+ /* If qemu reports that it's not ready yet don't make the job go to
+ * cur == end as some apps wrote code polling this instead of waiting for
+ * the ready event */
+ if (rawInfo->ready == 0 &&
+ info->cur == info->end &&
+ info->cur > 0)
+ info->cur -= 1;
+
info->type = rawInfo->type;
if (info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT &&
disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT)
--
2.20.1
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
On 1/29/19 10:24 AM, Peter Krempa wrote: > Some clients take the advice to poll virDomainGetBlockJobInfo rather > than wait for the ready event. In some cases qemu can get to 100% and > still not reach the synchronised phase. > > Since we are dealing with a computer interacting, the error that the job > can't be finalized yet is not handled very well by those specific > implementations. > > Our docs now correctly state to use the event. > > We already munge the output for the start of the job, so we can do it > even here. > > Signed-off-by: Peter Krempa <pkrempa@redhat.com> > --- > src/qemu/qemu_driver.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index 26edbf799f..cf500507da 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > @@ -17477,6 +17477,14 @@ qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo, > } > } > > + /* If qemu reports that it's not ready yet don't make the job go to > + * cur == end as some apps wrote code polling this instead of waiting for > + * the ready event */ > + if (rawInfo->ready == 0 && > + info->cur == info->end && > + info->cur > 0) > + info->cur -= 1; Why -= 1 instead of --? But the patch looks sane to me. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, Jan 29, 2019 at 05:24:09PM +0100, Peter Krempa wrote: > Some clients take the advice to poll virDomainGetBlockJobInfo rather > than wait for the ready event. In some cases qemu can get to 100% and > still not reach the synchronised phase. > > Since we are dealing with a computer interacting, OCD nit: The above sounds awkward to me, might want to rephrase it, maybe something like: "Given that computers are interacting here ..." > the error that the job > can't be finalized yet is not handled very well by those specific > implementations. > > Our docs now correctly state to use the event. Nit: Calling out the event name, VIR_DOMAIN_BLOCK_JOB_READY can be useful when reading Git logs at a later time. > We already munge the output for the start of the job, so we can do it > even here. Optional, a bit of explanation of "munging the output" at the start (and end) of the job would be useful for those new to these block APIs. Reading between the lines, and from what I recall Eric explaining once, what you're implying here is: that even if QEMU reported "cur==end", but if the _READY event is not emitted, libvirt manipulates it to report "cur<end". Thanks for the workaround (because "QEMU is not accurately reporting data") fix. /me wonders if he could make a reliable reproducer test for this... > Signed-off-by: Peter Krempa <pkrempa@redhat.com> > --- > src/qemu/qemu_driver.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index 26edbf799f..cf500507da 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > @@ -17477,6 +17477,14 @@ qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo, > } > } > > + /* If qemu reports that it's not ready yet don't make the job go to > + * cur == end as some apps wrote code polling this instead of waiting for > + * the ready event */ > + if (rawInfo->ready == 0 && > + info->cur == info->end && > + info->cur > 0) > + info->cur -= 1; > + > info->type = rawInfo->type; > if (info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT && > disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT) > -- > 2.20.1 > > -- > libvir-list mailing list > libvir-list@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list -- /kashyap -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, Jan 29, 2019 at 18:52:37 +0100, Kashyap Chamarthy wrote: > On Tue, Jan 29, 2019 at 05:24:09PM +0100, Peter Krempa wrote: > > Some clients take the advice to poll virDomainGetBlockJobInfo rather > > than wait for the ready event. In some cases qemu can get to 100% and > > still not reach the synchronised phase. > > > > Since we are dealing with a computer interacting, > > OCD nit: The above sounds awkward to me, might want to rephrase it, > maybe something like: "Given that computers are interacting here ..." > > > the error that the job > > can't be finalized yet is not handled very well by those specific > > implementations. > > > > Our docs now correctly state to use the event. > > Nit: Calling out the event name, VIR_DOMAIN_BLOCK_JOB_READY can be > useful when reading Git logs at a later time. > > > We already munge the output for the start of the job, so we can do it > > even here. > > Optional, a bit of explanation of "munging the output" at the start (and > end) of the job would be useful for those new to these block APIs. > > Reading between the lines, and from what I recall Eric explaining once, > what you're implying here is: that even if QEMU reported "cur==end", but > if the _READY event is not emitted, libvirt manipulates it to report > "cur<end". > > Thanks for the workaround (because "QEMU is not accurately reporting > data") fix. > > /me wonders if he could make a reliable reproducer test for this... Technically you'll need to patch either libvirt or qemu to do so. E.g. add a delay to processing of the READY event from qemu. This is a race across two applications which is not easy to reproduce. Also I really don't want to think this is a solution. This is a dirty hack to work around software not willing to adapt to the correct approach. I did not want do do this, but I've found that we do something similar for the job startup. I really hate this solution, but given that it's two years since I've pointed out that relying on polling the info via virDomainBlockJobInfo is broken and apps should use the _READY event and it's still an issue today I'm going to do this, but really don't want to advertise this as a solution. It's not a solution. It's a hack. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Wed, Jan 30, 2019 at 10:46:24AM +0100, Peter Krempa wrote: > On Tue, Jan 29, 2019 at 18:52:37 +0100, Kashyap Chamarthy wrote: [...] > > Optional, a bit of explanation of "munging the output" at the start (and > > end) of the job would be useful for those new to these block APIs. > > > > Reading between the lines, and from what I recall Eric explaining once, > > what you're implying here is: that even if QEMU reported "cur==end", but > > if the _READY event is not emitted, libvirt manipulates it to report > > "cur<end". > > > > Thanks for the workaround (because "QEMU is not accurately reporting > > data") fix. > > > > /me wonders if he could make a reliable reproducer test for this... > > Technically you'll need to patch either libvirt or qemu to do so. E.g. > add a delay to processing of the READY event from qemu. This is a race > across two applications which is not easy to reproduce. Yep, noted. For now I won't spend time on it. > Also I really don't want to think this is a solution. This is a dirty > hack to work around software not willing to adapt to the correct > approach. It's not about "not willing" but the "small matter of implementing" it correctly without regressing. :-) And I realize it's not a perfect "solution", that's why I called it a workaround. And FWIW, Eric seems to agree that it is reasonable given the circumstances. > I did not want do do this, but I've found that we do something similar > for the job startup. I really hate this solution, but given that it's > two years since I've pointed out that relying on polling the info via > virDomainBlockJobInfo is broken and apps should use the _READY event > and it's still an issue today I'm going to do this, but really don't > want to advertise this as a solution. It's not a solution. It's a > hack. Right, at least for Nova, I _did_ notify upstream that the correct way is to use the events. It's a matter of prioritizing that task. (I'll file a formal bug.) -- /kashyap -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
© 2016 - 2024 Red Hat, Inc.