[libvirt] [PATCH] qemu: hotplug: fail when net device detach times out

Jonathan Davies posted 1 patch 6 years, 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20180313140708.13855-1-jonathan.davies@nutanix.com
Test syntax-check passed
src/qemu/qemu_hotplug.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
[libvirt] [PATCH] qemu: hotplug: fail when net device detach times out
Posted by Jonathan Davies 6 years, 1 month ago
qemuDomainDetachNetDevice should not treat a timeout in a call to
qemuDomainWaitForDeviceRemoval as success. Instead, it should treat it
as failure -- this is the intention behind having a timeout.

If qemuDomainWaitForDeviceRemoval times out, it returns 0. In this
instance, if qemuDomainDetachNetDevice returns 0, virsh detach-interface
reports:

  Interface detached successfully

with a zero exit status.

But with a negative return value from qemuDomainDetachNetDevice, virsh
detach-interface reports:

  error: Failed to detach interface
  error: An error occurred, but the cause is unknown

with a non-zero exit status, which is more appropriate.

Signed-off-by: Jonathan Davies <jonathan.davies@nutanix.com>
---
 src/qemu/qemu_hotplug.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 4be0f546c..d9a2f2d4d 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -5082,8 +5082,13 @@ qemuDomainDetachNetDevice(virQEMUDriverPtr driver,
     if (qemuDomainObjExitMonitor(driver, vm) < 0)
         goto cleanup;
 
-    if ((ret = qemuDomainWaitForDeviceRemoval(vm)) == 1)
+    ret = qemuDomainWaitForDeviceRemoval(vm);
+    if (ret == 1) {
         ret = qemuDomainRemoveNetDevice(driver, vm, detach);
+    } else if (ret == 0) {
+        VIR_WARN("Detach of device %s timed out; treating as a failure", detach->ifname);
+        ret = -1;
+    }
 
  cleanup:
     qemuDomainResetDeviceRemoval(vm);
-- 
2.14.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: hotplug: fail when net device detach times out
Posted by Peter Krempa 6 years, 1 month ago
On Tue, Mar 13, 2018 at 14:07:08 +0000, Jonathan Davies wrote:
> qemuDomainDetachNetDevice should not treat a timeout in a call to
> qemuDomainWaitForDeviceRemoval as success. Instead, it should treat it
> as failure -- this is the intention behind having a timeout.
> 
> If qemuDomainWaitForDeviceRemoval times out, it returns 0. In this
> instance, if qemuDomainDetachNetDevice returns 0, virsh detach-interface
> reports:
> 
>   Interface detached successfully
> 
> with a zero exit status.
> 
> But with a negative return value from qemuDomainDetachNetDevice, virsh
> detach-interface reports:
> 
>   error: Failed to detach interface
>   error: An error occurred, but the cause is unknown
> 
> with a non-zero exit status, which is more appropriate.
> 
> Signed-off-by: Jonathan Davies <jonathan.davies@nutanix.com>
> ---

This is unfortunately pre-existing behavior that we can't change. Some
versions of qemu don't support reporting of successful detach using the
event infrastructure, thus we can't ever know when that was successful.

Applications which need to be sure that the device was detached need to
register handlers for the libvirt device detached event.

NACK, this is broken behavior, but can't be changed.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: hotplug: fail when net device detach times out
Posted by Daniel P. Berrangé 6 years, 1 month ago
On Tue, Mar 13, 2018 at 02:07:08PM +0000, Jonathan Davies wrote:
> qemuDomainDetachNetDevice should not treat a timeout in a call to
> qemuDomainWaitForDeviceRemoval as success. Instead, it should treat it
> as failure -- this is the intention behind having a timeout.

Actually this is intentional current behaviour. See the API docs for
virDomainDetachDeviceFlags:

 * Beware that depending on the hypervisor and device type, detaching a device
 * from a running domain may be asynchronous. That is, calling
 * virDomainDetachDeviceFlags may just request device removal while the device
 * is actually removed later (in cooperation with a guest OS). Previously,
 * this fact was ignored and the device could have been removed from domain
 * configuration before it was actually removed by the hypervisor causing
 * various failures on subsequent operations. To check whether the device was
 * successfully removed, either recheck domain configuration using
 * virDomainGetXMLDesc() or add a handler for the VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED
 * event. In case the device is already gone when virDomainDetachDeviceFlags
 * returns, the event is delivered before this API call ends. To help existing
 * clients work better in most cases, this API will try to transform an
 * asynchronous device removal that finishes shortly after the request into
 * a synchronous removal. In other words, this API may wait a bit for the
 * removal to complete in case it was not synchronous.



> 
> If qemuDomainWaitForDeviceRemoval times out, it returns 0. In this
> instance, if qemuDomainDetachNetDevice returns 0, virsh detach-interface
> reports:
> 
>   Interface detached successfully

If anything needs changing it is this message from virsh.

Virsh should not assume return value of 0 means the device has been
detached. Virsh should do what the API docs suggest and check the
XML config to see if the asynchronous detach requets has completed
or not. IOW, it should report either

 Interface detached successfully

or

 Interface detach request still pending



> with a zero exit status.
> 
> But with a negative return value from qemuDomainDetachNetDevice, virsh
> detach-interface reports:
> 
>   error: Failed to detach interface
>   error: An error occurred, but the cause is unknown
> 
> with a non-zero exit status, which is more appropriate.
> 
> Signed-off-by: Jonathan Davies <jonathan.davies@nutanix.com>
> ---
>  src/qemu/qemu_hotplug.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
> index 4be0f546c..d9a2f2d4d 100644
> --- a/src/qemu/qemu_hotplug.c
> +++ b/src/qemu/qemu_hotplug.c
> @@ -5082,8 +5082,13 @@ qemuDomainDetachNetDevice(virQEMUDriverPtr driver,
>      if (qemuDomainObjExitMonitor(driver, vm) < 0)
>          goto cleanup;
>  
> -    if ((ret = qemuDomainWaitForDeviceRemoval(vm)) == 1)
> +    ret = qemuDomainWaitForDeviceRemoval(vm);
> +    if (ret == 1) {
>          ret = qemuDomainRemoveNetDevice(driver, vm, detach);
> +    } else if (ret == 0) {
> +        VIR_WARN("Detach of device %s timed out; treating as a failure", detach->ifname);
> +        ret = -1;
> +    }
>  
>   cleanup:
>      qemuDomainResetDeviceRemoval(vm);
> -- 
> 2.14.1
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: hotplug: fail when net device detach times out
Posted by Jiri Denemark 6 years, 1 month ago
On Tue, Mar 13, 2018 at 14:23:22 +0000, Daniel P. Berrangé wrote:
> On Tue, Mar 13, 2018 at 02:07:08PM +0000, Jonathan Davies wrote:
> > If qemuDomainWaitForDeviceRemoval times out, it returns 0. In this
> > instance, if qemuDomainDetachNetDevice returns 0, virsh detach-interface
> > reports:
> > 
> >   Interface detached successfully
> 
> If anything needs changing it is this message from virsh.
> 
> Virsh should not assume return value of 0 means the device has been
> detached. Virsh should do what the API docs suggest and check the
> XML config to see if the asynchronous detach requets has completed
> or not. IOW, it should report either
> 
>  Interface detached successfully
> 
> or
> 
>  Interface detach request still pending

Yeah, it's been on my nice-to-have todo list for a long time now.
Unfortunately, virsh doesn't know what device (i.e., its alias) is going
to be removed based on the XML passed by a user. I was thinking about
adding a new public API which would accept a device XML and return the
matching full device XML from domain definition to help virsh with this
task, but I was too lazy^Wbusy.

Jirka

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: hotplug: fail when net device detach times out
Posted by Daniel P. Berrangé 6 years, 1 month ago
On Tue, Mar 13, 2018 at 03:42:10PM +0100, Jiri Denemark wrote:
> On Tue, Mar 13, 2018 at 14:23:22 +0000, Daniel P. Berrangé wrote:
> > On Tue, Mar 13, 2018 at 02:07:08PM +0000, Jonathan Davies wrote:
> > > If qemuDomainWaitForDeviceRemoval times out, it returns 0. In this
> > > instance, if qemuDomainDetachNetDevice returns 0, virsh detach-interface
> > > reports:
> > > 
> > >   Interface detached successfully
> > 
> > If anything needs changing it is this message from virsh.
> > 
> > Virsh should not assume return value of 0 means the device has been
> > detached. Virsh should do what the API docs suggest and check the
> > XML config to see if the asynchronous detach requets has completed
> > or not. IOW, it should report either
> > 
> >  Interface detached successfully
> > 
> > or
> > 
> >  Interface detach request still pending
> 
> Yeah, it's been on my nice-to-have todo list for a long time now.
> Unfortunately, virsh doesn't know what device (i.e., its alias) is going
> to be removed based on the XML passed by a user. I was thinking about
> adding a new public API which would accept a device XML and return the
> matching full device XML from domain definition to help virsh with this
> task, but I was too lazy^Wbusy.

Then, we should at least make virsh say

  Interface detach request accepted

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: hotplug: fail when net device detach times out
Posted by Jonathan Davies 6 years, 1 month ago
On 13/03/18 14:23, Daniel P. Berrangé wrote:
> On Tue, Mar 13, 2018 at 02:07:08PM +0000, Jonathan Davies wrote:
>> qemuDomainDetachNetDevice should not treat a timeout in a call to
>> qemuDomainWaitForDeviceRemoval as success. Instead, it should treat it
>> as failure -- this is the intention behind having a timeout.
> 
> Actually this is intentional current behaviour. See the API docs for
> virDomainDetachDeviceFlags:
> 
>   * Beware that depending on the hypervisor and device type, detaching a device
>   * from a running domain may be asynchronous. That is, calling
>   * virDomainDetachDeviceFlags may just request device removal while the device
>   * is actually removed later (in cooperation with a guest OS). Previously,
>   * this fact was ignored and the device could have been removed from domain
>   * configuration before it was actually removed by the hypervisor causing
>   * various failures on subsequent operations. To check whether the device was
>   * successfully removed, either recheck domain configuration using
>   * virDomainGetXMLDesc() or add a handler for the VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED
>   * event. In case the device is already gone when virDomainDetachDeviceFlags
>   * returns, the event is delivered before this API call ends. To help existing
>   * clients work better in most cases, this API will try to transform an
>   * asynchronous device removal that finishes shortly after the request into
>   * a synchronous removal. In other words, this API may wait a bit for the
>   * removal to complete in case it was not synchronous.

Thanks all the helpful replies.

Just to check I've understood: The design is that the call is 
synchronous but may become asynchronous if it times out. The purpose of 
the timeout is just to limit the window in which it might be synchronous.

But there's no way for the client to know whether it was synchronous or 
asynchronous since the return value is the same in both cases.

Therefore a well-written client would always need to check whether the 
operation completed successfully or not.

In which case, why bother with the synchronous mode and the server-side 
timeout? Wouldn't it be simpler for it to be always asynchronous, 
allowing the client to wait for as long as it likes before giving up?

Please correct me if I've misunderstood!

Thanks,
Jonathan

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: hotplug: fail when net device detach times out
Posted by Daniel P. Berrangé 6 years, 1 month ago
On Tue, Mar 13, 2018 at 06:13:46PM +0000, Jonathan Davies wrote:
> On 13/03/18 14:23, Daniel P. Berrangé wrote:
> > On Tue, Mar 13, 2018 at 02:07:08PM +0000, Jonathan Davies wrote:
> > > qemuDomainDetachNetDevice should not treat a timeout in a call to
> > > qemuDomainWaitForDeviceRemoval as success. Instead, it should treat it
> > > as failure -- this is the intention behind having a timeout.
> > 
> > Actually this is intentional current behaviour. See the API docs for
> > virDomainDetachDeviceFlags:
> > 
> >   * Beware that depending on the hypervisor and device type, detaching a device
> >   * from a running domain may be asynchronous. That is, calling
> >   * virDomainDetachDeviceFlags may just request device removal while the device
> >   * is actually removed later (in cooperation with a guest OS). Previously,
> >   * this fact was ignored and the device could have been removed from domain
> >   * configuration before it was actually removed by the hypervisor causing
> >   * various failures on subsequent operations. To check whether the device was
> >   * successfully removed, either recheck domain configuration using
> >   * virDomainGetXMLDesc() or add a handler for the VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED
> >   * event. In case the device is already gone when virDomainDetachDeviceFlags
> >   * returns, the event is delivered before this API call ends. To help existing
> >   * clients work better in most cases, this API will try to transform an
> >   * asynchronous device removal that finishes shortly after the request into
> >   * a synchronous removal. In other words, this API may wait a bit for the
> >   * removal to complete in case it was not synchronous.
> 
> Thanks all the helpful replies.
> 
> Just to check I've understood: The design is that the call is synchronous
> but may become asynchronous if it times out. The purpose of the timeout is
> just to limit the window in which it might be synchronous.

I'd say the opposite - the API is designed to be asynchronous, but you
might get lucky and have it complete before it returns.

> But there's no way for the client to know whether it was synchronous or
> asynchronous since the return value is the same in both cases.
> 
> Therefore a well-written client would always need to check whether the
> operation completed successfully or not.
> 
> In which case, why bother with the synchronous mode and the server-side
> timeout? Wouldn't it be simpler for it to be always asynchronous, allowing
> the client to wait for as long as it likes before giving up?

Yes Well written applications should always wait and check for actual
removal. Adding the server side wait was just a slightly gross
hack to make it slightly more reliable with badly written apps that
blindly assume completion

> Please correct me if I've misunderstood!

You're right, and yes this is an unpleasnt API design. If we could go
back and do it again we'd do things differently, but we're fairly stuck
with it now.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: hotplug: fail when net device detach times out
Posted by Jiri Denemark 6 years, 1 month ago
On Tue, Mar 13, 2018 at 18:20:51 +0000, Daniel P. Berrangé wrote:
> On Tue, Mar 13, 2018 at 06:13:46PM +0000, Jonathan Davies wrote:
> > Thanks all the helpful replies.
> > 
> > Just to check I've understood: The design is that the call is synchronous
> > but may become asynchronous if it times out. The purpose of the timeout is
> > just to limit the window in which it might be synchronous.
> 
> I'd say the opposite - the API is designed to be asynchronous, but you
> might get lucky and have it complete before it returns.

It's both actually :-) The API was originally designed as synchronous,
although it was implemented asynchronously for most cases. Mainly
because the action requires a guest OS to cooperate and it may not even
allow or support detaching some devices at all and the request will be
completely ignored. On the other hand, detaching USB devices in QEMU is
a rare example of implementation which is always synchronous. And if a
guest OS cooperates, the process may be very fast so in reality users
were seeing this API as synchronous because in "normal" cases the
devices were already removed at the time the API returned success (or
when they immediately checked the device after the API finished).

When we fixed the API documentation to be explicit about the
asynchronous behavior, we wanted to make sure existing users still see
it as synchronous in normal cases and the short timeout was introduced.

At the same time we could not change the return value to distinguish
between finished and requested detach.

This is quite a mess, but it was still considered better than
introducing a new detach API which could have better semantics, but all
apps using libvirt would need to adapt and possibly be able to work with
both versions anyway.

Jirka

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list