[PATCH v2] qemuDomainWaitForDeviceRemoval: recheck the value of priv->unplug.alias when timeout

Peter Krempa posted 1 patch 10 months, 4 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/33c3ddcd40b48a90c1043cb1548ccae4a150991d.1687528003.git.pkrempa@redhat.com
src/qemu/qemu_hotplug.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
[PATCH v2] qemuDomainWaitForDeviceRemoval: recheck the value of priv->unplug.alias when timeout
Posted by Peter Krempa 10 months, 4 weeks ago
From: zuoboqun <zuoboqun@baidu.com>

When detaching a device, the following race condition may happen:
Once qemuDomainSignalDeviceRemoval() marks the device for
removal, it returns true, which means it is the caller
that marked the device for removal is going to remove the
device from domain definition.

But qemuDomainWaitForDeviceRemoval() may still receive
timeout from virDomainObjWaitUntil() which is implemented
by pthread_cond_timedwait() due to an unavoidable race
between the expiration of the timeout and the predicate
state(priv->unplug.alias) change.

And then qemuDomainWaitForDeviceRemoval() will return 0,
thus the caller will not remove the device from domain
definition.

In this situation, the device is still present in the domain
definition but doesn't exist in qemu anymore. Worse, there is
no way to remove it from the domain definition.

Solution is to recheck the value of priv->unplug.alias to
determine who is going to remove the device from domain
definition.

Signed-off-by: zuo boqun <zuoboqun@baidu.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
---

v2:
 - rewrote waiting loop so that we always check the unplug status if the
   thread was notified
 - added comments explaining the logic

 src/qemu/qemu_hotplug.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index ba9e44945b..2e3c99760d 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -5391,21 +5391,27 @@ qemuDomainWaitForDeviceRemoval(virDomainObj *vm)
 {
     qemuDomainObjPrivate *priv = vm->privateData;
     unsigned long long until;
-    int rc;

     if (virTimeMillisNow(&until) < 0)
         return 1;
     until += qemuDomainGetUnplugTimeout(vm);

-    while (priv->unplug.alias) {
-        if ((rc = virDomainObjWaitUntil(vm, until)) == 1)
-            return 0;
+    while (true) {
+        int rc;

-        if (rc < 0) {
-            VIR_WARN("Failed to wait on unplug condition for domain '%s' "
-                     "device '%s'", vm->def->name, priv->unplug.alias);
+        if ((rc = virDomainObjWaitUntil(vm, until)) < 0) {
+            VIR_WARN("Failed to wait on unplug condition for domain '%s' device '%s'",
+                     vm->def->name, priv->unplug.alias);
             return 1;
         }
+
+        /* unplug event for this device was received, check the status */
+        if (!priv->unplug.alias)
+            break;
+
+        /* timeout */
+        if (rc == 1)
+            return 0;
     }

     if (priv->unplug.status == QEMU_DOMAIN_UNPLUGGING_DEVICE_STATUS_GUEST_REJECTED) {
-- 
2.40.1
Re: [PATCH v2] qemuDomainWaitForDeviceRemoval: recheck the value of priv->unplug.alias when timeout
Posted by Michal Prívozník 10 months, 3 weeks ago
On 6/23/23 15:48, Peter Krempa wrote:
> From: zuoboqun <zuoboqun@baidu.com>
> 
> When detaching a device, the following race condition may happen:
> Once qemuDomainSignalDeviceRemoval() marks the device for
> removal, it returns true, which means it is the caller
> that marked the device for removal is going to remove the
> device from domain definition.
> 
> But qemuDomainWaitForDeviceRemoval() may still receive
> timeout from virDomainObjWaitUntil() which is implemented
> by pthread_cond_timedwait() due to an unavoidable race
> between the expiration of the timeout and the predicate
> state(priv->unplug.alias) change.
> 
> And then qemuDomainWaitForDeviceRemoval() will return 0,
> thus the caller will not remove the device from domain
> definition.
> 
> In this situation, the device is still present in the domain
> definition but doesn't exist in qemu anymore. Worse, there is
> no way to remove it from the domain definition.
> 
> Solution is to recheck the value of priv->unplug.alias to
> determine who is going to remove the device from domain
> definition.
> 
> Signed-off-by: zuo boqun <zuoboqun@baidu.com>
> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
> Signed-off-by: Peter Krempa <pkrempa@redhat.com>
> ---
> 
> v2:
>  - rewrote waiting loop so that we always check the unplug status if the
>    thread was notified
>  - added comments explaining the logic
> 
>  src/qemu/qemu_hotplug.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>

Michal