[PATCH] qemuDomainWaitForDeviceRemoval: recheck the value of priv->unplug.alias when timeout

zuoboqun posted 1 patch 1 year, 5 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
src/qemu/qemu_hotplug.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
[PATCH] qemuDomainWaitForDeviceRemoval: recheck the value of priv->unplug.alias when timeout
Posted by zuoboqun 1 year, 5 months ago
When detaching a device, the following race condition may happen:
Once qemuDomainSignalDeviceRemoval() marks the device for
removal, it returns true, which means it is the caller
that marked the device for removal is going to remove the
device from domain definition.

But qemuDomainWaitForDeviceRemoval() may still receive
timeout from virDomainObjWaitUntil() which is implemented
by pthread_cond_timedwait() due to an unavoidable race
between the expiration of the timeout and the predicate
state(priv->unplug.alias) change.

And then qemuDomainWaitForDeviceRemoval() will return 0,
thus the caller will not remove the device from domain
definition.

In this situation, the device is still present in the domain
definition but doesn't exist in qemu anymore. Worse, there is
no way to remove it from the domain definition.

Solution is to recheck the value of priv->unplug.alias to
determine who is going to remove the device from domain
definition.

Signed-off-by: zuo boqun <zuoboqun@baidu.com>
---
 src/qemu/qemu_hotplug.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 972df572a7..c8028df93a 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -5399,8 +5399,12 @@ qemuDomainWaitForDeviceRemoval(virDomainObj *vm)
     until += qemuDomainGetUnplugTimeout(vm);
 
     while (priv->unplug.alias) {
-        if ((rc = virDomainObjWaitUntil(vm, until)) == 1)
-            return 0;
+        if ((rc = virDomainObjWaitUntil(vm, until)) == 1) {
+            if (priv->unplug.alias)
+                return 0;
+            else
+                return 1;
+        }
 
         if (rc < 0) {
             VIR_WARN("Failed to wait on unplug condition for domain '%s' "
-- 
2.34.0.windows.1
Re: [PATCH] qemuDomainWaitForDeviceRemoval: recheck the value of priv->unplug.alias when timeout
Posted by Peter Krempa 1 year, 5 months ago
On Mon, Jun 05, 2023 at 11:59:36 +0800, zuoboqun wrote:
> When detaching a device, the following race condition may happen:
> Once qemuDomainSignalDeviceRemoval() marks the device for
> removal, it returns true, which means it is the caller
> that marked the device for removal is going to remove the
> device from domain definition.
> 
> But qemuDomainWaitForDeviceRemoval() may still receive
> timeout from virDomainObjWaitUntil() which is implemented
> by pthread_cond_timedwait() due to an unavoidable race
> between the expiration of the timeout and the predicate
> state(priv->unplug.alias) change.
> 
> And then qemuDomainWaitForDeviceRemoval() will return 0,
> thus the caller will not remove the device from domain
> definition.
> 
> In this situation, the device is still present in the domain
> definition but doesn't exist in qemu anymore. Worse, there is
> no way to remove it from the domain definition.
> 
> Solution is to recheck the value of priv->unplug.alias to
> determine who is going to remove the device from domain
> definition.
> 
> Signed-off-by: zuo boqun <zuoboqun@baidu.com>
> ---
>  src/qemu/qemu_hotplug.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
> index 972df572a7..c8028df93a 100644
> --- a/src/qemu/qemu_hotplug.c
> +++ b/src/qemu/qemu_hotplug.c
> @@ -5399,8 +5399,12 @@ qemuDomainWaitForDeviceRemoval(virDomainObj *vm)
>      until += qemuDomainGetUnplugTimeout(vm);
>  
>      while (priv->unplug.alias) {
> -        if ((rc = virDomainObjWaitUntil(vm, until)) == 1)
> -            return 0;
> +        if ((rc = virDomainObjWaitUntil(vm, until)) == 1) {
> +            if (priv->unplug.alias)
> +                return 0;
> +            else
> +                return 1;
> +        }
>  
>          if (rc < 0) {
>              VIR_WARN("Failed to wait on unplug condition for domain '%s' "

In general your patch makes semantically sense, but in the
implementation the logic skips the step after the loop which looks
whether unplug was successful or not and if it was unsuccessful reports
an error. Since that is based on the notification too we need to
preserve that logic too.

I'll post a fixed version and possibly rewrite the logic a big.
答复: [PATCH] qemuDomainWaitForDeviceRemoval: recheck the value of priv->unplug.alias when timeout
Posted by Zuo,Boqun 1 year, 5 months ago
ping

-----邮件原件-----
发件人: Zuo,Boqun <zuoboqun@baidu.com> 
发送时间: 2023年6月5日 12:00
收件人: libvir-list@redhat.com
抄送: Zuo,Boqun <zuoboqun@baidu.com>
主题: [PATCH] qemuDomainWaitForDeviceRemoval: recheck the value of priv->unplug.alias when timeout

When detaching a device, the following race condition may happen:
Once qemuDomainSignalDeviceRemoval() marks the device for removal, it returns true, which means it is the caller that marked the device for removal is going to remove the device from domain definition.

But qemuDomainWaitForDeviceRemoval() may still receive timeout from virDomainObjWaitUntil() which is implemented by pthread_cond_timedwait() due to an unavoidable race between the expiration of the timeout and the predicate
state(priv->unplug.alias) change.

And then qemuDomainWaitForDeviceRemoval() will return 0, thus the caller will not remove the device from domain definition.

In this situation, the device is still present in the domain definition but doesn't exist in qemu anymore. Worse, there is no way to remove it from the domain definition.

Solution is to recheck the value of priv->unplug.alias to determine who is going to remove the device from domain definition.

Signed-off-by: zuo boqun <zuoboqun@baidu.com>
---
 src/qemu/qemu_hotplug.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 972df572a7..c8028df93a 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -5399,8 +5399,12 @@ qemuDomainWaitForDeviceRemoval(virDomainObj *vm)
     until += qemuDomainGetUnplugTimeout(vm);
 
     while (priv->unplug.alias) {
-        if ((rc = virDomainObjWaitUntil(vm, until)) == 1)
-            return 0;
+        if ((rc = virDomainObjWaitUntil(vm, until)) == 1) {
+            if (priv->unplug.alias)
+                return 0;
+            else
+                return 1;
+        }
 
         if (rc < 0) {
             VIR_WARN("Failed to wait on unplug condition for domain '%s' "
--
2.34.0.windows.1