[Xen-devel] [PATCH v2] libxl_qmp: wait for completion of device removal

Chao Gao posted 1 patch 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/1562133373-19208-1-git-send-email-chao.gao@intel.com
tools/libxl/libxl_qmp.c | 61 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 60 insertions(+), 1 deletion(-)

[Xen-devel] [PATCH v2] libxl_qmp: wait for completion of device removal

Posted by Chao Gao 2 weeks ago
To remove a device from a domain, a qmp command is sent to qemu. But it is
handled by qemu asychronously. Even the qmp command is claimed to be done,
the actual handling in qemu side may happen later.
This behavior brings two questions:
1. Attaching a device back to a domain right after detaching the device from
that domain would fail with error:

libxl: error: libxl_qmp.c:341:qmp_handle_error_response: Domain 1:received an
error message from QMP server: Duplicate ID 'pci-pt-60_00.0' for device

2. Accesses to PCI configuration space in Qemu may overlap with later device
reset issued by 'xl' or by pciback.

In order to avoid mentioned questions, wait for the completion of device
removal by querying all pci devices using qmp command and ensuring the target
device isn't listed. Only retry 5 times to avoid 'xl' potentially being blocked
by qemu.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
Changes in v2:
 - Break out early if we found an error during querying pci devices.
 - Print a message to warn admin that device removal may not be done
   in device model's side.
---
 tools/libxl/libxl_qmp.c | 61 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 60 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
index 42c8ab8d8d..9c4480a2b1 100644
--- a/tools/libxl/libxl_qmp.c
+++ b/tools/libxl/libxl_qmp.c
@@ -916,6 +916,38 @@ out:
     return rc;
 }
 
+static int pci_del_callback(libxl__qmp_handler *qmp,
+                            const libxl__json_object *response, void *opaque)
+{
+    const char *asked_id = opaque;
+    const libxl__json_object *bus = NULL;
+    GC_INIT(qmp->ctx);
+    int i, j, rc = 0;
+
+    for (i = 0; (bus = libxl__json_array_get(response, i)); i++) {
+        const libxl__json_object *devices = NULL;
+        const libxl__json_object *device = NULL;
+        const libxl__json_object *o = NULL;
+        const char *id = NULL;
+
+        devices = libxl__json_map_get("devices", bus, JSON_ARRAY);
+
+        for (j = 0; (device = libxl__json_array_get(devices, j)); j++) {
+             o = libxl__json_map_get("qdev_id", device, JSON_STRING);
+             id = libxl__json_object_get_string(o);
+
+             if (id && strcmp(asked_id, id) == 0) {
+                 rc = 1;
+                 goto out;
+             }
+        }
+    }
+
+out:
+    GC_FREE;
+    return rc;
+}
+
 static int qmp_run_command(libxl__gc *gc, int domid,
                            const char *cmd, libxl__json_object *args,
                            qmp_callback_t callback, void *opaque)
@@ -1000,9 +1032,36 @@ int libxl__qmp_pci_add(libxl__gc *gc, int domid, libxl_device_pci *pcidev)
 static int qmp_device_del(libxl__gc *gc, int domid, char *id)
 {
     libxl__json_object *args = NULL;
+    libxl__qmp_handler *qmp = NULL;
+    int rc = 0;
+
+    qmp = libxl__qmp_initialize(gc, domid);
+    if (!qmp)
+        return ERROR_FAIL;
 
     qmp_parameters_add_string(gc, &args, "id", id);
-    return qmp_run_command(gc, domid, "device_del", args, NULL, NULL);
+    rc = qmp_synchronous_send(qmp, "device_del", args,
+                              NULL, NULL, qmp->timeout);
+    if (rc == 0) {
+        unsigned int retry = 0;
+
+        do {
+            rc = qmp_synchronous_send(qmp, "query-pci", NULL,
+                                      pci_del_callback, id, qmp->timeout);
+            if (rc != 1) {
+                break;
+            }
+            sleep(1);
+        } while (retry++ < 5);
+
+        if (rc != 0) {
+            LOGD(WARN, qmp->domid,
+                 "device model may not complete removing device %s", id);
+        }
+    }
+
+    libxl__qmp_close(qmp);
+    return rc;
 }
 
 int libxl__qmp_pci_del(libxl__gc *gc, int domid, libxl_device_pci *pcidev)
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] libxl_qmp: wait for completion of device removal

Posted by Anthony PERARD 2 weeks ago
On Wed, Jul 03, 2019 at 01:56:13PM +0800, Chao Gao wrote:
> To remove a device from a domain, a qmp command is sent to qemu. But it is
> handled by qemu asychronously. Even the qmp command is claimed to be done,
> the actual handling in qemu side may happen later.
> This behavior brings two questions:
> 1. Attaching a device back to a domain right after detaching the device from
> that domain would fail with error:
> 
> libxl: error: libxl_qmp.c:341:qmp_handle_error_response: Domain 1:received an
> error message from QMP server: Duplicate ID 'pci-pt-60_00.0' for device
> 
> 2. Accesses to PCI configuration space in Qemu may overlap with later device
> reset issued by 'xl' or by pciback.
> 
> In order to avoid mentioned questions, wait for the completion of device
> removal by querying all pci devices using qmp command and ensuring the target
> device isn't listed. Only retry 5 times to avoid 'xl' potentially being blocked
> by qemu.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> Changes in v2:
>  - Break out early if we found an error during querying pci devices.
>  - Print a message to warn admin that device removal may not be done
>    in device model's side.

Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks,

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] libxl_qmp: wait for completion of device removal

Posted by Chao Gao 1 day ago
On Wed, Jul 03, 2019 at 05:03:17PM +0100, Anthony PERARD wrote:
>On Wed, Jul 03, 2019 at 01:56:13PM +0800, Chao Gao wrote:
>> To remove a device from a domain, a qmp command is sent to qemu. But it is
>> handled by qemu asychronously. Even the qmp command is claimed to be done,
>> the actual handling in qemu side may happen later.
>> This behavior brings two questions:
>> 1. Attaching a device back to a domain right after detaching the device from
>> that domain would fail with error:
>> 
>> libxl: error: libxl_qmp.c:341:qmp_handle_error_response: Domain 1:received an
>> error message from QMP server: Duplicate ID 'pci-pt-60_00.0' for device
>> 
>> 2. Accesses to PCI configuration space in Qemu may overlap with later device
>> reset issued by 'xl' or by pciback.
>> 
>> In order to avoid mentioned questions, wait for the completion of device
>> removal by querying all pci devices using qmp command and ensuring the target
>> device isn't listed. Only retry 5 times to avoid 'xl' potentially being blocked
>> by qemu.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>> Changes in v2:
>>  - Break out early if we found an error during querying pci devices.
>>  - Print a message to warn admin that device removal may not be done
>>    in device model's side.
>
>Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
>

Could we merge this patch? or need comments from someone else?

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel