[PATCH 2/5] qemuProcessHandleIOError: Fix logic for lookup of disk

Peter Krempa posted 5 patches 7 months, 2 weeks ago
[PATCH 2/5] qemuProcessHandleIOError: Fix logic for lookup of disk
Posted by Peter Krempa 7 months, 2 weeks ago
There are two bugs in the current disk lookup code in
qemuProcessHandleIOError:

 1) The QOM name isn't matched qemuProcessFindDomainDiskByAliasOrQOM()

   We pass NULL as the second argument, but the diskAlias argument can
   be the QOM path for e.g. virtio disks. This means that the IO error
   event doesn't actually contain the disk information:

     event 'io-error' for domain 'cd':  () report due to other: Input/output error

   rather than:

     event 'io-error' for domain 'cd': /dev/mapper/errdev0 (virtio-disk0) report due to other: Input/output error

 2) nodenames are not preferred

   We now do everything based on node names, which also allow you to
   pinpoint a image within the backing chain. With the current code the
   path would always refer to the top image rather than the actual image
   causing the problem.

This patch fixes both issues by re-ordering the lookup to prefer
nodenames and selecting the image path based on the node name and also
populates the third argument of qemuProcessFindDomainDiskByAliasOrQOM().

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
---
 src/qemu/qemu_process.c | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 34a755a49a..95d0a40f84 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -833,9 +833,9 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED,
     virObjectEvent *ioErrorEvent = NULL;
     virObjectEvent *ioErrorEvent2 = NULL;
     virObjectEvent *lifecycleEvent = NULL;
-    const char *srcPath;
-    const char *devAlias;
-    virDomainDiskDef *disk;
+    virStorageSource *errsrc = NULL;
+    const char *srcPath = "";
+    const char *devAlias = "";

     virObjectLock(vm);
     priv = QEMU_DOMAIN_PRIVATE(vm);
@@ -843,19 +843,21 @@ qemuProcessHandleIOError(qemuMonitor *mon G_GNUC_UNUSED,
     if (*diskAlias == '\0')
         diskAlias = NULL;

-    if (diskAlias)
-        disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, diskAlias, NULL);
-    else if (nodename)
-        disk = qemuDomainDiskLookupByNodename(vm->def, NULL, nodename, NULL);
-    else
-        disk = NULL;
+    if (nodename) {
+        virDomainDiskDef *disk = qemuDomainDiskLookupByNodename(vm->def, priv->backup, nodename, &errsrc);

-    if (disk) {
-        srcPath = virDomainDiskGetSource(disk);
-        devAlias = disk->info.alias;
-    } else {
-        srcPath = "";
-        devAlias = "";
+        if (errsrc->path)
+            srcPath = errsrc->path;
+
+        if (disk)
+            devAlias = disk->info.alias;
+    } else if (diskAlias) {
+        virDomainDiskDef *disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, diskAlias, diskAlias);
+
+        if (disk) {
+            srcPath = virDomainDiskGetSource(disk);
+            devAlias = disk->info.alias;
+        }
     }

     ioErrorEvent = virDomainEventIOErrorNewFromObj(vm, srcPath, devAlias, action);
-- 
2.48.1
Re: [PATCH 2/5] qemuProcessHandleIOError: Fix logic for lookup of disk
Posted by Peter Krempa 7 months, 2 weeks ago
On Fri, Jan 24, 2025 at 17:33:03 +0100, Peter Krempa wrote:
> There are two bugs in the current disk lookup code in
> qemuProcessHandleIOError:
> 
>  1) The QOM name isn't matched qemuProcessFindDomainDiskByAliasOrQOM()
> 
>    We pass NULL as the second argument, but the diskAlias argument can
>    be the QOM path for e.g. virtio disks. This means that the IO error
>    event doesn't actually contain the disk information:

I've found the root cause for this one to be in QEMU where starting from
qemu-9.2 a 'qom-path' field was introduced for the event in qemu, but it
was plumbed in wrong where the 'device' and 'qom-path' fields were
exchanged.

Based on that I'll slightly modify this patch to also take QOM path if
present and plumb it in properly via an explicit path rather than how
this patch did it.

I'll also post a patch fixing the qemu bug.