[PATCH v2] virtio: add VIRTQUEUE_ERROR QAPI event

Vladimir Sementsov-Ogievskiy posted 1 patch 7 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230912175756.151047-1-vsementsov@yandex-team.ru
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Markus Armbruster <armbru@redhat.com>, "Dr. David Alan Gilbert" <dave@treblig.org>, Eric Blake <eblake@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Eduardo Habkost <eduardo@habkost.net>
There is a newer version of this series
hw/virtio/vhost.c | 12 +++++++++---
monitor/monitor.c | 10 ++++++++++
qapi/qdev.json    | 25 +++++++++++++++++++++++++
3 files changed, 44 insertions(+), 3 deletions(-)
[PATCH v2] virtio: add VIRTQUEUE_ERROR QAPI event
Posted by Vladimir Sementsov-Ogievskiy 7 months, 2 weeks ago
For now we only log the vhost device error, when virtqueue is actually
stopped. Let's add a QAPI event, which makes possible:

 - collect statistics of such errors
 - make immediate actions: take core dumps or do some other debugging
 - inform the user through a management API or UI, so that (s)he can
  react somehow, e.g. reset the device driver in the guest or even
  build up some automation to do so

Note that basically every inconsistency discovered during virtqueue
processing results in a silent virtqueue stop.  The guest then just
sees the requests getting stuck somewhere in the device for no visible
reason.  This event provides a means to inform the management layer of
this situation in a timely fashion.

The event could be reused for some other virtqueue problems (not only
for vhost devices) in future. For this it gets a generic name and
structure.

We keep original VHOST_OPS_DEBUG(), to keep original debug output as is
here, it's not the only call to VHOST_OPS_DEBUG in the file.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---

v2: - improve commit message (just stole wording by Roman, hope he don't
                              mind:)
    - add event throttling

 hw/virtio/vhost.c | 12 +++++++++---
 monitor/monitor.c | 10 ++++++++++
 qapi/qdev.json    | 25 +++++++++++++++++++++++++
 3 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e2f6ffb446..162899feee 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -15,6 +15,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qapi/qapi-events-qdev.h"
 #include "hw/virtio/vhost.h"
 #include "qemu/atomic.h"
 #include "qemu/range.h"
@@ -1332,11 +1333,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n)
     struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue,
                                               error_notifier);
     struct vhost_dev *dev = vq->dev;
-    int index = vq - dev->vqs;
 
     if (event_notifier_test_and_clear(n) && dev->vdev) {
-        VHOST_OPS_DEBUG(-EINVAL,  "vhost vring error in virtqueue %d",
-                        dev->vq_index + index);
+        int ind = vq - dev->vqs + dev->vq_index;
+        DeviceState *ds = &dev->vdev->parent_obj;
+
+        VHOST_OPS_DEBUG(-EINVAL,  "vhost vring error in virtqueue %d", ind);
+        qapi_event_send_virtqueue_error(ds->id, ds->canonical_path, ind,
+                                        VIRTQUEUE_ERROR_VHOST_VRING_ERR,
+                                        "vhost reported failure through vring "
+                                        "error fd");
     }
 }
 
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 941f87815a..cb1ee31156 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -313,6 +313,7 @@ static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
     [QAPI_EVENT_BALLOON_CHANGE]    = { 1000 * SCALE_MS },
     [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
     [QAPI_EVENT_QUORUM_FAILURE]    = { 1000 * SCALE_MS },
+    [QAPI_EVENT_VIRTQUEUE_ERROR]   = { 1000 * SCALE_MS },
     [QAPI_EVENT_VSERPORT_CHANGE]   = { 1000 * SCALE_MS },
     [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
 };
@@ -497,6 +498,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
         hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
     }
 
+    if (evstate->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
+        hash += g_str_hash(qdict_get_str(evstate->data, "device"));
+    }
+
     return hash;
 }
 
@@ -524,6 +529,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
                        qdict_get_str(evb->data, "qom-path"));
     }
 
+    if (eva->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
+        return !strcmp(qdict_get_str(eva->data, "device"),
+                       qdict_get_str(evb->data, "device"));
+    }
+
     return TRUE;
 }
 
diff --git a/qapi/qdev.json b/qapi/qdev.json
index 6bc5a733b8..199e21cae7 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -161,3 +161,28 @@
 ##
 { 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
   'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @VirtqueueError:
+#
+# Since: 8.2
+##
+{ 'enum': 'VirtqueueError',
+  'data': [ 'vhost-vring-err' ] }
+
+##
+# @VIRTQUEUE_ERROR:
+#
+# Emitted when a device virtqueue fails in runtime.
+#
+# @device: the device's ID if it has one
+# @path: the device's QOM path
+# @virtqueue: virtqueue index
+# @error: error identifier
+# @description: human readable description
+#
+# Since: 8.2
+##
+{ 'event': 'VIRTQUEUE_ERROR',
+ 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int',
+            'error': 'VirtqueueError', 'description': 'str'} }
-- 
2.34.1
Re: [PATCH v2] virtio: add VIRTQUEUE_ERROR QAPI event
Posted by Markus Armbruster 7 months, 1 week ago
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:

> For now we only log the vhost device error, when virtqueue is actually
> stopped. Let's add a QAPI event, which makes possible:
>
>  - collect statistics of such errors
>  - make immediate actions: take core dumps or do some other debugging
>  - inform the user through a management API or UI, so that (s)he can
>   react somehow, e.g. reset the device driver in the guest or even
>   build up some automation to do so
>
> Note that basically every inconsistency discovered during virtqueue
> processing results in a silent virtqueue stop.  The guest then just
> sees the requests getting stuck somewhere in the device for no visible
> reason.  This event provides a means to inform the management layer of
> this situation in a timely fashion.
>
> The event could be reused for some other virtqueue problems (not only
> for vhost devices) in future. For this it gets a generic name and
> structure.
>
> We keep original VHOST_OPS_DEBUG(), to keep original debug output as is
> here, it's not the only call to VHOST_OPS_DEBUG in the file.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---

[...]

> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index 6bc5a733b8..199e21cae7 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -161,3 +161,28 @@
>  ##
>  { 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
>    'data': { '*device': 'str', 'path': 'str' } }
> +
> +##
> +# @VirtqueueError:
> +#

Missing member documentation:

    # @vhost-vring-err: Lorem ipsum dolor sit amet, consectetur adipiscing
    #     elit, sed do eiusmod tempor incididunt ut labore et dolore magna
    #     aliqua.

> +# Since: 8.2
> +##
> +{ 'enum': 'VirtqueueError',
> +  'data': [ 'vhost-vring-err' ] }

Let's not abbreviate "error" to "err" here.

> +
> +##
> +# @VIRTQUEUE_ERROR:
> +#
> +# Emitted when a device virtqueue fails in runtime.
> +#
> +# @device: the device's ID if it has one
> +# @path: the device's QOM path
> +# @virtqueue: virtqueue index
> +# @error: error identifier
> +# @description: human readable description
> +#
> +# Since: 8.2
> +##
> +{ 'event': 'VIRTQUEUE_ERROR',
> + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int',
> +            'error': 'VirtqueueError', 'description': 'str'} }
Re: [PATCH v2] virtio: add VIRTQUEUE_ERROR QAPI event
Posted by Denis Plotnikov 7 months, 2 weeks ago
Reviewed-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>

On 9/12/23 20:57, Vladimir Sementsov-Ogievskiy wrote:
> For now we only log the vhost device error, when virtqueue is actually
> stopped. Let's add a QAPI event, which makes possible:
>
>   - collect statistics of such errors
>   - make immediate actions: take core dumps or do some other debugging
>   - inform the user through a management API or UI, so that (s)he can
>    react somehow, e.g. reset the device driver in the guest or even
>    build up some automation to do so
>
> Note that basically every inconsistency discovered during virtqueue
> processing results in a silent virtqueue stop.  The guest then just
> sees the requests getting stuck somewhere in the device for no visible
> reason.  This event provides a means to inform the management layer of
> this situation in a timely fashion.
>
> The event could be reused for some other virtqueue problems (not only
> for vhost devices) in future. For this it gets a generic name and
> structure.
>
> We keep original VHOST_OPS_DEBUG(), to keep original debug output as is
> here, it's not the only call to VHOST_OPS_DEBUG in the file.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
>
> v2: - improve commit message (just stole wording by Roman, hope he don't
>                                mind:)
>      - add event throttling
>
>   hw/virtio/vhost.c | 12 +++++++++---
>   monitor/monitor.c | 10 ++++++++++
>   qapi/qdev.json    | 25 +++++++++++++++++++++++++
>   3 files changed, 44 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index e2f6ffb446..162899feee 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -15,6 +15,7 @@
>   
>   #include "qemu/osdep.h"
>   #include "qapi/error.h"
> +#include "qapi/qapi-events-qdev.h"
>   #include "hw/virtio/vhost.h"
>   #include "qemu/atomic.h"
>   #include "qemu/range.h"
> @@ -1332,11 +1333,16 @@ static void vhost_virtqueue_error_notifier(EventNotifier *n)
>       struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue,
>                                                 error_notifier);
>       struct vhost_dev *dev = vq->dev;
> -    int index = vq - dev->vqs;
>   
>       if (event_notifier_test_and_clear(n) && dev->vdev) {
> -        VHOST_OPS_DEBUG(-EINVAL,  "vhost vring error in virtqueue %d",
> -                        dev->vq_index + index);
> +        int ind = vq - dev->vqs + dev->vq_index;
> +        DeviceState *ds = &dev->vdev->parent_obj;
> +
> +        VHOST_OPS_DEBUG(-EINVAL,  "vhost vring error in virtqueue %d", ind);
> +        qapi_event_send_virtqueue_error(ds->id, ds->canonical_path, ind,
> +                                        VIRTQUEUE_ERROR_VHOST_VRING_ERR,
> +                                        "vhost reported failure through vring "
> +                                        "error fd");
>       }
>   }
>   
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index 941f87815a..cb1ee31156 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -313,6 +313,7 @@ static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
>       [QAPI_EVENT_BALLOON_CHANGE]    = { 1000 * SCALE_MS },
>       [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
>       [QAPI_EVENT_QUORUM_FAILURE]    = { 1000 * SCALE_MS },
> +    [QAPI_EVENT_VIRTQUEUE_ERROR]   = { 1000 * SCALE_MS },
>       [QAPI_EVENT_VSERPORT_CHANGE]   = { 1000 * SCALE_MS },
>       [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
>   };
> @@ -497,6 +498,10 @@ static unsigned int qapi_event_throttle_hash(const void *key)
>           hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
>       }
>   
> +    if (evstate->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
> +        hash += g_str_hash(qdict_get_str(evstate->data, "device"));
> +    }
> +
>       return hash;
>   }
>   
> @@ -524,6 +529,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
>                          qdict_get_str(evb->data, "qom-path"));
>       }
>   
> +    if (eva->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
> +        return !strcmp(qdict_get_str(eva->data, "device"),
> +                       qdict_get_str(evb->data, "device"));
> +    }
> +
>       return TRUE;
>   }
>   
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index 6bc5a733b8..199e21cae7 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -161,3 +161,28 @@
>   ##
>   { 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
>     'data': { '*device': 'str', 'path': 'str' } }
> +
> +##
> +# @VirtqueueError:
> +#
> +# Since: 8.2
> +##
> +{ 'enum': 'VirtqueueError',
> +  'data': [ 'vhost-vring-err' ] }
> +
> +##
> +# @VIRTQUEUE_ERROR:
> +#
> +# Emitted when a device virtqueue fails in runtime.
> +#
> +# @device: the device's ID if it has one
> +# @path: the device's QOM path
> +# @virtqueue: virtqueue index
> +# @error: error identifier
> +# @description: human readable description
> +#
> +# Since: 8.2
> +##
> +{ 'event': 'VIRTQUEUE_ERROR',
> + 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int',
> +            'error': 'VirtqueueError', 'description': 'str'} }